Machine Learning Notes 04

 

Classification

  For the binary classification, y0,1y\in {0,1 }(Also maybe, y0,1,2,3,y\in { 0,1,2,3,\cdots}, that’s called a multiclass classification problem, we will discuss it later.).
  So, we use a model called Logisitic Regression, and we can see the hypothesis hθ(x)h_{\theta}(x) should value in the range of 0 and 1. This is to say, 0hθ(x)10\leq h_{\theta}(x)\leq 1.


Hypothesis Representation

  • Logistic Regression:     hθ(x)=g(θTx)h_{\theta}(x)=g(\theta^{T}x), g(z)=11+ezg(z)=\frac{1}{1+e^{-z}}

  • Interpretation of Hypothesis Output. The value of hθ(x)h_{\theta}(x) equals to the estimated probability that y=1 (on input x, parameterized by θ\theta ). This is to say, hθ(x)=P(y=2x;θ)h_{\theta}(x)=P(y=2\mid x ; \theta)

  • Decision Boundary
    The decision boundaries are like this:

  Emphasis: Decision boundary is the property of hypothesis function, but not the property of training set and its parameters.


Logistic Regression—How to fit the parameters of theta

  • Cost Function of Logistic Regression:
    Cost(hθ,y)=ylog(hθ(x))(1y)log(1hθ(x))Cost(h_{\theta}, y)=-ylog(h_{\theta}(x))-(1-y)log(1-h_{\theta}(x))

  • Gradient Descent:
    To minimize JθJ_{\theta}

   Repeat{
      θj:=θjα1mi=1n(hθ(x(i))y(i))x(i)\theta_{j}:=\theta_{j}-\alpha\frac{1}{m}\sum_{i=1}^{n}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}
   }

  And we can see that this algortithm looks identical to linear regression!
  But actually, the hypothesis of them are different.

 Linear Regression: hθ(x)=θTxh_{\theta}(x)=\theta^{T}x
 Logistic Regression: hθ(x)=11+eθTxh_{\theta}(x)=\frac{1}{1+e^{-\theta^{T}x}}

Besides,

“…use a vector rise implementation, so that a vector rise implementation can update all of these until parameters all in one fell swoop.”


Advanced Optimization

  • Gradient Descent
  • Conjngate Gradient
  • BFGS
  • L-BFGS
  • ……
  • Adcantages: No need to manually pick α\alpha ; Often faster than gradient descent;

  • Disadvantages: More complex;


Multi-class classification: One-vs-all

  • For example, to slove the three-class problem, we can “turn this into three seperate two-class classification problems.”
  • On a new input xx, to make a prediction, pick the class i that maximizes.

Contents


本站采用知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议

知识共享许可协议

Gitalking ...