Classification
For the binary classification, \(y\in {0,1 }\)(Also maybe, \(y\in { 0,1,2,3,\cdots}\), that’s called a multiclass classification problem, we will discuss it later.).
So, we use a model called Logisitic Regression, and we can see the hypothesis \(h_{\theta}(x)\) should value in the range of 0 and 1. This is to say, \(0\leq h_{\theta}(x)\leq 1\).
Hypothesis Representation

Logistic Regression: \(h_{\theta}(x)=g(\theta^{T}x)\), \(g(z)=\frac{1}{1+e^{z}}\)

Interpretation of Hypothesis Output. The value of \(h_{\theta}(x)\) equals to the estimated probability that y=1 (on input x, parameterized by \(\theta\) ). This is to say, \(h_{\theta}(x)=P(y=2\mid x ; \theta)\)

Decision Boundary
The decision boundaries are like this:
Emphasis: Decision boundary is the property of hypothesis function, but not the property of training set and its parameters.
Logistic Regression—How to fit the parameters of theta

Cost Function of Logistic Regression:
\(Cost(h_{\theta}, y)=ylog(h_{\theta}(x))(1y)log(1h_{\theta}(x))\) 
Gradient Descent:
To minimize \(J_{\theta}\)
Repeat{
\(\theta_{j}:=\theta_{j}\alpha\frac{1}{m}\sum_{i=1}^{n}(h_{\theta}(x^{(i)})y^{(i)})x^{(i)}\)
}
And we can see that this algortithm looks identical to linear regression!
But actually, the hypothesis of them are different.
Linear Regression: \(h_{\theta}(x)=\theta^{T}x\)
Logistic Regression: \(h_{\theta}(x)=\frac{1}{1+e^{\theta^{T}x}}\)
Besides,
“…use a vector rise implementation, so that a vector rise implementation can update all of these until parameters all in one fell swoop.”
Advanced Optimization
 Gradient Descent
 Conjngate Gradient
 BFGS
 LBFGS
 ……

Adcantages: No need to manually pick \(\alpha\) ; Often faster than gradient descent;

Disadvantages: More complex;
Multiclass classification: Onevsall
 For example, to slove the threeclass problem, we can “turn this into three seperate twoclass classification problems.”
 On a new input \(x\), to make a prediction, pick the class i that maximizes.