Machine Learning Notes 02

Oct 11, 2016

Linear Regression with one variable

Hypothesis: \(h(\theta) = {\theta}_{0} + {\theta}_{1} x_{1}\)
Cost function: \(J(\theta_{0},\theta_{1})=\frac{1}{2m}\sum_{i=1}^{n}(h_{\theta}(x^{(i)})-y^{(i)})^2\)

Gradient descent

Target: minimize \(J(\theta_{0},\theta_{1})\) or \(J(\theta_{0},\theta_{1},\cdots,\theta_{n})\)

Gradient algorithm:

Repeat until convergence{
\( \theta_{j}:=\theta_{j}-\alpha\frac{\partial }{\partial j}J(\theta_{0},\theta_{1})\;(for \: j=0\:and\:j=1)\)
}
( \(:=\) - Assignment \(\alpha\) - Learning rate )

Warning: \(\theta_{0}\) and \(\theta_{1}\)should be updated Simultaneously !!!

Especially, when gradient descent for linear regression,

\(\frac{\partial }{\partial j}J(\theta_{0},\theta_{1})=\sum_{i=1}^{n}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}\)
when \(i=0\), we suppose \(x^{(0)}=1\)

#Machine Learning