Machine Learning Notes 03

Oct 18, 2016

Multivariate Linear Regression

Hypothesis: \( h(\theta) = \sum_{i=0}^{n}\theta_{i}X_{i} \)

\( X=\begin{bmatrix}x_{0}\ x_{1}\ .\ .\ .\ x_{n}\end{bmatrix}\in \mathbb{R}^{n+1} ;, \theta=\begin{bmatrix}\theta_{0}\ \theta_{1}\ .\ .\ .\ \theta_{n}\end{bmatrix}\in \mathbb{R}^{n+1}\)

Also, \(h(\theta) = \theta ^{ T} X\)

Gradient descent

Algorithm:

Repeat{
\(\theta_{j}:=\theta_{j}-\alpha\frac{1}{m}\sum_{i=1}^{n}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}\)
}

Feature Scaling

Idea: Make sure fretures are on a similar scale.
Mean normalization
Replace \(x_{i}\) with \(x_{i}-\mu_{i}\) to make features have approximately zero mean(Do not apply to \(x_{0}\), Which we suppose equals 1)
\(x_{i}:=\frac{x_{i}-\mu_{i}}{s_{i}}(i\neq0)\);
(\(\mu_{i}\) :Average value of \(x_{i}\) in training set, \(s_{i}\) :range(=max-min)(or standard deviation))

Learning Rate

If \(\alpha\) is too small: slow convergence;
If \(\alpha\) is too large: \(J(\theta)\) may not decrese on every iteration, may not converge(slow converge also possible);

Normal Equation —Solve for \(\theta\) analytically

For the input X, add a column on the left in X filled with 1, make up a new X;
then, \(\theta = (X^{ T}X)^{-1}X^{ T}y\)

The difference between Gradient Descent and Normal Equation:

#Machine Learning