LINEAR MODELS
Linear Regression
$$\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n = \mathbf{X}^T \boldsymbol{\beta}$$
ŷ = predicted value
β₀ = intercept (bias term)
βᵢ = coefficient for feature i
X = feature matrix [n×p]
Cost Function (MSE)
$$J(\boldsymbol{\beta}) = \frac{1}{2m} \sum_{i=1}^{m} (h(x^{(i)}) - y^{(i)})^2 = \frac{1}{2m} \|\mathbf{X}\boldsymbol{\beta} - \mathbf{y}\|^2$$
Minimize using Normal Equation
$$\boldsymbol{\beta} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$$
Or Gradient Descent
$$\boldsymbol{\beta} := \boldsymbol{\beta} - \alpha \frac{1}{m} \mathbf{X}^T(\mathbf{X}\boldsymbol{\beta} - \mathbf{y})$$
O(n²) training
O(n) prediction
Logistic Regression
$$P(y=1|\mathbf{x}) = \sigma(z) = \frac{1}{1 + e^{-z}} \quad \text{where } z = \boldsymbol{\beta}^T\mathbf{x}$$
σ(z) = sigmoid function
z = linear combination βTx
Output: probability ∈ [0,1]
Log-Likelihood Cost
$$J(\boldsymbol{\beta}) = -\frac{1}{m} \sum_{i=1}^{m} [y^{(i)}\log(h(x^{(i)})) + (1-y^{(i)})\log(1-h(x^{(i)}))]$$
Gradient (no closed form solution)
$$\nabla J = \frac{1}{m} \mathbf{X}^T(\sigma(\mathbf{X}\boldsymbol{\beta}) - \mathbf{y})$$
Update rule
$$\boldsymbol{\beta} := \boldsymbol{\beta} - \alpha \nabla J$$
O(n²k) training
O(n) prediction