filter-listOptimization

Book: Kambo N.S. Mathematical

Classes will done in two factors

chevron-rightCourse Contenthashtag
  • Unconstrained Optimization : Unconstrained Optimization : Convex sets and functions, Optimality conditions: First order and second order, line search methods, least squares, steepest descent, newton method, Quasi-Newton Method, conjugate gradient methods.

    Pre-requisite : Basic understanding of Matrix operations, Determinants, Derivatives and higher derivatives, partial derivatives, gradients ( Mathematics of class 11 and 12 in general). Lecture Plan is Here ⇒

    • Lecture 1 (Jan 11) : Vector spaces, Linear combination, Basis, Dimension.

    • Lecture 2 (Jan 18): Convex sets and Convex combinations, convex and concave functions, Necessary and sufficient conditions for Local extrema in functions of one variables

    • Lecture 3 (Jan 19): Gradient of a multivariable function, Hessian Matrix, Positive definite Matrix, Positive Semi-definite Matrix, Negative definite Matrix, Negative Semi-definite Matrix, Necessary and sufficient conditions for Local extrema in functions of several variables, Started discussing Golden Section Method.

    • Lecture 4 (Jan 25): Line Search Methods : Golden section Method and Fibonacci Method

    • Lecture 5 (Feb 1): Definition of Taylor's series, Newton's Method, Quasi Newton's Method

    • Lecture 6 (Feb 2): Directional derivatives, Steepest descent Method (& Quiz - I)

    • Lecture 7 (Feb 8): Orthogonal Vectors, Conjugate directions, Conjugate gradient Method

    • Lecture 8 (Feb 9): System of linear equations, Least square approximations (& Quiz - II)

  • Constrained Optimization : barrier method, penalty method, interior point methods, KKT method and Lagrangian Duality, simplex, Frank and Wolfe method, applications to dynamic programming and optimal control.

chevron-rightExamshashtag
  • 50% internal

    • 30% for 2 Quiz

      • 2Feb

      • 9 Feb

    • 20% Assignments

      • 1st Deadline ⇒ 23Feb

  • 50% Main

chevron-rightMaterial hashtag

Lecture 1:(11/01/2025)

Class Recordingarrow-up-right

Optimization is the act of obtaining the best result under the given circumstances.

Minimize the effort ( cost ) or Maximize the output ( Profit ).

Objective function ⇒ the problem statement.

Feasible solution ⇒ vector satisfying all the constraints of a given problem is called feasible solution of that problem.

Feasible region ⇒ The collection of all feasible solutions is called feasible region

Feasible solutions ⇒ In LLP it lies at corner of the feasible region.

Optimal solution ⇒ the only one feasible solution is optimal solution which is min/max depend on problem statement.

Linear Programming Problem(LLP) ⇒ the problem with max power is 1, x+ y=8

R(n)=x1,x2,x3,...xnxinR(n) = {x1, x2, x3,...xn} | x i*n

Non Linear Programming Problem(NLLP) ⇒ the problem with power more than 1

  • Vector and tuple

  • Vector space

Lecture 2:(18/01/2025)

Class Recordingarrow-up-right

Convex set: Any 2 point joing by line shoukd be in the domain

Convex combination:

A convex combination is a weighted sum of vectors where the scalars are non-negative and sum to 1. like Sets,

This combination lies on the line segment joining v1 ​ and v2

Functions:

f(x) = x

Local minima: A fucntion of one varible f(x) is such that have a local minima at x* if f(x) <= f(x* +h) for all sufficeint small positive and neg value of h

Local maxima:

f(x) >= f(x* +h)

if any point that did’t satisfy any condition then it nighter

Global Minima:

f(x*) <=f(x) for all x in domain f(x)

Necessary Condition:

point if f(x) if df(x)/d(x) = 0 at x*

derivative is slope

Sufficient condition:

Use the Second Derivative Test

If f′′(xc)>0f′′(xc​)>0, xcxc​ is a local minimum.

If f′′(xc)<0f′′(xc​)<0, xcxc​ is a local maximum.

If f′′(xc)=0f′′(xc​)=0, further analysis may be needed.

Convex function(U): (upward parabola) A function f(x) is convex if the graph lies below or on the straight line joining f(x) and f(y) for any two points x and y in the domain.

f(tx1+(1t)x2)tf(x1)+(1t)f(x2),t[0,1]f(tx_1 +(1−t)x_2)≤ tf(x_1)+(1−t)f(x_2), ∀t∈ [0,1]

Concave function(reverse U): (downward parabola) A function f(x) is concave if the graph lies above or on the straight line joining f(x) and f(y) for any two points x and y in the domain.

f(tx1+(1t)x2)tf(x1)+(1t)f(x2),t[0,1]f(tx_1 +(1−t)x_2) ≥ tf(x_1)+(1−t)f(x_2), ∀t∈ [0,1]

Neither CONVEX nor CONCAVE if any condition did’t satisfy like(U and reverse U together)

if both condition satisfy then it's straight line

Empty set is Convex set

Singleton set is Convex set i.e {a} is also Convex set

Line search method

Lecture 3: (19/01/2025)

Class Recordingarrow-up-right

Hassion matrix

The Hessian matrix of a function f(x,y) is a square matrix of second-order partial derivatives. It represents the curvature of the function.

Formula
Example

Symmetric matrix

A symmetric matrix is a square matrix that is equal to its transpose, meaning the elements across the diagonal are mirror images of each other.

Where we can change row and column change will not change matrix

Hessian is always a Symmetric matrix

Determinants

The determinants are calculated from the top-left corner of the matrix.

like det(1) = a_11, det(2)= [a_11 * a_22 + a_12*a_21] and so on...

Positive definite Hessian (Local minima)

all its leading principal minors determinants are positive, i.e calculating from left to right all the the determinate are positive.( excluding 0)

To have a local minimum at a critical point, the Hessian H must be positive definite.

If in Positive definite has some 0 values then it's positive semi-definite.

Negative definite Hessian (Local maxima)

all its leading principal minors alternate in sign, starting with negative.( excluding 0) This means:

If in Negative definite has some 0 values then it's negative semi-definite.

Saddle

If both condition din’t satisfy then is neither, A saddle point is a point on a multivariable function's graph where the tangent plane is flat, but it's not a local minimum or maximum

To determine convexity or concavity using the Hessian matrix:

circle-info

If the Hessian matrix is constant (i.e., independent of x and y) and positive semidefinite, then the function is convex and same for concave.

circle-exclamation

convex / concave

  • A function is convex if its Hessian matrix is positive semi-definite

  • A function is concave if its Hessian matrix is negative semi-definite

triangle-exclamation

Global Minima/Maxima

In some case when we have dependent in hasian matrix then

if value of f(x,y) is lowest at (x,y) in all the critical points of (x,y) then point (x,y) is Global minima.

Method

unimodal function => if f(x) has only one local mama or maxima(but not both) in region [a,b] {[a, b] is internal of search }

Line search method

In a line search method, the function being optimised should be unimodal along the search direction. This means the function has only one minimum or maximum within the search interval, ensuring that a clear optimum exists.

For example, in a function, the function is unimodal because it has a single minimum at x=0

We will learn 2 method => Golden section method and Fibonacci method

To find => max or min of single variable objective unimodal function

Equal interval method (depreciated)

In Equal interval we use divide and conquer method by taking mid of every interval and compare the value and choice another another reasons.

Lecture 4: (25/01/2025)

Class Recordingarrow-up-right

Golden section method

Golden Ratio => 1.618

In Golden section Method we take g = Golden Ratio - 1 = 0.618

g = 0.618

d = g(b-a) => in place of mid point like in Equal interval we calculated d

X1 = a+d

X2 = b-d

Then we calculate f(x1) and f(x2) and compare

If f(x2) < f(x1) then we reject x1 to b area

If f(x2) > f(x1) then we reject a to x1 area

Fibonacci Search Method

Here we use Fibonacci Search to get the value of d. Here we decide the number of iteration before doing the iteration(that's the change from Golden method).

Fibonacci series are ⇒ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377

Formula

Firstly we need to calculte n, for n we need to calculate Fn

Fn(ba)ϵF_n \geq \frac{(b - a)}{\epsilon}

ϵ is the desired precision, generally we take ϵ = 0.02 for hand calculations

Now compare Fn with the Fibonacci sequence and choose the smallest n such that Fn​ meets or exceeds the required value. for example ⇒

if Fn = 55.8 then we take index of 89, i.e. n = 12

if Fn = 35.2 then we take index of 55, i.e. n = 11

Now we know the number of iterations then by using this formula calcualte x,

xk=ak+FnkFn+1(bkak) x_k = a_k + \frac{F_{n-k}}{F_{n+1}} (b_k - a_k)
yk=bkFnkFn+1(bkak) y_k = b_k - \frac{F_{n-k}}{F_{n+1}} (b_k - a_k)
  • a,b are the interval bounds.

  • Fn​ is the Fibonacci number at step nnn.

  • k is the current iteration.

now we have [x_k, y_k] as two point then calculated F(x_k) and F(y_k) and Selecte the new [a,b] similar to Golden Search method. we do this till k = n.

Lecture 5: (01/02/2025)

Class Recordingarrow-up-right

file-pdf
2MB

Descent Method

The is an optimization technique used to minimize a function. It works by iteratively moving in the direction of the steepest decrease of the function.

Newton's Method

Newton's method is a powerful optimization technique that uses second-order derivatives to find minima or maxima of functions. Here's a structured breakdown of key concepts from the lecture notes:

1. Foundation: Taylor Series Expansion

For an infinitely differentiable function f(x)f(x) around point aa:

f(x)f(a)+f(a)(xa)+12f(a)(xa)2f(x) \approx f(a) + f'(a)(x-a) + \frac{1}{2}f''(a)(x-a)^2

This approximation forms the basis for deriving Newton's method[1].

2. Single Variable Newton Method

Algorithm Steps:

  1. Compute first and second derivatives: f(x)f'(x), f(x)f''(x)

  2. Update rule:

ak+1=akf(ak)f(ak)a_{k+1} = a_k - \frac{f'(a_k)}{f''(a_k)}
  1. Stop when f(x)<ε|f'(x)| < \varepsilon

Example: f(x)=x2+2xf(x) = x^2 + 2x

  • f(x)=2x+2f'(x) = 2x+2, f(x)=2f''(x) = 2

  • Starting at a1=0a_1=0:

a2=022=1a_2 = 0 - \frac{2}{2} = -1
  • Verification: f(1)=0f'(-1) = 0 indicates local minimum[1]

3. Multivariable Extension

For f(x)f(\mathbf{x}) where x=(x1,x2,...,xn)\mathbf{x} = (x_1,x_2,...,x_n):

  1. Compute gradient ablafabla f and Hessian HfH_f

  2. Update rule:

xk+1=xkHf1(xk)f(xk)\mathbf{x}_{k+1} = \mathbf{x}_k - H_f^{-1}(\mathbf{x}_k)\nabla f(\mathbf{x}_k)

Key Components:

  • Gradient: ablaf=(fx1,...,fxn)Tabla f = \left(\frac{\partial f}{\partial x_1}, ..., \frac{\partial f}{\partial x_n}\right)^T

  • Hessian: Hf=[2fxixj]n×nH_f = \left[\frac{\partial^2 f}{\partial x_i \partial x_j}\right]_{n\times n}

Example Optimization: f(x,y)=xy+2x2+2xy+y2f(x,y) = x-y + 2x^2 + 2xy + y^2

  • Gradient: ablaf=(1+4x+2y1+2x+2y)abla f = \begin{pmatrix} 1+4x+2y \\ -1+2x+2y \end{pmatrix}

  • Hessian: Hf=(4222)H_f = \begin{pmatrix} 4 & 2 \\ 2 & 2 \end{pmatrix}

  • Starting at (0,0):

Key Takeaways:

  • Requires computation of second derivatives

  • Quadratic convergence rate near minima

  • Hessian must be positive definite for minima

  • Matrix inversion can be computationally expensive for high dimensions

This method is particularly effective when good initial estimates are available and second derivatives can be efficiently computed. The lecture examples demonstrate both its power in quick convergence and the computational complexity involved in matrix operations

Lecture 6: (02/01/2025)

Newton's method is a powerful optimization technique that uses second-order derivatives to find minima or maxima of functions. Here's a structured breakdown of key concepts from the lecture notes:

1. Foundation: Taylor Series Expansion

For an infinitely differentiable function f(x)f(x) around point aa:

f(x)f(a)+f(a)(xa)+12f(a)(xa)2f(x) \approx f(a) + f'(a)(x-a) + \frac{1}{2}f''(a)(x-a)^2

This approximation forms the basis for deriving Newton's method[1].

2. Single Variable Newton Method

Algorithm Steps:

  1. Compute first and second derivatives: f(x)f'(x), f(x)f''(x)

  2. Update rule:

ak+1=akf(ak)f(ak)a_{k+1} = a_k - \frac{f'(a_k)}{f''(a_k)}
  1. Stop when f(x)<ε|f'(x)| < \varepsilon

Example: f(x)=x2+2xf(x) = x^2 + 2x

  • f(x)=2x+2f'(x) = 2x+2, f(x)=2f''(x) = 2

  • Starting at a1=0a_1=0:

a2=022=1a_2 = 0 - \frac{2}{2} = -1
  • Verification: f(1)=0f'(-1) = 0 indicates local minimum[1]

3. Multivariable Extension

For f(x)f(\mathbf{x}) where x=(x1,x2,...,xn)\mathbf{x} = (x_1,x_2,...,x_n):

  1. Compute gradient ablafabla f and Hessian HfH_f

  2. Update rule:

xk+1=xkHf1(xk)f(xk)\mathbf{x}_{k+1} = \mathbf{x}_k - H_f^{-1}(\mathbf{x}_k)\nabla f(\mathbf{x}_k)

Key Components:

  • Gradient: ablaf=(fx1,...,fxn)Tabla f = \left(\frac{\partial f}{\partial x_1}, ..., \frac{\partial f}{\partial x_n}\right)^T

  • Hessian: Hf=[2fxixj]n×nH_f = \left[\frac{\partial^2 f}{\partial x_i \partial x_j}\right]_{n\times n}

Example Optimization: f(x,y)=xy+2x2+2xy+y2f(x,y) = x-y + 2x^2 + 2xy + y^2

  • Gradient: ablaf=(1+4x+2y1+2x+2y)abla f = \begin{pmatrix} 1+4x+2y \\ -1+2x+2y \end{pmatrix}

  • Hessian: Hf=(4222)H_f = \begin{pmatrix} 4 & 2 \\ 2 & 2 \end{pmatrix}

  • Starting at (0,0):

A^{-1} = \frac{1}{ad-bc}\begin{pmatrix} d & -b \ -c & a \end{pmatrix}

Lecture 6: (02/01/2025)

Class Recordingarrow-up-right:

The steepest descent method is an iterative optimization technique used to find the minimum of a function. Here's a structured breakdown of the key concepts and example problem from the lecture notes:


1. Core Concept: Steepest Descent Method

  • Objective: Minimize a function f:RnRf: \mathbb{R}^n \rightarrow \mathbb{R}.

  • Update Rule: xk+1=xk+αkdkx_{k+1} = x_k + \alpha_k d_k, where dk=f(xk)d_k = -\nabla f(x_k) (steepest descent direction) and αk\alpha_k is the step size.


2. Mathematical Foundations

Directional Derivative

For f(x)f(x) at point xx in direction dd: Directional derivative=dTf(x)\text{Directional derivative} = d^T \cdot \nabla f(x)

  • Descent direction: dTf(x)<0d^T \cdot \nabla f(x) < 0

  • Ascent direction: dTf(x)>0d^T \cdot \nabla f(x) > 0

Gradient Properties

  • Steepest ascent direction: ablaf(x)abla f(x)

  • Steepest descent direction: f(x)-\nabla f(x)


3. Algorithm Steps

  1. Initialize: Choose starting point x1x_1. Set iteration i=1i = 1.

  2. Compute Search Direction: di=f(xi)d_i = -\nabla f(x_i).

  3. Find Optimal Step Size: Minimize f(xi+αidi)f(x_i + \alpha_i d_i) with respect to αi\alpha_i.

  4. Update: xi+1=xi+αidix_{i+1} = x_i + \alpha_i d_i.

  5. Check Stopping Criterion: If f(xi+1)f(xi)<ϵ\| f(x_{i+1}) - f(x_i) \| < \epsilon, stop. Otherwise, increment ii and repeat.


4. Example: Minimize f(x,y)=x2xy+y2f(x, y) = x^2 - xy + y^2

Initialization

  • Start at x1=(1,12)x_1 = \left(1, \frac{1}{2}\right).

  • Gradient:

    f(x,y)=(2xyx+2y)    f(1,12)=(320)\nabla f(x, y) = \begin{pmatrix} 2x - y \\ -x + 2y \end{pmatrix} \implies \nabla f\left(1, \frac{1}{2}\right) = \begin{pmatrix} \frac{3}{2} \\ 0 \end{pmatrix}
  • Search direction: d1=f(x1)=(320)d_1 = -\nabla f(x_1) = \begin{pmatrix} -\frac{3}{2} \\ 0 \end{pmatrix}.

Iteration 1

  • Minimize f(132α1,12)f\left(1 - \frac{3}{2}\alpha_1, \frac{1}{2}\right):

    • Solve ddα1f=0    α1=12\frac{d}{d\alpha_1} f = 0 \implies \alpha_1 = \frac{1}{2}.

    • Update:

      x2=(1,12)+12(320)=(14,12)x_2 = \left(1, \frac{1}{2}\right) + \frac{1}{2} \begin{pmatrix} -\frac{3}{2} \\ 0 \end{pmatrix} = \left(\frac{1}{4}, \frac{1}{2}\right)
    • Function values: f(x1)=34f(x_1) = \frac{3}{4}, f(x2)=316f(x_2) = \frac{3}{16}.

    • Difference: f(x2)f(x1)=0.56>0.05\| f(x_2) - f(x_1) \| = 0.56 > 0.05.

Iteration 2

  • Gradient at x2x_2: ablaf(14,12)=(034)abla f\left(\frac{1}{4}, \frac{1}{2}\right) = \begin{pmatrix} 0 \\ \frac{3}{4} \end{pmatrix}.

  • Search direction: d2=(034)d_2 = \begin{pmatrix} 0 \\ -\frac{3}{4} \end{pmatrix}.

  • Step size α2=12\alpha_2 = \frac{1}{2}.

  • Update: x3=(14,12)+12(034)=(14,18)x_3 = \left(\frac{1}{4}, \frac{1}{2}\right) + \frac{1}{2} \begin{pmatrix} 0 \\ -\frac{3}{4} \end{pmatrix} = \left(\frac{1}{4}, \frac{1}{8}\right).

  • Difference: f(x3)f(x2)=0.14>0.05\| f(x_3) - f(x_2) \| = 0.14 > 0.05.

Iteration 3

  • Gradient at x3x_3: ablaf(14,18)=(380)abla f\left(\frac{1}{4}, \frac{1}{8}\right) = \begin{pmatrix} \frac{3}{8} \\ 0 \end{pmatrix}.

  • Search direction: d3=(380)d_3 = \begin{pmatrix} -\frac{3}{8} \\ 0 \end{pmatrix}.

  • Step size α3=12\alpha_3 = \frac{1}{2}.

  • Update: x4=(116,18)x_4 = \left(\frac{1}{16}, \frac{1}{8}\right).

  • Difference: f(x4)f(x3)=0.03<0.05\| f(x_4) - f(x_3) \| = 0.03 < 0.05. Stop.


5. Key Takeaways

  • Step Size Calculation: Critical for convergence; found by minimizing f(xk+αdk)f(x_k + \alpha d_k).

  • Convergence Check: Monitor f(xk+1)f(xk)\| f(x_{k+1}) - f(x_k) \| to decide termination.

  • Oscillation Pattern: The example shows zigzagging toward the minimum due to repeated direction changes in alternating coordinates.


6. Tips for Implementation

  • Use symbolic computation tools (e.g., SymPy) to compute gradients and step sizes.

  • For faster convergence, consider combining with conjugate gradient methods.

  • Verify second derivatives (d2fdα2\frac{d^2 f}{d\alpha^2}) to confirm minima during line searches.

Lecture 7: (08/02/2025)

Lecture 7: (08/01/2025)

Class Recordingarrow-up-right:

The Conjugate Gradient Method is an iterative optimization algorithm for minimizing quadratic functions, particularly effective for solving large systems of linear equations. Here's a structured summary of key concepts from the lecture notes:


Quadratic Function Form

The method minimizes functions of form:

f(x)=12xTQx+bTx+cf(x) = \frac{1}{2} x^T Q x + b^T x + c

where QQ is a positive definite symmetric matrix, bb is a vector, and cc is a constant.


Conjugate Directions

  • Definition: Non-zero vectors d1,d2d_1, d_2 are QQ-conjugate if:

    d1TQd2=0d_1^T Q d_2 = 0
  • Key Property: Conjugate directions ensure that the algorithm converges to the minimum in at most nn iterations (for nn-dimensional problems).


Algorithm Steps

  1. Initialization: Start at x1x_1. Compute initial gradient ablaf(x1)abla f(x_1).

  2. First Direction: d1=f(x1)d_1 = -\nabla f(x_1).

  3. Iterative Update:

    • Compute step size αi\alpha_i by minimizing f(xi+αidi)f(x_i + \alpha_i d_i).

    • Update xi+1=xi+αidix_{i+1} = x_i + \alpha_i d_i.

    • Calculate new gradient ablaf(xi+1)abla f(x_{i+1}).

    • Compute next direction:

      di+1=f(xi+1)+f(xi+1)2f(xi)2did_{i+1} = -\nabla f(x_{i+1}) + \frac{\|\nabla f(x_{i+1})\|^2}{\|\nabla f(x_i)\|^2} d_i
  4. Stopping Condition: Terminate when f(xk)<ε\|\nabla f(x_k)\| < \varepsilon.


Example Application

Minimize f(x,y)=xy+2x2+2xy+y2f(x, y) = x - y + 2x^2 + 2xy + y^2 starting at (0,0)(0, 0):

  1. Gradient:

    f(x,y)=(1+4x+2y1+2x+2y)\nabla f(x, y) = \begin{pmatrix} 1 + 4x + 2y \\ -1 + 2x + 2y \end{pmatrix}
  2. Iteration 1:

    • ablaf(0,0)=(1,1)abla f(0, 0) = (1, -1), d1=(1,1)d_1 = (-1, 1).

    • Optimal α1=1\alpha_1 = 1, leading to x2=(1,1)x_2 = (-1, 1).

  3. Iteration 2:

    • ablaf(1,1)=(1,1)abla f(-1, 1) = (-1, -1), d2=(0,2)d_2 = (0, 2).

    • Optimal α2=14\alpha_2 = \frac{1}{4}, leading to x3=(1,1.5)x_3 = (-1, 1.5).

    • ablaf(1,1.5)=(0,0)abla f(-1, 1.5) = (0, 0), confirming optimality.


Connection to Linear Systems

Solving Ax=BAx = B is equivalent to minimizing:

f(x)=12xTAxBTxf(x) = \frac{1}{2} x^T A x - B^T x

Example: For the system:

{x+y=3xy=1\begin{cases} x + y = 3 \\ x - y = 1 \end{cases}

the corresponding quadratic function is:

f(x,y)=12x2+xy12y23xyf(x, y) = \frac{1}{2}x^2 + xy - \frac{1}{2}y^2 - 3x - y

Setting ablaf=0abla f = 0 recovers the original equations.


Key Takeaways

  1. Efficiency: Converges in nn steps for nn-dimensional problems.

  2. Q-Conjugacy: Ensures search directions are non-interfering.

  3. Applications: Beyond optimization, it solves linear systems when AA is symmetric positive definite.

This method combines gradient descent’s simplicity with conjugate directions’ efficiency, making it powerful for large-scale problems.

Lecture 8: (09/02/2025)

Class Recordingarrow-up-right:

Least Squares Method Overview

Finds the best-fit line y=mx+by = mx + b for non-collinear data points by minimizing the sum of squared vertical errors e=i=1n(yi(mxi+b))2e = \sum_{i=1}^n (y_i - (mx_i + b))^2.


Core Components

Error Function: e=i=1n(b+mxiyi)2e = \sum_{i=1}^n (b + mx_i - y_i)^2

Optimization Approach:

  1. Take partial derivatives of ee with respect to bb and mm

  2. Set derivatives to zero for minimization:

    {eb=2(b+mxiyi)=0em=2(b+mxiyi)xi=0\begin{cases} \frac{\partial e}{\partial b} = 2\sum (b + mx_i - y_i) = 0 \\ \frac{\partial e}{\partial m} = 2\sum (b + mx_i - y_i)x_i = 0 \end{cases}

Matrix Formulation

System Setup:

A=[1x11x21xn], X=[bm], Y=[y1y2yn]A = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix},\ X = \begin{bmatrix} b \\ m \end{bmatrix},\ Y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}

Normal Equations:

ATAX=ATYA^TAX = A^TY
[nxixixi2][bm]=[yixiyi]\begin{bmatrix} n & \sum x_i \\ \sum x_i & \sum x_i^2 \end{bmatrix} \begin{bmatrix} b \\ m \end{bmatrix} = \begin{bmatrix} \sum y_i \\ \sum x_iy_i \end{bmatrix}

Example Solution Walkthrough

Given Points: (0,6), (1,0), (2,0)

  1. Error Function:

    e=(b6)2+(m+b)2+(2m+b)2e = (b-6)^2 + (m+b)^2 + (2m+b)^2
  2. Partial Derivatives:

    eb=6b+6m12=0\frac{\partial e}{\partial b} = 6b + 6m - 12 = 0
    em=10m+6b=0\frac{\partial e}{\partial m} = 10m + 6b = 0
  3. Solve System:

    {6b+6m=1210m+6b=0b=5, m=3\begin{cases} 6b + 6m = 12 \\ 10m + 6b = 0 \end{cases} \Rightarrow b=5,\ m=-3

Best Fit Line: y=3x+5y = -3x + 5


Key Takeaways

  1. Geometric Interpretation: Minimizes vertical distances from points to line

  2. Matrix Advantage: Systematic solution for any number of points

  3. Uniqueness: Solution is unique if ATAA^TA is invertible (data not collinear)

  4. Verification: Hessian matrix [66610]\begin{bmatrix}6&6\\6&10\end{bmatrix} positive definite → True minimum

Applications: Curve fitting, trend analysis, and statistical regression models.


Practice Exercise

Using the normal equations, find the best-fit line for: (5,16), (10,19), (15,23), (20,26), (25,30)

Hint: Set up equations:

{5b+75m=11475b+1375m=1885\begin{cases} 5b + 75m = 114 \\ 75b + 1375m = 1885 \end{cases}

Factor 2 Start

In factor-2 I make notes Topic wise

Last updated