Optimization
Book: Kambo N.S. Mathematical
Classes will done in two factors
Course Content
Unconstrained Optimization : Unconstrained Optimization : Convex sets and functions, Optimality conditions: First order and second order, line search methods, least squares, steepest descent, newton method, Quasi-Newton Method, conjugate gradient methods.
Pre-requisite : Basic understanding of Matrix operations, Determinants, Derivatives and higher derivatives, partial derivatives, gradients ( Mathematics of class 11 and 12 in general). Lecture Plan is Here ⇒
Lecture 1 (Jan 11) : Vector spaces, Linear combination, Basis, Dimension.
Lecture 2 (Jan 18): Convex sets and Convex combinations, convex and concave functions, Necessary and sufficient conditions for Local extrema in functions of one variables
Lecture 3 (Jan 19): Gradient of a multivariable function, Hessian Matrix, Positive definite Matrix, Positive Semi-definite Matrix, Negative definite Matrix, Negative Semi-definite Matrix, Necessary and sufficient conditions for Local extrema in functions of several variables, Started discussing Golden Section Method.
Lecture 4 (Jan 25): Line Search Methods : Golden section Method and Fibonacci Method
Lecture 5 (Feb 1): Definition of Taylor's series, Newton's Method, Quasi Newton's Method
Lecture 6 (Feb 2): Directional derivatives, Steepest descent Method (& Quiz - I)
Lecture 7 (Feb 8): Orthogonal Vectors, Conjugate directions, Conjugate gradient Method
Lecture 8 (Feb 9): System of linear equations, Least square approximations (& Quiz - II)
Constrained Optimization : barrier method, penalty method, interior point methods, KKT method and Lagrangian Duality, simplex, Frank and Wolfe method, applications to dynamic programming and optimal control.
Lecture 1:(11/01/2025)
)Optimization is the act of obtaining the best result under the given circumstances.
Minimize the effort ( cost ) or Maximize the output ( Profit ).
Objective function ⇒ the problem statement.
Feasible solution ⇒ vector satisfying all the constraints of a given problem is called feasible solution of that problem.
Feasible region ⇒ The collection of all feasible solutions is called feasible region

Feasible solutions ⇒ In LLP it lies at corner of the feasible region.
Optimal solution ⇒ the only one feasible solution is optimal solution which is min/max depend on problem statement.

Linear Programming Problem(LLP) ⇒ the problem with max power is 1, x+ y=8
Non Linear Programming Problem(NLLP) ⇒ the problem with power more than 1
Vector and tuple
Vector space
Lecture 2:(18/01/2025)
)Convex set: Any 2 point joing by line shoukd be in the domain
Convex combination:
A convex combination is a weighted sum of vectors where the scalars are non-negative and sum to 1. like Sets,

Functions:
f(x) = x
Local minima: A fucntion of one varible f(x) is such that have a local minima at x* if f(x) <= f(x* +h) for all sufficeint small positive and neg value of h
Local maxima:
f(x) >= f(x* +h)
if any point that did’t satisfy any condition then it nighter
Global Minima:
f(x*) <=f(x) for all x in domain f(x)
Necessary Condition:
point if f(x) if df(x)/d(x) = 0 at x*
derivative is slope
Sufficient condition:
Use the Second Derivative Test
If f′′(xc)>0f′′(xc)>0, xcxc is a local minimum.
If f′′(xc)<0f′′(xc)<0, xcxc is a local maximum.
If f′′(xc)=0f′′(xc)=0, further analysis may be needed.
Convex function(U): (upward parabola) A function f(x) is convex if the graph lies below or on the straight line joining f(x) and f(y) for any two points x and y in the domain.
Concave function(reverse U): (downward parabola) A function f(x) is concave if the graph lies above or on the straight line joining f(x) and f(y) for any two points x and y in the domain.
Neither CONVEX nor CONCAVE if any condition did’t satisfy like(U and reverse U together)
if both condition satisfy then it's straight line
Empty set is Convex set
Singleton set is Convex set i.e {a} is also Convex set
Line search method

Lecture 3: (19/01/2025)
)Hassion matrix
The Hessian matrix of a function f(x,y) is a square matrix of second-order partial derivatives. It represents the curvature of the function.


Symmetric matrix
A symmetric matrix is a square matrix that is equal to its transpose, meaning the elements across the diagonal are mirror images of each other.
Where we can change row and column change will not change matrix
Hessian is always a Symmetric matrix

Determinants
The determinants are calculated from the top-left corner of the matrix.
like det(1) = a_11, det(2)= [a_11 * a_22 + a_12*a_21] and so on...

Positive definite Hessian (Local minima)
all its leading principal minors determinants are positive, i.e calculating from left to right all the the determinate are positive.( excluding 0)
To have a local minimum at a critical point, the Hessian H must be positive definite.
If in Positive definite has some 0 values then it's positive semi-definite.
Negative definite Hessian (Local maxima)
all its leading principal minors alternate in sign, starting with negative.( excluding 0) This means:

If in Negative definite has some 0 values then it's negative semi-definite.
Saddle
If both condition din’t satisfy then is neither, A saddle point is a point on a multivariable function's graph where the tangent plane is flat, but it's not a local minimum or maximum
To determine convexity or concavity using the Hessian matrix:
If the Hessian matrix is constant (i.e., independent of x and y) and positive semidefinite, then the function is convex and same for concave.

However, positive definite does not always imply that the method is convex, as convexity is a global property, while positive definiteness checks only local behavior.
convex / concave
A function is convex if its Hessian matrix is positive semi-definite
A function is concave if its Hessian matrix is negative semi-definite
but Not every positive semi-definite Hessian guarantees a convex function
Global Minima/Maxima
In some case when we have dependent in hasian matrix then
if value of f(x,y) is lowest at (x,y) in all the critical points of (x,y) then point (x,y) is Global minima.

Method
unimodal function => if f(x) has only one local mama or maxima(but not both) in region [a,b] {[a, b] is internal of search }
Line search method
In a line search method, the function being optimised should be unimodal along the search direction. This means the function has only one minimum or maximum within the search interval, ensuring that a clear optimum exists.
For example,
in a function,
the function is unimodal because it has a single minimum at x=0
We will learn 2 method => Golden section method and Fibonacci method
To find => max or min of single variable objective unimodal function
Equal interval method (depreciated)
In Equal interval we use divide and conquer method by taking mid of every interval and compare the value and choice another another reasons.
Lecture 4: (25/01/2025)
)Golden section method
Golden Ratio => 1.618
In Golden section Method we take g = Golden Ratio - 1 = 0.618
g = 0.618
d = g(b-a) => in place of mid point like in Equal interval we calculated d
X1 = a+d
X2 = b-d
Then we calculate f(x1) and f(x2) and compare
If f(x2) < f(x1) then we reject x1 to b area
If f(x2) > f(x1) then we reject a to x1 area

Fibonacci Search Method
Here we use Fibonacci Search to get the value of d. Here we decide the number of iteration before doing the iteration(that's the change from Golden method).
Fibonacci series are ⇒ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377
Formula
Firstly we need to calculte n, for n we need to calculate Fn
ϵ is the desired precision, generally we take ϵ = 0.02 for hand calculations
Now compare Fn with the Fibonacci sequence and choose the smallest n such that Fn meets or exceeds the required value. for example ⇒
if Fn = 55.8 then we take index of 89, i.e. n = 12
if Fn = 35.2 then we take index of 55, i.e. n = 11
Now we know the number of iterations then by using this formula calcualte x,
a,b are the interval bounds.
Fn is the Fibonacci number at step nnn.
k is the current iteration.
now we have [x_k, y_k] as two point then calculated F(x_k) and F(y_k) and Selecte the new [a,b] similar to Golden Search method. we do this till k = n.
Lecture 5: (01/02/2025)
)
Descent Method
The is an optimization technique used to minimize a function. It works by iteratively moving in the direction of the steepest decrease of the function.
Newton's Method
Newton's method is a powerful optimization technique that uses second-order derivatives to find minima or maxima of functions. Here's a structured breakdown of key concepts from the lecture notes:
1. Foundation: Taylor Series Expansion
For an infinitely differentiable function f(x) around point a:
This approximation forms the basis for deriving Newton's method[1].
2. Single Variable Newton Method
Algorithm Steps:
Compute first and second derivatives: f′(x), f′′(x)
Update rule:
Stop when ∣f′(x)∣<ε
Example: f(x)=x2+2x
f′(x)=2x+2, f′′(x)=2
Starting at a1=0:
Verification: f′(−1)=0 indicates local minimum[1]
3. Multivariable Extension
For f(x) where x=(x1,x2,...,xn):
Compute gradient ablaf and Hessian Hf
Update rule:
Key Components:
Gradient: ablaf=(∂x1∂f,...,∂xn∂f)T
Hessian: Hf=[∂xi∂xj∂2f]n×n
Example Optimization: f(x,y)=x−y+2x2+2xy+y2
Gradient: ablaf=(1+4x+2y−1+2x+2y)
Hessian: Hf=(4222)
Starting at (0,0):
Key Takeaways:
Requires computation of second derivatives
Quadratic convergence rate near minima
Hessian must be positive definite for minima
Matrix inversion can be computationally expensive for high dimensions
This method is particularly effective when good initial estimates are available and second derivatives can be efficiently computed. The lecture examples demonstrate both its power in quick convergence and the computational complexity involved in matrix operations
Lecture 6: (02/01/2025)
)Newton's method is a powerful optimization technique that uses second-order derivatives to find minima or maxima of functions. Here's a structured breakdown of key concepts from the lecture notes:
1. Foundation: Taylor Series Expansion
For an infinitely differentiable function f(x) around point a:
This approximation forms the basis for deriving Newton's method[1].
2. Single Variable Newton Method
Algorithm Steps:
Compute first and second derivatives: f′(x), f′′(x)
Update rule:
Stop when ∣f′(x)∣<ε
Example: f(x)=x2+2x
f′(x)=2x+2, f′′(x)=2
Starting at a1=0:
Verification: f′(−1)=0 indicates local minimum[1]
3. Multivariable Extension
For f(x) where x=(x1,x2,...,xn):
Compute gradient ablaf and Hessian Hf
Update rule:
Key Components:
Gradient: ablaf=(∂x1∂f,...,∂xn∂f)T
Hessian: Hf=[∂xi∂xj∂2f]n×n
Example Optimization: f(x,y)=x−y+2x2+2xy+y2
Gradient: ablaf=(1+4x+2y−1+2x+2y)
Hessian: Hf=(4222)
Starting at (0,0):
A^{-1} = \frac{1}{ad-bc}\begin{pmatrix} d & -b \ -c & a \end{pmatrix}
Lecture 6: (02/01/2025)
)The steepest descent method is an iterative optimization technique used to find the minimum of a function. Here's a structured breakdown of the key concepts and example problem from the lecture notes:
1. Core Concept: Steepest Descent Method
Objective: Minimize a function f:Rn→R.
Update Rule: xk+1=xk+αkdk, where dk=−∇f(xk) (steepest descent direction) and αk is the step size.
2. Mathematical Foundations
Directional Derivative
For f(x) at point x in direction d: Directional derivative=dT⋅∇f(x)
Descent direction: dT⋅∇f(x)<0
Ascent direction: dT⋅∇f(x)>0
Gradient Properties
Steepest ascent direction: ablaf(x)
Steepest descent direction: −∇f(x)
3. Algorithm Steps
Initialize: Choose starting point x1. Set iteration i=1.
Compute Search Direction: di=−∇f(xi).
Find Optimal Step Size: Minimize f(xi+αidi) with respect to αi.
Update: xi+1=xi+αidi.
Check Stopping Criterion: If ∥f(xi+1)−f(xi)∥<ϵ, stop. Otherwise, increment i and repeat.
4. Example: Minimize f(x,y)=x2−xy+y2
Initialization
Start at x1=(1,21).
Gradient:
∇f(x,y)=(2x−y−x+2y)⟹∇f(1,21)=(230)Search direction: d1=−∇f(x1)=(−230).
Iteration 1
Minimize f(1−23α1,21):
Solve dα1df=0⟹α1=21.
Update:
x2=(1,21)+21(−230)=(41,21)Function values: f(x1)=43, f(x2)=163.
Difference: ∥f(x2)−f(x1)∥=0.56>0.05.
Iteration 2
Gradient at x2: ablaf(41,21)=(043).
Search direction: d2=(0−43).
Step size α2=21.
Update: x3=(41,21)+21(0−43)=(41,81).
Difference: ∥f(x3)−f(x2)∥=0.14>0.05.
Iteration 3
Gradient at x3: ablaf(41,81)=(830).
Search direction: d3=(−830).
Step size α3=21.
Update: x4=(161,81).
Difference: ∥f(x4)−f(x3)∥=0.03<0.05. Stop.
5. Key Takeaways
Step Size Calculation: Critical for convergence; found by minimizing f(xk+αdk).
Convergence Check: Monitor ∥f(xk+1)−f(xk)∥ to decide termination.
Oscillation Pattern: The example shows zigzagging toward the minimum due to repeated direction changes in alternating coordinates.
6. Tips for Implementation
Use symbolic computation tools (e.g., SymPy) to compute gradients and step sizes.
For faster convergence, consider combining with conjugate gradient methods.
Verify second derivatives (dα2d2f) to confirm minima during line searches.
Lecture 7: (08/02/2025)
)Lecture 7: (08/01/2025)
)The Conjugate Gradient Method is an iterative optimization algorithm for minimizing quadratic functions, particularly effective for solving large systems of linear equations. Here's a structured summary of key concepts from the lecture notes:
Quadratic Function Form
The method minimizes functions of form:
where Q is a positive definite symmetric matrix, b is a vector, and c is a constant.
Conjugate Directions
Definition: Non-zero vectors d1,d2 are Q-conjugate if:
d1TQd2=0Key Property: Conjugate directions ensure that the algorithm converges to the minimum in at most n iterations (for n-dimensional problems).
Algorithm Steps
Initialization: Start at x1. Compute initial gradient ablaf(x1).
First Direction: d1=−∇f(x1).
Iterative Update:
Compute step size αi by minimizing f(xi+αidi).
Update xi+1=xi+αidi.
Calculate new gradient ablaf(xi+1).
Compute next direction:
di+1=−∇f(xi+1)+∥∇f(xi)∥2∥∇f(xi+1)∥2di
Stopping Condition: Terminate when ∥∇f(xk)∥<ε.
Example Application
Minimize f(x,y)=x−y+2x2+2xy+y2 starting at (0,0):
Gradient:
∇f(x,y)=(1+4x+2y−1+2x+2y)Iteration 1:
ablaf(0,0)=(1,−1), d1=(−1,1).
Optimal α1=1, leading to x2=(−1,1).
Iteration 2:
ablaf(−1,1)=(−1,−1), d2=(0,2).
Optimal α2=41, leading to x3=(−1,1.5).
ablaf(−1,1.5)=(0,0), confirming optimality.
Connection to Linear Systems
Solving Ax=B is equivalent to minimizing:
Example: For the system:
the corresponding quadratic function is:
Setting ablaf=0 recovers the original equations.
Key Takeaways
Efficiency: Converges in n steps for n-dimensional problems.
Q-Conjugacy: Ensures search directions are non-interfering.
Applications: Beyond optimization, it solves linear systems when A is symmetric positive definite.
This method combines gradient descent’s simplicity with conjugate directions’ efficiency, making it powerful for large-scale problems.
Lecture 8: (09/02/2025)
)Least Squares Method Overview
Finds the best-fit line y=mx+b for non-collinear data points by minimizing the sum of squared vertical errors e=∑i=1n(yi−(mxi+b))2.
Core Components
Error Function: e=∑i=1n(b+mxi−yi)2
Optimization Approach:
Take partial derivatives of e with respect to b and m
Set derivatives to zero for minimization:
{∂b∂e=2∑(b+mxi−yi)=0∂m∂e=2∑(b+mxi−yi)xi=0
Matrix Formulation
System Setup:
Normal Equations:
Example Solution Walkthrough
Given Points: (0,6), (1,0), (2,0)
Error Function:
e=(b−6)2+(m+b)2+(2m+b)2Partial Derivatives:
∂b∂e=6b+6m−12=0∂m∂e=10m+6b=0Solve System:
{6b+6m=1210m+6b=0⇒b=5, m=−3
Best Fit Line: y=−3x+5
Key Takeaways
Geometric Interpretation: Minimizes vertical distances from points to line
Matrix Advantage: Systematic solution for any number of points
Uniqueness: Solution is unique if ATA is invertible (data not collinear)
Verification: Hessian matrix [66610] positive definite → True minimum
Applications: Curve fitting, trend analysis, and statistical regression models.
Practice Exercise
Using the normal equations, find the best-fit line for: (5,16), (10,19), (15,23), (20,26), (25,30)
Hint: Set up equations:
Factor 2 Start
In factor-2 I make notes Topic wise
Last updated