# Optimization Classes will done in two factors

Course Content

* **Unconstrained Optimization** : Unconstrained Optimization : Convex sets and functions, Optimality conditions: First order and second order, line search methods, least squares, steepest descent, newton method, Quasi-Newton Method, conjugate gradient methods. Pre-requisite : Basic understanding of Matrix operations, Determinants, Derivatives and higher derivatives, partial derivatives, gradients ( Mathematics of class 11 and 12 in general). **Lecture Plan is Here ⇒** * **Lecture 1 (Jan 11)** : Vector spaces, Linear combination, Basis, Dimension. * **Lecture 2 (Jan 18)**: Convex sets and Convex combinations, convex and concave functions, Necessary and sufficient conditions for Local extrema in functions of one variables * **Lecture 3 (Jan 19)**: Gradient of a multivariable function, Hessian Matrix, Positive definite Matrix, Positive Semi-definite Matrix, Negative definite Matrix, Negative Semi-definite Matrix, Necessary and sufficient conditions for Local extrema in functions of several variables, Started discussing Golden Section Method. * **Lecture 4 (Jan 25)**: Line Search Methods : Golden section Method and Fibonacci Method * **Lecture 5 (Feb 1)**: Definition of Taylor's series, Newton's Method, Quasi Newton's Method * **Lecture 6 (Feb 2)**: Directional derivatives, Steepest descent Method (& Quiz - I) * **Lecture 7 (Feb 8)**: Orthogonal Vectors, Conjugate directions, Conjugate gradient Method * **Lecture 8 (Feb 9)**: System of linear equations, Least square approximations (& Quiz - II) * **Constrained Optimization** : barrier method, penalty method, interior point methods, KKT method and Lagrangian Duality, simplex, Frank and Wolfe method, applications to dynamic programming and optimal control.

Exams

* 50% internal * 30% for 2 Quiz * 2Feb * 9 Feb * 20% Assignments * [1st Deadline ⇒ 23Feb](https://m-tech-in-artificial-intelligenc.gitbook.io/manvendrapratapsinghdev/trimester-1/broken-reference) * 50% Main

Material

* [**Class Recordings**](https://general-smile-94b.notion.site/ODS-Class-Recording-1990dfee4e4380b5a982d9a89a5d7e29) * [**Class Material**](https://github.com/manvendrapratapsinghdev/IITJMaterial/tree/main/T1/ODS) * [**Books**](https://m-tech-in-artificial-intelligenc.gitbook.io/manvendrapratapsinghdev/trimester-1/broken-reference) * Draw Graph * * [https://www.desmos.com/calculator/rtckjjms9z](https://www.geogebra.org/classic?lang=en) * Probability & Statistics *

## Lecture 1:*(11/01/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1VuUdq0BxwlxlhdDWofKgJH9xfKp7mH6s/view?usp=sharing) {% file src="" %} {% file src="" %} > Optimization is the act of obtaining the best result under the given circumstances. > > Minimize the effort ( cost ) or Maximize the output ( Profit ). **Objective function** ⇒ the problem statement. **Feasible solution ⇒** vector satisfying all the constraints of a given problem is called **feasible solution** of that problem. **Feasible region ⇒** The collection of all feasible solutions is called **feasible region** ``` The minimization problem is to find feasible solution k such that f(k ) ≤ f(z) for all feasible solutions z. Such a solution is called optimal solution. • Min f(z) = Max (–f(z)) • Max f(z)= Min (-f(z)) ```

**Feasible solutions** ⇒ In LLP it lies at corner of the feasible region. **Optimal solution ⇒** the only one feasible solution is optimal solution which is min/max depend on problem statement.

Linear Programming Problem(**LLP**) ⇒ the problem with max power is 1, x+ y=8 $$ R(n) = {x1, x2, x3,...xn} | x i\*n $$ Non Linear Programming Problem(**NLLP**) ⇒ the problem with power more than 1 * Vector and tuple * Vector space ## Lecture 2:*(18/01/2025`)`* [**Class Recording**](https://drive.google.com/file/d/10pBIVlkkg-0OEYIB-AOH9dPSpBDOS-tn/view?usp=sharing) {% file src="" %} **Convex set**: Any 2 point joing by line shoukd be in the domain **Convex combination**: A convex combination is a weighted sum of vectors where the scalars are non-negative and sum to 1. like Sets,

This combination lies on the line segment joining v1 and v2

### Functions: f(x) = x **Local minima**: A fucntion of one varible f(x) is such that have a local minima at x\* if f(x) <= f(x\* +h) for all sufficeint small positive and neg value of h **Local maxima**: f(x) >= f(x\* +h) if any point that did’t satisfy any condition then it nighter **Global Minima**: f(x\*) <=f(x) for all x in domain f(x) **Necessary Condition**: point if f(x) if df(x)/d(x) = 0 at x\* derivative is slope **Sufficient condition:** Use the Second Derivative Test If f′′(xc)>0f′′(xc)>0, xcxc is a local minimum. If f′′(xc)<0f′′(xc)<0, xcxc is a local maximum. If f′′(xc)=0f′′(xc)=0, further analysis may be needed. \ **Convex function(U)**: (upward parabola) A function *f(x)* is convex if the graph lies **below or on** the straight line joining *f(x)* and *f(y)* for any two points x and y in the domain. $$ f(tx\_1 +(1−t)x\_2)≤ tf(x\_1)+(1−t)f(x\_2), ∀t∈ \[0,1] $$ **Concave function**(reverse U): (downward parabola) A function f(x) is concave if the graph lies **above or on** the straight line joining f(x) and f(y) for any two points x and y in the domain. $$ f(tx\_1 +(1−t)x\_2) ≥ tf(x\_1)+(1−t)f(x\_2), ∀t∈ \[0,1] $$ Neither **CONVEX** nor **CONCAVE** if any condition did’t satisfy like(U and reverse U together) if both condition satisfy then it's straight line **Empty set is Convex set** **Singleton set is Convex set i.e** {a} is also Convex set ### Line search method

## Lecture 3: *(19/01/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1h-oUp6cPg9yZbg2KhCrmb7neRKIgs-CR/view?usp=sharing) {% file src="" %} ### **Hassion matrix** The **Hessian matrix** of a function f(x,y) is a square matrix of second-order partial derivatives. It represents the curvature of the function.

### Symmetric matrix A **symmetric matrix** is a square matrix that is equal to its transpose, meaning the elements across the diagonal are mirror images of each other. Where we can change row and column change will not change matrix **Hessian** is always a Symmetric matrix

### **Determinants** The determinants are calculated from the top-left corner of the matrix. like det(1) = a\_11, det(2)= \[a\_11 \* a\_22 + a\_12\*a\_21] and so on...

#### Positive definite Hessian (Local minima) all its **leading principal minors** determinants are positive, i.e calculating from left to right all the the determinate are positive.( excluding 0) To have a **local minimum** at a critical point, the Hessian H must be **positive definite**. If in Positive definite has some 0 values then it's **positive semi-definite.** #### Negative definite Hessian (Local maxima) all its **leading principal minors** alternate in sign, starting with negative.( excluding 0) This means:

If in Negative definite has some 0 values then it's **negative semi-definite.** #### Saddle If both condition din’t satisfy then is neither, A saddle point is a point on a multivariable function's graph where the tangent plane is flat, but it's not a local minimum or maximum ### To determine convexity or **concavity** using the Hessian matrix: {% hint style="info" %} If the Hessian matrix is **constant** (i.e., independent of x and y) and **positive semidefinite**, then the function is **convex** and same for **concave.** {% endhint %}

{% hint style="warning" %} However, **positive definite** does not always imply that the method is convex, as convexity is a global property, while positive definiteness checks only local behavior. {% endhint %} #### **convex / concave** * A function is **convex** if its Hessian matrix is **positive semi-definite** * A function is **concave** if its Hessian matrix is **negative semi-definite** {% hint style="danger" %} but Not every positive semi-definite Hessian guarantees a convex function {% endhint %} ### Global Minima/Maxima In some case when we have dependent in hasian matrix then if value of f(x,y) is lowest at (x,y) in all the critical points of (x,y) then point (x,y) is Global minima.

### Method **unimodal** function => if f(x) has only one local mama or maxima(but not both) in region \[a,b] {\[a, b] is internal of search } #### Line search method In a **line search method**, the function being optimised should be **unimodal** along the search direction. This means the function has only one minimum or maximum within the search interval, ensuring that a clear optimum exists. For example,\ in a function,

the function is **unimodal** because it has a single minimum at x=0 **`We will learn 2 method => Golden section method and Fibonacci method`** **To find** => max or min of single variable objective **unimodal** function #### **Equal interval method** (depreciated) In Equal interval we use divide and conquer method by taking mid of every interval and compare the value and choice another another reasons. ## Lecture 4: *(25/01/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1hh1AA8WahyqLMwKP3tB9cfnBIwwCipn0/view?usp=sharing) ### Golden section method {% file src="" %} {% file src="" %} Golden Ratio => 1.618 In Golden section Method we take `g = Golden Ratio - 1 = 0.618` `g = 0.618` `d = g(b-a) => in place of mid point like in Equal interval we calculated d` `X1 = a+d` `X2 = b-d` Then we calculate f(x1) and f(x2) and compare `If f(x2) < f(x1) then we reject x1 to b area` `If f(x2) > f(x1) then we reject a to x1 area`

### Fibonacci Search Method {% file src="" %} {% file src="" %} Here we use Fibonacci Search to get the value of **d**. Here we decide the number of iteration before doing the iteration(that's the change from Golden method). **Fibonacci series** are ⇒ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377 #### **Formula** Firstly we need to calculte **n,** for n we need to calculate Fn $$ F\_n \geq \frac{(b - a)}{\epsilon} $$ ϵ is the **desired precision,** generally we take *ϵ* **= 0.02** for hand calculations Now compare Fn with the Fibonacci sequence and choose the ***smallest n*** such that Fn meets or exceeds the required value. for example ⇒ `if Fn = 55.8 then we take index of 89, i.e. n = 12` `if Fn = 35.2 then we take index of 55, i.e. n = 11`
Now we know the number of iterations then by using this formula calcualte x, $$ x\_k = a\_k + \frac{F\_{n-k}}{F\_{n+1}} (b\_k - a\_k) $$ $$ y\_k = b\_k - \frac{F\_{n-k}}{F\_{n+1}} (b\_k - a\_k) $$ * *a,b* are the interval bounds. * *Fn* is the Fibonacci number at step nnn. * *k* is the current iteration. now we have \[**x**\_*k*, **y**\_*k*] as two point then calculated F(**x**\_*k*) and F(**y**\_*k*) and Selecte the new \[a,b] similar to Golden Search method. we do this till k = n. ## Lecture 5: *(01/02/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1JgjKqvjpMe3o03xt0gPiOXNKWFbqrUMf/view?usp=sharing) {% file src="" %}

### **Descent Method** The is an optimization technique used to minimize a function. It works by iteratively moving in the direction of the steepest decrease of the function. ### Newton's Method Newton's method is a powerful optimization technique that uses second-order derivatives to find minima or maxima of functions. Here's a structured breakdown of key concepts from the lecture notes: ### 1. Foundation: Taylor Series Expansion For an infinitely differentiable function $$f(x)$$ around point $$a$$: $$ f(x) \approx f(a) + f'(a)(x-a) + \frac{1}{2}f''(a)(x-a)^2 $$ This approximation forms the basis for deriving Newton's method\[1]. ### 2. Single Variable Newton Method **Algorithm Steps:** 1. Compute first and second derivatives: $$f'(x)$$, $$f''(x)$$ 2. Update rule: $$ a\_{k+1} = a\_k - \frac{f'(a\_k)}{f''(a\_k)} $$ 3. Stop when $$|f'(x)| < \varepsilon$$ **Example:** $$f(x) = x^2 + 2x$$ * $$f'(x) = 2x+2$$, $$f''(x) = 2$$ * Starting at $$a\_1=0$$: $$ a\_2 = 0 - \frac{2}{2} = -1 $$ * Verification: $$f'(-1) = 0$$ indicates local minimum\[1] ### 3. Multivariable Extension For $$f(\mathbf{x})$$ where $$\mathbf{x} = (x\_1,x\_2,...,x\_n)$$: 1. Compute gradient $$abla f$$ and Hessian $$H\_f$$ 2. Update rule: $$ \mathbf{x}\_{k+1} = \mathbf{x}\_k - H\_f^{-1}(\mathbf{x}\_k)\nabla f(\mathbf{x}\_k) $$ **Key Components:** * Gradient: $$abla f = \left(\frac{\partial f}{\partial x\_1}, ..., \frac{\partial f}{\partial x\_n}\right)^T$$ * Hessian: $$H\_f = \left\[\frac{\partial^2 f}{\partial x\_i \partial x\_j}\right]\_{n\times n}$$ **Example Optimization:** $$f(x,y) = x-y + 2x^2 + 2xy + y^2$$ * Gradient: $$abla f = \begin{pmatrix} 1+4x+2y \ -1+2x+2y \end{pmatrix}$$ * Hessian: $$H\_f = \begin{pmatrix} 4 & 2 \ 2 & 2 \end{pmatrix}$$ * Starting at (0,0): **Key Takeaways:** * Requires computation of second derivatives * Quadratic convergence rate near minima * Hessian must be positive definite for minima * Matrix inversion can be computationally expensive for high dimensions This method is particularly effective when good initial estimates are available and second derivatives can be efficiently computed. The lecture examples demonstrate both its power in quick convergence and the computational complexity involved in matrix operations ## Lecture 6: *(02/01/2025`)`* Newton's method is a powerful optimization technique that uses second-order derivatives to find minima or maxima of functions. Here's a structured breakdown of key concepts from the lecture notes: ### 1. Foundation: Taylor Series Expansion For an infinitely differentiable function $$f(x)$$ around point $$a$$: $$ f(x) \approx f(a) + f'(a)(x-a) + \frac{1}{2}f''(a)(x-a)^2 $$ This approximation forms the basis for deriving Newton's method\[1]. ### 2. Single Variable Newton Method **Algorithm Steps:** 1. Compute first and second derivatives: $$f'(x)$$, $$f''(x)$$ 2. Update rule: $$ a\_{k+1} = a\_k - \frac{f'(a\_k)}{f''(a\_k)} $$ 3. Stop when $$|f'(x)| < \varepsilon$$ **Example:** $$f(x) = x^2 + 2x$$ * $$f'(x) = 2x+2$$, $$f''(x) = 2$$ * Starting at $$a\_1=0$$: $$ a\_2 = 0 - \frac{2}{2} = -1 $$ * Verification: $$f'(-1) = 0$$ indicates local minimum\[1] ### 3. Multivariable Extension For $$f(\mathbf{x})$$ where $$\mathbf{x} = (x\_1,x\_2,...,x\_n)$$: 1. Compute gradient $$abla f$$ and Hessian $$H\_f$$ 2. Update rule: $$ \mathbf{x}\_{k+1} = \mathbf{x}\_k - H\_f^{-1}(\mathbf{x}\_k)\nabla f(\mathbf{x}\_k) $$ **Key Components:** * Gradient: $$abla f = \left(\frac{\partial f}{\partial x\_1}, ..., \frac{\partial f}{\partial x\_n}\right)^T$$ * Hessian: $$H\_f = \left\[\frac{\partial^2 f}{\partial x\_i \partial x\_j}\right]\_{n\times n}$$ **Example Optimization:** $$f(x,y) = x-y + 2x^2 + 2xy + y^2$$ * Gradient: $$abla f = \begin{pmatrix} 1+4x+2y \ -1+2x+2y \end{pmatrix}$$ * Hessian: $$H\_f = \begin{pmatrix} 4 & 2 \ 2 & 2 \end{pmatrix}$$ * Starting at (0,0): $$$ \mathbf{x}\_2 = \begin{pmatrix} 0 \ 0 \end{pmatrix} - \frac{1}{4}\begin{pmatrix} 2 & -2 \ -2 & 4 \end{pmatrix}\begin{pmatrix} 1 \ -1 \end{pmatrix} = \begin{pmatrix} -1 \ 1.5 \end{pmatrix} $$\[1] ## 4. Practical Considerations **Stopping Conditions:** * Gradient magnitude: $$|\nabla f(\mathbf{x}\_k)| < \varepsilon$$ * Function value convergence: $$|f(\mathbf{x}\_{k+1}) - f(\mathbf{x}\_k)| < \varepsilon$$ **Matrix Inversion:** For 2x2 matrix $$A = \begin{pmatrix} a & b \ c & d \end{pmatrix}$$: $$$ A^{-1} = \frac{1}{ad-bc}\begin{pmatrix} d & -b \ -c & a \end{pmatrix} $$$ Used in Hessian inversion for multivariable updates\[1]. ## 5. Complex Example: Rosenbrock Function $$f(x,y) = 100(y-x^2)^2 + (1-x)^2$$ * Gradient: $$\nabla f = \begin{pmatrix} -400x(y-x^2)-2(1-x) \ 200(y-x^2) \end{pmatrix}$$ * Hessian contains second partial derivatives * Demonstrates convergence from (0,0) to (1,1) in 2 iterations\[1] **Key Takeaways:** 1. Requires computation of second derivatives 2. Quadratic convergence rate near minima 3. Hessian must be positive definite for minima 4. Matrix inversion can be computationally expensive for high dimensions This method is particularly effective when good initial estimates are available and second derivatives can be efficiently computed. The lecture examples demonstrate both its power in quick convergence and the computational complexity involved in matrix operations\[1]. Sources \[1] Lecture\_5.pdf $$$ ## Lecture 6: *(02/01/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1RFQpmY52v31uRRKHsZzDV9Q3okTVTOEO/view?usp=sharing): {% file src="" %} The steepest descent method is an iterative optimization technique used to find the minimum of a function. Here's a structured breakdown of the key concepts and example problem from the lecture notes: *** #### **1. Core Concept: Steepest Descent Method** * **Objective**: Minimize a function $$f: \mathbb{R}^n \rightarrow \mathbb{R}$$. * **Update Rule**:\ $$x\_{k+1} = x\_k + \alpha\_k d\_k$$,\ where $$d\_k = -\nabla f(x\_k)$$ (steepest descent direction) and $$\alpha\_k$$ is the step size. *** #### **2. Mathematical Foundations** **Directional Derivative** For $$f(x)$$ at point $$x$$ in direction $$d$$:\ $$\text{Directional derivative} = d^T \cdot \nabla f(x)$$ * **Descent direction**: $$d^T \cdot \nabla f(x) < 0$$ * **Ascent direction**: $$d^T \cdot \nabla f(x) > 0$$ **Gradient Properties** * Steepest ascent direction: $$abla f(x)$$ * Steepest descent direction: $$-\nabla f(x)$$ *** #### **3. Algorithm Steps** 1. **Initialize**: Choose starting point $$x\_1$$. Set iteration $$i = 1$$. 2. **Compute Search Direction**:\ $$d\_i = -\nabla f(x\_i)$$. 3. **Find Optimal Step Size**:\ Minimize $$f(x\_i + \alpha\_i d\_i)$$ with respect to $$\alpha\_i$$. 4. **Update**:\ $$x\_{i+1} = x\_i + \alpha\_i d\_i$$. 5. **Check Stopping Criterion**:\ If $$| f(x\_{i+1}) - f(x\_i) | < \epsilon$$, stop. Otherwise, increment $$i$$ and repeat. *** #### **4. Example: Minimize** $$f(x, y) = x^2 - xy + y^2$$ **Initialization** * Start at $$x\_1 = \left(1, \frac{1}{2}\right)$$. * Gradient: $$ \nabla f(x, y) = \begin{pmatrix} 2x - y \ -x + 2y \end{pmatrix} \implies \nabla f\left(1, \frac{1}{2}\right) = \begin{pmatrix} \frac{3}{2} \ 0 \end{pmatrix} $$ * Search direction: $$d\_1 = -\nabla f(x\_1) = \begin{pmatrix} -\frac{3}{2} \ 0 \end{pmatrix}$$. **Iteration 1** * Minimize $$f\left(1 - \frac{3}{2}\alpha\_1, \frac{1}{2}\right)$$: * Solve $$\frac{d}{d\alpha\_1} f = 0 \implies \alpha\_1 = \frac{1}{2}$$. * Update: $$ x\_2 = \left(1, \frac{1}{2}\right) + \frac{1}{2} \begin{pmatrix} -\frac{3}{2} \ 0 \end{pmatrix} = \left(\frac{1}{4}, \frac{1}{2}\right) $$ * Function values: $$f(x\_1) = \frac{3}{4}$$, $$f(x\_2) = \frac{3}{16}$$. * Difference: $$| f(x\_2) - f(x\_1) | = 0.56 > 0.05$$. **Iteration 2** * Gradient at $$x\_2$$: $$abla f\left(\frac{1}{4}, \frac{1}{2}\right) = \begin{pmatrix} 0 \ \frac{3}{4} \end{pmatrix}$$. * Search direction: $$d\_2 = \begin{pmatrix} 0 \ -\frac{3}{4} \end{pmatrix}$$. * Step size $$\alpha\_2 = \frac{1}{2}$$. * Update: $$x\_3 = \left(\frac{1}{4}, \frac{1}{2}\right) + \frac{1}{2} \begin{pmatrix} 0 \ -\frac{3}{4} \end{pmatrix} = \left(\frac{1}{4}, \frac{1}{8}\right)$$. * Difference: $$| f(x\_3) - f(x\_2) | = 0.14 > 0.05$$. **Iteration 3** * Gradient at $$x\_3$$: $$abla f\left(\frac{1}{4}, \frac{1}{8}\right) = \begin{pmatrix} \frac{3}{8} \ 0 \end{pmatrix}$$. * Search direction: $$d\_3 = \begin{pmatrix} -\frac{3}{8} \ 0 \end{pmatrix}$$. * Step size $$\alpha\_3 = \frac{1}{2}$$. * Update: $$x\_4 = \left(\frac{1}{16}, \frac{1}{8}\right)$$. * Difference: $$| f(x\_4) - f(x\_3) | = 0.03 < 0.05$$. **Stop**. *** #### **5. Key Takeaways** * **Step Size Calculation**: Critical for convergence; found by minimizing $$f(x\_k + \alpha d\_k)$$. * **Convergence Check**: Monitor $$| f(x\_{k+1}) - f(x\_k) |$$ to decide termination. * **Oscillation Pattern**: The example shows zigzagging toward the minimum due to repeated direction changes in alternating coordinates. *** #### **6. Tips for Implementation** * Use symbolic computation tools (e.g., SymPy) to compute gradients and step sizes. * For faster convergence, consider combining with conjugate gradient methods. * Verify second derivatives ($$\frac{d^2 f}{d\alpha^2}$$) to confirm minima during line searches. ## Lecture 7: *(08/02/2025`)`* ## Lecture 7: *(08/01/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1qGQ8_MuRIdHJwcrADqmeA3XZCof63gyc/view?usp=drive_link): {% file src="" %} The Conjugate Gradient Method is an iterative optimization algorithm for minimizing quadratic functions, particularly effective for solving large systems of linear equations. Here's a structured summary of key concepts from the lecture notes: *** ### **Quadratic Function Form** The method minimizes functions of form: $$ f(x) = \frac{1}{2} x^T Q x + b^T x + c $$ where $$Q$$ is a **positive definite symmetric matrix**, $$b$$ is a vector, and $$c$$ is a constant. *** ### **Conjugate Directions** * **Definition**: Non-zero vectors $$d\_1, d\_2$$ are $$Q$$-conjugate if: $$ d\_1^T Q d\_2 = 0 $$ * **Key Property**: Conjugate directions ensure that the algorithm converges to the minimum in at most $$n$$ iterations (for $$n$$-dimensional problems). *** ### **Algorithm Steps** 1. **Initialization**: Start at $$x\_1$$. Compute initial gradient $$abla f(x\_1)$$. 2. **First Direction**: $$d\_1 = -\nabla f(x\_1)$$. 3. **Iterative Update**: * Compute step size $$\alpha\_i$$ by minimizing $$f(x\_i + \alpha\_i d\_i)$$. * Update $$x\_{i+1} = x\_i + \alpha\_i d\_i$$. * Calculate new gradient $$abla f(x\_{i+1})$$. * Compute next direction: $$ d\_{i+1} = -\nabla f(x\_{i+1}) + \frac{|\nabla f(x\_{i+1})|^2}{|\nabla f(x\_i)|^2} d\_i $$ 4. **Stopping Condition**: Terminate when $$|\nabla f(x\_k)| < \varepsilon$$. *** ### **Example Application** Minimize $$f(x, y) = x - y + 2x^2 + 2xy + y^2$$ starting at $$(0, 0)$$: 1. **Gradient**: $$ \nabla f(x, y) = \begin{pmatrix} 1 + 4x + 2y \ -1 + 2x + 2y \end{pmatrix} $$ 2. **Iteration 1**: * $$abla f(0, 0) = (1, -1)$$, $$d\_1 = (-1, 1)$$. * Optimal $$\alpha\_1 = 1$$, leading to $$x\_2 = (-1, 1)$$. 3. **Iteration 2**: * $$abla f(-1, 1) = (-1, -1)$$, $$d\_2 = (0, 2)$$. * Optimal $$\alpha\_2 = \frac{1}{4}$$, leading to $$x\_3 = (-1, 1.5)$$. * $$abla f(-1, 1.5) = (0, 0)$$, confirming optimality. *** ### **Connection to Linear Systems** Solving $$Ax = B$$ is equivalent to minimizing: $$ f(x) = \frac{1}{2} x^T A x - B^T x $$ **Example**: For the system: $$ \begin{cases} x + y = 3 \\ x - y = 1 \end{cases} $$ the corresponding quadratic function is: $$ f(x, y) = \frac{1}{2}x^2 + xy - \frac{1}{2}y^2 - 3x - y $$ Setting $$abla f = 0$$ recovers the original equations. *** ### **Key Takeaways** 1. **Efficiency**: Converges in $$n$$ steps for $$n$$-dimensional problems. 2. **Q-Conjugacy**: Ensures search directions are non-interfering. 3. **Applications**: Beyond optimization, it solves linear systems when $$A$$ is symmetric positive definite. This method combines gradient descent’s simplicity with conjugate directions’ efficiency, making it powerful for large-scale problems. ## Lecture 8: *(09/02/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1VPK0gNNypqjMjzVBLjLRSRWvOK1fHOEB/view?usp=classroom_web\&authuser=0): {% file src="" %} ### Least Squares Method Overview Finds the best-fit line $$y = mx + b$$ for non-collinear data points by minimizing the sum of squared vertical errors $$e = \sum\_{i=1}^n (y\_i - (mx\_i + b))^2$$. *** #### Core Components **Error Function**:\ $$e = \sum\_{i=1}^n (b + mx\_i - y\_i)^2$$ **Optimization Approach**: 1. Take partial derivatives of $$e$$ with respect to $$b$$ and $$m$$ 2. Set derivatives to zero for minimization: $$ \begin{cases} \frac{\partial e}{\partial b} = 2\sum (b + mx\_i - y\_i) = 0 \\ \frac{\partial e}{\partial m} = 2\sum (b + mx\_i - y\_i)x\_i = 0 \end{cases} $$ *** #### Matrix Formulation **System Setup**: $$ A = \begin{bmatrix} 1 & x\_1 \\ 1 & x\_2 \\ \vdots & \vdots \\ 1 & x\_n \end{bmatrix},\\ X = \begin{bmatrix} b \ m \end{bmatrix},\\ Y = \begin{bmatrix} y\_1 \ y\_2 \ \vdots \ y\_n \end{bmatrix} $$ **Normal Equations**: $$ A^TAX = A^TY $$ $$ \begin{bmatrix} n & \sum x\_i \\ \sum x\_i & \sum x\_i^2 \end{bmatrix} \begin{bmatrix} b \ m \end{bmatrix} ============= \begin{bmatrix} \sum y\_i \ \sum x\_iy\_i \end{bmatrix} $$ *** #### Example Solution Walkthrough **Given Points**: (0,6), (1,0), (2,0) 1. **Error Function**: $$ e = (b-6)^2 + (m+b)^2 + (2m+b)^2 $$ 2. **Partial Derivatives**: $$ \frac{\partial e}{\partial b} = 6b + 6m - 12 = 0 $$ $$ \frac{\partial e}{\partial m} = 10m + 6b = 0 $$ 3. **Solve System**: $$ \begin{cases} 6b + 6m = 12 \\ 10m + 6b = 0 \end{cases} \Rightarrow b=5,\ m=-3 $$ **Best Fit Line**:\ $$y = -3x + 5$$ *** #### Key Takeaways 1. **Geometric Interpretation**: Minimizes vertical distances from points to line 2. **Matrix Advantage**: Systematic solution for any number of points 3. **Uniqueness**: Solution is unique if $$A^TA$$ is invertible (data not collinear) 4. **Verification**: Hessian matrix $$\begin{bmatrix}6&6\6&10\end{bmatrix}$$ positive definite → True minimum **Applications**: Curve fitting, trend analysis, and statistical regression models. *** #### Practice Exercise Using the normal equations, find the best-fit line for:\ (5,16), (10,19), (15,23), (20,26), (25,30) *Hint: Set up equations*: $$ \begin{cases} 5b + 75m = 114 \\ 75b + 1375m = 1885 \end{cases} $$ ## Factor 2 Start In factor-2 I make notes Topic wise {% file src="" %} {% embed url="" %}