# Machine Learning

ML comes where Human expertise does not exits

<details>

<summary>Course Content</summary>

**Introduction**: Definitions, Datasets for Machine Learning, Different Paradigms of Machine Learning, Data Normalization, Hypothesis Evaluation, VC-Dimensions and Distribution, Bias-Variance Tradeoff, Linear Regression, Classification (5-6 Lectures)&#x20;

• Bayes Decision Theory: Bayes decision rule, Minimum error rate classification, Normal density and discriminant functions Parameter Estimation: Maximum Likelihood and Bayesian Parameter Estimation (3-4 Lectures)&#x20;

• Discriminative Methods: SVM, Distance-based methods, Linear Discriminant Functions, Decision Tree, Random Decision Forest and Boosting (4 Lectures)&#x20;

• Dimensionality Reduction: PCA, LDA, ICA, SFFS, SBFS (2-3 Lectures)&#x20;

• Clustering: k-means clustering, Gaussian Mixture Modeling, EM-algorithm (3 Lectures)&#x20;

• Kernels and Neural Networks, Kernel Tricks, SVMs (primal and dual forms), K-SVR, K-PCA (2 Lectures)&#x20;

• Artificial Neural Networks: MLP, Backprop, and RBF-Net (3 Lectures)&#x20;

• Foundations of Deep Learning: CNN, Autoencoders (2-3 lectures)&#x20;

• Time series analysis

</details>

<details>

<summary>Exams</summary>

* 50% internal
  * 22.5% (7.5% each) for 3 -Quizzes
  * 12.5% for Assignments 1 ( 2 group)
  * 15% for Assignments 1 ( 3 group)
* 50% Main

</details>

<details>

<summary>Material </summary>

* [**Class Recordings**](https://general-smile-94b.notion.site/ML-Class-Recording-1990dfee4e4380fd8ce0cf27e0531a74)
* [**Class Material**](https://github.com/manvendrapratapsinghdev/IITJMaterial/tree/main/T1/ML)
* Python Library
  * <https://scikit-learn.org/stable/>
* Videos:
  * [Cost function](https://www.youtube.com/watch?v=7uwa9aPbBRU\&list=PLTDARY42LDV7WGmlzZtY-w9pemyPrKNUZ\&index=1)

</details>

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FtB0ia95RitYKgw7PQ5GD%2FHands%20On%20ML.pdf?alt=media&token=b05e2440-61f1-4a30-9cd1-27745db339b1>" %}

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FRjrWS7KT70X6nEfnQS2s%2FData%20Analysis%20Cheat%20Sheet.pdf?alt=media&token=9f510753-272e-4ceb-9773-23c8fdfaed46>" %}

## [*Gist of complete course*](https://app.napkin.ai/page/CgoiCHByb2Qtb25lEiwKBFBhZ2UaJDExNjg2YTlkLTQwYTYtNDdmMy1hNDBlLTg4YzFlZTIyYWQ4Mg?s=1)&#x20;

## Lecture 1: *(<mark style="color:orange;">11/01/2025</mark>`)`*

[**Class Recording**](https://futurense.zoom.us/rec/play/ZK98Z22v_ogK2QceGzu7tGf7v4yJHuVMpP1bgfdbROVE4cukCMnySDoO0b0ed6xOUF3fEEDnx7a-ht2F.wTjDIhS1ev3j25M7)

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2Fae3PIjgIKS0aiWldp833%2FLecture%201.pdf?alt=media&token=fed422cd-dbe4-4fbe-9b70-05ac0eb33664>" %}

Category of Data Set: \<Explalation needed>

### What is ML

* Learning is any process by which a system improves performance from experience – Herbert Simon
* A computer program is said to learn from experience E with respect to some task T and some performance measure P, if **its performance on T, as measured by P, improves with experience E**. —Tom Mitchell, 1997

### E, T, P examples

* **Tic-tac-toe**
  * **T**: Playing checkers&#x20;
  * **P**: Percentage of games won against an arbitrary opponent&#x20;
  * **E**: Playing practice games against itself&#x20;
* **Hand writing check**
  * **T**: Recognizing hand-written words&#x20;
  * **P**: Percentage of words correctly classified&#x20;
  * **E**: Database of humanlabeled images of handwritten words&#x20;
* **Auto driven car**&#x20;
  * **T**: Driving on four-lane highways using vision sensors&#x20;
  * **P**: Average distance traveled before a humanjudged error&#x20;
  * **E**: A sequence of images and steering commands recorded while observing a human driver.&#x20;
* **Email Spam check**
  * **T**: Categorize email messages as spam or legitimate.&#x20;
  * **P**: Percentage of email messages correctly classified.&#x20;
  * **E**: Database of emails, some with human-given labels

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FT4q6sJsKLqj3T6UIlt88%2FScreenshot%202025-01-14%20at%2010.02.38%E2%80%AFAM.png?alt=media&#x26;token=c76aa29a-077d-43c9-b558-2c6f172e4502" alt=""><figcaption></figcaption></figure>

### Task T

* Classifications of data
* Ranking
* Recommendation&#x20;
* Clustering&#x20;
* Density estimation

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2F69U3DX2u7fjQgJDOSyjr%2F1.png?alt=media&#x26;token=714a7d80-8253-48d8-a5cb-f6982dd29a3c" alt=""><figcaption></figcaption></figure>

### When do we use Machine Learning ?&#x20;

• Human expertise does not exist (navigating on Mars)&#x20;

• Humans can’t explain their expertise (speech recognition)&#x20;

• Models must be customized (personalized medicine)&#x20;

• Models are based on huge amounts of data (genomics)&#x20;

• Learning isn't always useful

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2F9noA61h1fs5Q5NvNXhJn%2FScreenshot%202025-01-14%20at%2010.11.15%E2%80%AFAM.png?alt=media&#x26;token=f5d4ec32-f5bc-4ccb-879c-2f9ad25d1419" alt=""><figcaption></figcaption></figure>

### Sample Applications of ML

* Web search &#x20;
* Computational biology
* Finance
* E-commerce
* Space exploration
* Robotics
* Information extraction
* Social networks
* Debugging software
* Medical imaging

## Lecture 2: *(<mark style="color:orange;">12/01/2025</mark>`)`*

[**Class Recording**](https://futurense.zoom.us/rec/play/TKTf0vJ54QxTT-b3DnDoPNni6nQTQGrtENY1CTs6LouRCLvaLOcZbRY5N_DN-EIuowLzZ6L9NqXMasH0.NOXIG_SsVPYX_sO_)

### **Supervise learning and Unsupervised leaning**

### **Supervise learning**:

When we have training data and desire output

Like: email spanning or finding dog/cat from  animals images

#### **Binary vs Multi-class classification**&#x20;

Binary => true/ false

Multi-class => Multiple options

### Unsupervised leaning

When we have training data only

There is no labelled data&#x20;

Finding the patter(clusters) from the given data

Like: Astronomical data, market segmentation&#x20;

### Cluster:&#x20;

Finding patters\
no of cluster is given by user is Unsupervised learning

### Reinforcement Leaning

Learning with  rewards,\
get reward for right work and penalty for wrong work

Example: ChatGPT and self car Driving&#x20;

### ML System Classification&#x20;

Batch vs Online Learning&#x20;

**Batch** => offline data learning&#x20;

**Online Learning** => chat GPT

Instance Base vs Model Base

### Challenges of Machine learning&#x20;

* Insufficient data
* NonRepresentative training data&#x20;
* Poor data
* Irrelevant data

### Performance Measure

&#x20;  It also called Cost &#x20;

Root Mean Square error

&#x20;     RMSE = sqrt \[(Σ(Pi – Oi)²) / n]

Mean Square error

&#x20;     MSE = (Σ(Pi – Oi)²) / n

### Explore Data

1. **Understand the data like**

* missing data
* data range, count&#x20;

2. **Visually watch the data using histograms**

* watch patter and outliner&#x20;
* Library =>  import matplotlib.pyplot as plt

3. **Duplicate Data** => Remove it
4. **Segregate Data**
5. **Get Unique Identifier**&#x20;
6. **Find Correlations** => Standard Correlations Coefficient( Pearson’s r)&#x20;

Segregate 20% (for less depend on data set) for Testing purpose  &#x20;

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2F93wWL8eF7WVieyVCjm6U%2FScreenshot%202025-01-14%20at%2010.21.34%E2%80%AFAM.png?alt=media&#x26;token=147452e6-21ed-4829-b89f-413d59eff3d9" alt=""><figcaption></figcaption></figure>

## Lecture 3: (*<mark style="color:orange;">18/01/2025</mark>*)

[**Class Recording**](https://futurense.zoom.us/rec/play/ojgH9HzpeGnwfdn1GdKUsZ-0BUivPtLq2B4ddL-fEj2zB1ryFeWaQPENxvxafDvXeg2NQrx0EKe3tZX_.OReCu7j0uGAxI8pj)

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FJ5QgboVyKyC9VNnCEUqX%2FLecture_2_DataPreProcessing.pdf?alt=media&token=49dcbedf-73dd-4074-a1cb-aa7a2a343fd4>" %}

### Data Preparation:

&#x20;80% of data analysis is spent on the process of cleaning and preparation data.

Imputation: replacing null or blank vale with zero or mean or medium value so that instead of removing vale complete value. by mapping with some other value

Good Imputation: ?? Homework

### Data Cleaning:&#x20;

Capping: remove the outline

Encoding

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FHpAH8YgpcHjHjG88GdiI%2Fdata%20cleaning.png?alt=media&#x26;token=38ed39b0-11c1-4b1b-8626-4492730b42b3" alt="" width="563"><figcaption></figcaption></figure>

Converting Text to Number… or we can said mapping text into numerical value. Like ordinal\_encoder

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FeBaNPVEcQQQcZnTH0Y5j%2Fdata%20cleaning%20dig.png?alt=media&#x26;token=71df9e35-0f81-44b7-9d69-1796c2eb787f" alt="" width="375"><figcaption></figcaption></figure>

### Features Scaling and transformation&#x20;

ML algo don’t perform well when input numerical attribute have very different scales.\
Gaussian Algo is best for ML, where min = 0

1. Feature Scaling: Adjusting the range of features (e.g., normalization or standardization) to ensure all features contribute equally to the model, preventing dominance by features with larger magnitudes.
2. Feature Transformation: Modifying features (e.g., log, square root, or polynomial transformations) to make data more suitable for modeling, often improving linearity or addressing skewness.

### Multimodal Distribution

Hyperparameter reading: Grid search method

Feature Importance : Drop 0 like value

Evaluate on Test

Linear Regression:&#x20;

{% hint style="info" %}
for reference ML ppt CS229
{% endhint %}

### Mean square error problem ([Cost function](https://m-tech-in-artificial-intelligenc.gitbook.io/manvendrapratapsinghdev/trimester-1/broken-reference))

Iteration to find theta

**Gradient Descent**: <mark style="color:red;">Mountain Example</mark>&#x20;

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FSj770lClP2Xk4YrjgmgX%2FScreenshot%202025-01-19%20at%208.10.02%E2%80%AFAM.png?alt=media&#x26;token=71707b80-5a91-490a-a8fb-4e9132ec2812" alt="" width="563"><figcaption></figcaption></figure>

## Lecture 4: *(<mark style="color:orange;">19/01/2025</mark>`)`*

[**Class Recording**](https://futurense.zoom.us/rec/play/A4bn_Ki2KcmMhjFBi5-HAfteRI0xOwVHqG1Ft6PSRB-Psmlum_-ERDujYOlX92-6xCn0ytXkTNqxR78v.YypNZ7tV2gWM0lwj)

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FVJ7drU1466AfYeL3HEm1%2FLecture_4_LinearRegressin.pdf?alt=media&token=fb3fffe0-6320-4f54-87eb-4f7b4ca5e192>" %}

### <mark style="color:purple;">**Linear Regression**</mark>&#x20;

models the relationship between a dependent variable Y and one or more independent variables X using a linear equation:

$$
Y= \beta\_0 + \beta\_1 X + \epsilon
$$

<mark style="background-color:yellow;">where β\_0 is the intercept, β\_1 is the slope, and epsilon(ϵ) is the error term.</mark>

#### <mark style="color:green;">**Example**</mark>**:** Predicting house prices based on square footage.&#x20;

If Y= 50000 + 200X,&#x20;

then a 1000 sq. ft. house costs 50000+200(1000) = 250000.

### <mark style="color:purple;">**Hypothesis function**</mark>&#x20;

In Linear Regression, the represents the predicted output as a linear combination of input features:

$$
h(X) = \theta\_0 + \theta\_1 X\_1
$$

<mark style="background-color:yellow;">where θ\_0(intercept) and θ\_1 (slope) are learned parameters</mark>.

#### <mark style="color:orange;">why we calculate the hypothesis?</mark>

"To estimate the relationship between input 𝑋 and output 𝑌, allowing us to make predictions for new data."

<mark style="color:green;">**Example**</mark>**:** If h(X)= 50 + 10X, for X=5, the predicted value is h(5)=50+10(5)= 50 + 10(5) = 100.

#### <mark style="color:purple;">**Hypothesis function for Multiple Linear Regression**</mark>&#x20;

where predictions depend on multiple input features:

$$
h(X)= \theta\_0 + \theta\_1 X\_1 + \theta\_2 X\_2 + ... + \theta\_n X\_n
$$

<mark style="background-color:yellow;">Each X\_i represents an independent variable, and θ\_i are the learned coefficients.</mark>

<mark style="color:green;">**Example**</mark>**:** Predicting house price based on size (X\_1) and number of rooms (X\_2):

h(X)=50000+200X\_1+10000X\_2

For X\_1= 1000 sq. ft, X\_2 = 3 rooms, the price is **₹2,80,000**.

### <mark style="color:purple;">Calculation Of θ ⇒</mark> <mark style="color:red;">Cost Function</mark>

The values of θ\_*0,θ\_*&#x31;,… are found using **Gradient Descent** or the **Normal Equation**.

#### **1. Gradient Descent Algorithm**

Minimizes the cost function:

$$
J(θ)= \frac{1}{2m} \sum\_{i=1}^{m} (h(X\_i) - Y\_i)^2
$$

* <mark style="background-color:yellow;">**m**</mark> <mark style="background-color:yellow;"></mark><mark style="background-color:yellow;">= Total number of training examples.</mark>
* <mark style="background-color:yellow;">**Xi​**</mark> <mark style="background-color:yellow;"></mark><mark style="background-color:yellow;">= Input features of the ith training example.</mark>
* <mark style="background-color:yellow;">**Yi​**</mark> <mark style="background-color:yellow;"></mark><mark style="background-color:yellow;">= Actual output (target value) of the ith training example.</mark>
* <mark style="background-color:yellow;">**h(Xi​)**</mark> <mark style="background-color:yellow;"></mark><mark style="background-color:yellow;">= Predicted output using the hypothesis function.</mark>

**Update Rule:**

$$
θj:=\theta\_j - \alpha \frac{\partial J}{\partial \theta\_j}
$$

<mark style="background-color:yellow;">where α is the learning rate.</mark>

<mark style="color:green;">**Example**</mark>**:** For data points (1,2), (2,2.8), (3,3.6), running gradient descent iteratively updates θ\_0 and θ\_1 to best fit h(X).

#### **2. Normal Equation (Direct Method)**

Used when data is small, as it’s computationally expensive for large datasets.

Solves for θ without iteration:

$$
θ=(X^TX)^{-1}X^TY
$$

### <mark style="color:orange;">Role of the hypothesis function</mark>

The hypothesis function serves as a **mathematical model** that maps inputs X to outputs Y, whether continuous (regression) or discrete (classification)

**Regression**, h(X) outputs continuous values, meaning the predictions can take any real number. <mark style="color:green;">**Example**</mark>**:** Predicting house prices—h(X) 50000 + 200X can output any value like ₹2,50,000 or ₹2,50,500.\ <mark style="background-color:orange;">Predict quantities</mark>

**Classification**, h(X) outputs discrete values, meaning predictions belong to predefined categories. **Example:** Spam detection—h(X) predicts either **Spam (1)** or **Not Spam (0)** based on email features.

<mark style="background-color:orange;">Predict label</mark>

### <mark style="color:purple;">Least Squares Optimization Problem</mark>

The **Least Squares Optimization Problem** finds the best-fit line by minimizes the sum of squared errors between predicted and actual values:

$$
J(θ)= \sum\_{i=1}^{m} (Y\_i - h(X\_i))^2
$$

where

$$
h(X)= \theta\_0 + \theta\_1 X.
$$

**Methods:**

1. **Gradient Descent** iteratively updates θ to minimize J(θ).
2. **Normal Equation** directly computes.

<mark style="color:green;">**Example**</mark>**:** Fitting a line to points (1,2),(2,2.8),(3,3.6)(1,2), (2,2.8), (3,3.6) by minimizing the squared differences between actual Y and predicted h(X).

### **Pitfalls of Least Squares Optimization:**

1. **Sensitive to Outliers:** Large errors get squared, making the model biased toward extreme values.
2. **Overfitting in High Dimensions:** Too many features (X) can lead to poor generalization.
3. **Multicollinearity:** Highly correlated features cause unstable parameter estimates.
4. **Non-Linearity:** Least squares assumes a linear relationship, failing for complex patterns.
5. **Heteroscedasticity:** Unequal variance in errors violates model assumptions.

<mark style="color:green;">**Example**</mark>**:** If one house in a dataset has an extreme price (₹1 crore while others are ₹10-20 lakhs), the least squares model will be skewed.

### <mark style="color:purple;">**Learning Rate**</mark><mark style="color:purple;">:</mark>&#x20;

Learning rate hyperparameter.&#x20;

The **learning rate** (α) controls how much Gradient Descent updates model parameters in each step:

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FFx49R8AfhD381nRmqfzM%2FScreenshot%202025-01-19%20at%208.17.30%E2%80%AFAM.png?alt=media&#x26;token=4aa8037c-48cb-4e92-bac4-8ec5b2a957b8" alt="" width="375"><figcaption></figcaption></figure>

$$
θj:= \theta\_j - \alpha \frac{\partial J}{\partial \theta\_j}
$$

#### **Effects:**

* **Too high (α≫1)** → Divergence (jumps over the minimum).
* **Too low (α≪1)** → Slow convergence.

<mark style="color:green;">**Example**</mark>**:** If α= 0.01, the model learns steadily, but if α= 10, it may overshoot and fail to minimize the cost function.

* When will be stop
  * fixed no for iteration
  * Stop at threshold&#x20;

`Numerical on MSE`

### <mark style="color:purple;">**Feature Scaling**</mark>&#x20;

improves Gradient Descent convergence by normalizing feature values. Two common methods:

**`Min-Max Scaling:`**

$$
X′= \frac{X - X\_{\min}}{X\_{\max} - X\_{\min}}
$$

<mark style="background-color:yellow;">Scales values between 0 and 1.</mark>

**`Standardization (Z-score):`**

$$
X′= \frac{X - \mu}{\sigma}
$$

<mark style="background-color:yellow;">Centers mean at 0 with unit variance.</mark>

**Example:** If house sizes range from 500 to 5000 sq. ft, without scaling, Gradient Descent takes longer to converge. Normalizing makes updates uniform, speeding up learning.

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FkdRtOZEugM6jaylD2gfS%2FScreenshot%202025-01-19%20at%208.19.58%E2%80%AFAM.png?alt=media&#x26;token=e4fb2f0d-4881-4a0e-8e34-8fae917d43ea" alt="" width="375"><figcaption></figcaption></figure>

### <mark style="color:purple;">Batch Gradient Decent vs Stochastic GD</mark>

* **Batch Gradient Descent** computes gradients using the entire dataset, making it slow for large datasets but stable. **Batch**: is like linear search b/c we gave all data to Machine so Time increases
* **Stochastic Gradient Descent (SGD)** updates parameters using one random instance at a time, making it faster but noisy. **Stochastic**: like random search b/c it pick random point and check GD but it will never stop at minimum but it will come near to that point b/c it changing testing data every time.

**SGD does not converge exactly** but oscillates near the minimum, helping escape local minima.

<mark style="color:green;">**Example**</mark>**:** In house price prediction, Batch GD updates after processing all houses, while SGD updates after each house, making it faster but less stable.

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FZTQfnynfgGYAn2tagJU9%2FScreenshot%202025-01-19%20at%208.25.45%E2%80%AFAM.png?alt=media&#x26;token=44a0ea5e-749e-4ce3-9b70-aa9381c4117d" alt="" width="563"><figcaption></figcaption></figure>

### <mark style="color:purple;">Mini Batch Gradient Decent</mark>

It is mix of both, instead of random data it will pick it will pick random sets and perform batch of that, it will go very close to global minima

### Normal Equation Derivation

* Not useful for large data set, b/c we have to take inverse of matrix and that’s costly.
* Gradient calculation is faster by using Normal equations&#x20;
* It will best work for minimum 70k data..
* If the inverse is not exist then clean data or use another approach

### <mark style="color:purple;">Polynomial Regression</mark>

**Polynomial Regression** extends Linear Regression by adding polynomial terms to capture non-linear relationships:

$$
h(X) = \theta\_0 + \theta\_1 X + \theta\_2 X^2 + ... + \theta\_n X^n
$$

**Example:** Predicting salary based on experience, where a simple linear model fails. If

h(X)= = 5000 + 2000X + 300X^2

for X = 5 years, the predicted salary is **₹32,500**.

### <mark style="color:purple;">**Learning Curves**</mark>

A plot of training and validation errors vs. training size, showing model performance.

#### <mark style="color:blue;">**Underfitting**</mark>

Occurs when the model is too simple (high bias), leading to high training and validation errors.\ <mark style="color:green;">**Example**</mark>**:** Linear regression on a curved dataset results in poor predictions.

#### <mark style="color:blue;">**Overfitting**</mark>

Occurs when the model is too complex (high variance), fitting noise instead of patterns.\ <mark style="color:green;">**Example**</mark>**:** A high-degree polynomial perfectly fits training data but performs poorly on new data.

## Lecture 5: *(<mark style="color:orange;">25/01/2025</mark>`)`*

[**Class Recording**](https://futurense.zoom.us/rec/play/wClklIgXDOZnTiJHfeRfvt-3ksHIskHZb9lK4zItUpPHl9E8PWEYjtLGZhSEDGSbb0cqQFmY3FsJss0Q.IQz19p--Vw6m3CTW)

### <mark style="color:purple;">Practical Session only</mark>

## Lecture 6: *(<mark style="color:orange;">01/02/2025</mark>`)`*

[**Class Recording**](https://futurense.zoom.us/rec/play/j5WSMXnsi7dDCuWCqmxB9o2JIJPmfF8wMmw-wDEKxFnjHwrzqMrpAgavUxUWEIJLgUhJ-SbNfFIauXTg.SLcZTN3I487thmxw)

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FXhnd9wZYxWR0oqEyPqg1%2FLecture_6_LogisticRegression.pdf?alt=media&token=7e5c3296-df00-442b-ac73-ad71aecfc1b5>" %}

### Regularised Linear Models- tackles overfitting

Lasso Regression&#x20;

Elastic

Error Bias & Variance Tradeoff & Irreducible&#x20;

**Bias** => does not fit the data well, i.e. underfoot

**Variance** => small change in data result change lot i.e overfit

**Irreducible** => Noisy data, if you can’t fit the data in model. So we need to clean up the data. Remove outliners

## Lecture 7: *(<mark style="color:orange;">02/02/2025</mark>`)`*

[**Class Recording**](https://futurense.zoom.us/rec/play/6SA3as3yDl6U5dU_YYqJjZOfolIcXAus-vwXJkRfkwovGxcyJMqaR5JWk5wvfDXEsU1wyx2PHhDCkryX.idOUmn4znoWW8nj5)

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FGNR7JzhRhBWJqYAuB8bU%2FLecture_7_Classification.pdf?alt=media&token=8b7e5a6e-2084-4311-b3b9-04a648176baa>" %}

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2F5lGUfrScv2C0MvVayJLn%2Fregression-vs-classification_simple-comparison-image_v3.png?alt=media&#x26;token=6d65f7e7-0ed5-4f69-8431-173db9aaeeed" alt=""><figcaption></figcaption></figure>

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FLjnWJJLhDao4BBS1LCWo%2Ftypes-of-regression.jpg?alt=media&#x26;token=ab2aa900-64a6-4db3-9e97-9559173820ae" alt="" width="563"><figcaption></figcaption></figure>

## Lecture 8: *(<mark style="color:orange;">08/02/2025</mark>`)`*

[**Class Recording**](https://futurense.zoom.us/rec/play/6ojdaYMA9jcG2augzQbFX-0byogaiFCdkCQVoNA14JQkEFM3TLV-WWShZKt4RO5Sapov4el91WpzQF-C.LwnO69GmhpCjsPDc)

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FWFXDDAGiq4IGYtRElV6u%2FLecture_8_DecisionTree.pdf?alt=media&token=4e01d49d-9f07-4ae5-be6e-6b9896e29ab9>" %}

[**https://chatgpt.com/share/67ac361b-1e38-800c-8d24-9e3991a11f25**](https://chatgpt.com/share/67ac361b-1e38-800c-8d24-9e3991a11f25)

## Lecture 9: *(<mark style="color:orange;">09/02/2025</mark>`)`*

[**Class Recording**](https://futurense.zoom.us/rec/play/C59FJ9vdLJ5HWwoBqMbG1sBeCtblObDVSJx1viu2a8iVNOkaTVDKsrW6r6CSXrKJUHgJMIhYNsNF4_z4.bSh7D9e9tunM16I0)

[**Doubt session  Recording**](https://futurense.zoom.us/rec/play/YNrPUbqtYKkTEkHUTp99wYBFgrUwHqpFITE61ZKNmxuAo59XGLKHXrWzC9u6Je_Mci2RMAmZ56DNGnYM.YggLz4fIBwO-Ov0q)

\
[**https://chatgpt.com/share/67af6047-4a34-8006-a25b-168265542c77**](https://chatgpt.com/share/67af6047-4a34-8006-a25b-168265542c77)

## Lecture 10: *(<mark style="color:orange;">15/02/2025</mark>`)`*

[**Class Recording**](https://futurense.zoom.us/rec/play/pHUB84cOxmdk4_vmRvOmpKtnNA-BMun-FTF_A34RsnEtYmt8AnscZOhMJqyQQWPrHjzn6OryMxVP1IaN.Qe4elChmrF_GNd30)

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FDPQgFiVS73aLj240DvI7%2FLecture_10_RandomForest.pdf?alt=media&token=9d084151-e451-4ee9-b6ed-89fa8af6c150>" %}

### <mark style="color:purple;">Random Forest</mark>

**Random Forest** is an ensemble learning method that builds multiple decision trees and aggregates their predictions to improve accuracy and reduce overfitting. It uses **bagging** and **feature randomness** for robustness.

### <mark style="color:purple;">**Bagging (Bootstrap Aggregating)**</mark>&#x20;

in Random Forest improves stability and accuracy by training each decision tree on a different **random subset** of the dataset with replacement. This reduces variance and prevents overfitting.

<mark style="color:green;">**Example**</mark>**:** In a customer churn prediction model, each tree is trained on a different bootstrapped sample, and the final decision is made by averaging (regression) or voting (classification).

<mark style="color:green;">**Example 2**</mark>**&#x20;⇒**&#x20;

**Bagging in Random Forest** can be understood using an example of classifying apples and oranges. Suppose we have a dataset of fruits with features like **color, weight, and texture**.

Each decision tree in the Random Forest is trained on a **random subset** of this dataset (with replacement). Some trees may focus more on **color**, while others on **weight**. When classifying a new fruit, the final decision is made by majority voting.

<mark style="color:green;">**Like**</mark>**:**

* **Tree 1:** Says "Apple" based on red color
* **Tree 2:** Says "Orange" based on texture
* **Tree 3:** Says "Apple" based on weight

Final prediction: **"Apple" (majority vote).**

### <mark style="color:purple;">**Feature Importance in Random Forest**</mark>&#x20;

measures how much each feature contributes to the model's decision-making. It helps in feature selection by identifying the most influential features.

<mark style="color:green;">**Example**</mark>**:** In a fruit classification model, **color** might be the most important feature, followed by **texture** and **weight**.

**Formula:**

$$
FIj= \frac{1}{N} \sum\_{i=1}^{N} \left( I\_{split, j}^{(i)} \right)
$$

<mark style="background-color:yellow;">where</mark>:

* FI\_j = Feature importance of feature j
* N = Number of trees
* Isplit,j(i) = Importance of feature j in tree i

**Code to get Feature Importance:**

## Lecture 11: *(<mark style="color:orange;">15/02/2025</mark>`)(Evening class)`*

[**Class Recording**](https://futurense.zoom.us/rec/play/jpi0HLOQad6EtlbhH98nh5gXAttQPSZkZjt2L8E8OXGgS5cupI-wcANYhw2slC3t44gWj96y9kuB8zA.BE8sibTfcLs92g-g)

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FfuRpyJ5DDelfyrcZ2Yui%2FLecture_11_SVM.pdf?alt=media&token=abf09cd0-840c-4011-8423-8e3f803bd218>" %}

### <mark style="color:purple;">Boosting</mark>

**Boosting** is an ensemble technique that combines weak learners sequentially, where each model corrects the errors of the previous one, improving overall accuracy. It reduces bias and variance.

<mark style="color:green;">**Example**</mark>**:** In spam detection, boosting refines misclassified emails by focusing more on difficult examples in each iteration.

### <mark style="color:purple;">**Example: Financial Fraud Detection (Using Bagging + Boosting Together)**</mark>

#### <mark style="color:orange;">**How?**</mark>

1. **Bagging inside Boosting:** Use **Random Forest** (bagging) as the base estimator in **AdaBoost/XGBoost** to make boosting more robust.
2. **Boosting inside Bagging:** Train multiple boosted models (e.g., **Gradient Boosted Trees**) and aggregate their predictions like bagging.

<mark style="color:orange;">**Step 1: Bagging (Random Forest) for Robust Feature Selection**</mark>

* A **Random Forest** model is trained using multiple decision trees on different subsets of transaction data.
* Each tree gives independent predictions, and majority voting ensures stable, less overfitting-prone results.
* <mark style="color:green;">**Example**</mark>**:**
  * Tree 1: Says "Fraud" based on transaction amount.
  * Tree 2: Says "Not Fraud" based on merchant type.
  * Tree 3: Says "Fraud" based on location difference.
  * **Final Bagging Prediction:** "Fraud" (majority vote).

<mark style="color:orange;">**Step 2: Boosting (XGBoost) for Enhanced Accuracy**</mark>

* The output from Random Forest is then fed into an **XGBoost model**, which corrects misclassifications.
* The model assigns **higher weights** to misclassified transactions and improves fraud detection.
* **Example:**
  * If Bagging misclassified a fraud case due to a rare merchant, Boosting will refine it using new weighted trees.

<mark style="color:orange;">**Final Outcome**</mark>

By combining **Bagging (for robustness)** and **Boosting (for accuracy improvement)**, the system detects fraud more reliably, reducing false positives and catching hard-to-detect fraudulent transactions.&#x20;

### <mark style="color:purple;">**Support Vector Machine (SVM)**</mark>

Support Vector Machine (SVM) is a supervised learning algorithm that finds the optimal **hyperplane** to separate classes with maximum margin. It works well for both linear and non-linear classification using **kernels**.

<mark style="color:green;">**Example**</mark>**:** In spam detection, SVM separates spam and non-spam emails based on word frequency patterns.

### <mark style="color:purple;">**Hard Margin vs. Soft Margin in SVM**</mark>

1. **Hard Margin SVM:**
   * Used when data is **linearly separable** with no misclassification.
   * <mark style="color:green;">**Example**</mark>**:** Perfectly separating red and blue balls in a 2D plane without overlap.
2. **Soft Margin SVM:**
   * Allows some misclassification for better generalization in **non-linearly separable** data.
   * <mark style="color:green;">**Example 1**</mark>**:** Classifying emails as spam or non-spam, where some emails might be misclassified due to ambiguous words.
   * <mark style="color:green;">**Example 2**</mark>**:** Separating dog and cat images where some breeds (e.g., Pomeranian vs. Persian cat) have similar features.

### <mark style="color:purple;">**Regularization Hyperparameter (CC)**</mark>

The **Regularization Hyperparameter (CC)** in SVM controls the trade-off between **maximizing margin** and **minimizing misclassification**.

* **High CC (low regularization)** → Focuses more on classifying all points correctly, leading to **overfitting**.
* **Low CC (high regularization)** → Allows some misclassification, leading to **better generalization**.

<mark style="color:green;">**Example 1**</mark>**:** In spam detection, a high CC might overfit to specific spam words, while a low CC generalizes better.\ <mark style="color:green;">**Example 2**</mark>**:** In image classification, a low CC prevents overfitting to noise in training images.

### <mark style="color:purple;">**Non-Linear SVM**</mark>

When data is **not linearly separable**, SVM uses **kernel tricks** to map it into a higher-dimensional space where a hyperplane can separate the classes.

<mark style="color:green;">**Example**</mark>**:**

Classifying red and blue points that form concentric circles. A linear SVM fails, but using a **Radial Basis Function (RBF) kernel**, we transform data into a higher dimension where a clear separation is possible.

<mark style="color:orange;">**Graph (Visualization of Non-Linear SVM)**</mark>

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FqNVU2MPdvt4hjnybDeey%2FScreenshot%202025-02-15%20at%208.43.58%20PM.png?alt=media&#x26;token=91bf814d-fc88-47a2-85c7-f0ddedcc1bc7" alt="" width="563"><figcaption></figcaption></figure>

The plot shows how **SVM with an RBF kernel** separates non-linearly distributed data (moons dataset). The **decision boundary** curves around the data, demonstrating how kernel tricks enable SVM to handle complex patterns.&#x20;

## Lecture 12: *(<mark style="color:orange;">22/02/2025</mark>`)`*

[**Class Recording**](https://futurense.zoom.us/rec/play/6XyB9M5pTZe61hYGYHrv0r7G-9U8c8DToUMSJwV7ysDCn5sp-waQtR2XYtoVrKV9TjrEC_-MAKkrmQZr.s8MOBFD_jFVruf_u)

### <mark style="color:purple;">**Kernel Function in SVM**</mark>

A **kernel function** transforms non-linearly separable data into a **higher-dimensional space**, making it linearly separable.

#### <mark style="color:orange;">**Common Kernel Types**</mark>**:**

1. **Linear Kernel** – Used when data is already linearly separable.
   * **Example:** Separating spam vs. non-spam emails based on word frequency.
2. **Polynomial Kernel** – Maps data into polynomial space for curved decision boundaries.
   * **Example:** Classifying different species of flowers with overlapping petal lengths.
3. **RBF (Gaussian) Kernel** – Maps data to an infinite-dimensional space, capturing complex patterns.
   * **Example:** Detecting fraudulent transactions with non-linear relationships.
4. **Sigmoid Kernel** – Similar to a neural network activation function.
   * **Example:** Handwriting recognition where patterns need non-linear separation.

[<mark style="color:blue;">**SVM CODE**</mark>](https://github.com/manvendrapratapsinghdev/IITJMaterial/blob/main/T1/ML/Code/SVM.ipynb)

## Lecture 13: *(<mark style="color:orange;">23/02/2025</mark>`)`*

[**Class Recording**](https://futurense.zoom.us/rec/play/SS27MBKAcWTrbLhhyLWhVuyKlSPh85SqPOtzpnVnMKiOjaeVRNWjMeb2ezOw7eMZ2kLaPPoJoOZ5egjJ.EoZ5fGxoSTFmk2Yx)

#### **Analyzing Covariance Matrix in ML**

**What is a Covariance Matrix?**

A **covariance matrix** is a square matrix that captures the relationships between multiple variables in a dataset. Each element C(i, j) represents the covariance between variable X\_i and X\_j:

$$
C(i,j)= \frac{1}{n} \sum\_{k=1}^{n} (X\_{ki} - \bar{X\_i})(X\_{kj} - \bar{X\_j})
$$

* **Positive covariance** → Variables increase together.
* **Negative covariance** → One variable increases while the other decreases.
* **Zero covariance** → No linear relationship.

***

**Why is it Important in ML?**

1. **Feature Relationship**: Helps understand how features interact.
2. **Dimensionality Reduction**: Used in **PCA (Principal Component Analysis)** to find uncorrelated axes.
3. **Multicollinearity Detection**: Identifies redundant features in regression models.

***

**Example with Visualization**

Consider a dataset with two features, **Height (cm)** and **Weight (kg)**.

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FcCtdzIYWyu1OuVIUFO5V%2F121.png?alt=media&#x26;token=c0953f88-2803-4d12-8fcf-0a54ea36a163" alt="" width="563"><figcaption></figcaption></figure>

**Interpretation**: If the covariance matrix has a **high positive value**, Height and Weight are strongly correlated.

***

**Graphical Representation**

* **Heatmap of the Covariance Matrix**

```python
import seaborn as sns

sns.heatmap(cov_matrix, annot=True, cmap="coolwarm")
plt.title("Covariance Matrix Heatmap")
plt.show()
```

This helps visualize how different features are related in high-dimensional datasets.

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2F2cuTbFd4ZGPi8R4Zgb6U%2FScreenshot%202025-02-23%20at%201.06.00%E2%80%AFPM.png?alt=media&#x26;token=aa28d999-2abe-4846-92cd-d6bb37314fd9" alt="" width="563"><figcaption></figcaption></figure>

***

#### **Relevance to PCA (Dimensionality Reduction)**

PCA relies on the **eigenvectors** and **eigenvalues** of the covariance matrix to transform correlated variables into **uncorrelated principal components**.

```python
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(data.T)

print("Principal Components:\n", pca.components_)
```

This transforms the dataset into new axes where features are **uncorrelated**, making ML models more efficient.

```
Principal Components:
 [[ 0.77334214  0.63398891]
 [-0.63398891  0.77334214]]
```

***

**Conclusion**

The covariance matrix is a fundamental tool in ML for **understanding feature relationships**, **reducing dimensions**, and **improving model efficiency.**

## Lecture 14: *(<mark style="color:orange;">01/03/2025</mark>`)`*

<mark style="color:red;">**Class Cancelled**</mark>&#x20;

## Lecture 15: *(<mark style="color:orange;">02/03/2025</mark>`)`*

**Class Recording**

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FZvrhu6eCeXtJ0SNHxCjU%2FLecture_KMEans.pdf?alt=media&token=bae304d5-a9ce-40cd-bc51-62135e0a30f8>" %}

## <mark style="color:red;background-color:yellow;">Unsupervised Learning</mark>

Unsupervised learning finds hidden patterns or structures in data without labeled outputs. It is widely used in clustering, anomaly detection, and dimensionality reduction.

### <mark style="color:purple;">K-means Clustering: A Simple Yet Powerful Algorithm</mark>

K-Means is an unsupervised learning algorithm that groups data into K clusters by minimizing intra-cluster distance. It follows an iterative process of centroid initialization, point assignment, centroid update, and convergence.

* <mark style="color:blue;">How K-Means Works:</mark>
  * Select the number of clusters (K).
  * Randomly initialize K centroids.
  * Assign data points to the nearest centroid.
  * Recalculate centroids based on cluster means.
  * Repeat until convergence.
* <mark style="color:blue;">Finding Optimal K (Elbow Method):</mark>
  * Run K-Means for different K values.
  * Calculate total variation within clusters.
  * Plot results and find the "elbow point" where adding clusters no longer reduces variation significantly.
* <mark style="color:blue;">Applications & Considerations</mark>:
  * Works for **1D, 2D, and multi-dimensional** data.
  * Used in **customer segmentation, image compression, and heatmaps**.
  * Running multiple times helps counter randomness in centroid initialization.

#### <mark style="color:blue;">Limitations of K-Means Clustering</mark>

* **Sensitivity to Initialization** – The algorithm's final clustering results can vary due to different initial centroid placements, leading to inconsistent outcomes.
* **Fixed Number of Clusters (K)** – K-means requires specifying the number of clusters in advance, which can be challenging without prior knowledge of the data structure.
* **Struggles with Non-Spherical Clusters** – It assumes clusters are spherical and evenly sized, making it ineffective for complex, irregularly shaped clusters.
* **Sensitivity to Outliers** – Outliers can distort centroid positions, leading to inaccurate cluster assignments and affecting overall performance.

#### <mark style="color:blue;">Cost function</mark>

The cost function for **K-Means Clustering** is the **Sum of Squared Errors (SSE)**, also known as **Inertia**. It measures the compactness of clusters by calculating the squared distance between each data point and its assigned centroid.

$$
J= \sum\_{i=1}^{K} \sum\_{x \in C\_i} || x - \mu\_i ||^2
$$

```
Where:
K = Number of clusters
x = Data point
μ_i Centroid of cluster C_i
|| x - u_i ||^2 = Squared Euclidean distance between the point and the centroid
```

The objective of **K-Means** is to **minimize** this cost function to achieve the best clustering.

#### <mark style="color:blue;">Pseudocode</mark>

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FEIVUs1eReVHv6hnEjQ7D%2Fflow_digaram_ML_k_mean.jpeg?alt=media&#x26;token=05b694a5-a11f-4b05-834b-3add180c418a" alt="" width="326"><figcaption></figcaption></figure>

```
Initialize K centroids randomly  
Repeat until convergence:  
  Assign each data point to the nearest centroid  
  Update centroids by computing the mean of assigned points  
  Check for convergence (centroids no longer change)

```

### WORKING......

### Mini-batch Elbow method

## Lecture 16: *(<mark style="color:orange;">08/03/2025</mark>`)`*

**Class Recording**

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FAPD9awO3kAhWzcj7zkx8%2FLecture_KMEans.pdf?alt=media&token=8c96f7bf-038b-485a-887c-333ac2ecaca2>" %}

### <mark style="color:purple;">Silhouette Coefficient</mark>&#x20;

The **Silhouette Coefficient (or Silhouette Score)** is a metric used to evaluate the quality of clustering in unsupervised learning. It measures how similar a data point is to its own cluster compared to other clusters. The score ranges from **-1 to 1**, where:

* **1** → The data point is well clustered.
* **0** → The data point is on the border between clusters.
* **-1** → The data point is likely misclassified.

$$
S(i)= (b(i)−a(i))/
max(a(i),b(i))
$$

```
where:
a(i) = Average intra-cluster distance 
(distance from i to all other points in the same cluster).

b(i) = Average nearest-cluster distance 
(distance from i to all points in the closest neighboring cluster).
```

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FtvblC99FaYciVNQPt0gW%2F12.png?alt=media&#x26;token=a118c112-016c-4011-8e1a-ec1e8f92fe6d" alt=""><figcaption></figcaption></figure>

### <mark style="color:purple;">Bayesian Decision Theory Risk function</mark>

Bayesian Decision Theory provides a probabilistic approach to decision-making under uncertainty. The **Risk Function** quantifies the expected loss when making decisions based on uncertain information.

## Lecture 17: *(<mark style="color:orange;">09/03/2025</mark>`)`*

**Class Recording**

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2F5SWFACdWzAnirhbKpUDp%2FLecture_PCA.pdf?alt=media&token=b187137b-741b-4235-a6f9-1eebf6802d3a>" %}

### <mark style="color:purple;">Principal Component Analysis (PCA)</mark>

Local Linear Embedding

#### <mark style="color:orange;">Eigenvalue</mark>

## Lecture 17: *(<mark style="color:orange;">16/03/2025</mark>`)`*

**Class Recording**

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2F4xra4a5Ez6gReR53U2cl%2FLecture_BayesClassification.pdf?alt=media&token=ee4246c2-0ce9-4953-90d6-eaf4246d02b7>" %}

## Lecture : *(<mark style="color:orange;">22/03/2025</mark>`)`*

**Class Recording**

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FsRw0C0iVqEAGwiAPCHL8%2FExample_BayesClass.pdf?alt=media&token=4b31e703-a3f6-45ba-844d-77e5269f079b>" %}

## Lecture : *(<mark style="color:orange;">23/03/2025</mark>`)`*

**Class Recording**

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FH6VYXUwvKHPawvfcapfL%2FLecture_Neural%20Networks.pdf?alt=media&token=189d7d37-0349-4569-8515-84985cc5651d>" %}

### <mark style="color:purple;">Neural Networks</mark>

## Lecture : *(<mark style="color:orange;">29/03/2025</mark>`)`*

**Class Recording**

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FoTfFwyEBjVp7hBnCjYEj%2Fnural.png?alt=media&#x26;token=16f8e775-6a08-4355-9163-699df7e53b36" alt="" width="563"><figcaption></figcaption></figure>

### <mark style="color:purple;">Neural Networks</mark>

**Input Encoding**: The original photo is processed into a latent feature space using a CNN.

**Style Conditioning**: A text encoder converts the prompt "Ghibli style" into a style embedding.

**Latent Fusion**: Cross-attention fuses the photo’s content with the Ghibli style.

**Diffusion Refinement**: An iterative diffusion model denoises the fused latent space to align it with the desired style.

**Decoding**: A decoder converts the refined latent representation back into the final stylized image.

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FN6jCCrkAMMqlsyq4FWQe%2Fnu.png?alt=media&#x26;token=9506e19c-858b-47d5-9afa-f830d7f4ee2c" alt="" width="563"><figcaption></figcaption></figure>

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FC8YORhcr65sXmlxwEisz%2FNeural_Networks.pdf?alt=media&token=aabb4ac6-3c98-4707-8380-bcd260b83e44>" %}

## Lecture : *(<mark style="color:orange;">30/03/2025</mark>`)`*

**Class Recording**

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2F9cuDGHyGihClHGvYTrFf%2Factivation.png?alt=media&#x26;token=297d4b9b-4223-419f-b23e-07b7a2cf6d43" alt=""><figcaption></figcaption></figure>

## Lecture : *(<mark style="color:orange;">05/04/2025</mark>`)`*

**Class Recording**

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FLYc8Vj8G5BLdnsXtbfjp%2FCNN_250403_115321.pdf?alt=media&token=a2b0f0e9-f378-4bb9-8303-1fb48f456482>" %}

<mark style="color:purple;">**CNN**</mark>

Convolutional Neural Networks are a specialized kind of neural network designed for processing structured grid data like images. They are particularly effective in visual recognition tasks.

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2Fn9hRjF0Few9p7MnWlt3A%2FCNN.png?alt=media&#x26;token=407e9a6c-5373-470a-9dcb-eaa93e7f13d0" alt=""><figcaption></figcaption></figure>

## Lecture : *(<mark style="color:orange;">06/04/2025</mark>`)`*

**Class Recording**

<mark style="color:purple;">**CNN**</mark>

{% embed url="<https://onyx-jay-6db.notion.site/DL-Notes-1d2d947d0c3e80539b46ea40c05d2533>" %}
Detail Notes by Akash
{% endembed %}

{% embed url="<https://onyx-jay-6db.notion.site/DL-Notes-1d2d947d0c3e80539b46ea40c05d2533>" %}
Good note of Ashish&#x20;
{% endembed %}

{% file src="<https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FpkLV6xht7rhPjJjchM4j%2FANN%2C%20Neurons%2C%20Activation%20Function.pdf?alt=media&token=1c413dc3-b8e1-4aaa-85ec-f13a51e55c80>" %}
Hand note by Ashish
{% endfile %}

## Lecture : *(<mark style="color:orange;">12/04/2025</mark>`)`*

**Class Recording**

### <mark style="color:purple;">**Autoencoders**</mark>

Autoencoders are neural networks designed to learn efficient representations (encodings) of data, typically for dimensionality reduction, denoising, or generative tasks. They work by trying to reconstruct their inputs.

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2Fkm3tyEXOzwdZ8XGPxYyE%2Fauto.png?alt=media&#x26;token=7b72c99c-a1ef-4ba8-b587-f5ac99b125eb" alt=""><figcaption></figcaption></figure>

### <mark style="color:purple;">**RNN (Recurrent Neural Network)**</mark> <a href="#bh-v_nri3fq0zyn4sudr9nn1" id="bh-v_nri3fq0zyn4sudr9nn1"></a>

RNNs are neural networks designed for sequential data, where the current output depends not only on the current input but also on previous inputs. They are widely used in tasks involving time series, language, and sequences.

<figure><img src="https://993787502-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxTZGvcOxnCaOvsmUzDj5%2Fuploads%2FvrkpdX7s0MB14Q8cnxaj%2Fautoi.png?alt=media&#x26;token=49069b3c-6743-43a9-ab2a-7bbbb32d7742" alt=""><figcaption></figcaption></figure>

## Lecture : *(<mark style="color:orange;">13/04/2025</mark>`)`*

<mark style="color:orange;">QUIZ</mark>
