📠Machine Learning

ML comes where Human expertise does not exits

Course Content

Introduction: Definitions, Datasets for Machine Learning, Different Paradigms of Machine Learning, Data Normalization, Hypothesis Evaluation, VC-Dimensions and Distribution, Bias-Variance Tradeoff, Linear Regression, Classification (5-6 Lectures)

• Bayes Decision Theory: Bayes decision rule, Minimum error rate classification, Normal density and discriminant functions Parameter Estimation: Maximum Likelihood and Bayesian Parameter Estimation (3-4 Lectures)

• Discriminative Methods: SVM, Distance-based methods, Linear Discriminant Functions, Decision Tree, Random Decision Forest and Boosting (4 Lectures)

• Dimensionality Reduction: PCA, LDA, ICA, SFFS, SBFS (2-3 Lectures)

• Clustering: k-means clustering, Gaussian Mixture Modeling, EM-algorithm (3 Lectures)

• Kernels and Neural Networks, Kernel Tricks, SVMs (primal and dual forms), K-SVR, K-PCA (2 Lectures)

• Artificial Neural Networks: MLP, Backprop, and RBF-Net (3 Lectures)

• Foundations of Deep Learning: CNN, Autoencoders (2-3 lectures)

• Time series analysis

Exams

50% internal
- 22.5% (7.5% each) for 3 -Quizzes
- 12.5% for Assignments 1 ( 2 group)
- 15% for Assignments 1 ( 3 group)
50% Main

Material

Class Recordings
Class Material
Python Library
- https://scikit-learn.org/stable/
Videos:
- Cost function

56MB

Hands On ML.pdf

PDF

Open

95KB

Data Analysis Cheat Sheet.pdf

PDF

Open

Gist of complete course

Lecture 1: (11/01/2025`)`

Class Recording

4MB

Lecture 1.pdf

PDF

Open

Category of Data Set: <Explalation needed>

What is ML

Learning is any process by which a system improves performance from experience – Herbert Simon
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. —Tom Mitchell, 1997

E, T, P examples

Tic-tac-toe
- T: Playing checkers
- P: Percentage of games won against an arbitrary opponent
- E: Playing practice games against itself
Hand writing check
- T: Recognizing hand-written words
- P: Percentage of words correctly classified
- E: Database of humanlabeled images of handwritten words
Auto driven car
- T: Driving on four-lane highways using vision sensors
- P: Average distance traveled before a humanjudged error
- E: A sequence of images and steering commands recorded while observing a human driver.
Email Spam check
- T: Categorize email messages as spam or legitimate.
- P: Percentage of email messages correctly classified.
- E: Database of emails, some with human-given labels

Task T

Classifications of data
Ranking
Recommendation
Clustering
Density estimation

When do we use Machine Learning ?

• Human expertise does not exist (navigating on Mars)

• Humans can’t explain their expertise (speech recognition)

• Models must be customized (personalized medicine)

• Models are based on huge amounts of data (genomics)

• Learning isn't always useful

Sample Applications of ML

Web search
Computational biology
Finance
E-commerce
Space exploration
Robotics
Information extraction
Social networks
Debugging software
Medical imaging

Lecture 2: (12/01/2025`)`

Class Recording

Supervise learning and Unsupervised leaning

Supervise learning:

When we have training data and desire output

Like: email spanning or finding dog/cat from animals images

Binary vs Multi-class classification

Binary => true/ false

Multi-class => Multiple options

Unsupervised leaning

When we have training data only

There is no labelled data

Finding the patter(clusters) from the given data

Like: Astronomical data, market segmentation

Cluster:

Finding patters no of cluster is given by user is Unsupervised learning

Reinforcement Leaning

Learning with rewards, get reward for right work and penalty for wrong work

Example: ChatGPT and self car Driving

ML System Classification

Batch vs Online Learning

Batch => offline data learning

Online Learning => chat GPT

Instance Base vs Model Base

Challenges of Machine learning

Insufficient data
NonRepresentative training data
Poor data
Irrelevant data

Performance Measure

It also called Cost

Root Mean Square error

RMSE = sqrt [(Σ(Pi – Oi)²) / n]

Mean Square error

MSE = (Σ(Pi – Oi)²) / n

Explore Data

Understand the data like

missing data
data range, count

Visually watch the data using histograms

watch patter and outliner
Library => import matplotlib.pyplot as plt

Duplicate Data => Remove it
Segregate Data
Get Unique Identifier
Find Correlations => Standard Correlations Coefficient( Pearson’s r)

Segregate 20% (for less depend on data set) for Testing purpose

Lecture 3: (18/01/2025)

Class Recording

4MB

Lecture_3_DataPreProcessing.pdf

PDF

Open

Data Preparation:

80% of data analysis is spent on the process of cleaning and preparation data.

Imputation: replacing null or blank vale with zero or mean or medium value so that instead of removing vale complete value. by mapping with some other value

Good Imputation: ?? Homework

Data Cleaning:

Capping: remove the outline

Encoding

Converting Text to Number… or we can said mapping text into numerical value. Like ordinal_encoder

Features Scaling and transformation

ML algo don’t perform well when input numerical attribute have very different scales. Gaussian Algo is best for ML, where min = 0

Feature Scaling: Adjusting the range of features (e.g., normalization or standardization) to ensure all features contribute equally to the model, preventing dominance by features with larger magnitudes.
Feature Transformation: Modifying features (e.g., log, square root, or polynomial transformations) to make data more suitable for modeling, often improving linearity or addressing skewness.

Multimodal Distribution

Hyperparameter reading: Grid search method

Feature Importance : Drop 0 like value

Evaluate on Test

Linear Regression:

for reference ML ppt CS229

Mean square error problem (Cost function)

Iteration to find theta

Gradient Descent: Mountain Example

Lecture 4: (19/01/2025`)`

Class Recording

5MB

Lecture_4_LinearRegressin.pdf

PDF

Open

Linear Regression

models the relationship between a dependent variable Y and one or more independent variables X using a linear equation:

Y= \beta_0 + \beta_1 X + \epsilon

where β_0 is the intercept, β_1 is the slope, and epsilon(ϵ) is the error term.

Example: Predicting house prices based on square footage.

If Y= 50000 + 200X,

then a 1000 sq. ft. house costs 50000+200(1000) = 250000.

Hypothesis function

In Linear Regression, the represents the predicted output as a linear combination of input features:

h(X) = \theta_0 + \theta_1 X_1

where θ_0(intercept) and θ_1 (slope) are learned parameters.

why we calculate the hypothesis?

"To estimate the relationship between input 𝑋 and output 𝑌, allowing us to make predictions for new data."

Example: If h(X)= 50 + 10X, for X=5, the predicted value is h(5)=50+10(5)= 50 + 10(5) = 100.

Hypothesis function for Multiple Linear Regression

where predictions depend on multiple input features:

h(X)= \theta_0 + \theta_1 X_1 + \theta_2 X_2 + ... + \theta_n X_n

Each X_i represents an independent variable, and θ_i are the learned coefficients.

Example: Predicting house price based on size (X_1) and number of rooms (X_2):

h(X)=50000+200X_1+10000X_2

For X_1= 1000 sq. ft, X_2 = 3 rooms, the price is ₹2,80,000.

Calculation Of θ ⇒ Cost Function

The values of θ_0,θ_1,… are found using Gradient Descent or the Normal Equation.

1. Gradient Descent Algorithm

Minimizes the cost function:

J(θ)= \frac{1}{2m} \sum_{i=1}^{m} (h(X_i) - Y_i)^2

m = Total number of training examples.
Xi = Input features of the ith training example.
Yi = Actual output (target value) of the ith training example.
h(Xi) = Predicted output using the hypothesis function.

Update Rule:

θj:=\theta_j - \alpha \frac{\partial J}{\partial \theta_j}

where α is the learning rate.

Example: For data points (1,2), (2,2.8), (3,3.6), running gradient descent iteratively updates θ_0 and θ_1 to best fit h(X).

2. Normal Equation (Direct Method)

Used when data is small, as it’s computationally expensive for large datasets.

Solves for θ without iteration:

θ=(X^TX)^{-1}X^TY

Role of the hypothesis function

The hypothesis function serves as a mathematical model that maps inputs X to outputs Y, whether continuous (regression) or discrete (classification)

Regression, h(X) outputs continuous values, meaning the predictions can take any real number. Example: Predicting house prices—h(X) 50000 + 200X can output any value like ₹2,50,000 or ₹2,50,500. Predict quantities

Classification, h(X) outputs discrete values, meaning predictions belong to predefined categories. Example: Spam detection—h(X) predicts either Spam (1) or Not Spam (0) based on email features.

Predict label

Least Squares Optimization Problem

The Least Squares Optimization Problem finds the best-fit line by minimizes the sum of squared errors between predicted and actual values:

J(θ)= \sum_{i=1}^{m} (Y_i - h(X_i))^2

where

h(X)= \theta_0 + \theta_1 X.

Methods:

Gradient Descent iteratively updates θ to minimize J(θ).
Normal Equation directly computes.

Example: Fitting a line to points (1,2),(2,2.8),(3,3.6)(1,2), (2,2.8), (3,3.6) by minimizing the squared differences between actual Y and predicted h(X).

Pitfalls of Least Squares Optimization:

Sensitive to Outliers: Large errors get squared, making the model biased toward extreme values.
Overfitting in High Dimensions: Too many features (X) can lead to poor generalization.
Multicollinearity: Highly correlated features cause unstable parameter estimates.
Non-Linearity: Least squares assumes a linear relationship, failing for complex patterns.
Heteroscedasticity: Unequal variance in errors violates model assumptions.

Example: If one house in a dataset has an extreme price (₹1 crore while others are ₹10-20 lakhs), the least squares model will be skewed.

Learning Rate:

Learning rate hyperparameter.

The learning rate (α) controls how much Gradient Descent updates model parameters in each step:

θj:= \theta_j - \alpha \frac{\partial J}{\partial \theta_j}

Effects:

Too high (α≫1) → Divergence (jumps over the minimum).
Too low (α≪1) → Slow convergence.

Example: If α= 0.01, the model learns steadily, but if α= 10, it may overshoot and fail to minimize the cost function.

When will be stop
- fixed no for iteration
- Stop at threshold

Numerical on MSE

Feature Scaling

improves Gradient Descent convergence by normalizing feature values. Two common methods:

Min-Max Scaling:

X′= \frac{X - X_{\min}}{X_{\max} - X_{\min}}

Scales values between 0 and 1.

Standardization (Z-score):

X′= \frac{X - \mu}{\sigma}

Centers mean at 0 with unit variance.

Example: If house sizes range from 500 to 5000 sq. ft, without scaling, Gradient Descent takes longer to converge. Normalizing makes updates uniform, speeding up learning.

Batch Gradient Decent vs Stochastic GD

Batch Gradient Descent computes gradients using the entire dataset, making it slow for large datasets but stable. Batch: is like linear search b/c we gave all data to Machine so Time increases
Stochastic Gradient Descent (SGD) updates parameters using one random instance at a time, making it faster but noisy. Stochastic: like random search b/c it pick random point and check GD but it will never stop at minimum but it will come near to that point b/c it changing testing data every time.

SGD does not converge exactly but oscillates near the minimum, helping escape local minima.

Example: In house price prediction, Batch GD updates after processing all houses, while SGD updates after each house, making it faster but less stable.

Mini Batch Gradient Decent

It is mix of both, instead of random data it will pick it will pick random sets and perform batch of that, it will go very close to global minima

Normal Equation Derivation

Not useful for large data set, b/c we have to take inverse of matrix and that’s costly.
Gradient calculation is faster by using Normal equations
It will best work for minimum 70k data..
If the inverse is not exist then clean data or use another approach

Polynomial Regression

Polynomial Regression extends Linear Regression by adding polynomial terms to capture non-linear relationships:

h(X) = \theta_0 + \theta_1 X + \theta_2 X^2 + ... + \theta_n X^n

Example: Predicting salary based on experience, where a simple linear model fails. If

h(X)= = 5000 + 2000X + 300X^2

for X = 5 years, the predicted salary is ₹32,500.

Learning Curves

A plot of training and validation errors vs. training size, showing model performance.

Underfitting

Occurs when the model is too simple (high bias), leading to high training and validation errors. Example: Linear regression on a curved dataset results in poor predictions.

Overfitting

Occurs when the model is too complex (high variance), fitting noise instead of patterns. Example: A high-degree polynomial perfectly fits training data but performs poorly on new data.

Lecture 5: (25/01/2025`)`

Class Recording

Practical Session only

Lecture 6: (01/02/2025`)`

Class Recording

3MB

Lecture_6_LogisticRegression.pdf

PDF

Open

Regularised Linear Models- tackles overfitting

Lasso Regression

Elastic

Error Bias & Variance Tradeoff & Irreducible

Bias => does not fit the data well, i.e. underfoot

Variance => small change in data result change lot i.e overfit

Irreducible => Noisy data, if you can’t fit the data in model. So we need to clean up the data. Remove outliners

Lecture 7: (02/02/2025`)`

Class Recording

3MB

Lecture_7_Classification.pdf

PDF

Open

Lecture 8: (08/02/2025`)`

Class Recording

4MB

Lecture_8_DecisionTree.pdf

PDF

Open

https://chatgpt.com/share/67ac361b-1e38-800c-8d24-9e3991a11f25

Lecture 9: (09/02/2025`)`

Class Recording

Doubt session Recording

https://chatgpt.com/share/67af6047-4a34-8006-a25b-168265542c77

Lecture 10: (15/02/2025`)`

Class Recording

4MB

Lecture_10_RandomForest.pdf

PDF

Open

Random Forest

Random Forest is an ensemble learning method that builds multiple decision trees and aggregates their predictions to improve accuracy and reduce overfitting. It uses bagging and feature randomness for robustness.

Bagging (Bootstrap Aggregating)

in Random Forest improves stability and accuracy by training each decision tree on a different random subset of the dataset with replacement. This reduces variance and prevents overfitting.

Example: In a customer churn prediction model, each tree is trained on a different bootstrapped sample, and the final decision is made by averaging (regression) or voting (classification).

Example 2 ⇒

Bagging in Random Forest can be understood using an example of classifying apples and oranges. Suppose we have a dataset of fruits with features like color, weight, and texture.

Each decision tree in the Random Forest is trained on a random subset of this dataset (with replacement). Some trees may focus more on color, while others on weight. When classifying a new fruit, the final decision is made by majority voting.

Like:

Tree 1: Says "Apple" based on red color
Tree 2: Says "Orange" based on texture
Tree 3: Says "Apple" based on weight

Final prediction: "Apple" (majority vote).

Feature Importance in Random Forest

measures how much each feature contributes to the model's decision-making. It helps in feature selection by identifying the most influential features.

Example: In a fruit classification model, color might be the most important feature, followed by texture and weight.

Formula:

FIj= \frac{1}{N} \sum_{i=1}^{N} \left( I_{split, j}^{(i)} \right)

where:

FI_j = Feature importance of feature j
N = Number of trees
Isplit,j(i) = Importance of feature j in tree i

Code to get Feature Importance:

Lecture 11: (15/02/2025`)(Evening class)`

Class Recording

3MB

Lecture_11_SVM.pdf

PDF

Open

Boosting

Boosting is an ensemble technique that combines weak learners sequentially, where each model corrects the errors of the previous one, improving overall accuracy. It reduces bias and variance.

Example: In spam detection, boosting refines misclassified emails by focusing more on difficult examples in each iteration.

Example: Financial Fraud Detection (Using Bagging + Boosting Together)

How?

Bagging inside Boosting: Use Random Forest (bagging) as the base estimator in AdaBoost/XGBoost to make boosting more robust.
Boosting inside Bagging: Train multiple boosted models (e.g., Gradient Boosted Trees) and aggregate their predictions like bagging.

Step 1: Bagging (Random Forest) for Robust Feature Selection

A Random Forest model is trained using multiple decision trees on different subsets of transaction data.
Each tree gives independent predictions, and majority voting ensures stable, less overfitting-prone results.
Example:
- Tree 1: Says "Fraud" based on transaction amount.
- Tree 2: Says "Not Fraud" based on merchant type.
- Tree 3: Says "Fraud" based on location difference.
- Final Bagging Prediction: "Fraud" (majority vote).

Step 2: Boosting (XGBoost) for Enhanced Accuracy

The output from Random Forest is then fed into an XGBoost model, which corrects misclassifications.
The model assigns higher weights to misclassified transactions and improves fraud detection.
Example:
- If Bagging misclassified a fraud case due to a rare merchant, Boosting will refine it using new weighted trees.

Final Outcome

By combining Bagging (for robustness) and Boosting (for accuracy improvement), the system detects fraud more reliably, reducing false positives and catching hard-to-detect fraudulent transactions.

Support Vector Machine (SVM)

Support Vector Machine (SVM) is a supervised learning algorithm that finds the optimal hyperplane to separate classes with maximum margin. It works well for both linear and non-linear classification using kernels.

Example: In spam detection, SVM separates spam and non-spam emails based on word frequency patterns.

Hard Margin vs. Soft Margin in SVM

Hard Margin SVM:
- Used when data is linearly separable with no misclassification.
- Example: Perfectly separating red and blue balls in a 2D plane without overlap.
Soft Margin SVM:
- Allows some misclassification for better generalization in non-linearly separable data.
- Example 1: Classifying emails as spam or non-spam, where some emails might be misclassified due to ambiguous words.
- Example 2: Separating dog and cat images where some breeds (e.g., Pomeranian vs. Persian cat) have similar features.

Regularization Hyperparameter (CC)

The Regularization Hyperparameter (CC) in SVM controls the trade-off between maximizing margin and minimizing misclassification.

High CC (low regularization) → Focuses more on classifying all points correctly, leading to overfitting.
Low CC (high regularization) → Allows some misclassification, leading to better generalization.

Example 1: In spam detection, a high CC might overfit to specific spam words, while a low CC generalizes better. Example 2: In image classification, a low CC prevents overfitting to noise in training images.

Non-Linear SVM

When data is not linearly separable, SVM uses kernel tricks to map it into a higher-dimensional space where a hyperplane can separate the classes.

Example:

Classifying red and blue points that form concentric circles. A linear SVM fails, but using a Radial Basis Function (RBF) kernel, we transform data into a higher dimension where a clear separation is possible.

Graph (Visualization of Non-Linear SVM)

The plot shows how SVM with an RBF kernel separates non-linearly distributed data (moons dataset). The decision boundary curves around the data, demonstrating how kernel tricks enable SVM to handle complex patterns.

Lecture 12: (22/02/2025`)`

Class Recording

Kernel Function in SVM

A kernel function transforms non-linearly separable data into a higher-dimensional space, making it linearly separable.

Common Kernel Types:

Linear Kernel – Used when data is already linearly separable.
- Example: Separating spam vs. non-spam emails based on word frequency.
Polynomial Kernel – Maps data into polynomial space for curved decision boundaries.
- Example: Classifying different species of flowers with overlapping petal lengths.
RBF (Gaussian) Kernel – Maps data to an infinite-dimensional space, capturing complex patterns.
- Example: Detecting fraudulent transactions with non-linear relationships.
Sigmoid Kernel – Similar to a neural network activation function.
- Example: Handwriting recognition where patterns need non-linear separation.

SVM CODE

Lecture 13: (23/02/2025`)`

Class Recording

Analyzing Covariance Matrix in ML

What is a Covariance Matrix?

A covariance matrix is a square matrix that captures the relationships between multiple variables in a dataset. Each element C(i, j) represents the covariance between variable X_i and X_j:

C(i,j)= \frac{1}{n} \sum_{k=1}^{n} (X_{ki} - \bar{X_i})(X_{kj} - \bar{X_j})

Positive covariance → Variables increase together.
Negative covariance → One variable increases while the other decreases.
Zero covariance → No linear relationship.

Why is it Important in ML?

Feature Relationship: Helps understand how features interact.
Dimensionality Reduction: Used in PCA (Principal Component Analysis) to find uncorrelated axes.
Multicollinearity Detection: Identifies redundant features in regression models.

Example with Visualization

Consider a dataset with two features, Height (cm) and Weight (kg).

Interpretation: If the covariance matrix has a high positive value, Height and Weight are strongly correlated.

Graphical Representation

Heatmap of the Covariance Matrix

import seaborn as sns

sns.heatmap(cov_matrix, annot=True, cmap="coolwarm")
plt.title("Covariance Matrix Heatmap")
plt.show()

This helps visualize how different features are related in high-dimensional datasets.

Relevance to PCA (Dimensionality Reduction)

PCA relies on the eigenvectors and eigenvalues of the covariance matrix to transform correlated variables into uncorrelated principal components.

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(data.T)

print("Principal Components:\n", pca.components_)

This transforms the dataset into new axes where features are uncorrelated, making ML models more efficient.

Principal Components:
 [[ 0.77334214  0.63398891]
 [-0.63398891  0.77334214]]

Conclusion

The covariance matrix is a fundamental tool in ML for understanding feature relationships, reducing dimensions, and improving model efficiency.

Lecture 14: (01/03/2025`)`

Class Cancelled

Lecture 15: (02/03/2025`)`

Class Recording

6MB

Lecture_KMEans.pdf

PDF

Open

Unsupervised Learning

Unsupervised learning finds hidden patterns or structures in data without labeled outputs. It is widely used in clustering, anomaly detection, and dimensionality reduction.

K-means Clustering: A Simple Yet Powerful Algorithm

K-Means is an unsupervised learning algorithm that groups data into K clusters by minimizing intra-cluster distance. It follows an iterative process of centroid initialization, point assignment, centroid update, and convergence.

How K-Means Works:
- Select the number of clusters (K).
- Randomly initialize K centroids.
- Assign data points to the nearest centroid.
- Recalculate centroids based on cluster means.
- Repeat until convergence.
Finding Optimal K (Elbow Method):
- Run K-Means for different K values.
- Calculate total variation within clusters.
- Plot results and find the "elbow point" where adding clusters no longer reduces variation significantly.
Applications & Considerations:
- Works for 1D, 2D, and multi-dimensional data.
- Used in customer segmentation, image compression, and heatmaps.
- Running multiple times helps counter randomness in centroid initialization.

Limitations of K-Means Clustering

Sensitivity to Initialization – The algorithm's final clustering results can vary due to different initial centroid placements, leading to inconsistent outcomes.
Fixed Number of Clusters (K) – K-means requires specifying the number of clusters in advance, which can be challenging without prior knowledge of the data structure.
Struggles with Non-Spherical Clusters – It assumes clusters are spherical and evenly sized, making it ineffective for complex, irregularly shaped clusters.
Sensitivity to Outliers – Outliers can distort centroid positions, leading to inaccurate cluster assignments and affecting overall performance.

Cost function

The cost function for K-Means Clustering is the Sum of Squared Errors (SSE), also known as Inertia. It measures the compactness of clusters by calculating the squared distance between each data point and its assigned centroid.

J= \sum_{i=1}^{K} \sum_{x \in C_i} || x - \mu_i ||^2

Where:
K = Number of clusters
x = Data point
μ_i Centroid of cluster C_i
|| x - u_i ||^2 = Squared Euclidean distance between the point and the centroid

The objective of K-Means is to minimize this cost function to achieve the best clustering.

Pseudocode

Initialize K centroids randomly  
Repeat until convergence:  
  Assign each data point to the nearest centroid  
  Update centroids by computing the mean of assigned points  
  Check for convergence (centroids no longer change)

WORKING......

Mini-batch Elbow method

Lecture 16: (08/03/2025`)`

Class Recording

6MB

Lecture_KMEans.pdf

PDF

Open

Silhouette Coefficient

The Silhouette Coefficient (or Silhouette Score) is a metric used to evaluate the quality of clustering in unsupervised learning. It measures how similar a data point is to its own cluster compared to other clusters. The score ranges from -1 to 1, where:

1 → The data point is well clustered.
0 → The data point is on the border between clusters.
-1 → The data point is likely misclassified.

S(i)= (b(i)−a(i))/ max(a(i),b(i))

where:
a(i) = Average intra-cluster distance 
(distance from i to all other points in the same cluster).

b(i) = Average nearest-cluster distance 
(distance from i to all points in the closest neighboring cluster).

Bayesian Decision Theory Risk function

Bayesian Decision Theory provides a probabilistic approach to decision-making under uncertainty. The Risk Function quantifies the expected loss when making decisions based on uncertain information.

Lecture 17: (09/03/2025`)`

Class Recording

8MB

Lecture_PCA.pdf

PDF

Open

Principal Component Analysis (PCA)

Local Linear Embedding

Eigenvalue

Lecture 17: (16/03/2025`)`

Class Recording

8MB

Lecture_BayesClassification.pdf

PDF

Open

Lecture : (22/03/2025`)`

Class Recording

340KB

Example_BayesClass.pdf

PDF

Open

Lecture : (23/03/2025`)`

Class Recording

8MB

Lecture_Neural Networks.pdf

PDF

Open

Neural Networks

Lecture : (29/03/2025`)`

Class Recording

Neural Networks

Input Encoding: The original photo is processed into a latent feature space using a CNN.

Style Conditioning: A text encoder converts the prompt "Ghibli style" into a style embedding.

Latent Fusion: Cross-attention fuses the photo’s content with the Ghibli style.

Diffusion Refinement: An iterative diffusion model denoises the fused latent space to align it with the desired style.

Decoding: A decoder converts the refined latent representation back into the final stylized image.

1MB

Neural_Networks.pdf

PDF

Open

Lecture : (30/03/2025`)`

Class Recording

Lecture : (05/04/2025`)`

Class Recording

5MB

CNN_250403_115321.pdf

PDF

Open

CNN

Convolutional Neural Networks are a specialized kind of neural network designed for processing structured grid data like images. They are particularly effective in visual recognition tasks.

Lecture : (06/04/2025`)`

Class Recording

CNN

Lecture : (12/04/2025`)`

Class Recording

Autoencoders

Autoencoders are neural networks designed to learn efficient representations (encodings) of data, typically for dimensionality reduction, denoising, or generative tasks. They work by trying to reconstruct their inputs.

RNN (Recurrent Neural Network)

RNNs are neural networks designed for sequential data, where the current output depends not only on the current input but also on previous inputs. They are widely used in tasks involving time series, language, and sequences.

Lecture : (13/04/2025`)`

QUIZ

PreviousExam Gist NextData Structures

Last updated 11 months ago

hashtagGist of complete coursearrow-up-right

hashtagLecture 1: (11/01/2025)

hashtagWhat is ML

hashtagE, T, P examples

hashtagTask T

hashtagWhen do we use Machine Learning ?

hashtagSample Applications of ML

hashtagLecture 2: (12/01/2025)

hashtagSupervise learning and Unsupervised leaning

hashtagSupervise learning:

hashtagBinary vs Multi-class classification

hashtagUnsupervised leaning

hashtagCluster:

hashtagReinforcement Leaning

hashtagML System Classification

hashtagChallenges of Machine learning

hashtagPerformance Measure

hashtagExplore Data

hashtagLecture 3: (18/01/2025)

hashtagData Preparation:

hashtagData Cleaning:

hashtagFeatures Scaling and transformation

hashtagMultimodal Distribution

hashtagMean square error problem (Cost function)

hashtagLecture 4: (19/01/2025)

hashtagLinear Regression

hashtagExample: Predicting house prices based on square footage.

hashtagHypothesis function

hashtagwhy we calculate the hypothesis?

hashtagHypothesis function for Multiple Linear Regression

hashtagCalculation Of θ ⇒ Cost Function

hashtag1. Gradient Descent Algorithm

hashtag2. Normal Equation (Direct Method)

hashtagRole of the hypothesis function

hashtagLeast Squares Optimization Problem

hashtagPitfalls of Least Squares Optimization:

hashtagLearning Rate:

hashtagEffects:

hashtagFeature Scaling

hashtagBatch Gradient Decent vs Stochastic GD

hashtagMini Batch Gradient Decent

hashtagNormal Equation Derivation

hashtagPolynomial Regression

hashtagLearning Curves

hashtagUnderfitting

hashtagOverfitting

hashtagLecture 5: (25/01/2025)

hashtagPractical Session only

hashtagLecture 6: (01/02/2025)

hashtagRegularised Linear Models- tackles overfitting

hashtagLecture 7: (02/02/2025)

hashtagLecture 8: (08/02/2025)

hashtagLecture 9: (09/02/2025)

hashtagLecture 10: (15/02/2025)

hashtagRandom Forest

hashtagBagging (Bootstrap Aggregating)

hashtagFeature Importance in Random Forest

hashtagLecture 11: (15/02/2025)(Evening class)

hashtagBoosting

hashtagExample: Financial Fraud Detection (Using Bagging + Boosting Together)

hashtagHow?

hashtagSupport Vector Machine (SVM)

hashtagHard Margin vs. Soft Margin in SVM

hashtagRegularization Hyperparameter (CC)

hashtagNon-Linear SVM

hashtagLecture 12: (22/02/2025)

hashtagKernel Function in SVM

hashtagCommon Kernel Types:

hashtagLecture 13: (23/02/2025)

hashtagAnalyzing Covariance Matrix in ML

hashtagRelevance to PCA (Dimensionality Reduction)

hashtagLecture 14: (01/03/2025)

hashtagLecture 15: (02/03/2025)

hashtagUnsupervised Learning

hashtagK-means Clustering: A Simple Yet Powerful Algorithm

hashtagLimitations of K-Means Clustering

hashtagCost function

hashtagPseudocode

hashtagWORKING......

hashtagMini-batch Elbow method

Gist of complete course

Lecture 1: (11/01/2025`)`

What is ML

E, T, P examples

Task T

When do we use Machine Learning ?

Sample Applications of ML

Lecture 2: (12/01/2025`)`

Supervise learning and Unsupervised leaning

Supervise learning:

Binary vs Multi-class classification

Unsupervised leaning

Cluster:

Reinforcement Leaning

ML System Classification

Challenges of Machine learning

Performance Measure

Explore Data

Lecture 3: (18/01/2025)

Data Preparation:

Data Cleaning:

Features Scaling and transformation

Multimodal Distribution

Mean square error problem (Cost function)

Lecture 4: (19/01/2025`)`

Linear Regression

Example: Predicting house prices based on square footage.

Hypothesis function

why we calculate the hypothesis?

Hypothesis function for Multiple Linear Regression

Calculation Of θ ⇒ Cost Function

1. Gradient Descent Algorithm

2. Normal Equation (Direct Method)

Role of the hypothesis function

Least Squares Optimization Problem

Pitfalls of Least Squares Optimization:

Learning Rate:

Effects:

Feature Scaling

Batch Gradient Decent vs Stochastic GD

Mini Batch Gradient Decent

Normal Equation Derivation

Polynomial Regression

Learning Curves

Underfitting

Overfitting

Lecture 5: (25/01/2025`)`

Practical Session only

Lecture 6: (01/02/2025`)`

Regularised Linear Models- tackles overfitting

Lecture 7: (02/02/2025`)`

Lecture 8: (08/02/2025`)`

Lecture 9: (09/02/2025`)`

Lecture 10: (15/02/2025`)`

Random Forest

Bagging (Bootstrap Aggregating)

Feature Importance in Random Forest

Lecture 11: (15/02/2025`)(Evening class)`

Boosting

Example: Financial Fraud Detection (Using Bagging + Boosting Together)

How?

Support Vector Machine (SVM)

Hard Margin vs. Soft Margin in SVM

Regularization Hyperparameter (CC)

Non-Linear SVM

Lecture 12: (22/02/2025`)`

Kernel Function in SVM

Common Kernel Types:

Lecture 13: (23/02/2025`)`

Analyzing Covariance Matrix in ML

Relevance to PCA (Dimensionality Reduction)

Lecture 14: (01/03/2025`)`

Lecture 15: (02/03/2025`)`

Unsupervised Learning

K-means Clustering: A Simple Yet Powerful Algorithm

Limitations of K-Means Clustering

Cost function

Pseudocode

WORKING......

Mini-batch Elbow method