📠Machine Learning

ML comes where Human expertise does not exits

chevron-rightCourse Contenthashtag

Introduction: Definitions, Datasets for Machine Learning, Different Paradigms of Machine Learning, Data Normalization, Hypothesis Evaluation, VC-Dimensions and Distribution, Bias-Variance Tradeoff, Linear Regression, Classification (5-6 Lectures)

• Bayes Decision Theory: Bayes decision rule, Minimum error rate classification, Normal density and discriminant functions Parameter Estimation: Maximum Likelihood and Bayesian Parameter Estimation (3-4 Lectures)

• Discriminative Methods: SVM, Distance-based methods, Linear Discriminant Functions, Decision Tree, Random Decision Forest and Boosting (4 Lectures)

• Dimensionality Reduction: PCA, LDA, ICA, SFFS, SBFS (2-3 Lectures)

• Clustering: k-means clustering, Gaussian Mixture Modeling, EM-algorithm (3 Lectures)

• Kernels and Neural Networks, Kernel Tricks, SVMs (primal and dual forms), K-SVR, K-PCA (2 Lectures)

• Artificial Neural Networks: MLP, Backprop, and RBF-Net (3 Lectures)

• Foundations of Deep Learning: CNN, Autoencoders (2-3 lectures)

• Time series analysis

chevron-rightExamshashtag
  • 50% internal

    • 22.5% (7.5% each) for 3 -Quizzes

    • 12.5% for Assignments 1 ( 2 group)

    • 15% for Assignments 1 ( 3 group)

  • 50% Main

chevron-rightMaterial hashtag
file-pdf
56MB

Lecture 1: (11/01/2025)

Class Recordingarrow-up-right

file-pdf
4MB

Category of Data Set: <Explalation needed>

What is ML

  • Learning is any process by which a system improves performance from experience – Herbert Simon

  • A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. —Tom Mitchell, 1997

E, T, P examples

  • Tic-tac-toe

    • T: Playing checkers

    • P: Percentage of games won against an arbitrary opponent

    • E: Playing practice games against itself

  • Hand writing check

    • T: Recognizing hand-written words

    • P: Percentage of words correctly classified

    • E: Database of humanlabeled images of handwritten words

  • Auto driven car

    • T: Driving on four-lane highways using vision sensors

    • P: Average distance traveled before a humanjudged error

    • E: A sequence of images and steering commands recorded while observing a human driver.

  • Email Spam check

    • T: Categorize email messages as spam or legitimate.

    • P: Percentage of email messages correctly classified.

    • E: Database of emails, some with human-given labels

Task T

  • Classifications of data

  • Ranking

  • Recommendation

  • Clustering

  • Density estimation

When do we use Machine Learning ?

• Human expertise does not exist (navigating on Mars)

• Humans can’t explain their expertise (speech recognition)

• Models must be customized (personalized medicine)

• Models are based on huge amounts of data (genomics)

• Learning isn't always useful

Sample Applications of ML

  • Web search

  • Computational biology

  • Finance

  • E-commerce

  • Space exploration

  • Robotics

  • Information extraction

  • Social networks

  • Debugging software

  • Medical imaging

Lecture 2: (12/01/2025)

Class Recordingarrow-up-right

Supervise learning and Unsupervised leaning

Supervise learning:

When we have training data and desire output

Like: email spanning or finding dog/cat from animals images

Binary vs Multi-class classification

Binary => true/ false

Multi-class => Multiple options

Unsupervised leaning

When we have training data only

There is no labelled data

Finding the patter(clusters) from the given data

Like: Astronomical data, market segmentation

Cluster:

Finding patters no of cluster is given by user is Unsupervised learning

Reinforcement Leaning

Learning with rewards, get reward for right work and penalty for wrong work

Example: ChatGPT and self car Driving

ML System Classification

Batch vs Online Learning

Batch => offline data learning

Online Learning => chat GPT

Instance Base vs Model Base

Challenges of Machine learning

  • Insufficient data

  • NonRepresentative training data

  • Poor data

  • Irrelevant data

Performance Measure

It also called Cost

Root Mean Square error

RMSE = sqrt [(Σ(Pi – Oi)²) / n]

Mean Square error

MSE = (Σ(Pi – Oi)²) / n

Explore Data

  1. Understand the data like

  • missing data

  • data range, count

  1. Visually watch the data using histograms

  • watch patter and outliner

  • Library => import matplotlib.pyplot as plt

  1. Duplicate Data => Remove it

  2. Segregate Data

  3. Get Unique Identifier

  4. Find Correlations => Standard Correlations Coefficient( Pearson’s r)

Segregate 20% (for less depend on data set) for Testing purpose

Lecture 3: (18/01/2025)

Class Recordingarrow-up-right

Data Preparation:

80% of data analysis is spent on the process of cleaning and preparation data.

Imputation: replacing null or blank vale with zero or mean or medium value so that instead of removing vale complete value. by mapping with some other value

Good Imputation: ?? Homework

Data Cleaning:

Capping: remove the outline

Encoding

Converting Text to Number… or we can said mapping text into numerical value. Like ordinal_encoder

Features Scaling and transformation

ML algo don’t perform well when input numerical attribute have very different scales. Gaussian Algo is best for ML, where min = 0

  1. Feature Scaling: Adjusting the range of features (e.g., normalization or standardization) to ensure all features contribute equally to the model, preventing dominance by features with larger magnitudes.

  2. Feature Transformation: Modifying features (e.g., log, square root, or polynomial transformations) to make data more suitable for modeling, often improving linearity or addressing skewness.

Multimodal Distribution

Hyperparameter reading: Grid search method

Feature Importance : Drop 0 like value

Evaluate on Test

Linear Regression:

circle-info

for reference ML ppt CS229

Mean square error problem (Cost function)

Iteration to find theta

Gradient Descent: Mountain Example

Lecture 4: (19/01/2025)

Class Recordingarrow-up-right

Linear Regression

models the relationship between a dependent variable Y and one or more independent variables X using a linear equation:

Y=β0+β1X+ϵY= \beta_0 + \beta_1 X + \epsilon

where β_0 is the intercept, β_1 is the slope, and epsilon(ϵ) is the error term.

Example: Predicting house prices based on square footage.

If Y= 50000 + 200X,

then a 1000 sq. ft. house costs 50000+200(1000) = 250000.

Hypothesis function

In Linear Regression, the represents the predicted output as a linear combination of input features:

h(X)=θ0+θ1X1h(X) = \theta_0 + \theta_1 X_1

where θ_0(intercept) and θ_1 (slope) are learned parameters.

why we calculate the hypothesis?

"To estimate the relationship between input 𝑋 and output 𝑌, allowing us to make predictions for new data."

Example: If h(X)= 50 + 10X, for X=5, the predicted value is h(5)=50+10(5)= 50 + 10(5) = 100.

Hypothesis function for Multiple Linear Regression

where predictions depend on multiple input features:

h(X)=θ0+θ1X1+θ2X2+...+θnXnh(X)= \theta_0 + \theta_1 X_1 + \theta_2 X_2 + ... + \theta_n X_n

Each X_i represents an independent variable, and θ_i are the learned coefficients.

Example: Predicting house price based on size (X_1) and number of rooms (X_2):

h(X)=50000+200X_1+10000X_2

For X_1= 1000 sq. ft, X_2 = 3 rooms, the price is ₹2,80,000.

Calculation Of θ ⇒ Cost Function

The values of θ_0,θ_1,… are found using Gradient Descent or the Normal Equation.

1. Gradient Descent Algorithm

Minimizes the cost function:

J(θ)=12mi=1m(h(Xi)Yi)2J(θ)= \frac{1}{2m} \sum_{i=1}^{m} (h(X_i) - Y_i)^2
  • m = Total number of training examples.

  • Xi​ = Input features of the ith training example.

  • Yi​ = Actual output (target value) of the ith training example.

  • h(Xi​) = Predicted output using the hypothesis function.

Update Rule:

θj:=θjαJθjθj:=\theta_j - \alpha \frac{\partial J}{\partial \theta_j}

where α is the learning rate.

Example: For data points (1,2), (2,2.8), (3,3.6), running gradient descent iteratively updates θ_0 and θ_1 to best fit h(X).

2. Normal Equation (Direct Method)

Used when data is small, as it’s computationally expensive for large datasets.

Solves for θ without iteration:

θ=(XTX)1XTYθ=(X^TX)^{-1}X^TY

Role of the hypothesis function

The hypothesis function serves as a mathematical model that maps inputs X to outputs Y, whether continuous (regression) or discrete (classification)

Regression, h(X) outputs continuous values, meaning the predictions can take any real number. Example: Predicting house prices—h(X) 50000 + 200X can output any value like ₹2,50,000 or ₹2,50,500. Predict quantities

Classification, h(X) outputs discrete values, meaning predictions belong to predefined categories. Example: Spam detection—h(X) predicts either Spam (1) or Not Spam (0) based on email features.

Predict label

Least Squares Optimization Problem

The Least Squares Optimization Problem finds the best-fit line by minimizes the sum of squared errors between predicted and actual values:

J(θ)=i=1m(Yih(Xi))2J(θ)= \sum_{i=1}^{m} (Y_i - h(X_i))^2

where

h(X)=θ0+θ1X. h(X)= \theta_0 + \theta_1 X.

Methods:

  1. Gradient Descent iteratively updates θ to minimize J(θ).

  2. Normal Equation directly computes.

Example: Fitting a line to points (1,2),(2,2.8),(3,3.6)(1,2), (2,2.8), (3,3.6) by minimizing the squared differences between actual Y and predicted h(X).

Pitfalls of Least Squares Optimization:

  1. Sensitive to Outliers: Large errors get squared, making the model biased toward extreme values.

  2. Overfitting in High Dimensions: Too many features (X) can lead to poor generalization.

  3. Multicollinearity: Highly correlated features cause unstable parameter estimates.

  4. Non-Linearity: Least squares assumes a linear relationship, failing for complex patterns.

  5. Heteroscedasticity: Unequal variance in errors violates model assumptions.

Example: If one house in a dataset has an extreme price (₹1 crore while others are ₹10-20 lakhs), the least squares model will be skewed.

Learning Rate:

Learning rate hyperparameter.

The learning rate (α) controls how much Gradient Descent updates model parameters in each step:

θj:=θjαJθjθj:= \theta_j - \alpha \frac{\partial J}{\partial \theta_j}

Effects:

  • Too high (α≫1) → Divergence (jumps over the minimum).

  • Too low (α≪1) → Slow convergence.

Example: If α= 0.01, the model learns steadily, but if α= 10, it may overshoot and fail to minimize the cost function.

  • When will be stop

    • fixed no for iteration

    • Stop at threshold

Numerical on MSE

Feature Scaling

improves Gradient Descent convergence by normalizing feature values. Two common methods:

Min-Max Scaling:

X=XXminXmaxXminX′= \frac{X - X_{\min}}{X_{\max} - X_{\min}}

Scales values between 0 and 1.

Standardization (Z-score):

X=XμσX′= \frac{X - \mu}{\sigma}

Centers mean at 0 with unit variance.

Example: If house sizes range from 500 to 5000 sq. ft, without scaling, Gradient Descent takes longer to converge. Normalizing makes updates uniform, speeding up learning.

Batch Gradient Decent vs Stochastic GD

  • Batch Gradient Descent computes gradients using the entire dataset, making it slow for large datasets but stable. Batch: is like linear search b/c we gave all data to Machine so Time increases

  • Stochastic Gradient Descent (SGD) updates parameters using one random instance at a time, making it faster but noisy. Stochastic: like random search b/c it pick random point and check GD but it will never stop at minimum but it will come near to that point b/c it changing testing data every time.

SGD does not converge exactly but oscillates near the minimum, helping escape local minima.

Example: In house price prediction, Batch GD updates after processing all houses, while SGD updates after each house, making it faster but less stable.

Mini Batch Gradient Decent

It is mix of both, instead of random data it will pick it will pick random sets and perform batch of that, it will go very close to global minima

Normal Equation Derivation

  • Not useful for large data set, b/c we have to take inverse of matrix and that’s costly.

  • Gradient calculation is faster by using Normal equations

  • It will best work for minimum 70k data..

  • If the inverse is not exist then clean data or use another approach

Polynomial Regression

Polynomial Regression extends Linear Regression by adding polynomial terms to capture non-linear relationships:

h(X)=θ0+θ1X+θ2X2+...+θnXnh(X) = \theta_0 + \theta_1 X + \theta_2 X^2 + ... + \theta_n X^n

Example: Predicting salary based on experience, where a simple linear model fails. If

h(X)= = 5000 + 2000X + 300X^2

for X = 5 years, the predicted salary is ₹32,500.

Learning Curves

A plot of training and validation errors vs. training size, showing model performance.

Underfitting

Occurs when the model is too simple (high bias), leading to high training and validation errors. Example: Linear regression on a curved dataset results in poor predictions.

Overfitting

Occurs when the model is too complex (high variance), fitting noise instead of patterns. Example: A high-degree polynomial perfectly fits training data but performs poorly on new data.

Lecture 5: (25/01/2025)

Class Recordingarrow-up-right

Practical Session only

Lecture 6: (01/02/2025)

Class Recordingarrow-up-right

Regularised Linear Models- tackles overfitting

Lasso Regression

Elastic

Error Bias & Variance Tradeoff & Irreducible

Bias => does not fit the data well, i.e. underfoot

Variance => small change in data result change lot i.e overfit

Irreducible => Noisy data, if you can’t fit the data in model. So we need to clean up the data. Remove outliners

Lecture 7: (02/02/2025)

Class Recordingarrow-up-right

Lecture 8: (08/02/2025)

Class Recordingarrow-up-right

https://chatgpt.com/share/67ac361b-1e38-800c-8d24-9e3991a11f25arrow-up-right

Lecture 9: (09/02/2025)

Class Recordingarrow-up-right

Doubt session Recordingarrow-up-right

https://chatgpt.com/share/67af6047-4a34-8006-a25b-168265542c77arrow-up-right

Lecture 10: (15/02/2025)

Class Recordingarrow-up-right

Random Forest

Random Forest is an ensemble learning method that builds multiple decision trees and aggregates their predictions to improve accuracy and reduce overfitting. It uses bagging and feature randomness for robustness.

Bagging (Bootstrap Aggregating)

in Random Forest improves stability and accuracy by training each decision tree on a different random subset of the dataset with replacement. This reduces variance and prevents overfitting.

Example: In a customer churn prediction model, each tree is trained on a different bootstrapped sample, and the final decision is made by averaging (regression) or voting (classification).

Example 2

Bagging in Random Forest can be understood using an example of classifying apples and oranges. Suppose we have a dataset of fruits with features like color, weight, and texture.

Each decision tree in the Random Forest is trained on a random subset of this dataset (with replacement). Some trees may focus more on color, while others on weight. When classifying a new fruit, the final decision is made by majority voting.

Like:

  • Tree 1: Says "Apple" based on red color

  • Tree 2: Says "Orange" based on texture

  • Tree 3: Says "Apple" based on weight

Final prediction: "Apple" (majority vote).

Feature Importance in Random Forest

measures how much each feature contributes to the model's decision-making. It helps in feature selection by identifying the most influential features.

Example: In a fruit classification model, color might be the most important feature, followed by texture and weight.

Formula:

FIj=1Ni=1N(Isplit,j(i))FIj= \frac{1}{N} \sum_{i=1}^{N} \left( I_{split, j}^{(i)} \right)

where:

  • FI_j = Feature importance of feature j

  • N = Number of trees

  • Isplit,j(i) = Importance of feature j in tree i

Code to get Feature Importance:

Lecture 11: (15/02/2025)(Evening class)

Class Recordingarrow-up-right

Boosting

Boosting is an ensemble technique that combines weak learners sequentially, where each model corrects the errors of the previous one, improving overall accuracy. It reduces bias and variance.

Example: In spam detection, boosting refines misclassified emails by focusing more on difficult examples in each iteration.

Example: Financial Fraud Detection (Using Bagging + Boosting Together)

How?

  1. Bagging inside Boosting: Use Random Forest (bagging) as the base estimator in AdaBoost/XGBoost to make boosting more robust.

  2. Boosting inside Bagging: Train multiple boosted models (e.g., Gradient Boosted Trees) and aggregate their predictions like bagging.

Step 1: Bagging (Random Forest) for Robust Feature Selection

  • A Random Forest model is trained using multiple decision trees on different subsets of transaction data.

  • Each tree gives independent predictions, and majority voting ensures stable, less overfitting-prone results.

  • Example:

    • Tree 1: Says "Fraud" based on transaction amount.

    • Tree 2: Says "Not Fraud" based on merchant type.

    • Tree 3: Says "Fraud" based on location difference.

    • Final Bagging Prediction: "Fraud" (majority vote).

Step 2: Boosting (XGBoost) for Enhanced Accuracy

  • The output from Random Forest is then fed into an XGBoost model, which corrects misclassifications.

  • The model assigns higher weights to misclassified transactions and improves fraud detection.

  • Example:

    • If Bagging misclassified a fraud case due to a rare merchant, Boosting will refine it using new weighted trees.

Final Outcome

By combining Bagging (for robustness) and Boosting (for accuracy improvement), the system detects fraud more reliably, reducing false positives and catching hard-to-detect fraudulent transactions.

Support Vector Machine (SVM)

Support Vector Machine (SVM) is a supervised learning algorithm that finds the optimal hyperplane to separate classes with maximum margin. It works well for both linear and non-linear classification using kernels.

Example: In spam detection, SVM separates spam and non-spam emails based on word frequency patterns.

Hard Margin vs. Soft Margin in SVM

  1. Hard Margin SVM:

    • Used when data is linearly separable with no misclassification.

    • Example: Perfectly separating red and blue balls in a 2D plane without overlap.

  2. Soft Margin SVM:

    • Allows some misclassification for better generalization in non-linearly separable data.

    • Example 1: Classifying emails as spam or non-spam, where some emails might be misclassified due to ambiguous words.

    • Example 2: Separating dog and cat images where some breeds (e.g., Pomeranian vs. Persian cat) have similar features.

Regularization Hyperparameter (CC)

The Regularization Hyperparameter (CC) in SVM controls the trade-off between maximizing margin and minimizing misclassification.

  • High CC (low regularization) → Focuses more on classifying all points correctly, leading to overfitting.

  • Low CC (high regularization) → Allows some misclassification, leading to better generalization.

Example 1: In spam detection, a high CC might overfit to specific spam words, while a low CC generalizes better. Example 2: In image classification, a low CC prevents overfitting to noise in training images.

Non-Linear SVM

When data is not linearly separable, SVM uses kernel tricks to map it into a higher-dimensional space where a hyperplane can separate the classes.

Example:

Classifying red and blue points that form concentric circles. A linear SVM fails, but using a Radial Basis Function (RBF) kernel, we transform data into a higher dimension where a clear separation is possible.

Graph (Visualization of Non-Linear SVM)

The plot shows how SVM with an RBF kernel separates non-linearly distributed data (moons dataset). The decision boundary curves around the data, demonstrating how kernel tricks enable SVM to handle complex patterns.

Lecture 12: (22/02/2025)

Class Recordingarrow-up-right

Kernel Function in SVM

A kernel function transforms non-linearly separable data into a higher-dimensional space, making it linearly separable.

Common Kernel Types:

  1. Linear Kernel – Used when data is already linearly separable.

    • Example: Separating spam vs. non-spam emails based on word frequency.

  2. Polynomial Kernel – Maps data into polynomial space for curved decision boundaries.

    • Example: Classifying different species of flowers with overlapping petal lengths.

  3. RBF (Gaussian) Kernel – Maps data to an infinite-dimensional space, capturing complex patterns.

    • Example: Detecting fraudulent transactions with non-linear relationships.

  4. Sigmoid Kernel – Similar to a neural network activation function.

    • Example: Handwriting recognition where patterns need non-linear separation.

SVM CODEarrow-up-right

Lecture 13: (23/02/2025)

Class Recordingarrow-up-right

Analyzing Covariance Matrix in ML

What is a Covariance Matrix?

A covariance matrix is a square matrix that captures the relationships between multiple variables in a dataset. Each element C(i, j) represents the covariance between variable X_i and X_j:

C(i,j)=1nk=1n(XkiXiˉ)(XkjXjˉ)C(i,j)= \frac{1}{n} \sum_{k=1}^{n} (X_{ki} - \bar{X_i})(X_{kj} - \bar{X_j})
  • Positive covariance → Variables increase together.

  • Negative covariance → One variable increases while the other decreases.

  • Zero covariance → No linear relationship.


Why is it Important in ML?

  1. Feature Relationship: Helps understand how features interact.

  2. Dimensionality Reduction: Used in PCA (Principal Component Analysis) to find uncorrelated axes.

  3. Multicollinearity Detection: Identifies redundant features in regression models.


Example with Visualization

Consider a dataset with two features, Height (cm) and Weight (kg).

Interpretation: If the covariance matrix has a high positive value, Height and Weight are strongly correlated.


Graphical Representation

  • Heatmap of the Covariance Matrix

This helps visualize how different features are related in high-dimensional datasets.


Relevance to PCA (Dimensionality Reduction)

PCA relies on the eigenvectors and eigenvalues of the covariance matrix to transform correlated variables into uncorrelated principal components.

This transforms the dataset into new axes where features are uncorrelated, making ML models more efficient.


Conclusion

The covariance matrix is a fundamental tool in ML for understanding feature relationships, reducing dimensions, and improving model efficiency.

Lecture 14: (01/03/2025)

Class Cancelled

Lecture 15: (02/03/2025)

Class Recording

Unsupervised Learning

Unsupervised learning finds hidden patterns or structures in data without labeled outputs. It is widely used in clustering, anomaly detection, and dimensionality reduction.

K-means Clustering: A Simple Yet Powerful Algorithm

K-Means is an unsupervised learning algorithm that groups data into K clusters by minimizing intra-cluster distance. It follows an iterative process of centroid initialization, point assignment, centroid update, and convergence.

  • How K-Means Works:

    • Select the number of clusters (K).

    • Randomly initialize K centroids.

    • Assign data points to the nearest centroid.

    • Recalculate centroids based on cluster means.

    • Repeat until convergence.

  • Finding Optimal K (Elbow Method):

    • Run K-Means for different K values.

    • Calculate total variation within clusters.

    • Plot results and find the "elbow point" where adding clusters no longer reduces variation significantly.

  • Applications & Considerations:

    • Works for 1D, 2D, and multi-dimensional data.

    • Used in customer segmentation, image compression, and heatmaps.

    • Running multiple times helps counter randomness in centroid initialization.

Limitations of K-Means Clustering

  • Sensitivity to Initialization – The algorithm's final clustering results can vary due to different initial centroid placements, leading to inconsistent outcomes.

  • Fixed Number of Clusters (K) – K-means requires specifying the number of clusters in advance, which can be challenging without prior knowledge of the data structure.

  • Struggles with Non-Spherical Clusters – It assumes clusters are spherical and evenly sized, making it ineffective for complex, irregularly shaped clusters.

  • Sensitivity to Outliers – Outliers can distort centroid positions, leading to inaccurate cluster assignments and affecting overall performance.

Cost function

The cost function for K-Means Clustering is the Sum of Squared Errors (SSE), also known as Inertia. It measures the compactness of clusters by calculating the squared distance between each data point and its assigned centroid.

J=i=1KxCixμi2J= \sum_{i=1}^{K} \sum_{x \in C_i} || x - \mu_i ||^2

The objective of K-Means is to minimize this cost function to achieve the best clustering.

Pseudocode

WORKING......

Mini-batch Elbow method

Lecture 16: (08/03/2025)

Class Recording

Silhouette Coefficient

The Silhouette Coefficient (or Silhouette Score) is a metric used to evaluate the quality of clustering in unsupervised learning. It measures how similar a data point is to its own cluster compared to other clusters. The score ranges from -1 to 1, where:

  • 1 → The data point is well clustered.

  • 0 → The data point is on the border between clusters.

  • -1 → The data point is likely misclassified.

S(i)=(b(i)a(i))/max(a(i),b(i))S(i)= (b(i)−a(i))/ max(a(i),b(i))

Bayesian Decision Theory Risk function

Bayesian Decision Theory provides a probabilistic approach to decision-making under uncertainty. The Risk Function quantifies the expected loss when making decisions based on uncertain information.

Lecture 17: (09/03/2025)

Class Recording

file-pdf
8MB

Principal Component Analysis (PCA)

Local Linear Embedding

Eigenvalue

Lecture 17: (16/03/2025)

Class Recording

Lecture : (22/03/2025)

Class Recording

Lecture : (23/03/2025)

Class Recording

Neural Networks

Lecture : (29/03/2025)

Class Recording

Neural Networks

Input Encoding: The original photo is processed into a latent feature space using a CNN.

Style Conditioning: A text encoder converts the prompt "Ghibli style" into a style embedding.

Latent Fusion: Cross-attention fuses the photo’s content with the Ghibli style.

Diffusion Refinement: An iterative diffusion model denoises the fused latent space to align it with the desired style.

Decoding: A decoder converts the refined latent representation back into the final stylized image.

Lecture : (30/03/2025)

Class Recording

Lecture : (05/04/2025)

Class Recording

CNN

Convolutional Neural Networks are a specialized kind of neural network designed for processing structured grid data like images. They are particularly effective in visual recognition tasks.

Lecture : (06/04/2025)

Class Recording

CNN

Detail Notes by Akash
Good note of Ashish
Hand note by Ashish

Lecture : (12/04/2025)

Class Recording

Autoencoders

Autoencoders are neural networks designed to learn efficient representations (encodings) of data, typically for dimensionality reduction, denoising, or generative tasks. They work by trying to reconstruct their inputs.

RNN (Recurrent Neural Network)

RNNs are neural networks designed for sequential data, where the current output depends not only on the current input but also on previous inputs. They are widely used in tasks involving time series, language, and sequences.

Lecture : (13/04/2025)

QUIZ

Last updated