# Data Structures

Course Content

A tentative list of topics that will be covered in this course are: Introduction to Algorithms, Insertion Sort, Divide and Conquer, Merge Sort, Solving Recurrences, Hashing, Binary Search Tree, Red Black Tree, Augmenting Data Structures, Disjoint Sets, Graphs, Dijkstra's Algorithm, Maximum Flow, Matching, String Matching, KMP Alogrithm, Dynamic Programming (Rod Cutting, LCS, etc. ), Complexity classes P, NP, co-NP, NP-complete, NP-Hard, Example reductions between problems.

Exams

* 55% internal * 30% for 2 assignment * [1st Deadline ⇒ 23Feb](https://m-tech-in-artificial-intelligenc.gitbook.io/manvendrapratapsinghdev/trimester-1/broken-reference) * **25% Quiz** * 23 feb ⇒ 1st Quiz * 45% Main

Material

* [**Class Recordings**](https://general-smile-94b.notion.site/DSA-Class-Recording-1980dfee4e4380ed9bb0c9a2b4787540) * [**Class Material**](https://github.com/manvendrapratapsinghdev/IITJMaterial/tree/main/T1/DSA) * [**Books**](https://m-tech-in-artificial-intelligenc.gitbook.io/manvendrapratapsinghdev/trimester-1/broken-reference)

## Lecture 1: *(11/01/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1pgt51KGnl74hSujvVaHRswwKOwF_nCro/view?usp=sharing) {% file src="" %} What is Algo Problem When we say algo is solving ## Lecture 2:*(12/01/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1TGfVwGxfmn1Jz3mWV-nqs7y-4tvVfFeL/view?usp=sharing) {% file src="" %} ### InsertionSort: **Technic** ==> Incremental {% code title="Pseudocode code" %} ```csharp (A, n) for i = 2 to n: key = arr[i] j = i - 1 while j > 0 and arr[j] > key: arr[j + 1] = arr[j] j = j - 1 arr[j + 1] = key ``` {% endcode %} ```python // Python Code of Insertion sort array = [6, 5, 4, 3, 1, 2] print("Initial Array =", array) for i in range(1,len(array)): ele = array[i] j = i-1 while j>=0 and ele< array[j]: array[j+1] = array[j] j = j-1 array[j+1] = ele print("i =", i, "Then Array =", array) ``` ```python // Dry Run Initial Array = [6, 5, 4, 3, 1, 2] i = 1 Then Array = [5, 6, 4, 3, 1, 2] i = 2 Then Array = [4, 5, 6, 3, 1, 2] i = 3 Then Array = [3, 4, 5, 6, 1, 2] i = 4 Then Array = [1, 3, 4, 5, 6, 2] i = 5 Then Array = [1, 2, 3, 4, 5, 6] ``` ### Measure the Run Time **Code it and check the runtime** **Drawback**: * Depend upon the size of Array * Hardware depends ### Run Time No of basic operation an algo perform as function of input size f(n) let's take the example of InsertionSort |

| Basic operation | No of times | Total Time | | ------------------------------ | --------------- | ------------------- | ---------- | | for i = 2 to n: | 2 | N | 2n | | key = arr\[i] | 1 | N-1 | N-1 | | j = i - 1 | 1 | N-1 | N-1 | | while j > 0 and arr\[j] > key: | 2 | Σti i = 2..n | 2 \*Σti | | arr\[j + 1] = arr\[j] | 1 | Σt(i-1) i = 2..n | Σt(i-1) | | j = j - 1 | 1 | Σt(i-1) i = 2..n | Σt(i-1) | | arr\[j + 1] = key | 1 | N-1 | N-1 | ``` T(n) = Basic operation * No of times T(n) = 2n + 3(n-1) + 2 Σ times + 2 Σ (ti-1) ``` **Best case scenario:** ``` when ti = 1 = 2n + 3(n-1) + 2(n-1) + 0 = 7n - 5 ==> an + b where a and b are const = O(n) ``` **Worst case scenario** ``` when ti = i = 2n + 3(n-1) + 2(n(n+1)/2-1) 2(n-1)/2-1 = a*n2 + B, an2 + b where a and b are const = O(n2) ``` #### Why only leading term matters > *Larger power take more time to execute for large number of data set it might take less time for smaller vales depending on the expression.* **We can prove this by graph** ## Lecture 3:*(18/01/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1aoe4ySwsR52dkSfV2ZAQvWYCvIbBqLIb/view?usp=sharing) {% file src="" %} ### Merge Sort **Technic** => Divide and conquer **Divide** => one or more smaller problem **conquer** => Solve the sub problem using recursively **Combine** => Merge the solution of small problem and create solution of problem

call => MergeSort(arr, 1, length(arr))
MergeSort(arr, s, e):
    if s >= e:
        return
    mid = s + (e - s)/2
    MergeSort(arr, s, mid)   
    MergeSort(arr, mid+1, e)
    Merge(arr, arr[s: mid], arr[mid+1: e])   
// end of method MergeSort
Merge(arr, left, right):
    i = 1, j = 1, k = 1
    n = length(left)
    m = length(right)
    while i <=n and j <=m:
        if  left[i]<= right[j]:
            arr[k]= left[i]
            i++
         else    
             arr[k]= right[j]
             j++
         k++
         //end of while
    while i <=n:
            arr[k]= left[i]
            i++, k++     
    while j <=m:
            arr[k]= right[j]
            j++, k++
    //end of Merge method

### Complexity $$ T(n)=2^kT(n/2 ^k)+cn $$

T(n) = c’NlogN +cN => O(NlogN) (b/c it is leading term)

### Recurrence Reaction A is a equation a/c to which a term of sequence of no is equal to some combination of previous term Recursive Tree Method $$ T(n) = 3\*T(n/4) +cn^2 $$

**Height** => log4(n+1) **Length** => (Log4(n+1))3 **Total Time** => O(n2) ## Lecture 4:*(19/01/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1RvZ1KQZWuhIsJO4WfjdbTtMGPUq22wmh/view?usp=sharing) {% file src="" %} **Self learning:** Stack, queue and Linked-list ### Hashing It is typically to maintain a **dynamic set** that support **three operations** => CRD **Dynamic set**: Set of elements where every element has key ### Direct-address Tables (Array) `Every election of Dynamic set has a distinct key` `Every element has a key. Like in array index is key` Key = {0,1,… n-1} **Cons** => Space issue To tackle this problem we use Hash Tables ### Hash Tables In hash table denoted by T\[0: m-1], whr m< n Key = {0,1,… n-1} #### Hash function $$ T\[h(x.key)], where h: U {0,1…m-1} $$ **Issue** when two values map on same key this is called **Collision.**

#### Chaining * to handele collision * we create a linked list * we will map the key to a linked list and that connect the new element

## Lecture 5: *(25/01/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1X6MibH2gHpPqvyX6O505QkVrA4RvXlB8/view?usp=sharing): {% file src="" %} #### **Independent Uniform Hashing** * **Definition**: A hash function $$h: U \rightarrow {0, 1, \ldots, m-1}$$ maps keys from a universe $$U = {0, 1, \ldots, n-1}$$ to a hash table $$T\[0: m-1]$$ ($$m < n$$) such that: * Each key $$k$$ is assigned a uniformly random slot $$h(k)$$. * Repeated calls for the same $$k$$ return the same $$h(k)$$. * **Expected Chain Length**: * For slot $$j$$, define random variables $$X\_0, X\_1, \ldots, X\_{n-1}$$, where $$X\_i = 1$$ if $$h(i) = j$$, else $$0$$. * The chain length $$T\_j = X\_0 + X\_1 + \ldots + X\_{n-1}$$. * Using linearity of expectation: $$ \mathbb{E}\[T\_j] = \sum\_{i=0}^{n-1} \mathbb{E}\[X\_i] = n \cdot \frac{1}{m} = \frac{n}{m} $$ * When $$n = m$$, search time is $$O(1)$$. *** #### **Open Addressing** * **Concept**: All elements are stored directly in the hash table. Collisions are resolved by probing subsequent slots. * **Probe Sequence**: A permutation of $$\langle h(k, 0), h(k, 1), \ldots, h(k, m-1) \rangle$$ for key $$k$$. * **Operations**: * **Insert**: Iterate through probe sequence until an empty slot is found. ```plaintext OA-INSERT(T, x): i = 0 do: q = h(x.key, i) if T[q] == NIL: T[q] = x return q else: i += 1 while i ≠ m error "Hash table overflow" ``` * **Search**: Follow probe sequence until the key is found or a NIL slot is encountered. * **Delete**: Marking a slot as NIL can break probe sequences for other keys, requiring careful handling. * **Probing Methods**: * **Linear Probing**: $$h(k, i) = (h'(k) + i) \mod m$$. * **Double Hashing**: $$h(k, i) = (h\_1(k) + i \cdot h\_2(k)) \mod m$$, where $$\gcd(h\_2(k), m) = 1$$. *** #### **Binary Search Trees (BSTs)** * **Structure**: A binary tree where each node contains: * Key and satellite data. * Pointers to parent, left child, and right child. * **BST Property**: For any node $$x$$: * Nodes in the left subtree have keys $$\leq x$$.key. * Nodes in the right subtree have keys $$\geq x$$.key. * **Supported Operations**: * Insert, Search, Delete. * Find minimum/maximum. * Find successor/predecessor. *** #### **Summary** * **Hashing**: Independent uniform hashing ensures an expected chain length of $$\frac{n}{m}$$. Open addressing resolves collisions via probing but requires careful deletion handling. * **BSTs**: Maintain dynamic sets with efficient operations by preserving the BST property. Probing methods like linear and double hashing enable collision resolution in open addressing. ## Lecture 6: *(02/02/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1dMTmKL823nMtTo-vWRZOF0AJfHjrISYm/view?usp=sharing) {% file src="" %} ### **Binary Search Tree** {% embed url="" %} Read this {% endembed %} ## Lecture 7: *(08/02/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1SeU_q0Y0rR-lV-6o09XqNRPFZoTclz0N/view?usp=sharing): {% file src="" %} ### **Red-Black Trees** All you need to about R-B is added in this blog: {% embed url="" %} Self-balancing BSTs that maintain O(log n) height through these properties: 1. Every node is **red** or **black** 2. Root is always black 3. All Nil leaves are black 4. Red nodes cannot have red children 5. Equal black nodes count (black height) on all root-to-leaf paths\[1] ## Lecture 8: *(09/02/2025`)`* [**Class Recording**](https://drive.google.com/file/d/1guUqlHYvbBBmtn5PKgxYkq8kznAC9qcb/view?usp=sharing): {% file src="" %} Red-Black Trees are balanced binary search trees that maintain logarithmic height through color-based constraints. This lecture covers their height properties and insertion mechanics through detailed proofs and case analyses. ## Lecture 9: *(15/02/2025`)`* **Class canceled** ## Lecture 9: *(16/02/2025`)`* [**Class Recording**:](https://drive.google.com/file/d/1mCfSKVs9e7Bi1MZegioY71u2-WBvS-SK/view) {% file src="" %} ### **Red-Black Tree Deletion** All you need to about R-B deletion is added in this blog: {% embed url="" %} ## Lecture 10: *(22/02/2025`)`* [**Class Recording**:](https://drive.google.com/file/d/1rAY886NHZiD4GcE6vkEeVUrpBXZFZzXo/view) {% file src="" %} * Red-Black Tree Deletion Cases ## Lecture 11: *(23/02/2025`)`* **Minor Exam** {% file src="" %} ## Lecture 11: *(01/03/2025`)`* [**Class Recording**:](https://drive.google.com/file/d/1d2CLGRxjBvSgVDwPRqqKh2MQ0bw34oK2/view?usp=sharing) {% file src="" %} ### Augmenting Data Structure {% file src="" %} It will be the extension of RB tree with extra value number of element it contain in it’s sub tree. #### Finding the ith element ```python FUNCTION Ith_element(node, i): 1. IF node == NULL: 2. RETURN NULL // i is out of range 3. Left_node_size = 0 4. IF node.left ≠ NULL: 5. Left_node_size = node.left.size 6. IF i == Left_node_size + 1: 7. RETURN node.key // Found the i-th element 8. ELSE IF i ≤ Left_node_size: 9. RETURN Ith_element(node.left, i) // Traverse in left subtree 10. ELSE: 11. RETURN Ith_element(node.right, i - (Left_node_size + 1)) // Traverse in right subtree ``` #### Finding the Rank of element ```python FUNCTION rank(T, key): 1. x = find_node(T.root, key) // Find the node containing 'key' 2. IF x == NULL: 3. RETURN 0 // Key not found 4. r = x.left.size + 1 5. y = x 6. while y ≠ T.root: 7. if y == y.p.right: 8. r = r + y.p.left.size + 1 9. y = y.p 10. return r FUNCTION find_node(node, key): 11. WHILE node ≠ NULL AND node.key ≠ key: 12. IF key < node.key: 13. node = node.left 14. ELSE: 15. node = node.right 16. RETURN node ``` ## Lecture 12: *(02/03/2025`)`* [**Class Recording**:](https://drive.google.com/file/d/1mbbbS1GSUVPWirv09FuTgzvwfl9yCFS9/view?usp=sharing) {% file src="" %} ## Data Structure * Disjoint-set is an advanced data structure used for handling collections of disjoint dynamic sets. * Also known as **Union-Find** or **Merge-Find Set**. * Commonly used in **graph algorithms** to determine connected components.

Sets

* A **set** is a collection of distinct objects. * **nion**: Combines two sets into one. * **Intersection**: Finds common elements between two sets. * **Disjoint Sets**: Two sets are **disjoint** if they have no common elements.

#### {% file src="" %} #### **Core Operations** The disjoint-set data structure supports three fundamental operations: **1. MAKE-SET(x)** Creates a new set containing only element `x`. ``` MAKE-SET(1), MAKE-SET(2) Sets: {1}, {2} ``` **2. UNION(x, y)** Merges the sets containing `x` and `y` into one. ``` UNION(2, 6) → {2, 6} ``` **3. FIND-SET(x)** Finds the representative (leader) of the set containing `x`. ``` EditFIND-SET(2) → 1 (if 1 is the representative) ``` #### Implementation The most efficient implementation uses a forest of trees: * Each element is a node in a tree * Each tree represents one subset * The root of each tree serves as the representative of that set * Initially, each element forms its own set (is its own parent) #### **Optimizations (Heuristics)** #### **1. Union by Rank** * Always attach the tree with fewer nodes to the root of the larger tree. * Ensures shallower trees, improving efficiency. #### **2. Path Compression** * During `FIND-SET(x)`, update each node on the path to directly point to the root. * Makes future queries much faster ### **Applications** * **Graph Algorithms**: * Handling queries about connectivity in graphs that change over time. * **Kruskal's Algorithm** for Minimum Spanning Tree. * **Networking**: Finding clusters in a network. * **Image Processing**: Segmenting images into different regions based on pixel similarity checkout this [*blog* ](https://www.hackerearth.com/practice/data-structures/disjoint-data-strutures/basics-of-disjoint-data-structures/tutorial/)for more information about Disjoint

## Lecture 13: *(08/03/2025`)`* [**Class Recording**:](https://drive.google.com/file/d/11TynGRonsxZu2p1xyccmW0vi7-PeXwA-/view?usp=sharing) {% file src="" %} ## Disjoint As Tree

#### MAKE-SET ``` function MakeSet(x) is if x is not already in the forest then x.parent := x x.size := 1 // if nodes store size x.rank := 0 // if nodes store rank end if end function ``` Complexity of Make-Set is ***O(1)*** #### FIND-SET ``` function FIND-SET(x) is if x.parent == x then return x else return FIND-SET(x.parent) end if end function ``` Complexity of Find-Set is ***O(log n)*** #### UNION ``` function UNION(x, y) is x_root := FIND-SET(x) y_root := FIND-SET(y) if x_root == y_root then return end if if x_root.rank < y_root.rank then x_root.parent := y_root else if x_root.rank > y_root.rank then y_root.parent := x_root else // If ranks are the same, make one point to the other and increment its rank y_root.parent := x_root x_root.rank := x_root.rank + 1 end if end function ``` while making node we perform Union and In Union we perform **rank check operation** not just adding in parallel By this we can say, In every tree the max height is **O(logn)** so \ Find node complexity = O(logn) \ \ Complexity of Make-Set is 2\*O(log n) ***⇒ O(log n)*** **Example of Disjoint-Tree**

### Path compression Heuristic Path compression is an optimization in the **Find** operation that flattens the tree structure by making nodes point directly to the root. #### **Why We Do It?** * To **speed up future Find operations** by reducing tree depth. * Helps in **keeping the tree nearly flat**, making Union-Find operations almost constant time **O(α(n))**, where **α(n)** is the inverse Ackermann function (very slow-growing). #### **Benefit** * **Improves efficiency**, making `Find(x)` nearly **O(1)** for large datasets. * Reduces redundant traversal, **optimizing memory and computation** in Disjoint Set operations. 🚀

{% embed url="" %} ## Lecture 14: *(09/03/2025`)`* [**Class Recording**:](https://drive.google.com/file/d/1PLWuweJK6386fvmHuv2K2GohgiZmr1K9/view?usp=sharing) {% file src="" %} ### Graph 1. **Graphs in Real Life**\ Graphs are used to represent relationships in real-world scenarios, such as airline networks, job assignments, and network connectivity. 2. **Undirected and Directed Graphs**\ An undirected graph consists of vertices connected by edges with no direction, whereas a directed graph has edges with specified directions. 3. **Graph Representation**\ Graphs can be represented using adjacency lists (efficient for sparse graphs) or adjacency matrices (efficient for checking edge existence). 4. **Basic Terminology**\ Concepts like adjacency, degree of a vertex, paths, cycles, and shortest distance help describe relationships between nodes in a graph. 5. **Graph Applications**\ Graph theory is applied in solving shortest path problems, network flow optimization, social network analysis, and circuit design. ## Lecture 15: *(15/03/2025`)`* [**Class Recording**:](https://drive.google.com/file/d/1klziVZ0R6uGin9uMrA3D8R3DBWO6UWE5/view) {% file src="" %} ### Breadth-first Search, Dijkstra ## Lecture 16: *(22/03/2025`)`* [**Class Recording**:](https://drive.google.com/file/d/12JfJbMdYa8KNJRZMRVwUIUbHkx5NwWRu/view) {% file src="" %} ### Dijkstra, Flow Networks ## Lecture 17: *(23/03/2025`)`* [**Class Recording**:](http://drive.google.com/file/d/1QEBXIvkwE7vmrjlYy4nyGhNe6UVQUwEe/view) {% file src="" %} ### Flow Networks, Ford-Fulkerson Method ## Lecture 18: *(29/03/2025`)`* [**Class Recording**:](https://drive.google.com/file/d/1VC-IJVf4Kcg4yBl_sRZjnNtBKp6e64kv/view) {% file src="" %} ### Bipartite Maximum Matching Bipartite Graph Bipartite matching Maximum Bipartite matching #### **Bipartite Matching** {% file src="" %} ## Lecture 19: *(30/03/2025`)`* [**Class Recording**:](https://drive.google.com/file/d/1ojSnfHymR50_wnqmKbLnniw03hT2XL2-/view) {% file src="" %} ### Dynamic Programming Dynamic Programming (DP) is a powerful algorithmic technique to solve complex problems by breaking them down into simpler subproblems. It is particularly effective for **optimization** problems that exhibit two key properties: **overlapping** subproblems and **optimal** substructure

### **Road cutting problem** The Rod Cutting Problem is a classic optimization challenge where the objective is to determine the optimal way to cut a rod of length *n* into smaller segments to maximize total revenue, given a price table for each segment length

## Lecture 20: *(06/03/2025`)`* [**Class Recording**:](https://drive.google.com/file/d/1Xqa-8fAp3BHoIlAsdcaFl_JyiVPO5Vm3/view) {% file src="" %} ### More Dynamic Programming Examples **LCS Dynamic Programming** * **Subsequence Definition**: Dropping 0+ characters while preserving order. * **LCS Problem**: Find longest subsequence common to two strings. * **Applications**: Measures similarity (e.g., DNA, text comparison). * **Brute Force Approach**: Exponential time, checks all subsequences. * **Optimal Substructure**: LCS can be built from solutions of subproblems. * **Recurrence Relation**: Forms the DP backbone with match/mismatch rules. * **Overlapping Subproblems**: Justifies using memoization/tabulation. * **Top-Down DP**: Recursive + memoization avoids recomputation. * **Bottom-Up DP**: Iteratively fills DP table using recurrence. * **Top-Down vs Bottom-Up**: Trade-off between recursion and full coverage. * **Time Complexity**: Both DP methods run in O(nm).

**Test you answer here:** {% embed url="" %} ## Lecture 21: *(06/03/2025`)`* [**Class Recording**:](https://drive.google.com/file/d/12q3l-1vqE_B5pXGBV7IalPwbMSroTbUs/view) {% file src="" %} ### P, NP, and NP-completeness ### Search Vs Decision Problem In computational complexity theory, search and decision problems represent two fundamental types of computational challenges.\ A decision problem asks a yes-or-no question based on given input values. For example:\ • Is a given number prime?\ • Does x evenly divide y?\ • Is a formula satisfiable?\ A search problem requires finding an actual solution rather than just determining its existence. For example:\ • Find a satisfying assignment for a formula\ • Find a clique of size k in a graph\ • Find a prime factor of a number While both decision and search problems are fundamental in computational complexity theory, decision problems do hold several important advantages over search problems Decision problems are often more tractable computationally. They typically require less computational resources since they only need to determine existence rather than construct a complete solution. **If a decision problem is not solvable in polynomial time, then its corresponding search problem cannot be solved in polynomial time either.** ### Complexity Class P

T**opics** of this lecture 1. **Search vs. Decision Problems**\ Search problems find solutions; decision problems answer Yes/No. 2. **Complexity Class P**\ P contains problems solvable in polynomial time. 3. **Complexity Class NP**\ NP contains problems verifiable in polynomial time. 4. **P vs NP**\ The key question: Is every NP problem also in P? 5. **NP-Completeness**\ NP-complete problems are the hardest in NP; solving one solves all. 6. **SAT and 3SAT**\ SAT and 3SAT are classic NP-complete problems involving Boolean formulas. 7. **NP-Complete Network**\ Various NP-complete problems are interlinked by polynomial-time reductions.