Data Structures

Lecture 1_class_notes.pdf

4MB

PDF

What is Algo

Problem

When we say algo is solving

Lecture 2:(12/01/2025`)`

Lecture 2_class_notes.pdf

4MB

PDF

InsertionSort:

Technic ==> Incremental

Pseudocode code

(A, n)
    for i = 2 to n:
        key = arr[i]
        j = i - 1
        while j > 0 and arr[j] > key:
            arr[j + 1] = arr[j]
            j = j - 1
        arr[j + 1] = key

// Python Code of Insertion sort
array = [6, 5, 4, 3, 1, 2]
print("Initial Array =", array)
for i in range(1,len(array)):
  ele = array[i]
  j = i-1
  while j>=0 and ele< array[j]:
    array[j+1] = array[j]
    j = j-1
  array[j+1] = ele
  print("i =", i, "Then Array =", array)

// Dry Run
Initial Array = [6, 5, 4, 3, 1, 2]
i = 1 Then Array = [5, 6, 4, 3, 1, 2]
i = 2 Then Array = [4, 5, 6, 3, 1, 2]
i = 3 Then Array = [3, 4, 5, 6, 1, 2]
i = 4 Then Array = [1, 3, 4, 5, 6, 2]
i = 5 Then Array = [1, 2, 3, 4, 5, 6]

Measure the Run Time

Code it and check the runtime

Drawback:

Depend upon the size of Array
Hardware depends

Run Time

No of basic operation an algo perform as function of input size f(n)

let's take the example of InsertionSort

Basic operation

No of times

Total Time

for i = 2 to n:

key = arr[i]

N-1

j = i - 1

N-1

while j > 0 and arr[j] > key:

Σti i = 2..n

2 *Σti

arr[j + 1] = arr[j]

Σt(i-1) i = 2..n

Σt(i-1)

j = j - 1

Σt(i-1) i = 2..n

Σt(i-1)

arr[j + 1] = key

N-1

T(n)  =   Basic operation  * No of times
T(n)  =  2n + 3(n-1) + 2 Σ times + 2 Σ (ti-1)

Best case scenario:

when ti = 1
= 2n + 3(n-1) + 2(n-1) + 0 
  = 7n - 5 ==> an + b where a and b are const
= O(n)

Worst case scenario

when ti = i
= 2n + 3(n-1) + 2(n(n+1)/2-1) 2(n-1)/2-1
= a*n2 + B, an2 + b where a and b are const
= O(n2)

Why only leading term matters

Larger power take more time to execute for large number of data set it might take less time for smaller vales depending on the expression. We can prove this by graph

Lecture 3:(18/01/2025`)`

Lecture 3_class_notes.pdf

56MB

PDF

Merge Sort

Technic => Divide and conquer

Divide => one or more smaller problem

conquer => Solve the sub problem using recursively

Combine => Merge the solution of small problem and create solution of problem

Pseudocode code

call => MergeSort(arr, 1, length(arr))
MergeSort(arr, s, e):
    if s >= e:
        return
    mid = s + (e - s)/2
    MergeSort(arr, s, mid)   
    MergeSort(arr, mid+1, e)
    Merge(arr, arr[s: mid], arr[mid+1: e])   
// end of method MergeSort
Merge(arr, left, right):
    i = 1, j = 1, k = 1
    n = length(left)
    m = length(right)
    while i <=n and j <=m:
        if  left[i]<= right[j]:
            arr[k]= left[i]
            i++
         else    
             arr[k]= right[j]
             j++
         k++
         //end of while
    while i <=n:
            arr[k]= left[i]
            i++, k++     
    while j <=m:
            arr[k]= right[j]
            j++, k++
    //end of Merge method

Complexity

T(n)=2^kT(n/2 ^k)+cn

Recurrence Reaction

A is a equation a/c to which a term of sequence of no is equal to some combination of previous term

Recursive Tree Method

T(n) = 3*T(n/4) +cn^2

Height => log4(n+1)

Length => (Log4(n+1))3

Total Time => O(n2)

Lecture 4:(19/01/2025`)`

Lecture 4_class_notes.pdf

25MB

PDF

Self learning: Stack, queue and Linked-list

Hashing

It is typically to maintain a dynamic set that support three operations => CRD

Dynamic set: Set of elements where every element has key

Direct-address Tables (Array)

Every election of Dynamic set has a distinct key

Every element has a key. Like in array index is key

Key = {0,1,… n-1}

Cons => Space issue

To tackle this problem we use Hash Tables

Hash Tables

In hash table denoted by T[0: m-1], whr m< n

Key = {0,1,… n-1}

Hash function

T[h(x.key)], where h: U {0,1…m-1}

Issue when two values map on same key this is called Collision.

Chaining

to handele collision
we create a linked list
we will map the key to a linked list and that connect the new element

Lecture 5: (25/01/2025`)`

3MB

Lecture 5.pdf

PDF

Independent Uniform Hashing

Definition: A hash function $h: U \rightarrow \{0, 1, \ldots, m-1\}$ maps keys from a universe $U = \{0, 1, \ldots, n-1\}$ to a hash table $T[0: m-1]$ ( $m < n$ ) such that:
- Each key $k$ is assigned a uniformly random slot $h(k)$ .
- Repeated calls for the same $k$ return the same $h(k)$ .
Expected Chain Length:
- For slot $j$ , define random variables $X_0, X_1, \ldots, X_{n-1}$ , where $X_i = 1$ if $h(i) = j$ , else $0$ .
- The chain length $T_j = X_0 + X_1 + \ldots + X_{n-1}$ .
- Using linearity of expectation:
  $\mathbb{E}[T_j] = \sum_{i=0}^{n-1} \mathbb{E}[X_i] = n \cdot \frac{1}{m} = \frac{n}{m}$
- When $n = m$ , search time is $O(1)$ .

Open Addressing

Concept: All elements are stored directly in the hash table. Collisions are resolved by probing subsequent slots.
Probe Sequence: A permutation of $\langle h(k, 0), h(k, 1), \ldots, h(k, m-1) \rangle$ for key $k$ .
Operations:
- Insert: Iterate through probe sequence until an empty slot is found.
  OA-INSERT(T, x): i = 0 do: q = h(x.key, i) if T[q] == NIL: T[q] = x return q else: i += 1 while i ≠ m error "Hash table overflow"
- Search: Follow probe sequence until the key is found or a NIL slot is encountered.
- Delete: Marking a slot as NIL can break probe sequences for other keys, requiring careful handling.
Probing Methods:
- Linear Probing: $h(k, i) = (h'(k) + i) \mod m$ .
- Double Hashing: $h(k, i) = (h_1(k) + i \cdot h_2(k)) \mod m$ , where $\gcd(h_2(k), m) = 1$ .

Binary Search Trees (BSTs)

Structure: A binary tree where each node contains:
- Key and satellite data.
- Pointers to parent, left child, and right child.
BST Property: For any node $x$ :
- Nodes in the left subtree have keys $\leq x$ .key.
- Nodes in the right subtree have keys $\geq x$ .key.
Supported Operations:
- Insert, Search, Delete.
- Find minimum/maximum.
- Find successor/predecessor.

Summary

Hashing: Independent uniform hashing ensures an expected chain length of $\frac{n}{m}$ . Open addressing resolves collisions via probing but requires careful deletion handling.
BSTs: Maintain dynamic sets with efficient operations by preserving the BST property. Probing methods like linear and double hashing enable collision resolution in open addressing.

Lecture 6: (02/02/2025`)`

1MB

Lecture 6.pdf

PDF

Binary Search Tree

Lecture 7: (08/02/2025`)`

920KB

Lecture 7.pdf

PDF

Red-Black Tree Deletion: Step-by-Step GuideMedium

Red-Black Trees

All you need to about R-B is added in this blog:

Self-balancing BSTs that maintain O(log n) height through these properties:

Every node is red or black
Root is always black
All Nil leaves are black
Red nodes cannot have red children
Equal black nodes count (black height) on all root-to-leaf paths[1]

Lecture 8: (09/02/2025`)`

1019KB

Lecture 8.pdf

PDF

Red-Black Trees are balanced binary search trees that maintain logarithmic height through color-based constraints. This lecture covers their height properties and insertion mechanics through detailed proofs and case analyses.

Lecture 9: (15/02/2025`)`

Class canceled

Lecture 9: (16/02/2025`)`

703KB

Lecture 9.pdf

PDF

Red-Black Tree Deletion: Step-by-Step GuideMedium

Red-Black Tree Deletion

All you need to about R-B deletion is added in this blog:

Lecture 10: (22/02/2025`)`

592KB

Lecture 10.pdf

PDF

Red-Black Tree Deletion Cases

Lecture 11: (23/02/2025`)`

Minor Exam

576KB

Minor.pdf

PDF

Lecture 11: (01/03/2025`)`

767KB

Lecture 11.pdf

PDF

Augmenting Data Structures,.pdf

Augmenting Data Structure

546KB

PDF

It will be the extension of RB tree with extra value number of element it contain in it’s sub tree.

Finding the ith element

FUNCTION Ith_element(node, i):
1.  IF node == NULL:
2.      RETURN NULL  // i is out of range

3.  Left_node_size = 0
4.  IF node.left ≠ NULL:
5.      Left_node_size = node.left.size

6.  IF i == Left_node_size + 1:
7.      RETURN node.key  // Found the i-th element

8.  ELSE IF i ≤ Left_node_size:
9.      RETURN Ith_element(node.left, i)  // Traverse in left subtree

10. ELSE:
11.     RETURN Ith_element(node.right, i - (Left_node_size + 1))  // Traverse in right subtree

Finding the Rank of element

FUNCTION rank(T, key):
1.  x = find_node(T.root, key)  // Find the node containing 'key'
2.  IF x == NULL:
3.      RETURN 0  // Key not found

4.  r = x.left.size + 1
5.  y = x
6.  while y ≠ T.root:
7.      if y == y.p.right:
8.          r = r + y.p.left.size + 1
9.      y = y.p
10. return r

FUNCTION find_node(node, key):
11. WHILE node ≠ NULL AND node.key ≠ key:
12.     IF key < node.key:
13.         node = node.left
14.     ELSE:
15.         node = node.right
16. RETURN node

Lecture 12: (02/03/2025`)`

657KB

Lecture 12.pdf

PDF

Data Structure

Disjoint-set is an advanced data structure used for handling collections of disjoint dynamic sets.
Also known as Union-Find or Merge-Find Set.
Commonly used in graph algorithms to determine connected components.

Sets

A set is a collection of distinct objects.
nion: Combines two sets into one.
Intersection: Finds common elements between two sets.
Disjoint Sets: Two sets are disjoint if they have no common elements.

32KB

Disjoint Set.pdf

PDF

Core Operations

The disjoint-set data structure supports three fundamental operations:

1. MAKE-SET(x) Creates a new set containing only element x.

MAKE-SET(1), MAKE-SET(2)
Sets: {1}, {2}

2. UNION(x, y) Merges the sets containing x and y into one.

UNION(2, 6) → {2, 6}

3. FIND-SET(x) Finds the representative (leader) of the set containing x.

EditFIND-SET(2) → 1 (if 1 is the representative)

Implementation

The most efficient implementation uses a forest of trees:

Each element is a node in a tree
Each tree represents one subset
The root of each tree serves as the representative of that set
Initially, each element forms its own set (is its own parent)

Optimizations (Heuristics)

1. Union by Rank

Always attach the tree with fewer nodes to the root of the larger tree.
Ensures shallower trees, improving efficiency.

2. Path Compression

During FIND-SET(x), update each node on the path to directly point to the root.
Makes future queries much faster

Applications

Graph Algorithms:
- Handling queries about connectivity in graphs that change over time.
- Kruskal's Algorithm for Minimum Spanning Tree.
Networking: Finding clusters in a network.
Image Processing: Segmenting images into different regions based on pixel similarity

checkout this blog for more information about Disjoint

Lecture 13: (08/03/2025`)`

931KB

Lecture 13.pdf

PDF

Union-Find Disjoint Sets (UFDS) - VisuAlgovisualgo.net

Disjoint As Tree

MAKE-SET

function MakeSet(x) is
    if x is not already in the forest then
        x.parent := x
        x.size := 1    // if nodes store size
        x.rank := 0    // if nodes store rank
    end if
end function

Complexity of Make-Set is O(1)

FIND-SET

function FIND-SET(x) is
    if x.parent == x then
     return x
    else 
     return FIND-SET(x.parent)
    end if
end function

Complexity of Find-Set is O(log n)

UNION

function UNION(x, y) is
    x_root := FIND-SET(x)
    y_root := FIND-SET(y)
    if x_root == y_root then
        return
    end if
    if x_root.rank < y_root.rank then
        x_root.parent := y_root
    else if x_root.rank > y_root.rank then
        y_root.parent := x_root
    else
        // If ranks are the same, make one point to the other and increment its rank
        y_root.parent := x_root
        x_root.rank := x_root.rank + 1
    end if
end function

while making node we perform Union and In Union we perform rank check operation not just adding in parallel

By this we can say, In every tree the max height is O(logn) so Find node complexity = O(logn) Complexity of Make-Set is 2*O(log n) ⇒ O(log n)

Example of Disjoint-Tree

Path compression Heuristic

Path compression is an optimization in the Find operation that flattens the tree structure by making nodes point directly to the root.

Why We Do It?

To speed up future Find operations by reducing tree depth.
Helps in keeping the tree nearly flat, making Union-Find operations almost constant time O(α(n)), where α(n) is the inverse Ackermann function (very slow-growing).

Benefit

Improves efficiency, making Find(x) nearly O(1) for large datasets.
Reduces redundant traversal, optimizing memory and computation in Disjoint Set operations. 🚀

Lecture 14: (09/03/2025`)`

4MB

Lecture 14.pdf

PDF

Graph

Graphs in Real Life Graphs are used to represent relationships in real-world scenarios, such as airline networks, job assignments, and network connectivity.
Undirected and Directed Graphs An undirected graph consists of vertices connected by edges with no direction, whereas a directed graph has edges with specified directions.
Graph Representation Graphs can be represented using adjacency lists (efficient for sparse graphs) or adjacency matrices (efficient for checking edge existence).
Basic Terminology Concepts like adjacency, degree of a vertex, paths, cycles, and shortest distance help describe relationships between nodes in a graph.
Graph Applications Graph theory is applied in solving shortest path problems, network flow optimization, social network analysis, and circuit design.

Lecture 15: (15/03/2025`)`

2MB

Lecture 15.pdf

PDF

Breadth-first Search, Dijkstra

Lecture 16: (22/03/2025`)`

1MB

Lecture 16.pdf

PDF

Dijkstra, Flow Networks

Lecture 17: (23/03/2025`)`

1MB

Lecture 17.pdf

PDF

Flow Networks, Ford-Fulkerson Method

Lecture 18: (29/03/2025`)`

2MB

Lecture 18.pdf

PDF

Bipartite Maximum Matching

Bipartite Graph

Bipartite matching

Maximum Bipartite matching

Bipartite Matching

5MB

Bipartite Matching.pdf

PDF

Lecture 19: (30/03/2025`)`

1MB

Lecture 19.pdf

PDF

Dynamic Programming

Dynamic Programming (DP) is a powerful algorithmic technique to solve complex problems by breaking them down into simpler subproblems. It is particularly effective for optimization problems that exhibit two key properties: overlapping subproblems and optimal substructure

Road cutting problem

The Rod Cutting Problem is a classic optimization challenge where the objective is to determine the optimal way to cut a rod of length n into smaller segments to maximize total revenue, given a price table for each segment length

Lecture 20: (06/03/2025`)`

1MB

Lecture 20.pdf

PDF

Dynamic Programming (Longest Common Subsequence)www.cs.usfca.edu

More Dynamic Programming Examples

LCS Dynamic Programming

Subsequence Definition: Dropping 0+ characters while preserving order.
LCS Problem: Find longest subsequence common to two strings.
Applications: Measures similarity (e.g., DNA, text comparison).
Brute Force Approach: Exponential time, checks all subsequences.
Optimal Substructure: LCS can be built from solutions of subproblems.
Recurrence Relation: Forms the DP backbone with match/mismatch rules.
Overlapping Subproblems: Justifies using memoization/tabulation.
Top-Down DP: Recursive + memoization avoids recomputation.
Bottom-Up DP: Iteratively fills DP table using recurrence.
Top-Down vs Bottom-Up: Trade-off between recursion and full coverage.
Time Complexity: Both DP methods run in O(nm).

Test you answer here:

Lecture 21: (06/03/2025`)`

692KB

Lecture 21.pdf

PDF