databaseData Structures

Total lecture 18

chevron-rightCourse Contenthashtag

A tentative list of topics that will be covered in this course are:

Introduction to Algorithms, Insertion Sort, Divide and Conquer, Merge Sort, Solving Recurrences, Hashing, Binary Search Tree, Red Black Tree, Augmenting Data Structures, Disjoint Sets, Graphs, Dijkstra's Algorithm, Maximum Flow, Matching, String Matching, KMP Alogrithm, Dynamic Programming (Rod Cutting, LCS, etc. ), Complexity classes P, NP, co-NP, NP-complete, NP-Hard, Example reductions between problems.

chevron-rightExamshashtag
  • 55% internal

    • 30% for 2 assignment

      • 1st Deadline ⇒ 23Feb

    • 25% Quiz

      • 23 feb ⇒ 1st Quiz

  • 45% Main

chevron-rightMaterial hashtag

Lecture 1: (11/01/2025)

Class Recordingarrow-up-right

What is Algo

Problem

When we say algo is solving

Lecture 2:(12/01/2025)

Class Recordingarrow-up-right

InsertionSort:

Technic ==> Incremental

Measure the Run Time

Code it and check the runtime

Drawback:

  • Depend upon the size of Array

  • Hardware depends

Run Time

No of basic operation an algo perform as function of input size f(n)

let's take the example of InsertionSort

Basic operation

No of times

Total Time

for i = 2 to n:

2

N

2n

key = arr[i]

1

N-1

N-1

j = i - 1

1

N-1

N-1

while j > 0 and arr[j] > key:

2

Σti i = 2..n

2 *Σti

arr[j + 1] = arr[j]

1

Σt(i-1) i = 2..n

Σt(i-1)

j = j - 1

1

Σt(i-1) i = 2..n

Σt(i-1)

arr[j + 1] = key

1

N-1

N-1

Best case scenario:

Worst case scenario

Why only leading term matters

Larger power take more time to execute for large number of data set it might take less time for smaller vales depending on the expression. We can prove this by graph

Lecture 3:(18/01/2025)

Class Recordingarrow-up-right

Merge Sort

Technic => Divide and conquer

Divide => one or more smaller problem

conquer => Solve the sub problem using recursively

Combine => Merge the solution of small problem and create solution of problem

Complexity

T(n)=2kT(n/2k)+cnT(n)=2^kT(n/2 ^k)+cn
T(n) = c’NlogN +cN => O(NlogN) (b/c it is leading term)

Recurrence Reaction

A is a equation a/c to which a term of sequence of no is equal to some combination of previous term

Recursive Tree Method

T(n)=3T(n/4)+cn2T(n) = 3*T(n/4) +cn^2

level => LogN +1

Height => log4(n+1)

Length => (Log4(n+1))3

Total Time => O(n2)

Lecture 4:(19/01/2025)

Class Recordingarrow-up-right

Self learning: Stack, queue and Linked-list

Hashing

It is typically to maintain a dynamic set that support three operations => CRD

Dynamic set: Set of elements where every element has key

Direct-address Tables (Array)

Every election of Dynamic set has a distinct key

Every element has a key. Like in array index is key

Key = {0,1,… n-1}

Cons => Space issue

To tackle this problem we use Hash Tables

Hash Tables

In hash table denoted by T[0: m-1], whr m< n

Key = {0,1,… n-1}

Hash function

T[h(x.key)],whereh:U0,1m1T[h(x.key)], where h: U {0,1…m-1}

Issue when two values map on same key this is called Collision.

2 is Collision point

Chaining

  • to handele collision

  • we create a linked list

  • we will map the key to a linked list and that connect the new element

Lecture 5: (25/01/2025)

Class Recordingarrow-up-right:

file-pdf
3MB

Independent Uniform Hashing

  • Definition: A hash function h:U{0,1,,m1}h: U \rightarrow \{0, 1, \ldots, m-1\} maps keys from a universe U={0,1,,n1}U = \{0, 1, \ldots, n-1\} to a hash table T[0:m1]T[0: m-1] (m<nm < n) such that:

    • Each key kk is assigned a uniformly random slot h(k)h(k).

    • Repeated calls for the same kk return the same h(k)h(k).

  • Expected Chain Length:

    • For slot jj, define random variables X0,X1,,Xn1X_0, X_1, \ldots, X_{n-1}, where Xi=1X_i = 1 if h(i)=jh(i) = j, else 00.

    • The chain length Tj=X0+X1++Xn1T_j = X_0 + X_1 + \ldots + X_{n-1}.

    • Using linearity of expectation:

      E[Tj]=i=0n1E[Xi]=n1m=nm\mathbb{E}[T_j] = \sum_{i=0}^{n-1} \mathbb{E}[X_i] = n \cdot \frac{1}{m} = \frac{n}{m}
    • When n=mn = m, search time is O(1)O(1).


Open Addressing

  • Concept: All elements are stored directly in the hash table. Collisions are resolved by probing subsequent slots.

  • Probe Sequence: A permutation of h(k,0),h(k,1),,h(k,m1)\langle h(k, 0), h(k, 1), \ldots, h(k, m-1) \rangle for key kk.

  • Operations:

    • Insert: Iterate through probe sequence until an empty slot is found.

    • Search: Follow probe sequence until the key is found or a NIL slot is encountered.

    • Delete: Marking a slot as NIL can break probe sequences for other keys, requiring careful handling.

  • Probing Methods:

    • Linear Probing: h(k,i)=(h(k)+i)modmh(k, i) = (h'(k) + i) \mod m.

    • Double Hashing: h(k,i)=(h1(k)+ih2(k))modmh(k, i) = (h_1(k) + i \cdot h_2(k)) \mod m, where gcd(h2(k),m)=1\gcd(h_2(k), m) = 1.


Binary Search Trees (BSTs)

  • Structure: A binary tree where each node contains:

    • Key and satellite data.

    • Pointers to parent, left child, and right child.

  • BST Property: For any node xx:

    • Nodes in the left subtree have keys x\leq x.key.

    • Nodes in the right subtree have keys x\geq x.key.

  • Supported Operations:

    • Insert, Search, Delete.

    • Find minimum/maximum.

    • Find successor/predecessor.


Summary

  • Hashing: Independent uniform hashing ensures an expected chain length of nm\frac{n}{m}. Open addressing resolves collisions via probing but requires careful deletion handling.

  • BSTs: Maintain dynamic sets with efficient operations by preserving the BST property. Probing methods like linear and double hashing enable collision resolution in open addressing.

Lecture 6: (02/02/2025)

Class Recordingarrow-up-right

file-pdf
1MB

Binary Search Tree

Read this

Lecture 7: (08/02/2025)

Class Recordingarrow-up-right:

file-pdf
920KB

Red-Black Trees

All you need to about R-B is added in this blog:

Self-balancing BSTs that maintain O(log n) height through these properties:

  1. Every node is red or black

  2. Root is always black

  3. All Nil leaves are black

  4. Red nodes cannot have red children

  5. Equal black nodes count (black height) on all root-to-leaf paths[1]

Lecture 8: (09/02/2025)

Class Recordingarrow-up-right:

file-pdf
1019KB

Red-Black Trees are balanced binary search trees that maintain logarithmic height through color-based constraints. This lecture covers their height properties and insertion mechanics through detailed proofs and case analyses.

Lecture 9: (15/02/2025)

Class canceled

Lecture 9: (16/02/2025)

Class Recording:arrow-up-right

file-pdf
703KB

Red-Black Tree Deletion

All you need to about R-B deletion is added in this blog:

Lecture 10: (22/02/2025)

Class Recording:arrow-up-right

file-pdf
592KB
  • Red-Black Tree Deletion Cases

Lecture 11: (23/02/2025)

Minor Exam

file-pdf
576KB

Lecture 11: (01/03/2025)

Class Recording:arrow-up-right

file-pdf
767KB

Augmenting Data Structure

It will be the extension of RB tree with extra value number of element it contain in it’s sub tree.

Finding the ith element

Finding the Rank of element

Lecture 12: (02/03/2025)

Class Recording:arrow-up-right

file-pdf
657KB

Data Structure

  • Disjoint-set is an advanced data structure used for handling collections of disjoint dynamic sets.

  • Also known as Union-Find or Merge-Find Set.

  • Commonly used in graph algorithms to determine connected components.

chevron-rightSetshashtag
  • A set is a collection of distinct objects.

  • nion: Combines two sets into one.

  • Intersection: Finds common elements between two sets.

  • Disjoint Sets: Two sets are disjoint if they have no common elements.

file-pdf
32KB

Core Operations

The disjoint-set data structure supports three fundamental operations:

1. MAKE-SET(x) Creates a new set containing only element x.

2. UNION(x, y) Merges the sets containing x and y into one.

3. FIND-SET(x) Finds the representative (leader) of the set containing x.

Implementation

The most efficient implementation uses a forest of trees:

  • Each element is a node in a tree

  • Each tree represents one subset

  • The root of each tree serves as the representative of that set

  • Initially, each element forms its own set (is its own parent)

Optimizations (Heuristics)

1. Union by Rank

  • Always attach the tree with fewer nodes to the root of the larger tree.

  • Ensures shallower trees, improving efficiency.

2. Path Compression

  • During FIND-SET(x), update each node on the path to directly point to the root.

  • Makes future queries much faster

Applications

  • Graph Algorithms:

    • Handling queries about connectivity in graphs that change over time.

    • Kruskal's Algorithm for Minimum Spanning Tree.

  • Networking: Finding clusters in a network.

  • Image Processing: Segmenting images into different regions based on pixel similarity

checkout this blog arrow-up-rightfor more information about Disjoint

Lecture 13: (08/03/2025)

Class Recording:arrow-up-right

file-pdf
931KB

Disjoint As Tree

MAKE-SET

Complexity of Make-Set is O(1)

FIND-SET

Complexity of Find-Set is O(log n)

UNION

while making node we perform Union and In Union we perform rank check operation not just adding in parallel

By this we can say, In every tree the max height is O(logn) so Find node complexity = O(logn) Complexity of Make-Set is 2*O(log n) ⇒ O(log n)

Example of Disjoint-Tree

Path compression Heuristic

Path compression is an optimization in the Find operation that flattens the tree structure by making nodes point directly to the root.

Why We Do It?

  • To speed up future Find operations by reducing tree depth.

  • Helps in keeping the tree nearly flat, making Union-Find operations almost constant time O(α(n)), where α(n) is the inverse Ackermann function (very slow-growing).

Benefit

  • Improves efficiency, making Find(x) nearly O(1) for large datasets.

  • Reduces redundant traversal, optimizing memory and computation in Disjoint Set operations. 🚀

Lecture 14: (09/03/2025)

Class Recording:arrow-up-right

file-pdf
4MB

Graph

  1. Graphs in Real Life Graphs are used to represent relationships in real-world scenarios, such as airline networks, job assignments, and network connectivity.

  2. Undirected and Directed Graphs An undirected graph consists of vertices connected by edges with no direction, whereas a directed graph has edges with specified directions.

  3. Graph Representation Graphs can be represented using adjacency lists (efficient for sparse graphs) or adjacency matrices (efficient for checking edge existence).

  4. Basic Terminology Concepts like adjacency, degree of a vertex, paths, cycles, and shortest distance help describe relationships between nodes in a graph.

  5. Graph Applications Graph theory is applied in solving shortest path problems, network flow optimization, social network analysis, and circuit design.

Lecture 15: (15/03/2025)

Class Recording:arrow-up-right

file-pdf
2MB

Breadth-first Search, Dijkstra

Lecture 16: (22/03/2025)

Class Recording:arrow-up-right

file-pdf
1MB

Dijkstra, Flow Networks

Lecture 17: (23/03/2025)

Class Recording:arrow-up-right

file-pdf
1MB

Flow Networks, Ford-Fulkerson Method

Lecture 18: (29/03/2025)

Class Recording:arrow-up-right

file-pdf
2MB

Bipartite Maximum Matching

Bipartite Graph

Bipartite matching

Maximum Bipartite matching

Bipartite Matching

Lecture 19: (30/03/2025)

Class Recording:arrow-up-right

file-pdf
1MB

Dynamic Programming

Dynamic Programming (DP) is a powerful algorithmic technique to solve complex problems by breaking them down into simpler subproblems. It is particularly effective for optimization problems that exhibit two key properties: overlapping subproblems and optimal substructure

Road cutting problem

​The Rod Cutting Problem is a classic optimization challenge where the objective is to determine the optimal way to cut a rod of length n into smaller segments to maximize total revenue, given a price table for each segment length

Lecture 20: (06/03/2025)

Class Recording:arrow-up-right

file-pdf
1MB

More Dynamic Programming Examples

LCS Dynamic Programming

  • Subsequence Definition: Dropping 0+ characters while preserving order.

  • LCS Problem: Find longest subsequence common to two strings.

  • Applications: Measures similarity (e.g., DNA, text comparison).

  • Brute Force Approach: Exponential time, checks all subsequences.

  • Optimal Substructure: LCS can be built from solutions of subproblems.

  • Recurrence Relation: Forms the DP backbone with match/mismatch rules.

  • Overlapping Subproblems: Justifies using memoization/tabulation.

  • Top-Down DP: Recursive + memoization avoids recomputation.

  • Bottom-Up DP: Iteratively fills DP table using recurrence.

  • Top-Down vs Bottom-Up: Trade-off between recursion and full coverage.

  • Time Complexity: Both DP methods run in O(nm).

Test you answer here:

Lecture 21: (06/03/2025)

Class Recording:arrow-up-right

file-pdf
692KB

P, NP, and NP-completeness

Search Vs Decision Problem

In computational complexity theory, search and decision problems represent two fundamental types of computational challenges. A decision problem asks a yes-or-no question based on given input values. For example: • Is a given number prime? • Does x evenly divide y? • Is a formula satisfiable? A search problem requires finding an actual solution rather than just determining its existence. For example: • Find a satisfying assignment for a formula • Find a clique of size k in a graph • Find a prime factor of a number

While both decision and search problems are fundamental in computational complexity theory, decision problems do hold several important advantages over search problems

Decision problems are often more tractable computationally. They typically require less computational resources since they only need to determine existence rather than construct a complete solution.

If a decision problem is not solvable in polynomial time, then its corresponding search problem cannot be solved in polynomial time either.

Complexity Class P

Topics of this lecture

  1. Search vs. Decision Problems Search problems find solutions; decision problems answer Yes/No.

  2. Complexity Class P P contains problems solvable in polynomial time.

  3. Complexity Class NP NP contains problems verifiable in polynomial time.

  4. P vs NP The key question: Is every NP problem also in P?

  5. NP-Completeness NP-complete problems are the hardest in NP; solving one solves all.

  6. SAT and 3SAT SAT and 3SAT are classic NP-complete problems involving Boolean formulas.

  7. NP-Complete Network Various NP-complete problems are interlinked by polynomial-time reductions.

Last updated