Data Structure and Algorithm of TU IT Adhikrit

 

Data Structure

A data structure is a way of organizing and storing data in a computer so that it can be accessed and used efficiently. It refers to the logical or mathematical representation of data, as well as the implementation in a computer program.


Classification:

Data structures can be classified into two broad categories:


Linear Data Structure: A data structure in which data elements are arranged sequentially or linearly, where each element is attached to its previous and next adjacent elements, is called a linear data structure. Examples are array, stack, queue, etc.

Non-linear Data Structure: Data structures where data elements are not placed sequentially or linearly are called non-linear data structures. Examples are trees and graphs.


Applications of Data Structures:

Data structures are used in a wide range of computer programs and applications, including:

Databases: Data structures are used to organize and store data in a database, allowing for efficient retrieval and manipulation.

Operating systems: Data structures are used in the design and implementation of operating systems to manage system resources, such as memory and files.

Computer graphics: Data structures are used to represent geometric shapes and other graphical elements in computer graphics applications.

Artificial intelligence: Data structures are used to represent knowledge and information in artificial intelligence systems.


Advantages of Data Structures:

The use of data structures provides several advantages, including:

Efficiency: Data structures allow for efficient storage and retrieval of data, which is important in applications where performance is critical.

Flexibility: Data structures provide a flexible way to organize and store data, allowing for easy modification and manipulation.

Reusability: Data structures can be used in multiple programs and applications, reducing the need for redundant code.

Maintainability: Well-designed data structures can make programs easier to understand, modify, and maintain over time.

Abstract Data Types

An Abstract Data Type (ADT) is a conceptual model that defines a set of operations and behaviors for a data structure, without specifying how these operations are implemented or how data is organized in memory. The definition of ADT only mentions what operations are to be performed but not how these operations will be implemented. It does not specify how data will be organized in memory and what algorithms will be used for implementing the operations. It is called "abstract" because it provides an implementation-independent view.

The process of providing only the essentials and hiding the details is known as abstraction.


Features of ADT

Abstract data types (ADTs) are a way of encapsulating data and operations on that data into a single unit. Some of the key features of ADTs include:

  • Abstraction: The user does not need to know the implementation of the data structure only essentials are provided.

  • Better Conceptualization: ADT gives us a better conceptualization of the real world.

  • Robust: The program is robust and has the ability to catch errors.

  • Encapsulation: ADTs hide the internal details of the data and provide a public interface for users to interact with the data. This allows for easier maintenance and modification of the data structure.

  • Data Abstraction: ADTs provide a level of abstraction from the implementation details of the data. Users only need to know the operations that can be performed on the data, not how those operations are implemented.


List ADT

Lists are linear data structures in which data is stored in a non - continuous fashion. List consists of data storage boxes called 'nodes'. These nodes are linked to each other i.e. each node consists of the address of some other block. In this way, all the nodes are connected to each other through these links.

List −

size(), this function is used to get number of elements present into the list

insert(x), this function is used to insert one element into the list

remove(x), this function is used to remove given element from the list

get(i), this function is used to get element at position i

replace(x, y), this function is used to replace x with y value


Stack ADT −

The Stack ADT is a linear data structure that follows the LIFO (Last In, First Out) principle.Stack is a linear data structure in which data can be only accessed from its top. It only has two operations i.e. push (used to insert data to the stack top) and pop (used to remove data from the stack top).

function:-

isFull(), This is used to check whether stack is full or not

isEmpry(), This is used to check whether stack is empty or not

push(x), This is used to push x into the stack

pop(), This is used to delete one element from top of the stack

peek(), This is used to get the top most element of the stack

size(), this function is used to get number of elements present into the stack


Queue ADT −

Queue is a linear data structure in which data can be accessed from both of its ends i.e. front and rear. It only has two operations i.e. push (used to insert data to the rear of the queue) and pop (used to remove data from the front of the queue).

Funcions:- 

isFull(), This is used to check whether queue is full or not

isEmpry(), This is used to check whether queue is empty or not

insert(x), This is used to add x into the queue at the rear end

delete(), This is used to delete one element from the front end of the queue

size(), this function is used to get number of elements present into the queue


Stack


  • Stack is a linear data structure that follows LIFO (Last In First Out) Principle, the last element inserted is the first to be popped out. It means both insertion and deletion operations happen at one end onlyi.e top of the stack

  • It is named stack because it has the similar operations as the real-world stacks, for example − a pack of cards or a pile of plates, etc.


LIFO(Last In First Out) Principle

Here are some real world examples of LIFO

Consider a stack of plates. When we add a plate, we add at the top. When we remove, we remove from the top.

A shuttlecock box (or any other box that is closed from one end) is another great real-world example of the LIFO (Last In, First Out) principle where do insertions and removals from the same

Stack Representation

A stack allows all data operations at one end only. At any given time, we can only access the top element of a stack.


Basic Operations on Stack:

In order to make manipulations in a stack, there are certain operations provided to us.

  • push() to insert an element into the stack

  • pop() to remove an element from the stack

  • top() Returns the top element of the stack.

  • isEmpty() returns true if stack is empty else false.

  • isFull() returns true if the stack is full else false.

To implement stack, we need to maintain reference to the top item.


Push Operation on Stack

The push operation adds an element to the top of the stack. This is the primary way to insert data into the stack.

Adds an item to the stack. If the stack is full, then it is said to be an Overflow condition.

Algorithm for Push Operation:

Before pushing the element to the stack, we check if the stack is full .

If the stack is full (top == capacity-1) , then Stack Overflows and we cannot insert the element to the stack.

Otherwise, we increment the value of top by 1 (top = top + 1) and the new value is inserted at top position .

The elements can be pushed into the stack till we reach the capacity of the stack.


Pop Operation in Stack

The pop operation removes the top element from the stack and returns it. This operation follows the LIFO principle, ensuring that the last element added is the first to be removed.

Removes an item from the stack. The items are popped in the reversed order in which they are pushed. If the stack is empty, then it is said to be an Underflow condition.

Algorithm for Pop Operation:

Before popping the element from the stack, we check if the stack is empty .

If the stack is empty (top == -1), then Stack Underflows and we cannot remove any element from the stack.

Otherwise, we store the value at top, decrement the value of top by 1 (top = top – 1) and return the stored top value.


Top or Peek Operation on Stack

The peek or top operation retrieves the top element of the stack without removing it. This allows you to see what is at the top of the stack without modifying its content.

Returns the top element of the stack.

Algorithm for Top Operation:

Before returning the top element from the stack, we check if the stack is empty.

If the stack is empty (top == -1), we simply print “Stack is empty”.

Otherwise, we return the element stored at index = top .


IsEmpty

The isEmpty operation checks whether the stack is empty. This is useful for preventing errors such as popping from an empty stack, which can result in underflow.

Returns true if the stack is empty, else false.

Algorithm for isEmpty Operation:

Check for the value of top in stack.

If (top == -1), then the stack is empty so return true .

Otherwise, the stack is not empty so return false .


isFull Operation in Stack Data Structure:

This operation checks if the stack has reached its maximum capacity. This is particularly relevant for stacks implemented with a fixed size, like in array-based implementations.

Returns true if the stack is full, else false.

Algorithm for isFull Operation:

Check for the value of top in stack.

If (top == capacity-1), then the stack is full so return true.

Otherwise, the stack is not full so return false.


Application of Stack in Data Structures

Here are the top 7 applications of the stack in data structure:

Expression Evaluation and Conversion

Backtracking

Function Call

Parentheses Checking

String Reversal

Syntax Parsing

Memory Management

Now you will understand all the applications one at a time.


1. Expression Evaluation and Conversion

There are three types of expression that you use in programming, they are:

Infix Expression: An infix expression is a single letter or an operator preceded by one single infix string followed by another single infix string.

X

X + Y

(X + Y ) + (A - B)


Prefix Expression: A prefix expression is a single letter or an operator followed by two prefix strings.

+ X Y

+ + X Y - A B


Postfix Expression: A postfix expression (also called Reverse Polish Notation) is a single letter or an operator preceded by two postfix strings.

X

X Y +

X Y + C D - +

Similarly, the stack is used to evaluate these expressions and convert these expressions like infix to prefix or infix to postfix.


2. Backtracking

  • Backtracking is a recursive algorithm mechanism that is used to solve optimization problems.

  • To solve the optimization problem with backtracking, you have multiple solutions; it does not matter if it is correct. While finding all the possible solutions in backtracking, you store the previously calculated problems in the stack and use that solution to resolve the following issues.

  • The N-queen problem is an example of backtracking, a recursive algorithm where the stack is used to solve this problem.


3. Function Call

Whenever you call one function from another function in programming, the reference of calling function stores in the stack. When the function call is terminated, the program control moves back to the function call with the help of references stored in the stack.


4. Parentheses Checking

  • Stack in data structures is used to check if the parentheses like ( ), { } are valid or not in programing while matching opening and closing brackets are balanced or not.

  • So it stores all these parentheses in the stack and controls the flow of the program.

For e.g ((a + b) * (c + d)) is valid but {{a+b})) *(b+d}] is not valid.


5. String Reversal

  • Another exciting application of stack is string reversal. Each character of a string gets stored in the stack.

  • The string's first character is held at the bottom of the stack, and the last character of the string is held at the top of the stack, resulting in a reversed string after performing the pop operation.


6. Syntax Parsing

  • Since many programming languages are context-free languages, the stack is used for syntax parsing by many compilers.


7. Memory Management

  • Memory management is an essential feature of the operating system, so the stack is heavily used to manage memory.


In short

Applications of Stacks:

  • Function calls: Stacks are used to keep track of the return addresses of function calls, allowing the program to return to the correct location after a function has finished executing.

  • Recursion: Stacks are used to store the local variables and return addresses of recursive function calls, allowing the program to keep track of the current state of the recursion.

  • Expression evaluation: Stacks are used to evaluate expressions in postfix notation (Reverse Polish Notation).

  • Syntax parsing: Stacks are used to check the validity of syntax in programming languages and other formal languages.

  • Memory management: Stacks are used to allocate and manage memory in some operating systems and programming languages.

Queue

  • A queue is a linear data structure where elements are stored in the FIFO (First In First Out) principle where the first element inserted would be the first element to be accessed. A queue is an Abstract Data Type (ADT) similar to stack, the thing that makes queue different from stack is that a queue is open at both its ends. The data is inserted into the queue through one end and deleted from it using the other end. Queue is very frequently used in most programming languages.

  • A real-world example of queue can be a single-lane one-way road, where the vehicle enters first, exits first. More real-world examples can be seen as queues at the ticket windows and bus-stops.


Representation of Queue



Queue Operations

  • Enqueue: Adds an element to the end (rear) of the queue. If the queue is full, an overflow error occurs.

  • Dequeue: Removes the element from the front of the queue. If the queue is empty, an underflow error occurs.

  • Peek/Front: Returns the element at the front without removing it.

  • Size: Returns the number of elements in the queue.

  • isEmpty: Returns true if the queue is empty, otherwise false.

  • isFull: Returns true if the queue is full, otherwise false.


Queue uses two pointers − front and rear. The front pointer accesses the data from the front end (helping in enqueueing) while the rear pointer accesses data from the rear end (helping in dequeuing).


Queue Insertion Operation: Enqueue()

The enqueue() is a data manipulation operation that is used to insert elements into the stack. 

Adds an element to the end (rear) of the queue. If the queue is full, an overflow error occurs

Algorithm

1. START

2. Check if the queue is full.

3. If the queue is full, produce overflow error and exit.

4. If the queue is not full, increment rear pointer to point 

   the next empty space.

5. Add data element to the queue location, where the rear 

   is pointing.

6. return success.

7. END


Queue Deletion Operation: dequeue()

The dequeue() is a data manipulation operation that is used to remove elements from the stack. Removes the element from the front of the queue. If the queue is empty, an underflow error occurs.

Algorithm

1. START

2. Check if the queue is empty.

3. If the queue is empty, produce an underflow error and exit.

4. If the queue is not empty, access the data where the front is pointing.

5. Increment front pointer to point to the next available data element.

6. ReturQueue Deletion Operation: dequeue()

7. END


Queue - The peek() Operation

The peek() is an operation which is used to retrieve the frontmost element in the queue, without deleting it. This operation is used to check the status of the queue with the help of the pointer.

Algorithm

1. START

2. Return the element at the front of the queue

3. END


Queue - The isFull() Operation

The isFull() operation verifies whether the stack is full.

Algorithm

1. START

2. If the count of queue elements equals the queue size,

   return true

3. Otherwise, return false

4. END


Queue - The isEmpty() operation

The isEmpty() operation verifies whether the stack is empty. This operation is used to check the status of the stack with the help of top pointer.

Algorithm

1. START

2. If the count of queue elements equals zero, return true

3. Otherwise, return false

4. END


Operation 6: size()

This operation returns the size of the queue i.e. the total number of elements it contains.


Types of Queues

Queue data structure can be classified into 4 types:

  • Simple Queue: Simple Queue simply follows FIFO Structure. We can only insert the element at the back and remove the element from the front of the queue. A simple queue is efficiently implemented either using a linked list or a circular array.

  • Double-Ended Queue (Deque): In a double-ended queue the insertion and deletion operations, both can be performed from both ends. They are of two types:

Input Restricted Queue: This is a simple queue. In this type of queue, the input can be taken from only one end but deletion can be done from any of the ends.

Output Restricted Queue: This is also a simple queue. In this type of queue, the input can be taken from both ends but deletion can be done from only one end.

  • Priority Queue: A priority queue is a special queue where the elements are accessed based on the priority assigned to them. They are of two types:

Ascending Priority Queue: In Ascending Priority Queue, the elements are arranged in increasing order of their priority values. Element with smallest priority value is popped first.

Descending Priority Queue: In Descending Priority Queue, the elements are arranged in decreasing order of their priority values. Element with largest priority is popped first.


Applications of Queue

Queue, as the name suggests, is utilized when you need to regulate a group of objects in order. This data structure caters to the need for First Come First Serve problems in different software applications. The scenarios mentioned below are a few systems that use the queue data structure to serve their needs -

  • Printers: Queue data structure is used in printers to maintain the order of pages while printing.

  • Interrupt handling in computes: The interrupts are operated in the same order as they arrive, i.e., interrupt which comes first, will be dealt with first.

  • Process scheduling in Operating systems: Queues are used to implement round-robin scheduling algorithms in computer systems.

  • Switches and Routers: Both switch and router interfaces maintain ingress (inbound) and egress (outbound) queues to store packets.

  • Customer service systems: It develops call center phone systems using the concepts of queues.


Differences Between Stack and Queue

Here is a table that highlights the key differences between stack and queue data structures:


Definition

A linear data structure that follows the Last In First Out (LIFO) principle.

A linear data structure that follows the First In First Out (FIFO) principle.

Primary Operations

Push (add an item), Pop (remove an item), Peek (view the top item)

Enqueue (add an item), Dequeue (remove an item), Front (view the first item), Rear (view the last item)


Insertion/Removal

Elements are added and removed from the same end (the top).

Elements are added at the rear and removed from the front.


Use Cases

Function call management (call stack), expression evaluation and syntax parsing, undo mechanisms in text editors.

Scheduling processes in operating systems, managing requests in a printer queue, breadth-first search in graphs.

Examples

Browser history (back button), reversing a word.

Customer service lines, CPU task scheduling.


Real-World Analogy

A stack of plates: you add and remove plates from the top.

A queue at a ticket counter: the first person in line is the first to be served.


Complexity (Amortized)

Push: O(1), Pop: O(1), Peek: O(1)

Enqueue: O(1), Dequeue: O(1), Front: O(1), Rear: O(1)


Memory Structure

Typically uses a contiguous block of memory or linked list.

Typically uses a circular buffer or linked list.


Implementation

Can be implemented using arrays or linked lists.

Can be implemented using arrays, linked lists, or circular buffers.


Conclusion

Stacks and queues are fundamental data structures that serve different purposes based on their unique characteristics and operations. Stacks follow the LIFO principle and are used for backtracking, function call management, and expression evaluation. Queues follow the FIFO principle and are used for task scheduling, resource management, and breadth-first search algorithms. Understanding the differences between these two data structures helps in selecting the appropriate one for specific applications, leading to more efficient and effective programming solutions.

Linked List

A linked list is a linear data structure which can store a collection of "nodes" connected together via links i.e. pointers. Linked lists nodes are not stored at a contiguous location, rather they are linked using pointers to the different memory locations. A node consists of the data value and a pointer to the address of the next node within the linked list.

A linked list is a dynamic linear data structure whose memory size can be allocated or de-allocated at run time based on the operation insertion or deletion, this helps in using system memory efficiently. Linked lists can be used to implment various data structures like a stack, queue, graph, hash maps, etc.

A linked list starts with a head node which points to the first node. Every node consists of data which holds the actual data (value) associated with the node and a next pointer which holds the memory address of the next node in the linked list. The last node is called the tail node in the list which points to null indicating the end of the list.


OR,

  • A linked list in data structures is a collection of elements called nodes, where each node contains some data and a reference to the next node in the sequence. This setup allows elements to be easily added or removed without reorganizing the entire structure.

  • Linked lists are crucial in data structures and algorithms (DSA) because they provide a dynamic way to manage data, adapting the size of the structure as needed without the high overhead associated with arrays.


Structure or Representation of Linked List

Node: Each element of a linked list is called node. A node has two components:

Data: The actual value or information that needs to be stored.

Next: A pointer (or reference) to the next node in the linked list.

Head: The first node in a linked list is called the head. It is the entry point to the list and used as a reference point to traverse it.

Null: The last node of a linked list, which points to null, indicating the end of the list.


Types of Linked List in Data Structures

There are three primary types of linked lists:

1. Singly Linked List

A singly linked list is a collection of nodes where each node contains data and a reference to the next node in the sequence. This structure allows for a simple, one-directional traversal from the head to the end of the list.

Example:

Consider managing a simple task list. Each task points to the next one until the end:

Task1 -> Task2 -> Task3 -> null

Operations like adding or removing tasks involve adjusting the 'next' reference of the preceding node, which is straightforward but only allows moving forward through the list.


2. Doubly Linked List

Doubly linked lists expand on singly linked lists by having each node contain two references: one pointing to the next node and one to the previous node. This allows nodes to be traversed in both forward and backward directions.

Example:

Imagine a playlist where you can move to the next or previous song:

null <- PrevSong <-> CurrentSong <-> NextSong -> null

This setup is particularly useful for applications like image viewers or document viewers where users may need to go forward and backward between images or pages.

3. Circular Linked List

A circular linked list is a type of data structure where each node points to the next node, and the last node points back to the first, forming a circle. 

This setup allows for an endless cycle of node traversal, which can be particularly useful in applications where the end of the list naturally reconnects to the beginning.

Example:

In multiplayer games, a circular linked list manages player turns, cycling back to the first player after the last one finishes.


Application/Uses of Linked Lists

Data structure linked lists offer several practical uses across different applications:

1. Dynamic Memory Allocation

Linked lists are ideal for applications where the amount of data is unknown in advance and can change dynamically. They are extensively used in implementing memory management algorithms.


2. Implementing Stacks and Queues 

Linked lists provide a natural way to implement other abstract data types like stacks and queues. 

For example, a singly linked list can serve as a stack or queue by adding or removing nodes from one end.


3. Browser History Management

The functionality of back and forward navigation in web browsers can be effectively managed using a doubly linked list where each node points to the previous and next web pages visited.


4. Undo Functionality in Applications 

Linked lists are used in implementing undo mechanisms in software applications, where operations are stored in a list and can be reversed by traversing backwards.


5. Music Playlists and Image Viewers 

Applications that require sequential access to elements, such as media players or image viewers, often use doubly linked lists to facilitate easy navigation between items (next and previous).


6. Graphs Representation

Adjacency lists, which are used to represent graphs, can be implemented using linked lists, where each node represents a vertex and its linked list represents the adjacent vertices.


7. Polynomial Arithmetic

Linked lists are useful in representing and manipulating polynomials. Each node can represent a term of the polynomial, allowing for dynamic adjustments to the polynomial as terms are added or subtracted.


Linked List Operations

Understanding how to perform operations on linked lists is crucial for using them effectively in programming. 

1. Insertion

  • Insertion in a linked list includes adding a new node to the list. You can insert a node at the beginning, in the middle, or at the end of the list.

  • Adding a new node in the linked list is a more than one step activity. We shall learn this with diagrams here. First, create a node using the same structure and find the location where it has to be inserted.

Insertion at Beginning

In this operation, we are adding an element at the beginning of the list.

Algorithm

1. START

2. Create a node to store the data

3. Check if the list is empty

4. If the list is empty, add the data to the node and 

   assign the head pointer to it.

5. If the list is not empty, add the data to a node and link to the 

   current head. Assign the head to the newly added node.

6. END


Insertion at Ending

In this operation, we are adding an element at the ending of the list.

Algorithm

1. START

2. Create a new node and assign the data

3. Find the last node

4. Point the last node to new node

5. END


Insertion at a Given Position

In this operation, we are adding an element at any position within the list.

Algorithm

1. START

2. Create a new node and assign data to it

3. Iterate until the node at position is found

4. Point first to new first node

5. END


2. Deletion

Deletion removes a node from the linked list. This can also be done at the beginning, at a specific position, or at the end.

Deletion at Beginning

In this deletion operation of the linked, we are deleting an element from the beginning of the list. For this, we point the head to the second node.

Algorithm

1. START

2. Assign the head pointer to the next node in the list

3. END


Deletion at Ending

In this deletion operation of the linked, we are deleting an element from the ending of the list.

Algorithm

1. START

2. Iterate until you find the second last element in the list.

3. Assign NULL to the second last element in the list.

4. END


Deletion at a Given Position

In this deletion operation of the linked, we are deleting an element at any position of the list.

Algorithm

1. START

2. Iterate until find the current node at position in the list.

3. Assign the adjacent node of current node in the list 

   to its previous node.

4. END


3. Search

  • Searching in a linked list involves traversing through the list from the head to find a node containing a specific value.

  • Searching for an element in the list using a key element. This operation is done in the same way as array search; comparing every element in the list with the key element given.

Algorithm

1 START

2 If the list is not empty, iteratively check if the list contains the key

3 If the key element is not present in the list, unsuccessful search

4 END


4. Traversal

  • Traversal means going through each node in the list from the beginning to the end to access or modify data.

  • The traversal operation walks through all the elements of the list in an order and displays the elements in that order.

Algorithm

1. START

2. While the list is not empty and did not reach the end of the list, print the data in each node

3. END


Advantages of Linked Lists

  • Dynamic Sizing: Linked lists can grow and shrink during runtime as they do not have a fixed size, unlike arrays.

  • Efficient Insertions and Deletions: Adding or removing nodes does not require the elements to be contiguous in memory, so operations can be performed without reallocating or reorganizing the entire structure.

  • No Memory Waste: Since there's no need to pre-allocate memory, linked lists can manage data more efficiently without wasting memory on unused space.

  • Ease of Implementation: Can easily be implemented and manipulated, particularly when it comes to complex data structures such as stacks, queues, and other abstract data types.

  • Flexible Structure: Nodes can easily be rearranged by changing their next pointers without the need for data movement, which is beneficial for applications that require frequent reordering of data.


Disadvantages of Linked Lists (Limitations)

  • Higher Memory Use: Each node in a linked list requires additional memory for a pointer to the next (and possibly previous) node, which is more memory-intensive than the tight data packing in arrays.

  • Slower Access Time: Direct access is not feasible with linked lists. To access an element at a specific index, you have to traverse the list from the head to that position, which takes more time than array access.

  • Complexity in Navigation: Navigating a linked list can be more complex compared to navigating an array, as it requires pointer manipulation, which can be prone to errors like pointer corruption or memory leaks.

  • Extra Memory for Pointers: The requirement for storing pointers typically means using extra memory, which can be significant in systems with a large number of elements.

Tree

  • A tree in DSA (Data Structures and Algorithms) is a way to organize data. It looks like an upside-down tree with a root at the top and branches spreading out. Each point on the tree is called a node, and the lines connecting them are called edges. The top node is the root, and nodes with no children are called leaves.

  • The basic idea of a tree is to show a hierarchy. 

  • For example, a family tree shows how family members are related. Trees help us organize and find data quickly. They are important because they make searching, sorting, and organizing data more efficient. 

  • Tree data structures are used in many applications like databases, file systems, and even in games to make decisions. Understanding trees helps us solve problems faster and manage data better.


Types of Trees in Data Structure

Let’s learn about the different types of trees in data structure:


1. Binary Tree

A binary tree is a tree data structure where each node has at most two children, referred to as the left child and the right child.

Properties:

Each node can have zero, one, or two children.

The left and right subtrees are also binary trees.


2. Binary Search Tree (BST)

A binary search tree is a binary tree where each node has a value, and the left child's value is less than the parent's value, while the right child's value is greater than the parent's value.

Properties:

Allows efficient searching, insertion, and deletion.

In-order traversal of a BST gives a sorted sequence of values.


3. AVL Tree

An AVL tree is a self-balancing binary search tree where the difference in heights between the left and right subtrees of any node is at most one.

Properties:

Ensures O(log n) time complexity for insertion, deletion, and search operations.

Automatically balances itself after insertions and deletions.


4. Red-Black Tree

A red-black tree is a self-balancing binary search tree where each node contains an extra bit for color (red or black) to ensure the tree remains balanced.

Properties:

No two consecutive red nodes.

Every path from the root to a leaf has the same number of black nodes.

Ensures O(log n) time complexity for insertion, deletion, and search operations.


5. Heap

A heap is a special tree-based data structure that satisfies the heap property: in a max-heap, each parent node is greater than or equal to its children; in a min-heap, each parent node is less than or equal to its children.


Properties:

Complete binary tree.

The root is the maximum (max-heap) or minimum (min-heap) element.


6. B-Tree

A B-tree is a self-balancing search tree in which nodes can have multiple children. It is commonly used in databases and file systems.

Properties:

Ensures balanced height.

Nodes can have more than two children.

Efficiently supports insertion, deletion, and search operations.


7. Segment Tree

A segment tree is a binary tree used for storing intervals or segments. It allows querying which segments contain a given point.

Properties:

Efficiently supports range queries and updates.

Each node represents an interval.



Tree Traversals

Tree traversal is the process of visiting all the nodes in a tree data structure in a specific order. There are four main types of tree traversals: in-order, pre-order, post-order, and level-order. 

Each traversal method visits nodes in a different sequence.


1. In-order Traversal (Left, Root, Right)

In in-order traversal, the nodes are recursively visited in this order: left subtree, root node, and then the right subtree.

Algorithm:

Traverse the left subtree by recursively calling the in-order function.

Visit the root node.

Traverse the right subtree by recursively calling the in-order function.

Example:

Consider the following: binary tree

In-order Traversal Steps:

Traverse left subtree of node 1 (root): 4, 2, 5

Visit root node: 1

Traverse right subtree of node 1: 3

In-order Traversal Result: 4, 2, 5, 1, 3


2. Pre-order Traversal (Root, Left, Right)

In pre-order traversal, the nodes are recursively visited in this order: root node, left subtree, and then the right subtree.

Algorithm:

  • Visit the root node.

  • Traverse the left subtree by recursively calling the pre-order function.

  • Traverse the right subtree by recursively calling the pre-order function.

Example:

Consider the following binary tree:

Pre-order Traversal Steps:

Visit root node: 1

Traverse left subtree of node 1: 2, 4, 5

Traverse right subtree of node 1: 3

Pre-order Traversal Result: 1, 2, 4, 5, 3


3. Post-order Traversal (Left, Right, Root)

In post-order traversal, the nodes are recursively visited in this order: left subtree, right subtree, and then the root node.


Algorithm:

  • Traverse the left subtree by recursively calling the post-order function.

  • Traverse the right subtree by recursively calling the post-order function.

  • Visit the root node.

Example:

Consider the following binary tree:

Post-order Traversal Steps:

Traverse left subtree of node 1: 4, 5, 2

Traverse right subtree of node 1: 3

Visit root node: 1

Post-order Traversal Result: 4, 5, 2, 3, 1


Operations on Tree Data Structure

Tree operations include insertion, deletion, searching, and traversal. These operations are fundamental for managing and manipulating data within tree structures:


1. Insertion

Insertion involves adding a new node to the tree. The location of the new node depends on the type of tree.

Example:

Consider inserting the value 6 into a binary tree:

Steps to Insert 6:

Start at the root node.

Find the first available position in level order.

Insert 6 as the right child of 3.

Result:


2. Deletion

Deletion involves removing a node from the tree. In a binary tree, the node to be deleted is replaced by the deepest and rightmost node to maintain the tree's structure.

Example:

Consider deleting the value 3 from the following binary tree:

Steps to Delete 3:

Find the node to be deleted (3).

Find the deepest and rightmost node (5).

Replace 3 with 5.

Remove the deepest and rightmost node.

Result:-


3. Search

Searching involves finding a node with a given value in the tree. The search operation can be implemented using any traversal method.

Example:

Consider searching for the value 4 in the following binary tree:

Steps to Search for 4:

Start at the root node (1).

Check the left subtree.

Move to node 2.

Check the left subtree of node 2.

Find the node 4.

Result:  Node 4 is found.


4. Traversal

Traversal involves visiting all the nodes in the tree in a specific order. The main traversal methods are in-order, pre-order, post-order, and level-order.

Example:

Consider the following binary tree:

In-order Traversal:

Traverse left subtree of 1: 4, 2, 5

Visit root node: 1

Traverse right subtree of 1: 3

Result: 4, 2, 5, 1, 3


Pre-order Traversal:

Visit root node: 1

Traverse left subtree of 1: 2, 4, 5

Traverse right subtree of 1: 3

Result: 1, 2, 4, 5, 3


Post-order Traversal:

Traverse left subtree of 1: 4, 5, 2

Traverse right subtree of 1: 3

Visit root node: 1

Result: 4, 5, 2, 3, 1


Level-order Traversal:

Visit nodes level by level from left to right.

Result: 1, 2, 3, 4, 5


Applications of Trees

Trees are versatile data structures that play a crucial role in various applications across computer science and related fields:


1. Efficient Searching and Sorting 

Binary Search Trees (BST) are used to maintain a sorted sequence of elements and provide efficient search, insertion, and deletion operations.

Example: Database indexing systems use BSTs to quickly locate records.


2. Priority Queues

Heap trees are used to implement priority queues where the highest (or lowest) priority element is accessed first.

Example: Task scheduling systems use heaps to manage tasks by priority.


3. Arithmetic Expressions Evaluation

Expression trees represent arithmetic expressions. The leaf nodes contain operands, and the internal nodes contain operators.

Example: Parsing and evaluating mathematical expressions in compilers.


4. Data Compression

Huffman trees are used to generate optimal prefix codes for characters based on their frequencies, reducing the overall size of the data.

Example: Compression algorithms like Huffman coding use these trees to compress data efficiently.


5. Database and File Systems

B-Trees are self-balancing search trees that maintain sorted data and allow searches, sequential access, insertions, and deletions in logarithmic time.

Example: Used in databases and file systems to manage large blocks of data.


6. Efficient Retrieval of Keys

Tries are used to store a dynamic set of strings where keys are usually strings. They provide efficient search and retrieval operations.

Example: Implementing dictionaries and autocomplete features.


Properties of Tree Data Structure:

  • Number of edges: An edge can be defined as the connection between two nodes. If a tree has N nodes then it will have (N-1) edges. There is only one path from each node to any other node of the tree.

  • Depth of a node: The depth of a node is defined as the length of the path from the root to that node. Each edge adds 1 unit of length to the path. So, it can also be defined as the number of edges in the path from the root of the tree to the node.

  • Height of a node: The height of a node can be defined as the length of the longest path from the node to a leaf node of the tree.

  • Height of the Tree: The height of a tree is the length of the longest path from the root of the tree to a leaf node of the tree.

  • Degree of a Node: The total count of subtrees attached to that node is called the degree of the node. The degree of a leaf node must be 0. The degree of a tree is the maximum degree of a node among all the nodes in the tree.

Binary search tree (BST)

  • A binary search tree (BST) is a type of data structure used in computer science to store and manage data. In a BST, each node is a part of the tree that holds a value. Each node can have up to two children nodes. The node at the top is called the root.


The main rule of a BST is that for any node:

  • All the values in the left child and its descendants are smaller than the node's value.

  • All the values in the right child and its descendants are greater than the node's value.

  • This structure helps to quickly find, add, or remove values from the tree.

For example, if you want to find a value, you can start at the root and move left or right depending on whether the value is smaller or larger than the current node, making the search very efficient.


Binary Search Tree Properties

A Binary Search Tree (BST) has specific properties that make it efficient for search, insertion, and deletion operations:


1. Node Structure

Each node in a BST contains three parts:

Data: The value stored in the node.

Left Child: A pointer/reference to the left child node.

Right Child: A pointer/reference to the right child node.


2. Binary Tree

A BST in data structure is a type of binary tree, which means each node can have at most two children: left and right.


3. Ordered Nodes

Left Subtree: For any given node, all the values in its left subtree are smaller than the value of the node.

Right Subtree: For any given node, all the values in its right subtree are greater than the value of the node.

Example:

Consider the following binary search tree:

The root node is 10.

The left child of 10 is 5, and all values in the left subtree (3, 5, 7) are less than 10.

The right child of 10 is 15, and all values in the right subtree (15, 20) are greater than 10.


4. No Duplicate Values

A BST in data structure does not contain duplicate values. Each value must be unique to maintain the order property.


5. Recursive Definition

Each subtree of a BST is also a BST. This means the left and right children of any node are roots of their own Binary Search Trees.


Binary Search Tree Operations in Data Structure

Let’s learn about the different operations on binary search tree (BST):

1. Insertion

Insertion in binary search tree includes adding a new node in a way that maintains the BST property: for any node, all values in its left subtree are smaller, and all values in its right subtree are greater.

Algorithm:

  1. Start at the root node.

  2. Compare the vaue to be inserted with the current node's value.

  3. If the value is smaller, move to the left child.

  4. If the value is greater, move to the right child.

  5. Repeat steps 2-4 until you find an empty spot.

  6. Insert the new node at the empty spot.

Or,https://www.tutorialspoint.com/data_structures_algorithms/binary_search_tree.htm

Whenever an element is to be inserted, first locate its proper location. Start searching from the root node, then if the data is less than the key value, search for the empty location in the left subtree and insert the data. Otherwise, search for the empty location in the right subtree and insert the data.

Algorithm

1. START

2. If the tree is empty, insert the first element as the root node of the tree. The following elements are added as the leaf nodes.

3. If an element is less than the root value, it is added into the left subtree as a leaf node.

4. If an element is greater than the root value, it is added into the right subtree as a leaf node.

5. The final leaf nodes of the tree point to NULL values as their child nodes.

6. END


Example:

Insert the value 8 into the following BST:

Insertion in binary search tree

Step-by-Step Insertion:

Start at the root (10).

8 is less than 10, move to the left child (5).

8 is greater than 5, move to the right child (7).

8 is greater than 7, move to the right child of 7 (which is empty).

Insert 8 as the right child of 7.

Result:


2. Deletion

Binary search tree node deletion involves removing a node and reorganizing the tree to maintain the BST property. 

There are three cases to handle:

  • Node with no children (leaf node)

  • Node with one child

  • Node with two children


Algorithm:

No Child (Leaf Node): Simply remove the node.

One Child: Remove the node and replace it with its child.

Two Children: Find the in-order predecessor (largest value in the left subtree) or in-order successor (smallest value in the right subtree), replace the node's value with the predecessor/successor, and delete the predecessor/successor.

Example:

Delete the value 5 from the following BST:

Binary search tree node deletion

Step-by-Step Deletion:

  1. Start at the root (10).

  2. 5 is less than 10, move to the left child (5).

  3. Node 5 has two children. Find the in-order successor (smallest value in the right subtree of 5), which is 7.

  4. Replace 5 with 7.

  5. Delete node 7.

result:-


3. Searching

Searching in a binary search tree involves finding a node with a given value by leveraging the BST property to reduce the number of comparisons.

Algorithm:

  1. Compare the target value with the current node's value.

  2. If the value is equal, the search is successful.

  3. If the value is smaller, move to the left child.

  4. If the value is greater, move to the right child.

  5. Repeat steps 2-5 until you find the value or reach an empty spot (unsuccessful search).

Or,

Whenever an element is to be searched, start searching from the root node. Then if the data is less than the key value, search for the element in the left subtree. Otherwise, search for the element in the right subtree. Follow the same algorithm for each node.

Algorithm

1. START

2. Check whether the tree is empty or not

3. If the tree is empty, search is not possible

4. Otherwise, first search the root of the tree.

5. If the key does not match with the value in the root, search its subtrees.

6. If the value of the key is less than the root value, search the left subtree

7. If the value of the key is greater than the root value, search the right subtree.

8. If the key is not found in the tree, return unsuccessful search.

9. END

example:-

Search for the value 8 in the following BST:

Searching in a binary search tree

Step-by-Step Searching:

  1. Start at the root (10).

  2. 8 is less than 10, move to the left child (5).

  3. 8 is greater than 5, move to the right child (7).

  4. 8 is greater than 7, move to the right child (8).

  5. 8 is equal to 8, search is successful.


Applications of Binary Search Tree (BST)

Binary Search Trees have numerous applications due to their efficient searching, insertion, and deletion capabilities:


1. Databases and Indexing

BSTs are used in database indexing to quickly locate records.

By storing records in a BST, databases can perform search, insert, and delete operations efficiently. Balanced BSTs like AVL trees or Red-Black trees are particularly useful to maintain performance.


2. Dynamic Sets and Dictionaries

Binary search trees are used to implement dynamic sets and dictionaries.

BSTs provide efficient methods to insert, delete, and look up elements. This makes them suitable for implementing data structures like sets, which require these operations.


3. Symbol Tables in Compilers

Compilers use binary search tree data structure to manage symbol tables.

Symbol tables store information about variables, functions, objects, etc., in a program. Using a BST allows compilers to quickly look up symbols and their associated information.


4. Autocomplete and Spell Check

A binary search tree is used in autocomplete systems and spell checkers.


By storing a dictionary of words in a BST, these systems can quickly suggest or validate words as users type.


5. Implementing Priority Queues

Although heap data structures are generally used for priority queues, BSTs can also be used to keep elements sorted by priority. Self-balancing BSTs ensure that the highest or lowest priority element can be accessed quickly.


6. Network Routing Algorithms

Routing tables store paths and their associated costs. Using BSTs allows for efficient management and quick lookup of routing paths.


7. File Systems and Folder Hierarchies

A binary search tree data structure can organize files and directories in a hierarchical structure, making it easy to search for and manage files.


8. Data Compression (Huffman Coding)

Huffman trees, a type of BST, are used to generate optimal prefix codes for characters based on their frequencies. This helps in compressing data efficiently.


9. Game Development (AI and Decision Trees)

Decision trees, which are a type of BST, are used to represent possible actions and outcomes in games. This helps in creating intelligent behavior for game characters.


10. Event-Driven Simulations

Events in a simulation are scheduled and processed in a specific order. Using a BST ensures that events are processed efficiently based on their scheduled times.


Recursion

  • The process in which a function calls itself directly or indirectly is called recursion and the corresponding function is called a recursive function.
  • A recursive algorithm takes one step toward solution and then recursively call itself to further move. The algorithm stops once we reach the solution.

  • Since called function may further call itself, this process might continue forever. So it is essential to provide a base case to terminate this recursion process.


Example:

  • To understand recursion better, imagine you have a stack of books, and you want to count how many books are in the stack. You could take the top book off the stack, count it as one, and then ask someone to count the rest of the books in the remaining stack. 

  • They would do the same thing—take the top book, count it as one, and ask the next person to count the rest. This process would continue until there is only one book left in the stack. The final person says "one" and passes the number back up the line, with each person adding one to the count they receive. Eventually, you get the total count of books.

Need of Recursion

  • Recursion helps in logic building. Recursive thinking helps in solving complex problems by breaking them into smaller subproblems.

  • Recursive solutions work as a a basis for Dynamic Programming and Divide and Conquer algorithms.

  • Certain problems can be solved quite easily using recursion like Towers of Hanoi (TOH), Inorder/Preorder/Postorder Tree Traversals, DFS of Graph, etc.


Algorithm /Steps to Implement Recursion

  1. Step1 – Define a base case: Identify the simplest (or base) case for which the solution is known or trivial. This is the stopping condition for the recursion, as it prevents the function from infinitely calling itself.

  2. Step2 – Define a recursive case: Define the problem in terms of smaller subproblems. Break the problem down into smaller versions of itself, and call the function recursively to solve each subproblem.

  3. Step3 – Ensure the recursion terminates: Make sure that the recursive function eventually reaches the base case, and does not enter an infinite loop.

  4. Step4 – Combine the solutions: Combine the solutions of the subproblems to solve the original problem.


Components of Recursive Algorithm

A recursive algorithm is made up of three main components: the base case, the recursive case, and the recurrence relation. 


1. Base Case

The base case is the condition that stops the recursion. It’s the simplest instance of the problem that can be solved directly without further recursion. The base case prevents the algorithm from calling itself infinitely and eventually allows the recursion to unwind and return a final result.

Example: If you are calculating the factorial of a number (n!), the base case is when n = 1 because the factorial of 1 is simply 1, and there’s no need for further recursion.


2. Recursive Case

The recursive case is where the function calls itself with a smaller or simpler version of the original problem. This part of the algorithm is responsible for breaking down the problem into smaller pieces that are easier to solve.

Example: In the factorial example, the recursive case involves calculating n * factorial(n-1). The function calls itself with n-1, gradually reducing the problem until it reaches the base case.


3. Recurrence Relation

The recurrence relation is the mathematical formula or expression that defines how the original problem is split into smaller subproblems. It links the original problem to its subproblems and ensures that each recursive step brings the algorithm closer to the base case.

Syntax to Declare a Recursive Function

recursionfunction()

recursionfunction(); //calling self function 

}


Example of fibonacci in recursion

 #include <iostream>

using namespace std;

int fibonacci(int n) {

    if (n == 0) {

        return 0;

    } else if (n == 1) {

        return 1;

    } else {

        return fibonacci(n - 1) + fibonacci(n - 2);

    }

}

int main() {

    int n, f;

    n=12;

    f = fibonacci(n);

    cout << f << endl;

    return 0;

}

Advantages of Recursive Algorithms

  • Simplifies Complex Problems: Recursion can make solving complex problems easier by breaking them down into smaller, manageable subproblems.

  • Reduces Code Size: Recursive solutions often require fewer lines of code compared to iterative solutions, making the code more concise and easier to read.

  • Natural Fit for Certain Problems: Problems like tree traversals, factorial calculation, and the Tower of Hanoi are naturally suited to recursion, leading to more intuitive solutions.

  • Cleaner Code: Recursive algorithms can lead to cleaner, more elegant code, especially for problems that involve repeated operations.


Disadvantages of Recursive Algorithms

  • Higher Memory Usage: Recursion can use more memory due to the overhead of maintaining multiple function calls on the call stack, which can lead to stack overflow if the recursion is too deep.

  • Slower Performance: Recursive algorithms may be slower than their iterative counterparts due to the overhead of function calls and stack management.

  • Risk of Infinite Recursion: If the base case is not defined or not reached, recursion can lead to infinite loops and program crashes.

  • Harder to Debug: Recursive algorithms can be more difficult to debug, as tracking the sequence of function calls can be challenging, especially in complex recursion.


Applications of Recursive Algorithms

1. Sorting Algorithms

Merge Sort: Uses recursion to divide the array into smaller subarrays and then merge them in a sorted manner.

Quick Sort: Recursively partitions the array around a pivot and sorts the partitions.


2. Search Algorithms

Binary Search: Recursively divides a sorted array in half to locate a target element.


3. Mathematical Computations

Factorial Calculation: Computes the factorial of a number by recursively multiplying it by the factorial of the previous number.

Fibonacci Sequence: Generates Fibonacci numbers using recursion by summing the previous two numbers in the sequence.


4. Tree and Graph Traversals

Tree Traversals (Preorder, Inorder, Postorder): Recursively visits nodes in a tree data structure.

Depth-First Search (DFS): Recursively explores all possible paths in a graph before backtracking.


5. Dynamic Programming

Memoization: Recursive algorithms with memoization store the results of subproblems to avoid redundant calculations, such as in the computation of Fibonacci numbers or the Knapsack problem.


6. Divide and Conquer Algorithms

Strassen’s Matrix Multiplication: Recursively multiplies matrices by breaking them down into smaller submatrices.

Karatsuba Algorithm: Recursively multiplies large numbers by dividing them into smaller parts.


Properties of Recursion

  • It solves a problem by breaking it down into smaller sub-problems, each of which can be solved in the same way.

  • A recursive function must have a base case or stopping criteria to avoid infinite recursion.

  • Recursion involves calling the same function within itself, which leads to a call stack.

  • Recursive functions may be less efficient than iterative solutions in terms of memory and performance.


Divide and Conquer

  • Divide and Conquer Algorithm is a problem-solving technique used to solve problems by dividing the main problem into subproblems, solving them individually and then merging them to find solution to the original problem.

  •  Divide and Conquer is mainly useful when we divide a problem into independent subproblems. If we have overlapping subproblems, then we use Dynamic Programming.

For Example:

Imagine you have a big jigsaw puzzle with 1,000 pieces. Instead of trying to solve the whole puzzle at once, you could divide it into smaller sections, like the corners, edges, and middle. You solve each section separately, and then you combine them to complete the entire puzzle. This approach is similar to how a divide and conquer algorithm works.

Working of Divide and Conquer Algorithm

Divide and Conquer Algorithm can be divided into three steps: Divide, Conquer and Merge.

1. Divide:

  • Break down the original problem into smaller subproblems.

  • Each subproblem should represent a part of the overall problem.

  • The goal is to divide the problem until no further division is possible.

In Merge Sort, we divide the input array in two halves. Please note that the divide step of Merge Sort is simple, but in Quick Sort, the divide step is critical. In Quick Sort, we partition the array around a pivot.


2. Conquer:

  • Solve each of the smaller subproblems individually.

  • If a subproblem is small enough (often referred to as the “base case”), we solve it directly without further recursion.

  • The goal is to find solutions for these subproblems independently.

In Merge Sort, the conquer step is to sort the two halves individually.


3. Merge:

  • Combine the sub-problems to get the final solution of the whole problem.

  • Once the smaller subproblems are solved, we recursively combine their solutions to get the solution of larger problem.

  • The goal is to formulate a solution for the original problem by merging the results from the subproblems.


Characteristics of Divide and Conquer Algorithm

Divide and Conquer Algorithm involves breaking down a problem into smaller, more manageable parts, solving each part individually, and then combining the solutions to solve the original problem. The characteristics of Divide and Conquer Algorithm are:

  • Dividing the Problem: The first step is to break the problem into smaller, more manageable subproblems. This division can be done recursively until the subproblems become simple enough to solve directly.

  • Independence of Subproblems: Each subproblem should be independent of the others, meaning that solving one subproblem does not depend on the solution of another. This allows for parallel processing or concurrent execution of subproblems, which can lead to efficiency gains.

  • Conquering Each Subproblem: Once divided, the subproblems are solved individually. This may involve applying the same divide and conquer approach recursively until the subproblems become simple enough to solve directly, or it may involve applying a different algorithm or technique.

  • Combining Solutions: After solving the subproblems, their solutions are combined to obtain the solution to the original problem. This combination step should be relatively efficient and straightforward, as the solutions to the subproblems should be designed to fit together seamlessly.


Applications of Divide and Conquer Algorithm

The following are some standard algorithms that follow Divide and Conquer algorithm:

  • Binary Search is an efficient algorithm for finding an element in a sorted array by repeatedly dividing the search interval in half. It works by comparing the target value with the middle element and narrowing the search to either the left or right half, depending on the comparison.

  • Quicksort is a sorting algorithm that picks a pivot element and rearranges the array elements so that all elements smaller than the picked pivot element move to the left side of the pivot, and all greater elements move to the right side. Finally, the algorithm recursively sorts the subarrays on the left and right of the pivot element.

  • Merge Sort is also a sorting algorithm. The algorithm divides the array into two halves, recursively sorts them, and finally merges the two sorted halves.

  • Closest Pair of Points: The problem is to find the closest pair of points in a set of points in the x-y plane. The problem can be solved in O(n^2) time by calculating the distances of every pair of points and comparing the distances to find the minimum. The Divide and Conquer algorithm solves the problem in O(N log N) time.

  • Karatsuba algorithm for fast multiplication does the multiplication of two binary strings in O(n1.59) where n is the length of binary string.

Advantages of Divide and Conquer Algorithms

Efficiency: Many divide and conquer algorithms, like Merge Sort and Quick Sort, are highly efficient and have better time complexity compared to simpler algorithms.


Parallelism: The independent subproblems can often be solved in parallel, making these algorithms well-suited for parallel and distributed computing environments.


Modularity: Divide and conquer algorithms break down complex problems into smaller, more manageable subproblems, making the code easier to understand, implement, and debug.


Scalability: These algorithms perform well on large datasets, which makes them scalable for real-world applications involving big data.


Reusability: The same divide and conquer approach can be applied to different problems, making it a versatile strategy in algorithm design.


Disadvantages of Divide and Conquer Algorithms

Overhead: Recursive function calls and managing the division of the problem can introduce additional overhead, which may lead to inefficiency for small or simple problems.


Complexity in Combining Solutions: In some cases, combining the solutions of subproblems can be complex and may negate the benefits of dividing the problem.


Not Always the Best Choice: For problems that are inherently simple, a divide and conquer approach may be overkill and less efficient than a straightforward algorithm.


Space Complexity: Some divide and conquer algorithms, like Merge Sort, require additional space for combining solutions, which can be a drawback in memory-constrained environments.

Dynamic Programming

Dynamic Programming is a commonly used algorithmic technique used to optimize recursive solutions when the same subproblems are called again.

  • The core idea behind DP is to store solutions to subproblems so that each is solved only once.

  • To solve DP problems, we first write a recursive solution in a way that there are overlapping subproblems in the recursion tree (the recursive function is called with the same parameters multiple times)

  • To make sure that a recursive value is computed only once (to improve time taken by algorithm), we store results of the recursive calls.

  • There are two ways to store the results, one is top down (or memoization) and other is bottom up (or tabulation).


When to Use Dynamic Programming (DP)/Key Concepts of Dynamic Programming Algorithm

Dynamic programming is used for solving problems that consists of the following characteristics:

1. Overlapping Subproblems

  • Overlapping subproblems in dynamic programming occur when a problem can be broken down into smaller subproblems that are solved multiple times. 

  • Instead of solving the same subproblem repeatedly, dynamic programming solves it once and stores the result. This saved result can then be reused whenever the same subproblem is encountered again.

  • or,The same subproblems are solved repeatedly in different parts of the problem refer to Overlapping Subproblems Property in Dynamic Programming.

  • Example:Consider the problem of computing the Fibonacci series. To compute the Fibonacci number at index n, we need to compute the Fibonacci numbers at indices n-1 and n-2. This means that the subproblem of computing the Fibonacci number at index n-2 is used twice (note that the call for n – 1 will make two calls, one for n-2 and other for n-3) in the solution to the larger problem of computing the Fibonacci number at index n. 

You may notice overlapping subproblems highlighted in the second recursion tree for Nth Fibonacci diagram shown below.


2. Optimal Substructure

  • Optimal substructure means that the optimal solution to a problem can be built from the optimal solutions to its subproblems. In other words, solving smaller parts of the problem optimally will lead to an overall optimal solution.

  • or,The property Optimal substructure means that we use the optimal results of subproblems to achieve the optimal result of the bigger problem.

  • For example, in the shortest path problem, if you know the shortest path from A to B and the shortest path from B to C, then the shortest path from A to C will be the combination of these two shortest paths. Dynamic programming uses this concept to build up the solution to a larger problem by solving and combining the solutions to smaller subproblems.


Approaches of Dynamic Programming (DP)

Dynamic programming can be achieved using two approaches:


1. Top-Down Approach (Memoization):

  • In the top-down approach, also known as memoization, we keep the solution recursive and add a memoization table to avoid repeated calls of same subproblems.

  • Before making any recursive call, we first check if the memoization table already has solution for it.

  • After the recursive call is over, we store the solution in the memoization table.


2. Bottom-Up Approach (Tabulation):

  • In the bottom-up approach, also known as tabulation, we start with the smallest subproblems and gradually build up to the final solution.

  • We write an iterative solution (avoid recursion overhead) and build the solution in bottom-up manner.

  • We use a dp table where we first fill the solution for base cases and then fill the remaining entries of the table using recursive formula.

  • We only use recursive formula on table entries and do not make recursive calls.


How will Dynamic Programming (DP) Work?

Let’s us now see the above recursion tree with overlapping subproblems highlighted with same color. We can clearly see that that recursive solution is doing a lot work again and again which is causing the time complexity to be exponential. Imagine time taken for computing a large Fibonacci number.

  • Identify Subproblems: Divide the main problem into smaller, independent subproblems, i.e., F(n-1) and F(n-2)

  • Store Solutions: Solve each subproblem and store the solution in a table or array so that we do not have to recompute the same again.

  • Build Up Solutions: Use the stored solutions to build up the solution to the main problem. For F(n), look up F(n-1) and F(n-2) in the table and add them.

  • Avoid Recomputation: By storing solutions, DP ensures that each subproblem (for example, F(2)) is solved only once, reducing computation time.


Example of Dynamic Programming (DP)

Example 1: Consider the problem of finding the Fibonacci sequence:

Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, …

// C++ program to find 

// fibonacci number using recursion.

#include <bits/stdc++.h>

using namespace std;

// Function to find nth fibonacci number

int fib(int n) {

    if (n <= 1) {

        return n;

    }

    return fib(n - 1) + fib(n - 2);

}

int main() {

    int n = 5;

    cout << fib(n);

    return 0;

}

result:-5


Advantages of Dynamic Programming

  • Efficiency: Reduces the time complexity by avoiding redundant calculations through the reuse of already computed solutions.

  • Optimal Solutions: Ensures that the solutions to problems are optimal by solving and combining the results of subproblems.

  • Versatility: Applicable to a wide range of problems, from combinatorial optimization to sequence alignment in bioinformatics.

  • Handles Complex Problems: Can solve problems that are difficult or impossible to solve using other techniques, especially those with overlapping subproblems.

  • Improves Recursive Algorithms: Converts exponential-time recursive algorithms into polynomial-time solutions.


Disadvantages of Dynamic Programming

  • High Space Complexity: Can require significant memory to store solutions for all subproblems, especially in multi-dimensional DP problems.

  • Complexity in Formulation: Identifying subproblems, defining recurrence relations, and setting up base cases can be challenging and may require deep insight into the problem.

  • Not Always Necessary: DP might be overkill for simpler problems where a greedy approach or divide-and-conquer technique would suffice.

  • Difficult to Debug: Debugging dynamic programming solutions can be harder due to the complexity of the state transitions and table updates.

  • May Lead to Suboptimal Performance: If not carefully implemented (e.g., unnecessary recalculations or improper state representation), DP can still result in inefficient solutions.


Applications of Dynamic Programming

  • Optimization Problems: Used to solve various optimization problems like the 0/1 Knapsack, Longest Increasing Subsequence (LIS), and Maximum Subarray problems.

  • String Matching: Helps in problems like Edit Distance, Longest Common Subsequence (LCS), and String Segmentation.

  • Graph Algorithms: Applied in finding shortest paths (e.g., Bellman-Ford, Floyd-Warshall), and in network flow problems.

  • Resource Allocation: Used in scenarios where resources need to be allocated optimally, such as in production planning and scheduling.

  • Bioinformatics: Employed in DNA sequence alignment, protein structure prediction, and evolutionary tree construction.

  • Game Theory: Helps in solving combinatorial game problems where the goal is to find optimal strategies.

Greedy 

  • A greedy algorithm is a way to solve problems by making the best choice that seems right at each step, without thinking about the future. It focuses on taking the most immediate, obvious solution that looks like it will work best right now. This approach doesn’t worry about whether the decision will be perfect in the long run, just that it’s the best option at the moment.

  • Greedy is an algorithmic paradigm that builds up a solution piece by piece, always choosing the next piece that offers the most obvious and immediate benefit. Greedy algorithms are used for optimization problems. 

  • An optimization problem can be solved using Greedy if the problem has the following property: 

  • At every step, we can make a choice that looks best at the moment, and we get the optimal solution to the complete problem. 

  • Some popular Greedy Algorithms are Fractional Knapsack, Dijkstra’s algorithm, Kruskal’s algorithm, Huffman coding and Prim’s Algorithm


Components of Greedy Algorithm

1. Greedy Choice Property

  • This property means that a greedy algorithm always makes the choice that seems the best at the moment. It selects the option that offers the most immediate benefit without considering the future consequences. 

  • The idea is that by choosing the best local option at each step, the algorithm will eventually reach an optimal solution for the entire problem.

  • or, The optimal solution can be constructed by making the best local choice at each step.


2. Optimal Substructure

  • Optimal substructure refers to the idea that a problem can be broken down into smaller subproblems, and the optimal solution to the overall problem can be achieved by solving these subproblems optimally. 

  • In other words, if you solve each small part of the problem in the best possible way, you can combine these solutions to get the best solution for the whole problem. This is a key characteristic that allows greedy algorithms to work effectively.

  • OR,the optimal solution to the problem contains the optimal solutions to its subproblems.


Characteristics of Greedy Algorithm

Here are the characteristics of a greedy algorithm:

  • Greedy algorithms are simple and easy to implement.

  • They are efficient in terms of time complexity, often providing quick solutions. Greedy Algorithms are typically preferred over Dynamic Programming for the problems where both are applied. For example, Jump Game problem and Single Source Shortest Path Problem (Dijkstra is preferred over Bellman Ford where we do not have negative weights).

  • These algorithms do not reconsider previous choices, as they make decisions based on current information without looking ahead.


How does the Greedy Algorithm works?

Greedy Algorithm solve optimization problems by making the best local choice at each step in the hope of finding the global optimum. It’s like taking the best option available at each moment, hoping it will lead to the best overall outcome.

Here’s how it works:

  • Start with the initial state of the problem. This is the starting point from where you begin making choices.

  • Evaluate all possible choices you can make from the current state. Consider all the options available at that specific moment.

  • Choose the option that seems best at that moment, regardless of future consequences. This is the “greedy” part – you take the best option available now, even if it might not be the best in the long run.

  • Move to the new state based on your chosen option. This becomes your new starting point for the next iteration.

  • Repeat steps 2-4 until you reach the goal state or no further progress is possible. Keep making the best local choices until you reach the end of the problem or get stuck.


Example:

Let’s say you have a set of coins with values [1, 2, 5, 10] and you need to give minimum number of coin to someone change for 39.

The greedy algorithm for making change would work as follows:

  1. Step-1: Start with the largest coin value that is less than or equal to the amount to be changed. In this case, the largest coin less than or equal to 39 is 10.

  2. Step- 2: Subtract the largest coin value from the amount to be changed, and add the coin to the solution. In this case, subtracting 10 from 39 gives 29, and we add one 10-coin to the solution.

  3. Repeat steps 1 and 2 until the amount to be changed becomes 0.


Applications of Greedy Algorithms

Network Routing: Greedy algorithms like Dijkstra’s algorithm are used to find the shortest path in network routing, optimizing data packet delivery across the internet.

Task Scheduling: Used in job sequencing problems to maximize profit or minimize deadlines by selecting jobs in order of priority.

Resource Allocation: Applied in scenario like the fractional knapsack problem to allocate resources efficiently based on value-to-weight ratios.

Data Compression: Huffman coding uses a greedy approach to compress data efficiently by encoding frequently used characters with shorter codes.

Coin Change: A greedy approach is often used to provide the minimum number of coins for a given amount of money, particularly when the denominations are standard (like in most currencies).

Traveling Salesman Problem (TSP) Heuristic: Greedy heuristics are used in approximating solutions to the TSP by selecting the nearest unvisited city at each step.


Advantages of Greedy Algorithms

Simplicity: Greedy algorithms are easy to understand and implement because they follow a straightforward approach of making the best choice at each step.

Efficiency: They often have lower time complexity compared to other algorithms like dynamic programming, making them faster for certain problems.

Locally Optimal Decisions: By making locally optimal choices, greedy algorithms can quickly arrive at a solution without the need for complex backtracking.

Works Well for Certain Problems: Greedy algorithms are particularly effective for problems that exhibit the greedy choice property and optimal substructure, such as Minimum Spanning Tree and Huffman Coding.

Disadvantages of Greedy Algorithms

May Not Always Produce Optimal Solutions: Greedy algorithms focus on immediate benefits, which can sometimes lead to suboptimal solutions for problems where a global perspective is needed.

Limited Applicability: Greedy algorithms only work well for problems that satisfy the greedy choice property and optimal substructure; they may fail or give incorrect results otherwise.

No Backtracking: Once a choice is made, it cannot be undone, which can lead to incorrect results if the wrong choice is made early on.

Lack of Flexibility: Greedy algorithms are rigid in their approach, and modifying them to account for different problem constraints can be challenging.


Backtracking

Backtracking is a problem-solving algorithmic technique that involves finding a solution incrementally by trying different options and undoing them if they lead to a dead end.

It is commonly used in situations where you need to explore multiple possibilities to solve a problem, like searching for a path in a maze or solving puzzles like Sudoku. When a dead end is reached, the algorithm backtracks to the previous decision point and explores a different path until a solution is found or all possibilities have been exhausted.

Examples of Backtracking Algorithms

N-Queens Problem,Sudoku Solver, Subset Sum Problem,Knight’s Tour Problemetc

or,Backtracking can be defined as a general algorithmic technique that considers searching every possible combination in order to solve a computational problem. 

he backtracking algorithm is a problem-solving approach that tries out all the possible solutions and chooses the best or desired ones. Generally, it is used to solve problems that have multiple solutions. The term backtracking suggests that for a given problem, if the current solution is not suitable, eliminate it and then backtrack to try other solutions.


When do we use backtracking algorithm?

Backtracking algorithm can be used for the following problems −

The problem has multiple solutions or requires finding all possible solutions.

When the given problem can be broken down into smaller subproblems that are similar to the original problem.

If the problem has some constraints or rules that must be satisfied by the solution


types of Backtracking Problems

Problems associated with backtracking can be categorized into 3 categories:

Decision Problems: Here, we search for a feasible solution.

Optimization Problems: For this type, we search for the best solution.

Enumeration Problems: We find set of all possible feasible solutions to the problems of this type.


How does Backtracking works?

As we know backtracking algorithm explores each and every possible path in order to find a valid solution, this exploration of path can be easily understood via given images:


As shown in the image, “IS”  represents the Initial State where the recursion call starts to find a valid solution. 

C : it represents different Checkpoints for recursive calls

TN: it represents the Terminal Nodes where no further recursive calls can be made, these nodes act as base case of recursion and we determine whether the current solution is valid or not at this state.

At each Checkpoint, our program makes some decisions and move to other checkpoints untill it reaches a terminal Node, after determining whether a solution is valid or not, the program starts to revert back to the checkpoints and try to explore other paths. For example in the above image TN1…TN5 are the terminal node where the solution is not acceptable, while TN6 is the state where we found a valid solution.

The back arrows in the images shows backtracking in actions, where we revert the changes made by some checkpoint.


Advantages of Backtracking Algorithms

Simple and Intuitive: Easy to understand and implement for a wide range of problems.

Flexibility: Can be applied to many types of problems, especially combinatorial and constraint satisfaction problems.

Exhaustive Search: Guarantees finding a solution if one exists by exploring all possible options.

Effective Pruning: Can reduce the search space significantly by eliminating paths that cannot lead to a solution.

Handles Complex Problems: Suitable for problems with complex constraints that other algorithms may struggle with.


Disadvantages of Backtracking Algorithms

High Time Complexity: Can be very slow due to the need to explore all possible solutions, leading to exponential time complexity in many cases.

Memory Intensive: Uses a lot of memory, especially for deep recursion and large decision trees.

Inefficient for Large Problems: Not practical for large-scale problems due to the combinatorial explosion of possibilities.

Non-Optimal Solutions: While it finds a solution, it may not be the most efficient or optimal one without further optimization.

Requires Careful Pruning: Effective use of backtracking depends on good pruning techniques, which can be complex to implement correctly.


Applications of Backtracking Algorithm

Puzzle Solving: Used in solving puzzles like Sudoku, crosswords, and the N-Queens problem.

Combinatorial Optimization: Generates all permutations, combinations, and subsets of a given set.

Constraint Satisfaction Problems: Applied in problems like graph coloring, scheduling, and job assignment where constraints must be met.

Pathfinding: Solves mazes and other pathfinding problems, such as the Rat in a Maze problem.

Game Solving: Used in strategy games to explore possible moves, such as in the Knight’s Tour problem.

Decision-Making: Helps in making optimal decisions in situations where multiple choices are possible, like in the Subset Sum problem.

Difference Between Recursion and Backracking

Recursion

Recursion does not always need backtracking

Solving problems by breaking them into smaller, similar subproblems and solving them recursively.

Controlled by function calls and call stack.

Applications of Recursion: Tree and Graph Traversal, Towers of Hanoi, Divide and Conquer Algorithms, Merge Sort, Quick Sort, and Binary Search.


Backtracking

Backtracking always uses recursion to solve problems

Solving problems with multiple choices and exploring options systematically, backtracking when needed.

Managed explicitly with loops and state.

Application of Backtracking: N Queen problem, Rat in a Maze problem, Knight’s Tour Problem, Sudoku solver, and Graph coloring problems.

Depth-first Search and Breadth-first Search

  • Depth-First Search (DFS) is a popular algorithm used for traversing or searching through graph or tree data structures. It starts at a source node (or vertex) and explores as far along each branch or path as possible before backtracking. 

  • This means it dives deep into the graph before moving to other branches, hence the name "depth-first." DFS can be implemented using either recursion or a stack.


working of DFS

Approach: DFS explores as deep as possible along a branch before backtracking. It uses a stack (or recursion) to keep track of nodes and backtracks when it reaches a dead end.


How it works: Starting from the source node, DFS goes deeper into the graph, visiting one branch of the graph first before backtracking to explore others.

Traversal Order: 

Starting from node 0:

Visit node 0 → Visit node 1 → Visit node 4 (deepest) → Backtrack to node 1 → Visit node 5 → Backtrack to node 0 → Visit node 2 → Visit node 6 → Backtrack to node 2 → Visit node 3 → Visit node 7.

Output: 0, 1, 4, 5, 2, 6, 3, 7


BFS (Breadth-First Search)

BFS (Breadth-First Search) is a graph traversal algorithm that explores nodes level by level, starting from a given source node. It visits all neighboring nodes at the current depth (or level) before moving on to nodes at the next depth level. BFS is commonly used for traversing or searching tree or graph data structures.


working of BFS

Approach: BFS explores the graph level by level, starting from the source node. It visits all the nodes at one level before moving to the next. BFS uses a queue to keep track of nodes that need to be explored.

How it works: From the source node, BFS moves to all its direct neighbors (Layer 1), then moves to the next layer (Layer 2), continuing until all nodes are visited.

Traversal Order: 

Starting from node 0:

Layer 0: Visit node 0 (source node).

Layer 1: Visit nodes 1, 2, 3 (neighbors of node 0).

Layer 2: Visit nodes 4, 5, 6, 7 (neighbors of nodes 1, 2, 3).

Output: 0, 1, 2, 3, 4, 5, 6, 7.


Time Complexity

Both have same complexity O(V + E)


Difference between BFS and DFS

  • BFS stands for Breadth First Search.

  • BFS(Breadth First Search) uses Queue data structure for finding the shortest path.

  • BFS is a traversal approach in which we first walk through all nodes on the same level before moving on to the next level. 

  • BFS builds the tree level by level.

  • It works on the concept of FIFO (First In First Out).

  • BFS is more suitable for searching vertices closer to the given source.

  • BFS is used in various applications such as bipartite graphs, shortest paths, etc. If weight of every edge is same, then BFS gives shortest pat from source to every other vertex.


DFS stands for Depth First Search.

  • DFS(Depth First Search) uses Stack data structure.

  • DFS is also a traversal approach in which the traverse begins at the root node and proceeds through the nodes as far as possible until we reach the node with no unvisited nearby nodes.

  • DFS builds the tree sub-tree by sub-tree.

  • It works on the concept of LIFO (Last In First Out).

  • DFS is more suitable when there are solutions away from source.

  • DFS is used in various applications such as acyclic graphs and finding strongly connected components etc. There are many applications where both BFS and DFS can be used like Topological Sorting, Cycle Detection, etc.

 Shortest Path Problem

https://www.hackerearth.com/practice/algorithms/graphs/shortest-path-algorithms/tutorial/

The shortest path algorithms are the ones that focuses on calculating the minimum travelling cost from source node to destination node of a graph in optimal time and space complexities.


Types of Shortest Path Algorithms:

As we know there are various types of graphs (weighted, unweighted, negative, cyclic, etc.) therefore having a single algorithm that handles all of them efficiently is not possible. In order to tackle different problems, we have different shortest-path algorithms, which can be categorised into two categories:


Single Source Shortest Path Algorithms:

In this algorithm we determine the shortest path of all the nodes in the graph with respect to a single node i.e. we have only one source node in this algorithms.


Depth-First Search (DFS)

Breadth-First Search (BFS)

Multi-Source BFS

Dijkstra's algorithm

Bellman-Ford algorithm

Topological Sort

A* search algorithm


Let us discuss these algorithms one by one.


1. Shortest Path Algorithm using Depth-First Search(DFS):

DFS algorithm recursively explores the adjacent nodes untill it reaches to the depth where no more valid recursive calls are possible. 

For DFS to be used as a shortest path algorithm, the graph needs to be acyclic i.e. a TREE, the reason it won't work for cyclic graphs is because due to cycles, the destination node can have multiple paths from the source node  and dfs will not be able to choose the best path.

If there does not exist a path between source node and destination node then we can store -1 as the shortest path between those nodes.


Algorithm:

Distance of source node to source node is initialized to 0.

Start the DFS from the source node.

As we go to a neighbouring nodes we set it's distance from source node = edge weight + distance of parent node.

DFS ends when all the leaf nodes are visited.

Complexity Analysis:


Time Complexity: O(N) where N is the number of nodes in the tree.

Auxiliary Space: O(N) call stacks in case of skewed tree.


2. Breadth-First Search (BFS) for Shortest Path Algorithm:

BFS is a great shortest path algorithm for all graphs, the path found by breadth first search to any node is the shortest path to that node, i.e the path that contains the smallest number of edges in unweighted graphs.

Reason: This works due to the fact that unlike DFS, BFS visits all the neighbouring nodes before exploring a single node to its neighbour. As a result, all nodes with distance 'd' from the source are visited after all the nodes with distance smaller than d. In simple terms all the nodes with distance 1 from the node will be visited first, then distance 2 nodes, then 3 and so on.


Algorithm:

Use a queue to store a pair of values {node, distance}

Distance of source node to source node is initialized to 0 i.e. push {src, 0} to the queue

While queue is not empty do the following:

Get the {node, distance} value of top of the queue

Pop the top element of the queue

Store the distance for the node as the shortest distance for that node from the source

For all non visited neighbours of the current node, push {neighbour, distance+1} into the queue and mark it visited


Complexity Analysis:

Time Complexity: O(N) where N is the number of nodes in the graph

Auxiliary Space: O(N) for storing distance for each node.


3. Multi-Source BFS for Shortest Path Algorithm:

Like BFS, it is also applicable to unweighted graph. To perform a Multi-source search, we basically start BFS from multiple nodes at the same time. When all the BFS meet, we’ve found the shortest path.


Algorithm:

Push all source nodes in a Queue with distance 0.

While the queue is not empty:

Get the {node, distance} value of top of the queue

Pop the top element of the queue

Store the distance for the node as the shortest distance for that node from the source

For all non visited neighbours of the current node, push {neighbour, distance+1} into the queue and mark it visited


Complexity Analysis:

Time Complexity: O(N) where N is the number of nodes in the tree.

Auxiliary Space: O(N) call stacks in case of skewed tree.


4. Dijkstra's Algorithm for Shortest Path Algorithm:

Dijkstra's algorithm finds the shortest path from a source node to all other nodes in a weighted graph by iteratively selecting the node with the smallest tentative distance and updating the distances to its neighbours. It ensures the shortest path is progressively discovered and is based on the principle of greedy optimization.


Algorithm:

Create a Set sptSet (shortest path tree set) that keeps track of vertices included in the shortest path tree, i.e., whose minimum distance from the source is calculated and finalized. Initially, this set is empty.

Assign a distance value to all vertices in the input graph. Initialize all distance values as INFINITE. Assign the distance value as 0 for the source vertex so that it is picked first.

While Set doesn’t include all vertices

Pick a vertex 'u' that is not there in Set and has a minimum distance value.

Include 'u' to sptSet.

Then update the distance value of all adjacent vertices of u.

To update the distance values, iterate through all adjacent vertices.

For every adjacent vertex v, if the sum of the distance value of u (from source) and weight of edge u-v, is less than the distance value of v, then update the distance value of v.


Complexity Analysis:

Time Complexity: O(E*log(V)), where E is the number of edges and V is the number of vertices in the graph:

Auxiliary Space: O(V)


5. Bellman-Ford algorithm for Shortest Path Algorithm:

Bellman-Ford is a single source shortest path algorithm that determines the shortest path between a given source vertex and every other vertex in a graph. This algorithm can be used on both weighted and unweighted graphs.

A Bellman-Ford algorithm is also guaranteed to find the shortest path in a graph, similar to Dijkstra’s algorithm. Although Bellman-Ford is slower than Dijkstra’s algorithm, it is capable of handling graphs with negative edge weights, which makes it more versatile. The shortest path cannot be found if there exists a negative cycle in the graph. If we continue to go around the negative cycle an infinite number of times, then the cost of the path will continue to decrease (even though the length of the path is increasing). As a result, Bellman-Ford is also capable of detecting negative cycles, which is an important feature.


Algorithm:

Initialize distance array dist[] for each vertex ‘v‘ as dist[v] = INFINITY.

Assume any vertex (let’s say ‘0’) as source and assign dist = 0.

Relax all the edges(u,v,weight) N-1 times as per the below condition:

dist[v] = minimum(dist[v], distance[u] + weight)


Complexity Analysis:

Time Complexity: O(V*E), where V is the number of vertices and E is the number of edges.

Auxiliary Space: O(V)


6. TopoLogical Sort:

Topological sorting can be used as a single source shortest path algorithm for Directed Acyclic Graphs (DAGs). This works because DAGs guarantees the non-existence of negative cycle.

Working: In DAGs the nodes with indegree=0 will act as source node because of their unreachability from all other nodes. These source nodes are then put into a queue and are allowed to run a DFS traversal to the adjacent nodes to calcualte the shortest path for nodes one by one.


Complexity Analysis:

Time Complexity: O(V+E).

Auxiliary space: O(V).


7. Floyd-Warshall algorithm for Shortest Path Algorithm:

The Floyd Warshall Algorithm is for solving all pairs of shortest-path problems. The problem is to find the shortest distances between every pair of vertices in a given edge-weighted directed Graph. 

It is an algorithm for finding the shortest path between all the pairs of vertices in a weighted graph. This algorithm follows the dynamic programming approach to find the shortest path.


Algorithm:

Initialize the solution matrix same as the input graph matrix as a first step.

Then update the solution matrix by considering all vertices as an intermediate vertex.

The idea is to one by one pick all vertices and updates all shortest paths which include the picked vertex as an intermediate vertex in the shortest path.

When we pick vertex number k as an intermediate vertex, we already have considered vertices {0, 1, 2, .. k-1} as intermediate vertices.

For every pair (i, j) of the source and destination vertices respectively, there are two possible cases.

k is not an intermediate vertex in shortest path from i to j. We keep the value of dist[i][j] as it is.

k is an intermediate vertex in shortest path from i to j. We update the value of dist[i][j] as dist[i][k] + dist[k][j] if dist[i][j] > dist[i][k] + dist[k][j]

Complexity Analysis:

Time Complexity: O(V^3), where V is the number of vertices.

Auxiliary Space: O(V^2)


8. A* Search Algorithm for Shortest Path Algorithm:

A* Search algorithm is one of the best and popular technique used in path-finding and graph traversals.

Informally speaking, A* Search algorithms, unlike other traversal techniques, it has “brains”. What it means is that it is really a smart algorithm which separates it from the other conventional algorithms. This fact is cleared in detail in below sections. 

It is also worth mentioning that many games and web-based maps use this algorithm to find the shortest path very efficiently (approximation). 


Complexity Analysis:

Time Complexity: O(E*logV), where E is the number of edges and V is the number of Vertices in the graph.

Auxiliary Space: O(V)

Minimum Spanning Tree

  • A minimum spanning tree (MST) is defined as a spanning tree that has the minimum weight among all the possible spanning trees.

  • The minimum spanning tree has all the properties of a spanning tree with an added constraint of having the minimum possible weights among all possible spanning trees. Like a spanning tree, there can also be many possible MSTs for a graph.

Let’s look at the MST of the above example Graph,

Algorithms to find Minimum Spanning Tree

There are several algorithms to find the minimum spanning tree from a given graph, some of them are listed below:


Kruskal’s Minimum Spanning Tree Algorithm

Kruskal's Algorithm is a greedy algorithm used to find the Minimum Spanning Tree (MST) of a connected, weighted graph by selecting edges with the least weight and ensuring no cycles are formed.

The algorithm sorts all edges by weight and adds them one by one to the MST, skipping those that would form a cycle, until all vertices are included in the MST.

This is one of the popular algorithms for finding the minimum spanning tree from a connected, undirected graph. This is a greedy algorithm. 

The algorithm workflow is as below:

First, it sorts all the edges of the graph by their weights, 

Then starts the iterations of finding the spanning tree. 

At each iteration, the algorithm adds the next lowest-weight edge one by one, such that the edges picked until now does not form a cycle.


Prim’s Minimum Spanning Tree Algorithm:

Prim's Algorithm is a greedy algorithm that finds the Minimum Spanning Tree (MST) by building the tree one vertex at a time, always choosing the minimum weight edge that connects a vertex in the tree to a vertex outside the tree.

The algorithm starts with a single vertex and repeatedly adds the smallest edge that expands the growing spanning tree until all vertices are included.

This is also a greedy algorithm. 

This algorithm has the following workflow:

It starts by selecting an arbitrary vertex and then adding it to the MST. 

Then, it repeatedly checks for the minimum edge weight that connects one vertex of MST to another vertex that is not yet in the MST. 

This process is continued until all the vertices are included in the MST. 


Applications of Minimum Spanning Trees:

  • Network design: Spanning trees can be used in network design to find the minimum number of connections required to connect all nodes. Minimum spanning trees, in particular, can help minimize the cost of the connections by selecting the cheapest edges.

  • Image processing: Spanning trees can be used in image processing to identify regions of similar intensity or color, which can be useful for segmentation and classification tasks.

  • Biology: Spanning trees and minimum spanning trees can be used in biology to construct phylogenetic trees to represent evolutionary relationships among species or genes.

  • Social network analysis: Spanning trees and minimum spanning trees can be used in social network analysis to identify important connections and relationships among individuals or groups.

Directed Acyclic Graph (DAG)

https://www.geeksforgeeks.org/introduction-to-directed-acyclic-graph/

Directed Acyclic Graph (DAG) is a directed graph that does not contain any cycles.

Below Graph represents a Directed Acyclic Graph (DAG):


features of DAG

Meaning of Directed Acyclic Graph:

Directed Acyclic Graph has two important features:

  • Directed Edges: In Directed Acyclic Graph, each edge has a direction, meaning it goes from one vertex (node) to another. This direction signifies a one-way relationship or dependency between nodes.

  • Acyclic: The term "acyclic" indicates that there are no cycles or closed loops within the graph. In other words, you cannot traverse a sequence of directed edges and return to the same node, following the edge directions. Formation of cycles is prohibited in DAG. Hence this characteristic is essential.


Properties of Directed Acyclic Graph DAG:

Directed Acyclic Graph (DAG) has different properties that make them usable in graph problems.

There are following properties of Directed Acyclic Graph (DAG):

  • Reachability Relation: In DAG, we can determine if there is a reachability relation between two nodes. Node A is said to be reachable from node B if there exists a directed path that starts at node B and ends at node A. This implies that you can follow the direction of edges in the graph to get from B to A.

  • Transitive Closure:The transitive closure of a directed graph is a new graph that represents all the direct and indirect relationships or connections between nodes in the original graph. In other words, it tells you which nodes can be reached from other nodes by following one or more directed edges.

  • Transitive Reduction: The transitive reduction of a directed graph is a new graph that retains only the essential, direct relationships between nodes, while removing any unnecessary indirect edges. In essence, it simplifies the graph by eliminating edges that can be inferred from the remaining edges.

  • Topological Ordering: A DAG can be topologically sorted, which means you can linearly order its nodes in such a way that for all the edges, start node of the edge occurs earlier in the sequence. This property is useful for tasks like scheduling and dependency resolution.


Practical Applications of DAG:

  • Data flow Analysis: In compiler design and optimization, DAGs are used to represent data flow within a program. This aids in optimizing code by identifying redundant calculations and dead code. DAGs are also used to represent the structure of basic blocks in Compiler Design.

  • Task Scheduling: DAGs are used in project management and job scheduling. Each task or job is represented as a node in the DAG, with directed edges indicating dependencies. The acyclic nature of the DAG ensures tasks are scheduled in a logical order, preventing circular dependencies.

Complexity Analysis of Algorithms

Complexity analysis is defined as a technique to characterise the time taken by an algorithm with respect to input size (independent from the machine, language and compiler). It is used for evaluating the variations of execution time on different algorithms.


What is the need for Complexity Analysis?

Complexity Analysis determines the amount of time and space resources required to execute it.

It is used for comparing different algorithms on different input sizes.

Complexity helps to determine the difficulty of a problem.

often measured by how much time and space (memory) it takes to solve a particular problem


Asymptotic Notations in Complexity Analysis:

1. Big O Notation

Big-O notation represents the upper bound of the running time of an algorithm. Therefore, it gives the worst-case complexity of an algorithm. By using big O- notation, we can asymptotically limit the expansion of a running time to a range of constant factors above and below. It is a model for quantifying algorithm performance. 

Mathematical Representation of Big-O Notation:

O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 }


2. Omega Notation

Omega notation represents the lower bound of the running time of an algorithm. Thus, it provides the best-case complexity of an algorithm.

The execution time serves as a lower bound on the algorithm’s time complexity. It is defined as the condition that allows an algorithm to complete statement execution in the shortest amount of time.


Mathematical Representation of Omega notation :

Ω(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0 }

Note: Ω (g) is a set


3. Theta Notation

Theta notation encloses the function from above and below. Since it represents the upper and the lower bound of the running time of an algorithm, it is used for analyzing the average-case complexity of an algorithm. The execution time serves as both a lower and upper bound on the algorithm’s time complexity. It exists as both, the most, and least boundaries for a given input value.

Mathematical Representation:

Θ (g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that 0 ≤ c1 * g(n) ≤ f(n) ≤ c2 * g(n) for all n ≥ n0}


Parameters to Measure Complexities

Time Complexity

Time complexity is the amount of time taken by an algorithm to execute each statement of the code till its completion. The time taken here is for the function of the length of the input and not the actual execution time of the machine on which the algorithm is running. The time complexity of algorithms is commonly expressed using the Big O notation.

To calculate the time complexity, total the cost of each fundamental instruction and the number of times the instruction is executed.


Space Complexity

Space complexity is the amount of memory space an algorithm or a problem takes during the execution of that particular problem/algorithm. Here, the space used includes both, the variables and the input values. Space complexity is the combination or sum up of the auxiliary space and the space used by input values. Auxiliary space is the temporary space required by an algorithm/problem during the execution of that algorithm/problem.


How does Complexity affect any algorithm?

Time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of length of the input. While, the space complexity of an algorithm quantifies the amount of space or memory taken by an algorithm to run  as a function of the length of the input.


How to optimize the time and space complexity of an Algorithm?

Optimization means modifying the brute-force approach to a problem. It is done to derive the best possible solution to solve the problem so that it will take less time and space complexity. We can optimize a program by either limiting the search space at each step or occupying less search space from the start.

We can optimize a solution using both time and space optimization. To optimize a program,

We can reduce the time taken to run the program and increase the space occupied;

we can reduce the memory usage of the program and increase its total run time, or

we can reduce both time and space complexity by deploying relevant algorithms

worst and average case analysis

1. Worst Case Analysis (Mostly used) 

In the worst-case analysis, we calculate the upper bound on the running time of an algorithm. We must know the case that causes a maximum number of operations to be executed.

For Linear Search, the worst case happens when the element to be searched (x) is not present in the array. When x is not present, the search() function compares it with all the elements of arr[] one by one.

This is the most commonly used analysis of algorithms (We will be discussing below why). Most of the time we consider the case that causes maximum operations.

2. Best Case Analysis (Very Rarely used) 

In the best-case analysis, we calculate the lower bound on the running time of an algorithm. We must know the case that causes a minimum number of operations to be executed.

For linear search, the best case occurs when x is present at the first location. The number of operations in the best case is constant (not dependent on n). So the order of growth of time taken in terms of input size is constant.

3. Average Case Analysis (Rarely used) 

In average case analysis, we take all possible inputs and calculate the computing time for all of the inputs. Sum all the calculated values and divide the sum by the total number of inputs.

We must know (or predict) the distribution of cases. For the linear search problem, let us assume that all cases are uniformly distributed (including the case of x not being present in the array). So we sum all the cases and divide the sum by (n+1). We take (n+1) to consider the case when the element is not present.


Why is Worst Case Analysis Mostly Used?

Average Case : The average case analysis is not easy to do in most practical cases and it is rarely done. In the average case analysis, we need to consider every input, its frequency and time taken by it which may not be possible in many scenarios

Best Case : The Best Case analysis is considered bogus. Guaranteeing a lower bound on an algorithm doesn’t provide any information as in the worst case, an algorithm may take years to run.

Worst Case: This is easier than average case and gives an upper bound which is useful information to analyze software products.

Time and Space Analysis of Algorithms 

Time Complexity:

  • Time complexity measures the amount of time an algorithm takes to complete as a function of the input size. 

  • Time complexity measures the time taken by every statement of the algorithm. Hence, it highly depends on the size of processed data. To estimate the time complexity, we need to consider the cost of each fundamental instruction and the number of times the instruction is executed.

  • We often express time complexity using Big O notation, which describes the upper bound of the growth rate.

or,

  • Time complexity is a metric used to describe how the execution time of an algorithm changes relative to the size of the input data. It provides a way to estimate the number of steps an algorithm will take to complete its task as the amount of data increases. 

  • This is crucial in data structures and algorithms (DSA) because it helps predict how algorithms will perform as they handle larger datasets, ensuring that applications remain efficient and responsive under various conditions.

  • Thus, time complexity helps developers choose the most appropriate data structures and algorithms, ensuring optimal performance for software applications.


Different Types of Time Complexities

The following are the primary types of time complexities that commonly appear in algorithm analysis:


1. Constant Time: O(1)

An algorithm is said to have a constant time complexity when the time to complete does not depend on the size of the input data. This is the ideal scenario for any operation. 

Examples: 

Accessing any element in an array by index.


2. Logarithmic Time: O(log n)

Logarithmic time complexity occurs when the algorithm reduces the problem size by a factor with each step. Algorithms that operate this way become faster than linear time as the size of the input data increases. 

Examples:

Binary search is a classic example of logarithmic time complexity, where the dataset is halved with each iteration.


3. Linear Time: O(n)

An algorithm is said to have linear time complexity when the time to complete is directly proportional to the input size. This means if you double the input size, the runtime will also double. 

Examples:

Scanning an array or a linked list from start to finish


4. Linearithmic Time: O(n log n)

This complexity arises when an algorithm performs a logarithmic operation (like divide and conquer) multiple times linearly across the data set. Common sorting algorithms like mergesort and heapsort exhibit linearithmic time complexity. 

They are highly efficient for large datasets and are preferred over simpler sorting methods like bubble sort.


5. Quadratic Time: O(n²)

Algorithms with quadratic time complexity are generally characterized by two nested loops. Each element of the input data is processed multiple times, making these algorithms inefficient as the data size grows. 

Examples:

Simple sorting algorithms like bubble sort, insertion sort, and selection sort.


6. Cubic Time: O(n³)

Cubic time complexity appears in algorithms involving three nested loops. This is generally seen in more complex mathematical computations such as matrix multiplication or solving three-body problems in physics.


7. Exponential Time: O(2^n)

Algorithms with exponential time complexities double the amount of work with every additional element in the input data. 

These are usually seen in brute-force solutions of complex problems, like calculating the Fibonacci sequence using a naive recursive approach or solving the traveling salesman problem through all permutations of cities.


Space complexity

Space complexity is a measure of the amount of memory (or space) an algorithm needs to run as a function of the length of the input. 

It includes both the temporary space allocated by the algorithm during its execution and the space required for the inputs to the algorithm. 

Space complexity is analyzed similarly to time complexity, as it helps in understanding the resource usage of an algorithm, which is crucial for applications where memory usage is a critical factor (such as in embedded systems or mobile applications).


TYPES:

Similar to time complexity, there are different types of space complexity, depending on the memory consumed by each algorithm.

1. Constant Space (O(1)):

Algorithms with constant space complexity use a fixed amount of memory that does not depend on the input size.


2. Linear Space (O(n)):

Linear space complexity implies that the memory usage grows linearly with the size of the input.

Hashing

  • Hashing is a data structure, where we can store the data and look up that data very quickly. Hashing uses a special formula called a hash function to map data to a location in the data structure.

  • The hash function takes the data as input and returns an index in the data structure where the data should be stored. This allows us to quickly retrieve the data by using the hash function to calculate the index and then looking up the data at that index.


Imagine you have a list of names and you want to look up a specific name in the list. You could use a hash function to map the name to an index in a data structure, such as an array or a hash table. This allows you to quickly find the name in the data structure without having to search through the entire list.


What is Hash Function?

A hash function is a mathematical function, which takes an input and returns a fixed size string of bytes. The output of the hash function is called a hash value or hash code. The hash function is designed to be fast and efficient, so that it can quickly calculate the hash value for a given input.


Hash functions have several important properties:

These are deterministic meaning that the same input will always produce the same output.

They can quickly calculate the hash value for a given input.

Hash functions are secure, meaning that it is difficult to reverse-engineer the input from the hash value.

It has a fixed output size, so the hash value is always the same length.


What is Hash Table?

A hash table is a data structure that make use of hash function to map keys to values. It consists of an array of buckets, where each bucket stores a key-value pair.

The hash function is used to calculate the index of the bucket where the key-value pair should be stored. This allows us to quickly retrieve the value associated with a key by using the hash function to calculate the index and then looking up the value at that index.


Properties of Hash Table

Hash tables have several important properties:

They provide fast lookups, insertions, and deletions, with an average time complexity of O(1).

They can store a large amount of data easily, with a space complexity of O(n).

It can handle collisions, where two keys map to the same index, by using techniques like chaining or open addressing.


Collision in Hashing

Collision in hashing occurs when we get similar output values, or rather hash values, for different input values. This can happen due to the limited range of hash values or the nature of the hash function.


Hashing Algorithms

There are several hashing algorithms that are commonly used in computer science, such as:

MD5 (Message Digest Algorithm 5)

SHA-1 (Secure Hash Algorithm 1)

SHA-256 (Secure Hash Algorithm 256)

SHA-512 (Secure Hash Algorithm 512)

CRC32 (Cyclic Redundancy Check 32)


Applications of Hashing

Hashing is used in various applications in computer science, such as:

Storing and retrieving data quickly.

Implementing data structures like hash tables and hash maps.

Searching and indexing data efficiently.

Ensuring data integrity and security.

Password hashing and encryption.


Sorting

Sorting refers to rearrangement of a given array or list of elements according to a comparison operator on the elements. The comparison operator is used to decide the new order of elements in the respective data structure.

Sorting algorithms are used to arrange a list of elements in a specific order, such as ascending or descending or any other user specified order like sorting strings by lengths.


Imagine you have a list of numbers or names, and you want to organize them from smallest to largest, or alphabetically. Sorting algorithms help you do that efficiently.


Characteristics of Sorting Algorithms

Let’s know about the main characteristics and properties of algorithms of sorting:

In-place Sorting and Not-in-place Sorting

Sorting algorithms may require some extra space for comparison and temporary storage of few data elements. These algorithms do not require any extra space and sorting is said to happen in-place, or for example, within the array itself. This is called in-place sorting. Bubble sort is an example of in-place sorting.

However, in some sorting algorithms, the program requires space which is more than or equal to the elements being sorted. Sorting which uses equal or more space is called not-in-place sorting. Merge-sort is an example of not-in-place sorting.


Stable and Not Stable Sorting

If a sorting algorithm, after sorting the contents, does not change the sequence of similar content in which they appear, it is called stable sorting.

If a sorting algorithm, after sorting the contents, changes the sequence of similar content in which they appear, it is called unstable sorting.


In-place Sorting and Not-in-place Sorting

Sorting algorithms may require some extra space for comparison and temporary storage of few data elements. These algorithms do not require any extra space and sorting is said to happen in-place, or for example, within the array itself. This is called in-place sorting. Bubble sort is an example of in-place sorting.

However, in some sorting algorithms, the program requires space which is more than or equal to the elements being sorted. Sorting which uses equal or more space is called not-in-place sorting. Merge-sort is an example of not-in-place sorting.


Adaptive and Non-Adaptive Sorting Algorithm

A sorting algorithm is said to be adaptive, if it takes advantage of already 'sorted' elements in the list that is to be sorted. That is, while sorting if the source list has some element already sorted, adaptive algorithms will take this into account and will try not to re-order them.

A non-adaptive algorithm is one which does not take into account the elements which are already sorted. They try to force every single element to be re-ordered to confirm their sorted ness.



Classification of Sort Algorithms

We can classify the various types of sorting in data structure as:


1. Based on Comparison:

Comparison-based Sorting: These algorithms sort data by comparing elements. Examples include Bubble Sort, Selection Sort, Insertion Sort, Merge Sort, Quick Sort, and Heap Sort.


Non-comparison-based Sorting: These algorithms sort data without comparing elements directly. Examples include Counting Sort, Radix Sort, and Bucket Sort.


2. Based on Stability

Stable Sorting Algorithms: Stable sort algorithms maintain the relative order of equal elements. Examples include Bubble Sort, Merge Sort, and Insertion Sort.


Unstable Sorting Algorithms: Unstable sort algorithms do not guarantee the relative order of equal elements. Examples include Quick Sort, Heap Sort, and Selection Sort.


Types of Sorting Algorithms in Data Structure

These are the main types of sorting in data structures:


1. Comparison-based:

Bubble sort

Selection sort

Insertion sort

Merge sort

Quick sort

Heap sort


2. Non-comparison-based:

Counting sort

Radix sort

Bucket sort


Bubble sort

Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the adjacent elements if they are in the wrong order. This algorithm is not suitable for large data sets as its average and worst-case time complexity are quite high.


Bubble Sort Algorithm

Bubble Sort is an elementary sorting algorithm, which works by repeatedly exchanging adjacent elements, if necessary. When no exchanges are required, the file is sorted.

We assume list is an array of n elements. We further assume that swap function swaps the values of the given array elements.

Step 1 − Check if the first element in the input array is greater than the next element in the array.

Step 2 − If it is greater, swap the two elements; otherwise move the pointer forward in the array.

Step 3 − Repeat Step 2 until we reach the end of the array.

Step 4 − Check if the elements are sorted; if not, repeat the same process (Step 1 to Step 3) from the last element of the array to the first.

Step 5 − The final output achieved is the sorted array.


Algorithm: Sequential-Bubble-Sort (A)

fori ← 1 to length [A] do

for j ← length [A] down-to i +1 do

   if A[A] < A[j-1] then

      Exchange A[j] ⟷ A[j-1]


Complexity Analysis of Bubble Sort:

Time Complexity: O(n2)

Auxiliary Space: O(1)

Please refer Complexity Analysis of Bubble Sort for details.


Advantages of Bubble Sort:

Bubble sort is easy to understand and implement.

It does not require any additional memory space.

It is a stable sorting algorithm, meaning that elements with the same key value maintain their relative order in the sorted output.

Disadvantages of Bubble Sort:

Bubble sort has a time complexity of O(n2) which makes it very slow for large data sets.

Bubble sort has almost no or limited real world applications. It is mostly used in academics to teach different ways of sorting.


Insertion Sort

Insertion sort is a simple sorting algorithm that works by iteratively inserting each element of an unsorted list into its correct position in a sorted portion of the list. It is like sorting playing cards in your hands. You split the cards into two groups: the sorted cards and the unsorted cards. Then, you pick a card from the unsorted group and put it in the right place in the sorted group.


Insertion Sort Algorithm

Now we have a bigger picture of how this sorting technique works, so we can derive simple steps by which we can achieve insertion sort.

Step 1 − If it is the first element, it is already sorted. return 1;

Step 2 − Pick next element

Step 3 − Compare with all elements in the sorted sub-list

Step 4 − Shift all the elements in the sorted sub-list that is greater than the value to be sorted

Step 5 − Insert the value

Step 6 − Repeat until list is sorted


Selection Sort

Selection Sort is a comparison-based sorting algorithm. It sorts an array by repeatedly selecting the smallest (or largest) element from the unsorted portion and swapping it with the first unsorted element. This process continues until the entire array is sorted.

First we find the smallest element and swap it with the first element. This way we get the smallest element at its correct position.

Then we find the smallest among remaining elements (or second smallest) and swap it with the second element.

We keep doing this until we get all elements moved to correct position.


Selection Sort Algorithm

This type of sorting is called Selection Sort as it works by repeatedly sorting elements. That is: we first find the smallest value in the array and exchange it with the element in the first position, then find the second smallest element and exchange it with the element in the second position, and we continue the process in this way until the entire array is sorted.

1. Set MIN to location 0.

2. Search the minimum element in the list.

3. Swap with value at location MIN.

4. Increment MIN to point to next element.

5. Repeat until the list is sorted.


Complexity Analysis of Selection Sort

Time Complexity: O(n^2)


Advantages of Selection Sort

Easy to understand and implement, making it ideal for teaching basic sorting concepts.

Requires only a constant O(1) extra memory space.

It requires less number of swaps (or memory writes) compared to many other standard algorithms. Only cycle sort beats it in terms of memory writes. Therefore it can be simple algorithm choice when memory writes are costly.

Disadvantages of the Selection Sort

Selection sort has a time complexity of O(n^2) makes it slower compared to algorithms like Quick Sort or Merge Sort.

Does not maintain the relative order of equal elements which means it is not stable.

Applications of Selection Sort

Perfect for teaching fundamental sorting mechanisms and algorithm design.

Suitable for small lists where the overhead of more complex algorithms isn’t justified and memory writing is costly as it requires less memory writes compared to other standard sorting algorithms.

Heap Sort algorithm is based on Selection Sort.


Merge Sort

Merge sort is a popular sorting algorithm known for its efficiency and stability. It follows the divide-and-conquer approach. It works by recursively dividing the input array into two halves, recursively sorting the two halves and finally merging them back together to obtain the sorted array.


How does Merge Sort work?

Here’s a step-by-step explanation of how merge sort works:

Divide:  Divide the list or array recursively into two halves until it can no more be divided. 

Conquer:  Each subarray is sorted individually using the merge sort algorithm. 

Merge:  The sorted subarrays are merged back together in sorted order. The process continues until all elements from both subarrays have been merged. 


Merge Sort Algorithm

Merge sort keeps on dividing the list into equal halves until it can no more be divided. By definition, if it is only one element in the list, it is considered sorted. Then, merge sort combines the smaller sorted lists keeping the new list sorted too.

Step 1: If it is only one element in the list, consider it already sorted, so return.

Step 2: Divide the list recursively into two halves until it can no  more be divided.

Step 3: Merge the smaller lists into new list in sorted order.


Time Complexity:O(n log n)


Advantages

Stability : Merge sort is a stable sorting algorithm, which means it maintains the relative order of equal elements in the input array.

Guaranteed worst-case performance: Merge sort has a worst-case time complexity of O(N logN) , which means it performs well even on large datasets.

Simple to implement: The divide-and-conquer approach is straightforward.

Naturally Parallel : We independently merge subarrays that makes it suitable for parallel processing.


Disadvantages

Space complexity: Merge sort requires additional memory to store the merged sub-arrays during the sorting process.

Not in-place: Merge sort is not an in-place sorting algorithm, which means it requires additional memory to store the sorted data. This can be a disadvantage in applications where memory usage is a concern.

Merge Sort is Slower than QuickSort in general as QuickSort is more cache friendly because it works in-place.


Quick Sort

QuickSort is a sorting algorithm based on the Divide and Conquer that picks an element as a pivot and partitions the given array around the picked pivot by placing the pivot in its correct position in the sorted array.

It works on the principle of divide and conquer, breaking down the problem into smaller sub-problems.


There are mainly three steps in the algorithm:

Choose a Pivot: Select an element from the array as the pivot. The choice of pivot can vary (e.g., first element, last element, random element, or median).

Partition the Array: Rearrange the array around the pivot. After partitioning, all elements smaller than the pivot will be on its left, and all elements greater than the pivot will be on its right. The pivot is then in its correct position, and we obtain the index of the pivot.

Recursively Call: Recursively apply the same process to the two partitioned sub-arrays (left and right of the pivot).

Base Case: The recursion stops when there is only one element left in the sub-array, as a single element is already sorted.


Heap Sort

Heap sort is a comparison-based sorting technique based on Binary Heap Data Structure. It can be seen as an optimization over selection sort where we first find the max (or min) element and swap it with the last (or first). We repeat the same process for the remaining elements. In Heap Sort, we use Binary Heap so that we can quickly find and move the max element in O(Log n) instead of O(n) and hence achieve the O(n Log n) time complexity.


Applications of Sorting Algorithms:

Quickly Finding k-th Smallest or K-th Largest : Once we sort the array, we can find k-th smallest and k-th largest elements in O(1) time for different values of k.

Searching Algorithms: Sorting is often a crucial step in search algorithms like binary search, Ternary Search, where the data needs to be sorted before searching for a specific element.

Data management: Sorting data makes it easier to search, retrieve, and analyze.

Database optimization: Sorting data in databases improves query performance. We typically keep the data sorted by primary index so that we can do quick queries.

Machine learning: Sorting is used to prepare data for training machine learning models.

Data Analysis: Sorting helps in identifying patterns, trends, and outliers in datasets. It plays a vital role in statistical analysis, financial modeling, and other data-driven fields.

Operating Systems: Sorting algorithms are used in operating systems for tasks like task scheduling, memory management, and file system organization.

Advantages of Sorting Algorithms:

Efficiency: Sorting algorithms help in arranging data in a specific order, making it easier and faster to search, retrieve, and analyze information.

Improved Performance: By organizing data in a sorted manner, algorithms can perform operations more efficiently, leading to improved performance in various applications.

Simplified data analysis: Sorting makes it easier to identify patterns and trends in data.

Reduced memory consumption: Sorting can help reduce memory usage by eliminating duplicate elements.

Improved data visualization: Sorted data can be visualized more effectively in charts and graphs.

Disadvantages of Sorting Algorithms:

Insertion: If we wish to keep data sorted, then insertion operation becomes costly as we have to maintain sorted order. If we do not have to maintain sorted order, we can simply insert at the end.

Algorithm selection: Choosing the most appropriate sorting algorithm for a given dataset can be challenging.

For a lot of problems hashing works better than sorting, for example, finding distinct elements, finding a pair with given sum.

Search

https://www.geeksforgeeks.org/introduction-to-searching-algorithms-2/

Searching is the fundamental process of locating a specific element or item within a collection of data. This collection of data can take various forms, such as arrays, lists, trees, or other structured representations.

The primary objective of searching is to determine whether the desired element exists within the data, and if so, to identify its precise location or retrieve it. It plays an important role in various computational tasks and real-world applications, including information retrieval, data analysis, decision-making processes, and more.


Importance of Searching in DSA

Efficiency: Efficient searching algorithms improve program performance.

Data Retrieval: Quickly find and retrieve specific data from large datasets.

Database Systems: Enables fast querying of databases.

Problem Solving: Used in a wide range of problem-solving tasks.


Characteristics of Searching

Understanding the characteristics of searching in data structures and algorithms is crucial for designing efficient algorithms and making informed decisions about which searching technique to employ. Here, we explore key aspects and characteristics associated with searching:


1. Target Element:

In searching, there is always a specific target element or item that you want to find within the data collection. This target could be a value, a record, a key, or any other data entity of interest.


2. Search Space:

The search space refers to the entire collection of data within which you are looking for the target element. Depending on the data structure used, the search space may vary in size and organization.


3. Complexity:

Searching can have different levels of complexity depending on the data structure and the algorithm used. The complexity is often measured in terms of time and space requirements.


4. Deterministic vs Non-deterministic:

Some searching algorithms, like binary search, are deterministic, meaning they follow a clear and systematic approach. Others, such as linear search, are non-deterministic, as they may need to examine the entire search space in the worst case.


Applications of Searching:

Searching algorithms have numerous applications across various fields. Here are some common applications:

Information Retrieval: Search engines like Google, Bing, and Yahoo use sophisticated searching algorithms to retrieve relevant information from vast amounts of data on the web.

Database Systems: Searching is fundamental in database systems for retrieving specific data records based on user queries, improving efficiency in data retrieval.

E-commerce: Searching is crucial in e-commerce platforms for users to find products quickly based on their preferences, specifications, or keywords.

Networking: In networking, searching algorithms are used for routing packets efficiently through networks, finding optimal paths, and managing network resources.

Artificial Intelligence: Searching algorithms play a vital role in AI applications, such as problem-solving, game playing (e.g., chess), and decision-making processes

Pattern Recognition: Searching algorithms are used in pattern matching tasks, such as image recognition, speech recognition, and handwriting recognition.


Types of Searching

here are two most common algorithms used based on the type of input array.


1. Linear Search:

Linear Search, also known as Sequential Search, is one of the simplest and most straightforward searching algorithms. It works by sequentially examining each element in a collection of data(array or list) until a match is found or the entire collection has been traversed.


Algorithm of Linear Search:

The Algorithm examines each element, one by one, in the collection, treating each element as a potential match for the key you're searching for.

If it finds any element that is exactly the same as the key you're looking for, the search is successful, and it returns the index of key.

If it goes through all the elements and none of them matches the key, then that means "No match is Found".


Pseudo Code for Linear Search:

LinearSearch(collection, key):

 for each element in collection:

 if element is equal to key:

 return the index of the element

 return "Not found"


Complexity Analysis of Linear Search:

Time Complexity:O(N)

Auxiliary Space: O(1) as except for the variable to iterate through the list, no other variable is used.

When to use Linear Search:

When there is small collection of data.

When data is unordered.


2. Binary Search:

Binary Search is defined as a searching algorithm used in a sorted array by repeatedly dividing the search interval in half. The idea of binary search is to use the information that the array is sorted and reduce the time complexity to O(log N).

Algorithm of Binary Search:

Divide the search space into two halves by finding the middle index “mid”.

Compare the middle element of the search space with the key.

If the key is found at middle element, the process is terminated.

If the key is not found at middle element, choose which half will be used as the next search space.

If the key is smaller than the middle element, then the left side is used for next search.

If the key is larger than the middle element, then the right side is used for next search.

This process is continued until the key is found or the total search space is exhausted.


Pseudo Code for Binary Search:

Below is the pseudo code for implementing binary search:

binarySearch(collection, key):

 left = 0

 right = length(collection) - 1

 while left <= right:

 mid = (left + right) // 2

 if collection[mid] == key:

 return mid

 elif collection[mid] < key:

 left = mid + 1

 else:

right = mid - 1

 return "Not found"


Complexity Analysis of Binary Search:

Time Complexity:

O(log N) - When the key is not present, and the search space is continuously halved.


When to use Binary Search:

When the data collection is monotonic (essential condition) in nature.

When efficiency is required, specially in case of large datasets.

Graph

https://www.geeksforgeeks.org/what-is-graph-data-structure/

A Graph is a non-linear data structure consisting of vertices and edges. The vertices are sometimes also referred to as nodes and the edges are lines or arcs that connect any two nodes in the graph. More formally a Graph is composed of a set of vertices( V ) and a set of edges( E ). The graph is denoted by G(E, V).

Components of a Graph

Vertices: Vertices are the fundamental units of the graph. Sometimes, vertices are also known as vertices or nodes. Every node/vertex can be labeled or unlabelled.

Edges: Edges are drawn or used to connect two nodes of the graph. It can be ordered pair of nodes in a directed graph. Edges can connect any two nodes in any possible way. There are no rules. Sometimes, edges are also known as arcs. Every edge can be labeled/unlabelled.

Types Of Graph

  • Null Graph: A graph is known as a null graph if there are no edges in the graph.

  • Trivial Graph: Graph having only a single vertex, it is also the smallest graph possible.

  • Undirected Graph: A graph in which edges do not have any direction. That is the nodes are unordered pairs in the definition of every edge. 

  • Directed Graph: A graph in which edge has direction. That is the nodes are ordered pairs in the definition of every edge.

  • Connected Graph: The graph in which from one node we can visit any other node in the graph is known as a connected graph. 

  • Disconnected Graph: The graph in which at least one node is not reachable from a node is known as a disconnected graph.

  • Regular Graph: The graph in which the degree of every vertex is equal to K is called K regular graph.

  • Complete Graph: The graph in which from each node there is an edge to each other node.

  • Cycle Graph: The graph in which the graph is a cycle in itself, the degree of each vertex is 2. 

  • Cyclic Graph: A graph containing at least one cycle is known as a Cyclic graph.

  • Directed Acyclic Graph: A Directed Graph that does not contain any cycle. 

  • Bipartite Graph: A graph in which vertex can be divided into two sets such that vertex in each set does not contain any edge between them.

  • Weighted Graph: A graph in which the edges are already specified with suitable weight is known as a weighted graph. Weighted graphs can be further classified as: directed weighted graphs and undirected weighted graphs. 

Graph traversal

https://techskillguru.com/ds/depth-first-search-traversal

Graph traversal is the process of visiting all the vertices (nodes) in a graph in a systematic way. It involves exploring or traversing each vertex in a graph by following the edges that connect the vertices.


There are two common methods for graph traversal: Breadth-First Search (BFS) and Depth-First Search (DFS). BFS explores all the neighboring nodes at the current depth before moving on to nodes at the next depth level, while DFS explores the deepest vertices of a graph before backtracking to explore other vertices.


Graph traversal is used in many graph algorithms, such as finding the shortest path between two vertices, checking if a graph is connected, finding cycles in a graph, and more. By visiting all the vertices in a graph, graph traversal helps to uncover the structure and properties of the graph, which can be used to solve various problems.


Types of Graph Traversal

There are two common methods for graph traversal: Breadth-First Search (BFS) and Depth-First Search (DFS).


Breadth-First Search (BFS)

Breadth-First Search (BFS) is a graph traversal algorithm that explores all the neighboring nodes at the current depth before moving on to nodes at the next depth level. It starts at the root node (or any other randomly chosen node) and explores all the vertices in the graph.

The BFS algorithm uses a queue data structure to keep track of the nodes to visit.


Breadth-First Search (BFS)  Algorithm

Step 1: Define a queue with the same number of nodes as the graph.

Step 2: Start traversal from starting point and put it into the Queue and count it as visited.

Step 3: Enqueue all the neighboring nodes of the visited node that have not been visited yet.

Step 4: When there are no more vertices to visit . Dequeue that node (front Node) from the queue.

Step 5: Repeat steps 3 and 4 until queue is empty.

Step 6: When the queue is empty, make the final spanning tree by removing unused lines from the graph.


DFS (Depth First Search)

DFS (Depth First Search) is a simple algorithm used for traversing or searching a tree or graph data structure.

The algorithm works by exploring a path as far as possible before backtracking. Starting at a root node or vertex, the algorithm explores each branch of the tree or graph by visiting the first unvisited node it encounters, then recursively applying the same rule to each subsequent node or vertex until it reaches a dead-end or visit all the vertices in graph.


Algorithm to Perform Depth First Search Traversal in Graph

We use the following steps to implement DFS traversal...


Step 1 - Define a Stack of size equal to the numbers of vertices in the graph..

Step 2 - Choose any vertex as starting point for traversal. Visit that vertex and push it on to the Stack.

Step 3 - Push any non-visited adjacent vertex of a top vertex into the stack.

Step 4 - Repeat step 3 until there is no new vertex to visit from the top of the stack.

Step 5 - Backtrack and remove one vertex from the stack if there are no new vertices.

Step 6 - Repeat steps 3, 4 and 5 until stack becomes Empty.

Step 7 - When stack is empty, remove unused graph edges to create final spanning tree.










Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Pure Versus Partial EC

Discuss classification or taxonomy of virtualization at different levels.