Sunteți pe pagina 1din 28

Hashing in Data Structure

In data structures,

 There are several searching techniques like linear search, binary search, search
trees etc.
 In these techniques, time taken to search any particular element depends on the
total number of elements.
 
Example-
 

 Linear Search takes O(n) time to perform the search in unsorted arrays


consisting of n elements.
 Binary Search takes O(logn) time to perform the search in sorted arrays
consisting of n elements.
 It takes O(logn) time to perform the search in Binary Search Tree consisting of n
elements.
 
Drawback-
 
The main drawback of these techniques is-

 As the number of elements increases, time taken to perform the search also
increases.
 This becomes problematic when total number of elements become too large.
 

Hashing in Data Structure-


In data structures,

 Hashing is a well-known technique to search any particular element among


several elements.
 It minimizes the number of comparisons while performing the search.
Hashing is defined as follows...

Hashing is the process of indexing and retrieving element (data) in a data


structure to provide a faster way of finding the element using a hash key.

Advantage-
Unlike other searching techniques,

 Hashing is extremely efficient.


 The time taken by it to perform the search does not depend upon the total
number of elements.
 It completes the search with constant time complexity O(1).
 
Hashing Mechanism-
In hashing,

 An array data structure called as Hash table is used to store the data items.
 Based on the hash key value, data items are inserted into the hash table.
 Hash Table is defined as follows...
Hash table is just an array which maps a key (data) into the data structure with
the help of hash function such that insertion, deletion and search operations
are performed with constant time complexity (i.e. O(1)).

Hash Key Value-


 Hash key value is a special value that serves as an index for a data item.
 It indicates where the data item should be be stored in the hash table.
 Hash key value is generated using a hash function.

Basic concept of hashing and hash table is shown in the following figure...

Types of Hash Functions-


 
There are various types of hash functions available such as-
1. Mid Square Hash Function
2. Division Hash Function
3. Folding Hash Function etc

1. Division method
In this the hash function is dependent upon the remainder of a division. For
example:-if the record 52,68,99,84 is to be placed in a hash table and let us
take the table size is 10.

Then:
h(key)=record% table size.
2=52%10
8=68%10
9=99%10
4=84%10

2. Mid square method


In this method firstly key is squared and then mid part of the result is taken
as the index. For example: consider that if we want to place a record of
3101 and the size of table is 1000. So 3101*3101=9616201 i.e. h (3101) =
162 (middle 3 digit)

3. Digit folding method


In this method the key is divided into separate parts and by using some
simple operations these parts are combined to produce a hash key. For
example: consider a record of 12465512 then it will be divided into parts i.e.
124, 655, 12. After dividing the parts combine these parts by adding it.

H(key)=124+655+12
=791

Collision
It is a situation in which the hash function returns the same hash key for
more than one record, it is called as collision. Sometimes when we are
going to resolve the collision it may lead to a overflow condition and this
overflow and collision condition makes the poor hash function.
Collision resolution technique
 Collision occurs when hash value of the new key maps to an occupied bucket of
the hash table.
 Collision resolution techniques are classified as-

1) Chaining (open hashing)

It is a method in which additional field with data i.e. chain is introduced. A


chain is maintained at the home bucket. In this when a collision occurs then
a linked list is maintained for colliding data.

The idea is to make each cell of hash table point to a linked list of records that
have same hash function value.
Let us consider a simple hash function as “key mod 7” and sequence of keys as
50, 700, 76, 85, 92, 73, 101.

Example2

Open Addressing-
In open addressing,

 Unlike separate chaining, all the keys are stored inside the hash table.
 No key is stored outside the hash table.
Techniques used for open addressing are-
 Linear Probing
 Quadratic Probing
 Double Hashing

1) Linear probing

It is very easy and simple method to resolve or to handle the collision. In


this collision can be solved by placing the second record linearly down,
whenever the empty place is found. In this method there is a problem of
clustering which means at some place block of a data is formed in a hash
table.

let hash(x) be the slot index computed using hash function and S be the table
size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
..................................................
..................................................

Example: Let us consider a hash table of size 10 and hash function is


defined as H(key)=key % table size.

In this diagram we can see that 56 and 36 need to be placed at same


bucket but by linear probing technique the records linearly placed
downward if place is empty i.e. it can be seen 36 is placed at index 7.

Example2
2) Quadratic probing

This is a method in which solving of clustering problem is done. In this


method the hash function is defined by the H(key)=(H(key)+x*x)%table
size.

We look for i2‘th slot in i’th iteration.


let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
..................................................
..................................................
Example1

Let us consider we have to insert following elements that are:-

67, 90,55,17,49.

In this we can see if we insert 67, 90, and 55 it can be inserted easily but at
case of 17 hash function is used in such a manner that :-(17+0*0)%10=17
(when x=0 it provide the index value 7 only) by making the increment in
value of x. let x =1 so (17+1*1)%10=8.in this case bucket 8 is empty hence
we will place 17 at index 8.

3) Double hashing
It is a technique in which two hash function are used when there is an
occurrence of collision. In this method 1 hash function is simple as same as
division method. But for the second hash function there are two important
rules which are

1. It must never evaluate to zero.


2. Must sure about the buckets, that they are probed.

The hash functions for this technique are:

H1(key)=key % table size


H2(key)=P-(key mod P)

Where, p is a prime number which should be taken smaller than the size of
a hash table.

We use another hash function hash2(x) and look for i*hash2(x) slot in i’th rotation.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) +
2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) +
3*hash2(x)) % S
..................................................
..................................................

Example1: Let us consider we have to insert 89,18,49,58,69

Example2: Let us consider we have to insert 67, 90,55,17,49.


In this we can see 67, 90 and 55 can be inserted in a hash table by using
first hash function but in case of 17 again the bucket is full and in this case
we have to use the second hash function which is H2(key)=P-(key mode P)
here p is a prime number which should be taken smaller than the hash
table so value of p will be the 7.

i.e. H2(17)=7-(17%7)=7-3=4 that means we have to take 4 jumps for


placing the 17. Therefore 17 will be placed at index 1.

S.No

. Seperate Chaining Open Addressing

Open Addressing requires more

1. Chaining is Simpler to implement. computation.

In chaining, Hash table never fills

up, we can always add more In open addressing, table may

2. elements to chain. become full.

Open addressing requires extra

Chaining is Less sensitive to the care for to avoid clustering and

3. hash function or load factors. load factor.

4. Chaining is mostly used when it is Open addressing is used when

unknown how many and how the frequency and number of


S.No

. Seperate Chaining Open Addressing

frequently keys may be inserted or

deleted. keys is known.

Cache performance of chaining is Open addressing provides better

not good as keys are stored using cache performance as everything

5. linked list. is stored in the same table.

Wastage of Space (Some Parts of In Open addressing, a slot can

hash table in chaining are never be used even if an input doesn’t

6. used). map to it.

7. Chaining uses extra space for links. No links in Open addressing

MCQs
1)A hash table of length 10 uses open addressing with hash function h(k)=k mod 10, and
linear probing. After inserting 6 values into an empty hash table, the table is as shown
below. 

 
Which one of the following choices gives a possible order in which the key values could have
been inserted in the table?

A 46, 42, 34, 52, 23, 33


B 34, 42, 23, 52, 33, 46
C 46, 34, 42, 23, 52, 33
D 42, 46, 33, 23, 34, 52
Answer: (C) 

How many different insertion sequences of the key values using the hash function h(k) =
k mod 10 and linear probing will result in the hash table shown below? 

A 10
B 20
C 30
D 40
Answer: C

Explanation: 
In a valid insertion sequence, the elements 42, 23 and 34 must appear before 52
and 33, and 46 must appear before 33. Total number of different sequences = 3!
x 5 = 30 In the above expression, 3! is for elements 42, 23 and 34 as they can
appear in any order, and 5 is for element 46 as it can appear at 5 different places.

4)Consider a hash table of size seven, with starting index zero, and a hash function
(3x + 4)mod7. Assuming the hash table is initially empty, which of the following is the
contents of the table when the sequence 1, 3, 8, 10 is inserted into the table using closed
hashing? Note that ‘_’ denotes an empty location in the table.

A 8, _, _, _, _, _, 10
B 1, 8, 10, _, _, _, 3
C 1, _, _, _, _, _,3
D 1, 10, 8, _, _, _, 3
Answer: B

Explanation

 Let us put values 1, 3, 8, 10 in the hash of size 7. Initially, hash table is empty
- - - - - - -

0 1 2 3 4 5 6

The value of function (3x + 4)mod 7 for 1 is 0, so let us put the value at 0

1 - - - - - -

0 1 2 3 4 5 6

The value of function (3x + 4)mod 7 for 3 is 6, so let us put the value at 6

1 - - - - - 3

0 1 2 3 4 5 6

The value of function (3x + 4)mod 7 for 8 is 0, but 0 is already occupied, let us put the
value(8) at next available space(1)

1 8 - - - - 3

0 1 2 3 4 5 6

The value of function (3x + 4)mod 7 for 10 is 6, but 6 is already occupied, let us put the
value(10) at next available space(2)

1 8 10 - - - 3

0 1 2 3 4 5 6

 5)Consider a hash table with 100 slots. Collisions are resolved using chaining.
Assuming simple uniform hashing, what is the probability that the first 3 slots are
unfilled after the first 3 insertions?

A (97 × 97 × 97)/1003
B (99 × 98 × 97)/1003
C (97 × 96 × 95)/1003
D (97 × 96 × 95)/(3! × 1003)
Answer:A

6)Which one of the following hash functions on integers will distribute keys most uniformly
over 10 buckets numbered 0 to 9 for i ranging from 0 to 2020?

A h(i) =i2 mod 10
B h(i) =i3 mod 10
C h(i) = (11 ∗ i2) mod 10

D h(i) = (12 ∗ i) mod 10


Answer: B
Explanation

since mod 10 is used, the last digit matters. If you do cube all numbers from 0 to 9, you get
following
Number Cube Last Digit in Cube
0 0 0
1 1 1
2 8 8
3 27 7
4 64 4
5 125 5
6 216 6
7 343 3
8 512 2
9 729 9
Therefore all numbers from 0 to 2020 are equally divided in 10 buckets. If we make a table
for square, we don't get equal distribution. In the following table. 1, 4, 6 and 9 are repeated,
so these buckets would have more entries and buckets 2, 3, 7 and 8 would be empty.
Number Square Last Digit in Cube
0 0 0
1 1 1
2 4 4
3 9 9
4 16 6
5 25 5
6 36 6
7 49 9
8 64 4
9 81 1
Alternative approach - Using concept of power of cycle: (a) (0,1,4,9,6,5,6,9,4,1,0) repeated
(b) (0,1,8,7,4,5,6,3,2,9) repeated (c) (0,1,4,9,6,5,6,9,4,1,0) repeated (d) (0,2,4,6,8) repeated
So, only h(i) =i3mod 10 covers all the digits from 0 to 9. Option (B) is correct.

7)Given a hash table T with 25 slots that stores 2000 elements, the load factor α for T is
__________
A 80
B 0.0125
C 8000
D 1.25
Answer: A
Explanation: 
load factor = (no. of elements) / (no. of table slots) = 2000/25 = 80

8)Which of the following statement(s) is TRUE?


1. A hash function takes a message of arbitrary length and generates a fixed length code.
2. A hash function takes a message of fixed length and generates a code of variable
length.
3. A hash function may give the same hash value for distinct messages.
 
A I only
B II and III only
C I and III only
D II only
Answer: C

9)An advantage of chained hash table (external hashing) over the open addressing scheme is
A Worst case complexity of search operations is less
B Space used is less
C Deletion is easier
D None of the above
Answer: C

10)Insert the characters of the string K R P C S N Y T J M into a hash table of size 10. Use
the hash function
h(x) = ( ord(x) – ord("a") + 1 ) mod10
If linear probing is used to resolve collisions, then the following insertion causes collision
A Y
B C
C M
D P
Answer: C
Explanation: 
(a) The hash table with size 10 will have index from 0 to 9. hash function = h(x) = ((ord(x) -
ord(A) + 1)) mod 10 So for string K R P C S N Y T J M:
K will be inserted at index : (11-1+1) mod 10 = 1
R at index: (18-1+1) mod 10 = 8
P at index: (16-1+1) mod 10 = 6
C at index: (3-1+1) mod 10 = 3 S at index: (19-1+1) mod 10 = 9 N at index: (14-1+1) mod 10 = 4 Y at
index (25-1+1) mod 10 = 5 T at index (20-1+1) mod 10 = 0 J at index (10-1+1) mod 10 = 0 // first
collision occurs. M at index (13-1+1) mod 10 = 3 //second collision occurs. Only J and M are causing
the collision. (b) Final Hash table will be:
0 T
1 K
2 J
3 C
4 N
5 Y
6 P
7 M
8 R
9 S

Graph Data Stucture


a graph is a data structure (V,E) that consists of
 A collection of vertices V
 A collection of edges E, represented as ordered pairs of vertices (u,v)

In the graph,

V = {0, 1, 2, 3}
E = {(0,1), (0,2), (0,3), (1,2)}
G = {V, E}

Graph Terminology
Path
A path can be defined as the sequence of nodes that are followed in order to reach some
terminal node V from the initial node U.

Closed Path
A path will be called as closed path if the initial node is same as terminal node. A path
will be closed path if V0=VN.

Cycle
A cycle can be defined as the path which has no repeated edges or vertices except the
first and last vertices.

Connected Graph
A connected graph is the one in which some path exists between every two vertices (u,
v) in V. There are no isolated nodes in connected graph.

Complete Graph
A complete graph is the one in which every node is connected with all other nodes. A
complete graph contain n(n-1)/2 edges where n is the number of nodes in the graph.

Weighted Graph
In a weighted graph, each edge is assigned with some data such as length or weight.
The weight of an edge e can be given as w(e) which must be a positive (+) value
indicating the cost of traversing the edge.
Digraph
A digraph is a directed graph in which each edge of the graph is associated with some
direction and the traversing can be done only in the specified direction.

Loop
An edge that is associated with the similar end points can be called as Loop.

Adjacent Nodes
If two nodes u and v are connected via an edge e, then the nodes u and v are called as
neighbours or adjacent nodes.

Degree of the Node


A degree of a node is the number of edges that are connected with that node. A node
with degree 0 is called as isolated node.

Graph Representation
By Graph representation, we simply mean the technique which is to be used in order to
store some graph into the computer's memory.

1. Sequential / Adjacency Matrix Representation


In sequential representation, we use adjacency matrix to store the mapping represented
by vertices and edges. In adjacency matrix, the rows and columns are represented by
the graph vertices. A graph having n vertices, will have a dimension n x n.

An undirected graph and its adjacency matrix representation is shown in the following
figure.

A directed graph and its adjacency matrix representation is shown in the following
figure.
The weighted directed graph along with the adjacency matrix representation is shown in
the following figure.

Pros: Representation is easier to implement and follow. Removing an edge takes O(1)
time. Queries like whether there is an edge from vertex ‘u’ to vertex ‘v’ are efficient and
can be done O(1).

Cons: Consumes more space O(V^2). Even if the graph is sparse(contains less number
of edges), it consumes the same space. Adding a vertex is O(V^2) time.

Linked / Adjacency List Representation


In the linked representation, an adjacency list is used to store the Graph into the
computer's memory.
Consider the undirected graph shown in the following figure and check the adjacency list
representation. The sum of the lengths of adjacency lists is equal to the twice of
the number of edges present in an undirected graph.

Consider the directed graph shown in the following figure and check the adjacency list
representation of the graph. In a directed graph, the sum of lengths of all the
adjacency lists is equal to the number of edges present in the graph.

In the case of weighted directed graph, each node contains an extra field that is called
the weight of the node. The adjacency list representation of a directed graph is shown in
the following figure.
Breadth first search
Breadth First Search is a level-wise vertex traversal process. Like a tree all
the graphs have vertex but graphs have cycle so in searching to avoid the
coming of the same vertex we prefer BFS

Algorithm:

To implement the BFS we use queue and array data structure.

There are two cases in the algorithm:

1. Whenever we visit a vertex we mark it visited and push its adjacent


non-visited vertices into the queue and pop the current vertex from
the queue.
2. If all the adjacent vertices are visited then only pop the current vertex
from the queue.

The algorithm works as follows:

1. Start by putting any one of the graph's vertices at the back of a queue.
2. Take the front item of the queue and add it to the visited list.
3. Create a list of that vertex's adjacent nodes. Add the ones which aren't in
the visited list to the back of the queue.
4. Keep repeating steps 2 and 3 until the queue is empty.
BFS example
Let's see how the Breadth First Search algorithm works with an example. We use
an undirected graph with 5 vertices.

We start from vertex 0, the BFS algorithm starts by putting it in the Visited list and
putting all its adjacent vertices in the stack.

Next, we visit the element at the front of queue i.e. 1 and go to its adjacent
nodes. Since 0 has already been visited, we visit 2 instead.

Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the back of the
queue and visit 3, which is at the front of the queue.
 

Only 4 remains in the queue since the only adjacent node of 3 i.e. 0 is already
visited. We visit it.

Since the queue is empty, we have completed the Breadth First Traversal of the
graph.

C Program
#include<stdio.h>
#include<conio.h>
int a[20][20],q[20],visited[20],n,i,j,f=0,r=-1;
void bfs(int v) {
for (i=1;i<=n;i++)
if(a[v][i] && !visited[i])
q[++r]=i;
if(f<=r) {
visited[q[f]]=1;
bfs(q[f++]);
}
}
void main() {
int v;
clrscr();
printf("\n Enter the number of vertices:");
scanf("%d",&n);
for (i=1;i<=n;i++) {
q[i]=0;
visited[i]=0;
}
printf("\n Enter graph data in matrix form:\n");
for (i=1;i<=n;i++)
for (j=1;j<=n;j++)
scanf("%d",&a[i][j]);
printf("\n Enter the starting vertex:");
scanf("%d",&v);
bfs(v);
printf("\n The node which are reachable are:\n");
for (i=1;i<=n;i++)
if(visited[i])
printf("%d\t",i); else
printf("\n Bfs is not possible");
getch();
}

Example 2

Consider this graph,


According to our algorithm, the traversal continues like,

DFS algorithm
Depth First Search is a depthwise vertex traversal process. Like a tree all
the graphs have vertex but graphs have cycle so in searching to avoid the
coming of the same vertex we prefer DFS.
 To implement the DFS we use stack and array data structure.

he DFS algorithm works as follows:

1. Start by putting any one of the graph's vertices on top of a stack.


2. Take the top item of the stack and add it to the visited list.
3. Create a list of that vertex's adjacent nodes. Add the ones which aren't in
the visited list to the top of stack.
4. Keep repeating steps 2 and 3 until the stack is empty.

DFS example
Let's see how the Depth First Search algorithm works with an example. We use
an undirected graph with 5 vertices.
We start from vertex 0, the DFS algorithm starts by putting it in the Visited list and
putting all its adjacent vertices in the stack.

Next, we visit the element at the top of stack i.e. 1 and go to its adjacent nodes.
Since 0 has already been visited, we visit 2 instead.

Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the top of the
stack and visit it.
After we visit the last element 3, it doesn't have any unvisited adjacent nodes, so
we have completed the Depth First Traversal of the graph.

Depth First Search (DFS) Algorithm


1 n ← number of nodes
2 Initialize visited[ ] to false (0)
3 for(i=0;i<n;i++)
4 visited[i] = 0;
5  
6 void DFS(vertex i) [DFS starting from i]
7 {
8 visited[i]=1;
9 for each w adjacent to i
10 if(!visited[w])
11 DFS(w);
12 }
Depth First Search (DFS) Program in C [Adjacency
Matrix]
1 #include<stdio.h>
2  
3 void DFS(int);
4 int G[10][10],visited[10],n;    //n is no of vertices and graph is sorted in array G[10][10]
5
6 void main()
7 {
8     int i,j;
9     printf("Enter number of vertices:");
10 scanf("%d",&n);
11
12     //read the adjecency matrix
13 printf("\nEnter adjecency matrix of the graph:");
14   
15 for(i=0;i<n;i++)
16        for(j=0;j<n;j++)
17 scanf("%d",&G[i][j]);
18
19
    //visited is initialized to zero
20
   for(i=0;i<n;i++)
21
        visited[i]=0;
22
23
    DFS(0);
24
}
25
26
27 void DFS(int i)
28 {
29     int j;
30 printf("\n%d",i);
31     visited[i]=1;
32
33 for(j=0;j<n;j++)
34        if(!visited[j]&&G[i][j]==1)
35             DFS(j);
36 }

S-ar putea să vă placă și