Skip to main content

Hashing in DBMS

Hashing in DBMS

In a huge database structure, it is very inefficient to search all the index values and reach the desired data. Hashing technique is used to calculate the direct location of a data record on the disk without using index structure.

In this technique, data is stored at the data blocks whose address is generated by using the hashing function. The memory location where these records are stored is known as data bucket or data blocks.

In this, a hash function can choose any of the column value to generate the address. Most of the time, the hash function uses the primary key to generate the address of the data block. A hash function is a simple mathematical function to any complex mathematical function. We can even consider the primary key itself as the address of the data block. That means each row whose address will be the same as a primary key stored in the data block.


The above diagram shows data block addresses same as primary key value. This hash function can also be a simple mathematical function like exponential, mod, cos, sin, etc. Suppose we have mod (5) hash function to determine the address of the data block. In this case, it applies mod (5) hash function on the primary keys and generates 3, 3, 1, 4 and 2 respectively, and records are stored in those data block addresses.



Types of Hashing:



Static Hashing

In static hashing, the resultant data bucket address will always be the same. That means if we generate an address for EMP_ID =103 using the hash function mod (5) then it will always result in same bucket address 3. Here, there will be no change in the bucket address.

Hence in this static hashing, the number of data buckets in memory remains constant throughout. In this example, we will have five data buckets in the memory used to store the data.



Operations of Static Hashing

  • Searching a record
When a record needs to be searched, then the same hash function retrieves the address of the bucket where the data is stored.
  • Insert a Record
When a new record is inserted into the table, then we will generate an address for a new record based on the hash key and record is stored in that location.
  • Delete a Record
To delete a record, we will first fetch the record which is supposed to be deleted. Then we will delete the records for that address in memory.
  • Update a Record
To update a record, we will first search it using a hash function, and then the data record is updated.

If we want to insert some new record into the file but the address of a data bucket generated by the hash function is not empty, or data already exists in that address. This situation in the static hashing is known as bucket overflow. This is a critical situation in this method.

To overcome this situation, there are various methods. Some commonly used methods are as follows:


1. Open Hashing
When a hash function generates an address at which data is already stored, then the next bucket will be allocated to it. This mechanism is called as Linear Probing.

For example: suppose R3 is a new address which needs to be inserted, the hash function generates address as 112 for R3. But the generated address is already full. So the system searches next available data bucket, 113 and assigns R3 to it.


2. Close Hashing
When buckets are full, then a new data bucket is allocated for the same hash result and is linked after the previous one. This mechanism is known as Overflow chaining.

For example: Suppose R3 is a new address which needs to be inserted into the table, the hash function generates address as 110 for it. But this bucket is full to store the new data. In this case, a new bucket is inserted at the end of 110 buckets and is linked to it.



Dynamic Hashing
  • The dynamic hashing method is used to overcome the problems of static hashing like bucket overflow.
  • In this method, data buckets grow or shrink as the records increases or decreases. This method is also known as Extendable hashing method. 
  • This method makes hashing dynamic, i.e., it allows insertion or deletion without resulting in poor performance.


How to search a key
  • First, calculate the hash address of the key.
  • Check how many bits are used in the directory, and these bits are called as i.
  • Take the least significant i bits of the hash address. This gives an index of the directory.
  • Now using the index, go to the directory and find bucket address where the record might be.

How to insert a new record
  • Firstly, you have to follow the same procedure for retrieval, ending up in some bucket.
  • If there is still space in that bucket, then place the record in it.
  • If the bucket is full, then we will split the bucket and redistribute the records.


For example:
Consider the following grouping of keys into buckets, depending on the prefix of their hash address:


The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and 6 are 01, so it will go into bucket B1. The last two bits of 1 and 3 are 10, so it will go into bucket B2. The last two bits of 7 are 11, so it will go into B3.


Insert key 9 with hash address 10001 into the above structure:
  • Since key 9 has hash address 10001, it must go into the first bucket. But bucket B1 is full, so it will get split.
  • The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are 001, so it will go into bucket B1, and the last three bits of 6 are 101, so it will go into bucket B5.
  • Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and 100 entry because last two bits of both the entry are 00. 
  • Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and 110 entry because last two bits of both the entry are 10.
  • Key 7 are still in B3. The record in B3 pointed by the 111 and 011 entry because last two bits of both the entry are 11.



Advantages of dynamic hashing
  • In this method, the performance does not decrease as the data grows in the system. It simply increases the size of memory to accommodate the data.
  • In this method, memory is well utilized as it grows and shrinks with the data. There will not be any unused memory lying.
  • This method is good for the dynamic database where data grows and shrinks frequently.

Disadvantages of dynamic hashing
  • In this method, if the data size increases then the bucket size is also increased. These addresses of data will be maintained in the bucket address table. This is because the data address will keep changing as buckets grow and shrink. If there is a huge increase in data, maintaining the bucket address table becomes tedious. 
  • In this case, the bucket overflow situation will also occur. But it might take little time to reach this situation than static hashing.

Comments

Popular posts from this blog

DBMS Keys

DBMS Keys KEYS in DBMS is an attribute or set of attributes which helps you to identify a row (tuple) uniquely in a relation(table). They allow you to find the relation between two tables. Keys help you uniquely identify a row in a table by a combination of one or more columns in that table. Key is also helpful for finding unique record or row from the table. Database key is also helpful for finding unique record or row from the table. Example: Employee ID FirstName LastName 11 Andrew Johnson 22 Tom Wood 33 Alex Hale In the above-given example, employee ID is a primary key because it uniquely identifies an employee record. In this table, no other employee can have the same employee ID. Here are some reasons for using sql key in the DBMS system. Keys help you to identify any row of data in a table. In a real-world application, a table could contain thousands of records. Moreover, the records could be duplicated. Keys in RDBMS ensure that you can uniquely identify a table record despite ...

Colors in CSS

Ways to declare Colors in CSS (Image by - Sharma Guides | Subham232330) 1. Color Name 2. Hex Value 3. RGB() and RGBA() 4. HSL() and HSLA() 5. HWB() * Color Name:- background-color:red; * HEX Value:- background-color:#001122; * RGB():- background-color:rgb(25,31,52); * RGBA():- background-color:rgba(0,0,0,1.5);          |           Transparency The hexadecimal system uses values from 0 to 255 but in RGB we can use 0% to 100% as well.

Computer Short Questions

Computer Short Questions & Answers: 1. What is any part of the computer that you can physically touch? – Hardware 2. Which generation of computers is still under development? – Fifth 3. What is the most common storage device for the personal computer? – Hard Disk Drive 4. Which key is used in combination with another key to perform a specific task? – Control 5. What is the pattern of printed lines on most products? – Barcodes 6. To make the number pad act as a directional arrow, we press which key? – Shift 7. Which devices let the computer communicate with you? – Input 8. What is the most frequently used piece of hardware for inputting data? – Hardware 9. What is the place where the computer stores programs and data? – Storage unit 10. What is the process of dividing the disk into tracks and sectors? – Formatting 11. What is the space in your computer that loads’ and works with data? – RAM memory 12. What is the storage which stores or retains data after power off? – Non-volatile s...