SlideShare a Scribd company logo
1
Hashing
2
Motivation
• Sequential Searching can be done in O(N) access time,
meaning that the number of seeks grows in proportion to
the size of the file.
• B-Trees improve on this greatly, providing O(Logk N)
access where k is a measure of the leaf size (i.e., the
number of records that can be stored in a leaf).
• What we would like to achieve, however, is an O(1)
access, which means that no matter how big a file grows,
access to a record always takes the same small number of
seeks.
• Static Hashing techniques can achieve such performance
provided that the file does not increase in time.
3
What is Hashing?
• A Hash function is a function h(K) which transforms a
key K into an address.
• Hashing is like indexing in that it involves associating
a key with a relative record address.
• Hashing, however, is different from indexing in two
important ways:
– With hashing, there is no obvious connection
between the key and the location.
– With hashing two different keys may be transformed
to the same address.
4
Collisions
• When two different keys produce the same
address, there is a collision. The keys involved are
called synonyms.
• Coming up with a hashing function that avoids
collision is extremely difficult. It is best to simply
find ways to deal with them.
• Possible Solutions:
– Spread out the records
– Use extra memory
– Put more than one record at a single address.
5
A Simple Hashing Algorithm
• Step 1: Represent the key in
numerical form
• Step 2: Fold and Add
• Step 3: Divide by a prime
number and use the remainder as
the address.
6
Hashing Functions and Record
Distributions
• Records can be distributed among addresses in
different ways: there may be (a) no synonyms (uniform
distribution); (b) only synonyms (worst case); (c) a few
synonyms (happens with random distributions).
• Purely uniform distributions are difficult to obtain and
may not be worth searching for.
• Random distributions can be easily derived, but they
are not perfect since they may generate a fair number
of synonyms.
• We want better hashing methods.
7
Some Other Hashing Methods
• Though there is no hash function that guarantees
better-than-random distributions in all cases, by
taking into considerations the keys that are being
hashed, certain improvements are possible.
• Here are some methods that are potentially better
than random:
– Examine keys for a pattern
– Fold parts of the key
– Divide the key by a number
– Square the key and take the middle
– Radix transformation
8
Predicting the Distribution of
Records
• When using a random distribution, we can use a
number of mathematical tools to obtain
conservative estimates of how our hashing
function is likely to behave:
• Using the Poisson Function p(x)=(r/N)x
e-(r/N)
/x!
applied to Hashing, we can conclude that:
• In general, if there are N addresses, then the
expected number of addresses with x records
assigned to them is Np(x)
9
Predicting Collisions for a Full
File
• Suppose you have a hashing function that you
believe will distribute records randomly and you
want to store 10,000 records in 10,000 addresses.
• How many addresses do you expect to have no
records assigned to them?
• How many addresses should have one, two, and
three records assigned respectively?
• How can we reduce the number of overflow
records?
10
Increasing Memory Space I
• Reducing collisions can be done by choosing a good
hashing function or using extra memory.
• The question asked here is how much extra memory
should be use to obtain a given rate of collision
reduction?
• Definition: Packing density refers to the ratio of the
number of records to be stored (r) to the number of
available spaces (N).
• The packing density gives a measure of the amount of
space in a file that is used.
11
Increasing Memory Space II
• The Poisson Distribution allows us to predict the number
of collisions that are likely to occur given a certain packing
density. We use the Poisson Distribution to answer the
following questions:
• How many addresses should have no records assigned to
them?
• How many addresses should have exactly one record
assigned (no synonym)?
• How many addresses should have one record plus one or
more synonyms?
• Assuming that only one record can be assigned to each home
address, how many overflow records can be expected?
• What percentage of records should be overflow records?
12
Collision Resolution by
Progressive Overflow
• How do we deal with records that cannot fit into their home
address? A simple approach: Progressive Overflow or
Linear Probing.
• If a key, k1, hashes into the same address, a1, as another
key, k2, then look for the first available address, a2,
following a1 and place k1 in a2. If the end of the address
space is reached, then wrap around it.
• When searching for a key that is not in, if the address space
is not full, then an empty address will be reached or the
search will come back to where it began.
13
Search Length when using
Progressive Overflow
• Progressive Overflow causes extra searches and
thus extra disk accesses.
• If there are many collisions, then many records
will be far from “home”.
• Definitions: Search length refers to the number of
accesses required to retrieve a record from
secondary memory. The average search length is
the average number of times you can expect to
have to access the disk to retrieve a record.
• Average search length = (Total search length)/
(Total number of records)
14
Storing More than One Record
per Address: Buckets
• Definition: A bucket describes a block of records
sharing the same address that is retrieved in one
disk access.
• When a record is to be stored or retrieved, its
home bucket address is determined by hashing.
When a bucket is filled, we still have to worry
about the record overflow problem, but this occurs
much less often than when each address can hold
only one record.
15
Effect of Buckets on Performance
• To compute how densely packed a file is, we need
to consider 1) the number of addresses, N,
(buckets) 2) the number of records we can put at
each address, b, (bucket size) and 3) the number
of records, r. Then, Packing Density = r/bN.
• Though the packing density does not change when
halving the number of addresses and doubling the
size of the buckets, the expected number of
overflows decreases dramatically.
16
Making Deletions
• Deleting a record from a hashed file is more
complicated than adding a record for two reasons:
– The slot freed by the deletion must not be allowed to
hinder later searches
– It should be possible to reuse the freed slot for later
additions.
• In order to deal with deletions we use tombstones, i.e.,
a marker indicating that a record once lived there but
no longer does. Tombstones solve both the problems
caused by deletion.
• Insertion of records is slightly different when using
tombstones.
17
Effects of Deletions and
Additions on Performance
• After a large number of deletions and additions
have taken places, one can expect to find many
tombstones occupying places that could be
occupied by records whose home address precedes
them but that are stored after them.
• This deteriorates average search lengths.
• There are 3 types of solutions for dealing with this
problem: a) local reorganization during deletions;
b) global reorganization when the average search
length is too large; c) use of a different collision
resolution algorithm.
18
Other Collision Resolution
Techniques
• There are a few variations on random hashing that
may improve performance:
– Double Hashing: When an overflow occurs, use a
second hashing function to map the record to its
overflow location.
– Chained Progressive Overflow: Like Progressive
overflow except that synonyms are linked together with
pointers.
– Chaining with a Separate Overflow Area: Like
chained progressive overflow except that overflow
addresses do not occupy home addresses.
– Scatter Tables: The Hash file contains no records, but
only pointers to records. I.e., it is an index.
19
Pattern of Record Access
• If we have some information about what records
get accessed most often, we can optimize their
location so that these records will have short
search lengths.
• By doing this, we try to decrease the effective
average search length even if the nominal average
search length remains the same.
• This principle is related to the one used in
Huffman encoding.

More Related Content

What's hot

Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query OptimizationAli Usman
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMS
Abhishek Dutta
 
Web mining
Web mining Web mining
Web mining
TeklayBirhane
 
Nosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptx
Radhika R
 
Page replacement algorithms
Page replacement algorithmsPage replacement algorithms
Page replacement algorithms
sangrampatil81
 
Data structure power point presentation
Data structure power point presentation Data structure power point presentation
Data structure power point presentation
Anil Kumar Prajapati
 
Chapter 9 Operating Systems silberschatz
Chapter 9 Operating Systems silberschatzChapter 9 Operating Systems silberschatz
Chapter 9 Operating Systems silberschatz
GiulianoRanauro
 
Indexing and Hashing
Indexing and HashingIndexing and Hashing
Indexing and Hashing
sathish sak
 
Adbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimizationAdbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimization
Vaibhav Khanna
 
Presentation on Segmentation
Presentation on SegmentationPresentation on Segmentation
Presentation on Segmentation
Priyanka bisht
 
Data cube
Data cubeData cube
Data cube
Hitesh Mohapatra
 
database recovery techniques
database recovery techniques database recovery techniques
database recovery techniques Kalhan Liyanage
 
Swap-space Management
Swap-space ManagementSwap-space Management
Swap-space Management
Agnas Jasmine
 
Os Swapping, Paging, Segmentation and Virtual Memory
Os Swapping, Paging, Segmentation and Virtual MemoryOs Swapping, Paging, Segmentation and Virtual Memory
Os Swapping, Paging, Segmentation and Virtual Memory
sgpraju
 
Data cube computation
Data cube computationData cube computation
Data cube computationRashmi Sheikh
 
File organization 1
File organization 1File organization 1
File organization 1
Rupali Rana
 
Concurrency control
Concurrency control Concurrency control
Concurrency control
Abdelrahman Almassry
 
Data storage and indexing
Data storage and indexingData storage and indexing
Data storage and indexing
pradeepa velmurugan
 

What's hot (20)

Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMS
 
Web mining
Web mining Web mining
Web mining
 
Nosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptx
 
Page replacement algorithms
Page replacement algorithmsPage replacement algorithms
Page replacement algorithms
 
Data structure power point presentation
Data structure power point presentation Data structure power point presentation
Data structure power point presentation
 
Chapter 9 Operating Systems silberschatz
Chapter 9 Operating Systems silberschatzChapter 9 Operating Systems silberschatz
Chapter 9 Operating Systems silberschatz
 
Indexing and Hashing
Indexing and HashingIndexing and Hashing
Indexing and Hashing
 
Lecture 1 ddbms
Lecture 1 ddbmsLecture 1 ddbms
Lecture 1 ddbms
 
Adbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimizationAdbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimization
 
Presentation on Segmentation
Presentation on SegmentationPresentation on Segmentation
Presentation on Segmentation
 
Data cube
Data cubeData cube
Data cube
 
database recovery techniques
database recovery techniques database recovery techniques
database recovery techniques
 
Swap-space Management
Swap-space ManagementSwap-space Management
Swap-space Management
 
Os Swapping, Paging, Segmentation and Virtual Memory
Os Swapping, Paging, Segmentation and Virtual MemoryOs Swapping, Paging, Segmentation and Virtual Memory
Os Swapping, Paging, Segmentation and Virtual Memory
 
Data cube computation
Data cube computationData cube computation
Data cube computation
 
Paging and segmentation
Paging and segmentationPaging and segmentation
Paging and segmentation
 
File organization 1
File organization 1File organization 1
File organization 1
 
Concurrency control
Concurrency control Concurrency control
Concurrency control
 
Data storage and indexing
Data storage and indexingData storage and indexing
Data storage and indexing
 

Similar to Hashing

FS-Mod5.pptx
FS-Mod5.pptxFS-Mod5.pptx
FS-Mod5.pptx
PurushottamPurshi
 
Data Indexing Presentation-My.pptppt.ppt
Data Indexing Presentation-My.pptppt.pptData Indexing Presentation-My.pptppt.ppt
Data Indexing Presentation-My.pptppt.ppt
sdsm2
 
Designing data intensive applications
Designing data intensive applicationsDesigning data intensive applications
Designing data intensive applications
Hemchander Sannidhanam
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5
Burak TUNGUT
 
Data Distribution Theory
Data Distribution TheoryData Distribution Theory
Data Distribution Theory
William LaForest
 
Probabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityProbabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. Similarity
Andrii Gakhov
 
Chap11 slides
Chap11 slidesChap11 slides
Chap11 slides
BaliThorat1
 
Inverted index
Inverted indexInverted index
Inverted index
Krishna Gehlot
 
Hashing
HashingHashing
Heap Memory Management.pptx
Heap Memory Management.pptxHeap Memory Management.pptx
Heap Memory Management.pptx
Viji B
 
hasing introduction.pptx
hasing introduction.pptxhasing introduction.pptx
hasing introduction.pptx
vvwaykule
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
Alex Robinson
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
Iffat Anjum
 
chapter6.ppt
chapter6.pptchapter6.ppt
chapter6.ppt
2020CE19
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
 
CS 2212- UNIT -4.pptx
CS 2212-  UNIT -4.pptxCS 2212-  UNIT -4.pptx
CS 2212- UNIT -4.pptx
LilyMkayula
 
Allocation and free space management
Allocation and free space managementAllocation and free space management
Allocation and free space management
rajshreemuthiah
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalab
Gluster.org
 
Hash table methods
Hash table methodsHash table methods
Hash table methodsunyil96
 

Similar to Hashing (20)

FS-Mod5.pptx
FS-Mod5.pptxFS-Mod5.pptx
FS-Mod5.pptx
 
Data Indexing Presentation-My.pptppt.ppt
Data Indexing Presentation-My.pptppt.pptData Indexing Presentation-My.pptppt.ppt
Data Indexing Presentation-My.pptppt.ppt
 
Designing data intensive applications
Designing data intensive applicationsDesigning data intensive applications
Designing data intensive applications
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5
 
Data Distribution Theory
Data Distribution TheoryData Distribution Theory
Data Distribution Theory
 
Probabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityProbabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. Similarity
 
Chap11 slides
Chap11 slidesChap11 slides
Chap11 slides
 
Inverted index
Inverted indexInverted index
Inverted index
 
Hashing
HashingHashing
Hashing
 
Heap Memory Management.pptx
Heap Memory Management.pptxHeap Memory Management.pptx
Heap Memory Management.pptx
 
hasing introduction.pptx
hasing introduction.pptxhasing introduction.pptx
hasing introduction.pptx
 
12-6810-12.ppt
12-6810-12.ppt12-6810-12.ppt
12-6810-12.ppt
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
 
chapter6.ppt
chapter6.pptchapter6.ppt
chapter6.ppt
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
CS 2212- UNIT -4.pptx
CS 2212-  UNIT -4.pptxCS 2212-  UNIT -4.pptx
CS 2212- UNIT -4.pptx
 
Allocation and free space management
Allocation and free space managementAllocation and free space management
Allocation and free space management
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalab
 
Hash table methods
Hash table methodsHash table methods
Hash table methods
 

More from Devyani Vaidya

Fundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of recordsFundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of records
Devyani Vaidya
 
Introduction to the design and specification of file structures
Introduction to the design and specification of file structuresIntroduction to the design and specification of file structures
Introduction to the design and specification of file structures
Devyani Vaidya
 
Mobile Phone Cloning
 Mobile Phone Cloning Mobile Phone Cloning
Mobile Phone Cloning
Devyani Vaidya
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Devyani Vaidya
 
secued cloud
 secued cloud secued cloud
secued cloud
Devyani Vaidya
 
Cloud Cmputing Security
Cloud Cmputing SecurityCloud Cmputing Security
Cloud Cmputing Security
Devyani Vaidya
 
Cloud Security
Cloud SecurityCloud Security
Cloud Security
Devyani Vaidya
 
Wireless network
Wireless networkWireless network
Wireless network
Devyani Vaidya
 
Environmental law
Environmental lawEnvironmental law
Environmental law
Devyani Vaidya
 
Wireless mobile charging using microwaves
Wireless mobile charging using microwavesWireless mobile charging using microwaves
Wireless mobile charging using microwaves
Devyani Vaidya
 
Secure Cloud Issues
Secure Cloud IssuesSecure Cloud Issues
Secure Cloud Issues
Devyani Vaidya
 
Energy Harvesing Through Reverse Electrowetting
Energy Harvesing Through Reverse Electrowetting Energy Harvesing Through Reverse Electrowetting
Energy Harvesing Through Reverse Electrowetting
Devyani Vaidya
 
Wireless Charging Of Mobile
Wireless Charging Of Mobile  Wireless Charging Of Mobile
Wireless Charging Of Mobile
Devyani Vaidya
 
Applet programming
Applet programming Applet programming
Applet programming
Devyani Vaidya
 
Seminar on telephone directory
Seminar on telephone directorySeminar on telephone directory
Seminar on telephone directory
Devyani Vaidya
 
History of Laptop
History of LaptopHistory of Laptop
History of Laptop
Devyani Vaidya
 
Ppt on open and close door using Applet
Ppt on open and close door using Applet Ppt on open and close door using Applet
Ppt on open and close door using Applet
Devyani Vaidya
 
Resource management
Resource managementResource management
Resource management
Devyani Vaidya
 
Ppt on use of biomatrix in secure e trasaction
Ppt on use of biomatrix in secure e trasactionPpt on use of biomatrix in secure e trasaction
Ppt on use of biomatrix in secure e trasaction
Devyani Vaidya
 
Secued Cloud
 Secued  Cloud Secued  Cloud
Secued Cloud
Devyani Vaidya
 

More from Devyani Vaidya (20)

Fundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of recordsFundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of records
 
Introduction to the design and specification of file structures
Introduction to the design and specification of file structuresIntroduction to the design and specification of file structures
Introduction to the design and specification of file structures
 
Mobile Phone Cloning
 Mobile Phone Cloning Mobile Phone Cloning
Mobile Phone Cloning
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
secued cloud
 secued cloud secued cloud
secued cloud
 
Cloud Cmputing Security
Cloud Cmputing SecurityCloud Cmputing Security
Cloud Cmputing Security
 
Cloud Security
Cloud SecurityCloud Security
Cloud Security
 
Wireless network
Wireless networkWireless network
Wireless network
 
Environmental law
Environmental lawEnvironmental law
Environmental law
 
Wireless mobile charging using microwaves
Wireless mobile charging using microwavesWireless mobile charging using microwaves
Wireless mobile charging using microwaves
 
Secure Cloud Issues
Secure Cloud IssuesSecure Cloud Issues
Secure Cloud Issues
 
Energy Harvesing Through Reverse Electrowetting
Energy Harvesing Through Reverse Electrowetting Energy Harvesing Through Reverse Electrowetting
Energy Harvesing Through Reverse Electrowetting
 
Wireless Charging Of Mobile
Wireless Charging Of Mobile  Wireless Charging Of Mobile
Wireless Charging Of Mobile
 
Applet programming
Applet programming Applet programming
Applet programming
 
Seminar on telephone directory
Seminar on telephone directorySeminar on telephone directory
Seminar on telephone directory
 
History of Laptop
History of LaptopHistory of Laptop
History of Laptop
 
Ppt on open and close door using Applet
Ppt on open and close door using Applet Ppt on open and close door using Applet
Ppt on open and close door using Applet
 
Resource management
Resource managementResource management
Resource management
 
Ppt on use of biomatrix in secure e trasaction
Ppt on use of biomatrix in secure e trasactionPpt on use of biomatrix in secure e trasaction
Ppt on use of biomatrix in secure e trasaction
 
Secued Cloud
 Secued  Cloud Secued  Cloud
Secued Cloud
 

Recently uploaded

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 

Recently uploaded (20)

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 

Hashing

  • 2. 2 Motivation • Sequential Searching can be done in O(N) access time, meaning that the number of seeks grows in proportion to the size of the file. • B-Trees improve on this greatly, providing O(Logk N) access where k is a measure of the leaf size (i.e., the number of records that can be stored in a leaf). • What we would like to achieve, however, is an O(1) access, which means that no matter how big a file grows, access to a record always takes the same small number of seeks. • Static Hashing techniques can achieve such performance provided that the file does not increase in time.
  • 3. 3 What is Hashing? • A Hash function is a function h(K) which transforms a key K into an address. • Hashing is like indexing in that it involves associating a key with a relative record address. • Hashing, however, is different from indexing in two important ways: – With hashing, there is no obvious connection between the key and the location. – With hashing two different keys may be transformed to the same address.
  • 4. 4 Collisions • When two different keys produce the same address, there is a collision. The keys involved are called synonyms. • Coming up with a hashing function that avoids collision is extremely difficult. It is best to simply find ways to deal with them. • Possible Solutions: – Spread out the records – Use extra memory – Put more than one record at a single address.
  • 5. 5 A Simple Hashing Algorithm • Step 1: Represent the key in numerical form • Step 2: Fold and Add • Step 3: Divide by a prime number and use the remainder as the address.
  • 6. 6 Hashing Functions and Record Distributions • Records can be distributed among addresses in different ways: there may be (a) no synonyms (uniform distribution); (b) only synonyms (worst case); (c) a few synonyms (happens with random distributions). • Purely uniform distributions are difficult to obtain and may not be worth searching for. • Random distributions can be easily derived, but they are not perfect since they may generate a fair number of synonyms. • We want better hashing methods.
  • 7. 7 Some Other Hashing Methods • Though there is no hash function that guarantees better-than-random distributions in all cases, by taking into considerations the keys that are being hashed, certain improvements are possible. • Here are some methods that are potentially better than random: – Examine keys for a pattern – Fold parts of the key – Divide the key by a number – Square the key and take the middle – Radix transformation
  • 8. 8 Predicting the Distribution of Records • When using a random distribution, we can use a number of mathematical tools to obtain conservative estimates of how our hashing function is likely to behave: • Using the Poisson Function p(x)=(r/N)x e-(r/N) /x! applied to Hashing, we can conclude that: • In general, if there are N addresses, then the expected number of addresses with x records assigned to them is Np(x)
  • 9. 9 Predicting Collisions for a Full File • Suppose you have a hashing function that you believe will distribute records randomly and you want to store 10,000 records in 10,000 addresses. • How many addresses do you expect to have no records assigned to them? • How many addresses should have one, two, and three records assigned respectively? • How can we reduce the number of overflow records?
  • 10. 10 Increasing Memory Space I • Reducing collisions can be done by choosing a good hashing function or using extra memory. • The question asked here is how much extra memory should be use to obtain a given rate of collision reduction? • Definition: Packing density refers to the ratio of the number of records to be stored (r) to the number of available spaces (N). • The packing density gives a measure of the amount of space in a file that is used.
  • 11. 11 Increasing Memory Space II • The Poisson Distribution allows us to predict the number of collisions that are likely to occur given a certain packing density. We use the Poisson Distribution to answer the following questions: • How many addresses should have no records assigned to them? • How many addresses should have exactly one record assigned (no synonym)? • How many addresses should have one record plus one or more synonyms? • Assuming that only one record can be assigned to each home address, how many overflow records can be expected? • What percentage of records should be overflow records?
  • 12. 12 Collision Resolution by Progressive Overflow • How do we deal with records that cannot fit into their home address? A simple approach: Progressive Overflow or Linear Probing. • If a key, k1, hashes into the same address, a1, as another key, k2, then look for the first available address, a2, following a1 and place k1 in a2. If the end of the address space is reached, then wrap around it. • When searching for a key that is not in, if the address space is not full, then an empty address will be reached or the search will come back to where it began.
  • 13. 13 Search Length when using Progressive Overflow • Progressive Overflow causes extra searches and thus extra disk accesses. • If there are many collisions, then many records will be far from “home”. • Definitions: Search length refers to the number of accesses required to retrieve a record from secondary memory. The average search length is the average number of times you can expect to have to access the disk to retrieve a record. • Average search length = (Total search length)/ (Total number of records)
  • 14. 14 Storing More than One Record per Address: Buckets • Definition: A bucket describes a block of records sharing the same address that is retrieved in one disk access. • When a record is to be stored or retrieved, its home bucket address is determined by hashing. When a bucket is filled, we still have to worry about the record overflow problem, but this occurs much less often than when each address can hold only one record.
  • 15. 15 Effect of Buckets on Performance • To compute how densely packed a file is, we need to consider 1) the number of addresses, N, (buckets) 2) the number of records we can put at each address, b, (bucket size) and 3) the number of records, r. Then, Packing Density = r/bN. • Though the packing density does not change when halving the number of addresses and doubling the size of the buckets, the expected number of overflows decreases dramatically.
  • 16. 16 Making Deletions • Deleting a record from a hashed file is more complicated than adding a record for two reasons: – The slot freed by the deletion must not be allowed to hinder later searches – It should be possible to reuse the freed slot for later additions. • In order to deal with deletions we use tombstones, i.e., a marker indicating that a record once lived there but no longer does. Tombstones solve both the problems caused by deletion. • Insertion of records is slightly different when using tombstones.
  • 17. 17 Effects of Deletions and Additions on Performance • After a large number of deletions and additions have taken places, one can expect to find many tombstones occupying places that could be occupied by records whose home address precedes them but that are stored after them. • This deteriorates average search lengths. • There are 3 types of solutions for dealing with this problem: a) local reorganization during deletions; b) global reorganization when the average search length is too large; c) use of a different collision resolution algorithm.
  • 18. 18 Other Collision Resolution Techniques • There are a few variations on random hashing that may improve performance: – Double Hashing: When an overflow occurs, use a second hashing function to map the record to its overflow location. – Chained Progressive Overflow: Like Progressive overflow except that synonyms are linked together with pointers. – Chaining with a Separate Overflow Area: Like chained progressive overflow except that overflow addresses do not occupy home addresses. – Scatter Tables: The Hash file contains no records, but only pointers to records. I.e., it is an index.
  • 19. 19 Pattern of Record Access • If we have some information about what records get accessed most often, we can optimize their location so that these records will have short search lengths. • By doing this, we try to decrease the effective average search length even if the nominal average search length remains the same. • This principle is related to the one used in Huffman encoding.