PE 459 LECTURE 2- natural gas basic concepts and properties
FS-Mod5.pptx
1.
2.
3. • Keys hashed to same address is called as synonyms
Solution:
• Spread out the records
• Use extra memory
• Put more than 1 records at a single address.
8. Some other Hashing Methods
• Examine keys for a pattern
• Fold parts of the key
• Divide the key by a number
• Square the key and take the middle.
Key=453 square= 205209 so addr=52
• Radix transformation
Key= 453 base equivalent=382 addr= 382%99=85
9. Predicting the Distribution of Records
Poisson Distribution
• It helps to predict the probability of collisions likely to occur in a file.
• Using Poisson function,
It predicts no. of collisions occur
Consider, address in file is N
It gives 2 outcomes, when a key is hashed
1. A- address not chosen : p(A) termed as a
2. B address chosen: p(B) termed as b
If N=10,
b=1/10=0.1
a=1-0.1= 0.9
10. How Much Extra Memory Should Be Used?
• Poisson function applied to hashing
• To reduce the collisions, use extra memory.
• Packing Density gives the measure of amount of space in a file is used.
• If there are 75 records and 100 addresses than packing density is 75/100=0.75=75%
11. • Predicting collision for different packing densities
• Suppose that 1000 addresses are allocated to hold 500 records in a randomly
hashed file, and that each address can hold one record. The packing density
for the file is
Let us answer the following questions about the distribution of records among
the available addresses in the file:
• How many addresses should have no records assigned to them?
• How many addresses should have exactly one record assigned (no synonyms)?
·. ·
• How many addresses should have one record plus· one or more synonyms? - -
• Assuming that only one record can be assigned to each home address, how
many overflow records an be expected?
• What percentage of records should be overflow records?
13. Extendable Hashing
• Direct access (hashing) files have static size, so not suitable for files whose size is
unknown in advance
• Dynamic file structure is desired which retains the feature of fast retrieval by primary key,
and which also expands and contracts as the number of records in the file fluctuates
(without reorganizing the whole file)
• Similar motivation!
• Indexed-sequential File ==> B tree
• Hashing ==> Extendible Hashing
14. • How Extendible Hashing works
• Idea from Tries file (radix searching)
• The branching factor of the tree is equal to the # of alternative symbols in
each position of the key
e.g.) Radix 26 trie - able, abrahms, adams, anderson, adnrews, baird
• Use the first n characters for branching
15.
16. Turning the trie into a directory
• Using Trie for extendible hashing
(1) Use Radix 2 Trie :
Keys in A : beginning with 0
Keys in B : beginning with 10
Keys in C : beginning with 11
(2) Retrieving from secondary storage the buckets containing keys,
instead of individual keys
17. Representation of Ties
• Rather than representing tries as a tree , it will be represented as array of
contiguous records forming a
• Directory of hash key and
• pointers to the corresponding buckets.
19. Splitting handle overflow
• When overflow occurs
e.g.1) Overflowing of bucket A
• Split A into A and D
• Come to use additional unused bits
• No need to expand the directory
20. • Overflowing of bucket B
• Do not have additional unused bits
(need to expand the directory)
1. Divide B using 3 bits of hash address
2. Make a complete full binary tree
3. Collapse it into the directory structure
21.
22. Dynamic Hashing
• Similar to dynamic extendible hashing
• Use a directory to track bucket addresses
• Extend the directory through the use of tries
• Start with a hash function that covers an address space of a fixed size
• When overflow occurs
• splits forming the leaves of a trie that grows down from the original
address node makes a trie
Two kinds of nodes
External node: reference a data bucket
Internal node: point to two children index nodes
When a node has split children, it changed from an external node to an internal nod
23. 1 2 3 4
4
1 2 3
40 41
4
1 3
1
410
20 21 41
411
2
Original
address
space
Original
address
space
Original
address
space
(a)
(b)
(c)