Collision Resolution Lesson Plan - 9
Contents Evocation Objective Introduction Collision Resolution Separate Chaining Open addressing Linked list Bucket Hashing
ANNEXURE-I Evocation
Objective To learn the basic knowledge about collision resolution methods
ANNEXURE-II Introduction-Collision resolution Hash a new key to address may create a collision When an element is inserted, it hashes to the same value as an already inserted element, then we have a collision and need to resolve it Two categories of collision resolution techniques Open addressing (closed hashing) Separate Chaining (open hashing)
Collision resolution concept In hashing algorithm, there must be some empty element in list all times Full list is defined as a list in which all elements except one contains data Load factor is defined as number of elements in list divided by number of physical elements allocated for list expressed in percentage α= (k/n) × 100 k - Number of filled elements in list n- Total number of elements allocated to list
Clustering Tendency of data to build up across a hashed list is known as clustering Two types of clustering Primary clustering Secondary clustering Primary Clustering Primary clustering occurs when data cluster around a home address Easy to identify Secondary Clustering Secondary clustering occurs when data become grouped along a collision path throughout a list Difficult to identify
Collision Resolution Methods Linear Probe Quadratic Probe Pseudorandom Key offset Collision Resolution Open Addressing Linked listBuckets
Separate Chaining The idea is to keep a list of all elements that hash to the same value The array elements are pointers to the first nodes of the lists A new item is inserted to the front of the list Two list operations are insertion and deletion Advantages: Better space utilization for large items Simple collision handling: searching linked list Overflow: we can store more items than the hash table size Deletion is quick and easy: deletion from the linked list
Example Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81 hash(key) = key %
Example
Insertion: insert = 4 x mod 11 =
Analyzing of Hashing with separate chaining Worst case All keys hash into the same bucket a single linked list insert, delete, find take O(n) time Average case Keys are uniformly distributed into buckets O(1+N/B): N is the number of elements in a hash table, B is the number of buckets If N = O(B), then O(1) time per operation N/B is called the load factor of the hash table
Open Addressing Separate chaining has the disadvantage of using linked lists Requires the implementation of a second data structure Open addressing resolves collisions in prime area, the area that contains all of home address In an open addressing hashing system, all the data go inside the table A bigger table is needed Generally the load factor should be below 0.5 If a collision occurs, alternative cells are tried until an empty cell is found
Open Addressing More formally: Cells h 0 (x), h 1 (x), h 2 (x), …are tried in succession where h i (x) = (hash(x) + f(i)) mod Table Size, with f(0) = 0 The function f is the collision resolution strategy There are three common collision resolution strategies: Linear Probing Quadratic probing Pseudorandom Collision resolution Key offset Double hashing
Evocation
Linear Probing Data cannot be stored in home address then resolve the collision by adding 1 to current address In linear probing, collisions are resolved by sequentially scanning an array (with wraparound) until an empty cell is found i.e. f is a linear function of i, typically f(i)= i Example: Insert items with keys: 89, 18, 49, 58, 9 into an empty hash table Table size is 10 Hash function is hash(x) = x mod 10 f(i) = i Advantages Simple to implement Data tend to remain near home address
Linear Probing Example
Linear Probing – Analysis Example What is the average number of probes for a successful search and an unsuccessful search for this hash table? Hash Function: h(x) = x mod 11 Successful Search: 20: : : : 2, : 3,4 24: 2,3,4, : : 9,10, 0 Avg. Probe for SS = ( )/8=15/8 Unsuccessful Search: We assume that the hash function uniformly distributes the keys 0: 0,1 -- 1: : 2,3,4,5,6 -- 3: 3,4,5,6 4: 4,5,6 -- 5: 5,6 -- 6: : : 8,9,10,0,1 9: 9,10,0, : 10,0,1 Avg. Probe for US = ( )/11=31/11
Linear Probing (insert 12) 12 = 1 x mod 11 = 1
Search with linear probing (Search 15) 15 = 1 x mod 11 = 4 NOT FOUND !
Deletion with linear probing: LAZY (Delete 9) 9 = 0 x mod 11 = 9 FOUND ! D
Linear Probe collision resolution Hash First insert: no collision Second insert: no collision Marry Dodd Sarah Trapp Bryan Devaux Harry Eagle Patrick Linn Tuan Ngo Feldman [000] [001] [002] [003] [004] [005] [006] [007] [008] [305] [306] Probe 1 Probe 2
Quadratic Probe Quadratic Probing eliminates primary clustering problem of linear probing Secondary clustering can be eliminated by adding a value other than 1 to the current address In quadratic probing, the increment is the collision probe number squared Collision function is quadratic The popular choice is f(i) = i 2 If the hash function evaluates to h and a search in cell h is inconclusive, we try cells h + 1 2, h+2 2, … h + i 2 i.e. It examines cells 1,4,9 and so on away from the original probe Remember that subsequent probe points are a quadratic number of positions from the original probe point
Quadratic Probe Disadvantage Time required to square the probe number Eliminate the multiplication factor by using increment factor that increases by 2 with each probe Add the increment factor to the previous increment gives the next increment Limitation Impossible to generate a new address for every element in list
Quadratic Collision Resolution Increments Probe numberCollision location Probe 2 and increment New address = 11+1 = = 42+4= = 96+9= = = = = = = = = = = = = = = 86
Quadratic Probe
Quadratic Probing (insert 12)
ANNEXURE-III Deep Breathing Meditation Sit comfortably with your back straight. Put one hand on your chest and the other on your stomach Breathe in through your nose. The hand on your stomach should rise. The hand on your chest should move very little Exhale through your mouth, pushing out as much air as you can while contracting your abdominal muscles. The hand on your stomach should move in as you exhale, but your other hand should move very little Continue to breathe in through your nose and out through your mouth. Try to inhale enough so that your lower abdomen rises and falls. Count slowly as you exhale
Optical Illusion What do you see in the image below?
Logo Identification
Quiz The technologically advanced humanoid robot ASIMO is made by which car company? Along with whom did Bill Gates found Microsoft? What science fiction writer wrote the three laws of robotics? Nano, Shuffle, Classic and Touch are variations of what?
Double Hashing When collision occurs use a second hash function Hash 2 (x) = R – (x mod R) R: greatest prime number smaller than table-size Inserting 12 H 2 (x) = 7 – (x mod 7) = 7 – (12 mod 7) = 2 Check H(x) If collision occurs check H(x) + 2 If collision occurs check H(x) + 4 If collision occurs check H(x) + 6 If collision occurs check H(x) + 8 H(x) + i * H 2 (x)
Double Hashing (insert 12)
Pseudorandom Collision Resolution Use pseudorandom number to resolve collision Instead of using key as a factor in random number calculation, use the collision address Resolve the collision using pseudorandom number generator, where a is 3 and c is 5 Y=(ax+c) modulo listsize = (3*1 + 5) Modulo 307 = 8 Limitation All keys follow only one collision path through the list
Pseudorandom Collision Resolution First insert: no collision Hash Second insert: no collision [000] [001] [002] [003] [004] [005] [006] [007] [008] [305] [306] Marry Dodd Sarah Trapp Bryan Devaux Patrick Linn Harry Eagle Tuan Ngo Feldman Probe1 Pseudorandom Y=3X+5
Key Offset Double hashing method that produces different collision paths for different keys Pseudorandom number generator produces a new address as a function of previous address, key offset calculates the new address as function of old address and key offset = [ key/listsize] address = ((offset + old address) modulo listSize) Example When key is and list size is 307 using modulo division hashing method generates address of 1 offset = [166702/307] = 543 address = (( ) modulo 307) =237
Key Offset If 237 were a collision, repeat the process to locate the next address offset = [166702/307] = 543 address = (( ) modulo 307) =166 KeyHome address Key offset Probe 1Probe
Evocation
Linked list Collision Resolution Major disadvantage to open addressing is that each collision resolution increases the probability of future collisions Eliminated by linked list approach Linked list is ordered collection of data in which element contains the location of next element
Linked List Collision Resolution [000] [001] [002] [003] [004] [005] [006] [007] [008] [305] [306] Marry Dodd Sarah Trapp Bryan Devaux Patrick Linn Tuan Ngo Feldman Harry Eagle ChrisWalljasper
Linked list Collision Resolution Use separate area to store collisions and chains together in linked list Two storage areas: prime area and overflow area Each element in prime area contains additional field a link header pointer to a linked list of overflow data in overflow area When collision occurs, one element is stored in prime area and chained to corresponding linked list in over flow area overflow area is typically implemented as linked list in dynamic memory
Linked list Collision Resolution Linked list is stored in any order, but LIFO sequence or key sequence LIFO sequence is fastest for insert because the linked list need not be scanned to insert data Element being inserted into overflow is placed at beginning of linked list and linked to node in prime area In key sequenced lists, key in prime area is smallest to provide for faster search retrieval
Evocation
Bucket Hashing Keys are hashed to bucket nodes that accommodate multiple data occurrences Bucket hold multiple data, collisions are postponed until bucket is full Example Each address is large enough to hold data for three employees Collision will not occur until tried to add fourth employee to address Two problems Use more space because many of bucket are empty or partially empty at any time It will not completely resolves collision problem
Bucket Hashing Marry Dodd Sarah Trapp Harry Eagle Ann Giorgis Byan Devaux Chris jasper Feldman [000] Bucket 0 [001] Bucket 1 [002] Bucket 2 [003] Bucket 307
ANNEXURE-IV Mind Map Collision Resolution Clustering Separate Chaining Open Addressing Bucket hashing Linked List Linear probe Quadratic probe Pseudo random Key offset
ANNEXURE -V Summary Hash a new key to address may create a collision Clustering is tendency of data to build up across a hashed list Three methods used t resolve collisions are open addressing, linked list and buckets Open addressing method can be divided into linear probe, quadratic probe, Pseudorandom rehashing and key offset rehashing In linear probe method, when collision occurs new data are stored in next available address
Summary In quadratic method, increment is the collision probe number squared In pseudorandom method, a random number generator is used to rehash address In key offset method, an offset is used to rehash address In linked list technique, separate areas store collisions and chain all synonyms together in linked list In bucket hashing, bucket accommodates multiple data occurrence