Download - Cormen Algo-lec8
-
8/14/2019 Cormen Algo-lec8
1/19
Introduction to Algorithms
6.046J/18.401J
LECTURE
8Hashing II Universal hashing
Universality theorem Constructing a set of
universal hash functions
Perfect hashing
Prof. Charles E. LeisersonOctober 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.1
-
8/14/2019 Cormen Algo-lec8
2/19
A weakness of hashing
Problem: For any hash function h, a set of keys exists that can cause the average access time of a hash table to skyrocket. An adversary can pick all keys from
{kU: h(k) = i} for some slot i.IDEA: Choose the hash function at random,independently of the keys.
Even if an adversary can see your code,he or she cannot find a bad set of keys,since he or she doesnt know exactly
which hash function will be chosen.October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.2
-
8/14/2019 Cormen Algo-lec8
3/19
Universal hashing
Definition. Let Ube a universe of keys, andletH be a finite collection of hash functions,
each mapping Uto {0, 1, , m1}. We sayH is universalif for allx,y U, wherex y,we have |{h H : h(x) = h(y)}| = |H| /m.That is, the chance of a collision {h : h(x) = Hh(y)}
betweenx andy is
1/m if we choose h |H|randomly fromH.
m
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.3
-
8/14/2019 Cormen Algo-lec8
4/19
Universality is good
Theorem. Let hbe a hash function chosen
(uniformly) at random from a universal setH
of hash functions. Suppose h is used to hash
n arbitrary keys into the m slots of a table T.Then, for a given keyx, we have
E[#collisions withx] < n/m.
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.4
-
8/14/2019 Cormen Algo-lec8
5/19
Proof of theorem
Proof. Let Cxbe the random variable denoting
the total number of collisions of keys in Twithx, and let1 ifh(x) = h(y),
c =xy 0 otherwise.
Note: E[cxy] = 1/m and Cx = cxy . }{Ty x
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.5
-
8/14/2019 Cormen Algo-lec8
6/19
Proof (continued)
[CE x ]=E
Take expectation of both sides.
cxy }{Ty x
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.6
-
8/14/2019 Cormen Algo-lec8
7/19
Proof (continued)
[CE x ]=E Take expectationof both sides.
cxy }{Ty x
= [cE xy ] Linearity ofexpectation. }{Ty x
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.7
-
8/14/2019 Cormen Algo-lec8
8/19
Proof (continued)
[CE x ]=E
Take expectation of both sides.
cxy }{Ty x
= [cE xy ] Linearity ofexpectation. }{Ty x
= 1/ m E[cxy] = 1/m. }{Ty x
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.8
-
8/14/2019 Cormen Algo-lec8
9/19
Proof (continued)
CE[ x ]=E Take expectationof both sides.
cxyTy x }{
= [cE xy ] Linearity ofexpectation.Ty x }{
= 1/ m E[cxy] = 1/m.Ty x }{1 Algebra..n=m
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.9
-
8/14/2019 Cormen Algo-lec8
10/19
Constructing a set of
universal hash functionsLet mbe prime. Decompose key kinto r+ 1digits, each with value in the set {0, 1, , m1}.That is, let k= k0, k1, , kr, where 0 ki < m.Randomized strategy:Picka = a0, a1, , arwhere each ai is chosen randomly from {0, 1, , m1}.
rDefine ha (k) = ka i mod m . Dot product,i modulo mi=0
|H| = mr+ 1. REMEMBERHow big is H = {ha}?THIS!
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.10
-
8/14/2019 Cormen Algo-lec8
11/19
Universality of dot-product
hash functionsTheorem. The setH = {ha} is universal.
Proof. Suppose that x = x0,x1, ,xrandy =y0,y1, ,yrbe distinct keys. Thus, they differin at least one digit position, wlog position 0.
For how many ha H dox andy collide?We must have ha(x) = ha(y), which implies that
r r
xa i ya (mod m) .i i ii=0 i=0
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.11
-
8/14/2019 Cormen Algo-lec8
12/19
Proof (continued)
Equivalently, we haver
ai (xi yi ) 0 (mod m)i=0or
ra0 (x0 y0 ) + ai (xi yi ) 0 (mod m) ,
i=1which implies that
r
a0 (x0 y0 ) ai (xi yi ) (mod m) .i=1
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.12
-
8/14/2019 Cormen Algo-lec8
13/19
Fact from number theory
Theorem. Let mbe prime. For anyzZmsuch thatz0, there exists a uniquez1 Zmsuch that
zz1 1 (mod m).Example: m = 7.
z 1 2 3 4 5 6z1 1 4 5 2 3 6
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.13
-
8/14/2019 Cormen Algo-lec8
14/19
Back to the proof
We haver
a0 (x0 y0 ) ai (xi yi ) (mod m) ,i=1and sincex0 y0 , an inverse (x0y0 )1 must exist,which implies that
ra0 ai (xi yi ) (x0 y0 )1 (mod m).
i=1
Thus, for any choices ofa1, a2, , ar, exactlyone choice ofa0 causesx andy to collide.
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.14
-
8/14/2019 Cormen Algo-lec8
15/19
Proof (completed)
Q. How many has causex andy to collide?
A. There are m choices for each ofa1, a2, , ar,but once these are chosen, exactly one choicefora0 causesx andy to collide, namely
r (x0 y0 ) 1
= ai(xi yi) mod m .a0 i 1=Thus, the number ofh s that causex andy
ato collide is mr 1 = mr= |H|/m.
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.15
-
8/14/2019 Cormen Algo-lec8
16/19
40 37 22
26
14 274 31
1 00
9 86
Perfect hashing
Given a set ofn keys, construct a static hash table of size m = O(n) such that SEARCH takes (1) time in the worst case.
40 37
26
14 27
S4
S6
4 31
1 00
9 86
h31(14) = h31(27) = 1
T S1IDEA: Two-
level scheme
01with universal 2
hashing at 3both levels. 4
No collisions 5at level 2!
6m a 0 1 2 3 4 5 6 7 8
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.16
22
-
8/14/2019 Cormen Algo-lec8
17/19
Collisions at level 2
Theorem. LetHbe a class of universal hash
functions for a table of size m = n2. Then, if we
use a random h H to hash n keys into the table,the expected number of collisions is at most 1/2.
Proof. By the definition of universality, the
probability that 2 given keys in the table collidenunderh is 1/m = 1/n22
of keys that can possibly collide, the expectedSince there are pairs.
number of collisions is
n 1 =
(nn 1) 1
-
8/14/2019 Cormen Algo-lec8
18/19
No collisions at level 2
Corollary. The probability of no collisionsis at least 1/2.
Proof. Markovs inequality says that for anynonnegative random variableX, we have
Pr{Xt} E[X]/t.Applying this inequality with t= 1, we findthat the probability of1 or more collisions is
at most 1/2.Thus, just by testing random hash functions
in H, well quickly find one that works. October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.18
-
8/14/2019 Cormen Algo-lec8
19/19
Analysis of storage
For the level-1 hash table T, choose m = n, andlet nibe random variable for the number of keys
that hash to slot i in T. By using ni2 slots for thelevel-2 hash table Si, the expected total storagerequired for the two-level scheme is therefore
E
m 1
i=0( ) ni
2 = (n),
since the analysis is identical to the analysis fromrecitation of the expected running time of bucketsort. (For a probability bound, apply Markov.)
October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.19