Direct File Organization 2006
Hakan Uraz - File Organization
1
Computed Chaining • The method that ue a link !ield pro"ide #etter per!ormance #ut re$uire more torage% • Thoe that don&t ue a link !ield re$uire le pace #ut pro"ide poorer per!ormance% • Conider a third cla in #et'een% (ntead o! toring an actual addre in the link) the* tore a peudolink +'hich re$uire additional proceing #e!ore it *ield an actual addre, add re, • (! torage i limited) in!ormation needed can #e computed rather than tored% 2006
Hakan Uraz - File Organization
2
Computed Chaining • The method that ue a link !ield pro"ide #etter per!ormance #ut re$uire more torage% • Thoe that don&t ue a link !ield re$uire le pace #ut pro"ide poorer per!ormance% • Conider a third cla in #et'een% (ntead o! toring an actual addre in the link) the* tore a peudolink +'hich re$uire additional proceing #e!ore it *ield an actual addre, add re, • (! torage i limited) in!ormation needed can #e computed rather than tored% 2006
Hakan Uraz - File Organization
2
Computed Chaining • The per!ormance o! the peudolink method are #etter than the nonlink method% The num#er o! o!!et or location 'hich 'ould need to #e earched to locate the ucceor item are tored in the peudolink o that 'e can compute the ucceor& actual addre% • The intermediate location do not ha"e to #e earched o per!ormance i impro"ed% • (t take !e'er #it +le torage, to tore the num#er o! o!!et rather than a !ull addre% • On the negati"e ide) do not per!orm a 'ell a the coaleced hahing method and the* re$uire r e$uire more torage than the nonlink method% 2006
Hakan Uraz - File Organization
Computed Chaining • Follo' the proce o! coaleced hahing in that the !irt item on a chain point to it immediate ucceor 'hich point to it immediate ucceor% • To eliminate the coalecing o! chain) computed chaining alo moves a moves a record tored at another record& home addre% • There i no coalecing +impro"e per!ormance,% • Unlike the nonlink method) computed chaining ue the ke* o! the record stored record stored at at a pro#e addre to locate the ne/t pro#e addre and not the ke* o! the record #eing inserted or or retrieved % 2006
Hakan Uraz - File Organization
.
Computed Chaining • Uing a !unction o! the item tored at a location enure that onl* one actual pro#e 'ill #e needed to locate the ucceor record%
2006
Hakan Uraz - File Organization
Computed Chaining
2006
Hakan Uraz - File Organization
6
Computed Chaining
2006
Hakan Uraz - File Organization
Computed Chaining
The probe !unction computes the addre o! the ucceor element gi"en the ke* o! the record tored at the current location and the peudolink o! the current location% The incrementing cheme o! linear $uotient i ued to calculate the ucceor poition% 2006
Hakan Uraz - File Organization
Computed Chaining - 4/ample • 5e* 2) 1) 23) 2) 3) 1) 16) ) • Hah+ke*, 7 ke* mod 11 • i+ke*, 7 8uotient+5e* 9 11, mod 11
2006
Hakan Uraz - File Organization
3
Computed Chaining - 4/ample :no!; repreent the num#er o! o!!et or the peudolink "alue%
2006
Hakan Uraz - File Organization
10
Computed Chaining - 4/ample • The increment i changed onl* 'hen 'e reach the ucceor element on a pro#e chain% • The current increment i al'a* the one aociated 'ith the mot recentl* "iited record on the current probe chain%
2006
Hakan Uraz - File Organization
11
Computed Chaining - 4/ample • 'e continue 'ith an increment o! t'o until 'e !ind an empt* location or until 'e !ind that the ta#le i !ull%
2006
Hakan Uraz - File Organization
12
Computed Chaining - 4/ample • =hich peudolink !ield do 'e need to et? • The one aociated 'ith 16% • @ince 'e had to look at t'o additional location) the peudolink "alue i t'o%
2006
Hakan Uraz - File Organization
1
Computed Chaining - 4/ample • =e !inall* add the record 'ith ke* % • =e o#er"e a move proce% =e mo"e 16 plus all it ucceor% Other'ie 'e 'ould no longer #e a#le to retrie"e them% • The record 'ith ke* ma* #e inerted directl* at location 3% • =e no' reinert the t'o record that 'ere diplaced #* the inertion% 2006
Hakan Uraz - File Organization
1.
Computed Chaining - 4/ample • =e need to keep a pointer or remem#er location 3 o that 'e can !ind the end o! the chain that thoe t'o record 'ere on pre"ioul*% • =e need to identi!* the immediate predeceor to the item that 'a remo"ed !rom location 3 o that 'e can relink it to the ne' location !or the remo"ed record% • =e kno' 'here to #egin the earch #* hahing 16 $hich *ield location % =e earch !rom that location until 'e !ind a peudolink to location 3% =e !ind that immediatel* at location % 2006
Hakan Uraz - File Organization
1
Computed Chaining - 4/ample (n general) the earch !or the immediate predeceor o! a remo"ed item ma* #e repreented a
2006
Hakan Uraz - File Organization
16
Computed Chaining - 4/ample • =e #egin the earch at the home addre 'hich contain a record and continue the earch until 'e !ind a record that point to the location o! the remo"ed item 'hich i indicated 'ith the temporar* pointer% • =e no' kno' the location o! the peudolink !ield that need to #e modi!ied in the reinertion proce% • (n the e/ample) 'e then need to ue that increment aociated 'ith 2 until 'e can !ind another empt* pace to inert the record 'ith ke* 16% 2006
Hakan Uraz - File Organization
1
Computed Chaining - 4/ample • Did 'e degrade the per!ormance !or retrie"ing 16 #* mo"ing it? •
2006
Hakan Uraz - File Organization
1
Computed Chaining - Dicuion • =ith computed chaining) the peudolink !ield i normall* onl* o! a ize u!!icient to tore the larget poi#le peudolink "alue% Aut 'hat i! 'e do not kno' 'hat the larget "alue i? Or 'hat i! onl* a limited) !i/ed num#er o! #it i a"aila#le !or the peudolink !ield? (! inu!!icient #it e/it to tore the actual peudolink "alue) it greatet !actor that can !it into the gi"en num#er o! #it i tored intead% Aut) thi 'ill increae the num#er o! retrie"al pro#e% 2006
Hakan Uraz - File Organization
13
Computed Chaining - Dicuion • The num#er o! pro#e re$uired to retrieve a record i e$ual to the poition o! that record in it pro#e chain% • The 'ort cae retrie"al per!ormance i bounded #* the num#er o! record that collide at an* one home addre% The other colliion reolution method ha"e 'ort cae per!ormance #aed on the placement o! the record inerted pre"ioul* into the ta#le% • =ith computed chaining) the 'ort cae retrie"al per!ormance can #e impro"ed #* chooing a hahing !unction that e"enl* ditri#ute the pro#a#le addree !or the gi"en data% 2006
Hakan Uraz - File Organization
20
Computed Chaining - Deletion • To delete a record !rom a ta#le !ormed uing computed chaining re$uire the reinertion o! all the record that !ollo' the deleted record on that pro#e chain% • Computed chaining re$uire more time !or inerting and deleting o that le time ma* #e needed !or retrie"al% 2006
Hakan Uraz - File Organization
21
Computed Chaining - Bariant (ntead o! earching an entire !ile o! record or an entire pro#e chain o! record) i! 'e could divide the record into maller group) 'e could narro' the earch conidera#l* i! 'e kne' the one group that the record mut #e in i! it 'ere in the !ile% Then 'e conquer %
2006
Hakan Uraz - File Organization
22
Computed Chaining - Bariant • The concept o! ha"ing multiple pro#e chain intead o! a ingle pro#e chain !or organizing record% • 4ach chain 'ill #e maller and the earching 'ill #e !ater%
2006
Hakan Uraz - File Organization
2
Computed Chaining - Bariant • (n addition to the hahing !unction) Hah+ke*,) to o#tain the home addre !or a record and the incrementing !unction) i+ke*,) to o#tain an increment) 'e introduce a third hahing !unction) g +ke*,) to tell u 'hich pro#e chain to inert into or to earch% Thi i an e/ample o! triple hashing % g +ke*, 0) 1) %%%) R 1 'here R i the num#er o! u#grouping% imple !unction !or g +ke*, i g +ke*, 7 ke* mod R 2006
Hakan Uraz - File Organization
2.
n 4/ample 'ith Eultiple Chain • • • •
5e* 2) 1) 23) 2) 3) 1) 16) ) Hah+ke*, 7 ke* mod 11 i+ke*, 7 8uotient+5e* 9 11, mod 11 g +ke*, 7 ke* mod 2
• (! a ke* i e"en) it 'ill go on the zeroth chain) i! it i odd) it 'ill go on the !irt chain% The record are inerted a #e!ore 'ith the onl* di!!erence #eing the placement o! the peudolink "alue in chain zero !or e"en ke* "alue and in chain one !or odd ke* "alue% 2006
Hakan Uraz - File Organization
2
n 4/ample 'ith Eultiple Chain
2006
Hakan Uraz - File Organization
26
Comparion o! Colliion eolution Eethod
For successful searches
lthough the di!!erence ma* not eem that great !or a ingle retrie"al) multipl* thoe num#er #* 10000 or 10000000% 2006
Hakan Uraz - File Organization
2
Comparion o! Colliion eolution Eethod
2006
Hakan Uraz - File Organization
2
Comparion o! Colliion eolution Eethod
lthough the 'ort cae per!ormance !or locating a record 'ith #oth G(@CH and computed chaining i n) their t*pical per!ormance 'ould #e #etter) #ecaue onl* record on one chain need to #e earched% =ith computed chaining) all the record on a chain mut ha"e the ame home addre) 'hich !urther limit the ize o! the chain% 2006
Hakan Uraz - File Organization
23
Comparion o! Colliion eolution Eethod • The method that pro"ide the lo'et a"erage num#er o! pro#e) in general i DC=C% The econd i computed chaining% • Computed chaining gi"e #etter reult o"er G(@CH% (t eliminated coalecing% =ithout coalecing G(@CH i DC=C and doe per!orm #etter than computed chaining% • (n computed chaining) !or a record that ha a large num#er o! location "iited in locating it ucceor) it ma* #e necear* to tore a !actor o! the num#er o! o!!et% That re$uire more pro#e on retrie"al% 2006
Hakan Uraz - File Organization
0
Comparion o! Colliion eolution Eethod • (! torage i ome'hat carce) computed chaining 'ill then ha"e an ad"antage o"er DC=C% • (! pace i limited) the method that gi"e the #et per!ormance 'ithout re$uiring additional torage !or the record i the #inar* tree method% Aut) it re$uire more proceing time and temporar* torage !or inertion than thoe method that do not ue a link or peudolink% 2006
Hakan Uraz - File Organization
1
Comparion o! Colliion eolution Eethod
2006
Hakan Uraz - File Organization
2
er!ect Hahing hah+ke*, unique addre • Aoth primar* and econdar* clutering are eliminated% • =ith per!ect hahing) 'e need onl* a ingle pro#e to retrie"e a record% • er!ect hahing i applica#le to !ile 'ith onl* a relati"el* mall num#er o! record #ecaue the computing time i #ig% 2006
Hakan Uraz - File Organization
er!ect Hahing • perfect hashing function map a ke* into a uni$ue addre% (! the range o! potential addree i the ame a the num#er o! ke*) the !unction i a minimal +in pace, per!ect hahing !unction% • eparate hahing need to #e de"ied !or each et o! ke*% (! one or more o! the ke* change) a ne' hahing !unction mut #e contructed% 2006
Hakan Uraz - File Organization
.
er!ect Hahing • (n minicycle algorithm) !or a ta#le o! ize <) a per!ect hahing !unction ma* #e characterized a p%hah+ke*, 7 +h0+ke*,I g Jh1+ke*,KI g Jh2+ke*,K, mod N
• =hat need to #e decided are the !unction h0) h1) h2 and g % • Thee !unction hould #e e!!icient% 2006
Hakan Uraz - File Organization
er!ect Hahing • The 'ort cae o! minic*cle algorithm i O+r 6, 'here r i the num#r o! record in the et% n upper #ound !or r eem to #e a#out 12% • pecial cae o! minic*cle alg% i the Cichelli& algorithm 'hich i not a e!!icient a minic*cle alg%) appropriate !or r upto 60) plu more diad"antage) #ut it i traight!or'ard to undertand% • For larger amount) the minic*cle algorithm i appropriate% 2006
Hakan Uraz - File Organization
6
Cichelli& lgorithm h0 7 length+ke*, h1 7 !irtLcharacter+ke*, h2 7 latLcharacter+ke*, g 7 T + x, • 'here T i a ta#le o! "alue aociated 'ith indi"idual character x 'hich ma* appear in a ke*% The time conuming part i determining T % • (n the ta#le) a "alue ma* #e aigned to more than one character% 2006
Hakan Uraz - File Organization
Cichelli& lgorithm
Get& appl* the alg% to the ke*'ord begin% p%hah+begin, 7 I T +h1+ke*,, I T +h2+ke*,, 7 I T +#, I T +n, 7 I 1 I 1 7 2006
Hakan Uraz - File Organization
2006
Hakan Uraz - File Organization
3
Cichelli& lgorithm • The algorithm in"ol"e ordering the ke* #aed on the !re$uencie o! occurrence o! the !irt and lat character in the ke*% • ignment o! "alue are made to the character in the !irt and lat poition o! the ke* !rom the top o! the ordering to the #ottom% • The algorithm ue an e/hauti"e earch 'ith #acktracking% 2006
Hakan Uraz - File Organization
.0
Cichelli& lgorithm - 4/ample 5e*
cat) ant) dog) gnat) chimp) rat) toad
• =e aume that the ma/imum "alue that ma* #e aigned to a character i .% (! 'e cannot !ind a olution uing .) 'e 'ould tr* a larger "alue% (! thi ma/imum "alue i too mall) 'e 'ill either ha"e no olution or a great amount o! #acktracking% (! the "alue i too large 'e 'on&t o#tain a minimal olution% 2006
Hakan Uraz - File Organization
.1
Cichelli& lgorithm - 4/ample • The !re$uencie o! occurrence o! the !irt and lat character are a71 c72 d 72 g 72 p71 r 71 t 7
2006
Hakan Uraz - File Organization
.2
Cichelli& lgorithm - 4/ample =e can compute the um o! the !re$uencie o! occurrence o! the !irt and lat character o! each ke*% =e o#tain
2006
Hakan Uraz - File Organization
.
Cichelli& lgorithm - 4/ample • =e ne/t order the ke* in decending order #aed on the um o! the !re$uencie to o#tain 'hat i on the right% • =e ar#itraril* choe the decending order% random ordering 'ould ha"e #een e$uall* accepta#le%
2006
Hakan Uraz - File Organization
..
Cichelli& lgorithm - 4/ample • Then 'e check the ordering to ee i! an* ke* e/it that ha"e #oth their !irt and lat character appearing in pre"iou ke*% •
2006
Hakan Uraz - File Organization
.
Cichelli& lgorithm - 4/ample • =e #egin aigning "alue to the character% t 7 0
d 7 0
p%hah+toad, 7 . t 7 0
d 7 0 g 7 1
p%hah+toad, 7 . p%hah+gnat, 7 2006
Hakan Uraz - File Organization
.6
Cichelli& lgorithm - 4/ample • (n :dog; 'e ha"e a con!lict% @o) #acktrack t 7 0
d 7 0 g 7 2
p%hah+toad, 7 . p%hah+gnat, 7 6 t 7 0
d 7 0 g 7 2
p%hah+toad, 7 . p%hah+gnat, 7 6 p%hah+dog, 7 2006
Hakan Uraz - File Organization
.
Cichelli& lgorithm - 4/ample t 7 0
d 7 0 g 7 2
c 7 0
p%hah+toad, 7 . p%hah+gnat, 7 6 p%hah+dog, 7 p%hah+cat, 7
2006
Hakan Uraz - File Organization
.
Cichelli& lgorithm - 4/ample t 7 0
d 7 0 g 7 2
c 7 0
r 7 .
p%hah+toad, 7 . p%hah+gnat, 7 6 p%hah+dog, 7 p%hah+cat, 7 p%hah+rat, 7 2006
Hakan Uraz - File Organization
.3
Cichelli& lgorithm - 4/ample t 7 0
d 7 0 g 7
c 7 0
p%hah+toad, 7 . p%hah+gnat, 7 p%hah+dog, 7 6 p%hah+cat, 7 p%hah+rat, 2006
r 7 2 • =e are not a#le to aign a "alue to a 'ithout a colliion% @o) 'e #acktracked%
7
Hakan Uraz - File Organization
0
Cichelli& lgorithm - 4/ample t 7 0
d 7 0 g 7 .
c 7 0
p%hah+toad, 7 . p%hah+gnat, 7 p%hah+dog, 7
r 7 2
a 7
• ro#lem till e/ited and 'e #acktracked%
p%hah+cat, 7 p%hah+rat, 7 p%hah+ant, 7 6 2006
Hakan Uraz - File Organization
1
Cichelli& lgorithm - 4/ample t 7 0 d 7 0 g 7 . c 7 0 r 7 2 a 7 p 7 . p%hah+toad, 7 . p%hah+gnat, 7 p%hah+dog,
7
p%hah+cat,
7
p%hah+rat,
7
p%hah+ant,
76
@ince thi olution map the e"en ke* into e"en conecuti"e location) it i a minimal per!ect hahing !unction%
p%hah+chimp, 7 3 2006
Hakan Uraz - File Organization
2
Cichelli& lgorithm - Dicuion • The amount o! #acktracking can #e #ig% and computational time i u#tantial% On the poiti"e ide) need to #e computed onl* once% • The diad"antage (t re$uire that no t'o ke* o! the ame length hare the !irt and lat character% For a lit 'ith more than . element) it ma* #e necear* to egment it into u#lit% For certain lit) it ma* #e impoi#le to tell in ad"ance i! the method 'ill *ield a minimal per!ect hahing !unction% 2006
Hakan Uraz - File Organization