DATA STR UCTER S WITH C
10CS35 DATA STR UCTUR ES WITH C (Common to CSE & ISE)
Sub ject Code: 10CS35 Hours/Week : 04 Total Hours : 52
I.A. Mark s : 25 Exam Hours: 03 Exam Mark s: 100 PAR T – A
UNIT - 1 8 Hours BASIC CONCEPTS: Pointer s
and Dynamic Memor y Allocation, Algor ithm S pecif ication, Data A bstr action, Per f or mance Analysis, Per f or mance Measur ement UNIT -2 6 Hours AR R AYS and STR UCTUR ES:
Ar r ays, Dynamically Allocated Ar r ays, Str uctur es and Unions, Polynomials, s par se Matr ices, R e pr esentation of Multidimensional Ar r ays UNIT - 3 6 Hours STACK S AND QUEUES:
Stack s, Stack s Using Dynamic Ar r ays, Queues, Cir cular Queues Using Dynamic Ar r ays, Evaluation of Ex pr essions, Multi ple Stack s and Queues. UNIT - 4 6 Hours LINK ED LISTS: Singly
Link ed lists and Chains, R e pr esenting Chains in C, Link ed Stack s and Queues, Polynomials, Additional List o per ations, S par se Matr ices, Dou bly Link ed Lists PAR T - B
UNIT - 5 6 Hours TR EES – 1: Intr oduction, Binar y Tr ees, Binar y Tr ee Tr aver sals, Thr eaded Binar y Tr ees, Hea ps. UNIT – 6 6 Hours TR EES – 2, GR APHS:
Binar y Sear ch Tr ees, Selection Tr ees, For ests, R e pr esentation of Dis joint Sets, Counting Binar y Tr ees, The Gr a ph A bstr act Data Ty pe. UNIT - 7 6 Hours PR IOR ITY QUEUES Single-
Fi bonacci Hea ps, Pair ing Hea ps.
and Dou ble-Ended Pr ior ity Queues, Lef tist Tr ees, Binomial Hea ps,
UNIT - 8 8 Hours EFFICIENT BINAR Y SEAR CH TR EES:
S play Tr ees.
O ptimal Binar y Sear ch Tr ees, AVL Tr ees, R ed-Black Tr ees,
Text Book :
1. Hor owitz, Sahni, Ander son-Fr eed: Fundamentals of Data Str uctur es in C, 2nd Edition, Univer sities Pr ess, 2007. (Cha pter s 1, 2.1 to 2.6, 3, 4, 5.1 to 5.3, 5.5 to 5.11, 6.1, 9.1 to 9.5, 10) Ref erence Book s:
1. Yedidyah, Augenstein, Tannen baum: Data Str uctur es Using C and C++, 2nd Edition, Pear son Education, 2003. 2. De basis Samanta: Classic Data Str uctur es, 2nd Edition, PHI, 2009. 3. R ichar d F. Gil ber g and Behr ouz A. For ouzan: Data Str uctur es A Pseudocode A p pr oach with C, Cengage Lear ning, 2005. 4. R o ber t K r use & Br uce Leung: Data Str uctur es & Pr ogr am Design in C, Pear son Education, 2007.
DEPT OF CSE, SJBIT
Page 1
DATA STR UCTER S WITH C
10CS35 TABLE OF CONTENTS
TABLE OF CO NTE NTS.............................................................................................................................. 1 U NIT – 1: BASIC CO NCEPTS.................................................................................................................... 4 1.1-Pointer s and Dynamic Memor y Allocation: ....................................................................................... 4 1.2. Algor ithm S pecif ication ..................................................................................................................... 8 1.3. Data A bstr action ................................................................................................................................ 9 1.4. Per f or mance Analysis ...................................................................................................................... 10 1.5. Per f or mance Measur ement: ............................................................................................................. 11 1.6 R ECOMME NDED QUESTIO NS .................................................................................................... 13 U NIT -2 : AR R AYS and STR UCTUR ES .................................................................................................. 14 2.1 AR R AY............................................................................................................................................. 14 2.2. Dynamically Allocating Multidimensional Ar r ays .......................................................................... 18 2.3. Str uctur es and Unions ...................................................................................................................... 20 2.4 .Polynomials...................................................................................................................................... 23 2.5. S par se Matr ices ................................................................................................................................ 26 2.6. R e pr esentation of Multidimensional ar r ays ..................................................................................... 29 2.7. R ECOMME NDED QUESTIO NS ................................................................................................... 31 U NIT – 3 : STACK S A ND QUEUES ........................................................................................................ 32 3.1.Stack s: ............................................................................................................................................... 32 3.2. Stack s Using Dynamic Ar r ays ......................................................................................................... 34 3.3. Queues.............................................................................................................................................. 34 3.4. Cir cular Queues Using Dynamic Ar r ays.......................................................................................... 37 3.5. Evaluation of Ex pr essions: Evaluating a postf ix ex pr ession ........................................................... 39 3.6. Multi ple Stack s and Queues............................................................................................................. 43 3.7. R ECOMME NDED QUESTIO NS ................................................................................................... 44 U NIT – 4 : LI NK ED LISTS ...................................................................................................................... 45 4.1. Singly Link ed lists and Chains......................................................................................................... 45 4.2. R e pr esenting Chains in C: ............................................................................................................... 46 4.3. Link ed Stack s and Queues ............................................................................................................... 47 4.4. Polynomials: .................................................................................................................................... 49 4.5. Additional List o per ations: .............................................................................................................. 52 4.6. S par se Matr ices ................................................................................................................................ 54 DEPT OF CSE, SJBIT
Page 2
DATA STR UCTER S WITH C
10CS35
4.7. Dou bly Link ed Lists ........................................................................................................................ 58 4.8. R ECOMME NDED QUESTIO NS ................................................................................................... 60 U NIT – 5 : TR EES – 1................................................................................................................................ 61 5.1 Intr oduction:...................................................................................................................................... 61 5.2 Binar y Tr ees:..................................................................................................................................... 63 5.3 Binar y tr ee Tr aver sals: ...................................................................................................................... 65 5.4. Thr eaded Binar y tr ees: ..................................................................................................................... 67 5.5. Hea ps ............................................................................................................................................... 67 5.6. R ECOMME NDED QUESTIO NS ................................................................................................... 70 U NIT – 6 : TR EES – 2, GR APHS .............................................................................................................. 71 6.1 Binar y Sear ch Tr ees .......................................................................................................................... 74 6.2. Selection Tr ees ................................................................................................................................. 76 6.3 For ests ............................................................................................................................................... 78 6.4 R e pr esentation of Dis joint Sets ......................................................................................................... 78 6.5 Counting Binar y Tr ees: ..................................................................................................................... 79 6.6 The Gr a ph A bstr act Data Ty pe ......................................................................................................... 80 6.7. R ECOMME NDED QUESTIO NS ................................................................................................... 81 U NIT – 7 : PR IOR ITY QUEUES ............................................................................................................... 82 7.1. Single- and Dou ble-Ended Pr ior ity Queues: .................................................................................... 82 7.2. Lef tist tr ee: ....................................................................................................................................... 85 7.3. Binomial Hea ps................................................................................................................................ 87 7.4. Fi bonacci Hea ps ............................................................................................................................... 91 7.5. Pair ing hea ps .................................................................................................................................... 95 7.6. R ECOMME NDED QUESTIO NS ................................................................................................... 97 U NIT – 8 : EFFICIE NT BI NAR Y SEAR CH TR EES ................................................................................ 98 8.1. O ptimal Binar y Sear ch Tr ees ........................................................................................................... 98 8.2. AVL Tr ees ....................................................................................................................................... 99 8.3. R ed- black Tr ees ............................................................................................................................. 103 8.5. R ECOMME NDED QUESTIO NS ................................................................................................. 119
DEPT OF CSE, SJBIT
Page 3
DATA STR UCTER S WITH C
10CS35
U NIT – 1: BASIC CO NCEPTS
1.1-Pointer s and Dynamic Memor y Allocation: In com puter science, a pointer is a pr ogr amming language data ty pe whose value r ef er s dir ectly to (or " points to") another value stor ed elsewher e in the com puter memor y using its addr ess. For high-level pr ogr amming languages, pointer s ef f ectively tak e the place of gener al pur pose r egister s in low-level languages such as assem bly language or machine code, but may be in availa ble memor y. A pointer ref er ences a location in memor y, and o btaining the value at the location a pointer r ef er s to is k nown as der ef er encing the pointer . A pointer is a sim ple, mor e concr ete im plementation of the mor e a bstr act r ef er ence data ty pe. Sever al languages su p por t some ty pe of pointer , although some have mor e restr ictions on their use than other s. Pointer s to data signif icantly im pr ove per f or mance f or r e petitive o per ations such as tr aver sing str ings, look u p ta bles, contr ol ta bles and tr ee str uctur es. In par ticular , it is of ten much chea per in time and s pace to co py and der ef er ence pointer s than it is to co py and access the data to which the pointer s point.Pointer s ar e also used to hold the addr esses of entr y points f or called su br outines in pr ocedur al pr ogr amming and f or run-time link ing to dynamic link li br ar ies (DLLs). In o b ject-or iented pr ogr amming, pointer s to f unctions ar e used f or binding methods, of ten using what ar e called vir tual method ta bles. Declar ing a pointer var ia ble is quite similar to declar ing an nor mal var ia ble all you have to do is to inser t a star '*' o per ator bef or e it. Gener al f or m of pointer declar ation is ty pe* name; wher e ty pe r e pr esent the ty pe to which pointer think s it is pointing to. Pointer s to machine def ined as well as user -def ined ty pes can be made Pointer Intialization: var ia ble _ ty pe * pointer _ name = 0; or var ia ble _ ty pe * pointer _ name = NULL; char * pointer _ name = "str ing value her e"; The o per ator that gets the value f r om pointer var ia ble is * (indir ection o per ator ). This is called the ref er ence to pointer . P=&a So the pointer p has addr ess of a and the value that that contained in that addr ess can be accessed by : *p So the o per ations done over it can be ex plained as below: a++; a=a+1; * p=* p+1; DEPT OF CSE, SJBIT
Page 4
DATA STR UCTER S WITH C
10CS35
(* p)++: While " pointer " has been used to r ef er to r ef er ences in gener al, it mor e pr o per ly a p plies to data str uctur es whose inter f ace ex plicitly allows the pointer to be mani pulated (ar ithmetically via point er ar it hmet ic) as a memor y addr ess, as o p posed to a magic cook ie or ca pa bility wher e this is not possi ble.
Fig 1: Pointer a pointing to the memor y addr ess associated with var ia ble b. Note that in this par ticular diagr am, the com puting ar chitectur e uses the same addr ess s pace and data pr imitive f or both pointer s and non- pointer s; this need not be the case. Pointers and Dynamic Memory Allocation:
Although ar r ays ar e good things, we cannot ad just the size of them in the middle of the pr ogr am. If our ar r ay is too smal l - our pr ogr am will f ail f or lar ge data. If our ar r ay is too bi g - we waste a lot of s pace, again r estr icting what we can do. The r ight solution is to build the data str uctur e f r om small pieces, and add a new piece whenever we need to mak e it lar ger . P oint er s ar e the connections which hold these pieces together ! Pointer s in R eal Lif e In many ways, tele phone num ber s ser ve as pointer s in today's society. To contact someone, you do not have to car r y them with you at all times. Al l you need i s t heir number . Many dif f er ent peo ple can all have your num ber simultaneously. Al l you need d o i s co p y t he point er . Mor e com plicated str uctur es can be built by com bining pointer s. F or e xam pl e , phone t r ee s or d ir ect or y in f or mat ion. Addr esses ar e a mor e physically cor r ect analogy f or pointer s, since they r eally ar e memor y addr esses. Link ed Data Str uctur es All the dynamic data str uctur es we will build have cer tain shar ed pr o per ties. We need a pointer to the entir e o b ject so we can f ind it. Note that this is a pointer , not a cell. Each cell contains one or mor e data fields, which is what we want to stor e. Each cell contains a pointer f ield to at least one ``next'' cell. Thus much of the s pace used in link ed data str uctur es is not data! We must be a ble to detect the end of the data str uctur e. This is why we need the NIL pointer s. Ther e ar e f our f unctions def ined in c standar d f or dynamic memmor y allocation - calloc, f r ee, malloc and r ealloc. But in the hear t of DMA ther e ar e only 2 of them malloc and f r ee. Malloc stands f or memmor y allocations and is used to allocate memmor y f r om the hea p while f r ee is used to r etur n allocated memmor y f r om malloc back to hea p. Both these f unctions uses a standar d li br ar y header DEPT OF CSE, SJBIT
Page 5
DATA STR UCTER S WITH C
10CS35
.War ning !!! - f r ee ( ) f unction should be used to f r ee memmor y only allocated pr eviously f r om malloc, r ealloc or calloc. Fr eeing a r andom or undef ined or com piler allocated memmor y can lead to sever e damage to the O.S., Com piler and Com puter Har dwar e Itself , in f or m of nasty system cr ashes. The pr ototy pe of malloc ( ) f unction is void *malloc (size _ t num ber _ of _ bytes) Im por tant thing to nor e is malloc r etur n a void pointer which can be conver ted to any pointer ty pe as ex plained in pr evious points. Also size _ t is a s pecial ty pe of unsigned integer def ined in ca pa ble of stor ing lar gest memmor y size that can be allocated using DMA, num ber _ of _ bytes is a value of ty pe size _ t gener ally a integer indicating the amount of memmor y to be allocated. Function malloc ( ) will be r etur ning a null pointer if memmor y allocation f ails and will r etur n a pointer to f ir st r egion of memmor y allocated when succsef ull. It is also r ecommended you check the pointer r etur ned f or f ailur e in allocation bef or e using the r etur ned memmor y f or incr easing sta bility of your pr ogr am, gener ally pr ogr ammer s pr ovide some er r or handling code in case of f ailur es. Also this r etur ned pointer never needs a ty pecast in C since it is a void pointer , it is a good pr actice to do one since it is r equir ed by C++ and will pr oduce a er r or if you used C++ com piler f or com pilation.Another commonly used o per ator used with malloc is sizeof o per ator which is used to calculate the value of num ber _ of _ bytes by deter ming the size of the com piler as well as user def ined ty pes and var ia bles. The pr ototy pe of f r ee ( ) f unction is void f r ee (void * p) Function f r ee ( ) is o p posite of malloc and is used to r etur n memmor y pr eviously allocated by other DMA f unctions. Also only memmor y allocated using DMA should be f r ee using f r ee () other wise you may cor r u pt your memmor y allocation system at minimum. C Sour ce code shown below shows sim ple method of using dynamic memmor y allocation elegantly – #include #include int main () { int * p; p = (int *) malloc ( sizeof (int) ); //Dynamic Memmor y Allocation if ( p == NULL) //Incase of memmor y allocation f ailur e execute the er r or handling code block { pr intf ("\nOut of Memmor y"); exit (1); } * p = 100; DEPT OF CSE, SJBIT
Page 6
DATA STR UCTER S WITH C
10CS35
pr intf ("\n p = %d", * p); //Dis play 100 of cour se. retur n 0; } Dynamic Allocation: To get dynamic allocation, use new:
p := New( pty pe); New( pty pe) allocates enough s pace to stor e exactly one o b ject of the ty pe pty pe. Fur ther , it r etur ns a pointer to this em pty cell. Bef or e a new or other wise ex plicit initialization, a pointer var ia ble has an ar bitr ar y value which points to t r oubl e! War ning -
initialize all pointer s bef or e use. Since you cannot initialize them to ex plicit constants, your only choices ar e NIL - meaning ex plicitly nothing. New( pty pe) - a f r esh chunk of memor y. Pointer Exam ples Exam ple: P: = new(node); q := new(node); p.x gr ants access to the f ield x of the r ecor d pointed to by p. p^.inf o := "music"; q^.next := nil; The pointer value itself may be co pied, which does not change any of the other f ields. Note this dif f er ence between assigning pointer s and what they point to. p := q; We get a r eal mess. We have com pletely lost access to music and can't get it back ! Pointer s ar e unid ir ect ional . Alter natively, we could co py the o b ject being pointed to instead of the pointer itself . p^ := q^; What ha p pens in each case if we now did: p^.inf o := "data str uctur es"; Wher e Does the S pace Come Fr om? Can we r eally get as much memor y as we want without limit just by using New? No, because ther e ar e the physical limits im posed by the size of the memor y of the com puter we ar e using. Usually Modula-3 systems let the dynamic memor y come f r om the ``other side'' of the ``activation recor d stack '' used to maintain pr ocedur e calls.Just as the stack r euses memor y when a pr ocedur e exits, dynamic stor age must be r ecycled when we don't need it anymor e. DEPT OF CSE, SJBIT
Page 7
DATA STR UCTER S WITH C
10CS35
Gar bage Collection The Modula-3 system is constantly k ee ping watch on the dynamic memor y which it has allocated, mak ing sur e that somet hin g is still pointing to it. If not, ther e is no way f or you to get access to it, so the s pace might as well be r ecycled. The g ar ba g e col l ect or automatically f r ees u p the memor y which has nothing pointing to it. It f r ees you f r om having to wor r y a bout ex plicitly f r eeing memor y, at the cost of leaving cer tain str uctur es which it can't f igur e out ar e r eally gar bage, such as a cir cular list. Ex plicit Deallocation Although cer tain languages lik e Modula-3 and Java su p por t gar bage collection, other s lik e C++ r equir e you to ex plicitly deallocate memor y when you don't need it. is the o p posite of New - it tak es the o b ject which is pointed to by p and mak es it availa ble f or r euse. Note that each dis pose tak es car e of only one cell in a list. To dis pose of an entir e link ed str uctur e we must do it one cell as a time. Note we can get into tr ou ble with d i s po se:
Di s po se( p )
Of cour se, it is too late to dis pose of music, so it will endur e f or ever without gar bage collection. Su p pose we d i s po se( p ), and later allocation mor e dynamic memor y with new. The cell we dis posed of might be r eused. Now what does q point to? Answer - the same location, but it means something else! So called d an g l in g r e f er ence s ar e a hor r i ble er r or , and ar e the main r eason why Modula-3 su p por ts gar bage collection. A dangling r ef er ence is lik e a fr iend lef t with your old phone num ber af ter you move. R each out and touch someone - eliminate dangling r ef er ences! Secur ity in Java It is possi ble to ex plicitly dis pose of memor y in Modula-3 when it is r eally necessar y, but it is str ongly discour aged. Java does not allow one to do such o per ations on pointer s at all. The r eason is secur it y.Pointer s allow you access to r aw memor y locations. In the hands of sk illed but evil peo ple, uncheck ed access to pointer s per mits you to modif y the o per ating system's or other peo ple's memor y contents.
1.2. Algorithm Specif ication: A pr agmatic a p pr oach to algor ithm s pecif ication and ver if ication is pr esented. The language AL pr ovides a level of a bstr action between a mathematical s pecif ication notation and a pr ogr amming language, su p por ting com pact but ex pr essive algor ithm descr i ption. Pr oof s of cor r ectness a bout algor ithms wr itten in AL can be done via an em bedding of the semantics of the language in a pr oof system; im plementations of algor ithms can be done thr ough tr anslation to standar d pr ogr amming languages. The pr oof s of cor r ectness ar e mor e tr acta ble than dir ect ver if ication of pr ogr amming language code; descr i ptions in AL ar e mor e easily r elated to executa ble pr ogr ams than standar d mathematical s pecif ications. AL pr ovides an inde pendent, por ta ble descr i ption which can be r elated to dif f er ent pr oof systems and dif f er ent pr ogr amming languages. Sever al inter f aces have been ex plor ed and tools f or f ully automatic tr anslation of AL s pecif ications into the HOL logic and Standar d ML executa ble code have been im plemented. A su bstantial case study uses AL as the common s pecif ication language f r om which both the f or mal pr oof s of cor r ectness and executa ble code have been pr oduced. DEPT OF CSE, SJBIT
Page 8
DATA STR UCTER S WITH C
10CS35
1.3. Data Abstraction A bstr action is the pr ocess by which data and pr ogr ams ar e def ined with a r e pr esentation similar to its meaning (semantics), while hiding away the im plementation details. A bstr action tr ies to r educe and f actor out details so that the pr ogr ammer can f ocus on a f ew conce pts at a time. A system can have sever al ab st r act ion l a yer s wher e by dif f er ent meanings and amounts of detail ar e ex posed to the pr ogr ammer . For exam ple, low-level a bstr action layer s ex pose details of the har dwar e wher e the pr ogr am is r un, while high-level layer s deal with the business logic of the pr ogr am. The f ollowing English def inition of a bstr action hel ps to under stand how this ter m a p plies to com puter science, IT and o b jects: ab st r act ion - a conce pt or id ea not a s sociat ed wit h an y s peci f ic in st ance[1]
A bstr action ca ptur es only those detail a bout an o b ject that ar e r elevant to the cur r ent per s pective. The conce pt or iginated by analogy with a bstr action in mathematics. The mathematical technique of a bstr action begins with mathematical def initions, mak ing it a mor e technical a p pr oach than the gener al conce pt of a bstr action in philoso phy. For exam ple, in both com puting and in mathematics, num ber s ar e conce pts in the pr ogr amming languages, as f ounded in mathematics. Im plementation details de pend on the har dwar e and sof twar e, but this is not a r estr iction because the com puting conce pt of num ber is still based on the mathematical conce pt. In com puter pr ogr amming, a bstr action can a p ply to contr ol or to data: Contr ol a bstr action is the a bstr action of actions while data a bstr action is that of data str uctur es. Contr ol a bstr action involves the use of su b pr ogr ams and r elated conce pts contr ol f lows. Data a bstr action allows handling data bits in meaningf ul ways. For exam ple, it is the basic motivation behind dataty pe. One can r egar d the notion of an o b ject (f r om o b ject-or iented pr ogr amming) as an attem pt to com bine a bstr actions of data and code. The same a bstr act def inition can be used as a common inter f ace f or a f amily of o b jects with dif f er ent im plementations and behavior s but which shar e the same meaning. The inher itance mechanism in o b jector iented pr ogr amming can be used to def ine an a bstr act class as the common inter f ace. Data a bstr action enf or ces a clear se par ation between the ab st r act pr o per ties of a data ty pe and the concr et e details of its im plementation. The a bstr act pr o per ties ar e those that ar e visi ble to client code that mak es use of the data ty pe—the int er f ace to the data ty pe—while the concr ete im plementation is k e pt entir ely pr ivate, and indeed can change, f or exam ple to incor por ate ef f iciency im pr ovements over time. The idea is that such changes ar e not su p posed to have any im pact on client code, since they involve no dif f er ence in the a bstr act behaviour . For exam ple, one could def ine an a bstr act data ty pe called l ook u p t abl e which uniquely associates k e y s with val ue s, and in which values may be r etr ieved by s pecif ying their cor r es ponding k eys. Such a look u p ta ble may be im plemented in var ious ways: as a hash ta ble, a binar y sear ch tr ee, or even a sim ple linear list of (k ey:value) pair s. As f ar as client code is concer ned, the a bstr act pr o per ties of the ty pe ar e the same in each case. Of cour se, this all r elies on getting the details of the inter f ace r ight in the f ir st place, since any changes ther e can have ma jor im pacts on client code. As one way to look at this: the inter f ace f or ms a cont r act on agr eed behaviour between the data ty pe and client code; anything not s pelled out in the contr act is su b ject to change without notice. Languages that im plement data a bstr action include Ada and Modula-2. O b ject-or iented languages ar e commonly claimed to of f er data a bstr action; however , their inher itance conce pt tends to put inf or mation in the inter f ace that mor e pr o per ly belongs in the im plementation; thus, changes to such inf or mation ends u p im pacting client code, leading dir ectly to the Fr agile binar y inter f ace pr o blem. DEPT OF CSE, SJBIT
Page 9
DATA STR UCTER S WITH C
10CS35
1.4. Perf ormance Analysis: Per f or mance analysis involves gather ing f or mal and inf or mal data to hel p customer s and s ponsor s def ine and achieve their goals. Per f or mance analysis uncover s sever al per s pectives on a pr o blem or o p por tunity, deter mining any and all dr iver s towar ds or bar r ier s to successf ul per f or mance, and pr o posing a solution system based on what is discover ed. A lighter def inition is: Per f or mance analysis is the f r ont end of the f r ont end. It's what we do to f igur e out what to do. Some synonyms ar e planning, sco ping, auditing, and diagnostics. What does a per f or mance analyst do? Her e's a list of some of the things you may be doing as par t of a per f or mance analysis: Inter viewing a s ponsor Reading the annual r e por t Chatting at lunch with a gr ou p of customer ser vice r e pr esentatives R eading the or ganization's policy on customer ser vice, f ocusing par ticular ly on the r ecognition and incentive as pects Listening to audiota pes associates with customer ser vice com plaints Leading a f ocus gr ou p with su per visor s Inter viewing some r andomly drawn r e pr esentatives Reviewing the call log R eading an ar ticle in a pr of essional jour nal on the su b ject of customer ser vice per f or mance im pr ovement Chatting at the su per mar k et with some body who is a customer , who wants to tell you a bout her ex per ience with customer ser vice
We distinguish thr ee basic ste ps in the per f or mance analysis pr ocess:
data collection, data tr ansf or mation, and data visualization. Dat a col l ect ion is the pr ocess by which data a bout pr ogr am per f or mance ar e o btained f r om an executing pr ogr am. Data ar e nor mally collected in a f ile, either dur ing or af ter execution, although in some situations it may be pr esented to the user in r eal time. Thr ee basic data collection techniques can be distinguished:
recor d the amount of time s pent in dif f er ent par ts of a pr ogr am. This inf or mation, though minimal, is of ten invalua ble f or highlighting per f or mance pr o blems. Pr of iles ty pically ar e gather ed automatically.
P r o f il e s
C ount er s recor d
either f r equencies of events or cumulative times. The inser tion of counter s may r equir e some pr ogr ammer inter vention. r ecor d each occur r ence of var ious s pecif ied events, thus ty pically pr oducing a lar ge amount of data. Tr aces can be pr oduced either automatically or with pr ogr ammer inter vention.
E vent t r ace s
DEPT OF CSE, SJBIT
Page 10
DATA STR UCTER S WITH C
10CS35
The r aw data pr oduced by pr of iles, counter s, or tr aces ar e r ar ely in the f or m r equir ed to answer per f or mance questions. Hence, dat a t r an s f or mat ions ar e a p plied, of ten with the goal of r educing total data volume. Tr ansf or mations can be used to deter mine mean values or other higher -or der statistics or to extr act pr of ile and counter data f r om tr aces. For exam ple, a pr of ile r ecor ding the time s pent in each su br outine on each pr ocessor might be tr ansf or med to deter mine the mean time s pent in each su br outine on each pr ocessor , and the standar d deviation f r om this mean. Similar ly, a tr ace can be pr ocessed to pr oduce a histogr am giving the distr i bution of message sizes. Each of the var ious per f or mance tools descr i bed in su bsequent sections incor por ates some set of built-in tr ansf or mations; mor e s pecialized tr ansf or mation can also be coded by the pr ogr ammer . Par allel per f or mance data ar e inher ently multidimensional, consisting of execution times, communication costs, and so on, f or multi ple pr ogr am com ponents, on dif f er ent pr ocessor s, and f or dif f er ent pr o blem sizes. Although data r eduction techniques can be used in some situations to com pr ess per f or mance data to scalar values, it is of ten necessar y to be a ble to ex plor e the r aw multidimensional data. As is well k nown in com putational science and engineer ing, this pr ocess can benef it enor mously f r om the use of d at a vi sual i z at ion techniques. Both conventional and mor e s pecialized dis play techniques can be a p plied to per f or mance data. As we shall see, a wide var iety of data collection, tr ansf or mation, and visualization tools ar e availa ble. When selecting a tool f or a par ticular task , the f ollowing issues should be consider ed: In gener al, per f or mance data o btained using sam pling techniques ar e less accur ate than data o btained by using counter s or timer s. In the case of timer s, the accur acy of the clock must be tak en into account.
Accur ac y.
S im pl icit y. The best tools in many cir cumstances ar e those that collect data automatically, with little or no pr ogr ammer inter vention, and that pr ovide convenient analysis ca pa bilities.
A f lexi ble tool can be extended easily to collect additional per f or mance data or to pr ovide dif f er ent views of the same data. Flexi bility and sim plicity ar e of ten o p posing r equir ements.
F l e xibil it y.
Unless a com puter pr ovides har dwar e su p por t, per f or mance data collection inevita bly intr oduces some over head. We need to be awar e of this over head and account f or it when analyzing data.
I nt r u sivene s s.
Ab st r act ion. A good per f or mance
tool allows data to be examined at a level of a bstr action a p pr o pr iate f or the pr ogr amming model of the par allel pr ogr am. For exam ple, when analyzing an execution tr ace f r om a message- passing pr ogr am, we pr o ba bly wish to see individual messages, par ticular ly if they can be r elated to send and r eceive statements in the sour ce pr ogr am. However , this pr esentation is pr o ba bly not a p pr o pr iate when studying a data- par allel pr ogr am, even if com pilation gener ates a message- passing pr ogr am. Instead, we would lik e to see communication costs r elated to data- par allel pr ogr am statements.
1.5. Perf ormance Measurement: Per f or mance measur ement is the pr ocess wher e by an or ganization esta blishes the par ameter s within which pr ogr ams, investments, and acquisitions ar e r eaching the desir ed r esults. Good Perf ormance Measures:
Pr ovide a way to see if our str ategy is wor k ing Focus em ployees' attention on what matter s most to success DEPT OF CSE, SJBIT
Page 11
DATA STR UCTER S WITH C
10CS35
Allow measur ement of accom plishments, not just of the wor k that is per f or med Pr ovide a common language f or communication Ar e ex plicitly def ined in ter ms of owner , unit of measur e, collection f r equency, data quality, ex pected value(tar gets), and thr esholds Ar e valid, to ensur e measur ement of the r ight things Ar e ver if ia ble, to ensur e data collection accur acy EX-
Per f or mance R ef er ence Model of the Feder al Enter pr ise Ar chitectur e, This pr ocess of measur ing per f or mance of ten r equir es the use of statistical evidence to deter mine pr ogr ess towar d s pecif ic def ined or ganizational o b jectives. Per f or mance measur ement is a f undamental building block of TQM and a total quality or ganization. Histor ically, or ganisations have always measur ed per f or mance in some way thr ough the f inancial per f or mance, be this success by pr of it or f ailur e thr ough liquidation. However , tr aditional per f or mance measur es, based on cost accounting inf or mation, pr ovide little to su p por t or ganizations on their quality jour ney, because they do not ma p pr ocess per f or mance and im pr ovements seen by the customer . In a successf ul total quality or ganisation, per f or mance will be measur ed by the im pr ovements seen by the customer as well as by the r esults deliver ed to other stak eholder s, such as the shar eholder s. This section cover s why measur ing per f or mance is im por tant. This is f ollowed by a descr i ption of cost of quality measur ement, which has been used f or many year s to dr ive im pr ovement activities and r aise awar eness of the ef f ect of quality pr o blems in an or ganisation.
DEPT OF CSE, SJBIT
Page 12
DATA STR UCTER S WITH C
10CS35
A sim ple per f or mance measur ement f r amewor k is outlined, which includes mor e than just measur ing, but also def ining and under standing metr ics, collecting and analysing data, then pr ior itising and tak ing im pr ovement actions. A descr i ption of the balanced scor ecar d a p pr oach is also cover ed. Why measure perf ormance?
‘ W hen you can mea sur e what you ar e s peak in g about and e x pr e s s it in number s , you k now somet hin g about it ’ . ‘Y ou cannot mana g e what you cannot mea sur e’ .
These ar e two of ten-quoted statements that demonstr ate why measur ement is im por tant. Yet it is sur pr ising that or ganisations f ind the ar ea of measur ement so dif f icult to manage. In the cycle of never -ending im pr ovement, per f or mance measur ement plays an im por tant r ole in: • Identif ying and tr ack ing pr ogr ess against or ganisational goals • Identif ying o p por tunities f or im pr ovement • Com par ing per f or mance against both inter nal and exter nal standar ds R eviewing the per f or mance of an or ganisation is also an im por tant ste p when f or mulating the dir ection of the str ategic activities. It is im por tant to k now wher e the str engths and weak nesses of the or ganisation lie, and as par t of the ‘ P l an – Do – Check – Act ’ cycle, measur ement plays a k ey r ole in quality and pr oductivity im pr ovement activities. The main r easons it is needed ar e: • To ensur e customer r equir ements have been met • To be a ble to set sensi ble ob ject ive s and com ply with them • To pr ovide st and ar d s for esta blishing com par isons • To pr ovide vi sibil it y and a “scor e boar d” f or peo ple to monit or their own per f or mance level • To highlight qual it y pr obl em s and deter mine ar eas f or pr ior it y at t ent ion • To pr ovide f eed back for dr iving the im pr ovement ef f or t It is also im por tant to under stand the im pact of TQM on im pr ovements in business per f or mance, on sustaining cur r ent per f or mance and r educing any possi ble decline in per f or mance.
1.6 RECOMMENDED QUESTIONS 1. Def ine Data Str uctur es?
2.What is a pointer var ia ble? 3. Dif f er ence between A bstr act Data Ty pe, Data Ty pe and Data Str uctur e? 4. Def ine an A bstr act Data Ty pe (ADT)? 5. Give any 2 advantages and disadvantages of using pointer s?
DEPT OF CSE, SJBIT
Page 13
DATA STR UCTER S WITH C
10CS35
UNIT -2 : AR R AYS and STR UCTUR ES
2.1 AR R AY: Def inition :Ar r ay by def inition is a var ia ble that hold multi ple elements which has the same data ty pe. Declar ing Ar r ays : We can declar e an ar r ay by s pecif y its data ty pe, name and the num ber of elements the ar r ay holds between squar e br ack ets immediately f ollowing the ar r ay name. Her e is the syntax:
1
data _ ty pe ar r ay _ name[size];
For exam ple, to declar e an integer ar r ay which contains 100 elements we can do as f ollows:
1
int a[100];
Ther e ar e some r ules on ar r ay declar ation. The data ty pe can be any valid C data ty pes including str uctur e and union. The ar r ay name has to f ollow the r ule of var ia ble and the size of ar r ay has to be a positive constant integer .We can access ar r ay elements via indexes ar r a y _ name[ ind e x ]. Indexes of ar r ay star ts f r om 0 not 1 so the highest elements of an ar r ay is ar r a y _ name[ si z e-1] Initializing Ar r ays : It is lik e a var ia ble, an ar r ay can be initialized. To initialize an ar r ay, you pr ovide initializing values which ar e enclosed within cur ly br aces in the declar ation and placed f ollowing an equals sign af ter the ar r ay name. Her e is an exam ple of initializing an integer ar r ay.
int list[5] = {2,1,3,7,8};
Ar r ay R e pr esentation
O per ations r equir e sim ple im plementations. Inser t, delete, and sear ch, r equir e linear time Inef f icient use of s pace
It is a p pr o pr iate that we begin our study of data str uctur es with the ar r ay. The ar r ay is of ten the only means f or str uctur ing data which is pr ovided in a pr ogr amming language. Ther ef or e it deser ves a signif icant amount of attention. If one ask s a gr ou p of pr ogr ammer s to def ine an ar r ay, the most of ten quoted saying is: a con secut ive set o f memor y l ocat ion s. This is unf or tunate because it clear ly r eveals a common point of conf usion, namely the distinction between a data str uctur e and its r e pr esentation. It is tr ue that ar r ays ar e almost always im plemented by using consecutive memor y, but not always. Intuitively, an ar r ay is a set of pair s, index and value. For each index which is def ined, ther e is a value DEPT OF CSE, SJBIT
Page 14
DATA STR UCTER S WITH C
10CS35
associated with that index. In mathematical ter ms we call this a cor r es pondence or a ma p ping. However , as com puter scientists we want to pr ovide a mor e f unctional def inition by giving the o per ations which ar e per mitted on this data str uctur e. For ar r ays this means we ar e concer ned with only two o per ations which r etr ieve and stor e values. Using our notation this o b ject can be def ined as: structure A R R AY (val ue , ind e x) R E AT E ( ) ar r a y declare C R E T R I E V E (ar r a y ,ind e x) val ue S T O R E (ar r a y ,ind e x ,val ue) ar r a y; f or all A ar r a y , i , j ind e x , x val ue let R E T R I E V E (C R E AT E ,i) :: = error R E T R I E V E (S T O R E ( A ,i , x) , j) :: = A L(i , j) then x else R E T R I E V E ( A , j) if E QU end end A R R AY
The f unction CR EATE pr oduces a new, em pty ar r ay. R ETR IEVE tak es as in put an ar r ay and an index, and either r etur ns the a p pr o pr iate value or an er r or . STOR E is used to enter new index-value pair s. The second axiom is r ead as "to r etr ieve the j-th item wher e x has alr eady been stor ed at index i in A is equivalent to check ing if i and j ar e equal and if so, x , or sear ch f or the j-th value in the r emaining ar r ay, A." This axiom was or iginally given by J. McCar thy. Notice how the axioms ar e inde pendent of any re pr esentation scheme. Also, i and j need not necessar ily be integer s, but we assume only that an EQUAL f unction can be devised. If we r estr ict the index values to be integer s, then assuming a conventional r andom access memor y we can im plement STOR E and R ETR IEVE so that they o per ate in a constant amount of time. If we inter pr et the indices to be n-dimensional, (i1 ,i2 , .. ,.in), then the pr evious axioms a p ply immediately and def ine ndimensional ar r ays. In section 2.4 we will examine how to im plement R ETR IEVE and STOR E f or multidimensional ar r ays using consecutive memor y locations. Array and Pointer:
Each ar r ay element occu pies consecutive memor y locations and ar r ay name is a pointer that points to the fir st element. Beside accessing ar r ay via index we can use pointer to mani pulate ar r ay. This pr ogr am hel ps you visualize the memor y addr ess each ar r ay elements and how to access ar r ay element using pointer .
01 #include 02 03 void main() 04 { 05 06
const int size = 5;
DEPT OF CSE, SJBIT
Page 15
DATA STR UCTER S WITH C
10CS35
07 08
int list[size] = {2,1,3,7,8};
09 10
int* plist = list;
11 12
// pr int memor y addr ess of ar r ay elements
13
f or (int i = 0; i < size;i++)
14
{
15
pr intf ("list[%d] is in %d\n",i,&list[i]);
16 17
}
18 19
// accessing ar r ay elements using pointer
20
f or (i = 0; i < size;i++)
21
{
22
pr intf ("list[%d] = %d\n",i,* plist);
23 24
/* incr ease memor y addr ess of pointer so it go to the next
25
element of the ar r ay */
26 27
plist++; }
28 29 } Her e is the out put DEPT OF CSE, SJBIT
Page 16
DATA STR UCTER S WITH C list[0] list[1] list[2] list[3] list[4] list[0] list[1] list[2] list[3] list[4] = 8
10CS35 is is is is is
= = = =
in in in in in
1310568 1310572 1310576 1310580 1310584 2 1 3 7
You can stor e pointer s in an ar r ay and in this case we have an ar r ay of pointer s. This code sni p pet uses an ar r ay to stor e integer pointer .
1 int *a p[10]; Multidimensional Ar r ays: An ar r ay with mor e than one index value is called a multidimensional ar r ay. The entir e ar r ay a bove is called single-dimensional ar r ay. To declar e a multidimensional ar r ay you can do f ollow syntax
1 data _ ty pe ar r ay _ name[][][]; The num ber of squar e br ack ets s pecif ies the dimension of the ar r ay. For exam ple to declar e two dimensions integer ar r ay we can do as f ollows:
1 int matr ix[3][3]; Initializing Multidimensional Ar r ays : You can initialize an ar r ay as a single-dimension ar r ay. Her e is an exam ple of initialize an two dimensions integer ar r ay:
1 int matr ix[3][3] = 2{ 3 {11,12,13}, 4 {21,22,23}, 5 {32,31,33}, 6 };
Dynamically Allocated Arrays: One-Dimensional Ar r ays
In C, ar r ays must have their extents def ined at com pile-time. Ther e's no way to post pone the def inition of the size of an ar r ay until r untime. Luck ily, with pointer s and malloc, we can wor k ar ound this limitation. DEPT OF CSE, SJBIT
Page 17
DATA STR UCTER S WITH C
10CS35
To allocate a one-dimensional ar r ay of length N of some par ticular ty pe, sim ply use malloc to allocate enough memor y to hold N elements of the par ticular ty pe, and then use the r esulting pointer as if it wer e an ar r ay. For exam ple, the f ollowing code sni p pet allocates a block of N ints, and then, using ar r ay notation, f ills it with the values 0 thr ough N -1: int *A = malloc (sizeof (int) * N); int i;
for (i = 0; i < N; i++) A[i] = i; This idea is ver y usef ul f or dealing with str ings, which in C ar e r e pr esented by ar r ays of char s, ter minated with a '\0' char acter . These ar r ays ar e near ly always ex pr essed as pointer s in the declar ation of f unctions, but accessed via C's ar r ay notation. For exam ple, her e is a f unction that im plements str len: int str len (char *s) { int i;
for (i = 0; s[i] != '\0'; i++) ; retur n (i) }
2.2. Dynamically Allocating Multidimensional Arrays We've seen that it's str aightf or war d to call malloc to allocate a block of memor y which can simulate an ar r ay, but with a size which we get to pick at r un-time. Can we do the same sor t of thing to simulate multidimensional ar r ays? We can, but we'll end u p using pointer s to pointer s. If we don't k now how many columns the ar r ay will have, we'll clear ly allocate memor y f or each r ow (as many columns wide as we lik e) by calling malloc, and each r ow will ther ef or e be r e pr esented by a pointer . How will we k ee p tr ack of those pointer s? Ther e ar e, af ter all, many of them, one f or each r ow. So we want to simulate an ar r ay of pointer s, but we don't k now how many r ows ther e will be, either , so we'll have to simulate that ar r ay (of pointer s) with another pointer , and this will be a pointer to a pointer . This is best illustr ated with an exam ple: #include int **ar r ay; ar r ay = malloc(nr ows * sizeof (int *)); DEPT OF CSE, SJBIT
Page 18
DATA STR UCTER S WITH C
10CS35
if (ar r ay == NULL) { f pr intf (stder r , "out of memor y\n"); e xit or r et ur n
} for (i = 0; i < nr ows; i++) { ar r ay[i] = malloc(ncolumns * sizeof (int)); if (ar r ay[i] == NULL) { f pr intf (stder r , "out of memor y\n"); e xit or r et ur n
} }
ar r ay is a pointer -to- pointer -to-int: at the f ir st level, it points to a block of pointer s, one f or each r ow. That f ir st-level pointer is the f ir st one we allocate; it has n r ows elements, with each element big enough to hold a pointer -to-int, or int *. If we successf ully allocate it, we then f ill in the pointer s (all n r ows of them) with a pointer (also o btained f r om malloc) to n columns num ber of ints, the stor age f or that r ow of the ar r ay. If this isn't quite mak ing sense, a pictur e should mak e ever ything clear :
Fig1: r e pr esentation of ar r ay Once we've done this, we can ( just as f or the one-dimensional case) use ar r ay-lik e syntax to access our simulated multidimensional ar r ay. If we wr ite ar r ay[i][ j] we'r e ask ing f or the i'th pointer pointed to by ar r ay, and then f or the j'th int pointed to by that inner pointer . (This is a pr etty nice r esult: although some com pletely dif f er ent machiner y, involving two levels of pointer der ef er encing, is going on behind the scenes, the simulated, dynamically-allocated twodimensional ``ar r ay'' can still be accessed just as if it wer e an ar r ay of ar r ays, i.e. with the same pair of br ack eted su bscr i pts.). If a pr ogr am uses simulated, dynamically allocated multidimensional ar r ays, it DEPT OF CSE, SJBIT
Page 19
DATA STR UCTER S WITH C
10CS35
becomes possi ble to wr ite ``heter ogeneous'' f unctions which d on' t have to k now (at com pile time) how big the ``ar r ays'' ar e. In other wor ds, one f unction can o per ate on ``ar r ays'' of var ious sizes and sha pes. The f unction will look something lik e func2(int **ar r ay, int nr ows, int ncolumns) { } This f unction does acce pt a pointer -to- pointer -to-int, on the assum ption that we'll only be calling it with simulated, dynamically allocated multidimensional ar r ays. (We must not call this f unction on ar r ays lik e the ``tr ue'' multidimensional ar r ay a2 of the pr evious sections). The f unction also acce pts the dimensions of the ar r ays as par ameter s, so that it will k now how many ``r ows'' and ``columns'' ther e ar e, so that it can iter ate over them cor r ectly. Her e is a f unction which zer os out a pointer -to- pointer , two-dimensional ``ar r ay'': void zer oit(int **ar r ay, int nr ows, int ncolumns) { int i, j; for (i = 0; i < nr ows; i++) { for ( j = 0; j < ncolumns; j++) ar r ay[i][ j] = 0; } } Finally, when it comes time to f r ee one of these dynamically allocated multidimensional ``ar r ays,'' we must r emem ber to f r ee each of the chunk s of memor y that we've allocated. (Just f r eeing the to p-level pointer , ar r ay, wouldn't cut it; if we did, all the second-level pointer s would be lost but not f r eed, and would waste memor y.) Her e's what the code might look lik e: for (i = 0; i < nr ows; i++) fr ee(ar r ay[i]); fr ee(ar r ay);
2.3. Structures and Unions: Structure:
A str uctur e is a user -def ined data ty pe. You have the a bility to def ine a new ty pe of data consider a bly mor e com plex than the ty pes we have been using. A str uctur e is a collection of one or mor e var ia bles, possi bly of dif f er ent ty pes, gr ou ped together under a single name f or convenient handling. Str uctur es ar e called “r ecor ds” in some languages, nota bly Pascal. Str uctur es hel p or ganize com plicated data, DEPT OF CSE, SJBIT
Page 20
DATA STR UCTER S WITH C
10CS35
par ticular ly in lar ge pr ogr ams, because they per mit a gr ou p of r elated var ia bles to be tr eated as a unit instead of as se par ate entities. Today’s a p plication r equir es com plex data str uctur es to su p por t them. A str uctur e is a collection of related elements wher e element belongs to a dif f er ent ty pe. Another way to look at a str uctur e is a tem plate – a patter n. For exam ple gr a phical user inter f ace f ound in a window r equir es str uctur es ty pical exam ple f or the use of str uctur es could be the f ile ta ble which holds the k ey data lik e logical f ile name, location of the f ile on disc and so on. The f ile ta ble in C is a ty pe def ined str uctur e - FILE. Each element of a str uctur e is also called as f ield. A f ield has a many char acter istic similar to that of a nor mal var ia ble. An ar r ay and a str uctur e ar e slightly dif f er ent. The f or mer has a collection of homogeneous elements but the latter has a collection of heter ogeneous elements. The gener al f or mat f or a str uctur e in C is shown st r uct {f ield _ list} var ia ble _ identif ier ; st r uct str uct _ name
{ ty pe1 f ieldname1; ty pe2 f ieldname2; . . . ty pe N f ieldname N; }; st r uct str uct _ name var ia bles;
The a bove f or mat shown is not concr ete and can var y, so dif f er ent ` flavour s of str uctur e declar ation is as shown. st r uct
{ …. } var ia ble _ identif er ; Example
st r uct mo b _ equi p;
{ long int IMEI; char r el _ date[10]; DEPT OF CSE, SJBIT
Page 21
DATA STR UCTER S WITH C
10CS35
char model[10]; char br and[15]; }; The a bove exam ple can be u pgr aded with t y ped e f . A pr ogr am to illustr ate the wor k ing of the str uctur e is shown in the pr evious section. t y ped e f st r uct mo b _ equi p;
{ long int IMEI; char r el _ date[10]; char model[10]; char br and[15]; int count; } MOB; MOB m1; st r uct tag
{ ……. }; st r uct tag var ia ble _ identif er s; t y ped e f st r uct
{ …….. } TYPE _ IDE NITIFIER ; TYPE _ IDE NTIFIER var ia ble _ identif er s; Accessing a structure
A str uctur e var ia ble or a tag name of a str uctur e can be used to access the mem ber s of a str uctur e with the hel p of a s pecial o per ator ‘.’ –also called as mem ber o per ator . In our pr evious exam ple To access the idea of the IMEI of the mo bile equi pment in the str uctur e mo b _ equi p is done lik e this Since the str uctur e var ia ble can be tr eated as a nor mal var ia ble All the IO f unctions f or a nor mal var ia ble holds good f or the str uctur e var ia ble also with slight. The scanf statement to r ead the in put to the IMEI is given below scan f ( “%d ” ,&m1. I M E I );
DEPT OF CSE, SJBIT
Page 22
DATA STR UCTER S WITH C
10CS35
Incr ement and decr ement o per ation ar e same as the nor mal var ia bles this includes postf ix and pr ef ix also. Mem ber o per ator has mor e pr ecedence than the incr ement or decr ement. Say su p pose in exam ple quoted ear lier we want count of student then m1.count ++; ++m1.count
Unions Unions ar e ver y similar to str uctur es, whatever discussed so f ar holds good f or unions also then why do we need unions? Size of unions de pends on the size of its mem ber of lar gest ty pe or mem ber with lar gest size, but this is not son in case of str uctur es. Exam ple union a bc1 { int a; f l oat b; char c;
}; The size of the union a bc1 is 4 bytes as f loat is lar gest ty pe. Ther ef or e at any point of time we can access only one mem ber of the union and this needs to remem ber ed by pr ogr ammer . Using structure data
Now that we have assigned values to the six sim ple var ia bles, we can do anything we desir e with them. In or der to k ee p this f ir st exam ple sim ple, we will sim ply pr int out the values to see if they r eally do exist as assigned. If you car ef ully ins pect the pr intf statements, you will see that ther e is nothing s pecial a bout them. The com pound name of each var ia ble is s pecif ied because that is the only valid name by which we can r ef er to these var ia bles. Str uctur es ar e a ver y usef ul method of gr ou ping data together in or der to mak e a pr ogr am easier to wr ite and under stand.
2.4 .Polynomials: In mathematics, a polynomial (f r om Gr eek pol y, "many" and medieval Latin binomium, " binomial"[1] [2] [3] ) is an ex pr ession of f inite length constr ucted f r om var ia bles (also k nown as indeter minates) and constants, using only the o per ations of addition, su btr action, multi plication, and non-negative integer ex ponents. For exam ple, x2 − 4 x + 7 is a polynomial, but x2 − 4/ x + 7 x3/2 is not, because its second ter m involves division by the var ia ble x (4/x) and because its thir d ter m contains an ex ponent that is not a whole num ber (3/2). The ter m " polynomial" can also be used as an ad jective, f or quantities that can be ex pr essed as a polynomial of some par ameter , as in " polynomial time" which is used in com putational com plexity theor y. Polynomials a p pear in a wide var iety of ar eas of mathematics and science. For exam ple, they ar e used to f or m polynomial equations, which encode a wide r ange of pr o blems, f r om elementar y wor d pr o blems to com plicated pr o blems in the sciences; they ar e used to def ine polynomial f unctions, which a p pear in settings r anging f r om basic chemistr y and physics to economics and social science; they ar e used in calculus and numer ical analysis to a p pr oximate other f unctions. In advanced mathematics, polynomials ar e used to constr uct polynomial r ings, a centr al conce pt in a bstr act alge br a and alge br aic geometr y. DEPT OF CSE, SJBIT
Page 23
DATA STR UCTER S WITH C
10CS35
A pol ynomial i s mad e u p o f t er m s t hat ar e onl y ad d ed , subt r act ed or mul t i pl ied .
A polynomial look s lik e this:
exam ple of a polynomial this one has 3 ter ms Fig 2: Polynomial comes f or m pol y- (meaning "many") and -nomial (in this case meaning "ter m") ... so it says "many ter ms" A polynomial can have: constants (lik e 3, -20, or ½) variables (lik e x and y)
ex ponents (lik e the 2 in y2) but only 0, 1, 2, 3, ... etc That can be com bined using: +
addition,
-
su btr action, and
×
Multi plication ... but not division!
Those r ules k ee ps polynomials sim ple, so they ar e easy to wor k with! Polynomial or Not?
DEPT OF CSE, SJBIT
Page 24
DATA STR UCTER S WITH C
10CS35
Fig 3: These ar e polynomials: 3x x-2 -6y2 - (7/9)x 3xyz + 3xy2z - 0.1xz - 200y + 0.5 512v5+ 99w5 1 (Yes, even "1" is a polynomial, it has one ter m which just ha p pens to be a constant). And these ar e not polynomials 2/(x+2) is not, because dividing is not allowed 1/x is not 3xy-2 is not, because the ex ponent is "-2" (ex ponents can only be 0,1,2,...) √x is not, because the ex ponent is "½" (see f r actional ex ponents) But these ar e allowed: x/2 is allowed, because it is also (½)x (the constant is ½, or 0.5) also 3x/8 for the same r eason (the constant is 3/8, or 0.375) √2 is allowed, because it is a constant (= 1.4142...etc) Monomial, Binomial, Tr inomial Ther e ar e s pecial names f or polynomials with 1, 2 or 3 ter ms:
DEPT OF CSE, SJBIT
Page 25
DATA STR UCTER S WITH C
10CS35
How d o you r emember t he name s? T hink c ycl e s!
Fi g 4:( T her e i s al so quad r inomial but t ho se name s ar e not o f t en u sed )
( 4
t er m s )
and
quint inomial
( 5
t er m s ) ,
2.5. Sparse Matrices: A s par se matr ix is a matr ix that allows s pecial techniques to tak e advantage of the lar ge num ber of zer o elements. This def inition hel ps to def ine "how many" zer os a matr ix needs in or der to be "s par se." The answer is that it de pends on what the str uctur e of the matr ix is, and what you want to do with it. For exam ple, a r andomly gener ated s par se matr ix with entr ies scatter ed r andomly thr oughout the matr ix is not s par se in the sense of Wilk inson. Next: Advanced Gr a phics U p: S par se matr ix com putations Pr evious: S par se matr ix com putations Cr eating a s par se matr ix: If a matr ix A is stor ed in or dinar y (dense) f or mat, then the command S = s par se(A) cr eates a co py of the matr ix stor ed in s par se f or mat. For exam ple: >> A = [0 0 1;1 0 2;0 -3 0] A= 0
0
1
1
0
2
0 -3
0
>> S = s par se(A) S= (2,1)
1
(3,2)
-3
DEPT OF CSE, SJBIT
Page 26
DATA STR UCTER S WITH C (1,3)
1
(2,3)
2
10CS35
>> whos Name
Size
Bytes Class
A
3x3
72 dou ble ar r ay
S
3x3
64 s par se ar r ay
Gr and total is 13 elements using 136 bytes Unf or tunately, this f or m of the s par se command is not par ticular ly usef ul, since if A is lar ge, it can be ver y time-consuming to f ir st cr eate it in dense f or mat. The command S = s par se(m,n) cr eates an zer o matr ix in s par se f or mat. Entr ies can then be added one- by-one: >> A = s par se(3,2) A= All zer o s par se: 3- by-2 >> A(1,2)=1; >> A(3,1)=4; >> A(3,2)=-1; >> A A= (3,1)
4
(1,2)
1
(3,2)
-1
(Of cour se, f or this to be tr uly usef ul, the nonzer os would be added in a loo p.) Another ver sion of the s par se command is S = s par se(I,J,S,m,n,maxnz). This cr eates an
s par se
matr ix with entr y ( I (k ) J , (k )) equal to . The o ptional ar gument maxnz causes Matla b to pr e-allocate stor age f or maxnz nonzer o entr ies, which can incr ease ef f iciency in the case when mor e nonzer os will be added later to S. The most common ty pe of s par se matr ix is a banded matr ix, that is, a matr ix with a f ew nonzer o diagonals. Such a matr ix can be cr eated with the s pdiags command. Consider the f ollowing matr ix: >> A A= DEPT OF CSE, SJBIT
Page 27
DATA STR UCTER S WITH C 64 -16
0 -16
-16 64 -16 0 -16 64 -16
0
0 -16
10CS35
0
0
0
0
0
0 -16
0
0
0
0
0 -16
0
0
0
0 -16
0
0
0 -16
0
0
0 64 -16
0 -16 64 -16
0
0 -16
0 -16 64
0
0
0 -16
0
0
0
0 -16
0
0
0
0
0
0
0 -16
0 64 -16
0
0 -16 64 -16
0 -16
0 -16 64
This is a matr ix with 5 nonzer o diagonals. In Matla b's indexing scheme, the nonzer o diagonals of A ar e num ber s -3, -1, 0, 1, and 3 (the main diagonal is num ber 0, the f ir st su bdiagonal is num ber -1, the f ir st su per diagonal is num ber 1, and so f or th). To cr eate the same matr ix in s par se f or mat, it is f ir st necessar y to cr eate a matr ix containing the nonzer o diagonals of A. Of cour se, the diagonals, r egar ded as column vector s, have dif f er ent lengths; only the main diagonal has length 9. In or der to gather the var ious diagonals in a single matr ix, the shor ter diagonals must be padded with zer os. The r ule is that the extr a zer os go at the bottom f or su bdiagonals and at the to p f or su per diagonals. Thus we cr eate the f ollowing matr ix: >> B = [ -16 -16 64
0
0
-16 -16 64 -16
0
-16
0
0 64 -16
-16 -16 64
0 -16
-16 -16 64 -16 -16 -16
0 64 -16 -16
0 -16 64
0 -16
0 -16 64 -16 -16 0
0 64 -16 -16
]; (notice the technique f or enter ing the r ows of a lar ge matr ix on sever al lines). The s pdiags command also needs the indices of the diagonals: >> d = [-3,-1,0,1,3];
DEPT OF CSE, SJBIT
Page 28
DATA STR UCTER S WITH C
10CS35
The matr ix is then cr eated as f ollows: S = s pdiags(B,d,9,9); The last two ar guments give the size of S. Per ha ps the most common s par se matr ix is the identity. R ecall that an identity matr ix can be cr eated, in dense f or mat, using the command eye. To cr eate the identity matr ix in s par se f or mat, use I = s peye(n). Another usef ul command is s py, which cr eates a gr a phic dis playing the s par sity patter n of a matr ix. For exam ple, the a bove penta-diagonal matr ix A can be dis played by the f ollowing command; see Figur e 6: >> s py(A)
Fig 5: The s par sity patter n of a matr ix
2.6. R epresentation of Multidimensional arrays For a two-dimensional ar r ay, the element with indices i, j would have addr ess B + c · i + d · j, wher e the coef f icients c and d ar e the row and column address increments, r es pectively. Mor e gener ally, in a k -dimensional ar r ay, the addr ess of an element with indices i1, i2, …, ik is B + c1 · i1 + c2 · i2 + … + ck · ik This f or mula r equir es only k multi plications and k −1 additions, f or any ar r ay that can f it in memor y. Mor eover , if any coef f icient is a f ixed power of 2, the multi plication can be r e placed by bit shif ting. The coef f icients ck must be chosen so that ever y valid index tu ple ma ps to the addr ess of a distinct element. If the minimum legal value f or ever y index is 0, then B is the addr ess of the element whose indices ar e all zer o. As in the one-dimensional case, the element indices may be changed by changing the base addr ess B. Thus, if a two-dimensional ar r ay has r ows and columns indexed f r om 1 to 10 and 1 to 20, r es pectively, then r e placing B by B + c1 - − 3 c1 will cause them to be r enum ber ed f r om 0 thr ough 9 and 4 thr ough 23, r es pectively. Tak ing advantage of this f eatur e, some languages (lik e FOR TR A N 77) s pecif y that ar r ay indices begin at 1, as in mathematical tr adition; while other languages (lik e For tr an 90, Pascal and Algol) let the user choose the minimum value f or each index. DEPT OF CSE, SJBIT
Page 29
DATA STR UCTER S WITH C
10CS35
Compact layouts
Of ten the coef f icients ar e chosen so that the elements occu py a contiguous ar ea of memor y. However , that is not necessar y. Even if ar r ays ar e always cr eated with contiguous elements, some ar r ay slicing o per ations may cr eate non-contiguous sub-ar r ays f r om them. Ther e ar e two systematic com pact layouts f or a two-dimensional ar r ay. For exam ple, consider the matr ix
In the r ow-ma jor or der layout (ado pted by C f or statically declar ed ar r ays), the elements of each r ow ar e stor ed in consecutive positions:
In Column-ma jor or der (tr aditionally used by For tr an), the elements of each column ar e consecutive in memor y:
For ar r ays with thr ee or mor e indices, "r ow ma jor or der " puts in consecutive positions any two elements whose index tu ples dif f er only by one in the l a st index. "Column ma jor or der " is analogous with r es pect to the f ir st index. In systems which use pr ocessor cache or vir tual memor y, scanning an ar r ay is much f aster if successive elements ar e stor ed in consecutive positions in memor y, r ather than s par sely scatter ed. Many algor ithms that use multidimensional ar r ays will scan them in a pr edicta ble or der . A pr ogr ammer (or a so phisticated com piler ) may use this inf or mation to choose between r ow- or column-ma jor layout f or each ar r ay. For exam ple, when com puting the pr oduct A· B of two matr ices, it would be best to have A stor ed in r ow-ma jor or der , and B in column-ma jor or der . The R e pr esentation of Multidimensional Ar r ays: N-dimension, A[M0][M2]. . .[Mn-1] – Address of any entry A[i0][i1]...[in-1] ba se i0 M 1 M 2
M n 1
i1 M 2 M 3
M n 1
i2 M 3 M 4
M n 1
in 2 M n 1 in 1 n -1
base +
i ja j , j=0
1 a j n k j 1 ,0 j n 1
wher e{ an 1 1
Ar r ay r esizing: Static ar r ays have a size that is f ixed at allocation time and consequently do not allow elements to be inser ted or r emoved. However , by allocating a new ar r ay and co pying the contents of the old ar r ay to it, it
DEPT OF CSE, SJBIT
Page 30
DATA STR UCTER S WITH C
10CS35
is possi ble to ef f ectively im plement a d ynamic or g r owabl e ver sion of an ar r ay; see dynamic ar r ay. If this o per ation is done inf r equently, inser tions at the end of the ar r ay r equir e only amor tized constant time. Some ar r ay data str uctur es do not r eallocate stor age, but do stor e a count of the num ber of elements of the ar r ay in use, called the count or size. This ef f ectively mak es the ar r ay a dynamic ar r ay with a f ixed maximum size or ca pacity; P a scal st r in g s ar e exam ples of this. Non-linear f ormulas
Mor e com plicated ("non-linear ") f or mulas ar e occasionally used. For a com pact two-dimensional tr iangular ar r ay, f or instance, the addr essing f or mula is a polynomial of degr ee 2. Ef f iciency Both st or e and sel ect tak e (deter ministic wor st case) constant time. Ar r ays tak e linear (O(n)) s pace in the num ber of elements n that they hold. In an ar r ay with element size k and on a machine with a cache line size of B bytes, iter ating thr ough an ar r ay of n elements r equir es the minimum of ceiling(nk /B) cache misses, because its elements occu py contiguous memor y locations. This is r oughly a f actor of B/k better than the num ber of cache misses needed to access n elements at r andom memor y locations. As a
consequence, sequential iter ation over an ar r ay is noticea bly f aster in pr actice than iter ation over many other data str uctur es, a pr o per ty called locality of r ef er ence (this does not mean however , that using a per f ect hash or tr ivial hash within the same (local) ar r ay, will not be even f aster - and achieva ble in constant time). Memor y-wise, ar r ays ar e com pact data str uctur es with no per -element over head. Ther e may be a per -ar r ay over head, e.g. to stor e index bounds, but this is language-de pendent. It can also ha p pen that elements stor ed in an ar r ay r equir e l e s s memor y than the same elements stor ed in individual var ia bles, because sever al ar r ay elements can be stor ed in a single wor d; such ar r ays ar e of ten called pack ed ar r ays. An extr eme ( but commonly used) case is the bit ar r ay, wher e ever y bit r e pr esents a single element. Link ed lists allow constant time r emoval and inser tion in the middle but tak e linear time f or indexed access. Their memor y use is ty pically wor se than ar r ays, but is still linear . An alter native to a multidimensional ar r ay str uctur e is to use a one-dimensional ar r ay of r ef er ences to ar r ays of one dimension less.
For two dimensions, in par ticular , this alter native str uctur e would be a vector of pointer s to vector s, one f or each r ow. Thus an element in r ow i and column j of an ar r ay A would be accessed by dou ble indexing ( A[i][ j] in ty pical notation). This alter native str uctur e allows r a g g ed or ja g g ed ar r ays, wher e each r ow may have a dif f er ent size — or , in gener al, wher e the valid r ange of each index de pends on the values of all pr eceding indices. It also saves one multi plication ( by the column addr ess incr ement) r e placing it by a bit shif t (to index the vector of r ow pointer s) and one extr a memor y access (f etching the r ow addr ess), which may be wor thwhile in some ar chitectur es.
2.7. RECOMMENDED QUESTIONS 1. How does a str uctur e dif f er f r om an union? Mention any 2 uses of str uctur e 2. Ex plain the single-dimensional ar r ay. 3. Ex plain the declar ation & initialization of 2-D ar r ay. 4. How tr ans posing of a matr ix done. 5. Ex plain the r e pr esentation of multidimensional ar r ay. 6. Ex plain the s par se matr ix r e pr esentation. DEPT OF CSE, SJBIT
Page 31
DATA STR UCTER S WITH C
10CS35
UNIT – 3 : STACK S AND QUEUES
3.1.Stack s: A st ack is an or der ed collection of items into which new items may be inser ted and fr om which items may be deleted at one end, called the t o p of the stack . A stack is a dynamic, constantly changing o b ject as the def inition of the stack pr ovides f or the inser tion and deletion of items. It has single end of the stack as to p of the stack , wher e both inser tion and deletion of the elements tak es place. The last element inser ted into the stack is the f ir st element deleted-l ast i n f i r st out l i st ( L I F O ) . Af ter sever al inser tions and deletions, it is possi ble to have the same f r ame again. Primitive Operations
When an item is added to a stack , it is pushed onto the stack . When an item is r emoved, it is po p ped f r om the stack . Given a stack s, and an item i, per f or ming the o per ation pu sh( s ,i ) adds an item i to the to p of stack s. push(s, H); push(s, I); push(s, J); O per ation po p( s ) r emoves the to p element. That is, if i= po p( s ), then the r emoved element is assigned to i. po p(s); Because of the push o per ation which adds elements to a stack , a stack is sometimes called a pushd own l i st . Conce ptually, ther e is no u p per limit on the num ber of items that may be k e pt in a stack . If a stack contains a single item and the stack is po p ped, the r esulting stack contains no items and is called the em pt y st ack . Push o per ation is a p plica ble to any stack . Po p o per ation cannot be a p plied to the em pty stack . If so, und er f l ow ha p pens. A Boolean o per ation em pt y( s ), r etur ns TR UE if stack is em pty. Other wise FALSE, if stack is not em pty. Representing stack s in C
Bef or e pr ogr amming a pr o blem solution that uses a stack , we must decide how to r e pr esent the stack in a pr ogr amming language. It is an or der ed collection of items. In C, we have AR R AY as an or der ed collection of items. But a stack and an ar r ay ar e two dif f er ent things. The num ber of elements in an ar r ay is fixed. A stack is a dynamic o b ject whose size is constantly changing. So, an ar r ay can be declar ed lar ge enough f or the maximum size of the stack . A stack in C is declar ed as a str uctur e containing two o b jects: • An ar r ay to hold the elements of the stack . • An integer to indicate the position of the cur r ent stack top within the ar r ay. #def ine STACK SIZE 100 str uct stack { int to p; DEPT OF CSE, SJBIT
Page 32
DATA STR UCTER S WITH C
10CS35
int items[STACK SIZE]; }; The stack s may be declar ed by str uct stack s; The stack items may be int, f loat, char , etc. The em pty stack contains no elements and can ther ef or e be indicated by t o p= -1. To initialize a stack S to the em pty state, we may initially execute s.t o p= -1.
To deter mine stack em pty condition, if (s.to p=-1) stack em pty; else stack is not em pty; The em pty(s) may be consider ed as f ollows: int em pty(str uct stack * ps) { if ( ps->to p== -1) retur n(TR UE); else retur n(FALSE); } Aggr egating the set of im plementation-de pendent tr ou ble s pots into small, easily identif ia ble units is an im por tant method of mak ing a pr ogr am mor e under standa ble and modif ia ble. This concept is k nown as mod ul ar i z at i on, in which individual f unctions ar e isolated into low-level modules whose pr o per ties ar e easily ver if ia ble. These low-level mod ul es can then be used by mor e com plex r outines, which do not have to concer n themselves with the details of the low-level modules but only with their f unction. The com plex routines may themselves then be viewed as modules by still higher -level r outines that use them inde pendently of their inter nal details. • Implementing pop operation If the stack is em pty, pr int a war ning message and halt execution. R emove the to p element f r om the stack . Retur n this element to the calling pr ogr am int po p(str uct stack * ps) { DEPT OF CSE, SJBIT
Page 33
DATA STR UCTER S WITH C
10CS35
if (em pty( ps)){ pr intf (“%”,”stack under f low”); exit(1); } retur n( ps->items[ ps->top--]); }
3.2. Stack s Using Dynamic Arrays: For exam ple: Ty pedef str uct { char }
*str ; wor ds;
main() { wor ds x[100]; // I do not want to use this, I want to dynamic incr ease the size of the ar r ay as data comesin. } For exam ple her e is the f ollowing ar r ay in which i r ead individual wor ds f r om a .txt f ile and save them wor d by wor d in the ar r ay: Code: char wor ds[1000][15]; Her e 1000 def ines the num ber of wor ds the ar r ay can save and each wor d may com pr ise of not mor e than 15 char acter s. Now I want that that pr ogr am should dynamically allocate the memor y f or the num ber of wor ds it counts. For exam ple, a .txt f ile may contain wor ds gr eater that 1000. Now I want that the pr ogr am should count the num ber of wor ds and allocate the memor y accordingly. Since we cannot use a var ia ble in place of [1000], I am com pletely blank at how to im plement my logic. Please hel p me in this regar d.
3.3. Queues: A queue is lik e a line of peo ple waiting f or a bank teller . The queue has a f ront and a rear. When we talk of queues we talk a bout two distinct ends: the f r ont and the r ear . Additions to the queue tak e place at the r ear . Deletions ar e made f r om the f r ont. So, if a jo b is su bmitted f or execution, it joins at the r ear of the jo b queue. The jo b at the f r ont of the queue is the next one to be executed • New peo ple must enter the queue at the r ear . push, although it is usually called an enqueue o per ation.
DEPT OF CSE, SJBIT
Page 34
DATA STR UCTER S WITH C
10CS35
• When an item is tak en f r om the queue, it always comes f r om the f r ont. pop, although it is usually called a dequeue o per ation. What is Queue? • Or der ed collection of elements that has two ends as f r ont and r ear . • Delete f r om f r ont end • Inser t f r om r ear end • A queue can be im plemented with an ar r ay, as shown her e. For exam ple, this queue contains the integer s 4 (at the f r ont), 8 and 6 (at the r ear ). Queue Operations
• Queue Over f low • Inser tion of the element into the queue • Queue under f low • Deletion of the element f r om the queue • Dis play of the queue str uct Queue { int que [size]; int f r ont; int r ear ; }Q; Exam ple: #include #include #include #def ine size 5 str uct queue { int que[size]; int f r ont, r ear ; } Q;
DEPT OF CSE, SJBIT
Page 35
DATA STR UCTER S WITH C
10CS35
Exam ple: #include #include #include #def ine size 5 str uct queue { int que[size]; int fr ont, r ear ; } Q; int Qf ull ( ){ if (Q.r ear >= size-1) retur n 1; else retur n 0; } int Qem pty( ){ if ((Q.f r ont == -1)||(Q.f r ont > Q.r ear )) retur n 1; else retur n 0; } int inser t (int item) { if (Q.f r ont == -1) Q.f r ont++; Q.que[++Q.r ear ] = item; retur n Q.r ear ; } Int delete () { Int item; DEPT OF CSE, SJBIT
Page 36
DATA STR UCTER S WITH C
10CS35
Item = Q.que[Q.f r ont]; Q.f r ont++; Retur n Q.f r ont; } Void dis play () { Int I; For (i=Q.f r ont;i<=Q.r ear ;i++) Pr intf (“ %d”,Q.que[i]); } Void main (void) { Int choice, item; Q.f r ont = -1; Q.R ear = -1; do { Pr intf (“Enter your choice : 1:I, 2:D, 3:Dis play”); Scanf (“%d”, &choice); Switch(choice){ Case 1: if (Qf ull()) pr intf (“Cannt Inser t”); else scanf (“%d”,item); inser t(item); br eak ; Case 2: if (Qem pty()) pr intf (“Under f low”); else delete(); br eak ; } } }
3.4. Circular Queues Using Dynamic Arrays: Cir cular Queue • When an element moves past the end of a cir cular ar r ay, it wr a ps ar ound to the beginning.
A mor e ef f icient queue r e pr esentation is o btained by r egar ding the ar r ay Q(1:n) as cir cular . It now becomes mor e convenient to declar e the ar r ay as Q(0:n - 1). When r ear = n - 1, the next element is enter ed at Q(0) in case that s pot is f r ee. Using the same conventions as bef or e, f r ont DEPT OF CSE, SJBIT
Page 37
DATA STR UCTER S WITH C
10CS35
will always point one position counter clock wise f r om the f ir st element in the queue. Again, f r ont = r ear if and only if the queue is em pty. Initially we have f r ont = r ear = 1. Figur e 3.4 illustr ates some of the possi ble conf igur ations f or a cir cular queue containing the f our elements J 11- J J 4 with n > 4. The assum ption of cir cular ity changes the ADD and DELETE algor ithms slightly. In or der to add an element, it will be necessar y to move r ear one position clock wise, i.e., if r ear = n - 1 then r ear 0 else r ear r ear + 1.
Figure : Circular queue of n elements
Using the modulo o per ator which com putes r emainder s, this is just r ear ( r ear + 1)mod n. S imil ar l y , it wil l be nece s sar y t o move f r ont one po sit ion cl ock wi se each t ime a d el et ion i s mad e. A g ain , u sin g t he mod ul o o per at ion , t hi s can be accom pl i shed b y f r ont ( f r ont + l )mod n. An e xaminat ion o f t he al g or it hm s ind icat e s t hat ad d it ion and d el et ion can now be car r ied out in a fi xed amount o f t ime or O( 1 ). e.g.
• OOOOO7963 _ 4OOOO7963 (af ter Enqueue(4)) • Af ter Enqueue(4), r ear index moves f r om 3 to 4. Queue Full Condition: if (f r ont == (r ear +1)%size) Queue is Full • Wher e do we inser t: rear = (r ear + 1)%size; queue[r ear ]=item; Af ter deletion : f ront = (f r ont+1)%size; Exam ple of a Cir cular Queue • A Cir cular Q, the size of which is 5 has thr ee elements 20, 40, and 60 wher e f r ont is 0 and r ear is 2. What ar e the values of af ter each of these o per ations: Q = 20, 40, 60, - , - fr ont–20[0], r ear –60[2] Inser t item 50: DEPT OF CSE, SJBIT
Page 38
DATA STR UCTER S WITH C
10CS35
Q = 20, 40, 60, 50, - fr ont-20[0], r ear -50[3] Inser t item 10: Q = 20, 40, 60, 50, 10 f r ont-20[0], r ear -10[4] Q = 20, 40, 60, 50, 10 f r ont-20[0], r ear -10[4] Inser t 30 Rear = (r ear + 1)%size = (4+1)%5 = 0, hence over f low. Delete an item delete 20, f r ont = (f r ont+1)%size = (0+1)%5=1 Delete an item delete 40, f r ont = (f r ont+1)%size = (1+1)&5=2 Inser t 30 at position 0 Rear = (r ear + 1)%size = (4+1)%5 = 0 Similar ly Inser t 80 at position 1
3.5. Evaluation expression:
of Expressions:
Evaluating
a
postf ix
When pioneer ing com puter scientists conceived the idea of higher level pr ogr amming languages, they wer e f aced with many technical hur dles. One of the biggest was the question of how to gener ate machine language instr uctions which would pr o per ly evaluate any ar ithmetic ex pr ession. A com plex assignment statement such as X A A/ B B ** C + + D* E - A * C
(3.1)
might have sever al meanings; and even if it wer e uniquely def ined, say by a f ull use of par entheses, it still seemed a f or mida ble task to gener ate a cor r ect and r easona ble instr uction sequence. For tunately the solution we have today is both elegant and sim ple. Mor eover , it is so sim ple that this as pect of com piler wr iting is r eally one of the mor e minor issues. An ex pr ession is made u p of o per ands, o per ator s and delimiter s. The ex pr ession a bove has f ive o per ands: , B,C D , D, and E . Though these ar e all one letter var ia bles, o per ands can be any legal var ia ble name or A B constant in our pr ogr amming language. In any ex pr ession the values that var ia bles tak e must be consistent with the o per ations per f or med on them. These o per ations ar e descr i bed by the o per ator s. In most pr ogr amming languages ther e ar e sever al k inds of o per ator s which cor r es pond to the dif f er ent k inds of data a var ia ble can hold. Fir st, ther e ar e the basic ar ithmetic o per ator s: plus, minus, times, divide, and ex ponentiation (+,-,*,/,**). Other ar ithmetic o per ator s include unar y plus, unar y minus and mod, ceil, and f loor. The latter thr ee may sometimes be li br ar y su br outines r ather than pr edef ined o per ator s. A second class ar e the r elational o per ator s: . These ar e usually def ined to wor k f or ar ithmetic o per ands, but they can just as easily wor k f or char acter str ing data. ('CAT' is less than 'DOG' since it pr ecedes 'DOG' in al pha betical or der .) The r esult of an ex pr ession which contains r elational o per ator s is one of the two DEPT OF CSE, SJBIT
Page 39
DATA STR UCTER S WITH C
10CS35
constants: true or f alse. Such all ex pr ession is called Boolean, named af ter the mathematician Geor ge Boole, the f ather of sym bolic logic. The f ir st pr o blem with under standing the meaning of an ex pr ession is to decide in what or der the o per ations ar e car r ied out. This means that ever y language must uniquely def ine such an or der . For instance, if A = 4, B = C = 2, D = E = 3, then in eq. 3.1 we might want X to be assigned the value 4/(2 ** 2) + (3 * 3) - (4 * 2) = (4/4) + 9 - 8 = 2. Let us now consider an exam ple. Su p pose that we ar e ask ed to evaluate the following postf ix ex pr ession: 623+-382/+*2$3+ S ymb O pnd 1 O pnd 2 V al ue o pnd st k
66 2 6,2 3 6,2,3 + 2 3 5 6,5 -6511 3 6 5 1 1,3 8 6 5 1 1,3,8 2 6 5 1 1,3,8,2 / 8 2 4 1,3,4 8 + 3 4 7 1,7 *1777 2 1 7 7 7,2 $ 7 2 49 49 3 7 2 49 49,3 + 49 3 52 52 Each time we r ead an o per and, we push it onto a stack . When we r each an o per ator , its o per ands will be the to p two elements on the stack . We can then po p these two elements, per f or m the indicated o per ation on them, and push the r esult on the stack so that it will be availa ble f or use as an o per and of the next o per ator . The maximum size of the stack is the num ber of o per ands that a p pear in the in put ex pr ession. But usually, the actual size of the stack needed is less than maximum, as o per ator po ps the to p two o per ands. DEPT OF CSE, SJBIT
Page 40
DATA STR UCTER S WITH C
10CS35
Program to evaluate postf ix expression
Along with push, po p, em pty o per ations, we have eval , i sd i g i t and o per o per ations. eval – the evaluation algor ithm
dou ble eval(char ex pr []) { int c, position; dou ble o pnd1, o pnd2, value; str uct stack o pndstk ; o pndstk .to p=-1; for ( position=0 ;( c=ex pr [ position])!=’\0’; position++) if (isdigit) push (&o pndstk , (dou ble) (c-‘0’)); else{ o pnd2= po p (&o pndstk ); 9 o pnd1= po p (&o pndstk ); value=o per (c, o pnd1,o pnd2); push (&o pndstk . value); } retur n( po p(&o pndstk )); } i sd i g i t – called by eval, to deter mine whether or not its ar gument is an o per and
int isdigit(char sym b) { retur n(sym b>=’0’ && sym b<=’9’); } o per – to im plement the o per ation cor r es ponding to an o per ator sym bol
dou ble o per (int sym b, dou ble o p1, dou ble o p2) { DEPT OF CSE, SJBIT
Page 41
DATA STR UCTER S WITH C
10CS35
switch (sym b){ case ‘+’ : r etur n (o p1+o p2); case ‘-‘ : r etur n (o p1-o p2); case ‘*’ : r etur n (o p1*o p2); case ‘/’ : r etur n(o p1/o p2); case ‘$’ : r etur n ( pow (o p1, o p2); def ault: pr intf (“%s”,”illegal o per ation”); exit(1); } } Converting an expression f rom inf ix to postf ix
Consider the given par entheses f r ee inf ix ex pr ession: A+B*C S ymb P o st f i x st r in g o p st k
1AA 2+A+ 3 B AB + 4 * AB + * 5 C ABC + * 6 ABC * + 7 ABC * + Consider the given par entheses inf ix ex pr ession: (A+B)*C S ymb P o st f i x st r in g O p st k
1(( 2AA( 3+A(+ 4 B AB ( + 5 ) AB+ DEPT OF CSE, SJBIT
Page 42
DATA STR UCTER S WITH C
10CS35
6 * AB+ * 7 C AB+C * 8 AB+C* Program to convert an expression f rom inf ix to postf ix
Along with po p, push, em pty, po pandtest, we also mak e use of additional f unctions such as, i so per and , pr cd , po st f i x. i so per and – retur ns TR UE if its ar gument is an o per and and FALSE other wise pr cd –
acce pts two o per ator sym bols as ar guments and r etur ns TR UE if the f ir st has pr ecedence over the second when it a p pear s to the lef t of the second in an inf ix str ing and FALSE other wise
po st f i x – pr ints the postf ix str ing
3.6. Multiple Stack s and Queues: U p to now we have been concer ned only with the r e pr esentation of a single stack or a single queue in the memor y of a com puter . For these two cases we have seen ef f icient sequential data r e pr esentations. What ha p pens when a data r e pr esentation is needed f or sever al stack s and queues? Let us once again limit our selves, to sequential ma p pings of these data o b jects into an ar r ay V (1:m). If we have only 2 stack s to r e pr esent. then the solution is sim ple. We can use V (1) f or the bottom most element in stack 1 and V (m) f or the cor r es ponding element in stack 2. Stack 1 can gr ow towar ds V (m) and stack 2 towar ds V (1). It is ther ef or e possi ble to utilize ef f iciently all the availa ble s pace. Can we do the same when mor e than 2 stack s ar e to be r e pr esented? The answer is no, because a one dimensional ar r ay has only two f ixed points V (1) and V (m) and each stack r equir es a f ixed point f or its bottommost element. When mor e than two stack s, say n, ar e to be r e pr esented sequentially, we can initially divide out the availa ble memor y V (1:m) into n segments and allocate one of these segments to each of the n stack s. This initial division of V (1:m) into segments may be done in pr o por tion to ex pected sizes of the var ious stack s if the sizes ar e k nown. In the a bsence of such inf or mation, V (1:m) may be divided into equal segments. For each stack i we shall use B(i) to r e pr esent a position one less than the position in V f or the bottommost element of that stack . T (i), 1 i n will point to the to pmost element of stack i. We shall use the boundar y condition B(i) = T (i) if f the i'th stack is em pty. If we gr ow the i'th stack in lower memor y indexes than the i + 1'st, then with roughly equal initial segments we have B (i) = T (i) = m/n (i - 1), 1 i n
(3.2)
as the initial values of B(i) and T (i), (see f igur e 3.9). Stack i, 1 i n can gr ow f r om B(i) + 1 u p to B(i + 1) bef or e it catches u p with the i + 1'st stack . It is convenient both f or the discussion and the algor ithms to def ine B(n + 1) = m. Using this scheme the add and delete algor ithms become: procedure A D D(i , X ) //add element X to the i'th stack , 1 i n// if T (i) = B(i + 1) then call ST AC K - F U LL (i) T (i) T (i) + 1 //add X to the i'th stack // V (T (i)) X end A D D L E T E (i , X ) procedure D E
DEPT OF CSE, SJBIT
Page 43
DATA STR UCTER S WITH C
10CS35
//delete to pmost element of stack i// if T (i) = B(i) then call ST AC K - E M P T Y (i) X V( T( i ) ) T (i) T (i) - 1 end D E L E T E
The algor ithms to add and delete a p pear to be a sim ple as in the case of only 1 or 2 stack s. This r eally is not the case since the STACK _ FULL condition in algor ithm ADD does not im ply that all m locations of V ar e in use. In f act, ther e may be a lot of unused s pace between stack s j and j + 1 f or 1 j n and j i. The pr ocedur e STACK _ FULL (i) should ther ef or e deter mine whether ther e is any f r ee s pace in V and shif t stack s ar ound so as to mak e some of this f r ee s pace availa ble to the i'th stack . Sever al str ategies ar e possi ble f or the design of algor ithm STACK _ FULL. We shall discuss one str ategy in the text and look at some other s in the exer cises. The pr imar y o b jective of algor ithm STACK _ FULL is to per mit the adding of elements to stack s so long as ther e is some f r ee s pace in V . One way to guar antee this is to design STACK _ FULL along the f ollowing lines: a) deter mine the least j , i < j n such that ther e is f r ee s pace between stack s j and j + 1, i.e., T ( j) < B( j + 1). If ther e is such a j, then move stack s i + 1, i + 2, ... j, one position to the r ight (tr eating V (1) as lef tmost and V (m) as r ightmost), ther e by cr eating a s pace between stack s i and i + 1. b) if ther e is no j as in a), then look to the lef t of stack i. Find the lar gest j such that 1 j < i and ther e is s pace between stack s j and j + 1, i.e., T ( j) < B( j + 1). If ther e is such a j, then move stack s j + 1, j + 2, ...,i one s pace lef t cr eating a f r ee s pace between stack s i and i + 1. c) if ther e is no j satisf ying either the conditions of a) or b), then all m s paces of V ar e utilized and ther e is no f r ee s pace. It should be clear that the wor st case per f or mance of this r e pr esentation f or the n stack s together with the a bove str ategy f or STACK _ FULL would be r ather poor . In f act, in the wor st case O(m) time may be needed f or each inser tion (see exer cises). In the next cha pter we shall see that if we do not limit our selves to sequential ma p pings of data o b jects into ar r ays, then we can o btain a data r e pr esentation f or m stack s that has a much better wor st case per f or mance than the r e pr esentation descr i bed her e.
3.7. RECOMMENDED QUESTIONS 1. Assume A=1, B=2, C=3. Evaluate the f ollowing postf ix ex pr essions : a) AB+C-BA+C$ b) ABC+*CBA-+* 2. Conver t the f ollowing inf ix ex pr ession to postf ix : ((A-(B+C))*D)$(E+F) 3. Ex plain the dif f er ent ways of r e pr esenting ex pr essions 4. State the advantages of using inf ix & postf ix notations 5. State the r ules to be f ollowed dur ing inf ix to postf ix conver sions 6. State the r ules to be f ollowed dur ing inf ix to pr ef ix conver sions 7. Mention the advantages of r e pr esenting stack s using link ed lists than ar r ays DEPT OF CSE, SJBIT
Page 44
DATA STR UCTER S WITH C
10CS35
UNIT – 4 : LINK ED LISTS
4.1. Singly Link ed lists and Chains: Let us discuss a bout the dr aw back s of stack s and queues. Dur ing im plementation, over f l ow occur s. No sim ple solution exists f or mor e stack s and queues. In a sequential r e pr esentation, the items of stack or queue ar e im pl icit l y or der ed by the sequential or der of stor age. If the items of stack or queue ar e e x pl icit l y or der ed, that is, each item contained within itself the addr ess of the next item. Then a new data str uctur e k nown as l inear l ink ed l i st . Each item in the list is called a nod e and contains two f ields, an in f or mat ion f iel d and a ne xt ad d r e s s f iel d . The inf or mation f ield holds the actual element on the list. The next addr ess f ield contains the addr ess of the next node in the list. Such an addr ess, which is used to access a par ticular node, is k nown as a point er .The nul l point er is used to signal the end of a list. The list with no nodes – em pt y l i st or nul l l i st . The notations used in algor ithms ar e:If p is a pointer to a node, nod e( p ) r ef er s to the node pointed to by p. I n f o( p ) r ef er sto the inf or mation of that node. ne xt ( p ) ref er s to next addr ess por tion. If ne xt ( p ) is notnull, in f o( ne xt ( p ) ) r ef er s to the inf or mation por tion of the node that f ollows nod e( p ) inthe list. A link ed list (or mor e clear ly, "singly link ed list") is a data str uctur e that consists of a sequence of nodes each of which contains a r ef er ence (i.e., a l ink ) to the next node in the sequence.
A l ink ed l i st who se nod e s cont ain t wo f iel d s: an int e g er val ue and a l ink t o t he ne xt nod e
Link ed lists ar e among the sim plest and most common data str uctur es. They can be used to im plement sever al other common a bstr act data str uctur es, including stack s, queues, associative ar r ays, and sym bolic ex pr essions, though it is not uncommon to im plement the other data str uctur es dir ectly without using a list as the basis of im plementation. The pr inci pal benef it of a link ed list over a conventional ar r ay is that the list elements can easily be added or r emoved without r eallocation or r eor ganization of the entir e str uctur e because the data items need not be stor ed contiguously in memor y or on disk . Link ed lists allow inser tion and r emoval of nodes at any point in the list, and can do so with a constant num ber of o per ations if the link pr evious to the link being added or r emoved is maintained dur ing list tr aver sal. On the other hand, sim ple link ed lists by themselves do not allow r andom access to the data other than the fir st node's data, or any f or m of ef f icient indexing.
DEPT OF CSE, SJBIT
Page 45
DATA STR UCTER S WITH C
10CS35
Fig :Inser ting and r emoving nodes f r om a list A list is a dynamic data str uctur e. The num ber of nodes on a list may var y dr amatically and dynamically as elements ar e inser ted and r emoved. For exam ple, let us consider a list with elements 5, 3 and 8 and we need to add an integer 6 to the f r ontof that list. Then, p=getnode(); inf o( p)=6; next( p)=list; list= p; Similar ly, f or r emoving an element f r om the list, the pr ocess is almost exactly o p posite of the pr ocess to add a node to the f r ont of the list. R emove the f ir st node of a nonem pty list and stor e the value of in f o f iel d into a var ia ble x. then, p=list; list=next( p); x=inf o( p);
4.2. Representing Chains in C: A chain is a link ed list in which each node r e pr esents one element.
Ther e is a link or pointer f r om one element to the next. The last node has a NULL (or 0) pointer
An ar r ay and a sequential ma p ping is used to r e pr esent sim ple data str uctur es in the pr evious cha pter s •This r e pr esentation has the pr o per ty that successive nodes of the data o b ject ar e stor ed a f ixed distance a par t (1) If the element ai jis stor ed at location Li j, then ai j+1is at the location Li j+1 DEPT OF CSE, SJBIT
Page 46
DATA STR UCTER S WITH C
10CS35
(2) If the i-thelement in a queue is at location Li, then the (i+1)-th element is at location Li+1% n f or the cir cular r e pr esentation (3) If the to pmost node of a stack is at location LT , then the node beneath it is at location LT-1, and so on •When a sequential ma p ping is used f or or der ed lists, o per ations such as inser tion and deletion of ar bitr ar y elements become ex pensive. In a link ed r e pr esentation – To access list elements in the cor r ect or der , with each element we stor e the addr ess or location of the next element in the list–A link ed list is com pr ised of nodes; each node has zer o or mor e data f ields and one or mor e link or pointer f ields.
4.3. Link ed Stack s and Queues: Pushing a Link ed Stack
Er r or code Stack :: push(const Stack entr y &item) /* Post: Stack entr y item is added to the to p of the Stack ; r etur ns success or retur ns a code of over _ ow if dynamic memor y is exhausted. */ { Node *new to p = new Node(item, to p node); if (new to p == NULL) retur n over _ ow; to p node = new to p; retur n success; } Popping a Link ed Stack
Er r or code Stack :: po p( ) /* Post: The to p of the Stack is r emoved. If the Stack is em pty the method r etur ns under _ ow; other wise it r etur ns success. */ { Node *old to p = to p node; if (to p node == NULL) retur n under _ ow; to p node = old to p->next; delete old top; DEPT OF CSE, SJBIT
Page 47
DATA STR UCTER S WITH C
10CS35
retur n success; } A queue is a par ticular k ind of collection in which the entities in the collection ar e k e pt in or der and the pr inci pal (or only) o per ations on the collection ar e the addition of entities to the r ear ter minal position and r emoval of entities f r om the f r ont ter minal position. This mak es the queue a Fir st-In-Fir st-Out (FIFO) data str uctur e. In a FIFO data str uctur e, the f ir st element added to the queue will be the f ir st one to be r emoved. This is equivalent to the r equir ement that once an element is added, all elements that wer e added bef or e have to be r emoved bef or e the new element can be invok ed. A queue is an exam ple of a linear data str uctur e. Queues pr ovide ser vices in com puter science, tr ans por t, and o per ations r esear ch wher e var ious entities such as data, o b jects, per sons, or events ar e stor ed and held to be pr ocessed later . In these contexts, the queue per f or ms the f unction of a buf f er . #include #include str uctnode{ intvalue; str uctnode*next; }; voidInit(str uctnode*n){ n->next= NULL; } voidEnqueue(str uctnode*r oot,intvalue){ str uctnode* j=(str uctnode*)malloc(sizeof (str uctnode)); j->value=value; j->next= NULL; str uctnode*tem p tem p=r oot; while(tem p->next!= NULL) { tem p=tem p->next; } tem p->next= j; pr intf (“Value Enqueued is : %d\n”,value);
;
} voidDequeue(str uctnode*r oot) { if (r oot->next== NULL) { pr intf (“ NoElementtoDequeue\n”); } DEPT OF CSE, SJBIT
Page 48
DATA STR UCTER S WITH C
10CS35
else { str uctnode*tem p; tem p=r oot->next; root->next=temp->next; printf (“ValueDequeuedis%d\n”,temp->value); fr ee(tem p); } } voidmain() { str uctnodesam ple _ queue; Init(&sam ple _ queue); Enqueue(&sam ple _ queue,10); Enqueue(&sam ple _ queue,50); Enqueue(&sam ple _ queue,570); Enqueue(&sam ple _ queue,5710); Dequeue(&sam ple _ queue); Dequeue(&sam ple _ queue); Dequeue(&sam ple _ queue); }
4.4. Polynomials: In mathematics, a polynomial (f r om Gr eek pol y, "many" and medieval Latin binomium, " binomial"[1] [2] [3] ) is an ex pr ession of f inite length constr ucted f r om var ia bles (also k nown as indeter minates) and constants, using only the o per ations of addition, su btr action, multi plication, and non-negative integer ex ponents. For exam ple, x2 − 4 x + 7 is a polynomial, but x2 − 4/ x + 7 x3/2 is not, because its second ter m involves division by the var ia ble x (4/x) and because its thir d ter m contains an ex ponent that is not a whole num ber (3/2). The ter m " polynomial" can also be used as an ad jective, f or quantities that can be ex pr essed as a polynomial of some par ameter , as in " polynomial time" which is used in com putational com plexity theor y. Polynomials a p pear in a wide var iety of ar eas of mathematics and science. For exam ple, they ar e used to f or m polynomial equations, which encode a wide r ange of pr o blems, f r om elementar y wor d pr o blems to com plicated pr o blems in the sciences. A polynomial is a mathematical ex pr ession involving a sum of power s in one or mor e var ia bles multi plied by coef f icients. A polynomial in one var ia ble (i.e., a univar iate polynomial) with constant coef f icients is given by (1)
DEPT OF CSE, SJBIT
Page 49
DATA STR UCTER S WITH C
10CS35
The individual summands with the coef f icients (usually) included ar e called monomials (Beck er and Weis pf enning 1993, p. 191), wher eas the pr oducts of the f or m in the multivar iate case, i.e., with the coef f icients omitted, ar e called ter ms (Beck er and Weis pf enning 1993, p. 188). However , the ter m "monomial" is sometimes also used to mean polynomial summands wit hout their coef f icients, and in some older wor k s, the def initions of monomial and ter m ar e r ever sed. Car e is ther ef or e needed in attem pting to distinguish these conf licting usages. The highest power in a univar iate polynomial is called its or der , or sometimes its degr ee. Any polynomial
wi t h
can be ex pr essed as (2)
wher e the pr oduct r uns over the r oots of with multi plicity.
and it is under stood that multi ple r oots ar e counted
A polynomial in two var ia bles (i.e., a bivar iate polynomial) with constant coef f icients is given by (3) The sum of two polynomials is o btained by adding together the coef f icients shar ing the same power s of var ia bles (i.e., the same ter ms) so, f or exam ple, (4) and has or der less than (in the case of cancellation of leading ter ms) or equal to the maximum or der of the or iginal two polynomials. Similar ly, the pr oduct of two polynomials is o btained by multi plying ter m by ter m and com bining the r esults, f or exam ple 5 6 and has or der equal to the sum of the or der s of the two or iginal polynomials. A polynomial quotient (7) of two polynomials and is k nown as a r ational f unction. The pr ocess of per f or ming such a division is called long division, with synthetic division being a sim plif ied method of r ecor ding the division. For any polynomial , divides , meaning that the polynomial quotient is a r ational polynomial or , in the case of an integer polynomial, another integer polynomial ( N. Sato, per s. comm., Nov. 23, 2004). DEPT OF CSE, SJBIT
Page 50
DATA STR UCTER S WITH C
10CS35
Exchanging the coef f icients of a univar iate polynomial end-to-end pr oduces a polynomial (8) Whose r oots ar e r eci pr ocals
of the or iginal r oots
.
Hor ner 's r ule pr ovides a com putationally ef f icient method of f or ming a polynomial f r om a list of its coef f icients, and can be im plemented in M at hemat ica as f ollows. Polynomial[l _ List, x _ ] := Fold[x #1 + #2&, 0, l] The f ollowing ta ble gives s pecial names given to polynomials of low or der s. polynomial or der polynomial name 2
quadr atic polynomial
3
cu bic polynomial
4
quar tic
5
quintic
6
sextic
Polynomials of f our th degr ee may be com puted using thr ee multi plications and f ive additions if a f ew quantities ar e calculated f ir st (Pr ess et al . 1989): ( 9) (10)
wher e
(11) (12) (13) (14) Similar ly, a polynomial of f if th degr ee may be com puted with f our multi plications and f ive additions, and a polynomial of sixth degr ee may be com puted with f our multi plications and seven additions.The use of link ed lists is well suited to polynomial o per ations. We can easily imagine wr iting a collection of pr ocedur es f or in put, out put addition, su btr action and multi plication of polynomials using link ed lists as
DEPT OF CSE, SJBIT
Page 51
DATA STR UCTER S WITH C
10CS35
the means of r e pr esentation. A hy pothetical user wishing to r ead in polynomials A( x) , B( x) and C ( x) and then com pute D( x) = A( x) * B( x) + C ( x) would wr ite in his main pr ogr am: call R E A D( A) A D( B) call R E call R E A D(C ) T P M U L( A , B) D P A D D(T , C ) call P R I N T ( D)
Now our user may wish to continue com puting mor e polynomials. At this point it would be usef ul to reclaim the nodes which ar e being used to r e pr esent T(x). This polynomial was cr eated only as a par tial r esult towar ds the answer D(x). By r etur ning the nodes of T(x), they may be used to hold other polynomials. procedure E R AS E (T )
//r etur n all the nodes of T to the availa ble s pace list avoiding r e peated calls to pr ocedur e R E T // if T = 0 then return p T while L I N K ( p) 0 do //f ind the end of T // p L I N K ( p) end Ll N K ( p) AV // p points to the last node of T // AV T //availa ble list now includes T // end E R AS E
Study this algor ithm car ef ully. It clever ly avoids using the R ET pr ocedur e to r etur n the nodes of T one node at a time, but mak es use of the f act that the nodes of T ar e alr eady link ed. The time r equir ed to er ase T ( x) is still pr o por tional to the num ber of nodes in T . This er asing of entir e polynomials can be car r ied out even mor e ef f iciently by modif ying the list str uctur e so that the LI NK f ield of the last node points back to the f ir st node as in f igur e 4.8. A list in which the last node points back to the f ir st will be ter med a cir cul ar l i st . A chain is a singly link ed list in which the last node has a zer o link f ield. Cir cular lists may be er ased in a f ixed amount of time inde pendent of the num ber of nodes in the list. The algor ithm below does this. procedure C E R AS E (T ) //r etur n the cir cular list T to the availa ble pool// if T = 0 then return; X LI NK ( T ) L I N K (T ) AV AV X end C E R AS E
4.5. Additional List operations: It is of ten necessar y and desir a ble to build a var iety of r outines f or mani pulating singly link ed lists. Some that we have alr eady seen ar e: 1) I NIT which or iginally link s together the AV list; 2) GET NODE and 3) R ET which get and r etur n nodes to AV . Another usef ul o per ation is one which inver ts a chain. This routine is es pecially inter esting because it can be done "in place" if we mak e use of 3 pointer s. DEPT OF CSE, SJBIT
Page 52
DATA STR UCTER S WITH C
10CS35
N V E RT ( X ) procedure I //a chain pointed at by X is inver ted so that if X = (a1, ...,am) then af ter execution X = (am, ...,a1)// p X ;q 0 while p 0 do //r f ollows q; q f ollows p// r q;q p p L I N K ( p) // p moves to next node// //link q to pr evious node// L I N K (q) r end X q end I N V E RT
The r eader should tr y this algor ithm out on at least 3 exam ples: the em pty list, and lists of length 1 and 2 to convince himself that he under stands the mechanism. For a list of m 1 nodes, the while loo p is executed m times and so the com puting time is linear or O(m). Another usef ul su br outine is one which concatenates two chains X and Y . AT E N AT E ( X , Y , Z ) procedure C O N C // X = (a1, ...,am), Y = (b1, ...,bn), m ,n 0, pr oduces a new chain Z = (a1, ...,am,b1 , ...,bn)// Z X if X = 0 then [ Z Y ; return] if Y = 0 then return p X //f ind last node of X // while L I N K ( p) 0 do p L I N K ( p) end //link last node of X to Y// L I N K ( p) Y end C O N C AT E N AT E
This algor ithm is also linear in the length of the f ir st list. Fr om an aesthetic point of view it is nicer to wr ite this pr ocedur e using the case statement in SPAR K S. This would look lik e: procedure C O N C AT E N AT E ( X , Y , Z ) case : X = 0 : Z Y : Y = 0 : Z X : else : p X ; Z X while L I N K ( p) 0 do p L I N K ( p) end L I N K ( p) Y end AT E N AT E end CO N C
Su p pose we want to inser t a new node at the f r ont of this list. We have to change the LI NK f ield of the node containing x3. This r equir es that we move down the entir e length of A until we f ind the last node. It is mor e convenient if the name of a cir cular list points to the last node r ather than the f ir st.
DEPT OF CSE, SJBIT
Page 53
DATA STR UCTER S WITH C
10CS35
Now we can wr ite pr ocedur es which inser t a node at the f r ont or at the r ear of a cir cular list and tak e a fixed amount of time. procedure I N S E RT _ _ F RO N T ( A , X )
//inser t the node pointed at by X to the f r ont of the cir cular list A, wher e A points to the last node// if A = 0 then [ A X L I N K ( X ) A] N K ( X ) L I N K ( A) else [ L I L I N K ( A) X ] end I N S E RT -- F RO N T To inser t X at the r ear , one only needs to add the additional statement A X to the else clause of I N S E RT _ _ F RO N T .
As a last exam ple of a sim ple pr ocedur e f or cir cular lists, we wr ite a f unction which deter mines the length of such a list. procedure L E N GT H ( A)
//f ind the length of the cir cular list A// i 0 if A 0 then [ pt r A repeat i i + 1; pt r LI NK ( pt r ) until pt r = A ] return (i) N GT H end L E
4.6. Sparse Matrices: A s par se matr ix is a matr ix po pulated pr imar ily with zer os (Stoer & Bulir sch 2002, p. 619). The ter m itself was coined by Har r y M. Mar k owitz. Conce ptually, s par sity cor r es ponds to systems which ar e loosely cou pled. Consider a line of balls connected by s pr ings f r om one to the next; this is a s par se system. By contr ast, if the same line of balls had s pr ings connecting each ball to all other balls, the system would be r e pr esented by a dense matr ix. The conce pt of s par sity is usef ul in com binator ics and a p plication ar eas such as networ k theor y, which have a low density of signif icant data or connections. A s par se matr ix is a matr ix that allows s pecial techniques to tak e advantage of the lar ge num ber of zer o elements. This def inition hel ps to def ine "how many" zer os a matr ix needs in or der to be "s par se." The answer is that it de pends on what the str uctur e of the matr ix is, and what you want to do with it. For exam ple, a r andomly gener ated s par se matr ix with entr ies scatter ed r andomly thr oughout the matr ix is not s par se in the sense of Wilk inson (f or dir ect methods) since it tak es
.
Cr eating a s par se matr ix
DEPT OF CSE, SJBIT
Page 54
DATA STR UCTER S WITH C
10CS35
If a matr ix A is stor ed in or dinar y (dense) f or mat, then the command S = s par se(A) cr eates a co py of the matr ix stor ed in s par se f or mat. For exam ple: >> A = [0 0 1;1 0 2;0 -3 0] A= 0
0
1
1
0
2
0 -3
0
>> S = s par se(A) S= (2,1)
1
(3,2)
-3
(1,3)
1
(2,3)
2
>> whos Name
Size
Bytes Class
A
3x3
72 dou ble ar r ay
S
3x3
64 s par se ar r ay
Gr and total is 13 elements using 136 bytes Unf or tunately, this f or m of the s par se command is not par ticular ly usef ul, since if A is lar ge, it can be ver y time-consuming to f ir st cr eate it in dense f or mat. The command S = s par se(m,n) cr eates an zer o matr ix in s par se f or mat. Entr ies can then be added one- by-one: >> A = s par se(3,2) A= All zer o s par se: 3- by-2 >> A(1,2)=1; >> A(3,1)=4;
DEPT OF CSE, SJBIT
Page 55
DATA STR UCTER S WITH C
10CS35
>> A(3,2)=-1; >> A A= (3,1)
4
(1,2)
1
(3,2)
-1
(Of cour se, f or this to be tr uly usef ul, the nonzer os would be added in a loo p.) Another ver sion of the s par se command is S = s par se(I,J,S,m,n,maxnz). This cr eates an
s par se
matr ix with entr y ( I (k ) J , (k )) equal to . The o ptional ar gument maxnz causes Matla b to pr e-allocate stor age f or maxnz nonzer o entr ies, which can incr ease ef f iciency in the case when mor e nonzer os will be added later to S. The most common ty pe of s par se matr ix is a banded matr ix, that is, a matr ix with a f ew nonzer o diagonals. Such a matr ix can be cr eated with the s pdiags command. Consider the f ollowing matr ix: >> A A= 64 -16
0 -16
-16 64 -16 0 -16 64 -16
0
0 -16
0
0
0
0
0
0 -16
0
0
0
0
0 -16
0
0
0
0 -16
0
0
0 -16
0
0
0 64 -16
0 -16 64 -16
0
0 -16
0 -16 64
0
0
0 -16
0
0
0
0 -16
0
0
0
0
0
0
0 -16
0 64 -16
0
0 -16 64 -16
0 -16
0 -16 64
This is a matr ix with 5 nonzer o diagonals. In Matla b's indexing scheme, the nonzer o diagonals of A ar e num ber s -3, -1, 0, 1, and 3 (the main diagonal is num ber 0, the f ir st su bdiagonal is num ber -1, the f ir st su per diagonal is num ber 1, and so f or th). To cr eate the same matr ix in s par se f or mat, it is f ir st necessar y to cr eate a matr ix containing the nonzer o diagonals of A. Of cour se, the diagonals, r egar ded as column vector s, have dif f er ent lengths; only the main diagonal has length 9. In or der to gather the var ious DEPT OF CSE, SJBIT
Page 56
DATA STR UCTER S WITH C
10CS35
diagonals in a single matr ix, the shor ter diagonals must be padded with zer os. The r ule is that the extr a zer os go at the bottom f or su bdiagonals and at the to p f or su per diagonals. Thus we cr eate the f ollowing matr ix: >> B = [ -16 -16 64
0
0
-16 -16 64 -16
0
-16
0
0 64 -16
-16 -16 64
0 -16
-16 -16 64 -16 -16 -16
0 64 -16 -16
0 -16 64
0 -16
0 -16 64 -16 -16 0
0 64 -16 -16
]; (notice the technique f or enter ing the r ows of a lar ge matr ix on sever al lines). The s pdiags command also needs the indices of the diagonals: >> d = [-3,-1,0,1,3]; The matr ix is then cr eated as f ollows: S = s pdiags(B,d,9,9); The last two ar guments give the size of S. Per ha ps the most common s par se matr ix is the identity. R ecall that an identity matr ix can be cr eated, in dense f or mat, using the command eye. To cr eate the identity matr ix in s par se f or mat, use I = s peye(n). Another usef ul command is s py, which cr eates a gr a phic dis playing the s par sity patter n of a matr ix. For exam ple, the a bove penta-diagonal matr ix A can be dis played by the f ollowing command; see Figur e 6: >> s py(A)
DEPT OF CSE, SJBIT
Page 57
DATA STR UCTER S WITH C
10CS35
Figure 6: The s par sity patter n of a matr ix
4.7. Doubly Link ed Lists : Although a cir cular ly link ed list has advantages over linear lists, it still has some dr aw back s. One cannot tr aver se such a list back war d. Dou ble-link ed lists r equir e mor e s pace per node , and their elementar y o per ations ar e mor e ex pensive; but they ar e of ten easier to mani pulate because they allow sequential access to the list in both dir ections. In par ticular , one can inser t or delete a node in a constant num ber of o per ations given only that node's addr ess. (Com par ed with singly-link ed lists, which r equir e the pr eviou s node's addr ess in or der to cor r ectly inser t or delete.) Some algor ithms r equir e access in both dir ections. On the other hand, they do not allow tail-shar ing, and cannot be used as per sistent data str uctur es. O per at ion s on Doubl y Link ed Li st s
One o per ation that can be per f or med on dou bly link ed list but not on or dinar y link ed list is to delete a given node. The f ollowing c r outine deletes the node pointed by pf r om a dou bly link ed list and stor es its contents in x. It is called by delete( p). delete( p ) { NODEPTR p, q, r ; int * px; if ( p = = NULL ) { pr intf (“ Void Deletion \n”);
DEPT OF CSE, SJBIT
Page 58
DATA STR UCTER S WITH C
10CS35
retur n; } * px = p -> inf o; q = p -> lef t; r = p -> r ight; q -> r ight = r ; r -> lef t = q; fr eenode( p ); retur n; } A node can be inser ted on the r ight or on the lef t of a given node. Let us consider inser tion at r ight side of a given node. The r outine inser t r ight inser ts a node with inf or mation f ield x to r ight of node( p) in a dou bly link ed list. inser tr ight( p, x) { NODEPTR p, q, r ; int x; if ( p = = NULL ) { pr intf (“ Void Inser tion \n”); retur n; } q = getnode(); q -> inf o = x; r = p -> r ight; r -> lef t = q; q -> r ight = r ; q -> lef t = p; p -> lef t = q; DEPT OF CSE, SJBIT
Page 59
DATA STR UCTER S WITH C
10CS35
retur n; }
4.8. RECOMMENDED QUESTIONS 1.Def ine Link ed Lists 2.List & ex plain the basic o per ations car r ied out in a link ed list 3.List out any two a p plications of link ed list and any two advantages of dou bly link ed list over singly link ed list. 4. Wr ite a C pr ogr am to simulate an or dinar y queue using a singly link ed list. 5. Give an algor ithm to inser t a node at a s pecif ied position f or a given singly link ed list. 6. Wr ite a C pr ogr am to per f or m the f ollowing o per ations on a dou bly link ed list: i) To cr eate a list by adding each node at the f r ont. ii) To dis play all the elements in the r ever se or der .
DEPT OF CSE, SJBIT
Page 60
DATA STR UCTER S WITH C
10CS35
UNIT – 5 : TR EES – 1
5.1 Introduction: A t r ee is a f inite set of one or mor e nodes such that: (i) ther e is a s pecially designated node called the r oot ; (ii) the r emaining nodes ar e par titioned into n 0 dis joint sets T 1, ...,T n wher e each of these sets is a tr ee. T 1, ...,T n ar e called the subt r ee s of the r oot. A tr ee str uctur e means that the data is or ganized so that items of inf or mation ar e r elated by br anches. One ver y common place wher e such a str uctur e ar ises is in the investigation of genealogies. A bstr actDataTy pe tr ee{ instances A set of elements: (1) em pty or having a distinguished root element (2) each non-root element having exactly one par ent element o per ations root() degr ee() child(k ) } Some basic ter minology f or tr ees:
Tr ees ar e f or med f r om nod e s and ed g e s. Nodes ar e sometimes called ver t ice s. Edges ar e sometimes called br anche s. Nodes may have a num ber of pr o per ties including val ue and l abel . Edges ar e used to r elate nodes to each other . In a tr ee, this r elation is called " par enthood." An edge {a b , } between nodes a and b esta blishes a as the par ent of b. Also, b is called a chil d of a. Although edges ar e usually dr awn as sim ple lines, they ar e r eally dir ected f r om par ent to child. In tr ee dr awings, this is to p-to- bottom. Inf ormal Def inition: a t r ee is a collection of nodes, one of which is distinguished as "r oot," along with a r elation (" par enthood") that is shown by edges. Formal Def inition: This def inition is "r ecur sive" in that it def ines tr ee in ter ms of itself . The def inition is also "constr uctive" in that it descr i bes how to constr uct a tr ee. 1. A single node is a tr ee. It is "r oot." 2. Su p pose N is a node and T1, T2, ..., Tk ar e tr ees with r oots n1, n2, ...,nk , r es pectively. We can constr uct a new tr ee T by mak ing N the par ent of the nodes n1, n2, ..., nk . Then, N is the r oot of T and T1, T2, ..., Tk ar e su btr ees.
The tree T, constructed using k subtrees
DEPT OF CSE, SJBIT
Page 61
DATA STR UCTER S WITH C
10CS35
Mor e ter minology
A node is either int er nal or it is a l ea f . A l ea f is a node that has no childr en. Ever y node in a tr ee (exce pt r oot) has exactly one par ent. The d e g r ee o f a nod e is the num ber of childr en it has. The d e g r ee o f a t r ee is the maximum degr ee of all of its nodes.
Paths and Levels
Def inition: A pat h is a sequence of nodes n1, n2, ..., nk such that node ni is the par ent of node ni+1
for all 1 <= i <= k . Def inition: The l en g t h of a path is the num ber of edges on the path (one less than the num ber of nodes). Def inition: The d e scend ent s of a node ar e all the nodes that ar e on some path f r om the node to any leaf . Def inition: The ance st or s of a node ar e all the nodes that ar e on the path f r om the node to the root. Def inition: The d e pt h of a node is the length of the path f r om r oot to the node. The de pth of a node is sometimes called its l evel . Def inition: The hei g ht o f a nod e is the length of the longest path f r om the node to a leaf . Def inition: the hei g ht o f a t r ee is the height of its r oot.
A general tree, showing node depths (levels)
In the exam ple a bove:
The nodes Y, Z, U, V, and W ar e leaf nodes. The nodes R , S, T, and X ar e inter nal nodes. The degr ee of node T is 3. The degr ee of node S is 1. The de pth of node X is 2. The de pth of node Z is 3. The height of node Z is zer o. The height of node S is 2. The height of node R is 3. The height of the tr ee is the same as the height of its r oot R . Ther ef or e the height of the tr ee is 3. The sequence of nodes R ,S,X is a path. The sequence of nodes R ,X,Y is not a path because the sequence does not satisf y the " par enthood" pr o per ty (R is not the par ent of X).
Representation of trees
DEPT OF CSE, SJBIT
Page 62
DATA STR UCTER S WITH C
10CS35
1. List re pr esentation 2. lef t child r ight si bling r e pr esentation 3. R e pr esentation as a degr ee two tr ee
5.2 Binary Trees:
Def inition: A binar y tr ee is a tr ee in which each node has degr ee of exactly 2 and the childr en of
each node ar e distinguished as "lef t" and "r ight." Some of the childr en of a node may be em pty. Formal Def inition: A binar y tr ee is: 1. either em pty, or 2. it is a node that has a lef t and a r ight su btr ee, each of which is a binar y tr ee. Def inition: A f ul l binar y t r ee (FBT) is a binar y tr ee in which each node has exactly 2 non-em pty childr en or exactly two em pty childr en, and all the leaves ar e on the same level. ( Note that this def inition dif f er s f r om the text def inition). Def inition: A com pl et e binar y t r ee (CBT) is a FBT exce pt, per ha ps, that the dee pest level may not be com pletely f illed. If not com pletely f illed, it is f illed f r om lef t-to-right. A FBT is a CBT, but not vice-ver sa.
Examples of Binary Trees
A Binary Tree. As usual, the em pty childr en ar e not ex plicitly shown.
i
A Full Binary Tree. In a FBT, the num ber of nodes at level i is 2 .
DEPT OF CSE, SJBIT
Page 63
DATA STR UCTER S WITH C
10CS35
A Complete Binary Tree.
Not a Complete Binary Tree. The tr ee a bove is not a CBT because the
dee pest level is not f illed f r om lef t-to-right.
Properties
Full tr ee A tree with all the leaves at the same level, and all the non-leaves having the same degree.
Complete Tree A f ull tr ee in which the ‘last’ elements ar e deleted.
Level h of a f ull tr ee has dh-1 nodes. The f ir st h levels of a f ull tr ee have nodes.
DEPT OF CSE, SJBIT
Page 64
DATA STR UCTER S WITH C
10CS35
A tr ee of height h and degr ee d has at most dh - 1 elements
Representations
1. Nodes consisting of a data f ield and k pointer s
2. Nodes consisting of w data f ield and two pointer s: a pointer to the f ir st child, and a
pointer to the next si bling. 3. A tr ee of degr ee k assumes an ar r ay f or holding a com plete tr ee of degr ee k, with em pty cells assigned f or missing elements.
5.3 Binar y tr ee Tr aver sals: Ther e ar e many o per ations that we of ten want to per f or m on tr ees. One notion that ar ises f r equently is the idea of tr aver sing a tr ee or visiting each node in the tr ee exactly once. A f ull tr aver sal pr oduces a linear or der f or the inf or mation in a tr ee. This linear or der may be f amiliar and usef ul. When tr aver sing a binar y tr ee we want to tr eat each node and its su btr ees in the same f ashion. If we let L , D , R stand f or moving lef t, pr inting the data, and moving r ight when at a node then ther e ar e six possi ble com binations of tr aver sal: L D R , L R D , D L R , D R L , R D L , and R L D. If we ado pt the convention that we tr aver se lef t bef or e r ight then only thr ee tr aver sals r emain: L D R , L R D and D L R. To these we assign the names inor der , postor der and pr eor der because ther e is a natur al cor r es pondence between these tr aver sals and pr oducing the inf ix, postf ix and pr ef ix f or ms of an ex pr ession.
Level or der
DEPT OF CSE, SJBIT
Page 65
DATA STR UCTER S WITH C
10CS35
x := root() if ( x ) queue (x) while( queue not em pty ){ x := dequeue() visit() i=1; while( i <= degr ee() ){ queue( child(i) ) } } Pr eor der
pr ocedur e pr eor der (x){ visit(x) i=1; while( i <= degr ee() ){ pr eor der ( child(i) ) } } Postor der
pr ocedur e postor der (x){ i=1; while( i <= degr ee() ){ postor der ( child(i) ) } visit(x) } Inor der Meaningf ul just f or binar y tr ees. pr ocedur e inor der (x){ if ( lef t _ child _ f or (x) ) { inor der ( lef t _ child(x) ) } visit(x) if ( right _ child _ f or (x) ) { inor der ( right _ child(x) ) } } DEPT OF CSE, SJBIT
Page 66
DATA STR UCTER S WITH C
10CS35
Usages f or ‘visit’: deter mine the height, count the num ber of elements .
5.4. Threaded Binary trees: If we look car ef ully at the link ed r e pr esentation of any binar y tr ee, we notice that ther e ar e mor e null link s than actual pointer s. As we saw bef or e, ther e ar e n + 1 null link s and 2n total link s. A clever way to mak e use of these null link s has been devised by A. J. Per lis and C. Thor nton. Their idea is to r e place the null link s by pointer s, called thr eads, to other nodes in the tr ee. If the R CHILD( P ) is nor mally equal to zer o, we will r e place it b y a point er t o t he nod e which woul d be pr int ed a f t er P when t r aver sin g t he t r ee in inor d er . A null LCHILD link at node P is r e placed b y a point er t o t he nod e which immed iat el y pr eced e s nod e P in inor d er . The tr ee T has 9 nodes and 10 null link s which have been r e placed by thr eads. If we tr aver se T in inor der the nodes will be visited in the or der H D I B E A F C G. For exam ple node E has a pr edecessor thr ead which points to B and a successor thr ead which points to A. In the memor y r e pr esentation we must be a ble to distinguish between thr eads and nor mal pointer s. This is done by adding two extr a one bit f ields LBIT and R BIT. LBIT( P ) =1 LBIT( P ) = 0 R BIT( P ) = 1 R BIT( P ) = 0
if LCHILD( P ) is a nor mal pointer if LCHILD( P ) is a thr ead if R CHILD( P ) is a nor mal pointer if R CHILD( P ) is a thr ead
5.5. Heaps A heap is a com plete tr ee with an or der ing-relation R holding between each node and its descendant. E xam pl e s f or R: smaller -than, bigger -than Assumption: In what f ollows, R is the r elation ‘ bigger -than’, and the tr ees have degr ee 2.
Hea p
Not a heap
Adding an Element
1. Add a node to the tr ee
DEPT OF CSE, SJBIT
Page 67
DATA STR UCTER S WITH C
10CS35
2. Move the elements in the path f r om the r oot to the new node one position down, if they ar e smaller than the new element new element
4
7
9
modif ied tree
3. Inser t the new element to the vacant node
4. A com plete tr ee of n nodes has de pth log n , hence the time com plexity is O(log n) Deleting an Element
1. Delete the value f r om the r oot node, and delete the last node while saving its value.
bef or e
DEPT OF CSE, SJBIT
af ter
Page 68
DATA STR UCTER S WITH C
10CS35
2. As long as the saved value is smaller than a child of the vacant node, move u p into the vacant node the lar gest value of the childr en.
3. Inser t the saved value into the vacant node
4. The time com plexity is O(log n) Initialization: Brute Force
Given a sequence of n values e1, ..., en, r e peatedly use the inser tion module on the n given values.
Level h in a com plete tr ee has at most 2h-1 = O(2n) elements Levels 1, ..., h - 1 have 20 + 21 + + 2h-2 = O(2h) elements Each element r equir es O(log n) time. Hence, br ute f or ce initialization r equir es O(n log n) time.
Ef f icient
Inser t the n elements e1, ..., en into a com plete tr ee
For each node, star ting f r om the last one and ending at the r oot, r eor ganize into a hea p the su btr ee whose r oot node is given. The r eor ganization is per f or med by inter changing the new element with the child of gr eater value, until the new element is gr eater than its childr en.
DEPT OF CSE, SJBIT
Page 69
DATA STR UCTER S WITH C
The time com plexity is O(0 * (n/2) + 1 * (n/4) + 2 * (n/8) + + 2 2-3 + + (log n) 2- log n)) = O(n)
10CS35
+ (log n) * 1) = O(n(0 2-1 + 1 2-2
since the f ollowing equalities holds. k = 1 (k - 1)2-k =2[ k = 1 (k - 1)2-k ] - [ k = 1 (k - 1)2k ] =[ k = 1 k2-k ] - [ k = 1 (k - 1)2-k ] = k = 1 [k - (k - 1)]2-k = k = 1 2-k =1 Applications: Priority Queue A dynamic set in which elements ar e deleted accor ding to a given or der ing-r elation. Heap Sort Build a hea p f r om the given set (O(n)) time, then r e peatedly r emove the elements f r om the
hea p (O(n log n)).
5.6. RECOMMENDED QUESTIONS 1. Constr uct a binar y tr ee f or : ((6+(3-2)*5)^2+3) 2. What is thr eaded binar y tr ee? Ex plain r ight in and lef t in thr eaded binar y tr ees. 3. Def ine a tr ee. Wr ite the pr ocedur e to conver t gener al tr ee to binar y tr ee. 4. Def ine degr ee of the node, leaves, r oot 5. Def ine inter nal nodes, par ent node, de pth and height of a tr ee 6. State the pr o per ties of a binar y tr ee 7. What is meant by binar y tr ee tr aver sal? What ar e the dif f er ent binar y tr ee tr aver sal techniques 8. State the mer its & demer it of linear r e pr esentation of binar y tr ees. 9. Def ine r ight-in thr eaded tr ee & lef t-in thr eaded tr ee DEPT OF CSE, SJBIT
Page 70
DATA STR UCTER S WITH C
10CS35
UNIT – 6 : TR EES – 2, GR APHS Introduction
The f ir st r ecor ded evidence of the use of gr a phs dates back to 1736 when Euler used them to solve the now classical K oenigs ber g br idge pr o blem.Some of the a p plications of gr a phs ar e: analysis of electr ical cir cuits, f inding shor test r outes, analysis of pr o ject planning, identif ication of chemical com pounds, statistical mechanics, genetics, cy ber netics, linguistics, social sciences, etc. Indeed, it might well be said that of all mathematical str uctur es, gr a phs ar e the most widely used.
Fi g ur e 6 .1 S ect i on o f t he r i ver P r e g al i n K oeni g sber g and E ul er ' s g r a ph . Def initions and Terminology
A gr a ph, G, consists of two sets V and E . V is a f inite non-em pty set of ver t ice s. E is a set of pair s of ver tices, these pair s ar e called ed g e s. V (G) and E (G) will r e pr esent the sets of ver tices and edges of gr a ph G. We will also wr ite G = (V , E ) to r e pr esent a gr a ph. In an und ir ect ed g r a ph the pair of ver tices r e pr esenting any edge is unor der ed . Thus, the pair s (v1, v2) and (v2, v1) r e pr esent the same edge. In a d ir ect ed g r a ph each edge is r e pr esented by a dir ected pair (v1, v2). v1 is the t ail and v2 the head of the edge. Ther ef or e and r e pr esent two dif f er ent edges. Figur e 6.2 shows thr ee gr a phs G1, G2 and G3.
DEPT OF CSE, SJBIT
Page 71
DATA STR UCTER S WITH C
10CS35
Fi g ur e 6 .2 T hr ee sam pl e g r a phs .
The gr a phs G1 and G2 ar e undir ected. G3 is a dir ected gr a ph. V (G1) = {1,2,3,4}; E (G1) = {(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)} V (G2) = {1,2,3,4,5,6,7}; E (G2) = {(1,2),(1,3),(2,4),(2,5),(3,6),(3,7)} V (G3) = {1,2,3}; E (G3) = {<1,2>, <2,1>, <2,3>}.
Note that the edges of a dir ected gr a ph ar e dr awn with an ar r ow f r om the tail to the head. The gr a ph G2 is also a tr ee while the gr a phs G1 and G3 ar e not. Tr ees can be def ined as a s pecial case of gr a phs, In addition, since E (G) is a set, a gr a ph may not have multi ple occur r ences of the same edge. When this restr iction is r emoved f r om a gr a ph, the r esulting data o b ject is r ef er r ed to as a multigr a ph. The data o b ject of f igur e 6.3 is a multigr a ph which is not a gr a ph. The num ber of distinct unor der ed pair s (vi,v j) with vi v j in a gr a ph with n ver tices is n(n - 1)/2. This is the maximum num ber of edges in any n ver tex undir ected gr a ph. An n ver tex undir ected gr a ph with exactly n(n - 1)/2 edges is said to be com pl et e. G1 is the com plete gr a ph on 4 ver tices while G2 and G3 ar e not com plete gr a phs. In the case of a dir ected gr a ph on n ver tices the maximum num ber of edges is n(n - 1). If (v1,v2) is an edge in E (G), then we shall say the ver tices v1 and v2 ar e ad jacent and that the edge (v1,v2) is incid ent on ver tices v1 and v2. The ver tices ad jacent to ver tex 2 in G2 ar e 4, 5 and 1. The edges incident on ver tex 3 in G2 ar e (1,3), (3,6) and (3,7). If is a dir ected edge, then ver tex v1 will be said to be ad jacent t o v2 while v2 is ad jacent f r om v1. The edge is incident to v1 and v2. In G3 the edges incident to ver tex 2 ar e <1,2>, <2,1> and <2,3>.
Fi g ur e 6 .3 E x am pl e o f a mul t i g r a ph that i s not a g r a ph.
A sub g r a ph of G is a gr a ph G' such that V (G' ) the su bgr a phs of G1 and G3.
V (G) and E (G' )
E (G). Figur e 6.4 shows some of
A pat h f r om ver tex v p to ver tex vq in gr a ph G is a sequence of ver tices v p,vi1,vi2, ...,vin,vq such that (v p,vi1),(vi1,vi2), ...,(vin,vq) ar e edges in E (G). If G' is dir ected then the path consists of ,, ...,, edges in E (G' ).
DEPT OF CSE, SJBIT
Page 72
DATA STR UCTER S WITH C
10CS35
The l en g t h of a path is the num ber of edges on it. A sim pl e pat h is a path in which all ver tices exce pt possi bly the f ir st and last ar e distinct. A path such as (1,2) (2,4) (4,3) we wr ite as 1,2,4,3. Paths 1,2,4,3 and 1,2,4,2 ar e both of length 3 in G1. The f ir st is a sim ple path while the second is not. 1,2,3 is a sim ple dir ected path in G3. 1,2,3,2 is not a path in G3 as the edge <3,2> is not in E (G3). A c ycl e is a sim ple path in which the f ir st and last ver tices ar e the same. 1,2,3,1 is a cycle in G1. 1,2,1 is a cycle in G3. For the case of dir ected gr a phs we nor mally add on the pr ef ix "dir ected" to the ter ms cycle and path. In an undir ected gr a ph, G, two ver tices v1 and v2 ar e said to be connect ed if ther e is a path in G f r om v1 to v2 (since G is undir ected, this means ther e must also be a path f r om v2 to v1). An undir ected gr a ph is said to be connected if f or ever y pair of distinct ver tices vi, vi in V (G) ther e is a path f r om vi to v j in G. Gr a phs G1 and G2 ar e connected while G4 of f igur e 6.5 is not. A connect ed com ponent or sim ply a com ponent of an undir ected gr a ph is a ma ximal connected su bgr a ph. G4 has two com ponents H 1 and H 2 (see f igur e 6.5). A t r ee is a connected acyclic (i.e., has no cycles) gr a ph . A dir ected gr a ph G is said to be st r on g l y connect ed if f or ever y pair of distinct ver tices vi, v j in V (G) ther e is a dir ected path f r om vi to v j and also f r om v j to vi. The gr a ph G3 is not str ongly connected as ther e is no path f r om v3 to v2. A st r on g l y connect ed com ponent is a maximal su bgr a ph that is str ongly connected. G3 has two str ongly connected com ponents.
( a ) S ome o f t he sub g r a phs o f G 1
DEPT OF CSE, SJBIT
Page 73
DATA STR UCTER S WITH C
10CS35
( b ) S ome o f t he sub g r a phs o f G 3 F i g ur e 6 .4 ( a ) S ub g r a phs o f G 1 and ( b ) S ub g r a phs o f G 3
Fi g ur e 6 .5 A g r a ph wi t h t wo connect ed com ponent s .
Fi g ur e 6 .6 st r on g l y connect ed com ponent s o f G 3 .
The degr ee of a ver tex is the num ber of edges incident to that ver tex. The degr ee of ver tex 1 in G1 is 3. In case G is a dir ected gr a ph, we def ine the in-d e g r ee of a ver tex v to be the num ber of edges f or which v is the head. The out -de g r ee is def ined to be the num ber of edges f or which v is the tail. Ver tex 2 of G3 has in-degr ee 1, out-degr ee 2 and degr ee 3. If d i is the degr ee of ver tex i in a gr a ph G with n ver tices and e edges, then it is easy to see that e = (1/2)
.
6.1 Binary Search Trees Characteristics
Tr ees in which the k ey of an inter nal node is gr eater than the k eys in its lef t su btr ee and is smaller than the k eys in its r ight su btr ee. Search
sear ch ( tr ee,k ey ) IF em pty tr ee THEN retur n not-found IF key == value in root THEN retur n found IF key > value in root THEN sear ch (lef t-su btr ee, key) sear ch (r ight-su btr ee, key) T ime: O(de pth of tr ee) Insertion
inser t 6
DEPT OF CSE, SJBIT
inser t 10
Page 74
DATA STR UCTER S WITH C
10CS35
Deletion
The way the deletion is made de pends on the ty pe of node holding the k ey. Node of degr ee 0 Delete the node
Node of degr ee 1 Delete the node, while connecting its pr edecessor to the successor .
Node of degr ee 2 R e place the node containing the deleted k ey with the node having the lar gest k ey in the lef t su btr ee, or with the node having the smallest k ey in the r ight su btr ee.
DEPT OF CSE, SJBIT
Page 75
DATA STR UCTER S WITH C
10CS35
6.2. Selection Trees A selection tree is a com plete binar y tr ee in which the leaf nodes hold a set of k eys, and each inter nal node holds the “winner ” k ey among its childr en. Modif ying a K ey It takes O(log n) time to modif y a selection tree in response to a change of a k ey in a leaf .
Initialization
The constr uction of a selection tr ee f r om scr atch tak es O(n) time by tr aver sing it level-wise f r om bottom u p.
Application: External Sort
n = 16
Given a set of n values
16 9 10 8 6 11 12 1 4 7 14 13 2 15 5 3
divide it into M chunck s,
16 9 10 8
6 11 12 1
4 7 1 4 13
2 15 5 3
inter nally sor t each chunk ,
8 9 10 16
1 6 1 1 12
4 7 1 3 14
2 3 5 15
DEPT OF CSE, SJBIT
M =4
Page 76
DATA STR UCTER S WITH C
10CS35
constr uct com plete binar y tr ee of M leaves with the chunk s attached to the leaves.
Conver t the tr ee into a selection tr ee with the k eys being f ed to the leaves f r om the chunk s
Remove the winner f r om the tr ee
Feed to the em pty leaf the next value f r om its cor r es ponding chunk
Ad just the selection tr ee to the change in the leaf
R e peat the deletion su b pr ocess until all the values ar e consumed.
The algor ithm tak es time to inter nally sor t the elements of the chunk s, O(M) to initialize the selection tr ee, and O(n log M) to per f or m the selection sor t. For M « n the total time com plexity is O(n log n). To r educe I/O o per ations, in puts f r om the chunk s to the selection tr ee should go thr ough buf f er s.
DEPT OF CSE, SJBIT
Page 77
DATA STR UCTER S WITH C
10CS35
6.3 Forests The def ault inter domain tr ust r elationshi ps ar e cr eated by the system dur ing domain contr oller cr eation. The num ber of tr ust r elationshi ps that ar e r equir ed to connect n domains is n –1, whether the domains ar e link ed in a single, contiguous par ent-child hier ar chy or they constitute two or mor e se par ate contiguous par ent-child hier ar chies. When it is necessar y f or domains in the same or ganization to have dif f er ent names paces, cr eate a se par ate tr ee f or each names pace. In Windows 2000, the r oots of tr ees ar e link ed automatically by two-way, tr ansitive tr ust r elationshi ps. Tr ees link ed by tr ust r elationshi ps f or m a f or est A single tr ee that is r elated to no other tr ees constitutes a f or est of one tr ee. The tr ee str uctur es f or the entir e Windows 2000 f or est ar e stor ed in Active Dir ector y in the f or m of par ent-child and tr ee-r oot r elationshi ps. These r elationshi ps ar e stor ed as tr ust account o b jects (class t r u st ed Domain ) in the System container within a s pecif ic domain dir ector y par tition. For each domain in a f or est, inf or mation a bout its connection to a par ent domain (or , in the case of a tr ee r oot, to another tr ee root domain) is added to the conf igur ation data that is r e plicated to ever y domain in the f or est. Ther ef or e, ever y domain contr oller in the f or est has k nowledge of the tr ee str uctur e f or the entir e f or est, including k nowledge of the link s between tr ees. You can view the tr ee str uctur e in Active Dir ector y Domain Tr ee Manager .
6.4 Representation of Dis joint Sets Set
In com puter science, a set is an a bstr act data str uctur e that can stor e cer tain values, without any par ticular or der , and no r e peated values. It is a com puter im plementation of the mathematical conce pt of a f inite set. Some set data str uctur es ar e designed f or static sets that do not change with time, and allow only quer y o per ations — such as check ing whether a given value is in the set, or enumer ating the values in some ar bitr ar y or der . Other var iants, called dynamic or mutable sets, allow also the inser tion and/or deletion of elements f r om the set. A set can be im plemented in many ways. For exam ple, one can use a list, ignor ing the or der of the elements and tak ing car e to avoid r e peated values. Sets ar e of ten im plemented using var ious f lavor s of tr ees, tr ies, hash ta bles, and mor e. A set can be seen, and im plemented, as a ( par tial) associative ar r ay, in which the value of each k ey-value pair has the unit ty pe. In ty pe theor y, sets ar e gener ally identif ied with their indicator f unction: accor dingly, a set of values of ty pe may be denoted by or . (Su bty pes and su bsets may be modeled by ref inement ty pes, and quotient sets may be re placed by setoids.) Operations
Ty pical o per ations that may be pr ovided by a static set str uctur e S ar e • element _ of ( x,S ): check s whether the value x is in the set S . • em pty(S ): check s whether the set S is em pty. • size(S ): r etur ns the num ber of elements in S . • enumer ate(S ): yields the elements of S in some ar bitr ar y or der . • pick (S ): r etur ns an ar bitr ar y element of S . • build( x1 x , 2,…, xn,): cr eates a set str uctur e with values x1 x , 2,…, xn. The enumer ate o per ation may r etur n a list of all the elements, or an iter ator , a pr ocedur e o b ject that retur ns one mor e value of S at each call. Dynamic set str uctur es ty pically add: • cr eate(n): cr eates a new set str uctur e, initially em pty but ca pa ble of holding u p to n elements. DEPT OF CSE, SJBIT
Page 78
DATA STR UCTER S WITH C
10CS35
• add(S ,x): adds the element x to S , if it is not ther e alr eady. • delete(S , x): r emoves the element x f r om S , if it is there. • ca pacity(S ): r etur ns the maximum num ber of values that S can hold. Some set str uctur es may allow only some of these o per ations. The cost of each o per ation will de pend on the im plementation, and possi bly also on the par ticular values stor ed in the set, and the or der in which they ar e inser ted. Ther e ar e many other o per ations that can (in pr inci ple) be def ined in ter ms of the a bove, such as: • po p(S ): r etur ns an ar bitr ar y element of S , deleting it f r om S . • f ind(S , P ): r etur ns an element of S that satisf ies a given pr edicate P . • clear (S ): delete all elements of S . In par ticular , one may def ine the Boolean o per ations of set theor y: • union(S ,T ): r etur ns the union of sets S and T . • inter section(S ,T ): r etur ns the inter section of sets S and T . • dif f er ence(S ,T ): r etur ns the dif f er ence of sets S and T . • su bset(S ,T ): a pr edicate that tests whether the set S is a su bset of set T . Other o per ations can be def ined f or sets with elements of a s pecial ty pe: • sum(S ): r etur ns the sum of all elements of S (f or some def inition of "sum"). • near est(S ,x): r etur ns the element of S that is closest in value to x ( by some cr iter ion). In theor y, many other a bstr act data str uctur es can be viewed as set str uctur es with additional o per ations and/or additional axioms im posed on the standar d o per ations. For exam ple, an a bstr act hea p can be viewed as a set str uctur e with a min(S ) o per ation that r etur ns the element of smallest value. Implementations
Sets can be im plemented using var ious data str uctur es, which pr ovide dif f erent time and s pace tr ade-of f s for var ious o per ations. Some im plementations ar e designed to im pr ove the ef f iciency of ver y s pecialized o per ations, such as near est or union. Im plementations descr i bed as "gener al use" ty pically str ive to o ptimize the element _of , add, and delete o per ation. Sets ar e commonly im plemented in the same way as associative ar r ays, namely, a self - balancing binar y sear ch tr ee f or sor ted sets (which has O(log n) f or most o per ations), or a hash ta ble f or unsor ted sets (which has O(1) aver age-case, but O(n) wor st-case, f or most o per ations). A sor ted linear hash ta ble[1] may be used to pr ovide deter ministically or der ed sets. Other po pular methods include ar r ays. In par ticular a su bset of the integer s 1..n can be im plemented ef f iciently as an n- bit bit ar r ay, which also su p por t ver y ef f icient union and inter section o per ations. A Bloom ma p im plements a set pr o ba bilistically, using a ver y com pact r e pr esentation but r isk ing a small chance of f alse positives on quer ies. The Boolean set o per ations can be im plemented in ter ms of mor e elementar y o per ations ( po p, clear , and add), but s pecialized algor ithms may yield lower asym ptotic time bounds. If sets ar e im plemented as sor ted lists, f or exam ple, the naive algor ithm f or union(S ,T ) will tak e code pr o por tional to the length m of S times the length n of T ; wher eas a var iant of the list mer ging algor ithm will do the jo b in time pr o por tional to m+n. Mor eover , ther e ar e s pecialized set data str uctur es (such as the union-find data str uctur e) that ar e o ptimized f or one or mor e of these o per ations, at the ex pense of other s.
6.5 Counting Binary Trees: Def inition: A binar y tr ee has a s pecial ver tex called its r oot. Fr om this ver tex at the to p, the r est of the tr ee is dr awn downwar d.Each ver tex may have a lef t child and/or a r ight child. Exam ple. The num ber of binar y tr ees with 1, 2, 3 ver tices is: Exam ple. The num ber of binar y tr ees with 4 ver tices is: DEPT OF CSE, SJBIT
Page 79
DATA STR UCTER S WITH C
10CS35
Con jectur e: The num ber of binar y tr ees on n ver tices is . Pr oof : Ever y binar y tr ee either : ! Has no ver tices (x0) –or – ! Br eak s down as one r oot ver tex (x) along with two binar y tr ees beneath (B(x)2). Ther ef or e, the gener ating f unction f or binar y tr ees satisf ies B(x) = 1 + xB(x)2. We conclude bn = 1 n+1#2n n $ . Another way: Find a r ecur r ence f or bn. Note: b4 = b0 b3 + b1 b2 + b2 b1 + b3 b0. In gener al, bn =%n−1 i=0 bi bn−1−i . Ther ef or e, B(x) equals 1 + & n"1 '&n−1 i=0 bi bn−1−I ( xn = 1 + x & n"1 '&n−1 i=0 bi bn−1−I ( xn−1 = 1+x & k "0 '&k i=0 bi bk −I ( xk = 1+x '& k "0 bk xk ('& k "0 bk xk ( = 1+xB(x)2.
6.6 The Graph Abstract Data Type:
In com puter science, a graph is an a bstr act data ty pe that is meant to im plement the gr a ph and hy per gr a ph conce pts f r om mathematics.A gr a ph data str uctur e consists of a f inite (and possi bly muta ble) set of or der ed pair s, called edges or arcs, of cer tain entities called nodes or vertices. As in mathematics, an edge ( x ,y) is said to point or go f rom x to y. The nodes may be par t of the gr a ph str uctur e, or may be exter nal entities r e pr esented by integer indices or r ef er ences. A gr a ph data str uctur e may also associate to each edge some numer ic attr i bute (cost, ca pacity, length, etc.).
edge value,
such as a sym bolic la bel or a
Algorithms
Gr a ph algor ithms ar e a signif icant f ield of inter est within com puter science. Ty pical higher -level o per ations associated with gr a phs ar e: f inding a path between two nodes, lik e de pth-f ir st sear ch and br eadth-f ir st sear ch and f inding the shor test path f r om one node to another , lik e Di jk str a's algor ithm. A solution to f inding the shor test path f r om each node to ever y other node also exists in the f or m of the Floyd–War shall algor ithm.A dir ected gr a ph can be seen as a flow networ k , wher e each edge has a ca pacity and each edge r eceives a f low. The For d–Fulk er son algor ithm is used to f ind out the maximum f low fr om a sour ce to a sink in a gr a ph Operations
The basic o per ations pr ovided by a gr a ph data str uctur e G usually include:
ad jacent(G, x, y): tests whether ther e is an edge f r om node x to node y. neigh bor s(G, x): lists all nodes y such that ther e is an edge f r om x to y. add(G, x, y): adds to G the edge f r om x to y, if it is not ther e. delete(G, x, y): r emoves the edge f r om x to y, if it is ther e. get _ node _ value(G, x): r etur ns the value associated with the node x. set _ node _ value(G, x, a): sets the value associated with the node x to a.
Str uctur es that associate values to the edges usually also pr ovide:
get _ edge _ value(G, x, y): r etur ns the value associated to the edge ( x, y). set _ edge _ value(G, x, y, v): sets the value associated to the edge ( x, y) to v.
DEPT OF CSE, SJBIT
Page 80
DATA STR UCTER S WITH C
10CS35
Representations
Dif f er ent data str uctur es f or the r e pr esentation of gr a phs ar e used in pr actice:
– Ver tices ar e stor ed as r ecor ds or o b jects, and ever y ver tex stor es a list of ad jacent ver tices. This data str uctur e allows the stor age of additional data on the ver tices. Incidence list – Ver tices and edges ar e stor ed as r ecor ds or o b jects. Each ver tex stor es its incident edges, and each edge stor es its incident ver tices. This data str uctur e allows the stor age of additional data on ver tices and edges. Ad jacency matrix – A two-dimensional matr ix, in which the r ows r e pr esent sour ce ver tices and columns r e pr esent destination ver tices. Data on edges and ver tices must be stor ed exter nally. Only the cost f or one edge can be stor ed between each pair of ver tices. Incidence matrix – A two-dimensional Boolean matr ix, in which the r ows r e pr esent the ver tices and columns r e pr esent the edges. The entr ies indicate whether the ver tex at a r ow is incident to the edge at a column. Ad jacency list
6.7. RECOMMENDED QUESTIONS 1. Def ine f or est. Ex plain the f or est tr aver sals. 2. Def ine Binar y sear ch tr ee. Ex plain with exam ple? 3. Def ine a path in a tr ee, ter minal nodes in a tr ee 4. Why it is said that sear ching a node in a binar y sear ch tr ee is ef f icient than that of a sim ple binar y tr ee? 5. List the a p plications of set ADT. 6. What do you mean by dis joint set ADT 7. List the a bstr act o per ations in the set. 8. Def ine Gr a ph. What is a dir ected gr a ph & undir ected gr a ph? 9. What is a weighted gr a ph? Def ine path in a gr a ph? 10. Def ine outdegr ee of a gr a ph & indegr ee of a gr a ph?
DEPT OF CSE, SJBIT
Page 81
DATA STR UCTER S WITH C
10CS35
UNIT – 7 : PR IOR ITY QUEUES Priority Queue: Need for priority queue:
In a multi user environment, the o per ating system scheduler must decide which of sever al processes to run only for a fixed period for t ime. For that we can use the algorithm of QUEUE, where Jobs ar e initially placed at the end of the queue. The scheduler will repeatedly tak e the fir st job o n the queue, run it until either it finishes or its ti me limit is up, and placing it at the and of the queue if it doesn’t finish. This str ategy is gener ally not approximate, because ver y shor t jo bs will soon to tak e a long time because of the wait involved to run. Gener ally, it is important that shor t jobs finish as fast as possi ble, so these jo bs should have pr ecedence over jo bs that have alr eady been running. Fur ther more, some jo bs that ar e not shor t ar e still very important and should also have pr ecedence. This particular application seems to require a special kind of queue, known as a PRIORI TY QUEUE.
Priority Queue:
It is a collection of ordered elements that provides fast access to the minimum or maximum element. Basic O per ations per f or med by pr ior ity queue ar e: 1. Inser t o per ation 2. Deletemin operation Inser t oper ation is the equivalent of queue’s Enqueue o per ation. Deletemin operation is the pr iority q ueue equivalent of the queue’s Dequeue o per ation.
Deltemin(H)
Pr iority Queue H
Insert(H)
Implementation:
Ther e ar e thr ee ways for im plementing pr iority queue. They ar e: 1. Linked list 2. Binary Search tree 3. Binary Heap
7.1. Single- and Double-Ended Priority Queues: A (single-ended) pr ior ity queue is a data ty pe su p por ting the f ollowing o per ations on an or der ed set of values: 1) f ind the maximum value (FindMax); 2) delete the maximum value (DeleteMax); 3) add a new value x (Inser t(x)). O bviously, the pr ior ity queue can be r edef ined by su bstituting o per ations 1) and 2) with FindMin and DeleteMin, r es pectively. Sever al str uctur es, some im plicitly stor ed in an ar r ay and some using mor e com plex data str uctur es, have been pr esented f or im plementing this data ty pe, including max-hea ps (or min-hea ps)
DEPT OF CSE, SJBIT
Page 82
DATA STR UCTER S WITH C
10CS35
Conce ptually, a max-hea p is a binar y tr ee having the f ollowing pr o per ties: a) hea p-sha pe: all leaves lie on at most two ad jacent levels, and the leaves on the last level occu py the lef tmost positions; all other levels ar e com plete. b) max-or der ing: the value stor ed at a node is gr eater than or equal to the values stor ed at its childr en. A max-hea p of size n can be constr ucted in linear time and can be stor ed in an n-element ar r ay; hence it is ref er r ed to as an im plicit data str uctur e [g]. When a max-hea p im plements a pr ior ity queue, FindMax can be per f or med in constant time, while both DeleteMax and Inser t(x) have logar ithmic time. We shall consider a mor e power f ul data ty pe, the dou bleended pr ior ity queue, which allows both FindMin and FindMax, as well as DeleteMin, DeleteMax, and Inser t(x) o per ations. An im por tant a p plication of this data ty pe is in exter nal quick sor t . A tr aditional hea p does not allow ef f icient im plementation of all the a bove o per ations; f or exam ple, FindMin r equir es linear (instead of constant) time in a max-hea p. One a p pr oach to over coming this intr insic limitation of hea ps, is to place a max-hea p “ back -to- back ” with a min-hea p. Def inition A double-ended priority queue (DEPQ) is a collection of zer o or mor e elements. Each element has a pr ior ity or value. The o per ations per f or med on a dou ble-ended pr ior ity queue ar e: 1. 2. 3. 4. 5. 6. 7.
isEm pty() ... r etur n tr ue if f the DEPQ is em pty size() ... r etur n the num ber of elements in the DEPQ getMin() ... r etur n element with minimum pr ior ity getMax() ... r etur n element with maximum pr ior ity put(x) ... inser t the element x into the DEPQ removeMin() ... r emove an element with minimum pr ior ity and r etur n this element removeMax() ... r emove an element with maximum pr ior ity and r etur n this element
A p plication to Exter nal Sor ting The inter nal sor ting method that has the best ex pected r un time is quick sor t (see Section 19.2.3). The basic idea in quick sor t is to par tition the elements to be sor ted into thr ee gr ou ps L, M, and R . The middle gr ou p M contains a single element called the pivot, all elements in the lef t gr ou p L ar e <= the pivot, and all elements in the r ight gr ou p R ar e >= the pivot. Following this par titioning, the lef t and r ight element gr ou ps ar e sor ted r ecur sively. In an exter nal sor t, we have mor e elements than can be held in the memor y of our com puter . The elements to be sor ted ar e initially on a disk and the sor ted sequence is to be lef t on the disk . When the inter nal quick sor t method outlined a bove is extended to an exter nal quick sor t, the middle gr ou p M is made as lar ge as possi ble thr ough the use of a DEPQ. The exter nal quick sor t str ategy is: 1. R ead in as many elements as will f it into an inter nal DEPQ. The elements in the DEPQ will eventually be the middle gr ou p of elements. 2. Read in the r emaining elements. If the next element is <= the smallest element in the DEPQ, out put this next element as par t of the lef t gr ou p. If the next element is >= the lar gest element in the DEPQ, out put this next element as par t of the r ight gr ou p. Other wise, r emove either the max or min element f r om the DEPQ (the choice may be made r andomly or alter nately); if the max element is r emoved, out put it as par t of the r ight gr ou p; other wise, out put the r emoved element as par t of the lef t gr ou p; inser t the newly in put element into the DEPQ. 3. Out put the elements in the DEPQ, in sor ted or der , as the middle gr ou p. 4. Sor t the lef t and r ight gr ou ps r ecur sively. DEPT OF CSE, SJBIT
Page 83
DATA STR UCTER S WITH C
10CS35
Generic Methods f or DEPQs:
Gener al methods exist to ar r ive at ef f icient DEPQ data str uctur es f r om single-ended pr ior ity queue (PQ) data str uctur es that also pr ovide an ef f icient im plementation of the r emove(the Node) o per ation (this o per ation r emoves the node the Node f r om the PQ). The sim plest of these methods, dual structure method, maintains both a min PQ and a max PQ of all the DEPQ elements together with correspondence pointers between the nodes of the min PQ and the max PQ that contain the same element. Figur e 1 shows a dual hea p str uctur e f or the elements 6, 7, 2, 5, 4. Cor r es pondence pointer s ar e shown as r ed ar r ows.
Figure 1 Dual heap
Although the f igur e shows each element stor ed in both the min and the max hea p, it is necessar y to stor e each element in only one of the t wo hea ps. The isEm pty and size o per ations ar e im plemented by using a var ia ble size that k ee ps tr ack of the num ber of elements in the DEPQ. The minimum element is at the r oot of the min hea p and the maximum element is at the r oot of the max hea p. To inser t an element x, we inser t x into both the min and the max hea ps and then set u p cor r es pondence pointer s between the locations of x in the min and max hea ps. To r emove the minimum element, we do a r emoveMin f r om the min hea p and a r emove(the Node), wher e the Node is the cor r es ponding node f or the r emoved element, f r om the max hea p. The maximum element is r emoved in an analogous way. ar e mor e so phisticated cor r es pondence methods. In both of these, half the elements ar e in the min PQ and the other half in the max PQ. When the num ber of elements is odd, one element is r etained in a buf f er . This buf f er ed element is not in either PQ. In total cor r es pondence, each element a in the min PQ is pair ed with a distinct element b of the max PQ. (a b , ) is a cor r es ponding pair of elements such that pr ior ity(a) <= pr ior ity( b). Figur e 2 shows a total cor r es pondence hea p f or the 11 elements 2, 3, 4, 4, 5, 5, 6, 7, 8, 9, 10. The element 9 is in the buf f er . Cor r es ponding pair s ar e shown by r ed ar r ows. Total and leaf correspondence
Figur e 2 Total cor r es pondence hea p
In leaf cor r es pondence, each leaf element of the min and max PQ is r equir ed to be par t of a cor r es ponding pair . Nonleaf elements need not be in any cor r es ponding pair . Figur e 3 shows a leaf cor r es pondence hea p.
DEPT OF CSE, SJBIT
Page 84
DATA STR UCTER S WITH C
10CS35
Figur e 3 A leaf cor r es pondence hea p
Total and leaf cor r es pondence str uctur es r equir e less s pace than do dual str uctur es. However , the DEPQ algor ithms f or total and leaf cor r es pondence str uctur es ar e mor e com plex than those f or dual str uctur es. Of the thr ee cor r es pondence ty pes, leaf cor r es pondence gener ally r esults in the f astest DEPQ cor r es pondence str uctur es. Using any of the descr i bed cor r es pondence methods, we can ar r ive at DEPQ str uctur es f r om hea ps, height biased lef tist tr ees, and pair ing hea ps. In these DEQP str uctur es, the o per ations put(x), r emoveMin(), and removeMax() tak e O(log n) time (n is the num ber of elements in the DEPQ, f or pair ing hea ps, this is an amor tized com plexity), and the r emaining DEPQ o per ations tak e O(1) time.
7.2. Lef tist tree: Def initions:An external node is an imaginar y node in a location of a missing child.
N ot at ion. Let the s-value of a node be the shor test distance f r om the node to an exter nal node.
An exter nal node has the s-value of 0. An inter nal node has the s-value of 1 plus the minimum of the s-values of its inter nal and exter nal childr en.
Height-Biased Lef tist Trees
In a height-biased lef tist tree the s-value of a lef t child of a node is not smaller than the s-value of the right child of the node. height- biased non height- biased
DEPT OF CSE, SJBIT
Page 85
DATA STR UCTER S WITH C
10CS35
Merging Height-Biased Lef tist Trees
Recur sive algor ithm
Consider two nonem pty height- biased lef tist tr ees A and B, and a r elation (e.g., smaller than) on the values of the k eys. Assume the k ey-value of A is not bigger than the k ey-value of B Let the r oot of the mer ged tr ee have the same lef t su btr ee as the r oot of A Let the r oot of the mer ged tr ee have the r ight su btr ee o btained by mer ging B with the r ight su btr ee of A. If in the mer ged tr ee the s-value of the lef t su btr ee is smaller than the s-value of the r ight su btr ee, inter change the su btr ees.
For the f ollowing exam ple, assume the k ey-value of each node equals its s-value.
DEPT OF CSE, SJBIT
Page 86
DATA STR UCTER S WITH C
10CS35
Time complexity
Linear in the r ightmost path of the outcome tr ee. The r ightmost path of the the outcome tr ee is a shor test path A shor test path can’t contain mor e than log n nodes. P r oo f If the shor test path can contain mor e than log n nodes, then the f ir st 1 + log n levels should
include 20 + 21 + > 2n - 1.
+ 2log n = 21+log n - 1 = 2n - 1 nodes. In such a case, f or n > 1 we end u p with n
7.3. Binomial Heaps: is a hea p similar to a binar y hea p but also su p por ts quick ly mer ging two hea ps. This is achieved by using a s pecial tr ee str uctur e. It is im por tant as an im plementation of the mergeable heap a bstr act data ty pe (also called melda ble hea p), which is a pr ior ity queue su p por ting mer ge o per ation. A binomial hea p is im plemented as a collection of binomial tr ees (com par e with a binar y hea p, which has a sha pe of a single binar y tr ee). A binomial tree is def ined r ecur sively: Binomial heap
A binomial tr ee of or der 0 is a single node A binomial tr ee of or der k has a r oot node whose childr en ar e r oots of binomial tr ees of or der s k −1, k −2, ..., 2, 1, 0 (in this or der ).
Binomial tr ees of or der 0 to 3: Each tr ee has a r oot node with su btr ees of all lower or der ed binomial tr ees, which have been highlighted. For exam ple, the or der 3 binomial tr ee is connected to an or der 2, 1, and 0 (highlighted as blue, gr een and r ed r es pectively) binomial tr ee. A binomial tr ee of or der k has 2k nodes, height k . Because of its unique str uctur e, a binomial tr ee of or der k can be constr ucted f r om two tr ees of or der k −1 tr ivially by attaching one of them as the lef tmost child of r oot of the other one. This f eatur e is centr al to the mer g e o per ation of a binomial hea p, which is its ma jor advantage over other conventional hea ps.The name comes f r om the sha pe: a binomial tr ee of or der has coef f icient.) DEPT OF CSE, SJBIT
nodes at de pth
. (See Binomial Page 87
DATA STR UCTER S WITH C
10CS35
Structure of a binomial heap
A binomial hea p is im plemented as a set of binomial tr ees that satisf y the binomial hea p pr o per t ie s:
Each binomial tr ee in a hea p o beys the minimum-hea p pr o per t yy: the k ey of a node is gr eater than or equal to the k ey of its par ent. Ther e can only be either one or z er o binomial tr ees f or each or der , including zer o or der .
The f ir st pr o per ty ensur es that the r oot of each binomial tr ee contains the smallest k ey in the tr ee, which a p plies to the entir e hea p. The second pr o per ty im plies that a binomial hea p with n nodes consists of at most log n + 1 binomial tr ees. In f act, the num ber and or der s of these tr ees ar e uniquely deter mined by the num ber of nodes n: each binomial tr ee cor r es ponds to one digit in the binar y r e pr esentation of num ber n. For exam ple num ber 13 is 1101 in binar y, , and thus a binomial hea p with 13 nodes will consist of thr ee binomial tr ees of or der s 3, 2, and 0 (see f igur e below).
E xam pl e o f a binomial hea p cont ainin g 13 T he hea p con si st s o f t hr ee binomial t r ee s wit h or d er s 0 , 2 , and 3.
nod e s
wit h
d i st inct
k e y s.
Implementation
Because no o per ation r equir es r andom access to the r oot nodes of the binomial tr ees, the r oots of the binomial tr ees can be stor ed in a link ed list, or der ed by incr easing or der of the tr ee. Merge
As mentioned a bove, the sim plest and most im por tant o per ation is the mer ging of two binomial tr ees of the same or der within two binomial hea ps. Due to the str uctur e of binomial tr ees, they can be mer ged tr ivially. As their r oot node is the smallest element within the tr ee, by com par ing the two k eys, the smaller of them is the minimum k ey, and becomes the new r oot node. Then the other tr ee become a su btr ee of the com bined tr ee. This o per ation is basic to the com plete mer ging of two binomial hea ps. f unction mer geTr ee( p, q) if p.r oot.k ey <= q.r oot.k ey return p.addSu bTr ee(q) else return q.addSu bTr ee( p)
DEPT OF CSE, SJBIT
Page 88
DATA STR UCTER S WITH C
10CS35
To mer ge two binomial tr ees of the same or der , f ir st com par e the r oot k ey. Since 7>3, the black tr ee on the lef t(with r oot node 7) is attached to the gr ey tr ee on the r ight(with r oot node 3) as a su btr ee. The r esult is a tr ee of or der 3. The o per ation of merging two hea ps is per ha ps the most inter esting and can be used as a su br outine in most other o per ations. The lists of r oots of both hea ps ar e tr aver sed simultaneously, similar ly as in the mer ge algor ithm. If only one of the hea ps contains a tr ee of or der j, this tr ee is moved to the mer ged hea p. If both hea ps contain a tr ee of or der j, the two tr ees ar e mer ged to one tr ee of or der j+1 so that the minimum-hea p pr o per ty is satisf ied. Note that it may later be necessar y to mer ge this tr ee with some other tr ee of or der j+1 pr esent in one of the hea ps. In the cour se of the algor ithm, we need to examine at most thr ee tr ees of any or der (two f r om the two hea ps we mer ge and one com posed of two smaller tr ees). Because each binomial tr ee in a binomial hea p cor r es ponds to a bit in the binar y r e pr esentation of its size, ther e is an analogy between the mer ging of two hea ps and the binar y addition of the si z e s of the two hea ps, f r om r ight-to-lef t. Whenever a car r y occur s dur ing addition, this cor r es ponds to a mer ging of two binomial tr ees dur ing the mer ge. Each tr ee has or der at most log n and ther ef or e the r unning time is O(log n). f unction mer ge( p, q) while not( p.end() and q.end() )
tr ee = mer geTr ee( p.cur r entTr ee(), q.cur r entTr ee()) if not hea p.cur r entTr ee().em pty() tr ee = mer geTr ee(tr ee, hea p.cur r entTr ee()) hea p.addTr ee(tr ee) else
hea p.addTr ee(tr ee) hea p.next() p.next() q.next()
DEPT OF CSE, SJBIT
Page 89
DATA STR UCTER S WITH C
10CS35
This shows the mer ger of two binomial hea ps. This is accom plished by mer ging two binomial tr ees of the same or der one by one. If the r esulting mer ged tr ee has the same or der as one binomial tr ee in one of the two hea ps, then those two ar e mer ged again. Insert
a new element to a hea p can be done by sim ply cr eating a new hea p containing only this element and then mer ging it with the or iginal hea p. Due to the mer ge, inser t tak es O(log n) time,however it has an amor t i z ed time of O(1) (i.e. constant). Inserting
Find minimum
To f ind the minimum element of the hea p, f ind the minimum among the r oots of the binomial tr ees. This can again be done easily in O(log n) time, as ther e ar e just O(log n) tr ees and hence r oots to examine.By using a pointer to the binomial tr ee that contains the minimum element, the time f or this o per ation can be r educed to O(1). The pointer must be u pdated when per f or ming any o per ation other than Find minimum. This can be done in O(log n) without r aising the r unning time of any o per ation. Delete minimum
To delete the minimum element f r om the hea p, f ir st f ind this element, r emove it f r om its binomial tr ee, and o btain a list of its su btr ees. Then tr ansf or m this list of su btr ees into a se par ate binomial hea p by reor der ing them f r om smallest to lar gest or der . Then mer ge this hea p with the or iginal hea p. Since each tr ee has at most log n childr en, cr eating this new hea p is O(log n). Mer ging hea ps is O(log n), so the entir e delete minimum o per ation is O(log n). f unction deleteMin(hea p)
min = hea p.tr ees().f ir st() f or each cur r ent in hea p.tr ees() if cur r ent.r oot < min then min = cur r ent f or each tr ee in min.su bTr ees() tm p.addTr ee(tr ee) hea p.r emoveTr ee(min) mer ge(hea p, tm p)
Decrease k ey
DEPT OF CSE, SJBIT
Page 90
DATA STR UCTER S WITH C
10CS35
Af ter decreasing the k ey of an element, it may become smaller than the k ey of its par ent, violating the minimum-hea p pr o per ty. If this is the case, exchange the element with its par ent, and possi bly also with its gr and par ent, and so on, until the minimum-hea p pr o per ty is no longer violated. Each binomial tr ee has height at most log n, so this tak es O(log n) time. Delete
To delete an element f r om the hea p, decr ease its k ey to negative inf inity (that is, some value lower than any element in the hea p) and then delete the minimum in the hea p. Perf ormance
All of the f ollowing o per ations wor k in O(log n) time on a binomial hea p with n elements:
Inser t a new element to the hea p Find the element with minimum k ey Delete the element with minimum k ey f r om the hea p Decr ease k ey of a given element Delete given element fr om the heap Mer ge two given hea ps to one hea p
Finding the element with minimum k ey can also be done in O(1) by using an additional pointer to the minimum. Applications
Discr ete event simulation, Pr ior ity queues
7.4. Fibonacci Heaps: A Fibonacci heap is a hea p data str uctur e consisting of a collection of tr ees. It has a better amor tized r unning time than a binomial hea p. Fi bonacci hea ps wer e develo ped by Michael L. Fr edman and R o ber t E. Tar jan in 1984 and f ir st pu blished in a scientif ic jour nal in 1987. The name of Fi bonacci hea p comes f r om Fi bonacci num ber s which ar e used in the r unning time analysis. Find-minimum is O(1) amor tized time.O per ations inser t, decr ease k ey, and mer ge (union) wor k in constant amor tized time. O per ations delete and delete minimum wor k in O(log n) amor tized time. This means that star ting f r om an em pty data str uctur e, any sequence of a o per ations f r om the f ir st gr ou p and b o per ations f r om the second gr ou p would tak e O(a + b log n) time. In a binomial hea p such a sequence of o per ations would tak e O((a + b)log (n)) time. A Fi bonacci hea p is thus better than a binomial hea p when b is asym ptotically smaller than a. Using Fi bonacci hea ps f or pr ior ity queues im pr oves the asymptotic r unning time of im por tant algor ithms, such as Di jk str a's algor ithm f or com puting the shor test path between two nodes in a gr a ph. Structure
DEPT OF CSE, SJBIT
Page 91
DATA STR UCTER S WITH C
10CS35
Figur e 1. Exam ple of a Fi bonacci hea p. It has thr ee tr ees of degr ees 0, 1 and 3. Thr ee ver tices ar e mar k ed (shown in blue). Ther ef or e the potential of the hea p is 9. A Fi bonacci hea p is a collection of tr ees satisf ying the minimum-hea p pr o per ty, that is, the k ey of a child is always gr eater than or equal to the k ey of the par ent. This im plies that the minimum k ey is always at the r oot of one of the tr ees. Com par ed with binomial hea ps, the str uctur e of a Fi bonacci hea p is mor e flexi ble. The tr ees do not have a pr escr i bed sha pe and in the extr eme case the hea p can have ever y element in a se par ate tr ee. This f lexi bility allows some o per ations to be executed in a "lazy" manner , post poning the wor k f or later o per ations. For exam ple mer ging hea ps is done sim ply by concatenating the two lists of tr ees, and o per ation d ecr ea se k e y sometimes cuts a node f r om its par ent and f or ms a new tr ee. However at some point some or der needs to be intr oduced to the hea p to achieve the desir ed r unning time. In par ticular , degr ees of nodes (her e degr ee means the num ber of childr en) ar e k e pt quite low: ever y node has degr ee at most O(log n) and the size of a su btr ee r ooted in a node of degr ee k is at least F k + 2, wher e F k is the k th Fi bonacci num ber . This is achieved by the r ule that we can cut at most one child of each nonr oot node. When a second child is cut, the node itself needs to be cut f r om its par ent and becomes the r oot of a new tr ee (see Pr oof of degr ee bounds, below). The num ber of tr ees is decr eased in the o per ation d el et e minimum, wher e tr ees ar e link ed together . As a r esult of a r elaxed str uctur e, some o per ations can tak e a long time while other s ar e done ver y quick ly. In the amor tized r unning time analysis we pr etend that ver y f ast o per ations tak e a little bit longer than they actually do. This additional time is then later su btr acted f r om the actual r unning time of slow o per ations. The amount of time saved f or later use is measur ed at any given moment by a potential function. The potential of a Fi bonacci hea p is given by Potential = t + 2m wher e t is the num ber of tr ees in the Fi bonacci hea p, and m is the num ber of mar k ed nodes. A node is mar k ed if at least one of its childr en was cut since this node was made a child of another node (all r oots ar e unmar k ed). Thus, the r oot of each tr ee in a hea p has one unit of time stor ed. This unit of time can be used later to link this tr ee with another tr ee at amor tized time 0. Also, each mar k ed node has two units of time stor ed. One can be used to cut the node f r om its par ent. If this ha p pens, the node becomes a r oot and the second unit of time will r emain stor ed in it as in any other r oot. Implementation of operations
To allow f ast deletion and concatenation, the r oots of all tr ees ar e link ed using a cir cular , dou bly link ed list. The childr en of each node ar e also link ed using such a list. For each node, we maintain its num ber of
DEPT OF CSE, SJBIT
Page 92
DATA STR UCTER S WITH C
10CS35
childr en and whether the node is mar k ed. Mor eover we maintain a pointer to the r oot containing the minimum k ey. O per ation f ind minimum is now tr ivial because we k ee p the pointer to the node containing it. It does not change the potential of the hea p, ther ef or e both actual and amor tized cost is constant. As mentioned a bove, merge is im plemented sim ply by concatenating the lists of tr ee r oots of the two hea ps. This can be done in constant time and the potential does not change, leading again to constant amor tized time. O per ation insert wor k s by cr eating a new hea p with one element and doing mer ge. This tak es constant time, and the potential incr eases by one, because the num ber of tr ees incr eases. The amor tized cost is thus still constant.
Fi bonacci hea p f r om Figur e 1 af ter f ir st phase of extr act minimum. Node with k ey 1 (the minimum) was deleted and its childr en wer e added as se par ate tr ees. O per ation extract minimum (same as d el et e minimum) o per ates in thr ee phases. Fir st we tak e the r oot containing the minimum element and r emove it. Its childr en will become r oots of new tr ees. If the num ber of childr en was d , it tak es time O(d ) to pr ocess all new r oots and the potential incr eases by d −1. Ther ef or e the amor tized r unning time of this phase is O(d ) = O(log n).
Fi bonacci hea p f r om Figur e 1 af ter extr act minimum is com pleted. Fir st, nodes 3 and 6 ar e link ed together . Then the r esult is link ed with tr ee r ooted at node 2. Finally, the new minimum is f ound. However to com plete the extr act minimum o per ation, we need to u pdate the pointer to the r oot with minimum k ey. Unf or tunately ther e may be u p to n r oots we need to check . In the second phase we ther ef or e decr ease the num ber of r oots by successively link ing together r oots of the same degr ee. When two r oots u and v have the same degr ee, we mak e one of them a child of the other so that the one with the smaller k ey r emains the r oot. Its degr ee will incr ease by one. This is r e peated until ever y r oot has a dif f er ent degr ee. To f ind tr ees of the same degr ee ef f iciently we use an ar r ay of length O(log n) in which we k ee p a pointer to one r oot of each degr ee. When a second r oot is f ound of the same degr ee, the two ar e link ed and the ar r ay is u pdated. The actual r unning time is O(log n + m) wher e m is the num ber of r oots at the beginning of the second phase. At the end we will have at most O(log n) r oots ( because each has a dif f er ent degr ee). Ther ef or e the dif f er ence in the potential f unction f r om bef or e this phase to af ter it is: O(log n) − m, and the amor tized r unning time is then at most O(log n + m) + O(log n) − m = O(log n). Since we can scale u p the units of potential stor ed at inser tion in each node by the constant f actor in the O(m) par t of the actual cost f or this phase.
DEPT OF CSE, SJBIT
Page 93
DATA STR UCTER S WITH C
10CS35
In the thir d phase we check each of the r emaining r oots and f ind the minimum. This tak es O(log n) time and the potential does not change. The over all amor tized r unning time of extr act minimum is ther ef or e O(log n).
Fi bonacci hea p f r om Figur e 1 af ter decr easing k ey of node 9 to 0. This node as well as its two mar k ed ancestor s ar e cut f r om the tr ee r ooted at 1 and placed as new r oots. O per ation decrease k ey will tak e the node, decr ease the k ey and if the hea p pr o per ty becomes violated (the new k ey is smaller than the k ey of the par ent), the node is cut f r om its par ent. If the par ent is not a root, it is mar k ed. If it has been mar k ed alr eady, it is cut as well and its par ent is mar k ed. We continue u pwar ds until we r each either the r oot or an unmar k ed node. In the pr ocess we cr eate some num ber , say k , of new tr ees. Each of these new tr ees exce pt possi bly the f ir st one was mar k ed or iginally but as a r oot it will become unmar k ed. One node can become mar k ed. Ther ef or e the potential decr eases by at least k − 2. The actual time to per f or m the cutting was O(k ), ther ef or e the amor tized r unning time is constant. Finally, o per ation delete can be im plemented sim ply by decr easing the k ey of the element to be deleted to minus inf inity, thus tur ning it into the minimum of the whole hea p. Then we call extr act minimum to remove it. The amor tized r unning time of this o per ation is O(log n). Proof of degree bounds
The amor tized per f or mance of a Fi bonacci hea p de pends on the degr ee (num ber of childr en) of any tr ee r oot being O(log n), wher e n is the size of the hea p. Her e we show that the size of the (su b)tr ee r ooted at any node x of degr ee d in the hea p must have size at least F d +2, wher e F k is the k th Fi bonacci num ber . The degr ee bound f ollows f r om this and the f act (easily pr oved by induction) that f or all integer s , wher e tak ing the log to base of both sides gives
. (We then have as r equir ed.)
, and
Consider any node x somewher e in the hea p ( x need not be the r oot of one of the main tr ees). Def ine size( x) to be the size of the tr ee r ooted at x (the num ber of descendants of x, including x itself ). We pr ove by induction on the height of x (the length of a longest sim ple path f r om x to a descendant leaf ), that size( x) ≥ F d+ 2, wher e d is the degr ee of x. Base case: If x has height 0, then d = 0, and size( x) = 1 = F 2.
Su p pose x has positive height and degr ee d >0. Let y1, y2, ..., yd be the childr en of x, indexed in or der of the times they wer e most r ecently made childr en of x ( y1 being the ear liest and yd the latest), and let c1, c2, ..., cd be their r es pective degr ees. We claim that ci ≥ i-2 f or each i with 2≤i≤d : Just bef or e yi was made a child of x, y1,..., yi−1 wer e alr eady childr en of x, and so x had degr ee at least i−1 at that time. Since tr ees ar e com bined only when the degr ees of their r oots ar e equal, it must have been that yi also had degr ee at least i-1 at the time it became a child of x. Fr om that time to the pr esent, yi can only have lost at most one child (as guar anteed by the mar k ing pr ocess), and so its cur r ent degr ee ci is at least i−2. This pr oves the claim. Inductive case:
DEPT OF CSE, SJBIT
Page 94
DATA STR UCTER S WITH C
10CS35
Since the heights of all the yi ar e str ictly less than that of x, we can a p ply the inductive hy pothesis to them to get size( yi) ≥ F ci+2 ≥ F (i−2)+2 = F i. The nodes x and y1 each contr i bute at least 1 to size( x), and so we have
A r outine induction pr oves that bound on size( x).
for any
, which gives the desir ed lower
Worst case
Although the total r unning time of a sequence of o per ations star ting with an em pty str uctur e is bounded by the bounds given a bove, some (ver y f ew) o per ations in the sequence can tak e ver y long to com plete (in par ticular delete and delete minimum have linear r unning time in the wor st case). For this r eason Fi bonacci hea ps and other amor tized data str uctur es may not be a p pr o pr iate f or real-time systems. It is possi ble to cr eate a data str uctur e which the same wor st case per f or mance as the Fi bonacci hea p has amor tized per f or mance.[3] However the r esulting str uctur e is ver y com plicated, so it is not usef ul in most pr actical cases.
7.5. Pairing heaps Pair ing hea ps ar e a ty pe of hea p data str uctur e with r elatively sim ple im plementation and excellent pr actical amor tized per f or mance. However , it has pr oven ver y dif f icult to deter mine the pr ecise asym ptotic r unning time of pair ing hea ps. Pair ing hea ps ar e hea p or der ed multiway tr ees. Descr i bing the var ious hea p o per ations is r elatively sim ple (in the f ollowing we assume a min-hea p):
find -min: sim ply r etur n the to p element of the hea p. mer g e: com par e the two r oot elements, the smaller r emains
the r oot of the r esult, the lar ger element and its su btr ee is a p pended as a child of this r oot. in ser t : cr eate a new hea p f or the inser ted element and mer g e into the or iginal hea p. decr ea se-k ey (o ptional): r emove the su btr ee r ooted at the k ey to be decr eased then mer g e it with the hea p. del et e-min: r emove the r oot and mer g e its su btr ees. Var ious str ategies ar e em ployed.
The amor tized time per del et e-min is O(logn).[1] The o per ations find -min, mer g e, and in ser t tak e
amor tized time[2] and decr ea se-k ey tak es amor tized time.[3] Fr edman pr oved that the amor tized time per decr ea se-k ey is at least Ω(loglogn).[4] That is, they ar e less ef f icient than Fi bonacci hea ps, which per f or m decr ea se-k ey in O(1) amor tized time. O(1)
Implementation
DEPT OF CSE, SJBIT
Page 95
DATA STR UCTER S WITH C
10CS35
A pair ing hea p is either an em pty hea p, or a pair consisting of a r oot element and a possi bly em pty list of pair ing hea ps. The hea p or der ing pr o per ty r equir es that all the r oot elements of the su bhea ps in the list ar e not smaller that then r oot element of the hea p. The f ollowing descr i ption assumes a pur ely f unctional hea p that does not su p por t the decr ea se-k ey o per ation. type Pair ingHea p[Elem] = Em pty | Hea p(elem: Elem, su bhea ps: List[Pair ingHea p[Elem]]) Operations fi nd -mi n
The f unction find -min sim ply r etur ns the r oot element of the hea p: f unction find-min(hea p) if hea p == Em pty error else return hea p.elem merge:
Mer ging with an em pty hea p r etur ns the other hea p, other wise a new hea p is r etur ned that has the minimum of the two r oot elements as its r oot element and just adds the hea p with the lar ger r oot to the list of su bhea ps: f unction mer ge(hea p1, hea p2) if hea p1 == Em pty return hea p2 elsif hea p2 == Em pty return hea p1 elseif hea p1.elem < hea p2.elem return Hea p(hea p1.elem, hea p2 :: hea p1.su bhea ps) else return Hea p(hea p2.elem, hea p1 :: hea p2.su bhea ps) Insert: The easiest way to insert an element into a heap is to merge the heap with a new heap containing just this element and an empty list of subheaps: f unction inser t(elem, hea p) return mer ge(Hea p(elem, []), hea p) delete-min:
The only non-tr ivial f undamental o per ation is the deletion of the minimum element f r om the hea p. The standar d str ategy f ir st mer ges the su bhea ps in pair s (this is the ste p that gave this datastr uctur e its name) fr om lef t to r ight and then mer ges the r esulting list of hea ps f r om r ight to lef t: f uction delete-min(hea p) if hea p == Em pty
DEPT OF CSE, SJBIT
Page 96
DATA STR UCTER S WITH C
10CS35
error elsif length(hea p.su bhea ps) == 0 return Em pty elsif length(hea p.su bhea ps) == 1 return hea p.su bhea ps[0] else return mer ge- pair s(hea p.su bhea ps)
This uses the auxiliar y f unction mer g e- pair s: f unction mer ge- pair s(l) if length(l) == 0 return Em pty elsif length(l) == 1 return l[0] else return mer ge(mer ge(l[0], l[1]), mer ge- pair s(l[2.. ]))
That this does indeed im plement the descr i bed two- pass lef t-to-right then r ight-to-lef t mer ging str ategy can be seen f r om this r eduction: mer ge- pair s([H1, H2, H3, H4, H5, H6, H7]) => mer ge(mer ge(H1, H2), mer ge- pair s([H3, H4, H5, H6, H7])) # mer ge H1 and H2 to H12, then the r est of the list => mer ge(H12, mer ge(mer ge(H3, H4), mer ge- pair s([H5, H6, H7]))) # mer ge H3 and H4 to H34, then the r est of the list => mer ge(H12, mer ge(H34, mer ge(mer ge(H5, H6), mer ge- pair s([H7])))) # mer ge H5 and H5 to H56, then the r est of the list => mer ge(H12, mer ge(H34, mer ge(H56, H7))) # switch dir ection, mer ge the last two r esulting hea ps, giving H567 => mer ge(H12, mer ge(H34, H567)) # mer ge the last two r esulting hea ps, giving H34567 => mer ge(H12, H34567) # f inally, mer ge the f ir st mer ged pair with the r esult of mer ging the r est => H1234567
7.6. RECOMMENDED QUESTIONS 1. Def ine a pr ior ity queue 2. Def ine a Deque (Dou ble-Ended Queue) 3. What is the need f or Pr ior ity queue? 4. What ar e the a p plications of pr ior ity queues? 5. What ar e binomial hea ps? 6. Ex plain the height- biased lef tist tr ees. DEPT OF CSE, SJBIT
Page 97
DATA STR UCTER S WITH C
10CS35
UNIT – 8 : EFFICIENT BINAR Y SEAR CH TR EES
8.1. Optimal Binary Search Trees: An o ptimal binar y sear ch tr ee is a binar y sear ch tr ee f or which the nodes ar e ar r anged on levels such that the tr ee cost is minimum. For the pur pose of a better pr esentation of o ptimal binar y sear ch tr ees, we will consider “extended binar y sear ch tr ees”, which have the k eys stor ed at their inter nal nodes. Su p pose “n” keys k 1, k 2, … , k n ar e stor ed at the inter nal nodes of a binar y sear ch tr ee. It is assumed that the k eys ar e given in sor ted or der , so that k1< k 2 < … < k n. An extended binar y sear ch tr ee is o btained f r om the binar y sear ch tr ee by adding successor nodes to each of its ter minal nodes as indicated in the f ollowing figur e by squar es:
In the extended tr ee: the squar es r e pr esent ter minal nodes. These ter minal nodes r e pr esent unsuccessf ul sear ches of the tr ee f or k ey values. The sear ches did not end successf ully, that is, because they r e pr esent k ey values that ar e not actually stor ed in the tr ee; the r ound nodes r e pr esent inter nal nodes; these ar e the actual k eys stor ed in the tr ee; assuming that the r elative f r equency with which each k ey value is accessed is k nown, weights can be assigned to each node of the extended tr ee ( p1 … p6). They r e pr esent the r elative f r equencies of sear ches ter minating at each node, that is, they mar k the successf ul sear ches.
If the user sear ches a par ticular k ey in the tr ee, 2 cases can occur : 1 – The key is f ound, so the cor r es ponding weight ‘ p’ is incr emented; 2 – The key is not f ound, so the cor r es ponding ‘q’ value is incr emented. GE NER ALIZATIO N: the ter minal node in the extended tr ee that is the lef t successor of k 1 can be inter pr eted as r e pr esenting all k ey values that ar e not stor ed and ar e less than k 1. Similar ly, the ter minal node in the extended tr ee that is the r ight successor of k n, r e pr esents all k ey values not stor ed in the tr ee that ar e gr eater than k n. The ter minal node that is successed between k i and k i-1 in an inor der tr aver sal re pr esents all k ey values not stor ed that lie between k i and k i - 1. An o bvious way to f ind an o ptimal binar y sear ch tr ee is to gener ate each possi ble binar y sear ch tr ee f or the k eys, calculate the weighted path length, and k ee p that tr ee with the smallest weighted path length. This sear ch thr ough all possi ble solutions is not f easi ble, since the num ber of such tr ees gr ows ex ponentially with “n”. An alter native would be a r ecur sive algor ithm. Consider the char acter istics of any o ptimal tr ee. Of cour se it has a r oot and two su btr ees. Both su btr ees must themselves be o ptimal binar y sear ch tr ees with r es pect to their k eys and weights. Fir st, any su btr ee of any binar y sear ch tr ee must be a binar y sear ch tr ee. Second, the su btr ees must also be o ptimal. Since ther e ar e “n” possi ble k eys as candidates f or the r oot of the o ptimal tr ee, the r ecur sive solution must tr y them all. For each candidate k ey as r oot, all k eys less than
DEPT OF CSE, SJBIT
Page 98
DATA STR UCTER S WITH C
10CS35
that k ey must a p pear in its lef t su btr ee while all k eys gr eater than it must a p pear in its r ight su btr ee. Stating the r ecur sive algor ithm based on these o bser vations r equir es some notations: OBST(i, j) denotes the o ptimal binar y sear ch tr ee containing the k eys k i, ki+1, …, k j; Wi, j denotes the weight matr ix f or OBST(i, j) Wi, j can be def ined using the f ollowing f or mula:
_ _ Ci, j, 0 ≤ i ≤ j ≤ n denotes the cost matr ix f or OBST(i, j) Ci, j can be def ined r ecur sively, in the f ollowing manner : Ci, i = Wi, j Ci, j = Wi, j + mini
8.2. AVL Trees DEPT OF CSE, SJBIT
Page 99
DATA STR UCTER S WITH C
10CS35
Def initions Named af ter Adelson, Velsk ii, and Landis.
Tr ees of height O(log n) ar e said to be balanced. AVL tr ees consist of a s pecial case in which the su btr ees of each node dif f er by at most 1 in their height. Balanced tr ees can be used to sear ch, inser t, and delete ar bitr ar y keys in O(log n) time. In contr ast, height- biased lef tist tr ees r ely on non- balanced tr ees to s peed-u p inser tions and deletions in pr ior ity queues. Height
Claim: AVL trees are balanced. Proof . Let Nh denote the num ber of nodes in an AVL tr ee of de pth h
Nh > Nh-1 + Nh-2 + 1 > 2Nh-2 + 1 > 1 + 2(1 + 2Nh-4) = 1 + 2 + 22 N h-4 > 1 + 2 + 22 + 23 N h-6 ... > 1 + 2 + 22 + 23 + ... + 2h/2 = 2h/2 – 1 Hence, 2h/2 - 1 < n h /2
< log 2(n + 1)
h
< 2 log 2(n + 1)
A mor e car ef ul analysis, based on Fi bonacci num ber s theor y, im plies the tighter bound of 1.44 log 2(n + 2).
DEPT OF CSE, SJBIT
Page 100
DATA STR UCTER S WITH C
10CS35
R otations
LL
RR
LR
RL
DEPT OF CSE, SJBIT
Page 101
DATA STR UCTER S WITH C
10CS35
LL & LR LL
Insertions and Deletions
Inser tions and deletions ar e per f or med as in binar y sear ch tr ees, and f ollowed by r otations to cor r ect im balances in the outcome tr ees. In the case of inser tions, one r otation is suf f icient. In the case of deletions, O(log n) r otations at most ar e needed f r om the f ir st point of discr e pancy going u p towar d the root.
Delete 4
Im balance at ‘3’ im plies a LL r otation with ‘2’
Im balance at ‘5’ im plies a R R r otation with ‘8’.
DEPT OF CSE, SJBIT
Page 102
DATA STR UCTER S WITH C
10CS35
8.3. Red-black Trees Properties
A binar y sear ch tr ee in which
The r oot is color ed black All the paths f r om the r oot to the leaves agr ee on the num ber of black nodes No path f r om the r oot to a leaf may contain two consecutive nodes color ed r ed
Em pty su btr ees of a node ar e tr eated as su btr ees with r oots of black color . The r elation n > 2h/2 - 1 im plies the bound h < 2 log 2(n + 1). Insertions
Inser t the new node the way it is done in binar y sear ch tr ees Color the node r ed If a discr e pancy ar ises f or the r ed- black tr ee, f ix the tr ee accor ding to the ty pe of discr e pancy.
A discr e pancy can r esult f r om a par ent and a child both having a r ed color . The ty pe of discr e pancy is deter mined by the location of the node with r es pect to its gr and par ent, and the color of the si bling of the par ent. Discr e pancies in which the si bling is r ed, ar e f ixed by changes in color . Discr e pancies in which the si blings ar e black , ar e f ixed thr ough AVL-lik e r otations. Changes in color may pr o pagate the pr o blem u p towar d the r oot. On the other hand, at most one r otation is suf f icient f or f ixing a discr e pancy.
LLr
if ‘A’ is the r oot, then it should be r e painted to black
DEPT OF CSE, SJBIT
Page 103
DATA STR UCTER S WITH C
10CS35
LR r if ‘A’ is the r oot, then it should be r e painted to black
LL b
LR b
Discr e pancies of ty pe R R r , R Lr , R R b, and R L b ar e handled in a similar manner .
inser t 1
inser t 2
inser t 3 RR b discr e pancy
DEPT OF CSE, SJBIT
Page 104
DATA STR UCTER S WITH C
10CS35
inser t 4 RR r discr e pancy
inser t 5 RR b discr e pancy Deletions
Delete a k ey, and a node, the way it is done in binar y sear ch tr ees. A node to be deleted will have at most one child. If the deleted node is r ed, the tr ee is still a r ed black tr ee. If the deleted node has a r ed child, r e paint the child to black . If a discr e pancy ar ises f or the r ed- black tr ee, f ix the tr ee accor ding to the ty pe of discr e pancy. A discr e pancy can r esult only f r om a loss of a black node.
Let A denote the lowest node with un balanced su btr ees. The ty pe of discr e pancy is deter mined by the location of the deleted node (R ight or Lef t), the color of the si bling (black or red), the num ber of r ed childr en in the case of the black si blings, and and the num ber of gr and-childr en in the case of r ed si blings. In the case of discr e pancies which r esult f r om the addition of nodes, the cor r ection mechanism may pr o pagate the color pr o blem (i.e., par ent and child painted r ed) u p towar d the r oot, and sto p ped on the way by a single r otation. Her e, in the case of discr e pancies which r esult f r om the deletion of nodes, the discr e pancy of a missing black node may pr o pagate towar d the r oot, and sto p ped on the way by an a p plication of an a p pr o pr iate r otation.
R b0 change of color , sending the def iciency u p to the r oot of the su btr ee
R b1
DEPT OF CSE, SJBIT
Page 105
DATA STR UCTER S WITH C
10CS35
R b2
Rr 0 might r esult in LL b discr e pancy of par ent and child having both the r ed color
Rr 1
DEPT OF CSE, SJBIT
Page 106
DATA STR UCTER S WITH C
10CS35
Rr 2
Similar tr ansf or mations a p ply to L b0, L b1, L b2, Lr 0, Lr 1, and Lr 2.
8.4. S play Tr ees : S play Tr ees (self -ad justing sear ch tr ees): These notes just descr i be the bottom-u p s playing algor ithm, the pr oof of the access lemma, and a f ew a p plications. Ever y time a node is accessed in a s play tr ee, it is moved to the r oot of the tr ee. The amor tized cost of the o per ation is O(log n). Just moving the element to the r oot by r otating it u p the tr ee does not have this pr o per ty. S play tr ees do movement is done in a ver y s pecial way that guar antees this amor tized bound. I'll descr i be the algor ithm by giving thr ee r ewr ite r ules in the f or m of pictur es. In these pictur es, x is the node that was accessed (that will eventually be at the r oot of the tr ee). By look ing at the local str uctur e of the tr ee def ined by x, x's par ent, and x's gr and par ent we decide which of the f ollowing thr ee r ules to f ollow. We continue to a p ply the r ules until x is at the r oot of the tr ee:
y Zig (ter minal case): x
Zig-zag:
Zig-zig:
z / \ x z /
/ x
x / ====> y
\
z
y
y
/ x ====> x ====> / y z y x \
y ====> / \ ====> x z \ z
/\
y
Notes (1) Each r ule has a mir r or image var iant, which cover s all the cases. (2) The zig-zig r ule is the one that distinguishes s playing f r om just r otating x to the r oot of the tr ee. (3) To p-down s playing is much mor e ef f icient in pr actice. Code f or doing this is on my we b site (www.cs.cmu.edu/~sleator ). Her e ar e some exam ples: DEPT OF CSE, SJBIT
Page 107
DATA STR UCTER S WITH C
10CS35
6 0 3 / \ /\ 5 5 / \ / /\ / \ 4 s play(0) 3 6 s play(3) 0 5 / =======> / \ =======> \ / \ 3 1 4 1 4 6 / \ \ 2 2 2 / 1 / 0 To analyze the per f or mance of s playing, we star t by assuming that each node x has a weight w(x) > 0. These weights can be chosen ar bitr ar ily. For each assignment of weights we will be a ble to der ive a bound on the cost of a sequence of accesses. We can choose the assignment that gives the best bound. By giving the f r equently accessed elements a high weight, we will be a ble to get tighter bounds on the running time. Note that the weights ar e only used in the analysis, and do not change the algor ithm at all. (A commonly used case is to assign all the weights to be 1.) Bef or e we can state our per f or mance lemma, we need to def ine two mor e quantities. The size of a node x (denoted s(x)) is the total weight of all the nodes in the su btr ee r ooted at x. The r ank of a node x (denoted r(x)) is the f loor (log _ 2) of the size of x. R estating these:
s(x) = Sum (over y in the su btr ee r ooted at x) of w(y) r(x) = f loor (log(s(x))) For each node x, we'll k ee p r (x) tok ens on that node. (Alter natively, the potential f unction will just be the sums of the r ank s of all the nodes in the tr ee.) Her e's an exam ple to illustr ate this: Her e's a tr ee, la beled with sizes on the left and r ank s on the r ight.
9o / 8o /\ 6o o1 /\ 2o o3 / / 1o o2 \ o1
3o / 3o /\ 2o o0 /\ 1o o1 / / 0o o1 \ o0
Notes a bout this potential: DEPT OF CSE, SJBIT
Page 108
DATA STR UCTER S WITH C
10CS35
(1) Doing a r otation between a pair of nodes x and y only ef f ects the r ank s of the nodes x and y, and no other nodes in the tr ee. Fur ther mor e, if y was the r oot bef or e the r otation, then the r ank of y bef or e the rotation equals the r ank of x af ter the r otation. (2) Assuming all the weights ar e 1, the potential of a balanced tr ee is O(n), and the potential of a long chain (most un balanced tr ee) is O(n log n). (3) In the bank er 's view of amor tized analysis, we can think of having r (x) tok ens on node x. Access lemma: The num ber of s playing ste ps done when s playing node x in a tr ee with r oot t is at most 3(r (t)-r(x))+1. Pr oof : As we do the wor k , we must pay one tok en f or each s play ste p we do. Fur ther mor e we must mak e sur e that ther e ar e always r (x) tok ens on node x as we do our r estr uctur ing. We ar e going to allocate 3(r (t) r(x)) +1 tok ens to do the s play. Our jo b in the pr oof is to show that this is enough. Fir st we need the f ollowing o bser vation a bout r ank s, called the R ank R ule. Su p pose that two si blings have the same r ank , r . Then the par ent has r ank at least r +1. This is because if the r ank is r , then the size is at least 2^r . If both si blings have size at least 2^r , then the total size is at least 2^(r +1) and we conclude that the r ank is at least r +1. We can r e pr esent this with the f ollowing diagr am:
x /\ r r
Then x >= r +1
Conver sly, su p pose we f ind a situation wher e a node and its par ent have the same r ank , r . Then the other si bling of the node must have r ank < r .So if we have thr ee nodes conf igur ed as f ollows, with these r ank s:
r /\ Then x < r x r Now we can go back to pr oving the lemma. The a p pr oach we tak e is to show that the 3(r '(x) - r (x)) tok ens ar e suf f icient to pay f or the a zig-zag or a zig-zig ste ps. And that 3(r '(x) - r (x)) +1 is suf f icient to pay f or the zig ste p. (Her e r '() r e pr esents the r ank f unction af ter the ste p, and r () r e pr esents the r ank function bef or e the ste p.) When we sum these costs to com pute the amor tized cost f or the entir e s play o per ation, they telesco pe to:
3(r (t) - r(x)) +1. Note that the +1 comes f r om the zig ste p, which can ha p pen only once. Her e's an exam ple, the la bels ar e r ank s: 2o DEPT OF CSE, SJBIT
2x Page 109
DATA STR UCTER S WITH C
10CS35
/ /\ 2o 0o o2 / / 2o s play(x) 2o / ==========> /\ 2x 1o o0 /\ / 0o o1 0o / o0 ----------------Total: 9
---------------7
We allocated: 3(2-2)+1 = 1 extr a tok ens f r om r estr uctur ing: 9-7 = 2 -----3
Ther e ar e 2 s play ste ps. So 3 > 2, and we have enough. It r emains to show these bounds f or the individual ste ps. Ther e ar e thr ee cases, one f or each of the ty pes of s play ste ps.
Zig:
ro / ao
o r ==> \ o b <= r
The actual cost is 1, so we s pend one tok en on this. We tak e the tok ens on a and augment them with another r -a and put them on b. Thus the total num ber of tok ens needed is 1+r -a. This is at most 1+3(r -a). Zig-zag: We'll split it into 2 cases: Case 1: The r ank does not incr ease between the star ting node and ending node of the ste p.
ro / ro \ ro
o r /\ ===> a o o b
By the R ank R ule, one of a or b must be < r , so one tok en is r eleased f r om the data str uctur e. we use this to pay f or the wor k . Thus, our allocation of 3(r -r) = 0 is suf f icient to pay f or this.
Case2: (The r ank does incr ease) ro DEPT OF CSE, SJBIT
o r Page 110
DATA STR UCTER S WITH C / b o \ ao
10CS35
/\ ===> c o o d
The tok ens on c can be su p plied f r om those on b. (Ther e ar e enough ther e cause b >= c.) Note that r -a > 0. So: use r -a (which is at least 1) to pay f or the wor k use r -a to augment the tok ens on a to mak e enough f or d Summing these two gives: 2(r -a). But 2(r -a) <= 3(r -a), which com pletes the zig-zag case. Zig-zig: Again, we s plit into two cases. Case 1: The r ank does not incr ease between the star ting node and the ending node of the ste p.
ro r o o r / /\ \ r o =====> r o o d ====> o c <= r / \ ro (d
o r \ ============> o c <= r \ o d <= r
use r -a (which is at least 1) to pay f or the wor k use r -a to boost the tok ens on a to cover those needed f or d use r -a to boost the tok ens on b to cover those needed f or c Summing these gives 3(r -a), which com pletes the analyzis of the zig-zig case.
This com pletes the pr oof of the access lemma. Balance Theor em: A sequence of m s plays in a tr ee of n nodes tak es time O(m log(n) + n log(n)). Pr oof : We a p ply the access lemma with all the weights equal to 1. For a given s play, r (t) <= log(n), and r (x) >= 0. So the amor tized cost of the s play is at most: DEPT OF CSE, SJBIT
Page 111
DATA STR UCTER S WITH C
10CS35
3 log(n) + 1
We now switching to the wor ld of potentials ( potential = total tok ens in the tr ee). To bound the cost of the sequence we add this amount f or each s play, then add the initial minus the f inal potential. The initial potential is at most n log(n), and the f inal potential is at least 0. This gives a bound of :
m (3 log(n) + 1) + n log(n) = O(m log(n) + n log(n)) Splaying
S play ste p at x let p(x) = par ent of node x case 1 (zig) p(x) = r oot of the tr ee
case 2 (zig-zig) p(x) is not the r oot and x and p(x) ar e both lef t (r ight) childr en
case 3 (zig-zag) p(x) is not the r oot and x is a lef t (r ight) child and p(x) is a r ight(lef t ) child
To S play a node X, r e peat the s play ste p on X until it is the r oot
DEPT OF CSE, SJBIT
Page 112
DATA STR UCTER S WITH C
10CS35
Splay B Splay vs. Move-to-root
Case 1
Case 2
Splay vs. Move-to-root
Case 3
DEPT OF CSE, SJBIT
Page 113
DATA STR UCTER S WITH C
10CS35
Move-to-root A
Splay A
DEPT OF CSE, SJBIT
Page 114
DATA STR UCTER S WITH C
10CS35
Per f or mance o f S pl a y T r ee
S playing at a node of de pth d tak es Theta(d) time ck = actual cost of o per ation k
= amor tized cost of o per ation k Dk = the state of the data str uctur e af ter a p plying k 'th o per ation to Dk = potential associated with Dk
so we get:
The actual amount of wor k r equir ed is given by:
DEPT OF CSE, SJBIT
Page 115
DATA STR UCTER S WITH C
10CS35
So need the total amor tized wor k and dif f er ence in potential Potential f or Splay Trees
Let: w(x) = weight of node x, a f ixed but ar bitr ar y value size(x) = rank (x) = lg(size(x))
Example
Let w(x) = 1/n wher e n is the num ber of nodes in the tr ee
Lemma The amor tized time to s play node x in a tr ee with r oot at t is at most 3(r (t) - r (x)) + 1 =
O(lg(s(t)/s(x))) Let s, r denote the size, r ank f unctions befor e a s play Let s', r ' denote the size, r ank f unctions af ter a s play Count r otations Case 1 (zig) One r otation
DEPT OF CSE, SJBIT
Page 116
DATA STR UCTER S WITH C
10CS35
Amor tized time of this ste p is: 1 + [r '(x) + r '(y)] - r(x) - r(y) only x and y change r ank <= 1 + r '(x) - r(x) r(y) >= r '(y) <= 1 + 3(r '(x) - r(x)) r'(x) >= r (x) Case 2 (zig-zig) Two r otations
Amor tized time of this ste p is: 2 + r '(x) + r '(y) + r '(z) - r(x) - r(y) - r(z) only x, y ,z change r ank = 2 + r '(y) + r '(z) - r(x) - r(y) r'(x) = r (z) <= 2 + r '(x) + r '(z) - 2r (x) r'(x) >= r '(y) and r(y) >= r (x) Assume that 2r '(x) - r(x) - r'(z) >= 2 2 + r '(x) + r '(z) - 2r (x) <= 2r '(x) - r(x) - r'(z) + r '(x) + r '(z) - 2r (x) = 3r '(x) - 3r (x) Need to show 2r '(x) - r(x) - r'(z) >= 2
Claim 1
Set b = 1-a DEPT OF CSE, SJBIT
Page 117
DATA STR UCTER S WITH C
10CS35
We have Setting this to 0 to f ind extr eme value we get
so
that is a = 1/2 and b = 1/2
but lg(1/2)+lg(1/2)= -2 End claim 1 Claim 2 2r '(x) - r(x) - r'(z) >= 2
Recall that:
We have: r(x) + r '(z) - 2r '(x) = lg(s(x)) + lg(s'(z)) - 2lg(s'(x)) = lg(s(x)/s'(x)) + lg(s'(z)/s'(x)) Now s(x) + s'(z) <= s'(x) (Why?) so 0<= s(x)/s'(x) + s'(z)/s'(x) <= 1 Set s(x)/s'(x) = a and s'(z)/s'(x) = b in claim 1 to get lg(s(x)/s'(x)) + lg(s'(z)/s'(x)) <= -2
DEPT OF CSE, SJBIT
Page 118