CS-305 Data Structure
Contents: - Introduction - Types -Array -Stack -Queue -Linked List -Trees -Graphs Computer Science is the study of data, its representation and transformation by Computer. For every data object, we consider the class of operations to be performed and then the way to represent the object so that these operations may be efficiently carried out. We require two techniques for this: evi evise se alte alterrnati native ve form orms of of dat dataa rep repre ressent entatio ation n !naly nalyse se the al"o al"ori ritthm whi which oper operat ates es on the st struct ructur ure. e. #hese are several terms involved above which we need to $now carefully before we proceed. #hese include data structure, data type and data representation. ! data type is a term which refers to the $inds of data that variables may hold. With every pro"rammin" lan"ua"e there is a set of built-in data types. #his means that the lan"ua"e allows variables variables to name data of that type and provides provides a set of operations operations which meanin"fully meanin"fully manipulates manipulates these variables. Some data types are easy to provide because they are built-in into the computer%s machine lan"ua"e instruction set, such as inte"er, character etc. &ther data types require considerably more efficient to implement. 'n some lan"ua"es, these are features which allow one to construct combinations of the built-in types ( li$e structures in )C%*. +owever, it is necessary to have such mechani mechanism sm to create create the new comple comple data data types types which which are not provid provided ed by the pro"ramm pro"rammin" in" lan"ua"e. #he new type also must be meanin"ful for manipulations. Such meanin"ful data types are referred as abstract data type. ABSTACT DATA T!"#:
!n abstract data type can be assumed as a mathematical model with a collection of operations defined on that model i.e. an !# is a new data type derived or created from basic or built in data type based on a particular lo"ical or mathematical model. For Example: Set of inte"ers consistin" of different numbers may be an !#. ! set is a combination of more than one inte"er, but the operations on set is a "eneralied operation of different inte"ers such as union, intersection, product, and difference. !bove !# or set encapsulates different mathematical operations and "eneralies operations on !#.
Basic Properties of ADT are: (i* ncapsulation and (ii* /eneraliation 0et us consider the followin" eample: Struct student 1 int rno2 char name3456,branch3556 int mar$s2. 72 #he above structure can be used to collect or retrieve the information of a student. #he structure can be called as !# if all the operations on student can be performed usin" the structure.
DATA DATA ST$CT$#: ST$CT$ #:
!n implementation of abstract data type is data structure i.e. a mathematical or lo"ical model of a particular or"aniation of data is called data structure. #hus, a data structure is the portion of memory allotted for a model, in which the required data can be arran"ed in a proper fashion. Types:-
! data structure can be broadly classified into
(i* 8rimitive data structure (ii* 9on-primitive data structure
(i* 8rimitive data structure #he data structures, typically those data structure that are directly operated upon by machine level instructions i.e. the fundamental data types such as int, float, double incase of )c% are $nown as primitive data structures. (ii* 9on-primitive data structure #he data structures, which are not primitive are called non-primitive data structures. #here are two types of-primitive data structures. (a) Linear Data St Structures:! list, which shows the relationship of adjacency between elements, is said to be linear data structure. #he most, simplest linear data structure is a 5- array, array, but because of its deficiency, list is frequently used for different different $inds of data. (b) onon-linear !ata ata structure:! list, which doesn%t show the relationship of adjacency between elements, is said to be non-linear data structure. Linear Data Structure:
! list is an ordered list, which consists of different data items connected by means of a lin$ or pointer. #his type of list is also called a lin$ed list. ! lin$ed list may be a sin"le list or double lin$ed list. •
Sin"le lin$ed list: - ! sin"le lin$ed list is used to traverse amon" the nodes in one direction.
•
ouble lin$ed list: - ! double lin$ed list is used to traverse amon" the nodes in both the directions.
! lin$ed list is normally used to represent any data used in word-processin" applications, also applied in different ;S pac$a"es. ! list has two subsets. #hey are: Stac$: - 't is also called as last-in-first-out (0'F&* system. 't is a linear list in which • insertion and deletion ta$e place only at one end. 't is used to evaluate different epressions.
#he frequently used non-linear data structures are (a* #rees : - 't maintains hierarchical relationship between various elements (b* /raphs: - 't maintains random relationship or point-to-point relationship between various elements. '"#ATI'% '% DATA ST$CT$#S: -
#he four major operations performed on data structures are: (i* (ii* (iii*
'nsertion eletion #raversal
(iv*
Searchin"
: - 'nsertion means addin" new details or new node into the data structure. : - eletion means removin" a node from the data structure. : - #raversin" means accessin" each node eactly once so that the nodes of a data structure can be processed. #raversin" is also called as visitin". : - Searchin" means findin" the location of node for a "iven $ey value.
!part from the four operations mentioned above, there are two more operations occasionally performed on data structures. #hey are: (a* Sortin" :Sortin" means arran"in" the data in a particular order. (b* ;er"in" : ;er"in" means joinin" two lists.
#"#S#%ATI'% '( DATA ST$CT$#S:-
!ny data structure can be represented in two ways. #hey are: (i* (ii* (i*
Sequential representation 0in$ed representation Se"uential representation: - ! sequential representation maintains the data in continuous memory locations which ta$es less time to retrieve the data but leads to time compleity durin" insertion and deletion operations. ecause of sequential nature, the elements of the list must be freed, when we want to insert a new element or new data at a particular position of the list. #o acquire free space in the list, one must shift the data of the list towards the ri"ht side from the position where the data has to be inserted. #hus, the time ta$en by C8= to shift the data will be much hi"her than the insertion operation and will
lead to compleity in the al"orithm. Similarly, while deletin" an item from the list, one must shift the data items towards the left side of the list, which may waste C8= time. Dra#bac$ of Se"uential representation: #he major drawbac$ of sequential representation is ta$in" much time for insertion and deletion operations unnecessarily and increasin" the compleity of al"orithm. (ii*
Lin$e! %epresentation: - 0in$ed representation maintains the list by means of a lin$ between the adjacent elements which need not be stored in continuous memory locations. urin" insertion and deletion operations, lin$s will be created or removed between which ta$es less time when compared to the correspondin" operations of sequential representation. ecause of the advanta"es mentioned above, "enerally, lin$ed representation is preferred for any data structure.
A&)orith* Ana&ysis: !n al"orithm is a finite set of instructions that, if followed, accomplishes a particular tas$. 'n addition, all al"orithms must satisfy the followin" criteria. 5. 'nput 4. &utput >. efiniteness ?. Finiteness @. ffectiveness
#he criteria 5 A 4 require that an al"orithm produces one or more outputs A have ero or more input. !ccordin" to criteria >, each operation must be definite such that it must be perfectly clear what should be done. !ccordin" to the ?th criteria al"orithm should terminate after a finite no. of operations. !ccordin" to @th criteria, every instruction must be very basic so that it can be carried out by a person usin" only pencil A paper. #here may be many al"orithms devised for an application and we must analyse and validate the al"orithms to jud"e the suitable one. #o jud"e an al"orithm the most important factors is to have a direct relationship to the performance of the al"orithm. #hese have to do with their computin" time A stora"e requirements ( referred as #ime compleity A Space compleity*. Space Co*p&e+ity: #he space compleity of an al"orithm is the amount of memory it needs to run. Ti*e Co*p&e+ity: #he time ta$en by a pro"ram is the sum of the compiled time A the run time. #he time compleity of an al"orithm is "iven by the number of steps ta$en by the al"orithm to compute the function it was written for. [NOTE : example on how to calculate computing time will be given later]
Stora)e Structure ,or Arrays !rray is set of homo"enous data items represented in conti"uous memory locations usin" a common name and sequence of indices startin" from B. !rray is a simplest data structure that ma$es use of computed address to locate its elements. !n array sie is fied and therefore requires a fied number of memory locations. Suppose ! is an array of n elements and the startin" address is "iven then the location and element ' will be
L'CAi. / Base address o, A i 1 2. 4 Where W is the width of each element. ! multidimensional array can be represented by an equivalent one-dimensional array. ! two dimensional array consistin" of number of rows and columns is a combination of more than 5 onedimensional array. ! 4 dimensional array is referred in two different ways. Considerin" row as major order or column as major order any array may be used to refer the elements. 'f we consider the row as major order then the elements are referred row by row whose addressin" function may be
L'CArc. / Base address o, A r-2. % c-2.6 4 Where r and c are subscripts. 9 is number of columns per row. W is the width of each element. 'f we consider the column as major order then the elements are referred column by column.
Sparse 7atrices ;atrices with relatively hi"h proportion of ero or null entries are called sparse matrices. When matrices are sparse, then much space and computin" time could e saved if the non-ero entries were stored eplicitly i.e. i"norin" the ero entries the processin" time and space can be minimied in sparse matrices. B B B B B B
B B B B B B
B B B B B B
4? B B B B @ B B B B B B 5D B B D B
B B B B B B
'n the above matri we have E rows and columns. #here are @ nonero entries out of ?4 entries. 't requires an alternate form to represent the matri without considerin" the null entries. #he alternate data structure that we consider to represent a sparse matri is a triplet. #he triplet is a two dimensional array havin" tG5 rows and > columns. Where, t is total number of nonero entries. #he first row of the triplet contains number of rows, columns and nonero entries available in the matri in its 5st, 4nd and >rd column respectively. Second row onwards it contains the row subscript, column subscript and the value of the nonero entry in its 5st, 4nd and >rd column respectively. 0et us represent the above matri in the followin" triplet of E rows and > columns
E 5 4 ? @ E
? E @ @ @
@ 4? @ 5D D
#he above triplet contains only non-ero details by reducin" the space for null entries. (o&&o8 the a&)orith*s tau)ht in c&ass6
Stacks ! stac$ is a linear data structure in which an element may be inserted or deleted only at one end called the top end of the stac$ i.e. the elements are removed from a stac$ in the reverse order of that in which they were inserted into the stac$. ! stac$ follows the principle of last-in-first-out (0'F&* system. !ccordin" to the stac$ terminolo"y, 8=S+ and 8&8 are two terms used for insert and delete operations.
epresentation o, Stacks ! stac$ may be represented by means of a one way list or a linear array. =nless, otherwise stated, each of the stac$s will be maintained by a linear array S#!CH, ! variable #&8 contains the location of the top element of the stac$. ! variable 9 "ives the maimum number elements that can be held by the stac$. #he condition where #&8 is 9=00, indicate that the stac$ is empty. #he condition where #&8 is 9, will indicate that the stac$ is full. (o&&o8 the push and pop a&)orith*s discussed in c&ass6
App&ication o, Stacks #here are two important applications of stac$s. a*
Iecursion
b*
!rithmetic pression
ecursion
Iecursion is and important facility in many pro"rammin" lan"ua"es. #here are many problems whose al"orithmic description is best described in a recursive manner. ! function is called recursive if the function definition refers to itself or does refers to another function which in turn refers bac$ to the same function. 'n-order for the definition not to be circular, it must have the followin" properties: (i* (ii*
#here must be certain ar"uments called base values, for which the function does not refer to itself. ach time the function does refer to itself, the ar"ument of the function must be closer to a base value.
! recursive function with those two properties is said to be well defined. 0et us consider the factorial of a number and its al"orithm described recursively: We $now that
9J K (9-5*J K
9 L (9-5*J 9-5 L (9-4*J and so on up to 5.
F!C#(9* 5.
if 9K5 return 5 4. else return 9 L F!C#(9-5* >. end 0et 9 be @. #hen accordin" to the definition F!C#(@* will call F!C#(?*, F!C#(?* will call F!C#(>*, F!C#(>* will call F!C#(4*, F!C#(4* will call F!C#(5*. #hen the eecution will return bac$ by finishin" the eecution of F!C#(5*, then F!C#(4* and so on up to F!C#(@* as described below. 5* 4* >* ?* @* E* * D* *
@J K @ L ?J ?J K ? L >J >J K > L 4J 4J K 4 L 5J 5J K 5 4J K 4 L 5 K 4 >J K > L 4 K E ?J K ? L E K 4? @J K @ L 4? K 54B
From above eample it is clear that every sub function contain parameters and local variables. #he parameters are the ar"uments which receive values from objects in the callin" pro"ram and which transmit values bac$ to the callin" pro"ram. #he sub-function must also $eep trac$ of the return address in the callin" pro"ram. #his return address is essential since control must be transferred bac$ to its proper place in the callin" pro"ram. !fter completion of the sub-function when the control is transferred bac$ to its callin" pro"ram, the local values and returnin" address is no lon"er needed. Suppose our sub-pro"ram is a recursive one, when it call itself, then current values must be saved, since they will be used a"ain when the pro"ram is reactivated. #hus, in recursive process a data structure is required to handle the data of on"oin" called function and the function which is called at last must be processed first. i.e the data accessed last must be processed fist i.e 0ast in first out principle. So, a stac$ may be suitable data structure that follows 0'F& to implement recursion.
Arith*etic #+pression #his section deals with the mechanical evaluation or compilation of infi epression. #he stac$ is find to be more efficient to evaluate an infi arithmetical epression by first convertin" to a suffi or postfi epression and then evaluatin" the suffi epression. #his approach will eliminate the repeated scannin" of an infi epressions in order to obtain its value. ! normal arithmetic epression is normally called as infi epression. ." !G
! 8olish mathematician found a way to represent the same epression called polish notation or prefi epression by $eepin" operators as prefi. ." G! We use the reverse way of the above epression for our evaluation. #he representation is called Ieverse 8olish 9otation (I89* or postfi epression. .". !G #he arithmetic epression evaluation is performed in two phases, they are Conversion of infi to postfi epression
valuation of postfi epression
(o&&o8 the a&)orith*s and e+a*p&es discussed in c&ass6
Queue
Circu&ar Queue #he linear arran"ement of the queue always considers the elements in forward direction. 'n the above two al"orithms, we had seen that, the pointers front and rear are always incremented as and when we delete or insert element respectively. Suppose in a queue of 5B elements front points to ? th element and rear points to Dth element as follows. 5
4
>
<==
? MM
@ MM
E MM
MM
FI&9#
D MM
5B
I!I
When we insert two more elements then the array will become 5 <==
4
>
? MM FI&9#
@ MM
E MM
MM
D MM
MM
5B MM
I!I
0ater, when we try to insert some elements, then accordin" to the lo"ic when I!I is 9 then it encounters an overflow situation. ut there are some elements are left blan$ at the be"innin" part of the array. #o utilie those left over spaces more efficiently, a circular fashion is implemented in queue
representation. #he circular fashion of queue reassi"ns the rear pointer with 5 if it reaches 9 and be"innin" elements are free and the process is continued for deletion also. Such queues are called Circular
Types o, Q$#$# #here are two types of
8riority
ouble nded
"riority Queue ! priority queue is a collection of elements such that each element has been assi"ned a priority value such that the order in which elements are deleted and processed comes from the followin" rules 5. !n element of hi"her priority is processed before any element of lower priority. 4. #wo elements with the same priority are processed accordin" to the order in which they were added to the queue. #here are various ways of maintainin" a priority queue in memory. &ne is usin" one way list. #he sequential representation is never preferred for priority queue. We use lin$ed
Dou9&e #nded Queue ! ouble nded
'nput restricted eque
4.
&utput restricted eque
! eque which allows insertion at only at one end of the list but allows deletion at both the ends of the list is called 'nput restricted eque. ! eque which allows deletion at only at one end of the list but allows insertion at both the ends of the list is called &utput restricted eque.
Gar9a)e Co&&ection and Co*paction /arba"e collection is the process of collectin" all unused nodes and returnin" them to available space. #his process is carried out in two phases: 'n first phase, $nown as mar$in" phase, all nodes in use are mar$ed. 'n second phase, all unmar$ed nodes are returned to the available space list. #he second phase is trivial when all nodes are of a fied sie. 'n this case, the second phase requires only the eamination of each node to see whether or not it has been mar$ed. 'n this situation it is only the first or mar$in" phase that is of any interest in desi"nin" al"orithm. When variable sie nodes are in use, it is desirable to compact memory so that all free nodes form a conti"uous bloc$ of memory. 'n this case, the second phase if referred to as memory compaction. Compaction of dis$ space to reduce avera"e retrieval time is desirable even for fied sie.
Linked List 'ne 8ay List. We understood that the sequential representation of the ordered list is epensive while insertin" or deletin" arbitrary elements stored at fied distance in a fied memory. #he lin$ed representation reduces the epense because the elements are not stored at fied distance and they are represented randomly and the operations such as insertion and deletion are required chan"e in lin$ rather than movement of data. ! lin$ed list is a lin$ed representation of the ordered list. 't is a linear collection of data elements termed as nodes whose linear order is "iven by means of lin$ or pointer. very node consist of two parts. #he first part is called '9F&, contains information of the data and second part is called 0'9H, contains the address of the net node in the list. ! variable called S#!I#, always points to the first node of the list and the lin$ part of the last node always contains null value. ! null value in the S#!I# variable denotes that the list is empty.
? S#!I#
4 !N!'0
'9F& 0'9H C 5 E 4 > > @ ! ? @ E 9ull D D 5B 5 5B 9ull
S#!I#
!
C
9=00
!lon" with the lin$ed list in the memory, a special list is maintained which consists of list of unused memory cells or unused nodes. #his list is called list of available space or availability list or list of free
stora"e or free stora"e list or free pool. ! variable !N!'0 is used to store the startin" address of the availability list. Sometimes, durin" insertion, there may not be available space for insertin" a data into a data structure, then the situation is called '#(L'4. 8ro"rammers "enerally handle the situation by chec$in" whether !N!'0 is 9=00 or not. #he situation where one wants to delete data from a data structure that is empty is called $%D#(L'4. #he situation is encountered when S#!I# is 9=00.
;eader Linked List : ! header lin$ed list is a lin$ed list, which always contains a special node called the header node at the be"innin" of the list. #he header node contains the overall information of the list, which is frequently required for many operations and useful while loo$in" for such information. #here are two $inds of header lists. a*
! &roun!e! 'ea!er list is a header list where the last node contains the null pointer.
b*
! circular 'ea!er list is a header list where the last node points bac$ to the header node.
Circu&ar Linked List : ! lin$ed list is called circular if the last node contains the address of first node or header list. #he advanta"e of circular lin$ed list is it requires minimum time to traverse the nodes which are already traversed, with out movin" to startin" node.
Linked Stack : #he problem with array-based stac$s are that the sie must be determined at compile time. 'nstead, letOs use a lin$ed list, with the stac$ pointer pointin" to the top element, let fresh be the new node. #o push a new element on the stac$, we must do: fresh->next = top; top = fresh;
#o pop an item from a lin$ed stac$, we just have to reverse the operation. p = top; top = top->next;
Linked Queues
App&ication o, Linked List: 5*
"o&yno*ia& 7anipu&ation:
! polynomial has multiple terms with same information such as coefficient and powers. ach term of a polynomial is treated as a node of a list and normally a lin$ed list used to represent a polynomial. #he implementation of polynomial addition is the only operation that is discussed many place. ;ultiplication of polynomials can be obtained by performin" repeated additions. ach polynomial is stored in decreasin" order of by term accordin" to the criteria of that polynomial. i.e. #he term whose powers are more are stored at first node and the least power term is stored at last. #his orderin" of polynomials ma$es the addition of polynomials easy. 'n fact two polynomials can be added or multiplied by scannin" each of their terms only once. 3Follow the al"orithms discussed in class6 4*
Linked Dictionary:
!n important part of any compiler is the construction and maintenance of a dictionary containin" names and their associated values. Such dictionary is also called Symbol #able. #here may be several symbols correspondin" to variable names, labels, literals, etc. #he constraints, which must be considered in the desi"n of the symbol tables, are processin" time and memory space. #here are many phases associated with the construction of symbol tables. #he main phases are buildin" and referencin". 't is very easy to construct a very fast symbol table system, provided that a lar"e section of memory is available. 'n such case a unique address is assi"ned to each name. #he most strai"htforward method of accessin" a symbol table is linear search technique. #his method involves arran"in" the symbols sequentially in memory via an array or by usin" a simple lin$ed list. !n insertion can be easily handled by addin" new element to the end of the list. When it is desired to access a particular symbol, the table is searched sequentially from its be"innin" until it is found. 't will ta$e nP4 comparisons to find a particular symbol. #he insertion mechanism is fast but the referencin" is etremely slow. #he referencin" will be fast if we use binary search technique. #o implement a binary search on symbol table a tree representation is used.
Dou9&e Linked List T8o 8ay List. Since the sin"le lin$ed list contains only one sin"le pointer that points to the net of the lin$ed list, there is only one way traversal. So, the reverse direction is not possible, in the sin"le lin$ed list. For bi-directional movement, a two-way list or double li$ed list is considered. '# is a linear collection of data elements, called nodes. Where each node is divided into three parts: '9F&, 8I'&I, and 9M#. #he '9F& part contains the information of the node and 8I'&I and 9M# are the pointers refers to predecessor node and successor node address respectively. #he list also has two pointers: S#!I# and 0!S#. S#!I# points to the first node of the list where as 0!S# points to last node of the list. #he 8I'&I part of the first node and 9M# part of the last node contains always null value.
8I'&I
? S#!I#
E 0!S#
S#!I#
!
4 !N!'0
'9F& C
9ull
!
5
?
9=00
C
5 4 > ? @ E D 5B
9M# E > @ 9ull D 5B 0!S# 5 9ull
9=00
Di,,erence 9et8een sin)&e &inked &ist and dou9&e &inked &ist :
#he Sin"le lin$ed list has only one advanta"e, that it can traverse a list in one direction. #hat means one cannot "et the address of its predecessor node. i.e. When we loo$ for any previous information of the list durin" operations then one has to traverse a"ain from the start node of the one way list. Which uses an etra pointer and additional searchin" time. ut in case double lin$ed list we can have the address of the net as well as previous node. So, while we loo$ for previous node address, we can obtain throu"h prior part of the two-way list which need not require etra pointer or ta$es less time than that of the sin"le lin$ed list. So apart from the bi-directional movement facility, the two-way list also saves the time and space durin" traversal operation.
Trees ! tree is a nonlinear data structure and is "enerally defined as a nonempty finite set of elements, called nodes such that: 5. # contains a distin"uished node called root of the tree. 4. #he remainin" elements of tree form an ordered collection of ero or more disjoint subsets called sub tree.
Binary Tree : ! binary tree is defined as a finite set of elements, called nodes, such that: 5* 4*
#ree is empty (called the null tree or empty tree* or #ree contains a distin"uished node called root node, and the remainin" nodes form an ordered pair of disjoint binary trees I #5
#4
'n the above tree I is the root node and #5 and #4 are called subtrees. #5 and #4 are left and ri"ht successor of I. #he node I is called parent node and #5 and #4 are called children. !ll lower level nodes are called !escen!ants and upper level nodes are called ancestors of their descendants. #he line drawn between parent and child is called an e!&e or arc where as the line(s* between and ancestor and descendant is called pat'. ! node without any children is called a terminal or leaf no!e and all others are called non-terminal or non-leaf no!e. ! path endin" with a leaf is called a branch. ach node in a binary tree is assi"ned a level number, as follows. #he root node is assi"ned the level number B, and every other node is assi"ned a level number, which are 5 more than the level number of its parent. #he nodes of same level number are said to belon" to same &eneration. 9odes of same parent are called siblin&s. #he !ept' or 'ei&'t of a tree is the maimum level number of the tree or maimum number of nodes in a branch of a tree. #wo trees are said to be similar if they have the same structure and are said to be copies if they are similar and if they have same contents at correspondin" nodes.
Co*p&ete Binary Tree: ! binary tree is said to be complete if all its level ecept possibly the last, have maimum number of possible nodes, and if all the nodes at the last level appear as far left as possible.
(u&& 9inary tree: ! binary tree said to be full if all its level have maimum number of possible node.
#+tended Binary Tree Strict&y Binary Tree or <-tree.:
! binary tree is said to be tended binary tree if each node has either B or 4 children. 'n this case the leaf nodes are called eternal nodes and the node with two children are called internal nodes.
Ske8ed Tree: ! tree is called S$ew if all the nodes of a tree are attached to one side only. i.e ! left s$ew will not have any ri"ht children in its each node and ri"ht s$ew will not have any left child in its each node. 0eft S$ew
Ii"ht S$ew
Binary Search Trees: ! tree is called binary search tree if each node of the tree has followin" properties. #he value at a node is "reater than every value in the left subtree and is less than every value in the ri"ht subtree.
;eap: ! binary tree is also called a heap and there are two types of heap. #he are ;a +eap and ;in +eap. ! heap is called maimum heap if value of a node is "reater than or equal to each of its descendant node. ! heap is called minimum heap if value of a node is less than or equal to each of its descendant node.
epresentation o, Binary Search Tree : Se=uentia& epresentation :
#he sequential representation of tree stores data in an array as per the followin" rules: 5.
#he root node is stored in 5st position.
4.
very left and ri"ht child of a parent node at location $ will be stored in (4LH*th position and (4LHG5*th position respectively.
#he followin" eample shows the representation of binary tree in an array.
F 5
4
+ >
! ?
@
/ E
'
D
C 5B
55
54
5>
5?
5@
5E
Suppose an array is representin" a tree then its tree representation will be drawn usin" the same rule and an eample is shown bellow.
! 5
4
C >
?
@
E
F
/ D
+
5B
55
54
5>
' 5?
Q 5@
5E
Linked epresentation o, Tree: #he 0in$ed representations of tree, maintains three parallel arrays. !n '9F& array contains the data of each node, 0F# array contains the location of left child and I'/+# array contains location of ri"ht child. ! I& pointer points to the root node of the tree. 0F# null > I&
@
4 !N!'0
null null
'9F& I'/+# C 5 null 4 ? > ? E @ 5 E D ! null D 5B null 5B null
;eader %odes:
When a binary tree is maintained in memory by means of a lin$ed representation. Sometimes an etra, special node, called a header node, is added to the be"innin" of the tree. When this etra node is used, the tree pointer variable, which is called +!, will point to the header node, and the left pointer of the header node will point to the root. /enerally, the header node of any tree contains the "eneral or overall information of the tree, which is required frequently in the operation of the tree. For eample, number of employees present in employee tree, number nodes present in a tree, cumulative values of all the nodes etc. so that while accessin" such information the nodes of the tree need not be traversed.
Threads and Threaded Binary Tree:
!pproimately half of the entries in the pointer fields 0eft and ri"ht of any binary tree contains null elements. Ieplacin" the null entries by some other type of information may more efficiently use this space. Specifically, we will replace certain null entries by special pointers, which point to nodes hi"her in the tree. #hese special pointers are called threads and the tree is called threaded binary tree. #here are many ways to thread a binary tree, but each threadin" will correspond to a particular traversal of tree. =nless otherwise stated, threadin" will correspond to in-order traversal. #here are two types of threadin": &ne way threadin" #wo way threadin" 'n one way threadin", either left pointer or ri"ht pointer will be used for threadin". When left pointer used to point the predecessor node of the tree accordin" to in-order traversal, then the threadin" is called left-in threadin". When a ri"ht pointer is used to point the successor node accordin" to in-order traversal, then the threadin" is called ri"ht-in threadin". 'n two way threadin" both left and ri"ht pointers are used to point predecessor and successor nodes of the tree accordin" to in-order traversal.
;ei)ht Ba&anced TreeAL Tree.: !delson-Nels$ii and 0andis in 5E4 introduced a binary tree structure that is balanced with respect to hei"hts of the subtrees. !s a result of the balanced nature of this type of tree, dynamic retrievals can be performed in less time. !t the same time an identifier may be inserted and deleted in that tree in less time. De,inition: ! empty tree is called hei"ht balanced. 'f the tree is nonempty binary tree # with #0 and #I as its left and ri"ht subtrees, then tree is called hei"ht balanced iff a* h0 R hI is R5, or B , or5 where h0 and hI are hei"hts of left subtree #0 and ri"ht subtrees #I respectively. b* #0 and #I are hei"ht balanced. /enerally, while insertin" or deletin" an identifier in tree we b alance the tree. alancin" of a tree is carried out usin" essentially two $ind of rotation left rotation and ri"ht rotation. When a tree at a node has alance Factor less R5 then the tree at that node is considered to be ri"ht heavy. #o balance the ri"ht heavy tree the tree at that node rotated towards left. Similarly, if the tree is at a node is heavin" alance Factor more than 5 will be considered as left heavy. #o balance the left heavy tree, the tree is rotated towards ri"ht at that node.
App&ication o, Binary Tree: 2>
Sy*9o& Ta9&e Construction:
#he notion of symbol table arises frequently in computer science while buildin" compilers, loaders, lin$ers, assemblers etc. ! symbol table is a set of name-value pairs. !ssociated with each name in the table is an attribute, a collection of attributes, or some directions about what further processin" is needed. &ne of the criteria that a symbol table routine must meet is that the table searchin" must be performed efficiently. #his requirement ori"inates in the compilation phase while handlin" many leemes and to$ens of the pro"ram. #he three required operation of the symbol table are: a* 'nsertion of new entry b* eletion of eistin" entry c* 0oo$in" up information of an eistin" entry. ach of above operation requires searchin".
/enerally, a tree is used to construct a symbol table because a* if the symbol table entries as encountered are uniformly distributed accordin" to leico"raphic order, then table searchin" becomes approimately equivalent to a binary search, as lon" as the tree is maintained in leico"raphic order. b* ! binary tree is easily maintained in leico"raphic order. <.
7anipu&ation o, the Arith*etic #+pressions:
We observed that the formulas in Ieverse polish notation are very useful in the compilation process. #here is a close relationship between binary trees and formulas in prefi or suffi notations. 0et us write the infi formula as a binary tree where a node has an operator as a value and where the left and ri"ht subtrees are the left and ri"ht operands of that operator. #he leaves of the tree are the variables and constants of the epression. We represent the epression in binary tree due to similarities of infi to inorder and postfi to postorder traversal of tree. #he tree used for epression is called parse tree. 3Follow the conversion process tau"ht in class room6
Graphs ! /raph is a nonlinear data structure, which is havin" point to point relationship amon" the nodes. ach node of the "raph is called as a verte and lin$ or line drawn between them is called and ed"e.
;athematically, ! "raph )/% consists of two sets N and )% such that /K1N,/7 Where N is finite nonempty set of vertices or nodes. N(/* represents set of vertices. !nd is a set of ed"es. (/* represents set of d"es. !ccordin" to above eample, N(/* K 15,4,>,?,@,E7 (/*K1(5,4*,(4,5*,(5,?*,(?,5*, 7 Suppose ed"e eK1u,v7, then the nodes u and v are called end points of the ed"e e. #he node u is called source node and node v is called destination node, the nodes u and v are called a!acent no!es. #he line drawn between to adjacent nodes is called an ed"e. 'f an ed"e is havin" direction, then the source node is called a!acent to the destination and destination node is a!acent from source. Path: ! path is a sequence of consecutive ed"es between a source and a destination throu"h different nodes. ! path, said to be close! if source is equal to destination. #he path is said to be simple if all nodes are distinct.
Cycle: ! cycle is closed path with len"th > or more. ! cycle of len"th $ is called a $-cycle.
Loop : 'f an ed"e is havin" identical end points, then the ed"e is called a loop.
Degreeor!er: ! de"ree of a node is the number of ed"es containin" that node. #he number ed"es pointin" towards the node are called in-!e&reein-or!er . #he number ed"es pointin" away from the node are called out-!e&reeout-or!er .
! "raph in which the ed"es are havin" direction is called !irecte! graph or !igraph , otherwise the "raph is called un!irecte! graph .
"#olate! no!e: 'f de"ree of a node is ero i.e. if the node is not havin" any ed"es, then the node is called isolated node. Complete $raph : ! "raph is called complete if all the nodes of the "raph are adjacent to each other. ! complete "raph with n nodes will have nL(n-5*P4 ed"es. %eighte! $raph : ! "raph is said to be wei"hted if each ed"e in the "raph is assi"ned a non-ne"ative numerical value called the wei"ht or cost of the ed"e. 'f an ed"e does not have any wei"ht then the wei"ht is considered as 5. &ultigraph : 'f a "raph has two parallel path to an ed"e or multiple ed"es alon" with a loop is said to be multi"raph.
epresentation o, Graph: ! "raph may be represented in two ways. #hey are Sequential Iepresentation and 0in$ed Iepresentation.
Se=uentia& epresentation: #he "raph is represented in sequential memory suin" two matrices. #hey are !djacency ;atri and 8ath ;atri Ad?acency 7atri+: Suppose / is a "raph with n nodes and the nodes of / are bein" ordered and are called v 5,v4,v>,..,vn then the adjacency matri !K(aij* of the "raph / is defined as 5 if vi is adjacent to v j aij K B, otherwise #he adjacency matri with 5%s and B%s is also called bit matri.
!K
B B 5 B
5 B B 5
B 5 B 5
5 B B B
"ath 7atri+ or each-a9i&ity 7atri+ :
Suppose / is a "raph with n nodes and the nodes of / are bein" ordered and are called v 5,v4,v>,..,vn then the 8ath matri 8K(pij* of the "raph / is defined as 5, if there is a path between vi and v j pij K B, otherwise !djacency matri ! is a path of le"th 5. Similary, !4, !>, !?, .., !n are the path matri of len"th 4,>,?,..,n respectively. #hen before calculatin" path matri the matri n will be calculated to find 8. Where n K !4 G !> G !? G .. G !n !ll non-ero elements of n are replaced with 5 to form path matri 8.
Linked epresentation: #he ;atri representation of "raph does not $eeps trac$ of the information related to the nodes. +ence a lin$ed representation is used to represent a "raph called adjacency structure. #he adjacency structure of the "raph maintains two lists called node list and ed"e list. ach node in the node list will correspond to a node in the "raph and will have three %ode List: field. #hey are the information of the node called '9F&, 8ointer to the net node of the list called 9M#, a pointer to the ed"e list called !Q. ach element of the ed"e list will correspond to an ed"e of the "raph and will "ave two #d)e List: fields. #hey are S# contains the address of the destination node and 0'9H contains the address of the net node of the ed"e list. 3Follow the al"orithm tau"ht in class for node an d ed"e insertion and deletion6 Graph Tra@ersa&:
#raversin" a "raph means visitin" all the vertices in a "raph eactly one. 't is of two types: readth First #raversal and epth First #raversal. Breadth (irst Tra@ersa&:
#he traversal starts at a node v, after mar$in" the node the traversal visits alll incident ed"es to node v after mar$in" the nodes and then movin" to an adjacent node and repeatin" the process. #he traversal continues until all unmar$ed nodes in the "raph have been visited. ! queue is maintained in the technique to maintain the list of incident ed"es and mar$ed nodes. 't is more appropriate for a di"raph.
Depth (irst Tra@ersa&:
! depth first search of an arbitrary "raph can be used to perform a trversal of a "eneral "raph. #he technique pic$s up a node and mar$s it. !n unmar$ed adjacent node to previous node is then selected and mar$ed, becomes the new start node, possibly leavin" the previous node with uneplored ed"es for the present. #he traversal continued recursively, until all unmar$ed nodes of the current path are visited. #he process is continued for all the paths of the "raph.
+!S+'9/ +ashin" is a searchin" technique which is $ey to address transformation technique. #he normal linear and binary search technique, searches for a $e y via sequence of comparisons. +ashin" differs from this in that the address or location of an identifier M, is obtained by computin" some arithmetic function, f of M, f(* "ives the address of M in the table. #his address will be referred to as the hash or home address of M. ependin" on the address yielded by the function the data are stored in sequential memory location, called hash table. ;ash Ta9&e:
#he memory available to maintain the symbol table is assumed to be sequential. #his memory is referred to as the hash table +#. #he hash table is partitioned into b buc$ets, +#(B*, +#(5*, ,+#(b R5*. ach buc$et is divided into S slots and each slot is capable of holdin" a records. #hus, a buc$et is said to consist of s slots, each slot bein" lar"e enou"h to hold 5 record. =sually s K5 and each buc$et can hold eactly 5 record. ! hashin" function, f(*, is used perform an identifier transformation on M. f(* maps the set possible identifier on to the inte"ers B throu"h b R5. #he ratio nP# is the i!enti'ier !en#ity, while n P(sLb* is the &oadin) density or &oadin) ,actor. Where n is the number of identifiers , b is number of buc$ets, # is total number of possible identifiers s is number of slots.
*AS*+, F.T+/! hashin" function, f ,transforms an identifier M into a buc$et address in the hash table .!s mentioned earlier the desired properties of such a function are that it be easily computable and that it minimie the number of collisions. Since many pro"rams use several identifiers with the same first letter, we would li$e the function to depend upon all the characters in the identifiers in addition, we would li$e the hash function to be such that it does not result in a biased use of the hash table for random inputs. Several $inds of uniform hash functions are in use. 5 . ivision 4. ;id-square > .Foldin" ?. i"it !nalysis &nly division method is used frequently and is most preferred one. Di@ision 7ethod:
#his is the most common method used for hash function. #he function is used to find a number may be prime or it is number of buc$ets. #hen the number will be used to divide the $ey by it. #he remainder is the hash address for that $ey. For eample let us consider a hash table of 5B buc$ets and try to find the address of followin" values. >?, @E, D, ?>4, D, E@5 the home address of >? will be >?T5B K ? #he home address of @E will be @ET5B K E !nd so on for others as mentioned in the table
HU B 5 4 > ? @ E D
'9F&
E@5 ?>4
MM MM
>?
MM
@E D
MM MM
D
MM
Some times two different $eys may yield same hash address. #he there will be collision between the $eys. #here are few techniques for resolvin" the collision. Co&&ision eso&ution Techni=ue:
When there is a collision, then a random rehashin" function is used to resolve the collision. #he efficiency of collision resolution procedure is measured by the avera"e number of probes($ey comparisons* needed to find the location of the record with the "iven $ey. 9ormally the collision is resolved by dividin" the each buc$et into multiple slots. So, that the $eys of same address can be $ept in different slots of same buc$et. #here are two different ways to resolve the collision. 'pen Addressin) and Chainin)>
#he open addressin" is uses a sequential representation for hash table li$e two dimensional or three dimensional array. #he chainin" concept uses a lin$ed representation for each buc$et and each buc$et is lin$ed with lin$ed list maintainin" the slots of that buc$et.