DATABASE SOLUTIONS (2 nd Edition) THOMAS M CONNOLLY & CAROLYN E BEGG
SOLUTIONS TO REVIEW QUESTIONS
Database Solutions (2 nd Edition)
Ch!t"# $ Int#od%tion' R"i" *%"+tion+
$,$
Li+t Li+t -o%# -o%# ". "./! /!0"+ 0"+ oo- dt1 dt1+ +" " ++t" ++t"/+ /+ oth"# oth"# thn thn tho+ tho+" " 0i+t" 0i+t"d d in S"t S"tion ion $,$, $,$,
Some examples could be:
$,2 $,2
•
A system that maintains maintains component component part part details for a car manufacturer; manufacturer;
•
An advertising advertising company company keeping details details of all clients clients and adverts adverts placed with with them;
•
A training company keeping keeping course course information information and participants’ participants’ details; details;
•
An organization organization maintaining maintaining all sales sales order information. information. Di+ Di+%+ %++ + th" th" /"n /"nin in3 3 oo- " "h h oo- th" th" -o00 -o00o oin in3 3 t"# t"#/+ /+44
()
dt
or end users! this constitutes all the different values connected with the various ob"ects#entities that are of concern to them. (1)
dt1+"
A shared collection of logically related data $and a description of this data%! designed to meet the information needs of an organization. ()
dt1+" /n3"/"nt ++t"/
A software system that: enables users to define! create! and maintain the database database and provides controlled access to this database. (d)
!!0ition !#o3#/
A computer program that interacts with the database by issuing an appropriate re&uest $typically an S'( statement% to the )*+S. (")
dt ind"!"nd"n"
,his is essentially the separation of underlying file structures from the programs that operate operate on them! also called program-data independence. (-)
i"+,
A virtual table that does not necessarily exist in the database but is generated by the )*+S from the underlying underlying base tables whenever it’s accessed. ,hese present present only a subset of the database that is of
2
Database Solutions (2 nd Edition)
Ch!t"# $ Int#od%tion' R"i" *%"+tion+
$,$
Li+t Li+t -o%# -o%# ". "./! /!0"+ 0"+ oo- dt1 dt1+ +" " ++t" ++t"/+ /+ oth"# oth"# thn thn tho+ tho+" " 0i+t" 0i+t"d d in S"t S"tion ion $,$, $,$,
Some examples could be:
$,2 $,2
•
A system that maintains maintains component component part part details for a car manufacturer; manufacturer;
•
An advertising advertising company company keeping details details of all clients clients and adverts adverts placed with with them;
•
A training company keeping keeping course course information information and participants’ participants’ details; details;
•
An organization organization maintaining maintaining all sales sales order information. information. Di+ Di+%+ %++ + th" th" /"n /"nin in3 3 oo- " "h h oo- th" th" -o00 -o00o oin in3 3 t"# t"#/+ /+44
()
dt
or end users! this constitutes all the different values connected with the various ob"ects#entities that are of concern to them. (1)
dt1+"
A shared collection of logically related data $and a description of this data%! designed to meet the information needs of an organization. ()
dt1+" /n3"/"nt ++t"/
A software system that: enables users to define! create! and maintain the database database and provides controlled access to this database. (d)
!!0ition !#o3#/
A computer program that interacts with the database by issuing an appropriate re&uest $typically an S'( statement% to the )*+S. (")
dt ind"!"nd"n"
,his is essentially the separation of underlying file structures from the programs that operate operate on them! also called program-data independence. (-)
i"+,
A virtual table that does not necessarily exist in the database but is generated by the )*+S from the underlying underlying base tables whenever it’s accessed. ,hese present present only a subset of the database that is of
2
Database Solutions (2 nd Edition) particular interest to a user. iews can be customized! for example! field names may change! and they also provide a level of security preventing users from seeing certain data. $,5
D"+# D"+#i1" i1" th" th" /in /in h# h#t t"#i "#i+t +ti+ i+ o- th" th" dt dt1 1+" +" !!# !!#o oh, h,
ocus is now on the data first! and then the applications. ,he structure of the data is now kept separate from the programs that operate on the data. ,his is held in the system catalog or data dictionary. /rograms can now share data! which is no longer fragmented. ,here is also a reduction in redundancy! and achievement of program-data independence.
$,6
D"+# D"+#i1" i1" th" th" -i" -i" o/ o/!on !on"nt "nt+ + o- th" th" DBM DBMS S "ni#o "ni#on/" n/"nt nt nd nd di+% di+%++ ++ ho ho th" th" #"0t #"0t" " to "h oth"#,
$0% Hardware: Hardware :
,he ,he comp compute uterr syst system em$s% $s% that that the the )*+S )*+S and and the the appli applicat cation ion progr programs ams run run on. on. ,his ,his
can range from a single /1! to a single mainframe! to a network of computers. $2% Software: Software:
,he )* )*+S sof softw twa are an and th the ap applic plica ation tion programs! ms! tog toget eth her wit with h th the op operati ating
system! including network software if the )*+S is being used over a network. $3% Data:
,he data acts as a bridge between the hardware and software components and the
human components. As we’ve already said! the database contains both the operational data and the meta-data $the 4data about data’%. $5% Procedures: ,he instructions and rules that govern the design and use of the database. ,his may include instructions on how to log on to the )*+S! make backup copies of the database! and how to handle hardware or software failures. $6% People:
,his includes the database designers! database administrators $)*As%! application
programmers! and the end-users. $,7 $,7
D"+ D"+#i #i1" 1" th" !#o1 !#o10" 0"/+ /+ ith ith th" th" t#d t#dit itio ion n00 to' to'ti ti"# "# 0i"nt 0i"nt'+ '+"# "#" "## #h #hit it" "t% t%#" #" nd nd di+ di+%+ %++ + ho ho th"+ th"+" " !#o1 !#o10" 0"/+ /+ "#" "#" o"# o"#o o/" /" ith ith th" th" th#" th#""' "'ti ti"# "# 0i" 0i"nt nt'+ '+"# "#" "## #hit"t%#",
7n the mid-0889s! mid-0889s! as applica applications tions became became more more complex complex and potentia potentially lly could could be deployed deployed to hundreds or thousands of end-users! the client side of this architecture gave rise to two problems: •
A 4fat’ client! re&uiring re&uiring considerable considerable resources resources on the client’s computer to run effectively $resources include disk space! A+! and 1/ power%.
•
A significant client client side administration administration overhead. overhead.
*y 0886! a new variation of the traditional two-tier client-server model appeared to solve these prob problem lems s called called the th#""'ti"# th#""'ti"# 0i"nt'+"#"# 0i"nt'+"#"# #hit"t%#" #hit"t%#" . ,his new architecture proposed three layers! each potentially running on a different platform: $0% ,he user interface layer! which runs on the end-user’s computer $the client %. %.
3
Database Solutions (2 nd Edition) $2% ,he business logic and data processing layer. ,his middle tier runs on a server and is often called the !!0ition +"#"# .
•
A 4thin’ client! which re&uires less expensive hardware. Simplified application maintenance! as a result of centralizing the business logic for many endusers into a single application server. ,his eliminates the concerns of software distribution that are problematic in the traditional two-tier client-server architecture.
•
Added modularity! which makes it easier to modify or replace one tier without affecting the other tiers.
•
=asier load balancing! again as a result of separating the core business logic from the database functions. or example! a T#n+tion 8#o"++in3 Monito# (T8M) can be used to reduce the number of connections to the database server. $A ,/+ is a program that controls data transfer between clients and servers in order to provide a consistent environment for eb
environment! with a >eb browser acting as the 4thin’ client! and a >eb server acting as the application server. ,he three-tier client server architecture is illustrated in igure 0.5.
4
Database Solutions (2 nd Edition)
$,9
D"+#i1" th" -%ntion+ tht +ho%0d 1" !#oid"d 1 /od"#n -%00'+0" /%0ti'%+"# DBMS,
)ata Storage! etrieval and pdate
Authorization Services
A ser-Accessible 1atalog
Support for )ata 1ommunication
,ransaction Support
7ntegrity Services
1oncurrency 1ontrol Services
Services to /romote )ata 7ndependence
ecovery Services
tility Services
$,:
O- th" -%ntion+ d"+#i1"d in o%# n+"# to Q%"+tion $,9; hih on"+ do o% thin< o%0d not 1" n""d"d in +tnd0on" 8C DBMS= 8#oid" >%+ti-ition -o# o%# n+"#,
1oncurrency 1ontrol Services - only single user. Authorization Services - only single user! but may be needed if different individuals are to use the )*+S at different times. tility Services - limited in scope. Support for )ata 1ommunication - only standalone system. $,?
Di+%++ th" dnt3"+ nd di+dnt3"+ o- DBMS+,
Some advantages of the database approach include control of data redundancy! data consistency! sharing of data! and improved security and integrity. Some disadvantages include complexity! cost! reduced performance! and higher impact of a failure.
5
Database Solutions (2 nd Edition)
Ch!t"# 2 Th" R"0tion0 Mod"0 ' R"i" *%"+tion+ 2,$
Di+%++ "h o- th" -o00oin3 on"!t+ in th" ont".t o- th" #"0tion0 dt /od"04 ()
#"0tion
A table with columns and rows. (1)
tt#i1%t"
A named column of a relation. ()
do/in
,he set of allowable values for one or more attributes. (d)
t%!0"
A record of a relation. (")
#"0tion0 dt1+",
A collection of normalized tables. 2,2
Di+%++ th" !#o!"#ti"+ o- #"0tion0 t10",
A relational table has the following properties: •
,he table has a name that is distinct from all other tables in the database.
•
=ach cell of the table contains exactly one value. $or example! it would be wrong to store several telephone numbers for a single branch in a single cell. 7n other words! tables don’t contain repeating groups of data. A relational table that satisfies this property is said to be normalized or in first normal form.%
•
=ach column has a distinct name.
•
,he values of a column are all from the same domain.
•
,he order of columns has no significance. 7n other words! provided a column name is moved along with the column values! we can interchange columns.
•
=ach record is distinct; there are no duplicate records.
•
,he order of records has no significance! theoretically.
2,5
Di+%++ th" di--"#"n"+ 1"t""n th" ndidt" <"+ nd th" !#i/# <" o- t10", E.!0in ht i+ /"nt 1 -o#"i3n <", Ho do -o#"i3n <"+ o- t10"+ #"0t" to ndidt" <"+= Gi" "./!0"+ to i00%+t#t" o%# n+"#,
6
Database Solutions (2 nd Edition) ,he primary key is the candidate key that is selected to identify tuples uni&uely within a relation. A foreign key is an attribute or set of attributes within one relation that matches the candidate key of some $possibly the same% relation. 2,6
Wht do"+ n%00 #"!#"+"nt=
epresents a value for a column that is currently unknown or is not applicable for this record. 2,7
D"-in" th" to !#ini!0 int"3#it #%0"+ -o# th" #"0tion0 /od"0, Di+%++ h it i+ d"+i#10" to "n-o#" th"+" #%0"+,
Entit int"3#it 7n a base table! no column of a primary key can be null. R"-"#"nti0 int"3#it
7f a foreign key exists in a table! either the foreign key value must match a
candidate key value of some record in its home table or the foreign key value must be wholly null.
7
Database Solutions (2 nd Edition)
Ch!t"# 5 SQL nd QBE ' R"i" *%"+tion+ 5,$
Wht #" th" to />o# o/!on"nt+ o- SQL nd ht -%ntion do th" +"#"=
A data definition language $))(% for defining the database structure. A data manipulation language $)+(% for retrieving and updating data. 5,2
E.!0in th" -%ntion o- "h o- th" 0%+"+ in th" SELECT +tt"/"nt, Wht #"+t#ition+ #" i/!o+"d on th"+" 0%+"+=
@ROM
specifies the table or tables to be used;
WHERE
filters the rows sub"ect to some condition;
GROU8 BY
forms groups of rows with the same column value;
HAVING
filters the groups sub"ect to some condition;
SELECT
specifies which columns are to appear in the output;
ORDER BY
specifies the order of the output.
5,5
Wht #"+t#ition+ !!0 to th" %+" o- th" 33#"3t" -%ntion+ ithin th" SELECT +tt"/"nt= Ho do n%00+ --"t th" 33#"3t" -%ntion+=
An aggregate function can be used only in the S=(=1, list and in the ?A7@ clause.
Apart from 1<@,$B%! each function eliminates nulls first and operates only on the remaining non-null values. 1<@,$B% counts all the rows of a table! regardless of whether nulls or duplicate values occur. 5,6
E.!0in ho th" GROU8 BY 0%+" o#<+, Wht i+ th" di--"#"n" 1"t""n th" WHERE nd HAVING 0%+"+=
S'( first applies the > ?== clause. ,hen it conceptually arranges the table based on the grouping column$s%. @ext! applies the ?A7@ clause and finally orders the result according to the <)= *C clause.
>?== filters rows sub"ect to some condition; ?A7@ f ilters groups sub"ect to some condition. 5,7
Wht i+ th" di--"#"n" 1"t""n +%1*%"# nd >oin= Und"# ht i#%/+tn"+ o%0d o% not 1" 10" to %+" +%1*%"#=
>ith a sub&uery! the columns specified in the S=(=1, list are restricted to one table. ,hus! cannot use a sub&uery if the S=(=1, list contains columns from more than one table. 5,9
Wht i+ QBE nd ht i+ th" #"0tion+hi! 1"t""n QBE nd SQL=
8
Database Solutions (2 nd Edition) '*= is an alternative! graphical-based! 4point-and-click’ way of &uerying the database! which is particularly suited for &ueries that are not too complex! and can be expressed in terms of a few tables. '*= has ac&uired the reputation of being one of the easiest ways for non-technical users to obtain information from the database.
'*= &ueries are converted into their e&uivalent S'( statements before transmission to the )*+S server.
9
Database Solutions (2 nd Edition)
Ch!t"# 6 Dt1+" S+t"/+ D""0o!/"nt Li-"0" ' R"i" *%"+tion+
6,$
D"+#i1" ht i+ /"nt 1 th" t"#/ +o-t#" #i+i+,
,he past few decades has witnessed the dramatic rise in the number of software applications. +any of these applications proved to be demanding! re&uiring constant maintenance. ,his maintenance involved correcting faults! implementing new user re&uirements! and modifying the software to run on new or upgraded platforms. >ith so much software around to support! the effort spent on maintenance began to absorb resources at an alarming rate. As a result! many ma"or software pro"ects were late! over budget! and the software produced was unreliable! difficult to maintain! and performed poorly. ,his led to what has become known as the 4software crisis’. Although this term was first used in the late 08D9s! more than 39 years later! the crisis is still with us. As a result! some people now refer to the software crisis as the 4software depression’. 6,2
Di+%++ th" #"0tion+hi! 1"t""n th" in-o#/tion ++t"/+ 0i-"0" nd th" dt1+" ++t"/ d""0o!/"nt 0i-"0",
An information system is the resources that enable the collection! management! control! and dissemination of data#information throughout a company. ,he database is a fundamental component of an information system. ,he lifecycle of an information system is inherently linked to the lifecycle of the database that supports it.
,ypically! the stages of the information systems lifecycle include: planning! re&uirements collection and analysis! design $including database design%! prototyping! implementation! testing! conversion! and operational maintenance. As a database is a fundamental component of the larger company-wide information system! the database system development lifecycle is inherently linked with the information systems lifecycle. 6,5
B#i"-0 d"+#i1" th" +t3"+ o- th" dt1+" ++t"/ d""0o!/"nt 0i-"0",
See @i3%#" 6,$ Stages of the database system development lifecycle. Dt1+" !0nnin3 is the management activities that allow the stages of the database system development lifecycle to be realized as efficiently and effectively as possible.
S+t"/ d"-inition involves identifying the scope and boundaries of the database system including its ma"or user views. A user view can represent a "ob role or business application area.
10
Database Solutions (2 nd Edition) R"*%i#"/"nt+ o00"tion nd n0+i+ is the process of collecting and analyzing information about the company that is to be supported by the database system! and using this information to identify the re&uirements for the new system.
,here are three approaches to dealing with multiple user views! namely the centralized approach! the view integration approach! and a combination of both. ,he "nt#0i"d !!#oh involves collating the users’ re&uirements for different user views into a single list of re&uirements. A data model representing all the user views is created during the database design stage. ,he i" int"3#tion !!#oh involves leaving the users’ re&uirements for each user view as separate lists of re&uirements. )ata models representing each user view are created and then merged at a later stage of database design.
Dt1+" d"+i3n is the process of creating a design that will support the company’s mission statement and mission ob"ectives for the re&uired database. ,his stage includes the logical and physical design of the database.
,he aim of DBMS +"0"tion is to select a system that meets the current and future re&uirements of the company! balanced against costs that include the purchase of the )*+S product and any additional software#hardware! and the costs associated with changeover and training.
A!!0ition d"+i3n involves designing the user interface and the application programs that use and process the database. ,his stage involves two main activities: transaction design and user interface design.
8#otot!in3 involves building a working model of the database system! which allows the designers or users to visualize and evaluate the system.
I/!0"/"nttion is the physical realization of the database and application designs.
Dt on"#+ion nd 0odin3 involves transferring any existing data into the new database and converting any existing applications to run on the new database.
T"+tin3 is the process of running the database system with the intent of finding errors.
O!"#tion0 /int"nn" is the process of monitoring and maintaining the system following installation. 6,6
D"+#i1" th" !%#!o+" o- #"tin3 /i++ion +tt"/"nt nd /i++ion o1>"ti"+ -o# th" #"*%i#"d dt1+" d%#in3 th" dt1+" !0nnin3 +t3",
11
Database Solutions (2 nd Edition) ,he mission statement defines the ma"or aims of the database system! while each mission ob"ective identifies a particular task that the database must support. 6,7
Di+%++ ht %+"# i" #"!#"+"nt+ h"n d"+i3nin3 dt1+" ++t"/,
A user view defines what is re&uired of a database system from the perspective of a particular "ob $such as +anager or Supervisor% or business application area $such as marketing! personnel! or stock control%. 6,9
Co/!#" nd ont#+t th" "nt#0i"d !!#oh nd i" int"3#tion !!#oh to /n3in3 th" d"+i3n o- dt1+" ++t"/ ith /%0ti!0" %+"# i"+,
An important activity of the re&uirements collection and analysis stage is deciding how to deal with the situation where there is more than one user view. ,here are three approaches to dealing with multiple user views: •
the centralized approach!
•
the view integration approach! and
•
a combination of both approaches.
C"nt#0i"d !!#oh e&uirements for each user view are merged into a single list of re&uirements for the new database system. A logical data model representing all user views is created during the database design stage.
,he "nt#0i"d !!#oh involves collating the re&uirements for different user views into a single list of re&uirements. A data model representing all user views is created in the database design stage. A diagram representing the management of user views 0 to 3 using the centralized approach is shown in igure 5.5. enerally! this approach is preferred when there is a significant overlap in re&uirements for each user view and the database system is not overly complex.
See @i3%#" 6,6 ,he centralized approach to managing multiple user views 0 to 3.
Vi" int"3#tion !!#oh
e&uirements for each user view remain as separate lists. )ata models representing each user view are created and then merged later during the database design stage.
,he i" int"3#tion !!#oh involves leaving the re&uirements for each user view as separate lists of re&uirements. >e create data models representing each user view. A data model that represents a single user view is called a 0o0 0o3i0 dt /od"0. >e then merge the local data models to create a 30o10 0o3i0 dt /od"0 representing all user views of the company.
12
Database Solutions (2 nd Edition) A diagram representing the management of user views 0 to 3 using the view integration approach is shown in igure 5.6. enerally! this approach is preferred when there are significant differences between user views and the database system is sufficiently complex to "ustify dividing the work into more manageable parts. See @i3%#" 6,7 ,he view integration approach to managing multiple user views 0 to 3.
or some complex database systems it may be appropriate to use a combination of both the centralized and view integration approaches to managing multiple user views. or example! the re&uirements for two or more users views may be first merged using the centralized approach and then used to create a 0o0 0o3i0 dt /od"0. $,herefore in this situation the local data model represents not "ust a single user view but the number of user views merged using the centralized approach%. ,he local data models representing one or more user views are then merged using the view integration approach to form the 30o10 0o3i0 dt /od"0 representing all user views.
6,:
E.!0in h it i+ n""++# to +"0"t th" t#3"t DBMS 1"-o#" 1"3innin3 th" !h+i0 dt1+" d"+i3n !h+",
)atabase design is made up of two main phases called logical and physical design. )uring logical database design! we identify the important ob"ects that need to be represented in the database and the relationships between these ob"ects. )uring physical database design! we decide how the logical design is to be physically implemented $as tables% in the target )*+S. ,herefore it is necessary to have selected the target )*+S before we are able to proceed to physical database design.
See igure 5.0 Stages of the database system development lifecycle. 6,?
Di+%++ th" to /in tiiti"+ ++oit"d ith !!0ition d"+i3n,
,he database and application design stages are parallel activities of the database system development lifecycle. 7n most cases! we cannot complete the application design until the design of the database itself has taken place.
,he two main activities associated with the application design stage is the design of the user interface and the application programs that use and process the database.
>e must ensure that all the functionality stated in the re&uirements specifications is present in the application design for the database system. ,his involves designing the interaction between the user and the data! which we call transaction design. 7n addition to designing how the re&uired functionality is to be achieved! we have to design an appropriate user interface to the database system. 6,
D"+#i1" th" !ot"nti0 1"n"-it+ o- d""0o!in3 !#otot!" dt1+" ++t"/,
13
Database Solutions (2 nd Edition) ,he purpose of developing a prototype database system is to allow users to use the prototype to identify the features of the system that work well! or are inade&uate! and if possible to suggest improvements or even new features for the database system. 7n this way! we can greatly clarify the re&uirements and evaluate the feasibility of a particular system design. /rototypes should have the ma"or advantage of being relatively inexpensive and &uick to build. 6,$
Di+%++ th" /in tiiti"+ ++oit"d ith th" i/!0"/"nttion +t3",
,he database implementation is achieved using the Data Definition Language (DDL) of the selected )*+S or a graphical user interface $7%! which provides the same functionality while hiding the lowlevel ))( statements. ,he ))( statements are used to create the database structures and empty database files. Any specified user views are also implemented at this stage.
,he application programs are implemented using the preferred thi#d o# -o%#th 3"n"#tion 0n3%3" (5GL o# 6GL). /arts of these application programs are the database transactions! which we implement using the Data Manipulation Language (DML) of the target )*+S! possibly embedded within a host programming language! such as isual *asic $*%! *.net! /ython! )elphi! 1! 1EE! 1F! Gava! 1<*<(! ortran! Ada! or /ascal. >e also implement the other components of the application design such as menu screens! data entry forms! and reports. Again! the target )*+S may have its own fourth generation tools that allow rapid development of applications through the provision of nonprocedural &uery languages! reports generators! forms generators! and application generators.
Security and integrity controls for the application are also implemented. Some of these controls are implemented using the ))(! but others may need to be defined outside the ))( using! for example! the supplied )*+S utilities or operating system controls. 6,$$
D"+#i1" th" !%#!o+" o- th" dt on"#+ion nd 0odin3 +t3",
,his stage is re&uired only when a new database system is replacing an old system. @owadays! it’s common for a )*+S to have a utility that loads existing files into the new database. ,he utility usually re&uires the specification of the source file and the target database! and then automatically converts the data to the re&uired format of the new database files. >here applicable! it may be possible for the developer to convert and use application programs from the old system for use by the new system.
14
Database Solutions (2 nd Edition) 6,$2
E.!0in th" !%#!o+" o- t"+tin3 th" dt1+" ++t"/,
*efore going live! the newly developed database system should be thoroughly tested. ,his is achieved using carefully planned test strategies and realistic data so that the entire testing process is methodically and rigorously carried out. @ote that in our definition of testing we have not used the commonly held view that testing is the process of demonstrating that faults are not present. 7n fact! testing cannot show the absence of faults; it can show only that software faults are present. 7f testing is conducted successfully! it will uncover errors in the application programs and possibly the database structure. As a secondary benefit! testing demonstrates that the database and the application programs appear to be working according to their specification and that performance re&uirements appear to be satisfied. 7n addition! metrics collected from the testing stage provides a measure of software reliability and software &uality. As with database design! the users of the new system should be involved in the testing process. ,he ideal situation for system testing is to have a test database on a separate hardware system! but often this is not available. 7f real data is to be used! it is essential to have backups taken in case of error. ,esting should also cover usability of the database system. 7deally! an evaluation should be conducted against a usability specification. =xamples of criteria that can be used to conduct the evaluation include $Sommerville! 2999%: •
(earnability - ?ow long does it take a new user to become productive with the systemH
•
/erformance - ?ow well does the system response match the user’s work practiceH
•
obustness - ?ow tolerant is the system of user errorH
•
ecoverability - ?ow good is the system at recovering from user errorsH
•
Adapatability - ?ow closely is the system tied to a single model of workH Some of these criteria may be evaluated in other stages of the lifecycle. After testing is complete!
the database system is ready to be 4signed off’ and handed over to the users. 6,$5
Wht #" th" /in tiiti"+ ++oit"d ith o!"#tion0 /int"nn" +t3",
7n this stage! the database system now moves into a maintenance stage! which involves the following activities: •
+onitoring the performance of the database system. 7f the performance falls below an acceptable level! the database may need to be tuned or reorganized.
•
+aintaining and upgrading the database system $when re&uired%. @ew re&uirements are incorporated into the database system through the preceding stages of the lifecycle.
15
Database Solutions (2 nd Edition)
Ch!t"# 7 Dt1+" Ad/ini+t#tion nd S"%#it ' R"i" *%"+tion+ 7,$
D"-in" th" !%#!o+" nd t+<+ ++oit"d ith dt d/ini+t#tion nd dt1+" d/ini+t#tion,
)ata administration is the management and control of the corporate data! including database planning! development and maintenance of standards! policies and procedures! and logical database design.
16
Database Solutions (2 nd Edition) )atabase administration
is the management and control of the physical realization of the corporate
database system! including physical database design and implementation! setting security and integrity controls! monitoring system performance! and reorganizing the database as necessary.
7,2
Co/!#" nd ont#+t th" /in t+<+ ##i"d o%t 1 th" DA nd DBA,
,he Data dministrator (D) and Database dministrator (DB) are responsible for managing and controlling the activities associated with the corporate data and the corporate database! respectively. ,he )A is more concerned with the early stages of the lifecycle! from planning through to logical database design. 7n contrast! the )*A is more concerned with the later stages! from application#physical database design to operational maintenance. )epending on the size and complexity of the organization and#or database system the )A and )*A can be the responsibility of one or more people.
17
Database Solutions (2 nd Edition)
7,5
E.!0in th" !%#!o+" nd +o!" o- dt1+" +"%#it,
Security considerations do not only apply to the data held in a database. *reaches of security may affect other parts of the system! which may in turn affect the database. 1onse&uently! database security encompasses hardware! software! people! and data. ,o effectively implement security re&uires appropriate controls! which are defined in specific mission ob"ectives for the system. ,his need for security! while often having been neglected or overlooked in the past! is now increasingly recognized by organizations. ,he reason for this turn-around is due to the increasing amounts of crucial corporate data being stored on computer and the acceptance that any loss or unavailability of this data could be potentially disastrous. 7,6
Li+t th" /in t!"+ o- th#"t tht o%0d --"t dt1+" ++t"/; nd -o# "h; d"+#i1" th" !o++i10" o%to/"+ -o# n o#3nition,
18
Database Solutions (2 nd Edition)
@i3%#" 7,$ A summary of the potential threats to computer systems. 7,7
E.!0in th" -o00oin3 in t"#/+ o- !#oidin3 +"%#it -o# dt1+"4
%tho#itionF i"+F 1<%! nd #"o"#F int"3#itF "n#!tionF RAID,
A%tho#ition Authorization is the granting of a right or privilege that enables a sub"ect to have legitimate access to a system or a system’s ob"ect. Authorization controls can be built into the software! and govern not only what database system or ob"ect a specified user can access! but also what the user may do with it. ,he process of authorization involves authentication of a sub"ect re&uesting access to an ob"ect! where 4sub"ect’ represents a user or program and 4ob"ect’ represents a database table! view! procedure! trigger! or any other ob"ect that can be created within the database system.
19
Database Solutions (2 nd Edition) Vi"+ A view is a virtual table that does not necessarily exist in the database but can be produced upon re&uest by a particular user! at the time of re&uest. ,he view mechanism provides a powerful and flexible security mechanism by hiding parts of the database from certain users. ,he user is not aware of the existence of any columns or rows that are missing from the view. A view can be defined over several tables with a user being granted the appropriate privilege to use it! but not to use the base tables. 7n this way! using a view is more restrictive than simply having certain privileges granted to a user on the base table$s%.
B<%! nd #"o"# *ackup is the process of periodically taking a copy of the database and log file $and possibly programs% onto offline storage media. A )*+S should provide backup facilities to assist with the recovery of a database following failure. ,o keep track of database transactions! the )*+S maintains a special file called a log file $or "ournal% that contains information about all updates to the database. 7t is always advisable to make backup copies of the database and log file at regular intervals and to ensure that the copies are in a secure location. 7n the event of a failure that renders the database unusable! the backup copy and the details captured in the log file are used to restore the database to the latest possible consistent state. Gournaling is the process of keeping and maintaining a log file $or "ournal% of all changes made to the database to enable recovery to be undertaken effectively in the event of a failure.
Int"3#it on+t#int+ 1ontribute to maintaining a secure database system by preventing data from becoming invalid! and hence giving misleading or incorrect results.
En#!tion 7s the encoding of the data by a special algorithm that renders the data unreadable by any program without the decryption key. 7f a database system holds particularly sensitive data! it may be deemed necessary to encode it as a precaution against possible external threats or attempts to access it. Some )*+Ss provide an encryption facility for this purpose. ,he )*+S can access the data $after decoding it%! although there is degradation in performance because of the time taken to decode it. =ncryption also protects data transmitted over communication lines. ,here are a number of techni&ues for encoding data to conceal the information; some are termed irreversible and others reversible. !rreversible tec"ni#ues! as the name implies! do not permit the original data to be known. ?owever! the data can be used to obtain valid statistical information. $eversible tec"ni#ues are more commonly used. ,o transmit data securely over insecure networks re&uires the use of a cryptosystem! which includes:
I
an encryption key to encrypt the data $plaintext%;
I
an encryption algorithm that! with the encryption key! transforms the plain text into ciphertext;
20
Database Solutions (2 nd Edition)
I
a decryption key to decrypt the ciphertext;
I
a decryption algorithm that! with the decryption key! transforms the ciphertext back into plain text.
R"d%ndnt A## o- Ind"!"nd"nt Di+<+ (RAID) A7) works by having a large disk array comprising an arrangement of several independent disks that are organized to improve reliability and at the same time increase performance. ,he hardware that the )*+S is running on must be fault%tolerant ! meaning that the )*+S should continue to operate even if one of the hardware components fails. ,his suggests having redundant components that can be seamlessly integrated into the working system whenever there is one or more component failures. ,he main hardware components that should be fault-tolerant include disk drives! disk controllers! 1/! power supplies! and cooling fans. )isk drives are the most vulnerable components with the shortest times between failures of any of the hardware components.
21
Database Solutions (2 nd Edition)
Ch!t"# 9 @t'@indin3 ' R"i" *%"+tion+ 9,$ B#i"-0 d"+#i1" ht th" !#o"++ o- -t'-indin3 tt"/!t+ to hi"" -o# dt1+" d""0o!"#, act-finding is the formal process of using techni&ues such as interviews and &uestionnaires to collect facts about systems! re&uirements! and preferences.
,he database developer uses fact-finding techni&ues at various stages throughout the database systems lifecycle to capture the necessary facts to build the re&uired database system. ,he necessary facts cover the business and the users of the database system! including the terminology! problems! opportunities! constraints! re&uirements! and priorities. ,hese facts are captured using factfinding techni&ues. 9,2 D"+#i1" ho -t'-indin3 i+ %+"d th#o%3ho%t th" +t3"+ o- th" dt1+" ++t"/ d""0o!/"nt 0i-"0", ,here are many occasions for fact-finding during the database system development lifecycle. ?owever! fact-finding is particularly crucial to the early stages of the lifecycle! including the database planning! system definition! and re&uirements collection and analysis stages. 7t’s during these early stages that the database developer learns about the terminology! problems! opportunities! constraints! re&uirements! and priorities of the business and the users of the system. act-finding is also used during database design and the later stages of the lifecycle! but to a lesser extent. or example! during physical database design! fact-finding becomes technical as the developer attempts to learn more about the )*+S selected for the database system. Also! during the final stage! operational maintenance! fact-finding is used to determine whether a system re&uires tuning to improve performance or further developed to include new re&uirements.
22
Database Solutions (2 nd Edition)
9,5 @o# "h +t3" o- th" dt1+" ++t"/ d""0o!/"nt 0i-"0" id"nti- "./!0"+ o- th" -t+ !t%#"d nd th" do%/"nttion !#od%"d,
23
Database Solutions (2 nd Edition)
9,6 A dt1+" d""0o!"# no#/00 %+"+ +""#0 -t'-indin3 t"hni*%"+ d%#in3 +in30" dt1+"
!#o>"t,
Th"
-i"
/o+t
o//on0
%+"d
t"hni*%"+
#"
"./inin3
do%/"nttion; int"#i"in3; o1+"#in3 th" 1%+in"++ in o!"#tion; ond%tin3 #"+"#h; nd %+in3 *%"+tionni#"+, D"+#i1" "h -t'-indin3 t"hni*%" nd id"nti- th" dnt3"+ nd di+dnt3"+ o- "h, E./inin3 do%/"nttion can be useful when you’re trying to gain some insight as to how the need for a database arose. Cou may also find that documentation can be helpful to provide information on the business $or part of the business% associated with the problem. 7f the problem relates to the current system there should be documentation associated with that system. =xamining documents! forms! reports! and files associated with the current system! is a good way to &uickly gain some understanding of the system.
Int"#i"in3 is the most commonly used! and normally most useful! fact-finding techni&ue. Cou can interview to collect information from individuals face-to-face. ,here can be several ob"ectives to using interviewing such as finding out facts! checking facts! generating user interest and feelings of involvement! identifying re&uirements! and gathering ideas and opinions.
O1+"#tion is one of the most effective fact-finding techni&ues you can use to understand a system. >ith this techni&ue! you can either participate in! or watch a person perform activities to learn about the system. ,his techni&ue is particularly useful when the validity of data collected through other methods is in &uestion or when the complexity of certain aspects of the system prevents a clear explanation by the end-users.
24
Database Solutions (2 nd Edition)
A useful fact-finding techni&ue is to #"+"#h the application and problem. 1omputer trade "ournals! reference books! and the 7nternet are good sources of information. ,hey can provide you with information on how others have solved similar problems! plus you can learn whether or not software packages exist to solve your problem.
Another fact-finding techni&ue is to conduct surveys through *%"+tionni#"+ . 'uestionnaires are special-purpose documents that allow you to gather facts from a large number of people while maintaining some control over their responses. >hen dealing with a large audience! no other factfinding techni&ue can tabulate the same facts as efficiently.
25
Database Solutions (2 nd Edition)
9,7 D"+#i1" th" !%#!o+" o- d"-inin3 /i++ion +tt"/"nt nd /i++ion o1>"ti"+ -o# dt1+" ++t"/, ,he mission statement defines the ma"or aims of the database system. ,hose driving the database pro"ect within the business $such as the )irector and#or owner% normally define the mission statement. A mission statement helps to clarify the purpose of the database pro"ect and provides a clearer path towards the efficient and effective creation of the re&uired database system.
26
Database Solutions (2 nd Edition) 9,: Ho do th" ont"nt+ o- %+"#+ #"*%i#"/"nt+ +!"i-ition di--"# -#o/ ++t"/+ +!"i-ition= ,here are two main documents created during the re&uirements collection and analysis stage! namely the users’ re&uirements specification and the systems specification. ,he users’ re&uirements specification describes in detail the data to be held in the database and how the data is to be used. ,he systems specification describes any features to be included in the database system such as the re&uired performance and the levels of security. 9,? D"+#i1" on" !!#oh to d"idin3 h"th"# to %+" "nt#0i"d; i" int"3#tion; o# o/1intion o- 1oth h"n d""0o!in3 dt1+" ++t"/ -o# /%0ti!0" %+"# i"+,
27
Database Solutions (2 nd Edition)
Ch!t"# : Entit'R"0tion+hi! Mod"0in3 ' R"i" *%"+tion+ :,$ D"+#i1" ht "ntiti"+ #"!#"+"nt in n ER /od"0 nd !#oid" "./!0"+ o- "ntiti"+ ith !h+i0 o# on"!t%0 ".i+t"n", =ntity is a set of ob"ects with the same properties! which are identified by a user or company as having an independent existence. =ach ob"ect! which should be uni&uely identifiable within the set! is called an entity occurrence. An entity has an independent existence and can represent ob"ects with a physical $or 4real’% existence or ob"ects with a conceptual $or 4abstract’% existence.
:,2 D"+#i1" ht #"0tion+hi!+ #"!#"+"nt in n ER /od"0 nd !#oid" "./!0"+ o- %n#; 1in#; nd t"#n# #"0tion+hi!+, elationship is a set of meaningful associations among entities. As with entities! each association should be uni&uely identifiable within the set. A uni&uely identifiable association is called a relationship occurrence. =ach relationship is given a name that describes its function. or example! the Actor entity is associated with the ole entity through a relationship called Pla&s! and the ole entity is associated with the ideo entity through a relationship called 'eatures.
,he entities involved in a particular relationship are referred to as participants. ,he number of participants in a relationship is called the degree and indicates the number of entities involved in a relationship. A relationship of degree one is called %n#! which is commonly referred to as a recursive relationship. A unary relationship describes a relationship where the same entity participates more than once in different roles. An example of a unary relationship is Supervises! which represents an association of staff with a supervisor where the supervisor is also a member of staff. 7n other words! the Staff entity participates twice in the Supervises relationship; the first participation as a supervisor! and the second participation as a member of staff who is supervised $supervisee%. See igure J.6 for a diagrammatic representation of the Supervises relationship.
28
Database Solutions (2 nd Edition)
A relationship of degree two is called 1in#.
A relationship of a degree higher than binary is called a complex relationship. A relationship of degree three is called t"#n#. An example of a ternary relationship is $egisters with three participating entities! namely *ranch! Staff ! and +ember . ,he purpose of this relationship is to represent the situation where a member of staff registers a member at a particular branch! allowing for members to register at more than one branch! and members of staff to move between branches.
29
Database Solutions (2 nd Edition)
@i3%#" :,6 =xample of a ternary relationship called $egisters.
:,5 D"+#i1" ht tt#i1%t"+ #"!#"+"nt in n ER /od"0 nd !#oid" "./!0"+ o- +i/!0"; o/!o+it"; +in30"'0%"; /%0ti'0%"; nd d"#i"d tt#i1%t"+, An tt#i1%t" is a property of an entity or a relationship. Attributes represent what we want to know about entities. or example! a ideo entity may be described by the catalog@o! title! category! dailyental! and price attributes. ,hese attributes hold values that describe each video occurrence! and represent the main source of data stored in the database. Si/!0" tt#i1%t" is an attribute composed of a single component. Simple attributes cannot be further subdivided. =xamples of simple attributes include the category and price attributes for a video. Co/!o+it" tt#i1%t" is an attribute composed of multiple components. 1omposite attributes can be further divided to yield smaller components with an independent existence. or example! the name attribute of the +ember entity with the value 4)on @elson’ can be subdivided into f@ame $4)on’% and l@ame $4@elson’%. Sin30"'0%"d tt#i1%t" is an attribute that holds a single value for an entity occurrence. ,he ma"ority of attributes are single-valued for a particular entity. or example! each occurrence of the ideo entity has a single-value for the catalog@o attribute $for example! 29J032%! and therefore the catalog@o attribute is referred to as being single-valued. M%0ti'0%"d tt#i1%t" is an attribute that holds multiple values for an entity occurrence. Some attributes have multiple values for a particular entity. or example! each occurrence of the ideo entity may have multiple values for the category attribute $for example! 41hildren’ and 41omedy’%! and therefore the category attribute in this case would be multi-valued. A multi-valued attribute may have a set of values with specified lower and upper limits. or example! the category attribute may have between one and three values. D"#i"d tt#i1%t" is an attribute that represents a value that is derivable from the value of a related attribute! or set of attributes! not necessarily in the same entity. Some attributes may be related for a particular entity. or example! the age of a member of staff $ age% is derivable from the date of birth $)<*% attribute! and therefore the age and )<* attributes are related. >e refer to the age attribute as a derived attribute! the value of which is derived from the )<* attribute. :,6 D"+#i1" ht /%0ti!0iit #"!#"+"nt+ -o# #"0tion+hi!,
30
Database Solutions (2 nd Edition) M%0ti!0iit
is the number of occurrences of one entity that may relate to a single occurrence of an
associated entity. :,7 Wht #" 1%+in"++ #%0"+ nd ho do"+ /%0ti!0iit /od"0 th"+" on+t#int+= +ultiplicity constrains the number of entity occurrences that relate to other entity occurrences through a particular relationship. +ultiplicity is a representation of the policies established by the user or company! and is referred to as a 1%+in"++ #%0". =nsuring that all appropriate business rules are identified and represented is an important part of modeling a company. ,he multiplicity for a binary relationship is generally referred to as one-to-one $0:0%! one-to-many $0:B%! or many-to-many $B:B%. =xamples of three types of relationships include: •
A member of staff manages a branch.
•
A branch has members of staff.
•
Actors play in videos.
:,9 Ho do"+ /%0ti!0iit #"!#"+"nt 1oth th" #din0it nd th" !#tii!tion on+t#int+ on #"0tion+hi!= +ultiplicity actually consists of two separate constraints known as cardinality and participation. C#din0it describes the number of possible relationships for each participating entity. 8#tii!tion determines whether all or only some entity occurrences participate in a relationship. ,he cardinality of a binary relationship is what we have been referring to as one-to-one! one-to-many! and many-tomany. A participation constraint represents whether all entity occurrences are involved in a particular relationship $mandator& participation% or only some $optional participation%. ,he cardinality and participation constraints for the Staff Manages *ranch relationship are shown in igure J.00.
31
Database Solutions (2 nd Edition)
:,: 8#oid" n "./!0" o- #"0tion+hi! ith tt#i1%t"+, An example of a relationship with an attribute is the relationship called Pla&s!n! which associates the Actor and ideo entities. >e may wish to record the character played by an actor in a given video. ,his information is associated with the Pla&s!n relationship rather than the Actor or ideo entities. >e create an attribute called character to store this information and assign it to the Pla&s!n relationship! as illustrated in igure J.02. @ote! in this figure the character attribute is shown using the symbol for an entity; however! to distinguish between a relationship with an attribute and an entity! the rectangle representing the attribute is associated with the relationship using a dashed line.
32
Database Solutions (2 nd Edition)
igure J.02 A relationship called Pla&s!n with an attribute called character . :,? D"+#i1" ho +t#on3 nd "< "ntiti"+ di--"# nd !#oid" n "./!0" o- "h, >e can classify entities as being either strong or weak. A s t#on3 "ntit is not dependent on the existence of another entity for its primary key. A "< "ntit
is partially or wholly dependent on the
existence of another entity! or entities! for its primary key. or example! as we can distinguish one actor from all other actors and one video from all other videos without the existence of any other entity! Actor and ideo are referred to as being strong entities. 7n other words! the Actor and ideo entities are strong because they have their own primary keys. An example of a weak entity called ole! which represents characters played by actors in videos. 7f we are unable to uni&uely identify one ole entity occurrence from another without the existence of the Actor and ideo entities! then ole is referred to as being a weak entity. 7n other words! the ole entity is weak because it has no primary key of its own.
33
Database Solutions (2 nd Edition)
@i3%#" :,9 )iagrammatic representation of attributes for the ideo ideo!! ole ole!! and Actor and Actor entities. entities.
Strong entities are sometimes referred to as parent owner ! or dominant entities and entities and weak entities as c"ild dependent ! or subordinate entities
:, :, D"+ D"+#i #i1" 1" ho -n nd nd h+ h+/ / t#! t#!+ + n n o% o%## in n ER /od"0 /od"0 nd ho th" th" n 1" #"+o0"d, an and chasm chasm traps are two types types of connecti connection on traps that that can occur in = models. models. ,he traps traps normally occur due to a misinterpretation misinterpretation of the meaning of certain relationships. relationships. 7n general! to identify connection traps we must ensure that the meaning of a relationship $and the business rule that it represents% is fully understood and clearly defined. 7f we don’t understand the relationships we may create a model that is not a true representation of the 4real world’.
A -n t#! may t#! may occur when two entities have a 0:B relationship that fans out from a third entity! but the two entities should have have a direct relationship relationship between between them to provide the necessary necessary information. information. A fan trap may be resolved through the addition of a direct relationship between the two entities that were originally separated by the third entity. A h+/ t#!
may occur when an = model suggests the existence of a relationship between
entities! but the pathway does not exist between certain entity occurrences. +ore specifically! a chasm trap may occur where there is a relationship with optional participation that forms part of the pathway between between the entities that are related. Again! a chasm trap may be resolved by the addition of a direct relationship between the two entities that were originally related through a pathway that included optional participation.
34
Database Solutions (2 nd Edition)
Ch!t"# ? No#/0ition R"i" *%"+tion+ ?,$ Di+%++ Di+%++ ho ho no#/0it no#/0ition ion / 1" 1" %+"d in dt1 dt1+" +" d"+i3n d"+i3n,, @ormalization can be used in database design in two ways: the first is to use normalization as a bottom-up approach to database design; the second is to use normalization in con"unction with = modeling. sing sing normal normaliza izatio tion n as a 1otto/'%! 1otto/'%! !!#oh involves involves analyzi analyzing ng the associat associations ions between between attributes and! based on this analysis! grouping the attributes together to form tables that represent entities and relationships. ?owever! ?owever! this approach becomes becomes difficult with a large number of attributes! where it’s difficult to establish all the important associations between the attributes. Alternatively! you can use a to!'don !!#oh to !!#oh to database design. 7n this approach! we use = modeling to create a data model that represents represents the main entities and relationships. >e then translate the = model into a set of tables that represents this data. 7t’s at this point that we use normalization normalization to check whether the tables are well designed. ?,2 D"+#i1" D"+#i1" th" t!"+ t!"+ o- %!dt" %!dt" no/0i" no/0i"+ + tht / o%# o%# on t10" tht h+ #"d%nd #"d%ndnt nt dt, ,ables that have redundant data may have problems called %!dt" no/0i"+! no/0i"+ ! which are classified as insertion! deletion! or modification anomalies. See igure K.2 for an example of a table with redu redunda ndant nt data data called called Staff*ranch. Staff*ranch. ,here are two main types of insertion anomalies! which we illustrate using this table. 7nsertion anomalies $0% ,o insert the details details of a new member member of staff located located at a given given branch into the the Staff*ranch Staff*ranch table! table! we must also enter the correct details for that branch. or example! to insert the details of a new member of staff at branch *992! we must enter the correct details of branch *992 so that the branch branch details are consisten consistentt with values values for branch branch *992 in other other records records of the Staff*ranch table. table. ,he data shown shown in the Staff*ranch table table is also shown shown in the Staff and and *ranch *ranch tables shown in igure K.0. ,hese tables do have redundant data and do not suffer from this potential inconsistency! because for each staff member we only enter the appropriate branch number into the Staff table. table. 7n addition! the details of branch *992 are recorded recorded only once in the database as a single record in the *ranch *ranch table. table. $2% ,o insert details details of a new branch branch that that currently has has no members members of staff into the Staff*ranch Staff*ranch table! table! it’s necessary to enter nulls into the staff-related columns! such as staff@o staff@o.. ?owever! as staff@o staff@o is is the prima primary ry key for the Staff*ranch table! table! attempting attempting to enter nulls nulls for staff@o staff@o violates entity integrity! and is not allowed. ,he design of the tables shown in igure K.0 avoids this problem because new branch details are entered into the *ranch *ranch table table separately from the staff details.
35
Database Solutions (2 nd Edition) ,he details of staff ultimately located at a new branch can be entered into the Staff table table at a later date. D"0"tion no/0i"+ 7f we delete a record from the Staff*ranch Staff*ranch table table that represents the last member of staff located at a branch! the details about about that branch are also lost from the database. or example! example! if we delete the record for staff Art /eters $S9506% from the Staff*ranch Staff*ranch table! table! the details relating to branch *993 are lost from the database. ,he design of the tables in igure K.0 avoids this problem because branch records are stored separately from staff records and only the column branch@o branch@o relates relates the two tables. 7f we delete the record for staff Art /eters $S9506% from the Staff table! table! the details on branch *993 in the *ranch *ranch table table remain unaffected. Modi-ition no/0i"+ 7f we want to change the value of one of the columns of a particular branch in the Staff*ranch Staff*ranch table! table! for example the telephone number number for branch *990! we must update the records of all staff located at that branch. branch. 7f this modificati modification on is not carried carried out on all the appropr appropriate iate records records of the Staff*ranch table! the database will become inconsistent. 7n this example! branch *990 would have different telephone numbers in different staff records. ,he above examples illustrate that the Staff and and *ranch *ranch tables tables of igure K.0 have more desirable properties properties than the Staff*ranch Staff*ranch table table of igure K.2. 7n the following sections! we examine how normal forms can be used to formalize the identification of tables that have desirable properties from those that may potentially suffer from update anomalies.
36
Database Solutions (2 nd Edition)
?,5 D"+#i1" th" h#t"#i+ti+ o- t10" tht io0t"+ -i#+t no#/0 -o#/ ($N@) nd th"n d"+#i1" ho +%h t10" i+ on"#t"d to $N@, ,he rule for -i#+t no#/0 -o#/ ($N@) is a table in which the intersection of every column and record contains only one value. 7n other words a table that contains more than one atomic value in the intersection of one or more column for one or more records is not in 0@. ,he non 0@ table can be converted to 0@ by restructuring original table by removing the column with the multi-values along with a copy of the primary key to create a new table. See igure K.5 for an example of this approach. ,he advantage of this approach is that the resultant tables may be in normal forms later that 0@. ?,6 Wht i+ th" /ini/0 no#/0 -o#/ tht #"0tion /%+t +ti+-= 8#oid" d"-inition -o# thi+ no#/0 -o#/,
37
Database Solutions (2 nd Edition)
?,9 D"+#i1" th" h#t"#i+ti+ o- t10" in +"ond no#/0 -o#/ (2N@), S"ond no#/0 -o#/ (2N@) is a table that is already in 0@ and in which the values in each nonprimary-key column can only be worked out from the values in all the columns that make up the primary key. ?,: D"+#i1" ht i+ /"nt 1 -%00 -%ntion0 d"!"nd"n nd d"+#i1" ho thi+ t!" od"!"nd"n #"0t"+ to 2N@, 8#oid" n "./!0" to i00%+t#t" o%# n+"#, ,he formal definition of +"ond no#/0 -o#/ (2N@) is a table that is in first normal form and every non-primary-key column is -%00 -%ntion00 d"!"nd"nt on the primary key. ull functional dependency indicates that if A and * are columns of a table! * is fully functionally dependent on A! if * is not dependent on any subset of A. 7f * is dependent on a subset of A! this is referred to as a !#ti0 d"!"nd"n. 7f a partial dependency exists on the primary key! the table is not in 2@. ,he partial dependency must be removed for a table to achieve 2@.
See S"tion ?,6 for an example. ?,? D"+#i1" th" h#t"#i+ti+ o- t10" in thi#d no#/0 -o#/ (5N@), Thi#d no#/0 -o#/ (5N@) is a table that is already in 0@ and 2@! and in which the values in all non-primary-key columns can be worked out from onl& the primary key $or candidate key% column$s% and no other columns. ?, D"+#i1" ht i+ /"nt 1 t#n+iti" d"!"nd"n nd d"+#i1" ho thi+ t!" od"!"nd"n #"0t"+ to 5N@, 8#oid" n "./!0" to i00%+t#t" o%# n+"#, ,he formal definition for thi#d no#/0 -o#/ (5N@) is a table that is in first and second normal forms and in which no non-primary-key column is t#n+iti"0 d"!"nd"nt on the primary key. ,ransitive dependency is a type of functional dependency that occurs when a particular type of relationship holds between columns of a table. or example! consider a table with columns A! *! and 1. 7f * is functionally dependent on A $ A L *% and 1 is functionally dependent on * $* L 1%! then 1 is transitively dependent on A via * $provided that A is not functionally dependent on * or 1%. 7f a transitive dependency exists on the primary key! the table is not in 3@. ,he transitive dependency must be removed for a table to achieve 3@. See S"tion ?,7 for an example.
Ch!t"# Lo3i0 Dt1+" D"+i3n St"! $' R"i" *%"+tion+ ,$ D"+#i1" th" !%#!o+" o- d"+i3n /"thodo0o3,
38
Database Solutions (2 nd Edition) A design methodology
is a structured approach that uses procedures! techni&ues! tools! and
documentation aids to support and facilitate the process of design. ,2 D"+#i1" th" /in !h+"+ ino0"d in dt1+" d"+i3n, )atabase design is made up of two main phases: logical and physical database design. Lo3i0 dt1+" d"+i3n is the process of constructing a model of the data used in a company based on a specific data model! but independent of a particular )*+S and other physical considerations. 7n the logical database design phase we build the logical representation of the database! which includes identification of the important entities and relationships! and then translate this representation to a set of tables. ,he logical data model is a source of information for the physical design phase! providing the physical database designer with a vehicle for making tradeoffs that are very important to the design of an efficient database. 8h+i0 dt1+" d"+i3n is the process of producing a description of the implementation of the database on secondary storage; it describes the base tables! file organizations! and indexes used to achieve efficient access to the data! and any associated integrity constraints and security restrictions. 7n the physical database design phase we decide how the logical design is to be physically implemented in the target relational )*+S. ,his phase allows the designer to make decisions on how the database is to be implemented. ,herefore! physical design is tailored to a specific )*+S. ,5 Id"nti- i/!o#tnt -to#+ in th" +%"++ o- dt1+" d"+i3n, ,he following are important factors to the success of database design: •
>ork interactively with the users as much as possible.
•
ollow a structured methodology throughout the data modeling process.
•
=mploy a data-driven approach.
•
7ncorporate structural and integrity considerations into the data models.
•
se normalization and transaction validation techni&ues in the methodology.
•
se diagrams to represent as much of the data models as possible.
•
se a database design language $)*)(%.
•
*uild a data dictionary to supplement the data model diagrams.
•
*e willing to repeat steps.
,6 Di+%++ th" i/!o#tnt #o0" !0"d 1 %+"#+ in th" !#o"++ o- dt1+" d"+i3n, sers play an essential role in confirming that the logical database design is meeting their re&uirements. (ogical database design is made up of two steps and at the end of each step $Steps 0.8 and 2.6% users are re&uired to review the design and provide feedback to the designer.
39
Database Solutions (2 nd Edition) ,7 Di+%++ th" /in tiiti"+ ++oit"d ith "h +t"! o- th" 0o3i0 dt1+" d"+i3n /"thodo0o3, ,he logical database design phase of the methodology is divided into two main steps. •
7n Step * we create a data model and check that the data model has minimal redundancy and is capable of supporting user transactions. ,he output of this step is the creation of a logical data model! which is a complete and accurate representation of the company $or part of the company% that is to be supported by the database.
•
7n Step + we map the = model to a set of tables. ,he structure of each table is checked using normalization. @ormalization is an effective means of ensuring that the tables are structurally consistent! logical! with minimal redundancy. ,he tables are also checked to ensure that they are capable of supporting the re&uired transactions. ,he re&uired integrity constraints on the database are also defined.
,9 Di+%++ th" /in tiiti"+ ++oit"d ith "h +t"! o- th" !h+i0 dt1+" d"+i3n /"thodo0o3, /hysical database design is divided into six main steps: •
Step , involves the design of the base tables and integrity constraints using the available functionality of the target )*+S.
•
Step - involves choosing the file organizations and indexes for the base tables. ,ypically! )*+Ss provide a number of alternative file organizations for data! with the exception of /1 )*+Ss! which tend to have a fixed storage structure.
•
Step . involves the design of the user views originally identified in the re&uirements analysis and collection stage of the database system development lifecycle.
•
Step D involves designing the security measures to protect the data from unauthorized access.
•
Step / considers relaxing the normalization constraints imposed on the tables to improve the overall performance of the system. ,his is a step that you should undertake only if necessary! because of the inherent problems involved in introducing redundancy while still maintaining consistency.
•
Step 0 is an ongoing process of monitoring and tuning the operational system to identify and resolve any performance problems resulting from the design and to implement new or changing re&uirements.
,: Di+%++ th" !%#!o+" o- St"! $ o- 0o3i0 dt1+" d"+i3n, /urpose of Step 0 is to build a logical data model of the data re&uirements of a company $or part of a company% to be supported by the database.
40
Database Solutions (2 nd Edition) =ach logical data model comprises: •
entities!
•
relationships!
•
attributes and attribute domains!
•
primary keys and alternate keys!
•
integrity constraints.
,he logical data model is supported by documentation! including a data dictionary and = diagrams! which you’ll produce throughout the development of the model. ,? Id"nti- th" /in t+<+ ++oit"d ith St"! $ o- 0o3i0 dt1+" d"+i3n, Step 0 1reate and check = model Step 0.0 7dentify entities Step 0.2 7dentify relationships Step 0.3 7dentify and associate attributes with entities or relationships Step 0.5 )etermine attribute domains Step 0.6 )etermine candidate! primary! and alternate key attributes Step 0.D Specialize#eneralize entities $optional step% Step 0.J 1heck model for redundancy Step 0.K 1heck model supports user transactions Step 0.8 eview model with users
, Di+%++ n !!#oh to id"nti-in3 "ntiti"+ nd #"0tion+hi!+ -#o/ %+"#+ #"*%i#"/"nt+ +!"i-ition, Id"nti-in3 "ntiti"+
Id"nti-in3 #"0tion+hi!+ ?aving identified the entities! the next step is to identify all the relationships that exist between these entities. >hen you identify entities! one method is to look for nouns in the users’ re&uirements
41
Database Solutions (2 nd Edition) specification. Again! you can use the grammar of the re&uirements specification to identify relationships. ,ypically! relationships are indicated by verbs or verbal expressions. or example: •
*ranch Has Staff
•
*ranch !sllocated ideoorent
•
ideoorent !sPart1f entalAgreement
,he fact that the users’ re&uirements specification records these relationships suggests that they are important to the users! and should be included in the model. ,ake great care to ensure that all the relationships that are either explicit or implicit in the users’ re&uirements specification are noted. 7n principle! it should be possible to check each pair of entities for a potential relationship between them! but this would be a daunting task for a large system comprising hundreds of entities.
*y far the easiest thing to do when you’ve identified an entity or a relationship in the users’ re&uirements specification is to consider M 2"at information are we re#uired to "old on . . .HN. ,he answer to this &uestion should be described in the specification. ?owever! in some cases! you may need to ask the users to clarify the re&uirements. nfortunately! they may give you answers that also contain other concepts! so users’ responses must be carefully considered. ,$$ Di+%++ n !!#oh to h"
re-examining one-to-one $0:0% relationships;
$2%
removing redundant relationships;
$3%
considering the time dimension when assessing redundancy.
?owever! to answer this &uestion you need only describe one approach. >e describe approach $0% here.
An "./!0" o- !!#oh ($)
42
Database Solutions (2 nd Edition) 7n the identification of entities! you may have identified two entities that represent the same ob"ect in the company. or example! you may have identified two entities named *ranch and
43
Database Solutions (2 nd Edition)
@i3%#" ,2 =xtract from the data dictionary for the *ranch user views of Sta&Home showing a description of entities.
ER di3#/+ ,hroughout the database design phase! = diagrams are used whenever necessary! to help build up a picture of what you’re attempting to model. )ifferent people use different notations for = diagrams. 7n this book! we’ve used the latest ob"ect-oriented notation called UML (Uni-i"d Mod"0in3 Ln3%3")! but other notations perform a similar function.
Do%/"nt #"0tion+hi!+ As you identify relationships! assign them names that are meaningful and obvious to the user! and also record relationship descriptions! and the multiplicity constraints in the data dictionary.
44
Database Solutions (2 nd Edition)
@i3%#" ,: =xtract from the data dictionary for the *ranch user views of Sta&Home showing descriptions of relationships.
Do%/"nt tt#i1%t"+ As you identify attributes! assign them names that are meaningful and obvious to the user. >here appropriate! record the following information for each attribute: •
attribute name and description;
•
data type and length;
•
any aliases that the attribute is known by;
•
whether the attribute must always be specified $in other words! whether the attribute allows or disallows nulls%;
•
whether the attribute is multi-valued;
•
whether the attribute is composite! and if so! which simple attributes make up the composite attribute;
•
whether the attribute is derived and! if so! how it should be computed;
•
default values for the attribute $if specified%.
45
Database Solutions (2 nd Edition)
@i3%#" ,? =xtract from the data dictionary for the *ranch user views of Sta&Home showing descriptions of attributes.
Do%/"nt tt#i1%t" do/in+ As you identify attribute domains! record their names and characteristics in the data dictionary. pdate the data dictionary entries for attributes to record their domain in place of the data type and length information.
Do%/"nt ndidt"; !#i/#; nd 0t"#nt" <"+ ecord the identification of candidate! primary! and alternate keys $when available% in the data dictionary.
46
Database Solutions (2 nd Edition)
@i3%#" ,$ =xtract from the data dictionary for the *ranch user views of Sta&Home showing attributes with primary and alternate keys identified.
Do%/"nt "ntiti"+ Cou now have a logical data model that represents the database re&uirements of the company $or part of the company%. ,he logical data model is checked to ensure that the model supports the re&uired transactions. ,his process creates documentation that ensures that all the information $entities! relationships! and their attributes% re&uired by each transaction is provided by the model! by documenting a description of each transaction’s re&uirements. Alternative approach to validating the data model against the re&uired transactions involves representing the pathway taken by each transaction directly on the = diagram. 1learly! the more transactions that exist! the more complex this diagram would become! so for readability you may need several such diagrams to cover all the transactions.
47
Database Solutions (2 nd Edition)
Ch!t"# $ Lo3i0 Dt1+" D"+i3n St"! 2 R"i" *%"+tion+ $,$
D"+#i1" th" /in !%#!o+" nd t+<+ o- St"! 2 o- th" 0o3i0 dt1+" d"+i3n /"thodo0o3,
,o create tables for the logical data model and to check the structure of the tables. ,he tasks involved in Step 2 are: •
Step 2.0 1reate tables
•
Step 2.2 1heck table structures using normalization
•
Step 2.3 1heck tables support user transactions
•
Step 2.5 1heck business rules
•
Step 2.6 eview logical database design with users
$,2
D"+#i1" th" #%0"+ -o# #"tin3 t10"+ tht #"!#"+"nt4 $a% strong and weak entities; $b% one-to-many $0:B% binary relationships; $c% one-to-many $0:B% recursive relationships; $d% one-to-one $0:0% binary relationships; $e% one-to-one $0:0% recursive relationships; $f% many-to-many $B:B% binary relationships; $g% complex relationships; $h% multi-valued attributes. ive examples to illustrate your answers.
48
Database Solutions (2 nd Edition)
=xamples are provided throughout the description of Step 2.0 in 1hapter 09. $,5
Di+%++ ho th" t"hni*%" o- no#/0ition n 1" %+"d to h"< th" +t#%t%#" o- th" t10"+ #"t"d -#o/ th" ER /od"0 nd +%!!o#tin3 do%/"nttion,
,he purpose of the techni&ue of normalization to examine the groupings of columns in each table created in Step 2.0. Cou check the composition of each table using the rules of normalization! to avoid unnecessary duplication of data. Cou should ensure that each table created is in at least third normal form $3@%. 7f you identify tables that are not in 3@! this may indicate that part of the = model is incorrect! or that you have introduced an error while creating the tables from the model. 7f necessary! you may need to restructure the data model and#or tables. $,6
Di+%++ on" !!#oh tht n 1" %+"d to h"< tht th" t10"+ +%!!o#t th" t#n+tion+ #"*%i#"d 1 th" %+"#+,
49
Database Solutions (2 nd Edition) $,7
Di+%++ ht 1%+in"++ #%0"+ #"!#"+"nt, Gi" "./!0"+ to i00%+t#t" o%# n+"#+,
*usiness rules are the constraints that you wish to impose in order to protect the database from becoming incomplete! inaccurate! or inconsistent. Although you may not be able to implement some business rules within the )*+S! this is not the &uestion here. At this stage! you are concerned only with high-level design that is! specifying w"at business rules are re&uired irrespective of "ow this might be achieved. ?aving identified the business rules! you will have a logical data model that is a complete and accurate representation of the organization $or part of the organization% to be supported by the database. 7f necessary! you could produce a physical database design from the logical data model! for example! to prototype the system for the user. >e consider the following types of business rules: re&uired data! column domain constraints! entity integrity! multiplicity! referential integrity! other business rules. $,7
D"+#i1" th" 0t"#nti" +t#t"3i"+ tht n 1" !!0i"d i- th"#" i+ hi0d #"o#d #"-"#"nin3 !#"nt #"o#d tht " i+h to d"0"t",
7f a record of the parent table is deleted! referential integrity is lost if there is a child record referencing the deleted parent record. 7n other words! referential integrity is lost if the deleted branch currently has one or more members of staff working at it. ,here are several strategies you can consider in this case: •
@< A1,7<@
/revent a deletion from the parent table if there are any referencing child records.
7n our example! 4Cou cannot delete a branch if there are currently members of staff working there’. •
1AS1A)=
>hen the parent record is deleted! automatically delete any referencing child
records. 7f any deleted child record also acts as a parent record in another relationship then the delete operation should be applied to the records in this child table! and so on in a cascading manner. 7n other words! deletions from the parent table cascade to the child table. 7n our example! 4)eleting a branch automatically deletes all members of staff working there’. 1learly! in this situation! this strategy would not be wise. •
S=, @((
>hen a parent record is deleted! the foreign key values in all related child records
are automatically set to null. 7n our example! 47f a branch is deleted! indicate that the current branch for those members of staff previously working there is unknown’. Cou can only consider this strategy if the columns comprising the foreign key can accept nulls! as defined in Step 0.3. •
S=, )=A(,
>hen a parent record is deleted! the foreign key values in all related child
records are automatically set to their default values. 7n our example! 47f a branch is deleted! indicate that the current assignment of members of staff previously working there is being assigned to another
50
Database Solutions (2 nd Edition) $default% branch’. Cou can only consider this strategy if the columns comprising the foreign key have default values! as defined in Step 0.3. •
@< 1?=1O >hen a parent record is deleted! do nothing to ensure that referential integrity is
maintained. ,his strategy should only be considered in extreme circumstances. $,9 Di+%++ ht 1%+in"++ #%0"+ #"!#"+"nt, Gi" "./!0"+ to i00%+t#t" o%# n+"#+, inally! you consider constraints known as business rules. *usiness rules should be represented as constraints on the database to ensure that only permitted updates to tables governed by 4real world’ transactions are allowed. or example! Sta&Home has a business rule that prevents a member from renting more than 09 videos at any one time.
51
Database Solutions (2 nd Edition)
Ch!t"# $$ Enhn"d Entit'R"0tion+hi! Mod"0in3 R"i" *%"+tion+ $$,$ D"+#i1" ht +%!"#0++ nd +%10++ #"!#"+"nt, S%!"#0++ is an entity that includes one or more distinct groupings of its occurrences! which re&uire to be represented in a data model. S%10++ is a distinct grouping of occurrences of an entity! which re&uire to be represented in a data model.
$$,2 D"+#i1" th" #"0tion+hi! 1"t""n +%!"#0++ nd it+ +%10++, ,he relationship between a superclass and any one of its subclasses is one-to-one $0:0% and is called a superclass#subclass relationship. or example! Staff #+anager forms a superclass#subclass relationship. =ach member of a subclass is also a member of the superclass but has a distinct role.
$$,5 D"+#i1" nd i00%+t#t" %+in3 n "./!0" th" !#o"++ o- tt#i1%t" inh"#itn", An entity occurrence in a subclass represents the same 4real world’ ob"ect as in the superclass. ?ence! a member of a subclass inherits those attributes associated with the superclass! but may also have subclass-specific attributes. or example! a member of the Sales/ersonnel subclass has subclass-specific attributes! salesArea! veh(icense@o! and carAllowance! and all the attributes of the Staff superclass! namely staff@o! name! position! salary! and branch@o.
$$,6 Wht #" th" /in #"+on+ -o# int#od%in3 th" on"!t+ o- +%!"#0++"+ nd +%10++"+ into n EER /od"0= ,here are two important reasons for introducing the concepts of superclasses and subclasses into an = model. ,he first reason is that it avoids describing similar concepts more than once! thereby saving you time and making the = model more readable. ,he second reason is that it adds more semantic information to the design in a form that is familiar to many people. or example! the assertions that 4+anager 7S-A member of staff’ and 4van 7S-A type of vehicle’ communicate significant semantic content in an easy-to-follow form.
52
Database Solutions (2 nd Edition)
$$,7 D"+#i1" ht +h#"d +%10++ #"!#"+"nt+, A subclass is an entity in its own right and so it may also have one or more subclasses. A subclass with more than one superclass is called a shared subclass. 7n other words! a member of a shared subclass must be a member of the associated superclasses. As a conse&uence! the attributes of the superclasses are inherited by the shared subclass! which may also have its own additional attributes. ,his process is referred to as multiple inheritance.
$$,9 D"+#i1" nd ont#+t th" !#o"++ o- +!"i0ition ith th" !#o"++ o- 3"n"#0ition, S!"i0ition
is the process of maximizing the differences between members of an entity by
identifying their distinguishing characteristics. Specialization is a top-down approach to defining a set of superclasses and their related subclasses. ,he set of subclasses is defined on the basis of some distinguishing characteristics of the entities in the superclass. >hen we identify a subclass of an entity! we then associate attributes specific to the subclass $where necessary%! and also identify any relationships between the subclass and other entities or subclasses $where necessary%. G"n"#0ition
is the process of minimizing the differences between entities by identifying their
common features. ,he process of generalization is a bottom-up approach! which results in the identification of a generalized superclass from the original subclasses. ,he process of generalization can be viewed as the reverse of the specialization process.
$$,: D"+#i1" th" to /in on+t#int+ tht !!0 to +!"i0ition3"n"#0ition #"0tion+hi!, ,here are two constraints that may apply to a superclass#subclass relationship called participation constraints and dis"oint constraints.
8#tii!tion on+t#int determines whether every occurrence in the superclass must participate as a member of a subclass. A participation constraint may be mandatory or optional. A superclass#subclass relationship with a mandator& participation specifies that every entity occurrence in the superclass must also be a member of a subclass. A superclass#subclass relationship with optional participation specifies that a member of a superclass need not belong to any of its subclasses.
Di+>oint on+t#int
describes the relationship between members of the subclasses and indicates
whether it’s possible for a member of a superclass to be a member of one! or more than one! subclass. ,he dis"oint constraint only applies when a superclass has more than one subclass. 7f the subclasses are dis3oint ! then an entity occurrence can be a member of only one of the subclasses. ,o represent a dis"oint superclass#subclass relationship! an 4
53
Database Solutions (2 nd Edition) $called nondis"oint%! then an entity occurrence may be a member of more than one subclass. ,he participation and dis"oint constraints of specialization#generalization are distinct giving the following four categories: mandatory and nondis"oint! optional and nondis"oint! mandatory and dis"oint! and optional and dis"oint.
54
Database Solutions (2 nd Edition)
Ch!t"# $2 8h+i0 Dt1+" D"+i3n St"! 5 R"i" *%"+tion+ $2,$
E.!0in th" di--"#"n" 1"t""n 0o3i0 nd !h+i0 dt1+" d"+i3n, Wh /i3ht th"+" t+<+ 1" ##i"d o%t 1 di--"#"nt !"o!0"=
(ogical database design is independent of implementation details! such as the specific functionality of the target )*+S!
application
programs!
programming languages! or any
other physical
considerations. ,he output of this process is a logical data model that includes a set of relational tables together with supporting documentation! such as a data dictionary. ,hese represent the sources of information for the physical design process! and they provide you with a vehicle for making trade-offs that are so important to an efficient database design. >hereas logical database design is concerned with the w"at ! physical database design is concerned with the "ow . 7n particular! the physical database designer must know how the computer system hosting the )*+S operates! and must also be fully aware of the functionality of the target )*+S. As the functionality provided by current systems varies widely! physical design must be tailored to a specific )*+S system. ?owever! physical database design is not an isolated activity P there is often feedback between physical! logical! and application design. or example! decisions taken during physical design to improve performance! such as merging tables together! might affect the logical data model. $2,2
D"+#i1" th" in!%t+ nd o%t!%t+ o- !h+i0 dt1+" d"+i3n,
,he inputs are the logical data model and the data dictionary. ,he outputs are the base tables! integrity rules! file organization specified! secondary indexes determined! user views and security mechanisms. $2,5
D"+#i1" th" !%#!o+" o- th" /in +t"!+ in th" !h+i0 d"+i3n /"thodo0o3 !#"+"nt"d in thi+ h!t"#,
Step 3 produces a relational database schema from the logical data model! which defines the base tables! integrity rules! and how to represent derived data.
55
Database Solutions (2 nd Edition)
$2,6
D"+#i1" th" t!"+ o- in-o#/tion #"*%i#"d to d"+i3n th" 1+" t10"+,
Cou will need to know: •
how to create base tables;
•
whether the system supports the definition of primary keys! foreign keys! and alternate keys;
•
whether the system supports the definition of re&uired data $that is! whether the system allows columns to be defined as @<, @((%;
•
whether the system supports the definition of domains;
•
whether the system supports relational integrity rules;
•
whether the system supports the definition of business rules.
$2,7
D"+#i1" ho o% o%0d o% hnd0" th" #"!#"+"nttion o- d"#i"d dt in th" dt1+", Gi" n "./!0" to i00%+t#t" o%# n+"#,
rom a physical database design perspective! whether a derived column is stored in the database or calculated every time it’s needed is a trade-off. ,o decide! you should calculate: •
the additional cost to store the derived data and keep it consistent with the data from which it is derived! and
•
the cost to calculate it each time it’s re&uired!
and choose the less expensive option sub"ect to performance constraints.
56
Database Solutions (2 nd Edition)
Ch!t"# $5 8h+i0 Dt1+" D"+i3n St"! 6 R"i" *%"+tion+ $5,$
D"+#i1" th" !%#!o+" o- St"! 6 in th" dt1+" d"+i3n /"thodo0o3,
Step 5 determines the file organizations for the base tables. ,his takes account of the nature of the transactions to be carried out! which also determine where secondary indexes will be of use. $5,2
Di+%++ th" !%#!o+" o- n0in3 th" t#n+tion+ tht h" to 1" +%!!o#t"d nd d"+#i1" th" t!" o- in-o#/tion o% o%0d o00"t nd n0",
Cou can’t make meaningful physical design decisions until you understand in detail the transactions that have to be supported. 7n analyzing the transactions! you’re attempting to identify performance criteria! such as: •
the transactions that run fre&uently and will have a significant impact on performance;
•
the transactions that are critical to the operation of the business;
•
the times of the day#week when there will be a high demand made on the database $called the pea4 load %. Cou’ll use this information to identify the parts of the database that may cause performance
problems. At the same time! you need to identify the high-level functionality of the transactions! such as the columns that are updated in an update transaction or the columns that are retrieved in a &uery. Cou’ll use this information to select appropriate file organizations and indexes. $5,5
Wh"n o%0d o% not dd n ind"."+ to t10"=
$0% )o not index small tables. 7t may be more efficient to search the table in memory than to store an additional index structure. $2% Avoid indexing a column or table that is fre&uently updated. $3% Avoid indexing a column if the &uery will retrieve a significant proportion $for example! 26Q% of the records in the table! even if the table is large. 7n this case! it may be more efficient to search the entire table than to search using an index. $5% Avoid indexing columns that consist of long character strings. $5,6
Di+%++ +o/" o- th" /in #"+on+ -o# +"0"tin3 o0%/n + !ot"nti0 ndidt" -o# ind".in3, Gi" "./!0"+ to i00%+t#t" o%# n+"#,
$0% 7n general! index the primary key of a table if it’s not a key of the file organization. Although the S'( standard provides a clause for the specification of primary keys as discussed in Step 3.0 covered in the last chapter! note that this does not guarantee that the primary key will be indexed in some )*+Ss.
57
Database Solutions (2 nd Edition) $2% Add a secondary index to any column that is heavily used for data retrieval. or example! add a secondary index to the +ember table based on the column l@ame! as discussed above. $3% Add a secondary index to a foreign key if there is fre&uent access based on it. or example! you may fre&uently "oin the ideoorent and *ranch tables on the column branch@o $the branch number%. ,herefore! it may be more efficient to add a secondary index to the ideoorent table based on branch@o. $5% Add a secondary index on columns that are fre&uently involved in: $a% selection or "oin criteria; $b% <)= *C; $c% </ *C; $d% other operations involving sorting $such as @7<@ or )7S,7@1,%. $6% Add a secondary index on columns involved in built-in functions! along with any columns used to aggregate the built-in functions. or example! to find the average staff salary at each branch! you could use the following S'( &uery: SELECT branch@o! AVG$salary% @ROM Staff GROU8 BY branch@o;
rom the previous guideline! you could consider adding an index to the branch@o column by virtue of the </ *C clause. ?owever! it may be more efficient to consider an index on both the branch@o column and the salary column. ,his may allow the )*+S to perform the entire &uery from data in the index alone! without having to access the data file. ,his is sometimes called an inde5%onl& plan! as the re&uired response can be produced using only data in the index. $D% As a more general case of the previous guideline! add a secondary index on columns that could result in an index-only plan. $5,7
Hin3 id"nti-i"d o0%/n + !ot"nti0 ndidt"; %nd"# ht i#%/+tn"+ o%0d o% d"id" 3in+t ind".in3 it=
?aving drawn up your 4wish-list’ of potential indexes! consider the impact of each of these on update transactions. 7f the maintenance of the index is likely to slow down important update transactions! then consider dropping the index from the list.
58
Database Solutions (2 nd Edition)
Ch!t"# $6 8h+i0 Dt1+" D"+i3n St"!+ 7 nd 9 R"i" *%"+tion+ $6,$
D"+#i1" th" !%#!o+" o- th" /in +t"!+ in th" !h+i0 d"+i3n /"thodo0o3 !#"+"nt"d in thi+ h!t"#,
Step 6 designs the user views for the database implementation. Step D designs the security mechanisms for the database implementation. ,his includes designing the access rules on the base relations. $6,2
Di+%++ th" di--"#"n" 1"t""n ++t"/ +"%#it nd dt +"%#it,
S+t"/ +"%#it covers access and use of the database at the system level! such as a username and password. Dt +"%#it covers access and use of database ob"ects $such as tables and views% and the actions that users can have on the ob"ects. $6,5
D"+#i1" th" "++ ont#o0 -i0iti"+ o- SQL,
=ach database user is assigned an %tho#ition id"nti-i"# by the )atabase Administrator $)*A%; usually! the identifier has an associated password! for obvious security reasons. =very S'( statement that is executed by the )*+S is performed on behalf of a specific user. ,he authorization identifier is used to determine which database ob"ects that user may reference! and what operations may be performed on those ob"ects. =ach ob"ect that is created in S'( has an owner! who is identified by the authorization identifier. *y default! the owner is the only person who may know of the existence of the ob"ect and perform any operations on the ob"ect. 8#ii0"3"+ are the actions that a user is permitted to carry out on a given base table or view. or example! S=(=1, is the privilege to retrieve data from a table and /)A,= is the privilege to modify records of a table. >hen a user creates a table using the S'( 1=A,= ,A*(= statement! he or she automatically becomes the owner of the table and receives full privileges for the table. 7,? A@, ,7<@ clause can be specified with the A@, statement to allow the receiving user$s% to pass the privilege$s% on to other users. /rivileges can be revoked using the S'( =hen a user creates a view with the 1=A,= 7=> statement! he or she automatically becomes the owner of the view! but does not necessarily receive full privileges on the view. ,o create the view! a user must have S=(=1, privilege to all the tables that make up the view. ?owever! the owner will only get other privileges if he or she holds those privileges for every table in the view. $6,5
D"+#i1" th" +"%#it -"t%#"+ o- Mi#o+o-t A"++ 22,
Access provides a number of security features including the following two methods:
59
Database Solutions (2 nd Edition) $a% setting a password for opening a database $system security%; $b% user-level security! which can be used to limit the parts of the database that a user can read or update $data security%. 7n addition to the above two methods of securing a +icrosoft Access database! other security features include: •
6ncr&ption7decr&ption:
encrypting a database compacts a database file and makes it
indecipherable by a utility program or word processor. ,his is useful if you wish to transmit a database electronically or when you store it on a floppy disk or compact disc. )ecrypting a database reverses the encryption. •
Preventing users from replicating a database setting passwords or setting startup options ;
•
Securing 8B code: this can be achieved by setting a password that you enter once per session or by saving the database as an +)= file! which compiles the *A source code before removing it from the database. Saving the database as an +)= file also prevents users from modifying forms and reports without re&uiring them to specify a log on password or without you having to set up user-level security.
60
Database Solutions (2 nd Edition)
Ch!t"# $7 8h+i0 Dt1+" D"+i3n St"! : R"i" *%"+tion+ $7,$
D"+#i1" th" !%#!o+" o- St"! : in th" dt1+" d"+i3n /"thodo0o3,
Step K considers relaxing the normalization constraints imposed on the logical data model to improve the overall performance of the system. $7,2
E.!0in th" /"nin3 o- d"no#/0ition,
ormally! the term d"no#/0ition refers to a change to the structure of a base table! such that the new table is in a lower normal form than the original table. ?owever! we also use the term more loosely to refer to situations where we combine two tables into one new table! where the new table is in the same normal form but contains more nulls than the original tables. $7,5
Di+%++ h"n it / 1" !!#o!#it" to d"no#/0i" t10", Gi" "./!0"+ to i00%+t#t" o%# n+"#,
,here are no fixed rules for determining when to denormalize tables. Some of the more common situations for considering denormalization to speed up fre&uent or critical transactions are: •
Step J.2.0 1ombining one-to-one $0:0% relationships
•
Step J.2.2 )uplicating nonkey columns in one-to-many $0:B% relationships to reduce "oins
•
Step J.2.3 )uplicating foreign key columns in one-to-many $0:B% relationships to reduce "oins
•
Step J.2.5 )uplicating columns in many-to-many $B:B% relationships to reduce "oins
•
Step J.2.6 7ntroducing repeating groups
•
Step J.2.D 1reating extract tables
•
Step J.2.J /artitioning tables
$7,6
D"+#i1" th" to /in !!#oh"+ to !#titionin3 nd di+%++ h"n "h / 1" n !!#o!#it" to i/!#o" !"#-o#/n", Gi" "./!0"+ to i00%+t#t" o%# n+"#,
Ho#iont0 !#titionin3 )istributing the #"o#d+ of a table across a number of $smaller% tables. V"#ti0 !#titionin3 )istributing the o0%/n+ of a table across a number of $smaller% tables $the primary key is duplicated to allow the original table to be reconstructed%. /artitions are particularly useful in applications that store and analyze large amounts of data. or example! let’s suppose there are hundreds of thousands of records in the ideoorent table that are held indefinitely for analysis purposes. Searching for a particular record at a branch could be &uite time consuming! however! we could reduce this time by horizontally partitioning the table! with one partition for each branch.
61
Database Solutions (2 nd Edition) ,here may also be circumstances where we fre&uently examine particular columns of a very large table and it may be appropriate to vertically partition the table into those columns that are fre&uently accessed together and another vertical partition for the remaining columns $with the primary key replicated in each partition to allow the original table to be reconstructed%.
62
Database Solutions (2 nd Edition)
Ch!t"# $9 8h+i0 Dt1+" D"+i3n St"! ? R"i" *%"+tion+ $9,$
D"+#i1" th" !%#!o+" o- th" /in +t"!+ in th" !h+i0 d"+i3n /"thodo0o3 !#"+"nt"d in thi+ h!t"#,
Step 8 monitors the database application systems and improves performance by making amendments to the design as appropriate. $9,2
Wht -to#+ n 1" %+"d to /"+%#" "--ii"n=
,here are a number of factors that we may use to measure efficiency:
I
9ransaction t"roug"put: this is the number of transactions processed in a given time interval. 7n some systems! such as airline reservations! high transaction throughput is critical to the overall success of the system.
I
$esponse time: this is the elapsed time for the completion of a single transaction. rom a user’s point of view! you want to minimize response time as much as possible. ?owever! there are some factors that influence response time that you may have no control over! such as system loading or communication times. Cou can shorten response time by:
I
-
reducing contention and wait times! particularly disk 7#< wait times;
-
reducing the amount of time resources are re&uired;
-
using faster components.
Dis4 storage: this is the amount of disk space re&uired to store the database files. Cou may wish to minimize the amount of disk storage used.
$9,5
Di+%++ ho th" -o%# 1+i h#d#" o/!on"nt+ int"#t nd --"t ++t"/ !"#-o#/n",
•
main memory
•
1/
•
disk 7#<
•
network. =ach of these resources may affect other system resources. =&ually well! an improvement in one
resource may effect an improvement in other system resources. or example: •
•
Adding more main memory should result in less paging. ,his should help avoid 1/ bottlenecks. +ore effective use of main memory may result in less disk 7#<.
63
Database Solutions (2 nd Edition) $9,6
Ho +ho%0d o% di+t#i1%t" dt #o++ di+<+=
igure 0D.0 illustrates the basic principles of distributing the data across disks: •
,he operating system files should be separated from the database files.
•
,he main database files should be separated from the index files.
•
,he recovery log file! if available and if used! should be separated from the rest of the database.
@i3%#" $9,$ T!i0 di+< on-i3%#tion,
$9,7
Wht i+ RAID t"hno0o3 nd ho do"+ it i/!#o" !"#-o#/n" nd #"0i1i0it=
A7) originally stood for $edundant rra& of !ne5pensive Dis4s! but more recently the 47’ in A7) has come to stand for !ndependent . A7) works on having a large disk array comprising an arrangement of several independent disks that are organized to increase performance and at the same time improve reliability. /erformance is increased through data striping : the data is segmented into e&ual-size partitions $the striping unit %! which are transparently distributed across multiple disks.
,his gives the
appearance of a single large! very fast disk where in actual fact the data is distributed across several smaller disks. Striping improves overall 7#< performance by allowing multiple 7#
64
Database Solutions (2 nd Edition)
Ch!t"# $ C%##"nt nd E/"#3in3 T#"nd+ R"i" *%"+tion+ $,$
Di+%++ th" 3"n"#0 h#t"#i+ti+ o- dn"d dt1+" !!0ition+, •
)esign data is characterized by a large number of types! each with a small number of instances. 1onventional databases are typically the opposite.
•
)esigns may be very large! perhaps consisting of millions of parts! often with many interdependent subsystem designs.
•
,he design is not static but evolves through time. >hen a design change occurs! its implications must be propagated through all design representations. ,he dynamic nature of design may mean that some actions cannot be foreseen at the beginning.
•
pdates are far-reaching because of topological or functional relationships! tolerances! and so on.
•
•
,here may be hundreds of staff involved with the design! and they may work in parallel on multiple versions of a large design. =ven so! the end product must be consistent and coordinated. ,his is sometimes referred to as cooperative engineering .
$,2
Di+%++ h th" "
Poor representation of real world; entities @ormalization generally leads to the creation of tables that do not correspond to entities in the 4real world’. ,he fragmentation of a 4real world’ entity into many tables! with a physical representation that reflects this structure! is inefficient leading to many "oins during &uery processing.
Semantic overloading ,he relational model has only one construct for representing data and relationships between data! namely the table. or example! to represent a many-to-many $B:B% relationship between two entities A and *! we create three tables! one to represent each of the entities A and *! and one to represent the relationship. ,here is no mechanism to distinguish between entities and relationships! or to distinguish between different kinds of relationship that exist between entities. or example! a 0:B relationship might be Has! Supervises! Manages! and so on. 7f such distinctions could be made! then it might be possible to build the semantics into the operations. 7t is said that the relational model is +"/nti00 o"#0od"d.
65
Database Solutions (2 nd Edition) Poor support for business rules 7n Section 2.3! we introduced the concepts of entity and referential integrity! and in Section 2.2.0 we introduced domains! which are also types of business rules. nfortunately! many commercial systems do not fully support these rules! and it’s necessary to build them into the applications. ,his! of course! is dangerous and can lead to duplication of effort and! worse still! inconsistencies. urthermore! there is no support for other types of business rules in the relational model! which again means they have to be built into the )*+S or the application.
Limited operations ,he relational model has only a fixed set of operations! such as set and record-oriented operations! operations that are provided in the S'( specification. ?owever! S'( currently does not allow new operations to be specified. Again! this is too restrictive to model the behavior of many 4real world’ ob"ects. or example! a 7S application typically uses points! lines! line groups! and polygons! and needs operations for distance! intersection! and containment.
Difficult& "andling recursive #ueries Atomicity of data means that repeating groups are not allowed in the relational model. As a result! it’s extremely difficult to handle recursive &ueries: that is! &ueries about relationships that a table has with itself $directly or indirectly%. ,o overcome this problem! S'( can be embedded in a high-level programming language! which provides constructs to facilitate iteration. Additionally! many )*+Ss provide a report writer with similar constructs. 7n either case! it is the application rather than the inherent capabilities of the system that provides the re&uired functionality.
!mpedance mismatc" 7n Section 3.0.0! we noted that until the most recent version of the standard S'( lacked computational completeness. ,o overcome this problem and to provide additional flexibility! the S'( standard provides embedded S'( to help develop more complex database applications. ?owever! this approach produces an i/!"dn" /i+/th because we are mixing different programming paradigms: $0%
S'( is a declarative language that handles rows of data! whereas a high-level language such as 41’ is a procedural language that can handle only one row of data at a time.
$2%
S'( and 3(s use different models to represent data. or example! S'( provides the built-in data types )ate and 7nterval! which are not available in traditional programming languages. ,hus! it’s necessary for the application program to convert between the two representations! which is inefficient! both in programming effort and in the use of runtime resources. urthermore! since we are using two different type systems! it’s not possible to automatically type check the application as a whole.
66
Database Solutions (2 nd Edition) ,he latest release of the S'( standard! S'(3! addresses some of the above deficiencies with the introduction of many new features! such as the ability to define new data types and operations as part of the data definition language! and the addition of new constructs to make the language computationally complete. $,5
E.!0in ht i+ /"nt 1 DDBMS; nd di+%++ th" /otition in !#oidin3 +%h ++t"/,
A Di+t#i1%t"d Dt1+" Mn3"/"nt S+t"/ $DDBMS% consists of a single logical database that is split into a number of -#3/"nt+. =ach fragment is stored on one or more computers $ #"!0i+% under the control of a separate )*+S! with the computers connected by a communications network. =ach site is capable of independently processing user re&uests that re&uire access to local data $that is! each site has some degree of local autonomy% and is also capable of processing data stored on other computers in the network. $,6
Co/!#"
nd
ont#+t
DDBMS
ith
di+t#i1%t"d
!#o"++in3,
Und"#
ht
i#%/+tn"+ o%0d o% hoo+" DDBMS o"# di+t#i1%t"d !#o"++in3= Di+t#i1%t"d !#o"++in3: a centralized database that can be accessed over a computer network. ,he key point with the definition of a distributed )*+S is that the system consists of data that is physically distributed across a number of sites in the network. 7f the data is centralized! even though other users may be accessing the data over the network! we do not consider this to be a distributed )*+S! simply distributed processing. $,7
Di+%++ th" dnt3"+ nd di+dnt3"+ o- DDBMS,
dvantages Reflects organizational structure +any organizations are naturally distributed over several locations. 7t’s natural for databases used in such an application to be distributed over these locations. Improved shareability and local autonomy ,he geographical distribution of an organization can be reflected in the distribution of the data; users at one site can access data stored at other sites. )ata can be placed at the site close to the users who normally use that data. 7n this way! users have local control of the data! and they can conse&uently establish and enforce local policies regarding the use of this data. Improved availability 7n a centralized )*+S! a computer failure terminates the operations of the )*+S. ?owever! a failure at one site of a ))*+S! or a failure of a communication link making some sites inaccessible! does not make the entire system inoperable. Improved reliability As data may be replicated so that it exists at more than one site! the failure of a node or a communication link does not necessarily make the data inaccessible. Improved performance As the data is located near the site of 4greatest demand’! and given the inherent parallelism of ))*+Ss! it may be possible to improve the speed of database accesses than
67
Database Solutions (2 nd Edition) if we had a remote centralized database. urthermore! since each site handles only a part of the entire database! there may not be the same contention for 1/ and 7#< services as characterized by a centralized )*+S. Economics 7t’s generally accepted that it costs much less to create a system of smaller computers with the e&uivalent power of a single large computer. ,his makes it more cost-effective for corporate divisions and departments to obtain separate computers. 7t’s also much more cost-effective to add workstations to a network than to update a mainframe system. Modular growth 7n a distributed environment! it’s much easier to handle expansion. @ew sites can be added to the network without affecting the operations of other sites. ,his flexibility allows an organization to expand relatively easily.
Disadvantages Complexity A ))*+S that hides the distributed nature from the user and provides an acceptable level of performance! reliability! and availability is inherently more complex than a centralized )*+S. eplication also adds an extra level of complexity! which if not handled ade&uately! will lead to degradation in availability! reliability! and performance compared with the centralized system! and the advantages we cited above will become disadvantages. Cost 7ncreased complexity means that we can expect the procurement and maintenance costs for a ))*+S to be higher than those for a centralized )*+S. urthermore! a ))*+S re&uires additional hardware to establish a network between sites. ,here are ongoing communication costs incurred with the use of this network. ,here are also additional manpower costs to manage and maintain the local )*+Ss and the underlying network. Security 7n a centralized system! access to the data can be easily controlled. ?owever! in a ))*+S not only does access to replicated data have to be controlled in multiple locations! but the network itself has to be made secure. 7n the past! networks were regarded as an insecure communication medium. Although this is still partially true! significant developments have been made recently to make networks more secure. Integrity control more difficult =nforcing integrity constraints generally re&uires access to a large amount of data that defines the constraint! but is not involved in the actual update operation itself. 7n a ))*+S! the communication and processing costs that are re&uired to enforce integrity constraints may be prohibitive. Lac of standards Although ))*+Ss depend on effective communication! we are only now starting to see the appearance of standard communication and data access protocols. ,his lack of standards has significantly limited the potential of ))*+Ss. ,here are also no tools or methodologies to help users convert a centralized )*+S into a distributed )*+S.
68
Database Solutions (2 nd Edition) Lac of experience eneral-purpose ))*+Ss have not been widely accepted! although many of the protocols and problems are well understood. 1onse&uently! we do not yet have the same level of experience in industry as we have with centralized )*+Ss. or a prospective adopter of this technology! this may be a significant deterrent. !atabase design more complex *esides the normal difficulties of designing a centralized database! the design of a distributed database has to take account of fragmentation of data! allocation of fragments to specific sites! and data replication.
$,9
D"+#i1" th" ".!"t"d -%ntion0it o- #"!0ition +"#"#,
At its basic level! we expect a distributed data replication service to be capable of copying data from one database to another! synchronously or asynchronously. ?owever! there are many other functions that need to be provided! such as: I
Specification of replication sc"ema
,he system should provide a mechanism to allow a
privileged user to specify the data and ob"ects to be replicated. I
Subscription mec"anism ,he system should provide a mechanism to allow a privileged user to subscribe to the data and ob"ects available for replication.
I
!nitialization mec"anism ,he system should provide a mechanism to allow for the initialization of a target replica.
I
Scalabilit& ,he service should be able to handle the replication of both small and large volumes of data.
I
Mapping and transformation ,he service should be able to handle replication across different )*+Ss and platforms. ,his may involve mapping and transforming the data from one data model into a different data model! or the data in one data type to a corresponding data type in another )*+S.
I
1b3ect replication 7t should be possible to replicate ob"ects other than data. or example! some systems allow indexes and stored procedures $or triggers% to be replicated.
I
6as& administration 7t should be easy for the )*A to administer the system and to check the status and monitor the performance of the replication system components.
$,:
Co/!#" nd ont#+t th" di--"#"nt on"#+hi! /od"0+ -o# #"!0ition, Gi" "./!0"+ to i00%+t#t" o%# n+"#,
69
Database Solutions (2 nd Edition) >ith master#slave ownership! asynchronously replicated data is owned by one site! the master or primar& site! and can be updated by only that site. sing a 4 publis"%and%subscribe’ metaphor! the master site $the publisher% makes data available. orkflow ownership allows the right to update replicated data to move from site to site. ?owever! at any one moment! there is only ever one site that may update that particular data set. A typical example of workflow ownership is an order processing system! where the processing of orders follows a series of steps! such as order entry! credit approval! invoicing! shipping! and so on. 7n a centralized )*+S! applications of this nature access and update the data in one integrated database: each application updates the order data in se&uence when! and only when! the state of the order indicates that the previous step has been completed. U!dt"'nh"#" (+//"t#i #"!0ition) on"#+hi! ,he two previous models share a common property: at any given moment! only one site may update the data; all other sites have read-only access to the replicas. 7n some environments! this is too restrictive. ,he update-anywhere model creates a peer-to-peer environment where multiple sites have e&ual rights to update replicated data. ,his allows local sites to function autonomously! even when other sites are not available. Shared ownership can lead to conflict scenarios and the replication architecture has to be able to employ a methodology for conflict detection and resolution. A simple mechanism to detect conflict within a single table is for the source site to send both the old and new values $ before% and after% images% for any records that have been updated since the last refresh. At the target site! the replication server can check each record in the target database that has also been updated against these values. ?owever! consideration has to be given to detecting other types of conflict such as violation of referential integrity between two tables. ,here have been many mechanisms proposed for conflict resolution! but some of the most common are: earliest#latest timestamps! site priority! and holding for manual resolution. $,?
Gi" d"-inition o- n OODBMS, Wht #" th" dnt3"+ nd di+dnt3"+ o- n OODBMS,
70
Database Solutions (2 nd Edition) OODM A $logical% data model that captures the semantics of ob"ects supported in ob"ect-oriented programming. OODB A persistent and sharable collection of ob"ects defined by an <<)+. OODBMS ,he manager of an <<)*. $,
Gi" d"-inition o- n ORDBMS, Wht #" th" dnt3"+ nd di+dnt3"+ o- n ORDBMS,
,hus! there is no single extended relational model; rather! there are a variety of these models! whose characteristics depend upon the way and the degree to which extensions were made. ?owever! all the models do share the same basic relational tables and &uery language! all incorporate some concept of 4ob"ect’! and some have the ability to store methods $or procedures or triggers% as well as data in the database. $,$
Gi" d"-inition o- dt #"ho%+", Di+%++ th" 1"n"-it+ o- i/!0"/"ntin3 dt #"ho%+",
Dt #"ho%+" : a consolidated#integrated view of corporate data drawn from disparate operational data sources and a range of end-user access tools capable of supporting simple to highly complex &ueries to support decision-making. B"n"-it+4 /otential high return on investment 1ompetitive advantage 7ncreased productivity of corporate decision-makers
$,$$
D"+#i1" th" h#t"#i+ti+ o- th" dt h"0d in dt #"ho%+",
,he data held in a data warehouse is described as being sub"ect-oriented! integrated! time-variant! and non-volatile $7nmon! 0883%.
I
Sub3ect%oriented as the warehouse is organized around the ma"or sub"ects of the organization $such as customers! products! and sales% rather than the ma"or application areas $such as customer invoicing! stock control! and product sales%. ,his is reflected in the need to store decision-support data rather than application-oriented data.
I
!ntegrated because of the coming together of source data from different organization-wide applications systems. ,he source data is often inconsistent using for example! different data types and#or formats. ,he integrated data source must be made consistent to present a unified view of the data to the users.
I
9ime%variant because data in the warehouse is only accurate and valid at some point in time or over some time interval. ,he time-variance of the data warehouse is also shown in the
71
Database Solutions (2 nd Edition) extended time that the data is held! the implicit or explicit association of time with all data! and the fact that the data represents a series of snapshots.
I
$,$2
Di+%++ ho dt /#t+ di--"# -#o/ dt #"ho%+"+ nd id"nti- th" /in #"+on+ -o# i/!0"/"ntin3 dt /#t,
A data mart holds a subset of the data in a data warehouse normally in the form of summary data relating to a particular department or business area such as +arketing or 1ustomer Services. ,he data mart can be stand-alone or linked centrally to the corporate data warehouse. As a data warehouse grows larger! the ability to serve the various needs of the organization may be compromised. ,he popularity of data marts stems from the fact that corporate data warehouses proved difficult to build and use. $,$5
Di+%++ ht on0in" n0ti0 !#o"++in3 (OLA8) i+ nd ho OLA8 di--"#+ -#o/ dt #"ho%+in3,
On0in" n0ti0 !#o"++in3 (OLA8)4
,he dynamic synthesis! analysis! and consolidation of large
volumes of multi-dimensional data. ,he key characteristics of <(A/ applications include multidimensional views of data! support for complex calculations! and time intelligence. $,$6
D"+#i1" OLA8 !!0ition+ nd id"nti- th" h#t"#i+ti+ o- +%h !!0ition+,
An essential re&uirement of all <(A/ applications is the ability to provide users with "ust-in-time $G7,% information! which is necessary to make effective decisions about an organizationRs strategic directions. $,$7
Di+%++ ho dt /inin3 n #"0i" th" 0%" o- dt #"ho%+",
72
Database Solutions (2 nd Edition) Simply storing information in a data warehouse does not provide the benefits an organization is seeking. ,o realize the value of a data warehouse! it’s necessary to extract the knowledge hidden within the warehouse. ?owever! as the amount and complexity of the data in a data warehouse grows! it becomes increasingly difficult! if not impossible! for business analysts to identify trends and relationships in the data using simple &uery and reporting tools. )ata mining is one of the best ways to extract meaningful trends and patterns from huge amounts of data. )ata mining discovers information within data warehouses that &ueries and reports cannot effectively reveal. $,$9
Wh o%0d " nt to dn/i00 3"n"#t" "1 !3"+ -#o/ dt h"0d in th" o!"#tion0 dt1+"= Li+t +o/" 3"n"#0 #"*%i#"/"nt+ -o# "1'dt1+" int"3#tion,
An ?,+(#+( document stored in a file is an example of a static >eb page: the content of the document does not change unless the file itself is changed. eb page is generated each time it’s accessed. As a result! a dynamic >eb page can have features that are not found in static pages! such as:
I
7t can respond to user input from the browser. or example! returning data re&uested by the completion of a form or the results of a database &uery.
I
7t can be customized by and for each user. or example! once a user has specified some preferences when accessing a particular site or page $such as area of interest or level of expertise%! this information can be retained and information returned appropriate to these preferences.
@ot in any ranked order! the re&uirements are as follows: I
,he ability to access valuable corporate data in a secure manner.
I
)ata and vendor independent connectivity to allow freedom of choice in the selection of the )*+S now and in the future.
I
,he ability to interface to the database independent of any proprietary >eb browser or >eb server.
I
A connectivity solution that takes advantage of all the features of an organization’s )*+S.
I
An open-architecture approach to allow interoperability with a variety of systems and technologies.
I
A cost-effective solution that allows for scalability! growth! and changes in strategic directions! and helps reduce the costs of developing and maintaining applications.
I
Support for transactions that span multiple ?,,/ re&uests.
I
Support for session- and application-based authentication.
I
Acceptable performance.
I
+inimal administration overhead.
73