Chapter 2 Data Models
Chapter 2 Data Models Answers to Review Questions 1. Discuss Discuss the the import importanc ancee of data data mode modeling ling..
A data model is a relatively simple representation, usually graphical, of a more complex real world object event. The data model’s main function is to help us understand the complexities of the realworld environment. The database designer uses data models to facilitate the interaction among desi design gner ers, s, appl applic icat atio ion n prog progra ramm mmer ers, s, and and end end user users. s. In shor short, t, a good good data data model odel is a communications device that helps eliminate or at least substantially reduce! discrepancies between the database design’s components and the real world data environment. The development of data models, bolstered by powerful database design tools, has made it possible to substantially diminish the database design error potential. "eview #ection $.% in detail.! 2. What is a busine business ss rule rule and what what is its purpose purpose in data data modeli modeling! ng!
A business rule is a brief, precise, and unambigous description of a policy, procedure, or principle within a specific organi&ation’s environment. In a sense, business rules are misnamed' they apply to any organi&ation any organi&ation -- a business, a government unit, a religious group, or a research laboratory( large or small -- that stores and uses data to generate information. )usiness rules are derived from a description of operations. operations. As its name implies, a description of operations is a detailed narrative that describes the operational environment of an organi&ation. #uch a description re*uires great precision and detail. If the description of operations is incorrect or inomplete, the business rules derived from it will not reflect the real world data environment accurately, thus leading to poorly defined data models, which lead to poor database designs. In turn, poor database designs lead to poor applications, thus setting the stage for poor decision ma+ing which may ultimately lead to the demise of the organi&ation. ote especially that business rules help to create and enforce actions within that organi&ation’s organi&ation’s environment. )usiness rules must be rendered in writing and updated to reflect any change in the organi&ation’s operational environment. roperly written business rules are used to define entities, attributes, relationships, and constraints. )ecaus )ecausee these these compone components nts form form the basis basis for a databa database se design design,, the carefu carefull deriva derivatio tion n and definition of business rules is crucial to good database d esign. ". How do you translate business rules into data model components!
As a general rule, a noun in a business rule will translate into an entity in the model, and a verb (active or passive) associating nouns will translate into a relationship among the entities. For eample, the business rule !a customer
"#
Chapter 2 Data Models may generate many invoices$ contains two nouns (customer and invoice) and a verb (!generate$) that associates them.
"%
Chapter 2 Data Models
#. What three languages emerged to standardi$e the basic networ% data model and wh& was such standardi$ation important to users and designers!
&he three languages were' The DD' (schema) constitutes the /ata /efinition 0anguage for the database schema. The //01s use enabled the database administrator to define the database schema, i.e., its over-all blueprint. 2. The DD' (subschema) allows the definition of the specific database components that will be used by each application. The DM' is the /ata 2anipulation 0anguage that allows us to manipulate the ". database contents. 1.
#tandardi&ation is important to users and designers because it allows them to shift from one commercial application to another with little trouble when they operate at the logical level. *. Describe the basic features of the relational data model and discuss their importance to the end user and the designer.
A relational database is a single data repository that provides both structural and data independence while maintaining conceptual simplicity. &he relational database model is perceived by the user to be a collection o tables in which data are stored. ach table resembles a matri composed o row and columns. &ables are related to each other by sharing a common value in one o their columns. &he relational model represents a brea*through or users and designers because it lets them operate in a simpler conceptual environment. nd users +nd it easier to visualie their data as a collection o data organied as a matri. Designers +nd it easier to deal with conceptual data representation, reeing them rom the compleities associated with physical data representation. +. ,-plain how the entit& relationship (,R) model helped produce a more structured relational database design environment.
An entity relationship model, also *nown as an -M, helps identiy the databases main entities and their relationships. /ecause the -M components are graphically represented, their role is more easily understood. 0sing the diagram, it1s easy to map the -M to the relational database model1s tables and attributes. &his mapping process uses a series o wellde+ned steps to generate all the re3uired database structures. (&his structures mapping approach is augmented by a process *nown as normaliation, which is covered in detail in Chapter 4 !5ormaliation o Database &ables.$)
"4
Chapter 2 Data Models . Consider the scenario described b& the statement /A customer can ma%e man& pa&ments but each pa&ment is made b& onl& one customer0 as the basis for an entit& relationship diagram (,RD) representation.
This scenario yields the 3"/s shown in 4igure 5$.6. ote the use of the oweroint 7row’s 4oot template. 8e will start using the 9isio rofessional-generated 7row’s 4oot 3"/s in 7hapter :, but you can, of course, continue to use the template if you do not have access to 9isio rofessional.!
igure Q2. he Chen and Crow3s oot ,RDs for Question
45, Remind &our students again that we have not (&et) illustrated the effect of optional relationships on the ,RD3s presentation. 5ptional relationships and their treatment are covered in detail in Chapter # /,ntit& Relationship (,R) Modeling.0
6. Wh& is an ob7ect said to have greater semantic content than an entit&!
An object has greater semantic content because it embodies both data and behavior. That is, the object contains, in addition to data, also the description of the operations that may be performed by the object. 8. What is the difference between an ob7ect and a class in the ob7ect oriented data model (55DM)!
An object is an instance of a specific class. It is useful to point out that the object is a run-time concept, while the class is a more static description. ;bjects that share similar characteristics are grouped in classes. A class is a collection of similar objects with shared structure attributes! and behavior methods.! Therefore, a class resembles an entity set.
"6
Chapter 2 Data Models 19. :ow would &ou model Question with an 55DM! (;se igure 2.# as &our guide.)
The ;;/2 that corresponds to *uestion 6’s 3"/ is shown in 4igure 5%.%='
igure Q2.19 he 55DM Model for Question 19
11. What is an ,RDM and what role does it pla& in the modern (production) database environment!
The 3xtended "elational /ata 2odel 3"/2! is the relational data model’s response to the ;bject ;riented /ata 2odel ;;/2.! 2ost current "/)2#es support at least a few of the 3"/2’s extensions. 4or example, support for large binary objects )0;)s! is now common.
Although the 7-DM7 label has re3uently been used in the database literature to describe the relational database models response to the 88DMs challenges, C. 9. Date ob:ects to the -DM label or the ollowing reasons' " &he useul contribution o 7the ob:ect model7 is its ability to let users de+ne their own and oten very comple data types. ;owever, mathematical structures *nown as 7domains7 in the relational model also provide this ability. &hereore, a relational D/M< that properly supports such domains greatly diminishes the reason or using the ob:ect model. =iven proper support or domains, relational database models are 3uite capable o handling the comple data encountered in time series, engineering design, o>ce automation, +nancial modeling, and so on. /ecause the relational model can support comple data types, the notion o an 7etended relational database model7 or -DM is 7etremely inappropriate and inaccurate7 and 7it should be +rmly resisted.7 (&he capability that is supposedly being etended is already there?) ven the label object/relational model (O/RDM) is not 3uite accurate, because the relational database models domain is not an ob:ect model structure. ;owever, there are already 3uite a ew 8@products also *nown as Universal Database Servers on the mar*et. &hereore, Date concedes that we are probably stuc* with the 8@- label. n act, Date believes that 7an 8@- system is in everyones •
•
% 7. >. /ate, ?)ac+ To the "elational 4uture?, http'@@www.dbpd.com@vault@B=Bdate.html
"B
Chapter 2 Data Models uture.7 More precisely, Date argues that a true 8@- system would be 7nothing more nor less than a true relational system which is to say, a system that supports the relational model, with all that such support entails.7 C. 9. Date concludes his discussion by observing that 7e need do nothing to the relational model achieve ob:ect unctionality. (5othing, that is, ecept implement it, something that doesnt yet seem to have been tried in the commercial world.)7 12.
"emind students to review the definitions of data- and structural independence found in 7hapter %, C/atabase #ystems.D Data independence exists when it is possible to ma+e changes in the data storage characteristics without affecting the application program’s ability to access the data. 7onversely, data dependence exists when an application program is unable to access the data after a change in the data storage characteristics has been made. The practical significance of data dependence is the difference between the logical data format how the human being views the data! and the ph&sical data format how the computer CseesD the data!. )ecause a file system exhibits data dependence, any program that accesses a file system’s file must not only tell the computer what to do, but also how to do it. In contrast to the file system, a relational database exhibits data independence. Therefore, anyprogram can access the data regardless of a change in the data storage characteristics. 4or example, if you want to get a listing of all the customers whose last name is C#mithD in a relational database, the command set #3037T E 4";2 7F#T;23" 8<3"3 7F#G0A23 H C#mithD( reads the same regardless of the last name field’s si&e for example, up to $= bytes or up to : bytes! or characterics fixed field length or variable field length!.
NOTE Although students have not yet been introduced to #tructured 5uery 0anguage #50!, the relational database *uery language standard, this command set is simple enough to serve as a discussion vehicle. If you explain that the CED symbol in the #3037T statement means CallD to indicate that all the records are to be selected, the remaining components 4";2 and 8<3"3 are self-explanatory. Jou can dermonstrate this command with any of the 2# Access databases that unclude a 7F#T;23" table. 4or example, use the Ch92=
"
Chapter 2 Data Models >tructural independence exists when it is possible to ma+e changes in the database structure without affecting the application program’s ability to access the data. 4or example, the preceding #50 command set wor+s fine regardless of whether the 7F#G0A23 is listed first, third, or last in the 7F#T;23" table structure.
2E
Chapter 2 Data Models The comparisons are summari&ed in Table 5$.%$.
A?', Q2.12 Data and >tructural
4ile system
DAA <4D,,4D,4C,
>R;C;RA' <4D,,4D,4C,
o Jes Jes Jes Jes Jes
o o o Jes Jes Jes
1". What is a relationship and what three t&pes of relationships e-ist!
A relationship is an association among two or more! entities. Three types of relationships exist' oneto-one %'%!, one-to-many %'2!, and many-to-many 2' or 2'2.! 1#. Bive an e-ample of each of the three t&pes of relationships.
%'% An academic department is chaired by one professor( a professor may chair only one academic department. %'2 A customer may generate many invoices( each invoice is generated by one customer. 2' An employee may have earned many degrees( a degree may have been earned by many employees. 1*. What is a table and what role does it pla& in the relational model!
#trictly spea+ing, the relational data model bases data storage on relations. These relations are based on algebraic set theory.
2"
Chapter 2 Data Models 1+. What is a relational diagram! Bive an e-ample.
A relational diagram is a visual representation of the relational database’s entities, the attributes within those entities, and the relationships between those entities. Therefore, it is easy to see what the entities represent and to see what types of relationships %'%, %'2, 2'! exist among the entities and how those relationships are implemented. An example of a relational diagram is found in the text’s 4igure $.$. 1. What is connectivit&!
Connectivity is the relational term to describe the types o relationships ("'", "'M, M'5).
n the +gure, the businesss rule that an advisor can advise many students and a student has only one assigned advisor is shown with in a relationship with a connectivity o "'M. &he business rule that a student can register only one vehicle to par* on campus and a vehicle can be registered by only one student is shown with a relationship with a connectivity o "'". Finally, the rule that a student can register or many classes, and a class can be registered or by many students, is shown by the relationship with a connectivity o M'5. 16. Describe the ?ig Data phenomenon.
;ver the last few years, a new wave of data has CemergedD to the limelight. #uch data have alsways exsisted but did not recive the attention that is receiving today. These data are characteri&ed for being high volume petabyte si&e and beyond!, high fre*uency data are generated almost constantly!, and mostly semi-structured. These data come from multiple and vatied sources such as web site logs, web site posts in social sites, and machine generated information K#, sensors, etc.! #uch data( have been accumulated over the years and companies are now awa+ining to the fact that it contains a lot of hidden information that could help the day-to-day business such as browsing patterns, purchasing preferences, behaivor patterns, etc.! The need to manage and leverage this data has triggered a phenomenon labeled C)ig /ataD. i! Data reers to a movement to +nd new
22
Chapter 2 Data Models and better ways to manage large amounts o webgenerated data and derive business insight rom it, while, at the same time, providing high perormance and scalability at a reasonable cost. 18. What is sparse data! Bive an e-ample.
#parse data refers to cases in which the number of attributes are very large, but the numbers but the actual number of distinct value instances is relatively small. 4or example, if you are modeling census data, you will have an entitty called person. This entity person can have hundred of attributes, some of those attributes would be first name, last name, degree, employer, income, veteran status, foreign born, etc. Although, there would be many millions of rows of data for each person, there will be many attributes that will be left blan+, for example, not all persons will have a degree, an income or an employer. 3ven fewer persons will be veterans or foreign born. 3very time that you have an data entity that has many columns but the data instances for the columns are very low many empty attribute occurrences! it is said that you have sparse data. There is another related terminoligy, data sparcity that refers to the number of different values a fiven columns could have. In this case, a column such as CgenderD although it will have values for all rows, it has a low data sparcity because the number of different values is ony two' male or female. A column such as name and birthdate will have high data sparcity because the number of different values is high. 29. Define and describe the basic characteristics of a 4o>Q' database.
3very time you search for a product on Ama&on, send messages to friends in 4aceboo+, watch a video in JouTube or search for directions in Koogle 2aps, you are using a o#50 database. 4o>Q' refers to a new generation of databases that address the very specific challenges o f the Cbig dataD era and have the following general characteristics' •
•
•
ot based on the relational model. These databases are generally based on a variation of the +ey-value data model rather than in the relational model, hence the o#50 name. The key-value data model is based on a structure composed of two data elements' a +ey and a value( in which for every +ey there is a corresponding value or a set of values!. The +ey-value data model is also referred to as the attribute-value or associative data model. In the +ey-value data model, each row represents one attribute of one entity instance. The C+eyD column points to an attribute and the CvalueD column contains the actual value for the attribute. The data type of the CvalueD column is generally a long string to accommodate the variety of actual data types of the values that are placed in the column. #upport distributed database architectures. ;ne of the big advantages of o#50 databases is that they generally use a distributed architecture. In fact, several of them 7assandra, )ig Table! are designed to use low cost commodity servers to form a complex networ+ of distributed database nodes rovide high scalability, high availability and fault tolerance. o#50 databases are designed to support the ability to add capacity add database nodes to the
2
Chapter 2 Data Models distributed database! when the demand is high and to do it transparently and without downtime. 4ault tolerant means that if one of the nodes in the distributed database fails, the database will +eep operating as normal. •
•
#upport very large amounts of sparse data. )ecause o#50 databases use the +ey-value data model, they are suited to handle very high volumes of sparse data( that is for cases where the number of attributes is very large but the number of actual data instances is low. Keared toward performance rather than transaction consistency. ;ne of the biggest problems of very large distributed databases is to enforce data consistency. /istributed databases automatically ma+e copies of data elements at multiple nodes to ensure high availability and fault tolerance. If the node with the re*uested data goes down, the re*uest can be served from any other node with a copy of the data.
21. ;sing the e-ample of a medical clinic with patients and tests provide a simple representation of how to model this e-ample using the relational model and how it wold be represented using the %e&value data modeling techniue.
As you can see in 4igure 5$.$%, the relational model stores data in a tabular format in which each row represents a CrecordD for a given patient. 8hile, the +ey-value data model uses three d iffernet fields to represent each data element in the record. Therefore, for each patient row, there are three
2#
Chapter 2 Data Models rows in the +ey-value model.
22. What is logical independence!
"o!ical independence eists when you can change the internal model without aGecting the conceptual model. hen you discuss logical and other types o independence, it1s worthwhile to discuss and review some basic modeling concepts and terminology' n general terms, a model is an abstraction o a more comple realworld ob:ect or event. A model1s main unction is to help you understand the compleities o the realworld environment. ithin the database environment, a data model represents data structures and their characteristics, relations, constraints, and transormations. As its name implies, a purely conceptual model stands at the highest level o abstraction and ocuses on the basic ideas (concepts) that are eplored in the model, without speciying the details that will enable the designer to implement the model. For eample, a conceptual model would include entities and their relationships and it may even include at least some o the attributes that de+ne the entities, but it would not include attribute details such as the nature o the attributes (tet, numeric, etc.) or the physical storage re3uirements o those atttributes. •
•
The terms data model and database model are often used interchangeably. In the text, the term database model is be used to refer to the implementation of a data model in a specific database system.
2%
Chapter 2 Data Models •
•
•
•
Data models (relatively simple representations, usually graphical, o more comple realworld data structures), bolstered by powerul database design tools, have made it possible to substantially diminish the potential or errors in database design. &he internal model is the representation o the database as !seen$ by the D/M<. n other words, the internal model re3uires the designer to match the conceptual model1s characteristics and constraints to those o the selected implementation model. An internal sc#ema depicts a speci+c representation o an internal model, using the database constructs supported by the chosen database. &he e$ternal model is the end users1 view o the data environment.
2". What is ph&sical independence!
Hou have p#ysical independence when you can change the physical model without aGecting the internal model. &hereore, a change in storage devices or methods and even a change in operating system will not aGect the internal model. &he terms physical model and internal model may re3uire a bit o additional discussion' &he p#ysical model operates at the lowest level o abstraction, describing the way data are saved on storage media such as dis*s or tapes. &he physical model re3uires the de+nition o both the physical storage devices and the (physical) access methods re3uired to reach the data within those storage devices, ma*ing it both sotware and hardware dependent. &he storage structures used are dependent on the sotware (D/M<, operating system) and on the type o storage devices that the computer can handle. &he precision re3uired in the physical model1s de+nition demands that database designers who wor* at this level have a detailed *nowledge o the hardware and sotware used to implement the database design. &he internal model is the representation o the database as !seen$ by the D/M<. n other words, the internal model re3uires the designer to match the conceptual model1s characteristics and constraints to those o the selected implementation model. An internal sc#ema depicts a speci+c representation o an internal model, using the database constructs supported by the chosen database. •
•
24