Database characteristics Database Management Systems and features
Data Independence Components of database environment
2
Data / Information
Objectives of Data Management • Organization Organization of data for retrieval retrieval and maintenance • Data definition definition and manipulation manipulation functions functions
Data: raw facts about things and events Information: transformed data that has va ue or ec s on ma ng Data management (data retrieval and maintenance) are essential for organization’s success
• Concurrency • Reliability • Security • Data independence 3
Data Management Technology Era
Generation
Orientation
Major Features
1960s
1 st Generation
File
1970s
2 nd Generation
Network Navigation
File structures and proprietary program interfaces Networks and hierarchies of related records,
1980s
3 rd Generation
Relational
What is a Database A collection of persistent data that can be shared and interrelated interrelated.. (Mannino) A shared collection of logically related data, and a description of this data, designed to meet the information needs of an organization.. organization (Connolly & Begg)
A shared shared,, integrated computer structure that houses a collection of
• End user data, that is, raw facts of interest to the end user. user.
Multi-media, active, distributed processing, more powerful operators
• Metadata, or data about data, through which the data are integrated.. integrated (Rob & Coronel) 5
6
Lec1.1
Why Use a Database
Characteristics of Data in Databases
• Centralized Centralized control over over data
• Persistent (stored) • Inter-related
• Data integration: integration: to remove data redundancy redundancy,, inconsistency and to facilitate access •
• Shared
• Independence Independence between applications applications and data data • Performance • Scalability • Reduced application application program maintenance maintenance 7
8
Water Utility Database
University Database Registration
Grade Recording
Entities: students, faculty, courses, offerings, enrollments Relationships: faculty teach offerings, students enroll in offerings, offerings made of courses, ...
Faculty Assignment
Course Scheduling
University Database
9
10
Database Management System
Features of commercial DBMSs
• A collection of components that support data acquisition, maintenance, dissemination, dissemination, storage, maintenance, retrieval and formatting. (Mannino) • A software system that enables users to define, create, maintain and control access to the database. database.
(Connolly & Begg)
• A collection of programs that manage the database structure and control access to the data stored in the database.
Database creation Non-procedural Non-procedural data access Application development Procedural language interface Transaction processing
(Rob & Coronel)
• The software which performs management of the data in databases, including data definition, access, update, and control. 11
12
Lec1.2
University Database (ERD) Student StdSSN StdFirstName StdLastName
Entity is a distinct object (a person, place, thing, concept, event) in the organization to be represented in the database. re a ons ons p s an assoc assoc a on e ween ween entities. An attribute is a property that describes some aspect of the object that we want to record.
SQL query specifies what data to access, not how to navigate the database to retr eve t e ata Query optimizer determines the access path Improves application development productivity and performance 17
18
Lec1.3
Sample Report
Sample Data Entry Form
19
Transaction Processing
20
Airline Transaction Example START TRANSACTION Display greeting Get reservation preferences from user SELECT departure and return flight records If reservation is acceptable then UPDATE seats remaining of departure flight record UPDATE seats remaining of return flight record INSERT reservation record Print ticket if requested End If On Error: ROLLBACK COMMIT
Transaction: unit of work and unit of recovery Reliable and efficient processing of large volumes of transactions Concurrency control (simultaneous users) Recover from failures
21
Three Schema Architecture (ANSI/SPARC Model)
Data Independence
22
Software maintenance is a major part (>50%) of information system budgets
View 1 External to oncep ua Mappings
minimal impact on applications
Conceptual to Internal Mappings
View 2
Conceptual Schema
Internal Schema
23
View n
External Level
Conceptual Level
Internal Level
24
Lec1.4
Data Independence
Three Levels of ANSI/SPARC model • External view: A subset of the conceptual schema that represents user’s (application) point of view • Conceptual schema: An abstraction of the real world system in the form of a conceptual model . . • Internal schema: Low-level representation of the database, implementation of conceptual view. A physical DB consists of a collection of data files stored on one or more phy sical discs.
• Logical data independence - changes in conceptual schema do not impact on external views – e.g. adding a new attribute • Physical data independence - changes in the internal schema do not impact on conceptual schema (and external views) – e.g. add a disk for physical storage 25
26
Client-Server Architecture a) Client, server, and database on the same computer
Database Specialists
b) Mulitple clients and 1 server on different computers
Client
Client
Server
Database administrator (DBA)
Server Client
Client
Database c) Multiple servers and databases on different computers Client
a a a m n s ra or
Database
Server
Server
More technical DBMS specific skills
Less technical Planning role
Client Client
Client
Database
Database
27
28
Summary
Components of a DB system
DBA Developer
Users
DBMS Applications
Hardware
Databases and database technology vital to modern organizations Database technology supports daily Non-procedural access is a crucial feature Many opportunities to work with databases
Database
29
30
Lec1.5
Lecture 2 Relational Model
31
Relational Model
Data Structures
Data Structures Data Integrity Data Manipulation
Relations Attributes Tuples
(tables) (columns) (rows)
Cardinality (number of rows) A relation schema R(A 1, A 2, ... , A n) A relation (instance) r is a set of tuples
33
34
Relation Patient
Alternative Terminology
PATIENT Relation: Degree = 4, Cardinality = 5
D at ab as e Table
R el at io na l Files Model Relation Record-type, file
Row
Tuple
Record
Column
Attribute
Field
Relation schema: PATIENT(PATNO, NAME, SEX, AGE)
PATNO 103456 103458 103476 103496 103497
35
NAME Curran Woods Adams Harris Wilson
SEX M M F M F
AGE 59 76 62 34 24
36
Lec1.6
Properties of Relation
Alternative Representation
The ordering of rows has no significance The ordering of columns has no significance Rows must be unique Attribute values must be atomic
AGE 62
SEX F
59 76
PATNAME Adams Curran Woods
M M
PATNO 103476 103456 103458
24
Wilson
F
103497
Order of attributes and tuples in a relation has no significance
37
38
Relational Database
Attributes (Columns)
A relational database schema (intention) is a set of relation schemas {R1, R2, ... , Rn} and a set of integrity constraints. A relational database (extension) is a set of relation instances {r1, r2, ... , rn} such that each ri is an instance of Ri , and the integrity constraints defined in the relational database schema are s atisfied
Values of an attribute are simple (atomic) are drawn from same domain may be a null Nu Va ues Missing or not applicable data values Domains Set of all possible values from which an attribute is drawn Defined on data types, probably with further constraints
39
40
The Uniqueness Property
Candidate Keys
ADMISSION Relation
ADMNO PATNO DATE 101 103476 10-FEB-90
102 103
103456 11-FEB-90 103458 11-FEB-90
All tuples in a relation must have a unique relation key
Types of Keys candidate keys primary key alternate keys 41
An attribute or combination of attributes which uniquely identify tuples in a relation Uniqueness property: No two distinct tuples key K of R Irreducibility (minimality) property: No proper subset of a candidate key has the uniqueness property 42
Lec1.7
Primary and Alternate Keys
Foreign keys An attribute or combination of attributes that correspond to a primary key of some other (target) relation. Foreign key values must equal to the primary key values in the referenced re a on or e w o y nu .
Exactly one candidate key of a relation is chosen as primary key The reminder, if any, are called alternate keys
43
Data Integrity
44
Referential Integrity Example ASSIGN
Domain Integrity - attribute values are drawn from corresponding domains Entity Integrity - every relation must have primary key cannot accept null values Referential Integrity - foreign key values in the referencing relation must equal to primary key values in the target relation, or be entirely null
NAME david
RATE 90
CODE BUDGET p01 90000
SDATE EDATE 01-FEB-90 01-JUN-90
p01 p01 p01
s ue jean joe
60 45 45
p02 p03
01-MAR-90 01-APR-90 02-JUN-90 30-AUG-90
p02 p02 p03 p03 p03 p03
sue bruce david peter bruce j oe
65 50 90 60 45 45
45
Referential Integrity Rule
20000 50000
PERSON NAME joe
adam sue david bruce jean peter t om
JOB programmer manager analyst manager programmer programmer analyst m ana ge r
46
Relational Algebra
Defined on foreign keys for delete and update operations
PROJECT
CODE p01
RESTRICT (inhibit) – cannot proceed with the o eration when related records exist CASCADE – cascades the operation to related records NULLIFY – sets the foreign key values of related records to null
Algebra: SL age>60 PATIENT SELECT patno, name, sex, age FROM patient WHERE age > 60
NAME
AGE
Curran Woods Adams Harris Wilson
59 76 62 34 24
SELECT name, age FROM patient
Algebra: PJ name,age PATIENT 49
PRODUCT (Cartesian Product) Give all combination of admission and patient records. SELECT admission.*, patient .* FROM admission, patient
Algebra: ADMISSION CP PATIENT
50
ADMNO PATNO DATE
PATNO NAME
101
103476 10-FEB-90
103456 Curran M
59
101
103476 10-FEB-90
103458 Woods M
76
101
103476 10-FEB-90
103476 Adams F
62
101
103476 10-FEB-90
103496 Harris
M
34
101
103476 10-FEB-90
103497 Wilson F
24
102
103456 10-FEB-90
103456 Curran M
59
102
103456 10-FEB-90
103458 Woods M
76
102
103456 10-
103476
62
102
103456 10-FEB-90
103496 Harris
M
34
102
103456 10-FEB-90
103497 Wilson F
24
103
103458 11-FEB-90
103456 Curran M
59
103
103458 11-FEB-90
103458 Woods M
76
103
103458 11-FEB-90
103476 Adams F
62
103
103458 11-FEB-90
103496 Harris
M
34
103
103458 11-FEB-90
103497 Wilson F
24
-90
SEX
AGE
ams
51
52
θ-JOIN
Natural JOIN
List all patients admitted to the hospital, include all patient data. ADMNO PATNO DATE 101 103476 10-FEB-90 103 103458 11-FEB-90
PATNO NAME SEX 103476 Adams F
AGE 62
103458 Woods M
76
Algebra: ADMISSION JN patno=patno PATIENT SELECT admission.*, patient .* FROM admission, patient WHERE admission.patno = patient.patno
List all patients admitted to the hospital, include all patient data. ADMNO
PATNO
DATE
NAME
SEX
101 102 103
103476 103456 103458
10-FEB-90 11-FEB-90 11-FEB-90
Adams F Curran M Woods M
AGE 62 59 76
Algebra: ADMISSION NJ PATIENT
53
54
Lec1.9
Set Operators
SET OPERATORS INPATIENT Relation
A UNION B
PATNO NAME
SEX
103456 Curran M 103458 Woods M 103476 Adams F
A INTERSECT B
OUTPATIENT Relation AGE
PATNO NAME
SEX
59 76 62
103458 Woods M 103496 Harris M 103497 Wilson F
AGE 76 34 24
A MINUS B
55
UNION
Union Compatibility
56
List both inpatients and outpatients.
Same number of columns Column compatibility (i.e. the same domain)
PATNO NAME
SEX
AGE
103456 Curran oo s 103476 Adams 103496 Harris 103497 Wilson
M
59
F M F
62 34 24
A ge ra: INPATIENT UN OUTPATIENT
SELECT patno, name, sex, age FROM inpatient UNION SELECT patno, name, sex, age FROM outpatient 57
INTERSECTION
DIFFERENCE (MINUS)
Give all patients who have been treated as both inpatients and outpatients. PATNO NAME
SEX
103458 Woods M
58
Give all patients treated as inpatients only (i.e. exclude outpatients).
AGE
PATNO NAME SEX 103456 Curran M 103476 Adams F
76
AGE 59 62
Algebra: INPATIENT INT OUTPATIENT Algebra: INPATIENT DF OUTPATIENT
SELECT patno, name, sex, age FROM inpatient INTERSECT SELECT patno, name, sex, age FROM outpatient
SELECT patno, name, sex, age FROM inpatient MINUS SELECT patno, name, sex, age FROM outpatient 59
60
Lec1.10
DIVISION
Algebraic Rules
Give people who are assigned to all projects NAME Sue
Algebra: (PJcode,nameASSIGN) DV (PJcodePROJECT)
Commutative R op S = S op R UN, INT, CP, JN Associative (R op S) op T = R op (S op T) = R op S op T UN, INT, CP, JN Distributive R op1 (S op2 T) = (R op1 S) op2 (R op1 T) PJA(R UN S) = (PJAR) UN (PJAS) PJA(R INT S)≠ (PJAR) INT (PJAS) SLp(R DF S) = (SLpR) DF (SLpS)
61
62
Lecture 3 SQL Part I
63
Outline
Structured Query Language - SQL
SQL origin and evolution Basic Select Statement Arithmetic Calculations and Functions Grouping (Aggregate) Operations Update Operations
65
IBM SQL - origins of SQL and major influence, Sequel, 1974, System R, 1976 ANSI X3H2, 1982 SQL86 (120 pages) SQL/89 Integrity Enhancement SQL/92 (622 pages) SQL:1999 Object-Relational (1,000s of pages) SQL:2003 XML Key industry standard – guides the evolution of database technologies 66
Lec1.11
Schema Definition and Manipulation
Data Manipulation and Transactions
DDL Data Definition Language
create and alter table schema, drop tables create and drop index
MARGARITA Cheese, Tomato, Spice HAM Ham NAPOLITANA Anchovies, Olives STAGIONY H am , M ush ro om , O li ve s, An cho vi es HAWAIIAN Ham, Pineapple CABANOSSI Cabanossi I L The lot SICILIANO O ni on s, C ap si cu m, Ol iv es, An ch ovi es AMERICANO Salami, Pepperoni MEXICANO O ni on s, C ap si cu m, M us hr oo m, C hi l i VEGETERIAN O ni on s, C ap si cu m, M us hr oo m, P ea s MUSHROOM Mushrooms SEAFOOD Seafood Combination GARLIC Garlic
The SELECT clause corresponds to the projection operation of the relational algebra, it is used to list the columns desired in the result of a query The WHERE clause corresponds to the selection predicate of the relational algebra, it consists of a predicate involving columns of the tables that appear in the from clause The FROM clause corresponds to the Cartesian product operation of the relational algebra. It lists the tables to be scanned in the evaluation of the query
SELECT FROM WHERE GROUP BY HAVING ORDER BY
79
80
Listing Rows of Data
Ordering Rows
List all data in the MENU table.
List all pizzas in alphabetic order.
select * from menu
select * from menu order by pizza
PIZZA M AR GA RI TA HAM N AP OL IT AN A STA GI ONY HAWAIIAN CABANOSSI SPECIAL S IC IL IA NO A ME RI CA NO M EX IC AN O VEGETERIAN MUSHROOM SEAFOOD GARLIC
PIZZA A ME RI CA NO C AB AN OS SI GARLIC HAM HAWAIIAN M AR GA RI TA MEXICANO MUSHROOM N AP OL IT AN A SEAFOOD S IC IL IA NO S PE CI AL S TA GI ON Y VEGETERIAN
PIZZA SPECIAL SEAFOOD STAGIONY AMERICANO CABANOSSI HAWAIIAN MEXICANO NAPOLITANA SICILIANO VEGETERIAN HAM MUSHROOM MARGARITA GARLIC
83
COUNTRY
I TA LY
HAWAII I TA LY I TA LY I TA LY MEXICO USA I TA LY
PIZZA GARLIC M AR GA RI TA HAM MUSHROOM VEGETERIAN HAWAIIAN C AB AN OS SI N AP OL IT AN A S I CI LI AN O MEXICANO AMERICANO S TA GI ON Y SEAFOOD SPECIAL
84
Lec1.14
Eliminating Duplicates
Conditional Retrieval List all Italian pizzas.
List all price categories recorded in the MENU table, eliminating duplicates.
List all pizzas that cost less than $7.00. select pizza from menu where price < 7 PIZZA MARGARITA GARLIC
85
Compound Predicate Clauses
Selecting Sets of Values (IN) List all pizzas that do not originate from Italy or USA.
List all Italian pizzas that cost less than $7.00.
select pizza from menu where country not in ('ITALY', 'USA')
select pizza from menu where price < 7 and country = 'ITALY'
PIZZA HAWAIIAN MEXICANO I
PIZZA MARGARITA
Give all information for the VEGETERIAN, AMERICANO, MEXICANO, and GARLIC pizzas.
List all pizzas that do not originate from Italy or USA. select pizza from menu where not (country = 'ITALY' or country = 'USA')
select pizza, price, country, base from menu where pizza in ('VEGETERIAN', 'AMERICANO', 'MEXICANO', 'GARLIC')
PIZZA HAWAIIAN MEXICANO
87
Selecting Ranges (BETWEEN)
PIZZA GARLIC MEXICANO AMERICANO VEGETERIAN
PRICE COUNTRY 3.50 7.40 MEXICO 7.40 USA 7.40
BASE WM WF WM WM
88
Matching Character Strings (LIKE)
List pizzas that cost between 6 and 7 dollars.
% matches any sequence of zero or more characters List pizza with beginning with the letter 'M'.
select pizza, price from menu where price between 6 and 7 PIZZA MARGARITA
86
select pizza, price, country from menu where pizza like 'M%'
PRICE 6.20
PIZZA MARGARITA MEXICANO MUSHROOM
PRICE COUNTRY 6.20 ITALY 7.40 MEXICO 7.30
List pizzas with name ending with the letters 'ANO'. select pizza, price, country from menu where pizza like '%ANO'
89
PIZZA SICILIANO AMERIC ANO MEXICANO
PRICE 7.40 7.40 7.40
COUNTRY ITALY USA MEXICO
90
Lec1.15
Matching Character Strings (LIKE)
Matching Null Values (IS NOT NULL)
List pizzas with the substring 'II' anywhere within the pizza name.
List all pizzas, giving PIZZA name, PRICE and COUNTRY of origin where the COUNTRY of origin has a valid v alue (i.e. is not missing).
select pizza from menu where pizza like '%II%'
select pizza, price, country from menu w ere country s not nu
PIZZA HAWAIIAN
PIZZA MARGARITA NAPOLITANA STAGIONY HAWAIIAN CABANOSSI SICILIA NO AMERICANO MEXICANO
List pizzas where the third character of the COUNTRY of origin is 'X' followed by any three characters. select pizza, price, country from menu where country like '__X___' PIZZA M E XI CA NO
PRICE 6.20 7.40 7.80 7.40 7.40 7.40 7.40 7.40
COUNTRY ITALY ITALY ITALY HAWAII ITALY ITALY USA MEXICO
PRICE COUNTRY 7 . 40 M E XI CO
91
Matching Null Values (IS NULL)
Arithmetic Calculations and Functions
List all pizzas, giving PIZZA name, PRICE and COUNTRY of origin where the COUNTRY of origin has NOT been recorded (i.e. is missing).
List prices for medium and large pizzas, given that the price of a large pizza is 1.5 times that of medium size pizza. select pizza, price, price* 1.5 from menu
select pizza, price, country from menu where countr is null PIZZA HAM SPECIAL VEGETERIAN MUSHROOM SEAFOOD G AR LI C
92
PIZZA PRICE PRICE*1.5 * MARGARITA 6.20 9.3 HAM 7.30 10.95 NAPOLITANA 7.40 11.1 STAGIONY 7.80 11.7 H AW AI IA N 7 .4 0 1 1. 1 CABANOSSI 7.40 11.1 S PE CI AL 9 .9 0 14 .8 5 S IC IL IA NO 7 .4 0 1 1. 1 A ME RI CA NO 7 .4 0 1 1. 1 M EX IC AN O 7 .4 0 1 1. 1 VEGETERIAN 7.40 11.1 MUSHROOM 7.30 10.95 S EA FO OD 9 .2 0 1 3 .8 GARLIC 3.50 5.25
PRICE COUNTRY 7.30 9.90 7.40 7.30 9.20 3 .5 0
93
Column Aliases
SQL Functions
List prices for medium and large pizzas, given that the price of a large pizza is 1.5 times that of medium size pizza.
List prices for medium and large pizzas, given that the price of a large pizza is 1.5 times that of medium size pizza. select pizza, price medium, round(price* 1.5,1) large from menu
select pizza, price medium, price* 1.5 large from menu PIZZA MEDIUM MARGARITA 6.2 HAM 7.3 NAPOLITANA 7.4 STAGIONY 7.8 HAWAIIAN 7.4 CABANOSSI 7.4 SPECIAL 9.9 SICILIANO 7.4 A ME RI CA NO 7 .4 ME XI CA NO 7 .4 VEGETERIAN 7.4 MUSHROOM 7.3 SEAFOOD 9.2 G AR LI C 3 .5
94
PIZZA MEDIUM M AR GA RI TA 6 .2 H AM 7 .3 NAPOLITANA 7.4 STAGIONY 7.8 HAWAIIAN 7.4 CABANOSSI 7.4 SPECIAL 9.9 S IC IL IA NO 7 .4 A ME RI CA NO 7 .4 MEXICANO 7.4 VEGETERIAN 7.4 MUSHROOM 7.3 S EA FO OD 9 .2 GARLIC 3.5
Determine the total number of rows in the menu table.
Give the number of values recorded in the COUNTRY column.
select count(*) from menu
select count(country) from menu NO_OF_COUNTRIES 4
COUNT(*) 14
How many pizzas have a whole meal base.
How many different countries of origin are recorded in the MENU table.
select count(*) from menu where base = 'WM'
select count(distinct country) no_of_countries from menu COUNT(COUNTRY) 8
COUNT(*) 7
97
98
AVG and SUM Functions
MIN and MAX Functions Give price of the cheapest Italian pizza.
Give the average price of all pizzas.
select min(price) from menu where country = 'ITALY'
select avg(price) from menu AVG(PRICE) 7.35714286
MIN(PRICE) 6.2
Give price of the most expensive pizza.
Give the total price for MARGARITA and VEGETERIAN pizzas.
select max(price) from menu
select sum(price) from menu where pizza in ('MARGARITA', 'VEGETERIAN')
MAX(PRICE) 9.9
SUM(PRICE) 13.6
99
100
Simple Insert Operations
Grouping Operations Give the average price of pizzas for each COUNTRY of origin. select from where group
country, avg(price) menu country is not null by country
COUNTRY
AVG(PRICE)
HAWAII ITALY MEXI MEXICO USA
7.4 7.24 7.4 7.4 7.4
Insert a new row into the MENU table, containing the following information: information: PIZZA = 'AUSTRALIAN', PRICE = 8.50, COUNTRY = 'AUST', and BASE = 'WM'. insert into menu '
Give the average price of pizzas for each COUNTRY of origin, do not list countries with only one pizza. select country, avg(price) from menu where country is not null group by country having count(*) > 1
COUNTRY
AVG(PRICE)
HAWAII MEXICO
7.4 7.4
', .
,'
', '
'
Insert a new row into the MENU table, containing the following information: PIZZA = 'SUPREME', PRICE = 10.50, and BASE = 'WM' insert into menu (pizza, price, base) values('SUPREME', 10.50, 'WM')
101
102
Lec1.17
Simple Delete Operations
Updating Column Values (UPDATE)
Remove the GARLIC pizza from the MENU.
Increase the PRICE for all pizzas by 20%.
delete from menu where pizza = 'GARLIC'
update menu set price = price * 1.2
Remove all ITALIAN pizzas from the MENU.
PIZZA MARGARITA HAM NAPOLITANA STAGIONY HAWAIIAN CABANOSSI SPECIAL SICILIANO AMERICANO MEXICANO VEGETERIAN MUSHROOM SEAFOOD GARLIC
COUNTRY ITALY ITALY ITALY HAWAII ITALY ITALY USA MEXICO
BASE WF WF WF WM WM WF WF WM WM WF WM WM WF WM
103
104
Lecture 4 SQL Part II
105
Outline
Join Operations List all ingredients and their type for the MARGARITA pizza.
Join operations Outer-joins Self-joins Su queries Subqueries with aggregate functions
select from where and
i.ingredient, type recipe r, items i pizza pizza = 'MARGARITA 'MARGARITA'' i.ingre i.ingredie dient nt = r.ingr r.ingredi edient ent
INGREDIENT TYPE CHEESE DAIRY SPICE SPICE T T
List all FISH ingredients used in pizzas. select i.ingredient i.ingredient,, r.pizza r.pizza from items i, recipe r where i.ingredient = r.ingredient and i.type = 'FISH' order by 1
107
INGREDIENT ANCHOVIES ANCHOVIES ANCHOVIES ANCHOVIES ANCHOVIES ANCHOVIES ANCHOVIES SEAFOO SEAFOOD D SEAFOO SEAFOOD D
PIZZA NAPOLITANA SICILIANO SICILIANO SPECIAL SPECIAL STAGIONY O NY SEAFOOD FOOD SPECIAL CIAL
108
Lec1.18
Outer Joins
Join Operations - Alternative Syntax List all ingredients and their type for the MARGARITA pizza. select from where
ingredient, type recipe natural join items pizza = 'MARGARITA' 'MARGARITA'
INGREDIENT CHEESE SPICE T OM OM AT AT O
TYPE DAIRY SPICE V EG EG
List all FISH ingredients used in pizzas. select select ingredient, ent, pizza pizza from items join recipe using(ingredient) where type = 'FISH' order by 1 INGREDIENT ANCHOVIES ANCHOVIES ANCHOVIES ANCHOVIES ANCHOVIES SEAFOO SEAFOOD D SEAFOO SEAFOOD D
Inner join excludes non matching rows Preserving non matching rows is important in some situations Full outer join One-sided outer join
PIZZA NAPOLITANA SICILIANO SICILIANO SPECIAL SPECIAL STAGIONY O NY SEAFOO SEAFOOD D SPECIA SPECIAL L
109
110
Outer Join Operators
Outer Join Operations
Full outer join
List all FISH ingredients and the pizzas which use them. select ingredient, pizza from items natural left outer join recipe where type = 'FISH’ order by 1
et
uter o n
Unmatched rows of the left table
on
Matched rows using the join condition
g t uter o n
INGREDIENT PIZZA ANCHOVIES NAPOLITANA ANCHOVIES SPECIAL ANCHOVIES ANCHOVIES STAGIONY O NY ANCHOVIES ANCHOVIES SICILIANO A NO PRAWN SEAFOOD FOOD SEAFOO SEAFOOD D SEAFOOD FOOD SPECIA SPECIAL L
Unmatched rows of the right table
111
112
Joins involving more than two tables
Self-Join Operations Give all pizzas that originate from the same country as the SICILIANO pizza.
For the AMERICANO pizza give the base, and list all ingredients and their type.
select from where and and
select from where and and
m.pizza, m.base, i.ingredient, i.type menu m, recipe r, items i m.pizza = r.pizza r.ingredient = i.ingredient m.pizza = 'AMERICANO'
PIZZA AMERICANO RICANO AMERICANO AMERICANO A ME ME RI RI CA CA NO NO
BASE WM WM WM
m1.pizza menu m1, menu m2 m1.country = m2.country m2.pizza = 'SICILIANO' m1.pizza <> 'SICILIANO'
PIZZA MARGARITA NAPOLITANA STAGIONY CABANOSSI
INGREDIENT PEPPER PEPPERONI SALAMI SALAMI S PI PI CE CE
List all pizzas that cost more than STAGIONY pizza; give their prices.
Alternative Syntax
select from where
m1.pizza, m1.price menu m1, menu m2 m1.price > m2.price
select from where
PIZZA SPECIAL SEAFOO SEAFOOD D
PRICE 9.90 9.20 9.20
and
m2.pizza = 'STAGIONY'
113
TYPE SPICE CE MEAT S PI PI CE CE
pizza, base, ingredient, type menu natural join recipe natural join items pizza = 'AMERICANO'
114
Lec1.19
Subqueries
Multiple Table Retrieval Using Subquery
Uses WHERE and HAVING clauses for nesting subqueries Inner query executes only once o re erences o ou er query rom nner query
List all ingredients for the Mexican pizza, (i.e. COUNTRY = 'MEXICO'). select ingredient from recipe where pizza = (select pizza from menu where country = 'MEXICO') INGREDIENT CAPSICUM CHILLI ILLI MUSHROOM ONION SPICE
Alternative Syntax
select ingredient from recipe where pizza in (select pizza from menu where country = 'MEXICO')
115
INGREDIENT CAPSICUM CHILLI MUSHROOM ONION SPICE
Multiple Table Retrieval Using Subquery
116
Subqueries using ALL and ANY Give pizzas and prices for pizzas that are more expensive than all Italian pizza.
List pizzas, their COUNTRY of origin, and BASE for all pizzas containing FISH ingredients.
select pizza, price from menu where price > all (select price from menu where country = 'ITALY')
select pizza, country, base from menu where pizza in (select pizza rom rec pe where ingredient in (select ingredient from items where type = 'FISH')) PIZZA COUNTRY NAPOLITANA ITALY SEAFOOD SICILIANO ITALY SPECIAL ST AGI ON Y IT ALY
PIZZA S PE CI AL S E AF O OD
PRICE 9 .9 0 9 . 20
List pizzas with at least one MEAT ingredient. select distinct pizza from recipe where ingredient = any (select ingredient from items where type = 'MEAT')
BASE WF WF WM WF WM
117
PIZZA AMERICANO CABANOSSI HAM HAWAIIAN SPECIAL STAGIONY
118
Aggregate Functions in Subquery (HAVING)
Aggregate Functions in Subquery Give the name and price of the most expensive pizza.
Find the pizza which uses the largest number of ingredients.
select pizza, price from menu where price = (select max(price) from menu ) PIZZA S PE CII AL L
select pizza from recipe group by pizza having count(*) >= all (select count(*) from reci e group by pizza)
PRICE 9. 9 0
List pizzas costing less than the average price. select pizza, price from menu where price < (select avg(price) from menu ) PIZZA MARGARITA HAM MUSHROOM GARLIC
543-21-0987 Victoria Emmanual 654-32-1098 Leonard Fibon 098-76-5432 Leonard Vince
Querying Hierarchical Structures List faculty members who have a higher salary than their supervisors. SELECT Subr.FacSSN, Subr.FacLastName, Subr.FacSalary, Supr.FacSSN, Supr.FacLastName, Supr.FacSalary FROM Faculty Subr, Faculty Supr WHERE Subr.FacSupervisor = Supr.FacSSN AND Subr.FacSalary > Supr.FacSalary;
765-43-2109 Nicki Macon 987-65-4321 Julia Mills
876-54-3210 Christopher Colan
121
122
Lecture 5 SQL Part III
123
Outline
Correlated Subquery List each ingredient and the pizza that contains the largest amount of this ingredient.
select ingredient, pizza, amount from recipe r where amount = (select max(amount) from recipe where ingredient = r.ingredient) order by ingredient
125
INGREDIENT PIZZA AMOUNT ANCHOVIES NAPOLITANA 100 B AC ON S PE CI AL 25 CABANOSSI CABANOSSI 150 CAPSICUM SICILIANO 75 CAPSICUM MEXICANO 75 CHEESE MARGARITA 120 CHILLI MEXICANO 25 EGG SPECIAL 25 G AR LI C GA RL IC 25 HAM HAM 150 MUSHROOM MUSHROOM 100 OLIVES NAPOLITANA 75 O NI ON M EX IC AN O 75 PEAS VEGETARIAN 50 PEPPERONI AMERICANO 75 PINEAPPLE HAWAII AN 100 SALAMI AMERIC ANO 120 S EA FO OD S EA FO OD 2 00 SPICE MEXICANO 20 TOMATO MARGARITA 100
126
Lec1.21
Correlated Subquery …
Subquery using EXISTS
List ingredients used in more than one pizza.
List ingredients used in more than one pizza.
select distinct ingredient from recipe r where ingredient in (select ingredient from recipe where pizza <> r.pizza)
select distinct ingredient from recipe r where exists (select * from recipe where ingredient = r.ingredient and pizza <> r.pizza)
Which ingredients are not used in any pizza? select ingredient from items i where not exists (select * from recipe r where i.ingredient = r.ingredient) INGREDIENT PRAWN
127
Multiple-level Correlated Subquery
128
Multiple-level Correlated Subquery …
Are there any ingredients used in all pizzas?
Is there a pizza that uses all the ingredients?
select ingredient from items i where not exists (select * from menu m where not exists (select * from recipe r where r.ingredient = i.ingredient and m.. izza = r.. izza
select pizza from menu m where not exists (select * from items i where not exists (select * from recipe r where r.ingredient = i.ingredient and m. izza = r. izza))
Alternative
select ingredient from items i where not exists (select pizza from menu minus select pizza from recipe where ingredient = i.ingredient)
no records selected
INGREDIENT SPICE
129
Advanced Update Operations
130
Advanced Update Operations
Insert a new pizza ('SUPREME', 10.50, 'AUST', 'WM') which contains all the ingredients (set amount 50).
Increase the price of all pizzas that contain meat ingredients by $1.00. update menu set price = price + 1.00 where pizza in (select pizza from recipe r, items i where r.ingredient = i.ingredient and i.type = 'MEAT' )
step 1: (insert data into the MENU table) insert into menu (pizza, price, country, base) values ('SUPREME', 10.50, 'AUST', 'WM') 1 record created.
6 records updated.
step 2: (insert data into the RECIPE table) insert into recipe(pizza, ingredient, amount) select 'SUPREME', ingredient, 50 from items 20 records created.
131
132
Lec1.22
Advanced Update Operations
UNION
Remove the HAM ingredient from the RECIPE table for pizzas that contain both HAM and SALAMI.
Give ingredients used in the MUSHROOM or VEGETARIAN pizzas. select ingredient from recipe where pizza = 'MUSHROOM' union select ingredient from recipe where pizza = 'VEGETARIAN'
delete from recipe where ingredient = 'HAM' and pizza in (select pizza from recipe r1 where exists (select * from recipe r2 where r1.pizza = r2.pizza and r2.ingredient = 'SALAMI' )) 1 record deleted.
Give ingredients used in the MUSHROOM pizza, but not used in the VEGETARIAN pizza.
Give ingredients used in both the MUSHROOM, and the VEGETARIAN pizzas.
select ingredient from recipe where pizza = 'MUSHROOM' minus select ingredient from recipe where pizza = 'VEGETARIAN'
select ingredient from recipe where pizza = 'VEGETARIAN' intersect select ingredient from recipe where pizza = 'MUSHROOM' INGREDIENT MUSHROOM SPICE
no records selected
135
Parametric Queries
136
Implementation Specific Functions
List all ingredients for a given pizza. List all ingredients for a given pizza, match similar sounding names.
select ingredient from recipe where pizza = '&1' Enter value for 1: GARLIC
select pizza, ingredient from recipe where soundex(pizza) = soundex('&1')
INGREDIENT GARLIC SPICE
n er va ue or : amer can PIZZA AMERICANO AMERICANO AMERICANO
137
INGREDIENT SALAMI PEPPERONI SPICE
138
Lec1.23
SQL Views
Creating and Altering Database Objects CREATING TABLES AND INDEXES
create table menu ( pizza char(10) PRIMARY KEY, price number (7, 2 ), co un tr y c ha r( 7) , base char(2) )
create unique index menu_i01 on menu(pizza)
Data independence Controlling data access Simplify access View updatability
ALTERING TABLES alter table menu modify country char(10) alter table items add cost number(7,2) DELETING TABLES drop table menu 139
SQL Views
What is a View?
140
Derived (logical) table Treated as a base table in SQL Updatability may be restricted for some views
Create a view containing all Italian pizzas. create view italian as select * from menu where country = 'ITALY' View created. select * from italian PIZZA MARGARITA NAPOLITANA S TA GI ON Y C AB AN OS SI S IC IL IA NO
PRICE 6.20 7.40 7 .8 0 7 .4 0 7 .4 0
COUNTRY ITALY ITALY I T AL Y I T AL Y I TA LY
BASE WF WF WM WF WM
141
Views - continued
142
Views - continued
Create a view giving the total amount of ingredients used in each pizza.
Create a view consisting of all vegetarian pizzas (i.e. pizzas without meat ingredients).
create view totals(pizza, amt) as select pizza, sum(amount) from recipe group by pizza
create view vegetarian as select * from menu m where not exists (select * from items i, recipe r where i.ingredient = r.ingredient and i.t e = 'MEAT' and r.pizza = m.pizza )
View created. select * from totals
View created.
PIZZA AMT AMERIC ANO 205 CABANOSSI 160 GARLIC 35 HAM 55 HAWAIIAN 205 MARGARITA 225 MEXICANO 245 MUSHROOM 105 NAPOLITANA 185 S EA FO OD 2 05 SICILIANO 240 S PE CI AL 3 85 STAGIONY 210 VEGETARIAN 255
select * from vegetarian PIZZA M AR GA RI TA NAPOLIT ANA SI CI LI AN O MEXICANO VEGETARIAN MUSHROOM SEAFOOD GARLIC
143
PRICE COUNTRY 6 .2 0 IT AL Y 7.40 ITALY 7 .4 0 I TAL Y 7.40 MEXICO 7.40 7.30 9.20 3.50
BASE WF WF WM WF WM WM WF WM
144
Lec1.24
View Updatability
Authorization
A read-only views can be only queried Updatable view can be updated using INSERT, UPDATE and DELETE statements and requires 1-1 correspondence between rows in the view and base table rows View includes primary keys of base tables View includes all NOT NULL columns of base tables
grant on to
< user list>
revoke on from
< rivile e list> < ex pr es si on >
privilege: RETRIEVE[(column list)] INSERT UPDATE[(column list)] DELETE ALL
145
146
147
148
Authorization Examples grant on to
ALL menu managers
grant on
UPDATE(amount) recipe where pizza in (‘SPECIAL’, ‘MEXICANO’,‘GARLIC’) Chris, Anna
to
Lecture 6 Entity Relationship Modeling I
Outline
Database Development ER modeling basic concepts
Relationships
Generalisation hierarchies Alternative notations
150
Lec1.25
Database Development Data requirements
Cross Checking Requirements
Conceptual Data Modeling
System Requirements
ERD
Data Requirements
Logical Database Design
Development
Tables
ERDs, Table Design, ...
Distributed Database Design
Internal Schema
Process Models, Interaction Models, Prototypes
Operational Applications
151
152
Objectives of Data modeling
Visio Professional Oracle Designer 10g ER Assistant
Development
Operational System
E-R Modeling Tools
Cross Checking
Operational Database
Distribution Schema Physical Database Design
Application Requirements
Drawing too Diagram checking
To facilitate database design To facilitate communication between the database designers and the end users o ocumen e overa a a mo e o e system
153
154
Essential Properties of Data Models
Completeness
Recording the same data more than once can lead to
How accurately does the model reflect and enforce the business rules?
Stability and Flexibility
An E-R model is a logical representation of data of interest to an organization Components of E-R models Entities (entity types) Relationships (relationship types) Attributes An E-R model is normally expressed as a E-R diagram
Enforcement of business rules
Does the model support all the essential data objects?
Non-redundancy
E-R Model Components
How well will the model cope with possible changes to the business requirements? 155
156
Lec1.26
Entities
Entity instances
An entity (entity occurrence) corresponds to a real world object An entity type is a collection of entities of the same type that share same properties. Entity type is something about which organizations wish to store data Each entity type is identified by a name
Entity type
Two entity instances
Entity set: Employee Attributes: Employee No, Name, address, City 101 Michelle 100 pacific Ave San Francisco
201 Richard 100 City road Sydney
157
158
Relationships
Relationship Examples
A relationship (relationship occurrence) is an association of entity instances. A relationship type is a collection of relationshi s of the same t e Each relationship type is identified by a name
Relationship types: Student takes subject Staff member teaches subject
Relationship instances: Peter takes 31487 Helen teaches 31487
159
160
Relationship Participation
Relationship Cardinality
Given a relationship between entities A and B: The participation is optional if an occurrence of entity A does not require a correspon ing occurrence o entity B The participation is mandatory if occurrence of entity A has at least one corresponding occurrence of entity B
Person
ID Name
use
Workstation
serialNo
161
Person
ID Name
use
Person
ID Name
use
Person
ID Name
use
Workstation
Workstation
Workstation
serialNo
serialNo
serialNo
162
Lec1.27
Attributes and Identifiers
Basic Symbols
An Attribute is a property of an entity or a relationship type that describes some aspects of the entity or the relationship attributes that uniquely identifies individual occurrences of an entity (relationship) type
163
Cardinalities Course
164
Cardinality Notation Single line: one cardinality Inside symbol: minimum cardinality
Offering
Course1
Offering1
Course2
Offering2
Course
Offering3
CourseNo CrsDesc CrsUnits
Course3
Crow's foot: many cardinality
Offering
Has
Offering4 Outside symbol: maximum cardinality
OfferNo OffLocation OffTime
Circle: zero cardinality
165
Summary of Cardinalities
Classification of Cardinalities
Classification
Minimum cardinality based
Mandatory
Mandatory Optional
Maximum cardinality based
166
Functional 1-M M-N 1-1 167
Cardinality Restrictions
Minimum cardinality ≥ 1
Optional
Minimum cardinality = 0
Functional or singlevalued
Maximum cardinality = 1
1-M
Maximum cardinality = 1 in one direction and maximum cardinality > 1 in the other direction.
Business rule representation Diagram rules Guidelines for analysing business information needs Transformations for generating alternative designs Finalising an ERD
Ensure that ERD notation is correctly used Similar to syntax rules for a computer language omp e eness
All entities must have identifiers All relationships must have cardinality and participation specified
Supported by the ER Assistant
193
194
Naming Consistency Rules
Relationship Names
Entity Name Rule: entity type names must be unique Attribute Name Rule: attribute names must e un que w t n eac ent ty type an relationship
Participating entities provide a context for relationship names Must provide unique names for multiple re at ons ps etween t e same ent ty types
195
196
Identification Dependency Rules
Some Business Data Modeling Issues
Weak entity rule: weak entities have at least one identifying relationship
Poorly defined
Identifying relationship rule: at least one participating entity type must be weak for eac en y ng re a ons p
Identification dependency cardinality rule: each dependent entity occurrence must have one and only one parent entity occurrence associated with it 197
Conflicting statements Wide scope Missing details Many stakeholders Requirements in many formats
Add structure
Eliminate irrelevant details Add missing details Narrow the scope
198
Lec1.33
Entity Examples
Determine Entity Types and Attributes
For entity types, look for nouns that represent groups of people, places, things, and events For attributes, look for properties that provide details about the entity types If no additional properties are available treat as attribute not entity
Customer CustNo CustName CustAddr C us tT e
Meter MeterNo MtrAddr MtrSize
Bill BillNo BillDate BillStartDate BillEndDate BillDueDate
Relationship references involve associations amon nouns re resentin entit t es Sentences that involve an entity type having another entity type as a property Sentences that involve an entity type having a collection of another entity type
Use notation precisely Strive for design simplicity Consider alternative designs se agram ru es o ensure s ruc ura consistency and completeness Review design for common errors Justify important design decisions Practice 210
Note: AuthNo and ISBN references to Author and Book table, respectively. Both are NOT NULL.
Note: StdSSN and OfferNo references to Student and Offering table, respectively. Both are NOT NULL.
for
Book
ISBN Title
219
Super/Subset Transformations
Self-referencing relationships a) manager-subordinate
Faculty FacSSN FacName
Supervises
220
b) course prerequisites
Course
Employee EmpNo EmpName EmpHireDate
PrereqTo
CourseNo CrsDesc
a) Faculty(FacSSN,FacName,FacSupervisor*) FacSupervisor references to Faculty table, can be NULL b) Course(CourseNo,CrsDesc) PrereqTo(CourseNo, PrereqTo) 221
SalaryEmp
HourlyEmp
EmpSalary
EmpRate
Employee(EmpNo,EmpName,EmpHireDate,EmpSalary, EmpRate) Note: some attributes take NULL values
222
Lec1.37
Super/Subset Transformations …
Super/Subset Transformations …
Employee EmpNo EmpName EmpHireDate
Note: EmpNo references to Empoyee, cannot take NULL.
R7. Reading(ReadNo,ReadTime,ReadLevel, EmpNo,EmpName,MeterNo*,BillNo*) FDs: ReadNo → ReadTime,ReadLevel, EmpNo,EmpName,MeterNo*,BillNo EmpNo → EmpName R1-R6 are in BCNF as there are no partial and transitive dependencies (all determinants in their FDs are keys). R7 is in 2NF as there is a transitive dependency
Employee(EmpNo, EmpName) with EmpNo → EmpName Reading(ReadNo,ReadTime,ReadLevel, EmpNo*, MeterNo*,BillNo*) EmpNo references Employee MeterNo references Meter BillNo references Bill with ReadNo → ReadTime,ReadLevel, EmpNo, MeterNo,BillNo
Customer Name: Address:
233
Terry
100 Riverton Road, Bently, WA
Item Number
Descripti on
Quantity
Price ($)
Type
Total ($)
EC765432
ABC
1000
55
I-3
55,000
DR567890
BCD
2050
60.2
J-6
123,410
Sum
178,410
234
Lec1.39
Example
ERD – transformation (1)
Clarify business rules each order is prepared for one customer only each item is only appears once in one order each customer has a unique cust. number each item number is unique total value of order (Sum) can be calculated
235
236
ERD – transformation (2)
Lecture 10 Normalization Part I
Customer(CustNo, CustName, Address) Order (OrderNo, Date, CustNo*)
1. Top-Down using ERD: From requirements, create an E-R diagram, then transform it into a set of normalized tables
Data modification anomalies Functional dependencies (FDs) Findin candidate ke s for a relation Partial and transitive functional dependencies
Requirements
Transform
ERD
Check for BCNF
2. Bottom-Up normalization: From requirements form relations and normalize to BCNF Requirements
239
1NF
2NF
3NF
BCNF
240
Lec1.40
Avoiding Redundancies
Redundancy - Derived Facts
Redundancy creates inconsistencies – data modification anomalies Avoid duplication of attributes vo er ve ac s .e. a r u es a can e calculated from other attributes) To avoid redundancy, store one fact in one place
calculated totals ordno o1 o1 o1
partno p1 p2 p3
price 120 650 150
qty 4 1 3
total 480 650 450
Note: Total can be derived from price and qty.
241
242
Redundancy - Duplication
Data Modification Anomalies
ordno o1 o1 o2 o2 o3
partno p1 p2 p1 p3 p1
description keyboard screen keyboard disk drive keyboard
mu p e s orage of facts
description is only dependent on partno
Change multiple rows to change one fact. When description of the part p1 is changed, it must be changed in all 3 places Insertion anomalies: When another order wit p1 is inserte , an i erent description is entered, data will be inconsistent Deletion anomalies: If the last order with p1 is deleted, the description of p1 is lost
243
Redundant Design
244
Tables without Redundancy
StdSSN
StdMajor
OfferNo
OffTerm
Grade CourseNo
CrsDesc
S1
IS
O1
Spring
3.5
C1
DB
S1
IS
O2
Spring
3.3
C2
VB
C2
VB
St dSSN
O ff er No
G ra de
CourseNo
CrsDesc
S1 S1 S2 S2
O1 O2 O3 O2
3.5 3.3 3.1 3.4
C1 C2 C3
DB VB OO
S td SS N
S td M a jo r
S1 S2
IS FIN
. S2
FIN
O2
Spring
3.4
Problems: •Student (S1) enrolls into a new offering: •Update C2 description •S2 withdraws from offering O3 Causes: StdMajor is only dependent on StdSSN, OffTerm is only dependent on OfferNo and CrsDesc is only dependent on CourseNo. PK is 245 {StdSSN,OfferNo)
O ff e rN o
O ff Te r m
C ou r se N o
O1 O2 O3
Spring Spring Autumn
C1 C2 C3
246
Lec1.41
Functional Dependencies (FDs)
Another example (2) SNo 5432 1234 5432 2377 5432
SubjNo 31434 31434 31545 31435 31435
Mark 65 75 66 80 70
TeleNo 123456 234567 123456 456789 123456
Address Broadway Station Rd Broadway K eymark Broadway
Problems: Student (Sno5432) changes address: Student (Sno5432) takes another subject 32443 & gets 70: Student (Sno1234) withdraws from 31434:
247
Definition for functional dependency X Y : Let X and Y be arbitrary subset of attributes of a relation R . Then a functional dependency X → Y holds (i.e., X functionally determines Y or Y is functionally dependent on X ) iff (if and only if) for eac va ue o X t ere is on y one istinct va ue o Y . e.g., relation R(StdSSN,StdClass,OfferNo,OffYear,Grade,CourseNo,CrsD esc) : StdSSN →StdClass is a M:1 relationship from StdSSN(X) to StdClass(Y) within R , whenever two tuples of R agree on their X -value, they also agree on their Y -value 248
FD Definition
Functional Dependencies Determinant
X → Y X (functionally) determines Y X: left-hand-side (LHS) or determinant For each X value, there is at most one Y value
person-id
name
person-id
address
person-id project-no
hours-worked
Person-id determines Name, and is functionally dependent on Person-id 249
Can disprove FD by looking at data, i.e., two rows that have the same X value but a different Y value, e.g., OfferNo is not functionally dependent on OffYear
•
Cannot prove existence of a FD based using sample data as data is limited, violation of a FD may not appear in the subset of values e.g., StdSSN → OffYear
F1 (reflexivity): if B ⊆ A (B is a subset of A), then A → B. (SNo, Sname) → Sname
F2 au mentation : if A → B then AC → BC. If SNo → Sname, then (SNo,CNo) → (Sname,CNo)
F3 (transitivity): if A → B and B → C, then A → C. if SNo → TaxNo, TaxNo → AccNo, then SNo → AccNo
F4 (self-determination): A → A
253
254
Armstrong’s Inference Rules
Example Suppose we are given R with attributes {A, B, C, D, E, F}, and the FDs {A BC, B E, CD EF}, show AD F holds in R
F5 (decomposition): if A → BC, then A → B and A → C. if SNo → Sname, Address, then SNo → Sname, and SNo → Address F6 (union): if A → B and A → C, then A → BC. o name, o ress then SNo → (Sname, Address)
1. A 2.
1, 5
3. AD
F7 (composition): if A → B and C → D, then AC → BD. If SNo → Sname, AccNo → TaxNo then (SNo, AccNo) → (Sname,TaxNo)
BC (given) CD (2, F2)
4. CD
EF (given)
5. AD
EF (3 and 4, F3)
6. AD
F (5, F5)
255
256
Closure of a set of Attributes
Closure Calculation Example
Definition: Given a set F of FDs and a set X of attributes, the set of all attributes that are functionally dependent on X is called the closure of X , and is denoted as X + .
Suppose we are given a relation R with attributes {A, B, C, D, E, F}, and the FDs {A BC, B E, CD EF}, compute {A, D}+ Procedure: +=
/* n a za on
(2) X+ = ABCD /* first iteration
X → Y is derivable from F by using Armstrong’s axioms iff Y ⊆ X+.
(3) X+ = ABCDEF /* second iteration i.e., {A, D}+ = {A, B, C, D}+ = {A, B, C, D, E, F}+ = {A, B, C, D, E, F} 257
A superkey for a relation R is a set of attributes of R that includes at least one candidate key of R as a subset K is a superkey of a relation R iff K → A o s or every attri ute A o R K is a superkey of a relation R iff K+ consists of all attributes of R A candidate key is an irreducible superkey
Candidate Keys Suppose we are given a relation R with attributes {A, B, C, D, E, F}, and FDs {A
BC, B
E, CD
EF},
find all candidate keys for R. Note: In order to find candidate keys, finding superkeys first (1) Start with ABCD ABCD is a superkey as {A, B, C, D}+ = {A, B, C, D, E, F}=R
Key attribute: An attribute A in relation R is called a key attribute if A is a member of any candidate key of R
(2) Check the subsets of ABCD are superkeys ABC? No as {A, B,C}+ = {B, C, E}
BCD? No as {B, C, D}+ = {B, C, D, E, F} CDA? Yes as {A, C, D}+ = {A, B, C, D, E, F}; DAB? Yes as {D,A,B}+ = {A,B,C,D,E,F}
Non-key attribute: If A is not a member of any candidate key, then A is a non-key attribute
(3) Check the subsets of CDA and DAB are superkeys CD? No as CD+ = {CDEF}; DA? Yes; AC? No , AB? No, BD? No (4) AD is the only candidate key (A? No, D? No) 263
264
Lec1.44
Partial FDs
Example …
Only need to consider FDs between non-key attributes and key attributes
Given Student (SNo, Sname, Address, TeleNo) with FDs as SNo → Sname, Address, TeleNo
Definition of partial FDs: A non-key attribute is functionally dependent on a part of a candidate key, not a whole key.
A non-key attribute is dependent on another non-key attribute(s).
80 88 80
e.g. vehicle(regno, owner, address) regno → owner
owner → address
s1 changes name s1 takes a new subject and input a different name, like the last row in the table s1 takes no subject, his name is lost
regno
owner
address
ABC123 XYZ888 VBR746
Joe Joe Adam
123 Main St 123 Main St 34 Harris St
267
268
Non-key determinant FDs
Problems with Transitive FDs
Description of non-key determinant FDs:
regno
owner
ABC123 XYZ888 VBR746
Joe Joe Adam
address
The determinant is not a candidate key.
123 Main St 123 Main St 34 Harris St
e.g. participation(proj-id, person-id,
mgr-id, time-spent)
FDs: pro mgrmgr-id → proj-id proj-id, person-id → time-spent
Changing Joe’s address When buying a new car, Joe’s address is duplicated When Adam car is sold, his address is lost
Candidate keys: {proj-id, person-id}, {mgr-id, person-id} → mgr-id and mgr-id → proj-id are the nonkey determinant FDs because their determinants proj-id and mgr-id are not candidates on their own FDs proj-id
• 1NF avoids nesting or repeating groups in tables • To convert a unnnormalized table into 1NF, remove repeating groups of attributes
StdSSN StdClass
OfferNo
OffYear Grade CourseNo CrsDesc
S1
JUN
O1
2003
3.5
C1
DB
S1
JUN
O2
2003
3.3
C2
VB
S2
JUN
O3
2003
3.1
C3
OO
S2
JUN
O2
2003
3.4
C2
VB
The candidate key is (StdSSN, OfferNo)
283
284
Convert a relation/table into 2NF
2NF
• Ensure that the relation is in 1NF; • Remove partial dependencies 1. Break out each partial FD to become new table 2. Each partial FD has a determinant, this e erm nan w ecome e n a new relation 3. Determinant become FKs in the original relations Caution: the above decomposition should be lossless and all FDs must be preserved.
Relation is in 2NF if it is 1NF, and each non-key attribute is irreducibly dependent (i.e. fully functionally dependent) on relation key(s) A relation is in 2NF if every non-key attribute is dependent on , . 2NF avoids partial functional dependences Note: Violations can only occur in relations that have composite keys
285
286
2NF Example
Example of conversion to 2NF Given relation StudentScore (sno, subjno, sname, score) with FDs: sno
sname; (sno,subjno)
score
• Determination of candidate key(s) (sno,subjno) is a candidate key, sno and subjno are key attributes and score is a non-key attribute • sno → sname ---- art a
v o ates
bigUniv(StdSSN, StdCity, StdClass, OfferNo, OffTerm, OffYear, CourseNo, CrsDesc, EnrGrade) is not in 2NF because StdSSN → StdCity, StdClass OfferNo → OffTerm, OffYear, CourseNo, CrsDesc are partial FDs. Splitting the table UnivTable1 (StdSSN, StdCity, StdClass) UnivTable2 (OfferNo, OffTerm, OffYear, CourseNo, CrsDesc) UnivTable0 (StdSSN*, OfferNo*, EnrGrade)
ru e
• The highest normal form for StudentScore is 1NF
• Covert StudentScore into 2NF relations
Remove the partial FDs, we have
Student(sno, sname) with sno → sname StdScore(sno*, subjno, score) with sno,subjno → score • Check if the decomposition is lossless and dependency-preserving 287
288
Lec1.48
Convert a relation/table into 3NF
3NF
• A relation/table is in 2NF; • Remove transitive dependencies
Relation must be in 2NF, and there must be no interdependencies between non-key attributes.
1. decompose each transitive FD to form a new relation . , determinant will become the PK in the new table
3NF avoids transitive dependencies
3. Caution: ensure that the above decomposition
is lossless and dependency-preserving.
289
290
Example of Conversion to 3NF Given a relation vehicle(regno, owner, address) with regno → owner; owner → address, • Candidate key: regno • Regno is the key-attribute and the others are non-key attributes • ve c e oes no ave any par a , s n • owner → address is a transitive FD, it is not in 3NF • The highest normal form for vehicle is 2NF • Convert vehicle into 3NF by removing the TFD customer(owner, address) with owner → address vehicle(regno, owner*) with regno → owner • Check if the decomposition is lossless and dependency-preserving.
More examples of conversion into 3NF(1) Given a relation POrder(OrderNo, OrderDate,SuppName, SuppAdd, ItemNo, ItemDesc, ItemQty) with FDs:
• Determine the candidate key(s): Starting superkey: {OrderNo, ItemNo, SuppName} {OrderNo, ItemNo} is a candidate key • Determine the highest normal form POrder has partial FDs, so its highest normal form is 1NF 292
291
More examples of conversion into 3NF(2)
3NF Example
1NF-2NF: PO1(OrderNo, OrderDate,SuppName,SuppAdd) with OrderNo → OrderDate, SuppName SuppName
PO2(ItemNo, ItemDesc) with ItemNo → ItemDesc PO3(OrderNo*, ItemNo*, ItemQty) with OrderNo, ItemNo → ItemQty
2NF-3NF:
UnivTable2 ( OfferNo, OffTerm, OffYear, CourseNo, CrsDesc) with OfferNo → OffTerm, OffYear, CourseNo, CrsDesc is not in 3NF because CourseNo → CrsDesc is a artial FD. Splitting the table
PO11(SuppName,SuppAdd) with SuppName → SuppAdd PO12(OrderNo, OrderDate,SuppName*) with
BCNF is violated in situations where there is partial dependency between relation keys
Every determinant must be a candidate key. Note: • s a rev se an s ronger an e or g na 3NF definition • if a relation contains only one candidate key, 3NF and BCNF are equivalent
Note:
This is only possible if there are multiple composite candidate keys. Relations with multiple composite candidate keys are not common. 295
296
BCNF violation example 1
BCNF violation example 2
Given a relation Participation(proj-id, person-id, mgr-id, time-spent) with FDs: proj-id → mgr-id mgr-id → proj-id proj-id, person-id → time-spent
StdSSN
OfferNo
Email
EnrGrade
UnivTable4(StdSSN,OfferNo, Email, EnrGrade) , er o → nr ra e OfferNo,Email → EnrGrade StdSSN → Email, Email → StdSSN
Candidate keys: {proj-id, person-id}, {mgr-id, person-id} Participation has no partial nor transitive FDs, so it is in 3NF. However, it has FDs between proj-id and mgr-id which violates the BCNF rule. It is not in BCNF.
Candidate keys: {StdSSN, OfferNo}, {OfferNo, Email} UnivTable4 has no partial nor transitive FDs, so it is in 3NF, but StdSSN → Email, Email → StdSSN violates BCNF rule, it is not in BCNF 297
298
Convert a relation into BCNF
Example 1 of Conversion to BCNF •
• Ensure 3NF; • Ensure that all determinants are keys
•
. along with the attributes that it determines
•
2. The determinants become the PKs
• 299
Participation(proj-id, person-id, mgr-id, time-spent) proj-id → mgr-id mgr-id → proj-id proj-id, person-id → time-spent Candidate keys: - , - , - , Remove the non-key determinant FDs Project(proj-id, mgr-id) with proj-id → mgr-id and mgr-id → proj-id Participation(proj-id*, person-id, time-spent) with proj-id, person-id → time-spent Check if the decomposition is lossless and dependency-preserving. 300
Given a relation R(A, B, C, D, E, F) is in 3NF. One more FD is D → B, C Convert R into relations that are in BCNF. Remove the non-key determinant FD: R1(D,B,C) with D → B, C R2(A,D*,E,F)
Relations Customer and Order are in BCNF because their FDs don’t include partial FDs neither transitive FDs and all determinants of their FDs are keys.
Relations Items and OrderItem are in BCNF, but OrderCust is in 2NF as it has a transitive FD. 311
Transaction concepts Concurrency control Recovery management Transaction design issues
316
Transactions
Business transactions support operation of organizations Collection of database operations processed
Avoids loss of data integrity
Airline Transaction Example BEGIN TRANSACTION Display greeting Get reservation preferences from user SELECT departure and return flight records If reservation is acce table then UPDATE seats remaining of departure flight record UPDATE seats remaining of return flight record INSERT reservation record Print ticket if requested End If On Error: ROLLBACK COMMIT;
Interference among multiple concurrent users Failure recovery
317
318
Lec1.53
Transaction Processing Services
Transaction Properties
Atomic (atomicity): all or nothing Consistent (consistency): database must be consistent before and after a transaction so a e so a on : par a resu s o operations within a transaction are not revealed to other users Durable (durability): database changes are permanent after the transaction completes
Concurrency control -- To guarantee isolation and consistency in multi-user environments Recovery management -- To guarantee fault
319
Lost Update Problem
Concurrency Control
320
Transaction A Time Transaction B
Objective
Read SR (10)
Avoid loss of consistency Maximise throughput: number of transactions rocessed er unit of time Avoid hot-spots
If SR > 0 then SR = SR -1
T1 T2 T3 T4
Write SR (9)
T5 T6
Read SR (10)
If SR > 0 then SR = SR -1 Write SR (9)
321
322
Inconsistent Retrieval Problems
Uncommitted Dependency Problem
Transaction A Read SR (10) SR = SR - 1 Write SR (9) Rollback
Time Transaction B T1 T2 T3 T4 Read SR (9) T5
323
Interference causes inconsistency among multiple retrievals of a subset of data Incorrect summar Phantom read Non repeatable read
324
Lec1.54
Incorrect Summary Problem Transaction A Read SR 1 (10) SR 1 = SR 1 - 1 Write SR 1 (9)
Time T1 T2 T3 T4 5
Read SR 2 (5) SR 2 = SR 2 - 1 Write SR 2 (4)
T6 T7 T8 T9 T 10
Locking Fundamentals
Transaction B
Read SR 1 (9 ) um = um 1 Read SR 2 (5 ) Sum = Sum + SR 2
Fundamental tool for concurrency control Obtain lock before accessing an item Wait if a conflicting lock is held
S are oc : con icts wit exc usive oc s Exclusive lock: conflicts with all other kinds of locks
Concurrency control manager maintains the lock table
325
326
Locking Granularity
Deadlock (Mutual Waiting) Transaction A
Database
Table
Time Transaction B
XLock SR1
T1
XLock SR2 (wait)
T3
Index
Page Record
T4
XLock SR1 (wait)
Field 327
328
Deadlock Resolution
Detection
Two Phase Locking (2PL)
Identify deadlocks
Resolution
Protocol to prevent lost update problems All transactions must follow Conditions
Widely used although timeout interval may be difficult to determine
329
O tain oc e ore accessing item Cannot obtain new locks after releasing locks
330
Lec1.55
2PL Implementation
Optimistic Approaches
ld e h s k c
L
G
ph ing w ro
se
Assumes conflicts are rare No locks Check for conflicts
r n ng phase
Evaluation
BOT
Time
EOT
After each read and write At end of transaction Less overhead More variability
331
332
Recovery Management
Storage Device Basics
Device characteristics and failure types Recovery tools Recovery processes
Volatile: loses state after a shutdown Nonvolatile: retains state after a shutdown Nonvolatile is more reliable than volatile but a ures can cause oss o a a Use multiple levels and redundant levels of nonvolatile storage for valuable data
333
334
Failure Types
Local
Detected and abnormal termination Limited to a single transaction
Operating System
Transaction Log
Affects all active transactions Less common than local failures
Device
History of database changes Large storage overhead Operations
Un o: revert to previous state Redo: reestablish a new state
Fundamental tool of recovery management
Affects all active and past transactions Least common 335
336
Lec1.56
Checkpoints
Transaction Log Example TransNo Action Time Table Row Column Old
New
T101
BEGIN 10:29
T101
Update 10:30 Acct
A10
Bal
100
200
T101
Update 10:30 Acct
A25
Bal
500
400
T101
Insert
H100 CustNo
Reduces restart work but adds overhead
10:32 Hist
C100
T101
END
10:33
Checkpoint log record Write log buffers and database buffers
Checkpoint interval: time between checkpoints Types of checkpoints
Cache consistent Fuzzy
337
338
Other Recovery Tools
Recovery from a Media Failure
Force writing
Restore database from the most recent backup Redo all committed transactions since the
Restart active transactions
Checkpoint time End of transaction
Complete Incremental
339
340
Recovery Timeline Checkpoint
Recovery Processes Failure
Time
T1
Depend on timing of database writes Immediate update approach:
Before commit -
protocol)
T3
Deferred update approach
T4
T5
341
After commit Undo operations not needed 342
Lec1.57
Immediate Update Recovery
Deferred Update Recovery
C lass T1
D escription Finished before C P
R estart W ork N one
Class Description
Restart Work
T2
Started before CP ; f i n i s h e d b e f o r e f a i lu r e
R edo forw ard from checkpoint
T1
Finished before CP
None
T2
T3
Started after C P; f i n is h e d b e f o r e f a i lu r e
Division of work into transactions Objective: minimize transaction duration integrity constraints Transaction boundary decision can affect hot spots
345
346
Registration Form Example
Transaction Boundary Choices
One transaction for the entire form One transaction for the main form and one transaction for all subform records
separate transactions for each subform record
347
348
Lec1.58
Common Isolation Levels
Isolation Levels
Degree to which a transaction is separated from the actions of other transactions Balance concurrency control overhead with interference problems ome ransac ons can o era e uncommitted dependency and inconsistent retrieval problems Specify using the SET TRANSACTION statement
Serialisable: no interference problems Read stability: prevents lost update, uncommitted dependency, and some Uncommitted read: prevents only lost update problems
349
350
SQL Isolation Levels
Integrity Constraint Timing
L ev el
X Lo ck s SL oc ks
P L oc ks
I nt er fer en ce
None
Uncommitted dependency
Read None uncommitted
None
Read committed
Long
Short
None
Repeatable read
Long
Long
Short (S), Phantom reads Long (X)
Serializable
Long
Long
Long
Most constraints checked immediately Can defer constraint checking to EOT SQL SET CONSTRAINTS statement
All except uncommitted dependency
None 351
352
Save Points
Summary
Some transactions have tentative actions SAVEPOINT statement determines n erme a e po n s ROLLBACK to specified save points
Transaction: user-defined collection of work DBMS support ACID properties Knowledge of concurrency control and