Contents Articles Database
1
Entity – relationship model
24
Database design
32
Relational database
35
Relational model
39
Binary relation
49
Database normalization
55
First normal form
63
Second normal form
67
Third normal form
70
Boyce – Codd normal form
73
Fourth normal form
78
Fifth normal form
80
Domain/key normal form
83
Sixth normal form
85
Relation (database)
87
Functional dependency
88
Multivalued dependency
90
Join dependency
92
Concurrency control
93
References Article Sources and Contributors
101
Image Sources, Licenses and Contributors
104
Article Licenses License
105
Database
Database A database is an organized collection of data. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports processes requiring this information (for example, finding a hotel with vacancies). The term database is correctly applied to the data and their supporting data structures, and not to the database management system (DBMS). The database data collection with DBMS is called a database system. The term database system implies that the data are managed to some level of of quality quality (measured in terms of accuracy, availability, usability, and resilience) and this in turn often implies the use of a general-purpose database management system (DBMS). [1] A general-purpose DBMS is typically a complex software system that meets many usage requirements to properly maintain its databases which are often large and complex. The utilization of databases is now so widespread that virtually every technology and product relies on databases and DBMSs for its development and commercialization, or even may have DBMS software embedded in it. Also, organizations and companies, from small to large, depend heavily on databases for their operations. Well known DBMSs include FoxPro, IBM DB2, Linter, Microsoft Access, Microsoft SQL Server, MySQL, Oracle, PostgreSQL and SQLite. A database is not generally portable across different DBMS, but different DBMSs can inter-operate to some degree by using standards like SQL and ODBC together to support a single application built over more than one database. A DBMS also needs to provide effective run-time execution to properly support (e.g., in terms of performance, availability, and security) as many database end-users as needed. A way to classify databases involves the type of their contents, for example: bibliographic, document-text, statistical, or multimedia objects. Another way is by their application area, for example: accounting, music compositions, movies, banking, manufacturing, or insurance. The term database may be narrowed to specify particular aspects of organized collection of data and may refer to the logical database, to the physical database as data content in computer data storage or to many other database sub-definitions.
History Database concept The database concept has evolved since the 1960s to ease increasing difficulties in designing, building, and maintaining complex information systems (typically with many concurrent end-users, and with a large amount of diverse data). It has evolved together with database management systems which enable the effective handling of databases. Though the terms database and DBMS define different entities, they are inseparable: a database's properties are determined by its supporting DBMS. The Oxford English dictionary cites a 1962 technical report as the first to use the term "data-base." With the progress prog ress in technology in the areas of processors, computer memory, computer storage and computer networks, the sizes, capabilities, and performance of databases and their respective DBMSs have grown in orders of magnitudes. For decades it has been unlikely that a complex information system can be built effectively without a proper database supported by a DBMS. The utilization of databases is now spread to such a wide degree that virtually every technology and product relies on databases and DBMSs for its development and commercialization, or even may have such embedded in it. Also, organizations and companies, from small to large, heavily depend on databases for their operations. No widely accepted exact definition exists for DBMS. However, a system needs to provide considerable functionality to qualify as a DBMS. Accordingly its supported data collection needs to meet respective usability requirements (broadly defined by the requirements below) to qualify as a database. Thus, a database and its supporting DBMS are defined here by a set of general requirements listed below. Virtually all existing mature
1
Database
2
DBMS products meet these requirements to a great extent, while less mature either meet them or converge to meet them.
Evolution of database and DBMS technology See also Database management system#History system#History The introduction of the term database coincided with the availability of direct-access storage (disks and drums) from the mid-1960s onwards. The term represented a contrast with the tape-based systems of the past, allowing shared interactive use rather than daily batch processing. In the earliest database systems, efficiency was perhaps the primary concern, but it was already recognized that there were other important objectives. One of the key aims was to make the data independent of the logic of application programs, so that the same data could be made available to different applications. The first generation of database systems were navigational,[2] applications typically accessed data by following pointers from one record to another. The two main data models at this time were the hierarchical model, epitomized by IBM's IMS system, and the Codasyl model (Network model), implemented in a number of products such as IDMS. The relational model, first proposed in 1970 by Edgar F. Codd, departed from this tradition by insisting that applications should search for data by content, rather than by following links. This was considered necessary to allow the content of the database to evolve without constant rewriting of applications. Relational systems placed heavy demands on processing resources, and it was not until the mid 1980s that computing hardware became powerful enough to allow them to be widely deployed. By the early 1990s, however, relational systems were dominant for all large-scale data processing applications, and they remain dominant today (2012) except in niche areas. The dominant database language is the standard SQL for the Relational model, which has influenced database languages also for other data models. Because the relational model emphasizes search rather than navigation, it does not make relationships between different entities explicit in the form of pointers, but represents them rather using primary keys and foreign keys. While this is a good basis for a query language, it is less well suited as a modeling language. For this reason a different model, the entity-relationship model which emerged shortly later (1976), gained popularity for database design. In the period since the 1970s database technology has kept pace with the increasing resources becoming available from the computing platform: notably the rapid increase in the capacity and speed (and reduction in price) of disk storage, and the increasing capacity of main memory. This has enabled ever larger databases and higher throughputs to be achieved. The rigidity of the relational model, in which all data are held in tables with a fixed structure of rows and columns, has increasingly been seen as a limitation when handling information that is richer or more varied in structure than the traditional 'ledger-book' data of corporate information systems: for example, document databases, engineering databases, multimedia databases, or databases used in the molecular sciences. Various attempts have been made to address this problem, many of them gathering under banners such as post-relational or NoSQL. Two developments of note are the object database and the XML database. The vendors of relational databases have fought off competition from these newer models by extending the capabilities of their own products to support a wider variety of data types.
Database General-purpose General-purpose DBMS
A DBMS has evolved into a complex software system and its development typically requires thousands of person-years of development effort. Some general-purpose DBMSs, like Oracle, Microsoft SQL Server, FoxPro, and IBM DB2, have been undergoing upgrades for thirty years or more. General-purpose DBMSs aim to satisfy as many applications as possible, which typically makes them even more complex than special-purpose databases. However, the fact that they can be used "off the shelf", as well as their amortized cost over many applications and instances, makes them an attractive alternative (Vs. one-time development) whenever they meet an application's requirements. Though attractive in many cases, a general-purpose DBMS is not always the optimal solution: When certain applications are pervasive with many operating instances, each with many users, a general-purpose DBMS may introduce unnecessary overhead and too large "footprint" (too large amount of unnecessary, unutilized software code). Such applications usually justify dedicated development. Typical examples are email systems, though they need to possess certain DBMS properties: email systems are built in a way that optimizes email messages handling and managing, and do not need significant portions of a general-purpose DBMS functionality. Types of people involved
Three types of people are involved with a general-purpose DBMS: 1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., Oracle, IBM, Microsoft, Sybase), or, in the case of Open source DBMSs (e.g., MySQL), volunteers or people supported by interested companies and organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades. 2. Application developers and database administrators - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS inside (see embedded database; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS. 3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it. Database machines and appliances
In the 1970s and 1980s attempts were made to build database systems with integrated hardware and software. The underlying philosophy was that such integration would provide higher performance at lower cost. Examples were IBM System/38, the early offering of Teradata, and the Britton Lee, Inc. database machine. Another approach to hardware support for database management was ICL's CAFS accelerator, a hardware disk controller with programmable search capabilities. In the long term these efforts were generally unsuccessful because specialized database machines could not keep pace with the rapid development and progress of general-purpose computers. Thus most database systems nowadays are software systems running on general-purpose hardware, using general-purpose computer data storage. However this idea is still pursued for certain applications by some companies like Netezza and Oracle (Exadata).
3
Database
4
Database research Database research has been an active and diverse area, with many specializations, carried out since the early days of dealing with the database concept in the 1960s. It has strong ties with database technology and DBMS products. Database research has taken place at research and development groups of companies (e.g., notably at IBM Research, who contributed technologies and ideas virtually to any DBMS existing today), research institutes, and academia. Research has been done both through theory and prototypes. The interaction between research and database related product development has been very productive to the database area, and many related key concepts and technologies emerged from it. Notable are the Relational and the Entity-relationship models, the atomic transaction concept and related Concurrency control techniques, Query languages and Query optimization optimization methods, RAID, and more. Research has provided deep insight to virtually all aspects of databases, though not always has been pragmatic, effective (and cannot and should not always be: research r esearch is exploratory in nature, and not always leads to accepted or useful ideas). Ultimately market forces and real needs determine the selection of problem solutions and related technologies, also among those proposed by research. However, occasionally, not the best and most elegant solution wins (e.g., SQL). Along their history DBMSs and respective databases, to a great extent, have been the outcome of such research, while real product requirements and challenges triggered database research directions and sub-areas. The database research area has several notable dedicated academic journals (e.g., ACM Transactions on Database Systems-TODS, Data and Knowledge Engineering-DKE, and more) and annual conferences (e.g., ACM SIGMOD, ACM PODS, VLDB, IEEE ICDE, and more), as well as an active and quite heterogeneous (subject-wise) research community all over the world.
Database type examples The following are examples of various database types. Some of them are not main-stream types, but most of them have received special attention (e.g., in research) due to end-user requirements. Some exist as specialized DBMS products, and some have their functionality types incorporated in existing general-purpose DBMSs. Though may differ in nature and functionality, these various types typically have to comply with the usability requirements below to comply as databases. • Active database An active database is a database that includes an event-driven architecture which can respond to conditions both inside and outside the database. Possible uses include security monitoring, alerting, statistics gathering and authorization. Most modern relational databases include active database features in the form of database trigger. • Cloud database A Cloud database is a database that relies on cloud technology. Both the database and most of its DBMS reside remotely, "in the cloud," while its applications are both developed by programmers and later maintained and utilized by (application's) end-users through a web browser and Open APIs. More and more such database products are emerging, both of new vendors and by virtually all established database vendors. • Data warehouse Data warehouses archive data from operational databases dat abases and often from external sources such as market research firms. Often operational data undergo transformation transfor mation on their way into the warehouse, getting summarized, anonymized, reclassified, etc. The warehouse becomes the central source of data for use by managers and other end-users who may not have access to operational data. For example, sales data might be aggregated to weekly totals and converted from internal product codes to use UPCs so that they can be compared with ACNielsen data. Some basic and essential components of data warehousing include retrieving, analyzing, and mining data, transforming,loading and managing data so
Database
5 as to make them available for further use. Operations in a data warehouse are typically concerned with bulk data manipulation, and as such, it is unusual and inefficient to target individual rows for update, insert or delete. Bulk native loaders for input data and bulk SQL passes for aggregation are the norm.
• Distributed database The definition of a distributed database is broad, and may be utilized in different meanings. In general it typically refers to a modular DBMS architecture that allows distinct DBMS instances to cooperate as a single DBMS over processes, computers, and sites, while managing a single database distributed itself over multiple computers, and different sites. Examples are databases of local work-groups and departments at regional offices, branch offices, manufacturing plants and other work sites. These databases can include both segments shared by multiple sites, and segments specific to one site and used only locally in that site. • Document-oriented database A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information. Document-oriented databases are one of the main categories of so-called NoSQL databases and the popularity of the term "document-oriented database" (or "document store") has grown with the use of the term NoSQL itself. Utilized to conveniently store, manage, edit and retrieve documents. • Embedded database An embedded database system is a DBMS which is tightly integrated with an application application software that requires access to stored data in a way that the DBMS is “hidden” from the application ’s end-user and requires little or no ongoing maintenance. It is actually a broad technology category that includes DBMSs with differing properties and target markets. The term "embedded database" can be confusing because only a small subset of embedded database products is used in real-time embedded systems such as telecommunications switches and consumer electronics devices. [3] • End-user database These databases consist of data developed by individual end-users. Examples of these are collections of documents, spreadsheets, presentations, multimedia, and other files. Several products exist to support such databases. Some of them are much simpler than full fledged DBMSs, with more elementary elementary DBMS functionality (e.g., not supporting multiple concurrent end-users on a same database), database), with basic programming interfaces, and a relatively small "foot-print" (not much code to run as in "regular" general-purpose databases). However, also available general-purpose DBMSs can often be used for such purpose, if they provide basic user-interfaces for straightforward database applications (limited query and data display; no real programming needed), while still enjoying the database qualities and protections that these DBMSs can provide. • Federated database and multi-database A federated database is an integrated database that comprises several distinct databases, each with its own DBMS. It is handled as a single database by a federated database management system (FDBMS), which transparently integrates multiple autonomous DBMSs, possibly of different types (which makes it a heterogeneous database), and provides them with an integrated conceptual view. The constituent databases are interconnected via computer network, and may be geographically decentralized. Sometime the term multi-database is used as a synonym to federated database, though it may refer to a less integrated (e.g., without an FDBMS and a managed integrated schema) group of databases that cooperate in a single application. In this case typically middleware for distribution is used which
Database
6 typically includes an atomic commit protocol (ACP), e.g., the two-phase commit protocol, to allow distributed (global) transactions (vs. local transactions confined to a single DBMS) across the participating databases.
• Graph database A graph database is a kind of NoSQL database that uses graph structures with nodes, edges, and properties to represent and store information. General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases. • Hypermedia databases The World Wide Web can be thought of as a database, albeit one spread across millions of independent computing systems. Web browsers "process" these data one page at a time, while web crawlers and other software provide the equivalent of database indexes to support search and other activities. • Hypertext database In a Hypertext database, any word or a piece of text representing r epresenting an object, e.g., another piece of text, an article, a picture, or a film, can be linked to that object. Hypertext databases are particularly useful for organizing large amounts of disparate information. For example they are useful for organizing online encyclopedias, where users can conveniently jump in the texts, in a controlled way, by using hyperlinks. • In-memory database An in-memory database (IMDB; also main memory database or MMDB) is a database that primarily resides in main memory, but typically backed-up by non-volatile computer data storage. Main memory databases are faster than disk databases. Accessing data in memory reduces the I/O reading activity when, for example, querying the data. In applications where response time is critical, such as telecommunications network equipment, main memory databases are often used. [4] • Knowledge base A knowledge base (abbreviated KB, kb or Δ[5][6]) is a special kind of database for knowledge management, providing the means for the computerized collection, organization, and retrieval of knowledge. Also a collection of data representing problems with their solutions and related experiences. • Operational database These databases databases store detailed data about t he operations of an organization. They are typically organized by subject matter, process relatively high volumes of updates using transactions. Essentially every major organization on earth uses such databases. Examples include customer databases that record contact, credit, and demographic information about a business' customers, personnel databases that hold information such as salary, benefits, skills data about employees, Enterprise resource planning that record details about product components, parts inventory, and financial databases that keep track of the organization's money, accounting and financial dealings. • Parallel database A parallel database, run by a parallel DBMS, seeks to improve performance through parallelization for tasks such as loading data, building indexes and evaluating queries. Parallel databases improve processing and input/output speeds by using multiple central processing units (CPUs) (including multi-core processors) and storage in parallel. In parallel processing, many operations are performed simultaneously, as opposed to serial, sequential processing, where operations are performed with no time overlap. The major parallel DBMS architectures (which are induced by the underlying hardware architecture are: • Shared memory architecture, where multiple processors share the main memory space, as well as other data storage.
Database
7 • Shared disk architecture, where each processing unit (typically consisting of multiple processors) has its own main memory, but all units share the other storage. • Shared nothing architecture, where each processing unit has its own main memory and other storage.
• Real-time database If a DBMS system responses users' request in a given time period, it can be regarded as a real time database. • Spatial database A spatial database can store the data with multidimensional features. The queries on such data include location based queries, like "where is the closest hotel in my area". • Temporal database A temporal database is a database with built-in time aspects, for example a temporal data model and a temporal version of Structured Query Language (SQL). More specifically the temporal aspects usually include valid-time and transaction-time. Unstructured-data database • Unstructured-data
An unstructured-data database is intended to store in a manageable and protected way diverse objects that do not fit naturally and conveniently in common databases. It may include email messages, documents, journals, multimedia objects etc. The name may be misleading since some objects can be highly structured. However, the entire possible object collection does not fit into a predefined structured framework. Most established DBMSs now support unstructured data in various ways, and new dedicated DBMSs are emerging.
Major database usage requirements The major purpose of a database is to provide the information system (in its broadest sense) that utilizes it with the information the system needs according to its own requirements. A certain broad set of requirements refines this general goal. These database requirements translate to requirements for the respective DBMS, to allow conveniently building a proper database for the given application. If this goal is met by a DBMS, then the designers and builders of the specific database can concentrate on the application's aspects, and not deal with building and maintaining the underlying DBMS. Also, since a DBMS is complex and expensive to build and maintain, it is not economical to build such a new tool (DBMS) for every application. Rather it is desired to provide a flexible tool for handling databases for as many as possible given applications, i.e., a general-purpose DBMS.
Functional requirements Certain general functional requirements need to be met in conjunction with a database. They describe what is needed to be defined in a database for any specific application. The database's structure must be defined. The database needs to be based on a data model that is sufficiently rich to describe in the database all the needed respective application's aspects. Data definition languages exist to describe the databases within the data model. Such languages are typically data model specific. A database data model needs support by a sufficiently rich data manipulation language to allow database manipulations, and for information to be generated from the data. Such language is typically data model specific. A database needs built-in security means to protect its content (and users) from dangers of unauthorized users (either humans or programs). Protection is also provided from types of unintentional breach. Security types and levels should be defined by the database owners. Manipulating database data often involves processes of several interdependent steps, at different times (e.g., when different people's interactions are involved; e.g., generating an insurance policy). Data manipulation languages are typically intended to describe what is needed in a single such step. Dealing with multiple steps typically requires
Database writing quite complex programs. Most applications are programmed using common programming languages and software development tools. However the area of process description has evolved in the frameworks of workflow and business processes with supporting languages and software packages which considerably simplify the tasks. Traditionally these frameworks have been out of the scope of common DBMSs, but utilization of them has become common-place, and often they are provided as add-ons to DBMSs.
Operational requirements Operational requirements are needed to be met by a database in order to effectively support an application when operational. Though it typically may be expected that operational requirements are automatically met by a DBMS, in fact it is not so in most of the cases: To be met substantial work of design and tuning is typically needed by database administrators. This is typically done by specific instructions/operations through special database user interfaces and tools, and thus may be viewed as secondary functional requirements (which are not less important than the primary). Availability
A DB should maintain needed levels of availability, i.e., the DB needs to be available in a way that a user's action does not need to wait beyond a certain time range before starting executing upon the DB. Availability also relates to failure and recovery from it (see Recovery from failure and disaster below): Upon failure and during recovery normal availability changes, and special measures are needed to satisfy availability requirements. Performance
Users' actions upon the DB should be executed within needed time ranges. Isolation between users
When multiple users access the database concurrently the actions of a user should be uninterrupted and unaffected by actions of other users. These concurrent actions should maintain the DB's consistency (i.e., keep the DB from corruption). Recovery from failure and disaster
All computer systems, including DBMSs, are prone to failures for many reasons (both software and hardware related). Failures typically corrupt the DB, typically to the extent that it is impossible to repair it without special measures. The DBMS should provide automatic recovery from failure procedures that repair the DB and return it to a well defined state. Backup and restore
Sometimes it is desired to bring a database back to a previous state (for many reasons, e.g., cases when the database is found corrupted due to a software error, or if it has been updated with erroneous data). To achieve this a backup operation is done occasionally or continuously, where each desired database state (i.e., the values of its data and their embedding in database's data structures) is kept within dedicated backup files (many techniques exist to do this effectively). When this state is needed, i.e., when it is decided by a database administrator to bring the database back to this state (e.g., by specifying this state by a desired point in time when the database was in this state), these files are utilized to restore that state.
8
Database Data independence independence
Data independence pertains to a database's life cycle (see Database building, maintaining, and tuning below). It strongly impacts the convenience and cost of maintaining an application and its database, and has been the major motivation for the emergence and success of the Relational model, as well as the convergence to a common database architecture. In general the term "data independence" means that changes in the database's structure do not require changes in its application's computer programs, and that changes in the database at a certain architectural level (see below) do not affect the database's levels above. Data independence is achieved to a great extent in contemporary DBMS, but it is not completely attainable, and achieved at different degrees for different types of database structural changes.
Major database functional areas The functional areas are domains and subjects that have evolved in order to provide proper answers and solutions to the functional requirements above.
Data models A data model is an abstract structure that provides the me ans to effectively effectively describe specific data structures needed to model an application. As such a data model needs sufficient expressive power to capture the needed aspects of applications. These applications are often typical to commercial companies and other organizations (like manufacturing, human-resources, stock, banking, etc.). For effective utilization and handling it is desired that a data model is relatively simple and intuitive. This may be in conflict with high expressive power needed to deal with certain complex applications. Thus any popular general-purpose data model usually well balances between being intuitive and relatively simple, and very complex with high expressive power. The application's semantics is usually not explicitly expressed in the model, but rather implicit (and detailed by documentation external to the model) and hinted to by data item types' names (e.g., "part-number") and their connections (as expressed by generic data structure types provided by each specific model). Early data models
These models were popular in the 1960s, 1970s, but nowadays can be found primarily in old legacy systems. They are characterized primarily by being navigational with strong connections between their logical and physical representations, and deficiencies in data independence. Hierarchical model
In the Hierarchical model different record types (representing real-world entities) are embedded in a predefined hierarchical (tree-like) structure. This hierarchy is used as the physical order of records in storage. Record access is done by navigating through the data structure using pointers combined with sequential accessing. This model has been supported primarily by the IBM IMS DBMS, one of the earliest DBMSs. Various limitations of the model have been compensated at later IMS versions by additional logical hierarchies imposed on the base physical hierarchy. Network model
In this model a hierarchical relationship between two record types (representing real-world entities) is established by the set construct. A set consists of circular linked lists where one record type, the set owner or parent, appears once in each circle, and a second record type, the subordinate or child, may appear multiple times in each circle. In this way a hierarchy may be established between any two record types, e.g., type A is the owner of B. At the same time another set may be defined where B is the owner of A. Thus all the sets comprise a general directed graph (ownership defines a direction), or network construct. Access to records is either sequential (usually in each record
9
Database type) or by navigation in the circular linked lists. This model is more general and powerful than the hierarchical, and has been the most popular before being replaced by the Relational model. It has been standardized by CODASYL. Popular DBMS products that utilized it were Cincom Systems' Total and Cullinet's IDMS. IDMS gained a considerable customer base and exists and supported until today. In the 1980s it has adopted the Relational model and SQL in addition to its original tools and languages. Inverted file model
An inverted file or inverted index of a first file, by a field in this file (the inversion field), is a second file in which this field is the key. A record in the second file includes a key and pointers to records in the first file where the inversion field has the value of the key. This is also the logical structure of contemporary database indexes. The related Inverted file data model utilizes inverted files of primary database files to efficiently directly access needed records in these files. Notable for using this data model is the ADABAS DBMS of Software AG, introduced in 1970. ADABAS has gained considerable customer base and exists and supported until today. In the 1980s it has adopted the Relational model and SQL in addition to its original tools and languages. Relational model
The relational model is a simple model that provides flexibility. It organizes data based on two-dimensional arrays known as relations, or tables as related to databases. These relations consist of a heading and a set of zero or more tuples in arbitrary order. The heading is an unordered set of zero or more attributes, or columns of the table. The tuples are a set of unique attributes mapped to values, or the rows of data in the table. Data can be associated across multiple tables with a key. A key is a single, or set of multiple, attribute(s) that is common to both tables. The most common language associated with the relational model is the Structured Query Language (SQL), though it differs in some places. Object model
In recent years, the object-oriented paradigm has been applied in areas such as engineering and spatial databases, telecommunications and in various scientific domains. The conglomeration of object oriented programming and database technology led to this new kind of database. These databases attempt to bring the database world and the application-programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch ) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce key ideas of object programming, such as encapsulation and polymorphism, into the world of databases. A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application-programming side, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not provide language-level functionality for finding objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.
10
Database
11
Other database models
Products offering a more general data model than the relational model are sometimes classified as post-relational.[7] Alternate terms include "hybrid database", "Object-enhanced RDBMS" and others. The data model in such products incorporates relations but is not constrained by E.F. Codd's Information Principle, which requires that all information in the database must be cast explicitly in terms of values in relations and in no other way [8] Some of these extensions to the relational model integrate concepts from technologies that pre-date the relational model. For example, they allow representation of a directed graph with trees on the nodes. The German company sones implements this concept in its GraphDB. Some post-relational products extend relational systems with non-relational features. Others arrived in much the same place by adding relational features to pre-relational systems. Paradoxically, this allows products that are historically pre-relational, such as PICK and MUMPS, to make a plausible claim to be post-relational. The resource space model (RSM) is a non-relational data model based on multi-dimensional classification. [9]
Database languages Database languages are dedicated programming languages, tailored and utilized to • define a databas databasee (i.e., its specif specific ic data types types and the relatio relationship nshipss among them), them), • manipulate manipulate its content content (e.g., (e.g., insert new data data occurrences occurrences,, and update or delete delete existing existing ones), ones), and • query it (i.e., (i.e., request request informatio information: n: compute compute and retrieve retrieve any informat information ion based based on its data). data). Database languages are data-model-specific, i.e., each language assumes and is based on a certain structure of the data (which typically differs among different data models). They typically have commands to instruct execution of the desired operations in the database. Each such command is equivalent to a complex expression (program) in a regular programming language, and thus programming in dedicated (database) languages simplifies the task of handling databases considerably. An expressions in a database language is automatically transformed (by a compiler or interpreter, as regular programming languages) to a proper computer program that runs while accessing the database and providing the needed results. The following are notable examples: SQL for the Relational model
A major Relational model language supported by all the relational DBMSs and a standard. SQL was one of the first commercial languages for the relational model. Despite not adhering to the relational model as described by Codd, it has become the most widely used database language. [10][11] Though often described as, and to a great extent is a declarative language, SQL also includes procedural elements. SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standards (ISO) in 1987. Since then the standard has been enhanced several times with added features. However, issues of SQL code portability between major RDBMS products still exist due to lack of full compliance with, or different interpretations of the standard. Among the reasons mentioned are the large size, and incomplete specification of the standard, as well as vendor lock-in.
Database OQL for the Object model
An object model language standard (by the Object Data Management Group) that has influenced the design of some of the newer query languages like JDOQL and EJB QL, though they cannot be considered as different flavors of OQL. XQuery for the XML model
XQuery is an XML based database language (also named XQL). SQL/XML combines XQuery and XML with SQL.[12]
Database architecture Database architecture (to be distinguished from DBMS architecture; see below) may be viewed, to some extent, as an extension of data modeling. It is used to conveniently answer requirements of different end-users from a same database, as well as for other benefits. For example, a financial department of a company needs the payment details of all employees as part of the company's expenses, but not other many details about employees, that are the interest of the human resources department. Thus different departments need different views of the company's database, that both include the employees' payments, possibly in a different level of detail (and presented in different visual forms). To meet such requirement effectively database architecture consists of three levels: external, conceptual and internal. Clearly separating the three levels was a major feature of the relational database model implementations that dominate 21st century databases. [13] • The external level defines how each end-user type understands the organization of its respective relevant data in the database, i.e., the different needed end-user views. A single database can have any number of views at the external level. • The conceptual level unifies the various external views into a coherent whole, global view. [13] It provides the common-denominator of all the external views. It comprises all the end-user needed generic data, i.e., all the data from which any view may be derived/computed. It is provided in the simplest possible way of such generic data, and comprises the back-bone of the database. It is out of the scope of the various database end-users, and serves database application developers and defined by database administrators that build the database. • The Internal level (or Physical level) is as a matter of fact part of the database implementation inside a DBMS (see Implementation section below). It is concerned with cost, performance, scalability and other operational matters. It deals with storage layout of the conceptual level, provides supporting storage-structures like indexes, to enhance performance, and occasionally stores data of individual views (materialized views), computed from generic data, if performance justification exists for such redundancy. It balances all the external views' performance requirements, possibly conflicting, in attempt to optimize the overall database usage by all its end-uses according to the database goals and priorities. All the three levels are maintained and updated according to changing needs by database administrators who often also participate in the database design. The above three-level database architecture also relates to and being motivated by the concept of data independence which has been described for long time as a desired database property and was one of the major initial driving forces of the Relational model. In the context of the above architecture it means that changes made at a certain level do not affect definitions and software developed with higher level interfaces, and are being incorporated at the higher level automatically. For example, changes in the internal level do not affect application programs written using conceptual level interfaces, which saves substantial change work that would be needed otherwise. In summary, the conceptual is a level of indirection between internal and external. On one hand it provides a common view of the database, independent of different external view structures, and on the other hand it is uncomplicated by details of how the data are stored or managed (internal level). In principle every level, and even every external view, can be presented by a different data model. In practice usually a given DBMS uses the same
12
Database data model for both the external and the conceptual levels (e.g., relational model). The internal level, which i s hidden inside the DBMS and depends on its implementation (see Implementation section below), requires a different level of detail and uses its own data structure types, typically different in nature from the structures of the external and conceptual levels which are exposed to DBMS users (e.g., the data models above): While the external and conceptual levels are focused on and serve DBMS users, the concern of the internal level is effective implementation details.
Database security Database security deals with all various aspects of protecting the database content, its owners, and its users. It ranges from protection from intentional unauthorized database uses to unintentional database accesses by unauthorized entities (e.g., a person or a computer program). The following are major areas of database security (among many others). Access control
Database access control deals with controlling who (a person or a certain computer program) is allowed to access what information in the database. The information may comprise specific database objects (e.g., record types, specific records, data structures), certain computations over certain objects (e.g., query types, or specific queries), or utilizing specific access paths to the former (e.g., using specific indexes or other data structures to access information). Database access controls are set by special authorized (by the database owner) personnel that uses dedicated protected security DBMS interfaces. Data security
The definition of data security varies and may overlap with other database security aspects. Broadly it deals with protecting specific chunks of data, both physically (i.e., from corruption, or destruction, or removal; e.g., see Physical security), or the interpretation of them, or parts of them to meaningful information (e.g., by looking at the strings of bits that they comprise, concluding specific valid credit-card numbers; e.g., see Data encryption). Database audit
Database audit primarily involves monitoring that no security breach, in all aspects, has happened. If security breach is discovered then all possible corrective actions are taken.
Database design Database design is done before building it to meet needs of end-users within a given application/information-system that the database is intended to support. The database design defines the needed data and data structures that such a database comprises. A design is typically carried out according to the common three architectural levels of a database (see Database architecture above). First, the conceptual level is designed, which defines the over-all picture/view of the database, and reflects all the real-world elements (entities) the database intends to model, as well as the relationships among them. On top of it the external level, various views of the database, are designed according to (possibly completely different) needs of specific end-user types. More external views can be added later. External views requirements may modify the design of the conceptual level (i.e., add/remove entities and relationships), but usually a well designed conceptual level for an application well supports most of the needed external views. The conceptual view also determines the internal level (which primarily deals with data layout in storage) to a great extent. External views requirement may add supporting storage structures, like materialized views and indexes, for enhanced performance. Typically the internal layer is optimized for top performance, in an average way that takes into account performance requirements (possibly conflicting) of different external views according to
13
Database their relative importance. While the conceptual and external levels design can usually be done independently of any DBMS (DBMS-independent design software packages exist, possibly with interfaces to some specific popular DBMSs), the internal level design highly relies on the capabilities and internal data structure of the specific DBMS utilized (see the Implementation section below). A common way to carry out conceptual level design is to use the entity-relationship model (ERM) (both the basic one, and with possible enhancement that it has gone over), since it provides a straightforward, intuitive perception of an application's elements and semantics. An alternative approach, which preceded the ERM, is using the Relational model and dependencies (mathematical relationships) among data to normalize the database, i.e., to define the ("optimal") relations (data record or tupple types) in the database. Though a large body of research exists for this method it is more complex, less intuitive, and not more effective than the ERM method. Thus normalization is less utilized in practice than the ERM method. The ERM may be less subtle than normalization in several aspects, but it captures the main needed dependencies which are induced by keys/identifiers of entities and relationships. Also the ERM inherently includes the important inclusion dependencies (i.e., an entity instance that does not exist (has not been explicitly inserted) cannot appear in a relationship with other entities) which usually have been ignored in normalization. [14] In addition the ERM allows entity type generalization (the Is-a relationship) and implied property (attribute) inheritance (similarly to the that found in the object model). Another aspect of database design is its security. It involves both defining access control to database objects (e.g., Entities, Views) as well as defining security levels and methods for the data themselves (See Database security above). Entities and relationships
The most common database design methods are based on the entity relationship model (ERM, or ER model). This model views the world in a simplistic but very powerful way: It consists of "Entities" and the "Relationships" among them. Accordingly a database consists of entity and relationship types, each with defined attributes (field types) that model concrete entities and relationships. Modeling a database in this way typically yields an effective one with desired properties (as in some normal forms; see normalization below). Such models can be translated to any other data model required by any specific DBMS for building an effective database. Database normalization
In the design of a relational database, the process of organizing database relations to minimize redundancy is called normalization. The goal is to produce well-structured relations so that additions, deletions, and modifications of a field can be made in just one relation (table) without worrying about appearance and update of the same field in other relations. The process is algorithmic and based on dependencies (mathematical relations) that exist among relations' field types. The process result is bringing the database relations into a certain "normal form". Several normal forms exist with different properties.
Database building, maintaining, and tuning After designing a database for an application arrives the stage of building the database. Typically an appropriate general-purpose DBMS can be selected to be utilized for this purpose. A DBMS provides the needed user interfaces to be utilized by database administrators to define the needed application's data structures within the DBMS's respective data model. Other user interfaces are used to select needed DBMS parameters (like security related, storage allocation parameters, etc.). When the database is ready (all its data structures and other needed components are defined) it is typically populated with initial application's data (database initialization, which is typically a distinct project; in many cases using specialized DBMS interfaces that support bulk insertion) before making it operational. In some cases the database
14
Database
15
becomes operational while empty from application's data, and data are accumulated along its operation. After completing building the database and making it operational arrives the database maintenance stage: Various database parameters may need changes and tuning for better performance, application's data structures may be changed or added, new related application programs may be written to add to the application's functionality, etc.
Miscellaneous areas Database migration between DBMSs
See also Database migration in Data migration A database built with one DBMS is not portable to another DBMS (i.e., the other DBMS cannot run it). However, in some situations it is desirable to move, migrate a database from one DBMS to another. The reasons are primarily economical (different DBMSs may have different total costs of ownership-TCO ownership-TCO), ), functional, and operational (different DBMSs may have different capabilities). The migration involves the database's transformation from one DBMS type to another. The transformation should maintain (if possible) the database related application (i.e., all related application programs) intact. Thus, the database's conceptual and external architectural levels should be maintained in the transformation. It may be desired that also some aspects of the architecture internal level are maintained. A complex or large database migration may be a complicated and costly (one-time) project by itself, which should be factored into the decision to migrate. This in spite of the fact that tools may exist to help migration between specific DBMS. Typically a DBMS vendor provides tools to help importing databases from other popular DBMSs.
Implementation: Database management systems or How database usage requirements are met A database management system (DBMS) is a system that allows to build and maintain databases, as well as to utilize their data and r etrieve information information from it. A DBMS implements solutions (see Major functional areas above) to the database usability requirements above. It defines the database type that it supports, as well as its functionality and operational capabilities. A DBMS provides the internal processes for external applications built on them. The end-users of some such specific application are usually exposed only to that application and do not directly interact with the DBMS. Thus end-users enjoy the effects of the underlying DBMS, but its internals are completely invisible to end-users. Database designers and database administrators interact with the DBMS through dedicated interfaces to build and maintain the applications' databases, and thus need some more knowledge and understanding about how DBMSs operate and the DBMSs' external interfaces and tuning parameters. A DBMS consists of software that operates databases, providing storage, access, security, backup and other facilities to meet needed requirements. DBMSs can be categorized according to the database model(s) that they support, such as relational or XML, the type(s) of computer they support, such as a server cluster or a mobile phone, the query language(s) that access the database, such as SQL or XQuery, performance trade-offs, such as maximum scale or maximum speed or others. Some DBMSs cover more than one entry in these categories, e.g., supporting multiple query languages. Database software typically support the Open Database Connectivity (ODBC) standard which allows the database to integrate (to some extent) with other databases. The development of a mature general-purpose DBMS typically takes several years and many man-years. Developers of DBMS typically update their product to follow and take advantage of progress in computer and storage technologies. Several DBMS products like Oracle and IBM DB2 have been in on-going development since the 1970s-1980s. Since DBMSs comprise a significant economical market, computer and storage vendors often take into account DBMS requirements in their own development plans.
Database
16
DBMS architecture: major DBMS components DBMS architecture specifies its components (including descriptions of their functions) and their interfaces. DBMS architecture is distinct from database architecture. The following are major DBMS components: • DBMS external interfaces - They are the means to communicate with the DBMS (both ways, to and from the DBMS) to perform all the operations needed for the DBMS. These can be operations on a database, or operations to operate and manage the DBMS. For example: - Direct database operations: defining data types, assigning security levels, updating data, querying the database, etc. - Operations related to DBMS operation and management: backup and restore, database recovery, security monitoring, database storage allocation and database layout configuration monitoring, performance monitoring and tuning, etc. An external interface can be either a user interface (e.g., typically for a database administrator), or an application programming interface (API) used for communication between an application program and the DBMS. • Database language engines (or processors) - Most operations upon databases are performed through expression in Database languages (see above). Languages exist for data definition, data manipulation and queries (e.g., SQL), as well as for specifying various aspects of security, and more. Language expressions are fed into a DBMS through proper interfaces. A language engine processes the language expressions (by a compiler or language interpreter) to extract the intended database operations from the expression in a way that they can be executed by the DBMS. • Query optimizer - Performs query optimization on every query to choose for it the most efficient query plan (a partial order (tree) of operations) to be executed to compute the query result. • Database engine - Performs the received database operations on the database objects, typically at their higher-level representation. • Storage engine - translates the operations to low-level operations on the storage bits. In some references the Storage engine is viewed as part of the Database engine. • Transaction engine - for correctness and reliability purposes most DBMS internal operations are performed encapsulated in transactions (see below). Transactions can also be specified externally to the DBMS to encapsulate a group of operations. The transaction engine tracks all the transactions and manages their execution according to the transaction rules (e.g., proper concurrency control, and proper commit or abort for each). • DBMS management and operation component - Comprises many components that deal with all the DBMS management and operational aspects like performance monitoring and tuning, backup and restore, recovery from failure, security management and monitoring, database storage allocation and database storage layout monitoring, etc.
Database storage Database storage storag e is the container of the physical materialization of a database. It comprises the Internal (physical) level in the database architecture. It also contains all the information needed (e.g., metadata, "data about the data", and internal data structures) to reconstruct the Conceptual level and External level from the Internal level when needed. It is not part of the DBMS but rather manipulated by the DBMS (by its Storage engine; see above) to manage the database that resides in it. Though typically accessed by a DBMS through the underlying Operating system (and often utilizing the operating systems' File systems as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that
Database look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database). In principle the database storage can be viewed as a linear address space, where every bit of data has its unique address in this address space. Practically only a very small percentage of addresses is kept as initial reference points (which also requires storage), and most of the database data are accessed by indirection using displacement calculations (distance in bits from the reference points) and data structures which define access paths (using pointers) to all needed data in an effective manner, optimized for the needed data access operations. Data Coding the data and Error-correcting codes
• Data are encoded encoded by assigning assigning a bit pattern pattern to each language language alphabet alphabet character character,, digit, other other numerical numerical patterns, patterns, and multimedia object. Many standards exist for encoding (e.g., ASCII, JPEG, MPEG-4). • By adding bits bits to each encoded encoded unit, the redundanc redundancyy allows both to detect detect errors errors in coded data and to correct correct them based on mathematical algorithms. Errors occur regularly in low probabilities due to random bit value flipping, or "physical bit fatigue," loss of the physical bit in storage its ability to maintain distinguishable value (0 or 1), or due to errors in inter or intra-computer communication. A random bit flip (e.g., due to random radiation) is typically corrected upon detection. A bit, or a group of malfunctioning physical bits (not always the specific defective bit is known; group definition depends on specific storage device) is typically automatically fenced-out, taken out of use by the device, and replaced with another functioning equivalent group in the device, where the corrected bit values are restored (if possible). The Cyclic redundancy check (CRC) method is typically used in storage for error detection. Data compression
Data compression methods allow in many cases to represent a string of bits by a shorter bit string ("compress") and reconstruct the original string ("decompress") when needed. This allows to utilize substantially less storage (tens of percents) for many types of data at the cost of more computation (compress and decompress when needed). Analysis of trade-off between storage cost saving and costs of related computations and possible delays in data availability is done before deciding whether to keep certain data in a database compressed or not. Data compression is typically controlled through the DBMS's data definition interface, but in some cases may be a default and automatic. Data encryption
For security reasons certain types of data (e.g., credit-card information) may be kept encrypted in storage to prevent the possibility of unauthorized information reconstruction from chunks of storage snapshots (taken either via unforeseen vulnerabilities in a DBMS, or more likely, by bypassing it). Data encryption is typically controlled through the DBMS's data definition interface, but in some cases may be a default and automatic.
17
Database Data storage types
This collection of bits describes both the contained database data and their related metadata (i.e., data that describe the contained data and allows computer programs to manipulate the database data correctly). The size of a database can nowadays be tens of Terabytes, where a byte is eight bits. The physical materialization of a bit can employ various existing technologies, while new and improved technologies are constantly under development. Common examples are: • Magnetic medium (e.g., in Magnetic disk) - Orientation of magnetic field in magnetic regions on a surface of material (two orientation directions, for 0 and 1). • Dynamic random-access memory memory (DRAM) - State of a miniature electronic circuit consisting of few transistors (among millions nowadays) in an integrated circuit (two states for 0 and 1). These two examples are respectively for two major storage types: • Nonvolatile storage can maintain its bit states (0s and 1s) without electrical power supply, or when power supply is interrupted; • Volatile storage loses its bit values when power supply is interrupted (i.e., its content is erased). Sophisticated storage units, which can, in fact, be effective dedicated parallel computers that support a large amount of nonvolatile storage, typically must include also components with volatile storage. Some such units employ batteries that can provide power for several hours in case of external power interruption (e.g., see the EMC Symmetrix) and thus maintain the content of the volatile storage parts intact. Just before such a device's batteries lose their power the device typically automatically backs-up its volatile content portion (into nonvolatile) and shuts off to protect its data. Databases are usually too expensive (in terms of importance and needed investment in resources, e.g., time, money, to build them) to be lost by a power interruption. Thus at any point in time most of their content resides in nonvolatile storage. Even if for operational reason very large portions of them reside in volatile storage (e.g., tens of Gigabytes in volatile memory, for in-memory databases), most of this is backed-up in nonvolatile storage. A relatively small portion of this, which temporarily may not have nonvolatile backup, can be reconstructed by proper automatic database recovery procedures after volatile storage content loss. More examples of storage types: • Volatile Volatile storage storage can be found found in processors processors,, computer computer memory (e.g., (e.g., DRAM), DRAM), etc. • Non-volatile Non-volatile storage storage types types include include ROM, EPROM, EPROM, Hard disk drives, drives, Flash Flash memory and drives, drives, Storage Storage arrays, arrays, etc. Storage metrics
Databases always use several types of storage when operational (and implied several when idle). Different types may significantly differ in their properties, and the optimal mix of storage types is determined by the types and quantities of operations that each storage type needs to perform, as well as considerations like physical space and energy consumption and dissipation (which may become critical for a large database). Storage types can be categorized by the following attributes: • • • • • •
Vola Volati tile le/N /Non onvo vola latil tile. e. Cost of the medium medium (e.g., (e.g., per Megabyte Megabyte), ), Cost to operate operate (cost (cost of energy energy consumed consumed per unit unit time). Access Access spee speedd (e.g., (e.g., byte bytess per seco second) nd).. Granularit rity — from fine to coarse (e.g., size in bytes of access operation). Reliability Reliability (the (the probability probability of spontaneous spontaneous bit value value change change under various various conditions) conditions).. Maximal Maximal possible possible number number of writes (of any specific specific bit or specific specific group of bits; bits; could be constraine constrainedd by the technology used (e.g., "write once" or "write twice"), or due to "physical bit fatigue," loss of ability to distinguish between the 0, 1 states due to many state changes (e.g., in Flash memory)).
18
Database
19
• Power needed needed to operate operate (Energy per time; time; energy per per byte accessed), accessed), Energy Energy efficiency, efficiency, Heat Heat to dissipate. dissipate. • Packaging Packaging density density (e.g., (e.g., realist realistic ic number number of bytes per volume volume unit) unit) Protecting storage device content: Device mirroring (replication) and RAID
See also Disk storage replication While a group of bits malfunction may be resolved by error detection and correction mechanisms (see above), storage device malfunction requires different solutions. The following solutions are commonly used and valid for most storage devices: • Device mirroring (replication) - A common solution to the problem is constantly maintaining an identical copy of device content on another device (typically of a same type). The downside is that this doubles the storage, and both devices (copies) need to be updated simultaneously with some overhead and possibly some delays. The upside is possible concurrent read of a same data group by two independent processes, which increases performance. When one of the replicated devices is detected to be defective, the other copy is still operational, and is being utilized to generate a new copy on another device (usually available operational in a pool of stand-by devices for this purpose). • Redundant Redundant array of independent disks (RAID) - This method generalizes the device mirroring above by allowing one device in a group of N devices to fail and be replaced with content restored (Device mirroring is RAID with N=2). RAID groups of N=5 or N=6 are common. N>2 saves storage, when comparing with N=2, at the cost of more processing during both regular operation (with often reduced performance) and defective device replacement. Device mirroring and typical RAID are designed to handle a single device failure in the RAID group of devices. However, if a second failure occurs before the RAID group is completely repaired from the first failure, then data can be lost. The probability of a single failure is typically small. Thus the probability of two failures in a same RAID group in time proximity is much smaller (approximately the probability squared, i.e., multiplied by itself). If a database cannot tolerate even such smaller probability of data loss, then the RAID group itself is replicated (mirrored). In many cases such mirroring is done geographically remotely, in a different storage array, to handle also recovery from disasters (see disaster recovery above). Database storage layout
Database bits are laid-out in storage in data-structures and grouping that can take advantage of both known effective algorithms to retrieve and manipulate them and the storage own properties. Typically the storage itself is design to meet requirements of various areas that extensively utilize storage, including databases. A DBMS in operation always simultaneously utilizes several storage types (e.g., memory, and external storage), with respective layout methods. Database storage hierarchy
A database, while in operation, resides simultaneously in several types of storage. By the nature of contemporary computers most of the database part pa rt inside a computer that hosts the DBMS resides (partially replicated) in volatile storage. Data (pieces of the th e database) that are being processed /manipulated reside inside a processor, possibly in processor's caches. These data are being read from/written to memory, typically through a computer bus (so far typically volatile storage components). Computer memory is communicating data (transferred to/from) external storage, typically through standard storage interfaces or networks (e.g., fibre channel, iSCSI). A storage array, a common external storage unit, typically has storage hierarchy of it own, from a fast cache, typically consisting of (volatile and fast) DRAM, which is connected (again via standard interfaces) to drives, possibly with different speeds, like flash drives and magnetic disk drives (non-volatile). The drives may be connected to magnetic tapes, on which typically the least active parts of a large database may reside, or database backup generations.
Database Typically a correlation exists currently between storage speed and price, while the faster storage is typically volatile. Data structures
A data structure is an abstract construct that embeds data in a well defined manner. An efficient data structure allows to manipulate the data in efficient ways. The data manipulation may include data insertion, deletion, updating and retrieval in various modes. A certain data structure type may be very effective in certain operations, and very ineffective in others. A data structure type is selected upon DBMS development to best meet the operations needed for the types of data it contains. Type of data structure selected for a certain task typically also takes into consideration the type of storage it resides in (e.g., speed of access, minimal size of storage chunk accessed, etc.). In some DBMSs database administrators have the flexibility to select among options of data structures to contain user data for performance reasons. Sometimes the data structures have selectable parameters to tune the database performance. Databases may store data in many data structure types. [15] Common examples are the following: • • • • •
ordere ordered/u d/unor norder dered ed flat flat files files hash tables B+ trees ISAM heaps
Application data and DBMS data
A typical DBMS cannot store the data of the application it serves alone. In order to handle the application data the DBMS need to store these data in data structures that comprise specific data by themselves. In addition the DBMS needs its own data structures and many types of bookkeeping data like indexes and logs. The DBMS data are an integral part of the database and may comprise a substantial portion of it. Database indexing
Indexing is a technique for improving database performance. The many types of indexes share the common property that they reduce the need to examine every entry when running a query. In large databases, this can reduce query time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a binary search with an adjacent reference to the location of the entry, analogous to the index in the back of a book. The same data can have multiple indexes (an employee database could be indexed by last name and hire date.) Indexes affect performance, but not results. Database designers can add or remove indexes without changing application logic, reducing maintenance costs as the database grows and database usage evolves. Given a particular query, the DBMS' query optimizer is responsible for devising the most efficient strategy for finding matching data. Indexes can speed up data access, but they consume space in the database, and must be updated each time the data are altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine whether a given index is worth the cost. Database data clustering
In many cases substantial performance improvement is gained if different types of database objects that are usually utilized together are laid in storage in proximity, being clustered . This usually allows to retrieve needed related objects from storage in minimum number of input operations (each sometimes substantially time consuming). Even for in-memory databases clustering provides performance advantage due to common utilization of large caches for input-output operations in memory, with similar resulting behavior.
20
Database
21
For example it may be beneficial to cluster a record of an item in stock with all its respective order records. The decision of whether to cluster certain objects or not depends on the objects' utilization statistics, object sizes, caches sizes, storage types, etc. In a relational database clustering the two respective relations "Items" and "Orders" results in saving the expensive execution of a Join operation between the two relations whenever such a join is needed in a query (the join result is already ready in storage by the clustering, available to be utilized). Database materialized views
Often storage redundancy is employed to increase performance. A common example is storing materialized views , which consist of frequently needed external views or query results. Storing such views saves the expensive computing of them each time they are needed. The downsides of materialized views are the overhead incurred when updating them to keep them synchronized with their original updated database data, and the cost of storage redundancy. Database and database object replication
See also Replication below Occasionally a database employs storage redundancy by database objects replication (with one or more copies) to increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a replicated object need to be synchronized across the object copies. In many cases the entire database is replicated.
Database transactions As with every software system, a DBMS that operates in a faulty computing environment is prone to failures of many kinds. A failure can corrupt the respective database unless special measures are taken to prevent this. A DBMS achieves certain levels levels of fault fault tolerance by encapsulating operations within transactions. The concept of a database transaction (or atomic transaction ) has evolved in order to enable both a well understood database system behavior in a faulty environment where crashes can happen any time, and recovery from a crash to a well understood database state. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands). ACID rules
Every database transaction obeys the following rules: • Atomicity - Either the effects of all or none of its operations remain ("all or nothing" semantics) when a transaction is completed ( committed or aborted respectively). In other words, to the outside world a committed transaction appears (by its effects on the database) to be indivisible, atomic, and an aborted transaction does not leave effects on the database at all, as if never existed. • Consistency - Every transaction must leave the database in a consistent (correct) state, i.e., maintain the predetermined integrity integrity rules rules of the database (constraints upon and among the database's objects). A transaction t ransaction must transform a database from one consistent state to another consistent state (however, it is the responsibility of the transaction's programmer to make sure that the transaction itself is correct, i.e., performs correctly what it intends to perform (from the application's point of view) while the predefined integrity rules are enforced by the DBMS). Thus since a database can be normally changed only by transactions, all the database's states are consistent. An aborted transaction does not change the database state it has started from, as if it never existed (atomicity above). • Isolation - Transactions cannot interfere with each other (as an end result of their executions). Moreover, usually (depending on concurrency control method) the effects of an incomplete transaction are not even visible to
Database another transaction. Providing isolation is the main goal of concurrency control. • Durability - Effects of successful (committed) transactions must persist through crashes (typically by recording the transaction's effects and its commit event in a non-volatile memory). Isolation, concurrency control, and locking Isolation provides the ability for multiple users to operate on the database at the same time without corrupting the
data. • Concurrency control comprises the underlying mechanisms in a DBMS which handle isolation and guarantee related correctness. It is heavily utilized by the Database and Storage engines (see above) both to guarantee the correct execution of concurrent transactions, and (different mechanisms) the correctness of other DBMS processes. The transaction-related mechanisms typically constrain the database data access operations' timing (transaction schedules) to certain orders characterized as the Serializability and Recoverabiliry schedule properties. Constraining database access operation execution typically means reduced performance (rates of execution), and thus concurrency control mechanisms are typically designed to provide the best performance possible under the constraints. Often, when possible without harming correctness, the serializability property is compromised for better performance. However, recoverability cannot be compromised, since such typically results in a quick database integrity violation. • Locking is the most common transaction concurrency control method in DBMSs, used to provide both serializability and recoverability for correctness. In order to access a database object a transaction first needs to acquire a lock for this object. Depending on the access operation type (e.g., reading or writing an object) and on the lock type, acquiring the lock may be blocked and postponed, if another transaction is holding a lock for that object.
Query optimization A query is a request for information from a database. It can be as simple as "finding the address of a person with SS# 123-45-6789," or more complex like "finding the average salary of all the employed married men in California between the ages 30 to 39, that earn less than their wives." Queries results are generated by accessing relevant database data and manipulating them in a way that yields the requested information. Since database structures are complex, in most cases, and especially for not-very-simple queries, the needed data for a query can be collected from a database by accessing it in different ways, through different data-structures, and in different orders. Each different way typically requires different processing time. Processing times of a same query may have large variance, from a fraction of a second to hours, depending on the way selected. The purpose of query optimization, which is an automated process, is to find the way to process a given query in minimum time. The large possible variance in time justifies performing query optimization, though finding the exact optimal way to execute a query, among all possibilities, is typically very complex, time consuming by itself, may be too costly, and often pr actically impossible. Thus query optimization typically tries to approximate the optimum by comparing several common-sense alternatives to provide in a reasonable time a "good enough" plan which typically does not deviate much from the best possible result.
DBMS support for the development and maintenance of a database and its application A DBMS typically intends to provide convenient environment to develop and later maintain an application built around its respective database type. A DBMS either provides such tools, or allows integration with such external tools. Examples for tools relate to database design, application programming, application program maintenance, database performance analysis and monitoring, database configuration monitoring, DBMS hardware configuration (a DBMS and related database may span computers, networks, and storage units) and related database mapping (especially for a distributed DBMS), storage allocation and database layout monitoring, storage migration, etc.
22
Database
References database systems, Prentice-Hall Inc., Simon & Schuster, Page 1, ISBN [1] Jeffrey Jeffrey Ullman Ullman and Jennifer Jennifer widom 1997: First course in database 0-13-861337-0. [2] C. W. W. Bachm Bachmann ann,, The Programmer as Navigator [3] Graves, Graves, Steve. "COTS "COTS Databases Databases For Embedded Systems" Systems" (http:/ (http:/ / www.embedded-computing. www.embedded-computing.com/ com/ articles/ articles/ id/ id/ ?2020), ?2020), Embedded Computing Design magazine, January, 2007. Retrieved on August 13, 2008. [4] "TeleCommunication Systems Signs up as a Reseller of TimesTen; Mobile Operators and Carriers Carriers Gain Real-Time Platform for Location-Based Services" (http:/ / findarticles.com/ findarticles.com/ p/ p/ articles/ articles/ mi_m0EIN/ mi_m0EIN/ is_2002_June_24/ is_2002_June_24/ ai_87694370). ai_87694370). Business Wire. 2002-06-24. . [5] Argumentatio Argumentationn in Artificial Artificial Intelligence Intelligence by Iyad Rahwan, Guillermo Guillermo R. Simari [6] "OWL DL Semantics" Semantics" (http:/ / www.obitko. www. obitko.com/ com/ tutorials/ tutorials/ ontologies-semantic-web/ ontologies-semantic-web/ owl-dl-semantics.html). owl-dl-semantics. html). . Retrieved 10 December 2010. informatics: transforming healthcare healthcare with technology technology, Thomson, ISBN [7] Introducing databases by Stephen Chu, in Conrick, M. (2006) Health informatics: 0-17-012731-1, p. 69. [8] Date, C. J. (June (June 1, 1999). "When's "When's an extension extension not an extension?" extension?" (http:/ (http:/ / intelligent-enterprise.informationweek. intelligent-enterprise.informationweek.com/ com/ db_area/ db_area/ archives/ archives/ Enterprise 2 (8). . 1999/ 990106/ 990106/ online1. online1. jhtml;jsessionid=Y2UNK1QFKXMBTQE1GHRSKH4ATMY32JVN). jhtml;jsessionid=Y2UNK1QFKXMBTQE1GHRSKH4ATMY32JVN). Intelligent Enterprise [9] Zhuge, Zhuge, H. (2008). (2008). The Web Resource Space Model. Web Information Systems Engineering and Internet Technologies Book Series. 4. Springer. ISBN 978-0-387-72771-4. [10] Chapple, Chapple, Mike. "SQL "SQL Fundamenta Fundamentals" ls" (http:/ (http:/ / databases.about. databases.about.com/ com/ od/ od/ sql/ sql/ a/ a/ sqlfundamentals.htm). sqlfundamentals. htm). Databases. About.com. . Retrieved 2009-01-28. [11] "Structured "Structured Query Query Language Language (SQL)" (SQL)" (http:/ (http:/ / publib.boulder. publib.boulder.ibm. ibm.com/ com/ infocenter/ infocenter/ db2luw/ db2luw/ v9/ v9/ index. index. jsp?topic=com.ibm. jsp?topic=com.ibm.db2. db2.udb. udb. admin.doc/ admin. doc/ doc/ doc/ c0004100.htm). c0004100.htm). International Business Machines. October 27, 2006. . Retrieved 2007-06-10. Standardkonformität ausgewählter Datenbanksysteme, Diplomica [12] Wagner, Michael Michael (2010), (2010), "1. Auflage", Auflage", SQL/XML:2006 - Evaluierung der Standardkonformität Verlag, ISBN 3-8366-9609-6 [13] Date Date 1990, 1990, pp. 31 – 32 32 [14] Johann A. Makowsky, Victor M. Markowitz and Nimrod Rotics, 1986: "Entity-relationship consistency consistency for relational schemas" (http:/ / the 1986 Conference Conference on Database Theory Theory (ICDT '86), Lecture Notes www.springerlink. www.springerlink.com/ com/ content/ content/ p67756164r127m18/ p67756164r127m18/ ) Proceedings of the in Computer Science, 1986, Volume 243/1986, pp. 306-322, Springer, doi:10.1007/3-540-17187-8_43 [15] Lightstone, Lightstone, Teorey & Nadeau Nadeau 2007
Further reading • Ling Liu and Tamer Tamer M. Özsu Özsu (Eds.) (2009). (2009). " Encyclope Encyclopedia dia of Database Database Systems Systems (http:/ (http:/ / www.springer.com/ www.springer.com/ computer/ database+management+ database+management+& &+information+retrieval/ information+retrieval/ book/ book/ 978-0-387-49616-0), 978-0-387-49616-0), 4100 p. 60 illus. ISBN 978-0-387-49616-0. • Beynon-Davi Beynon-Davies, es, P. (2004). Database Database Systems. Systems. 3rd Edition. Edition. Palgrave, Palgrave, Houndmills, Houndmills, Basingsto Basingstoke. ke. • Connol Connolly, ly, Thomas Thomas and Caroly Carolynn Begg. Begg. Database Systems. New York: Harlow, 2002. • Date Date,, C. C. J. J. (20 (2003 03). ). An Introduction to Database Systems, Systems, Fifth Edition . Addison Wesley. ISBN 0-201-51381-1. • Gray Gray,, J. J. and and Reut Reuter er,, A. A. Transaction Processing: Concepts and Techniques , 1st edition, Morgan Kaufmann Publishers, 1992. • Kroenk Kroenke, e, Davi Davidd M. and and David David J. J. Auer. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007. • Lightst Lightstone one,, S.; Teore Teorey, y, T.; Nade Nadeau, au, T. (200 (2007). 7). Physical Database Design: the database database professional's guide to exploiting indexes, views, storage, and more . Morgan Kaufmann Press. ISBN 0-12-369389-6. • Teorey Teorey,, T.; Ligh Lightst tstone one,, S. and and Nadeau Nadeau,, T. Database Modeling & Design: Logical Design Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
External links • Data Databa base se (htt (http: p:/ / / www.dmoz.org/ www.dmoz.org/ Computers/ Computers/ Data_Formats/ Data_Formats/ Database/ Database/ ) at the Open Directory Project
23
Entityrelationship model
24
Entity –relationship model In software engineering, an Entity – Relationship model (ER model for short) is an abstract way to describe a database. It usually starts with a relational database, which stores data in tables. Some of the data in these tables point to data in other tables - for instance, your entry in the database could point to several entries for each of the phone numbers that are yours. The ER model would say that you are an entity, and each phone number is an entity, and the relationship between you and the phone numbers is 'has a phone number'. Diagrams created to design these entities and relationships are called entity –relationship diagrams or ER diagrams.
A sample Entity – Relationship diagram using Chen's notation
This article refers to the techniques proposed in Peter Chen's 1976 paper. [1] However, variants of the idea existed previously, [2] and have been devised subsequently such as supertype and subtype data entities [3] and commonality relationships (an example with additional concepts is the enhanced entity – relationship relationship model).
Overview Using the three schema approach to software engineering, there are three levels of ER models that may be developed. The conceptual data model is the highest level ER model in that it contains the least granular detail but establishes the overall scope of what is to be included within the model set. The conceptual ER model normally defines master reference data entities that are commonly used by the organization. Developing an enterprise-wide conceptual ER model is useful to support documenting the data architecture for an organization. A conceptual ER model may be used as the foundation for one or more logical data models. The purpose of the conceptual ER model is then to establish structural metadata commonality for the master data entities between the set of logical ER models. The conceptual data model may be used to form commonality relationships between ER models as a basis for data model integration. A logical ER model does not require a conceptual ER model especially if the scope of the logical ER model is to develop a single disparate information system. The logical ER model contains more detail than the conceptual ER model. In addition to master data entities, operational and transactional data entities are now defined. The details of each data entity are developed and the t he entity relationships between these data entities are established. The logical ER model is however developed independent of technology into which it will be implemented. One or more physical ER models may be developed from each logical ER model. The physical ER model is normally developed be instantiated as a database. Therefore, each physical ER model must contain enough detail to produce a database and each physical ER model is technology dependent since each database management system is somewhat different.
Entityrelationship model
25
The physical model is normally forward engineered to instantiate the structural metadata into a database management system as relational database objects such as database tables, database indexes such as unique key indexes, and database constraints such as a foreign key constraint or a commonality constraint. The ER model is also normally used to design modifications to the relational database objects and to maintain the structural metadata of the database. The first stage of information system design uses these models during the requirements analysis to describe information needs or the type of information that is to be stored in a database. The data modeling technique can be used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain area of interest. In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a physical model during physical design. Note that sometimes, both of these phases are referred to as "physical design".
A UML metamodel of Extended Entity Relationship models
The building blocks: entities, relationships, and attributes An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified. An entity is an abstraction from the complexities of a domain. When we speak of an entity, we normally speak of some aspect of the real world which can be distinguished from other aspects of the real world. [4] An entity may be a physical object such as a house or a car, an event such as a house sale or a car service, or a concept such as a customer transaction or order. Although the term entity is the one most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term.
Two related entities
An entity with an attribute
A relationship with an attribute
Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem. A relationship captures how entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Primary key
Entityrelationship model Examples: an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a proved relationship between a mathematician and a theorem. The model's linguistic aspect described above is utilized in the declarative database query language ERROL, which mimics natural language constructs. ERROL's semantics and implementation are based on Reshaped relational algebra (RRA), a relational algebra which is adapted to the entity – relationship relationship model and captures its linguistic aspect. Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security Number (SSN) attribute; the proved relationship may have a date attribute. Every entity (unless it is a weak entity) must have a minimal set of uniquely identifying attributes, which is called the entity's primary key. Entity – relationship relationship diagrams don't show single entities or single instances of relations. Rather, they show entity sets and relationship sets. Example: a particular song is an entity. The collection of all songs in a database is an entity set. The eaten relationship between a child and her lunch is a single relationship. The set of all such child-lunch relationships in a database is a relationship set. In other words, a relationship set corresponds to a relation in mathematics, while a relationship corresponds to a member of the relation. Certain cardinality constraints on relationship sets may be indicated as well.
Relationships, roles and cardinalities In Chen's original paper he gives an example of a relationship and its roles. He describes a relationship "marriage" and its two roles "husband" and "wife". A person plays the role of husband in a marriage (relationship) and another person plays the role of wife in the (same) marriage. These words are nouns. That is no surprise; naming things requires a noun. However as is quite usual with new ideas, many eagerly appropriated the new terminology but then applied it to their own old ideas. Thus the lines, arrows and crows-feet of their diagrams owed more to the earlier Bachman diagrams than to Chen's relationship diamonds. And they similarly misunderstood other important concepts. In particular, it became fashionable (now almost to the point of exclusivity) to "name" relationships and roles as verbs or phrases.
Relationship names A relationship expressed with a single verb implying direction, makes it impossible to discuss the model using the following proper English. For example: • the song song and and the artist artist are related related by a 'performs 'performs'' • the husband husband and wife wife are are related related by an 'is-mar 'is-married-t ried-to'. o'. Expressing the relationships with a noun resolves this: • the song song and and the artist artist are related related by a 'performa 'performance' nce' • the husban husbandd and and wife wife are are related related by a 'marriage'. 'marriage'. Traditionally, the relationships are expressed twice, (using present continuous verb phrases), once in each direction. This gives two English statements per relationship. For example: • the song is performed by the artist • the artist performs the song
26
Entityrelationship model
Role naming It has also become prevalent to name roles with phrases e.g. is-the-owner-of and is-owned-by etc. Correct nouns in this case are "owner" and "possession". Thus "person plays the role of owner" and "car plays the role of possession" rather than "person plays the role of is-the-owner-of" etc. The use of nouns has direct benefit when generating physical implementations from semantic models. When a person has two relationships with car then it is possible to very simply generate names such as "owner_person" and "driver_person" which are immediately meaningful.
Cardinalities Modifications to the original specification can be beneficial. Chen described look-across cardinalities. As an aside, the Barker-Ellis notation, used in Oracle Designer, uses same-side for minimum cardinality (analogous to optionality) and role, but look-across for maximum cardinality (the crows foot). In Merise,[5] Elmasri & Navathe[6] and others[7] there is a preference for same-side for roles and both minimum and maximum cardinalities. Recent researchers (Feinerer, [8] Dullea et al.[9]) have shown that this is more coherent when applied to n-ary relationships of order > 2. In Dullea et al. one reads "A 'look across' notation such as used in the UML does not effectively represent the semantics of participation constraints imposed on relationships where the degree is higher than binary." In Feinerer it says "Problems arise if we operate under the look-across semantics as used for UML associations. Hartmann[10] investigates this situation and shows how and why different transformations fail." (Although the "reduction" mentioned is spurious as the two diagrams 3.4 and 3.5 are in fact the same) and also "As we will see on the next few pages, the look-across interpretation introduces several difficulties which prevent the extension of simple mechanisms from binary to n-ary associations."
Semantic modelling The father of ER modelling said in his seminal paper: "The entity-relationship model adopts the more natural view that the real world consists of entities and relationships. It incorporates some of the important semantic information about the real world." [1] He is here in accord with philosophic and theoretical traditions from the time of the Ancient Greek philosophers: philosophers: Socrates, Socrates, Plato and Aristotle (428 BC) through to modern epistemology, semiotics and logic of Peirce, Frege and Russell. Plato himself associates knowledge with the apprehension of unchanging Forms (The forms, according to Socrates, are roughly speaking archetypes or abstract representations of the many types of things, and properties) and their relationships to one another. In his original 1976 article Chen explicitly contrasts entity – relationship relationship diagrams with record modelling techniques: "The data structure diagram is a representation of the organisation of records and is not an exact representation of entities and relationships." Several other authors also support his program: Kent in "Data and Reality" [11] : "One thing we ought to have clear in our minds at the outset of a modelling endeavour is whether we are intent on describing a portion of "reality" (some human enterprise) or a data processing activity." Abrial in "Data Semantics" : "... the so called "logical" definition and manipulation of data are still influenced (sometimes unconsciously) by the "physical" storage and retrieval mechanisms currently available on computer systems." Stamper: "They pretend to describe entity types, but the vocabulary is from data processing: fields, data items, values. Naming rules don't reflect the conventions we use for naming people and things; they reflect instead techniques for locating records in files." In Jackson's words: "The developer begins by creating a model of the reality with which the system is concerned, the reality which furnishes its [the system's] subject matter ..."
27
Entityrelationship model
28
Elmasri, Navathe: "The ER model concepts are designed to be closer to the user ’s perception of data and are not meant to describe the way in which data will be stored in the computer." A semantic model is a model of concepts, it is sometimes called a "platform independent model". It is an intensional model. At the latest since Carnap, it is well known that: [12] "...the full meaning of a concept is constituted by two aspects, its intension and its extension. The first part comprises the embedding of a concept in the world of concepts as a whole, i.e. the totality of all relations to other concepts. The second part establishes the referential meaning of the concept, i.e. its counterpart in the real or in a possible world". An extensional model is that which maps to the elements of a particular methodology or technology, and is thus a "platform specific model". The UML specification explicitly states that associations in class models are extensional and this is in fact self-evident by considering the extensive array of additional "adornments" provided by the specification over and above those provided by any of the prior candidate "semantic modelling languages"."UML as a Data Modeling Notation, Part 2" [13]
Diagramming conventions Chen's notation for entity – relationship relationship modeling uses rectangles to represent entities, and diamonds to represent relationships appropriate for first-class objects: they can have attributes and relationships of their own. Entity sets are drawn as rectangles, relationship sets as diamonds. If an entity set participates in a relationship set, they are connected with a line. Attributes are drawn as ovals and are connected with a line to exactly one entity or relationship set. Cardinality constraints are expressed as follows: • a doub double le line line ind indic icat ates es a participation constraint , totality or surjectivity: all entities in the entity set must participate in at least one relationship in the relationship set; • an arrow arrow from from entity entity set set to relatio relationship nship set indica indicates tes a key constraint, i.e. injectivity: each entity of the entity set can participate in at most one relationship in the relationship set; • a thick line indicat indicates es both, both, i.e. bijectivity: bijectivity: each entity in the entity set is involved in exactly one relationship.
Various methods of representing the same one to many relationship. In each case, the diagram shows the relationship between a person and a place of birth: each person must have been born at one, and only one, location, but each location may have had zero or more people born at it.
Entityrelationship model
29
• an unde underli rlined ned name name of an attr attribu ibute te indicates that it is a key: two different entities or relationships with this attribute always have different values for this attribute. Attributes are often omitted as they can clutter up a diagram; other diagram techniques often list entity attributes within the rectangles drawn for entity sets.
Two related entities shown using Crow's Foot notation. In this example, an optional relationship is shown between Artist and Song; the symbols closest to the song entity represents "zero, one, or many", whereas a song has "one and only one" Artist. The former is therefore read as, an Artist (can) perform(s) "zero, one, or many" song(s).
Related diagramming convention techniques: • • • • • • • • •
Bach Bachma mann nota notati tion on Bark Barker er's 's Nota Notati tion on EXPRESS IDEF1X[14] Mart Martin in not notat atio ionn (min, max)-notatio max)-notationn of Jean-Raymo Jean-Raymond nd Abrial Abrial in 1974 1974 UML UML cla class ss diag diagra rams ms Merise Obje Object ct-R -Rol olee Mode Modelin lingg
Crow's Foot Notation Crow's Foot notation is used in Barker's Notation, SSADM and Information Engineering. Crow's Foot diagrams represent entities as boxes, and relationships as lines between the boxes. Different shapes at the ends of these lines represent the cardinality of the relationship. Crow's Foot notation was used in the 1980s by the consultancy practice CACI. Many of the consultants at CACI (including Richard Barker) subsequently moved to Oracle UK, where they developed the early versions of Oracle's CASE tools, introducing the notation to a wider audience. The following tools use Crow's Foot notation: ARIS, System Architect, Visio, PowerDesigner, Toad Data Modeler, DeZign for Databases, Devgems Data Modeler, OmniGraffle, MySQL Workbench and SQL Developer Data Modeler. CA's ICASE tool, CA Gen aka Information Engineering Facility also uses this notation.
ER diagramming tools There are many ER diagramming tools. Free software ER diagramming tools that can interpret and generate ER models and SQL and do database analysis are MySQL Workbench (formerly DBDesigner), and Open ModelSphere (open-source). A freeware ER tool that can generate database and application layer code (webservices) is the RISE Editor. Proprietary ER diagramming tools are Avolution, dbForge Studio for MySQL, ER/Studio, ERwin, MagicDraw, MEGA International, ModelRight, Navicat Data Modeler, OmniGraffle, Oracle Designer, PowerDesigner, Rational Rose, Sparx Enterprise Architect, SQLyog, System Architect, Toad Data Modeler, and Visual Paradigm. Free software diagram tools just draw the shapes without having any knowledge of what they mean, nor do they generate SQL. These include Creately, yEd, LucidChart, Kivio, and Dia.
Entityrelationship model
Limitations ER models assume information content that can readily be represented in a relational database. They describe only a relational structure for this information. Hence, they are inadequate for systems in which the information cannot readily be represented in relational form, such as with semi-structured data. Furthermore, for many systems, the possible changes to the information contained are nontrivial and important enough to warrant explicit specification. Some authors have extended ER modeling with constructs to represent change, an approach supported by the original author; [15] an example is Anchor Modeling. An alternative is to model change separately, using a process modeling technique. Additional techniques can be used for other aspects of systems. For instance, ER models roughly correspond to just 1 of the 14 different modeling techniques offered by UML. Another limitation: ER modeling is aimed at specifying information from scratch. This suits the design of new, standalone information systems, but is of less help in integrating pre-existing information sources that already define their own data representations in detail. Even where it is suitable in principle, ER modeling is rarely used as a separate activity. One reason for this is today's abundance of tools to support diagramming and other design support directly on relational database management systems. These tools can readily extract database diagrams that are very close to ER diagrams from existing databases, and they provide alternative views on the information contained in such diagrams. In a survey, Brodie and Liu[16] could not find a single instance of entity – relationship relationship modeling inside a sample of ten [17] Fortune 100 companies. Badia and Lemire blame this lack of use on the lack of guidance but also on the lack of benefits, such as lack of support for data integration. Also, the enhanced entity – relationship relationship model (EER modeling) introduces several concepts which are not present in ER modeling.
References [1] "The Entity Entity Relationship Relationship Model: Model: Toward a Unified View View of Data" (http:/ / citeseerx.ist. citeseerx. ist.psu. psu.edu/ edu/ viewdoc/ viewdoc/ summary?doi=10.1. summary?doi=10.1.1. 1.123. 123. 1085) for entity – relationship relationship modeling. [2] A.P.G. Brown, "Modelling a Real-World System and Designing a Schema to Represent It", in Douque and Nijssen (eds.), Data Base Description, North-Holland, 1975, ISBN 0-7204-2833-5. [3] “Designing a Logical Database: Supertypes and Subtypes” (http:/ / technet.microsoft. technet.microsoft.com/ com/ en-us/ en-us/ library/ library/ cc505839.aspx) cc505839.aspx) [4] Paul Beynon-Davies (2004). Database Systems. Houndmills, Basingstoke, UK: Palgrave [5] Hubert Tardieu, Arnold Rochfeld and René Colletti La methode MERISE: Principes Principes et outils (Paperback - 1983) [6] Elmasri, Ramez, B. B. Shamkant, Navathe, Fundamentals of Database Systems, third ed., Addison-Wesley, Menlo Park, Park, CA, USA, 2000. [7] ER 2004 : 23rd International Conference on Conceptual Modeling, Shanghai, China, November 8-12, 2004 (http:/ / books.google. books.google.com/ com/ books?id=odZK99osY1EC&pg=PA52& books?id=odZK99osY1EC& pg=PA52&img=1& img=1&pgis=1& pgis=1&dq=genova& dq=genova&sig=ACfU3U3tDC_q8WOMqUJW4EZCa5YQywoYLw& sig=ACfU3U3tDC_q8WOMqUJW4EZCa5YQywoYLw&edge=0) edge=0) [8] A Formal Treatment of UML Class Diagrams as an Efficient Method for Configuration Management 2007 (http:/ / publik.tuwien. publik.tuwien.ac. ac.at/ at/ files/ pub-inf_4582.pdf) pub-inf_4582.pdf) [9] James Dullea, Il-Yeol Song, Ioanna Lamprou - An analysis of structural validity in entity-relationship modeling 2002 (http:/ / www.ischool. www.ischool. drexel.edu/ drexel. edu/ faculty/ faculty/ song/ song/ publications/ publications/ p_DKE_03_Validity.pdf) p_DKE_03_Validity.pdf) [10] "Reasoning about participation constraints and Chen's constraints" S Hartmann - 2003 (http:/ / www.acs. www. acs.org. org.au/ au/ documents/ documents/ public/ public/ crpit/ crpit/ CRPITV17Hartmann.pdf) CRPITV17Hartmann.pdf) [11] [11] http http:/ :/ / www.bkent. www. bkent.net/ net/ Doc/ Doc/ darxrp.htm darxrp.htm [12] [12] http http:/ :/ / wenku.baidu.com/ wenku.baidu. com/ view/ view/ 8048e7bb1a37f111f1855b22.html 8048e7bb1a37f111f1855b22.html [13] [13] http http:/ :/ / www.tdan. www. tdan.com/ com/ view-articles/ view-articles/ 8589 8589 [14] IDEF1X IDEF1X (https (https:/ :/ / idbms.navo. idbms.navo.navy. navy.mil/ mil/ DataModel/ DataModel/ IDEF1X.html) IDEF1X.html) [15] P. Chen. Suggested research directions for a new frontier: Active conceptual modeling (http:/ / www.springerlink. www.springerlink.com/ com/ content/ content/ 5160x2634402663r/ ). ). ER 2006, volume 4215 of Lecture Notes in Computer Science, pages 1 – 4. 4. Springer Berlin / Heidelberg, 2006. [16] M. L. Brodie and J. T. Liu. The power and limits of relational technology in the age of information ecosystems ecosystems (http:/ / www. www. michaelbrodie.com/ michaelbrodie.com/ documents/ documents/ The The Power and Limits of Relational Technology In the Age of Information Ecosystems V2.pdf). V2.pdf). On The Move Federated Conferences, 2010.
30
Entityrelationship model [17] A. Badia and D. Lemire. Lemire. A call to arms: revisiting revisiting database database design (http:/ (http: / / dl.acm. dl. acm.org/ org/ citation.cfm?id=2070750). citation.cfm?id=2070750). SIGMOD Record 40, 3 (November 2011), 61-69.
Further reading • Rich Richar ardd Bar Barke kerr (19 (1990 90). ). CASE Method: Tasks and Deliverables . Wokingham, England: Addison-Wesley. • Paul Paul Beyn Beynon on-D -Dav avie iess (2004 (2004). ). Database Systems. Houndmills, Basingstoke, UK: Palgrave • Peter Chen Chen (March (March 1976). "The "The Entity-Rela Entity-Relationsh tionship ip Model Model - Toward a Unified Unified View of Data". Data". ACM Transactions 36. doi:10.1145/320434.320440. on Database Systems 1 (1): 9 – 36. • 1976. "The "The Entity-Rel Entity-Relations ationship hip Model--To Model--Toward ward a Unified Unified View View of Data" Data" (http:/ (http:/ / csc.lsu. csc.lsu.edu/ edu/ news/ news/ erd.pdf). erd.pdf). In: 36 ACM Transactions on Database Systems Systems 1/1/1976 ACM-Press ISSN 0362-5915, S. 9 – 36
External links • Entity Entity Relat Relation ionshi shipp Modeli Modeling ng (http (http:/ :/ / www.devarticles. www.devarticles.com/ com/ c/ c/ a/ a/ Development-Cycles/ Development-Cycles/ Entity-Relationship-Modeling/ ) - Article from Development Cycles • Entity Entity Relat Relation ionshi shipp Modell Modelling ing (http (http:/ :/ / www.databasedesign.co.uk/ www.databasedesign.co. uk/ bookdatabasesafirstcourse/ bookdatabasesafirstcourse/ chap3/ chap3/ chap3. chap3. htm) • An Entity Relationsh Relationship ip Diagram Diagram Example Example (http:/ / rapidapplicationdevelopment.blogspot.com/ rapidapplicationdevelopment.blogspot.com/ 2007/ 2007/ 06/ 06/ entity-relationship-diagram-example.html). entity-relationship-diagram-example. html). Demonstrates the crow's feet notation by way of an example. • "Entity-Rela "Entity-Relations tionship hip Modeling: Modeling: Historical Historical Events, Events, Future Trends, Trends, and Lessons Lessons Learned" (http:/ (http:/ / bit.csc.lsu. bit.csc. lsu.edu/ edu/ ~chen/ pdf/ pdf/ Chen_Pioneers.pdf) Chen_Pioneers.pdf) by Peter Chen. • "Engli "English, sh, Chin Chinese ese and and ER diagr diagrams ams"" (http:/ (http:/ / bit.csc. bit.csc.lsu. lsu.edu/ edu/ ~chen/ ~chen/ pdf/ pdf/ ER_C.pdf) ER_C.pdf) by Peter Chen. • Case study: study: E-R diagram diagram for Acme Fashion Fashion Supplies Supplies (http:/ (http:/ / www.cilco.co.uk/ www.cilco.co.uk/ briefing-studies/ briefing-studies/ acme-fashion-supplies-feasibility-study/ slides/ slides/ logical-data-structure.html) logical-data-structure.html) by Mark H. Ridley. • Logical Logical Data Data Structures Structures (LDSs) - Getting Getting starte startedd (http:/ (http:/ / www.cems. www.cems.uwe. uwe.ac. ac.uk/ uk/ ~tdrewry/ ~tdrewry/ lds.htm) lds.htm) by Tony Drewry. • Introdu Introducti ction on to Data Data Model Modeling ing (http:/ (http:/ / www.utexas. www.utexas.edu/ edu/ its/ its/ archive/ archive/ windows/ windows/ database/ database/ datamodeling/ datamodeling/ index. index. html) • Lectur Lecturee by Prof.Dr Prof.Dr.Mu .Muhit hittin tin GÖKME GÖKMEN N (http:/ (http:/ / www3.itu. www3.itu.edu. edu.tr/ tr/ ~gokmen/ ~gokmen/ SE-lecture-5.pdf), SE-lecture-5.pdf), Department of Computer Engineering, Istanbul Technical University. • ER-Dia ER-Diagra gram m Conven Conventio tionn (http: (http:/ / / www.scribd.com/ www.scribd. com/ doc/ doc/ 3053988/ 3053988/ ER-Diagram-convention) ER-Diagram-convention) • Crow's Crow's Foot Foot Notati Notation on (http: (http:/ / / www2.cs. www2.cs.uregina. uregina.ca/ ca/ ~bernatja/ ~bernatja/ crowsfoot.html) crowsfoot.html) • "Articulate "Articulatedd Entity Relationsh Relationship ip (AER) Diagram Diagram for Complete Complete Automation Automation of Relational Relational Database Database Normalization" (http:/ / airccse.org/ airccse.org/ journal/ ijdms/ ijdms/ papers/ papers/ 0510ijdms06.pdf) 0510ijdms06.pdf) P. S. Dhabe, Dr. M. S. Patwardhan, Asavari A. Deshpande.
31
Database design
32
Database design Database design is the process of producing a detailed data model of a database. This logical data model contains all
the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database. A fully attributed data model contains detailed attributes for each entity. The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views. In an object database the entities and relationships map directly to object classes and named relationships. However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system (DBMS). [1] The process of doing database design generally consists of a number of steps which will be carried out by the database designer. Usually, the designer must: • Determine Determine the relati relationship onshipss between between the differen differentt data elemen elements. ts. • Superimpose Superimpose a logical logical structur structuree upon the data data on the basis basis of these relation relationships ships.. [2]
ER Diagram (Entity-relationship model) Database designs also include ER (Entity-relationship model) diagrams. An ER diagram is a diagram that helps to design databases in an efficient way. Attributes in ER diagrams are usually modeled as an oval with the name of the attribute, linked to the entity or relationship that contains the attribute. Within the relational model the final step can generally be broken down into two further steps, that of determining the grouping of information within the system, generally determining what are the basic objects about which information is being stored, and then determining the relationships between these groups of information, or objects. This step is not necessary with an Object database. [2]
A sample Entity-relationship diagram
Database design
The Design Process[3] 1. Determine the purpose of the database - This helps prepare for the remaining steps. 2. Find and organize the information required - Gather all of the types of information to record in the database, such as product name and order number. 3. Divide the information into tables - Divide information items into major entities or subjects, such as Products or Orders. Each subject then becomes a table. 4. Turn information items into columns - Decide what information needs to stored in each table. Each item becomes a field, and is displayed as a column in the table. For example, an Employees table might include fields such as Last Name and Hire Date. 5. Specify primary keys - Choose each table ’s primary key. The primary key is a column that is used to uniquely identify each row. An example might be Product ID or Order ID. 6. Set up the table relationships - Look at each table and decide how the data in one table is related to the data in other tables. Add fields to tables or create new tables to clarify the relationships, as necessary. 7. Refine the design - Analyze the design for errors. Create tables and add a few records of sample data. Check if results come from the tables as expected. Make adjustments to the design, as needed. 8. Apply the normalization rules - Apply the data normalization rules to see if tables are structured correctly. Make adjustments to the tables
Determining data to be stored In a majority of cases, a person who is doing the design of a database is a person with expertise in the area of database design, rather than expertise in the domain from which the data to be stored is drawn e.g. financial information, biological information etc. Therefore the data to be stored in the database must be determined in cooperation with a person who does have expertise in that domain, and who is aware of what data must be stored within the system. This process is one which is generally considered part of requirements analysis, and requires skill on the part of the database designer to elicit the needed information from those with the domain knowledge. This is because those with the necessary domain knowledge frequently cannot express clearly what their system requirements for the database are as they are unaccustomed to thinking in terms of the discrete data elements which must be stored. Data to be stored can be determined by Requirement Specification. [4]
Normalization In the field of relational database design, normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics —insertion, update, and deletion anomalies —that could lead to a loss of data integrity. A standard piece of database design guidance is that the designer should create a fully normalized design; selective denormalization can subsequently be performed, but only for performance reasons. However, some modeling disciplines, such as the dimensional modeling approach to data warehouse design, explicitly recommend non-normalized designs, i.e. designs that in large part do not adhere to 3NF. Normalization consists of normal forms that are 1NF,2NF,3NF,BOYCE-CODD NF (3.5NF),4NF and 5NF
33
Database design
Types of Database design Conceptual schema Once a database designer is aware of the data which is to be stored within the database, they must then determine where dependency is within the data. Sometimes when data is changed you can be changing other data that is not visible. For example, in a list of names and addresses, assuming a situation where multiple people can have the same address, but one person cannot have more than one address, the name is dependent upon the address, because if the address is different, then the associated name is different too. However, the other way around is different. One attribute can change and not another. (NOTE: A common misconception is that the relational model is so called because of the stating of relationships between data elements therein. This is not true. The relational model is so named because it is based upon the mathematical structures known as relations.)
Logically structuring data Once the relationships and dependencies amongst the various pieces of information have been determined, it is possible to arrange the data into a logical structure which can then be mapped into the storage objects supported by the database management system. In the case of relational databases the storage objects are tables which store data in rows and columns. Each table may represent an implementation of either a logical object or a relationship joining one or more instances of one or more logical objects. Relationships between tables may then be stored as links connecting child tables with parents. Since complex logical relationships are themselves tables they will probably have links to more than one parent. In an Object database the storage objects correspond directly to the objects used by the Object-oriented programming language used to write the applications that will manage and access the data. The relationships may be defined as attributes of the object classes involved or as methods that operate on the object classes.
Physical design The physical design of the database specifies the physical configuration of the database on the storage media. This includes detailed specification of data elements, data types, indexing options and other parameters residing in the DBMS data dictionary. It is the detailed design of a system that includes modules & the database's hardware & software specifications of the system.
References [1] Gehani, N. (2006). The Database Book: Principles and practice using MySQL. 1st ed., Summit, NJ.: Silicon Press [2] Teorey, T.J., Lightstone, S.S., et al., (2009). Database Design: Know it all.1st ed. Burlington, MA.: Morgan Kaufmann Publishers [3] Database design basics. (n.d.). Database Database design basics. basics. Retrieved May 1, 2010, from http:/ / office.microsoft. office. microsoft.com/ com/ en-us/ en-us/ access/ access/ HA012242471033.aspx HA012242471033.aspx Modeling & Design: Logical Logical Design, 4th edition, Morgan Kaufmann Press. ISBN [4] Teorey, Teorey, T.; Lightstone, Lightstone, S. S. and Nadeau, Nadeau, T.(2005) T.(2005) Database Modeling 0-12-685352-5
34
Database design
35
Further reading • S. Light Lightsto stone, ne, T. Teore Teorey, y, T. T. Nadea Nadeau, u, “Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more ”, Morgan Kaufmann Press, 2007. ISBN 0-12-369389-6
External links • (http:/ / www.sqlteam.com/ www.sqlteam. com/ article/ article/ database-design-and-modeling-fundamentals) database-design-and-modeling-fundamentals) • (http:/ / office.microsoft. office.microsoft.com/ com/ en-us/ en-us/ access/ access/ HA012242471033.aspx) HA012242471033.aspx) • Databa Database se Norma Normaliz lizati ation on Basic Basicss (http:/ (http:/ / databases.about. databases.about.com/ com/ od/ od/ specificproducts/ specificproducts/ a/ a/ normalization.htm) normalization.htm) by Mike Chapple (About.com) • Databa Database se Norma Normaliz lizati ation on Intro Intro (htt (http:/ p:/ / www.databasejournal. www.databasejournal.com/ com/ sqletc/ sqletc/ article.php/ article. php/ 1428511), 1428511), Part 2 (http:/ / www.databasejournal. www.databasejournal.com/ com/ sqletc/ sqletc/ article.php/ article.php/ 26861_1474411_1) 26861_1474411_1) • "An Introdu Introduction ction to Databas Databasee Normaliz Normalization" ation" (http:/ / web.archive. web.archive.org/ org/ web/ web/ 20110606025027/ 20110606025027/ http:/ http:/ / dev. dev. mysql.com/ mysql.com/ tech-resources/ tech-resources/ articles/ articles/ intro-to-normalization.html). intro-to-normalization.html). Archived from the original (http:/ / dev. dev. mysql.com/ mysql.com/ tech-resources/ tech-resources/ articles/ articles/ intro-to-normalization.html) intro-to-normalization.html) on 2011-06-06. Retrieved 2012-02-25. • "Nor "Norma mali liza zati tion on"" (htt (http: p:/ / / web.archive. web.archive.org/ org/ web/ web/ 20100106115112/ 20100106115112/ http:/ http:/ / www.utexas.edu/ www.utexas. edu/ its/ its/ archive/ archive/ windows/ database/ database/ datamodeling/ datamodeling/ rm/ rm/ rm7.html). rm7.html). Archived from the original (http:/ / www.utexas.edu/ www.utexas. edu/ its/ its/ windows/ database/ database/ datamodeling/ datamodeling/ rm/ rm/ rm7.html) rm7.html) on 2010-01-06. Retrieved 2012-02-25. • Efficie Efficient nt Data Databas basee Desig Designn (http (http:/ :/ / www.sum-it.nl/ www.sum-it. nl/ cursus/ cursus/ enindex.php3#dbdesign) enindex.php3#dbdesign) • Data Data Mode Modeler lerss Comm Communi unity ty (http:/ (http:/ / www.datamodelers.com/ www.datamodelers.com/ ) • Relati Relationa onall databas databasee design design tutoria tutoriall (http:/ (http:/ / en.tekstenuitleg. en.tekstenuitleg.net/ net/ articles/ articles/ software/ software/ database-design-tutorial/ database-design-tutorial/ intro. intro. html) • Data Databa base se desi design gn (ht (http tp:/ :/ / www.dmoz. www.dmoz.org/ org/ Computers/ Computers/ Data_Formats/ Data_Formats/ Database/ Database/ ) at the Open Directory Project
Relational database A relational database is a collection of data items organised as a set of formally described tables from which data can be accessed easily. A relational database is created using the relational model. The software used in a relational database is called a relational database management system (RDBMS). A relational database is the predominant choice in storing data, over other models like the hierarchical database model or the network model. The relational database was first defined in 1970 by Edgar Codd, of IBM's San Jose Research Laboratory. [1]
Terminology Relational database theory uses a set of mathematical terms, which are roughly equivalent to SQL database terminology. The table below summarizes some of the most important relational database terms and their SQL database equivalents.
Relational database terminology.
Relational database
36
Relational term
SQL equivalent
relati relation, on, base base relvar relvar table table deri derive vedd rel relva varr
view view,, que query ry res result, ult, res result ult set set
tuple
row
attribute
column
Relations or Tables A relation is defined as a set of tuples that have the same attributes. A tuple usually represents an object and information about that object. Objects are typically physical objects or concepts. A relation is usually described as a table, which is organized into rows and columns. All the data referenced by an attribute are in the same domain and conform to the same constraints. The relational model specifies that the tuples of a relation have no specific order and that the tuples, in turn, impose no order on the attributes. Applications access data by specifying queries, which use operations such as select to identify tuples, project to identify attributes, and join to combine relations. Relations can be modified using the insert , delete, and update operators. New tuples can supply explicit values or be derived from a query. Similarly, queries identify tuples for updating or deleting. It is necessary for each tuple of a relation to be uniquely identifiable by some combination (one or more) of its attribute values. This combination is referred to as the primary key.
Base and derived relations In a relational database, all data are stored and accessed via relations. Relations that store data are called "base relations", and in implementations are called "tables". Other relations do not store data, but are computed by applying relational operations to other relations. These relations are sometimes called "derived relations". In implementations these are called "views" or "queries". Derived relations are convenient in that they act as a single relation, even though they may grab information from several relations. Also, derived relations can be used as an abstraction layer.
Domain A domain describes the set of possible values for f or a given attribute, and can be considered a constraint on the value of the attribute. Mathematically, attaching a domain to an attribute means that any value for the attribute must be an element of the specified set. The character data value 'ABC', for instance, is not in the integer domain. The integer value 123 satisfies the domain constraint.
Constraints Constraints make it possible to further restrict the domain of an attribute. For instance, a constraint can restrict a given integer attribute to values between 1 and 10. Constraints provide one method of implementing business rules in the database. SQL implements constraint functionality in the form of check constraints. Constraints restrict the data that can be stored in relations. These are usually defined using expressions that result in a boolean value, indicating whether or not the data satisfies the constraint. Constraints can apply to single attributes, to a tuple (restricting combinations of attributes) or to an entire relation. Since every attribute has an associated domain, there are constraints ( domain constraints). The two principal rules for the relational model are known as entity integrity and referential integrity.
Relational database
Primary keys A primary key uniquely defines a relationship within a database. In order for an attribute to be a good primary key it must not repeat. While natural attributes are sometimes good primary keys, surrogate keys are often used instead. A surrogate key is an artificial attribute assigned to an object which uniquely identifies it (for instance, in a table of information about students at a school they might all be assigned a student ID in order to differentiate them). The surrogate key has no intrinsic (inherent) meaning, but rather is useful through its ability to uniquely identify a tuple. Another common occurrence, especially in regards to N:M cardinality is the composite key. A composite key is a key made up of two or more attributes within a table that (together) uniquely identify a record. (For example, in a database relating students, teachers, and classes. Classes could be uniquely identified by a composite key of their room number and time slot, since no other class could have exactly the same combination of attributes. In fact, use of a composite key such as this can be a form of data verification, albeit a weak one.)
Foreign key A foreign key is a field in a relational table that matches the primary key column of another table. The foreign key can be used to cross-reference tables. Foreign keys need not have unique values in the referencing relation. Foreign keys effectively use the values of attributes in the referenced relation to restrict the domain of one or more attributes in the referencing relation. A foreign key could be described formally as: "For all tuples in the referencing relation projected over the referencing attributes, there must exist a tuple in the referenced relation projected over those same attributes such that the values in each of the referencing attributes match the corresponding values in the referenced attributes."
Stored procedures A stored procedure is executable code that is associated with, and generally stored in, the database. Stored procedures usually collect and customize common operations, like inserting a tuple into a relation, gathering statistical information about usage patterns, or encapsulating complex business logic and calculations. Frequently they are used as an application programming interface (API) for security or simplicity. Implementations of stored procedures on SQL RDBMSs often allow developers to take advantage of procedural extensions (often vendor-specific) to the standard declarative SQL syntax. Stored procedures are not part of the relational database model, but all commercial implementations include them.
Index An index is one way of providing quicker access to data. Indices can be created on any combination of attributes on a relation. Queries that filter using those attributes can find matching tuples randomly using the index, without having to check each tuple in turn. This is analogous to using the index of a book to go directly to the page on which the information you are looking for is found, that is you do not have to read the entire book to find what you are looking for. Relational databases typically supply multiple indexing techniques, each of which is optimal for some combination of data distribution, relation size, and typical access pattern. Indices are usually implemented via B+ trees, R-trees, and bitmaps. Indices are usually not considered part of the database, as they are considered an implementation detail, though indices are usually maintained by the same group that maintains the other parts of the database. It should be noted that use of efficient indexes on both primary and foreign keys can dramatically improve query performance. This is because B-tree indexes result in query times proportional to log(n) where N is the number of rows in a table and hash indexes result in constant time queries (no size dependency so long as the relevant part of the index fits into memory).
37
Relational database
Relational operations Queries made against the relational database, and the derived relvars in the database are expressed in a relational calculus or a relational algebra. In his original relational algebra, Codd introduced eight relational operators in two groups of four operators each. The first four operators were based on the traditional mathematical set operations: • The union operator operator combines combines the tuples tuples of two relations relations and removes removes all duplicate duplicate tuples tuples from the result. result. The relational union operator is equivalent to the SQL UNION operator. • The intersection intersection operator operator produces produces the the set of tuples that two two relations relations share in common. common. Intersection Intersection is implemented in SQL in the form of the INTERSECT operator. • The difference difference operator operator acts on two relation relationss and produces produces the set of tuples tuples from the first relation relation that do not exist exist in the second relation. Difference is implemented in SQL in the form of the EXCEPT or MINUS operator. • The cartesian cartesian product product of two relations relations is a join that is not restricted restricted by any criteria criteria,, resulting resulting in every tuple of the first relation being matched with every tuple of the second relation. The cartesian product is implemented in SQL as the CROSS JOIN operator. The remaining operators proposed by Codd involve special operations specific to relational r elational databases: • The selection, selection, or restrictio restriction, n, operation operation retrieves retrieves tuples tuples from a relation, relation, limiting the the results to only only those that meet meet a specific criteria, i.e. a subset in terms of set theory. The SQL equivalent of selection is the SELECT query statement with a WHERE clause. • The projection projection operatio operationn extracts extracts only the specifie specifiedd attributes attributes from a tuple tuple or set of tuples. tuples. • The join operation operation defined defined for relational relational database databasess is often referred referred to as a natural join. join. In this type of join, two relations are connected by their common attributes. SQL's approximation of a natural join is the INNER JOIN operator. • The relational relational division division operation operation is a slightly slightly more complex complex operation, operation, which involves involves essentiall essentiallyy using the tuples of one relation (the dividend) to partition a second relation (the divisor). The relational division operator is effectively the opposite of the cartesian product operator (hence the name). Other operators have been introduced or proposed since Codd's introduction of the original eight including relational comparison operators and extensions that offer support for nesting and hierarchical data, among others.
Normalization Normalization was first proposed by Codd as an integral part of the relational model. It encompasses a set of procedures designed to eliminate nonsimple domains (non-atomic values) and the redundancy (duplication) of data, which in turn prevents data manipulation anomalies and loss of data integrity. The most common forms of normalization applied to databases are called the normal forms.
Relational database management systems Relational databases, databases, as implemented in relational database management sys tems, have become a predominant choice for the storage of inf ormation ormation in new new databases used for financial records, manufacturing and lo gistical gistical information, person personnel nel data and much more. more. Relational Relational databases have often ofte n replaced replaced legacy hierarchi hierarchical cal datab databases and network databases databases because they t hey are easier to understand and use, even though they are much less efficient. As computer power has increased, the inefficiencies of relational databases, which made them impractical in earlier times, have been outweighed by their ease of use. However, relational databases have been challenged by object databases, which were introduced in an attempt to address the object-relational impedance mismatch in relational database, and XML databases. The three leading commercial relational database vendors are Oracle, Microsoft, and IBM.[2] The three leading open source implementations are MySQL, PostgreSQL, and SQLite. Amazon Relational Database Service is a database as a service offering MySQL and Oracle database engines.
38
Relational database
39
References Communications of the ACM 13 ACM 13 (6): 377 – 387. [1] Codd, E.F. (1970). (1970). "A Relational Relational Model of Data for Large Shared Shared Data Banks". Banks". Communications 387. doi:10.1145/362384.362685. [2] Gartner Gartner Says Worldwide Relational Relational Database Database Market Increased Increased 14 Percent in 2006 (http:/ / www.gartner. www.gartner.com/ com/ it/ it/ page. page. jsp?id=507466), jsp?id=507466), includes revenue estimates for leading database companies
Relational model The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F. Codd. [1][2] In the relational model of a database, all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database. The purpose of the relational model is to provide a declarative method for specifying data and queries: users directly state what information the database contains and what information they want from it, and let the database management system software take care of describing data structures for storing the data and retrieval procedures for answering queries. Most implementations of the relational model use the SQL data definition and query language. A table in an SQL database schema corresponds to a predicate variable; the contents of a table to a relation; key constraints, other constraints, and SQL queries correspond to predicates. However, SQL databases, including DB2, deviate from the relational model in many details; Codd fiercely argued against deviations that compromise the original principles.[4]
[3] Diagram of an example database according to the Relational model.
Overview The relational model's central idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values. The content of the database at any given time is a finite (logical) model of the database, i.e. a set of relations, one per predicate variable, such that all predicates are satisfied. A request for information from the database (a database query) is also a predicate.
Relational model
40
In the relational model, related records are linked together with a "key".
Alternatives to the relational model Other models are the hierarchical model and network model. Some systems using these older architectures are still in use today in data centers with high data volume needs, or where existing systems are so complex and abstract it would be cost-prohibitive to migrate to systems employing the relational model; also of note are newer object-oriented databases.
Relational model concepts.
A recent development is the Object-Relation type-Object model, which is based on the assumption that any fact can be expressed in the form of one or more binary relationships. The model is used in Object Role Modeling (ORM), RDF/Notation 3 (N3). The relational model was the first database model to be described in formal mathematical terms. Hierarchical and network databases existed before relational databases, but their specifications were relatively informal. After the relational model was defined, there were many attempts to compare and contrast the different models, and this led to the emergence of more rigorous descriptions of the earlier models; though the procedural nature of the data manipulation interfaces for hierarchical and network databases limited the scope for formalization.
Relational model
Implementation There have been several attempts to produce a true implementation of the relational database model as originally defined by Codd and explained by Date, Darwen and others, but none have been popular successes so far. Rel is one of the more recent attempts to do this.
History The relational model was invented by E.F. (Ted) Codd as a general model of data, and subsequently maintained and developed by Chris Date and Hugh Darwen among others. In The Third Manifesto (first (fir st published in 1995) Date and Darwen show how the relational model can accommodate certain desired object-oriented features.
Controversies Codd himself, some years after publication of his 1970 model, proposed a three-valued logic (True, False, Missing or NULL) version of it to deal with missing information, and in his The Relational Model for Database Management Version 2 (1990) he went a step further with a four-valued logic (True, False, Missing but Applicable, Missing but Inapplicable) version. But these have never been implemented, presumably because of attending complexity. SQL's NULL construct was intended to be part of a three-valued logic system, but fell short of that due to logical errors in the standard and in its implementations.
Relational model topics The model The fundamental assumption of the relational model is that all data is represented as mathematical n-ary relations, an n-ary relation being a subset of the Cartesian product of n domains. In the mathematical model, reasoning about such data is done in two-valued predicate logic, meaning there are two possible evaluations for each proposition: either true or false (and in particular no third value such as unknown, or not applicable, either of which are often associated with the concept of NULL). Data are operated upon by means of a relational calculus or relational algebra, these being equivalent in expressive power. The relational model of data permits the database designer to create a consistent, logical representation of information. Consistency is achieved by including declared constraints in the database design, which is usually referred to as the logical schema. The theory includes a process of database normalization whereby a design with certain desirable properties can be selected from a set of logically equivalent alternatives. The access plans and other implementation and operation details are handled by the DBMS engine, and are not reflected in the logical model. This contrasts with common practice for SQL DBMSs in which performance tuning often requires changes to the logical model. The basic relational building block is the domain or data type, usually abbreviated nowadays to type. A tuple is an attribute values. An attribute is an ordered pair of attribute attribute name and type name. An attribute value is ordered set of attribute a specific valid value for the type of the attribute. This can be either a scalar value or a more complex type. A relation consists of a heading and a body. A heading is a set of attributes. A body (of an n-ary relation) is a set of n-tuples. The heading of the relation is also the heading of each of its tuples. A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a set is an unordered collection of unique, non-duplicated items, although some DBMSs impose an order to their data. In mathematics, a tuple has an order, and allows for duplication. E.F. Codd originally defined tuples using this mathematical definition. [5] Later, it was one of E.F. Codd's great insights that using attribute names instead of an ordering would be so much more convenient (in general) in a computer language based on relations . This insight is still being used today. Though the concept has changed, the name "tuple" has not. An immediate and important
41
Relational model consequence of this distinguishing feature is that in the relational model the Cartesian product becomes commutative. A table is an accepted visual representation of a relation; a tuple is similar to the concept of row, but note that in the database language SQL the columns and the rows of a table are ordered. A relvar is a named variable of some specific relation type, to which at all times some relation of that type is assigned, though the relation may contain zero tuples. The basic principle of the relational model is the Information Principle: all information is represented by data values in relations. In accordance with this Principle, a relational database is a set of relvars and the result of every query is presented as a relation. The consistency of a relational database is enforced, not by rules built into the applications that use it, but rather by constraints, declared as part of the logical schema and enforced by the DBMS for all applications. In general, constraints are expressed using relational comparison operators, of which just one, "is subset of" ( ⊆), is theoretically sufficient. In practice, several useful shorthands are expected to be available, of which the most important are candidate key (really, superkey) and foreign key constraints.
Interpretation To fully appreciate the relational model of data it is essential to understand the intended interpretation of a relation. The body of a relation is sometimes called its extension. This is because it is to be interpreted as a representation of the extension of some predicate, this being the set of true propositions that can be formed by replacing each free variable in that predicate by a name (a term that designates something). There is a one-to-one correspondence between the free variables of the predicate and the attribute names of the relation heading. Each tuple of the relation body provides attribute values to instantiate the predicate by substituting each of its free variables. The result is a proposition that is deemed, on account of the appearance of the tuple in the relation body, to be true. Contrariwise, every tuple whose heading conforms to that of the relation but which does not appear in the body is deemed to be false. This assumption is known as the closed world assumption: it is often violated in practical databases, where the absence of a tuple might mean that the truth of the corresponding proposition is unknown. For example, the absence of the tuple ('John', 'Spanish') from a table of language skills cannot necessarily be taken as evidence that John does not speak Spanish. For a formal exposition of these ideas, see the section Set-theoretic Formulation, below.
Application to databases A data type as used in a typical relational database might be the set of integers, the set of character strings, the set of dates, or the two boolean values true and false, and so on. The corresponding type names for these types might be the strings "int", "char", "date", "boolean", etc. It is important to understand, though, that relational theory does not dictate what types are to be supported; indeed, nowadays provisions are expected to be available for user-defined types in addition to the built-in ones provided by the system. Attribute is the term used in the theory for what is commonly referred to as a column. Similarly, table is commonly
used in place of the theoretical term relation (though in SQL the term is by no means synonymous with relation). A table data structure is specified as a list of column definitions, each of which specifies a unique column name and the type of the values that are permitted for that column. An attribute value is the entry in a specific column and row, such as "John Doe" or "35". A tuple is basically the same thing as a row, except in an SQL DBMS, where the column values in a row are ordered. (Tuples are not ordered; instead, each attribute value is identified solely by the attribute name and never by its ordinal position within the tuple.) An attribute name might be "name" or "age".
42
Relational model A relation is a table structure definition (a set of column definitions) along with the data appearing in that structure. The structure definition is the heading and the data appearing in it is the body, a set of rows. A database relvar (relation variable) is commonly known as a base table. The heading of its assigned value at any time is as specified operator in the table declaration and its body is that most recently assigned to it by invoking so me update operator (typically, INSERT, UPDATE, or DELETE). The heading and body of the table resulting from evaluation of some query are determined by the definitions of the operators used in the expression of that query. (Note that in SQL the heading is not always a set of column definitions as described above, because it is possible for a column to have no name and also for two or more columns to have the same name. Also, the body is not always a set of rows because in SQL it is possible for the same row to appear more than once in the same body.)
SQL and the relational model SQL, initially pushed as the standard language for relational databases, deviates from the relational model in several places. The current ISO SQL standard doesn't mention the relational model or use relational terms or concepts. However, it is possible to create a database conforming to the relational model using SQL if one does not use certain SQL features. The following deviations from the relational model have been noted in SQL. Note that few database servers implement the entire SQL standard and in particular do not allow some of these deviations. Whereas NULL is ubiquitous, for example, allowing duplicate column names within a table or anonymous columns is uncommon. Duplicate rows The same row can appear more than once in an SQL table. The same tuple cannot appear more than once in a relation. Anonymous columns A column in an SQL table can be unnamed and thus unable to be referenced in expressions. The relational model requires every attribute to be named and referenceable. Duplicate column names Two or more columns of the same SQL table can have the same name and therefore cannot be referenced, on account of the obvious ambiguity. The relational model requires every attribute to be referenceable. Column order significance The order of columns in an SQL table is defined and significant, one consequence being that SQL's implementations of Cartesian product and union are both noncommutative. The relational model requires there to be no significance to any ordering of the attributes of a relation. Views without CHECK OPTION Updates to a view defined without CHECK OPTION can be accepted but the resulting update to the database does not necessarily have the expressed effect on its target. For example, an invocation of INSERT can be accepted but the inserted rows might not all appear in the view, or an invocation of UPDATE can result in rows disappearing from the view. The relational model requires updates to a view to have the same effect as if the view were a base relvar. Columnless tables unrecognized SQL requires every table to have at least one column, but there are two relations of degree zero (of cardinality one and zero) and they are needed to represent extensions of predicates that contain no free variables. NULL This special mark can appear instead of a value wherever a value can appear in SQL, in particular in place of a column value in some row. The deviation from the relational model arises from the fact that the implementation of this ad hoc concept in SQL involves the use of three-valued logic, under which the
43
Relational model comparison of NULL with itself does not yield true but instead yields the third truth value, unknown; similarly the comparison NULL with something other than itself does not yield false but instead yields unknown. It is because of this behaviour in comparisons that NULL is described as a mark rather than a value. The relational model depends on the law of excluded middle under which anything that is not true is false and anything that is not false is true; it also requires every tuple in a relation body to have a value for every attribute of that relation. This particular deviation is disputed by some if only because E.F. Codd himself eventually advocated the use of special marks and a 4-valued logic, but this was based on his observation that there are two distinct reasons why one might want to use a special mark in place of a value, which led opponents of the use of such logics to discover more distinct reasons and at least as many as 19 have been noted, which would require a 21-valued logic. SQL itself uses NULL for several purposes other than to represent "value unknown". For example, the sum of the empty set is NULL, meaning zero, the average of the empty set is NULL, meaning undefined, and NULL appearing in the result of a LEFT JOIN can mean "no value because there is no matching row in the right-hand operand".
Relational operations Users (or programs) request data from a relational database by sending it a query that is written in a special language, usually a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded into software that provides an easier user interface. Many web sites, such as Wikipedia, perform SQL queries when generating pages. In response to a query, the database returns a result set, which is just a list of rows containing the answers. The simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return just the answer wanted. Often, data from multiple tables are combined into one, by doing a join. Conceptually, this is done by taking all possible combinations of rows (the Cartesian product), and then filtering out everything except the answer. In practice, relational database management systems rewrite ("optimize") queries to perform faster, using a variety of techniques. There are a number of relational operations in addition to join. These include project (the process of eliminating some of the columns), restrict (the process of eliminating some of the rows), union (a way of combining two tables with similar structures), difference (which lists the rows in one table that are not found in the other), intersect (which lists the rows found in both tables), and product (mentioned above, which combines each row of one table with each row of the other). Depending on which other sources you consult, there are a number of other operators – many of which can be defined in terms of those listed above. These include semi-join, outer operators such as outer join and outer union, and various forms of division. Then there are operators to rename columns, and summarizing or aggregating operators, and if you permit relation values as attributes (RVA – relation-valued attribute), then operators such as group and ungroup. The SELECT statement in SQL serves to handle all of these except for the group and ungroup operators. The flexibility of relational databases allows programmers to write queries that were not anticipated by the database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for a long time (perhaps several decades). This has made the idea and implementation of relational databases very popular with businesses.
44
Relational model
45
Database normalization Relations are classified based upon the types of anomalies to which they're vulnerable. A database that's in the first normal form is vulnerable to all types of anomalies, while a database that's in the domain/key normal form has no modification anomalies. Normal forms are hierarchical in nature. That is, the lowest level is the first normal form, and the database cannot meet the requirements for higher level normal forms without first having met all the requirements of the lesser normal forms. [6]
Examples Database An idealized, very simple example of a description of some relvars (relation variables) and their attributes: • • • • • •
Customer (Customer ID, Tax ID, Name, Address, City, State, Zip, Phone, Email) Order (Order No, Customer ID, Invoice No, Date Placed, Date Promised, Terms, Status) Order Line (Order No, Order Line No, Product Code, Qty) Invoice (Invoice No, Customer ID, Order No, Date, Status) Inv Invoice oice Line Line (Invoice No, Invoice Line No, Product Code, Qty Shipped) Product (Product Code, Product Description)
In this design we have six relvars: Customer, Order, Order Line, Invoice, Invoice Line and Product. The bold, underlined attributes are candidate keys. The non-bold, underlined attributes are foreign keys. Usually one candidate key is arbitrarily chosen to be called the primary key and used in preference over the other candidate keys, which are then called alternate keys. A candidate key is a unique identifier enforcing that no tuple will be duplicated; this would make the relation into something else, namely a bag, by violating the basic definition of a set. Both foreign keys and superkeys (which includes candidate keys) can be composite, that is, can be composed of several attributes. Below is a tabular depiction of a relation of our example Customer relvar; a relation can be thought of as a value that can be attributed to a relvar.
Customer relation Customer ID
Tax ID
Name
Address
[More fields....]
================================================================================================== 1234567890
555-5512222
Munmun
323 Broadway
...
2223344556
555-5523232
Wile E.
1200 Main Street
...
3334445563
555-5533323
Ekta
871 1st Street
...
4232342432
555-5325523
E. F. Codd
123 It Way
...
If we attempted to insert a new customer with the ID 1234567890, this would violate the design of the relvar since Customer ID is a primary key and we already have a customer 1234567890. The DBMS must reject a transaction such as this that would render the database inconsistent by a violation of an integrity constraint. Foreign keys are integrity constraints enforcing that the value of the attribute set is drawn from a candidate key in
another relation. For example in the Order relation the attribute Customer ID is a foreign key. A join is the operation that draws on information from several relations at once. By joining relvars from the example above we could query the database for all of the Customers, Orders, and Invoices. If we only wanted the tuples for a specific customer, we would specify this using a restriction condition. If we wanted to retrieve all of the Orders for Customer 1234567890, we could query the database to return every row in the Order table with Customer ID 1234567890 and join the Order table to the Order Line table based on Order
Relational model
46
No.
There is a flaw in our database design above. The Invoice relvar contains an Order No attribute. So, each tuple in the Invoice relvar will have one Order No, which implies that there is precisely one Order for each Invoice. But in reality an invoice can be created against many orders, or indeed for no particular order. Additionally the Order relvar contains an Invoice No attribute, implying that each Order has a corresponding Invoice. But again this is not always true in the real world. An order is sometimes paid through several invoices, and sometimes paid without an invoice. In other words there can be many Invoices per Order and many Orders per Invoice. This is a many-to-many relationship between Order and Invoice (also called a non-specific relationship). To represent this relationship in the database a new relvar should be introduced whose role is to specify the correspondence between Orders and Invoices: OrderInvoice(Order No,Invoice No) Now, the Order relvar has a one-to-many relationship to the OrderInvoice table, as does the Invoice relvar. If we want to retrieve every Invoice for a particular Order, we can query for all orders where Order No in the Order relation equals the Order No in OrderInvoice, and where Invoice No in OrderInvoice equals the Invoice No in Invoice.
Set-theoretic formulation Basic notions in the relational model are relation names and attribute names. We will represent these as strings such as "Person" and "name" and we will usually use the variables and to range over them. Another basic notion is the set of atomic values that contains values such as numbers and strings. Our first definition concerns the notion of tuple, which formalizes the notion of row or record in a table: Tuple A tuple is a partial function from attribute names to atomic values. Header A header is a finite set of attribute names. Projection The projection of a tuple on a finite set of attributes
is
.
The next definition defines relation which formalizes the contents of a table as it is defined in the relational model. Relation A relation is a tuple
with
, the header, and
, the body, a set of tuples that all have the domain
. Such a relation closely corresponds to what is usually called the extension of a predicate in first-order logic except that here we identify the places in the predicate with attribute names. Usually in the relational model a database schema is said to consist of a set of relation names, the headers that are associated with these names and the constraints that should hold for every instance of the database schema. Relation universe A relation universe
over a header
is a non-empty set of relations with header
.
Relation schema A relation schema with header
consists of a header
and a predicate
. A relation satisfies a relation schema
that is defined for all relations
if it has header
and satisfies
.
Relational model
47
Key constraints and functional dependencies One of the simplest and most important types of relation constraints is the key constraint . It tells us that in every instance of a certain relational schema the tuples can be identified by their values for certain attributes. Superkey A superkey is written as a finite set of attribute names. A superkey •
holds in a relation
if:
and
• there there exist exist no two distin distinct ct tuples tuples
such such that that
A superkey holds in a relation universe Theorem: A superkey
holds in
.
if it holds in all relations in
holds in a relation universe
over
.
if and only if
and
.
Candidate key A superkey holds as a candidate key for a relation universe no proper subset of that also holds as a superkey for .
if it holds as a superkey for
and there is
Functional dependency A functional dependency (FD for short) is written as
for
A functional dependency
if:
• •
holds in a relation
finite sets of attribute names.
and tuples
,
A functional dependency
holds in a relation universe
if it holds in all relations in
.
Trivial functional dependency A functional dependency is trivial under a header Theorem: An FD
if it holds in all relation universes over
is trivial under a header
if and only if
.
.
Closure Armstrong's axioms: The closure of a set of FDs superset of
under a header
, written as
, is the smallest
such that:
• •
(reflexivity) (transitivity) and
•
(augmentation) Theorem: Armstrong's axioms are sound and complete; given a header
contain subsets of which all FDs in
,
if and only if
and a set of FDs that only holds in all relation universes over in
hold.
Completion The completion of a finite set of attributes superset of
under a finite set of FDs
, written as
, is the smallest
such that:
• The completion of an attribute set can be used to compute if a certain dependency is in the closure of a set of FDs. Theorem: Given a set
of FDs,
if and only if
Irreducible cover An irreducible cover of a set
of FDs is a set
of FDs such that:
.
Relational model
48
• • ther theree exis exists ts no •
such such that that is a singleton set and
•
.
Algorithm to derive candidate keys from functional dependencies INPUT: a set S of FDs that contain only subsets of a header
H
OUTPUT: the set C of superkeys that hold as candidate keys in
all relation universes over H in which all FDs in S hold begin C :=
∅;
// found candidate keys
Q := { H };
while Q <>
// superkeys that contain candidate keys
∅ do
let let K be some element elem ent from Q ; Q := Q – { K }; minimal := true ;
for each X->Y in S do K' := (K – Y )
if K'
∪ X ;
// derive new superkey
⊂ K then
minimal := false; Q := Q ∪ { K' };
end if end for if minimal and there is not a subset of K in C then
remove all supersets of K from C ; C := C ∪ { K };
end if end while end
References Redundancy, and Consistency of Relations Stored in Large Data Banks" , E.F. Codd, IBM Research Report, 1969 [1] "Derivability, Redundancy, [2] "A Relational Model of Data for Large Shared Data Banks" , in Communications of the ACM, 1970. [3] Data Integration Integration Glossary Glossary (http:/ (http:/ / knowledge.fhwa. knowledge.fhwa.dot. dot.gov/ gov/ tam/ tam/ aashto.nsf/ aashto.nsf/ All+Documents/ All+Documents/ 4825476B2B5C687285256B1F00544258/ 4825476B2B5C687285256B1F00544258/ $FILE/ DIGloss.pdf), DIGloss.pdf), U.S. Department of Transportation, August 2001. [4] E. F. Codd, The Relational Relational Model for Database Management, Addison-Wesley Addison-Wesley Publishing Company, 1990, ISBN 0-201-14192-2 [5] Codd, E.F. (1970). (1970). "A Relational Relational Model of Data for Large Shared Shared Data Banks" (http:/ (http:/ / www.acm. www. acm.org/ org/ classics/ classics/ nov95/ nov95/ toc.html). toc.html). Communications of the ACM 13 ACM 13 (6): 377 – 387. 387. doi:10.1145/362384.362685. . Processing: Fundamentals, Fundamentals, Design, and Implementation Implementation (1997), Prentice-Hall, Inc., pages 130 – 144 [6] David David M. M. Kroe Kroenke, nke, Database Processing: 144
Relational model
Further reading • Date, Date, C. J., J., Darw Darwen, en, H. (2000) (2000).. Foundation for Future Database Systems: Systems: The Third Manifesto, 2nd edition, Addison-Wesley Professional. ISBN 0-201-70928-7. • Date Date,, C. C. J. J. (20 (2003 03). ). Introduction to Database Systems. 8th edition, Addison-Wesley. ISBN 0-321-19784-4.
External links • Feasibility Feasibility of a set-theoret set-theoretic ic data structure structure : a general general structure structure based on a reconstitut reconstituted ed definition definition of relation relation (Childs' 1968 research cited by Codd's 1970 paper) (http:/ / hdl.handle. hdl.handle.net/ net/ 2027.42/ 2027.42/ 4164) 4164) • The Third Third Mani Manifes festo to (TTM (TTM)) (http (http:/ :/ / www.thethirdmanifesto. www.thethirdmanifesto.com/ com/ ) • Relati Relationa onall Databa Databases ses (http: (http:/ / / www.dmoz.org/ www.dmoz. org/ Computers/ Computers/ Software/ Software/ Databases/ Databases/ Relational/ Relational/ ) at the Open Directory Project • Rela Relati tion onal al Mod Model el (ht (http tp:/ :/ / c2.com/ c2.com/ cgi/ cgi/ wiki?RelationalModel) wiki?RelationalModel) • Binary relati relations ons and tuples tuples compare comparedd with respect respect to to the semantic semantic web (http:/ (http:/ / blogs.sun. blogs.sun.com/ com/ bblfish/ bblfish/ entry/ entry/ why_binary_relations_beat_tuples)
Binary relation In mathematics, a binary relation on a set A is a collection of ordered pairs of elements of A. In other words, it is a subset of the Cartesian product A2 = A × A. More generally, a binary relation between two sets A and B is a subset of A A × B. The terms dyadic relation and 2-place relation are synonyms for binary relations. An example is the "divides" relation between the set of prime numbers P and the set of integers Z, in which every prime p is associated with every integer z that is a multiple of p p (and not with any integer that is not a multiple of p p). In this relation, for instance, the prime 2 is associated with numbers that include −4, 0, 6, 10, but not 1 or 9; and the prime 3 is associated with numbers that include 0, 6, and 9, but not 4 or 13. Binary relations are used in many branches of mathematics to model concepts like "is greater than", "is equal to", and "divides" in arithmetic, "is congruent to" in geometry, "is adjacent to" in graph theory, "is orthogonal to" in linear algebra and many more. The concept of function is defined as a special kind of binary relation. Binary relations are also heavily used in computer science. A binary relation is the special case n = 2 of an n-ary relation R ⊆ A1 × … × An, that is, a set of n-tuples where the jth component of each n-tuple is taken from the jth domain A j of the relation. In some systems of axiomatic set theory, relations are extended to classes, which are generalizations of sets. This extension is needed for, among other things, modeling the concepts of "is an element of" or "is a subset of" in set theory, without running into logical inconsistencies such as Russell's paradox.
Formal definition A binary relation R is usually defined as an ordered triple ( X , Y , G) where X and Y are arbitrary sets (or classes), and G is a subset of the Cartesian product X × Y . The sets X and Y are called the domain (or the set of departure) and codomain (or the set of destination), respectively, of the relation, and G is called its graph. The statement ( x y , y) ∈ R is read " x is R-related to y", and is denoted by xRy or R( x , y). The latter notation corresponds x y to viewing R as the characteristic function on "X" x "Y" for the set of pairs of G. The order of the elements in each pair of G is important: if a ≠ b, then aRb and bRa can be true or false, independently of each other.
49
Binary relation A relation as defined by the triple ( X , Y , G) is sometimes referred to as a correspondence instead.[1] In this case the relation from X to Y is the subset G of X X ×Y , and "from X to Y " must always be either specified or implied by the context when referring to the relation. In practice correspondence and relation tend to be used interchangeably.
Is a relation more than its graph? According to the definition above, two relations with the same graph may be different, if they differ in the sets and . For example, if , then , , and are three distinct relations. Some mathematicians,especially in set theory, do not consider the sets and to be part of the relation, and therefore define a binary relation as being a subset of x , that is, just the graph . A special case of this difference in points of view applies to the notion of function. Many authors insist on distinguishing between a function's codomain and its range. Thus, a single "rule," like mapping every real number x to x2, can lead to distinct functions and , depending on whether the images under that rule are understood to be reals or, more restrictively, non-negative reals. But others view functions as simply sets of ordered pairs with unique first components. This difference in perspectives does raise some nontrivial issues. As an example, the former camp considers surjectivity —or being onto —as a property of functions, while the latter sees it as a relationship that functions may bear to sets. Either approach is adequate for most uses, provided that one attends to the necessary changes in language, notation, and the definitions of concepts like restrictions, composition, inverse relation, and so on. The choice between the two definitions usually matters only in very formal contexts, like category theory.
Example Example: Suppose there are four objects {ball, car, doll, gun} and four persons {John, Mary, Ian, Venus}. Suppose that John owns the ball, Mary owns the doll, and Venus owns the car. Nobody owns the gun and Ian owns nothing. Then the binary relation "is owned by" is given as R=({ball, car, doll, gun}, {John, Mary, Ian, Venus}, {(ball, John), (doll, Mary), (car, Venus)}).
Thus the first element of R is the set of objects, the second is the set of people, and the last element is a set of ordered pairs of the form (object, owner). The pair (ball, John), denoted by ball RJohn means that the ball is owned by John. Two different relations could have the same graph. For example: the relation ({ball, car, doll, gun}, {John, Mary, Venus}, {(ball,John), (doll, Mary), (car, Venus)}) is different from the previous one as everyone is an owner. But the graphs of the two relations are the same. Nevertheless, R is usually identified or even defined as G( R) and "an ordered pair ( x, y) ∈ G( R R)" is usually denoted as "( x, y) ∈ R".
Special types of binary relations Some important types of binary relations R between X and Y are listed below. Uniqueness properties: • injective (also called left-unique[2]): for all x and z in X and y in Y it holds that if xRy xRy and zRy then x = z. [2] • functional (also called right-unique or right-definite): for all x in X , and y and z in Y it holds that if xRy xRy and xRz then y = z; such a binary relation is called a partial function. • one-to-one (also written 1-to-1): injective and functional. Totality properties:
50
Binary relation • left-total[2]: for all x in X there exists a y in Y such that xRy (this property, although sometimes also referred to as total, is different from the definition of total in the next section). • surjective (also called right-total[2]): for all y in Y there exists an x in X such that xRy. Uniqueness and totality properties: • A function: a relation that is functional and left-total. • A bijection: a one-to-one correspondence; such a relation is a function and is said to be bijective.
Relations over a set If X X = Y then we simply say that the binary relation is over X , or that it is an endorelation over X . Some types of endorelations are widely studied in graph theory, where they're known as directed graphs. The set of all binary relations B(X) on a set X is a semigroup with involution with the involution being the mapping of a relation to its inverse relation. Some important types of binary relations over a set X are: • reflexive: for all x in X it holds that xRx. For example, "greater than or equal to" is a reflexive relation but "greater than" is not. • irreflexive (or strict): for all x in X it holds that not xRx. "Greater than" is an example of an irreflexive relation. • coreflexive: for all x and y in X it holds that if xRy xRy then x = y. "Equal to and odd" is an example of a coreflexive relation. • symmetric: for all x and y in X it holds that if xRy xRy then yRx. "Is a blood relative of" is a symmetric relation, because x is a blood relative of y y if and only if y y is a blood relative of x x. • antisymmetric: for all distinct x x and y in X , if xRy xRy then not yRx. • asymmetric: for all x and y in X , if xRy xRy then not yRx. (So asymmetricity is stronger than anti-symmetry. In fact, asymmetry is equivalent to anti-symmetry plus irreflexivity.) • transitive: for all x, y and z in X it holds that if xRy xRy and yRz then xRz. (Note that, under the assumption of transitivity, irreflexivity and asymmetry are equivalent.) • total: for all x and y in X it holds that xRy or yRx (or both). "Is greater than or equal to" is an example of a total relation (this definition for total is different from left total in the previous section). • trichotomous: for all x and y in X exactly one of xRy xRy, yRx or x = y holds. "Is greater than" is an example of a trichotomous relation. • Euclidean: for all x, y and z in X it holds that if xRy xRy and xRz, then yRz (and zRy). Equality is a Euclidean relation because if x x= y and x= z, then y= z. • serial: for all x in X , there exists y in X such that xRy. "Is greater than" is a serial relation on the integers. But it is not a serial relation on the positive integers, because there is no y in the positive integers such that 1> y.[3] However, the "Is less than" is a serial relation on the positive integers (the natural numbers), the rational numbers and the real numbers. Every reflexive relation is serial. • set-like: for every x in X , the class of all y such that yRx is a set. (This makes sense only if we allow relations on proper classes.) The usual ordering < on the class of ordinal numbers is set-like, while its inverse > is not. A relation that is reflexive, symmetric, and transitive is called an equivalence relation. A relation that is reflexive, antisymmetric, and transitive is called a partial order. A partial order that is total is called a total order, simple order , linear order, or a chain. [4] A linear order where every nonempty set has a least element is called a well-order. A relation that is symmetric, transitive, and serial is also reflexive.
51
Binary relation
Operations on binary relations If R R is a binary relation over X and Y , then the following is a binary relation over Y and X : • Inverse or converse: R −1, defined as R −1 = { ( y y, x) | ( x x, y) ∈ R }. A binary relation over a set is equal to its inverse if and only if it is symmetric. See also duality (order theory). If R R is a binary relation over X , then each of the following is a binary relation over X : • Reflexive closure: R =, defined as R = = { ( x x, x) | x ∈ X } ∪ R or the smallest reflexive relation over X containing R. This can be seen to be equal to the intersection of all reflexive relations containing R. • Reflexive reduction: R ≠, defined as R ≠ = R \ { ( x x, x) | x ∈ X } or the largest irreflexive relation over X contained in R. • Transitive closure: R +, defined as the smallest transitive relation over X containing R. This can be seen to be equal to the intersection of all transitive relations containing R. • Transitive reduction: R −, defined as a minimal relation having the same transitive closure as R. + = • Reflexive transitive closure: R *, defined as R * = ( R R ) , the smallest preorder containing R. • Reflexive transitive symmetric closure: R ≡, defined as the smallest equivalence relation over X containing R. If R R, S are binary relations over X and Y , then each of the following is a binary relation: • Union: R ∪ S ⊆ X × Y , defined as R ∪ S = { ( x x, y) | ( x x, y) ∈ R or ( x x, y) ∈ S }. • Intersection: R ∩ S ⊆ X × Y , defined as R ∩ S = { ( x x, y) | ( x x, y) ∈ R and ( x x, y) ∈ S }. If R R is a binary relation over X and Y , and S is a binary relation over Y and Z , then the following is a binary relation over X and Z : (see main article composition of relations) • Composition: S ∘ R, also denoted R ; S (or more ambiguously R ∘ S ), ), defined as S ∘ R = { ( x x, z) | there exists y ∈ Y , such that ( x, y) ∈ R and ( y y, z) ∈ S }. The order of R R and S in the notation S ∘ R, used here agrees with the standard notational order for composition of functions.
Complement If R R is a binary relation over X and Y , then the following too: • The complement S is defined as x S y y if not x R x R y y. The complement of the inverse is the inverse of the complement. If X X = Y the complement has the following properties: • If a relation relation is symmetric, symmetric, the complem complement ent is too. too. • The complement complement of of a reflexive reflexive relation relation is is irreflexive irreflexive and and vice versa. versa. • The complement complement of of a strict strict weak order order is a total total preorder preorder and and vice versa. versa. The complement of the inverse has these same properties.
Restriction The restriction of a binary relation on a set X to a subset S is the set of all pairs ( x, y) in the relation for which x and y are in S . If a relation is reflexive, irreflexive, symmetric, antisymmetric, asymmetric, transitive, total, trichotomous, a partial order, total order, strict weak order, total preorder (weak order), or an equivalence relation, its restrictions are too. However, the transitive closure of a restriction is a subset of the restriction of the transitive closure, i.e., in general not equal. Also, the various concepts of completeness (not to be confused with being "total") do not carry over to restrictions. For example, on the set of real numbers a property of the relation " ≤" is that every non-empty subset S of R with an upper bound in R has a least upper bound (also called supremum) in R. However, for a set of rational numbers this
52
Binary relation
53
supremum is not necessarily rational, so the same property does not hold on the restriction of the relation " ≤" to the set of rational numbers. The left-restriction (right-restriction, respectively) of a binary relation between X and Y to a subset S of its domain (codomain) is the set of all pairs ( x, y) in the relation for which x ( y y) is an element of S .
Sets versus classes Certain mathematical "relations", such as "equal to", "member of", and "subset of", cannot be understood to be binary relations as defined above, because their domains and codomains cannot be taken to be sets in the usual systems of axiomatic set theory. For example, if we try to model the general concept of "equality" as a binary relation =, we must take the domain and codomain to be the "set of all sets", which is not a set in the usual set theory. In most mathematical contexts, references to the relations of equality, membership and subset are harmless because they can be understood implicitly to be restricted to some set in the context. The usual work-around to this problem is to select a "large enough" set A, that contains all the objects of interest, and work with the restriction = A instead of =. Similarly, the "subset of" relation ⊆ needs to be restricted to have domain and codomain P( A A) (the power set of a specific set A): the resulting set relation can be denoted ⊆ A. Also, the "member of" relation needs to be restricted to have domain A and codomain P( A A) to obtain a binary relation ∈ that is a set. A Another solution to this problem is to use a set theory with proper classes, such as NBG or Morse – Kelley Kelley set theory, and allow the domain and codomain (and so the graph) to be proper classes: in such a theory, equality, membership, and subset are binary relations without special comment. (A minor modification needs to be made to the concept of the ordered triple ( X , Y , G), as normally a proper class cannot be a member of an ordered tuple; or of course one can identify the function with its graph in this context.) [5] With this definition one can for instance define a function relation between every set and its power set.
The number of binary relations The number of distinct binary relations on an n-element set is 2 n2 (sequence A002416 in OEIS): Number of n-element n-element binary relations of different types n
all all
tra transit nsitiv ivee refl refleexiv xive preo preord rdeer part partia iall orde orderr tota totall preo preord rder er tota totall orde orderr equi equiv valenc lencee rela relati tio on
0
1
1
1
1
1
1
1
1
1
2
2
1
1
1
1
1
1
2
16
13 13
4
4
3
3
2
2
3
512
171
64
29
19
13
6
5
4
65536
3994
4096
355
219
75
24
15
A000670
A000142
A000110
OEIS A002416 A006905
A053763 A000798 A001035
Notes: • • • •
The number number of irreflexiv irreflexivee relations relations is the same same as that that of reflexive reflexive relation relations. s. The number number of strict partial partial orders (irreflexiv (irreflexivee transitive transitive relations) relations) is the same same as that of partial partial orders. The number number of strict strict weak weak orders is the same same as that that of total preorde preorders. rs. The total orders orders are the partial partial orders that that are also total preorders preorders.. The number of preorders preorders that that are neither a partial order nor a total preorder is, therefore, the number of preorders, minus the number of partial orders, minus the number of total preorders, plus the number of total orders: 0, 0, 0, 3, and 85, respectively. • the number number of equivalence equivalence relation relationss is the number number of partitions, partitions, which which is the Bell Bell number. number.
Binary relation
54
The binary relations can be grouped into pairs (relation, complement), except that for n = 0 the relation is its own complement. The non-symmetric ones can be grouped into quadruples (relation, complement, inverse, inverse complement).
Examples of common binary relations • order order relati relations ons,, includi including ng strict strict order orders: s: • • • • • •
greater th than grea greate terr than than or equa equall to to less than less less than than or equa equall to to divi divide dess (eve (evenl nly) y) is a subset of of
• equi equiva vale lenc ncee rela relati tion ons: s: • • • •
equality is para paralle llell to (for (for affin affinee space spaces) s) is in bije biject ctio ionn wit withh isomorphy
• dependency dependency relation, relation, a finite, finite, symmetric, symmetric, reflex reflexive ive relation relation.. • independenc independencyy relation, a symmetric, symmetric, irreflexive irreflexive relation relation which which is the complement complement of some dependency dependency relation. relation.
Binary relations by property reflexive symmetric transi nsitive symbo mbol
exampl mple
→
directed graph undirected graph
No
Yes
tournament
No
No
dependency
Yes
Yes Yes
≤
Yes
≤
preference
No
Yes
≤
subset
Yes
Yes
weak order preorder
Yes
partial order
Yes
partial equivalence
pecking order
equivalence relation
Yes
Yes
Yes
∼, ≅, ≈, ≡ equality
strict partial order
No
No
Yes
<
proper subset
Binary relation
55
Notes dictionary of Mathematics Mathematics (http:/ [1] Encyclopedic dictionary / books.google. books.google.co. co.uk/ uk/ books?id=azS2ktxrz3EC&pg=PA1331& books?id=azS2ktxrz3EC&pg=PA1331&hl=en& hl=en&sa=X& sa=X& ei=glo6T_PmC9Ow8QPvwYmFCw&ved=0CGIQ6AEwBg#v=onepage& ei=glo6T_PmC9Ow8QPvwYmFCw& ved=0CGIQ6AEwBg#v=onepage&f=false). f=false). MIT. 2000. pp. 1330 – 1331. 1331. ISBN 0-262-59020-4. . [2] Kilp, Knauer and Mikhalev Mikhalev:: p. 3 [3] Yao, Y.Y.; Wong, S.K.M. (1995). "Generalization of rough sets using relationships between attribute values" (http:/ / www2.cs. www2.cs.uregina. uregina.ca/ ca/ the 2nd Annual Joint Joint Conference on Information Information Sciences Sciences: 30 – 33. ~yyao/ PAPERS/ PAPERS/ relation.pdf). relation.pdf). Proceedings of the 33. .. orderings, Academic Press, 1982, ISBN 012597680, p. 4 [4] Joseph Joseph G. Rose Rosenst nstein ein,, Linear orderings without variables. American Mathematical Society. p. 3. [5] Tarski, Tarski, Alfred; Alfred; Givant, Givant, Steven Steven (1987). A formalization of set theory without ISBN 0-8218-1041-3.
References • M. Kilp Kilp,, U. Knauer Knauer,, A.V. A.V. Mikh Mikhale alev, v, Monoids, Acts and Categories: with Applications to Wreath Products and Graphs, De Gruyter Expositions in Mathematics vol. 29, Walter de Gruyter, 2000, ISBN 3-11-015248-7. • Gunt Gunthe herr Sch Schmi midt dt,, 201 2010. 0. Relational Mathematics. Cambridge University Press, ISBN 978-0-521-76268-7.
External links • Hazewinkel Hazewinkel,, Michiel, Michiel, ed. (2001), (2001), "Binary "Binary relation" relation" (http:/ / www.encyclopediaofmath.org/ www.encyclopediaofmath.org/ index.php?title=p/ index.php?title=p/ b016380), Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-010-4
Database normalization normalization Database normalization is the process of organizing the fields and tables of a relational database to minimize
redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships. Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know as the First Normal Form (1NF) in 1970. [1] Codd went on to define the Second Normal Form (2NF) and Third Normal Form (3NF) in 1971,[2] and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974.[3] Informally, a relational database table is often described as "normalized" if it is in the Third Normal Form. [4] Most 3NF tables are free of insertion, update, and deletion anomalies. A standard piece of database design guidance is that the designer should create a fully normalized design; selective denormalization can subsequently be performed for performance reasons. [5]
Objectives of normalization A basic objective of the first normal form defined by Codd in 1970 was to permit data to be queried and manipulated using a "universal data sub-language" grounded in first-order logic. [6] (SQL is an example of such a data sub-language, albeit one that Codd regarded as seriously flawed.) [7] The objectives of normalization beyond 1NF (First Normal Form) were stated as follows by Codd: 1. To free the collection of relations from undesirable insertion, update and deletion dependencies; 2. To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs; 3. To make the relational model more informative to users;
Database normalization
56
4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by. —E.F. Codd, "Further Normalization of the Data Base Relational Model" [8]
The sections below give details of each of these objectives.
Free the database of modification anomalies When an attempt is made to modify (update, insert into, or delete from) a table, undesired side-effects may follow. Not all tables can suffer from these side-effects; rather, the side-effects can only arise in tables that have not been sufficiently normalized. An insufficiently normalized table might have one or more of the following characteristics: • The same same info informa rmatio tionn can be be expres expressed sed on on multiple rows; therefore updates to the table may result in logical inconsistencies. For example, each record in an "Employees' Skills" table might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular employee will potentially need to be applied to multiple records (one for each of his skills). If the update is not carried through successfully —if, that is, the employee's address is updated on some records but not others —then the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular employee's address is. This phenomenon is known as an update anomaly.
An update anomaly . Employee 519 is shown as having different addresses on different records.
An insertion anomaly . Until the new faculty member, Dr. Newsome, is assigned to teach at least one course, his details cannot be recorded.
• There There are circ circums umstan tances ces in in which which certa certain in facts cannot be recorded at all. For example, each record in a "Faculty and Their Courses" table might contain a Faculty ID, Faculty Name, Faculty Hire Date, and Course Code —thus we can record the details of any A deletion anomaly. All information about Dr. Giddens is lost when he temporarily ceases to be assigned to any courses. faculty member who teaches at least one course, but we cannot record the details of a newly hired faculty member who has not yet been assigned to teach any courses except by setting the Course Code to null. This phenomenon is known as an insertion anomaly. • Under certain circumstances, deletion of data representing certain facts necessitates deletion of data representing completely different facts. The "Faculty and Their Courses" table described in the previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must delete the
Database normalization
57
last of the records on which that faculty member appears, effectively also deleting the faculty member. This phenomenon is known as a deletion anomaly.
Minimize redesign when extending the database structure When a fully normalized database structure is extended to allow it to accommodate new types of data, the pre-existing aspects of the database structure can remain largely or entirely unchanged. As a result, applications interacting with the database are minimally affected.
Make the data model more informative to users Normalized tables, and the relationship between one normalized table and another, mirror real-world concepts and their interrelationships.
Avoid bias towards any particular pattern of querying Normalized tables are suitable for general-purpose querying. This means any queries against these tables, including future queries whose details cannot be anticipated, are supported. In contrast, tables that are not normalized lend themselves to some types of queries, but not others. For example, consider an online bookseller whose customers maintain wishlists of books they'd like to have. For the obvious, anticipated query —what books does this customer want? —it's enough to store the customer's wishlist in the table as, say, a homogeneous string of authors and titles. With this design, though, the database can answer only that one single query. It cannot by itself answer interesting but unanticipated queries: What is the most-wished-for book? Which customers are interested in WWII espionage? How does Lord Byron stack up against his contemporary poets? Answers to these questions must come from special adaptive tools completely separate from the database. One tool might be software written especially to handle such queries. This special adaptive software has just one single purpose: in effect to normalize the non-normalized field. Unforeseen queries can be answered trivially, and entirely within the database framework, with a normalized table.
Example Querying and manipulating the data within an unnormalized data structure, such as the following non-1NF representation of customers' credit card transactions, tr ansactions, involves more complexity than is really necessary: Customer Jones Wilkins Stevens Transactions Tr. ID
Date
Amount
1289 128900 14-O 14-Oct ct-2 -200 0033 −87 1290 129044 15-O 15-Oct ct-2 -200 0033 −50
Tr. ID
Date
Amount
1289 128988 14-O 14-Oct ct-2 -200 0033 −21
Database normalization
58
Tr. ID
Date
Amount
1290 129077 15-O 15-Oct ct-2 -200 0033 −18 1492 149200 20-N 20-Nov ov-2 -200 0033 −70 1500 150033 27-N 27-Nov ov-2 -200 0033 −60
To each customer corresponds a repeating group of transactions. The automated evaluation of any query relating to customers' transactions therefore would broadly involve t wo stages: 1. Unpacking one or more customers' customers' groups of transactions allowing the individual transactions transactions in a group to be examined, and 2. Deriving Deriving a query result result based based on the results results of the first first stage stage For example, in order to find out the monetary sum of all transactions that occurred in October 2003 for all customers, the system would have to know that it must first unpack the Transactions group of each customer, then sum the Amounts of all transactions thus obtained where the Date of the transaction falls in October 2003. One of Codd's important insights was that this t his structural complexity could always be removed completely, leading to much greater power and flexibility in the way queries could be formulated (by users and applications) and evaluated (by the DBMS). The normalized equivalent of the structure above would look like this: Customer Tr. ID
Date
Amount
Jones
12890 14-Oct-2003 −87
Jones
12904 15-Oct-2003 −50
Wilkins
12898 14-Oct-2003 −21
Stevens
12907 15-Oct-2003 −18
Stevens
14920 20-Nov-2003 −70
Stevens
15003 27-Nov-2003 −60
Now each row represents an individual credit card transaction, and the DBMS can obtain the answer of interest, simply by finding all rows with a Date falling in October, and summing their Amounts. The data structure places all of the values on an equal footing, exposing each to the DBMS directly, so each can potentially participate directly in queries; whereas in the previous situation some values were embedded in lower-level structures that had to be handled specially. Accordingly, the normalized design lends itself to general-purpose query processing, whereas the unnormalized design does not.
Background to normalization: definitions Functional dependency dependency
In a given table, an attribute Y is said to have a functional dependency on a set of attributes X (written X → Y ) if and only if each X value is associated with precisely one Y value. For example, in an "Employee" table that includes the attributes "Employee ID" and "Employee Date of Birth", the functional dependency {Employee ID} → {Employee Date of Birth} would hold. It follows from the previous two sentences that each {Employee ID} is associated with precisely one {Employee Date of Birth}. Trivial functional dependency dependency
A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} → {Employee Address} is trivial, as is {Employee Address} → {Employee Address}. Full functional dependency
Database normalization An attribute is fully functionally dependent on a set of attributes X if it is: • functi functiona onally lly depend dependent ent on X, and and • not functionall functionallyy dependent dependent on any proper subset subset of X. {Employee {Employee Address} Address} has a functional functional dependency dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}. Transitive dependency A transitive dependency is an indirect functional dependency, one in which X → Z only by virtue of X X →Y and Y → Z . Multivalued dependency A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows. Join dependency A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T . Superkey A superkey is a combination of attributes that can be used to uniquely identify a database record. A table might have many superkeys. Candidate key A candidate key is a special subset of superkeys that do not have any extraneous information in them: it is a minimal superkey. Examples: Imagine a table with the fields
, , and . This table has many possible superkeys. Three of these are , and . Of those listed, only is a candidate key, as the others contain information not necessary to uniquely identify records ('SSN' here refers to Social Security Number, which is unique to each person). Non-prime attribute A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table. Prime attribute A prime attribute, conversely, is an attribute that does occur in some candidate key. Primary key Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a key which the database designer has designated for this purpose.
Normal forms The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is. Each table has a " highest normal form" (HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its HNF. The normal forms are applicable to individual tables; to say that an entire database is in normal form n is to say that all of its tables are in normal form n. Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion, i.e. a 1NF design is first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of how normalization typically works. A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is
59
Database normalization
60
overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the designer, because 3NF tables usually need no modification to meet the requirements of these higher normal forms. The main normal forms are summarized below. Normal form
Defined by
In
Brief definition
1NF 1NF
Firs Firstt nor norma mall for form m Two Two ver verssions ions:: E.F E.F.. Codd (1970), C.J. Date (2003)
[1] 1970 and 2003 [9]
2NF
Second normal form
E.F. Codd
1971
[2]
No non-prime attribute in the table is functionally dependent on a proper subset of any candidate key
3NF
Third normal form
Two versions: E.F. Codd (1971), C. Zaniolo (1982)
[2] 1971 and 1982 [10]
Every non-prime attribute is non-transitively dependent on every candidate key in the table. The attributes that do not contribute to the description of the primary key are removed from the table. In other words, no transitivity dependency is allowed.
EKNF EKNF Elem Elemen enta tary ry Key Key Normal Form
C. Zaniolo
1982
BCNF Boyce – Codd Codd normal form
Raymond F. Boyce and E.F. Codd
1974
4NF
Fourth normal form
Ronald Fagin
1977
[12] Every non-trivial multivalued dependency in the table is a dependency on a superkey
5NF
Fifth no normal for form m Rona onald Fa Fagin
1979
[13] Every non-trivial join dependency in the table is implied by the superkeys of the table
Table faithfully represents a relation and has no repeating groups
[10] Every non-trivial functional dependency in the table is either the dependency of an elementary key attribute or a dependency on a superkey [11] Every non-trivial functional dependency in the table is a dependency on a superkey
DKNF DKNF Doma Domain in/ke /keyy normal form
Ronald Fagin
1981
[14] Every constraint on the table is a logical consequence of the table's domain constraints and key constraints
6NF
C.J. Date, Hugh Darwen, and Nikos Lorentzos
2002
[15] Table features no non-trivial join dependencies at all (with reference to generalized join operator)
Sixth normal form
Denormalization Databases intended for online transaction processing (OLTP) are typically more normalized than databases intended for online analytical processing (OLAP). OLTP applications are characterized by a high volume of small transactions such as updating a sales record at a supermarket checkout counter. The expectation is that each transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are primarily "read mostly" databases. OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or "denormalized" data may facilitate business intelligence applications. Specifically, dimensional tables in a star schema often contain denormalized data. The denormalized or redundant data must be carefully controlled during extract, transform, load (ETL) processing, and users should not be permitted to see the data until it is in a consistent state. The normalized alternative to the star schema is the snowflake schema. In many cases, the need for denormalization has waned as computers and RDBMS software have become more powerful, but since data volumes have generally increased along with hardware and software performance, OLAP databases often still use denormalized schemas. Denormalization is also used to improve performance on smaller computers as in computerized cash-registers and mobile devices, since these may use the data for look-up only (e.g. price lookups). Denormalization may also be used when no RDBMS exists for a platform (such as Palm), or no changes are to be made to the data and a swift response is crucial.
Database normalization
61
Non-first normal form (NF² or N1NF) In recognition that denormalization can be deliberate and useful, the non-first normal form is a definition of database designs which do not conform to first normal form, by allowing "sets and sets of sets to be attribute domains" (Schek 1982). The languages used to query and manipulate data in the model must be extended accordingly to support such values. One way of looking at this is to consider such structured values as being specialized types of values (domains), with their own domain-specific languages. However, what is usually meant by non-1NF models is the approach in which the relational model and the languages used to query it are extended with a general mechanism for such structure; for instance, the nested relational model supports the use of relations as domain values, by adding two additional operators (nest and unnest ) to the relational algebra that can create and flatten nested relations, respectively. Consider the following table:
First Normal Form Person Person Favour Favourite ite Colo Colour ur
Bob
blue
Bob
red
Jane
green
Jane
yellow
Jane
red
Assume a person has several favourite colours. Obviously, favourite colours consist of a set of colours modeled by the given table. To transform a 1NF into an NF² table a "nest" operator is required which extends the relational algebra of the higher normal forms. Applying the "nest" operator to the 1NF table yields the following NF² table:
Non-First Normal Form Person Person Favour Favourite ite Colour Colourss
Bob Favourite Colour
blue red Jane Favourite Colour
green yellow red
To transform this NF² table back into a 1NF an "unnest" operator is required which extends the relational algebra of the higher normal forms. The unnest, in this case, would make "colours" into its own table. Although "unnest" is the mathematical inverse to "nest", the operator "nest" is not always the mathematical inverse of "unnest". Another constraint required is for the operators to be bijective, which is covered by the Partitioned Normal Form (PNF).
Database normalization
Notes and references [1] Codd, E.F. (June (June 1970). "A Relational Relational Model of Data for Large Shared Shared Data Banks" Banks" (http:/ / www.acm. www.acm.org/ org/ classics/ classics/ nov95/ nov95/ toc.html). toc.html). Communications of the ACM 13 ACM 13 (6): 377 – 387. 387. doi:10.1145/362384.362685. . [2] Codd, E.F. "Further Normalization Normalization of the Data Base Relational Model". (Presented at Courant Computer Science Symposia Series Series 6, "Data Base Systems", New York City, May 24 – 25, 25, 1971.) IBM Research Report RJ909 (August 31, 1971). Republished in Randall J. Rustin (ed.), Data Base Systems: Courant Courant Computer Computer Science Symposia Series Series 6 . Prentice-Hall, 1972. [3] Codd, E. F. "Recent "Recent Investigations into Relational Data Base Base Systems". IBM Research Report RJ1385 RJ1385 (April 23, 1974). Republished in Proc. 1974 Congress (Stockholm, Sweden, 1974). , N.Y.: North-Holland (1974). Introduction to Database Database Systems. Addison-Wesley (1999), p. 290 [4] C.J. C.J. Date Date.. An Introduction contraindicated ... [Y]ou should [5] Chris Date, for example, writes: "I believe firmly that anything less less than a fully normalized design is strongly contraindicated "denormalize" only as a last resort . That is, you should back off from a fully normalized design only if all other strategies for improving Depth: Relational Theory Theory for Practitioners. O'Reilly (2005), performance have somehow failed to meet requirements." Date, C.J. Database in Depth: p. 152. [6] "The adoption of a relational model of data ... permits the development of a universal data sub-language based on an applied predicate calculus. A first-order predicate calculus suffices if the collection of relations is in first normal form. Such a language would provide a yardstick of linguistic power for all other proposed data languages, and would itself be a strong candidate for embedding (with appropriate syntactic modification) in a variety of host Ianguages (programming, command- or problem-oriented)." Codd, "A Relational Model of Data for Large Shared Data Banks" (http:/ / www.acm. www.acm.org/ org/ classics/ classics/ nov95/ nov95/ toc.html), toc.html), p. 381 [7] Codd, E.F. E.F. Chapter Chapter 23, "Serious "Serious Flaws Flaws in SQL", SQL", in The Relational Model for Database Management: Version 2 . Addison-Wesley (1990), pp. 371 – 389 389 [8] Codd, E.F. "Further "Further Normalizat Normalization ion of the Data Base Relat Relational ional Model", Model", p. 34 Database: Writings 2000– 2006 2006 (Springer-Verlag, 2006), pp. 127 – 128. [9] Date, C. C. J. "What First First Normal Normal Form Really Really Means" Means" in Date on Database: 128. Transactions on Database Database Systems 7(3), [10] Zaniolo, Carlo. "A "A New Normal Form for the Design of Relational Database Schemata." ACM Transactions September 1982. [11] Codd, E. F. "Recent "Recent Investigations into Relational Relational Data Base Systems". IBM Research Report RJ1385 (April 23, 1974). Republished in Proc. 1974 Congress Congress (Stockholm, Sweden, 1974). New York, N.Y.: North-Holland (1974). [12] Fagin, Ronald (September 1977). "Multivalued "Multivalued Dependencies and a New Normal Form for Relational Databases" (http:/ / www.almaden. www.almaden. Transactions on Database Database Systems 2 (1): 267. doi:10.1145/320557.320571. . ibm.com/ ibm. com/ cs/ cs/ people/ people/ fagin/ fagin/ tods77.pdf). tods77.pdf). ACM Transactions [13] Ronald Fagin. "Normal Forms Forms and Relational Database Database Operators". ACM SIGMOD International Conference on Management of Data, May 31-June 1, 1979, Boston, Mass. Also IBM Research Report RJ2471, Feb. 1979. for Relational Databases Databases That Is Based Based on Domains and Keys (http:/ [14] Ronald Ronald Fagin Fagin (1981 (1981)) A Normal Form for / www.almaden. www.almaden.ibm. ibm.com/ com/ cs/ cs/ Communications of the ACM , vol. 6, pp. 387 – 415 people/ fagin/ fagin/ tods81.pdf), tods81.pdf), Communications 415 [15] C.J. Date, Date, Hugh Hugh Darwen, Darwen, Nikos Nikos Lorentzos. Lorentzos. Temporal Data and the Relational Model. Morgan Kaufmann (2002), p. 176
• Paper: "Non "Non First Normal Form Form Relations" Relations" by G. Jaeschke, Jaeschke, H. -J Schek Schek ; IBM Heidelberg Heidelberg Scientific Scientific Center. Center. -> Paper studying normalization and denormalization operators nest and unnest as mildly described at the end of this wiki page.
Further reading • Litt's Litt's Tips Tips:: Norma Normaliz lizati ation on (htt (http:/ p:/ / www.troubleshooters.com/ www.troubleshooters.com/ littstip/ littstip/ ltnorm.html) ltnorm.html) • Date Date,, C. C. J. J. (19 (1999 99), ), An Introduction to Database Systems (http:/ / www.aw-bc. www.aw-bc.com/ com/ catalog/ catalog/ academic/ academic/ product/ product/ 0,1144,0321197844,00.html) 0,1144,0321197844,00.html) (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4. • Kent Kent,, W. W. (19 (1983 83)) A Simple Guide to Five Normal Forms in Relational / www.bkent. www.bkent.net/ net/ Relational Database Theory (http:/ Doc/ simple5.htm), simple5.htm), Communications of the ACM, vol. 26, pp. 120 – 125 125 • Date, Date, C.J. C.J.,, & Darwe Darwen, n, H., H., & Pasc Pascal, al, F. F. Database Debunkings (http:/ / www.dbdebunk. www.dbdebunk.com) com) • H.-J. Schek, Schek, P. Pistor Data Data Structures Structures for an Integrated Integrated Data Base Base Managemen Managementt and Information Information Retrieval Retrieval System System
62
Database normalization
63
External links • Databa Database se Norma Normaliz lizati ation on Basic Basicss (http:/ (http:/ / databases.about. databases.about.com/ com/ od/ od/ specificproducts/ specificproducts/ a/ a/ normalization.htm) normalization.htm) by Mike Chapple (About.com) • Databa Database se Norma Normaliz lizati ation on Intro Intro (htt (http:/ p:/ / www.databasejournal. www.databasejournal.com/ com/ sqletc/ sqletc/ article.php/ article. php/ 1428511), 1428511), Part 2 (http:/ / www.databasejournal. www.databasejournal.com/ com/ sqletc/ sqletc/ article.php/ article.php/ 26861_1474411_1) 26861_1474411_1) • An Introdu Introduction ction to Databa Database se Normalizati Normalization on (http:/ (http:/ / mikehillyer.com/ mikehillyer.com/ articles/ articles/ an-introduction-to-database-normalization/ ) by Mike Hillyer. • A tutori tutorial al on the the first first 3 norm normal al forms forms (htt (http:/ p:/ / phlonx.com/ phlonx.com/ resources/ resources/ nf3/ nf3/ ) by Fred Coulson • DB Norm Normali alizat zation ion Exampl Examples es (htt (http:/ p:/ / www.dbnormalization.com/ www.dbnormalization.com/ ) • Description Description of the database database normaliz normalization ation basics basics (http:/ (http:/ / support.microsoft. support.microsoft.com/ com/ kb/ kb/ 283878) 283878) by Microsoft • Database Database Normalizati Normalization on and and Design Design Techniques Techniques (http:/ (http:/ / www.barrywise. www.barrywise.com/ com/ 2008/ 2008/ 01/ 01/ database-normalization-and-design-techniques/ ) by Barry Wise, recommended reading for the Harvard MIS. • A Simple Guide Guide to Five Five Normal Normal Forms in Relational Relational Databa Database se Theory Theory (http:/ (http:/ / www.bkent.net/ www.bkent.net/ Doc/ Doc/ simple5. simple5. htm)
First normal form First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if the
domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain.[1] Edgar Codd, in a 1971 conference paper, defined a relation in first normal form to be one such that none of the domains of that relation should have elements which are themselves sets. [2] First normal form is an essential property of a relation in a relational database. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement.
Examples The following scenario illustrates how a database design might violate first normal form.
Domains and values Suppose a designer wishes to record the names and telephone numbers of customers. He defines a customer table which looks like this:
Customer Cust Custom omer er ID ID Firs Firstt Name Name Surn Surnam amee Tele Teleph phon onee Numb Number er
123
Robert
Ingram
555-861-2025
456
Jane
Wright
555-403-1659
789
Maria
Fernandez 555-808-9633
The designer then becomes aware of a requirement to record multiple telephone numbers for some customers. He reasons that the simplest way of doing this is to allow the "Telephone Number" field in any given record to contain more than one value:
First normal form
64
Customer Cust Custom omer er ID ID Firs Firstt Name Name Surn Surnam amee Tele Teleph phon onee Numb Number er
123
Robert
Ingram
555-861-2025
456
Jane
Wright
555-403-1659 555-776-4100
789
Maria
Fernandez 555-808-9633
Assuming, however, that the Telephone Number column is defined on some telephone number-like domain, such as the domain of 12-character strings, the representation above is not in first normal form. It is in violation of first normal form as a single field has been allowed to contain multiple values. A typical relational database management system will not allow fields in a table to contain multiple values in this way.
A design that complies with 1NF A design that is unambiguously in first normal form makes use of two tables: a Customer Name table and a Customer Telephone Number table.
Customer Name Cust Custom omer er ID ID Firs Firstt Name Name Surn Surnam amee
123
Robert
Ingram
456
Jane
Wright
789
Maria
Fernandez
Customer Telephone Number Custom Customer er ID Teleph Telephone one Numbe Numberr
123
555-861-2025
456
555-403-1659
456
555-776-4100
789
555-808-9633
Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone Number link appears on its own record. With Customer ID as key fields, a "parent-child" or one-to-many relationship exists between the two tables. A record in the "parent" table, Customer Name, can have many telephone number records in the "child" table, Customer Telephone Number, but each telephone number belongs to one, and only one customer. It is worth noting that this design meets the additional requirements for second and third normal form.
Atomicity Edgar F. Codd's definition of 1NF makes reference to the concept of 'atomicity'. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS." [3] Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)."[4] Meaning a field should not be divided into parts with more than one kind of data in it such that what one part means to the DBMS depends on another part of the same field. Hugh Darwen and Chris Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood. [5][6] In particular, the notion of a
First normal form "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic: • A character character string would would seem not to be atomic, atomic, as the RDBMS RDBMS typically typically provides provides operators operators to decompose decompose it into substrings. • A fixed-point fixed-point number number would seem not to be atomic, atomic, as the RDBMS RDBMS typically typically provides provides operators operators to decompose decompose it into integer and fractional components. Date suggests that "the notion of atomicity has no absolute meaning":[7] a value may be considered atomic for some purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns of any conceivable data type (from string types and numeric types to array types and table types) are then acceptable in a 1NF table —although perhaps not always desirable (For example, it would be more desirable to separate a Customer Name field into two separate fields as First Name, Surname). First normal form, as defined by Chris Date, permits relation-valued attributes (tables within tables). Date argues that relation-valued attributes, by means of which a field within a table can contain a table, are useful in rare cases. [8]
1NF tables as representations of relations According to Date's definition, a table is in first normal form if and only if it is "isomorphic to some relation", which means, specifically, that it satisfies the following five conditions: [9] 1. 2. 3. 4. 5.
There's There's no top-totop-to-bottom bottom ordering ordering to the the rows. rows. There's There's no left-to-ri left-to-right ght ordering ordering to the the columns. columns. There There are are no dupli duplicat catee rows. rows. Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else). else). All columns are regular regular [i.e. rows have no hidden components such as row IDs, IDs, object IDs, or hidden timestamps].
Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in first normal form. Examples of tables (or views) that would not meet this definition of first normal form are: • A table that that lacks a unique unique key. Such Such a table would would be able able to accommodate accommodate duplica duplicate te rows, in violatio violationn of condition 3. • A view whose definitio definitionn mandates mandates that results results be returned returned in a particular particular order, so that the row-order row-ordering ing is an [10] intrinsic and meaningful aspect of the view. This violates condition 1. The tuples in true relations are not ordered with respect to each other. • A table with at least least one nullable nullable attribute. attribute. A nullable nullable attribute attribute would be in violation violation of condition condition 4, which requires every field to contain exactly one value from its column's domain. It should be noted, however, that this aspect of condition 4 is controversial. It marks an important departure from Codd's later vision of the relational model,[11] which made explicit provision for nulls. [12]
65
First normal form
References Fundamentals of Database Systems, Systems, Fourth Edition Edition. Pearson. p. 315. [1] Elmasri, Elmasri, Ramez Ramez and Navathe, Navathe, Shamkant Shamkant B. (July (July 2003). Fundamentals ISBN 0321204484. "It states that the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute." Data Base Systems. Courant Institute: Prentice-Hall. [2] E. F. Codd (Oct 1972). "Further "Further normalization normalization of the database database relational relational model". model". Data ISBN 013196741X. "A relation is in first normal form if it has the property that none of its domains has elements which are themselves sets." [3] Codd, Codd, E. F. The Relational Model for Database Management Version 2 (Addison-Wesley, 1990). [4] Codd, Codd, E. F. The Relational Model for Database Management Version 2 (Addison-Wesley, 1990), p. 6. [5] Darwen, Hugh. "Relation-Valued Attributes; or, Will the Real First Normal Form Please Please Stand Up?", in C. J. Date and Hugh Darwen, Relational Database Database Writings 1989-1991 1989-1991 (Addison-Wesley, 1992). [6] "[F]or many years," writes Date, "I "I was as confused as anyone else. What's worse, I did my best (worst?) to to spread that confusion through my writings, seminars, and other presentations." Date, C. J. "What First Normal Form Really Means" (http:/ / www.dbdebunk. www.dbdebunk.com/ com/ page/ page/ page/ page/ Database: Writings 2000-2006 2000-2006 (Springer-Verlag, 2006), p. 108 629796.htm) 629796.htm) in Date on Database: [7] Date, C. J. "What "What First First Normal Form Really Really Means" Means" (http:/ (http:/ / www.dbdebunk. www.dbdebunk.com/ com/ page/ page/ page/ page/ 629796.htm) 629796.htm) p. 112. [8] Date, C. J. "What "What First First Normal Form Really Really Means" Means" (http:/ (http:/ / www.dbdebunk. www.dbdebunk.com/ com/ page/ page/ page/ page/ 629796.htm) 629796.htm) pp. 121 – 126. 126. [9] Date, C. J. "What "What First First Normal Form Really Really Means" Means" (http:/ (http:/ / www.dbdebunk. www.dbdebunk.com/ com/ page/ page/ page/ page/ 629796.htm) 629796.htm) pp. 127 – 128. 128. [10] Such views cannot cannot be created using using SQL that conforms to the SQL:2003 SQL:2003 standard. standard. [11] "Codd first first defined the relational relational model in 1969 and didn't introduce introduce nulls nulls until 1979" Date, C. J.SQL and Relational Theory (O'Reilly, 2009), Appendix A.2. [12] The third of Codd's 12 rules states that "Null values ... [must be] supported in a fully relational DBMS for representing missing missing information and inapplicable information in a systematic way, independent of data type." Codd, E. F. "Is Your DBMS Really Relational?" Computerworld , October 14, 1985.
Further reading • Litt's Litt's Tips Tips:: Norma Normaliz lizati ation on (htt (http:/ p:/ / www.troubleshooters.com/ www.troubleshooters.com/ littstip/ littstip/ ltnorm.html) ltnorm.html) • Date, C. J., J., & Lorentzo Lorentzos, s, N., N., & Darwen, Darwen, H. (2002). (2002). Temporal Data & the Relational Model (http:/ / www. www. elsevier.com/ elsevier.com/ wps/ wps/ product/ product/ cws_home/ cws_home/ 680662) 680662) (1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9. • Date Date,, C. C. J. J. (19 (1999 99), ), An Introduction to Database Systems (http:/ / www.aw-bc. www.aw-bc.com/ com/ catalog/ catalog/ academic/ academic/ product/ product/ 0,1144,0321197844,00.html) 0,1144,0321197844,00.html) (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4. • Kent Kent,, W. W. (19 (1983 83)) A Simple Guide to Five Normal Forms in Relational / www.bkent. www.bkent.net/ net/ Relational Database Theory (http:/ Doc/ simple5.htm), simple5.htm), Communications of the ACM, vol. 26, pp. 120 – 125 125 • Date, Date, C. C. J., J., & Darwe Darwen, n, H., H., & Pasc Pascal, al, F. Database Debunkings (http:/ / www.dbdebunk. www.dbdebunk.com) com)
66
Second normal form
67
Second normal form Second normal form (2NF) is a normal form used in database normalization. 2NF was originally defined by E.F.
Codd in 1971.[1] A table that is in first normal form (1NF) must meet additional criteria if it is to qualify for second normal form. Specifically: a table is in 2NF if and only if it is in 1NF and no non prime attribute is dependent on any proper subset of any candidate key of the table. A non prime attribute of a table is an attribute that is not a part of any candidate key of the table. Put simply, a table is in 2NF if and only if it is in 1NF and every non-prime attribute of the table is either dependent on the whole of a candidate key, or on another non prime attribute. Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF.
Example Consider a table describing employees' skills:
Employees' Skills Employee
Skill
Current Wo Work Lo Location
Jones
Typing
114 Main Street
Jones
Shorthand
114 Main Street
Jones
Whittling
114 Main Street
Bravo ravo
Light ght Cleaning 73 Indu ndustrial Way
Ellis
Alchemy
73 Industrial Way
Ellis
Flying
73 Industrial Way
Harr Harris ison on
Ligh Lightt Cle Clean anin ingg 73 Indu Indust stri rial al Way Way
Neither {Employee} nor {Skill} is a candidate key for the table. This is because a given Employee might need to appear more than once (he might have multiple Skills), and a given Skill might need to appear more than once (it might be possessed by multiple Employees). Only the composite key {Employee, Skill} qualifies as a candidate key for the table. The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely Employee. Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are represented: we are told three times that Jones works at 114 Main Street, and twice that Ellis works at 73 Industrial Way. This redundancy makes the table vulnerable to update anomalies: it is, for example, possible to update Jones' work location on his " Typing" and "Shorthand" records and not update his " Whittling" record. The resulting data would imply contradictory answers to the question "What is Jones' current work location?" A 2NF alternative to this design would represent the same information in two tables: an "Employees" table with candidate key {Employee}, and an "Employees' Skills" table with candidate key {Employee, Skill}:
Second normal form
68
Employees Employee Employee Current Current Work Location Location
Jones
114 Main Street
Bravo
73 Industrial Way
Ellis
73 Industrial Way
Harr Harris ison on
73 Indu Indust stri rial al Way Way
Employees' Skills Employee
Skill
Jones
Typing
Jones
Shorthand
Jones
Whittling
Bravo
Light Cleaning
Ellis
Alchemy
Ellis
Flying
Harr Harris ison on
Ligh Lightt Clea Cleani ning ng
Neither of these tables can suffer from update anomalies. Not all 2NF tables are free from update anomalies, however. An example of a 2NF table which suffers from update anomalies is:
Tournament Winners Tournament
Year
Winner
Winner Date of Birth
Des Des Moin Moines es Mast Master erss 1998 1998 Chip Chip Mast Master erso sonn 14 Marc Marchh 1977 1977 Indi Indian anaa Invi Invita tati tion onal al 1998 1998 Al Fred Fredri rick ckso sonn
21 July July 1975 1975
Cleveland Op Open
28 Se September 19 1968
1999 Bob Al Albertson
Des Des Moi Moine ness Mas Maste ters rs 1999 1999 Al Fred Fredri rick ckso sonn
21 July July 1975 1975
Indi Indian anaa Invi Invita tati tion onal al 1999 1999 Chip Chip Mas Maste ters rson on 14 Mar March ch 197 19777
Even though Winner and Winner Date of Birth are determined by the whole key {Tournament, Year} and not part of it, particular Winner / Winner Date of Birth combinations are shown redundantly on multiple records. This leads to an update anomaly: if updates are not carried out consistently, a particular winner could be shown as having two different dates of birth. The underlying problem is the transitive dependency to which the Winner Date of Birth attribute is subject. Winner Date of Birth actually depends on Winner, which in turn depends on the key Tournament / Year. This problem is addressed by third normal form (3NF).
Second normal form
69
2NF and candidate keys A table for which there are no partial functional dependencies on the primary key is typically, but not always, in 2NF. In addition to the primary key, the table may contain other candidate keys; it is necessary to establish that no non-prime attributes have part-key dependencies on any of these candidate keys. Multiple candidate keys occur in the following table:
Electric Toothbrush Models Manufacturer
Model
Model Full Name
Manufacturer Country
Forte
X-Prime
Forte X-Prime
Italy
Forte
Ultraclean
Forte Ultraclean
Italy
Dent-o-Fresh
EZbru brush
Dent-o-F o-Fres resh BananaBrush-200 2000 USA
Kobayashi
ST-60
Kobayashi ST-60
Hoch
Toothmaster Hoch Toothmaster
Germany
Hoch
X-Prime
Germany
Japan
Hoch X-Prime
Even if the designer has specified the primary key as {Model Full Name}, the table is not in 2NF. {Manufacturer, Model} is also a candidate key, and Manufacturer Country is dependent on a proper subset of it: Manufacturer. To make the design conform to 2NF, it is necessary to have two tables:
Electric Toothbrush Manufacturers Manufactu Manufacturer rer Manufactu Manufacturer rer Country Country
Forte
Italy
Dentnt-o-Fresh
USA
Kobayashi
Japan
Hoch
Germany
Electric Toothbrush Models Manufacturer
Model
Model Full Name
Forte
X-Prime
Forte X-Prime
Forte
Ultraclean
Forte Ultraclean
Dent Dent-o -o-F -Fre resh sh
EZbr EZbrus ushh
Dent Dent-o -o-F -Fre ressh Banan ananaB aBru rush sh-2 -200 0000
Kobayashi
ST-60
Kobayashi ST-60
Hoch
Toothmaster Hoch Toothmaster
Hoch
X-Prime
Hoch X-Prime
Second normal form
References [1] Codd, E.F. "Further Normalization Normalization of the Data Base Relational Model." (Presented at Courant Computer Science Symposia Series Series 6, "Data Base Systems," New York City, May 24th-25th, 1971.) IBM Research Report RJ909 (August 31st, 1971). Republished in Randall J. Rustin Courant Computer Computer Science Symposia Series Series 6 . Prentice-Hall, 1972. (ed.), Data Base Systems: Courant
Further reading • Litt's Litt's Tips Tips:: Norma Normaliz lizati ation on (htt (http:/ p:/ / www.troubleshooters.com/ www.troubleshooters.com/ littstip/ littstip/ ltnorm.html) ltnorm.html) • Date, C. J., J., & Lorentzo Lorentzos, s, N., N., & Darwen, Darwen, H. (2002). (2002). Temporal Data & the Relational Model (http:/ / www. www. elsevier.com/ elsevier.com/ wps/ wps/ product/ product/ cws_home/ cws_home/ 680662) 680662) (1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9. • C.J. C.J.Da Date te (200 (2004) 4).. Introduction to Database Systems (8th ed.). Boston: Addison-Wesley. ISBN 978-0-321-19784-9. • Kent Kent,, W. W. (19 (1983 83)) A Simple Guide to Five Normal Forms in Relational / www.bkent. www.bkent.net/ net/ Relational Database Theory (http:/ Doc/ simple5.htm), simple5.htm), Communications of the ACM, vol. 26, pp. 120 – 125 125 • Date, Date, C.J. C.J.,, & Darwe Darwen, n, H., H., & Pasc Pascal, al, F. F. Database Debunkings (http:/ / www.dbdebunk. www.dbdebunk.com) com)
External links • Databa Database se Norma Normaliz lizati ation on Basic Basicss (http:/ (http:/ / databases.about. databases.about.com/ com/ od/ od/ specificproducts/ specificproducts/ a/ a/ normalization.htm) normalization.htm) by Mike Chapple (About.com) • An Introdu Introduction ction to Databa Database se Normalizati Normalization on (http:/ (http:/ / mikehillyer.com/ mikehillyer.com/ articles/ articles/ an-introduction-to-database-normalization/ ) by Mike Hillyer. • A tutori tutorial al on the the first first 3 norm normal al forms forms (htt (http:/ p:/ / phlonx.com/ phlonx.com/ resources/ resources/ nf3/ nf3/ ) by Fred Coulson • Description Description of the database database normaliz normalization ation basics basics (http:/ (http:/ / support.microsoft. support.microsoft.com/ com/ kb/ kb/ 283878) 283878) by Microsoft
Third normal form In computer science, the third normal form (3NF) is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971. [1] Codd's definition states that a table is in 3NF if and only if both of the following conditions hold: • The relat relation ion R (tabl (table) e) is in seco second nd norma normall form (2NF) • Every non-prime non-prime attribute attribute of R is non-transiti non-transitively vely dependen dependentt (i.e. directly directly dependent) dependent) on every every superkey superkey of R. A non-prime attribute of R is an attribute that does not belong to any candidate key of R. [2] A transitive dependency is a functional dependency in which X → Z ( X X determines Z ) indirectly, by virtue of X X → Y and Y → Z [3] (where it is not the case that Y → X ). ). A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and only if, for each of its functional dependencies X → A, at least one of the following conditions holds: • X contains A (that is, X → A is trivial functional dependency), or • X is a superkey, or • Eve Every elem elemeent of A- X X , the set difference between A and X, is a prime attribute (i.e., each column in A- X X is [4][5] contained in some candidate key) Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent Boyce – Codd Codd normal form (BCNF). BCNF simply eliminates the third alternative (" A is a prime attribute").
70
Third normal form
71
"Nothing but the key" A memorable statement of Codd's definition of 3NF, paralleling the traditional pledge to give true evidence in a court of law, was given by Bill Kent: "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key." [6] A common variation supplements this definition with the oath: "so help me Codd". [7] Requiring existence of "the key" ensures that the table is in 1NF; requiring that non-key attributes be dependent on "the whole key" ensures 2NF; further requiring that non-key attributes be dependent on "nothing but the key" ensures 3NF. Chris Date refers to Kent's summary as "an intuitively attractive characterization" of 3NF, and notes that with slight adaptation it may serve as a definition of the slightly stronger Boyce – Codd Codd normal form: "Each attribute must represent a fact about the key, the whole key, and nothing but the key." [8] The 3NF version of the definition is weaker than Date's BCNF variation, as the former is concerned only with ensuring that non-key attributes are dependent on keys. Prime attributes (which are keys or parts of keys) must not be functionally dependent at all; they each represent a fact about the key in the sense of providing part or all of the key itself. (It should be noted here that this rule applies only to functionally dependent attributes, as applying it to all attributes would implicitly prohibit composite candidate keys, since each part of any such key would violate the "whole key" clause.) An example of a 2NF table that fails to meet the requirements of 3NF is:
Tournament Winners Tournament
Year
Winner
Winner Date of Birth
Indi Indian anaa Invi Invita tati tion onal al 1998 1998 Al Fred Fredri rick ckso sonn
21 July July 1975 1975
Cleveland Op Open
28 Se September 19 1968
1999 Bob Al Albertson
Des Des Moi Moine ness Mas Maste ters rs 1999 1999 Al Fred Fredri rick ckso sonn
21 July July 1975 1975
Indi Indian anaa Invi Invita tati tion onal al 1999 1999 Chip Chip Mas Maste ters rson on 14 Mar March ch 197 19777
Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the composite key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a candidate key for the table. The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the candidate key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records. In order to express the same facts without violating 3NF, it is necessary to split the table into two:
Tournament Winners Tournament
Year
Winner
India Indiana na Invit Invitat atio ional nal 1998 1998 Al Fred Fredri rick ckso sonn Cleveland Open
1999 Bob Albertson
Des Des Moin Moines es Mast Master erss 1999 1999 Al Fred Fredri rick ckso sonn India Indiana na Invit Invitat atio ional nal 1999 1999 Chip Chip Mast Master erso sonn
Third normal form
72
Player Dates of Birth Player
Date of Birth
Chip Chip Maste Masterso rsonn 14 March March 1977 Al Fred Fredri rick ckso sonn
21 Jul Julyy 1975 1975
Bob Bob Alb Alber erts tson on
28 Sep Septe temb mber er 1968 1968
Update anomalies cannot occur in these tables, which are both in 3NF.
Derivation of Zaniolo's conditions The definition of 3NF offered by Carlo Zaniolo in 1982, and given above, is proved in the following way: Let X → A be a nontrivial FD (i.e. one where X does not contain A) and let A be a non-key attribute. Also let Y be a key of R. Then Y → X. Therefore A is not transitively dependent on Y if and only if X → Y, that is, if and only if X is a superkey.[9]
Normalization beyond 3NF Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF tables, rarely met with in practice, are affected by such anomalies; these are tables which either fall short of Boyce – Codd Codd normal form (BCNF) or, if they meet BCNF, fall short of the higher normal forms 4NF or 5NF.
References [1] Codd, E.F. "Further Normalization Normalization of the Data Base Relational Model." (Presented at Courant Computer Science Symposia Series Series 6, "Data Base Systems," New York City, May 24th – 25th, 25th, 1971.) IBM Research Report RJ909 (August 31st, 1971). Republished in Randall J. Rustin Courant Computer Computer Science Symposia Series Series 6 . Prentice-Hall, 1972. (ed.), Data Base Systems: Courant [2] Codd, Codd, p. 43. 43. [3] [3] Codd Codd,, p. 45 45 – 46. 46. Transactions on Database Database Systems 7(3), [4] Zaniolo, Carlo. "A New Normal Form for the Design of Relational Database Schemata." ACM Transactions September 1982. Concepts (http:/ [5] Abraham Abraham Silberschatz Silberschatz,, Henry F. Korth, S. Sudarsha Sudarshan, n, Database System Concepts / www.db-book. www. db-book.com/ com/ ) (5th edition), p. 276-277 [6] Kent, William. "A Simple Guide to Five Five Normal Forms in Relational Database Theory" (http:/ / www.bkent. www.bkent.net/ net/ Doc/ Doc/ simple5.htm), simple5.htm), Communications of the ACM 26 ACM 26 (2), Feb. 1983, pp. 120 – 125. 125. [7] The author of a 1989 book on database management credits one of his students with coming up with the "so help me Codd" addendum. Diehr, Diehr, Management (Scott, Foresman, 1989), p. 331. George. Database Management Introduction to Database Database Systems (7th ed.) (Addison Wesley, 2000), p. 379. [8] Date Date,, C.J. C.J. An Introduction [9] Zanio Zaniolo lo,, p. 494. 494.
Further reading • Date Date,, C. C. J. J. (19 (1999 99), ), An Introduction to Database Systems (http:/ / www.aw-bc. www.aw-bc.com/ com/ catalog/ catalog/ academic/ academic/ product/ product/ 0,1144,0321197844,00.html) 0,1144,0321197844,00.html) (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4. • Kent Kent,, W. W. (19 (1983 83)) A Simple Guide to Five Normal Forms in Relational / www.bkent. www.bkent.net/ net/ Relational Database Theory (http:/ Doc/ simple5.htm), simple5.htm), Communications of the ACM, vol. 26, pp. 120 – 125 125
External links • Litt's Litt's Tips Tips:: Norma Normaliz lizati ation on (htt (http:/ p:/ / www.troubleshooters.com/ www.troubleshooters.com/ littstip/ littstip/ ltnorm.html) ltnorm.html) • Databa Database se Norma Normaliz lizati ation on Basic Basicss (http:/ (http:/ / databases.about. databases.about.com/ com/ od/ od/ specificproducts/ specificproducts/ a/ a/ normalization.htm) normalization.htm) by Mike Chapple (About.com)
Third normal form
73
• An Introdu Introduction ction to Databa Database se Normalizati Normalization on (http:/ (http:/ / mikehillyer.com/ mikehillyer.com/ articles/ articles/ an-introduction-to-database-normalization/ ) by Mike Hillyer. • A tutori tutorial al on the the first first 3 norm normal al forms forms (htt (http:/ p:/ / phlonx.com/ phlonx.com/ resources/ resources/ nf3/ nf3/ ) by Fred Coulson • Description Description of the database database normaliz normalization ation basics basics (http:/ (http:/ / support.microsoft. support.microsoft.com/ com/ kb/ kb/ 283878) 283878) by Microsoft
Boyce –Codd normal form Boyce –Codd normal form (or BCNF or 3.5NF) is a normal form used in database normalization. It is a slightly
stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomaly not dealt with by 3NF as originally defined. [1] Chris Date has pointed out that a definition of what we now know as BCNF appeared in a paper by Ian Heath in 1971.[2] Date writes: "Since that definition predated Boyce and Codd's own definition by some three years, it seems to me that BCNF ought by rights to be called Heath normal form. But it isn't." [3] If a relational scheme is in BCNF then all redundancy based on functional dependency has been removed, although other types of redundancy may still exist. A relational schema R is in Boyce – Codd Codd normal form if and only if for every one of its dependencies X → Y , at least one of the following conditions hold: [4] • X → Y is a trivial functional dependency (Y ⊆ X) • X is a superkey for schema R
3NF tables not meeting BCNF Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table which does not have multiple overlapping candidate keys is guaranteed to be in BCNF. [5] Depending on what its functional dependencies are, a 3NF table with two or more overlapping candidate keys may or may not be in BCNF An example of a 3NF table that does not meet BCNF is:
Today's Court Bookings Court Start Ti Time End Time
Rate Type
1
09:30
10:30
SAVER
1
11:00
12:00
SAVER
1
14:00
15:30
STANDARD
2
10:00
11:30
PREMIUM-B
2
11:30
13:30
PREMIUM-B
2
15:00
16:30
PREMIUM-A
• Each row in the the table represen represents ts a court booking booking at a tennis tennis club that that has one hard hard court court (Court 1) and one one grass court (Court 2) • A booking booking is defined defined by its its Court and and the period period for which which the the Court is reserved reserved • Additionally Additionally,, each booking booking has a Rate Type associat associated ed with it. There are four four distinct distinct rate types: types: • • • •
SAVER, SAVER, for for Court Court 1 book booking ingss made made by membe members rs STANDARD, STANDARD, for Court Court 1 bookings bookings made by non-mem non-members bers PREMIUM-A, PREMIUM-A, for Court Court 2 bookings bookings made by members members PREMIUM-B, PREMIUM-B, for Court Court 2 bookings bookings made by non-memb non-members ers
BoyceCodd normal form
74
Note that, Court 1 (normal quality) is less expensive than Court 2 (high quality) The table's superkeys are: • • • • • • • • •
S1 = {Court, Start Time} S2 = {Court, End Time} S3 = {Rate Type, Start Time} S4 = {Rate Type, End Time} S5 = {Court, Start Time, End Time} S6 = {Rate Type, Start Time, End Time} S7 = {Court, Rate Type, Start Time} S8 = {Court, Rate Type, End Time} ST = {Court, Rate Type, Start Time, End Time}, the trivial superkey
Note that even though in the above table Start Time and End Time attributes have no duplicate values for each of them, we still have to admit that in some other days two different bookings on court 1 and court 2 could start at the same time or end at the same time . This is the reason why {Start Time} and {End Time} cannot be considered as the table's superkeys. However, only S 1, S2, S3 and S4 are candidate keys (that is, minimal superkeys for that relation) because e.g. S 1 ⊂ S5, so S5 cannot be a candidate key. Recall that 2NF prohibits partial functional dependencies of non-prime attributes (i.e. an attribute that does not occur in ANY candidate key) on candidate keys, and that 3NF prohibits transitive functional dependencies of non-prime attributes on candidate keys. In Today's Court Bookings table, there are no non-prime attributes: that is, all attributes belong to some candidate key. Therefore the table adheres to both 2NF and 3NF. The table does not adhere to BCNF. This is because of the dependency Rate Type → Court, in which the determining attribute (Rate Type) is neither a candidate key nor a superset of a candidate key. Dependency Rate Type → Court is respected as a Rate Type should only ever apply to a single Court. The design can be amended so that it meets BCNF:
Rate Types Rate Ty Type
SAVER
Court Member Fl Flag
1
Yes
STANDARD 1
No
PREMIUM-A 2
Yes
PREMIUM-B 2
No
BoyceCodd normal form
75
Today's Bookings Rate ate Type
Start tart Time ime End End Time Time
SAVER
09:30
10:30
SAVER
11:00
12:00
STANDARD 14:00
15:30
PREMIUM-B 10:00
11:30
PREMIUM-B 11:30
13:30
PREMIUM-A 15:00
16:30
The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag}; the candidate keys for the Today's Bookings table are {Rate Type, Start Time} and {Rate Type, End Time}. Both tables are in BCNF. Having one Rate Type associated with two different Courts is now impossible, so the anomaly affecting the original table has been eliminated.
Achievability of BCNF In some cases, a non-BCNF table cannot be decomposed into tables that satisfy BCNF and preserve the dependencies that held in the original table. Beeri and Bernstein showed in 1979 that, for example, a set of functional dependencies {AB → C, C → B} cannot be represented by a BCNF schema. [6] Thus, unlike the first three normal forms, BCNF is not always achievable. Consider the following non-BCNF table whose functional dependencies follow the {AB → C, C → B} pattern:
Nearest Shops Person
Shop Ty Type
Davidson Optician
Nearest Sh Shop
Eagle Eye
Davi Davids dson on Hair Hairdr dres esse serr Snip Snippe pets ts Wright
Bookshop
Merlin Bo Books
Fuller
Bakery
Doughy's
Fuller
Hairdre dresser Sweeney Todd's
Fuller
Optician
Eagle Eye
For each Person / Shop Type combination, the table tells us which shop of this type is geographically nearest to the person's home. We assume for simplicity that a single shop cannot be of more than one type. The candidate keys of the table are: • {Per {Perso son, n, Sho Shopp Type Type}} • {Pers {Person on,, Nea Neare rest st Shop Shop}} Because all three attributes are prime attributes (i.e. belong to candidate keys), the table is in 3NF. The table is not in BCNF, however, as the Shop Type attribute is functionally dependent on a non-superkey: Nearest Shop. The violation of BCNF means that the table is subject to anomalies. For example, Eagle Eye might have its Shop Type changed to "Optometrist" on its "Fuller" record while retaining the Shop Type "Optician" on its "Davidson" record. This would imply contradictory answers to the question: "What is Eagle Eye's Shop Type?" Holding each shop's Shop Type only once would seem preferable, as doing so would prevent such anomalies from occurring:
BoyceCodd normal form
76
Shop Near Person Person
Shop
Davi Davids dson on Eagl Eaglee Eye Eye Davi Davids dson on Snip Snippet petss Wright
Merlin Bo Books
Fuller
Doughy's
Fuller
Sweeney Todd's
Fuller
Eagle Eye
Shop Shop
Shop Type
Eagle Eye
Optician
Snippets
Hairdresser
Merlin Boo Bookks
Books okshop
Doughy's
Bakery
Sweene Sweeneyy Todd's Todd's Hairdre Hairdresse sserr
In this revised design, the "Shop Near Person" table has a candidate key of {Person, Shop}, and the "Shop" table has a candidate key of {Shop}. Unfortunately, although this design adheres to BCNF, it is unacceptable on different grounds: it allows us to record multiple shops of the same type against the same person. In other words, its candidate keys do not guarantee that the functional dependency {Person, Shop Type} → {Shop} will be respected. A design that eliminates all of these anomalies (but does not conform to BCNF) is possible. [7] This design consists of the original "Nearest Shops" table supplemented by the "Shop" table described above.
Nearest Shops Person
Shop Ty Type
Davidson Optician
Nearest Sh Shop
Eagle Eye
Davi Davids dson on Hair Hairdr dres esse serr Snip Snippe pets ts Wright
Bookshop
Merlin Bo Books
Fuller
Bakery
Doughy's
Fuller
Hairdre dresser Sweeney Todd's
Fuller
Optician
Eagle Eye
BoyceCodd normal form
77
Shop Shop
Shop Type
Eagle Eye
Optician
Snippets
Hairdresser
Merlin Boo Bookks
Books okshop
Doughy's
Bakery
Sweene Sweeneyy Todd's Todd's Hairdre Hairdresse sserr
If a referential integrity constraint is defined to the effect that {Shop Type, Nearest Shop} from the first table must refer to a {Shop Type, Shop} from the second table, then the data anomalies described previously are prevented.
References [1] Codd, E. F. "Recent "Recent Investigations into Relational Data Base Base Systems." IBM Research Report RJ1385 RJ1385 (April 23, 1974). Republished in Proc. 1974 Congress (Stockholm, Sweden, 1974). New York, N.Y.: North-Holland (1974). Workshop on Data Description, Description, Access, and [2] Heath, Heath, I. "Unacceptable "Unacceptable File Operations Operations in a Relational Relational Database." Database." Proc. 1971 ACM SIGFIDET Workshop Control, San Diego, Calif. (November 11th – 12th, 12th, 1971). Depth: Relational Theory Theory for Practitioners. O'Reilly (2005), p. 142. [3] Date Date,, C.J. C.J. Database in Depth: Concepts (6th ed.). McGraw-Hill. pp. 333. ISBN 978-0-07-352332-3. [4] Silberschat Silberschatz, z, Abraham Abraham (2006). (2006). Database System Concepts Processing Letters [5] Vincent, M.W. and B. B. Srinivasan. "A Note Note on Relation Schemes Schemes Which Are in 3NF But Not in BCNF." Information Processing 48(6), 1993, pp. 281 – 83. 83. Transactions [6] Beeri, Catriel and Bernstein, Philip A. A. "Computational problems related to the design of normal form relational schemas." ACM Transactions on Database Systems 4(1), March 1979, p. 50. Transactions on Database Database Systems 7(3), [7] Zaniolo, Carlo. "A New Normal Form for the Design of Relational Database Schemata." ACM Transactions September 1982, pp. 493.
Bibliography • Date Date,, C. C. J. J. (19 (1999 99). ). An Introduction to Database Systems (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
External links • Rules Rules Of Of Data Data Norma Normaliz lizati ation on (http (http:/ :/ / web.archive. web.archive.org/ org/ web/ web/ 20080805014412/ 20080805014412/ http:/ http:/ / www.datamodel.org/ www.datamodel.org/ NormalizationRules.html) NormalizationRules.html) • Advanc Advanced ed Normal Normaliza izatio tionn (htt (http:/ p:/ / web.archive. web.archive.org/ org/ web/ web/ 20080423014733/ 20080423014733/ http:/ http:/ / www.utexas. www.utexas.edu/ edu/ its/ its/ archive/ windows/ windows/ database/ database/ datamodeling/ datamodeling/ rm/ rm/ rm8.html) rm8.html) by ITS, University of Texas.
Fourth normal form
78
Fourth normal form Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977,
4NF is the next level of normalization after Boyce – Codd Codd normal form (BCNF). Whereas the second, third, and Boyce – C Codd odd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A Table is in 4NF if and only if, for every one of its non-trivial [1] multivalued dependencies X Y , X is a superkey —that is, X is either a candidate key or a superset thereof.
Multivalued dependencies If the column headings in a relational database table are divided into three disjoint groupings X , Y , and Z , then, in the context of a particular row, we can refer to the data beneath each group of headings as x, y, and z respectively. A multivalued dependency X Y signifies that if we choose any x actually occurring in the table (call this choice x ), c and compile a list of all the xc yz combinations that occur in the table, we will find that xc is associated with the same y entries regardless of z. A trivial multivalued dependency X whole set of attributes of the relation.
Y is one where either Y is a subset of X , or X and Y together form the
A functional dependency is a special case of multivalued dependency. In a functional dependency X → Y , every x determines exactly one y, never more than one.
Example Consider the following example:
Pizza Delivery Permutations Rest Restau aura rant nt
Pizz Pizza a Vari Variet ety y Deli Delive very ry Area Area
A1 Pizza
Thick Crust
Springfield
A1 Pizza
Thick Crust
Shelbyville
A1 Pizza
Thick Crust
Capital City
A1 Pi Pizza
Stuffed Cr Crust
Springfield
A1 Pi Pizza
Stuffed Cr Crust
Shelbyville
A1 Pizza
Stuffed Crust
Capital City
Elite Pizza
Thin Crust
Capital City
Elite Pizza
Stuffed Crust
Capital City
Vinc Vincen enzo zo's 's Pizz Pizzaa Thic Thickk Crus Crustt
Spri Spring ngfi fiel eldd
Vinc Vincen enzo zo's 's Pizz Pizzaa Thic Thickk Crus Crustt
Shel Shelby byvi vill llee
Vinc Vincen enzo zo's 's Pizz Pizzaa Thin Thin Crust rust
Sprin pringf gfiield eld
Vinc Vincen enzo zo's 's Pizz Pizzaa Thin Thin Crust rust
Shelb helbyv yvil ille le
Each row indicates that a given restaurant can deliver a given variety of pizza to a given area. The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery Area}. Therefore it meets all normal forms up to BCNF. If we assume, however, that pizza varieties offered by a restaurant are not affected by delivery area, then it does not meet 4NF. The problem is that the table features two non-trivial multivalued dependencies on the {Restaurant} attribute (which is not a superkey). The dependencies are:
Fourth normal form • {Resta {Restaura urant} nt} • {Resta {Restaura urant} nt}
79 {Pizza {Pizza Variet Variety} y} {Deliv {Delivery ery Area} Area}
These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties of pizza a restaurant offers are independent from the areas to which the restaurant delivers. This state of affairs leads to redundancy in the table: for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza starts producing Cheese Crust pizzas then we will need to add multiple rows, one for each of A1 Pizza's delivery areas. There is, moreover, nothing to prevent us from doing this incorrectly: we might add Cheese Crust rows for all but one of A1 Pizza's delivery areas, thereby failing to respect the multivalued dependency {Restaurant} {Pizza Variety}. To eliminate the possibility of these anomalies, we must place the facts about varieties offered into a different table from the facts about delivery areas, yielding two tables that are both in 4NF:
Varieties By Restaurant Resta estaur ura ant
Piz Pizza Vari Variet ety y
A1 Pizza
Thick Crust
A1 Pizza
Stuffed Crust
Elite Pizza
Thin Crust
Elite Pizza
Stuffed Crust
Vincen Vincenzo' zo'ss Pizza Pizza Thick Thick Crust Crust Vincen Vincenzo' zo'ss Pizza Pizza Thin Thin Crust Crust
Delivery Areas By Restaurant Resta estaur ura ant
Deli Delive verry Are Area a
A1 Pizza
Springfield
A1 Pizza
Shelbyville
A1 Pizza
Capital City
Elite Pizza
Capital City
Vincen Vincenzo' zo'ss Pizza Pizza Spring Springfie field ld Vincen Vincenzo' zo'ss Pizza Pizza Shelby Shelbyvil ville le
In contrast, if the pizza varieties offered by a restaurant sometimes did legitimately vary from one delivery area to another, the original three-column table would satisfy 4NF. Ronald Fagin demonstrated that it is always possible to achieve 4NF. [2] Rissanen's theorem is also applicable on multivalued dependencies.
Fourth normal form
4NF in practice A 1992 paper by Margaret S. Wu notes that the teaching of database normalization typically stops short of 4NF, perhaps because of a belief that tables violating 4NF (but meeting all lower normal forms) are rarely encountered in business applications. This belief may not be accurate, however. Wu reports that in a study of forty organizational databases, over 20% contained one or more tables that violated 4NF while meeting all lower normal forms. [3]
References [1] "A relation schema R* is in fourth normal form (4NF) if, whenever a nontrivial multivalued dependency X Y holds for R*, R*, then so does the functional dependency X → A for every column name A of R*. Intuitively all dependencies are the result of keys." Fagin, Ronald (September 1977). "Multivalued Dependencies and a New Normal Form for Relational Databases" (http:/ / www.almaden. www.almaden.ibm. ibm.com/ com/ cs/ cs/ Transactions on Database Database Systems 2 (1): 267. doi:10.1145/320557.320571. . people/ fagin/ fagin/ tods77.pdf). tods77.pdf). ACM Transactions [2] Fagi Fagin, n, p. p. 268 268 Bulletin 24 (1): 19 – 23. [3] Wu, Margaret Margaret S. (March 1992). "The Practical Practical Need Need for Fourth Normal Form". Form". ACM SIGCSE Bulletin 23. doi:10.1145/135250.134515.
Further reading • Date Date,, C. C. J. J. (19 (1999 99), ), An Introduction to Database Systems (http:/ / www.aw-bc. www.aw-bc.com/ com/ catalog/ catalog/ academic/ academic/ product/ product/ 0,1144,0321197844,00.html) 0,1144,0321197844,00.html) (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4. • Kent Kent,, W. W. (19 (1983 83)) A Simple Guide to Five Normal Forms in Relational / www.bkent. www.bkent.net/ net/ Relational Database Theory (http:/ Doc/ simple5.htm), simple5.htm), Communications of the ACM, vol. 26, pp. 120 – 125 125 • Date, Date, C.J. C.J.,, & Darwe Darwen, n, H., H., & Pasc Pascal, al, F. F. Database Debunkings (http:/ / www.dbdebunk. www.dbdebunk.com) com) • Advanc Advanced ed Normal Normaliza izatio tionn (htt (http:/ p:/ / www.utexas. www.utexas.edu/ edu/ its/ its/ windows/ windows/ database/ database/ datamodeling/ datamodeling/ rm/ rm/ rm8.html) rm8.html) by ITS, University of Texas.
Fifth normal form Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of database normalization
designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys. A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R. [1]
Example Consider the following example:
80
Fifth normal form
81
Travelling Salesman Product Availability By Brand Trav Travel elli ling ng Sale Salesm sman an
Bran Brand d
Prod Produc uctt Type Type
Jack Schneider
Acme
Vacuum Cleaner
Jack Schneider
Acme
Breadbox
Willy Loman
Robusto Pruning Shears
Willy Loman
Robusto Vacuum Cleaner
Willy Loman
Robusto Breadbox
Willy Loman
Robusto Umbrella Stand
Louis Fe Ferguson
Robusto Vacuum Cl Cleaner
Louis Fe Ferguson
Robusto Telescope
Louis Ferguson
Acme
Vacuum Cleaner
Louis Ferguson
Acme
Lava Lamp
Louis Ferguson
Nimbus Tie Rack
The table's predicate is: Products of the type designated by Product Type , made by the brand designated by Brand , are available from the travelling salesman designated by Travelling Salesman. In the absence of any rules restricting the valid possible combinations of Travelling Salesman, Brand, and Product Type, the three-attribute table above is necessary in order to model the situation correctly. Suppose, however, that the following rule applies: A Travelling Salesman has certain Brands B rands and certain Product Types in his repertoire. If Brand B is in his repertoire, and Product Type P is in his repertoire, then (assuming Brand B makes Product Type P), the Travelling Salesman must offer only the products of Product Type P made by Brand B.
In that case, it is possible to split the table into three:
Product Types By Travelling Salesman Travel Travellin ling g Sale Salesma sman n
Produc Productt Type Type
Jack Schneider
Vacuum Cleaner
Jack Sc Schneider
Breadbox
Willy Loman
Pruning Shears
Willy Loman
Vacuum Cleaner
Willy Loman
Breadbox
Willy Loman
Umbrella Stand
Louis Ferguson
Telescope
Louis Ferguson
Vacuum Cleaner
Louis Ferguson
Lava Lamp
Louis Ferguson
Tie Rack
Fifth normal form
82
Brands By Travelling Salesman Trav Travel elli ling ng Sales Salesma man n
Bran Brand d
Jack Schneider
Acme
Willy Loman
Robusto
Louis Ferguson
Robusto
Louis Ferguson
Acme
Louis Ferguson
Nimbus
Product Types By Brand Brand
Product Ty Type
Acme
Vacuum uum Cleaner ner
Acme
Breadbox
Acme
Lava Lamp
Robust Robustoo Prunin Pruningg Shears Shears Robust Robustoo Vacuum Vacuum Cleaner Cleaner Robus Robusto to Brea Breadbo dboxx Robust Robustoo Umbrel Umbrella la Stand Stand Robus Robusto to Tele Telesc scope ope Nimb Nimbus us Tie Tie Rac Rackk
Note how this setup helps to remove redundancy. Suppose that Jack Schneider starts selling Robusto's products. In the previous setup we would have to add two new entries since Jack Schneider is able to sell two Product Types covered by Robusto: Breadboxes and Vacuum Cleaners. With the new setup we need only add a single entry (in Brands By Travelling Salesman).
Usage Only in rare situations does a 4NF table not conform to 5NF. These are situations in which a complex real-world constraint governing the valid combinations of attribute values in the 4NF table is not implicit in the structure of that table. If such a table is not normalized to 5NF, the burden of maintaining the logical consistency of the data within the table must be carried partly by the application responsible for insertions, deletions, and updates to it; and there is a heightened risk that the data within the table will become inconsistent. In contrast, the 5NF design excludes the possibility of such inconsistencies..
Fifth normal form
References [1] Analysis Analysis of normal normal forms for anchor-ta anchor-tables bles (http:/ (http:/ / www.anchormodeling. www.anchormodeling.com/ com/ wp-content/ wp-content/ uploads/ uploads/ 2010/ 2010/ 08/ 08/ 6nf.pdf) 6nf.pdf)
Further reading • Kent Kent,, W. W. (19 (1983 83)) A Simple Guide to Five Normal Forms in Relational / www.bkent. www.bkent.net/ net/ Relational Database Theory (http:/ Doc/ simple5.htm), simple5.htm), Communications of the ACM, vol. 26, pp. 120 – 125 125 • Date, Date, C.J. C.J.,, & Darwe Darwen, n, H., H., & Pasc Pascal, al, F. F. Database Debunkings (http:/ / www.dbdebunk. www.dbdebunk.com) com)
Domain/key normal form Domain/key normal form (DKNF) is a normal form used in database normalization which requires that the
database contains no constraints other than domain constraints and key constraints. A domain constraint specifies the permissible values for a given attribute, while a key constraint specifies the attributes that uniquely identify a row in a given table. The domain/key normal form is achieved when every constraint on the relation is a logical consequence of the definition of keys and domains, and enforcing key and domain restraints and conditions causes all constraints to be met. Thus, it avoids all non-temporal anomalies. The reason to use domain/key normal form is to avoid having general constraints in the database that are not clear domain or key constraints. Most databases can easily test domain and key constraints on attributes. General constraints however would normally require special database programming in the form of stored procedures that are expensive to maintain and expensive for the database to execute. Therefore general constraints are split into domain and key constraints. It's much easier to build a database in domain/key normal form than it is to convert lesser databases which may contain numerous anomalies. However, successfully building a domain/key normal form database remains a difficult task, even for experienced database programmers. Thus, while the domain/key normal form eliminates the problems found in most databases, it tends to be the most costly normal form to achieve. However, failing to achieve the domain/key normal form may carry long-term, hidden costs due to anomalies which appear in databases adhering only to lower normal forms over time. The third normal form, Boyce – Codd Codd normal form, fourth normal form and fifth normal form are special cases of the domain/key normal form. All have either functional, multi-valued or join dependencies that can be converted into (super)keys. The domains on those normal forms were unconstrained so all domain constraints are satisfied. However, transforming a higher normal form into domain/key normal form is not always a dependency-preserving transformation and therefore not always possible.
Example A violation of DKNF occurs in the following table:
83
Domain/key normal form
84
Wealthy Person Wealth Wealthy y Person Person Wealth Wealthy y Person Person Type Type Net Worth Worth in Doll Dollars ars
Steve
Eccentric Mi Millionaire
124,543,621
Roderick
Evil Billionaire
6,553,228,893
Katrina
Eccentric Bi Billionaire
8,829,462,998
Gary
Evil Millionaire
495,565,211
(Assume that the domain for Wealthy Person consists of the names of all wealthy people in a pre-defined sample of wealthy people; the domain for Wealthy Person Type consists of the values 'Eccentric Millionaire', 'Eccentric Billionaire', 'Evil Millionaire', and 'Evil Billionaire'; and the domain for Net Worth in Dollars consists of all integers greater than or equal to 1,000,000.) There is a constraint linking Wealthy Person Type to Net Worth in Dollars, even though we cannot deduce one from the other. The constraint dictates that an Eccentric Millionaire or Evil Millionaire will have a net worth of 1,000,000 to 999,999,999 inclusive, while an Eccentric Billionaire or Evil Billionaire will have a net worth of 1,000,000,000 or higher. This constraint is neither a domain constraint nor a key constraint; therefore we cannot rely on domain constraints and key constraints to guarantee that an inconsistent Wealthy Person Type / Net Worth in Dollars combination does not make its way into the database. The DKNF violation could be eliminated by altering the Wealthy Person Type domain to make it consist of just two values, 'Evil' and 'Eccentric' (the wealthy person's status as a millionaire or billionaire is implicit in their Net Worth in Dollars, so no useful information is lost).
Wealthy Person Wealth Wealthy y Person Person Wealth Wealthy y Person Person Type Type Net Worth Worth in Doll Dollars ars
Steve
Eccentric
124,543,621
Roderick
Evil
6,553,228,893
Katrina
Eccentric
8,829,462,998
Gary
Evil
495,565,211
Wealthiness Status Status
Minimum
Millionaire 1,000,000
Maximum
999,999,999
Bill Billio iona nair iree 1,000 1,000,00 ,000,0 0,000 00 999,9 999,999, 99,999 999,99 ,9999
DKNF is frequently difficult to achieve in practice.
Domain/key normal form
85
References • Fagin, Ronald Ronald (1981). (1981). "A Normal Normal Form for Relatio Relational nal Databases Databases That That Is Based Based on Domains Domains and Keys" Keys" 415. doi:10.1145/319587.319592. Transactions on Database Systems 6: 387 – 415.
[1]
. ACM
[1] [1] http http:/ :/ / www.almaden. www.almaden.ibm. ibm.com/ com/ cs/ cs/ people/ people/ fagin/ fagin/ tods81.pdf tods81.pdf
External links • Databa Database se Norma Normaliz lizati ation on Basic Basicss (http:/ (http:/ / databases.about. databases.about.com/ com/ od/ od/ specificproducts/ specificproducts/ a/ a/ normalization.htm) normalization.htm) by Mike Chapple (About.com) • An Introdu Introduction ction to Databa Database se Normalizati Normalization on (http:/ (http:/ / dev.mysql. dev.mysql.com/ com/ tech-resources/ tech-resources/ articles/ articles/ intro-to-normalization.html) intro-to-normalization.html) by Mike Hillyer. • Norm Normal aliz izat atio ionn (htt (http: p:/ / / www.utexas. www.utexas.edu/ edu/ its-archive/ its-archive/ windows/ windows/ database/ database/ datamodeling/ datamodeling/ rm/ rm/ rm7.html) rm7.html) by ITS, University of Texas. • A tutori tutorial al on the the first first 3 norm normal al forms forms (htt (http:/ p:/ / phlonx.com/ phlonx.com/ resources/ resources/ nf3/ nf3/ ) by Fred Coulson • Description Description of the database database normaliz normalization ation basics basics (http:/ (http:/ / support.microsoft. support.microsoft.com/ com/ kb/ kb/ 283878) 283878) by Microsoft
Sixth normal form Sixth normal form (6NF) is a term in relational database theory, used in two different ways.
6NF (C. Date's definition) A book by Christopher J. Date and others on temporal databases, [1] defined sixth normal form as a normal form for databases based on an extension of the relational algebra. In this work, the relational operators, such as join, are generalized to support a natural treatment of interval data, such as sequences of dates or moments in time. [2] Sixth normal form is then based on this generalized join, as follows: A relvar R [table] is in sixth normal form (abbreviated 6NF) if and only if it satisfies no nontrivial join dependencies at all — where, as before, a join dependency is trivial if and only if at least one of the projections (possibly U_projections) involved is taken over the set of all attributes of the relvar [table] concerned.[Date et al.] [3] Any relation in 6NF is also in 5NF. Sixth normal form is intended to decompose relation variables to irreducible components. Though this may be relatively unimportant for non-temporal relation variables, it can be important when dealing with temporal variables or other interval data. For instance, if a relation comprises a supplier's name, status, and city, we may also want to add temporal data, such as the time during which these values are, or were, valid (e.g., for historical data) but the three values may vary independently of each other and at different rates. We may, for instance, wish to trace the history of changes to Status. For further discussion on Temporal Aggregation in SQL, see also Zimanyi. [4] For a different approach, see TSQL2.[5]
Sixth normal form
DKNF Some authors use the term sixth normal form differently, namely, as a synonym for Domain/key normal form (DKNF). This usage predates Date et al.'s work. [6]
Usage The sixth normal form is currently being used in some data warehouses where the benefits outweigh the drawbacks,[7] for example using Anchor Modeling. Although using 6NF leads to an explosion of tables, modern databases can prune the tables from select queries (using a process called 'table elimination') where they are not required and thus speed up queries that only access several attributes.
References [1] [2] [3] [4] [5]
Date Date et et al., al., 2003 op. cit., cit., chapt chapter er 9: 9: Generalizing the relational operators op. cit., cit., section section 10.4, p. p. 176 Zima Zimanyi nyi 2005 2005 Snodgrass, Snodgrass, Richard Richard T. TSQL2 Temporal Temporal Query Query Language (http:/ (http:/ / www.cs. www.cs.arizona. arizona.edu/ edu/ ~rts/ ~rts/ tsql2.html). tsql2.html). Describes history, gives references to standard and original book. [6] See www.dbdebun www.dbdebunk.com k.com for a discussion discussion on this topic topic (http:/ (http:/ / www.dbdebunk. www.dbdebunk.com/ com/ page/ page/ page/ page/ 621935.htm) 621935.htm) [7] See the the Anchor Modeling Modeling website website (http:/ (http:/ / www.anchormodeling. www.anchormodeling.com) com) for a website that describes a data warehouse modelling method based on the sixth normal form
Further reading • Date Date,, C.J C.J.. (20 (2006 06). ). The relational database dictionary: a comprehensive glossary of relational terms and concepts, with illustrative examples . O'Reilly Series Pocket references. O'Reilly Media, Inc.. p. 90. ISBN 978-0-596-52798-3. • Date, Chris J.; Hugh Hugh Darwen, Darwen, Nikos Nikos A. Lorentz Lorentzos os (January (January 2003). Temporal Data and the Relational Model: A Detailed Investigation into the Application of Interval Interval and Relation Theory to the Problem of Temporal Temporal Database Management . Oxford: Elsevier LTD. ISBN 1-55860-855-9.
• Zimanyi,, Zimanyi,, E. (June 2006). "Temporal "Temporal Aggregates Aggregates and Temporal Temporal Universal Universal Quantifica Quantification tion in Standard SQL" SQL" (http:/ / www.sigmod. www.sigmod.org/ org/ publications/ publications/ sigmod-record/ sigmod-record/ 0606/ 0606/ sigmod-record.june2006.pdf) sigmod-record.june2006.pdf) (PDF). ACM SIGMOD Record, volume 35, number 2, page 16 . ACM.
86
Relation (database)
87
Relation (database) In a relational database, a relation is a set of tuples (d 1, d2, ..., d j), where each element d n is a member of D n, a data domain. [1] Each distinct domain used in the definition of a relation is called an attribute, and each attribute may be named. In SQL, a query language for relational databases, relations are represented by tables, where each row of a table represents a single tuple, and where the values of each attribute form a column.
Relation, tuple, and attribute represented as table, row, and column.
E. F. Codd originally used the term in its mathematical sense of a finitary relation, a set of tuples on some set of n sets S 1, S 2, .... ,S n.[2] In this sense, the term was used by Augustus De Morgan in 1858. [3] Where all values of every attribute of a relation are atomic, that relation is said to be in first normal form.
Examples Below is an example of a relation having three named attributes: 'ID' from the domain of integers, and 'Name' and 'Address' from the domain of strings: ID (Integer)
Name (String)
Address (String)
102
Yonezawa Akinori
Naha, Okinawa
202
Murata Makoto
Sendai, Miyagi
104
Sakamura Ken
Kumamoto, Kumamoto
152
Matsumoto Yukihiro Okinawa, Okinawa
The tuples are unordered - one cannot say "The tuple of 'Murata Makoto' is above the tuple of 'Matsumoto Yukihiro'", nor can one say "The tuple of 'Yonezawa Akinori' is the first tuple."
Base relation variable and derived relation variable (view) Relation variables (relvars) are classified into two classes: base relation variables and derived relation variables. By applying a relational algebra expression or relational calculus expression to one or more relation variables, one new relation value is derived. A base relation variable is a source relation variable which is not derived from any other relation variables. In SQL, a database language of relational database, the term base table can be roughly considered a base relation variable. By using a Data Definition Language (DDL), it is able to define base relation variables. In SQL, by using CREATE TABLE syntax, it is able to define base relation variables. The following is an example. CREATE TAB TABLE LE List_of_people List_of_people (
ID INTEGER INTEGER, , Name CHAR CHAR( (40 40), ), Address CHAR CHAR( (200 200), ), PRIMARY KEY (ID)
)
A derived relation variable is a relation variable which is derived from one or more relation variables by applying a relational algebra expression or relational calculus expression. A View is considered a derived relation variable. A
Relation (database) Data Definition Language (DDL) is used to define derived relation variables. In SQL, CREATE VIEW syntax is used to define derived relation variables. The following is an example. CREATE VIEW CREATE VIEW List_of_Okinawa_people AS ( SELECT ID, Name, Address FROM List_of_people WHERE Address LIKE '%, Okinawa'
)
References Data Base Systems. Courant Institute: Prentice-Hall. [1] E. F. Codd (Oct 1972). "Further "Further normalization normalization of the database database relational relational model". model". Data ISBN 013196741X. "R is a relation on these n sets if it is a set of elements of the form (d1, d2, ..., d j) where d j ∈ D j for each j=1,2,...,n ." [2] Codd, Edgar F (June 1970). 1970). "A Relational Relational Model of Data for Large Shared Shared Data Banks" (http:/ (http:/ / www.seas. www. seas.upenn. upenn.edu/ edu/ ~zives/ ~zives/ 03f/ 03f/ cis550/ cis550/ Communications of the ACM (Association for Computing Machinery) 13 (6): 377 – 87. codd.pdf). codd.pdf). Communications 87. doi:10.1145/362384.362685. . "The term relation is used here in its accepted mathematical sense" [3] Augustus Augustus De Morgan Morgan (1858). "On "On the Syllogism Syllogism,, No. III". Transactions of the Cambridge Philosophical Society 10: 208. "When two objects, qualities, classes, or attributes, viewed together by the mind, are seen under some connexion, that connexion is called a relation."
• Date Date,, C. C. J. J. (19 (1994 94). ). An Introduction to Database Systems (6 ed.). Addison – Wesley. Wesley. ISBN 978-0-201-82458-2. • Date Date,, C. C. J. J. (20 (2005 05). ). Database in Depth : Relational Theory for Practitioners. Beijing: O'Reilly Media. ISBN 978-0-596-10012-4.
Functional dependency A functional dependency is a constraint between two sets of attributes in a relation from a database. Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y , also in R, (written X → Y ) if, and only if, each X value is associated with precisely one Y value. Customarily we call X the determinant set and Y the dependent dependent attribute attribute. Thus, given a tuple and the values of the attributes in X , one can determine the corresponding value of the Y attribute. In simple words, if X value is known, Y value is certainly known. For the purposes of simplicity, given that X and Y are sets of attributes in R, X → Y denotes that X functionally determines each of the members of Y —in this case Y is known as the dependent set . Thus, a candidate key is a minimal set of attributes that functionally determine all of the attributes in a relation. The concept of functional dependency arises when one attribute is dependent on another attribute and it also uniquely determines the other attribute. (Note: the "function" being discussed in "functional dependency" is the function of identification.) A functional dependency FD: X → Y is called trivial if Y is a subset of X. The determination of functional dependencies is an important part of designing databases in the relational model, and in database normalization and denormalization. The functional dependencies, along with the attribute domains, are selected so as to generate constraints that would exclude as much data inappropriate to the user domain from the system as possible. For example, suppose one is designing a system to track vehicles and the capacity of their engines. Each vehicle has a unique vehicle identification number (VIN). One would write VIN → EngineCapacity because it would be inappropriate for a vehicle's engine to have more than one capacity. (Assuming, in this case, that vehicles only have one engine.) However, EngineCapacity → VIN , is incorrect because there could be many vehicles with the same engine capacity. This functional dependency may suggest that the attribute EngineCapacity be placed in a relation with candidate key VIN. However, that may not always be appropriate. For example, if that functional dependency occurs as a result of the transitive functional dependencies VIN → VehicleModel and VehicleModel → EngineCapacity then that would
88
Functional dependency
89
not result in a normalized relation.
Irreducible function depending set A functional depending set S is irreducible if the set has the following three properties: 1. Each right right set of a functional functional dependency dependency of S contains contains only one attribute. attribute. 2. Each left set of a functional dependency of S is irreducible. It means that reducing any one attribute from left set will change the content of S (S will lose some information). 3. Reducing Reducing any functional functional dependen dependency cy will change change the content content of S. Sets of Functional Dependencies(FD) with these properties are also called canonical or minimal.
Properties of functional dependencies Given that X , Y , and Z are sets of attributes in a relation R, one can derive several properties of functional dependencies. Among the most important are Armstrong's axioms, which are used in database normalization: • Subset Property (Axiom of Reflexivity): If Y is a subset of X X , then X → Y • Augmentation (Axiom of Augmentation): If X X → Y , then XZ → YZ • Transitivity (Axiom of Transitivity): If X X → Y and Y → Z , then X → Z From these rules, we can derive these secondary rules: • Union: If X X → Y and X → Z , then X → YZ • Decomposition: If X X → YZ , then X → Y and X → Z • Pseudotransitivity: If X X → Y and WY → Z , then WX → Z Equivalent sets of functional dependencies are called covers of each other. Every set of functional dependencies has a canonical cover.
Example This example illustrates the concept of functional dependency. The situation modeled is that of college students visiting one or more lectures in each of which they are assigned a teaching assistant (TA). Let's further assume that every student is in some semester and is identified by a unique integer ID. StudentID Semester
Lecture
TA
1234
6
Numerical Methods John
2380
4
Numerical Methods Peter
1234
6
Visual Computing
1201
4
Numerical Methods Peter
1201
4
Physics II
Amina
Simone
We notice that whenever two rows in this table feature the same StudentID, they also necessarily have the same Semester values. This basic fact can be expressed by a functional dependency: • StudentID → Semester. Other nontrivial functional dependencies can be identified, for example: • {Stud {Studen entID tID,, Lect Lectur ure} e} → TA • {Stud {Studen entID tID,, Lect Lectur ure} e} → {TA, Semester} The latter expresses the fact that the set {StudentID, Lecture} is a superkey of the relation.
Functional dependency
90
External links • Gary Burt Burt (summer, (summer, 1999). "CS "CS 461 (Database (Database Managem Management ent Systems) Systems) lecture lecture notes" notes" [1]. University of Maryland Baltimore County Department of Computer Science and Electrical Engineering. • Jeffre Jeffreyy D. Ullma Ullman. n. "CS34 "CS3455 Lectur Lecturee Notes" Notes" [2] (PostScript). Stanford University. • Osmar Zaiane Zaiane (June (June 9, 1998). 1998). "CMPT "CMPT 354 (Databa (Database se Systems Systems I) lecture lecture notes" notes" [3]. Simon Fraser University Department of Computing Science.
References [1] [1] http http:/ :/ / www.cs. www.cs.umbc.edu/ umbc.edu/ courses/ courses/ 461/ 461/ current/ current/ burt/ burt/ lectures/ lectures/ lec14/ lec14/ [2] [2] http http:/ :/ / www-db.stanford. www-db.stanford.edu/ edu/ ~ullman/ ~ullman/ cs345notes/ cs345notes/ slides01-1.ps slides01-1.ps [3] [3] http http:/ :/ / www.cs. www.cs.sfu. sfu.ca/ ca/ CC/ CC/ 354/ 354/ zaiane/ zaiane/ material/ material/ notes/ notes/ Chapter6/ Chapter6/ node10.html node10.html
Multivalued dependency In database theory, multivalued dependency dependency is a full constraint between two sets of attributes in a relation. dependency requires that certain tuples be present in a In contrast to the functional dependency, the multivalued dependency relation. Therefore, a multivalued dependency is a special case of tuple-generating dependency. The multivalued dependency plays a role in the 4NF database normalization.
Formal definition The formal definition is given as follows. Let
[1]
be a relation schema and let
(which can be read as of tuples
and
in
and
multidetermines
(subsets). The multivalued dependency
) holds on
such that
if, in any legal relation
, there exist tuples
and
in
such that
In more simple words the above condition can be expressed as follows: if we denote by values for and
collectively equal to exist in
, the tuples
and
, for all pairs
the tuple having
correspondingly, then whenever the tuples should also exist in
.
Example Consider this example of a database of teaching courses, the books recommended for the course, and the lecturers who will be teaching the course:
Multivalued dependency
91
Course
Book
Lecturer
AHA
Silberschatz John D
AHA
Nederpelt
AHA
Silberschatz William M
AHA
Nederpelt
AHA
Silberschatz Christian G
AHA
Nederpelt
OSO
Silberschatz John D
OSO
Silberschatz William M
William M
John D
Christian G
Because the lecturers attached to the course and the books attached to the course are independent of each other, this database design has a multivalued dependency; if we were to add a new book to the AHA course, we would have to add one record for each of the lecturers on that course, and vice versa. Put formally, there are two multivalued dependencies in this relation: {course} {book} and equivalently {course} {lecturer}. Databases with multivalued dependencies thus exhibit redundancy. In database normalization, fourth normal form requires that either every multivalued dependency X Y is trivial or for every nontrivial multivalued dependency X Y , X is a superkey.
Interesting properties • If • If • If
, Then and and
, Then , then
The following also involve functional dependencies: • If • If
, then and
, then
The above rules are sound and complete. • A dec decom ompo posi siti tion on of R R into ( X X , Y ) and ( X X , R − Y ) is a lossless-join decomposition if and only if X X in R.
Y holds
Definitions full constraint A constraint which expresses something about all attributes in a database. (In contrast to an embedded constraint.) That a multivalued dependency is a full constraint follows from its definition,as where it says something about the attributes . tuple-generating dependency A dependency which explicitly requires certain tuples to be present in the relation. trivial multivalued dependency 1 A multivalued dependency which involves all the attributes of a relation i.e. multivalued dependency implies, for tuples trivial multivalued dependency 2 A multivalued dependency for which
and .
, tuples
and
which are equal to
. A trivial and
.
Multivalued dependency
92
References Database System Concepts Concepts (5th ed.). McGraw-Hill. p. 295. ISBN 0-07-124476-X. [1] Silberschat Silberschatz, z, Abraham; Abraham; Korth, Sudarsha Sudarshann (2006). Database
External links • Multivalued Multivalued depende dependencies ncies and and a new Normal Normal form for Relationa Relationall Databases Databases (http:/ (http:/ / www.almaden. www.almaden.ibm.com/ ibm.com/ cs/ people/ people/ fagin/ fagin/ tods77.pdf) tods77.pdf) (PDF) - Ronald Fagin, IBM Research Lab
Join dependency A join dependency is a constraint on the set of legal relations over a database scheme. A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T. If one of the tables in the join has all the attributes of the table T, the join dependency is called trivial. The join dependency plays an important role in the Fifth normal form, also known as project-join normal form, because it can be proven that if you decompose a scheme in tables to , the decomposition will be a lossless-join decomposition if you restrict the legal relations on .
to a join dependency on
called
Another way to describe a join dependency is to say that the set of relationships in the join dependency is independent of each other.
Formal definition Let
be a relation schema and let
The relation
be a decomposition of
satisfies the join dependency
if
. .
A join dependency is trivial if one of the is itself. —Silberschatz, Korth. Database System Concepts, 1st Edition[1]
Example Given a pizza-chain that models purchases in table Customer = { order-number, customer-name, pizza-name, courier}. It is obvious that you can derive the following relations: • custom customerer-nam namee depend dependss on orderorder-num number ber • pizzapizza-nam namee depen depends ds on order-n order-numb umber er • courie courierr depen depends ds on orde order-nu r-numbe mberr Since the relationships are independent you can say there is a join dependency as follows: *((order-number, customer-name), (order-number, pizza-name), (order-number,courier)). If each customer has his own courier however, you could have a join-dependency like this: *((order-number, customer-name), (order-number, courier), (customer-name, courier), (order-number,pizza-name)), but *((order-number, customer-name, courier), (order-number,pizza-name)) would be valid as well. This makes it obvious that just having a join dependency is not enough to normalize a database scheme.
Join dependency
References Concepts, 1st Edition [1] Silber Silbersch schatz atz,, Korth. Korth. Database System Concepts
Concurrency control In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible. Computer systems, both software and hardware, consist of modules, or components. components. Each component is designed to operate correctly, i.e., to obey to or meet certain consistency rules. When comp onents that operate concurrently interact by messaging or by sharing accessed data (in memory or storage), a certain compone nt's consistency consistency may be violated by another component. The general area of concurrency control provides rules, methods, design methodologies, and theories to maintain the consistency of components operating concurrently while interacting, and thus the consistency and correctness of the whole system. Introducing concurrency control into a system means applying operation constraints which typically result in some performance reduction. Operation consistency and correctness should be achieved with as good as possible efficiency, without reducing performance below reasonable. For example, a failure in concurrency control can result in data corruption from torn read or write operations.
Concurrency control in databases Comments:
1. This section is applicable to all transactional transactional systems, i.e., to all systems that use database transactions (atomic transactions; e.g., transactional objects in Systems management and in networks of smartphones which typically implement private, dedicated database systems), not only general-purpose database management systems (DBMSs). 2. DBMSs need to deal also with concurrency concurrency control issues not typical just to database database transactions but rather to operating systems in general. These issues (e.g., see Concurrency control in operating systems below) are out of the scope of this section. Concurrency control in Database management managem ent systems systems (DBMS; e.g., Bernstein et al. 1987, Weikum and Vossen 2001), other transactional objects, and related distributed applications (e.g., Grid computing computing and Cloud Cloud computing) computing) ensures that database transactions con currently without violating the data integrity of the respective transactions are performed concurrently databases. Thus concurrency control is an essential element for correctness in any system where two database transactions or more, executed with time overlap, can acc ess the same data, d ata, e.g., virtually in any general-purpose database system. Consequently a vast body of related research has been accumula ted since database database systems have emerged in the early 1970s. A well established concurrency control theory for database sy stems is is outlined in the references mentioned above: serializability theory, which allows to effectively design and analyze concurrency control methods and mechanisms. An alternative theory for concurrency control of atomic transactions over abstract data types is presented in (Lynch et al. 1993), and not utilized below. This theory is more refined, complex, with a wider scope, and has been less utilized in the Database literature than the classical theory above. Each theory has its pros and cons, emphasis and insight. To some extent they are complementary, and their merging may be useful. To ensure correctness, a DBMS usually guarantees that only serializable transaction schedules schedul es are generated, generated, unless serializability is intentionally relaxed to increase performance, but only in cases where application correctness is not harmed. For maintaining correctness in cases of failed (aborted) transactions (which can always happen for many reasons) schedules also need to have the recoverability (from abort) property. A DBMS also guarantees that no effect of committed transactions is lost, and no effect of aborted (rolled back) transactions remains in the related database. Overall transaction characterization is usually summarized by the ACID rules below. As databases have
93
Concurrency control become distributed, or needed to cooperate in distributed environments (e.g., Federated databases in the early 1990, and Cloud computing currently), the effective distribution of concurrency control mechanisms has received special attention.
Database transaction and the ACID rules The concept of a database transaction (or atomic transaction) has evolved in order to enable both a well understood database system behavior in a faulty environment where crashes can happen any time, and recovery from a crash to a well understood database state. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands). Every database transaction obeys the following rules (by support in the database system; i.e., a database system is designed to guarantee them for the transactions it runs): • Atomicity - Either the effects of all or none of its operations remain ("all or nothing" semantics) when a transaction is completed ( committed or aborted respectively). In other words, to the outside world a committed transaction appears (by its effects on the database) to be indivisible, atomic, and an aborted transaction does not leave effects on the database at all, as if never existed. • Consistency - Every transaction must leave the database in a consistent (correct) state, i.e., maintain the predetermined integrity rules of the database (constraints upon and among the database's objects). A transaction must transform a database from one consistent state to another consistent state (however, it is the responsibility of the transaction's programmer to make sure that the transaction itself is correct, i.e., performs correctly what it intends to perform (from the application's point of view) while the predefined integrity rules are enforced by the DBMS). Thus since a database can be normally changed only by transactions, all the database's states are consistent. An aborted transaction does not change the database state it has started from, as if it never existed (atomicity above). • Isolation - Transactions cannot interfere with each other (as an end result of their executions). Moreover, usually (depending on concurrency control method) the effects of an incomplete transaction are not even visible to another transaction. Providing isolation is the main goal of concurrency control. • Durability - Effects of successful (committed) transactions must persist through crashes (typically by recording the transaction's effects and its commit event in a non-volatile memory). The concept of atomic transaction has been extended during the years to what has become Business transactions which actually implement types of Workflow and are not atomic. However also such enhanced transactions typically utilize atomic transactions as components.
Why is concurrency control needed? If transactions are executed serially, i.e., sequentially with no overlap in time, no transaction concurrency exists. However, if concurrent transactions with interleaving operations are allowed in an uncontrolled manner, some unexpected, undesirable result may occur. Here are some typical examples: 1. The lost update problem: problem: A second transaction transaction writes a second second value of a data-item (datum) on top of a first value written by a first concurrent transaction, and the first value is lost to other transactions running concurrently which need, by their precedence, to read the first value. The transactions that have read the wrong value end with incorrect results. 2. The dirty read problem: Transactions read a value written by a transaction that has been later aborted. This value disappears from the database upon abort, and should not have been read by any transaction ("dirty read"). The reading transactions end with incorrect results.
94
Concurrency control 3. The incorrect summary problem: problem: While one transaction transaction takes a summary summary over the values of all the instances instances of a repeated data-item, a second transaction updates some instances of that data-item. The resulting summary does not reflect a correct result for any (usually needed for correctness) precedence order between the two transactions (if one is executed before the other), but rather some random result, depending on the timing of the updates, and whether certain update results have been included in the summary or not. Most high-performance transactional systems need to run transactions concurrently to meet their performance requirements. Thus, without concurrency control such systems can neither provide correct results nor maintain their databases consistent.
Concurrency control mechanisms Categories
The main categories of concurrency control mechanisms are: • Optimistic - Delay the checking of whether a transaction meets the i solation and other integrity rules (e.g., serializability and recoverability) until its end, without blocking any of its (read, write) operations ("...and be optimistic about the rules being met..."), and then abort a transaction to prevent the violation, if the desired rules are to be violated upon its commit. An aborted transaction is immediately restarted and re-executed, which incurs an obvious overhead (versus executing it to the end only once). If not too many transactions are aborted, then being optimistic is usually a good strategy. • Pessimistic - Block an operation of a transaction, if it may cause violation of the rules, until the possibility of violation disappears. Blocking operations is typically involved with performance reduction. • Semi-optimistic - Block operations in some situations, if they may cause violation of some rules, and do not block in other situations while delaying rules checking (if needed) to transaction's end, as done with optimistic. Different categories provide different performance, i.e., different average transaction completion rates ( throughput ), ), depending on transaction types mix, computing level of parallelism, and other factors. If selection and knowledge about trade-offs are available, then category and method should be chosen to provide the highest performance. The mutual blocking between two transactions (where each one blocks the other) or more results in a deadlock, where the transactions involved are stalled and cannot reach completion. Most non-optimistic mechanisms (with blocking) are prone to deadlocks which are resolved by an intentional abort of a stalled transaction (which releases the other transactions in that deadlock), and its immediate restart and re-execution. The likelihood of a deadlock is typically low. Both blocking, deadlocks, and aborts result in performance reduction, and hence the trade-offs between the categories. Methods
Many methods for concurrency control exist. Most of them can be implemented within either main category above. The major methods, [1] which have each many variants, and in some cases may overlap or be combined, are: 1. Lock Lockin ingg (e.g (e.g., ., Two-phase locking - 2PL) - Controlling access to data by locks assigned to the data. Access of a transaction to a data item (database object) locked by another transaction may be blocked (depending on lock type and access operation type) until lock release. 2. Serialization graph checking (also called Serializability, or Conflict, or Precedence graph checking) - Checking for cycles in the schedule's graph and breaking them by aborts. 3. Timestamp ordering (TO) - Assigning timestamps to transactions, and controlling or checking access to data by timestamp order. 4. Commitment ordering (or Commit ordering; CO) - Controlling or checking transactions' chronological order of commit events to be compatible with their respective precedence order.
95
Concurrency control Other major concurrency control types that are utilized in conjunction with the methods above include: • Multiversion concurrency control (MVCC) - Increasing concurrency and performance by generating a new version of a database object each time the object is written, and allowing transactions' read operations of several last relevant versions (of each object) depending on scheduling method. • Index concurrency control - Synchronizing access operations to indexes, rather than to user data. Specialized methods provide substantial performance gains. • Private workspace model (Deferred update) - Each transaction maintains a private workspace for its accessed data, and its changed data become visible outside the transaction only upon its commit (e.g., Weikum and Vossen 2001). This model provides a different concurrency control behavior with benefits in many cases. The most common mechanism type in database systems since their early days in the 1970s has been Strong strict Two-phase locking (SS2PL; also called Rigorous scheduling or Rigorous 2PL ) which is a special case (variant) of both Two-phase locking (2PL) and Commitment ordering (CO). It is pessimistic. In spite of its long name (for historical reasons) the idea of the SS2PL mechanism is simple: "Release all locks applied by a transaction only after the transaction has ended." SS2PL (or Rigorousness) is also the name of the set of all schedules that can be generated by this mechanism, i.e., these are SS2PL (or Rigorous) schedules, have the SS2PL (or Rigorousness) property.
Major goals of concurrency control mechanisms Concurrency control mechanisms firstly need to operate correctly, i.e., to maintain each transaction's integrity rules (as related to concurrency; application-specific integrity rule are out of the scope here) while transactions are running concurrently, and thus the integrity of the entire transactional system. Correctness needs to be achieved with as good performance as possible. In addition, increasingly a need exists to operate effectively while transactions are distributed over processes, computers, and computer networks. Other subjects that may affect concurrency control are recovery and replication. Correctness Serializability
For correctness, a common major goal of most concurrency control mechanisms is generating schedules with the Serializability property. Without serializability undesirable phenomena may occur, e.g., money may disappear from accounts, or be generated from nowhere. Serializability of a schedule means equivalence (in the resulting database values) to some serial schedule with the same transactions (i.e., in which transactions are sequential with no overlap in time, and thus completely isolated i solated from each other: No concurrent access by any two transactions to the same data is possible). Serializability is considered the highest level of isolation among database transactions, and the major correctness criterion for concurrent transactions. In some cases compromised, relaxed forms of serializability are allowed for better performance (e.g., the popular Snapshot isolation mechanism) or to meet availability requirements in highly distributed systems (see Eventual consistency ), but only if application's correctness is not violated by the relaxation (e.g., no relaxation is allowed for money transactions, since by relaxation money can disappear, or appear from nowhere). Almost all implemented concurrency control mechanisms achieve serializability by providing Conflict serializablity , a broad special case of serializability (i.e., it covers, enables most serializable schedules, and does not impose significant additional delay-causing constraints) which can be implemented efficiently.
96
Concurrency control Recoverability
See Recoverability in Serializability Comment: While in the general area of systems the term "recoverability" may refer to the ability of a system to
recover from failure or from an incorrect/forbidden state, within concurrency control of database systems this term has received a specific meaning. Concurrency control control typically typically also ensures the Recoverability property of schedules for maintaining correctness in cases of aborted transactions (which can always happen for many reasons). Recoverability (from abort) means that no committed transaction in a schedule has read data written by an aborted transaction. Such data disappear from the database (upon the abort) and are parts of an incorrect database state. Reading such data violates the consistency rule of ACID. Unlike Serializability, Recoverability cannot be compromised, relaxed at any case, since any relaxation results in quick database integrity violation upon aborts. The major methods listed above provide serializability mechanisms. None of them in its general form automatically provides recoverability, and special considerations and mechanism enhancements are needed to support recoverability. A commonly utilized special case of recoverability is Strictness, which allows efficient database recovery from failure (but excludes optimistic implementations; e.g., Strict CO (SCO) cannot have an optimistic implementation, but has semi-optimistic ones). Comment: Note that the Recoverability property is needed even if no database failure occurs and no database recovery from failure is needed. It is rather needed to correctly automatically handle transaction aborts, which may
be unrelated to database failure and recovery from it. Distribution
With the fast technological development of computing the difference between local and distributed computing over low latency networks or buses is blurring. Thus the quite effective utilization of local techniques in such distributed environments is common, e.g., in computer clusters and multi-core processors. However the local techniques have their limitations and use multi-processes (or threads) supported by multi-processors (or multi-cores) to scale. This often turns transactions into distributed ones, if they themselves need to span multi-processes. In these cases most local concurrency control techniques do not scale well. Distributed serializability and Commitment ordering
See Distributed serializability in Serializability As database systems system s have become distributed, distri buted, or started to cooperate in distributed environments (e.g., Federated databases in the early 1990s, and nowadays Grid computing, Cloud computing, and networks with smartphones), some transactions have become distributed. A distributed transaction means that the transaction spans proc esses, and a nd may span computers and geographical sites. This generates a need in effective distributed concurrency control mechanisms. Achieving Achievi ng the Serializability Serializability property of a distributed d istributed system's schedule (see Distributed serializability and Global serializability ( Modular challeng es typically not met met by most of Modular serializability)) effectively poses special challenges the regular serializability mechanisms, originally designed to operate locally. This is especially due t o a need need in costly distribution of of concurrency concurrency control information amid communication and computer latency. The only known general effective technique for distribution is Commitment ordering, which was disclosed publicly in 1991 (after being patented). Commitment ordering (Commit ordering, CO; Raz 1992) means that transactions' chronological order of commit events is kept compatible with their respective precedence order. CO does not require the distribution of concurrency control information and provides a general effective solution (reliable, high-performance, and scalable) for both distributed and global serializability, also in a heterogeneous environment with database systems (or other transactional objects) with different (any) concurrency control mechanisms. [1] CO is indifferent to which mechanism is utilized, since it does not interfere with any transaction operation scheduling (which most mechanisms control), and only determines the order of commit events. Thus, CO enables the efficient distribution of all other mechanisms, and also the distribution of a mix of different (any) local mechanisms, for achieving
97
Concurrency control distributed and global serializability. The existence of such a solution has been considered "unlikely" until 1991, and by many experts also later, due to misunderstanding of the CO solution (see Quotations in Global serializability). An important side-benefit of CO is automatic distributed deadlock resolution. Contrary to CO, virtually all other techniques (when not combined with CO) are prone to distributed deadlocks (also called global deadlocks) which need special handling. CO is also the name of the resulting schedule property: A schedule has the CO property if the chronological order of its transactions' commit events is compatible with the respective transactions' precedence (partial) order. SS2PL mentioned above is a variant (special case) of CO and thus also effective to achieve distributed and global serializability. It also provides automatic distributed deadlock resolution (a fact overlooked in the research literature even after CO's publication), as well as Strictness and thus Recoverability. Possessing these desired properties together with known efficient locking based implementations explains SS2PL's popularity. SS2PL has been utilized to efficiently achieve Distributed and Global serializability since the 1980, and has become the de facto standard for it. However, SS2PL is blocking and constraining (pessimistic), and with the proliferation of distribution and utilization of systems different from traditional database systems (e.g., as in Cloud computing), less constraining types of CO (e.g., Optimistic CO) may be needed for better performance. Comments:
1. The Distributed conflict serializability property in its general form is difficult to achieve efficiently, but it is achieved efficiently via its special case Distributed CO: Each local component (e.g., a local DBMS) needs both to provide some form of CO, and enforce a special vote ordering strategy for the Two-phase commit protocol (2PC: utilized to commit distributed transactions). Differently from the general Distributed CO, Distributed SS2PL exists automatically when all local components are SS2PL based (in each component CO exists, implied, and the vote ordering strategy is now met automatically). This fact has been known and utilized since the 1980s (i.e., that SS2PL exists globally, without knowing about CO) for efficient Distributed SS2PL, which implies Distributed serializability and strictness (e.g., see Raz 1992, page 293; it is also implied in Bernstein et al. 1987, page 78). Less constrained Distributed serializability and strictness can be efficiently achieved by Distributed Strict CO (SCO), or by a mix of SS2PL based and SCO based local components. 2. About the references references and Commitment ordering: (Bernstein (Bernstein et al. 1987) was published before the discovery of CO in 1990. The CO schedule property is called Dynamic atomicity in (Lynch et al. 1993, page 201). CO is described in (Weikum and Vossen 2001, pages 102, 700), but the description is partial and misses CO's essence. (Raz 1992) was the first refereed and accepted for publication article about CO algorithms (however, publications about an equivalent Dynamic atomicity property can be traced to 1988). Other CO articles followed. (Bernstein and Newcomer 2009) [1] note CO as one of the four major concurrency control methods, and CO's ability to provide interoperability among other methods. Distributed recoverability
Unlike Serializability, Distributed recoverability and Distributed strictness can be achieved efficiently in a straightforward way, similarly to the way Distributed CO is achieved: In each database system they have to be applied locally, and employ a vote ordering strategy for the Two-phase commit protocol (2PC; Raz 1992, page 307). As has been mentioned above, Distributed SS2PL, including Distributed strictness (recoverability) and Distributed commitment ordering (serializability), automatically employs the needed vote ordering strategy, and is achieved (globally) when employed locally in each (local) database system (as has been known and utilized for many years; as a matter of fact locality is defined by the boundary of a 2PC participant (Raz 1992) ).
98
Concurrency control Other major subjects of attention
The design of concurrency control mechanisms is often influenced by the following subjects: Recovery
All systems are prone to failures, and handling recovery from failure is a must. The properties of the generated schedules, which are dictated by the concurrency control mechanism, may have an impact on the effectiveness and efficiency of recovery. For example, the Strictness property (mentioned in the section Recoverability above) is often desirable for an efficient recovery. Replication
For high availability database objects are often replicated . Updates of replicas of a same database object need to be kept synchronized. This may affect the way concurrency control is done (e.g., Gray et al. 1996 [2]).
References • Philip A. Bernstein Bernstein,, Vassos Vassos Hadzila Hadzilacos, cos, Nathan Nathan Goodman Goodman (1987): (1987): Concurrency Control and Recovery in [3] (free PDF download), Addison Wesley Publishing Company, 1987, ISBN 0-201-10715-5 Database Systems • Gerhar Gerhardd Weikum Weikum,, Gottfrie Gottfriedd Vossen Vossen (200 (2001): 1): Transactional Information Systems [4], Elsevier, ISBN 1-55860-508-8 • Nancy Lynch, Michael Michael Merritt, Merritt, William William Weihl, Weihl, Alan Alan Fekete Fekete (1993): (1993): Atomic Transactions in Concurrent Concurrent and [5] Distributed Systems , Morgan Kauffman (Elsevier), August 1993, ISBN 978-1-55860-104-8, ISBN 1-55860-104-X • Yoav Raz (1992): (1992): "The Principle Principle of Commitmen Commitmentt Ordering, Ordering, or Guaranteeing Guaranteeing Serializab Serializability ility in a Heterogeneo Heterogeneous us [6] [7] Environment of Multiple Autonomous Resource Managers Using Atomic Commitment." (PDF ), Proceedings of the Eighteenth International International Conference on Very Large Data Data Bases (VLDB), pp. 292-312, Vancouver, Canada, August 1992. (also DEC-TR 841, Digital Equipment Corporation, November 1990)
Footnotes Transaction Processing Processing, 2nd Edition (http:/ [1] Philip Philip A. Bernstein Bernstein,, Eric Newcome Newcomerr (2009): Principles of Transaction / www.elsevierdirect. www.elsevierdirect.com/ com/ product. product. jsp?isbn=9781558606234), Morgan Kaufmann (Elsevier), June 2009, ISBN 978-1-55860-6 23-4 (page 145) [2] Gray, Gray, J.; Hell Helland, and, P.; P.; O’Neil, P.; Shasha, D. (1996). "The dangers of replication and a solution" (ftp:/ / ftp.research. ftp.research.microsoft. microsoft.com/ com/ pub/ pub/ tr/ tr/ the 1996 ACM SIGMOD SIGMOD International International Conference on Management Management of Data Data. pp. 173 – 182. tr-96-17.pdf). tr-96-17. pdf). Proceedings of the 182. doi:10.1145/233269.233330. . [3] [3] http http:/ :/ / research.microsoft. research.microsoft.com/ com/ en-us/ en-us/ people/ people/ philbe/ philbe/ ccontrol.aspx ccontrol. aspx [4] [4] http http:/ :/ / www.elsevier. www.elsevier.com/ com/ wps/ wps/ find/ find/ bookdescription.cws_home/ bookdescription.cws_home/ 677937/ 677937/ description#description description#description [5] [5] http http:/ :/ / www.elsevier. www.elsevier.com/ com/ wps/ wps/ find/ find/ bookdescription.cws_home/ bookdescription.cws_home/ 680521/ 680521/ description#description description#description [6] [6] http http:/ :/ / www.informatik. www.informatik.uni-trier. uni-trier.de/ de/ ~ley/ ~ley/ db/ db/ conf/ conf/ vldb/ vldb/ Raz92.html Raz92.html [7] [7] http http:/ :/ / www.vldb. www.vldb.org/ org/ conf/ conf/ 1992/ 1992/ P292.PDF P292. PDF
Concurrency control in operating systems Multitasking operating systems, especially real-time operating systems, need to maintain the illusion that all tasks running on top of them are all running at the same time, even though only one or a few tasks really are running at any given moment due to the limitations of the hardware the operating system is running on. Such multitasking is fairly simple when all tasks are independent from each other. However, when several tasks try to use the same resource, or when tasks try to share information, it can lead to confusion and inconsistency. The task of concurrent computing is to solve that problem. Some solutions involve "locks" similar to the locks used in databases, but they risk causing problems of their own such as deadlock. Other solutions are Non-blocking algorithms.
99
Concurrency control
References • Andrew Andrew S. Tanen Tanenbau baum, m, Albert Albert S Woodhu Woodhull ll (2006): (2006): Operating Systems Design and Implementation, 3rd Edition , Prentice Hall, ISBN 0-13-142938-8 • Silberschat Silberschatz, z, Avi; Avi; Galvin Galvin,, Peter; Peter; Gagne, Gagne, Greg (2008). (2008). Operating Systems Concepts, 8th edition . John Wiley & Sons. ISBN 0-470-12872-0.
100
Article Sources and Contributors
Article Sources and Contributors http://en.wikipedia.org/w/index.php?oldid=518806445 dex.php?oldid=518806445 Contributors: *drew, 05winsjp, 10285658sdsaa, 10metreh, 110808028 amol, 16@r, 206.31.111.xxx, 25or6to4, Database Source: http://en.wikipedia.org/w/in 28421u2232nfenfcenc, 28nebraska, 2D, 4twenty42o, 65.10.163.xxx, APH, Aaron Brenneman, Abhikumar1995, Addihockey10, Aditya gopal3, Admfirepanther, Adrian J. Hunter, Aepanico, Afluegel, Ahodgkinson, Ahoerstemeier, Ahy1, Aitias, Aj.robin, Akamad, Al Wiseman, Alain Amiouni, Alansohn, Alasdair, Ale jrb, Allan McInnes, Allecher, Alpha Quadrant (alt), Alphax, Alzpp, Amaraiel, Amd628, Anders Torlind, Andonic, Andre Engels, Andrewferrier, AndriuZ, Angela, Anikingos, AnjaliSinha, AnmaFinotera, Ann Stouter, AnonUser, Anonymous Dissident, Antandrus, Antrax, Apparition11, Arbitrarily0, Arcann, Argon233, Arjun01, Armen1304, ArnoLagrange, Arthena, Arthur Rubin, Arved, ArwinJ, Asyndeton, AtheWeatherman, Atkinsdc, AutumnSnow, Avenged Eightfold, AwamerT, Ayecee, AzaToth, Baa, Babbling.Brook, Barneca, Bbatsell, Bblank, Bcartolo, Bcontins, Beeblebrox, Beetstra, Beland, Ben Ben, Ben-Zin, Benni39, Bentogoa, Bernd in Japan, Beta M, Betterusername, Bharath357, Bjcubsfan, Bkhouser, Blanchardb, BlindEagle, Blood Red Sandman, BluCreator, Bluemask, Bluerocket, BobStepno, Bobblewik, Bogdangiusca, Bogey97, Boing! said Zebedee, Bongwarrior, Bowlderizer, Branzman, Brick Thrower, BrokenSphere, BryanG, Btilm, Bubba hotep, Burner0718, Buzzimu, Bwhynot14, C12H22O11, CIreland, COMPFUNK2, CableCat, Calabe1992, Call me Bubba, Callanecc, Calliopejen1, Caltas, Calutuigor, Cambalachero, Cambapp, Cammo33, Camw, Can't sleep, clown will eat me, CanisRufus, Canterbury Tail, Cantras, Capricorn42, Captain-n00dle, Captain-tucker, Carbonite, CardinalDan, Caster23, CasualVisitor, Cavanagh, Cenarium, CesarB, Cevalsi, Ceyjan, Chaojoker, Chester Markel, Childzy, Chirpy, Chocolateboy, ChorizoLasagna, Chrax, Chris 73, Chris G, Chrislk02, Christophe.billiottet, Chriswiki, Chtuw, Chuckhoffmann, Chuunen Baka, Clarince63, Clark89, Click23, Closedmouth, Colindolly, Colonies Chris, Cometstyles, Comm ander Keane, Comps, Constructive, Conversion script, Courcelles, Cpereyra, Cpl Syx, Cpuwhiz11, Craftyminion, Craig Stuntz, Crashdoom, Credema, Crucis, Cryptic, Culverin, Cyan, Cyan Gardevoir, Cybercobra, Cyberjoac, CynicalMe, D. Recorder, DARTH SIDIOUS 2, DEddy, DFS454, DJ Clayworth, DVD R W, DVdm, DamnRandall, Dan100, Dancayta, Dancter, Danhash, Daniel.Cardenas, DanielCD, Danieljamesscott, Danim, Dart88, Darth Mike, Darth Panda, Darthvader023, Davewild, David Fuchs, David0811, Dbates1999, Dbfirs, DePiep, Dead3y3, DeadEyeArrow, DeadlyAssassin, Deathlasersonline, Decrease789, DeirdreGerhardt, Denisarona, DerBorg, DerHexer, Deville, Dgw, Diamondland, DigitalEnthusiast, Discospinster, Djordjes, Djsasso, Dkastner, Doc glasgow, Doddsy1993, Dogposter, Donama, Doniago, DougBarry, Dougofborg, Doulos Christos, Dreadstar, Dreamyshade, Drivenapart, Drumroll99, Duyanfang, Dwolt, Dysepsion, E23, Eagleal, Earlypsychosis, EarthPerson, EastTN, Echartre, Edgar181, Edgarde, Edivorce, Edward, Eeekster, Ejrrjs, ElKevbo, Elwikipedista, Epbr123, Era7bd, Eric Bekins, Eric Burnett, Ericlaw02, Erikrj, Escape Orbit, Etxrge, EugeneZelenko, Everyking, Evildeathmath, Excirial, Exor674, Explicit, Ezeu, FFGeyer, Fang Aili, FatalError, Favonian, Feedmecereal, FetchcommsAWB, Feydey, Fieldday-sunday, Filx, Finlay McWalter, Flewis, Flubeca, Fluffernutter, FlyingToaster, Fooker69, Fortdj33, Foxfax555, Frankman, Franky21, Franl, Fratrep, Frsparrow, Fubar Obfusco, Furrykef, Fuzzie, Fæ, G12kid, GDonato, GHe, GLaDOS, Gadfium, Gail, Galzigler, Garyzx, Giftlite, Ginsengbomb, Girl2k, Gishac, Glacialfox, GlenPeterson, GnuDoyng, Gogo Dodo, GoingBatty, Gonfus, Gozzy345, Graeme Bartlett, GraemeL, Graham87, GrayFullbuster, GregWPhoto, Gregfitzy, GregorB, Grim23, Grsmca, Grstain, Gsallis, Gscshoyru, Gwizard, Gzkn, Haakon, Hadal, HaeB, Hamtechperson, Hankhuck, HappyInGeneral, Harej, Hasek is the best, HeliXx, Helixblue, Helloher, HexaChord, Heymid, Heysim0n, Hotstaff, Hugsandy, Huntthetroll, Hurricane111, HybridBoy, Hydrogen Iodide, IElonex!, Iced Kola, Igoldste, Imfargo, Imnotminkus, Imran, Informatwr, Insineratehymn, Inspector 34, Ironman5247, Isfisk, Itafran2010, ItsZippy, Ixfd64, J.delanoy, JCLately, JForget, JLaTondre, JMRyan, Ja 62, JaGa, Jab843, Jackacon, JamesBWatson, Jan1nad, Jasimab, Jasper Deng, Jauerback, Javert, Jaxl, Jay, Jb-adder, Jclemens, Jdlambert, JeffTan, JeffreyYasskin, Jennavecia, JephapE, Jerome Charles Potts, Jhfireboy, Jk2q3jrklse, Jmanigold, Jni, JoanneB, Joel7687, John Vandenberg, John of Re ading, Johnuniq, Jojalozzo, Jonathan Webley, JonathanFreed, Jondel, Jonearles, Jonwynne, Joshnpowell, Joshwa1234567890, Journalist, Jschnur, Jstaniek, Jvhertum, Jwoodger, Jwy, KILLERKEA23, Kanonkas, Karlhahn, Karmaf ist, Katalaveno, Keenan Pepper, Keilana, Kekekecakes, Kellyk99, Kenny sh, Kevins, KeyStroke, Khoikhoi, Kiand, Kimberly ayoma, Kimera Kat, King of Hearts, Kingius, Kingpin13, KingsleyIdehen, Kivar2, Kkailas, Knbanker, Koavf, Komal.Ar, KoshVorlon, KotetsuKat, Kozmando, Kraftlos, Krashlandon, Kslays, Kubigula, Kukini, Kunaldeo, Kungfuadam, Kuru, Kushal one, Kvasilev, Kwiki, KyraVixen, Kzzl, L Kensington, LC, LaosLos, Larsinio, Latka, Leaderofearth, Leandrod, LeaveSleaves, LeeHam2007, Leonnicholls07, LessHeard vanU, Levin, Levin Carsten, Lexo, Lflores92201, Lfstevens, Lguzenda, Lights, LindaEllen, Lingliu07, Lingwitt, Linkspamremover, LittleOldMe, LittleWink, Llyntegid, Lod, Logan, Lotje, Lovefamosos, Lovelac7, Lowellian, Lradrama, Lsschwar, LuK3, Lucyin, Luizfsc, Luna Santin, M.badnjki, M4gnum0n, MBisanz, MECiAf., MER-C, Machdohvah, Madhava 1947, Majorly, Malvikiran, Mandarax, Manikandan 2030, Mannafredo, Marasmusine, Mark Renier, MarkSutton, MartinSpamer, Materialscientist, Mathewforyou, Mato, Matthewrbowker, Matticus78, Mattisse, Maty18, Maury Markowitz, Max Naylor, Maxferrario, Maxime.Debosschere, Maxmarengo, Mayur, Mazca, Mboverload, McGeddon, Mdd, Meaghan, Megatronium, Melucky2getu, Mentifisto, Mephistophelian, Mercy11, Methnor, Mhkay, Michael Hardy, Michael Slone, Microchip08, Mike Dillon, Mike Rosoft, Mike Schwartz, MikeSy, Mikeblas, Mikey180791, MilerWhite, Millermk, Milo99, Mindmatrix, Minimac, Minna Sora no Shita, Mkeranat, Moreschi, Morwen, MrNoblet, Mrt3366, Mspraveen, Mugaliens, Mukherjeeassociates, Mulad, Mumonkan, Mushroom, Mxn, N1RK4UDSK714, N25696, NAHID, NSR, Nafc lark, Namlemez, Nanshu, NathanBeach, NawlinWiki, NetManage, Netizen, NewEnglandYankee, Ngpd, Nick, Nicoosuna, Niteowlneils, Nk, Noah Salzm an, Noctibus, Noldoaran, Northamerica1000, Northernhenge, Nsaa, Nurg, Oca asi, Oda Mari, Odavy, Ohka-, Oho1, Oli Filth, Olinga, OllieFury, OnePt618, OrgasGirl, Oroso, OverlordQ, PJM, PaePae, Pak21, Parzi, PatrikR, Paul August, Paul Drye, Paul E Ester, Paul Foxworthy, Paulinho28, Pcb21, Pdcook, Peashy, Pee Tern, PeeAeMKay, Pengo, PeregrineAY, Peruvianllama, Peter Karlsen, Peter.C, Pgk, Phantomsteve, Pharaoh of the Wizards, Phearlez, PhilKnight, Philip Trueman, Philippe, Phoenix-wiki, Piano non troppo, Pillefj, Pingveno, Pinkadelica, Pjoef, Plrk, Pnm, Poeloq, Pol098, Poor Yorick, Poterxu, Praba tuty, Prari, Prashanthns, PrePress, Preet91119, Proofreader77, Prunesqualer, Psaajid, Psb777, Puchiko, Pvjohnson, Pyfan, Quadell, Qwertykris, Qwyrxian, Qxz, R'n'B, R3miixasim, RIH-V, RadioFan, RadioKirk, Railgun, Rakeki, Ravinjit, RayGates, RayMetz100, RazorXX8, Rdsmith4, Reaper Eternal, Refactored, Regancy42, Reidh21234, Rettetast, RexNL, Rhobite, Rich Farmbrough, Ricky81682, Ringbang, Rishu arora11, Riverraisin, Rj Haseeb, Rjwilmsi, Robert Merkel, Robert Skyhawk, Robocoder, Robth, Romanm, Rotanagol, Rothwellisretarded, Roux, Rursus, Ruud Koot, Ryager, Ryanslater, Ryanslater2, Ryulong, S.K., SAE1962, SDSWIKI, SJP, SWAdair, Sae1962, Saiken79, Salvio giuliano, Sam Barsoom, Sam Korn, SamJohnston, Samir, Sandman, Sango123, Sarchand, SarekOfVulcan, SatuSuro, Saturdayswiki, Savh, ScMeGr, Sceptre, Seanust 1, Seaphoto, SebastianHelm, Serketan, Several Pending, Sewebster, Shadowjams, Shadowseas, Sheeana, Shipmaster, Shirulashem, Siebren, Silly rabbit, S imeon, Simetrical, SimonMorgan, Sintaku, Sir Nicholas de Mimsy-Porpington, Sissi's bd, Siteobserver, Sjakkalle, Sjc, Skybrian, Slakr, Sleske, SnoFox, Somchai1029, Sonett72, Sonia, Soosed, Soumark, SpK, Spartaz, Spazure, Spdegabrielle, SpikeTorontoRCP, SpuriousQ, Srdju001, Srikeit, Ssd, StaticVision, Stdazi, Stephen Gilbert, Stevertigo, Stifle, Stirling Newberry, Storm Rider, Strongsauce, Stuhacking, Sucker666, Sudarevic, Suffusion of Yellow, SuperHamster, Supertouch, Supreme Deliciousness, Supten, SwisterTwister, SymlynX, Sythy2, Tabletop, Tablizer, TakuyaMurata, TalkyLemon, Tasc, Tazmaniacs, Th1rt3en, Thatperson, The Anome, The Thing That Should Not Be, The Wiki Octopus, The wub, TheGrimReaper NS, TheNewPhobia, Thedjatclubrock, Thehulkmonster, Theimmaculatechemist, Theodolite, Theory of deadman, Thingg, Think outside the box, Thinktdub, Thomasryno, ThumbFinger, Thumperward, Tictacsir, Tide rolls, Tim Q. Wells, TimBentley, TittoAssini, Tobias Bergemann, Tomatronster, Tonydent, Toquinha, Tpbradbury, Treekids, TrentonLipscomb, Trevor MacInnis, Triwbe, Troels Arvin, Trusilver, Tualha, Tudorol, Tuhl, Turlo Lomon, Turnstep, Twebby, Twelvethirteen, TwistOfCain, TwoTwoHello, Twsx, TyA, UberScienceNerd, Ubiq, Ugebgroup8, Ulric1313, Ultraexactzz, Uncle Dick, Unyoyega, VNeumann, Vary, Velella, Versus22, Veryprettyfish, Vespristiano, Victor fa lk, Vikreykja, Vincent Liu, Vipinhari, Vishnava, Visor, Vivacewwxu, VoxLuna, Vrenator, W m ccall, W163, WOSlinker, Waggers, Waveguy, Wavelength, Wee toddid, Welsh, Werdna, Widefox, Wifione, Wik, Wiki alf, Wiki tiki tr, Wikidrone, Wikipelli, WikiuserNI, Willking1979, Wimt, Windsok, Winterst, Wipe, Wmahan, Woohookitty, Woseph, Writeread82, Wulfila, Wwmbes, Wya 7890, Xhelllox, Xin0427, Yossman007, ZenerV, Zhenqinli, Zhou Yu, Zipircik, Zippanova, ZooPro, Zro, Zundark, Zzuuzz, Σ, М И Ф, ﮐ ﯿ, 2838 anonymous edits http://en.wikipedia.org/w/index.php?oldid=518701197 p?oldid=518701197 Contributors: 16@r, AJackl, Ahy1, Alksentrs, Alleborgo, Amitchaudhary, Anarchitect, Andrejj, Ann Entity –relationship model Source: http://en.wikipedia.org/w/index.ph Stouter, Anshuk, Ardonik, Armoreno10, Arronax50, Arunloboforever, Asavari24, Axeltroike, Ayush Sama ntroy, Bencoder, Benthompson, BigSmoke, Bignose, Bilby, Bobo192, Bolo1729, Borgx, BozMo, Brick Thrower, Bsodmike, CDV, Caltas, CapitalR, Causa sui, Cedric.claidiere, Centrx, Charlesbc, ChrisEich, Claygate, Cohesion, Colin Angus Mackay, Corb3126, Craigwb, DStoykov, Danhuby, Danim, Datapolitical, Dburbank, DeadEyeArrow, Deragon, Diligentdogs, Diptanshu.D, Discospinster, Diza, Doctorambient, Dodo bird, Doscmaker, Doug Bell, Dream of Nyx, Duke Ganote, Dumelow, ERfan111, EWikist, EagleFan, Edward, Egrabczewski, Eirikr, Ekillaby, Elsendero, Elwikipedista, Erics, Fan-1967, Fieldday-sunday, Finell, Firsfron, Forage, Ftiercel, Fuper, Gebe, Godrickwok, Gonwin, GraemeL, GreyCat, Gronky, Gsmgm, HTS3000, Heightwatcher, HerrSchnapps, Hex, HexaChord, HisSpaceResearch, Hu, Hu12, Hugsandy, ITBlair, Ian channing, Informatwr, Ixfd64, J.delanoy, JD554, JLaTondre, Ja 62, Jan Hidders, Jandalhandler, Jason Quinn, Jay, Jay Gatsby, Jeff3000, JerkerNyberg, Johncartmell, Johncurrier, Jolta, Jonathan Webley, JorisvS, Jwilkinson, Kalotus, Karlcuya, Karnesky, Kate, Keilana, Kemorgan, KennethJ, KeyStroke, Khalid hassani, Khazar2, Kirklm, Klausness, Kliford, Koavf, Kocio, Kop, Kubigula, Kushalbiswas777, Lambmj, Lancew, Legaia, Leggattst, Libcub, Ligulem, LockeShocke, Lprichar, MBisanz, Malcolmxl5, Mann jess, Manning Bartlett, Marekich, Mark Arsten, Mark Renier, Master of Puppets, Materialscientist, Matthew 1130, Mcowpland, Mdd, Meegs, MelbourneStar, Mhkay, Michael Hardy, Michael miceli, Miriup, Misteror, Mjb, Motyka, Msnicki, NBthee, NSLE, Nareshyalagala, Nickcarr, Nishadha, Nn123645, Northernhenge, Northumbrian, Nt777, Octahedron80, Ohnoitsjamie, OsamaBinLogin, Ottomachin, Patrick Deelen, Patriotic dissent, Paul Bassin, Paulgeorgebassin, Paullewis4372, Pde, Perrydegner, Perter Nokia, Petr Dlouhý, Plhale667, Pm master, Pmmulligan, Pooryorick, Puffin, Raffen, Ralph Corderoy, Rambaldi47, Raven4x4x, Ravinjit, Rejeantremblay, René, Riccardo.fabris, RichardVeryard, Roadmr, RobertG, RobertRoggenbuck, Robsta, RockMFR, Ronjouch, Ronz, Rosattin, Rp, Rsrikanth05, Rvstephenson, Sae1962, SarekOfVulcan, Saturday, Saurabha5, Saustin1, Sbachmn, ScottHJohnson, Sean Whitton, Slovakjoe, SoftwareDeveloper, Softwiki, Sstrauch, Stanislav Nowak, Stephenb, SteveChervitzTrutane, SteveLichtman, Stuque, Stwalkerster, Sulai, S upten, Svirf, THEN WHO WAS PHONE?, Tdrewry, Texture, The undertow, TheMattrix, Thejoshwolfe, Threazy, Tide rolls, Toyota prius 2, Tumble, Turnstep, TwoTwoHello, TypoBoy, Udit90, Udo Altmann, UnitedStatesian, Vespristiano, Vishnava, Vrenator, Wdhoke, Whadda, Willking1979, Wizgha, Wsandiego, Wwmbes, Xilaworp, Yahya Abdal-Aziz, Zanaq, Zhenyu, Zondor, ZweiOhren, ^demon, 540 anonymous edits Database design Source: http://en.wikipedia.org/w/ind http://en.wikipedia.org/w/index.php?oldid=507370466 ex.php?oldid=507370466 Contributors: A. B., Agrumer, Allecher, Allen3, Alpha Quadrant, AnmaFinotera, Anna Lincoln, ArchCarrier,
Arthur Rubin, Binksternet, Bobo192, Bruce s r, Capricorn42, Codeczero, Cschutijser, Cyrius, DVdm, Danim, David Delony, DavidCHay, Dethomas, Disnarda, Dr. Zombieman, Dr1819, Dreikin, Ejiskolo, ElKevbo, Elwikipedista, Falcon8765, Focus22, Fraggle81, Friendlydata, Giteshtrivedi, Grsmca, Handcover, Hrundi Bakshi, Jason Quinn, Jon Awbrey, Jwoodger, KeyStroke, Kku, LAX, Lguzenda, LilHelpa, Lingliu07, M4gnum0n, Magioladitis, Maimai009, Mark Arsten, Mark Renier, Matt Beard, Mattbrundage, Mdd, Mentifisto, Mikimik, Miquonranger03, Mixer, Myasuda, Neerajadsul, Ohnoitsjamie, Oxguy3, Puchiko, RMFan1, Ravenmichael, Ra vinjit, Rbellika, Rednblu, RichardVeryard, Sanskritkanji, Seaphoto, Shamanx, SkyWalker, Smokris, Soap, Sobloku, Tarotcards, The Letter J, Tkgd2007, Troels Arvin, Turgan, TyA, White Trillium, Xcentaur, Xhienne, Zzuuzz, 160 anonymous edits http://en.wikipedia.org/w/index.php?oldid=518158542 dex.php?oldid=518158542 Contributors: *Kat*, 01001, 127, 217.162.105.xxx, 64.192.12.xxx, Abdull, Abolen, Adamcscott, Adamrush, Relational database Source: http://en.wikipedia.org/w/in Admrboltz, Agateller, Ahoerstemeier, Alain Amiouni, Alansohn, Andre Engels, Angela, Anuja297, Appzter, Astheg, AutumnSnow, Banes, Beland, Beno1000, Bitnine, Bobo2000, Boothy443, Booyabazooka, Bpalitaa, Brick Thrower, Bsdlogical, CALR, Calmer Waters, Calvernaz, Chris.Giles, Chrislk02, Conversion script, Craig Stuntz, Crosbiesmith, Cww, DARTH SIDIOUS 2, DVdm, Dandv, Danim, Dannydaman9, Darth Mike, Dave6, David Delony, Dfeuer, DinosaursLoveExistence, Dionyziz, Dirk P Broer, Drgs100, Dschwart11, Dumbledad, EagleFan, Eik Corell, El
101
Article Sources and Contributors C, ElKevbo, Emperorbma, Fabrictramp(public), FatalError, FayssalF, Ferkelparade, Fidimayor, Fieldday-sunday, Filiocht, Findling67, FlyingDoctor, Francs2000, Fratrep, Fred Bradstadt, Freediving-beava, Frigotoni, Fuddle, Gaur1982, Gerbrant, Giftlite, Glane23, GoingBatty, Graham87, HJ Mitchell, Hapsiainen, Harold f, Herostratus, Hmrox, Hp-NJITWILL, I do not exist, ILikeBeer, IRP, Ideogram, Iohannes Animosus, J.delanoy, JCLately, JLaTondre, JaGa, Jacobrothstein, Jan Hidders, Jitendraapi, Jncraton, John Vandenberg, Johnuniq, Jon Awbrey, Jwoodger, Jóna Þórunn, K faiad, KingsleyIdehen, Klausness, KnowledgeOfSelf, Kostmo, Kraron, Kristensson, Krogstadt, Kuru, Lalapicklem, Larsinio, Leandrod, Lfstevens, Linlasj, Logthis, Looxix, Luna Santin, MC MasterChef, MER-C, Mac, MainlyDigGrammar, Mandarax, Manop, Mark Renier, Mark T, Mav, Mblumber, Mckaysalisbury, Merlion444, Metroking, Michael Hardy, Michael Hodgson, Mikeblas, MilerWhite, Mindmatrix, Msikma, NHRHS2010, Nannahara, Nanshu, Nisantha04, Niteowlneils, Nocohen, Ns.code, Odie5533, Odysseus1479, Olinga, Oursinees324, OwenBlacker, Oxymoron83, Pablo323, Pdcook, Pearle, Philcha, Pietdesomere, Pinkadelica, Psb777, Psychcf, Quitchy, Rasmus Faber, RayGates, Rchertzy, Rfl, Riggz0rer, Romanm, Rrburke, SandyGeorgia, Scouser0phil, Sequologist, Sfe1, Sgiovannini, Shinju, Sir Nicholas de Mimsy-Porpington, Sir Vicious, Slightlyusefulcat, Smjg, Solipsist, Sonett72, Specialbrad, Spiritia, Spudtater, SqlPac, Stare at the sun, SteinbDJ, Steve Casburn, Supten, TJRC, Tcncv, Tcnuk, Ted Longstaffe, Teles, TheDJ, Thingg, Tobias Bergemann, Todd Vredevoogd, Tom harrison, Triddle, Triwbe, Troels Arvin, Ttguy, Turnstep, Utcursch, Vespristiano, Wavelength, Wesley, Wolfraem, Wolfsbane2k, Xiong, Xphile2868, Zahid Abdassabur, Zipircik, 536 anonymous edits http://en.wikipedia.org/w/index.php?oldid=515822104 ndex.php?oldid=515822104 Contributors: 130.94.122.xxx, 62.114.199.xxx, Adamcscott, Altenmann, AndrewWTaylor, AndrewWarden, Relational model Source: http://en.wikipedia.org/w/i AndyKali, AnonMoos, Arthur Rubin, Ashrust, Audiodude, AutumnSnow, Aytharn, BD2412, BMF81, Babbling.Brook, Bblfish, Be land, Bento00, Bobo192, BonsaiViking, Brick Thrower, Brion VIBBER, Budloveall, CBM, Cadr, Cathy Linton, Cconnett, ChaosControl, Chessphoon, Chrisahn, Chrissi, Conti, Conversion script, Cra ig Stuntz, Crashoffer12345, Crosbiesmith, DARTH SIDIOUS 2, Damian Yerrick, Danim, DaveVoorhis, David Eppstein, Derek Ross, Dreadstar, Drunken Pirate, EagleFan, Ehn, Elwikipedista, Emx, Enric Naval, Erik Garrison, Evildeathmath, Furrykef, Fyrael, Gadfium, Gary D, Gary King, Giftlite, Grassnbread, Greenrd, Gregbard, GregorB, Gurch, Hans Adler, Helvetius, Hyacinth, Ideogram, Iluvitar, Immunize, Irishguy, Ixfd64, J04n, JCLately, Jadedcrypto, Jalesh, Jan Hidders, Jarble, Jbolden1517, Jeff3000, JesseW, Jklowden, Jmabel, Joelm, Jon Awbrey, Jpbowen, Kassie, Kendrick Hang, Ketiltrout, Khalid hassani, Kimchi.sg, Kjkolb, Klausness, Korrawit, Lahiru k, Larsinio, Leandrod, Leifbk, Lethe, Lfstevens, MER-C, Madjestic, Maokart444, MarXidad, Marc Venot, Mark Renier, Materialscientist, Matt Deres, Mckaysalisbury, Mdd, Metaeducation, Mets501, Mhkay, Michael Hardy, MilerWhite, Mindmatrix, Moogwrench, Muntfish, NSash, Nad, Nascar1996, Neilc, Niteowlneils, NonDucor, Nsd, Ocaasi, Ocrow, Ozten, Pablothegreat85, Paul Foxworthy, Pitix, Pmsyyz, Pol098, Quazak Zouski, R'n'B, Razorbliss, Rbrwr, Reedy, Reyk, RonaldKunenborg, Ronhjones, Rp, Rursus, Ruud Koot, S.K., Sdorrance, Seraphim, SeventyThree, Sietse Snel, Simetrical, SimonP, Sonett72, Spartan-James, Spellcast, SpuriousQ, SqlPac, SteinbDJ, Stevertigo, THEN WHO WAS PHONE?, Tatrgel, Teknic, The-G-Unit-Boss, Tjic, Tobias Bergemann, Tohobbes, Tony1, Toreau, Troels Arvin, Tualha, Turnstep, Vikreykja, Welsh, Wgsimon, Windharp, Winhunter, Woohookitty, Zklink, 285 anonymous edits http://en.wikipedia.org/w/index.php?oldid=512222707 .php?oldid=512222707 Contributors: AHMartin, Adrian.benko, Alansohn, Andycjp, Anonymous Dissident, Ardric47, Arthur Rubin, Binary relation Source: http://en.wikipedia.org/w/index AxelBoldt, BYS2, Boute, Brambleclawx, Bryan Derksen, CBM, CRGreathouse, Charles Matthews, Chocolateboy, Classicalecon, Constructive editor, Conversion script, Cybercobra, Dallashan, DavidCBryant, Dbtfz, Djjrjr, Dmcq, Dominus, Dpv, Dreftymac, DuaneLAnderson, Egriffin, Eje211, Elwikipedista, EmilJ, Ernsts, Floquenbeam, Fre sheneesz, Fropuff, Geero, Giftlite, Hahahaha4, Hairy Dude, Hans Adler, Happy-melon, Henry Delforn (old), Ht686rg90, IdealOmniscience, Isheden, Isomorphic, Ixfd64, Jan Hidders, Jellystones, Jitse Niesen, Jlr, Jon Awbrey, Jorge Stolfi, Joriki, Keenan Pepper, Koertefa, Koffieyahoo, Labus, LajujKej, Lambiam, Lerutit, Linas, MFH, Mark Renier, MathMartin, Mets501, Mfwitten, MiNombreDeGuerra, Michael Hardy, Mjb, NearSetAccount, Netrapt, Obradovic Goran, Oleg Alexandrov, OnePt618, Owlgorithm, Patrick, Paul August, PaulTanenbaum, Pcap, Penumbra2000, Pigcatian, Plmday, Policron, Quarl, Quux0r, Randall Holmes, Reindra, ResearchRave, Rlupsa, Roadrunner, Robin S, Royalguard11, SEIBasaurus, Sftd, Shanes, Slipstream, Snigbrook, SpaceFlight89, TakuyaMurata, Tarquin, Tide rolls, Timwi, Tobias Bergemann, Tomo, Trovatore, Turms, Vanish2, Vipinhari, WillowW, Wshun, Xantharius, Xavic69, Yrodro, Yuval m adar, Zundark, 116 anonymous edits Database normalization Source: http://en.wikipedia.org/w/index.php?oldid=518100250 http://en.wikipedia.org/w/index.php?oldid=518100250 Contributors: 1exec1, A3 nm, ARPIT SRIVASTAV, Ahoerstemeier, Akamad, Akhristov, Alai,
Alasdair, Alest, Alpha 4615, Amr40, AndrewWTaylor, Antonielly, Anwar saadat, Apapadop, Arakunem, Archer3, Arcturus, Arthena, Ascend, AstroWiki, AubreyEllenShomo, Autocracy, AutumnSnow, Azhar600-1, BMF81, Babbling.Brook, Bernard Fra nçois, Bewildebeast, Billben74, Billpennock, BillyPreset, Black Eagle, Blade44, Blakewest, Blanchardb, Bloodshedder, Blowdart, BlueNovember, BlueWanderer, Bongwarrior, Boson, Bovineone, BradBeattie, Brick Thrower, BrokenSegue, Bruceshining, Bschmidt, Bugsbunny1611, BuzCo, CLW, Callavinash1, Can't sleep, clown will eat me, Chairboy, Chrislk02, Citral, Cl22333, CodeNaked, Conversion script, Creature, Crenner, Crosbiesmith, DARTH SIDIOUS 2, Damian Yerrick, DanMS, Dancraggs, Danim, Danlev, Datasmid, David Colbourn, DavidConrad, DavidHOzAu, Davidhorman, Dean001, Decrease789, Demosta, Denisarona, DerHexer, Dfass, Dflock, Discospinster, Doc vogt, DocRuby, Docu, Don Hammond, Doud101, Dreftymac, Drowne, Dthomsen8, Duke Ganote, Ed Poor, Edward Z. Yang, Eghanvat, Elcool83, Elwikipedista, EmmetCaulfield, Emperorbma, Encognito, Enric Naval, Epepke, Eric Burnett, Escape Orbit, Ethan, Evilyuffie, Falcon8765, Farquaadhnchmn, Fathergod, FauxFaux, F ieldday-sunday, Fireman biff, Flewellyn, Fluffernutter, Fmjohnson, Fraggle81, Fred Bradstadt, Furrykef, Gadfium, GateKeeper, Gimboid13, Gk5885, Gogo Dodo, Gottabekd, Gregbard, GregorB, Groganus, Gustavb, Guybrush, Guðsþegn, Gwr2004, Hadal, Hairy Dude, Hanifbbz, Hapsiainen, HbJ, Hbf, Heracles31, HiDrNick, History2007, Hoo man, Hu12, Hydrogen Iodide, Ianblanes, IceUnshattered, Imre Fabian, Inquam, Intgr, Jadvinia, Jakew, JamesBWatson, Jamesday, Jamesjusty, Jan Hidders, Japo, Jason Quinn, Javert16, Jdlambert, Jgro, Jjjjjjjjjj, Jklin, Joness59, Joseph Dwayne, Jpatokal, Jpo, Justin W Smith, KAtremer, KathrynLybarger, Keane2007, Keegan, KevinOwen, KeyStroke, Keyvez, Kgwikipedian, Kingpin13, Klausness, Kushalbiswas777, L Kensington, L'Aquatique, LOL, Larsinio, Lawrence Cohen, Leandrod, Lee J Haywood, Legless the oaf, Leleutd, Leotohill, Lerdthenerd, Les boys, Lethe, Libcub, Lifeweaver, Linhvn88, LittleOldMe, Longhair, Lssilva, Lujianxiong, Lulu of the Lotus-Eaters, Lumingz, Luna Santin, M4gnum0n, MER-C, Magantygk, Manavkataria, Mark Renier, Marknew, MarownIOM, MartinHarper, Masterstupid, Materialscientist, Matmota, Matthew 1130, Mckaysalisbury, Metaeducation, Michael Hardy, Michalis Famelis, Michealt, Microtony, Mike Rosoft, Mikeblas, Mikeo, Mindmatrix, Miss Madeline, Mjhorrell, Mo0, Modeha, Mooredc, Mpd, Mr Stephen, MrDarcy, Nabav, NawlinWiki, Nick1nildram, NickCT, NoahWolfe, Noisy, Northamerica1000, Nsaa, NubKnacker, Obradovic Goran, Ocrow, OliverMay, Opes, Oxymoron83, Pagh, Peachey88, Pearle, Perfectblue97, Pete142, Pharaoh of the Wizards, Phil Boswell, Philip Trueman, Pie Man 360, Plastic rat, Polluxian, Prakicov, ProveIt, Purplepiano, Quarl, RB972, RadioFan, Railgun, Rathgemz, Rdsmith4, Rdummarf, Reedy, Regancy42, Reinyday, Remy B, Reofi, RichF, Rjwilmsi, Robert McClenon, Robomaeyhem, Rockcool19, Rodasmith, Romke, Ronfagin, Rp, Rumplefish, Ruud Koot, Ryulong, Sam Hocevar, Sasha.sheinberg, SchuminWeb, ScottJ, Scwlong, Seaphoto, Sfnhltb, Shadowjams, Shakinglord, Shawn wiki, Shreyasjoshis, Shyamal, Silpi, Simeon, Simetrical, Sixpence, Skritek, Smjg, Smurfix, Snezzy, Snigbrook, Socialservice, Sonett72, Soulpatch, Soumyasch, Spacesoon, Sstrader, Stannered, Starwiz, Stephen e nelson, Stephenb, SteveHL, Stifle, Stolkin, Sue Rangell, Superjaws, Sydneyw, Sylvain Mielot, Szathmar, Taw, Tbhotch, Tcamacho, Tedickey, Teknic, Tgantos, The Thing That Should Not Be, The undertow, The1physicist, Tide rolls, Titofhr, Tobias Bergemann, Toddst1, Tom Lougheed, Tom Morris, Tommy2010, Toxicwaste288, Traxs7, Troels Arvin, Turnstep, Twinney12, Tyc20, Unforgettableid, Upholder, Utcursch, Vald, Valdor65, Velella, VinceBowdren, Vladsinger, Vodak, Voidxor, Waggers, Wakimakirolls, Wavelength, Wexcan, WikiPuppies, WikipedianYknOK, Wildheat, Wilfordbrimley, Wilsondavidc, Winterst, Wjhonson, Woohookitty, WookieInHeat, Xiong Chiamiov, Xiroth, Yong-Yeol Ahn, Zedla, Zeyn1, Zhenqinli, Zzuuzz, 石 庭 石 庭 豐 豐 , 1253 anonymous edits http://en.wikipedia.org/w/index.php?oldid=518836121 ex.php?oldid=518836121 Contributors: Aeonx, Alansohn, Alxndr, Ambuj.Saxena, Bernard Ladenthin, BillyPreset, Boson, Brianga, First normal form Source: http://en.wikipedia.org/w/ind Brick Thrower, Burner0718, Closedmouth, Crosbiesmith, Davidhorman, Dfass, Dreftymac, Eallik, Ebraminio, Eibcga, Foxyshadis, Funnyfarmofdoom, General Wesc, GermanX, GregorB, Gwernol, Hamidrizeh, Heathcliff, Hobsonlane, Isnow, Jacobolus, Jason Quinn, Jgzheng, John of Reading, Klilidiplomus, Kwetal, LarRan, Lordmwesh, M.r santosh kumar., Mahlon, Mark Renier, Mfpinhal, Montchav, Morra, Mystagogue, Nabav, NawlinWiki, ReformatMe, Vegpuff, VictorAnyakin, VinceBowdren, Whitmerj, 파핀, 168 anonymous edits http://en.wikipedia.org/w/index.php?oldid=518458517 ?oldid=518458517 Contributors: Ak786, Allens, Apugazh, Benjamin.Cramphorn, Bernard Ladenthin, Boson, Btilm, Second normal form Source: http://en.wikipedia.org/w/index.php Carlhoerberg, Chrislk02, Crosbiesmith, DARTH SIDIOUS 2, DVdm, Don Hammond, ESkog, Ebraminio, GermanX, Glane23, GregorB, Haffasoul, Ijliao, Jason Quinn, Javert16, JianzhouZhou, JimpsEd, Mark Renier, Materialscientist, Mordashov, Nabav, Sanchitideas, Shreyasjoshis, SqlPac, Uncle Dick, VinceBowdren, Whitmerj, 파핀, 87 anonymous edits http://en.wikipedia.org/w/index.php?oldid=517451053 .php?oldid=517451053 Contributors: Alvin-cs, Amalthea, Anabus, Arcturus, Azrich, Be rnard Ladenthin, Blahma, Boson, Bxn1358, Third normal form Source: http://en.wikipedia.org/w/index CapitalR, Centrx, Codeculturist, DVdm, Diego Moya, Don Hammond, Dorfl, Ebraminio, Edward Z. Yang, Electriccatfish2, Furrykef, Garde, GermanX, Gingerjoos, Gwen-chan, Ijliao, Jason Quinn, Jcsalterego, Jitse Niesen, Joseph Dwayne, Jswhitten, Kitkatbeard, Leasabp, Michalp, Michealt, MsHyde, Nabav, Natural Cut, Ollie, Pinethicket, Semaphorite, Shreyasjoshis, Sleske, Someusername222, THEN WHO WAS PHONE?, Thingg, Toyota prius 2, USConsLib, Unara, Vegpuff, VinceBowdren, Vlad2000Plus, Wavelength, Whitmerj, Wikimiro, Willking1979, Wyadbb, 94 anonymous edits Boyce –Codd normal form Source: http://en.wikipedia.org/w/ind http://en.wikipedia.org/w/index.php?oldid=500051408 ex.php?oldid=500051408 Contributors: Allens, Anugrah atreya, Athirubansm, Bernard Ladenthin, Boson, Briangregory2000,
Chitransh saxena, Christian75, CiudadanoGlobal, Ebraminio, Eggman64, Elizium23, Fctseng, Fieldday-sunday, Hairy Dude, Island Monkey, JForget, Jgzheng, JimpsEd, Leflyman, Mikeblas, Nabav, Nay Min Thu, NeerajKawathekar, Niddriesteve, Njsg, Obradovic Goran, Oxymoron83, P.kmetski, ProbePLayer, Quarl, Raztus, Simetrical, Smurfix, Solomon423, SqlPac, Su30, Torzsmokus, Twarther, Uzume, VinceBowdren, Whitepaw, Yachtsman1, ZenSaohu, 石 庭 石 庭 豐 豐 , 80 anonymous edits Fourth normal form Source: http://en.wikipedia.org/w/ind http://en.wikipedia.org/w/index.php?oldid=514805202 ex.php?oldid=514805202 Contributors: Akerans, Ashimjgec08, Bernard Ladenthin, Britannica, Ebraminio, Fetchcomms, Geeoharee,
GermanX, Jason Quinn, Jmabel, Mark Renier, Nabav, Northamerica1000, Patrick, Savh, Selfworm, VinceBowdren, Vjosullivan, WikHead, Winterst, 29 anonymous edits Fifth normal form Source: http://en.wikipedia.org/w/index.php http://en.wikipedia.org/w/index.php?oldid=513381881 ?oldid=513381881 Contributors: Andy M. Wang, Bernard Ladenthin, B rick Thrower, Cool Blue, Dsn.naruka, Dugo, Ebraminio,
FineganCJ, Flying Panda, GermanX, Igor Yalovecky, Jason Quinn, Libcub, MarcosWozniak, Mark Renier, Nabav, Quarl, RonaldKunenborg, Siryendor, SqlPac, Stamfest, Systemparadox, Tide rolls, TonyTheTiger, VinceBowdren, 40 anonymous edits Domain/key normal form Source: http://en.wikipedia.org/w/index.php http://en.wikipedia.org/w/index.php?oldid=508185658 ?oldid=508185658 Contributors: Bernard Ladenthin, Dcoetzee, Emurphy42, GermanX, Gigs, Gregbard, Jmabel, Mark
Renier, Mugaliens, Nabav, Ott2, RonaldKunenborg, SqlPac, VinceBowdren, Whitepaw, 19 anonymous edits Sixth normal form Source: http://en.wikipedia.org/w/i http://en.wikipedia.org/w/index.php?oldid=511111497 ndex.php?oldid=511111497 Contributors: Boson, DePiep, Emurphy42, Esran, Favonian, GregorB, Jason Quinn, Mark Renier, Nabav,
Quarl, RBarryYoung, Roenbaeck, RonaldKunenborg, Rp, 14 anonymous edits
102
Article Sources and Contributors Relation (database) Source: http://en.wikipedia.org/w/i http://en.wikipedia.org/w/index.php?oldid=517036792 ndex.php?oldid=517036792 Contributors: Asocall, AutumnSnow, Crosbiesmith, Ed Poor, Fratrep, Georgeryp, Icairns, Lfstevens,
MaD70, Mark Renier, MusiKk, NickCT, Rob Bednark, Subversive.sound, Universalss, 11 anonymous edits Functional dependency Source: http://en.wikipedia.org/w/index.ph http://en.wikipedia.org/w/index.php?oldid=518356288 p?oldid=518356288 Contributors: 1ForTheMoney, Armagedescu, AutumnSnow, BartVB, Cdrdata, Citral, Crosbiesmith,
D6bmg, EagleFan, Ejrh, Eric22, Favonian, Fragment, Ganesh121292, GeorgeBills, Graham87, Grassnbread, Hut 8.5, Igor Yalovecky, Iridiumcao, J. M., Jan Hidders, Jleedev, Jwp, KelvSYC, KeyStroke, Kushalbiswas777, LOL, MaBoehm, Mark Renier, MartinWaite, Matthiaspaul, MaxDel, Michael Hardy, Michaelcomella, Mroberts297, Notheruser, Odsh, Pdfpdf, Porges, Poromenos, Qetuth, Ruud Koot, Slavy13, Spamduck, Thebulbs, VinceBowdren, Waitak, Wilson44691, 90 anonymous edits http://en.wikipedia.org/w/index.php?oldid=518358625 ex.php?oldid=518358625 Contributors: AutumnSnow, Cdrdata, Citral, Dweller, Emma li mk, Gromuald, Justin W Smith, Kilopi, Multivalued dependency Source: http://en.wikipedia.org/w/ind Mark Renier, Michael Hardy, Michaelcomella, Mkagenius, Poa, ProveIt, R'n'B, VinceBowdren, 32 anonymous edits http://en.wikipedia.org/w/index.php?oldid=507767926 ex.php?oldid=507767926 Contributors: Boson, Hairy Dude, History2007, RonaldKunenborg, Signalhead, 6 anonymous edits Join dependency Source: http://en.wikipedia.org/w/ind http://en.wikipedia.org/w/index.php?oldid=516604541 dex.php?oldid=516604541 Contributors: 2GooD, Acdx, Adrianmunyua, Augsod, Bdesham, B rick Thrower, CanisRufus, CarlHewitt, Concurrency control Source: http://en.wikipedia.org/w/in Christian75, Clausen, Comps, Craig Stuntz, DavidCary, Furrykef, Gdimitr, GeraldH, JCLately, Jesse Viviano, Jirislaby, John of Reading, JonHarder, Jose Icaza, Karada, KeyStroke, Kku, Leibniz, M4gnum0n, Magioladitis, Malbrain, Mark Renier, Mgarcia, Mindmatrix, Miym, Nealcardwell, Nguyen Thanh Quang, Peak, Poor Yorick, Re edy, Rholton, Ruud Koot, Siskus, Smallman12q, The Anome, Thingg, Thoreaulylazy, TonyW, Touko vk, Tumble, Victor falk, Vincnet, Wbm1058, Wikidrone, YUL89YYZ, 84 anonymous edits
103
Image Sources, Licenses and Contributors
Image Sources, Licenses and Contributors http://en.wikipedia.org/w/index.php?title=File:ER_Diagram_MMORPG.png ex.php?title=File:ER_Diagram_MMORPG.png License: GNU Free Documentation License Contributors: Original Image:ER Diagram MMORPG.png Source: http://en.wikipedia.org/w/ind uploader was TheMattrix at en.wikipedia File:Entity Relationship metamodel.png Source: http://en.wikipedia.org/w/i http://en.wikipedia.org/w/index.php?title=File:Entity_Relation ndex.php?title=File:Entity_Relationship_metamodel.png ship_metamodel.png License: Creative Commons Zero Contributors:
Ottomachin Image:Erd-entity-relationship-example1.svg Source: http://en.wikipedia.org/w/index http://en.wikipedia.org/w/index.php?title=File:Erd-entity-relationshi .php?title=File:Erd-entity-relationship-example1.svg p-example1.svg License: Creative Commons ShareAlike 1.0 Generic Contributors: Chanueting
http://en.wikipedia.org/w/index.php?title=File:Erd-entity-with-attri ex.php?title=File:Erd-entity-with-attribute.svg bute.svg License: Creative Commons Attribution-Sharealike 2.5 Contributors: Image:Erd-entity-with-attribute.svg Source: http://en.wikipedia.org/w/ind Original uploader was Bigsmoke at en.wikipedia Image:erd-relationship-with-attribute.png Source: http://en.wikipedia.org/w/index.ph http://en.wikipedia.org/w/index.php?title=File:Erd-relationship-with p?title=File:Erd-relationship-with-attribute.png -attribute.png License: Creative Commons Attribution-Sharealike 2.5 Contributors: Original uploader was Bigsmoke at en.wikipedia Image:Erd-id-as-primary-key.png Source: http://en.wikipedia.org/w/in http://en.wikipedia.org/w/index.php?title=File:Erd-id-as-primary-key.png dex.php?title=File:Erd-id-as-primary-key.png License: Creative Commons Attribution-Sharealike 2.5 Contributors:
Original uploader was Bigsmoke at en.wikipedia http://en.wikipedia.org/w/index.php?title=File:ERD_Representation.svg ex.php?title=File:ERD_Representation.svg License: Public Domain Contributors: Benthompson Image:ERD Representation.svg Source: http://en.wikipedia.org/w/ind http://en.wikipedia.org/w/index.php?title=File:ERD-artist-performs-song.svg p?title=File:ERD-artist-performs-song.svg License: Public Domain Contributors: Original uploader was Image:ERD-artist-performs-song.svg Source: http://en.wikipedia.org/w/index.ph Bignose at en.wikipedia Image:Relational database terms.svg Source: http://en.wikipedia.org/w/in http://en.wikipedia.org/w/index.php?title=File:Relational_database_terms.svg dex.php?title=File:Relational_database_terms.svg License: Public Domain Contributors: User:Booyabazooka
http://en.wikipedia.org/w/index.php?title=File:Relational_Model.svg ?title=File:Relational_Model.svg License: Public Domain Contributors: U.S. Department of Transportation File:Relational Model.svg Source: http://en.wikipedia.org/w/index.php vectorization: File:Relational key.png Source: http://en.wikipedia.org/w/ind http://en.wikipedia.org/w/index.php?title=File:Relational_key.png ex.php?title=File:Relational_key.png License: Public Domain Contributors: LionKimbro
http://en.wikipedia.org/w/index.php?title=File:Relational_model_concepts ndex.php?title=File:Relational_model_concepts.png .png License: GNU Free Documentation License Contributors: File:Relational model concepts.png Source: http://en.wikipedia.org/w/i User:AutumnSnow File:Update anomaly.svg Source: http://en.wikipedia.org/w/in http://en.wikipedia.org/w/index.php?title=File:Update_anomaly.svg dex.php?title=File:Update_anomaly.svg License: Public Domain Contributors: Nabav, File:Insertion anomaly.svg Source: http://en.wikipedia.org/w/ind http://en.wikipedia.org/w/index.php?title=File:Insertion_anomaly.svg ex.php?title=File:Insertion_anomaly.svg License: Public domain Contributors: en:User:Nabav, User:Stannered
http://en.wikipedia.org/w/index.php?title=File:Deletion_anomaly.svg ex.php?title=File:Deletion_anomaly.svg License: Public domain Contributors: en:User:Nabav, User:Stannered File:Deletion anomaly.svg Source: http://en.wikipedia.org/w/ind
104
License
License Creative Commons Attribution-Share Alike 3.0 Unported //creativecommons.org/licenses/by-sa/3.0/ //creativecommons.org/licenses/by-sa/3.0/
105