Chapter 6 Distributed DBMS Architecture This chapter introduces the architecture of different distributed systems such as client/server system and peer-to-peer distributed system. Due to diversity of distributed system, it is very diffic difficult ult to derive derive an equival equivalent ent archit architect ecture ure for distri distribut buted ed DBMS. DBMS. Differ Different ent altern alternati ative ve architectures of the distributed database system and the advantages and disadvantages of such systems are discussed here in detail. This chapter also introduces the concept of multi-database syst system em (MDB (MDBS) S) whic which h is used used to mana manage ge the the hete hetero roge genei neity ty of diff differ erent ent DBMS DBMSss in a heterogeneous distributed DBMS environment. The classification of multi-database system and the architectures of such databases are thoroughly presented in this chapter. The The outli outline ne of this this chapt chapter er is as foll follow ows. s. Sect Sectio ion n 6.1. 6.1.1 1 intr introd oduce ucess diff differ erent ent alte altern rnat ativ ivee archit architect ecture uress for client client/se /serve rverr syste system m and pros pros and cons cons of this this syste system. m. In Sectio Section n 6.1.2, 6.1.2, alternative architectures for peer-to-peer distributed system has been discussed. Section 6.1.3 focuses on multi database system (MDBS). The classifications of MDBS and their c orresponding architectures have been illustrated in this section. 6.1 Introduction
The architecture of a system reflects the structure of the underlying system. It defines the different different components components of the system, system, the functions of these components and overall overall interaction interactionss and relationships among these components. This concept is true for general computer systems as well as software systems also. The software architecture of a program or computing system is the struct structure ure or struct structure uress of the syste system, m, which which compri comprises ses softwa software re element elementss or module modules, s, the externally visible properties of these elements and the relationships among them. Software architecture can be thought as a representation of an engineering software system and the process and discipline for effectively implementing the design(s) of such system. Distri Distribut buted ed databas databasee system system can be conside considerr as a large-s large-scal calee softwa software re system system,, thus, thus, the architecture of distributed system can be defined in a similar manner like software systems. This chapter chapter introd introduce ucess the differ different ent altern alternati ative ve refere reference nce archit architect ecture uress of distri distribute buted d databa database se systems such as client/server, peer-to-peer and multi-database systems. 6.1.1 Client/Server System
In the late 1970s and early 1980s smaller systems (mini computer) were developed that required less power and air conditioning. The term client/server was first used in the 1980s and gaining acceptance in reference to personal computer (PCs) on a network. In the late 1970s, Xerox develop developed ed the standar standard d and technol technology ogy that that is famili familiar ar as Ethern Ethernet et today today.. This This provid provided ed a standard means of linking together computers from different manufactures and formed the basis for modern local area networks (LANs) and wide area networks (WANs). Client/server system has been developed to cope up with the rapidly changing business environment. The general forces that drive the move to Client/Server system are as follows:
102
• • • •
•
A strong business requirement for decentralized co mputing horsepower. Standard, powerful computers with user-friendly interfaces. Mature, shrink-wrapped user applications with wide spread accep tance. Inexpensive, modular systems designed with enterprise class architecture such as power and network redundancy and file archiving network protocols to link them together. Growing cost/performance advantages of PC-based platforms.
The Client/Server system system is a versat versatile ile,, messag message-ba e-based sed and modula modularr infras infrastru tructur cturee that that is intende intended d to improv improvee usabil usability ity,, flexib flexibili ility ty,, intero interoper perabi abilit lity y and scalab scalabili ility ty as compar compared ed to centralized, mainframe, time-sharing computing. In the simplest sense, the client and the server can be defined as follows. •
•
A Client is an individual user’s computer or a user application that does a certain amount of processing its own and sends and receives requests to and from one or more servers for other processing and/or data. A Server consists of one or more computers or an application program that receives and processes requests from one or more client machines. A server is typically designed with some redundancies in power, network, computing, and file storage.
Usually, a client is defined as a requester of services and a server is defined as the provider of serv servic ices es.. A singl singlee mach machin inee can can be both both a clie client nt and and a serv server er depen dependi ding ng on the the soft softwa ware re config configura uratio tion. n. Someti Sometimes mes,, the term term server server or client client refers refers to the softwa software re rather rather than than the machines. Generally, server software runs on powerful computers dedicated for exclusive use to business applications. On the other hand, client software software runs on common PCs or workstations. The properties of a server are: • • •
Passive (slave) Waiting for requests On request serves clients and sends reply.
The properties of a client are: • • •
Active (Master) Sending requests Waits until reply arrives.
Server can be stateless or stateful. A stateless server does not keep any information between requests. A stateful server can remember information between requests. 6.1.1.1 Advantages and disadvantages of Client/Server System
A client/server system provides a number of advantages over a powerful mainframe centralized syste system. m. The major major advanta advantage ge is that that it improv improves es usabil usability ity,, flexib flexibili ility ty,, intero interoper perabi abilit lity y and scalab scalabili ility ty as compare compared d to centra centrali lized, zed, mainfr mainframe, ame, time-s time-shar haring ing comput computing ing.. In additi addition, on, a client/server system has the following advantages:
103
•
•
•
A client/server system has the ability to distribute the computing workload between client workstations and shared servers. A client/server system allows the end user to use microcomputer’s graphical user interfaces, thereby improves functionality and simplicity. simplicity. It provides better performance at a reduced cost for hardware and software than alternative mini or mainframe solutions.
The client/server environment is more difficult to maintain for a variety of reasons which are as follows. •
•
•
•
•
The client/server architecture creates a more complex environment in which it is often difficult to manage different platforms (LANs, operating systems, DBMS etc.). In a client/server system, the operating system software is distributed over many machines rather than a single system, thereby increases complexity. A client client/se /serve rverr system system may suffer suffer from securi security ty problem problem as the number number of users users and processing sites increases. The workstations are geographically distributed in a client/server system and each of these workstations are administrated and controlled by individual departments, which adds extra complexity. complexity. Furthermore, one communication cost is incurred with each processing. The maintaining cost of a client/server system is greater than alternative mini or mainframe solutions.
6.1.1.2 Architecture of Client/Server Distributed System
Client/Server architecture is a prerequisite to the proper development of Client/Server systems. The Client/Server architecture is based on hardware and software components that interact to form a distributed system. In a client/server distributed database system, entire data can be viewed as a single logical database while at the physical level data may be distributed. From the data organizational view, the architecture of a client/server distributed database system is mainly concent concentrat rated ed on softwa software re compon components ents of the system system and this this syste system m includ includes es three three main main components: clients, servers and communications middleware. (i)
A Client is an individual computer or process or user’s application that requests services from the server. A Client is also known as front-end application since the end user usually interacts with the client process. The software components require in client machine are client operating system, client DBMS and client graphical user interface. Clie Client nt proce process ss is run run on an oper operat atin ing g syst system em that that has has at leas leastt some some mult multit itas aski king ng capabilities. The end users interact with the client process via graphical user interface. In addition, a client DBMS is required at client side, which is responsible for managing the data data that that is cached cached to the client client.. In some some client client/se /serve rverr archit architect ecture ure,, commun communicat ication ion software is embedded into the client machine to interact efficiently with other machines in the network as a substitute of communication middleware.
104
(ii)
A Server consists of one or more computers or is a computer process or application that provides services to clients. A Server is also known as back-end application since the server server proces processs provid provides es the backgro background und servic services es for the client client proces processes ses.. A server server pro provi vide dess most most of the the data data mana manage geme ment nt serv servic ices es such such as quer query y proc proces essi sing ng and and optimizatio optimization, n, transaction transaction management, recovery recovery management, management, storage storage management management and integrity services to clients. In addition, sometimes communication software is resided into the server machine to manage communications with clients and other servers in the network instead of communication middleware.
(iii)
A communication middleware is any process(es) through which clients and servers communicate with each other. The communication middleware is usually associated with a network that controls data and information transmission between clients and servers. Communication middleware software consists of three main components: Application Program Interface (API), Database translator and network translator. The application pro progr gram am inte interf rfac acee (API) API) is publ public ic to clie client nt appl applic icat atio ions ns thro throug ugh h whic which h it can can communicate with the communication middleware. The middleware API allows the client process to be database server independent. The database translator translates the SQL requests into the specific database server syntax, thus, enables a DBMS from one vendor to communicate directly with a DBMS from another vendor without a gateway. The network translator manages the network communications protocols, thus, it allows clients to be network network protocol independent. independent. To accomplish accomplish the connection between the client and the server, the communication middleware software operates at two different levels. The physical level deals with the communications between the client and the server comp comput uter erss (com (compu pute terr to comp comput uter er)) wher wherea eass the the logi logica call leve levell deal dealss with with the the communications between the client and the server processes (interprocess). Graphical User Application Interface Program Client Machine OS Client DBMS
Data
SQL Queries Communication Middleware
Query optimizer Transaction Manager Server Machine
Recovery manager
OS
…………………… Runtime Support Processor 105 Figu Figure re 6.1 6.1
Clie Client nt/S /Ser erve verr Refe Refere renc ncee Arch Archit itec ectu ture re
Database
A Client Client/S /Serv erver er archit architect ecture ure is intend intended ed to provid providee a scalab scalable le archit architect ecture ure,, whereby whereby each Communication Middleware computer or process on the network is either a client or a server. 6.1.1.3 Architectural Alternatives for Client/Server System
A Client/Server system provides several architectural alternatives known as two-tier, three-tier and multi-tier or n-tier. n-tier. Two-tier architecture: A generic Client/Server architecture has two types of nodes on the network: clients and servers. As a result, these generic architectures are sometimes referred to as two-tier architectures. With two tier tier client/ client/ser server ver archit architect ecture ure,, the user user syste system m interf interface ace is usually located in the user's desktop environment and the database management services are usually in a server that services many clients. Processing management is split between the user system interface environment and the database management server environment. The general two-tier architecture of a Client/Server system is illustrated in the following figure.
Client1
Client2
……
Clientn
Communication Network
Print Server
Figu Figure re 6.2 6.2
File Server
DBMS Server
TwoTwo-ti tier er Cli Clien ent/ t/Se Serv rver er Arc Archi hite tect ctur uree
In a two-tier client/server system, it may occur that multiple clients served by a single server, called multiple client-single server approach. Another substitute is multiple servers provide multiple clients clients multiple multiple servers approach. In case of services to multiple clients, called multiple multiple client single server approach, two alternative management strategies are possible: either each client manages its own connection to the appropriate server or each client communicates with its home server, which further communicates with other servers as required. The former approach simplifies server code but complicates the client code with additional functionalities that leads to heavy (fat) client system. On the other hand, the latter approach loads the server machine with all data management responsibilities, thus, leads to light (thin) client system. Depending on the extent to which the processing is shared between the client and the server, a server can be described as fat or thin. A fat server carries the larger proportion of processing load where as a thin server carries a lesser processing load.
106
The two-tier client/server architecture is a good solution for distributed computing when work groups are defined up to 100 people interacting on a local area network simultaneously. It provides a number of limitations also. The major limitation is performance begins to deteriorate when the number of users exceeds 100. A second limitation of the two tier architecture is that impl implem ement entat atio ion n of proce process ssin ing g mana managem gement ent serv servic ices es usin using g vendo vendorr propr proprie ieta tary ry datab databas asee procedures restricts flexibility and choice of DBMS for applications. Three-tier architecture: Some networks of Client/Server architecture consist of three different kinds of nodes, clients, application servers which process data for the clients and database servers which store data for the application servers. This is called three-tier architecture. The three-tier architecture (also referred to as the multi-tier architecture) emerged to overcome the limitations of the two-tier architecture. In the three-tier architecture, a middle tier was added betwe between en the user user syste system m interf interface ace client client enviro environme nment nt and the databas databasee managem management ent server server environment. The middle tier can perform queuing, application execution, and database staging. There There are variou variouss ways for implem implement enting ing the middle middle tier tier, such such as transa transacti ction on proces processin sing g monitors, message servers, web servers, or application servers. The typical three-tier architecture of a Client/Server system is depicted in figure 6.3.
Graphical User Interface, Web Interface
Client
Application Server or Web Server
Application Programs, Web Pages
Database Server
Database Management System
Figu Figure re 6.3 6.3
Thre Th reee-ti tier er Clie Client nt/S /Ser erve verr Arch Archit itec ectu ture re
The most basic type of three-tier architecture has a middle layer consisting of Transaction Processing (TP) monitor technology. The TP monitor technology is a type of message queuing, transaction scheduling, and prioritization service where the client connects to the TP monitor (middle tier) instead of the database server. The transaction is accepted by the monitor, which queues it and takes responsibility for managing it to completion, thus, freeing up the client. TP monitor technology also provides a number of services such as updating multiple different DBMSs in a single transaction, connectivity to a variety of data sources including flat files, nonrelational DBMS & the mainframe, the ability to attach priorities to transactions and robust securi security ty.. When When all these these functi functional onaliti ities es is provid provided ed by third third party party middle middlewar waree vendors vendors,, it complicates the TP monitor code which is referred as “TP heavy” and it can service thousands thousands
107
of users. On the other hand, if all these functionalities is embedded in the DBMS and can be considered as two-tier architecture, it is referred to as “TP Lite”. A limitation limitation to TP monitor technology is that the implementation code is usually written in a lower level language, and not yet widely available in the popular visual toolsets. Messaging is another way to implement three tier architectures. Messages are prioritized and processed asynchronously. The message server connects to the relational DBMS and other data sources. The message server architecture is mainly focuses on intelligent messages. Messaging systems are good solutions for wireless infrastructure. The three-tier architecture with a middle layer consisting of Application Server allocates the main body of an application on a shared host for execution rather than in the user system interface interface client environment. environment. The application application server shares business business logic, computations computations,, and a data retrieval engine. Thus, major advantages with application application server server are with less software software on the the clie client nt side side ther theree is less less secur securit ity y to worr worry y about about,, appl applic icat atio ions ns are are more more scal scalab able le,, and and installation costs are less on a single server than maintaining ea ch on a desktop client. Currently, developing client/server systems using technologies that support distributed objects has gaining gaining popularity popularity,, as these technologies technologies support support interoperabi interoperability lity across languages and platf platform orms, s, as well well as enhanci enhancing ng mainta maintaina inabil bility ity and adapta adaptabil bility ity of the system system.. There There are currently two prominent distributed object technologies; one is Common Object Request Broker Architecture (CORBA) and another is COM (Component Object Model)/DCOM. The major advantage of three-tier client/server architecture is that it provides better performance for groups with a large large number number of users users and improv improves es flexibi flexibilit lity y with with respec respectt to two-ti two-tier er approach. In case of three-tier architecture, since data processing is separated from different servers servers it provides provides more scalability scalability.. The disadvantage disadvantage of three-tier three-tier architecture architecture is that it puts a greater load on the network. Moreover, in case of three-tier architecture, it is much more difficult to progra program m and test test softwa software re than than in two-ti two-tier er archit architect ecture ure,, because because more more devices devices have to communicate to complete a user ’s transaction. transaction. In genera general, l, a multimulti-tie tierr (or n-tier n-tier)) archit architect ecture ure may deploy deploy any number number of distin distinct ct servic services, es, including transitive relations between application servers implementing different functions of business logic, each of which may or may not employ a distinct or shared database system. 6.1.2 Peer-to-Peer Distributed System
The peer-to-peer architecture is a good way to structure a distributed system so that it consists of many identical software processes or modules, each module running on a different computer or node. The different software modules stored at different sites communicate with each other to complete the processing required for the completion of distributed applications. A peer-to-peer architecture provides both client and server functionalities on each computer. Therefore, each node can access services from other nodes as well as providing services to other nodes in a peerto-peer to-peer distributed distributed system. system. In contrast with with the client/ser client/server ver architecture, architecture, in a peer-to-pe peer-to-peer er dist distri ribut buted ed syst system em each each node node prov provid ides es user user inte intera ract ctio ion n faci facili liti ties es as well well as proc proces essi sing ng capabilities.
108
Considering the complexity associated with discovering, communicating, and managing the large number of computers involved in a distributed system, the software module at each node in a peer-to-peer distributed system is typically structured in a layered manner. Thus, the software module of peer-to-peer applications can be divided into the three layers, known as the base overlay layer, the middleware layer, and the application layer. The base overlay overlay layer deals with the issue of discovering other participants in the system and creating a mechanism for all nodes to communicate with each other. This layer ensures that all participants in the network are aware of other participants. The middleware layer includes additional software components that can be potentially reused by many different applications. The functionalities provided by this layer include the ability to create a distributed index for information in the system, a publish subscribe facility and security services. The functions provided by the middleware layer are not necessary for all applications, but they are developed to be reused by more than one application. The application layer provides software packages intended to be used by users and developed so as to exploit the distributed nature of the peer-to-peer infrastructure. There is no standard terminology across different implementations of the peer-to-peer system, and thus, the term “peer-to-peer” is used for general descriptions of the functionalities required for building a gener generic ic peerpeer-to to-p -pee eerr syst system. em. Most Most of the the peerpeer-to to-p -peer eer syst system emss are are devel develop oped ed as sing single le application. As a database management system, each node in a peer-to-peer distributed system provides all data management services and it can execute local queries as well as global queries. Thus, in this system there is no distinction between client DBMS and server DBMS. As a single application program, DBMS at each node accept user requests and manages execution. Like client/server system, in a peer-to-peer distributed database system data is also viewed as a single logical database although the data is distributed at the physical level. In this context, the identification of the reference architecture for a distributed database system is necessary. necessary. 6.1.2.1 Reference Architecture of Distributed DBMS
This This sectio section n introd introduce ucess the refere reference nce archit architect ecture ure of a distri distribut buted ed databa database se system system.. Due to diversities of distributed DBMSs, it is much more difficult to represent an equ ivalent architecture that is generally applicable for all applications. However, it may be useful to represent a possible reference architecture that addresses data distribution. Data in a distributed system is usually fragmented and replicated. Considering this fragmentation and replication issue, the reference architecture of a distributed DBMS consists of the following schemas (as described in figure 6.4): • • • •
A set of global external schemas A global conceptual schema A fragmentation schema and allocation schema A set of schemas schemas for each local local DBMS, DBMS, confor conformin ming g to the ANSI-S ANSI-SP PARC threethree-lev level el architecture.
The reference architecture of distributed DBMS is illustrated in figure 6.4.
109
Global external schema1
Global external schema2
Global external schema n
Global conceptual schema
Fragmentation schema
Allocation schema
Local mapping schema1
Local mapping schema2
Local mapping schema n
Local conceptual schema1
Local conceptual schema2
Local conceptual schema n
Local internal schema1
Local internal schema2
Local internal schema n
D B1
Figure 6.4
D B2
D Bn
Reference Architecture of Distributed DBMS
Global External Schema. In a distributed system, user applications and user accesses to the distributed database is represented by a number of global external schemas. This is the top most
110
level in the reference architecture of a distributed DBMS. This level describes the part of the distributed database that is relevant to different users. Global Conceptual Schema. The global conceptual schema represents the logical description of the entire database as it is not distributed. This level corresponds to the conceptual level of the ANSI-S ANSI-SP PARC archit architectu ecture re of central centralize ized d DBMS DBMS and contain containss defini definitio tions ns of all entiti entities, es, relationships among entities, security and integrity information for the whole database stored at all sites in a distributed system. Fragmentation Schema and allocation Schema. In a distributed database, the data can be split into a number of non-overlapping portions, called fragments. There are several different ways to perform this fragmentation operation. The fragmentation schema describes how the data is to be logically partitioned in a distributed database. The global conceptual schema consists of a set of global relations and the mapping between the global relations and fragments is defined in the fragmentation schema. This mapping is one to many, that is, a number of fragments correspond to one global relation but only one global relation corresponds to one fragment. The allocation schema is a description of where the data (fragments) is to be located, taking account of any replication. The type of mapping defined in the allocation schema determines whether the distributed database is redundant or non-redundant. In case of redundant data distribution the mapping is one to many while in case of non-redundant data distribution the mapping is one to one. Local Schemas. Each local DBMS in a distributed system has its own set of schemas. The local conceptual and local internal schemas correspond to the equivalent levels of ANSI-SPARC architecture. In a distributed database system, the physical data organization at each machine is probably different and therefore it requires an individual internal schema definition at each site, local intern internal al schema schema. To handle called local handle fragme fragmenta ntatio tion n and replic replicati ation on issue issues, s, the logical logical organization of data at each site is described by a third layer in the architecture, called local conceptual schema. The global conceptual schema is the union of all local conceptual schemas, thus, the local conceptual schemas are mappings of the global schema onto each site. This mapping mapping is done by local mapping schemas. The local mapping schema maps fragments in the allocation schema into external objects in the local database and this mapping depends on the type of local DBMS. Therefore, in a heterogeneous distributed DBMS, there may have different types of local mappings at different nodes.
This architecture provides a very general conceptual framework for understanding distributed databases. Furthermore, such databases are typically designed in a top-down manner, therefore, all external view definitions are made globally. 6.1.2.2 Component Architecture of Distributed DBMS
This section introduces a component architecture of distributed DBMS which is independent of the reference architecture. The four major components of a Distributed DBMS that has been identified are as follows. •
Distributed DBMS (DDBMS) component
111 111
• • •
Data communications (DC) component Global System Catalog (GSC) Local DBMS (LDBMS) component
Distributed DBMS (DDBMS) Component. The DDBMS component is the controlling unit of the entire system. This component provides the different level of transparencies such as data distri distributi bution on transp transpare arency ncy,, transa transacti ction on transp transpare arency ncy,, perfor performan mance ce transp transpare arency ncy and DBMS DBMS transparency (in case of heterogeneous Distributed DBMS). [OZSU] has identified four major componets of DDBMS as listed in the following.
(a) The user interface handler – This component is responsible for interpreting user commands as they come into the system and formatting the result data as it is sent to the user. semanticc data control controller ler – This (b) The semanti This componen componentt is respons responsibl iblee for checkin checking g integr integrity ity constraints and authorizations that are defined in the global conceptual schema before processing the user requests.
(c) The global query optimizer and decomposer – This component determines an execution strategy to minimize a cost function and translates the global queries into local ones using the global and local conceotual schemas as well as the global system catalog. The global query optimizer is responsible for generating the best strategy to execute distributed join operations. (d) The distributed execution monitor- It coordinates the distributed execution of the user request. This component is also known as distributed transaction manager. In execution of distributed queries, the execution monitors at various sites may and usually communicate with one another an other.. Data Communications (DC) Component. The DC component is the software that enables all sites to communicate with each other. other. The DC component contains all information about the sites and the links. Global System Catalog(GSC). The global system catalog provides the same functionality as system catalog of a centralized DBMS. In addition with metadata of the entire database, a GSC contains all fragmentation, replication and allocation details considering the distributed nature of a distributed DBMS. It can itself be managed as a distributed database and thus, it can be fragme fragmente nted d and distri distribute buted, d, fully fully replic replicate ated d or central centralize ized d like like any other other relati relations ons into into the system.(The details of global system catalog management has been introduced in Chapter 12, Section 12.2) Local DBMS (LDBMS) component. The local DBMS component is a standard DBMS, stored at each site that that has a databas databasee and respons responsibl iblee for controll controlling ing local data. data. Each Each LDBMS LDBMS component has its own local system catalog that contains the information about the data stored at that particular site. In a homogeneous distributed DBMS, the local DBMS component is the same product, replicated at each site while in a heterogeneous distributed DBMS, there must be at least two sites with different DBMS products and/or platforms. The major components of a local DBMS are as follows.
112
(a) The local query optimizer – This component is used as the access path selector and responsible for choosing the best access path to access any data item for the execution of a query (the query may be local query as well as part of the global query executed execu ted at that site). (b) The local recovery manager – The local recovery recovery manager ensures the consistency consistency of the local database inspite of failures. run-ti -time me support support process processor or – This (c) The run This compone component nt physica physically lly accesse accessess the databas databasee according to the commands in the schedule generated by the query optimizer and responsible for managing main memory buffers. The run-time support processor is the interface to the operating system and contains the database buffer (or cache) manager. manager. 6.1.2.3 Distributed Data Independence
The reference architecture of a distributed DBMS is an extension of ANSI/SPARC architecture, therefore data independence is supported by this model. Distributed data independence means that that upper upper levels levels are unaff unaffect ected ed by changes changes to lower lower levels levels into into the distri distribut buted ed databas databasee distribute buted d logica logicall data indepe independen ndence ce and archit architect ecture ure.. Like Like central centralize ized d DBMS, DBMS, both both distri distributed physical data independence are supported by this architecture. In a distributed syste system, m, the user user querie queriess data data are irresp irrespect ective ive of its locati location, on, fragme fragmenta ntatio tion n or replic replicati ation. on. Furthermore, any changes made to the global conceptual schema do not affect the user views at global global extern external al schema schemas. s. Thus, Thus, distri distribute buted d logica logicall data data indepen independenc dencee is provid provided ed by global global external schemas in distributed database architecture. Similarly, the global conceptual schema provides distributed physical data independence in the distributed database environment. 6.1.3 Multi-Database System (MDBS)
In recent years, multi-database system has gaining attention of many researchers that attempts to logically integrate several different independent distributed DBMSs while allowing the local DBMSs to maintain complete control of their operations. Hence, complete autonomy means that there can be no software modifications to the local DBMSs in a distributed DBMS. Thus, Multidatabase system (MDBS) is an additional software layer on the top of the local DBMSs which provides the necessary functionality. functionality. A multi-database (MDBS) system is a software that can be manipulated and accessed through a sing single le manip manipul ulat atio ion n lang langua uage ge with with a singl singlee commo common n data data mode modell (i.e, (i.e, thro through ugh a sing single le application) in a heterogeneous environment without interfering the normal execution of the individual database systems. The MDBS has developed from a requirement to manage and retrieve data from multiple databases within a single application while providing complete autonomy to individual database systems. To support DBMS transparency, MDBS resides on top of existing databases and file systems and presents a single database to its users. A MDBS maintains a global schema against which users issue queries and updated, and this global schema is constructed by integrating the schemas of local databases. To execute a global query, the MDBS first translates it into a number of sub queries, and updates these sub queries into appropriate local queries for running into local DBMSs. After completion of execution, local
113
results are merged and final global result for the user is generated. A MDBS controls multiple gateways and manages local databases through these gateways. MDBSs can be classified into two different categories based on the autonomy of the individual federated MDBS. A federated MDBS is again DBMSs. These are nonfederated MDBS and federated categorized as loosely coupled federated MDBS and tightly coupled federated MDBS based on who manages the federation and how the components are integrated. Further, a tightly single federation federation tightly coupled coupled federated federated coupled federated MDBS can be classified as single MDBS and Multiple federations tightly coupled federated MDBS. The complete taxonomy of multi-database systems [Sheth and Larson, 1990] is depicted in figure 6.5.
Multi-database System
Non-federated MDBS
Federated MDBS
Loosely Coupled Federated MDBS
Tightly Coupled Federated MDBS
Single Federation Tightly Coupled Federated MDBS
Figu Figure re 6.5 6.5
Multiple Federations Tightly Coupled Federated MDBS
Taxo Taxono nomy my of Mult Multii-da data taba base se Syst System emss
Federat Federated ed MDBS. MDBS. A fed federa erated ted multi multi database database system system (FMDBS (FMDBS)) is a collection of cooperating database management systems that are autonomous but participate in a federation to allow partial and controlled sharing of their data. In a federated MDBS, all component DBMSs cooperate to allow different degrees of integration. There is no centralized control in a federated architecture because the component databases control access to their data. To allow controlled sharing of data while preserving the autonomy of component DBMSs and continued execution of existing applications, a federated MDBS support two types of operations; local or global (or federation). Local operations are directly submitted to a component DBMS and they involve data only from that component database. Global operations access data from multiple databases managed by multiple component DBMSs via federated multi-database management system. Therefore, a federated MDBS is a cross between a distributed DBMS and a centralized DBMS. It
114
is a distributed system to global users whereas a centralized DBMS to local users. In simple way, way, a multimulti-dat databas abasee syste system m is said said to be federa federated ted multimulti-dat databas abasee system system (FMDB (FMDBS) S),, if users users interface to a multi-database system through some integrated views and there is no connection between any two integrated views. The features of a FMDBS FMDBS are listed in the following. •
•
•
•
Integra Integrated ted schema schema exists exists - The FMDBS administrator (MDBA) is responsible for the creation of integrated schemas in the heterogeneous environment. Compone Component nt database databasess are transpar transparent ent to users users – Users are not aware regarding the multiple component DBMSs in a FMDBS, thus, the users only need to understand the integr integrate ated d schamas schamas to implem implement ent the operati operations ons on FMDBS FMDBS.. They cannot change the integrated component when they the y operate this FMDBS. A common data model (CDM) is required to implement the federation – The CDM must be very powerful to represent all data models in different components. The integration of export schemas of component data models is placed on the CDM. Update Update transact transaction ionss is a diffic difficult ult issue issue in FMDBS FMDBS – The compone component nt databa databases ses are completely independent and join the federation through the integrated schema. It is difficult to decide whether the FMDBS or the local component database systems will control the transactions.
Two types of FMDBS has identified, namely, loosely coupled FMDBS and tightly coupled FMDBS depending on how multiple component databases are integrated. A FMDBS is loosely coupled if it is the user’s responsibility to create and maintain the federation and there is no control enforced by the federated system and its administrators. Similarly, a FMDBS is tightly coupled coupled if the federa federatio tion n and its adminis administra trator tor(s) (s) have have the respons responsibi ibilit lity y for creati creating ng and maintaining the integration and actively control the access to components databases. A federation is built by a selective and controlled integration of its components. A tightly coupled FMDBS may have one or more federated schemas. A tightly coupled FMDBS is said to have single federation if it allo allows ws the the cre creat atio ion n and and mana manage geme ment nt of only only one one fede federa rate ted d sche schema ma.. On the the other hand, a tightly coupled FMDBS is said to be multiple federations if it allows the creation and management of multiple federated schemas. A loosely coupled FMDBS always supports multiple federated schemas. Non-federated MDBS. In contrast to a federated multi-database system, a non-federated multidatabase system does not distinguish local and global users. In a non-federated MDBS, all component databases are fully integrated to provide a single global schema (sometimes called ente enterp rpri rise se or corpor corporat ate) e) know known n as unif unifie ied d MDBS MDBS.. Thus Thus,, in a nonnon-fe feder derat ated ed MDBS MDBS,, all all applications are global applications (because there is no local user) and data are accessed through single global schema. It logically appears to its users like a distributed database. 6.1.3.1 Five-Level Schema Architecture of federated MDBS
The terms terms federat federated ed databa database se system system and federa federated ted databas databasee archit architect ecture ure introd introduced uced by Heimbigner Heimbigner and McLeod (1985) to solve the interaction interactionss and sharing sharing among independendent independendently ly designed databases. Their main purpose was to build up a loosely coupled federation of different compone component nt databas databases. es. However However,, [Sheth [Sheth & Larson Larson,, 1990] 1990] has identi identifie fied d five-l five-leve evell schema schema
115
architecture for a federated MDBS to solve the heterogeneity of FMDBS, which is depicted in figure 6.6. (i)
(ii)
Local Schema – A local schema is used to represent each component database of a federated MDBS. A local schema is expressed in the native data model of the component DBMS and hence different local schemas can be expressed in different data models. Component Component Schema – A componet schema is generated by translating local schemas into a data model called the canonical or Common Data Model (CDM) of the FMDBS. A component schema is used to facilitate negotiation and integration among different divergent local schemas to execute global tasks.
External Schema
External Schema
Federated Schema
Export Schema
External Schema
Federated Schema
Export Schema
Component Schema
Export Schema
Component Schema
Local Schema
Local Schema
Component database
Component database
Figure 6.6 Five-level Schema Architecture of Federated MDBS
116
(iii)
(iv)
(v)
Export Schema – An export schema is a subset of a component schema and it is used to represent only those portions of a local database which is authorized by the local DBMS for accessing of non-local users. The purpose of defining export schemas is to control and management of autonomy for component databases. Federated Schema – A federated federated Schema is an integration of multiple export schemas. It is always connected with a data dictionary which stores the information about the data distribution and the definition of different schemas in the heterogeneous environment. There may be multiple multiple federated schemas schemas in a FMDBS, one for each class of federation federation users. External Schema or Application Schema – An external schema or application application schema is derived from the federated schema and is suitable for different users. Application schema can be a subset of a large complicated federated schema or may be changed into a different data model, to fit in a specific user interface for fulfilling the requirements of different users. This allows users to put additional integrity constraints or access control constraints on the federated schema.
6.1.3.1.1 Reference Architecture of Tightly Coupled Federated MDBS
The architecture of a federated MDBS is primarily determined by which schemas are present, how they are arranged and how they are constructed. constructed. The reference reference architecture architecture is necessary necessary to understand, understand, categorize categorize and compare compare different different architectural architectural options options for developing developing federated federated database systems. This section describes the reference architecture of a tightly coupled federated MDBS. Usually, a federated MDBS is designed in a bottom-up manner to integrate a number of existing heterogeneous databases. In a tightly coupled federated MDBS, federated schema takes the form of schema integration. For simplicity a single (logical) federation is consider for the entire system, and it is represented by a global conceptual schema (GCS). A number of export schemas are integrated into global conceptual schema, where export schemas are created by negotiation between the local databases and the global conceptual schema. Thus, in a FMDBS, FMDBS, the global conceptual schema is a subset of local conceptual schemas and consisting of the data that each local DBMS agree to share. The global conceptual schema of a tightly coupled federated MDBS involves the integration of either parts parts of local local concept conceptual ual schemas schemas or local local extern external al schemas schemas.. Global Global extern external al schema schemass are generated by negotiation between global users and the global conceptual schema.
117
Global External schema
Global External schema
Global Conceptual schema Local External schema
Local External schema
Local External schema
Local External schema
Local conceptual schema
Local conceptual schema
Local internal schema
Local internal schema
DB
Figur Figuree 6.7 6.7
DB
Refe Refere rence nce Arc Archi hitec tectur turee of Tig Tight htly ly Cou Coupl pled ed Fede Federa rated ted MDB MDBS S
6.1.3.1.2 Reference Architecture of Loosely Coupled Federated MDBS
In contrast with tightly coupled federated MDBS, schema intergration is not takes place in loosely coupled federated MDBS; therefore, a loosely coupled federated MDBS can not have a global global concept conceptual ual schema schema.. In this this case, case, federat federated ed schema schemass for global users are defined defined by importing export schemas using a user interface or an application program or by defining a multidatabase language query that references export schema objects of local databases. Export schemas are created based on local component databases. Thus, in a loosely coupled federated MDBS, MDBS, a global global external external schema consists consists of one or more more local local concept conceptual ual schemas. schemas. The reference architecture of a loosely coupled federated MDBS is depicted in figure 6.8.
118
Global External schema
Local External schema
Global External schema
Local External schema
Local External schema
Local External schema
Local conceptual schema
Local conceptual schema
Local internal schema
Local internal schema
DB
Figu Figure re 6.8 6.8
DB
Refe Referen rence ce Arc Archi hitec tectur turee of Loo Loosel sely y Coupl Coupled ed Fede Federa rated ted MDB MDBS S
6.2 Chapter Summary
This chapter introduces several atternative architectures for a distributed database system such as Client/server, Client/server, peer-to-peer and multidatabase system. •
•
A Client/Server system is a versatile, message-based and modular infrastructure that is intended intended to improve improve usability usability,, flexibility flexibility,, interoperabi interoperability lity and scalability scalability as compared compared to centralized, mainframe, time-sharing computing. In a client/server system, there are two different kinds of nodes; clients and servers. In simplest sense clients request for the services to the server and servers provide services to the clients. In a peer-to peer-to-pe -peer er archit architect ecture ure,, each node provid provides es user user intera interacti ction on facili facilitie tiess as well well as proce process ssin ing g capabi capabili liti ties es.. A peer peer-t -too-pee peerr archi archite tect ctur uree prov provid ides es both both clie client nt and serv server er functionalities on each node.
119
•
A multi database system is a software system that attempt to logically integrate several differ different ent indepen independent dent distri distribut buted ed DBMSs DBMSs while while allowi allowing ng the local local DBMSs DBMSs to mainta maintain in complete control of their operations. MDBSs can be classified into two different categories: Nonfed Nonfedera erated ted MDBS MDBS and federa federated ted MDBS. MDBS. A federa federated ted MDBS MDBS is again again categor categorize ized d as loosely coupled federated federated MDBS and tightly tightly coupled federated federated MDBS. Further Further,, a tightly tightly coupled federated federated MDBS can be classified classified as single single federation federation tightly coupled federated MDBS and Multiple federations tightly coupled federated MDBS.
6.3 Review Questions 1.
Descri Describe be the archit architect ecture ure of a Client Client/Se /Serve rverr system system.. Compare Compare and contras contrastt betwee between n Client/Server and Peer-to-peer architectures of Distributed DBMS. 2. Wr Write ite down down the the benefits benefits of Client Client/Serv /Server er System System.. 3. Compare and and contrast contrast between two-ti two-tier er and three-tier three-tier architec architecture ture of a client/ser client/server ver system. system. 4. Wr Write ite a short note on peer-t peer-to-peer o-peer architecture architecture.. Briefly discuss discuss the componen componentt archit architect ecture ure of a distri distribut buted ed DBMS. DBMS. Draw Draw the refere reference nce 5. Briefly architecture of distributed database. 6. Why the top three layers in the reference architecture for distributed database systems are often referred referred as site independent schemas? Define physical physical image of a global relation relation at a site. Comment on the differe different nt types types of dependen dependency cy for the bottom bottom layers layers in the reference reference 7. Comment architecture of distributed DBMS. Explain the role of local mapping schema towards the integration of heterogeneous multi-site databases in this context. 8. Comment on the statem statement ent “The allocatio allocation n schema in distribut distributed ed database database architecture architecture is site site independent”. 9. What What is dist distri ribut buted ed data data inde indepe pende ndence nce?? Expl Explai ain n how dist distri ribu bute ted d data data inde indepe pende ndence nce is provided by the architecture of a distributed DBMS. 10.What is Multi-database system? Discuss the utilities of such database system. 11. Briefly discuss the classification of multidatabase system. system. 12. Differentiate between federated and non-federated multidatabase multidatabase system. 13. Write down down the features of a federated multidatabase system. 14. Compare between federated schema and export schema of a federated MDBS. MDBS. 15. Describe the reference architecture of loosely coupled federated multidatabase system.
Exercises 1.
Multiple Choice Questions:
(i)
Which of the following computing models is used by distributed database system? a. Main Mainfr fram amee comp comput utin ing g mode modell b. Discon Disconnect nect,, person personal al compu computin ting g model model c. Client Client/Se /Serve rverr comput computing ing model model d. None None of thes these. e.
120
(ii)
Which of the following is not a property of a server? a. Act Active ive (Ma (Masster) ter) b. b. Waiti aiting ng for for req reque uest stss c. On reque request st serv serves es clien clients ts and and sends sends reply reply d. None None of thes these. e.
(iii)
Which of the following is not a property of a client? a. Act Active ive (Ma (Masster) ter) b. b. Send Sendin ing g requ reques ests ts c. Waits aits unt until il repl reply y arri arrive ves. s. d. None None of thes these. e.
(iv)
Which of the following statement is correct? a. A heavy heavy client client comp complic licate atess the serv server er code code b. A heavy heavy clie client nt simpl simplif ifies ies the the client client code code c. A heavy heavy client client simp simplif lifies ies the the server server code code d. None None of thes these. e.
(v)
A thin client a. Simp Simpli lifi fies es the the ser serve verr code code b. b. Compl Complic icat ates es the the clien clientt code code c. Compl Complic icat ates es the the ser server ver code code d. Simpli Simplifie fiess both both client client and and server server code. code.
(vi)
In distributed DBMS, distributed physical data independence is provided by a. Local Local concep conceptu tual al schem schemaa b. b. Local Local exter external nal sche schema ma c. Globa Globall con concep ceptu tual al schem schemaa d. Local Local mappi mapping ng schem schema. a.
(vii)
In distributed DBMS, distributed logical data independence is provided by a. Local Local concep conceptu tual al schem schemaa b. b. Globa Globall ext exter ernal nal schem schemaa c. Globa Globall con concep ceptu tual al schem schemaa d. Local Local mappi mapping ng schem schema. a.
(viii)
Which of the following statement is correct? a. There There is is no local local user user in a feder federate ated d MDBS MDBS b. There There is no no global global user user in in a non-fed non-federat erated ed MDBS MDBS c. There There is no local local user user in a non-f non-feder ederate ated d MDBS MDBS d. None None of thes these. e.
121
(ix)
Which of the following statement is incorrect? a. A MDBS MDBS provi provides des DBMS DBMS tran transpa sparen rency cy b. A MDBS provides provides complete complete autonomy autonomy to to individual individual databas databasee systems systems c. A MDBS MDBS contr controls ols multip multiple le gate gateway wayss d. None None of the the abo above ve e. All All of of the the abov above. e.
(x)
Which of the following statement is true? a. A loosely loosely coupled coupled federat federated ed MDBS MDBS has no global global conceptual conceptual schema b. A tightly tightly coupled coupled federated federated MDBS has no global global external external schema c. A loosely loosely coupled coupled federat federated ed MDBS MDBS has no local local external external schema schema d. A tightly tightly coupled coupled federated federated MDBS has no local local conceptual conceptual schema. schema.
(xi)
In a peer-to-peer architecture, a. Each node has the client client functi functiona onalit lity y b. Each node has the same same capabi capabilit lity y c. Each node has the the client functionalit functionality y as well well as server server functionali functionality ty d. Both a. a. an and c. c.
(xii)
Which of the following schema is used in federated MDBS? a. Com Compone ponent nt sch schem emaa b. b. Fede Federa rate ted d sch schem emaa c. Expo Export rt schem chemaa d. All of of these e. None of of th these.
122