Visit: http://adityamacro.blogspot.in/
MASTER OF COMPUTER APPLICATIONS (MCA) MCA/ASSIGN/SEMESTER-IV ASSIGNMENTS (July - 2016 & January - 2017)
SCHOOL OF COMPUTER AND INFORMATION SCIENCES INDIRA GANDHI NATIONAL OPEN UNIVERSITY MAIDAN GARHI, NEW DELHI – 110 068 Course Code MCS-043
Submission-Schedule Course Title Advanced Database Management Systems
Assignment No.
MCA(4)/043/Assignment/16-17
Solved By, Aditya (
[email protected] (
[email protected]))
July- December Session
January-June Session
15th October, 2016
15th April, 2017
Visit: http://adityamacro.blogspot.in/
Visit: http://adityamacro.blogspot.in/
Q3. Explain MVD (Multi Valued Dependencies) and join dependency with the help of an example of your choice. Answer: Classification of Dependencies in DBMS: Classification of Dependencies
Which Normal Form Remove these Dependencies
Partial Dependencies Transitive Dependencies Multivalued Dependencies Join Dependencies
Second Normal Form (2NF) Third Normal Form (3NF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) (Dependencies among the Relations/Tables or Databases)
Inclusion Dependency
Multivalued Dependency: Consider a relation Faculty (FID, Course, Book) which consists of two multivalued attributes (Course and Book). The two multivalued attributes are independent of each other. FID
Course
Book
FID
Course
Book
1
C1/C2
B1/B2
1
C1
B1
2
C1
B1
1
C1
B2
1
C2
B1
1
C2
B2
2
C1
B1
It is clear that there are multiple copies of the information about Course and Book. This is an example of a multivalued dependency which occurs when a r elation has more than one independent, multivalued attribute. A multivalued dependency occurs when a relation R has attributes A(FID), B(Course), and C(Book) such that A determines a set of values for B A determines a set of values for C and B and C are independent of each other. (No relation between Course and Book) These multivalued dependencies can be indicated as follows : (FID → → Course) (FID → → Book) Join Dependency: Let R be a relation. Let A, B, …, Z be arbitrary subsets of R’s attributes. R satisfies the JD * ( A, B, …, Z ) If and only if R is equal to the join of its projections on A, B, …, Z. A join dependency JD(R1, R2, …, Rn) specified on relation schema R, is a trivial JD, if one of the relation schemas Ri in JD(R1, R2, ….,Rn) is equal to R.
Q4. Define a simple hash-join and explain the process and cost calculations of hash-join with the help of an example.
Solved By, Aditya (
[email protected])
Visit: http://adityamacro.blogspot.in/
Visit: http://adityamacro.blogspot.in/
Answer: Hash-Join: This is applicable to both the equi-joins and natural joins. A hash function h is used to partition tuples of both relations, where h maps joining attribute (enroll no in our xample) values to {0, 1, ..., n-1}. The join attribute is hashed to the join-hash partitions. In the example of Figure 4 we have used mod 100 function to hashing, and n = 100.
Once the partition tables of STUDENT and MARKS are made on the enrolment number, then only the corresponding partitions will participate in the join as: A STUDENT tuple and a MARKS tuple that satisfy the join condition will have the same value for the join attributes. Therefore, they will be hashed to equivalent partition and thus can be joined easily. Cost calculation for Simple Hash-Join (i) Cost of partitioning r and s: all the blocks of r and s are read once and after partitioning written back, so cost 1 = 2 (blocks of r + blocks of s). (ii) Cost of performing the hash-join using build and probe will require at least one block transfer for reading the partitions Cost 2 = (blocks of r + blocks of s) (iii) There are a few more blocks in the main memory that may be used for evaluation, they may be read or written back. We ignore this cost as it will be too less in comparison to cost 1 and cost 2. Thus, the total cost = cost 1 + cost 2 = 3 (blocks of r + blocks of s) Q5. The following are the relational schemes of Employee, Project and Assigned-to Employee (Emp#, Emp_name, Profession), Project (Proj#, Proj_name, Chief_Architect), Assigned-to (Proj#, Emp#). Create appropriate samples of each relation according to the question. Write the following queries in SQL. (i) Get Emp# of employees working on Project numbered MCS-043. (ii) Get details of employees working on database projects. Solved By, Aditya (
[email protected]) Visit: http://adityamacro.blogspot.in/
Visit: http://adityamacro.blogspot.in/
(iii) Finally create an optimal query tree for each query. Answer: Lets put some dummy values into the tables of database as below: Table Name-
Employee
Table Name-
Project Proj_name
Table Name-
Assignedto
Chief_Architect
Proj#
Emp#
Emp#
Emp_name
Profession
Proj#
Emp1
Vinay
DBA
MCS-043
Database
Vinay
MCS-043
Emp1
Akhand
MCS-043
Emp2
Emp2
Harish
DBA
MCS-041
Operating System
Emp3
Ramesh
Web Developer
MCS-042
Networking
Kishor
MCS-042
Emp3
Emp4
SatyaRanjan
Web Designer
MCS-012
C Language
Abhinav
MCS-041
Emp4
Emp5
Gagan
Software Engineer
MCS-022
Java
Amit
MCS-012
Emp5
1. To get Emp# of employees working on Project numbered MCS-043 below query will be used: Select Emp# from Asssigned-to where Proj#=’MCS-043’; And the output will be Emp1, Emp2 2. Select A.Emp#, Emp_name from A.Assigned-to, Employee where project#=’MCS-043’;
Q6. Given the following semi-structure data in XML, create the DTD (Document Type Declaration) for it
What are the different options available for storing XML data? Answer: In semi-structured data, the schema or format information is mixed with the data values, since each data object can have different attributes that are not known earlier. Thus, this type of data is sometimes referred to as selfdescribing data. The basic object in XML is the XML document. There are two main structuring concepts that construct an XML document: Elements and attributes: Attributes in XML describe elements. Elements are identified in a document by their start tag and end tag. The tag names are enclosed between angular brackets <…>, and end tags are further identified by a backslash …>. Complex elements are constructed from other elements hierarchically, whereas simple elements contain data values. Thus, there is a correspondence between the XML textual representation and the tree structure. In the tree representation of XML, internal nodes represent complex elements, whereas leaf nodes represent simple elements. That is why the XML model is called a tree model or a hierarchical model.
Solved By, Aditya (
[email protected])
Visit: http://adityamacro.blogspot.in/
Visit: http://adityamacro.blogspot.in/
Document Type Declaration (DTD): DTD is one of the component of XML document. A DTD is used to define the syntax and grammar of a document, that is, it defines the meaning of t he document elements. XML defines a set of key words, rules, data types, etc to define the permissible structure of XML documents. In other words, we can say that you use the DTD grammar to define the grammar of your XML documents. Below is form of DTD for the given semistructured XML data:
Student_Data Vishal Rohini Sector 22, Delhi Vinod Rajeev Chowk, New Delhi
Q7. What is data mining? How is it different from OLTP? What is classification in context of data mining? Answer: Data Mining: Data is growing at a phenomenal rate today and the users expect more sophisticated information from this data. There is need for new techniques and tools that can automatically generate useful information and knowledge from large volumes of data. Data mining is one such technique of generating hidden information from the data. Data mining can be defined as: “an automatic process of extraction of non-trivial or implicit or previously unknown but potentially useful information or patterns from data in large databases, data warehouses or in flat files”. Data mining is related to data warehouse in this respect that, a data warehouse is well equipped for providing data as input for the data mining process. The advantages of using the data of data warehouse for data mining are or many some of them are listed below: • Data quality and consistency are essential for data mining, to ensure, the accuracy of the predictive models. In data warehouses, before loading the data, it is first extracted, cleaned and transformed. We will get good results only if we have good quality data. • Data warehouse consists of data from multiple sources. The data in data warehouses is integrated and subject oriented data. The data mining process performed on this data. • In data mining, it may be the case that, the required data may be aggregated or summarised data. This is already there in the data warehouse. • Data warehouse provides the capability of analysing data by using OLAP operations. Thus, the results of a data mining study can be analysed for hirtherto, uncovered patterns . As defined earlier, data mining generates potentially useful information or patterns from data. In fact, the information generated through data mining can be used to create knowledge. So let us, first, define the three terms data, information and knowledge.
OLTP vs. OLA: We can divide IT systems into transactional (OLTP) and analytical (OLAP). In general we can assume that OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it. Solved By, Aditya (
[email protected])
Visit: http://adityamacro.blogspot.in/
Visit: http://adityamacro.blogspot.in/
OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF). OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time i s an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema). The following table summarizes the major differences between OLTP and OLAP system design.
Specification
OLTP System
OLAP System
Online Transaction Processing
Online Analytical Processing
(Operational System)
(Data Warehouse)
Source of data
Operational data; OLTPs are the original source of t he data.
Consolidation data; OLAP data comes from the various OLTP Databases
Purpose of data
To control and run fundamental business tasks
To help with planning, problem solving, and decision support
What the data
Reveals a snapshot of ongoing business processes
Multi-dimensional views of various kinds of business activities
Inserts and Updates
Short and fast inserts and updates initiated by end users
Periodic long-running batch jobs refresh the data
Queries
Relatively standardized and simple queries Returning relatively few records
Often complex queries involving aggregations
Processing Speed
Typically very fast
Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes
Space Requirements
Can be relatively small if historical data is archived
Larger due to the existence of aggregation structures and history data; requires more indexes t han OLTP
Database Design
Highly normalized with many tables
Typically de-normalized with fewer t ables; use of star and/or snowflake schemas
Backup and Recovery
Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability
Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method
Q8. How will you enforce referential integrity constraints in Oracle? Explain with the help of one example. Solved By, Aditya (
[email protected])
Visit: http://adityamacro.blogspot.in/
Visit: http://adityamacro.blogspot.in/
Answer: Data Integrity explains how to use integrity constraints to enforce the business rules associated with your database and prevent the entry of invalid information into tables. Introduction to Data Integrity: It is important that data adhere to a predefined set of rules, as determined by the database administrator or application developer. As an example of data integrity, consider the tables employees and departments and the business rules for the information in each of the tables, as illustrated in Figure
Types of Data Integrity: This section describes the rules that can be applied to table columns to enforce different types of data integrity. Null Rule: A null rule is a rule defined on a single column that allows or disallows inserts or updates of rows containing a null (the absence of a value) in that column. Unique Column Values: A unique value rule defined on a column (or set of columns) allows the insert or update of a row only if it contains a unique value in that column (or set of columns). Primary Key Values: A primary key value rule defined on a key (a column or set of columns) specifies that each row in the table can be uniquely identified by the values in the key. Referential Integrity Rules: A referential integrity rule is a rule defined on a key (a column or set of columns) in one table that guarantees that the values in that key match the values in a key in a related table (the referenced value). Referential integrity also includes the rules that dictate what types of data manipulation are allowed on referenced values and how these actions affect dependent values. The rules associated with referential integrity are: Restrict: Disallows the update or deletion of referenced data.
Solved By, Aditya (
[email protected])
Visit: http://adityamacro.blogspot.in/
Visit: http://adityamacro.blogspot.in/
Set to Null: When referenced data is updated or deleted, all associated dependent data is set to NULL. Set to Default: When referenced data is updated or deleted, all associated dependent data is set to a default value. Cascade: When referenced data is updated, all associated dependent data is correspondingly updated. When a referenced row is deleted, all associated dependent rows are deleted. No Action: Disallows the update or deletion of referenced data. This differs from RESTRICT in that it is checked at the end of the statement, or at the end of the transaction if the constraint is deferred. (Oracle uses No Action as its default action.) How Oracle Enforces Data Integrity Oracle enables you to define and enforce each type of data integrity rule defined in the previous section. Most of these rules are easily defined using integrity constraints or database triggers. Integrity Constraints Description: An integrity constraint is a declarative method of defining a rule for a column of a table. Oracle supports the following integrity constraints: NOT NULL constraints for the rules associated with nulls in a column UNIQUE key constraints for the rule associated with unique column values PRIMARY KEY constraints for the rule associated with primary identification values FOREIGN KEY constraints for the rules associated with referential integrity. Oracle supports the use of FOREIGN KEY integrity constraints to define the referential integrity actions, including: Update and delete No Action Delete CASCADE Delete SET NULL CHECK constraints for complex integrity rules Note:You cannot enforce referential integrity using declarative integrity constraints if child and parent tables are on different nodes of a distributed database. However, you can enforce referential integrity in a distributed database using database triggers. Database Triggers: Oracle also lets you enforce integrity rules with a non-declarative approach using database triggers (stored database procedures automatically invoked on insert, update, or delete operations). Overview of Integrity Constraints: Oracle uses integrity constraints to prevent invalid data entry into the base tables of the database. You can define integrity constraints to enforce the business rules you want to associate with the information in a database. If any of the results of a DML statement execution violate an integrity constraint, t hen Oracle rolls back the statement and returns an error. For example, assume that you define an integrity constraint for the salary column of the employees table. This integrity constraint enforces the rule that no row in this table can contain a numeric value greater than 10,000 in this column. If an INSERT or UPDATE statement attempts to violate this integrity constraint, then Oracle rolls back the statement and returns an information error message. Referential Integrity Constraints: Different tables in a relational database can be related by common columns, and the rules that govern the relationship of the columns must be maintained. Referential integrity rules guarantee that these relationships are preserved. The following terms are associated with referential integrity constraints. Solved By, Aditya (
[email protected])
Visit: http://adityamacro.blogspot.in/
Visit: http://adityamacro.blogspot.in/ Term
Definition
Foreign key
The column or set of columns included in the definition of the referential integrity constraint that reference a referenced key.
Referenced key
The unique key or primary key of the same or different table that is referenced by a foreign key.
Dependent or child table
The table that includes the foreign key. Therefore, it is the table that is dependent on the values present in the referenced unique or primary key.
Referenced or parent table
The table that is referenced by the child table's foreign key. It is this table's referenced key that determines whether specific inserts or updates are allowed in the child table.
A referential integrity constraint requires that for each row of a table, the value in the foreign key matches a value in a parent key. Foreign keys can be defined as multiple columns. However, a composite foreign key must reference a composite primary or unique key with the same number of columns and the same datatypes. Because composite primary and unique keys are limited to 32 columns, a composite foreign key is also limited to 32 columns.
Q9. How does PostgeSQL perform storage and indexing of tables? Briefly discuss the type of indexes involved in PostgeSQL. Solved By, Aditya (
[email protected])
Visit: http://adityamacro.blogspot.in/
Visit: http://adityamacro.blogspot.in/
Answer: PostgreSQL – INDEXES: Indexes are special lookup tables that the database search engine can use to speed up data retrieval. Simply put, an index is a pointer to data in a table. An index in a database is very similar to an index in the back of a book. For example, if you want to reference all pages in a book that discuss a certain topic, you first refer to the index, which lists all topics alphabetically and are then referred to one or more specific page numbers. An index helps speed up SELECT queries and WHERE clauses, but it slows down data input, with UPDATE and INSERT statements. Indexes can be created or dropped with no effect on the data. Creating an index involves the CREATE INDEX statement, which allows you to name the index, to specify the table and which column or columns to index, and to indicate whether the index is in ascending or descending order. Indexes can also be unique, similar to the UNIQUE constraint, in that the index prevents duplicate entries in the column or combination of columns on which there's an index. The CREATE INDEX Command: The basic syntax of CREATE INDEX is as follows: CREATE INDEX index_name ON table_name; Index Types: PostgreSQL provides several index types: B-tree, Hash, GiST, SP-GiST and GIN. Each index type uses a different algorithm that is best suited to different types of queries. By default, the CREATE INDEX command creates B-tree indexes, which fit the most common situations. Single-Column Indexes: single-column index is one that is created based on only one table column. The basic syntax is as follows: CREATE INDEX index_name ON table_name (column_name); Multicolumn Indexes: A multicolumn index is defined on more than one column of a table. The basic syntax is as follows: CREATE INDEX index_name ON table_name (column1_name, column2_name); Whether to create a single-column index or a multicolumn index, take into consideration the column(s) that you may use very frequently in a query's WHERE clause as filter conditions. Should there be only one column used, a single-column index should be the choice. Should there be two or more columns that are frequently used in the W HERE clause as filters, the multicolumn index would be the best choice. Unique Indexes: Unique indexes are used not only for performance, but also for data integrity. A unique index does not allow any duplicate values t o be inserted into the table. The basic syntax is as follows: CREATE UNIQUE INDEX index_name on table_name (column_name); Partial Indexes: A partial index is an index built over a subset of a table; the subset is defined by a conditional expression (called the predicate of the partial index). The index contains entries only for those table rows that satisfy the predicate. The basic syntax is as follows: CREATE INDEX index_name on table_name (conditional_expression);
Solved By, Aditya (
[email protected])
Visit: http://adityamacro.blogspot.in/
Visit: http://adityamacro.blogspot.in/
Implicit Indexes: Implicit indexes are indexes that are automatically created by the database server when an object is created. Indexes are automatically created for primary key constraints and unique constraints. Example: Following is an example where we will create an index on COMPANY table for salary column: # CREATE INDEX salary_index ON COMPANY (salary); Now, let's list down all the indices available on COMPANY table using \d company command as follows: # \d company This will produce the following result, where company_pkey is an implicit index which got created when the table was created. Table "public.company" Column | Type | Modifiers ---------+---------------+----------id name
| integer | text
| not null | not null
age
| integer
| not null
address | character(50) | salary | real | Indexes: "company_pkey" PRIMARY KEY, btree (id) "salary_index" btree (salary)
You can list down the entire indexes database wide using the \di command: The DROP INDEX Command: An index can be dropped using PostgreSQL DROP command. Care should be taken when dropping an index because performance may be slowed or improved. The basic syntax is as follows: DROP INDEX index_name; You can use following statement to delete previously created index: # DROP INDEX salary_index;
Solved By, Aditya (
[email protected])
Visit: http://adityamacro.blogspot.in/