Bill Inmon vs. Ralph Kimball
Vladimir Marković, software architect at Banca Intesa Beograd
Agenda •
•
Facts without bias –
History
–
Data warehouse paradigms
–
Diversity
–
Pros & Cons
–
Make decision
Make compromise –
Find the weaknesses
–
Development techniques
–
Take the best of the both approaches
Facts without bias
History •
•
•
1990 - Inmon publishes “Building the Data Warehouse” 1996 - Kimball publishes “The Data Warehouse Toolkit” 2002 –
–
Inmon updates book and defines architecture for collection of disparate sources into detailed, time variant data store - the top down approach Kimball updates book and defines multiple databases called data marts that are organized by business processes, but use enterprise standard data bus -the bottom-up approach
Data warehouse paradigms •
•
Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form. Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.
Diversity Ralph Kimball
Bill Inmon
Build business process oriented small data marts which are joined to each other using common dimensions between business process.
One centralize data warehouse which will act as a enterprise-wide datawarehouse and then build data mart as per need for specific department or process
It is known as bottom-up approach
It is known as top down approach
Data marts should be build on dimensional modelling approach
Central data warehouse to follow ER modeling approach
Inmon’s top-down
approach
Bill Inmon, an early and influential practitioner, has formally defined a data warehouse in the following terms; –
Subject-oriented •
–
Time-variant •
–
The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;
Non-volatile •
–
The data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;
Data in the database is never over-written or deleted - once committed, the data is static, read-only, but retained for future reporting;
Integrated •
The database contains data from most or all of an organization's operational applications, and that this data is made consistent
Kimball’s bottom-up
approach
Ralph Kimball, a leading proponent of the dimensional approach to building data warehouses, provides a succinct definition for a data warehouse: –
“A copy of transaction data specifically structured for query and analysis.“
Inmon vs. Kimball: Similar or different? •
Kimball, in 1997, stated that –
–
•
"...the data warehouse is nothing more than the union of all the data marts", Kimball indicates a bottom-up data warehousing methodology in which individual data marts providing thin views into the organizational data could be created and later combined into a larger all-encompassing data warehouse.
Inmon responded in 1998 by saying, –
–
"You can catch all the minnows in the ocean and stack them together and they still do not make a whale ,“ This indicates the opposing view that the data warehouse should be designed from the top-down to include all corporate data. In this methodology, data marts are created only after the complete data warehouse has been created.
Inmon vs. Kimball: Similar or different?
Pros and cons of both the approaches
How to decide? •
•
•
•
Insurance: It is vital to get the overall picture with respect to individual clients, groups, history of claims, mortality rate tendencies, demography, profitability of each plan and agents, etc. All aspects are inter-related and therefore suited for the Inmon’s approach. Marketing: This is a specialized division, which does not call for enterprise warehouse. Only data marts are required. Hence, Kimball’s approach is suitable. CRM in banks: The focus is on parameters such as products sold, up-sell and cross-sell at a customer-level. It is not necessary to get an overall picture of the business. For example, there is no need to link a customer’s details to the treasury department dealing with forex transactions and regulations. Since the scope is limited, you can go for Kimball’s method. However, if the entire processes and divisions in the bank are to be linked, the obvious choice is Inmon’s design vs. Kimball’s. Manufacturing: Multiple functions are involved here, irrespective of the budget involved. Thus, where there is a systemic dependency as in this case, an enterprise model is required. Hence Inmon’s method is ideal.
In a nutshell
Bill Inmon’s enterprise data warehouse approach (the top-down design): A normalized data model is designed first. Then the dimensional data marts, which contain data required for specific business processes or specific departments are created from the data warehouse. Ralph Kimball’s dimensional design approach (the bottom-up design): The data marts facilitating reports and analysis are created first; these are then combined together to create a broad data warehouse.
Make compromise ???
Find the weaknesses
Development techniques •
User oriented
•
Operational oriented
•
Business process oriented
User oriented warehouse development •
•
•
•
the significant metrics are based on the company’s critical success factors and these factors depend on the company’s management strategy and goals the corresponding dimensions relate to influences on the critical success factors use interviews in gathering the informational requirements problems: –
potential ignorance of the users
–
most users have only a slight idea of their demands
Operational oriented warehouse development •
•
•
to analyze the data models of the underlying operational sources and to identify the relevant transactions the major problem with operational oriented warehouse development is the restriction of data models. data models specify the business of a company incompletely
Business process oriented warehouse development (Kimball) •
•
the business process model contains a formal description of the informational requirements of the users possibility to identify informational requirements that could not be satisfied with the actual information offerings of the source systems
How to design dimensional model -Kimball approach•
DW gathering and documenting enterpriselevel business requirements
•
Analytic themes
•
Report bus matrix
•
Data quality review – gap analysis
•
Design DW data model
•
Data dictionary
What would the sponsors like to see? •
Decrease startup cost
•
Speed up initial set – up
•
Transparent business value of the deliverables – how to measure the investment
•
Decision making support
•
…
What would the BI users like to see? •
•
Covered requested analysis Understandable and easy to use structures for ad hoc reporting
•
Predefined reports and dashboards
•
Excellent BI tools
•
…
What would IT folks like to see? •
Big budget
•
Clearly functional specification
•
Enough time for developing without stress
•
Transparent UAT
•
Easy to maintenance
•
Easy to enhancement
•
Easy to enrichment
•
…
Take the best of the both approaches •
Use Inmon approach to design EDW – –
•
Spend time for ETL conceptual design & strategy –
– –
•
– –
• • •
Create two type of tables for each entities : staging table with SCD Type1 attributes and row version table with SCD Type2 attributes Establish view layer if it possible Establish metadata repository and try to make code generator
Use Kimball approach to design DMs –
•
design EDW data model for specific industry if you have time or buy if you have money try to identify all entities (dimensions) without insight in particular data mart (DM)
cover all requested analysis use shared dimensions and star schema make dimension model understandable and easy to use for business users
Populate EDW with all dimensional data – serve improvement Populate EDW with fact data necessary to feed DMs Populate DDMs using EDW as staging Develop BI applications – try to get more without redesign DM and developing ETLs
Books: •
•
•
•
Building The Data Warehouse. Third Edition – Bill Inmon The Data Warehouse Toolkit, Second Edition - Ralph Kimball, Margy Ross The Data Warehouse Lifycycle Toolkit, Second Edition – Ralph Kimball, Margy Ross, Waren Thornthwaite, Joy Mundy, Bob Bacer The Microsoft Data Warehouse Toolkit - Ralph Kimball, Waren Thornthwaite, Joy Mundy
Articles: • • •
•
• • • • • •
Data Warehousing: Our Great Debate Wraps Up Can't Inmon and Kimball just get along? Inmon vs. Kimball data warehousing: the debate over DW architecture How to choose between the Inmon vs. Kimball approach for data warehouse design Bill Inmon: Kimball methodology ignores the value of textual data Data Warehouse Architecture: The Great Debate Ralph Kimball vs. Bill Inmon approaches to data warehouse design Alternative data warehousing frameworks Resources for Oracle DBA learning about DW principles Is an Inmon-modeled BI system, like Madison, the future of data warehousing?