DataWareHousing DataWareHous ing - ETL E TL Project Life Cycle Posted by Suresh at 1:08 at 1:08 AM Labels: AM Labels: Data Data Warehouse, Warehouse, ETL ETL,, Life Cycle 0 comments
The following are the different phases involved in a ETL E TL project development life cycle. 1) Requirement Gathering a) Business Requirement Collection ( BRD ) b) System Requirement Collection ( SRD ) 2) Design Phase a) High Level Design Document ( HRD ) b) Low level Design Document ( LLD ) 3) Development Phase a) Mapping Design b) Code Review c) Peer Review 4) Testing a) Unit Testing b) System Integration Testing. c) USer Acceptance Testing ( UAT ) 5) Pre - Production 6) Production ( Go-Live )
Requirement Gathering: ---------------------------------------------
Business Requirement Collection ( BRD ): -> The business requirement gathering start by business Analyst, onsite technical lead and client business users. -> In this phase,a Business Analyst prepares Business Requirement Document ( BRD ) (or) Business Requirement Specifications ( BRS ) -> BR collection takes place at client location. -> The o/p from BR Analysis are -> BRS Business Requirement Specifications -> SRS System Requirement Specifications
Sy stem Requirement Collection ( SRD ): Senior technical people (or) ETL architect will prepare the SRS which contains s/w and h/w details The SRS will includes a) O/S to be used ( windows or unix ) b) RDBMS required to build database ( oracle, Teradata etc ) c) ETL tools required ( Informatica,Datastage ) d) OLAP tools required ( Cognos ,BO ) The SRS is also called as Technical Requirement Specifications ( TRS )
Design Phase: ---------------------------------------> The o/p from design and planning phase is a) HLD ( High Level Design ) Document b)LLD ( Low Level Design ) Document
HLD ( High Level Design ) Document : An ETL Architect and DWH Architect participate in designing a solution to build a DWH. An HLD document is prepared based on Business Requirement.
LLD ( Low Level Design ) Document : Based on HLD,a senior ETL developer prepare Low Level Design Document The LLD contains more technical details of an ETL System. An LLD contains data flow diagram ( DFD ), details of source and targets of each mapping. An LLD also contains information about full and incremental load. After LLD then Development Phase will start
Development Phase : -------------------------------------------------Mapping: -> Based an LLD, the ETL team will create mapping ( ETL Code ) -> After designing the mappings, the code ( Mappings ) will be reviewed by developers. Code Review :-> Code Review will be done by developer. -> In code review,the developer will review the code and the logic but not the data. -> The following activities takes place in code review
-> You have to check the naming standards of transformation,mappings of data etc. -> Source and target mapping ( Placed the correct logic or not in mapping ) Peer Review :-> The code will reviewed by your team member ( third party developer )
Testing: --------------The following various types testing carried out in testing environment. 1) Unit Testing 2) Development Integration Testing 3) System Integration Testing 4) User Acceptance Testing Unit Testing :-> A unit test for the DWH is a white Box testing,It should check the ETL procedure and Mappings. -> The following are the test cases can be executed by an ETL developer. 1) Verify data loss 2) No.of records in the source and target 3) Dataload/Insert 4) Dataload/Update 5) Incremental load 6) Data accuracy 7) verify Naming standards. 8) Verify column Mapping -> The Unit Test will be carried by ETL developer in development phase. -> ETL developer has to do the data validations also in this phase. Development Integration Testing -> Run all the mappings in the sequence order. -> First Run the source to stage mappings. -> Then run the mappings related to dimensions and facts. Sy stem Integration Testing :-
-> After development phase,we have to move our code to QA environment. -> In this environment,we are giving read-only permission to testing people. -> They will test al l the workflows.
-> And they will test our code according to their standards. User Acceptance Testing ( UAT ) :-> This test is carried out in the presence of client side technical users to verify the data migration from source to destination.
Production Environment :---------------------------------> Migrate the code into the Go-Live environment from test environment ( QA Environment ).
Thanks: shivakrishnas LINKS TO THIS POST
Fact Table Posted by Suresh at 12:36 AM Labels: Data Warehouse, Fact Table 0 comments
Fact Table:
a fact table consists of the measurements, metrics or facts of a business process.
Measures or Metrics: These represents the business for ex. Transaction is the business in banking and selling is the business in retail.
Grain or Granularit y : The grain of a fact table represents the most atomic level by which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store". Each record in this fact table is therefore uniquely defined by a day, product and store. Other dimensions might be members of this fact table (such as location/region) but these add nothing to the uniqueness of the fact records. These "affiliate dimensions" allow for additional slices of the independent facts but generally provide insights at a higher level of aggregation (a region contains many stores).
Types of Facts: There are three types of facts: Additive: Additive facts are facts that can be summe d up through all of the dimensions in the fact table. Semi-Additive: Semi-additive facts are facts that can be summe d up for some of the dimensions in the fact table, but not the others.
y y
Non-Additive: Non-additive
y
facts are facts that cannot be summe d up for any of the dimensions present in
the fact table. Let us use examples to illustrate each of the three types of facts. The f irst example assumes that we are a reta iler, and we have a fact table w ith the following columns: Date Store
Product Sales_ Amount
The purpose of this table is to recor d the sales amount for each pro duct in each store on a daily basis. Sales_ Amount is the fact. In this case, Sales_ Amount is an additive fact, because you can sum up th is fact along any of the three dimensions present in the fact table -- date, store, and product. For example, the sum of Sales_ Amount for all 7 days in a week represent the total sales amount for that week. Say
we are a bank with the following fact table: Date Account
Current_Balance Prof it_Margin The purpose of this table is to recor d the current balance for each account at the en d of each day, as well as the prof it margin for each account for each day. Current_Balance and Prof it_Margin are the facts.Current_Balance is a semadditive fact, as it makes sense to a dd them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to a dd them up through time (adding up all current balances for a g iven account for each day of the month does not give us any useful information). Prof it_Margin is a non-additive fact, for it does not make sense to a dd them up for the account level or the day level. LINKS TO THIS POST
Data Warehouse Definition Posted by Suresh at 11:32 PM Labels: Data Warehouse 0 comments
The most popular definition came from Bill Inmon, who provided the following: A
data warehouse is a subject-oriented, integrated, time-variant and non-volatile
collection of data in support of management's decision making process. Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product. Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may ho ld the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.
Non-volatile:
Once data is in the data warehouse, it will not change. So, historical data in
a data warehouse should never be altered.
Ralph
A
Kimball provided
a more concise definition of a dat a warehouse:
data warehouse is a copy of transaction data specifically structured for query and
analysis. This is a functional view of a data warehouse. Kimball did not address how the data warehouse is built like Inmon did, rather he focused on the functionality of a data warehouse.