Data Warehouse: Meaning, Characteristics and Benefits Article Shared by Diksha S
After reading reading this article article you will will learn about:about:- 1. Meaning Meaning of Data Data Warehouse 2. Characteristics Characteristics of Data Data Warehouse Warehouse . Benefits Benefits !. Di"ensio Di"ensions. ns.
Meaning of Data Warehouse: As companies companies have grown larger larger they have become become separated separated both geographically geographically and culturally from the markets and customers they serve. Disney, an American corporation, has operations in Europe, Asia and Australasia, as well as in the USA. Benetton, the rench fashion brand has operations across five continents. !n retailing alone it operates over "### stores and concessions. $ompanies such as these generate a huge volume of data that needs to be converted into information that can be used for both operational and analytical purposes. %he data warehouse is a solution solution to that problem. problem. Data Data ware houses are are really no more more than repositories of large amounts of operational, historical historical and other customer&related customer&related data. Data volume can reach terabyte levels, levels, i.e. ' (# bytes of data. A warehouse is a repository for data imported from other databases. Attached to the front end of the warehouse is a set set of analytical analytical procedures for making sense sense out of the data. )etailers, )etailers, home shopping companies and banks have been early adopters of data warehouses. Different people have different definitions for a data warehouse. #he "ost $o$ular definition ca"e fro" Bill %n"on, who $ro&ided the following: *A data warehouse is a sub+ect&oriented, integrated, time&variant and non&volatile collection of data in support of managements decision making process.'ub(ect-)riented: A data warehouse warehouse can be used to analye analye a particular particular sub+ect area. area. or e/ample, e/ample, *sales*salescan be a particular sub+ect.
%ntegrated: A data warehouse warehouse integrates data data from multiple multiple data sources. sources. or e/ample, e/ample, source A and and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product. #i"e-*ariant: 0istorical data is kept in a data warehouse. or e/ample, one can retrieve data from 1 months, 2 months, 3' months, or even older data from a data warehouse. %his contrasts with a transactions transactions system, system, where often often only the most most recent data is kept. or e/ample, e/ample, a transaction system may hold the most recent address of a customer, where a data warehouse can can hold all addresses addresses associated associated with a customer. customer. +on-&olatile: 4nce data is in the data warehouse, it will not change. So, historical data in a data warehouse should should never be altered. altered. al$h i"ball $ro&ided a "ore concise definition of a data warehouse: *A data warehouse is a copy of transaction data specifically structured structured for 5uery and analysis.%his is a functional view of a data warehouse. 6imball did not address how the data warehouse is built built like !nmon !nmon did, rather he focused focused on the functionality functionality of a data data warehouse. A data data warehouse 7D89 7D89 is a database used used for reporting reporting and analysis. %he %he data stored in the warehouse is uploaded from the operational systems. %he data may pass through an operational data store for additional operations before it is used in the D8 for reporting. A data warehouse warehouse "aintains "aintains its its functions functions in three three layers: :ayer;3 Staging :ayer; ' !ntegration :ayer; 1 access.
Staging is used to store raw data for use by developers. %he integration layer is used to integrate data and to have a level of abstraction from users. %he access layer is for getting data out for users. 4ne thing to mention about data warehouse is that they can be subdivided into data marts. 8ith data marts it stores subsets of data from a warehouse, which focuses on a specific aspect of a company like sales or a marketing process. %his definition of the data warehouse focuses on data storage. %he main source of the data is cleaned, transformed, catalogued and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support. 0owever, the means to retrieve and analye data, to e/tract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system.
Characteristics of Data Warehouse: i. 'ub(ect-oriented : %he warehouse organies data around the essential sub+ects of the busi ness 7customers and products9 rather than around applications such as inventory management or order processing. i.%ntegrated: !t is consistent in the way that data from several sources is e/tracted and transformed. or e/ample, coding conventions are standardied; < = male, = female. ii. #i"e-&ariant: Data are organied by various time&periods 7e.g. months9. iii. +on-&olatile:
%he warehouses database is not updated in real time. %here is periodic bulk uploading of transactional and other data. %his makes the data less sub+ect to momentary change. %here are a number of steps and processes in building a warehouse. irst, you must identify where the relevant data is stored. %his can be a challenge.8hen the $ommonwealth Bank opted to implement $)< in its retail banking business, it found that relevant customer data were resident on over ># separate systems. Secondly, data must be e/tracted from those systems. !t is possible that when these systems were developed they were not e/pected to align with other systems. %he data then needs to be transformed into a standardied, consistent and clean format. Data in different systems may have been stored in different forms. Also, the cleanliness of data from different parts of the business may vary. %he culture in sales may be very driven by 5uarterly performance targets. ?etting sales representatives to maintain their customer fi les may be not straightforward.
Benefits of a Data Warehouse: A data warehouse maintains a copy of information from the source transaction systems. #his architectural co"$leity $ro&ides the o$$ortunity to: a.
c. !mprove data, by providing consistent codes and descriptions, flagging or even fi/ing bad data. d. @resent the organiations information consistently. e. @rovide a single common data model for all data of interest regardless of the datas source. f. )estructure the data so that it makes sense to the business users. g. )estructure the data so that it delivers e/cellent 5uery performance, even for comple/ analytic 5ueries, without impacting the operational systems. h. Add value to operational business applications, notably customer relationship management 7$)<9 systems.
Dimensions of Data Warehouse: A dimension is a data element that categories each item in a data set into non& overlapping regions. A data warehouse dimension provides the means to *slice and dicedata in a data warehouse. Dimensions provide structured labeling information to otherwise unordered numeric measures. or e/ample, *$ustomer-, *Date-, and *@roduct- are all dimensions that could be applied meaningfully to a sales receipt. A dimensional data element is similar to a categorical variable in statistics. %he primary function of dimensions is threefold; to provide filtering, grouping and labeling. or e/ample, in a data warehouse where each person is categoried as having a gender of male, female or unknown, a user of the data warehouse would then be able to filter or categorie each presentation or report by either filtering based on the gender dimension or displaying results broken out by the gender. Each dimension in a data warehouse may have one or more hierarchies applied to it. or the *Date- dimension, there are several possible hierarchies; *Day
b. unk dimension c. Degenerate dimension d. )ole&playing dimension a. Confor"ed di"ension: A conformed dimension is a set of data attributes that have been physically implemented in multiple database tables using the same structure, attributes, domain values, definitions and concepts in each implementation. Dimensions are conformed when they are either e/actly the same 7including keys9 or one is a perfect subset of the other.
4ne solution is to create a new dimension for each of the remaining attributes, but due to their nature, it could be necessary to create a vast number of new dimensions resulting in a fact table with a very large number of foreign keys. %he designer could also decide to leave the remaining attributes in the fact table but this could make the row length of the table unnecessarily large if, for e/ample, the attributes is a long te/t string. %he solution to this challenge is to identify all the attributes and then put them into one or several unk Dimensions. 4ne unk Dimension can hold several truefalse or yesno indicators that have no correlation with each other, so it would be convenient to convert the indicators into a more describing attribute. An e/ample would be an indicator about whether a package had arrived, instead of indicating this as *yes- or *no-, it would be converted into *arrived- or *pending- in the +unk dimension. %he designer can choose to build the dimension table so it ends up holding all the indicators occurring with every other indicator so that all combinations are covered. %his sets up a fi/ed sie for the table itself which would be '/ rows, where / is the number of indicators. %his solution is appropriate in situations where the designer would e/pect to encounter a lot of different combinations and where the possible combinations are limited to an acceptable level. !n a situation where the number of indicators are large, thus creating a very big table or where the designer only e/pect to encounter a few of the possible combinations, it would be more appropriate to build each row in the +unk dimension as new combinations are encountered. %o limit the sie of the tables, multiple +unk dimensions might be appropriate in other situations depending on the correlation between various indicators. unk dimensions are also appropriate for placing attributes like non&generic comments from the fact table. Such attributes might consist of data from an optional comment field when a customer places an order and as a result will probably be blank in many cases. %herefore the +unk dimension should contain a single row representing the blanks as a surrogate key that will be used in the fact table for every row returned with a blank comment field c. Degenerate di"ension: A dimension key, such as a transaction number, invoice number, ticket number, or bill& of& lading number, that has no attributes and hence does not +oin to an actual dimension
table. Degenerate dimensions are very common when the grain of a fact table represents a single transaction item or line item because the degenerate dimension represents the uni5ue identifier of the parent. Degenerate dimensions often play an integral role in the fact tables primary key. d. ole-$laying di"ensions: Dimensions are often recycled for multiple applications within the same database. or instance, a *Date- dimension can be used for *Date of Sale-, as well as *Date of Delivery-, or *Date of 0ire-. %his is often referred to as a *role&playing dimension-.
Business
Data warehouse 7D809 in its simplest form is a data repositorystore specifically modeleddesigned for high performance and efficient reporting and analysis of historic, current and calculated data. Usually a good Business !ntelligence Solution is backed by a Data 8arehouse. 3.
!n a Data 8arehouse data is historiedversioned. E/ & $ustomer moves from status A to B, %ransactional systems usually store only the current status whereas D80 stores both the records with time periods 7valid from and valid to9.
'. 0igh performance 7minimum time9 during data read operations. Data is denormalied in the access layer of the Data 8arehouse by intentionally introducing controlled redundancy. Usually the Data access layer of D80 is modeled as Star Schema. 1. Data lineage is maintained i.e Data sources can be tracked (. Data is clean and unified. E/ & one data source may store . 0as large data storage capacity H. Usually D80 is loaded using batch +obs and the +obs are asynchronous to data changes in source systems. 3#. Few data for e/isting data areas can be added with ease. 33. Few data sources can be integrated with ease. 3'. Data within the D80 is arranged based on sub+ect areas than based on applications. E/ ' or more source applications may have sales data, 0owever in a D80 when a sales Data
Data Warehousing and Data Mining
1. Data warehousing
The data ware house is the modern concept of database management system. The term data warehouse is given by W.H. Inmon.
•
W.H. Inmon:”A subect oriented integrated! nonvo"ati"e! time#variant co""ection of data in support of management decision is ca""ed data warehouse.”
•
Ralph Kimball: Data warehouse is the cong"omerate of a"" data marts within the enterprises. Information is a"ways stored in the dimensiona" mode".
•
A co""ection of decision support techno"ogies! aimed at enab"ing the $now"edge wor$er %e&ecutive! manager! and ana"yst' to ma$e better and faster decisions is ca""ed data warehouse.
Characteristics of data warehouse: •
Mu"ti#user support
•
Accessibi"ity
•
Transparency
•
("ient#server Architecture
•
)"e&ib"e *eporting
•
+eneric Dimensiona"ity
•
Mu"tidimensiona" conceptua" views
Functionalit of data warehouse: •
*o""#up
•
*o""#down
•
,ivot
•
-"ice and Dice
•
-orting
•
-e"ection
•
Derived %computed' attributes
On the basis of architecture, there are three data warehouse models: %a' !nterprises Warehouse: An nterprises warehouse co""ects a"" of the information about subects concerning the entire organi/ation. It provides corporate wide data integration.
%b' Data "art: Data marts are usua""y imp"emented on "ow cost departmenta" servers. The imp"ementation cyc"e of data mart is genera""y measured in wee$s rather than months or year.
%c' #irtual ware house: A virtua" warehouse is a set of views over operationa" databases. )or efficient 0uery processing! on"y some of the possib"e summary views may be materia"i/ed.
1irtua" Ware House
Data "odel:
The foundation of the data warehousing system is the data model . A good data mode" wi"" a""ow the data warehousing system to grow easi"y! as we"" as a""owing for good performance. In data warehousing proect! the "ogica" data mode" is bui"t based on user re0uirements! and then it is trans"ated into the physica" data mode": %a' Conceptual data model: At this "eve"! the data mode"er attempts to identify the highest "eve" re"ationships among the different entities. •
Inc"udes the important entities and the re"ationships among them.
•
2o attribute is specified.
•
2o primary $ey is specified.
%b' $ogical data model: At this "eve"! the data mode"er attempts to describe the data in as much detai" as possib"e! without regard to how they wi"" be physica""y imp"emented in the database. In data warehousing! it is common for the conceptua" data mode" and the "ogica" data mode" to be combined into a sing"e step %de"iverab"e'. The steps for designing the "ogica" data mode" are as fo""ows: 3. Identify a"" entities. 4. -pecify primary $eys for a"" entities. 5. )ind the re"ationships between different entities. 6. )ind a"" attributes for each entity. 7. *eso"ve many#to#many re"ationships. 8. 2orma"i/ation.
%c' %hsical data model: At this "eve"! the data mode"er wi"" specify how the "ogica" data mode" wi"" be rea"i/ed in the database schema. •
-pecification a"" tab"es and co"umns.
•
)oreign $eys are used to identify re"ationships between tab"es.
•
De#norma"i/ation may occur based on user re0uirements.
•
,hysica" considerations may cause the physica" data mode" to be 0uite different from the "ogica" data mode".
Data warehouse &sage:
Data warehouse contains integrated and processed data to perform data ana"ysis at the time of decision ma$ing and p"anning. It is a very important too" for business e&ecutives.
%a' Information %rocessing: It supports 0uerying basic statistica" ana"ysis! and reporting using crosstabs! tab"es! charts! or graphs.
%b' 'naltical %rocessing: It is he"pfu" in mu"tidimensiona" ana"ysis of data warehouse data and support basis 9A, %9n#ine Ana"ytica" ,rocessing' operations %-"ice#dice! dri""ing! pivoting'.
%c' Data "ining: The data mining is a process of inte""igent pattern discovery from data warehouse. It supports associations! constructing ana"ytica" mode"s! performing c"assification and predication! and presenting the mining resu"ts using crosstabs! graphs! and other visua"i/ation too"s.
(. Data "ining
An information e&traction activity whose goa" is to discover hidden facts contained in databases is termed as data mining. ;sing a combination of machine "earning! statistica" ana"ysis! mode"ing techni0ues and database techno"ogy! data mining finds patterns and subt"e re"ationships in data and infers ru"es that a""ow the prediction of future resu"ts. Typica" app"ications inc"ude mar$et segmentation! customer profi"ing! fraud detection! eva"uation of retai" promotions! and credit ris$ ana"ysis. •
Data mining refers to the mining or discovery of new information in the term of pattern or ru"es from vast amount of data.
•
Data mining he"ps in e&tracting meaningfu" patterns that cannot be found necessari"y by mere"y 0uerying or processing data or metadata in the data warehouse.
•
Data mining is a process of data ana"ysis using powerfu" ana"ysis too"s capab"e of e&tracting business inte""igence from the "arge repository of e"ectronic data.
•
Data mining is the resu"t of natura" evo"ution of Information techno"ogy in genera" and Database techno"ogy in particu"ar.
Data "ining 'pplication:
Data mining does not rep"ace s$i""ed business ana"ysts or managers! but rather gives them powerfu" new too"s to improve the ob they are doing. It is a something out from traditiona" trac$s of decision ma$ing and business p"anning. It offers great promises in he"ping organi/ations to uncover patterns hidden in their data that can be used to predict the behavior of customers! products and processes. •
)iomedical and D*' data analsis: The genetic engineering is the young discip"ine of engineering which is tota""y based on the structure of genes. There are genes are present in human body and a pair of gene is responsib"e to
contro" any specific characteristics. The genes are present in D2A %Deo&yribo 2uc"ic Acid' which is made from nuc"eotides: Adenine %A'! (ytocine %('! +uanine %+'! and Thymine %T'. The gene engineering is boon for person suffering from hereditary disease. After ferti"i/ation! se0uence of diseases carrying gene in /ygote is changed. •
Image processing: Data mining provides efficient too"s for image processing.
•
Financial data analsis: The ban$ and business organi/ations are often based on data mining for co""ection! high 0ua"ity accuracy! better customer service and satisfaction! "oan payment! credit rating etc.
•
Retail Industr: The customers are maor obective for any business organi/ation. The products and services are designed to focusing customers. Data mining is he"pfu" in prediction of behavior of customers in mar$et. It is used to identify customer buying behavior! improve customer service! enhance customer and goods ratio! design more effective goods and discover cost effective transportation methods etc.
•
"anufacturing sectors: Manufacturing section of any organi/ation is dependent on data mining for designing of most acceptab"e products. The mar$et is the name of competition! if there is no any competition your monopo"y he"p you to obtain high profit! but now a days monopo"y can e&ists not for "ong times. The data mining he"ps e&ecutive to design customer oriented products.
•
+elecommunication Industr: Te"ecommunication industries are bac$bone of any organi/ation. The mismanagement in communication industry can spoi" many business organi/ations! industries! universities! mi"itary systems etc! because it does not carry on"y norma" data but a"so confidentia" data. In te"ecommunication industry
data mining is used for identifying te"ecommunication patterns! catching fraudu"ent activities! ma$ing better use of recourses! and improving 0ua"ity of services.
,. KDD -Knowledge Discoer in Database/
The KDD is in the e&panding process. The term
What is business intelligence0 =usiness inte""igence usua""y refers to the information that is avai"ab"e for the enterprise to ma$e decisions on. A data warehousing %or data mart' system is the bac$end! or the infrastructura"! component for achieving business inte""igence. =usiness inte""igence a"so inc"udes the insight gained from doing data mining ana"ysis! as we"" as unstructured data %thus the need for content management systems'. )or our purposes here! we wi"" discuss business inte""igence in the conte&t of using a data warehouse infrastructure. The
Data se"ection
•
ncoding
•
Data c"eaning
•
nrichment
•
Data Mining! and
•
*eporting
. Data mining and warehousing:
•
The goa" of a data warehouse is to support decision ma$ing with data. Data mining can be used in conunction with a data warehouse to he"p with certain types of decisions.
•
Data mining can be app"ied to operationa" databases with individua" transactions. To ma$e data mining more efficient! the data warehouse shou"d have an aggregated or summari/ed co""ection of data. Data mining he"ps in e&tracting meaningfu" new patterns that cannot be found in the data warehouse.
•
Data mining app"ications shou"d therefore be strong"y considered ear"y! during the design of data warehouse.
•
Data mining too"s shou"d be designed to faci"itate their use in conunction with data warehouses.
2. Web Data "ining
The Wor"d Wide Web provides rich sources for data mining. It is a too huge for effective data warehousing and data mining! and too comp"e& and heterogeneous because it has no standard and structure. The WWW is huge! wide"y distributed! g"oba" information service center for •
Information services: news! advertisements! consumer information! financia" management! education! government! e#commerce! etc.
•
Hyper#"in$ information
•
Access and usage information
Dat aWar ehouse $ Ad at awa r e ho us e( DW,DWH) ,o ra ne nt e r p r i s eda t awa r e ho us e( EDW) ,i sa
s y st em,us ual l ysepar at edf r om t heor i gi nal s y st em. $ I ti sdes i gnedf orquer yandanal y si sr at hert hanf ort r ans ac t i onpr oc es si ng.I t
us ual l yc ont ai nshi s t or i c al dat ader i v edf r om t r ans ac t i ondat a,buti tc ani nc l ude dat af r om o t hers our c es . $ I ts epar at esanal y s i swor k l oadf r om t r ans ac t i onwor k l oadandenabl esan
or gani z at i ont oc ons ol i dat edat af r om s ev er al s our c es . $ Apr oc es soft r ans f or mi ngdat ai nt oi nf or mat i onandmak i ngi tav ai l abl et o
u s er si nat i me l ye no ug hma nn ert oma k ead i ff e r e nc e . $ I ti sar el at i onal ormul t i di mens i onal dat abas emanagements y s t em des i gned
t os up po r tma na ge me ntd ec i s i o nma k i n g. $ Adat awar ehous i ngi sac opyoft r ans ac t i ondat as pec i fi cal l ys t r uc t ur edf or
quer y i ngandr epor t i ng. Hi st or y ★ Th ec o nc e pto fd at awa r e ho us i n gd at e sba c kt ot h el a t e1 98 0swh enI BM
r es ear c her sBar r yDe v l i nandPaul Mur ph yde v el opedt he" bus i nes sdat a war ehouse" . ★ 1960s-Gener al Mi l l sandDar t mout hCol l ege,i naj oi ntr es ear c hpr oj ec t ,
de v el opt het er msdi mens i onsandf ac t s . ★ 1970s-ACNi el s enandI RIpr ov i dedi mens i onal dat amar t sf orr et ai l s al es . ★ 1 98 3–T er a da t ai n t r o du c esada t a ba s eman ag eme nts y s t e ms p ec i fi c a l l y
des i gnedf ordec i s i ons uppor t . ★ 1988-Bar r yDev l i nandPaul Mur phypubl i s ht hear t i c l eAnar c hi t ec t ur ef ora
bus i nes sandi nf or mat i ons y s t emsi nI BM Sy s t emsJ our nal wher et he yi nt r oduc e t het er m" bus i nes sdat awar ehous e" .
Dat aWar ehousei ncl udes ★ Ret r i ev i ngdat a ★ Anal y z i ngdat a ★ Ex t r ac t i ngdat a ★ L oadi ngdat a ★ T r ans f or mi ngdat a ★ Managi ngdat a
Benefit s Mai nt ai nsac op yofi nf or mat i onf r om t hes our c et r ans ac t i ons y s t ems ★ ★ Congr egat edat af r om mul t i pl es our c esi nt oas i ngl edat abas e ★ Mai nt ai ndat ahi s t or y ,ev eni ft hes our c et r ans ac t i ons ys t emsdonot . ★ I mpr ov edat aqual i t y , bypr ov i di ngc ons i s t entc odesanddes cr i pt i ons ,fl aggi ng
o re v enfi x i n gb add at a . ★ Pr es entt heor gani z at i on' si nf or mat i onc ons i s t ent l y ★ Pr ov i deas i ngl ec ommondat amodel f oral l dat aofi nt er es tr egar dl es soft he
dat a' ss our c e ★ Res t r uc t ur et hedat as ot hati tmak ess ens et ot hebus i nes sus er s ★ Sav esTi me
Gener at esaHi ghROI ★ Char act er i st i csofDat aWar ehouse $ Thec onc ep tofaDat aWar ehous ewasi nt r oduc edb yBi l l I nmon,t hef at herof
Dat aWar ehous e.Her ear et hec har ac t er i s t i c s: ★ Subj ec tOr i ent at i on ★ Ti mev ar i anc e ★ NonVol at i l e ★ I nt egr at ed
Subj ectOr i ent at i on $ Dat awar ehous esa r edes i gnedt ohel py ouanal y z edat a.Fore x ampl e,t o
l ear nmor eabouty ourc ompan y ' ss al esdat a,y o uc anbui l dawar ehous et hat c onc ent r at esons al es .Us i ngt hi swar ehous e,y ouc anans werques t i onsl i k e " Whowasourbes tc us t omerf ort hi si t em l as ty ear ?"Thi sabi l i t yt odefi neadat a war ehous eb ys ubj ec tmat t er ,s al esi nt hi sc as e,mak e st hedat awar ehous e s ubj ec tor i ent ed. Ti mevar i ance $ I nor dert odi s cov ert r endsi nbus i nes s,anal y s t sneedl ar geamount sofdat a.
Thi si sv er ymuc hi nc ont r as tt oonl i net r ans ac t i onpr oc es si ng( OL TP)s y st ems , wh er ep er f or ma nc er eq ui r e me nt sde ma ndt h ath i s t or i c a ld at ab emo v e dt oa n ar c hi v e .Adat awar ehous e' sf oc usonc hangeo v ert i mei swhati smeantb yt he t er mt i mev ar i ant . NonVol at i l e $ Non vol at i l emeanst hat ,onc eent er edi nt ot hewar ehous e,dat as houl dno t
c hange.Thi si sl ogi c al bec aus et hepur pos eofawar ehous ei st oenabl ey out o a na l y z ewh ath aso c c ur r ed . I nt egr at ed $ I nt egr at i oni sc l os el yr el at edt os ubj ec tor i ent at i on.Dat awar ehous esmus tput
dat af r om di s par at es our c esi nt oac ons i s t entf or mat .The ymus tr es ol v es uc h pr obl emsasnami ngconfl i c t sa ndi nc ons i s t enc i esamonguni t sofmeas ur e. Whent heyac hi ev et hi s ,t heyar es ai dt obei nt egr at ed. TypesofSyst ems
★ Dat aMar t ★ Onl i neanal y t i c al pr oc es s i ng( OLAP)
Onl i neTr ans ac t i onPr oc es s i ng( OL TP) ★ ★ Pr edi c t i v eanal y si s
Dat aMar t $ Dat aMar t:Adat amar ti sas i mpl ef or m ofadat awar ehous et hati sf oc us ed
onas i ngl es ubj ec t( orf unc t i onal ar ea) ,s uc hass al es ,fi nanc eormar k et i ng.Dat a mar t sar eof t enbui l tandc ont r ol l edbyas i ngl edepar t mentwi t hi nan or gani z at i on.Gi v ent hei rs i ngl es ubj ec tf oc us ,dat amar t sus ual l ydr awdat af r om onl yaf ews our c es .Thes our c escoul dbei nt er nal oper at i onal s y st ems ,ac ent r al dat awar ehous e,ore xt er nal dat a. Onl i neanal yt i calpr ocessi ng( OLAP) $ Onl i neanal y t i c al pr oc es si ng( OLAP)i schar ac t er i z edbyar el at i v el yl ow
v ol umeoft r ans ac t i ons .Quer i esar eof t env er yc ompl e xandi nv ol v e a gg r e ga t i o ns .Fo rOL APs y s t e ms ,r es p on s et i mei sane ffe c t i v e ne s smea s ur e . OLAPa pp l i c a t i o nsa r ewi d el yu s edbyDa t aMi n i n gt e c hn i qu es .OLAPd at a ba s es s t or eaggr egat ed,hi s t or i c al dat ai nmul t i di mens i onal s chemas( us ual l ys t ar s c h ema s ) .OLAPs y s t e mst y p i c a l l yha v ed at al a t e nc yofaf e wh our s ,a so ppo s ed t odat amar t s ,wher el at enc yi sex pec t edt obec l os ert ooneda y . Onl i neTr ansact i onPr ocessi ng( OLTP) $ Onl i neT r ans ac t i onPr oc es s i ng( OL TP)i sc har ac t er i z edb yal ar genumberof
s h or to nl i n et r a ns a c t i o ns( I NSERT ,UPDATE,DEL ETE) .OL T Ps y s t e ms emphas i z ev er yf as tquer ypr oc es si ngandmai nt ai ni ngdat ai nt egr i t yi nmul t i a c c es se nv i r o nmen t s .Fo rOL TPs y s t e ms ,e ffe c t i v e ne s si smea s ur e db yt h e numberoft r ans ac t i onspers ec ond.OL TPdat abas esc ont ai nde t ai l edandc ur r ent
dat a.Thes c hemaus edt os t or et r ans ac t i onal dat abas esi st heent i t ymodel ( us ual l y3NF) . Pr edi ct i veanal y si s $ Pr edi c t i v eanal y si si saboutfi ndi ngandquant i f y i nghi ddenpat t er nsi nt hedat a
us i ngc ompl e xmat hemat i c al model st hatc anbeus edt opr edi c tf ut ur eout c omes . Pr edi c t i v eanal y s i si sdi ffer entf r om OLAPi nt hatOLAPf oc us esonhi s t or i c al dat a anal y si sandi sr eac t i v ei nnat ur e,whi l epr edi c t i v eanal y si sf oc us esont hef ut ur e. Thesesyst emsar eal sousedf orCRM ( Cust omerRel at i onshi pManagement ) . Dat awar ehousevsOL TP $ Data warehouse
$ OLTP
★ Subject Oriente d
★ Application Oriente d
★ Used to analyze busines s
★ Used to run busines s
★ Summarized and refine d
★ Detailed dat a
★ Snapshot dat a
★ Current up to dat e
★ Integrated Dat a
★ Isolated Dat a
★ Knowledge User !anager )
★ Clerical Use r
★ "arge #olumes accessed at a time millions )
★ $ew %ecords accessed at a tim e
★ !ostly %ead &atch Update )
★ %ead'Update Acces s
★ %edundancy presen t
★ (o data redundanc y
★ Database Size )** +& , few terabyte s
★ Database Size )**!& ,)** + B
★ -uery throughput is the performance metri c
★ .ransaction throughput is the performance metri c
★ /undreds of user s
★ .housands of user s
★ !anaged by subset s
★ !anaged in entiret y
Dat awar ehouseenvi r onment
$ Theen vi r onmentf ordat awar ehous esandmar t si nc l udest hef ol l o wi ng: ★ So ur c es y s t e mst h atp r o v i d ed at at ot hewar e ho us eo rmar t
Dat ai nt egr at i ont ec hnol ogyandpr oc es s est hatar eneededt opr epar et he ★ dat af orus e ★ Di ffer entar c hi t ec t ur esf ors t or i ngdat ai nanor gani z at i on' sdat awar ehous eor
d at amar t s ★ Di ffer entt ool sandappl i c at i onsf ort hev ar i et yofus er s ★ Me t adat a,dat aqual i t y ,andgo v er nanc epr oc es s esmus tbei npl ac et oens ur e
t hatt hewar ehous eormar tmeet si t spur pos es Dat aWar ehouseComponent s $ Th ep r i ma r yc o mp on en t so fd at awa r e ho us ear e: ★ Oper at i onal Dat a ★ Loa dManage r ★ War ehou s eMana ger ★ Quer yMana ger ★ Det ai l ed,Li ght l yandHi ghl ysummer i z eddat a ★ Ar chi veandBackupDat a ★ Met aDat a ★ En du s era c c es st oo l s
Dat aWar ehouseComponent s ( Ope r at i onalDat a) ★ Thedat ac omesf r om t hemai nf r ames y s t emsi nt het r adi t i onal ne t wor kand
hi er ar c hi c al f or mat . Dat ac anc omef r om t r adi t i onal RDBMSl i k eOr ac l e,I nf or mi xet c . ★ ★ Oper at i onal dat ac anal s oc omef r om e xt er nal s our c es( e. g.c ommer c i al
dat abas esanddat abas esas s oc i at edwi t hs uppl i erandc us t omer s ) .
Dat aWar ehouseComponent s ( LoadManager ) ★ Per f or msal l t heoper at i onsas soc i at edwi t hex t r ac t i onandl oadi ngdat ai nt o
t h ed at awa r e ho us e . ★ Oper at i onsi nc l udes i mpl et r ans f or mat i onsoft hedat at opr epar et hedat af or
ent r yi nt ot hewar ehous e. ★ Thes i z eandc ompl e x i t yoft hi sc omponentwi l l v a r ybet weendat awar ehous es
andma ybec ons t r uc t edus i ngac ombi nat i onofv e ndordat al oadi ngt ool sand c us t om bui l tpr ogr ams . Dat aWar ehouseComponent s ( War ehouseManager ) ★ Anal y s i sofdat at oens ur ec ons i s t enc y . ★ T r ans f or mat i onandmer gi ngt hes our c edat af r om t empor ar ys t or agei nt odat a
wa r e ho us et a bl e s . ★ Cr eat ei nde x esandv i e wsont hebas et abl e. ★ Denor mal i z at i on ★ Gener at i onofaggr egat i on ★ Bac k i ngupandar c hi v i ngofdat a
Dat aWar ehouseComponent s ( Quer yManager ) ★ Per f or msal l oper at i onsas s oc i at edwi t hmanagementofus erquer i es .
Thi sc omponenti sus ual l yc ons t r uc t edus i ngv endorendus erac c es st ool s , ★ dat awar ehous i ngmoni t or i ngt ool s ,dat abas ef ac i l i t i esandc us t om bui l tpr ogr ams . ★ Thec ompl e xi t yofaquer ymanageri sdet er mi nedbyf ac i l i t i espr ov i dedbyt he
e nd us e ra c c es s
Dat aWar ehouseComponent s ( De t a i l e d,Li ght l yandHi ghl ysumme r i z e dda t a ) ★ Thi sar eaoft hewar ehous es t or esal l t hedet ai l eddat ai nt hedat abas e
s c hema. ★ I nmos tc as esdet ai l eddat ai snots t or edonl i nebutaggr egat edt ot henex t
l ev el ofdet ai l s . ★ Thear eaoft hedat awar ehous es t or esal l t hepr edefi nedl i ght l yandhi ghl y
s u mma r i z e d( a ggr eg at e d)d at a . ★ Thi sar eaoft hewar ehous ei st r ans i entasi twi l l bes ubj ec tt oc hangeonan
ongoi ngbas i si nor dert or es pondt ot hec hangi ngquer ypr ofi l es . ★ Thepur pos eoft hesummar i z edi nf or mat i oni st os peedupt hequer y
p er f o r ma nc e . Dat aWar ehouseComponent s (Ar chi veandBackupDat a) ★ Thi sar eaoft hewar ehous es t or esdet ai l edands ummar i z eddat af ort he
pur pos eofar c hi v i ngandbac k up. ★ Thedat ai st r ans f er r edt os t or agear c hi v essuc hasmagnet i ct apesoropt i c al
di s ks . Dat aWar ehouseComponent s (Met aDat a) $ Thedat awar ehous es t or esal l t heMet adat a( dat aaboutdat a)defi ni t i onsus ed
byal l pr oc es sesi nt hewar ehous e.I ti sus edf orv ar i et yofpur pos ei nc l udi ng: ★ Theex t r ac t i onandl oadi ngpr oc es s–Me t adat ai sus edt omapdat asour c es
t oac ommonvi e w ofi nf or mat i onwi t hi nt hewar ehous e. ★ Thewar ehousemanagementpr ocess–Met adat ai susedt oaut omat et he
pr oduc t i onofs ummar yt abl es .
★ Aspa r to fQu er yMa na ge me ntp r o ce s sMe t ad at ai su s edt od i r e c taq uer yt o
t hemos tappr opr i at edat as our c e. Dat aWar ehouseComponent s ( War ehouseManager ) Anal y s i sofdat at oens ur ec ons i s t enc y . ★ ★ T r ans f or mat i onandmer gi ngt hes our c edat af r om t empor ar ys t or agei nt odat a
wa r e ho us et a bl e s . ★ Cr eat ei nde x esandv i e wsont hebas et abl e. ★ Denor mal i z at i on ★ Gener at i onofaggr egat i on ★ Bac k i ngupandar c hi v i ngofdat a
ETL( e xt r ac t ,t r a ns f or m,a ndl oad) Dat aWar ehouseComponent s ETL( e xt r ac t ,t r a ns f or m,a ndl oad) $ Ex t r ac t ,t r ans f or m,andl oad( ETL)r ef er st oapr oc es si ndat abas eus ageand
es pec i al l yi ndat awar ehous i ngt hat : ★ Ex t r ac t sdat af r om out s i desour c es ★ T r ans f or msi tt ofi toper at i onal needs ,whi c hc ani nc l udequal i t yl ev el s ★ Loadsi ti nt ot heendt ar get( dat abas e,mor es pec i fi cal l y ,oper at i onal dat a
s t or e,dat amar t ,ordat awar ehous e) Dat aWar ehouseComponent s Ex t r a ct $ Thefi r s tpar tofanETLpr oc es si nv ol v esex t r ac t i ngt hedat af r om t hes our c e
s y s t ems .I nman yc as est hi si st hemos tc hal l engi ngas pec tofETL,s i nc e ex t r ac t i ngdat ac or r ec t l yset st hes t agef ort hes uc c es sofs ubs equentpr oc es ses .
★ Mos tdat awar ehous i ngpr oj ec t sc ons ol i dat edat af r om di ff er ents our c e
sys t ems. Eac hs epar at es y s t em ma yal s ous eadi ff er entdat aor gani z at i onand/ or ★ f or mat . ★ Commondat as our c ef or mat sa r er el at i onal dat abas esa ndflatfi l es ,butma y
i nc l udenonr el at i onal dat abas e ★ Thes t r eami ngoft hee xt r ac t eddat as our c eandl oadont hefl yt ot he
des t i nat i ondat abas e. ★ Thegoal oft heex t r ac t i onphas ei st oc onv er tt hedat ai nt oas i ngl ef or mat
appr opr i at ef ort r ans f or mat i onpr oc es si ng. Dat aWar ehouseComponent sT r ansf or m $ Thet r ans f or ms t ageappl i esas er i esofr ul esorf unc t i onst ot heex t r ac t eddat a
f r om t hes our c et oder i v et hedat af orl oadi ngi nt ot heendt ar get . ★ Sel ec t i ngonl ycer t ai nc ol umnst ol oad ★ T r ans l at i ngc odedv al ues ★ Enc odi ngf r eef or mv al ues ★ Der i v i nganewc al c ul at edv al ue ★ Sor t i ng ★ J oi ni ngdat af r om mul t i pl es our c esanddedupl i c at i ngt hedat a ★ Aggr egat i on ★ Gener at i ngs ur r ogat ek e yv al ues
Tr ans pos i ngorpi v ot i ng ★ ★ Spl i t t i ngac ol umni nt omul t i pl ec ol umns
Dat aWar ehouseComponent sLoad $ Thel oadphas el oadst hedat ai nt ot heendt ar get ,us ual l yt hedat awar ehous e
( DW) .Dependi ngont her equi r ement soft heor gani z at i on,t hi spr oc es sv ar i es wi del y .Somedat awar ehous esma yo v er wr i t ee xi s t i ngi nf or mat i onwi t h
c umul at i v ei nf or mat i on;updat i ngex t r ac t eddat ai sf r equent l ydoneonadai l y , week l y , ormont hl ybas i s .Ot herdat awar ehous es( ore v enot herpar t soft he s amedat awar ehous e)ma yaddnewdat ai nanhi s t or i c al f or m atr egul ari nt er v al s —f ore xampl e,hour l y . T ounder s t andt hi s ,c ons i deradat awar ehous et hati s r equi r edt omai nt ai ns al esr ec or dsoft hel as ty ear .Thi sdat awar ehous e o v er wr i t esan ydat aol dert hanay earwi t hnewerdat a. Dat aWar ehouseAr chi t ect ur es $ Dat awar ehous esandt hei rar c hi t ec t ur esv ar ydependi ngupont hes pec i fi c sof
anor gani z at i on' ssi t uat i on.Thr eec ommonar c hi t ec t ur esar e: ★ Dat aWar ehous eAr c hi t ec t ur e( Bas i c ) ★ Dat aWar ehous eAr c hi t ec t ur e( wi t haSt agi ngAr ea) ★ Da t aWar eh ou seAr c h i t ec t u r e( wi t haSt a gi ngAr e aa ndDa t aMa r t s )
Dat aWar ehouseAr chi t ect ur e ( Ba si c ) $ Endus er sdi r ec t l yac c es sdat ader i v edf r om s ev er al s our c es y s t emst hr ough
t h ed at awa r e ho us e . Thenex tfi gur ei l l us t r at est hr eet hi ngs: ★ Dat aSour c es( oper at i onal s y s t emsandfi l es ) ★ War ehouse( met adat a,summar ydat a,andr awdat a) ★ Us er s( anal y si s ,r epor t i ng,andmi ni ng) ★ Dat aWar ehous eAr c hi t ec t ur e( Bas i c )
Dat aWar ehouseAr chi t ect ur e ( wi t haSt a gi ngAr e a) $ Youneedt oc l eanandpr oc es sy ouroper at i onal dat abef or eput t i ngi ti nt ot he
war ehous e.Youc andot hi sp r ogr ammat i c al l y ,al t houghmos tdat awar ehous es us eas t agi ngar eai ns t ead.As t agi ngar eas i mpl i fi esbui l di ngs ummar i esand
gener al war ehousemanagement . St agi ngar ea-Apl ac ewher edat ai spr oc es s edbef or eent er i ngt hewar ehous e. Thenex tfi gur ei l l us t r at est hr eet hi ngs: ★ Dat aSour c es( oper at i onal s y s t emsandfi l es ) ★ St ag i n gAr e a( wh er eda t as our c e sgobe f o r et h ewa r e hou s e) ★ War ehouse( met adat a,summar ydat a,andr awdat a) ★ Us er s( anal y si s ,r epor t i ng,andmi ni ng)
Dat aWar ehouseAr chi t ect ur e ( wi t haSt a gi ngAr e a) $ Toc us t omi z ey ourwar ehous e' sar c hi t ec t ur ef ordi ffer entgr oupswi t hi ny our
or gani z at i on. Thi sbyaddi ngdat amar t s,whi c har es y st emsdes i gnedf orapar t i c ul arl i neof bus i nes s . Thene xte x ampl ei l l us t r at esane x ampl ewher epur c has i ng,s al es ,and i nv ent or i esar es epar at ed.I nt hi sex ampl e,afi nanc i al anal y stmi ghtwantt o anal y zehi s t or i c al dat af orpur c has esands al es . Dat aWar ehouseAr chi t ect ur e ( wi t haSt a gi ngAr e a) $ Thi si l l us t r at esfi v et hi ngs: ★ Dat aSour c es( oper at i onal s y s t emsandfl atfi l es ) ★ St ag i n gAr e a( wh er eda t as our c e sgobe f o r et h ewa r e hou s e) ★ War ehouse( met adat a,summar ydat a,andr awdat a) ★ Dat aMar t s( pur c has i ng,s al es ,andi nv ent or y )
Us er s( anal y si s ,r epor t i ng,andmi ni ng) ★ Dat aWar ehousi ng Sof t wa r et ool s