This paper includes the application that is implemented at my college. Here is perfect explanation of Data Warehousing and Data Mining with full descr...
The data warehouse allows the storage of data in a format that facilitates its access, but if the tools for deriving information and/or knowledge and presenting them in a format that is useful for ...
This paper includes the application that is implemented at my college. Here is perfect explanation of Data Warehousing and Data Mining with full description of the project.
Data warehousing and data mining (both data & text) provide a technology that enables the decision-maker in the corporate sector/govt. to process this huge amount of data in a reasonable amo…Description complète
Full description
Data Warehousing and Data Mining (1)
Hand book of Data Mining JNTU syllubusFull description
data warehouse data miningFull description
Data Mining is defined as the procedure of extracting information from huge sets of data. Now a day, Data Mining technique placing a vital role in the Information Industry.
Descripción: Data Mining
Data mining studi kasus Alcoholic Liver Disease (ALD) akibat potensial yang diakibatkan oleh konsusi alkohol
Data MiningDeskripsi lengkap
miningFull description
This research paper is about Data Warehousing. It gives you overview about data warehousing and how companies use it. It also tells you about the architecture and working of data warehousing…Full description
managementFull description
Check list of important items to successfully implement ETL.
INDEX 1. ABSTRACT 2. DATA DATA WARE WAREHO HOUS USIN ING G Introduction Need of Data Warehousing
Purpose of Data Warehousing
Characteristics Life cycle Architecture Tools and technologies Applications
3. DATA MI MINING Introduction Types of Data Mining Major elements of Data Mining Data Mining: A KDD process Steps in KDD process
Mining Methods of Data Mining 4. PROJECT PROJECT ON DATA DATA MINING: MINING: Web Website site Data Mining Mining Aim of project Implementation Working Advantages
5. CONCLUSION 6. REFERENCE
DATA WAREHOUSING AND DATA MINING ABSTRACT: Fast, Fast, accurat accuratee and scalab scalable le data data analys analysis is techni technique quess are needed needed to extract extract useful useful information from huge pile of data. Data warehouse is a single, integrated source of decision supp suppor ortt info inform rmat atio ion n form formed ed by coll collect ectin ing g data data from from mult multip iple le sour source ces, s, inte intern rnal al to the the organization as well as external, and transforming and summarizing this information to enable improved decision making. Data warehouse is designed for easy access by users to large amounts of information, and data access is typically supported by specialized analytical tools and applica applicatio tions. ns. Typica Typicall applic applicati ations ons includ includee decisi decision on suppo support rt syste systems ms and execut execution ion information system. Data mining is the exploration and analysis of large quantities of data in order to discover discover valid, novel, novel, potentially potentially useful, and ultimately ultimately understandable understandable patterns in data. It is “An information extraction activity whose goal is to discover hidden facts contained in databases”.
The process of extracting valid, previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions. The project entitled “Website Data Mining” is an application of data mining which which is built built for for the websit websitee develo developer perss for the their ir effec effectiv tivee creat creation ion of websit websites es in internet.
Data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. It produces output values for an assigned set of input values. Typical Typical applic applicati ations ons includ includee market market segmen segmentat tation ion,, custom customer er profil profiling ing,, fraud fraud detect detection ion,, evaluation of retail promotions, and credit risk analysis.
DATA WAREHOUSING Everyd Everyday ay increa increasin singly gly,, organi organizati zations ons are analyz analyzing ing curren currentt and histor historical ical data data to identify useful patterns and support business strategies. A large amount of the right information is the key to survival in today’s competitive enviro environme nment. nt. And this kind kind of inform informati ation on can be made made availa available ble only only if there’s there’s totally totally integrated enterprise data warehouse. What is data warehousing? A data warehouse is a subject-oriented, integrated, non-volatile & time-variant
collection of data in support of management’s decisions. Need for Data Warehousing:
• IT or business staff spending a lot of time developing special reports for decision-makers. • Lots of PC-based or small server systems obtaining extracts of data incapable of presenting a holistic view of the entire gamut of information. • Same data present on different systems, in different department and users may be unaware of this fact. • Difficulty in getting meaningful information in a timely manner. • Multiple systems giving different answer to the business questions. • Less analysis by decision makers and policy planners due to non-availability of sophisticated tools and easily decipherable, timely and comprehensive information
Purpose of Data Warehousing:
Better business intelligence for end users. • Reduction in time to access and analyze information. • Consolidation of disparate information sources.
• Replacement of older, less-responsive decision support systems • Faster time to market for products and services
Data Warehouse Characteristics: 1. Subject-oriented àWH is organized around the major subjects of the enterprise rather
than the major application areas. This is reflected in the need to store decision-support data rather than application-oriented data. 2. Integrated à because the source data come together from different enterprise-wide
application applicationss systems. systems. The source source data is often inconsistent inconsistent using..The using..The integrated integrated data source must be made consistent to present a unified view of the data to the users. 3. Time-variantàthe source data in the WH is only accurate and valid at some point in
time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots. 4. Non-volatileàdata is not update in real time but is refresh from OS on a regular basis.
New data is always added as a supplement to DB, rather than replacement. the DB continually absorbs this new data, incrementally integrating it with previous data
DATA WAREHOUSE LIFE CYCLE : Data warehousing is a concept. It is not a product that can be purchased off the shelf. It
is a set of hardware and software components integrated together which can be used to analyze the massive amount of data stored in an efficient manner. It is a process through which one can build build a succes successfu sfull data data warehou warehouse. se. Follo Followin wing g are the five five steps steps toward towardss buildi building ng a successful data warehouse.
1) JUST JUSTIF IFIC ICAT ATIO ION N 2) REQU REQUIR IREM EMEN ENT T ANAL ANALYS YSIS IS 3)
DESIGN
4) DEVELOPMENT & IMPLEMENTATION
5) DEPL DEPLOY OYME MENT NT
DATA WAREHOUSE ARCHITECTURE :
Operational data source1
Meta-data
Operational data source 2
Lightly Manage
Reporting, query,application development, and EIS(executive information system) tools
High Query summarized data
summarized
Load Manager
data
Operational Detailed data
data source n
DBMS
OLAP(online analytical processing) tools
Operational data store (ods)
Operational data store (ODS)
Data mining Archive/backup
End-useraccess tools data
Typical architecture of a data warehouse Warehouse Manager
Main Components: •
Operational data sourcesàfor the DW is supplied from mainframe operational data held
in first first gene genera rati tion on hiera hierarc rchi hical cal and and netw networ ork k data databa base ses, s, depa depart rtme ment ntal al data data held held in proprietary proprietary file systems, systems, private data held on workstaions workstaions and private serves and external
systems such as the Internet, commercially available DB, or DB assoicated with and organization’s suppliers or customers •
Operational datastore(ODS) datastore(ODS)àis a repository of current and integrated operational data
used for analysis. It is often structured and supplied with data in the same way as the data warehouse, but may in fact simply act as a staging area for data to be moved into the warehouse •
Load manager
also called the frontend component, it performance all the operations
associated associated with the extraction extraction and loading loading of data into the warehouse. warehouse. These operations operations include simple transformations of the data to prepare the data for entry into the warehouse •
Warehouse managerà performs all the operations associated with the management of the
data in the warehouse. The operations performed by this component include analysis of data to ensure consistency, transformation and merging of source data, creation of indexes and views, generation of denormalizations and aggregations, and archiving and backing-up data •
also calle called d back backen end d comp compon onen ent, t, it perfo perform rmss all all the the oper operat atio ions ns Query Quer y manager manageràalso associ associate ated d with with the manage managemen mentt of user user queries queries.. The operat operation ionss perfor performed med by this this component include directing queries to the appropriate tables and scheduling the execution of queries
•
End-user access tools àcan be categorized into five main groups: data reporting and
query tools, application application development development tools, tools, executive executive informatio information n system system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools. Tools and Technologies:
•
The critica criticall steps steps in in the the const construc ructio tion n of a data data wareh warehous ouse: e: a. Extraction
•
b. Cleansing
c. Transformation
After After the crit critical ical step steps, s, loadi loading ng the resul results ts into into targe targett system system can can be carrie carried d out eith either er by separate products, or by a single, categories:
•
Code generators
•
Data Databa base se dat dataa repl replic icat atio ion n tool toolss
•
Dyna Dynami micc tran transf sfor orma mati tion on eng engin inee
Applications:
•
Online Transaction Processing:
OLTP systems are the major kinds of enterprise applications: Examples: Order entry systems, Inventory control systems, Reservation systems, Point-of-sale systems, Tracking systems, etc.
•
Executive information system (EIS) :
Present information at the highest level of summarization using corporate business measures. They are designed for extreme ease-of-use and, in many cases, only a mouse is requir required. ed. Graphic Graphicss are usual usually ly genero generousl usly y incorp incorpora orated ted to provid providee at-a-gl at-a-glanc ancee indications of performance •
Decision Support Systems (DSS) :
They ideally present information in graphical and tabular form, providing the user with the ability to drill down on selected information. Note the increased detail and data manipulation options presented.
DATA MINING What is data mining? Data Mining refers to the process of analyzing the data from different perspectives and
summarizing it into useful information. Data mining software is one of the numbers of tools used used for for analy analyzi zing ng data data from from many many diff differe erent nt dime dimens nsio ions ns or angl angles es,, categ categor oriz izee it, it, and and summarize the relationship identified.
Definition: Data mining mining is the process process of finding finding correlation correlation or or patterns patterns among among fields fields
in large large
relational databases. “The process of extracting valid, previously unknown, comprehensible, comprehensible, and actionable information from large databases and using it to make crucial business decision”
Different Types of Data Mining: Business, Scientific and Internet Data Mining Five major elements of Data Mining: 1. Extract, transform, & load transaction data on to the data warehouse system. 2. Store and manage data in multidimensional database system. 3. Provide access to business analysts and IT Professionals. 4. Analyze the data by application software. 5. Present the data in useful format such as graph or table.
DATA MINING: A KDD Process:
Steps of KDD Process: 1. Learni Learning ng the applic applicatio ation n doma domain in 2. Relevant Relevant prior prior knowledge knowledge and goals of application application 3. Creatin Creating g a target target data data set: set: data data select selection ion 4. Data Data clean cleaning ing and prepro preproces cessin sing g 5. Data Data reduct reduction ion and transfo transforma rmatio tion n 6. Find useful features, dimensionality or variable reduction, and invariant representation.
7. Choosi Choosing ng func functio tions ns of of data data mini mining ng 8. Summarizatio Summarization, n, classificati classification, on, regressio regression, n, associati association, on, clustering clustering.. 9. Choosi Choosing ng the the mini mining ng algo algorit rithm( hm(s) s) 10. Data mining mining:: search search for for patterns patterns of interest interest 11. Pattern Pattern evaluatio evaluation n and knowledge knowledge presentation presentation 12. Visualizatio Visualization, n, transformat transformation, ion, removing removing redundan redundantt patterns, patterns, etc. 13. Use of disc discove overed red kno knowled wledge. ge.
Methods of Data Mining:
1. Classification
2.Regression
3.Clustering 4.Associative rules 5.Visualization
PROJECT ON DATA MINING : “Website “Websi te Data Mining” We have created an application which works as a data mining for a website developer.
The project has been implemented successfully on a local server and has given an excellent feedback. •
Aim of the project:
To give a simple graph to a user on the whole information of websites •
Implementation:
The data warehouse that is being used for the project is information gathered by a survey. The data has been collected to a database. This database is used in the project. The database contains the information on many websites. This is a huge database. The database is formed going to the questionnaires that were subtitled by the users of that websites. The application we created is a web based one. The application creates particular graph like, pie chart, line chart or bar graph. These graphs are generated as per the parameters selected by the website builders. The parameters that are selected would look as the figure below:
These constraints entered by the user are considered to generate charts. The abstraction of the data from the database is done in effective manner. The user will just know, for exam exampl ple, e, a webs websit itee buil builde derr want wantss to know know wher wheree the the socia sociall netw networ orki king ng site sitess are are used used maximum as per the database will look as below:
•
Working:
Java Servlet Pages (JSP) is used for the program the application. The database is stored in the Microsoft Access DB. For implementation purpose a local server of Tomcat 6.0 Server is used. For generating the charts in JSP, we made use of the JFreeChart package. The page navigation is considered for taking the inputs. The traversing is as follows: Index.jsp
ganechhart.jsp
In index.jsp, the parameters are taken ta ken from the user. These parameters are posted to the genechart.jsp file in the server. The SQL queries are fixed to generate the appropriate records. These records are used to build the charts. Example of the code for SQL in JSP is as follows: String url="jdbc:odbc:Driver={Microsoft Access Driver(*.mdb)}; DBQ=/FinalDB.mdb;DriverID=22;READONLY=true"; Class.forName("sun.jdbc.odbc.JdbcOdbcDriver"); con=DriverManager.getConnection(url,"",""); st=con.createStatement(); rs = st.executeQuery( sSql ); while( rs.next() ) { out.println( "
" ); for( int i=1; i<=n; i++ ) // Achtung: erste Spalte mit 1 statt 0 out.println( "
" + rs.getString( i ) + "
" ); } These These records after getting getting formed, formed, an algorithm algorithm is used to get the statistics statistics of the data. This algorithm will give the whole implementation of websites that can be used to generate the chart. The charts are generated with the following code: while( rs3.next() ) { data.setValue(rs3.getString( 1 ), cvi[i++]); } final ChartRenderingInfo info = new ChartRenderingInfo(new StandardEntityCollection()); final File file1 = new File("../piechart3.png"); ChartUtilities.saveChartAsPNG(file1, chart, 600, 400, info); The chart when generated will be saved as ‘.png’ image file. This is then displayed as an output to the user. • Advantages: The website builder can retrieve the appropriate factors that he wants to know before creating a site. A big survey results can be generated within records and a simple understandable chart is prepared that can be used by the surveyors.
CONCLUSION Data Warehousing provides the means to change the raw data into information for
making making effecti effective ve busine business ss decisi decisions ons-th -thee emphas emphasis is on inform informatio ation, n, not data. data. The Data Data warehouse is the hub for decision support data. Data mining is a useful tool with multiple algorithms that can be tuned for specific tasks. It can benefit business, medicine, and science. It needs more efficient algorithms to speed up data mining process.
REFERENCE
Multidimensional Data analysis and Data Mining
- Arinjay Choudhary, Dr. P.S. Deshande Data Mining and Data Warehousing and OLAP
-A. Berson, S.J. Smith
www.datawarehousingonline.com AND www.Wikipedia.com