Big Data Analytics This course provides practical foundation level training that enables immediate and effective participation in big data projects. At the end of this course, the student will become familiar with the fundamental concepts of Big Data management and analytics; will become competent in recognizing challenges faced by applications dealing with very large volumes of data as well as in proposing scalable solutions for them; and will be able to understand how Big Data impacts business intelligence, scientific discovery, and our day-to-day life. Core topics: Introduction to the Big Data problem. Current challenges, trends, and applications Technologies for Big Data management Hands on prototype projects to get the actua; working of these technologies
Module 1: Introduction to Big B ig Data Analytics
The Evolution of Data Management, Defining Big Data, Traditional and advanced analytics. History of big data, its elements, career related knowledge, advantages, and disadvantages. Application perspective of Big Data covering topics such as using big data in marketing, analytics, retail, hospitality, consumer good, defense etc. Module 2: Introduction to Big Data and Hadoop eco system
This module focuses on Data Explosion, Types of Data, Need for Big Data, Big Data and Its Sources, Characteristics of Big Data Technology, Technology, Leveraging Multiple Sources of Data, Hadoop/Spark based technologies for Handling Big Data.
Module 3: Interactive analysis
Hive: Introducing Hive, Getting Started with Hive, Hive Variables, Hive Properties, Data types in Hive, Loading Files into Tables, Application in Hive, Inserting Data into Tables, Update in Hive.
Introduction to schema on write, dimensional models to exploit/analyse business metrics, data pond for analysis, metrics and KPIs, Drilling/roll ups, slice/dice for big data, Implementing a sales analysis system with Hive
Module 4: Advanced Analytics (structured and and Time series Analysis)
Introduction Introduction to Analysis Base Tables, Dimension reduction, ETL for analysis, data pond for analytics, analytics, Implementing a customer profiling system with SparkSQL Hbase: HBase – Introduction, Introduction, Characteristics of HBase, Companies Using HBase, HBase Architecture, Architecture, Storage Model of HBase, Row Distribution of Data between Region Servers , Data Storage in HBase, Data Model, HBase vs. RDBMS, Implementing a time series analysis system for IoT
Module 5: Text Analytics
Pig: Introducing Pig, the Pig Architecture, Architect ure, Benefits of Pig, Installing Pig, Properties Propert ies of Pig, Running Pig, Running Pig Programs, Pig Latin Structure, Application Flow. Text analysis, Tokenizing, filtering, scoring, corpus creation, implementing a Sentiment analysis system for Tweets
Module 6: Real time Analytics
Module 6: Real time Analytics
Introduction to Lambda architecture, aggregations, anomaly detection, CEP, batch layer models, real time thresholds, implementing a real time analytics for impressions
Dr. Sridhar Vaithianathan
[email protected] u.in
Pig: Introducing Pig, the Pig Architecture, Benefits of Pig, Installing Pig, Properties of Pig, Running Pig, Running Pig Programs, Pig Latin Structure, Application Flow. Hive: Introducing Hive, Getting Started with Hive, Hive Variables, Hive Properties, Data types in Hive, Loading Files into Tables, Application in Hive, Inserting Data into Tables, Update in Hive. Hbase: HBase – Introduction, Characteristics of HBase, Companies Using HBase, HBase Architecture, Storage Model of HBase, Row Distribution of Data between Region Servers , Data Storage in HBase, Data Model, HBase vs. RDBMS.