Big Data Analytics Course Guide TOC

2016
Big Data Technologies
Hadoop and Analytics
Course Guide
Venue:
Indian Institute of Corporate Affairs (IICA)
(Under Ministry of Corporate Affairs)
Plot No. 6,7,8 Sector 5
IMT Manesar, Gurgaon
Haryana

Big Data Technologies • HADOOP • Analytics IICA
Centre for e-Governance • Indian Institute of Corporate Affairs 2
Hands on with Big Data Technologies and Analytics
Center for e-Governance
Indian Institute of Corporate Affairs
(Under Ministry of Corporate Affairs)
Plot No. 6,7,8 Sector 5
IMT Manesar, Gurgaon
Haryana
Website: http://www.iica.in
Updated Dec 2016

Table of Contents
Module 1 - Introduction to Linux ........................................................................... 7
- Linux as a prerequisite for Big Data and Hadoop
- Overview of Linux Operating System
- Understanding the Linux command line
- Linux Commands and Shell Scripts
- Working with Linux GUI
- Exercises
Module 2 - Understanding Big Data .................................................................... 22
- Introduction to Big Data Technologies
- The 3 Vs of Big Data (Volume, Variety and Velocity)
- Structured and Unstructured Data
- Centralized vs. Distributed computing
- Applications and use cases of Big Data
- Opportunities and challenges of Big Data
Module 3 - Getting started with Hadoop ............................................................. 34
- What is Hadoop, and why is it popular
- Overview of Apache BigTop and Hadoop installation
- Hadoop configuration files
- Overview of Hadoop Vendor Distributions
- Distributed File Systems (DFS)
- Various types of DFS
- Getting familiar with Hadoop Virtual Machine Environment
- Hadoop Ecosystem Tools and Components
- Hadoop Command line (CLI) and Graphical interface (GUI)
- Exercises
Module 4 - Understanding the Hadoop Architecture ......................................... 51
- Name Node and Data Nodes
- Difference between Hadoop 1.x and 2.x
- Hadoop Distributed File System (HDFS)
- HDFS Overview and Architecture
- HDFS Data Flows (Read and Write)
- HDFS Interfaces - Command Line Interface, File System, Administrative and
Web Interface
- Copying data into HDFS, and working with data in HDFS
- Advanced HDFS features, like Data replication, Rack awareness, Fuse-DFS
- Overview of HDFS Federation, High Availability, Distcp and Hadoop Archives
- Exercises

Module 5 - YARN and MapReduce....................................................................... 75
- Functional Programming paradigms
- What is MapReduce
- Shuffling and Sorting
- YARN Resource Manager UI
- Standalone, Pseudo distributed, and Fully distributed mode
- MapReduce v1 compared to YARN and MapReduce v2
- Examples of MapReduce programs
- Exercises
Module 6 - Data Ingestion in HDFS...................................................................... 82
- Importing data to HDFS
- Introduction to SQOOP
- SQOOP configuration
- Ingesting data in HDFS using SQOOP
- Exporting data to RDBMS
- Introduction to Flume
- Flume configuration
- Capturing data in real-time using Flume
- Exercises
Module 7 - Working with Hive .............................................................................. 95
- Introduction to Hive and its Architecture
- Different Modes of executing Hive queries
- HiveQL (DDL & DML Operations)
- External vs. Managed Tables
- Hive vs. Impala
- User-Defined Functions (UDFs)
- Exercises
Module 8 - Working with Pig.............................................................................. 107
- Different Modes of executing Pig
- Pig Data Types
- Pig Latin language Constructs (LOAD, STORE, DUMP, SPLI T etc.)
- User-Defined Functions (UDFs)
- Developing and deploying Pig programs
- Exercises
Module 9 - Getting familiar with Apache Hadoop Ecosystem Tools .............. 112
- Introduction to Oozie workflows, designs and deployments
- Apache Mahout, and Building a Recommender using Mahout
- Introduction to Avro, Kafka, Storm, and Zookeeper
- Exercises

Module 10 - Introduction to NoSQL Databases................................................ 120
- Review of RDBMS
- Need for NoSQL
- Brewers CAP Theorem
- ACID vs. BASE
- Schema on Read vs. Schema on Write
- Different levels of consistency
- Different types of NoSQL databases
- Exercises
Module 11 - Working with NoSQL Databases................................................... 123
- Document stores - CouchBase, MongoDB
- Graph databases - Neo4J
- Key-value stores - Riak
- Column Family - Cassandra, HBase
- Overview of Hybrid NoSQL Databases
- Exercises
Module 12 - Working with Apache Spark.......................................................... 130
- Understanding Spark Architecture
- Comparing Hadoop and Spark
- Introduction to RDD
- Spark SQL
- Sample programs in Spark
- Exercises
Module 13 - Introduction to Data Analytics ...................................................... 138
- Difference between Data Analysis and Analytics
- Types of Analytics
- Big Data Analytics
- Business Analytics
- Predictive Analytics
- Real-Time Analytics
- Web Analytics
- Customized Analytics Solutions
- Exercises
Module 14 - Big Data Proof of Concepts and Use Cases ................................ 155
- Text Mining
- Traditional case of Watson
- Sentiment Analysis
- Weather Data Analysis
- Trending Topics and Conclusion
- Exercises

Big Data Analytics Course Guide TOC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Big Data Analytics Course Guide TOC

Similar to Big Data Analytics Course Guide TOC (20)

More from Manish Chopra

More from Manish Chopra (20)

Recently uploaded

Recently uploaded (20)

Big Data Analytics Course Guide TOC