SlideShare a Scribd company logo
1 of 5
Download to read offline
2016
Big Data Technologies
Hadoop and Analytics
Course Guide
Big Data Technologies
Hadoop and Analytics
Venue:
Indian Institute of Corporate Affairs (IICA)
(Under Ministry of Corporate Affairs)
Plot No. 6,7,8 Sector 5
IMT Manesar, Gurgaon
Haryana
Big Data Technologies • HADOOP • Analytics IICA
Centre for e-Governance • Indian Institute of Corporate Affairs 2
Big Data Technologies
Hadoop and Analytics
Hands on with Big Data Technologies and Analytics
Center for e-Governance
Indian Institute of Corporate Affairs
(Under Ministry of Corporate Affairs)
Plot No. 6,7,8 Sector 5
IMT Manesar, Gurgaon
Haryana
Website: http://www.iica.in
Updated Dec 2016
Big Data Technologies • HADOOP • Analytics IICA
Centre for e-Governance • Indian Institute of Corporate Affairs 3
Table of Contents
Module 1 - Introduction to Linux ........................................................................... 7
- Linux as a prerequisite for Big Data and Hadoop
- Overview of Linux Operating System
- Understanding the Linux command line
- Linux Commands and Shell Scripts
- Working with Linux GUI
- Exercises
Module 2 - Understanding Big Data .................................................................... 22
- Introduction to Big Data Technologies
- The 3 Vs of Big Data (Volume, Variety and Velocity)
- Structured and Unstructured Data
- Centralized vs. Distributed computing
- Applications and use cases of Big Data
- Opportunities and challenges of Big Data
Module 3 - Getting started with Hadoop ............................................................. 34
- What is Hadoop, and why is it popular
- Overview of Apache BigTop and Hadoop installation
- Hadoop configuration files
- Overview of Hadoop Vendor Distributions
- Distributed File Systems (DFS)
- Various types of DFS
- Getting familiar with Hadoop Virtual Machine Environment
- Hadoop Ecosystem Tools and Components
- Hadoop Command line (CLI) and Graphical interface (GUI)
- Exercises
Module 4 - Understanding the Hadoop Architecture ......................................... 51
- Name Node and Data Nodes
- Difference between Hadoop 1.x and 2.x
- Hadoop Distributed File System (HDFS)
- HDFS Overview and Architecture
- HDFS Data Flows (Read and Write)
- HDFS Interfaces - Command Line Interface, File System, Administrative and
Web Interface
- Copying data into HDFS, and working with data in HDFS
- Advanced HDFS features, like Data replication, Rack awareness, Fuse-DFS
- Overview of HDFS Federation, High Availability, Distcp and Hadoop Archives
- Exercises
Big Data Technologies • HADOOP • Analytics IICA
Centre for e-Governance • Indian Institute of Corporate Affairs 4
Module 5 - YARN and MapReduce....................................................................... 75
- Functional Programming paradigms
- What is MapReduce
- Shuffling and Sorting
- YARN Resource Manager UI
- Standalone, Pseudo distributed, and Fully distributed mode
- MapReduce v1 compared to YARN and MapReduce v2
- Examples of MapReduce programs
- Exercises
Module 6 - Data Ingestion in HDFS...................................................................... 82
- Importing data to HDFS
- Introduction to SQOOP
- SQOOP configuration
- Ingesting data in HDFS using SQOOP
- Exporting data to RDBMS
- Introduction to Flume
- Flume configuration
- Capturing data in real-time using Flume
- Exercises
Module 7 - Working with Hive .............................................................................. 95
- Introduction to Hive and its Architecture
- Different Modes of executing Hive queries
- HiveQL (DDL & DML Operations)
- External vs. Managed Tables
- Hive vs. Impala
- User-Defined Functions (UDFs)
- Exercises
Module 8 - Working with Pig.............................................................................. 107
- Different Modes of executing Pig
- Pig Data Types
- Pig Latin language Constructs (LOAD, STORE, DUMP, SPLI T etc.)
- User-Defined Functions (UDFs)
- Developing and deploying Pig programs
- Exercises
Module 9 - Getting familiar with Apache Hadoop Ecosystem Tools .............. 112
- Introduction to Oozie workflows, designs and deployments
- Apache Mahout, and Building a Recommender using Mahout
- Introduction to Avro, Kafka, Storm, and Zookeeper
- Exercises
Big Data Technologies • HADOOP • Analytics IICA
Centre for e-Governance • Indian Institute of Corporate Affairs 5
Module 10 - Introduction to NoSQL Databases................................................ 120
- Review of RDBMS
- Need for NoSQL
- Brewers CAP Theorem
- ACID vs. BASE
- Schema on Read vs. Schema on Write
- Different levels of consistency
- Different types of NoSQL databases
- Exercises
Module 11 - Working with NoSQL Databases................................................... 123
- Document stores - CouchBase, MongoDB
- Graph databases - Neo4J
- Key-value stores - Riak
- Column Family - Cassandra, HBase
- Overview of Hybrid NoSQL Databases
- Exercises
Module 12 - Working with Apache Spark.......................................................... 130
- Understanding Spark Architecture
- Comparing Hadoop and Spark
- Introduction to RDD
- Spark SQL
- Sample programs in Spark
- Exercises
Module 13 - Introduction to Data Analytics ...................................................... 138
- Difference between Data Analysis and Analytics
- Types of Analytics
- Big Data Analytics
- Business Analytics
- Predictive Analytics
- Real-Time Analytics
- Web Analytics
- Customized Analytics Solutions
- Exercises
Module 14 - Big Data Proof of Concepts and Use Cases ................................ 155
- Text Mining
- Traditional case of Watson
- Sentiment Analysis
- Weather Data Analysis
- Trending Topics and Conclusion
- Exercises

More Related Content

What's hot

2 bda module-2 apache hive
2 bda module-2 apache hive2 bda module-2 apache hive
2 bda module-2 apache hiveYashaswiniAS1
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Vijay_hadoop admin
Vijay_hadoop adminVijay_hadoop admin
Vijay_hadoop adminvijay vijay
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture Ganesh B
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Zohar Elkayam
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkAgnihotriGhosh2
 
ATHOKPAM NABAKUMAR SINGH's HADOOP ADMIN
ATHOKPAM NABAKUMAR SINGH's HADOOP ADMINATHOKPAM NABAKUMAR SINGH's HADOOP ADMIN
ATHOKPAM NABAKUMAR SINGH's HADOOP ADMINAthokpam Nabakumar
 
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentationmlang222
 
Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Nicolas Morales
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopVigen Sahakyan
 

What's hot (20)

2 bda module-2 apache hive
2 bda module-2 apache hive2 bda module-2 apache hive
2 bda module-2 apache hive
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Hadoop
HadoopHadoop
Hadoop
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Vijay_hadoop admin
Vijay_hadoop adminVijay_hadoop admin
Vijay_hadoop admin
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop
HadoopHadoop
Hadoop
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Hadoop
HadoopHadoop
Hadoop
 
ATHOKPAM NABAKUMAR SINGH's HADOOP ADMIN
ATHOKPAM NABAKUMAR SINGH's HADOOP ADMINATHOKPAM NABAKUMAR SINGH's HADOOP ADMIN
ATHOKPAM NABAKUMAR SINGH's HADOOP ADMIN
 
Anju
AnjuAnju
Anju
 
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentation
 
Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0
 
Cloudera search
Cloudera searchCloudera search
Cloudera search
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 

Viewers also liked

Қазақ ұлттық шешендік өнерінің тарихы
Қазақ ұлттық шешендік өнерінің тарихыҚазақ ұлттық шешендік өнерінің тарихы
Қазақ ұлттық шешендік өнерінің тарихыBilim All
 
Аппликация
АппликацияАппликация
АппликацияBilim All
 
Ban Josip Jelačić_Ema
Ban Josip Jelačić_EmaBan Josip Jelačić_Ema
Ban Josip Jelačić_EmaZeljka Ditrih
 
HP-UX 11iv3 Ignite-UX with NFSv4 and SSH Tunnel by Dusan Baljevic
HP-UX 11iv3 Ignite-UX with NFSv4 and SSH Tunnel by Dusan BaljevicHP-UX 11iv3 Ignite-UX with NFSv4 and SSH Tunnel by Dusan Baljevic
HP-UX 11iv3 Ignite-UX with NFSv4 and SSH Tunnel by Dusan BaljevicCircling Cycle
 
генетика оқулық
генетика оқулықгенетика оқулық
генетика оқулықAsem Sarsembayeva
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text MiningMichel Bruley
 
а ну ка девочки
а ну ка девочкиа ну ка девочки
а ну ка девочкиsed49
 
Columbus Bar Association - Common Ethics Mistake Law Firms Make When Marketin...
Columbus Bar Association - Common Ethics Mistake Law Firms Make When Marketin...Columbus Bar Association - Common Ethics Mistake Law Firms Make When Marketin...
Columbus Bar Association - Common Ethics Mistake Law Firms Make When Marketin...Get Noticed Get Found
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Viewers also liked (12)

Estou contigo
Estou contigoEstou contigo
Estou contigo
 
Қазақ ұлттық шешендік өнерінің тарихы
Қазақ ұлттық шешендік өнерінің тарихыҚазақ ұлттық шешендік өнерінің тарихы
Қазақ ұлттық шешендік өнерінің тарихы
 
Аппликация
АппликацияАппликация
Аппликация
 
Ban Josip Jelačić_Ema
Ban Josip Jelačić_EmaBan Josip Jelačić_Ema
Ban Josip Jelačić_Ema
 
HP-UX 11iv3 Ignite-UX with NFSv4 and SSH Tunnel by Dusan Baljevic
HP-UX 11iv3 Ignite-UX with NFSv4 and SSH Tunnel by Dusan BaljevicHP-UX 11iv3 Ignite-UX with NFSv4 and SSH Tunnel by Dusan Baljevic
HP-UX 11iv3 Ignite-UX with NFSv4 and SSH Tunnel by Dusan Baljevic
 
Iot 융합기술 적용사례 및 발전전망(keti)
Iot 융합기술 적용사례 및 발전전망(keti)Iot 융합기술 적용사례 및 발전전망(keti)
Iot 융합기술 적용사례 및 발전전망(keti)
 
генетика оқулық
генетика оқулықгенетика оқулық
генетика оқулық
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text Mining
 
а ну ка девочки
а ну ка девочкиа ну ка девочки
а ну ка девочки
 
LIÇÃO 10 - MANSIDÃO: TORNA O CRENTE APTO PARA EVITAR PELEJAS
LIÇÃO 10 - MANSIDÃO: TORNA O CRENTE APTO PARA EVITAR PELEJASLIÇÃO 10 - MANSIDÃO: TORNA O CRENTE APTO PARA EVITAR PELEJAS
LIÇÃO 10 - MANSIDÃO: TORNA O CRENTE APTO PARA EVITAR PELEJAS
 
Columbus Bar Association - Common Ethics Mistake Law Firms Make When Marketin...
Columbus Bar Association - Common Ethics Mistake Law Firms Make When Marketin...Columbus Bar Association - Common Ethics Mistake Law Firms Make When Marketin...
Columbus Bar Association - Common Ethics Mistake Law Firms Make When Marketin...
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similar to Big Data Analytics Course Guide TOC

Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotechlccinfotech
 
Hadoop 2.0-development
Hadoop 2.0-developmentHadoop 2.0-development
Hadoop 2.0-developmentKnowledgehut
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Edureka!
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Hadoop_Architect__eVenkat
Hadoop_Architect__eVenkatHadoop_Architect__eVenkat
Hadoop_Architect__eVenkatVenkat Krishnan
 
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...NashvilleTechCouncil
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringAnant Corporation
 
SplunkLive! Hunk Technical Overview
SplunkLive! Hunk Technical OverviewSplunkLive! Hunk Technical Overview
SplunkLive! Hunk Technical OverviewSplunk
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Eric Baldeschwieler
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 

Similar to Big Data Analytics Course Guide TOC (20)

Hadoop content
Hadoop contentHadoop content
Hadoop content
 
Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotech
 
Big Data and Hadoop Training in Bangalore by myTectra
Big Data and Hadoop Training in Bangalore by myTectraBig Data and Hadoop Training in Bangalore by myTectra
Big Data and Hadoop Training in Bangalore by myTectra
 
Hadoop 2.0-development
Hadoop 2.0-developmentHadoop 2.0-development
Hadoop 2.0-development
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)
 
Hadoop training in Bangalore
Hadoop training in BangaloreHadoop training in Bangalore
Hadoop training in Bangalore
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Hareesh
HareeshHareesh
Hareesh
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Hadoop_Architect__eVenkat
Hadoop_Architect__eVenkatHadoop_Architect__eVenkat
Hadoop_Architect__eVenkat
 
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
The Nuts and Bolts of Hadoop and it's Ever-changing Ecosystem, Presented by J...
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data Engineering
 
Hadoop
HadoopHadoop
Hadoop
 
SplunkLive! Hunk Technical Overview
SplunkLive! Hunk Technical OverviewSplunkLive! Hunk Technical Overview
SplunkLive! Hunk Technical Overview
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 

More from Manish Chopra

AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdfAWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdfManish Chopra
 
Getting Started with ChatGPT.pdf
Getting Started with ChatGPT.pdfGetting Started with ChatGPT.pdf
Getting Started with ChatGPT.pdfManish Chopra
 
Grafana and AWS - Implementation and Usage
Grafana and AWS - Implementation and UsageGrafana and AWS - Implementation and Usage
Grafana and AWS - Implementation and UsageManish Chopra
 
Containers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdfContainers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdfManish Chopra
 
OpenKM Solution Document
OpenKM Solution DocumentOpenKM Solution Document
OpenKM Solution DocumentManish Chopra
 
Alfresco Content Services - Solution Document
Alfresco Content Services - Solution DocumentAlfresco Content Services - Solution Document
Alfresco Content Services - Solution DocumentManish Chopra
 
Jenkins Study Guide ToC
Jenkins Study Guide ToCJenkins Study Guide ToC
Jenkins Study Guide ToCManish Chopra
 
Ansible Study Guide ToC
Ansible Study Guide ToCAnsible Study Guide ToC
Ansible Study Guide ToCManish Chopra
 
Microservices with Dockers and Kubernetes
Microservices with Dockers and KubernetesMicroservices with Dockers and Kubernetes
Microservices with Dockers and KubernetesManish Chopra
 
Unix and Linux Operating Systems
Unix and Linux Operating SystemsUnix and Linux Operating Systems
Unix and Linux Operating SystemsManish Chopra
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 
Preparing a Dataset for Processing
Preparing a Dataset for ProcessingPreparing a Dataset for Processing
Preparing a Dataset for ProcessingManish Chopra
 
Organizations with largest hadoop clusters
Organizations with largest hadoop clustersOrganizations with largest hadoop clusters
Organizations with largest hadoop clustersManish Chopra
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File SystemsManish Chopra
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Manish Chopra
 
Oracle solaris 11 installation
Oracle solaris 11 installationOracle solaris 11 installation
Oracle solaris 11 installationManish Chopra
 
Emergence and Importance of Cloud Computing for the Enterprise
Emergence and Importance of Cloud Computing for the EnterpriseEmergence and Importance of Cloud Computing for the Enterprise
Emergence and Importance of Cloud Computing for the EnterpriseManish Chopra
 
Steps to create an RPM package in Linux
Steps to create an RPM package in LinuxSteps to create an RPM package in Linux
Steps to create an RPM package in LinuxManish Chopra
 
Setting up a HADOOP 2.2 cluster on CentOS 6
Setting up a HADOOP 2.2 cluster on CentOS 6Setting up a HADOOP 2.2 cluster on CentOS 6
Setting up a HADOOP 2.2 cluster on CentOS 6Manish Chopra
 
The Anatomy of GOOGLE Search Engine
The Anatomy of GOOGLE Search EngineThe Anatomy of GOOGLE Search Engine
The Anatomy of GOOGLE Search EngineManish Chopra
 

More from Manish Chopra (20)

AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdfAWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
 
Getting Started with ChatGPT.pdf
Getting Started with ChatGPT.pdfGetting Started with ChatGPT.pdf
Getting Started with ChatGPT.pdf
 
Grafana and AWS - Implementation and Usage
Grafana and AWS - Implementation and UsageGrafana and AWS - Implementation and Usage
Grafana and AWS - Implementation and Usage
 
Containers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdfContainers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdf
 
OpenKM Solution Document
OpenKM Solution DocumentOpenKM Solution Document
OpenKM Solution Document
 
Alfresco Content Services - Solution Document
Alfresco Content Services - Solution DocumentAlfresco Content Services - Solution Document
Alfresco Content Services - Solution Document
 
Jenkins Study Guide ToC
Jenkins Study Guide ToCJenkins Study Guide ToC
Jenkins Study Guide ToC
 
Ansible Study Guide ToC
Ansible Study Guide ToCAnsible Study Guide ToC
Ansible Study Guide ToC
 
Microservices with Dockers and Kubernetes
Microservices with Dockers and KubernetesMicroservices with Dockers and Kubernetes
Microservices with Dockers and Kubernetes
 
Unix and Linux Operating Systems
Unix and Linux Operating SystemsUnix and Linux Operating Systems
Unix and Linux Operating Systems
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Preparing a Dataset for Processing
Preparing a Dataset for ProcessingPreparing a Dataset for Processing
Preparing a Dataset for Processing
 
Organizations with largest hadoop clusters
Organizations with largest hadoop clustersOrganizations with largest hadoop clusters
Organizations with largest hadoop clusters
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
 
Oracle solaris 11 installation
Oracle solaris 11 installationOracle solaris 11 installation
Oracle solaris 11 installation
 
Emergence and Importance of Cloud Computing for the Enterprise
Emergence and Importance of Cloud Computing for the EnterpriseEmergence and Importance of Cloud Computing for the Enterprise
Emergence and Importance of Cloud Computing for the Enterprise
 
Steps to create an RPM package in Linux
Steps to create an RPM package in LinuxSteps to create an RPM package in Linux
Steps to create an RPM package in Linux
 
Setting up a HADOOP 2.2 cluster on CentOS 6
Setting up a HADOOP 2.2 cluster on CentOS 6Setting up a HADOOP 2.2 cluster on CentOS 6
Setting up a HADOOP 2.2 cluster on CentOS 6
 
The Anatomy of GOOGLE Search Engine
The Anatomy of GOOGLE Search EngineThe Anatomy of GOOGLE Search Engine
The Anatomy of GOOGLE Search Engine
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 

Big Data Analytics Course Guide TOC

  • 1. 2016 Big Data Technologies Hadoop and Analytics Course Guide Big Data Technologies Hadoop and Analytics Venue: Indian Institute of Corporate Affairs (IICA) (Under Ministry of Corporate Affairs) Plot No. 6,7,8 Sector 5 IMT Manesar, Gurgaon Haryana
  • 2. Big Data Technologies • HADOOP • Analytics IICA Centre for e-Governance • Indian Institute of Corporate Affairs 2 Big Data Technologies Hadoop and Analytics Hands on with Big Data Technologies and Analytics Center for e-Governance Indian Institute of Corporate Affairs (Under Ministry of Corporate Affairs) Plot No. 6,7,8 Sector 5 IMT Manesar, Gurgaon Haryana Website: http://www.iica.in Updated Dec 2016
  • 3. Big Data Technologies • HADOOP • Analytics IICA Centre for e-Governance • Indian Institute of Corporate Affairs 3 Table of Contents Module 1 - Introduction to Linux ........................................................................... 7 - Linux as a prerequisite for Big Data and Hadoop - Overview of Linux Operating System - Understanding the Linux command line - Linux Commands and Shell Scripts - Working with Linux GUI - Exercises Module 2 - Understanding Big Data .................................................................... 22 - Introduction to Big Data Technologies - The 3 Vs of Big Data (Volume, Variety and Velocity) - Structured and Unstructured Data - Centralized vs. Distributed computing - Applications and use cases of Big Data - Opportunities and challenges of Big Data Module 3 - Getting started with Hadoop ............................................................. 34 - What is Hadoop, and why is it popular - Overview of Apache BigTop and Hadoop installation - Hadoop configuration files - Overview of Hadoop Vendor Distributions - Distributed File Systems (DFS) - Various types of DFS - Getting familiar with Hadoop Virtual Machine Environment - Hadoop Ecosystem Tools and Components - Hadoop Command line (CLI) and Graphical interface (GUI) - Exercises Module 4 - Understanding the Hadoop Architecture ......................................... 51 - Name Node and Data Nodes - Difference between Hadoop 1.x and 2.x - Hadoop Distributed File System (HDFS) - HDFS Overview and Architecture - HDFS Data Flows (Read and Write) - HDFS Interfaces - Command Line Interface, File System, Administrative and Web Interface - Copying data into HDFS, and working with data in HDFS - Advanced HDFS features, like Data replication, Rack awareness, Fuse-DFS - Overview of HDFS Federation, High Availability, Distcp and Hadoop Archives - Exercises
  • 4. Big Data Technologies • HADOOP • Analytics IICA Centre for e-Governance • Indian Institute of Corporate Affairs 4 Module 5 - YARN and MapReduce....................................................................... 75 - Functional Programming paradigms - What is MapReduce - Shuffling and Sorting - YARN Resource Manager UI - Standalone, Pseudo distributed, and Fully distributed mode - MapReduce v1 compared to YARN and MapReduce v2 - Examples of MapReduce programs - Exercises Module 6 - Data Ingestion in HDFS...................................................................... 82 - Importing data to HDFS - Introduction to SQOOP - SQOOP configuration - Ingesting data in HDFS using SQOOP - Exporting data to RDBMS - Introduction to Flume - Flume configuration - Capturing data in real-time using Flume - Exercises Module 7 - Working with Hive .............................................................................. 95 - Introduction to Hive and its Architecture - Different Modes of executing Hive queries - HiveQL (DDL & DML Operations) - External vs. Managed Tables - Hive vs. Impala - User-Defined Functions (UDFs) - Exercises Module 8 - Working with Pig.............................................................................. 107 - Different Modes of executing Pig - Pig Data Types - Pig Latin language Constructs (LOAD, STORE, DUMP, SPLI T etc.) - User-Defined Functions (UDFs) - Developing and deploying Pig programs - Exercises Module 9 - Getting familiar with Apache Hadoop Ecosystem Tools .............. 112 - Introduction to Oozie workflows, designs and deployments - Apache Mahout, and Building a Recommender using Mahout - Introduction to Avro, Kafka, Storm, and Zookeeper - Exercises
  • 5. Big Data Technologies • HADOOP • Analytics IICA Centre for e-Governance • Indian Institute of Corporate Affairs 5 Module 10 - Introduction to NoSQL Databases................................................ 120 - Review of RDBMS - Need for NoSQL - Brewers CAP Theorem - ACID vs. BASE - Schema on Read vs. Schema on Write - Different levels of consistency - Different types of NoSQL databases - Exercises Module 11 - Working with NoSQL Databases................................................... 123 - Document stores - CouchBase, MongoDB - Graph databases - Neo4J - Key-value stores - Riak - Column Family - Cassandra, HBase - Overview of Hybrid NoSQL Databases - Exercises Module 12 - Working with Apache Spark.......................................................... 130 - Understanding Spark Architecture - Comparing Hadoop and Spark - Introduction to RDD - Spark SQL - Sample programs in Spark - Exercises Module 13 - Introduction to Data Analytics ...................................................... 138 - Difference between Data Analysis and Analytics - Types of Analytics - Big Data Analytics - Business Analytics - Predictive Analytics - Real-Time Analytics - Web Analytics - Customized Analytics Solutions - Exercises Module 14 - Big Data Proof of Concepts and Use Cases ................................ 155 - Text Mining - Traditional case of Watson - Sentiment Analysis - Weather Data Analysis - Trending Topics and Conclusion - Exercises