SlideShare a Scribd company logo
1 of 64
Download to read offline
Big Data & Hadoop
❑ LIVE On-Line Classes
❑ Class recordings made available for life time
❑ Quizzes and Assignments at end of each chapter
❑ Technical support
❑ Project work
❑ Assessment and Certification
❑ Post Training Guidance and Support
❑ Assistance in finding relevent Job
Day 1 Day 2
Week 1 Understanding Big Data
Hadoop Architecture
Hadoop Cluster
Data Loading Techniques
Week 2 Basic MapReduce Advanced MapReduce
YARN 2.0
Week 3 PIG Latin Hive
Week 4 NoSQL Databases, HBase and
ZooKeeper
Project Work
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
NYSE generates about one
terabyte of new trade data per
day to Perform stock trading
analytics to determine trends for
optimal trades.
Volume Variety Velocity
Big Data
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
Storage - Backup /
Read - Write
Processing
(ETL)
Usage /
Visualization
OLTP
RDBMS
Soci
al
Logs
Expensive
Storage and
processing
Lot of Data
Discarded
Storage spread across.
Not easily accessible.
Limited storage capacity
Reports
Reports
OLTP
RDBMS
Soci
al
Logs
Lot of Data
Discarded
Reports
(Batch)
Hadoop
DW Reports
❏
❏
❏
❏
❏
❏
❏
❏
1 Machine
4 I/O Channels
Each Channel -- 100 MBps
100 Machines
4 I/O Channels
Each Channel -- 100 MBps
1 Machine
4 I/O Channels
Each Channel -- 100 MBps
100 Machines
4 I/O Channels
Each Channel -- 100 MBps
Reading 1 TB Data
45 Minutes .45 Minutes
Story of Hadoop
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
Characterstics of Hadoop
Hadoop
Reliable
Economical
Scalable
Fault Tolerant
❏
❏
❏
❏
❏
❏
❏
❏
Hadoop Core Components
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
Name Node:
Keeps track of overall file
directory structure and the
placement of Data Block
Name Node
(Stores metadata only)
METADATA:
/user/doug/hinfo-> 1 3 5
/user/doug/pdetail-> 4 2
NameNode
Edit Logs
FSImage
❑
❑
❑
❑
NameNode
Sedondary
Namenode
File System
Metadata
Its been
an hour
?
Quiz
Quiz
Quiz
When the NameNode fails, Secondary NameNode takes over instantly and
prevents Cluster Failure:
❑ TRUE
❑ FALSE
Quiz
When the NameNode fails, Secondary NameNode takes
over instantly and prevents Cluster Failure:
❑ TRUE
❑ FALSE
False. Secondary NameNode is used for creating NameNode
Checkpoints. NameNode can be manually recovered using ‘edits’ and
‘FSImage’ stored in Secondary NameNode.
JobTracker
JobTracker (cotd..)
JobTracker (cotd..)
Quiz
Quiz
Rack 1 Rack 2 Rack 3
Block A Block B Block C
Topology script property topology.script.file.name in core-site.xml
❑
❑
❑
❑
Green - GA Versions
Black - Not Released by
Apache yet
Red - Commercial
❏ http://blog.cloudera.com/blog/2012/01/an-update-on-apache-
hadoop-1-0/
❏ https://blogs.apache.org/bigtop/entry/all_you_wanted_to_know
❏ https://hadoop.apache.org/releases.html
❏ http://hortonworks.com/blog/apache-hadoop-2-is-ga/
❏
❏
❏
❏
Class 2 Pre-work
❏ Setup hadoop environment using documents provided on google
drive
❏ CDH3 (recommended) or CDH4
❏ Execute basic linux commands
❏ Execute HDFS hands on commands
❏ Attempt the class-1 assignment
Thank You !
See you in next class

More Related Content

What's hot

Redis memory optimization sripathi, CTO hashedin
Redis memory optimization   sripathi, CTO hashedinRedis memory optimization   sripathi, CTO hashedin
Redis memory optimization sripathi, CTO hashedinHashedIn Technologies
 
MySQL Performance Schema in Action
MySQL Performance Schema in Action MySQL Performance Schema in Action
MySQL Performance Schema in Action Mydbops
 
Best practices for running MySQL on production - Vaibhav Upadhyay
Best practices for running MySQL on production - Vaibhav UpadhyayBest practices for running MySQL on production - Vaibhav Upadhyay
Best practices for running MySQL on production - Vaibhav UpadhyayMydbops
 
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...DataStax Academy
 
Алексей Лесовский "Тюнинг Linux для баз данных. "
Алексей Лесовский "Тюнинг Linux для баз данных. "Алексей Лесовский "Тюнинг Linux для баз данных. "
Алексей Лесовский "Тюнинг Linux для баз данных. "Tanya Denisyuk
 
Ceph at salesforce ceph day external presentation
Ceph at salesforce   ceph day external presentationCeph at salesforce   ceph day external presentation
Ceph at salesforce ceph day external presentationSameer Tiwari
 
Hp java heap dump analysis Workshop
Hp java heap dump analysis WorkshopHp java heap dump analysis Workshop
Hp java heap dump analysis WorkshopMadhavan Marimuthu
 
Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...
Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...
Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...ScyllaDB
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News AggregatorMário Almeida
 
Nov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonNov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonYahoo Developer Network
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataGruter
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChangerZero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChangerMongoDB
 
Using ZFS file system with MySQL
Using ZFS file system with MySQLUsing ZFS file system with MySQL
Using ZFS file system with MySQLMydbops
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataYan Wang
 
Draft slide of Demystifying DHT in GlusterFS
Draft slide of Demystifying DHT in GlusterFSDraft slide of Demystifying DHT in GlusterFS
Draft slide of Demystifying DHT in GlusterFSAnkit Raj
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basicHafizur Rahman
 
JFall 2011 no sql workshop
JFall 2011 no sql workshopJFall 2011 no sql workshop
JFall 2011 no sql workshopfvanvollenhoven
 

What's hot (20)

Redis memory optimization sripathi, CTO hashedin
Redis memory optimization   sripathi, CTO hashedinRedis memory optimization   sripathi, CTO hashedin
Redis memory optimization sripathi, CTO hashedin
 
MySQL Performance Schema in Action
MySQL Performance Schema in Action MySQL Performance Schema in Action
MySQL Performance Schema in Action
 
Hdfs internals
Hdfs internalsHdfs internals
Hdfs internals
 
Best practices for running MySQL on production - Vaibhav Upadhyay
Best practices for running MySQL on production - Vaibhav UpadhyayBest practices for running MySQL on production - Vaibhav Upadhyay
Best practices for running MySQL on production - Vaibhav Upadhyay
 
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
 
Алексей Лесовский "Тюнинг Linux для баз данных. "
Алексей Лесовский "Тюнинг Linux для баз данных. "Алексей Лесовский "Тюнинг Linux для баз данных. "
Алексей Лесовский "Тюнинг Linux для баз данных. "
 
Ceph at salesforce ceph day external presentation
Ceph at salesforce   ceph day external presentationCeph at salesforce   ceph day external presentation
Ceph at salesforce ceph day external presentation
 
Hp java heap dump analysis Workshop
Hp java heap dump analysis WorkshopHp java heap dump analysis Workshop
Hp java heap dump analysis Workshop
 
Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...
Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...
Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News Aggregator
 
Nov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In PythonNov HUG 2009: Hadoop Record Reader In Python
Nov HUG 2009: Hadoop Record Reader In Python
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Apache Hadoop HDFS
Apache Hadoop HDFSApache Hadoop HDFS
Apache Hadoop HDFS
 
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChangerZero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
 
Using ZFS file system with MySQL
Using ZFS file system with MySQLUsing ZFS file system with MySQL
Using ZFS file system with MySQL
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
 
Draft slide of Demystifying DHT in GlusterFS
Draft slide of Demystifying DHT in GlusterFSDraft slide of Demystifying DHT in GlusterFS
Draft slide of Demystifying DHT in GlusterFS
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
 
JFall 2011 no sql workshop
JFall 2011 no sql workshopJFall 2011 no sql workshop
JFall 2011 no sql workshop
 

Similar to Day 1 big data & hadoop By SoApt

NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimePlamen Jeliazkov
 
Plain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsPlain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsAngela Byron
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityEdureka!
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxMiraj Godha
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionEdureka!
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache HadoopSufi Nawaz
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYWangda Tan
 

Similar to Day 1 big data & hadoop By SoApt (20)

NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real Time
 
Plain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsPlain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticals
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
MySQL HA
MySQL HAMySQL HA
MySQL HA
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptx
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Redis
RedisRedis
Redis
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 

Day 1 big data & hadoop By SoApt