SlideShare a Scribd company logo
1 of 21
Download to read offline
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC.
                         Copyright for all other & referenced work is retained by their respective owners.




Introducing Hadoop
Mastering Hadoop Map-reduce for Data Analysis


Shashank Tiwari
blog: shanky.org | twitter: @tshanky
st@treasuryofideas.com
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                        All other & referenced work is copyrighted to their respective owners




What is Hadoop
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                        All other & referenced work is copyrighted to their respective owners




HDFS Architecture
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                        All other & referenced work is copyrighted to their respective owners




Namenode/Datanode, JobTracker/TaskTracker
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                      All other & referenced work is copyrighted to their respective owners




MapReduce
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                      All other & referenced work is copyrighted to their respective owners




ZK Namespace
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                        All other & referenced work is copyrighted to their respective owners




Essential HBase Schema
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                         All other & referenced work is copyrighted to their respective owners




Multi-dimensional View
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                  All other & referenced work is copyrighted to their respective owners




A Map/Hash View

•{


• "row_key_1" : { "name" : {


•     "first_name" : "Jolly", "last_name" : "Goodfellow"


•     } } },


•    "location" : { "zip": "94301" },
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                         All other & referenced work is copyrighted to their respective owners




Architectural View (HBase)
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                       All other & referenced work is copyrighted to their respective owners




The Persistence Mechanism
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                         All other & referenced work is copyrighted to their respective owners




The underlying file format
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                 All other & referenced work is copyrighted to their respective owners




Installing & Setting up Hadoop

• Required software: Java 1.6.x, ssh + sshd


• Download


• Install


• Configure


   • single-node


   • pseudo-distributed


   • cluster
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Download

• Source: http://hadoop.apache.org/


• Version:


   • 0.20.203.x -- current stable


   • 0.20.x -- previous stable


• Includes


   • Hadoop Common -- common utilities, HDFS, MapReduce
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Install

• Extract: tar zxvf hadoop-0.20.203.0rc1.tar.gz


• Move & Create Symbolic Link


   • ln -s hadoop-0.20.203.0 hadoop


• On Windows


   • http://developer.yahoo.com/hadoop/tutorial/module3.html
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                               All other & referenced work is copyrighted to their respective owners




Configure -- single-node

• Edit: conf/hadoop-env.sh


  • Set JAVA_HOME


• Default configuration is single-node


• Start bin/hadoop (for command options)


• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/
  single_node_setup.html
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Configure -- pseduo-distributed

• Edit: conf/core-site.xml (configure HDFS daemon)


• Edit: conf/hdfs-site.xml (configure HDFS replication factor)


• Edit: conf/mapred-site.xml (configure MapReduce JobTracker daemon)


• Enable ssh to localhost (without passphrase)


• Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/
  single_node_setup.html
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                 All other & referenced work is copyrighted to their respective owners




Start Hadoop
• Format HDFS: bin/hadoop namenode -format


• Start all daemons: bin/start-all.sh


• Verify logs


• Browse the web interface:


   • Namenode: http://localhost:50070/


   • JobTracker: http://localhost:50030/
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Take Hadoop for a test-drive
• Run examples (hadoop-examples-0.20.203.0.jar)


• Grep using regular expressions


  • Copy files to HDFS: bin/hadoop fs -put bin input


  • Grep for files which have text beginning with ‘start’


  • Verify output on HDFS: bin/hadoop fs -cat output/*


  • Copy output to local filesystem & verify: bin/hadoop fs -get output output
    && cat output/*
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                               All other & referenced work is copyrighted to their respective owners




Configure -- cluster
• References:


• http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html
  (official documentation)


• http://developer.yahoo.com/hadoop/tutorial/module7.html (Managing a
  Hadoop Cluster. Source: YDN)


• http://wiki.datameer.com/display/DAS1/Hadoop+Cluster+Configuration+Tips
Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC
                                All other & referenced work is copyrighted to their respective owners




Questions?




• blog: shanky.org | twitter: @tshanky


• st@treasuryofideas.com

More Related Content

What's hot

Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionDavid Pilato
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache PigSachin Vakkund
 
Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easyVictor Sanchez Anguix
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystemtfmailru
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latinknowbigdata
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Gera Shegalov
 
Asset Pipeline
Asset PipelineAsset Pipeline
Asset PipelineEric Berry
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersYash Sharma
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayDataWorks Summit
 
API Design Antipatterns - APICon SF
API Design Antipatterns - APICon SFAPI Design Antipatterns - APICon SF
API Design Antipatterns - APICon SFManish Pandit
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning ElasticsearchAnurag Patel
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
 
Puppet for Everybody: Federated and Hierarchical Puppet Enterprise
Puppet for Everybody: Federated and Hierarchical Puppet EnterprisePuppet for Everybody: Federated and Hierarchical Puppet Enterprise
Puppet for Everybody: Federated and Hierarchical Puppet EnterprisecbowlesUT
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2Rafał Kuć
 

What's hot (18)

Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English version
 
An Introduction to Apache Pig
An Introduction to Apache PigAn Introduction to Apache Pig
An Introduction to Apache Pig
 
Beginning hive and_apache_pig
Beginning hive and_apache_pigBeginning hive and_apache_pig
Beginning hive and_apache_pig
 
Apache Pig: Making data transformation easy
Apache Pig: Making data transformation easyApache Pig: Making data transformation easy
Apache Pig: Making data transformation easy
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013
 
Asset Pipeline
Asset PipelineAsset Pipeline
Asset Pipeline
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCoders
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
 
API Design Antipatterns - APICon SF
API Design Antipatterns - APICon SFAPI Design Antipatterns - APICon SF
API Design Antipatterns - APICon SF
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
API Design
API DesignAPI Design
API Design
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
Puppet for Everybody: Federated and Hierarchical Puppet Enterprise
Puppet for Everybody: Federated and Hierarchical Puppet EnterprisePuppet for Everybody: Federated and Hierarchical Puppet Enterprise
Puppet for Everybody: Federated and Hierarchical Puppet Enterprise
 
Battle of the Giants round 2
Battle of the Giants round 2Battle of the Giants round 2
Battle of the Giants round 2
 

Viewers also liked

SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerKorea Sdec
 
รับสมัครครูนาฏศิลป์
รับสมัครครูนาฏศิลป์รับสมัครครูนาฏศิลป์
รับสมัครครูนาฏศิลป์Orapan Chamnan
 
Sdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopSdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopKorea Sdec
 
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudSDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudKorea Sdec
 
SDEC2011 Essentials of Pig
SDEC2011 Essentials of PigSDEC2011 Essentials of Pig
SDEC2011 Essentials of PigKorea Sdec
 
SDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionSDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionKorea Sdec
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingKorea Sdec
 

Viewers also liked (9)

SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuer
 
รับสมัครครูนาฏศิลป์
รับสมัครครูนาฏศิลป์รับสมัครครูนาฏศิลป์
รับสมัครครูนาฏศิลป์
 
Sdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopSdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
 
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudSDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloud
 
SDEC2011 Essentials of Pig
SDEC2011 Essentials of PigSDEC2011 Essentials of Pig
SDEC2011 Essentials of Pig
 
SDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionSDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestion
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
 
Google Protocol Buffers
Google Protocol BuffersGoogle Protocol Buffers
Google Protocol Buffers
 

Similar to SDEC2011 Introducing Hadoop

Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...rhatr
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopOwen O'Malley
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api CompatibilityCloudera, Inc.
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010Jonathan Seidman
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCloudIDSummit
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single clusterSalil Navgire
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Hortonworks
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutKorea Sdec
 
Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016Clark Everetts
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training Keylabs
 

Similar to SDEC2011 Introducing Hadoop (20)

Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api Compatibility
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
Presentation
PresentationPresentation
Presentation
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single cluster
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 

More from Korea Sdec

SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveSDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveKorea Sdec
 
SDEC2011 Rapidant
SDEC2011 RapidantSDEC2011 Rapidant
SDEC2011 RapidantKorea Sdec
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACCKorea Sdec
 
SDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesSDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesKorea Sdec
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedKorea Sdec
 

More from Korea Sdec (6)

SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveSDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
 
SDEC2011 Rapidant
SDEC2011 RapidantSDEC2011 Rapidant
SDEC2011 Rapidant
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACC
 
SDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesSDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & Experiences
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

SDEC2011 Introducing Hadoop

  • 1. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC. Copyright for all other & referenced work is retained by their respective owners. Introducing Hadoop Mastering Hadoop Map-reduce for Data Analysis Shashank Tiwari blog: shanky.org | twitter: @tshanky st@treasuryofideas.com
  • 2. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners What is Hadoop
  • 3. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners HDFS Architecture
  • 4. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Namenode/Datanode, JobTracker/TaskTracker
  • 5. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners MapReduce
  • 6. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners ZK Namespace
  • 7. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Essential HBase Schema
  • 8. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Multi-dimensional View
  • 9. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners A Map/Hash View •{ • "row_key_1" : { "name" : { • "first_name" : "Jolly", "last_name" : "Goodfellow" • } } }, • "location" : { "zip": "94301" },
  • 10. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Architectural View (HBase)
  • 11. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners The Persistence Mechanism
  • 12. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners The underlying file format
  • 13. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Installing & Setting up Hadoop • Required software: Java 1.6.x, ssh + sshd • Download • Install • Configure • single-node • pseudo-distributed • cluster
  • 14. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Download • Source: http://hadoop.apache.org/ • Version: • 0.20.203.x -- current stable • 0.20.x -- previous stable • Includes • Hadoop Common -- common utilities, HDFS, MapReduce
  • 15. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Install • Extract: tar zxvf hadoop-0.20.203.0rc1.tar.gz • Move & Create Symbolic Link • ln -s hadoop-0.20.203.0 hadoop • On Windows • http://developer.yahoo.com/hadoop/tutorial/module3.html
  • 16. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Configure -- single-node • Edit: conf/hadoop-env.sh • Set JAVA_HOME • Default configuration is single-node • Start bin/hadoop (for command options) • Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
  • 17. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Configure -- pseduo-distributed • Edit: conf/core-site.xml (configure HDFS daemon) • Edit: conf/hdfs-site.xml (configure HDFS replication factor) • Edit: conf/mapred-site.xml (configure MapReduce JobTracker daemon) • Enable ssh to localhost (without passphrase) • Reference: http://hadoop.apache.org/common/docs/r0.20.203.0/ single_node_setup.html
  • 18. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Start Hadoop • Format HDFS: bin/hadoop namenode -format • Start all daemons: bin/start-all.sh • Verify logs • Browse the web interface: • Namenode: http://localhost:50070/ • JobTracker: http://localhost:50030/
  • 19. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Take Hadoop for a test-drive • Run examples (hadoop-examples-0.20.203.0.jar) • Grep using regular expressions • Copy files to HDFS: bin/hadoop fs -put bin input • Grep for files which have text beginning with ‘start’ • Verify output on HDFS: bin/hadoop fs -cat output/* • Copy output to local filesystem & verify: bin/hadoop fs -get output output && cat output/*
  • 20. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Configure -- cluster • References: • http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html (official documentation) • http://developer.yahoo.com/hadoop/tutorial/module7.html (Managing a Hadoop Cluster. Source: YDN) • http://wiki.datameer.com/display/DAS1/Hadoop+Cluster+Configuration+Tips
  • 21. Confidential, for personal use only. All original content copyright owned by Treasury of Ideas LLC All other & referenced work is copyrighted to their respective owners Questions? • blog: shanky.org | twitter: @tshanky • st@treasuryofideas.com