SlideShare a Scribd company logo
Building Hadoop Based
Big Data Environment
Evans Ye @ TWHUG

2013/12/14
Who am I
• Evans Ye @

• Dumbo Team
• http://dumbointaiwan.blogspot.tw/
12/14/2013

Copyright 2013 Trend Micro Inc.
Agenda
• Building your own Hadoop version
• Hadoop Deployment

• Hadoop release engineering
• The development environment

• Bigtop puppet

12/14/2013

Copyright 2013 Trend Micro Inc.
Why Build our own version
• Add your own patch at any time
– From community perspective, they need to take care about
backward complicity,
which need much more time and effort on it.

• Fetch official patches in to current adopted version
– You may not upgrade your Hadoop version frequently,
But there’s a specific need for that patch.

• Flexibility, Business needed features
12/14/2013

Copyright 2013 Trend Micro Inc.
As a Beginner

12/14/2013

Copyright 2013 Trend Micro Inc.
Build Hadoop Infrastructure

12/14/2013

Copyright 2013 Trend Micro Inc.

What’s your work?
….

12/14/2013

Copyright 2013 Trend Micro Inc.

I thought you just need to
yum install Hadoop.
Brute force
• git clone
• Make some changes

• Builde binary tarball

How to do version control?

core-site.xml
hdfs-site.xml
mapred-site.xml
…

12/14/2013

Copyright 2013 Trend Micro Inc.
Bigtop

12/14/2013

Copyright 2013 Trend Micro Inc.
How bigtop helps you
• Apache Hadoop App developers:
– Run pseudo-distributed Hadoop cluster to test your code on.

• Vendors:
– Build your own Apache Hadoop distribution, customized from
Apache Bigtop bits.

• Packaging, Deployment, Integration Testing

12/14/2013

Copyright 2013 Trend Micro Inc.
Supported Linux Distro
• Ubuntu 10.10
• CentOS 5/6

• Fedora 18
• Mageia 1

• openSUSE 12.2

12/14/2013

Copyright 2013 Trend Micro Inc.
Build
• Build hadoop-common (see BUILDING.txt)
– hadoop-common$ mvn package –Pdist,docs,src,native -Dtar

• Prepare your src tar in bigtop
• Bigtop$ make hadoop-rpm

12/14/2013

Copyright 2013 Trend Micro Inc.
Hadoop Deployment

12/14/2013

Copyright 2013 Trend Micro Inc.
Configuration files
• Hadoop related config
–
–
–
–
–
–
–
–
–
12/14/2013

core-site.xml
hdfs-site.xml
mapred-site.xml
log4j.properties
hadoop-env.sh
fair-scheduler.xml
rack-topology
hadoop-metrics.properties
taskcontroller.cfg
Copyright 2013 Trend Micro Inc.
Local Directories
• Hadoop related file and directory
– Namenode metadata
• /name/1, /name/2

– Datanode
• /data/1, /data/2 , /data/3 , /data/4

– Tasktracker
• /mapred/1/local, /mapred/2/local

–…

12/14/2013

Copyright 2013 Trend Micro Inc.
More hadoop ecosystem

12/14/2013

Copyright 2013 Trend Micro Inc.
Problems to solve
• Lots of nodes need to be configured
• Less human involved, less mistake made

• Configuration changed quite often
– adjust fair scheduler
– enable/disable short circuit
– try more performance improvement configurations

12/14/2013

Copyright 2013 Trend Micro Inc.
Hadooppet

12/14/2013

Copyright 2013 Trend Micro Inc.
What is puppet ?
• A IT automation tool to help system administrators
automate the many repetitive tasks
• You need to only define the desired state

12/14/2013

Copyright 2013 Trend Micro Inc.
What is Hadooppet ?
• A general hadoop cluster deployment tool based on
puppet
• Kerberos / ldap auto configured
• A set of hadoop / kerberos management tool
• A set of sanity check scripts for trend hadoop related
services
• Manage configuration on puppetmaster

12/14/2013

Copyright 2013 Trend Micro Inc.
Design
• Abstract environment specific configurations in a single
configuration file
• setup.sh
–
–
–
–
–
–

12/14/2013

namenode_fqdns=(“dev1.example.com” “dev2.example.com”)
namenode_dirs=(“/name/1” “/name/2”)
namenode_heap=32g
map_slots=5
reduce_slots=3
…

Copyright 2013 Trend Micro Inc.
Benifits
• Can be used to setup any kind of hadoop cluster
• When doing main version upgarade, minimal the
downtime
– hadoop1  hadoop2
Namenode
Secondarynamenode

12/14/2013

Copyright 2013 Trend Micro Inc.

Active/Standby Namenode
Journalnodes
ZKFC
Release Engineering

12/14/2013

Copyright 2013 Trend Micro Inc.
Manually
• Build src tarball in hadoop-common
• Build rpms in bigtop

• submit build to release yum repo
• yum update on hadoop cluster…

12/14/2013

Copyright 2013 Trend Micro Inc.
Continuous Integration
• Setup hadoop-common daily build
• Setup Bigtop release Build
– should be manually triggered

• Setup Hadooppet daily build
– Run sanity checks on a REAL CLUSTER

12/14/2013

Copyright 2013 Trend Micro Inc.
Virtualization
• Build a Xen Server Cluster

12/14/2013

Copyright 2013 Trend Micro Inc.
12/14/2013

Copyright 2013 Trend Micro Inc.
give-me-vm
• Pycon 2012
– Small Python Tools for Software Release Engineering

• An automation tool to manage
VM lifecycle
• Use Python XenAPI

• Create temporary VM for testing
by self service
• Destroy it when the testing
is finished
12/14/2013

Copyright 2013 Trend Micro Inc.
Build auto deployment on Hadooppet
• ./give_me_vm.py
• setup passphraseless ssh between each VM

• set hostname
• Install Hadooppet on master

• run deployment
• run sanity checks
• ./destroy_vm.py
12/14/2013

Copyright 2013 Trend Micro Inc.
12/14/2013

Copyright 2013 Trend Micro Inc.
Development
Environment

12/14/2013

Copyright 2013 Trend Micro Inc.
For hadoop service developers…
• No enough hadoop client for each developers
• Developer can not reach server side while developing
hadoop related services
• Can not experiment new technology like impala spark
flume

• CI on Hadoop related services

12/14/2013

Copyright 2013 Trend Micro Inc.
give-me-vm + Hadoop all-in-one VM
• Use Hadooppet to setup a peudo-distributed hadoop
VM as Xenserver template

• get a Hadoop all-in-one VM via give-me-vm
• Services integrate its CI test with hadoop all-in-one VM

12/14/2013

Copyright 2013 Trend Micro Inc.
Bigtop
puppet

12/14/2013

Copyright 2013 Trend Micro Inc.
Bigtop puppet
• Bigtop also has a set of puppet scripts to deploy
Hadoop ecosystem

12/14/2013

Copyright 2013 Trend Micro Inc.
Bigtop puppet
• Preparation:
– A VM with jdk, puppet installed
– mkdir –p /data/{1,2}
– git clone https://github.com/apache/bigtop.git

12/14/2013

Copyright 2013 Trend Micro Inc.
Conclusion
• There’re many great deployment tool exist
– Ambari, CM, ETU appliance
– Choose suitable distribution by your business need

• If you want to do it by yourself
– Bigtop can do packaging for you easily
– Leverage bigtop puppet module for your deployment

12/14/2013

Copyright 2013 Trend Micro Inc.
Questions?
Thank you !

More Related Content

What's hot

Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
Uri Laserson
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
DataWorks Summit
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
DataWorks Summit
 
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
In-Memory Computing Summit
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
inside-BigData.com
 
Full Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and GrafanaFull Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and Grafana
Jazz Yao-Tsung Wang
 
XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorial
marpierc
 
Introduction to HCFS
Introduction to HCFSIntroduction to HCFS
Introduction to HCFS
Jazz Yao-Tsung Wang
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Spark Summit
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
Travis Oliphant
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
Travis Oliphant
 
London level39
London level39London level39
London level39
Travis Oliphant
 
Keep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its BestKeep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its Best
DataWorks Summit/Hadoop Summit
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
All Things Open
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
Travis Oliphant
 
Chaos is a ladder !
Chaos is a ladder !Chaos is a ladder !
Chaos is a ladder !
Haggai Philip Zagury
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
DataWorks Summit
 
Flaky tests and bugs in Apache software (e.g. Hadoop)
Flaky tests and bugs in Apache software (e.g. Hadoop)Flaky tests and bugs in Apache software (e.g. Hadoop)
Flaky tests and bugs in Apache software (e.g. Hadoop)
Akihiro Suda
 

What's hot (19)

Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
 
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
Full Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and GrafanaFull Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and Grafana
 
XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorial
 
Introduction to HCFS
Introduction to HCFSIntroduction to HCFS
Introduction to HCFS
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
London level39
London level39London level39
London level39
 
Keep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its BestKeep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its Best
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
Chaos is a ladder !
Chaos is a ladder !Chaos is a ladder !
Chaos is a ladder !
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
SparkFramework
SparkFrameworkSparkFramework
SparkFramework
 
Flaky tests and bugs in Apache software (e.g. Hadoop)
Flaky tests and bugs in Apache software (e.g. Hadoop)Flaky tests and bugs in Apache software (e.g. Hadoop)
Flaky tests and bugs in Apache software (e.g. Hadoop)
 

Viewers also liked

Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and Challenges
Gregg Barrett
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
Senthil Kumar
 
Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title)
Coastal Pet Products, Inc.
 
Generating Insight from Big Data in Energy and the Environment
Generating Insight from Big Data in Energy and the EnvironmentGenerating Insight from Big Data in Energy and the Environment
Generating Insight from Big Data in Energy and the Environment
David Wallom
 
Enterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise AgilityEnterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise Agility
NUS-ISS
 
Open-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit ReportOpen-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit Report
Innovative Management Services
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data science
Mahesh Kumar CV
 
Big Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBig Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat Protection
Blue Coat
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage Strategy
Hitachi Vantara
 
Smart Analytics For The Utility Sector
Smart Analytics For The Utility SectorSmart Analytics For The Utility Sector
Smart Analytics For The Utility Sector
Herman Bosker
 
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneOpen-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Innovative Management Services
 
Building Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom WhiteBuilding Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom White
The Hive
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
Inside Analysis
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and Hadoop
Kai Zheng
 
"Big Data" in the Energy Industry
"Big Data" in the Energy Industry"Big Data" in the Energy Industry
"Big Data" in the Energy Industry
Paige Bailey
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
Hortonworks
 
Mr. satish kumar, schnieder electric
Mr. satish kumar, schnieder electricMr. satish kumar, schnieder electric
Mr. satish kumar, schnieder electric
Rohan Pinto
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
Cloudera, Inc.
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Shivaji Dutta
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
DataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and Challenges
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 
Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title)
 
Generating Insight from Big Data in Energy and the Environment
Generating Insight from Big Data in Energy and the EnvironmentGenerating Insight from Big Data in Energy and the Environment
Generating Insight from Big Data in Energy and the Environment
 
Enterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise AgilityEnterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise Agility
 
Open-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit ReportOpen-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit Report
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data science
 
Big Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBig Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat Protection
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage Strategy
 
Smart Analytics For The Utility Sector
Smart Analytics For The Utility SectorSmart Analytics For The Utility Sector
Smart Analytics For The Utility Sector
 
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneOpen-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
 
Building Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom WhiteBuilding Hadoop Data Applications with Kite by Tom White
Building Hadoop Data Applications with Kite by Tom White
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and Hadoop
 
"Big Data" in the Energy Industry
"Big Data" in the Energy Industry"Big Data" in the Energy Industry
"Big Data" in the Energy Industry
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Mr. satish kumar, schnieder electric
Mr. satish kumar, schnieder electricMr. satish kumar, schnieder electric
Mr. satish kumar, schnieder electric
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 

Similar to Building hadoop based big data environment

How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competition
Evans Ye
 
Drupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: LaunchingDrupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: LaunchingAcquia
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
 
Spark forspringdevs springone_final
Spark forspringdevs springone_finalSpark forspringdevs springone_final
Spark forspringdevs springone_final
sdeeg
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
Terry Padgett
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
hadooparchbook
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0
Inside Analysis
 
Future of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsFuture of Data Intensive Applicaitons
Future of Data Intensive Applicaitons
Milind Bhandarkar
 
14.05.2012 Opening the tool box: Development, testing and deployment in the H...
14.05.2012 Opening the tool box: Development, testing and deployment in the H...14.05.2012 Opening the tool box: Development, testing and deployment in the H...
14.05.2012 Opening the tool box: Development, testing and deployment in the H...Swiss Big Data User Group
 
Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016
Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016
Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016
Paul McKibben
 
Aiming for automatic updates - Drupal Dev Days Lisbon 2018
Aiming for automatic updates - Drupal Dev Days Lisbon 2018Aiming for automatic updates - Drupal Dev Days Lisbon 2018
Aiming for automatic updates - Drupal Dev Days Lisbon 2018
hernanibf
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Cloudera, Inc.
 
Drupal in-depth
Drupal in-depthDrupal in-depth
Drupal in-depth
Kathryn Carruthers
 
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
Phase2
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster Suite
EDB
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
Uwe Printz
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
Roberto Pérez Alcolea
 

Similar to Building hadoop based big data environment (20)

How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competition
 
Drupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: LaunchingDrupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: Launching
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
Spark forspringdevs springone_final
Spark forspringdevs springone_finalSpark forspringdevs springone_final
Spark forspringdevs springone_final
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0
 
Future of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsFuture of Data Intensive Applicaitons
Future of Data Intensive Applicaitons
 
14.05.2012 Opening the tool box: Development, testing and deployment in the H...
14.05.2012 Opening the tool box: Development, testing and deployment in the H...14.05.2012 Opening the tool box: Development, testing and deployment in the H...
14.05.2012 Opening the tool box: Development, testing and deployment in the H...
 
Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016
Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016
Help! I inherited a Drupal Site! - DrupalCamp Atlanta 2016
 
Aiming for automatic updates - Drupal Dev Days Lisbon 2018
Aiming for automatic updates - Drupal Dev Days Lisbon 2018Aiming for automatic updates - Drupal Dev Days Lisbon 2018
Aiming for automatic updates - Drupal Dev Days Lisbon 2018
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Drupal in-depth
Drupal in-depthDrupal in-depth
Drupal in-depth
 
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster Suite
 
HugNov14
HugNov14HugNov14
HugNov14
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 

More from Evans Ye

Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfJoin ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Evans Ye
 
非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽
Evans Ye
 
2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public
Evans Ye
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
Evans Ye
 
The Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessThe Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward Success
Evans Ye
 
The Apache Way
The Apache WayThe Apache Way
The Apache Way
Evans Ye
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
Evans Ye
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
Evans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
Evans Ye
 
BigTop vm and docker provisioner
BigTop vm and docker provisionerBigTop vm and docker provisioner
BigTop vm and docker provisioner
Evans Ye
 
Docker workshop
Docker workshopDocker workshop
Docker workshopEvans Ye
 
Fits docker into devops
Fits docker into devopsFits docker into devops
Fits docker into devopsEvans Ye
 
Deep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaDeep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through Impala
Evans Ye
 
Network Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseNetwork Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseEvans Ye
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodesEvans Ye
 
How to be a star engineer
How to be a star engineerHow to be a star engineer
How to be a star engineerEvans Ye
 

More from Evans Ye (18)

Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfJoin ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
 
非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽
 
2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
The Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessThe Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward Success
 
The Apache Way
The Apache WayThe Apache Way
The Apache Way
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
BigTop vm and docker provisioner
BigTop vm and docker provisionerBigTop vm and docker provisioner
BigTop vm and docker provisioner
 
Docker workshop
Docker workshopDocker workshop
Docker workshop
 
Fits docker into devops
Fits docker into devopsFits docker into devops
Fits docker into devops
 
Deep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaDeep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through Impala
 
Network Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseNetwork Traffic Search using Apache HBase
Network Traffic Search using Apache HBase
 
Vagrant
VagrantVagrant
Vagrant
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodes
 
How to be a star engineer
How to be a star engineerHow to be a star engineer
How to be a star engineer
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 

Building hadoop based big data environment

  • 1. Building Hadoop Based Big Data Environment Evans Ye @ TWHUG 2013/12/14
  • 2. Who am I • Evans Ye @ • Dumbo Team • http://dumbointaiwan.blogspot.tw/ 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 3. Agenda • Building your own Hadoop version • Hadoop Deployment • Hadoop release engineering • The development environment • Bigtop puppet 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 4. Why Build our own version • Add your own patch at any time – From community perspective, they need to take care about backward complicity, which need much more time and effort on it. • Fetch official patches in to current adopted version – You may not upgrade your Hadoop version frequently, But there’s a specific need for that patch. • Flexibility, Business needed features 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 5. As a Beginner 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 6. Build Hadoop Infrastructure 12/14/2013 Copyright 2013 Trend Micro Inc. What’s your work?
  • 7. …. 12/14/2013 Copyright 2013 Trend Micro Inc. I thought you just need to yum install Hadoop.
  • 8. Brute force • git clone • Make some changes • Builde binary tarball How to do version control? core-site.xml hdfs-site.xml mapred-site.xml … 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 10. How bigtop helps you • Apache Hadoop App developers: – Run pseudo-distributed Hadoop cluster to test your code on. • Vendors: – Build your own Apache Hadoop distribution, customized from Apache Bigtop bits. • Packaging, Deployment, Integration Testing 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 11. Supported Linux Distro • Ubuntu 10.10 • CentOS 5/6 • Fedora 18 • Mageia 1 • openSUSE 12.2 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 12. Build • Build hadoop-common (see BUILDING.txt) – hadoop-common$ mvn package –Pdist,docs,src,native -Dtar • Prepare your src tar in bigtop • Bigtop$ make hadoop-rpm 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 14. Configuration files • Hadoop related config – – – – – – – – – 12/14/2013 core-site.xml hdfs-site.xml mapred-site.xml log4j.properties hadoop-env.sh fair-scheduler.xml rack-topology hadoop-metrics.properties taskcontroller.cfg Copyright 2013 Trend Micro Inc.
  • 15. Local Directories • Hadoop related file and directory – Namenode metadata • /name/1, /name/2 – Datanode • /data/1, /data/2 , /data/3 , /data/4 – Tasktracker • /mapred/1/local, /mapred/2/local –… 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 17. Problems to solve • Lots of nodes need to be configured • Less human involved, less mistake made • Configuration changed quite often – adjust fair scheduler – enable/disable short circuit – try more performance improvement configurations 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 19. What is puppet ? • A IT automation tool to help system administrators automate the many repetitive tasks • You need to only define the desired state 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 20. What is Hadooppet ? • A general hadoop cluster deployment tool based on puppet • Kerberos / ldap auto configured • A set of hadoop / kerberos management tool • A set of sanity check scripts for trend hadoop related services • Manage configuration on puppetmaster 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 21. Design • Abstract environment specific configurations in a single configuration file • setup.sh – – – – – – 12/14/2013 namenode_fqdns=(“dev1.example.com” “dev2.example.com”) namenode_dirs=(“/name/1” “/name/2”) namenode_heap=32g map_slots=5 reduce_slots=3 … Copyright 2013 Trend Micro Inc.
  • 22. Benifits • Can be used to setup any kind of hadoop cluster • When doing main version upgarade, minimal the downtime – hadoop1  hadoop2 Namenode Secondarynamenode 12/14/2013 Copyright 2013 Trend Micro Inc. Active/Standby Namenode Journalnodes ZKFC
  • 24. Manually • Build src tarball in hadoop-common • Build rpms in bigtop • submit build to release yum repo • yum update on hadoop cluster… 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 25. Continuous Integration • Setup hadoop-common daily build • Setup Bigtop release Build – should be manually triggered • Setup Hadooppet daily build – Run sanity checks on a REAL CLUSTER 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 26. Virtualization • Build a Xen Server Cluster 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 28. give-me-vm • Pycon 2012 – Small Python Tools for Software Release Engineering • An automation tool to manage VM lifecycle • Use Python XenAPI • Create temporary VM for testing by self service • Destroy it when the testing is finished 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 29. Build auto deployment on Hadooppet • ./give_me_vm.py • setup passphraseless ssh between each VM • set hostname • Install Hadooppet on master • run deployment • run sanity checks • ./destroy_vm.py 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 32. For hadoop service developers… • No enough hadoop client for each developers • Developer can not reach server side while developing hadoop related services • Can not experiment new technology like impala spark flume • CI on Hadoop related services 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 33. give-me-vm + Hadoop all-in-one VM • Use Hadooppet to setup a peudo-distributed hadoop VM as Xenserver template • get a Hadoop all-in-one VM via give-me-vm • Services integrate its CI test with hadoop all-in-one VM 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 35. Bigtop puppet • Bigtop also has a set of puppet scripts to deploy Hadoop ecosystem 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 36. Bigtop puppet • Preparation: – A VM with jdk, puppet installed – mkdir –p /data/{1,2} – git clone https://github.com/apache/bigtop.git 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 37. Conclusion • There’re many great deployment tool exist – Ambari, CM, ETU appliance – Choose suitable distribution by your business need • If you want to do it by yourself – Bigtop can do packaging for you easily – Leverage bigtop puppet module for your deployment 12/14/2013 Copyright 2013 Trend Micro Inc.