SlideShare a Scribd company logo
1 of 14
Getting started with HADOOP?
Tips from Hadoop Professionals to help kick start your career
“I would like to share my experience with you。
1. I think practice is more important than theory, so do
a quick start like use Cloudera QuickStart VM。
2. Starting with the basics of installing and configuring
Hadoop Using command line, when you are familiar
with it, you can use GUI like ambari or cloudera
manager。”
Jin Zhan
Square Enix - Senior Engineer
Japan
“Here are some tips - these are based on things which
people should know but I have seen them get wrong -
you probably have them already - and there are more
than two!
1. You must increase ulimits
http://blog.cloudera.com/blog/2009/03/configuration-
parameters-what-can-you-just-ignore/
Mark H. Butler
Software Engineer at
Pataniqa Ltd
Preston, United Kingdom
2. Installing a NoSQL database? Use the YCSB
benchmark to check it is working correctly
https://github.com/brianfrankcooper/YCSB/wiki
Mark H. Butler
Software Engineer at
Pataniqa Ltd
Preston, United Kingdom
3. Consider using compression (although there are tradeoffs!)
http://comphadoop.weebly.com/
http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-
vs-lzf-vs-zlib-a-comparison-of
http://www.slideshare.net/Hadoop_Summit/kamat-singh-
june27425pmroom210cv2
http://blog.cloudera.com/blog/2011/09/snappy-and-hadoop/
http://www.cloudera.com/blog/2009/11/17/hadoop-at-twitter-
part-1-splittable-lzo-compression/
https://github.com/twitter/hadoop-lzo
Mark H. Butler
Software Engineer at
Pataniqa Ltd
Preston, United Kingdom
4. Don't install a Hadoop cluster manually - but there are many
technologies to automate e.g. Puppet, Chef, Ansible, Vagrant
http://blog.godatadriven.com/bare-metal-hadoop-provisioning-
ansible-cobbler.html
http://chimpler.wordpress.com/2013/01/20/deploying-hadoop-
on-ec2-with-whirr/
http://java.dzone.com/articles/setting-hadoop-virtual-cluster
http://www.diversit.eu/2012/05/setting-up-hadoop-cluster-using-
puppet.html
http://www.rpark.com/2013/02/using-chef-to-build-out-hadoop-
cluster.html
Mark H. Butler
Software Engineer at
Pataniqa Ltd
Preston, United Kingdom
5. Java and Scala are great but don't overlook Python -
it's handy for prototyping one-off map-reduce jobs as
you do not need a cluster to test
http://www.michael-noll.com/tutorials/writing-an-
hadoop-mapreduce-program-in-python/
Hope that helps! “
Mark H. Butler
Software Engineer at
Pataniqa Ltd
Preston, United Kingdom
“Technically speaking, Map Reduce is the base and Map
= Select and Reduce = Group by so if you know what
you want and how you want to summarize it then
Hadoop is meant for you. “
Piyush Jindal
Software Engineer at Target
Bengaluru, Karnataka, India
“Tips :
1. Good knowledge of Data Structure and Insight to
Analyze the data is a Must.
2. Core JAVA and COLLECTION is must.
3. SQL and PL/SQL knowledge to solve complex
scenarios will help a lot.
These are the stepping stones to approach a problem in
Bigdata and to provide solution as well.. “
SOMANATH NANDA
Cloudera Certified Developer for Hadoop
Cognizant Technology Solutions
Bengaluru, Karnataka, India
“1. Audit your data to identify what might be useful but
unexploited. 2. Study new technologies; they are
moving rapidly.”
Merv Adrian
Vice President at Gartner
San Francisco Bay Area
“some good examples in this whitepaper (note,
registration required):
http://www.mongodb.com/lp/big-data
Mat Keep
Principal Product Marketing Manager at MongoDB Inc.
Hawkinge, Kent, United Kingdom
“Here are some tips in no specific order
1. Best value of Hadoop comes from the combination of software
and hardware designed for your specific needs.
2. Hardware configuration of your cluster is very important . If
you work load is I/O bound then disk specs are important, if CPU
bound then faster CPUs are better and if application is memory
bound then server with larger memory are needed.
Mohit Saxena
Vice President -Technology Founder InMobi - A Global
Mobile Ad Network
Bengaluru Area, India
3. Network connectivity between nodes is extremely important at
least 1 gigabit NIC are must in Hadoop cluster so that inter
communication aren't a bottleneck in your cluster as they can be
huge drag.
4. Plan the size of storage and disk controller as per your need of
read per sec that you want to achieve from each server.
5. Ganglia is a fairly good monitoring tool for Hadoop and it can
point out bottlenecks .”
Mohit Saxena
Vice President -Technology Founder InMobi - A Global
Mobile Ad Network
Bengaluru Area, India
For more information on best Hadoop courses for your career
Check out the link below
http://www.dezyre.com/Big-Data-and-Hadoop/19

More Related Content

Recently uploaded

How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7gragkhusi
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 

Recently uploaded (20)

How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Tips from Hadoop experts for beginners

  • 1. Getting started with HADOOP? Tips from Hadoop Professionals to help kick start your career
  • 2. “I would like to share my experience with you。 1. I think practice is more important than theory, so do a quick start like use Cloudera QuickStart VM。 2. Starting with the basics of installing and configuring Hadoop Using command line, when you are familiar with it, you can use GUI like ambari or cloudera manager。” Jin Zhan Square Enix - Senior Engineer Japan
  • 3. “Here are some tips - these are based on things which people should know but I have seen them get wrong - you probably have them already - and there are more than two! 1. You must increase ulimits http://blog.cloudera.com/blog/2009/03/configuration- parameters-what-can-you-just-ignore/ Mark H. Butler Software Engineer at Pataniqa Ltd Preston, United Kingdom
  • 4. 2. Installing a NoSQL database? Use the YCSB benchmark to check it is working correctly https://github.com/brianfrankcooper/YCSB/wiki Mark H. Butler Software Engineer at Pataniqa Ltd Preston, United Kingdom
  • 5. 3. Consider using compression (although there are tradeoffs!) http://comphadoop.weebly.com/ http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy- vs-lzf-vs-zlib-a-comparison-of http://www.slideshare.net/Hadoop_Summit/kamat-singh- june27425pmroom210cv2 http://blog.cloudera.com/blog/2011/09/snappy-and-hadoop/ http://www.cloudera.com/blog/2009/11/17/hadoop-at-twitter- part-1-splittable-lzo-compression/ https://github.com/twitter/hadoop-lzo Mark H. Butler Software Engineer at Pataniqa Ltd Preston, United Kingdom
  • 6. 4. Don't install a Hadoop cluster manually - but there are many technologies to automate e.g. Puppet, Chef, Ansible, Vagrant http://blog.godatadriven.com/bare-metal-hadoop-provisioning- ansible-cobbler.html http://chimpler.wordpress.com/2013/01/20/deploying-hadoop- on-ec2-with-whirr/ http://java.dzone.com/articles/setting-hadoop-virtual-cluster http://www.diversit.eu/2012/05/setting-up-hadoop-cluster-using- puppet.html http://www.rpark.com/2013/02/using-chef-to-build-out-hadoop- cluster.html Mark H. Butler Software Engineer at Pataniqa Ltd Preston, United Kingdom
  • 7. 5. Java and Scala are great but don't overlook Python - it's handy for prototyping one-off map-reduce jobs as you do not need a cluster to test http://www.michael-noll.com/tutorials/writing-an- hadoop-mapreduce-program-in-python/ Hope that helps! “ Mark H. Butler Software Engineer at Pataniqa Ltd Preston, United Kingdom
  • 8. “Technically speaking, Map Reduce is the base and Map = Select and Reduce = Group by so if you know what you want and how you want to summarize it then Hadoop is meant for you. “ Piyush Jindal Software Engineer at Target Bengaluru, Karnataka, India
  • 9. “Tips : 1. Good knowledge of Data Structure and Insight to Analyze the data is a Must. 2. Core JAVA and COLLECTION is must. 3. SQL and PL/SQL knowledge to solve complex scenarios will help a lot. These are the stepping stones to approach a problem in Bigdata and to provide solution as well.. “ SOMANATH NANDA Cloudera Certified Developer for Hadoop Cognizant Technology Solutions Bengaluru, Karnataka, India
  • 10. “1. Audit your data to identify what might be useful but unexploited. 2. Study new technologies; they are moving rapidly.” Merv Adrian Vice President at Gartner San Francisco Bay Area
  • 11. “some good examples in this whitepaper (note, registration required): http://www.mongodb.com/lp/big-data Mat Keep Principal Product Marketing Manager at MongoDB Inc. Hawkinge, Kent, United Kingdom
  • 12. “Here are some tips in no specific order 1. Best value of Hadoop comes from the combination of software and hardware designed for your specific needs. 2. Hardware configuration of your cluster is very important . If you work load is I/O bound then disk specs are important, if CPU bound then faster CPUs are better and if application is memory bound then server with larger memory are needed. Mohit Saxena Vice President -Technology Founder InMobi - A Global Mobile Ad Network Bengaluru Area, India
  • 13. 3. Network connectivity between nodes is extremely important at least 1 gigabit NIC are must in Hadoop cluster so that inter communication aren't a bottleneck in your cluster as they can be huge drag. 4. Plan the size of storage and disk controller as per your need of read per sec that you want to achieve from each server. 5. Ganglia is a fairly good monitoring tool for Hadoop and it can point out bottlenecks .” Mohit Saxena Vice President -Technology Founder InMobi - A Global Mobile Ad Network Bengaluru Area, India
  • 14. For more information on best Hadoop courses for your career Check out the link below http://www.dezyre.com/Big-Data-and-Hadoop/19