Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSummit2010
Developers Most Frequent Headaches
and How to Address Them
Who am I!
Co-founder and CTO at
Architect of Karmasphere’s solutions
Have been working with Hadoop since …
Written a few compilers
Broken a few things:
› computers, security systems, bosses, etc.
Survey of Questions
•How to maintain the cluster?
•Why does Hadoop do ….?
•How to know what the cluster is doing?
•How to use Hadoop?
•How to get stuff to/from Hadoop? 20%
•How to setup Hadoop?
Based on user questions and issues
Source: Hadoop Users Mail-list (March 2009-June 2010
Problems Past –Cluster as a Utility
Getting a cluster – it’s a utility (like electricity)
› Amazon EMR, Hadoop, Cloudera, IBM, Yahoo
Cluster versions and protocols
› Easy to switch between clusters
› Staging for faster development
› Easy to migrate data
› Talk to remote clusters
Ensures Hadoop distribution and version independence
Works from Windows (unlike Hadoop Client), Mac and Linux
Supports any Hadoop environment: private, public or cloud
› Job portability
› Operating system portability
› Firewall hopping and tunnelling
› Fault tolerant API
› Synchronous and Asynchronous API
› Clean Object Oriented design
Making it easy and predictable to maintain a business
operation reliant on Hadoop
Why did my job fail?
What do I need to know about my job?
› Valgrind, lint, coverity, gprof, gdb, findbugs, sparse,
Why did my job do ….?
Traditional Approach Karmasphere Approach
Rich communications required for Hive
Supported within Karmasphere
Debug/ optimization information
Hive JDBC Thrift Proxy Karmasphere
through JDBC Thrift
Thrift Server Native
Job Tracker Job Tracker
Get Working Efficiently with Hadoop
Karmasphere Studio: Community Edition Free
Karmasphere Studio: Professional Edition
› ($200 introductory discount for attendees)
Karmasphere Client (Enterprise license)
Karmasphere Studio: Analyst Edition
› Coming sooner than you think!