• Save
Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSummit2010
Upcoming SlideShare
Loading in...5
×
 

Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSummit2010

on

  • 2,639 views

Hadoop Summit 2010 Developers Track

Hadoop Summit 2010 Developers Track
Developer's Most Frequent Hadoop Headaches & How to Address Them
Shevek Mankin, Karmasphere

Statistics

Views

Total Views
2,639
Views on SlideShare
2,639
Embed Views
0

Actions

Likes
7
Downloads
0
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSummit2010 Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSummit2010 Presentation Transcript

  • Got Problems? Developers Most Frequent Headaches and How to Address Them Shevek CTO
  • Session Agenda  Introduction  Problems Past  Problems Present  Problems Future  Wrap Up 2
  • Who am I!  Co-founder and CTO at  Architect of Karmasphere’s solutions  Have been working with Hadoop since …  Written a few compilers  Broken a few things: › computers, security systems, bosses, etc. 3
  • Survey of Questions 1164 Questions 100% Others 80% •How to maintain the cluster? •Why does Hadoop do ….? 60% •How to know what the cluster is doing? 40% •How to use Hadoop? •How to get stuff to/from Hadoop? 20% •How to setup Hadoop? 0% Based on user questions and issues 4 Source: Hadoop Users Mail-list (March 2009-June 2010
  • Problems Past –Cluster as a Utility  Getting a cluster – it’s a utility (like electricity) › Amazon EMR, Hadoop, Cloudera, IBM, Yahoo  Cluster versions and protocols › Easy to switch between clusters › Staging for faster development › Easy to migrate data › Talk to remote clusters
  • Karmasphere Client  Ensures Hadoop distribution and version independence  Works from Windows (unlike Hadoop Client), Mac and Linux  Supports any Hadoop environment: private, public or cloud service.  Provides: › Job portability › Operating system portability › Firewall hopping and tunnelling › Fault tolerant API › Synchronous and Asynchronous API › Clean Object Oriented design  Making it easy and predictable to maintain a business operation reliant on Hadoop
  • Cluster Access
  • Problems Present – Interact with Cluster  Getting data in  Getting data out
  • Problems Present – Interact with Cluster  Getting data in  Getting data out … This is the problem. Can’t Get data out Have to extract information
  • Writing a MapReduce Job  Understanding MapReduce  Boilerplate is boring  Testing takes time  Debugging is difficult What Happened?
  • Karmasphere Job Developer
  • Present Continuous  Why did my job fail? › Monitoring › Diagnostics › Debugging  What do I need to know about my job? › Valgrind, lint, coverity, gprof, gdb, findbugs, sparse, JSR305, ....  Why did my job do ….?
  • Karmasphere Studio - Continuous
  • Problems Future  Hive  Pig  Cascading  Others ….
  • High Level Languages - Challenges  Accessibility  Integration  Portability  Diagnostics
  • Karmasphere Application Framework
  • Traditional Approach Karmasphere Approach User User Client Side Rich communications required for Hive Rich Communication Supported within Karmasphere Application framework Debug/ optimization information Hive JDBC Thrift Proxy Karmasphere Application All communications Framework ‘hampered’ through JDBC Thrift proxy Thrift Server Native Hadoop Protocol Hive Engine Server Side Hadoop Client Job Tracker Job Tracker Cluster Cluster (Hadoop) (Hadoop)
  • Your time costs money Theory Results Experiment Confidential
  • Get Working Efficiently with Hadoop  Karmasphere Studio: Community Edition Free  Karmasphere Studio: Professional Edition › ($200 introductory discount for attendees)  Karmasphere Client (Enterprise license)  Karmasphere Studio: Analyst Edition › Coming sooner than you think!
  • Questions? shevek@karmasphere.com