Got Problems?
Developers Most Frequent Headaches
and How to Address Them
Shevek
CTO
Session Agenda

 Introduction

 Problems Past

 Problems Present

 Problems Future


 Wrap Up
                     2
Who am I!


 Co-founder and CTO at
 Architect of Karmasphere’s solutions
 Have been working with Hadoop since …
 Writt...
Survey of Questions
                                                                   1164 Questions
                    ...
Problems Past –Cluster as a Utility

 Getting a cluster – it’s a utility (like electricity)
   ›   Amazon EMR, Hadoop, Cl...
Karmasphere Client
 Ensures Hadoop distribution and version independence
 Works from Windows (unlike Hadoop Client), Mac...
Cluster Access
Problems Present – Interact with Cluster

 Getting data in
 Getting data out
Problems Present – Interact with Cluster

 Getting data in
 Getting data out
…




                     This is the pro...
Writing a MapReduce Job

 Understanding MapReduce
 Boilerplate is boring
 Testing takes time
 Debugging is difficult

...
Karmasphere Job Developer
Present Continuous

 Why did my job fail?
 ›   Monitoring
 ›   Diagnostics
 ›   Debugging
 What do I need to know about ...
Karmasphere Studio - Continuous
Problems Future

 Hive
 Pig
 Cascading
 Others ….
High Level Languages - Challenges

 Accessibility
 Integration
 Portability
 Diagnostics
Karmasphere Application Framework
Traditional Approach                                                  Karmasphere Approach

                              ...
Your time
  costs money
                 Theory


Results                   Experiment

  Confidential
Get Working Efficiently with Hadoop


 Karmasphere Studio: Community Edition           Free

 Karmasphere Studio: Profes...
Questions?


    shevek@karmasphere.com
Upcoming SlideShare
Loading in …5
×

Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSummit2010

2,020 views
1,934 views

Published on

Hadoop Summit 2010 Developers Track
Developer's Most Frequent Hadoop Headaches & How to Address Them
Shevek Mankin, Karmasphere

Published in: Technology

Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSummit2010

  1. 1. Got Problems? Developers Most Frequent Headaches and How to Address Them Shevek CTO
  2. 2. Session Agenda  Introduction  Problems Past  Problems Present  Problems Future  Wrap Up 2
  3. 3. Who am I!  Co-founder and CTO at  Architect of Karmasphere’s solutions  Have been working with Hadoop since …  Written a few compilers  Broken a few things: › computers, security systems, bosses, etc. 3
  4. 4. Survey of Questions 1164 Questions 100% Others 80% •How to maintain the cluster? •Why does Hadoop do ….? 60% •How to know what the cluster is doing? 40% •How to use Hadoop? •How to get stuff to/from Hadoop? 20% •How to setup Hadoop? 0% Based on user questions and issues 4 Source: Hadoop Users Mail-list (March 2009-June 2010
  5. 5. Problems Past –Cluster as a Utility  Getting a cluster – it’s a utility (like electricity) › Amazon EMR, Hadoop, Cloudera, IBM, Yahoo  Cluster versions and protocols › Easy to switch between clusters › Staging for faster development › Easy to migrate data › Talk to remote clusters
  6. 6. Karmasphere Client  Ensures Hadoop distribution and version independence  Works from Windows (unlike Hadoop Client), Mac and Linux  Supports any Hadoop environment: private, public or cloud service.  Provides: › Job portability › Operating system portability › Firewall hopping and tunnelling › Fault tolerant API › Synchronous and Asynchronous API › Clean Object Oriented design  Making it easy and predictable to maintain a business operation reliant on Hadoop
  7. 7. Cluster Access
  8. 8. Problems Present – Interact with Cluster  Getting data in  Getting data out
  9. 9. Problems Present – Interact with Cluster  Getting data in  Getting data out … This is the problem. Can’t Get data out Have to extract information
  10. 10. Writing a MapReduce Job  Understanding MapReduce  Boilerplate is boring  Testing takes time  Debugging is difficult What Happened?
  11. 11. Karmasphere Job Developer
  12. 12. Present Continuous  Why did my job fail? › Monitoring › Diagnostics › Debugging  What do I need to know about my job? › Valgrind, lint, coverity, gprof, gdb, findbugs, sparse, JSR305, ....  Why did my job do ….?
  13. 13. Karmasphere Studio - Continuous
  14. 14. Problems Future  Hive  Pig  Cascading  Others ….
  15. 15. High Level Languages - Challenges  Accessibility  Integration  Portability  Diagnostics
  16. 16. Karmasphere Application Framework
  17. 17. Traditional Approach Karmasphere Approach User User Client Side Rich communications required for Hive Rich Communication Supported within Karmasphere Application framework Debug/ optimization information Hive JDBC Thrift Proxy Karmasphere Application All communications Framework ‘hampered’ through JDBC Thrift proxy Thrift Server Native Hadoop Protocol Hive Engine Server Side Hadoop Client Job Tracker Job Tracker Cluster Cluster (Hadoop) (Hadoop)
  18. 18. Your time costs money Theory Results Experiment Confidential
  19. 19. Get Working Efficiently with Hadoop  Karmasphere Studio: Community Edition Free  Karmasphere Studio: Professional Edition › ($200 introductory discount for attendees)  Karmasphere Client (Enterprise license)  Karmasphere Studio: Analyst Edition › Coming sooner than you think!
  20. 20. Questions? shevek@karmasphere.com

×