Successfully reported this slideshow.
Your SlideShare is downloading. ×

Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSummit2010

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 20 Ad

More Related Content

Slideshows for you (20)

Similar to Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSummit2010 (20)

Advertisement

More from Yahoo Developer Network (20)

Recently uploaded (20)

Advertisement

Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSummit2010

  1. 1. Got Problems? Developers Most Frequent Headaches and How to Address Them Shevek CTO
  2. 2. Session Agenda  Introduction  Problems Past  Problems Present  Problems Future  Wrap Up 2
  3. 3. Who am I!  Co-founder and CTO at  Architect of Karmasphere’s solutions  Have been working with Hadoop since …  Written a few compilers  Broken a few things: › computers, security systems, bosses, etc. 3
  4. 4. Survey of Questions 1164 Questions 100% Others 80% •How to maintain the cluster? •Why does Hadoop do ….? 60% •How to know what the cluster is doing? 40% •How to use Hadoop? •How to get stuff to/from Hadoop? 20% •How to setup Hadoop? 0% Based on user questions and issues 4 Source: Hadoop Users Mail-list (March 2009-June 2010
  5. 5. Problems Past –Cluster as a Utility  Getting a cluster – it’s a utility (like electricity) › Amazon EMR, Hadoop, Cloudera, IBM, Yahoo  Cluster versions and protocols › Easy to switch between clusters › Staging for faster development › Easy to migrate data › Talk to remote clusters
  6. 6. Karmasphere Client  Ensures Hadoop distribution and version independence  Works from Windows (unlike Hadoop Client), Mac and Linux  Supports any Hadoop environment: private, public or cloud service.  Provides: › Job portability › Operating system portability › Firewall hopping and tunnelling › Fault tolerant API › Synchronous and Asynchronous API › Clean Object Oriented design  Making it easy and predictable to maintain a business operation reliant on Hadoop
  7. 7. Cluster Access
  8. 8. Problems Present – Interact with Cluster  Getting data in  Getting data out
  9. 9. Problems Present – Interact with Cluster  Getting data in  Getting data out … This is the problem. Can’t Get data out Have to extract information
  10. 10. Writing a MapReduce Job  Understanding MapReduce  Boilerplate is boring  Testing takes time  Debugging is difficult What Happened?
  11. 11. Karmasphere Job Developer
  12. 12. Present Continuous  Why did my job fail? › Monitoring › Diagnostics › Debugging  What do I need to know about my job? › Valgrind, lint, coverity, gprof, gdb, findbugs, sparse, JSR305, ....  Why did my job do ….?
  13. 13. Karmasphere Studio - Continuous
  14. 14. Problems Future  Hive  Pig  Cascading  Others ….
  15. 15. High Level Languages - Challenges  Accessibility  Integration  Portability  Diagnostics
  16. 16. Karmasphere Application Framework
  17. 17. Traditional Approach Karmasphere Approach User User Client Side Rich communications required for Hive Rich Communication Supported within Karmasphere Application framework Debug/ optimization information Hive JDBC Thrift Proxy Karmasphere Application All communications Framework ‘hampered’ through JDBC Thrift proxy Thrift Server Native Hadoop Protocol Hive Engine Server Side Hadoop Client Job Tracker Job Tracker Cluster Cluster (Hadoop) (Hadoop)
  18. 18. Your time costs money Theory Results Experiment Confidential
  19. 19. Get Working Efficiently with Hadoop  Karmasphere Studio: Community Edition Free  Karmasphere Studio: Professional Edition › ($200 introductory discount for attendees)  Karmasphere Client (Enterprise license)  Karmasphere Studio: Analyst Edition › Coming sooner than you think!
  20. 20. Questions? shevek@karmasphere.com

×