Your SlideShare is downloading. ×
Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSummit2010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSummit2010

1,809

Published on

Hadoop Summit 2010 Developers Track …

Hadoop Summit 2010 Developers Track
Developer's Most Frequent Hadoop Headaches & How to Address Them
Shevek Mankin, Karmasphere

Published in: Technology
1 Comment
7 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,809
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
1
Likes
7
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Got Problems? Developers Most Frequent Headaches and How to Address Them Shevek CTO
  • 2. Session Agenda  Introduction  Problems Past  Problems Present  Problems Future  Wrap Up 2
  • 3. Who am I!  Co-founder and CTO at  Architect of Karmasphere’s solutions  Have been working with Hadoop since …  Written a few compilers  Broken a few things: › computers, security systems, bosses, etc. 3
  • 4. Survey of Questions 1164 Questions 100% Others 80% •How to maintain the cluster? •Why does Hadoop do ….? 60% •How to know what the cluster is doing? 40% •How to use Hadoop? •How to get stuff to/from Hadoop? 20% •How to setup Hadoop? 0% Based on user questions and issues 4 Source: Hadoop Users Mail-list (March 2009-June 2010
  • 5. Problems Past –Cluster as a Utility  Getting a cluster – it’s a utility (like electricity) › Amazon EMR, Hadoop, Cloudera, IBM, Yahoo  Cluster versions and protocols › Easy to switch between clusters › Staging for faster development › Easy to migrate data › Talk to remote clusters
  • 6. Karmasphere Client  Ensures Hadoop distribution and version independence  Works from Windows (unlike Hadoop Client), Mac and Linux  Supports any Hadoop environment: private, public or cloud service.  Provides: › Job portability › Operating system portability › Firewall hopping and tunnelling › Fault tolerant API › Synchronous and Asynchronous API › Clean Object Oriented design  Making it easy and predictable to maintain a business operation reliant on Hadoop
  • 7. Cluster Access
  • 8. Problems Present – Interact with Cluster  Getting data in  Getting data out
  • 9. Problems Present – Interact with Cluster  Getting data in  Getting data out … This is the problem. Can’t Get data out Have to extract information
  • 10. Writing a MapReduce Job  Understanding MapReduce  Boilerplate is boring  Testing takes time  Debugging is difficult What Happened?
  • 11. Karmasphere Job Developer
  • 12. Present Continuous  Why did my job fail? › Monitoring › Diagnostics › Debugging  What do I need to know about my job? › Valgrind, lint, coverity, gprof, gdb, findbugs, sparse, JSR305, ....  Why did my job do ….?
  • 13. Karmasphere Studio - Continuous
  • 14. Problems Future  Hive  Pig  Cascading  Others ….
  • 15. High Level Languages - Challenges  Accessibility  Integration  Portability  Diagnostics
  • 16. Karmasphere Application Framework
  • 17. Traditional Approach Karmasphere Approach User User Client Side Rich communications required for Hive Rich Communication Supported within Karmasphere Application framework Debug/ optimization information Hive JDBC Thrift Proxy Karmasphere Application All communications Framework ‘hampered’ through JDBC Thrift proxy Thrift Server Native Hadoop Protocol Hive Engine Server Side Hadoop Client Job Tracker Job Tracker Cluster Cluster (Hadoop) (Hadoop)
  • 18. Your time costs money Theory Results Experiment Confidential
  • 19. Get Working Efficiently with Hadoop  Karmasphere Studio: Community Edition Free  Karmasphere Studio: Professional Edition › ($200 introductory discount for attendees)  Karmasphere Client (Enterprise license)  Karmasphere Studio: Analyst Edition › Coming sooner than you think!
  • 20. Questions? shevek@karmasphere.com

×