Your SlideShare is downloading. ×
Hadoop for humans
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop for humans

323
views

Published on

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
323
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop for humans Kien Pham Software Engineer - R&D Anaheim, CA 10/04/2013 Friday, October 4, 13
  • 2. Hadoop? Friday, October 4, 13
  • 3. is a framework HDFS Map /Reduce http://www.flickr.com/photos/d90nikon/6195610430/sizes/o/in/photostream/ Friday, October 4, 13
  • 4. Map / Reduce Friday, October 4, 13
  • 5. Mapper I like SendGrid and email, you like SendGrid and email too 1 1 1 1 1 Friday, October 4, 13
  • 6. Mapper I like SendGrid and email, you like SendGrid and email too 1 1 1 1 1 I like SendGrid and email, you like SendGrid and email too 1 1 1 1 1 I like SendGrid and email, you like SendGrid and email too 1 1 1 1 1 worker 1 worker 2 worker 3 Friday, October 4, 13
  • 7. Reducer 1like SendGrid email SendGrid email 1 1 1 1 1like SendGrid email 2 2 Friday, October 4, 13
  • 8. 1like SendGrid email 2 2 key value Friday, October 4, 13
  • 9. key value {"d": "2013-09-01", "t": "j"} {"d": "2013-09-02", "t": "j"} {"d": "2013-09-01", "t": "x"} {"d": "2013-09-02", "t": "x"} 764872 269661 190889 71693 Friday, October 4, 13
  • 10. HDFS Friday, October 4, 13
  • 11. HDFS Friday, October 4, 13
  • 12. HDFS @ SG 138 TB Friday, October 4, 13
  • 13. 1 TB = 1,024 GB 138TB = 141,312 GB 300GB / day 141,312 GB / 300 GB = 471 days Friday, October 4, 13
  • 14. S3 Friday, October 4, 13
  • 15. 2015 50% of the world’s data Hadoop will process http://www.flickr.com/photos/tisdale53/4737492082/ Friday, October 4, 13
  • 16. custom jobs? Friday, October 4, 13
  • 17. mrgumble Friday, October 4, 13
  • 18. abstract Hadoop process Friday, October 4, 13
  • 19. start stop status result Friday, October 4, 13
  • 20. mrgumble start -j my_cool_job Friday, October 4, 13
  • 21. mrgumble stop -j my_cool_job Friday, October 4, 13
  • 22. mrgumble status --job_id 1234 Friday, October 4, 13
  • 23. mrgumble result -j job_name Friday, October 4, 13
  • 24. excited? Friday, October 4, 13
  • 25. template.py hadoop-jobs repo jobs/ Friday, October 4, 13
  • 26. import mrgumble import sgstats-hadoop Friday, October 4, 13
  • 27. Live Demo Friday, October 4, 13