Hadoop for humans

489 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
489
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hadoop for humans

  1. 1. Hadoop for humans Kien Pham Software Engineer - R&D Anaheim, CA 10/04/2013 Friday, October 4, 13
  2. 2. Hadoop? Friday, October 4, 13
  3. 3. is a framework HDFS Map /Reduce http://www.flickr.com/photos/d90nikon/6195610430/sizes/o/in/photostream/ Friday, October 4, 13
  4. 4. Map / Reduce Friday, October 4, 13
  5. 5. Mapper I like SendGrid and email, you like SendGrid and email too 1 1 1 1 1 Friday, October 4, 13
  6. 6. Mapper I like SendGrid and email, you like SendGrid and email too 1 1 1 1 1 I like SendGrid and email, you like SendGrid and email too 1 1 1 1 1 I like SendGrid and email, you like SendGrid and email too 1 1 1 1 1 worker 1 worker 2 worker 3 Friday, October 4, 13
  7. 7. Reducer 1like SendGrid email SendGrid email 1 1 1 1 1like SendGrid email 2 2 Friday, October 4, 13
  8. 8. 1like SendGrid email 2 2 key value Friday, October 4, 13
  9. 9. key value {"d": "2013-09-01", "t": "j"} {"d": "2013-09-02", "t": "j"} {"d": "2013-09-01", "t": "x"} {"d": "2013-09-02", "t": "x"} 764872 269661 190889 71693 Friday, October 4, 13
  10. 10. HDFS Friday, October 4, 13
  11. 11. HDFS Friday, October 4, 13
  12. 12. HDFS @ SG 138 TB Friday, October 4, 13
  13. 13. 1 TB = 1,024 GB 138TB = 141,312 GB 300GB / day 141,312 GB / 300 GB = 471 days Friday, October 4, 13
  14. 14. S3 Friday, October 4, 13
  15. 15. 2015 50% of the world’s data Hadoop will process http://www.flickr.com/photos/tisdale53/4737492082/ Friday, October 4, 13
  16. 16. custom jobs? Friday, October 4, 13
  17. 17. mrgumble Friday, October 4, 13
  18. 18. abstract Hadoop process Friday, October 4, 13
  19. 19. start stop status result Friday, October 4, 13
  20. 20. mrgumble start -j my_cool_job Friday, October 4, 13
  21. 21. mrgumble stop -j my_cool_job Friday, October 4, 13
  22. 22. mrgumble status --job_id 1234 Friday, October 4, 13
  23. 23. mrgumble result -j job_name Friday, October 4, 13
  24. 24. excited? Friday, October 4, 13
  25. 25. template.py hadoop-jobs repo jobs/ Friday, October 4, 13
  26. 26. import mrgumble import sgstats-hadoop Friday, October 4, 13
  27. 27. Live Demo Friday, October 4, 13

×