Geek camp

763 views

Published on

Slides of my talk 'Intro to Hadoop' @geekcamp.sg

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
763
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Geek camp

  1. 1. Intro to Hadoop Jaideep Dhok
  2. 2. Hi! ● I work at ● Involved with Hadoop for 2+ years
  3. 3. Outline
  4. 4. Brief History of Hadoop ● 2005 - ● Inspired by the GFS and MapReduce papers published by Google. ● Promoted heavily by Yahoo! Since 2006 ● Today, the defacto standard in 'Big Data' computing
  5. 5. The Buzz
  6. 6. Why? ● 'Big Data' ● How big? - petabyte scale ● Scalable ● Robust ● Secure!
  7. 7. Scalability
  8. 8. When To Use It ● Can you use Hadoop to do X? ● Is your problem 'embarassingly' parallel? ● Workflow? – Dependent/Independent Tasks ● Data/CPU intensive? ● Can you use Hadoop to do X in the Clouds? ● Depends where your data is
  9. 9. Why To Use It? ● Ad hoc analysis ● Semi/structured data – Log files – Text – CSV, XML, anything really – RDBMS – NoSQL!
  10. 10. Use Cases ● Analytics ● User behavior ● Reporting ● Filtering ● Machine Learning ● Just storing your data
  11. 11. Just From The Logs ● Suppose you run a web-site ● User breakdown by browsers ● Location ● Understanding user session – How long do they use it? – Who are the active users? – What part of my app they use the most? – What part of my app is user X's fav?
  12. 12. Tools ● Native Hadoop APIs – Java ● Streaming – Perl, Python, Ruby, any language as long it has support for 'stdin' and 'stdout' ● Pig ● HIVE ● Pipes – C and C++
  13. 13. Ecosystem
  14. 14. Don't Wait ● Hadoop ● hadoop.apache.org ● Cloudera tutorials on Hadoop ● Books
  15. 15. Questions?
  16. 16. Thank You! jaideep.dhok@gmail.com

×