Analyzing Hadoop with Hadoop

796 views

Published on

Talk that I gave at Berlin Buzzwords 2012. It shows why Hive doesn't fit in the Hadoop No-SQL environment and some examples of what information we were able to extract from the Hadoop user mailing list and git logs.

Published in: Technology
1 Comment
0 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/Hadoop.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
796
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
17
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Analyzing Hadoop with Hadoop

  1. 1. Analyzing Hadoop with HadoopMontag, 4. Juni 12
  2. 2. Data Grows Faster Than Moores Law! Unstructured: 61.7% growth Structured: 21.8 % growth http://www.emc.com/about/news/press/2011/20110628-01.htm © sg@datameer.com, confidential - Do not distributeMontag, 4. Juni 12
  3. 3. 30+ Years Workflow Slow Static Barrier Business ETL Data Warehouse Intelligence Fast Dynamic Agile Raw Load Hadoop Analytics © sg@datameer.com, confidential - Do not distributeMontag, 4. Juni 12
  4. 4. Hadoop + Hive SQL 10+M NO-SQL Hadoop LOC http://thepage.time.com/2009/04/18/why-is-this-elephant-crying/ http://dearcomputer.nl/gir/?q=nerd+&s=4&b=Rip+Google!Montag, 4. Juni 12
  5. 5. Evolution backward 1970’ ANSI SQL ORM JDO NO-SQL Hive SEQUEL Structured English Query Language http://chelseavose.wordpress.com/2012/01/26/is-evolution-real/Montag, 4. Juni 12
  6. 6. Unstructured + StructuredMontag, 4. Juni 12
  7. 7. git log --numstat --pretty=format:%H,%ai,%cn,%ce%+BMontag, 4. Juni 12
  8. 8. Data Quality?Montag, 4. Juni 12
  9. 9. Results...Montag, 4. Juni 12
  10. 10. Commits per Year 200Montag, 4. Juni 12
  11. 11. LOC Changes per Year 7,000,000Montag, 4. Juni 12
  12. 12. Most Lines Added 1,500,000Montag, 4. Juni 12
  13. 13. 2006 eMails vs Commits commits emails 72Montag, 4. Juni 12
  14. 14. 2011 eMails vs Commits commits emails 559Montag, 4. Juni 12
  15. 15. EMails per Month 800Montag, 4. Juni 12
  16. 16. Most Discussed, Least ChangedMontag, 4. Juni 12
  17. 17. Most Active Emailers 900Montag, 4. Juni 12
  18. 18. We’re hiring!Montag, 4. Juni 12
  19. 19. Emails with Most RepliesMontag, 4. Juni 12
  20. 20. Avg Characters per Commit Message 120Montag, 4. Juni 12
  21. 21. Longest Comment 35,000Montag, 4. Juni 12
  22. 22. Email Activity per TimezoneMontag, 4. Juni 12
  23. 23. Follow us: @datameerMontag, 4. Juni 12

×