Your SlideShare is downloading. ×
0
Analyzing                     Hadoop                with HadoopMontag, 4. Juni 12
Data Grows Faster Than Moores Law!                                  Unstructured: 61.7% growth                            ...
30+ Years Workflow                      Slow       Static          Barrier                                                 ...
Hadoop + Hive                         SQL                                                             10+M                ...
Evolution backward                     1970’    ANSI SQL   ORM   JDO   NO-SQL   Hive                     SEQUEL     Struct...
Unstructured + StructuredMontag, 4. Juni 12
git log --numstat --pretty=format:%H,%ai,%cn,%ce%+BMontag, 4. Juni 12
Data Quality?Montag, 4. Juni 12
Results...Montag, 4. Juni 12
Commits per Year                          200Montag, 4. Juni 12
LOC Changes per Year                           7,000,000Montag, 4. Juni 12
Most Lines Added                                 1,500,000Montag, 4. Juni 12
2006 eMails vs Commits        commits        emails                           72Montag, 4. Juni 12
2011 eMails vs Commits         commits         emails                          559Montag, 4. Juni 12
EMails per Month                        800Montag, 4. Juni 12
Most Discussed, Least ChangedMontag, 4. Juni 12
Most Active Emailers                         900Montag, 4. Juni 12
We’re hiring!Montag, 4. Juni 12
Emails with Most RepliesMontag, 4. Juni 12
Avg Characters per Commit                     Message                     120Montag, 4. Juni 12
Longest Comment                                   35,000Montag, 4. Juni 12
Email Activity per TimezoneMontag, 4. Juni 12
Follow us:                     @datameerMontag, 4. Juni 12
Upcoming SlideShare
Loading in...5
×

Analyzing Hadoop with Hadoop

508

Published on

Talk that I gave at Berlin Buzzwords 2012. It shows why Hive doesn't fit in the Hadoop No-SQL environment and some examples of what information we were able to extract from the Hadoop user mailing list and git logs.

Published in: Technology
1 Comment
0 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/Hadoop.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
508
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Analyzing Hadoop with Hadoop"

  1. 1. Analyzing Hadoop with HadoopMontag, 4. Juni 12
  2. 2. Data Grows Faster Than Moores Law! Unstructured: 61.7% growth Structured: 21.8 % growth http://www.emc.com/about/news/press/2011/20110628-01.htm © sg@datameer.com, confidential - Do not distributeMontag, 4. Juni 12
  3. 3. 30+ Years Workflow Slow Static Barrier Business ETL Data Warehouse Intelligence Fast Dynamic Agile Raw Load Hadoop Analytics © sg@datameer.com, confidential - Do not distributeMontag, 4. Juni 12
  4. 4. Hadoop + Hive SQL 10+M NO-SQL Hadoop LOC http://thepage.time.com/2009/04/18/why-is-this-elephant-crying/ http://dearcomputer.nl/gir/?q=nerd+&s=4&b=Rip+Google!Montag, 4. Juni 12
  5. 5. Evolution backward 1970’ ANSI SQL ORM JDO NO-SQL Hive SEQUEL Structured English Query Language http://chelseavose.wordpress.com/2012/01/26/is-evolution-real/Montag, 4. Juni 12
  6. 6. Unstructured + StructuredMontag, 4. Juni 12
  7. 7. git log --numstat --pretty=format:%H,%ai,%cn,%ce%+BMontag, 4. Juni 12
  8. 8. Data Quality?Montag, 4. Juni 12
  9. 9. Results...Montag, 4. Juni 12
  10. 10. Commits per Year 200Montag, 4. Juni 12
  11. 11. LOC Changes per Year 7,000,000Montag, 4. Juni 12
  12. 12. Most Lines Added 1,500,000Montag, 4. Juni 12
  13. 13. 2006 eMails vs Commits commits emails 72Montag, 4. Juni 12
  14. 14. 2011 eMails vs Commits commits emails 559Montag, 4. Juni 12
  15. 15. EMails per Month 800Montag, 4. Juni 12
  16. 16. Most Discussed, Least ChangedMontag, 4. Juni 12
  17. 17. Most Active Emailers 900Montag, 4. Juni 12
  18. 18. We’re hiring!Montag, 4. Juni 12
  19. 19. Emails with Most RepliesMontag, 4. Juni 12
  20. 20. Avg Characters per Commit Message 120Montag, 4. Juni 12
  21. 21. Longest Comment 35,000Montag, 4. Juni 12
  22. 22. Email Activity per TimezoneMontag, 4. Juni 12
  23. 23. Follow us: @datameerMontag, 4. Juni 12
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×