Welcome to the Age of Data

1,771 views

Published on

An introductory presentation on Big Data and Hadoop for bigdate.be - presented 11/Jan/2012 at Accenture (Brussels).

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,771
On SlideShare
0
From Embeds
0
Number of Embeds
514
Actions
Shares
0
Downloads
69
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Welcome to the Age of Data

  1. 1. Welcome to the age of data! BIGDATA.BE IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  2. 2. who am i» Steven Noels» Founder & VP Product » Makers of Lily: Interactive Big Data platform» Open Source / Apache Software Foundation» co-founder bigdata.be IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2
  3. 3. Houston, we havea problem.IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  4. 4. We’redrowning. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  5. 5. Drowningin aSeaofData. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  6. 6. Mountains of Metadata.IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  7. 7. The firehose of UGC.IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  8. 8. Still, wecan’t makemuch sense of it. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  9. 9. ... and wethrow a lot of it away.IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  10. 10. We regardDATA as cost. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  11. 11. But data is anopportunity. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  12. 12. Think about it. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  13. 13. advertisementsIIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
  14. 14. recommendations IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
  15. 15. fraud detection IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
  16. 16. eyeballsIIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  17. 17. churnIIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  18. 18. The future isfordatanerds. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  19. 19. This is what BigData is about:new insights,new business. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  20. 20. 3 issues forBIG DATA IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  21. 21. volume need: more capacity data moore 1 time IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
  22. 22. solution:distributed systems 1 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22
  23. 23. 1IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23
  24. 24. distributedsystems are 1hard. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  25. 25. 2database IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  26. 26. 2database data warehouse IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  27. 27. 2database data warehouse analytics IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  28. 28. data shuffling, data duplication 2 database data warehouse analytics IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  29. 29. “Top-performing organizations are twice as likely to apply analytics to activities.” 3 (MIT Sloan Management Review, Winter 2011)IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26
  30. 30. enter IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 27
  31. 31. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
  32. 32. HBaseIIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
  33. 33. what is hadoop ? 1 server RAM CPU Disk IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 29
  34. 34. RAM HBASECPU MAP/REDUCEDISK HDFS many servers IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 30
  35. 35. map/reduce IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 31
  36. 36. map/reduce» Batch-oriented» Data locality (code is shipped around)» Heavy parallellization» Process management» Append-only files IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 32
  37. 37. Hadoop ecosystem» Hadoop Common » Hive: A data warehouse infrastructure» Subprojects that provides data summarization and ad hoc querying. » Flume/SQOOP: Data collection systems » MapReduce: A software framework for for large distributed systems. distributed processing of large data » HBase: A scalable, distributed database sets on compute clusters. that supports structured data storage » Pig: A high-level data-flow language for large/wide tables. and execution framework for parallel » HDFS: A distributed file system that computation. provides high throughput access to » ZooKeeper: A high-performance application data. coordination service for distributed applications. » Mahout: machine learning libraries IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33
  38. 38. High-level data model / easy API indexes UI Framework SDK (HUE) (HUE SDK) Search Dev2Dev Workflow Scheduling Metadata tutoring, (OOZIE) (oozie) (HIVE) integrated deployment and Languages / enterprise Data Compilers Fast usage metrics, supportIntegration (PIG, HIVE) Read/Write analytics & (FLUME, Access recommen- SQOOP) (HBASE) dations (PIG, HIVE) Coordination (ZOOKEEPER) CDH IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 34
  39. 39. real-time big data architecture 1. compensate for high latency of updates to serving layer speed layer 2. fast, incremental algorithms 3. batch layer eventually overrides speed layer storm 1. random access to batch views serving layer 2. updated by batch layer 1. store master dataset (append-only) batch layer 2. compute arbitrary views IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35
  40. 40. Hadoop, interactive.Analytics Interactics (RDBMS) batch interactive static files data management 1018 1015 109-12 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 36
  41. 41. news & media smart data management insights indexing searchcommerce finance interactive audience profile metrics harvesting telecom My baby: Lily. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 37
  42. 42. The start of Lily.IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38
  43. 43. Thank you ! for your attention for your questions » steven.noels@outerthought.com » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

×