Introducing the hadoop ecosystem

1,012 views
902 views

Published on

Introducing the Hadoop Ecosystem, a presentation I gave at KMO Kennisbeurs on October 25th

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,012
On SlideShare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Introducing the hadoop ecosystem

  1. 1. IntroducingThe Hadoop EcosystemThe Hadoop Ecosystem
  2. 2. Context: Performance Gap Trend Introduction to the Hadoop Ecosystem 2
  3. 3. Context: Exponential for Decades Abundance of - computing & storage - generated data (estimated 8ZB in ’15) - things More data provides greater value Traditional data doesn’t scale well It’s time for a new approach! Introduction to the Hadoop Ecosystem 3
  4. 4. New Hardware ApproachTraditional Big Data Exotic HW Commodity HW - big central servers -racks of pizza boxes - SAN -Ethernet - RAID -JBOD Hardware reliability Unreliable HW Scales further Limited scalability Cost effective Expensive Introduction to the Hadoop Ecosystem 4
  5. 5. New Software ApproachTraditional Big Data Monolotic Distributed - Centralized -storage & compute nodes - RDBMS Raw data Schema first Open source Proprietary Introduction to the Hadoop Ecosystem 5
  6. 6. Hadoop De facto big data industry standard (batch) Vendor adoption - IBM, Microsoft, Oracle, EMC, ... A collection of projects at Apache - HDFS, MapReduce, Hive, Pig, Hbase, Flume, Oozie, ... Main components - HDFS - MapReduce Cluster Set of machines running HDFS and MapReduce Introduction to the Hadoop Ecosystem 6
  7. 7. HDFS Introduction to the Hadoop Ecosystem 7
  8. 8. MapReduce Introduction to the Hadoop Ecosystem 8
  9. 9. MapReduce Introduction to the Hadoop Ecosystem 9
  10. 10. MapReduce Introduction to the Hadoop Ecosystem 10
  11. 11. Typical Adoption Pattern An idea that’s impractical without Hadoop Build Hadoop-based POC Move initial application to production Add more datasets and users - removing data silos in organizations - permitting easy experiments on real data Snowballs into institution’s central repository for - analysis data processing data service layer Introduction to the Hadoop Ecosystem 11
  12. 12. Use Case 1: Truvo Introduction to the Hadoop Ecosystem 12
  13. 13. Use Case 2: UZ Brussel Introduction to the Hadoop Ecosystem 13
  14. 14. How can you use Hadoop? What data are you ignoring? - How can you use it? How can you combine internal and external data? - Business partners - Feedback from you customers through social media - End your data silos - ... Introduction to the Hadoop Ecosystem 14
  15. 15. DataCrunchers - Big Data Enablers Introduction to the Hadoop Ecosystem 15
  16. 16. Introduction to the Hadoop Ecosystem 16

×