Successfully reported this slideshow.
Your SlideShare is downloading. ×

Hadoop: the Big Answer to the Big Question of the Big Data

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 27 Ad
Advertisement

More Related Content

Slideshows for you (18)

Similar to Hadoop: the Big Answer to the Big Question of the Big Data (20)

Advertisement

Recently uploaded (20)

Advertisement

Hadoop: the Big Answer to the Big Question of the Big Data

  1. 1. THE ANSWER TO THE QUESTION OF THE DATA eleks by Victor Haydin DevTalks #1
  2. 2. Gordon Moore
  3. 3. 1975 2012 Cost of 1 TB storage $208 000 000 $110 Cost of 1 GFLOPS/s computing facility $62 000 000 $1.50 Number of network hosts 57 > 1 000 000 000 World’s data amount ~130 GB ~2.9 ZB
  4. 4. 1 ZB = 1 000 000 000 000 000 000 000 B (1021)
  5. 5. Commodity Hardware
  6. 6. Wikipedia: “Apache Hadoop is a software framework that supports data-intensive distributed applications”
  7. 7. Main Contributors
  8. 8. HDFS: Hadoop Distributed File System Hardware Failure Streaming Data Access Large Data Sets Simple Coherency Mode (write-once) Portability
  9. 9. Moving Computation is cheaper then moving Data
  10. 10. MapReduce
  11. 11. Map(k1,v1) → list(k2,v2) void map(string key, string value): for each word w in value: yield return KeyValuePair(w, 1); Reduce(k2, list (v2)) → list(v3) void reduce(string key, int[] values): int sum = 0; for each pc in values: sum += pc; return KeyValuePair(key, sum);
  12. 12. Demo
  13. 13. Ecosystem ZooKeeper
  14. 14. 45K nodes, 180-200 PB 3K+ nodes, 36+ PB
  15. 15. powered by
  16. 16. Future Core: • HDFS: high-availability and scalability • MapReduce: modularity and alternative ways to perform queries Ecosystem development: • Apache BigTop: consolidation project • HBase, Hive, Pig, ZooKeeper, Avro, Sqoop: stabilizing, interoperability • Incubator: Flume, Ozzie, Whirr
  17. 17. Demo
  18. 18. Q&A

×