THE   ANSWER     TO THE   QUESTION     OF THE   DATAeleks         by Victor HaydinDevTalks #1
Gordon Moore
1975                                       2012                 Cost of 1 TB storage$208 000 000                          ...
1 ZB = 1 000 000 000 000 000 000 000 B                 (1021)
Commodity Hardware
Wikipedia: “Apache Hadoop is a softwareframework that supports data-intensivedistributed applications”
Main Contributors
HDFS: Hadoop Distributed File System   Hardware Failure   Streaming Data Access   Large Data Sets   Simple Coherency Mode ...
Moving Computation is cheaper then          moving Data
MapReduce
Map(k1,v1) → list(k2,v2)void map(string key, string value):  for each word w in value:    yield return KeyValuePair(w, 1);...
Demo
EcosystemZooKeeper
45K nodes, 180-200 PB3K+ nodes, 36+ PB
powered by
FutureCore:• HDFS: high-availability and scalability• MapReduce: modularity and alternative ways to perform queriesEcosyst...
Demo
Q&A
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Upcoming SlideShare
Loading in …5
×

Hadoop: the Big Answer to the Big Question of the Big Data

6,812 views
6,714 views

Published on

More info: http://www.elekslabs.com/2012/02/devtalks-1-presentations.html
Video: http://www.youtube.com/watch?feature=player_embedded&v=GENRle60Elk

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,812
On SlideShare
0
From Embeds
0
Number of Embeds
5,524
Actions
Shares
0
Downloads
24
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Hadoop: the Big Answer to the Big Question of the Big Data

  1. 1. THE ANSWER TO THE QUESTION OF THE DATAeleks by Victor HaydinDevTalks #1
  2. 2. Gordon Moore
  3. 3. 1975 2012 Cost of 1 TB storage$208 000 000 $110 Cost of 1 GFLOPS/s computing facility$62 000 000 $1.50 Number of network hosts 57 > 1 000 000 000 World’s data amount~130 GB ~2.9 ZB
  4. 4. 1 ZB = 1 000 000 000 000 000 000 000 B (1021)
  5. 5. Commodity Hardware
  6. 6. Wikipedia: “Apache Hadoop is a softwareframework that supports data-intensivedistributed applications”
  7. 7. Main Contributors
  8. 8. HDFS: Hadoop Distributed File System Hardware Failure Streaming Data Access Large Data Sets Simple Coherency Mode (write-once) Portability
  9. 9. Moving Computation is cheaper then moving Data
  10. 10. MapReduce
  11. 11. Map(k1,v1) → list(k2,v2)void map(string key, string value): for each word w in value: yield return KeyValuePair(w, 1);Reduce(k2, list (v2)) → list(v3)void reduce(string key, int[] values): int sum = 0; for each pc in values: sum += pc; return KeyValuePair(key, sum);
  12. 12. Demo
  13. 13. EcosystemZooKeeper
  14. 14. 45K nodes, 180-200 PB3K+ nodes, 36+ PB
  15. 15. powered by
  16. 16. FutureCore:• HDFS: high-availability and scalability• MapReduce: modularity and alternative ways to perform queriesEcosystem development:• Apache BigTop: consolidation project• HBase, Hive, Pig, ZooKeeper, Avro, Sqoop: stabilizing,interoperability• Incubator: Flume, Ozzie, Whirr
  17. 17. Demo
  18. 18. Q&A

×