Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
THE   ANSWER     TO THE   QUESTION     OF THE   DATAeleks         by Victor HaydinDevTalks #1
Gordon Moore
1975                                       2012                 Cost of 1 TB storage$208 000 000                          ...
1 ZB = 1 000 000 000 000 000 000 000 B                 (1021)
Commodity Hardware
Wikipedia: “Apache Hadoop is a softwareframework that supports data-intensivedistributed applications”
Main Contributors
HDFS: Hadoop Distributed File System   Hardware Failure   Streaming Data Access   Large Data Sets   Simple Coherency Mode ...
Moving Computation is cheaper then          moving Data
MapReduce
Map(k1,v1) → list(k2,v2)void map(string key, string value):  for each word w in value:    yield return KeyValuePair(w, 1);...
Demo
EcosystemZooKeeper
45K nodes, 180-200 PB3K+ nodes, 36+ PB
powered by
FutureCore:• HDFS: high-availability and scalability• MapReduce: modularity and alternative ways to perform queriesEcosyst...
Demo
Q&A
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Hadoop: the Big Answer to the Big Question of the Big Data
Upcoming SlideShare
Loading in …5
×

Hadoop: the Big Answer to the Big Question of the Big Data

7,002 views

Published on

More info: http://www.elekslabs.com/2012/02/devtalks-1-presentations.html
Video: http://www.youtube.com/watch?feature=player_embedded&v=GENRle60Elk

Published in: Technology
  • Be the first to comment

Hadoop: the Big Answer to the Big Question of the Big Data

  1. 1. THE ANSWER TO THE QUESTION OF THE DATAeleks by Victor HaydinDevTalks #1
  2. 2. Gordon Moore
  3. 3. 1975 2012 Cost of 1 TB storage$208 000 000 $110 Cost of 1 GFLOPS/s computing facility$62 000 000 $1.50 Number of network hosts 57 > 1 000 000 000 World’s data amount~130 GB ~2.9 ZB
  4. 4. 1 ZB = 1 000 000 000 000 000 000 000 B (1021)
  5. 5. Commodity Hardware
  6. 6. Wikipedia: “Apache Hadoop is a softwareframework that supports data-intensivedistributed applications”
  7. 7. Main Contributors
  8. 8. HDFS: Hadoop Distributed File System Hardware Failure Streaming Data Access Large Data Sets Simple Coherency Mode (write-once) Portability
  9. 9. Moving Computation is cheaper then moving Data
  10. 10. MapReduce
  11. 11. Map(k1,v1) → list(k2,v2)void map(string key, string value): for each word w in value: yield return KeyValuePair(w, 1);Reduce(k2, list (v2)) → list(v3)void reduce(string key, int[] values): int sum = 0; for each pc in values: sum += pc; return KeyValuePair(key, sum);
  12. 12. Demo
  13. 13. EcosystemZooKeeper
  14. 14. 45K nodes, 180-200 PB3K+ nodes, 36+ PB
  15. 15. powered by
  16. 16. FutureCore:• HDFS: high-availability and scalability• MapReduce: modularity and alternative ways to perform queriesEcosystem development:• Apache BigTop: consolidation project• HBase, Hive, Pig, ZooKeeper, Avro, Sqoop: stabilizing,interoperability• Incubator: Flume, Ozzie, Whirr
  17. 17. Demo
  18. 18. Q&A

×