0
Apache Hadoop, MapReduce           &     Windows Azure   Guðmundur Jón Halldórsson        Five Degrees          July 2012
Web crawler! „No this isn‘t about that“
What is Hadoop?System for processingmind-boggingly largeamount of data
HadoopMap-Reduce = Computation  HDFS     = Storage
HDFSHadoop Distributed File SystemYes it is file system written in Java And you can do normal file system operationslike ...
HDFSHDFS will keep 3 copies of each blockThe NameNode tracks blocks and datanodes  DN1   DN2   DN3                        ...
Map-Reduce• Write a mapper that takes a key and value,  emits zero or more new keys and values• Write a reducer all the va...
Map-Reduce JS examplevar map = function ( key, value, context ) {    var words = value.split(/[^a-zA-Z]/);    for ( var i=...
MapReduce
Data Systems and Their Timeframes
Does hadoop solve all my DATAproblems or is are there something         else out there?
•   PIG         High-level MapReduce Language•   HIVE        SQL Like high-level MapReduce Language•   HBase       Realtim...
Upcoming SlideShare
Loading in...5
×

2012 apache hadoop_map_reduce_windows_azure

578

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
578
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "2012 apache hadoop_map_reduce_windows_azure"

  1. 1. Apache Hadoop, MapReduce & Windows Azure Guðmundur Jón Halldórsson Five Degrees July 2012
  2. 2. Web crawler! „No this isn‘t about that“
  3. 3. What is Hadoop?System for processingmind-boggingly largeamount of data
  4. 4. HadoopMap-Reduce = Computation HDFS = Storage
  5. 5. HDFSHadoop Distributed File SystemYes it is file system written in Java And you can do normal file system operationslike [ls, mkdir, ...].Works best with large files. HDFS splits file intoblocks of 128 MB (can be configures)
  6. 6. HDFSHDFS will keep 3 copies of each blockThe NameNode tracks blocks and datanodes DN1 DN2 DN3 NN DN4 DN5 DN5 Namenode DN1, DN4, DN7 DN3, DN5, DN8 DN5 DN8 DN9 DN3, DN4, DN5
  7. 7. Map-Reduce• Write a mapper that takes a key and value, emits zero or more new keys and values• Write a reducer all the values of one key and emits zero or more new keys and values
  8. 8. Map-Reduce JS examplevar map = function ( key, value, context ) { var words = value.split(/[^a-zA-Z]/); for ( var i=0; i < words.length; i++ ) { if ( words[i] !== „“ ) { context.write( words[i].toLowerCase(), 1 ); } }}; var reduce = function ( key, values, context ) { var sum = 0; while ( values.hasNext() ) { sum += parseInt( values.next() ); } context.write( key, sum );}
  9. 9. MapReduce
  10. 10. Data Systems and Their Timeframes
  11. 11. Does hadoop solve all my DATAproblems or is are there something else out there?
  12. 12. • PIG High-level MapReduce Language• HIVE SQL Like high-level MapReduce Language• HBase Realtime processing (based on google BigTable)• Accumulo NSA fork of Hbase• Avro Data Serialization• ZooKeeper Low level coordination• HCatalog Storage Management and interoperability between all systems• OOZIE Job scheduler• Flume Log and data aggregation• Whirr Automated cloud cluster on ec2, rackspace etc• Sqoop Relational data importer• MrUnit Unit testing job• Mahout Machine learning libraries• BigTop Interoperability• Crunch MapReduce pipelines in Java and Scala• Giraph Processing math on huge distribute graphs
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×