Your SlideShare is downloading. ×
  • Like
2012 apache hadoop_map_reduce_windows_azure
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

2012 apache hadoop_map_reduce_windows_azure



Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Apache Hadoop, MapReduce & Windows Azure Guðmundur Jón Halldórsson Five Degrees July 2012
  • 2. Web crawler! „No this isn‘t about that“
  • 3. What is Hadoop?System for processingmind-boggingly largeamount of data
  • 4. HadoopMap-Reduce = Computation HDFS = Storage
  • 5. HDFSHadoop Distributed File SystemYes it is file system written in Java And you can do normal file system operationslike [ls, mkdir, ...].Works best with large files. HDFS splits file intoblocks of 128 MB (can be configures)
  • 6. HDFSHDFS will keep 3 copies of each blockThe NameNode tracks blocks and datanodes DN1 DN2 DN3 NN DN4 DN5 DN5 Namenode DN1, DN4, DN7 DN3, DN5, DN8 DN5 DN8 DN9 DN3, DN4, DN5
  • 7. Map-Reduce• Write a mapper that takes a key and value, emits zero or more new keys and values• Write a reducer all the values of one key and emits zero or more new keys and values
  • 8. Map-Reduce JS examplevar map = function ( key, value, context ) { var words = value.split(/[^a-zA-Z]/); for ( var i=0; i < words.length; i++ ) { if ( words[i] !== „“ ) { context.write( words[i].toLowerCase(), 1 ); } }}; var reduce = function ( key, values, context ) { var sum = 0; while ( values.hasNext() ) { sum += parseInt( ); } context.write( key, sum );}
  • 9. MapReduce
  • 10. Data Systems and Their Timeframes
  • 11. Does hadoop solve all my DATAproblems or is are there something else out there?
  • 12. • PIG High-level MapReduce Language• HIVE SQL Like high-level MapReduce Language• HBase Realtime processing (based on google BigTable)• Accumulo NSA fork of Hbase• Avro Data Serialization• ZooKeeper Low level coordination• HCatalog Storage Management and interoperability between all systems• OOZIE Job scheduler• Flume Log and data aggregation• Whirr Automated cloud cluster on ec2, rackspace etc• Sqoop Relational data importer• MrUnit Unit testing job• Mahout Machine learning libraries• BigTop Interoperability• Crunch MapReduce pipelines in Java and Scala• Giraph Processing math on huge distribute graphs