Your SlideShare is downloading. ×
2012 apache hadoop_map_reduce_windows_azure
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

2012 apache hadoop_map_reduce_windows_azure

558
views

Published on

Published in: Technology, Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
558
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Apache Hadoop, MapReduce & Windows Azure Guðmundur Jón Halldórsson Five Degrees July 2012
  • 2. Web crawler! „No this isn‘t about that“
  • 3. What is Hadoop?System for processingmind-boggingly largeamount of data
  • 4. HadoopMap-Reduce = Computation HDFS = Storage
  • 5. HDFSHadoop Distributed File SystemYes it is file system written in Java And you can do normal file system operationslike [ls, mkdir, ...].Works best with large files. HDFS splits file intoblocks of 128 MB (can be configures)
  • 6. HDFSHDFS will keep 3 copies of each blockThe NameNode tracks blocks and datanodes DN1 DN2 DN3 NN DN4 DN5 DN5 Namenode DN1, DN4, DN7 DN3, DN5, DN8 DN5 DN8 DN9 DN3, DN4, DN5
  • 7. Map-Reduce• Write a mapper that takes a key and value, emits zero or more new keys and values• Write a reducer all the values of one key and emits zero or more new keys and values
  • 8. Map-Reduce JS examplevar map = function ( key, value, context ) { var words = value.split(/[^a-zA-Z]/); for ( var i=0; i < words.length; i++ ) { if ( words[i] !== „“ ) { context.write( words[i].toLowerCase(), 1 ); } }}; var reduce = function ( key, values, context ) { var sum = 0; while ( values.hasNext() ) { sum += parseInt( values.next() ); } context.write( key, sum );}
  • 9. MapReduce
  • 10. Data Systems and Their Timeframes
  • 11. Does hadoop solve all my DATAproblems or is are there something else out there?
  • 12. • PIG High-level MapReduce Language• HIVE SQL Like high-level MapReduce Language• HBase Realtime processing (based on google BigTable)• Accumulo NSA fork of Hbase• Avro Data Serialization• ZooKeeper Low level coordination• HCatalog Storage Management and interoperability between all systems• OOZIE Job scheduler• Flume Log and data aggregation• Whirr Automated cloud cluster on ec2, rackspace etc• Sqoop Relational data importer• MrUnit Unit testing job• Mahout Machine learning libraries• BigTop Interoperability• Crunch MapReduce pipelines in Java and Scala• Giraph Processing math on huge distribute graphs