2012 apache hadoop_map_reduce_windows_azure
Upcoming SlideShare
Loading in...5
×
 

2012 apache hadoop_map_reduce_windows_azure

on

  • 862 views

 

Statistics

Views

Total Views
862
Views on SlideShare
862
Embed Views
0

Actions

Likes
1
Downloads
17
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    2012 apache hadoop_map_reduce_windows_azure 2012 apache hadoop_map_reduce_windows_azure Presentation Transcript

    • Apache Hadoop, MapReduce & Windows Azure Guðmundur Jón Halldórsson Five Degrees July 2012
    • Web crawler! „No this isn‘t about that“
    • What is Hadoop?System for processingmind-boggingly largeamount of data
    • HadoopMap-Reduce = Computation HDFS = Storage
    • HDFSHadoop Distributed File SystemYes it is file system written in Java And you can do normal file system operationslike [ls, mkdir, ...].Works best with large files. HDFS splits file intoblocks of 128 MB (can be configures)
    • HDFSHDFS will keep 3 copies of each blockThe NameNode tracks blocks and datanodes DN1 DN2 DN3 NN DN4 DN5 DN5 Namenode DN1, DN4, DN7 DN3, DN5, DN8 DN5 DN8 DN9 DN3, DN4, DN5
    • Map-Reduce• Write a mapper that takes a key and value, emits zero or more new keys and values• Write a reducer all the values of one key and emits zero or more new keys and values
    • Map-Reduce JS examplevar map = function ( key, value, context ) { var words = value.split(/[^a-zA-Z]/); for ( var i=0; i < words.length; i++ ) { if ( words[i] !== „“ ) { context.write( words[i].toLowerCase(), 1 ); } }}; var reduce = function ( key, values, context ) { var sum = 0; while ( values.hasNext() ) { sum += parseInt( values.next() ); } context.write( key, sum );}
    • MapReduce
    • Data Systems and Their Timeframes
    • Does hadoop solve all my DATAproblems or is are there something else out there?
    • • PIG High-level MapReduce Language• HIVE SQL Like high-level MapReduce Language• HBase Realtime processing (based on google BigTable)• Accumulo NSA fork of Hbase• Avro Data Serialization• ZooKeeper Low level coordination• HCatalog Storage Management and interoperability between all systems• OOZIE Job scheduler• Flume Log and data aggregation• Whirr Automated cloud cluster on ec2, rackspace etc• Sqoop Relational data importer• MrUnit Unit testing job• Mahout Machine learning libraries• BigTop Interoperability• Crunch MapReduce pipelines in Java and Scala• Giraph Processing math on huge distribute graphs