Hadoop: Beyond MapReduce
Upcoming SlideShare
Loading in...5
×
 

Hadoop: Beyond MapReduce

on

  • 1,624 views

Overview of the above and beyond MapReduce, for the HPC/science community. Key point: move up the stack, reuse what is there. But: some of these people are capable of writing their own YARN apps, so ...

Overview of the above and beyond MapReduce, for the HPC/science community. Key point: move up the stack, reuse what is there. But: some of these people are capable of writing their own YARN apps, so they should be encouraged to do so if they see a need.


Statistics

Views

Total Views
1,624
Slideshare-icon Views on SlideShare
1,606
Embed Views
18

Actions

Likes
0
Downloads
45
Comments
0

3 Embeds 18

http://tedwon.com 14
http://localhost 3
http://hortonworks.dev 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hadoop: Beyond MapReduce Hadoop: Beyond MapReduce Presentation Transcript

    • © Hortonworks Inc. 2013 Hadoop: Beyond MapReduce Steve Loughran, Hortonworks stevel@hortonworks.com @steveloughran Big Data workshop, June 2013
    • © Hortonworks Inc. Hadoop MapReduce 1. Map: events  <k,v>* pairs 2. Reduce: <k,[v1, v2,.. vn]>  <k,v'> •Map trivially parallelisable on blocks in a file •Reduce parallelise on keys •MapReduce engine can execute Map and Reduce sequences against data •HDFS provides data location for work placement Page 2
    • © Hortonworks Inc. MapReduce democratised big data •Conceptual model easy to grasp •Can write and test locally, superlinear scaleup •Tools and stack Page 3 You don't need to understand parallel coding to run apps across 1000 machines
    • © Hortonworks Inc. 2012 The stack is key to use Page 4 Kafka
    • © Hortonworks Inc. 2012 Example: Pig Page 5 generated = LOAD '$src/$srcfile' USING PigStorage(',' , '-noschema') AS (line: int, gaussian: double, b: boolean, c:chararray ); sorted = ORDER generated BY c ASC; result = FILTER sorted BY gaussian >= 0;
    • © Hortonworks Inc. Example: Apache Giraph •Graph nodes in RAM •exchange data with peers at barriers •use cases: PageRank, Friend-of-Friend •But also: modelling cells in a heart Bulk-Synchronous-Parallel -read Pregel paper Page 6
    • © Hortonworks Inc. But there is a lot more we can do Page 7
    • © Hortonworks Inc. New Algorithms and runtimes •Giraph for graph work •Stream processing: Storm •Iterative and chained processing: Dryad-style •Long-lived processes Page 8
    • © Hortonworks Inc. Production-side issues •Scale to 10K nodes •Eliminate SPOFs & Bottlenecks •Improve versioning by moving MR engine user-side •Avoid having dedicated servers for other roles Page 9
    • © Hortonworks Inc. 2012 YARN: Yet Another Resource Negotiator Resource Manager MapReduce Status Job Submission Client Node Manager Node Manager Container Node Manager App Mstr Node Status Resource Request App Master manages the app AM can request containers and run code in them
    • © Hortonworks Inc. YARN vs Other Resource Negotiators •MapReduce #1 initial use case •Failures: AM handles worker failures, YARN handles AM failures •Scheduling Locality: sources of data, destinations. AM gets provides location requests along with (CPU, RAM Page 11
    • © Hortonworks Inc. Pig/Hive-MR versus Pig/Hive-Tez Page 12 I/O Synchronization Barrier I/O Pipelining Pig/Hive - Tez Pig/Hive - Tez SELECT a.state, COUNT(*) FROM a JOIN b ON (a.id = b.id) GROUP BY a.state
    • © Hortonworks Inc. FastQuery: Beyond Batch with YARN Page 13 Tez Generalizes Map-Reduce Simplified execution plans process data more efficiently Always-On Tez Service Low latency processing for all Hadoop data processing
    • © Hortonworks Inc. You too can write a distributed execution framework -if you need to Page 14
    • © Hortonworks Inc. Start the work in progress •Hamster: MPI •Storm-YARN from Yahoo! •Hoya: HBase on YARN  me And start with other people's code •Continuuity Weave -looks best place to start Page 15
    • © Hortonworks Inc. What are the services and algorithms we are going to need? Page 16
    • © Hortonworks Inc http://hortonworks.com/careers/ Page 17 P.S: we are hiring