13.02.2014

uweseiler

Hadoop 2 Beyond MapReduce
13.02.2014

About me

Big Data Nerd

Hadoop Trainer

Photography Enthusiast

NoSQL Fanboy

Travelpirate
13.02.2014

About us

is a bunch of…

Big Data Nerds

Agile Ninjas

Continuous Delivery Gurus

Join us!
Enterprise Java Sp...
13.02.2014

Agenda

• Why Hadoop 2?
• HDFS 2
• YARN
• YARN Apps
• Write your own YARN App
• Tez, Hive & Stinger Initiative
13.02.2014

Agenda

• Why Hadoop 2?
• HDFS 2
• YARN
• YARN Apps
• Write your own YARN App
• Tez, Hive & Stinger Initiative
Hadoop 1

13.02.2014

Built for web-scale batch apps
Single App

Single App

Batch

Batch

Single App

Single App

Single ...
13.02.2014

MapReduce is good for…

• Embarrassingly parallel algorithms
• Summing, grouping, filtering, joining
• Off-lin...
13.02.2014

MapReduce is OK for…

• Iterative jobs (i.e., graph algorithms)
– Each iteration must read/write data to
disk
...
13.02.2014

MapReduce is not good for…

• Jobs that need shared state/coordination
– Tasks are shared-nothing
– Shared-sta...
13.02.2014

•

Hadoop 1 limitations

Scalability
– Maximum cluster size ~ 4,500 nodes
– Maximum concurrent tasks – 40,000
...
13.02.2014

A brief history of Hadoop 2

• Originally conceived & architected by the
team at Yahoo!
–

Arun Murthy created...
13.02.2014

Hadoop 2: Next-gen platform

Single use system

Multi-purpose platform

Batch Apps

Batch, Interactive, Stream...
Taking Hadoop beyond batch

13.02.2014

Store all data in one place
Interact with data in multiple ways
Applications run n...
13.02.2014

Agenda

• Why Hadoop 2?
• HDFS 2
• YARN
• YARN Apps
• Write your own YARN App
• Tez, Hive & Stinger Initiative
13.02.2014

HDFS 2: In a nutshell

• Removes tight coupling of Block
Storage and Namespace
• High Availability
• Scalabili...
HDFS 2: Federation

13.02.2014

NameNodes do not talk to each other

NameNode 1

NameNode 2

NameNodes manages
Namespace 2...
13.02.2014

HDFS 2: Architecture

Only the active
writes edits

NFS

Active NameNode
Block
Map

DataNode

Shared state on
...
13.02.2014

Only the active
writes edits

HDFS 2: Quorum based storage
Journal
Node

Journal
Node

Active NameNode
Block
M...
13.02.2014

When healthy,
holds persistent
session
Holds special
lock znode
ZKFailover
Controller

Process that
monitors h...
13.02.2014

HDFS 2: Snapshots

• Admin can create point in time snapshots of HDFS
• Of the entire file system
• Of a speci...
13.02.2014

HDFS 2: NFS Gateway

• Supports NFS v3 (NFS v4 is work in progress)
• Supports all HDFS commands
• List files
...
13.02.2014

HDFS 2: Performance

• Many improvements
• Write pipeline (e.g. new primitives hflush, hsync)
• Read path impr...
13.02.2014

Agenda

• Why Hadoop 2?
• HDFS 2
• YARN
• YARN Apps
• Write your own YARN App
• Tez, Hive & Stinger Initiative
13.02.2014

YARN: Design Goals

• Build a new abstraction layer by splitting up
the two major functions of the JobTracker
...
YARN: Concepts

13.02.2014

• Application
– Application is a job committed to the YARN framework
– Example: MapReduce job,...
13.02.2014

YARN: Architecture

• Resource Manager
– Global resource scheduler
– Hierarchical queues

•

Node Manager
– Pe...
13.02.2014

YARN: Architectural Overview

Split up the two major functions of the JobTracker
Cluster resource management &...
13.02.2014

YARN: BigDataOS™
ResourceManager

Scheduler
NodeManager

NodeManager

NodeManager

NodeManager

MapReduce 1

m...
13.02.2014

YARN: Multi-tenancy
ResourceManager

Scheduler
NodeManager

NodeManager

MapReduce 1

map 1.1

Region server 2...
13.02.2014

YARN: Capacity Scheduler

• Queues
• Economics as queue capacities
•
•

• Resource Isolation
• Administration
...
13.02.2014

Agenda

• Why Hadoop 2?
• HDFS 2
• YARN
• YARN Apps
• Write your own YARN App
• Tez, Hive & Stinger Initiative
13.02.2014

•
•
•
•
•
•
•
•
•
•

YARN Apps: Overview

MapReduce 2
HOYA
Storm
Samza
Apache S4
Spark
Apache Giraph
Apache Ha...
13.02.2014

•
•
•
•
•
•
•
•
•
•

YARN Apps: Overview

MapReduce 2
HOYA
Storm
Samza
Apache S4
Spark
Apache Giraph
Apache Ha...
13.02.2014

MapReduce 2: In a nutshell

• MapReduce is now a YARN app
•
•

No more map and reduce slots, it’s containers n...
13.02.2014

MapReduce 2: Shuffle

• Faster Shuffle
•

Better embedded server: Netty

• Encrypted Shuffle
•
•
•
•

Secure t...
13.02.2014

MapReduce 2: Performance

• Key Optimizations
• No hard segmentation of resource into map and reduce slots
• Y...
13.02.2014

•
•
•
•
•
•
•
•
•
•

YARN Apps: Overview

MapReduce 2
HOYA
Storm
Samza
Apache S4
Spark
Apache Giraph
Apache Ha...
13.02.2014

HOYA: In a nutshell

• Create on-demand HBase clusters
•
•
•
•
•

Small HBase cluster in large YARN cluster
Dy...
13.02.2014

HOYA Client

YARNClient
HOYA
specific API

HOYA: Creation of AppMaster
ResourceManager

Scheduler
NodeManager
...
13.02.2014

HOYA Client

YARNClient
HOYA
specific API

HOYA: Deployment of HBase
ResourceManager

Scheduler
NodeManager

N...
13.02.2014

HOYA Client

YARNClient
HOYA
specific API

HOYA: Bind via ZooKeeper
ResourceManager

Scheduler
NodeManager

No...
13.02.2014

•
•
•
•
•
•
•
•
•
•

YARN Apps: Overview

MapReduce 2
HOYA
Storm
Samza
Apache S4
Spark
Apache Giraph
Apache Ha...
13.02.2014

Storm: In a nutshell

• Stream-processing
• Real-time processing
• Developed as standalone application
• https...
13.02.2014

Storm: Conceptual view

Bolt

Spout
Source of streams

• Consumer of streams
• Processing of tuples
• Possibly...
13.02.2014

•
•
•
•
•
•
•
•
•
•

YARN Apps: Overview

MapReduce 2
HOYA
Storm
Samza
Apache S4
Spark
Apache Giraph
Apache Ha...
13.02.2014

Spark: In a nutshell

• High-speed in-memory analytics over
Hadoop and Hive
• Separate MapReduce-like engine
–...
13.02.2014

Spark: Data Sharing
13.02.2014

•
•
•
•
•
•
•
•
•
•

YARN Apps: Overview

MapReduce 2
HOYA
Storm
Samza
Apache S4
Spark
Apache Giraph
Apache Ha...
13.02.2014

Giraph: In a nutshell

• Giraph is a framework for processing semistructured graph data on a massive scale.
• ...
13.02.2014

Agenda

• Why Hadoop 2?
• HDFS 2
• YARN
• YARN Apps
• Write your own YARN App
• Tez, Hive & Stinger Initiative
13.02.2014

YARN: API & client libaries

• Application Client Protocol: Client to RM interaction
•
•
•

Library: YarnClien...
13.02.2014
App client

YARNClient
App specific
API

YARN: Application flow
Application
Client
Protocol

ResourceManager

S...
13.02.2014

YARN: More things to consider

• Container / Tasks
• Client
• UI
• Recovery
• Container-to-AM communication
• ...
YARN: Testing / Debugging

13.02.2014

•

MiniYARNCluster
•
•
•

•

Unmanaged AM
•
•

•

Cluster within process in threads...
13.02.2014

Agenda

• Why Hadoop 2?
• HDFS 2
• YARN
• YARN Apps
• Write your own YARN App
• Tez, Hive & Stinger Initiative
13.02.2014

•
•

Apache Tez: In a nutshell

Distributed execution framework that works on
computations represented as data...
Apache Tez: The new primitive

13.02.2014

MapReduce as Base

Apache Tez as Base
Hadoop 2

Hadoop 1
MR
Pig

Hive

Pig

Hiv...
13.02.2014

Apache Tez: Architecture

• Task with pluggable Input, Processor & Output
Task
HDFS
Input

Map
Processor

Sort...
13.02.2014

Apache Tez: Tez Service

• MapReduce Query Startup is expensive:
– Job launch & task-launch latencies are fata...
13.02.2014

Apache Tez: Performance

Performance gains over Map Reduce
• Eliminate replicated write barrier between succes...
13.02.2014

Apache Tez: Performance

Optimal resource management
• Reuse YARN containers to launch new tasks.
• Reuse YARN...
13.02.2014

Apache Tez: Performance

Execution plan reconfiguration at runtime
• Dynamic runtime concurrency control based...
13.02.2014

Apache Tez: Performance

Dynamic physical data flow decisions
• Decide the type of physical byte movement and ...
13.02.2014

Apache Tez: Performance
SELECT a.state, COUNT(*),
AVERAGE(c.price)
FROM a
JOIN b ON (a.id = b.id)
JOIN c ON (a...
13.02.2014

Real-Time
• Online systems
• R-T analytics
• CEP

0-5s

Hive: Current Focus Area
Interactive
• Parameterized
R...
13.02.2014

Real-Time
• Online systems
• R-T analytics
• CEP

Stinger: Extending the sweet spot
NonInteractive

Interactiv...
13.02.2014

Stinger Initiative: In a nutshell
13.02.2014

Stinger: Enhancing SQL Semantics
13.02.2014

Stinger: Tez revisited
SELECT a.state, COUNT(*),
AVERAGE(c.price)
FROM a
JOIN b ON (a.id = b.id)
JOIN c ON (a....
Hive: Persistence Options

13.02.2014

• Built-in Formats
•
•
•
•
•
•
•

ORCFile
RCFile
Avro
Delimited Text
Regular Expres...
13.02.2014

Stinger: ORCFile in a nutshell

• Optimized Row Columnar File
• Columns stored separately

• Knows types
• Use...
13.02.2014

Large block
size well
suited for
HDFS

Stinger: ORCFile Layout
Columnar format
arranges columns
adjacent withi...
13.02.2014

Stinger: ORCFile advantages

• High Compression
• Many tricks used out-of-the-box to ensure high
compression r...
13.02.2014

Stinger: ORCFile compression
Data set from TPC-DS
Measurements by Owen
O‘Malley, Hortonworks
13.02.2014

Stinger: More optimizations

• Join Optimizations
• New Join Types added or improved in Hive 0.11

• More Effi...
13.02.2014

Stinger: Overall Performance

* Real numbers, but handle with care!
13.02.2014

Hadoop 2: Summary

1. Scale
2. New programming models &
services
3. Beyond Java
4. Improved cluster utilizatio...
13.02.2014

One more thing…

Let’s get started with
Hadoop 2!
13.02.2014

Hortonworks Sandbox 2.0

http://hortonworks.com/products/hortonworks-sandbox
13.02.2014

Dude, where’s my Hadoop 2?

* Table by Adam Muise, Hortonworks
13.02.2014

The end…or the beginning?
Upcoming SlideShare
Loading in...5
×

Hadoop 2 - Beyond MapReduce

2,153

Published on

Talk held at a combined meeting of the Web Performance Karlsruhe (http://www.meetup.com/Karlsruhe-Web-Performance-Group/events/153207062) & Big Data Karlsruhe/Stuttgart (http://www.meetup.com/Big-Data-User-Group-Karlsruhe-Stuttgart/events/162836152) user groups.

Agenda:
- Why Hadoop 2?
- HDFS 2
- YARN
- YARN Apps
- Write your own YARN App
- Tez, Hive & Stinger Initiative

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,153
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
168
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Hadoop 2 - Beyond MapReduce

  1. 1. 13.02.2014 uweseiler Hadoop 2 Beyond MapReduce
  2. 2. 13.02.2014 About me Big Data Nerd Hadoop Trainer Photography Enthusiast NoSQL Fanboy Travelpirate
  3. 3. 13.02.2014 About us is a bunch of… Big Data Nerds Agile Ninjas Continuous Delivery Gurus Join us! Enterprise Java Specialists Performance Geeks
  4. 4. 13.02.2014 Agenda • Why Hadoop 2? • HDFS 2 • YARN • YARN Apps • Write your own YARN App • Tez, Hive & Stinger Initiative
  5. 5. 13.02.2014 Agenda • Why Hadoop 2? • HDFS 2 • YARN • YARN Apps • Write your own YARN App • Tez, Hive & Stinger Initiative
  6. 6. Hadoop 1 13.02.2014 Built for web-scale batch apps Single App Single App Batch Batch Single App Single App Single App Batch Batch Batch HDFS HDFS HDFS
  7. 7. 13.02.2014 MapReduce is good for… • Embarrassingly parallel algorithms • Summing, grouping, filtering, joining • Off-line batch jobs on massive data sets • Analyzing an entire large dataset
  8. 8. 13.02.2014 MapReduce is OK for… • Iterative jobs (i.e., graph algorithms) – Each iteration must read/write data to disk – I/O and latency cost of an iteration is high
  9. 9. 13.02.2014 MapReduce is not good for… • Jobs that need shared state/coordination – Tasks are shared-nothing – Shared-state requires scalable state store • Low-latency jobs • Jobs on small datasets • Finding individual records
  10. 10. 13.02.2014 • Hadoop 1 limitations Scalability – Maximum cluster size ~ 4,500 nodes – Maximum concurrent tasks – 40,000 – Coarse synchronization in JobTracker • Availability – Failure kills all queued and running jobs • Hard partition of resources into map & reduce slots – Low resource utilization • Lacks support for alternate paradigms and services – Iterative applications implemented using MapReduce are 10x slower
  11. 11. 13.02.2014 A brief history of Hadoop 2 • Originally conceived & architected by the team at Yahoo! – Arun Murthy created the original JIRA in 2008 and now is the Hadoop 2 release manager The community has been working on Hadoop 2 for over 4 years • • Hadoop 2 based architecture running at scale at Yahoo! – Deployed on 35,000+ nodes for 6+ months
  12. 12. 13.02.2014 Hadoop 2: Next-gen platform Single use system Multi-purpose platform Batch Apps Batch, Interactive, Streaming, … Hadoop 1 MapReduce Cluster resource mgmt. + data processing HDFS Redundant, reliable storage Hadoop 2 MapReduce Others Data processing Data processing YARN Cluster resource management HDFS 2 Redundant, reliable storage
  13. 13. Taking Hadoop beyond batch 13.02.2014 Store all data in one place Interact with data in multiple ways Applications run natively in Hadoop Batch Interactive Online MapReduce Tez HOYA Streaming Graph In-Memory Storm, … Giraph YARN Cluster resource management HDFS 2 Redundant, reliable storage Spark Other Search, …
  14. 14. 13.02.2014 Agenda • Why Hadoop 2? • HDFS 2 • YARN • YARN Apps • Write your own YARN App • Tez, Hive & Stinger Initiative
  15. 15. 13.02.2014 HDFS 2: In a nutshell • Removes tight coupling of Block Storage and Namespace • High Availability • Scalability & Isolation • Increased performance Details: https://issues.apache.org/jira/browse/HDFS-1052
  16. 16. HDFS 2: Federation 13.02.2014 NameNodes do not talk to each other NameNode 1 NameNode 2 NameNodes manages Namespace 2 only slice of namespace Namespace 1 logs finance insights Pool 1 Pool 2 Block DataNode 1 reports DataNode 2 Block Storage as generic storage service Pools DataNode 3 Common storage DataNode 4 DataNodes can store blocks managed by any NameNode
  17. 17. 13.02.2014 HDFS 2: Architecture Only the active writes edits NFS Active NameNode Block Map DataNode Shared state on shared directory on NFS Edits File DataNode Standby NameNode Block Map DataNode Edits File DataNode The Standby simultaneously reads and applies the edits DataNode DataNodes report to both NameNodes but listen only to the orders from the active one
  18. 18. 13.02.2014 Only the active writes edits HDFS 2: Quorum based storage Journal Node Journal Node Active NameNode Block Map DataNode Edits File DataNode Journal Node Standby NameNode Block Map DataNode Edits File DataNode The state is shared on a quorum of journal nodes The Standby simultaneously reads and applies the edits DataNode DataNodes report to both NameNodes but listen only to the orders from the active one
  19. 19. 13.02.2014 When healthy, holds persistent session Holds special lock znode ZKFailover Controller Process that monitors health of NameNode DataNode HDFS 2: High Availability ZooKeeper Node ZooKeeper Node Journal Node ZooKeeper Node Journal Node Journal Node SharedNameNode state Active NameNode Block Map Edits File DataNode Standby NameNode Block Map DataNode Send block reports to Active & Standby NameNode Edits File DataNode When healthy, holds persistent session ZKFailover Controller Process that monitors health of NameNode DataNode
  20. 20. 13.02.2014 HDFS 2: Snapshots • Admin can create point in time snapshots of HDFS • Of the entire file system • Of a specific data-set (sub-tree directory of file system) • Restore state of entire file system or data-set to a snapshot (like Apple Time Machine) • Protect against user errors • Snapshot diffs identify changes made to data set • Keep track of how raw or derived/analytical data changes over time
  21. 21. 13.02.2014 HDFS 2: NFS Gateway • Supports NFS v3 (NFS v4 is work in progress) • Supports all HDFS commands • List files • Copy, move files • Create and delete directories • Ingest for large scale analytical workloads • Load immutable files as source for analytical processing • No random writes • Stream files into HDFS • Log ingest by applications writing directly to HDFS client mount
  22. 22. 13.02.2014 HDFS 2: Performance • Many improvements • Write pipeline (e.g. new primitives hflush, hsync) • Read path improvements for fewer memory copies • Short-circuit local reads for 2-3x faster random reads • I/O improvements using posix_fadvise() • libhdfs improvements for zero copy reads • Significant improvements: I/O 2.5 - 5x faster
  23. 23. 13.02.2014 Agenda • Why Hadoop 2? • HDFS 2 • YARN • YARN Apps • Write your own YARN App • Tez, Hive & Stinger Initiative
  24. 24. 13.02.2014 YARN: Design Goals • Build a new abstraction layer by splitting up the two major functions of the JobTracker • Cluster resource management • Application life-cycle management • Allow other processing paradigms • Flexible API for implementing YARN apps • MapReduce becomes YARN app • Lots of different YARN apps
  25. 25. YARN: Concepts 13.02.2014 • Application – Application is a job committed to the YARN framework – Example: MapReduce job, Storm topology, … • Container – Basic unit of allocation – Fine-grained resource allocation across multiple resource types (RAM, CPU, Disk, Network, GPU, etc.) • • container_0 = 4 GB, 1 CPU container_1 = 512 MB, 6 CPU’s – Replaces the fixed map/reduce slots
  26. 26. 13.02.2014 YARN: Architecture • Resource Manager – Global resource scheduler – Hierarchical queues • Node Manager – Per-machine agent – Manages the life-cycle of container – Container resource monitoring • Application Master – Per-application – Manages application scheduling and task execution – e.g. MapReduce Application Master
  27. 27. 13.02.2014 YARN: Architectural Overview Split up the two major functions of the JobTracker Cluster resource management & Application life-cycle management ResourceManager Scheduler NodeManager NodeManager AM 1 NodeManager Container 1.1 NodeManager Container 2.1 Container 2.3 NodeManager NodeManager NodeManager NodeManager Container 1.2 AM 2 Container 2.2
  28. 28. 13.02.2014 YARN: BigDataOS™ ResourceManager Scheduler NodeManager NodeManager NodeManager NodeManager MapReduce 1 map 1.1 reduce 2.2 map 2.1 Region server 2 reduce 2.1 nimbus 1 vertex 3 NodeManager NodeManager NodeManager NodeManager HBase Master map 1.2 MapReduce 2 map 2.2 nimbus 2 Region server 1 vertex 4 vertex 2 NodeManager NodeManager NodeManager NodeManager HOYA reduce 1.1 Tez map 2.3 vertex 1 Region server 3 Storm
  29. 29. 13.02.2014 YARN: Multi-tenancy ResourceManager Scheduler NodeManager NodeManager MapReduce 1 map 1.1 Region server 2 reduce 2.1 NodeManager HBase Master nimbus 2 NodeManager NodeManager reduce 2.2 root nimbus 1 map 2.1 vertex 3 30% 10% 60% NodeManager NodeManager NodeManager AdDWH User Hoc map 1.2 MapReduce 2 map 2.2 75% 25% 20% 80% Region server 1 NodeManager NodeManager HOYA reduce 1.1 vertex 1 vertex 2 vertex 4 Dev Dev Prod NodeManager 60% 40% Tez Dev1 Region server 3 Prod NodeManager Dev2 map 2.3 Storm
  30. 30. 13.02.2014 YARN: Capacity Scheduler • Queues • Economics as queue capacities • • • Resource Isolation • Administration • • • Capacity Scheduler Hierarchical queues SLA‘s via preemption Queue ACL‘s Runtime reconfig Charge-back root nimbus 1 30% 10% 60% NodeManager AdDWH User Hoc MapReduce 2 75% 25% 20% 80% vertex 4 Dev Prod 60% Tez Dev1 Dev Prod 40% Dev2
  31. 31. 13.02.2014 Agenda • Why Hadoop 2? • HDFS 2 • YARN • YARN Apps • Write your own YARN App • Tez, Hive & Stinger Initiative
  32. 32. 13.02.2014 • • • • • • • • • • YARN Apps: Overview MapReduce 2 HOYA Storm Samza Apache S4 Spark Apache Giraph Apache Hama Elastic Search Cloudera Llama Batch HBase on YARN Stream Processing Stream Processing Stream Processing Iterative/Interactive Graph Processing Bulk Synchronous Parallel Scalable Search Impala on YARN
  33. 33. 13.02.2014 • • • • • • • • • • YARN Apps: Overview MapReduce 2 HOYA Storm Samza Apache S4 Spark Apache Giraph Apache Hama Elastic Search Cloudera Llama Batch HBase on YARN Stream Processing Stream Processing Stream Processing Iterative/Interactive Graph Processing Bulk Synchronous Parallel Scalable Search Impala on YARN
  34. 34. 13.02.2014 MapReduce 2: In a nutshell • MapReduce is now a YARN app • • No more map and reduce slots, it’s containers now No more JobTracker, it’s YarnAppmaster library now • Multiple versions of MapReduce • • The older mapred APIs work without modification or recompilation The newer mapreduce APIs may need to be recompiled • Still has one master server component: the Job History Server • • • The Job History Server stores the execution of jobs Used to audit prior execution of jobs Will also be used by YARN framework to store charge backs at that level • Better cluster utilization • Increased scalability & availability
  35. 35. 13.02.2014 MapReduce 2: Shuffle • Faster Shuffle • Better embedded server: Netty • Encrypted Shuffle • • • • Secure the shuffle phase as data moves across the cluster Requires 2 way HTTPS, certificates on both sides Causes significant CPU overhead, reserve 1 core for this work Certificates stored on each node (provision with the cluster), refreshed every 10 secs • Pluggable Shuffle Sort • • • • Shuffle is the first phase in MapReduce that is guaranteed to not be datalocal Pluggable Shuffle/Sort allows application developers or hardware developers to intercept the network-heavy workload and optimize it Typical implementations have hardware components like fast networks and software components like sorting algorithms API will change with future versions of Hadoop
  36. 36. 13.02.2014 MapReduce 2: Performance • Key Optimizations • No hard segmentation of resource into map and reduce slots • YARN scheduler is more efficient • MR2 framework has become more efficient than MR1: shuffle phase in MRv2 is more performant with the usage of Netty. • 40.000+ nodes running YARN across over 365 PB of data. • About 400.000 jobs per day for about 10 million hours of compute time. • Estimated 60% – 150% improvement on node usage per day • Got rid of a whole 10,000 node datacenter because of their increased utilization.
  37. 37. 13.02.2014 • • • • • • • • • • YARN Apps: Overview MapReduce 2 HOYA Storm Samza Apache S4 Spark Apache Giraph Apache Hama Elastic Search Cloudera Llama Batch HBase on YARN Stream Processing Stream Processing Stream Processing Iterative/Interactive Graph Processing Bulk Synchronous Parallel Scalable Search Impala on YARN
  38. 38. 13.02.2014 HOYA: In a nutshell • Create on-demand HBase clusters • • • • • Small HBase cluster in large YARN cluster Dynamic HBase clusters Self-healing HBase Cluster Elastic HBase clusters Transient/intermittent clusters for workflows • Configure custom configurations & versions • Better isolation • More efficient utilization/sharing of cluster
  39. 39. 13.02.2014 HOYA Client YARNClient HOYA specific API HOYA: Creation of AppMaster ResourceManager Scheduler NodeManager NodeManager Container HOYA Application Master Container NodeManager Container Container Container Container
  40. 40. 13.02.2014 HOYA Client YARNClient HOYA specific API HOYA: Deployment of HBase ResourceManager Scheduler NodeManager NodeManager Region Server Container HOYA Application Master Container NodeManager Container HBase Master Container Container Region Server Container
  41. 41. 13.02.2014 HOYA Client YARNClient HOYA specific API HOYA: Bind via ZooKeeper ResourceManager Scheduler NodeManager NodeManager Region Server Container HBase Client HOYA Application Master Container NodeManager Container HBase Master Container Container Region Server Container ZooKeeper
  42. 42. 13.02.2014 • • • • • • • • • • YARN Apps: Overview MapReduce 2 HOYA Storm Samza Apache S4 Spark Apache Giraph Apache Hama Elastic Search Cloudera Llama Batch HBase on YARN Stream Processing Stream Processing Stream Processing Iterative/Interactive Graph Processing Bulk Synchronous Parallel Scalable Search Impala on YARN
  43. 43. 13.02.2014 Storm: In a nutshell • Stream-processing • Real-time processing • Developed as standalone application • https://github.com/nathanmarz/storm • Ported on YARN • https://github.com/yahoo/storm-yarn
  44. 44. 13.02.2014 Storm: Conceptual view Bolt Spout Source of streams • Consumer of streams • Processing of tuples • Possibly emits new tuples Bolt Stream Spout Unbound sequence of tuples Tuple List of name-value pairs Bolt Tuple Bolt Tuple Spout Tuple Bolt Bolt Topology Network of spouts & bolts as the nodes and streams as the edges
  45. 45. 13.02.2014 • • • • • • • • • • YARN Apps: Overview MapReduce 2 HOYA Storm Samza Apache S4 Spark Apache Giraph Apache Hama Elastic Search Cloudera Llama Batch HBase on YARN Stream Processing Stream Processing Stream Processing Iterative/Interactive Graph Processing Bulk Synchronous Parallel Scalable Search Impala on YARN
  46. 46. 13.02.2014 Spark: In a nutshell • High-speed in-memory analytics over Hadoop and Hive • Separate MapReduce-like engine – – Speedup of up to 100x On-disk queries 5-10x faster • Compatible with Hadoop‘s Storage API • Available as standalone application – https://github.com/mesos/spark • Experimental support for YARN since 0.6 – http://spark.incubator.apache.org/docs/0.6.0/running-on-yarn.html
  47. 47. 13.02.2014 Spark: Data Sharing
  48. 48. 13.02.2014 • • • • • • • • • • YARN Apps: Overview MapReduce 2 HOYA Storm Samza Apache S4 Spark Apache Giraph Apache Hama Elastic Search Cloudera Llama Batch HBase on YARN Stream Processing Stream Processing Stream Processing Iterative/Interactive Graph Processing Bulk Synchronous Parallel Scalable Search Impala on YARN
  49. 49. 13.02.2014 Giraph: In a nutshell • Giraph is a framework for processing semistructured graph data on a massive scale. • Giraph is loosely based upon Google's Pregel • Giraph performs iterative calculations on top of an existing Hadoop cluster. • Available on GitHub – https://github.com/apache/giraph
  50. 50. 13.02.2014 Agenda • Why Hadoop 2? • HDFS 2 • YARN • YARN Apps • Write your own YARN App • Tez, Hive & Stinger Initiative
  51. 51. 13.02.2014 YARN: API & client libaries • Application Client Protocol: Client to RM interaction • • • Library: YarnClient Application lifecycle control Access cluster information • Application Master Protocol: AM – RM interaction • • • Library: AMRMClient / AMRMClientAsync Resource negotiation Heartbeat to the RM • Container Management Protocol: AM to NM interaction • • • Library: NMClient/NMClientAsync Launching allocated containers Stop running containers Use external frameworks like Twill, REEF or Spring
  52. 52. 13.02.2014 App client YARNClient App specific API YARN: Application flow Application Client Protocol ResourceManager Scheduler Application Master Protocol NodeManager NodeManager myAM Container AMRMClient NodeManager Container NMClient Container Management Protocol
  53. 53. 13.02.2014 YARN: More things to consider • Container / Tasks • Client • UI • Recovery • Container-to-AM communication • Application history
  54. 54. YARN: Testing / Debugging 13.02.2014 • MiniYARNCluster • • • • Unmanaged AM • • • Cluster within process in threads (HDFS, MR, HBase) Useful for regression tests Avoid for unit tests: too slow Support to run the AM outside of a YARN cluster for manual testing Debugger, heap dumps, etc. Logs • • • Log aggregation support to push all logs into HDFS Accessible via CLI, UI Useful to enable all application logs by default
  55. 55. 13.02.2014 Agenda • Why Hadoop 2? • HDFS 2 • YARN • YARN Apps • Write your own YARN App • Tez, Hive & Stinger Initiative
  56. 56. 13.02.2014 • • Apache Tez: In a nutshell Distributed execution framework that works on computations represented as dataflow graphs Tez is Hindi for “speed” • Naturally maps to execution plans produced by query optimizers • Highly customizable to meet a broad spectrum of use cases and to enable dynamic performance optimizations at runtime • Built on top of YARN
  57. 57. Apache Tez: The new primitive 13.02.2014 MapReduce as Base Apache Tez as Base Hadoop 2 Hadoop 1 MR Pig Hive Pig Hive Other MapReduce Cluster resource mgmt. + data processing Tez Real time Storm Execution Engine YARN Cluster resource management HDFS Redundant, reliable storage HDFS Redundant, reliable storage O t h e r
  58. 58. 13.02.2014 Apache Tez: Architecture • Task with pluggable Input, Processor & Output Task HDFS Input Map Processor Sorted Output „Classical“ Map Task Shuffle Input Reduce Processor „Classical“ Reduce HDFS Output YARN ApplicationMaster to run DAG of Tez Tasks
  59. 59. 13.02.2014 Apache Tez: Tez Service • MapReduce Query Startup is expensive: – Job launch & task-launch latencies are fatal for short queries (in order of 5s to 30s) • Solution: – Tez Service (= Preallocated Application Master) • Removes job-launch overhead (Application Master) • Removes task-launch overhead (Pre-warmed Containers) – Hive (or Pig) • Submit query plan to Tez Service – Native Hadoop service, not ad-hoc
  60. 60. 13.02.2014 Apache Tez: Performance Performance gains over Map Reduce • Eliminate replicated write barrier between successive computations • Eliminate job launch overhead of workflow jobs • Eliminate extra stage of map reads in every workflow job • Eliminate queue & resource contention suffered by workflow jobs that are started after a predecessor job completes.
  61. 61. 13.02.2014 Apache Tez: Performance Optimal resource management • Reuse YARN containers to launch new tasks. • Reuse YARN containers to enable shared objects across tasks. Tez Task Host Tez Application Master Task started Task finished Tez Task 1 Task started Tez Task 2 Shared Objects YARN Container YARN Container
  62. 62. 13.02.2014 Apache Tez: Performance Execution plan reconfiguration at runtime • Dynamic runtime concurrency control based on data size, user operator resources, available cluster resources and locality • Advanced changes in dataflow graph structure • Progressive graph construction in concert with user optimizer Stage 1 < 10 GB data HDFS Blocks • • YARN resources Stage 2 • 10 reducers • 100 reducers 50 Mapper 100 Partions > 10 GB data Decision made at runtime!
  63. 63. 13.02.2014 Apache Tez: Performance Dynamic physical data flow decisions • Decide the type of physical byte movement and storage on the fly • Store intermediate data on distributed store, local store or in-memory • Transfer bytes via block files or streaming or anything in between > 32 GB Local file Consumer Producer < 32 GB Decision made at runtime! In-Memory
  64. 64. 13.02.2014 Apache Tez: Performance SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b ON (a.id = b.id) JOIN c ON (a.itemId = c.itemId) GROUP BY a.state Existing Hive Tez & Hive Service Hive/Tez Parse Query 0.5s Parse Query 0.5s Parse Query 0.5s Create Plan 0.5s Create Plan 0.5s Create Plan 0.5s Launch MapReduce 20s Launch MapReduce 20s Submit to Tez Service 0.5s Process MapReduce 10s Process MapReduce 2s Total 31s Total 23s Process Map-Reduce Total 2s 3.5s * No exact numbers, for illustration only
  65. 65. 13.02.2014 Real-Time • Online systems • R-T analytics • CEP 0-5s Hive: Current Focus Area Interactive • Parameterized Reports • Drilldown • Visualization • Exploration NonInteractive Batch • Data preparation • Operational • Incremental batch batch processing processing • Enterprise • Dashboards / Reports Scorecards • Data Mining Current Hive Sweet Spot 1m – 1h 5s – 1m Data Size 1h+
  66. 66. 13.02.2014 Real-Time • Online systems • R-T analytics • CEP Stinger: Extending the sweet spot NonInteractive Interactive • Parameterized Reports • Drilldown • Visualization • Exploration • Data preparation • Incremental batch processing • Dashboards / Scorecards Batch • Operational batch processing • Enterprise Reports • Data Mining Future Hive Expansion 0-5s 1m – 1h 5s – 1m 1h+ Data Size Improve Latency & Throughput • Query engine improvements • New “Optimized RCFile” column store • Next-gen runtime (elim’s M/R latency) Extend Deep Analytical Ability • Analytics functions • Improved SQL coverage • Continued focus on core Hive use cases
  67. 67. 13.02.2014 Stinger Initiative: In a nutshell
  68. 68. 13.02.2014 Stinger: Enhancing SQL Semantics
  69. 69. 13.02.2014 Stinger: Tez revisited SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b ON (a.id = b.id) JOIN c ON (a.itemId = c.itemId) GROUP BY a.state Job 1 Job 2 I/O Synchronization Barrier I/O Synchronization Barrier Job 3 Hive - MapReduce Single Job Hive - Tez
  70. 70. Hive: Persistence Options 13.02.2014 • Built-in Formats • • • • • • • ORCFile RCFile Avro Delimited Text Regular Expression S3 Logfile Typed Bytes • 3rd Party Addons • JSON • XML
  71. 71. 13.02.2014 Stinger: ORCFile in a nutshell • Optimized Row Columnar File • Columns stored separately • Knows types • Uses type-specific encoders • Stores statistics (min, max, sum, count) • Has light-weight index • Skip over blocks of rows that don’t matter • Larger blocks • 256 MB by default • Has an index for block boundaries
  72. 72. 13.02.2014 Large block size well suited for HDFS Stinger: ORCFile Layout Columnar format arranges columns adjacent within the file for compression and fast access.
  73. 73. 13.02.2014 Stinger: ORCFile advantages • High Compression • Many tricks used out-of-the-box to ensure high compression rates. • RLE, dictionary encoding, etc. • High Performance • Inline indexes record value ranges within blocks of ORCFile data. • Filter pushdown allows efficient scanning during precise queries. • Flexible Data Model • All Hive types including maps, structs and unions.
  74. 74. 13.02.2014 Stinger: ORCFile compression Data set from TPC-DS Measurements by Owen O‘Malley, Hortonworks
  75. 75. 13.02.2014 Stinger: More optimizations • Join Optimizations • New Join Types added or improved in Hive 0.11 • More Efficient Query Plan Generation • Vectorization • Make the most use of L1 and L2 caches • Avoid branching whenever possible • Optimized Query Planner • Automatically determine optimal execution parameters. • Buffering • Cache hot data in memory. • …
  76. 76. 13.02.2014 Stinger: Overall Performance * Real numbers, but handle with care!
  77. 77. 13.02.2014 Hadoop 2: Summary 1. Scale 2. New programming models & services 3. Beyond Java 4. Improved cluster utilization 5. Enterprise Readiness
  78. 78. 13.02.2014 One more thing… Let’s get started with Hadoop 2!
  79. 79. 13.02.2014 Hortonworks Sandbox 2.0 http://hortonworks.com/products/hortonworks-sandbox
  80. 80. 13.02.2014 Dude, where’s my Hadoop 2? * Table by Adam Muise, Hortonworks
  81. 81. 13.02.2014 The end…or the beginning?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×