Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Welcome to YARN Meetup
September 2013

©2013 LinkedIn Corporation. All Rights Reserved.
YARN @ LinkedIn
State of the Art
Mohammad Islam

©2013 LinkedIn Corporation. All Rights Reserved.
Early Adopter
 YARN is good fit for many LinkedIn problems
 Many initiatives by multiple teams
 LI Engineers enjoy the ...
Early Adopter
 Samza : Real-time stream processing
system
– Developed by LinkedIn team
– Apache incubator project
– Use Y...
Early Adopter
 Helix – Generic cluster management
system
– Built and used in LinkedIn
– Apache Incubator project
– Incorp...
Early Adopter
 Not yet open sourced
– Few projects are incubating at LI
– Mostly around custom and near-realtime
executio...
Early Adopter
 Administering YARN:
– One of the pioneers of a 2.1.0-beta prod-like
deployment
– Led by our Ops/Dev team
–...
Early Adopter
 Pig on Tez: Actively working with Pig
community
 Hosted a small “Pig on Tez” dev meeting
– Participants i...
Apache Giraph on YARN

©2013 LinkedIn Corporation. All Rights Reserved.
Overview of Giraph
 A distributed graph processing framework
– Master/slave architecture
– In-memory computation
– Vertex...
Quick History
 HortonWorks/LinkedIn intern (Eli) wrote the
early version of Giraph AM
 Based on 2.0.3
 Since then YARN ...
Giraph on YARN
Node
Manager
Worker

Client

Resource
Manager

Worker

Node
Manager
App
Mstr

ZooKeeper

Worker

Node
Manag...
New Giraph AM
 Girpah AM : Nearly a complete rewrite by LinkedIn
Hadoop dev.
– Used new stable API
– Adopt new asynchrono...
Memory Footprint - Page Rank Algorithm

 Iteration 3



Iteration 27
Reachable
1.5 GB

Reachable
1.5 GB
Unreachable
3 GB...
Challenges in Giraph
 Memory intensive Java based system
 Various (GC) knobs to tune the system and
application
 Depend...
Future Direction
 Option 1: “Worker” in C++
– C++provides direct control over memory management
– No need to rewrite the ...
Final Thoughts on Giraph
 LinkedIn is the 1st player of Giraph on YARN
 Successfully executed full LinkedIn graph run
–
...
Challenges in YARN
 Failover of various components (RM/AM etc.)
 APIs stabilization –almost there!
 Representative exam...
Concluding on YARN
 YARN is the way to go forward!
 Reduce the innovation barrier
 Support non-MR execution platform
 ...
Q& A

Thanks for coming!

©2013 LinkedIn Corporation. All Rights Reserved.
Giraph Architecture
 Master / Workers
 Zookeeper

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Master

Worker...
Upcoming SlideShare
Loading in …5
×

Yarn at LinkedIn

996 views

Published on

YARN meetup on September 2013 at LinkedIn
How LinkedIn is using YARN?
What is the future of YARN at LinkedIn?
New Giraph AM on YARN and experiences of running LinkedIn Graph.

Published in: Technology
  • Be the first to comment

Yarn at LinkedIn

  1. 1. Welcome to YARN Meetup September 2013 ©2013 LinkedIn Corporation. All Rights Reserved.
  2. 2. YARN @ LinkedIn State of the Art Mohammad Islam ©2013 LinkedIn Corporation. All Rights Reserved.
  3. 3. Early Adopter  YARN is good fit for many LinkedIn problems  Many initiatives by multiple teams  LI Engineers enjoy the fun of emergent technologies ©2013 LinkedIn Corporation. All Rights Reserved.
  4. 4. Early Adopter  Samza : Real-time stream processing system – Developed by LinkedIn team – Apache incubator project – Use YARN and Kafka – Detailed presentation coming later today ©2013 LinkedIn Corporation. All Rights Reserved.
  5. 5. Early Adopter  Helix – Generic cluster management system – Built and used in LinkedIn – Apache Incubator project – Incorporating YARN resource management – Stay tuned to learn more today ©2013 LinkedIn Corporation. All Rights Reserved.
  6. 6. Early Adopter  Not yet open sourced – Few projects are incubating at LI – Mostly around custom and near-realtime execution engine – Status: Some in POC and some are in design state ©2013 LinkedIn Corporation. All Rights Reserved.
  7. 7. Early Adopter  Administering YARN: – One of the pioneers of a 2.1.0-beta prod-like deployment – Led by our Ops/Dev team – Found a lot of issues  Kerberos auth (YARN -621 & others) – Contributing back to Apache to stabilize YARN  Streamlined operational tools (HADOOP9902) ©2013 LinkedIn Corporation. All Rights Reserved.
  8. 8. Early Adopter  Pig on Tez: Actively working with Pig community  Hosted a small “Pig on Tez” dev meeting – Participants include: Yahoo, HortonWorks, Netflix and LinkedIn  Developed a high-level implementation plan ©2013 LinkedIn Corporation. All Rights Reserved.
  9. 9. Apache Giraph on YARN ©2013 LinkedIn Corporation. All Rights Reserved.
  10. 10. Overview of Giraph  A distributed graph processing framework – Master/slave architecture – In-memory computation – Vertex-centric high-level programming model – Based on Bulk Synchronous Parallel (BSP) ©2013 LinkedIn Corporation. All Rights Reserved. 10
  11. 11. Quick History  HortonWorks/LinkedIn intern (Eli) wrote the early version of Giraph AM  Based on 2.0.3  Since then YARN has evolved a lot!  API overhauled Action: Overhaul Giraph onYARN ©2013 LinkedIn Corporation. All Rights Reserved.
  12. 12. Giraph on YARN Node Manager Worker Client Resource Manager Worker Node Manager App Mstr ZooKeeper Worker Node Manager Master ©2013 LinkedIn Corporation. All Rights Reserved. Worker 12
  13. 13. New Giraph AM  Girpah AM : Nearly a complete rewrite by LinkedIn Hadoop dev. – Used new stable API – Adopt new asynchronous/event based model – Status: Patch ready  Client – Used new API – Status: Patch ready  Security – Added Kerberos support for Giraph YARN client and AM – Status: Testing ©2013 LinkedIn Corporation. All Rights Reserved.
  14. 14. Memory Footprint - Page Rank Algorithm  Iteration 3  Iteration 27 Reachable 1.5 GB Reachable 1.5 GB Unreachable 3 GB Unreachable 6 GB ©2013 LinkedIn Corporation. All Rights Reserved.
  15. 15. Challenges in Giraph  Memory intensive Java based system  Various (GC) knobs to tune the system and application  Depends heavily on skillful application developers  Performance degradation from scaling up  Not a good player for multi-tenant system ©2013 LinkedIn Corporation. All Rights Reserved. 15
  16. 16. Future Direction  Option 1: “Worker” in C++ – C++provides direct control over memory management – No need to rewrite the whole Giraph  Issue : Adoption barrier – Writing C++ application – Possible solution: Giraph scripting language  Like Hive or Pig  Option 2: Off-heap memory usage Option 3: Leave it alone! ©2013 LinkedIn Corporation. All Rights Reserved. 16
  17. 17. Final Thoughts on Giraph  LinkedIn is the 1st player of Giraph on YARN  Successfully executed full LinkedIn graph run – – – – Page Rank algorithm 200M+ vertices and XX Billions edges On 40-node cluster with 650GB memory Total time taken: 28 minutes  Ready to go!  Scope for improvements utilizing YARN’s flexibility ©2013 LinkedIn Corporation. All Rights Reserved. 17
  18. 18. Challenges in YARN  Failover of various components (RM/AM etc.)  APIs stabilization –almost there!  Representative examples for quick dev ramp-up  Better documentation – Book on its way!  Operational friendly – Centralized logging – SLA support – timed resource constraint. ©2013 LinkedIn Corporation. All Rights Reserved.
  19. 19. Concluding on YARN  YARN is the way to go forward!  Reduce the innovation barrier  Support non-MR execution platform  Improved utilization/performance – By removing the split of map/reduce slot – Through distribution of JT responsibility ©2013 LinkedIn Corporation. All Rights Reserved.
  20. 20. Q& A Thanks for coming! ©2013 LinkedIn Corporation. All Rights Reserved.
  21. 21. Giraph Architecture  Master / Workers  Zookeeper Worker Worker Worker Worker Worker Worker Worker Master Worker Worker ©2013 LinkedIn Corporation. All Rights Reserved. 21

×