SlideShare a Scribd company logo
1 of 21
Welcome to YARN Meetup
September 2013

©2013 LinkedIn Corporation. All Rights Reserved.
YARN @ LinkedIn
State of the Art
Mohammad Islam

©2013 LinkedIn Corporation. All Rights Reserved.
Early Adopter
 YARN is good fit for many LinkedIn problems
 Many initiatives by multiple teams
 LI Engineers enjoy the fun of emergent
technologies

©2013 LinkedIn Corporation. All Rights Reserved.
Early Adopter
 Samza : Real-time stream processing
system
– Developed by LinkedIn team
– Apache incubator project
– Use YARN and Kafka
– Detailed presentation coming later today

©2013 LinkedIn Corporation. All Rights Reserved.
Early Adopter
 Helix – Generic cluster management
system
– Built and used in LinkedIn
– Apache Incubator project
– Incorporating YARN resource management
– Stay tuned to learn more today

©2013 LinkedIn Corporation. All Rights Reserved.
Early Adopter
 Not yet open sourced
– Few projects are incubating at LI
– Mostly around custom and near-realtime
execution engine
– Status: Some in POC and some are in
design state

©2013 LinkedIn Corporation. All Rights Reserved.
Early Adopter
 Administering YARN:
– One of the pioneers of a 2.1.0-beta prod-like
deployment
– Led by our Ops/Dev team
– Found a lot of issues
 Kerberos auth (YARN -621 & others)

– Contributing back to Apache to stabilize
YARN
 Streamlined operational tools (HADOOP9902)

©2013 LinkedIn Corporation. All Rights Reserved.
Early Adopter
 Pig on Tez: Actively working with Pig
community
 Hosted a small “Pig on Tez” dev meeting
– Participants include: Yahoo, HortonWorks, Netflix
and LinkedIn

 Developed a high-level implementation plan

©2013 LinkedIn Corporation. All Rights Reserved.
Apache Giraph on YARN

©2013 LinkedIn Corporation. All Rights Reserved.
Overview of Giraph
 A distributed graph processing framework
– Master/slave architecture
– In-memory computation
– Vertex-centric high-level programming model
– Based on Bulk Synchronous Parallel (BSP)

©2013 LinkedIn Corporation. All Rights Reserved.

10
Quick History
 HortonWorks/LinkedIn intern (Eli) wrote the
early version of Giraph AM
 Based on 2.0.3
 Since then YARN has evolved a lot!
 API overhauled

Action: Overhaul Giraph onYARN

©2013 LinkedIn Corporation. All Rights Reserved.
Giraph on YARN
Node
Manager
Worker

Client

Resource
Manager

Worker

Node
Manager
App
Mstr

ZooKeeper

Worker

Node
Manager
Master
©2013 LinkedIn Corporation. All Rights Reserved.

Worker
12
New Giraph AM
 Girpah AM : Nearly a complete rewrite by LinkedIn
Hadoop dev.
– Used new stable API
– Adopt new asynchronous/event based model
– Status: Patch ready

 Client
– Used new API
– Status: Patch ready
 Security
– Added Kerberos support for Giraph YARN client and
AM
– Status: Testing

©2013 LinkedIn Corporation. All Rights Reserved.
Memory Footprint - Page Rank Algorithm

 Iteration 3



Iteration 27
Reachable
1.5 GB

Reachable
1.5 GB
Unreachable
3 GB

Unreachable
6 GB

©2013 LinkedIn Corporation. All Rights Reserved.
Challenges in Giraph
 Memory intensive Java based system
 Various (GC) knobs to tune the system and
application
 Depends heavily on skillful application
developers
 Performance degradation from scaling up
 Not a good player for multi-tenant system

©2013 LinkedIn Corporation. All Rights Reserved.

15
Future Direction
 Option 1: “Worker” in C++
– C++provides direct control over memory management
– No need to rewrite the whole Giraph

 Issue : Adoption barrier
– Writing C++ application
– Possible solution: Giraph scripting language
 Like Hive or Pig

 Option 2: Off-heap memory usage
Option 3: Leave it alone!
©2013 LinkedIn Corporation. All Rights Reserved.

16
Final Thoughts on Giraph
 LinkedIn is the 1st player of Giraph on YARN
 Successfully executed full LinkedIn graph run
–
–
–
–

Page Rank algorithm
200M+ vertices and XX Billions edges
On 40-node cluster with 650GB memory
Total time taken: 28 minutes

 Ready to go!
 Scope for improvements utilizing YARN’s
flexibility

©2013 LinkedIn Corporation. All Rights Reserved.

17
Challenges in YARN
 Failover of various components (RM/AM etc.)
 APIs stabilization –almost there!
 Representative examples for quick dev ramp-up
 Better documentation
– Book on its way!

 Operational friendly
– Centralized logging
– SLA support – timed resource constraint.

©2013 LinkedIn Corporation. All Rights Reserved.
Concluding on YARN
 YARN is the way to go forward!
 Reduce the innovation barrier
 Support non-MR execution platform
 Improved utilization/performance
– By removing the split of map/reduce slot
– Through distribution of JT responsibility

©2013 LinkedIn Corporation. All Rights Reserved.
Q& A

Thanks for coming!

©2013 LinkedIn Corporation. All Rights Reserved.
Giraph Architecture
 Master / Workers
 Zookeeper

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Master

Worker

Worker

©2013 LinkedIn Corporation. All Rights Reserved.

21

More Related Content

Similar to Yarn at LinkedIn

Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsLuciano Resende
 
Delivering Mobile Apps to the Field with Oracle
Delivering Mobile Apps to the Field with OracleDelivering Mobile Apps to the Field with Oracle
Delivering Mobile Apps to the Field with OracleSimon Haslam
 
Serverless Java: JJUG CCC 2019
Serverless Java: JJUG CCC 2019Serverless Java: JJUG CCC 2019
Serverless Java: JJUG CCC 2019Shaun Smith
 
Enterprise Application Migration
Enterprise Application MigrationEnterprise Application Migration
Enterprise Application MigrationVMware Tanzu
 
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...Cloud Native Day Tel Aviv
 
Diagnose Your Microservices
Diagnose Your MicroservicesDiagnose Your Microservices
Diagnose Your MicroservicesMarcus Hirt
 
Functions and DevOps
Functions and DevOpsFunctions and DevOps
Functions and DevOpsShaun Smith
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal GemfireIn-Memory Computing Summit
 
Insight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark KerznerInsight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark KerznerSynerzip
 
Is 12 Factor App Right About Logging
Is 12 Factor App Right About LoggingIs 12 Factor App Right About Logging
Is 12 Factor App Right About LoggingPhil Wilkins
 
Frontend Monoliths: Run if you can!
Frontend Monoliths: Run if you can!Frontend Monoliths: Run if you can!
Frontend Monoliths: Run if you can!Jonas Bandi
 
Top 10 Programming Languages | Programming Languages For Beginners | Computer...
Top 10 Programming Languages | Programming Languages For Beginners | Computer...Top 10 Programming Languages | Programming Languages For Beginners | Computer...
Top 10 Programming Languages | Programming Languages For Beginners | Computer...Edureka!
 
Java 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the futureJava 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the futureRogue Wave Software
 
Simplifying and Future-Proofing Hadoop
Simplifying and Future-Proofing HadoopSimplifying and Future-Proofing Hadoop
Simplifying and Future-Proofing HadoopPrecisely
 
Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Rogue Wave Software
 
Open Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesOpen Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesAll Things Open
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 

Similar to Yarn at LinkedIn (20)

Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
Delivering Mobile Apps to the Field with Oracle
Delivering Mobile Apps to the Field with OracleDelivering Mobile Apps to the Field with Oracle
Delivering Mobile Apps to the Field with Oracle
 
Serverless Java: JJUG CCC 2019
Serverless Java: JJUG CCC 2019Serverless Java: JJUG CCC 2019
Serverless Java: JJUG CCC 2019
 
HugNov14
HugNov14HugNov14
HugNov14
 
Eclipse Way
Eclipse WayEclipse Way
Eclipse Way
 
Enterprise Application Migration
Enterprise Application MigrationEnterprise Application Migration
Enterprise Application Migration
 
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
 
Node.js as an IOT Bridge
Node.js as an IOT BridgeNode.js as an IOT Bridge
Node.js as an IOT Bridge
 
Diagnose Your Microservices
Diagnose Your MicroservicesDiagnose Your Microservices
Diagnose Your Microservices
 
Functions and DevOps
Functions and DevOpsFunctions and DevOps
Functions and DevOps
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
 
Insight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark KerznerInsight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark Kerzner
 
Is 12 Factor App Right About Logging
Is 12 Factor App Right About LoggingIs 12 Factor App Right About Logging
Is 12 Factor App Right About Logging
 
Frontend Monoliths: Run if you can!
Frontend Monoliths: Run if you can!Frontend Monoliths: Run if you can!
Frontend Monoliths: Run if you can!
 
Top 10 Programming Languages | Programming Languages For Beginners | Computer...
Top 10 Programming Languages | Programming Languages For Beginners | Computer...Top 10 Programming Languages | Programming Languages For Beginners | Computer...
Top 10 Programming Languages | Programming Languages For Beginners | Computer...
 
Java 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the futureJava 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the future
 
Simplifying and Future-Proofing Hadoop
Simplifying and Future-Proofing HadoopSimplifying and Future-Proofing Hadoop
Simplifying and Future-Proofing Hadoop
 
Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)
 
Open Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesOpen Source Applied - Real World Use Cases
Open Source Applied - Real World Use Cases
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 

Recently uploaded

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 

Recently uploaded (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 

Yarn at LinkedIn

  • 1. Welcome to YARN Meetup September 2013 ©2013 LinkedIn Corporation. All Rights Reserved.
  • 2. YARN @ LinkedIn State of the Art Mohammad Islam ©2013 LinkedIn Corporation. All Rights Reserved.
  • 3. Early Adopter  YARN is good fit for many LinkedIn problems  Many initiatives by multiple teams  LI Engineers enjoy the fun of emergent technologies ©2013 LinkedIn Corporation. All Rights Reserved.
  • 4. Early Adopter  Samza : Real-time stream processing system – Developed by LinkedIn team – Apache incubator project – Use YARN and Kafka – Detailed presentation coming later today ©2013 LinkedIn Corporation. All Rights Reserved.
  • 5. Early Adopter  Helix – Generic cluster management system – Built and used in LinkedIn – Apache Incubator project – Incorporating YARN resource management – Stay tuned to learn more today ©2013 LinkedIn Corporation. All Rights Reserved.
  • 6. Early Adopter  Not yet open sourced – Few projects are incubating at LI – Mostly around custom and near-realtime execution engine – Status: Some in POC and some are in design state ©2013 LinkedIn Corporation. All Rights Reserved.
  • 7. Early Adopter  Administering YARN: – One of the pioneers of a 2.1.0-beta prod-like deployment – Led by our Ops/Dev team – Found a lot of issues  Kerberos auth (YARN -621 & others) – Contributing back to Apache to stabilize YARN  Streamlined operational tools (HADOOP9902) ©2013 LinkedIn Corporation. All Rights Reserved.
  • 8. Early Adopter  Pig on Tez: Actively working with Pig community  Hosted a small “Pig on Tez” dev meeting – Participants include: Yahoo, HortonWorks, Netflix and LinkedIn  Developed a high-level implementation plan ©2013 LinkedIn Corporation. All Rights Reserved.
  • 9. Apache Giraph on YARN ©2013 LinkedIn Corporation. All Rights Reserved.
  • 10. Overview of Giraph  A distributed graph processing framework – Master/slave architecture – In-memory computation – Vertex-centric high-level programming model – Based on Bulk Synchronous Parallel (BSP) ©2013 LinkedIn Corporation. All Rights Reserved. 10
  • 11. Quick History  HortonWorks/LinkedIn intern (Eli) wrote the early version of Giraph AM  Based on 2.0.3  Since then YARN has evolved a lot!  API overhauled Action: Overhaul Giraph onYARN ©2013 LinkedIn Corporation. All Rights Reserved.
  • 13. New Giraph AM  Girpah AM : Nearly a complete rewrite by LinkedIn Hadoop dev. – Used new stable API – Adopt new asynchronous/event based model – Status: Patch ready  Client – Used new API – Status: Patch ready  Security – Added Kerberos support for Giraph YARN client and AM – Status: Testing ©2013 LinkedIn Corporation. All Rights Reserved.
  • 14. Memory Footprint - Page Rank Algorithm  Iteration 3  Iteration 27 Reachable 1.5 GB Reachable 1.5 GB Unreachable 3 GB Unreachable 6 GB ©2013 LinkedIn Corporation. All Rights Reserved.
  • 15. Challenges in Giraph  Memory intensive Java based system  Various (GC) knobs to tune the system and application  Depends heavily on skillful application developers  Performance degradation from scaling up  Not a good player for multi-tenant system ©2013 LinkedIn Corporation. All Rights Reserved. 15
  • 16. Future Direction  Option 1: “Worker” in C++ – C++provides direct control over memory management – No need to rewrite the whole Giraph  Issue : Adoption barrier – Writing C++ application – Possible solution: Giraph scripting language  Like Hive or Pig  Option 2: Off-heap memory usage Option 3: Leave it alone! ©2013 LinkedIn Corporation. All Rights Reserved. 16
  • 17. Final Thoughts on Giraph  LinkedIn is the 1st player of Giraph on YARN  Successfully executed full LinkedIn graph run – – – – Page Rank algorithm 200M+ vertices and XX Billions edges On 40-node cluster with 650GB memory Total time taken: 28 minutes  Ready to go!  Scope for improvements utilizing YARN’s flexibility ©2013 LinkedIn Corporation. All Rights Reserved. 17
  • 18. Challenges in YARN  Failover of various components (RM/AM etc.)  APIs stabilization –almost there!  Representative examples for quick dev ramp-up  Better documentation – Book on its way!  Operational friendly – Centralized logging – SLA support – timed resource constraint. ©2013 LinkedIn Corporation. All Rights Reserved.
  • 19. Concluding on YARN  YARN is the way to go forward!  Reduce the innovation barrier  Support non-MR execution platform  Improved utilization/performance – By removing the split of map/reduce slot – Through distribution of JT responsibility ©2013 LinkedIn Corporation. All Rights Reserved.
  • 20. Q& A Thanks for coming! ©2013 LinkedIn Corporation. All Rights Reserved.
  • 21. Giraph Architecture  Master / Workers  Zookeeper Worker Worker Worker Worker Worker Worker Worker Master Worker Worker ©2013 LinkedIn Corporation. All Rights Reserved. 21

Editor's Notes

  1. So what is giraph?Giraph is a distributed graph processing framework. It tries to solve a class of iterative problems that hadoop has problem with, such as pagerank.Graph processing is very improtant to linkedinGiraph is designed with master slave architecture and does all its computation in memory. Meaning, it loads the inputs from HDFS once and writes the output back to HDFS only after finishing its business logic processing.Giraph provides a vertex-centric programming model. All algorithms are implemented from the point of view of a single vertex in the input graph performing a single iteration of the computation.Giraph makes graph algorithm easy to reason about and implement by following the BSP. A bsp computation proceeds in a series of global supersteps. A superstep consists of three components, concurrent computation, communication, and barrier synchronization.
  2. Client is nothing but initiating an application for the user. It just asks the resource manager will you launch my application master. From there, the application master is going to do everything.Resource manager just schedules your task (job tracker sort of activity ask the node managers for the containers with the right heap size, and where we could put this task.Application master is sort of master node for your application and is going to launch, manage the life cycle, communicates with health anything to do with your task. New brain of your application.You may wonder what is the difference between the master and application master? The answer is these two components could be combined as one. However, giraph is implemented this way and my focus is on giraph’s performance.
  3. byte[] 1.8GBLong/DoubleWritable 2GB----- Meeting Notes (9/3/13 19:49) -----move it upadd GB----- Meeting Notes (9/4/13 11:45) -----move up
  4. Step 2. Make them support Java-based applicationsJava interface to write Giraph applications running on C++ Giraph master/workers----- Meeting Notes (9/3/13 19:49) -----No need to rewrite the whole Giraphmore control over memory management----- Meeting Notes (9/4/13 11:45) -----from prev slides, we found out jvm is the killerthat's why we are thinking if c++ is better candidate, we are asking to overhaul the giraph, we only ask the griaph to pice
  5. Relative new things in the community.There are a few bugs we fixed to make it work
  6. Master – responsible for coordination (load distribution, coordinates synchronization, request checkpoints, collect health status, etc)Worker – responsible for computation within each iteration or superstepZookeeper – responsible for computation state (partition to worker mapping, global state, checkpoints paths, statistics, etc)In the next slide, I’m going to show you how these components work together----- Meeting Notes (9/3/13 19:49) -----add another slide descirbing the computational modelpartition the vertice to distribute the load