SlideShare a Scribd company logo
1 of 30
Analyzing Historical Data of
Applications on HadoopYARN:
for Fun and Profit
Mayank Bansal,Zhijie Shen
Agenda
• Who we are ?
• Why we need New History Server?
• Application History Server
• Timeline Server
• Future Work
Who we are
• Hadoop Architect @ ebay
• Apache Hadoop Committer
• Apache Oozie PMC and Committer
• Current
• Leading Hadoop Core Development for
YARN and MapReduce @ ebay
• Past
• Working on Scheduler / Resource
Managers
• Working on Distributed Systems
• Data Pipeline frameworks
Mayank Bansal
Who we are
• Software Engineer @ Hortonworks
• Apache Hadoop Committer
• Apache SAMZA PPMC and Committer
Zhijie Shen
Agenda
• Who we are ?
• Why we need New History Server?
• Application History Server
• Application Timeline Server
• Future Work
MR JobHistory Server
• We already have Job History Server
• It is only for Map Reduce Customized
• Storage is HDFS only
• Storage is very MR specific
• Counters
• Mappers and Reducers
• If you have only Map Reduce you are good.
Hadoop-2
Single Use System
Batch Apps
Multi Purpose Platform
Batch, Interactive, streaming
YARN
Issues with current Job History
• What if I have other Applications
• RM crashes
• Hard Limit on # Apps
• Upgrades / Updates
Agenda
• Who we are ?
• Why we need New History Server?
• Application History Server
• Timeline Server
• Future Work
Application History Server
• Separate Process
• Pluggable Storage
• HDFS
• In-Memory
• Resource Manager directly writes to Storage
• Aggregated Logs
• Separate UI, CLI and Rest End Point
Application History Server
Storage:
• It stores generic Data
• Application level data (queue, user etc…)
• List of ApplicationAttempts
• Information about each ApplicationAttempt
• List of containers for ApplicationAttempt
• Generic information about each container.
Application History Server
Application History Server
• CLI Interface
$ yarn application -status <Application ID>
$ yarn applicationattempt -list <Application ID>
• REST APIs
• http://localhost:8188/ws/v1/applicationhistory/app
s/appid
Application History Server
• Scalability for storage
• One file per application
• File format is protobuff
• Size of HDFS files
• Multiple RM threads writing to History Storage
# of
Containers
100 1K 10 K 100K
Size of the File 19 KB 184 KB 1.8 MB 19 MB
Agenda
• Who we are ?
• Why we need New History Server?
• Application History Server
• Timeline Server
• Future Work
Timeline Service - Motivation
• YARN takes care of it
– Relieving the application from monitoring service
• Application diversity
– Framework specific metadata/metrics
Timeline Service – Data Model
• Entity Type
– An abstract concept of anything
• Entity
– One specific instance of a entity type
– Defining the relationship between entities
• Event
– Something happens to an entity
Timeline Service – Architecture
• LevelDB Store
• Client Library
• REST Interfaces
Timeline Service – Store
• LevelDB based store
– Key-value store
– Lightweight
– License compatible
• Implementing reader/writer interfaces
• Support data retention
Timeline Service – Client
• TimelineClient
– Wrap over REST POST method
– POJO objects
• TimelineEntity
• TimelineEvent
– In Client/AM/Container
Timeline Service – APIs
• Rest APIs, JSON as the media
• Get timeline entities
– http://localhost:8188/ws/v1/timeline/{entityType}
• Get timeline entity
– http://localhost:8188/ws/v1/timeline/{entityType}/{entityId}
• Get timeline events
– http://localhost:8188/ws/v1/timeline/{entityType}/events
Timeline Service – Security
• HTTP SPNEGO
• Kerberos Authentication
• Delegation Token
– Performance
– AM/Container no Kerberos
• Access Control
– Admin/owner
– Timeline entity-level
Browser or
REST clientTimeline Client Timeline Client
Client AM
2. Pass DT via AppSubmissionContext
1. Get DT
Put Data via
Kerberos or
DTs
3. Put Data w/ DT
Timeline Authen ca on Filter
Timeline REST APIs
Timeline
ACLs
Manager
Access Verifica on
Pass
authen cated
user informa on
Get Data via
Kerberos
HadoopsupportforKerberos,SPNEGO,ACLs
Timeline Service security
Timeline Server
Timeline Service – Use Case (1)
Timeline Service – EarlyAdopter (2)
Timeline Service – Early Adopter (3)
Agenda
• Who we are ?
• Why we need New History Server?
• Application History Server
• Timeline Server
• Future Work
To Be Continue…
• Integrating the generic history and timeline data
• Rebasing MR Job history server on the timeline
server
• Making the timeline server rendering the timeline
data
To Be Continue…
Scale
• Leveldb does not handle ebay scale
• We need something which can horizontally scale
• HBASE
Questions
30
Mayank Bansal
mabansal@ebay.com
mayank@apache.org
Zhijie Shen
zshen@hortonworks.com
zjshen@apache.org

More Related Content

What's hot

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 

What's hot (20)

Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
Venturing into Large Hadoop Clusters
Venturing into Large Hadoop ClustersVenturing into Large Hadoop Clusters
Venturing into Large Hadoop Clusters
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
How Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at ScaleHow Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at Scale
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 

Viewers also liked (6)

Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)
 
YARN High Availability
YARN High AvailabilityYARN High Availability
YARN High Availability
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
 
1000台規模のHadoopクラスタをHive/Tezアプリケーションにあわせてパフォーマンスチューニングした話
1000台規模のHadoopクラスタをHive/Tezアプリケーションにあわせてパフォーマンスチューニングした話1000台規模のHadoopクラスタをHive/Tezアプリケーションにあわせてパフォーマンスチューニングした話
1000台規模のHadoopクラスタをHive/Tezアプリケーションにあわせてパフォーマンスチューニングした話
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 

Similar to Analyzing Historical Data of Applications on YARN for Fun and Profit

Similar to Analyzing Historical Data of Applications on YARN for Fun and Profit (20)

Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
Spring-Boot-PQS with Apache Ignite Caching @ HbaseCon PhoenixCon Dataworks su...
 
REST APIs
REST APIsREST APIs
REST APIs
 
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
 
Adobe Spark Meetup - 9/19/2018 - San Jose, CA
Adobe Spark Meetup - 9/19/2018 - San Jose, CAAdobe Spark Meetup - 9/19/2018 - San Jose, CA
Adobe Spark Meetup - 9/19/2018 - San Jose, CA
 
WSO2Con USA 2017: Building an Effective API Architecture
WSO2Con USA 2017: Building an Effective API ArchitectureWSO2Con USA 2017: Building an Effective API Architecture
WSO2Con USA 2017: Building an Effective API Architecture
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Ai big dataconference_ml_fastdata_vitalii bondarenko
Ai big dataconference_ml_fastdata_vitalii bondarenkoAi big dataconference_ml_fastdata_vitalii bondarenko
Ai big dataconference_ml_fastdata_vitalii bondarenko
 
Vitalii Bondarenko "Machine Learning on Fast Data"
Vitalii Bondarenko "Machine Learning on Fast Data"Vitalii Bondarenko "Machine Learning on Fast Data"
Vitalii Bondarenko "Machine Learning on Fast Data"
 
SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 Performance
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 Performance
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Analyzing Historical Data of Applications on YARN for Fun and Profit

  • 1. Analyzing Historical Data of Applications on HadoopYARN: for Fun and Profit Mayank Bansal,Zhijie Shen
  • 2. Agenda • Who we are ? • Why we need New History Server? • Application History Server • Timeline Server • Future Work
  • 3. Who we are • Hadoop Architect @ ebay • Apache Hadoop Committer • Apache Oozie PMC and Committer • Current • Leading Hadoop Core Development for YARN and MapReduce @ ebay • Past • Working on Scheduler / Resource Managers • Working on Distributed Systems • Data Pipeline frameworks Mayank Bansal
  • 4. Who we are • Software Engineer @ Hortonworks • Apache Hadoop Committer • Apache SAMZA PPMC and Committer Zhijie Shen
  • 5. Agenda • Who we are ? • Why we need New History Server? • Application History Server • Application Timeline Server • Future Work
  • 6. MR JobHistory Server • We already have Job History Server • It is only for Map Reduce Customized • Storage is HDFS only • Storage is very MR specific • Counters • Mappers and Reducers • If you have only Map Reduce you are good.
  • 7. Hadoop-2 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, streaming
  • 9. Issues with current Job History • What if I have other Applications • RM crashes • Hard Limit on # Apps • Upgrades / Updates
  • 10. Agenda • Who we are ? • Why we need New History Server? • Application History Server • Timeline Server • Future Work
  • 11. Application History Server • Separate Process • Pluggable Storage • HDFS • In-Memory • Resource Manager directly writes to Storage • Aggregated Logs • Separate UI, CLI and Rest End Point
  • 12. Application History Server Storage: • It stores generic Data • Application level data (queue, user etc…) • List of ApplicationAttempts • Information about each ApplicationAttempt • List of containers for ApplicationAttempt • Generic information about each container.
  • 14. Application History Server • CLI Interface $ yarn application -status <Application ID> $ yarn applicationattempt -list <Application ID> • REST APIs • http://localhost:8188/ws/v1/applicationhistory/app s/appid
  • 15. Application History Server • Scalability for storage • One file per application • File format is protobuff • Size of HDFS files • Multiple RM threads writing to History Storage # of Containers 100 1K 10 K 100K Size of the File 19 KB 184 KB 1.8 MB 19 MB
  • 16. Agenda • Who we are ? • Why we need New History Server? • Application History Server • Timeline Server • Future Work
  • 17. Timeline Service - Motivation • YARN takes care of it – Relieving the application from monitoring service • Application diversity – Framework specific metadata/metrics
  • 18. Timeline Service – Data Model • Entity Type – An abstract concept of anything • Entity – One specific instance of a entity type – Defining the relationship between entities • Event – Something happens to an entity
  • 19. Timeline Service – Architecture • LevelDB Store • Client Library • REST Interfaces
  • 20. Timeline Service – Store • LevelDB based store – Key-value store – Lightweight – License compatible • Implementing reader/writer interfaces • Support data retention
  • 21. Timeline Service – Client • TimelineClient – Wrap over REST POST method – POJO objects • TimelineEntity • TimelineEvent – In Client/AM/Container
  • 22. Timeline Service – APIs • Rest APIs, JSON as the media • Get timeline entities – http://localhost:8188/ws/v1/timeline/{entityType} • Get timeline entity – http://localhost:8188/ws/v1/timeline/{entityType}/{entityId} • Get timeline events – http://localhost:8188/ws/v1/timeline/{entityType}/events
  • 23. Timeline Service – Security • HTTP SPNEGO • Kerberos Authentication • Delegation Token – Performance – AM/Container no Kerberos • Access Control – Admin/owner – Timeline entity-level Browser or REST clientTimeline Client Timeline Client Client AM 2. Pass DT via AppSubmissionContext 1. Get DT Put Data via Kerberos or DTs 3. Put Data w/ DT Timeline Authen ca on Filter Timeline REST APIs Timeline ACLs Manager Access Verifica on Pass authen cated user informa on Get Data via Kerberos HadoopsupportforKerberos,SPNEGO,ACLs Timeline Service security Timeline Server
  • 24. Timeline Service – Use Case (1)
  • 25. Timeline Service – EarlyAdopter (2)
  • 26. Timeline Service – Early Adopter (3)
  • 27. Agenda • Who we are ? • Why we need New History Server? • Application History Server • Timeline Server • Future Work
  • 28. To Be Continue… • Integrating the generic history and timeline data • Rebasing MR Job history server on the timeline server • Making the timeline server rendering the timeline data
  • 29. To Be Continue… Scale • Leveldb does not handle ebay scale • We need something which can horizontally scale • HBASE

Editor's Notes

  1. This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.