SlideShare a Scribd company logo
1 of 20
Download to read offline
July 2020
Tao Feng | @feng-tao | Engineer, Lyft Data Platform
Blog: go.lyft.com/airflowblog
Airflow @ Lyft
2
Who
● Engineer at Lyft Data Platform and Tools
● Apache Airflow PMC and Committer
● Working on different data products (Airflow, Amundsen, etc)
● Previously at Linkedin, Oracle
Agenda
• Data Platform @ Lyft
• Airflow Customization @ Lyft
• Current Focus For Airflow @ Lyft
• Summary
3
Data Platform @ Lyft
4
About Lyft
MISSION: Improve people's life
with the world's best
transportation
Lyft’s data analytics platform architecture
Backend Services
Mobile app
PubSub
Events Batch ETL
Presto, Hive Client,
and BI Tools
Airflow main use cases @ Lyft
7
Airflow usage @ Lyft
8
● Two Clusters
● Celery Executors
Airflow Customization @
Lyft
9
Airflow customization @ Lyft
• UI auditing
• DAG dependency graph
10
Airflow customization @ Lyft
• Extra link for task instance UI panel
11
● Hive query log
● Dr elephant report for performance tuning
● Hive job analysis dashboard
Airflow customization @ Lyft
• Amundsen is an open-sourced data discovery portal.
• It is integrated with Airflow to show the task and table lineage.
• It is currently used by 18+ companies.
12
Current Focus For
Airflow @ Lyft
13
ETL Expiration System
14
• Lots of ETLs are not well maintained with no clear ownership.
• Built an ETL Expiration system to:
‒ Disabled DAGs with expired TTLs (DAG owner needs to renew the TTL every six
months).
‒ Disabled DAGs that produced unused datasets
‒ Disabled DAGs that are failing for a long time
PY2 -> PY3
• Built a dashboard to understand PY3
issue.
‒ Most issues are related to string encoding or
string and integer comparison.
• DAG loading time is higher in py3
compared to py2
‒ Cherry pick a few performance improvement
patches from upstream
15
Airflow Upgrade
• Leverage new features:
‒ DAG serialization
‒ RBAC
‒ Data Lineage
‒ Performance Improvements
• Current status:
‒ Built a new multi-tenant cluster to onboard new use cases.
‒ Finishing PY3 upgrade for legacy DAGs.
‒ Converting the existing legacy mono DAG repo as another tenant on the new
cluster.
16
Summary
17
Summary
18
• Covers Lyft data platform in general
• Discusses about Airflow customization at Lyft
• Discusses about Airflow current work at Lyft
Acknowledgement
19
• Members who maintain Airflow at Lyft
‒ Andrew Stahlman
‒ Bhanu Renukuntla
‒ Chao-han Tsai (committer)
‒ Jinhyuk Chang
‒ Junda Yang
‒ Max Payton
‒ Sherry Zhao
‒ Shenghu Yang (EM)
‒ Tao Feng (committer)
• Thanks Maxime for his guidance
Tao Feng | @feng-tao
Blog at go.lyft.com/airflowblog
20

More Related Content

What's hot

Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafkaconfluent
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introductionchrislusf
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connectKnoldus Inc.
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For DevelopersKevin Brockhoff
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheusCeline George
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsJulian Mazzitelli
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleFlink Forward
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 

What's hot (20)

Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 

Similar to Lyft's Use of Airflow for Data Pipelines

Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyftTao Feng
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureGyula Fóra
 
Implementation of the new REST API for Open Source LBS-platform Geo2Tag
Implementation of the new REST API for Open Source LBS-platform Geo2TagImplementation of the new REST API for Open Source LBS-platform Geo2Tag
Implementation of the new REST API for Open Source LBS-platform Geo2TagOSLL
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaInfluxData
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2Kaxil Naik
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol Performance(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol PerformanceBIOVIA
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop PlatformApache Apex
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01jade_22
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Cloud Foundry Roadmap Update - OSCON - May 2017
Cloud Foundry Roadmap Update - OSCON - May 2017Cloud Foundry Roadmap Update - OSCON - May 2017
Cloud Foundry Roadmap Update - OSCON - May 2017Chip Childers
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) Surendar S
 

Similar to Lyft's Use of Airflow for Data Pipelines (20)

Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Implementation of the new REST API for Open Source LBS-platform Geo2Tag
Implementation of the new REST API for Open Source LBS-platform Geo2TagImplementation of the new REST API for Open Source LBS-platform Geo2Tag
Implementation of the new REST API for Open Source LBS-platform Geo2Tag
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol Performance(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol Performance
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Cloud Foundry Roadmap Update - OSCON - May 2017
Cloud Foundry Roadmap Update - OSCON - May 2017Cloud Foundry Roadmap Update - OSCON - May 2017
Cloud Foundry Roadmap Update - OSCON - May 2017
 
Airflow Intro-1.pdf
Airflow Intro-1.pdfAirflow Intro-1.pdf
Airflow Intro-1.pdf
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
 

More from Tao Feng

Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 
Odp - On demand profiler (ICPE 2018)
Odp - On demand profiler (ICPE 2018)Odp - On demand profiler (ICPE 2018)
Odp - On demand profiler (ICPE 2018)Tao Feng
 
Effective Multi-stream Joining in Apache Samza Framework
Effective Multi-stream Joining in Apache Samza FrameworkEffective Multi-stream Joining in Apache Samza Framework
Effective Multi-stream Joining in Apache Samza FrameworkTao Feng
 
A memory capacity model for high performing data-filtering applications in Sa...
A memory capacity model for high performing data-filtering applications in Sa...A memory capacity model for high performing data-filtering applications in Sa...
A memory capacity model for high performing data-filtering applications in Sa...Tao Feng
 
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshopSamza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshopTao Feng
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeTao Feng
 

More from Tao Feng (6)

Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Odp - On demand profiler (ICPE 2018)
Odp - On demand profiler (ICPE 2018)Odp - On demand profiler (ICPE 2018)
Odp - On demand profiler (ICPE 2018)
 
Effective Multi-stream Joining in Apache Samza Framework
Effective Multi-stream Joining in Apache Samza FrameworkEffective Multi-stream Joining in Apache Samza Framework
Effective Multi-stream Joining in Apache Samza Framework
 
A memory capacity model for high performing data-filtering applications in Sa...
A memory capacity model for high performing data-filtering applications in Sa...A memory capacity model for high performing data-filtering applications in Sa...
A memory capacity model for high performing data-filtering applications in Sa...
 
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshopSamza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per nodeBenchmarking Apache Samza: 1.2 million messages per sec per node
Benchmarking Apache Samza: 1.2 million messages per sec per node
 

Recently uploaded

Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming languageSmritiSharma901052
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 

Recently uploaded (20)

Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming language
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 

Lyft's Use of Airflow for Data Pipelines

  • 1. July 2020 Tao Feng | @feng-tao | Engineer, Lyft Data Platform Blog: go.lyft.com/airflowblog Airflow @ Lyft
  • 2. 2 Who ● Engineer at Lyft Data Platform and Tools ● Apache Airflow PMC and Committer ● Working on different data products (Airflow, Amundsen, etc) ● Previously at Linkedin, Oracle
  • 3. Agenda • Data Platform @ Lyft • Airflow Customization @ Lyft • Current Focus For Airflow @ Lyft • Summary 3
  • 5. About Lyft MISSION: Improve people's life with the world's best transportation
  • 6. Lyft’s data analytics platform architecture Backend Services Mobile app PubSub Events Batch ETL Presto, Hive Client, and BI Tools
  • 7. Airflow main use cases @ Lyft 7
  • 8. Airflow usage @ Lyft 8 ● Two Clusters ● Celery Executors
  • 10. Airflow customization @ Lyft • UI auditing • DAG dependency graph 10
  • 11. Airflow customization @ Lyft • Extra link for task instance UI panel 11 ● Hive query log ● Dr elephant report for performance tuning ● Hive job analysis dashboard
  • 12. Airflow customization @ Lyft • Amundsen is an open-sourced data discovery portal. • It is integrated with Airflow to show the task and table lineage. • It is currently used by 18+ companies. 12
  • 14. ETL Expiration System 14 • Lots of ETLs are not well maintained with no clear ownership. • Built an ETL Expiration system to: ‒ Disabled DAGs with expired TTLs (DAG owner needs to renew the TTL every six months). ‒ Disabled DAGs that produced unused datasets ‒ Disabled DAGs that are failing for a long time
  • 15. PY2 -> PY3 • Built a dashboard to understand PY3 issue. ‒ Most issues are related to string encoding or string and integer comparison. • DAG loading time is higher in py3 compared to py2 ‒ Cherry pick a few performance improvement patches from upstream 15
  • 16. Airflow Upgrade • Leverage new features: ‒ DAG serialization ‒ RBAC ‒ Data Lineage ‒ Performance Improvements • Current status: ‒ Built a new multi-tenant cluster to onboard new use cases. ‒ Finishing PY3 upgrade for legacy DAGs. ‒ Converting the existing legacy mono DAG repo as another tenant on the new cluster. 16
  • 18. Summary 18 • Covers Lyft data platform in general • Discusses about Airflow customization at Lyft • Discusses about Airflow current work at Lyft
  • 19. Acknowledgement 19 • Members who maintain Airflow at Lyft ‒ Andrew Stahlman ‒ Bhanu Renukuntla ‒ Chao-han Tsai (committer) ‒ Jinhyuk Chang ‒ Junda Yang ‒ Max Payton ‒ Sherry Zhao ‒ Shenghu Yang (EM) ‒ Tao Feng (committer) • Thanks Maxime for his guidance
  • 20. Tao Feng | @feng-tao Blog at go.lyft.com/airflowblog 20