SlideShare a Scribd company logo
Airflow based Video Encoding Platform
Hotstar
About Hotstar
Hotstar is an online video streaming platform wholly owned by Fox (Disney).
We offers over 120,000 hours of TV content and movies. In this July we
created a record with 100 Million DAU and 25.3 concurrent viewers.
1. Huge amount of videos and heavy encoding service.
2. Various genres and studios, needs ML to classify & optimize.
3. Long-tail distribution, head contents need ultimate encoding efficiency.
Our Challenges
How to manage and visualize vast encoding jobs?
How to optimize pod scheduling with limited resources?
How to get the statistics of all tasks?
How to create tasks dynamically?
Our Platform
Kubernetes
…
AWS S3
VOD-API
Ceph
Airflow
PostgreSQL
CMS
KubernetePod
KubernetePod
KubernetePod
Basic Concepts
• DAG (Dag run)
• Operator(Task)
Bash, Docker, SSH, KubernetesPod...
How manage global variables?
Airflow Variable
1. Key-Value format
2. Can be configured in Web UI,can
also be modified in DAG
3. Can save token and pass it into DAG,
avoid hardcoding passwords in codes
4. Can be used to set environment
specific variables.
How to exchange info between operators?
XCOM
Xcom: “cross-communication”
How to control Kubernetes?
KubernetesPodOperator
Support most KubernetesPod operations. Can limit pod resource allocation
using resources param.
Using the KubernetesPodOperator (Google)
Resources
request & limit
Command to run in Pod
Docker image to use
Build an Airflow encoding work flow
Download
source and
metadata
Encode and
package
CMS Deploy
Verification and
generate
configuration
Scene analysis
and choose
proper params
Backup and
cleanup
Create
thumbnails
QA
More Codecs
Skip Intro
Check and
upload
⌛️
Build an Airflow encoding work flow
Deployment
CI/CD
Build
Docker
Airflow Kubernetes
CI/CD
Update
DAG File
Encoder,Analysis SDK,
Operator scripts
How to get the statistics of all tasks?
Gantt & Task Duration View
How to optimize pod scheduling
with limited resources?
Time Consuming
Statistics
Task Duration view
Dependency
Analysis
Gantt View
Estimate proper
CPU and mem
limits
Task Level
Resource Map
A1CPU + 2GB
C3CPU + 3GB
D5CPU + 4GB
E1CPU + 2GB
B1CPU + 1GB
How to create tasks dynamically?
Operator Placeholder+BranchOperator
Solution
Create redundant operators at buildtime,and place a BranchOperator before them.
At runtime use BranchOperator to control how many tasks to launch,and set
remaining tasks to skip state.
DAG topology is
built before
workflow start and
can not be
changed during
runtime.
Different contents
have different
output layers. So
we need to
launch different
num of tasks at
runtime.
Traps
1. When Dag file is updated, all old (completed) Dags will be displayed in
latest graph too (this is incorrect).
2. DAG file is executed before DagRun workflow start,only commands /
scripts in Operator works at Runtime.
Traps
3. Scheduler will be quite slow when the amount of DAG Run is large.
4. Developing KubernetesPodOperator Dags takes much more time than
writing python/shell scripts.
Summary
1. Decoupling development work and operation work
2. Communities are maturing
3. Flexible and easy to customize
4. Powerful analysis tool
Quick start
1. Install Docker Desktop and enable Kubernetes
2. brew install kubernetes-helm
3. helm install --namespace "airflow" --name "airflow" stable/airflow
4. export POD_NAME=$(kubectl get pods --namespace airflow -l
"component=web,app=airflow" -o jsonpath="{.items[0].metadata.name}")
echo http://127.0.0.1:8080
kubectl port-forward --namespace airflow $POD_NAME 8080:8080
5. Open http://127.0.0.1:8080
Future Improvements
1. Only one scheduler allowed in airflow
2. Batch analysis (Task Duration、Tree View) is inefficient
3. Xcom、Log fulltext search
4. Airflow scheduling of workers across different hardware platforms
Thank you

More Related Content

What's hot

SharePoint Beginner Training for End Users
SharePoint Beginner Training for End UsersSharePoint Beginner Training for End Users
SharePoint Beginner Training for End Users
Gregory Zelfond
 
AI Builder with Power Platform
AI Builder with Power PlatformAI Builder with Power Platform
AI Builder with Power Platform
Cheah Eng Soon
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
Cloudera, Inc.
 
Strategies for Training End Users How To Use Salesforce
Strategies for Training End Users How To Use SalesforceStrategies for Training End Users How To Use Salesforce
Strategies for Training End Users How To Use Salesforce
Shell Black
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
Kai Wähner
 
Checklist for successful salesforce implementation
Checklist for successful salesforce implementationChecklist for successful salesforce implementation
Checklist for successful salesforce implementation
Cloud Analogy
 
Data Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceData Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-Service
DATAVERSITY
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
The journey toward a self-service data platform at Netflix - sf 2019
The journey toward a self-service data platform at Netflix - sf 2019The journey toward a self-service data platform at Netflix - sf 2019
The journey toward a self-service data platform at Netflix - sf 2019
Karthik Murugesan
 
Prospect Research
Prospect ResearchProspect Research
Apache Kafka in Gaming Industry (Games, Mobile, Betting, Gambling, Bookmaker,...
Apache Kafka in Gaming Industry (Games, Mobile, Betting, Gambling, Bookmaker,...Apache Kafka in Gaming Industry (Games, Mobile, Betting, Gambling, Bookmaker,...
Apache Kafka in Gaming Industry (Games, Mobile, Betting, Gambling, Bookmaker,...
Kai Wähner
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
DataWorks Summit
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
Amazon Web Services
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
HostedbyConfluent
 
Gestión de proyectos predictiva y evolutiva_evolutiva
Gestión de proyectos predictiva y evolutiva_evolutivaGestión de proyectos predictiva y evolutiva_evolutiva
Gestión de proyectos predictiva y evolutiva_evolutiva
Juan Palacio
 
Salesforce Deck Template
Salesforce Deck TemplateSalesforce Deck Template
Salesforce Deck Template
Phil Weinmeister
 
Data Management and Migration in Salesforce
Data Management and Migration in SalesforceData Management and Migration in Salesforce
Data Management and Migration in Salesforce
Sunil kumar
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Databricks
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
Agile Program and Portfolio Management
Agile Program and Portfolio ManagementAgile Program and Portfolio Management
Agile Program and Portfolio Management
Mike Cottmeyer
 

What's hot (20)

SharePoint Beginner Training for End Users
SharePoint Beginner Training for End UsersSharePoint Beginner Training for End Users
SharePoint Beginner Training for End Users
 
AI Builder with Power Platform
AI Builder with Power PlatformAI Builder with Power Platform
AI Builder with Power Platform
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
Strategies for Training End Users How To Use Salesforce
Strategies for Training End Users How To Use SalesforceStrategies for Training End Users How To Use Salesforce
Strategies for Training End Users How To Use Salesforce
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
 
Checklist for successful salesforce implementation
Checklist for successful salesforce implementationChecklist for successful salesforce implementation
Checklist for successful salesforce implementation
 
Data Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceData Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-Service
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
The journey toward a self-service data platform at Netflix - sf 2019
The journey toward a self-service data platform at Netflix - sf 2019The journey toward a self-service data platform at Netflix - sf 2019
The journey toward a self-service data platform at Netflix - sf 2019
 
Prospect Research
Prospect ResearchProspect Research
Prospect Research
 
Apache Kafka in Gaming Industry (Games, Mobile, Betting, Gambling, Bookmaker,...
Apache Kafka in Gaming Industry (Games, Mobile, Betting, Gambling, Bookmaker,...Apache Kafka in Gaming Industry (Games, Mobile, Betting, Gambling, Bookmaker,...
Apache Kafka in Gaming Industry (Games, Mobile, Betting, Gambling, Bookmaker,...
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
Gestión de proyectos predictiva y evolutiva_evolutiva
Gestión de proyectos predictiva y evolutiva_evolutivaGestión de proyectos predictiva y evolutiva_evolutiva
Gestión de proyectos predictiva y evolutiva_evolutiva
 
Salesforce Deck Template
Salesforce Deck TemplateSalesforce Deck Template
Salesforce Deck Template
 
Data Management and Migration in Salesforce
Data Management and Migration in SalesforceData Management and Migration in Salesforce
Data Management and Migration in Salesforce
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Agile Program and Portfolio Management
Agile Program and Portfolio ManagementAgile Program and Portfolio Management
Agile Program and Portfolio Management
 

Similar to Airflow based Video Encoding Platform

What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
Kaxil Naik
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
Anant Corporation
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
Luciano Mammino
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
Stanislav Pogrebnyak
 
Serverless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineServerless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipeline
Shu-Jeng Hsieh
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
HostedbyConfluent
 
DevOps - Interview Question.pdf
DevOps - Interview Question.pdfDevOps - Interview Question.pdf
DevOps - Interview Question.pdf
MinhTrnNht7
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Yahoo!デベロッパーネットワーク
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
kloia
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
SATOSHI TAGOMORI
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
Kaxil Naik
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Chris Fregly
 
DevOps #EMERSONEDUARDORODRIGUES EMERSON EDUARDO RODRIGUES
DevOps #EMERSONEDUARDORODRIGUES EMERSON EDUARDO RODRIGUESDevOps #EMERSONEDUARDORODRIGUES EMERSON EDUARDO RODRIGUES
DevOps #EMERSONEDUARDORODRIGUES EMERSON EDUARDO RODRIGUES
emersonitaliano110
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Anthony Dahanne
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
Ambassador Labs
 
Deep Dive Azure Functions - Global Azure Bootcamp 2019
Deep Dive Azure Functions - Global Azure Bootcamp 2019Deep Dive Azure Functions - Global Azure Bootcamp 2019
Deep Dive Azure Functions - Global Azure Bootcamp 2019
Andrea Tosato
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
DataWorks Summit
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
SeungYong Oh
 
[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue
Amazon Web Services Korea
 

Similar to Airflow based Video Encoding Platform (20)

What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Serverless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineServerless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipeline
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
 
DevOps - Interview Question.pdf
DevOps - Interview Question.pdfDevOps - Interview Question.pdf
DevOps - Interview Question.pdf
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 
DevOps #EMERSONEDUARDORODRIGUES EMERSON EDUARDO RODRIGUES
DevOps #EMERSONEDUARDORODRIGUES EMERSON EDUARDO RODRIGUESDevOps #EMERSONEDUARDORODRIGUES EMERSON EDUARDO RODRIGUES
DevOps #EMERSONEDUARDORODRIGUES EMERSON EDUARDO RODRIGUES
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
 
Deep Dive Azure Functions - Global Azure Bootcamp 2019
Deep Dive Azure Functions - Global Azure Bootcamp 2019Deep Dive Azure Functions - Global Azure Bootcamp 2019
Deep Dive Azure Functions - Global Azure Bootcamp 2019
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue
 

More from Hotstar

How Chaos Engineering is practiced at Hotstar
How Chaos Engineering is practiced at HotstarHow Chaos Engineering is practiced at Hotstar
How Chaos Engineering is practiced at Hotstar
Hotstar
 
WebSDK - Switching between service providers
WebSDK - Switching between service providersWebSDK - Switching between service providers
WebSDK - Switching between service providers
Hotstar
 
Remote Config and Beyond
Remote Config and BeyondRemote Config and Beyond
Remote Config and Beyond
Hotstar
 
Analysing high throughput data in real time
Analysing high throughput data in real timeAnalysing high throughput data in real time
Analysing high throughput data in real time
Hotstar
 
Scaling Hotstar.com for 10Mn concurrency
Scaling Hotstar.com for 10Mn concurrencyScaling Hotstar.com for 10Mn concurrency
Scaling Hotstar.com for 10Mn concurrency
Hotstar
 
Build intelligent, real-time applications using Machine Learning
Build intelligent, real-time applications using Machine LearningBuild intelligent, real-time applications using Machine Learning
Build intelligent, real-time applications using Machine Learning
Hotstar
 
Build real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaBuild real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache Kafka
Hotstar
 
Amazon AI Conclave, Bangalore 2017
Amazon AI Conclave, Bangalore 2017 Amazon AI Conclave, Bangalore 2017
Amazon AI Conclave, Bangalore 2017
Hotstar
 

More from Hotstar (8)

How Chaos Engineering is practiced at Hotstar
How Chaos Engineering is practiced at HotstarHow Chaos Engineering is practiced at Hotstar
How Chaos Engineering is practiced at Hotstar
 
WebSDK - Switching between service providers
WebSDK - Switching between service providersWebSDK - Switching between service providers
WebSDK - Switching between service providers
 
Remote Config and Beyond
Remote Config and BeyondRemote Config and Beyond
Remote Config and Beyond
 
Analysing high throughput data in real time
Analysing high throughput data in real timeAnalysing high throughput data in real time
Analysing high throughput data in real time
 
Scaling Hotstar.com for 10Mn concurrency
Scaling Hotstar.com for 10Mn concurrencyScaling Hotstar.com for 10Mn concurrency
Scaling Hotstar.com for 10Mn concurrency
 
Build intelligent, real-time applications using Machine Learning
Build intelligent, real-time applications using Machine LearningBuild intelligent, real-time applications using Machine Learning
Build intelligent, real-time applications using Machine Learning
 
Build real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaBuild real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache Kafka
 
Amazon AI Conclave, Bangalore 2017
Amazon AI Conclave, Bangalore 2017 Amazon AI Conclave, Bangalore 2017
Amazon AI Conclave, Bangalore 2017
 

Recently uploaded

132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
enizeyimana36
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 

Recently uploaded (20)

132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 

Airflow based Video Encoding Platform

  • 1. Airflow based Video Encoding Platform Hotstar
  • 2. About Hotstar Hotstar is an online video streaming platform wholly owned by Fox (Disney). We offers over 120,000 hours of TV content and movies. In this July we created a record with 100 Million DAU and 25.3 concurrent viewers. 1. Huge amount of videos and heavy encoding service. 2. Various genres and studios, needs ML to classify & optimize. 3. Long-tail distribution, head contents need ultimate encoding efficiency.
  • 3. Our Challenges How to manage and visualize vast encoding jobs? How to optimize pod scheduling with limited resources? How to get the statistics of all tasks? How to create tasks dynamically?
  • 5. Basic Concepts • DAG (Dag run) • Operator(Task) Bash, Docker, SSH, KubernetesPod...
  • 6. How manage global variables? Airflow Variable 1. Key-Value format 2. Can be configured in Web UI,can also be modified in DAG 3. Can save token and pass it into DAG, avoid hardcoding passwords in codes 4. Can be used to set environment specific variables.
  • 7. How to exchange info between operators? XCOM Xcom: “cross-communication”
  • 8. How to control Kubernetes? KubernetesPodOperator Support most KubernetesPod operations. Can limit pod resource allocation using resources param. Using the KubernetesPodOperator (Google) Resources request & limit Command to run in Pod Docker image to use
  • 9. Build an Airflow encoding work flow Download source and metadata Encode and package CMS Deploy Verification and generate configuration Scene analysis and choose proper params Backup and cleanup Create thumbnails QA More Codecs Skip Intro Check and upload ⌛️
  • 10. Build an Airflow encoding work flow
  • 12. How to get the statistics of all tasks? Gantt & Task Duration View
  • 13. How to optimize pod scheduling with limited resources? Time Consuming Statistics Task Duration view Dependency Analysis Gantt View Estimate proper CPU and mem limits Task Level Resource Map A1CPU + 2GB C3CPU + 3GB D5CPU + 4GB E1CPU + 2GB B1CPU + 1GB
  • 14. How to create tasks dynamically? Operator Placeholder+BranchOperator Solution Create redundant operators at buildtime,and place a BranchOperator before them. At runtime use BranchOperator to control how many tasks to launch,and set remaining tasks to skip state. DAG topology is built before workflow start and can not be changed during runtime. Different contents have different output layers. So we need to launch different num of tasks at runtime.
  • 15. Traps 1. When Dag file is updated, all old (completed) Dags will be displayed in latest graph too (this is incorrect). 2. DAG file is executed before DagRun workflow start,only commands / scripts in Operator works at Runtime.
  • 16. Traps 3. Scheduler will be quite slow when the amount of DAG Run is large. 4. Developing KubernetesPodOperator Dags takes much more time than writing python/shell scripts.
  • 17. Summary 1. Decoupling development work and operation work 2. Communities are maturing 3. Flexible and easy to customize 4. Powerful analysis tool
  • 18. Quick start 1. Install Docker Desktop and enable Kubernetes 2. brew install kubernetes-helm 3. helm install --namespace "airflow" --name "airflow" stable/airflow 4. export POD_NAME=$(kubectl get pods --namespace airflow -l "component=web,app=airflow" -o jsonpath="{.items[0].metadata.name}") echo http://127.0.0.1:8080 kubectl port-forward --namespace airflow $POD_NAME 8080:8080 5. Open http://127.0.0.1:8080
  • 19. Future Improvements 1. Only one scheduler allowed in airflow 2. Batch analysis (Task Duration、Tree View) is inefficient 3. Xcom、Log fulltext search 4. Airflow scheduling of workers across different hardware platforms