SlideShare a Scribd company logo
1 of 22
Download to read offline
How to Develop and Operate Cloud
Native Data Platforms and Applications
Du Li
Serena Wang, Yen Feng, Tony Ma, Preethi Ganeshan, Tushar Agarwal,
Kaiyu Liu, Nitish Victor, Yu Jin, Sundeep Narravula
Electronic Arts (EA)
Data Orchestration Summit, 11/7/2019
Applications
Platforms
Infrastructure
Addressing Big Data Challenges at EA
● Seasonal and daily
fluctuations in data traffic
● Need to scale up and down
compute resources
frequently and quickly
General problems in cloud-based tech stacks
● Reasons: Varying degrees of automation exist at different layers
○ Infrastructure (as Code): fully automated
○ Platforms: manually managed
○ Applications: manually managed
● Consequences: High operational costs in platforms and applications
○ Lack of built-in automation, hard to scale up/down
○ Mismatch of automation, hard to keep different layers in sync
○ Engineers busy with ops, limited time for dev and innovation
● Solutions: Redesign or upgrade in cloud-native architectures
One tale of two stories ...
● Monitoring Systems
● Data Platforms
● Key Takeaways
How orchestration helped solve the problems ...
The life of ops team with 1,000+ alerts per day ...
Alerts
Generator
Metrics
Aggregator
Metrics Sensors
confs
What a monitoring system is like ...
confs
confs confs confs
which machine runs what
and how to do health check
from which machines to
collect metrics
what metrics to collect and
where to send metrics
Mismatch of automation causing ops nightmare ...
● Monitoring components: manually configured, little or no automation
● Infrastructure: fully automated, computers added and removed rapidly
● The mismatch makes it hard to keep configurations up to date
● Consequence: many false alerts, while many resources not monitored
Metrics
Sensor
Metrics
Sensor
Metrics
Sensor
confs
confs
Alerts
Generator
confs confs confs
Metrics
Aggregators
Configuration Orchestration
● Keep monitoring confs in sync with
hardware status
● Not modifying any underlying
monitoring components
The Solution: Monitoring As Code
Which machine
runs what
How monitoring
components are
wired
Library of
methods of
how to check
What to check
for each cluster
Monitoring As Code: Configuration Orchestration
AWS
CI/CD Pipelines
Alerts
Generator
Auto Configuration Generators
Metrics
Aggregators
Alerts
Generator
Metrics
Sensor
Metrics
Aggregator
Metrics
Sensors
Monitoring As Code
updates
review/merge
deploy by
CI/CD pipelines
● All monitoring configurations are managed
as one gitlab project
● Impose standard structures on all the
configurations such that they can be
automatically generated
● Configurations as treated the same way as
software code
○ Updates are reviewed, traceable
○ Auto deployed by CI/CD
● Essentially an orchestration layer on top of
monitoring components without modifying
any of those components
1/1/2019 - 4/25/2019 daily averages improved:
~1,000 alerts => ~100 alerts
against increasing #hosts and #services
Since 6/2019, ~10-20 alerts per day on average
One tale of two stories ...
● Monitoring Systems
● Data Platforms
● Key Takeaways
How orchestration helped solve the problems ...
● Ocean: Internal ETL jobs
○ Only cache recent data in HDFS
○ ETL Jobs read/write on HDFS
○ Backup data to S3 and update the
analytics metastore of new data
○ Legacy tech stacks
● Pond: External customer workloads
○ Run analytic queries directly on S3,
maybe the entire history of data
○ Some queries turned into light ETL jobs
○ Some analytic ETL jobs have
dependencies with regular ETL jobs
○ Mixed legacy and modern tech stacks
Two clusters
Main pain points
● Two clusters with duplicates
● Hadoop hard to scale
● HDFS used as cache
○ Complex custom code for data backup, purge, retention, loading
● Fragmented address spaces
○ HDFS (for ETL) recent data, S3 (for analytics) all data
○ Two hive metadata stores: one for ETL and one for analytic workloads
○ Complex sync between jobs and between metadata stores
Data Platform 2.0
● Consolidated ETL/Analytics
● CI/CD for auto deployment
● Auto scaling of YARN and Presto
● Data orchestration using Alluxio
Benefits of data orchestration
● Analytics Workloads
○ Presto + Alluxio: caching for better performance
● ETL Workloads
○ YARN + Alluxio: replace HDFS with Alluxio
■ Specialized cache service to auto backup, purge & load data
■ One unified address space to simplify syncs
■ Easier to scale and auto manage compute resources
○ Significant architectural benefits. But, will we lose any performance?
Preliminary benchmarking: Alluxio vs HDFS
● Same configurations
○ 10 H1.8xlarge instances, collocated YARN + HDFS/Alluxio
○ Alluxio with 3 replicas, no memory (using HDD for caching)
● Test dataset
○ Single file of varying sizes, single table with 1000 files of varying sizes
● Reads: S3 => Alluxio 3.3-4.7X faster than S3 => HDFS
● Writes: Alluxio => S3 2.7-4.2X faster than HDFS => S3
● Hive query: Alluxio => Alluxio 1.3-2.1X faster than HDFS => HDFS
One tale of two stories ...
● Monitoring Systems
● Data Platforms
● Key Takeaways
How orchestration helped solve the problems ...
Key Takeaways
● In the new cloud age, old ways of devops are breaking
○ Wrong technologies => 20% dev, 80% ops
○ Strong implications on CapEx, OpEx, and HREx
● New emphasis in the cloud: Automation
○ Auto deployment, scaling, recovery => 90% dev, 10% ops
● TIPS: Adopt cloud-native technologies, e.g.,
○ Everything as Code: infrastructure, platforms, applications
○ Configuration Orchestration: auto generate all configurations
○ Data Orchestration: caching and unified address space
Thank you!
We’re Hiring
Du Li <duli@ea.com>

More Related Content

What's hot

What's hot (20)

What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data Orchestration
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspective
 
Alluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio Use Cases and Future Directions
Alluxio Use Cases and Future Directions
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...
 
Alluxio + Spark: Accelerating Auto Data Tagging in WeRide
Alluxio + Spark: Accelerating Auto Data Tagging in WeRideAlluxio + Spark: Accelerating Auto Data Tagging in WeRide
Alluxio + Spark: Accelerating Auto Data Tagging in WeRide
 
Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System Alluxio - Virtual Unified File System
Alluxio - Virtual Unified File System
 
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data StoresPresto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the Cloud
 
Presto on Alluxio Hands-On Lab
Presto on Alluxio Hands-On LabPresto on Alluxio Hands-On Lab
Presto on Alluxio Hands-On Lab
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
 
Building Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, AlluxioBuilding Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, Alluxio
 
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
 

Similar to How to Develop and Operate Cloud Native Data Platforms and Applications

Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
programmermag
 

Similar to How to Develop and Operate Cloud Native Data Platforms and Applications (20)

How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data Platforms
 
Fineo Technical Overview - NextSQL for IoT
Fineo Technical Overview - NextSQL for IoTFineo Technical Overview - NextSQL for IoT
Fineo Technical Overview - NextSQL for IoT
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysData Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM Exellys
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Large Data Analyze With PyTables
Large Data Analyze With PyTablesLarge Data Analyze With PyTables
Large Data Analyze With PyTables
 
PyTables
PyTablesPyTables
PyTables
 
Py tables
Py tablesPy tables
Py tables
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managability
 
PyTables
PyTablesPyTables
PyTables
 
Anurag Awasthi - Machine Learning applications for CloudStack
Anurag Awasthi - Machine Learning applications for CloudStackAnurag Awasthi - Machine Learning applications for CloudStack
Anurag Awasthi - Machine Learning applications for CloudStack
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
re:Invent re:Peat
re:Invent re:Peatre:Invent re:Peat
re:Invent re:Peat
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Ghost Environment
Ghost EnvironmentGhost Environment
Ghost Environment
 

More from Alluxio, Inc.

More from Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 

How to Develop and Operate Cloud Native Data Platforms and Applications

  • 1. How to Develop and Operate Cloud Native Data Platforms and Applications Du Li Serena Wang, Yen Feng, Tony Ma, Preethi Ganeshan, Tushar Agarwal, Kaiyu Liu, Nitish Victor, Yu Jin, Sundeep Narravula Electronic Arts (EA) Data Orchestration Summit, 11/7/2019
  • 2.
  • 3. Applications Platforms Infrastructure Addressing Big Data Challenges at EA ● Seasonal and daily fluctuations in data traffic ● Need to scale up and down compute resources frequently and quickly
  • 4. General problems in cloud-based tech stacks ● Reasons: Varying degrees of automation exist at different layers ○ Infrastructure (as Code): fully automated ○ Platforms: manually managed ○ Applications: manually managed ● Consequences: High operational costs in platforms and applications ○ Lack of built-in automation, hard to scale up/down ○ Mismatch of automation, hard to keep different layers in sync ○ Engineers busy with ops, limited time for dev and innovation ● Solutions: Redesign or upgrade in cloud-native architectures
  • 5. One tale of two stories ... ● Monitoring Systems ● Data Platforms ● Key Takeaways How orchestration helped solve the problems ...
  • 6. The life of ops team with 1,000+ alerts per day ...
  • 7. Alerts Generator Metrics Aggregator Metrics Sensors confs What a monitoring system is like ... confs confs confs confs which machine runs what and how to do health check from which machines to collect metrics what metrics to collect and where to send metrics
  • 8. Mismatch of automation causing ops nightmare ... ● Monitoring components: manually configured, little or no automation ● Infrastructure: fully automated, computers added and removed rapidly ● The mismatch makes it hard to keep configurations up to date ● Consequence: many false alerts, while many resources not monitored
  • 9. Metrics Sensor Metrics Sensor Metrics Sensor confs confs Alerts Generator confs confs confs Metrics Aggregators Configuration Orchestration ● Keep monitoring confs in sync with hardware status ● Not modifying any underlying monitoring components The Solution: Monitoring As Code
  • 10. Which machine runs what How monitoring components are wired Library of methods of how to check What to check for each cluster Monitoring As Code: Configuration Orchestration AWS CI/CD Pipelines Alerts Generator Auto Configuration Generators Metrics Aggregators Alerts Generator Metrics Sensor Metrics Aggregator Metrics Sensors
  • 11. Monitoring As Code updates review/merge deploy by CI/CD pipelines ● All monitoring configurations are managed as one gitlab project ● Impose standard structures on all the configurations such that they can be automatically generated ● Configurations as treated the same way as software code ○ Updates are reviewed, traceable ○ Auto deployed by CI/CD ● Essentially an orchestration layer on top of monitoring components without modifying any of those components
  • 12. 1/1/2019 - 4/25/2019 daily averages improved: ~1,000 alerts => ~100 alerts against increasing #hosts and #services Since 6/2019, ~10-20 alerts per day on average
  • 13. One tale of two stories ... ● Monitoring Systems ● Data Platforms ● Key Takeaways How orchestration helped solve the problems ...
  • 14.
  • 15. ● Ocean: Internal ETL jobs ○ Only cache recent data in HDFS ○ ETL Jobs read/write on HDFS ○ Backup data to S3 and update the analytics metastore of new data ○ Legacy tech stacks ● Pond: External customer workloads ○ Run analytic queries directly on S3, maybe the entire history of data ○ Some queries turned into light ETL jobs ○ Some analytic ETL jobs have dependencies with regular ETL jobs ○ Mixed legacy and modern tech stacks Two clusters
  • 16. Main pain points ● Two clusters with duplicates ● Hadoop hard to scale ● HDFS used as cache ○ Complex custom code for data backup, purge, retention, loading ● Fragmented address spaces ○ HDFS (for ETL) recent data, S3 (for analytics) all data ○ Two hive metadata stores: one for ETL and one for analytic workloads ○ Complex sync between jobs and between metadata stores
  • 17. Data Platform 2.0 ● Consolidated ETL/Analytics ● CI/CD for auto deployment ● Auto scaling of YARN and Presto ● Data orchestration using Alluxio
  • 18. Benefits of data orchestration ● Analytics Workloads ○ Presto + Alluxio: caching for better performance ● ETL Workloads ○ YARN + Alluxio: replace HDFS with Alluxio ■ Specialized cache service to auto backup, purge & load data ■ One unified address space to simplify syncs ■ Easier to scale and auto manage compute resources ○ Significant architectural benefits. But, will we lose any performance?
  • 19. Preliminary benchmarking: Alluxio vs HDFS ● Same configurations ○ 10 H1.8xlarge instances, collocated YARN + HDFS/Alluxio ○ Alluxio with 3 replicas, no memory (using HDD for caching) ● Test dataset ○ Single file of varying sizes, single table with 1000 files of varying sizes ● Reads: S3 => Alluxio 3.3-4.7X faster than S3 => HDFS ● Writes: Alluxio => S3 2.7-4.2X faster than HDFS => S3 ● Hive query: Alluxio => Alluxio 1.3-2.1X faster than HDFS => HDFS
  • 20. One tale of two stories ... ● Monitoring Systems ● Data Platforms ● Key Takeaways How orchestration helped solve the problems ...
  • 21. Key Takeaways ● In the new cloud age, old ways of devops are breaking ○ Wrong technologies => 20% dev, 80% ops ○ Strong implications on CapEx, OpEx, and HREx ● New emphasis in the cloud: Automation ○ Auto deployment, scaling, recovery => 90% dev, 10% ops ● TIPS: Adopt cloud-native technologies, e.g., ○ Everything as Code: infrastructure, platforms, applications ○ Configuration Orchestration: auto generate all configurations ○ Data Orchestration: caching and unified address space
  • 22. Thank you! We’re Hiring Du Li <duli@ea.com>