SlideShare a Scribd company logo
AWS S3 + Alluxio + Presto = ❤
The Ryte Use Case
● What is Ryte: Platform to optimize your Online-Marketing
● Requirements for the Ryte Search Success
● The way to Presto on AWS EMR with S3
● Problems came up
● How we solve them with Alluxio
AWS S3 + Alluxio + Presto = ❤
Customer Relationship
Management
Productivity
Application
Website Quality
Management
Welcome to the age of strategic platforms
3
Welcome the Ryte Suite with three essential tools!
4
Make sure the technical
elements of your
website are flawless.
Website Success
Create engaging
content that users
will love.
Content Success
Score top search
rankings with the help
of real Google data.
Search Success
Monitor OptimizeAnalyze
Making sure the digital evolution won’t hurt your business
Hands-on advice to
make your website the
best it can be.
Get the best possible
data to make informed
decisions.
Monitor vital
elements of your
website’s success.
5
Let’s have a look at Website Success
Website Success
Content Success Search Success
Make sure the technical
elements of your
website are flawless.
6
Let’s have a look at Content Success
Website Success
Content Success
Search Success
Create engaging
content that users
will love.
7
Let’s have a look at Search Success
Website Success Content Success
Search Success
Score top search
rankings with the help
of real Google data.
8
● Ryte is 100% Google compliant
● Ryte does not use scraped data, only Google Search Console data
● So monitor your important keywords based on 100% real Google data and
therefore real search queries
Our key asset: real Google data
Requirements for Ryte Search Success
From Product Objectives to the technical implementation
Daily Import of multiple GB JSON Data
Mainly analytics based Queries
Our Product Features require queries on raw Data
Product Objectives to
choose our technical
solution
Prefer usage of AWS High-Level Services
Development-Team experience
HTTP JSON API
Daily Data Import
Ryte Data-Backend
Ryte Web-Frontend
HTTP JSON API
Daily Data Import
Ryte Data-Backend
Ryte Web-Frontend
Ryte Data-Backend: first Edition
AWS Elasticsearch
Service
Import Application
on AWS EC2
REST API
on AWS EC2
Ryte Data-Backend: first Edition
AWS Elasticsearch
Service
Import Application
on AWS EC2
REST API
on AWS EC2
● Simple Data transformation. JSON 2 JSON
● Knowledge in Teams exist
● Performance
● Analytics Queries
● Costs scales on Data, not on Usage
● High-Performance Setup are difficult
Elasticsearch as Data-Storage
Downsides
Ryte Data-Backend: second Edition
Parquet on AWS S3
Import Application
on AWS ECS
AWS EMR (Hadoop)
w Presto as
Dist-SQL-Engine
● Full decoupled read / write engine
● High reliable, low-cost storage with AWS S3
(99.999999999% durability)
● Cost-intensive scaling is usage based
REST API
on AWS ECS
AWS ECS Container
AWS ECS
Container
AWS S3 (unlimited persistent & reliable object store)
AWS EMR Task Node
REST API
AWS EMR Master Instance
Presto Task NodePresto Master Node
Ryte Backend Flow
RANDOMLY HIGH PEAKS ON S3 Request RESULT IN TIMEOUTS!!!111
AWS EMR/Presto with S3 as Data-Backend
Downsides
● 1 of 100 slow S3 Request kill the whole query
● S3 Latency has direct impact to the user
● AWS try they best to find a solution but it
stuck for days & weeks
API Responses up to 20s instead of 3s
Decoupling of our Storage Layer, or: How Alluxio solves all Problems
Ryte Data-Backend: third Edition
Parquet stored
on AWS S3
Import
on AWS ECS
REST API
on AWS ECS
Presto
on AWS EMR
Cluster
Alluxio cache
on AWS EMR
Cluster
● Currently no extra Hardware costs
● Alluxio Cache can “warmed up”
● Cache costs scaling on usage
● Fits perfectly between Presto and S3
AWS ECS Container
AWS ECS
Container
AWS S3 (unlimited persistent & reliable object store)
AWS EMR Task Node
REST API
AWS EMR Master Instance
Alluxio workerAlluxio Master
RAM CachePresto Task Node
Alluxio Client
Presto Master Node
Ryte Backend Flow
Performance Push 😱😍
Query-Time
reduced by
72%
on average!
Summary
● Alluxio help us perfectly to decouple S3 latency spikes from user requests
● No need for additional Hardware until today
● Easy integration between Presto on Hadoop & S3
● Hardware requirements scaling still with business 👍
26
Danny Linden @ Ryte
Chapter Lead Engineering
E-Mail: d.linden@ryte.com
linkedin.com/in/danny-linden/
Twitter: @CodingDanny
Questions ? WE ARE HIRING IN MUNICH:
jobs.ryte.com

More Related Content

More from Alluxio, Inc.

More from Alluxio, Inc. (20)

Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
 
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to Production
 
Alluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model TrainingAlluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model Training
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 

Recently uploaded (20)

Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
A Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data MigrationA Guideline to Gorgias to to Re:amaze Data Migration
A Guideline to Gorgias to to Re:amaze Data Migration
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
Benefits of Employee Monitoring Software
Benefits of  Employee Monitoring SoftwareBenefits of  Employee Monitoring Software
Benefits of Employee Monitoring Software
 

AWS S3 + Alluxio + Presto = ❤️ The Ryte Use Case

  • 1. AWS S3 + Alluxio + Presto = ❤ The Ryte Use Case
  • 2. ● What is Ryte: Platform to optimize your Online-Marketing ● Requirements for the Ryte Search Success ● The way to Presto on AWS EMR with S3 ● Problems came up ● How we solve them with Alluxio AWS S3 + Alluxio + Presto = ❤
  • 4. Welcome the Ryte Suite with three essential tools! 4 Make sure the technical elements of your website are flawless. Website Success Create engaging content that users will love. Content Success Score top search rankings with the help of real Google data. Search Success
  • 5. Monitor OptimizeAnalyze Making sure the digital evolution won’t hurt your business Hands-on advice to make your website the best it can be. Get the best possible data to make informed decisions. Monitor vital elements of your website’s success. 5
  • 6. Let’s have a look at Website Success Website Success Content Success Search Success Make sure the technical elements of your website are flawless. 6
  • 7. Let’s have a look at Content Success Website Success Content Success Search Success Create engaging content that users will love. 7
  • 8. Let’s have a look at Search Success Website Success Content Success Search Success Score top search rankings with the help of real Google data. 8
  • 9. ● Ryte is 100% Google compliant ● Ryte does not use scraped data, only Google Search Console data ● So monitor your important keywords based on 100% real Google data and therefore real search queries Our key asset: real Google data
  • 10. Requirements for Ryte Search Success From Product Objectives to the technical implementation Daily Import of multiple GB JSON Data Mainly analytics based Queries Our Product Features require queries on raw Data Product Objectives to choose our technical solution Prefer usage of AWS High-Level Services Development-Team experience
  • 11. HTTP JSON API Daily Data Import Ryte Data-Backend Ryte Web-Frontend
  • 12. HTTP JSON API Daily Data Import Ryte Data-Backend Ryte Web-Frontend
  • 13. Ryte Data-Backend: first Edition AWS Elasticsearch Service Import Application on AWS EC2 REST API on AWS EC2
  • 14. Ryte Data-Backend: first Edition AWS Elasticsearch Service Import Application on AWS EC2 REST API on AWS EC2 ● Simple Data transformation. JSON 2 JSON ● Knowledge in Teams exist ● Performance ● Analytics Queries
  • 15. ● Costs scales on Data, not on Usage ● High-Performance Setup are difficult Elasticsearch as Data-Storage Downsides
  • 16. Ryte Data-Backend: second Edition Parquet on AWS S3 Import Application on AWS ECS AWS EMR (Hadoop) w Presto as Dist-SQL-Engine ● Full decoupled read / write engine ● High reliable, low-cost storage with AWS S3 (99.999999999% durability) ● Cost-intensive scaling is usage based REST API on AWS ECS
  • 17. AWS ECS Container AWS ECS Container AWS S3 (unlimited persistent & reliable object store) AWS EMR Task Node REST API AWS EMR Master Instance Presto Task NodePresto Master Node Ryte Backend Flow
  • 18. RANDOMLY HIGH PEAKS ON S3 Request RESULT IN TIMEOUTS!!!111 AWS EMR/Presto with S3 as Data-Backend Downsides ● 1 of 100 slow S3 Request kill the whole query ● S3 Latency has direct impact to the user ● AWS try they best to find a solution but it stuck for days & weeks
  • 19. API Responses up to 20s instead of 3s
  • 20. Decoupling of our Storage Layer, or: How Alluxio solves all Problems
  • 21. Ryte Data-Backend: third Edition Parquet stored on AWS S3 Import on AWS ECS REST API on AWS ECS Presto on AWS EMR Cluster Alluxio cache on AWS EMR Cluster ● Currently no extra Hardware costs ● Alluxio Cache can “warmed up” ● Cache costs scaling on usage ● Fits perfectly between Presto and S3
  • 22. AWS ECS Container AWS ECS Container AWS S3 (unlimited persistent & reliable object store) AWS EMR Task Node REST API AWS EMR Master Instance Alluxio workerAlluxio Master RAM CachePresto Task Node Alluxio Client Presto Master Node Ryte Backend Flow
  • 23.
  • 25. Summary ● Alluxio help us perfectly to decouple S3 latency spikes from user requests ● No need for additional Hardware until today ● Easy integration between Presto on Hadoop & S3 ● Hardware requirements scaling still with business 👍
  • 26. 26 Danny Linden @ Ryte Chapter Lead Engineering E-Mail: d.linden@ryte.com linkedin.com/in/danny-linden/ Twitter: @CodingDanny Questions ? WE ARE HIRING IN MUNICH: jobs.ryte.com