SlideShare a Scribd company logo
1 of 17
Patricio Rocca - Agosto 2016
@patriciorocca
Arquitecturas de tiempo
real y escalables
About Jampp
We are a tech company that helps companies
grow their mobile business by driving engaged
users to their apps
We are a team of 75 people, 30%
in the engineering team.
Located in 6 cities across the
US, Latin America, Europe and
Africa
Machine learning
Post-install event optimisation
Dynamic Product Ads and Segments
Data Science
Programmatic Buying
We process 220,000 bid requests per second
We process each bid request in less than 100ms
We manage 40Tb of data everyday
We do real time machine learning
Jampp Architecture Impressive Facts
And… we are just a team of 22 nerds :) or :(
Bid
Real-Time Bidding Workflow
Auction Win
Exchange Exchange
Publisher Publisher
Jampp
Bidder
Jampp
Machine
Learning
Jampp
Engagement
Segments
Builder
IMPRESSION ;-)
PLACEHOLDER
IMPRESSION
Real-Time Tracking Workflow
In-App
Event
IMPRESSION ;-)
Click
Jampp
NodeJS
Tracking
Platform
Client
Tracking
Platform
Publisher Client
IMPRESSION
Let’s talk about architecture!
Real Time Bidding Architecture
Bid Price = CPI * eCTR * eCVR * (1-margin) * 1000
Python + Tornado + Cython + nginx (+ antigravity)
Caching, layers upon layers upon layers
Leaky bucket-ish feedback loop for pacing
With predictive local projections to account for imperfect and laggy
inter-server communication
Selective, aggregate logging
Circa-25TB of data generated per day makes naïve logging… unwise
Real Time Bidding Architecture (details)
In-process L1 serves all requests
µs latency access a lifesaver for real-time,
latency-constrained workloads
Local L2 in each server
Buffers responses from the L3
Saves bandwidth to-from the L3
(3 MB/s x 230 servers x 8 procs = death)
Decreases promotion latency to L1
Remote L3 provides main distributed cache
storage
Caching
Uses logistic regression to predict P(click |
impression) or P(install | click) using context
features
Online solution that incrementally learns from
the Real Time Bidding events just in time
Uses regularization and hashing trick to explore
a huge feature space and keep only the
statistically most informative ones
Machine Learning
Stream Processing Architecture
Stream Processing Architecture (details)
Uses Amazon Kinesis for durable streaming data and Amazon
Lambda for data processing
DynamoDB as temporal data storage for enrichment and analytics
S3 provides a Single Source of Truth for batch data applications
Decouples data from processing to enable multiple Big Data
engines running on different clusters/infrastructure
Easy on demand scaling by AWS™
Data Push
Pick your partition key for evenly
distributing data across shards
Encoding protocol matters! MessagePack
offered the best trade off between
compression and serialization speed
factor
Data Processing and Enrichment
Write/Read Batching to reduce the HTTPS
protocol overhead and costs
Exponential backoff + Jitter to reduce the
impact of in-app events bursts sent by
the tracking platforms
Increased Data Retention Period from 1 day
(default) to 3 days on the raw data
streams
Spark + Hadoop + PrestoDB = <3
Firehose real time data-ingestion to S3 and
auto scaling capabilities
EMR Cluster simplifies our data processing
Spark ETLs are executed by Airflow, to
enrich data, de-normalize and convert
JSON to Parquet.
Spark Streaming for real-time anomaly
detection and fraud prevention
Dunno fuck with real time! (caching and cython to the rescue)
Rent first, build later
Development and staging for Big Data projects should involve production
traffic or be prepared for trouble
PrestoDB is really amazing in regards to performance, maturity and
feature set
Kinesis, Dynamo and Firehose use HTTPS as transport protocol which is
slow, requires aggressive batching and exponential back-off + jitter
Monitoring, logs and alerts managed by AWS Cloudwatch oversimplifies
production support
Lessons Learned
Gracias ;-)
geeks.jampp.com

More Related Content

What's hot

Winning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive AnalyticsWinning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive AnalyticsSingleStore
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIn Marketing We Trust
 
Driving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive AnalyticsDriving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive AnalyticsSingleStore
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Jason Flittner
 
Spark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul BhambhriSpark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul BhambhriJen Aman
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale SingleStore
 
Customer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data PerspectiveCustomer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data PerspectiveDatabricks
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTSingleStore
 
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...Amazon Web Services
 
Data to Drive Decision-Making - CaliStream Meetup
Data to Drive Decision-Making - CaliStream MeetupData to Drive Decision-Making - CaliStream Meetup
Data to Drive Decision-Making - CaliStream MeetupJerome Boulon
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkSingleStore
 
Real-Time Analytics with MemSQL and Spark
Real-Time Analytics with MemSQL and SparkReal-Time Analytics with MemSQL and Spark
Real-Time Analytics with MemSQL and SparkSingleStore
 
The Netflix data platform: Now and in the future by Kurt Brown
The Netflix data platform: Now and in the future by Kurt BrownThe Netflix data platform: Now and in the future by Kurt Brown
The Netflix data platform: Now and in the future by Kurt BrownData Con LA
 

What's hot (20)

Microservices Live
Microservices LiveMicroservices Live
Microservices Live
 
Winning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive AnalyticsWinning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive Analytics
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted Conf
 
Driving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive AnalyticsDriving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive Analytics
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 
Spark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul BhambhriSpark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul Bhambhri
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale
 
Customer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data PerspectiveCustomer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data Perspective
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
 
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
 
Big Data in the Cloud
Big Data in the Cloud Big Data in the Cloud
Big Data in the Cloud
 
Data to Drive Decision-Making - CaliStream Meetup
Data to Drive Decision-Making - CaliStream MeetupData to Drive Decision-Making - CaliStream Meetup
Data to Drive Decision-Making - CaliStream Meetup
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
 
Real-Time Analytics with MemSQL and Spark
Real-Time Analytics with MemSQL and SparkReal-Time Analytics with MemSQL and Spark
Real-Time Analytics with MemSQL and Spark
 
The Netflix data platform: Now and in the future by Kurt Brown
The Netflix data platform: Now and in the future by Kurt BrownThe Netflix data platform: Now and in the future by Kurt Brown
The Netflix data platform: Now and in the future by Kurt Brown
 

Similar to Jampp's Impressive Real-Time Bidding and Streaming Architecture

Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon KinesisAmazon Web Services
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
 
Aws Tools for Alexa Skills
Aws Tools for Alexa SkillsAws Tools for Alexa Skills
Aws Tools for Alexa SkillsBoaz Ziniman
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)Amazon Web Services
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWSAmazon Web Services
 
Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Amazon Web Services
 
Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Amazon Web Services
 
Mai-Lan Tomsen Bukovec- Keynote-AWS Summit Manila
Mai-Lan Tomsen Bukovec- Keynote-AWS Summit ManilaMai-Lan Tomsen Bukovec- Keynote-AWS Summit Manila
Mai-Lan Tomsen Bukovec- Keynote-AWS Summit ManilaAmazon Web Services
 
Big Data Analytics, Machine Learning e Inteligência Artificial
Big Data Analytics, Machine Learning e Inteligência ArtificialBig Data Analytics, Machine Learning e Inteligência Artificial
Big Data Analytics, Machine Learning e Inteligência ArtificialAmazon Web Services LATAM
 
AWS Cloud Experience CA: Data Lakes & Analytics en AWS
AWS Cloud Experience CA: Data Lakes & Analytics en AWSAWS Cloud Experience CA: Data Lakes & Analytics en AWS
AWS Cloud Experience CA: Data Lakes & Analytics en AWSAmazon Web Services LATAM
 
Modern Data Architectures for Real Time Analytics & Engagement
Modern Data Architectures for Real Time Analytics & EngagementModern Data Architectures for Real Time Analytics & Engagement
Modern Data Architectures for Real Time Analytics & EngagementAmazon Web Services
 
Connected IoT and Intelligent Solutions
Connected IoT and Intelligent SolutionsConnected IoT and Intelligent Solutions
Connected IoT and Intelligent SolutionsAmazon Web Services
 
Analyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon KinesisAnalyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon KinesisAmazon Web Services
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Amazon Web Services
 
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo realPath to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo realAmazon Web Services LATAM
 
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Amazon Web Services
 
Mining Information from Data on Cloud
Mining Information from Data on CloudMining Information from Data on Cloud
Mining Information from Data on CloudAmazon Web Services
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Amazon Web Services
 
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Web Services
 

Similar to Jampp's Impressive Real-Time Bidding and Streaming Architecture (20)

Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Aws Tools for Alexa Skills
Aws Tools for Alexa SkillsAws Tools for Alexa Skills
Aws Tools for Alexa Skills
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWS
 
Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017
 
Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017Opening Keynote - AWS Summit SG 2017
Opening Keynote - AWS Summit SG 2017
 
Mai-Lan Tomsen Bukovec- Keynote-AWS Summit Manila
Mai-Lan Tomsen Bukovec- Keynote-AWS Summit ManilaMai-Lan Tomsen Bukovec- Keynote-AWS Summit Manila
Mai-Lan Tomsen Bukovec- Keynote-AWS Summit Manila
 
Big Data Analytics, Machine Learning e Inteligência Artificial
Big Data Analytics, Machine Learning e Inteligência ArtificialBig Data Analytics, Machine Learning e Inteligência Artificial
Big Data Analytics, Machine Learning e Inteligência Artificial
 
AWS Cloud Experience CA: Data Lakes & Analytics en AWS
AWS Cloud Experience CA: Data Lakes & Analytics en AWSAWS Cloud Experience CA: Data Lakes & Analytics en AWS
AWS Cloud Experience CA: Data Lakes & Analytics en AWS
 
Modern Data Architectures for Real Time Analytics & Engagement
Modern Data Architectures for Real Time Analytics & EngagementModern Data Architectures for Real Time Analytics & Engagement
Modern Data Architectures for Real Time Analytics & Engagement
 
Connected IoT and Intelligent Solutions
Connected IoT and Intelligent SolutionsConnected IoT and Intelligent Solutions
Connected IoT and Intelligent Solutions
 
Analyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon KinesisAnalyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon Kinesis
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo realPath to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
 
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018
 
Mining Information from Data on Cloud
Mining Information from Data on CloudMining Information from Data on Cloud
Mining Information from Data on Cloud
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
 
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Jampp's Impressive Real-Time Bidding and Streaming Architecture

  • 1. Patricio Rocca - Agosto 2016 @patriciorocca Arquitecturas de tiempo real y escalables
  • 2. About Jampp We are a tech company that helps companies grow their mobile business by driving engaged users to their apps We are a team of 75 people, 30% in the engineering team. Located in 6 cities across the US, Latin America, Europe and Africa Machine learning Post-install event optimisation Dynamic Product Ads and Segments Data Science Programmatic Buying
  • 3. We process 220,000 bid requests per second We process each bid request in less than 100ms We manage 40Tb of data everyday We do real time machine learning Jampp Architecture Impressive Facts And… we are just a team of 22 nerds :) or :(
  • 4. Bid Real-Time Bidding Workflow Auction Win Exchange Exchange Publisher Publisher Jampp Bidder Jampp Machine Learning Jampp Engagement Segments Builder IMPRESSION ;-) PLACEHOLDER IMPRESSION
  • 5. Real-Time Tracking Workflow In-App Event IMPRESSION ;-) Click Jampp NodeJS Tracking Platform Client Tracking Platform Publisher Client IMPRESSION
  • 6. Let’s talk about architecture!
  • 7. Real Time Bidding Architecture
  • 8. Bid Price = CPI * eCTR * eCVR * (1-margin) * 1000 Python + Tornado + Cython + nginx (+ antigravity) Caching, layers upon layers upon layers Leaky bucket-ish feedback loop for pacing With predictive local projections to account for imperfect and laggy inter-server communication Selective, aggregate logging Circa-25TB of data generated per day makes naïve logging… unwise Real Time Bidding Architecture (details)
  • 9. In-process L1 serves all requests µs latency access a lifesaver for real-time, latency-constrained workloads Local L2 in each server Buffers responses from the L3 Saves bandwidth to-from the L3 (3 MB/s x 230 servers x 8 procs = death) Decreases promotion latency to L1 Remote L3 provides main distributed cache storage Caching
  • 10. Uses logistic regression to predict P(click | impression) or P(install | click) using context features Online solution that incrementally learns from the Real Time Bidding events just in time Uses regularization and hashing trick to explore a huge feature space and keep only the statistically most informative ones Machine Learning
  • 12. Stream Processing Architecture (details) Uses Amazon Kinesis for durable streaming data and Amazon Lambda for data processing DynamoDB as temporal data storage for enrichment and analytics S3 provides a Single Source of Truth for batch data applications Decouples data from processing to enable multiple Big Data engines running on different clusters/infrastructure Easy on demand scaling by AWS™
  • 13. Data Push Pick your partition key for evenly distributing data across shards Encoding protocol matters! MessagePack offered the best trade off between compression and serialization speed factor
  • 14. Data Processing and Enrichment Write/Read Batching to reduce the HTTPS protocol overhead and costs Exponential backoff + Jitter to reduce the impact of in-app events bursts sent by the tracking platforms Increased Data Retention Period from 1 day (default) to 3 days on the raw data streams
  • 15. Spark + Hadoop + PrestoDB = <3 Firehose real time data-ingestion to S3 and auto scaling capabilities EMR Cluster simplifies our data processing Spark ETLs are executed by Airflow, to enrich data, de-normalize and convert JSON to Parquet. Spark Streaming for real-time anomaly detection and fraud prevention
  • 16. Dunno fuck with real time! (caching and cython to the rescue) Rent first, build later Development and staging for Big Data projects should involve production traffic or be prepared for trouble PrestoDB is really amazing in regards to performance, maturity and feature set Kinesis, Dynamo and Firehose use HTTPS as transport protocol which is slow, requires aggressive batching and exponential back-off + jitter Monitoring, logs and alerts managed by AWS Cloudwatch oversimplifies production support Lessons Learned

Editor's Notes

  1. Jampp is an advertising technology company founded on 2013. We do both user acquisition and user engagement through real time bidding (a.k.a programmatic media buying)
  2. We built our own Demand Side Platform in Python which processes 19B auctions per day
  3. The bidder calculates the bid price based on 1) Machine Learning stochastic gradient descent model which generates a decision tree that predicts the CTR and CVR, 2) user groups generated by the user activity in the app and the probability to generate revenue within the app (user engagement) After the user clicks on the ad and we redirect to the Apple Store/Google Play/Deeplink to our clients apps we lose context and get completely blind
  4. All our clients are using a tracking platform integrated with their app to track all the in-app events (user activity)
  5. In-process LRU serves all requests µs latency access a lifesaver for real-time, latency-constrained workloads Remote L3 provides main cache storage, avoids multiplication of efforts Local L2 in each server Buffers responses from the L3 Saves bandwidth to-from the L3 (3 MB/s x 230 servers x 8 procs = death) Decreases promotion latency to L1 Precomputed slow-changing bundles in S3 Speeds up load of massive near-static data Inter-process shared memory with mmap
  6. Uber Engineering Team made a great analysis comparing json, ujson, protobuf, thrift and the winner was messagepack
  7. Uber Engineering Team made a great analysis comparing json, ujson, protobuf, thrift and the winner was messagepack
  8. Exponential backoff >> none + Jitter (adding randomness) Data retention increase $0.020 per shard hour which is almost nothing in comparison to losing data
  9. RDBMSs can fit a lot of use cases initially: unified log, OLAP, near real-time processing (but dunno scale)