Jampp's Impressive Real-Time Bidding and Streaming Architecture

•Download as PPTX, PDF•

2 likes•1,129 views

Presented at the Architecture Conference (ArqConf) in Buenos Aires, Argentina. Here is a 10,000ft view of our Real Time Bidding and Stream Processing architecture.

Technology

Patricio Rocca - Agosto 2016
@patriciorocca
Arquitecturas de tiempo
real y escalables

About Jampp
We are a tech company that helps companies
grow their mobile business by driving engaged
users to their apps
We are a team of 75 people, 30%
in the engineering team.
Located in 6 cities across the
US, Latin America, Europe and
Africa
Machine learning
Post-install event optimisation
Dynamic Product Ads and Segments
Data Science
Programmatic Buying

We process 220,000 bid requests per second
We process each bid request in less than 100ms
We manage 40Tb of data everyday
We do real time machine learning
Jampp Architecture Impressive Facts
And… we are just a team of 22 nerds :) or :(

Bid
Real-Time Bidding Workflow
Auction Win
Exchange Exchange
Publisher Publisher
Jampp
Bidder
Jampp
Machine
Learning
Jampp
Engagement
Segments
Builder
IMPRESSION ;-)
PLACEHOLDER
IMPRESSION

Real-Time Tracking Workflow
In-App
Event
IMPRESSION ;-)
Click
Jampp
NodeJS
Tracking
Platform
Client
Tracking
Platform
Publisher Client
IMPRESSION

Bid Price = CPI * eCTR * eCVR * (1-margin) * 1000
Python + Tornado + Cython + nginx (+ antigravity)
Caching, layers upon layers upon layers
Leaky bucket-ish feedback loop for pacing
With predictive local projections to account for imperfect and laggy
inter-server communication
Selective, aggregate logging
Circa-25TB of data generated per day makes naïve logging… unwise
Real Time Bidding Architecture (details)

In-process L1 serves all requests
µs latency access a lifesaver for real-time,
latency-constrained workloads
Local L2 in each server
Buffers responses from the L3
Saves bandwidth to-from the L3
(3 MB/s x 230 servers x 8 procs = death)
Decreases promotion latency to L1
Remote L3 provides main distributed cache
storage
Caching

Uses logistic regression to predict P(click |
impression) or P(install | click) using context
features
Online solution that incrementally learns from
the Real Time Bidding events just in time
Uses regularization and hashing trick to explore
a huge feature space and keep only the
statistically most informative ones
Machine Learning

Stream Processing Architecture (details)
Uses Amazon Kinesis for durable streaming data and Amazon
Lambda for data processing
DynamoDB as temporal data storage for enrichment and analytics
S3 provides a Single Source of Truth for batch data applications
Decouples data from processing to enable multiple Big Data
engines running on different clusters/infrastructure
Easy on demand scaling by AWS™

Data Push
Pick your partition key for evenly
distributing data across shards
Encoding protocol matters! MessagePack
offered the best trade off between
compression and serialization speed
factor

Data Processing and Enrichment
Write/Read Batching to reduce the HTTPS
protocol overhead and costs
Exponential backoff + Jitter to reduce the
impact of in-app events bursts sent by
the tracking platforms
Increased Data Retention Period from 1 day
(default) to 3 days on the raw data
streams

Spark + Hadoop + PrestoDB = <3
Firehose real time data-ingestion to S3 and
auto scaling capabilities
EMR Cluster simplifies our data processing
Spark ETLs are executed by Airflow, to
enrich data, de-normalize and convert
JSON to Parquet.
Spark Streaming for real-time anomaly
detection and fraud prevention

Dunno fuck with real time! (caching and cython to the rescue)
Rent first, build later
Development and staging for Big Data projects should involve production
traffic or be prepared for trouble
PrestoDB is really amazing in regards to performance, maturity and
feature set
Kinesis, Dynamo and Firehose use HTTPS as transport protocol which is
slow, requires aggressive batching and exponential back-off + jitter
Monitoring, logs and alerts managed by AWS Cloudwatch oversimplifies
production support
Lessons Learned

What's hot

Microservices LiveData Driven Innovation

Winning the On-Demand Economy with Spark and Predictive AnalyticsSingleStore

AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services

In-Memory Computing Webcast. Market Predictions 2017SingleStore

Zero Downtime App Deployment using HadoopDataWorks Summit/Hadoop Summit

(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services

Introduction to Google Cloud Platform for Big Data - Trusted ConfIn Marketing We Trust

Driving the On-Demand Economy with Spark and Predictive AnalyticsSingleStore

Netflix Big Data Paris 2017Jason Flittner

Spark Summit East Keynote by Anjul BhambhriJen Aman

Real-Time Geospatial Intelligence at Scale SingleStore

Customer Experience at Disney+ Through Data PerspectiveDatabricks

Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent

Enabling Real-Time Analytics for IoTSingleStore

Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...Amazon Web Services

Big Data in the Cloud Amazon Web Services

Data to Drive Decision-Making - CaliStream MeetupJerome Boulon

The Fast Path to Building Operational Applications with SparkSingleStore

Real-Time Analytics with MemSQL and SparkSingleStore

The Netflix data platform: Now and in the future by Kurt BrownData Con LA

What's hot (20)

Microservices Live

Winning the On-Demand Economy with Spark and Predictive Analytics

AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...

In-Memory Computing Webcast. Market Predictions 2017

Zero Downtime App Deployment using Hadoop

(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014

Introduction to Google Cloud Platform for Big Data - Trusted Conf

Driving the On-Demand Economy with Spark and Predictive Analytics

Netflix Big Data Paris 2017

Spark Summit East Keynote by Anjul Bhambhri

Real-Time Geospatial Intelligence at Scale

Customer Experience at Disney+ Through Data Perspective

Processing Real-Time Data at Scale: A streaming platform as a central nervous...

Enabling Real-Time Analytics for IoT

Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...

Big Data in the Cloud

Data to Drive Decision-Making - CaliStream Meetup

The Fast Path to Building Operational Applications with Spark

Real-Time Analytics with MemSQL and Spark

The Netflix data platform: Now and in the future by Kurt Brown

Similar to Jampp's Impressive Real-Time Bidding and Streaming Architecture

Getting started with Amazon KinesisAmazon Web Services

Real-Time Analytics with Confluent and MemSQLSingleStore

Aws Tools for Alexa SkillsBoaz Ziniman

The hidden engineering behind machine learning products at HelixaAlluxio, Inc.

AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)Amazon Web Services

Financial Services Analytics on AWSAmazon Web Services

Opening Keynote - AWS Summit SG 2017Amazon Web Services

Mai-Lan Tomsen Bukovec- Keynote-AWS Summit ManilaAmazon Web Services

Big Data Analytics, Machine Learning e Inteligência ArtificialAmazon Web Services LATAM

AWS Cloud Experience CA: Data Lakes & Analytics en AWSAmazon Web Services LATAM

Modern Data Architectures for Real Time Analytics & EngagementAmazon Web Services

Connected IoT and Intelligent SolutionsAmazon Web Services

Analyzing Real-time Streaming Data with Amazon KinesisAmazon Web Services

Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Amazon Web Services

Path to the future #4 - Ingestão, processamento e análise de dados em tempo realAmazon Web Services LATAM

Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Amazon Web Services

Mining Information from Data on CloudAmazon Web Services

Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Amazon Web Services

Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Web Services

Similar to Jampp's Impressive Real-Time Bidding and Streaming Architecture (20)

Getting started with Amazon Kinesis

Real-Time Analytics with Confluent and MemSQL

Aws Tools for Alexa Skills

The hidden engineering behind machine learning products at Helixa

AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)

Financial Services Analytics on AWS

Opening Keynote - AWS Summit SG 2017

Mai-Lan Tomsen Bukovec- Keynote-AWS Summit Manila

Big Data Analytics, Machine Learning e Inteligência Artificial

AWS Cloud Experience CA: Data Lakes & Analytics en AWS

Modern Data Architectures for Real Time Analytics & Engagement

Connected IoT and Intelligent Solutions

Analyzing Real-time Streaming Data with Amazon Kinesis

Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...

Path to the future #4 - Ingestão, processamento e análise de dados em tempo real

Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018

Mining Information from Data on Cloud

Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...

Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018

Recently uploaded

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Install Stable Diffusion in windows machinePadma Pradeep

Key Features Of Token Development (1).pptxLBM Solutions

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

costume and set research powerpoint presentationphoebematthew05

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Build your next Gen AI Breakthrough - April 2024Neo4j

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Artificial intelligence in the post-deep learning eraDeakin University

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Recently uploaded (20)

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service

Gen AI in Business - Global Trends Report 2024.pdf

Install Stable Diffusion in windows machine

Key Features Of Token Development (1).pptx

Human Factors of XR: Using Human Factors to Design XR Systems

costume and set research powerpoint presentation

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

"Debugging python applications inside k8s environment", Andrii Soldatenko

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Build your next Gen AI Breakthrough - April 2024

My Hashitalk Indonesia April 2024 Presentation

Unraveling Multimodality with Large Language Models.pdf

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Dev Dives: Streamline document processing with UiPath Studio Web

Pigging Solutions in Pet Food Manufacturing

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

WordPress Websites for Engineers: Elevate Your Brand

Artificial intelligence in the post-deep learning era

Scanning the Internet for External Cloud Exposures via SSL Certs

Jampp's Impressive Real-Time Bidding and Streaming Architecture

1. Patricio Rocca - Agosto 2016 @patriciorocca Arquitecturas de tiempo real y escalables

2. About Jampp We are a tech company that helps companies grow their mobile business by driving engaged users to their apps We are a team of 75 people, 30% in the engineering team. Located in 6 cities across the US, Latin America, Europe and Africa Machine learning Post-install event optimisation Dynamic Product Ads and Segments Data Science Programmatic Buying

3. We process 220,000 bid requests per second We process each bid request in less than 100ms We manage 40Tb of data everyday We do real time machine learning Jampp Architecture Impressive Facts And… we are just a team of 22 nerds :) or :(

4. Bid Real-Time Bidding Workflow Auction Win Exchange Exchange Publisher Publisher Jampp Bidder Jampp Machine Learning Jampp Engagement Segments Builder IMPRESSION ;-) PLACEHOLDER IMPRESSION

5. Real-Time Tracking Workflow In-App Event IMPRESSION ;-) Click Jampp NodeJS Tracking Platform Client Tracking Platform Publisher Client IMPRESSION

6. Let’s talk about architecture!

7. Real Time Bidding Architecture

8. Bid Price = CPI * eCTR * eCVR * (1-margin) * 1000 Python + Tornado + Cython + nginx (+ antigravity) Caching, layers upon layers upon layers Leaky bucket-ish feedback loop for pacing With predictive local projections to account for imperfect and laggy inter-server communication Selective, aggregate logging Circa-25TB of data generated per day makes naïve logging… unwise Real Time Bidding Architecture (details)

9. In-process L1 serves all requests µs latency access a lifesaver for real-time, latency-constrained workloads Local L2 in each server Buffers responses from the L3 Saves bandwidth to-from the L3 (3 MB/s x 230 servers x 8 procs = death) Decreases promotion latency to L1 Remote L3 provides main distributed cache storage Caching

10. Uses logistic regression to predict P(click | impression) or P(install | click) using context features Online solution that incrementally learns from the Real Time Bidding events just in time Uses regularization and hashing trick to explore a huge feature space and keep only the statistically most informative ones Machine Learning

11. Stream Processing Architecture

12. Stream Processing Architecture (details) Uses Amazon Kinesis for durable streaming data and Amazon Lambda for data processing DynamoDB as temporal data storage for enrichment and analytics S3 provides a Single Source of Truth for batch data applications Decouples data from processing to enable multiple Big Data engines running on different clusters/infrastructure Easy on demand scaling by AWS™

13. Data Push Pick your partition key for evenly distributing data across shards Encoding protocol matters! MessagePack offered the best trade off between compression and serialization speed factor

14. Data Processing and Enrichment Write/Read Batching to reduce the HTTPS protocol overhead and costs Exponential backoff + Jitter to reduce the impact of in-app events bursts sent by the tracking platforms Increased Data Retention Period from 1 day (default) to 3 days on the raw data streams

15. Spark + Hadoop + PrestoDB = <3 Firehose real time data-ingestion to S3 and auto scaling capabilities EMR Cluster simplifies our data processing Spark ETLs are executed by Airflow, to enrich data, de-normalize and convert JSON to Parquet. Spark Streaming for real-time anomaly detection and fraud prevention

16. Dunno fuck with real time! (caching and cython to the rescue) Rent first, build later Development and staging for Big Data projects should involve production traffic or be prepared for trouble PrestoDB is really amazing in regards to performance, maturity and feature set Kinesis, Dynamo and Firehose use HTTPS as transport protocol which is slow, requires aggressive batching and exponential back-off + jitter Monitoring, logs and alerts managed by AWS Cloudwatch oversimplifies production support Lessons Learned

17. Gracias ;-) geeks.jampp.com

Editor's Notes

Jampp is an advertising technology company founded on 2013. We do both user acquisition and user engagement through real time bidding (a.k.a programmatic media buying)
We built our own Demand Side Platform in Python which processes 19B auctions per day
The bidder calculates the bid price based on 1) Machine Learning stochastic gradient descent model which generates a decision tree that predicts the CTR and CVR, 2) user groups generated by the user activity in the app and the probability to generate revenue within the app (user engagement) After the user clicks on the ad and we redirect to the Apple Store/Google Play/Deeplink to our clients apps we lose context and get completely blind
All our clients are using a tracking platform integrated with their app to track all the in-app events (user activity)
In-process LRU serves all requests µs latency access a lifesaver for real-time, latency-constrained workloads Remote L3 provides main cache storage,avoids multiplication of efforts Local L2 in each server Buffers responses from the L3 Saves bandwidth to-from the L3(3 MB/s x 230 servers x 8 procs = death) Decreases promotion latency to L1 Precomputed slow-changing bundles in S3 Speeds up load of massive near-static data Inter-process shared memory with mmap
Uber Engineering Team made a great analysis comparing json, ujson, protobuf, thrift and the winner was messagepack
Uber Engineering Team made a great analysis comparing json, ujson, protobuf, thrift and the winner was messagepack
Exponential backoff >> none + Jitter (adding randomness) Data retention increase $0.020 per shard hour which is almost nothing in comparison to losing data
RDBMSs can fit a lot of use cases initially: unified log, OLAP, near real-time processing (but dunno scale)

Jampp's Impressive Real-Time Bidding and Streaming Architecture

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Jampp's Impressive Real-Time Bidding and Streaming Architecture

Similar to Jampp's Impressive Real-Time Bidding and Streaming Architecture (20)

Recently uploaded

Recently uploaded (20)

Jampp's Impressive Real-Time Bidding and Streaming Architecture

Editor's Notes