SlideShare a Scribd company logo
Building a real-time, scalable
and intelligent programmatic
ad buying platform
Martín Bonamico
Juan Martín Pampliega
Agenda
1. Jampp
2. Adtech, RTB, clicks, installs, events
3. Initial Architecture
4. Initial Architecture Characteristics
5. Evolution of Data Needs
6. New Data Infrastructure - Stream Processing
7. Key Take Aways
Jampp and AdTech
Jampp is a leading mobile app
marketing and retargeting platform.
Founded in 2013, Jampp has offices in San
Francisco, London, Berlin, Buenos Aires, São
Paulo and Cape Town.
We help companies grow their business by
seamlessly acquiring, engaging & retaining
mobile app users.
Jampp’s platform combines machine
learning with big data for programmatic ad
buying which optimizes towards in-app
activity.
Our platform processes +200,000 RTB ad bid
requests per second (17+ billions per day)
which amounts to about 300 MB/s or 25 TB
of data per day.
How does programmatic ads work?
DOWNLOAD
APP
Source /
Exchange
Jampp
Tracking
Platform
AppStore /
Google Play
App
Install
Postback
Postback
RTB: Real Time Bidding
Jampp Events
1. RTB:
a. Auction: the exchange asks if we want to bid for the
impression.
b. Bid/Non-Bid: bid with price or non-bid (less than 80ms).
c. Impression: the ad is displayed to the user.
2. Non-RTB:
a. Click: event that marks when the user clicks on the ad.
b. Install: install of the app on first app open.
c. Event: in app events like purchase, view, favorited.
Data @ Jampp
● Our platform started using RDBMSs and a
traditional Data Warehouse architecture on Amazon
Web Services.
● Data grew exponentially and data needs became
more complex.
● In the last year alone, 2500%+ in-app events and
500%+ RTB bids.
● This made us evolve our architecture to be able to
effectively handle Big Data.
Initial Data Architecture
C1
C2
Cn
Cupper
Load
Balancer
MySQL
Click
Install
Event
Click
Redirect
PostgreSQL
B1 B2 Bn
Replicator
API
(Pivots)
Auctions Bids Impressions
Initial Jampp Infrastructure
Jampp Initial Systems: Bidder
● OpenRTB bidding system implementation that runs on
200+ virtual machines with 70GB RAM each.
● Strong latency requirements. Less than 80ms to answer a
request.
● Written in Cython and uses ZMQ for communication.
● Heavy use of coherent caching to comply with latency
requirements.
● Data is continually replicated and enriched from MySQL
by the replicator process.
Jampp Initial Systems: Cupper
● Event tracking system written in Node.js.
● Tracks clicks, installs and in-app events. (200+
millions per day)
● Can be scaled horizontally (10 instances) and is
located behind a load balancer (ELB).
● Uses a MySQL database to store attributed events
and Kinesis to store organics.
Jampp Initial Systems: API
● PostgreSQL is used as a Data Warehouse database apart
from the use the bidder does.
● An API exposes the data for querying with a caching
layer.
● Fact tables are maintained with hourly, daily and
monthly granularity and high cardinality dimensions are
removed in large fact tables for data older than 15 days.
● Data is continually aggregated through an aggregation
process written in Python.
Evolution of the Data
Architecture
Emerging Needs
● Log forensics capabilities - as our systems and company
scale and we integrate with more outside systems.
● More historical and granular data for advanced analytics
and model training.
● The need to make the data readily available to other
systems outside from the traditional RDBMS arose. Some
of these systems are too demanding for RDBMS to
handle easily.
C1
C2
Cn
Cupper
Load
Balancer
MySQL
(Ruby)
Click
Install
Event
Click
Redirect
ELB
Logs
C1
C2
Cn
EMR - Hadoop Cluster
AirPal
Initial Evolution
New System Characteristics
● The new system was based on Amazon Elastic Map
Reduce.
● Data imported hourly from RDBMSs with Sqoop.
● Logs are imported every 10 minutes from different
sources to S3 tables.
● Facebook PrestoDB and Apache Spark are used for
interactive log and analytics.
New System Characteristics
● Scalable storage and processing capabilities using
HDFS, YARN and Hive for ETLs and data storage.
● Connectors from different languages like Python,
Julia and Java/Scala.
● Data archiving in S3 for long term storage and
enabling other data processing technologies.
Aspects that needed improvement
● Data still imported in batch mode. Delay was larger
for MySQL data than with Python replicator.
● EMR not great for long running clusters.
● The EMR cluster is not designed with strong multi-
user capabilities. It is better to have multiple
clusters with few users than a big one with many.
● Data still being accumulated in RDBMSs (clicks,
installs, events).
Final stage of the evolution
● Real-time event processing architecture based on
best practices for stream processing in AWS.
● Uses Amazon Kinesis for streaming data storage
and Amazon Lambda for data processing.
● DynamoDB and Redis are used for temporal data
storage for enrichment and analytics.
● S3 gives us a Source of Truth for batch data
applications and Kinesis for stream processing.
Our Real-Time Architecture
Still, it isn’t perfect...
● There is no easy way to manage windows and out or
order data with Amazon Lambda.
● Consistency of DynamoDB and S3.
● Price of AWS managed services for events with large
numbers compared to custom maintained solutions.
● ACID guarantees of RDBMs are not an easy thing to part
with.
● SQL and indexes in RDBMs make forensics easier.
Benefits of the Evolution
● Enables the use of stream processing frameworks to
keep data as fresh as economically possible.
● Decouples data from processing to enable multiple Big
Data engines running on different clusters/
infrastructure.
● Easy on demand scaling given by AWS managed tools
like AWS Lambda, AWS DynamoDB and AWS EMR.
● Monitoring, logs and alerts managed by AWS
Cloudwatch.
Big Data Technologies at Jampp
S3HDFS
Hadoop/YARN
Lambda
DynamoDB
Key Take Aways
● Ad tech is a technologically intensive market which
complies with the three Vs from Big Data.
● As the business’ data needs grows in complexity specialized
data systems need to be put in place.
● Using technologies that are meant to scale easily and are
managed by a third party can bring you peace of mind.
● Stream processing is fundamental in new Big Data Projects.
● There is currently no one tool that clearly fulfills all the
needs for scalable and correct stream processing.
References
http://radar.oreilly.com/2015/08/the-world-beyond-batch-
streaming-101.html
http://radar.oreilly.com/2015/08/the-world-beyond-batch-
streaming-102.html
https://engineering.linkedin.com/distributed-systems/log-what-
every-software-engineer-should-know-about-real-time-datas-
unifying
http://blog.confluent.io/2015/01/29/making-sense-of-stream-
processing/
JAMPP - AGRANDA 2015
http://44jaiio.sadio.org.
ar/sites/default/files/agranda14-30.pdf
Questions?
geeks.jampp.com
We Are Hiring! - jampp.com/jobs.php
martin.bonamico@jampp.com
juan@jampp.com

More Related Content

What's hot

Microservices Live
Microservices LiveMicroservices Live
Microservices Live
Data Driven Innovation
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
SingleStore
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
dataeaze systems
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted Conf
In Marketing We Trust
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
Jason Flittner
 
Customer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data PerspectiveCustomer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data Perspective
Databricks
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Gigaom
 
The Impact of Always-on Connectivity for Geospatial Applications and Analysis
The Impact of Always-on Connectivity for Geospatial Applications and AnalysisThe Impact of Always-on Connectivity for Geospatial Applications and Analysis
The Impact of Always-on Connectivity for Geospatial Applications and Analysis
SingleStore
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
HostedbyConfluent
 
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
confluent
 
Winning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive AnalyticsWinning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive Analytics
SingleStore
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale
SingleStore
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
confluent
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
SingleStore
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan
confluent
 
Real Time Data Infrastructure team overview
Real Time Data Infrastructure team overviewReal Time Data Infrastructure team overview
Real Time Data Infrastructure team overview
Monal Daxini
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
Eva Tse
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
SingleStore
 

What's hot (20)

Microservices Live
Microservices LiveMicroservices Live
Microservices Live
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted Conf
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 
Customer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data PerspectiveCustomer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data Perspective
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
The Impact of Always-on Connectivity for Geospatial Applications and Analysis
The Impact of Always-on Connectivity for Geospatial Applications and AnalysisThe Impact of Always-on Connectivity for Geospatial Applications and Analysis
The Impact of Always-on Connectivity for Geospatial Applications and Analysis
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
 
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
 
Winning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive AnalyticsWinning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive Analytics
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan
 
Real Time Data Infrastructure team overview
Real Time Data Infrastructure team overviewReal Time Data Infrastructure team overview
Real Time Data Infrastructure team overview
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
 

Viewers also liked

Flowics - Jornada en Big Data 2016 - ITBA
Flowics - Jornada en Big Data 2016 - ITBA Flowics - Jornada en Big Data 2016 - ITBA
Flowics - Jornada en Big Data 2016 - ITBA
Andres Moratti
 
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices Environment
PyData
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
Amazon Web Services
 
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
Cloudera, Inc.
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Helena Edelson
 
CRM@Oracle - Customer 360
CRM@Oracle - Customer 360CRM@Oracle - Customer 360
CRM@Oracle - Customer 360
tbOracleCRM
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Amazon Web Services
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
Amazon Web Services
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
Cloudera, Inc.
 

Viewers also liked (9)

Flowics - Jornada en Big Data 2016 - ITBA
Flowics - Jornada en Big Data 2016 - ITBA Flowics - Jornada en Big Data 2016 - ITBA
Flowics - Jornada en Big Data 2016 - ITBA
 
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices Environment
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
 
CRM@Oracle - Customer 360
CRM@Oracle - Customer 360CRM@Oracle - Customer 360
CRM@Oracle - Customer 360
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 

Similar to Building a real-time, scalable and intelligent programmatic ad buying platform

Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
Federico Palladoro
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
MongoDB
 
No SQL at The Guardian
No SQL at The GuardianNo SQL at The Guardian
No SQL at The Guardian
Mat Wall
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentation
Mat Wall
 
NoSQL meetup July 2011
NoSQL meetup July 2011NoSQL meetup July 2011
NoSQL meetup July 2011
Shay Hassidim
 
Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2
kartraj
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridYahoo Developer Network
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Prolifics
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
Amazon Web Services
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using HadoopDataWorks Summit
 
Cloud computing infrastructure
Cloud computing infrastructureCloud computing infrastructure
Cloud computing infrastructuresinhhn
 
IBM z/OS Version 2 Release 2 -- Fueling the digital enterprise
IBM z/OS Version 2 Release 2 -- Fueling the digital enterpriseIBM z/OS Version 2 Release 2 -- Fueling the digital enterprise
IBM z/OS Version 2 Release 2 -- Fueling the digital enterprise
Anderson Bassani
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Amazon Web Services
 
Giga Spaces Getting Ready For The Cloud
Giga Spaces   Getting Ready For The CloudGiga Spaces   Getting Ready For The Cloud
Giga Spaces Getting Ready For The Cloudchzesin
 
GigaSpaces - Getting Ready For The Cloud
GigaSpaces - Getting Ready For The CloudGigaSpaces - Getting Ready For The Cloud
GigaSpaces - Getting Ready For The Cloud
gigaspaces
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
In-Memory Computing Summit
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
Amazon Web Services
 
Anz cics ts v5 technical update seminar intro (half day event)
Anz cics ts v5 technical update seminar intro (half day event)Anz cics ts v5 technical update seminar intro (half day event)
Anz cics ts v5 technical update seminar intro (half day event)
nick_garrod
 
Automating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop AgentAutomating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop Agent
CA | Automic Software
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
MapR Technologies
 

Similar to Building a real-time, scalable and intelligent programmatic ad buying platform (20)

Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
No SQL at The Guardian
No SQL at The GuardianNo SQL at The Guardian
No SQL at The Guardian
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentation
 
NoSQL meetup July 2011
NoSQL meetup July 2011NoSQL meetup July 2011
NoSQL meetup July 2011
 
Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
 
Cloud computing infrastructure
Cloud computing infrastructureCloud computing infrastructure
Cloud computing infrastructure
 
IBM z/OS Version 2 Release 2 -- Fueling the digital enterprise
IBM z/OS Version 2 Release 2 -- Fueling the digital enterpriseIBM z/OS Version 2 Release 2 -- Fueling the digital enterprise
IBM z/OS Version 2 Release 2 -- Fueling the digital enterprise
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
 
Giga Spaces Getting Ready For The Cloud
Giga Spaces   Getting Ready For The CloudGiga Spaces   Getting Ready For The Cloud
Giga Spaces Getting Ready For The Cloud
 
GigaSpaces - Getting Ready For The Cloud
GigaSpaces - Getting Ready For The CloudGigaSpaces - Getting Ready For The Cloud
GigaSpaces - Getting Ready For The Cloud
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Anz cics ts v5 technical update seminar intro (half day event)
Anz cics ts v5 technical update seminar intro (half day event)Anz cics ts v5 technical update seminar intro (half day event)
Anz cics ts v5 technical update seminar intro (half day event)
 
Automating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop AgentAutomating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop Agent
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
vrstrong314
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
abdulrafaychaudhry
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
QuickwayInfoSystems3
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 

Building a real-time, scalable and intelligent programmatic ad buying platform

  • 1. Building a real-time, scalable and intelligent programmatic ad buying platform Martín Bonamico Juan Martín Pampliega
  • 2. Agenda 1. Jampp 2. Adtech, RTB, clicks, installs, events 3. Initial Architecture 4. Initial Architecture Characteristics 5. Evolution of Data Needs 6. New Data Infrastructure - Stream Processing 7. Key Take Aways
  • 4. Jampp is a leading mobile app marketing and retargeting platform. Founded in 2013, Jampp has offices in San Francisco, London, Berlin, Buenos Aires, São Paulo and Cape Town. We help companies grow their business by seamlessly acquiring, engaging & retaining mobile app users.
  • 5. Jampp’s platform combines machine learning with big data for programmatic ad buying which optimizes towards in-app activity. Our platform processes +200,000 RTB ad bid requests per second (17+ billions per day) which amounts to about 300 MB/s or 25 TB of data per day.
  • 6. How does programmatic ads work? DOWNLOAD APP Source / Exchange Jampp Tracking Platform AppStore / Google Play App Install Postback Postback
  • 7. RTB: Real Time Bidding
  • 8. Jampp Events 1. RTB: a. Auction: the exchange asks if we want to bid for the impression. b. Bid/Non-Bid: bid with price or non-bid (less than 80ms). c. Impression: the ad is displayed to the user. 2. Non-RTB: a. Click: event that marks when the user clicks on the ad. b. Install: install of the app on first app open. c. Event: in app events like purchase, view, favorited.
  • 9. Data @ Jampp ● Our platform started using RDBMSs and a traditional Data Warehouse architecture on Amazon Web Services. ● Data grew exponentially and data needs became more complex. ● In the last year alone, 2500%+ in-app events and 500%+ RTB bids. ● This made us evolve our architecture to be able to effectively handle Big Data.
  • 12. Jampp Initial Systems: Bidder ● OpenRTB bidding system implementation that runs on 200+ virtual machines with 70GB RAM each. ● Strong latency requirements. Less than 80ms to answer a request. ● Written in Cython and uses ZMQ for communication. ● Heavy use of coherent caching to comply with latency requirements. ● Data is continually replicated and enriched from MySQL by the replicator process.
  • 13. Jampp Initial Systems: Cupper ● Event tracking system written in Node.js. ● Tracks clicks, installs and in-app events. (200+ millions per day) ● Can be scaled horizontally (10 instances) and is located behind a load balancer (ELB). ● Uses a MySQL database to store attributed events and Kinesis to store organics.
  • 14. Jampp Initial Systems: API ● PostgreSQL is used as a Data Warehouse database apart from the use the bidder does. ● An API exposes the data for querying with a caching layer. ● Fact tables are maintained with hourly, daily and monthly granularity and high cardinality dimensions are removed in large fact tables for data older than 15 days. ● Data is continually aggregated through an aggregation process written in Python.
  • 15. Evolution of the Data Architecture
  • 16. Emerging Needs ● Log forensics capabilities - as our systems and company scale and we integrate with more outside systems. ● More historical and granular data for advanced analytics and model training. ● The need to make the data readily available to other systems outside from the traditional RDBMS arose. Some of these systems are too demanding for RDBMS to handle easily.
  • 18. New System Characteristics ● The new system was based on Amazon Elastic Map Reduce. ● Data imported hourly from RDBMSs with Sqoop. ● Logs are imported every 10 minutes from different sources to S3 tables. ● Facebook PrestoDB and Apache Spark are used for interactive log and analytics.
  • 19. New System Characteristics ● Scalable storage and processing capabilities using HDFS, YARN and Hive for ETLs and data storage. ● Connectors from different languages like Python, Julia and Java/Scala. ● Data archiving in S3 for long term storage and enabling other data processing technologies.
  • 20. Aspects that needed improvement ● Data still imported in batch mode. Delay was larger for MySQL data than with Python replicator. ● EMR not great for long running clusters. ● The EMR cluster is not designed with strong multi- user capabilities. It is better to have multiple clusters with few users than a big one with many. ● Data still being accumulated in RDBMSs (clicks, installs, events).
  • 21. Final stage of the evolution ● Real-time event processing architecture based on best practices for stream processing in AWS. ● Uses Amazon Kinesis for streaming data storage and Amazon Lambda for data processing. ● DynamoDB and Redis are used for temporal data storage for enrichment and analytics. ● S3 gives us a Source of Truth for batch data applications and Kinesis for stream processing.
  • 23. Still, it isn’t perfect... ● There is no easy way to manage windows and out or order data with Amazon Lambda. ● Consistency of DynamoDB and S3. ● Price of AWS managed services for events with large numbers compared to custom maintained solutions. ● ACID guarantees of RDBMs are not an easy thing to part with. ● SQL and indexes in RDBMs make forensics easier.
  • 24. Benefits of the Evolution ● Enables the use of stream processing frameworks to keep data as fresh as economically possible. ● Decouples data from processing to enable multiple Big Data engines running on different clusters/ infrastructure. ● Easy on demand scaling given by AWS managed tools like AWS Lambda, AWS DynamoDB and AWS EMR. ● Monitoring, logs and alerts managed by AWS Cloudwatch.
  • 25. Big Data Technologies at Jampp S3HDFS Hadoop/YARN Lambda DynamoDB
  • 26. Key Take Aways ● Ad tech is a technologically intensive market which complies with the three Vs from Big Data. ● As the business’ data needs grows in complexity specialized data systems need to be put in place. ● Using technologies that are meant to scale easily and are managed by a third party can bring you peace of mind. ● Stream processing is fundamental in new Big Data Projects. ● There is currently no one tool that clearly fulfills all the needs for scalable and correct stream processing.
  • 28. Questions? geeks.jampp.com We Are Hiring! - jampp.com/jobs.php martin.bonamico@jampp.com juan@jampp.com