SlideShare a Scribd company logo
Copyright © 2015 Improve Digital - All Rights Reserved
Example Use of Samza at
Improve Digital
Garry Turkington CTO
g.turkington@improvedigital.com
@garryturk
Copyright © 2015 Improve Digital - All Rights Reserved
• Cloud-based Real Time Advertising
Technology
• Focus on the premium publisher /
media owner
• Integrations with thousands of
Demand Partners
• Decision driven by Real Time Data
• Offices in UK, NL, DE & ES
• +100 Premium Publishers
IMPROVE
DIGITAL
ABOUT US
Copyright © 2015 Improve Digital - All Rights Reserved
Data Sources
Copyright © 2015 Improve Digital - All Rights Reserved
• We have lots of data coming from the ad server fleet
• We need to get to it as fast as possible
• Kafka was a natural hub for that data
• And Samza was a natural extension for message
processing
• Used for both single message notifications and
aggregations
Why Samza is useful
Copyright © 2015 Improve Digital - All Rights Reserved
Samza integration
• A Samza job runs within multiple YARN
containers
• Each container runs multiple tasks
• The task instances are the custom code
• Each task is the dedicated consumer of a
single partition in each of its input topics
• Samza checkpoints task progress to allow
recovery after task failure/restart
Copyright © 2015 Improve Digital - All Rights Reserved
Samza API
Every Samza task implements the StreamTask
interface which has one method:
Void process(
IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator);
Copyright © 2015 Improve Digital - All Rights Reserved
Samza API (continued)
Optionally the initableTask and windowableTask
interfaces can also be implemented:
void init(Config config, TaskContext context);
• Void window(MessageCollector collector,
TaskCoordinator coordinator);
Copyright © 2015 Improve Digital - All Rights Reserved
For each client produce 1 minute aggregates:
• Timestamp
• Client
• Number of impressions (ads won)
• Number of bids received
• Amount of budget spent
Example Problem Statement
Copyright © 2015 Improve Digital - All Rights Reserved
• Log files pushed into S3 which creates SQS
notifications
• Ingest application reads the messages, and pulls files
• Then splits files and pushes records into Kafka
• Ensures correct partition key
• Hold the obvious question about where the ingest
lives…
Step 1: Getting Data Into Kafka
Copyright © 2015 Improve Digital - All Rights Reserved
S3Ad Server
Logs
Ingest App
SQS Notifications Get
Request
Logs
Kafka
Log Records
Getting Data Into Kafka
Copyright © 2015 Improve Digital - All Rights Reserved
• Raw ingest rate is very high
• So a first job does 10 second aggregations on the
input
• This also does some column filtering/cleaning
• Not strictly necessary but the output proved very
useful
• Plus helps reduce YARN resource requirements
Step 2: Top-Level Aggregation
Copyright © 2015 Improve Digital - All Rights Reserved
Top Level Aggregation
Impression
Reduction
Imp Logs Bid Logs
Bids
Reduction
Copyright © 2015 Improve Digital - All Rights Reserved
• First step that is specific to this use case
• From the 10 second aggs combines into per-minute
aggregates
• Importantly also reduces to a much smaller set of
key fields
• Builds aggregates using persistent task state
Step 3: Main Aggregation
Copyright © 2015 Improve Digital - All Rights Reserved
Main Aggregation
Impression
Reduction
Imp Logs Bid Logs
Bids
Reduction
Bids
Aggregation
Impression
Aggregation
1 min imp aggs 1 min bid aggs
Copyright © 2015 Improve Digital - All Rights Reserved
• Next job in the chain creates composite structures
• The imp and bid aggregates are combined into
single records
• Heavily uses Samza persistent state
• Data object is stored for each minute and fields filled
from both streams
Step 4: Crossing the Streams
Copyright © 2015 Improve Digital - All Rights Reserved
Crossing The Stream
Impression
Reduction
Imp Logs Bid Logs
Bids
Reduction
Bids
Aggregation
Impression
Aggregation
1 min imp aggs 1 min bid aggs
Combiner
Composite records
Copyright © 2015 Improve Digital - All Rights Reserved
This doesn’t give us revenue data converted to client
currency
But to know how to convert we need reference data:
• Exchange rate information
• The preferred currency for each client
• We use Samza bootstrap streams to inject this
information
• Then update as new data arrives
Reference data
Copyright © 2015 Improve Digital - All Rights Reserved
Reference Data
Impression
Reduction
Imp Logs Bid Logs
Bids
Reduction
Bids
Aggregation
Impression
Aggregation
1 min imp aggs 1 min bid aggs
Combiner
Composite records
Reference
Streams
Copyright © 2015 Improve Digital - All Rights Reserved
Step 5: Getting Data Out of Kafka
Final job reads the composite records and pushes
downstream
Where export lives is flexible:
• Write directly from final job
• Add a specific output job
• Use an external client
• Kafka copycat may give a common framework here
Copyright © 2015 Improve Digital - All Rights Reserved
Getting Data Out of Kafka
Impression
Reduction
Imp Logs Bid Logs
Bids
Reduction
Bids
Aggregation
Impression
Aggregation
1 min imp aggs 1 min bid aggs
Combiner
Composite records
Reference
Streams
Exporter
Downstream Data
Copyright © 2015 Improve Digital - All Rights Reserved
Final Bill of Materials
To meet our requirement we use a combination
of:
• External ingest client
• Initial data reduction job
• Data aggregation job
• Stream combiner job
• Data exporter job
Copyright © 2015 Improve Digital - All Rights Reserved
Summary
• Samza gives great flexibility for small window aggregation
• Decomposing into multiple jobs creates reusable intermediate
data
• Remember the output of each job is available to any Kafka
client
• Much thought often goes into getting data into and out of
Kafka
• Kafka copycat may make that a whole lot easier
Copyright © 2015 Improve Digital - All Rights Reserved
Questions?

More Related Content

What's hot

The 5 Stages of Cloud Management for Enterprises
The 5 Stages of Cloud Management for EnterprisesThe 5 Stages of Cloud Management for Enterprises
The 5 Stages of Cloud Management for Enterprises
RightScale
 
Cloud Migration and Portability (with and without Containers)
Cloud Migration and Portability (with and without Containers)Cloud Migration and Portability (with and without Containers)
Cloud Migration and Portability (with and without Containers)
RightScale
 
Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...
Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...
Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...
Amazon Web Services
 
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
RightScale
 
Multi-Cloud Management with RightScale CMP (Demo)
Multi-Cloud Management with RightScale CMP (Demo)Multi-Cloud Management with RightScale CMP (Demo)
Multi-Cloud Management with RightScale CMP (Demo)
RightScale
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
Amazon Web Services
 
Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS
Amazon Web Services
 
Manage and Optimize Cloud Spend with RightScale Optima
Manage and Optimize Cloud Spend with RightScale OptimaManage and Optimize Cloud Spend with RightScale Optima
Manage and Optimize Cloud Spend with RightScale Optima
RightScale
 
Tagging Best Practices for Cloud Governance
Tagging Best Practices for Cloud GovernanceTagging Best Practices for Cloud Governance
Tagging Best Practices for Cloud Governance
RightScale
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
HostedbyConfluent
 
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018
Amazon Web Services
 
Automating Multi-Cloud Policies for AWS, Azure, Google, and More
Automating Multi-Cloud Policies for AWS, Azure, Google, and MoreAutomating Multi-Cloud Policies for AWS, Azure, Google, and More
Automating Multi-Cloud Policies for AWS, Azure, Google, and More
RightScale
 
Using RightScale CMP with Cloud Provider Tools
Using RightScale CMP with Cloud Provider ToolsUsing RightScale CMP with Cloud Provider Tools
Using RightScale CMP with Cloud Provider Tools
RightScale
 
12 Ways to Manage Cloud Costs and Optimize Cloud Spend
12 Ways to Manage Cloud Costs and Optimize Cloud Spend12 Ways to Manage Cloud Costs and Optimize Cloud Spend
12 Ways to Manage Cloud Costs and Optimize Cloud Spend
RightScale
 
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenContainer Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
InfluxData
 
How MSPs Can Be Successful in AWS, Azure, and Google Clouds
How MSPs Can Be Successful in AWS, Azure, and Google CloudsHow MSPs Can Be Successful in AWS, Azure, and Google Clouds
How MSPs Can Be Successful in AWS, Azure, and Google Clouds
RightScale
 
Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...
Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...
Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...
Amazon Web Services
 
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
AWS Germany
 
AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)
AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)
AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)
Amazon Web Services
 
Management@Scale
Management@ScaleManagement@Scale
Management@Scale
Amazon Web Services
 

What's hot (20)

The 5 Stages of Cloud Management for Enterprises
The 5 Stages of Cloud Management for EnterprisesThe 5 Stages of Cloud Management for Enterprises
The 5 Stages of Cloud Management for Enterprises
 
Cloud Migration and Portability (with and without Containers)
Cloud Migration and Portability (with and without Containers)Cloud Migration and Portability (with and without Containers)
Cloud Migration and Portability (with and without Containers)
 
Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...
Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...
Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...
 
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
 
Multi-Cloud Management with RightScale CMP (Demo)
Multi-Cloud Management with RightScale CMP (Demo)Multi-Cloud Management with RightScale CMP (Demo)
Multi-Cloud Management with RightScale CMP (Demo)
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS
 
Manage and Optimize Cloud Spend with RightScale Optima
Manage and Optimize Cloud Spend with RightScale OptimaManage and Optimize Cloud Spend with RightScale Optima
Manage and Optimize Cloud Spend with RightScale Optima
 
Tagging Best Practices for Cloud Governance
Tagging Best Practices for Cloud GovernanceTagging Best Practices for Cloud Governance
Tagging Best Practices for Cloud Governance
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
 
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018
 
Automating Multi-Cloud Policies for AWS, Azure, Google, and More
Automating Multi-Cloud Policies for AWS, Azure, Google, and MoreAutomating Multi-Cloud Policies for AWS, Azure, Google, and More
Automating Multi-Cloud Policies for AWS, Azure, Google, and More
 
Using RightScale CMP with Cloud Provider Tools
Using RightScale CMP with Cloud Provider ToolsUsing RightScale CMP with Cloud Provider Tools
Using RightScale CMP with Cloud Provider Tools
 
12 Ways to Manage Cloud Costs and Optimize Cloud Spend
12 Ways to Manage Cloud Costs and Optimize Cloud Spend12 Ways to Manage Cloud Costs and Optimize Cloud Spend
12 Ways to Manage Cloud Costs and Optimize Cloud Spend
 
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenContainer Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
 
How MSPs Can Be Successful in AWS, Azure, and Google Clouds
How MSPs Can Be Successful in AWS, Azure, and Google CloudsHow MSPs Can Be Successful in AWS, Azure, and Google Clouds
How MSPs Can Be Successful in AWS, Azure, and Google Clouds
 
Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...
Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...
Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...
 
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
 
AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)
AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)
AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)
 
Management@Scale
Management@ScaleManagement@Scale
Management@Scale
 

Similar to Samza london hug presentation Aug 5 2015 - Garry Turkington

Best Practices for Scaling an InfluxEnterprise Cluster
Best Practices for Scaling an InfluxEnterprise ClusterBest Practices for Scaling an InfluxEnterprise Cluster
Best Practices for Scaling an InfluxEnterprise Cluster
InfluxData
 
Praxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudPraxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloud
Roman Weber
 
RTBkit 2.0 Roadmap Preview
RTBkit 2.0 Roadmap PreviewRTBkit 2.0 Roadmap Preview
RTBkit 2.0 Roadmap Preview
Datacratic
 
Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...
Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...
Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...
LetsConnect
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
idan_by
 
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
Amazon Web Services
 
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
Amazon Web Services
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica
Will Du
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Amazon Web Services
 
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
Amazon Web Services
 
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Amazon Web Services
 
RTBkit Introduction & Best Practices
RTBkit Introduction & Best PracticesRTBkit Introduction & Best Practices
RTBkit Introduction & Best Practices
Datacratic
 
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Amazon Web Services
 
Our trading infrastructure for next generation trading with quant and automat...
Our trading infrastructure for next generation trading with quant and automat...Our trading infrastructure for next generation trading with quant and automat...
Our trading infrastructure for next generation trading with quant and automat...
Bryan Downing
 
Building a [micro]services platform on AWS
Building a [micro]services platform on AWSBuilding a [micro]services platform on AWS
Building a [micro]services platform on AWS
Shaun Pearce
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Amazon Web Services
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
2019 03-13-implementing microservices by ddd
2019 03-13-implementing microservices by ddd2019 03-13-implementing microservices by ddd
2019 03-13-implementing microservices by ddd
Kim Kao
 
Implementing Microservices by DDD
Implementing Microservices by DDDImplementing Microservices by DDD
Implementing Microservices by DDD
Amazon Web Services
 

Similar to Samza london hug presentation Aug 5 2015 - Garry Turkington (20)

Best Practices for Scaling an InfluxEnterprise Cluster
Best Practices for Scaling an InfluxEnterprise ClusterBest Practices for Scaling an InfluxEnterprise Cluster
Best Practices for Scaling an InfluxEnterprise Cluster
 
Praxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudPraxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloud
 
RTBkit 2.0 Roadmap Preview
RTBkit 2.0 Roadmap PreviewRTBkit 2.0 Roadmap Preview
RTBkit 2.0 Roadmap Preview
 
Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...
Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...
Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
 
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
 
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
 
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
 
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
 
RTBkit Introduction & Best Practices
RTBkit Introduction & Best PracticesRTBkit Introduction & Best Practices
RTBkit Introduction & Best Practices
 
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
 
Our trading infrastructure for next generation trading with quant and automat...
Our trading infrastructure for next generation trading with quant and automat...Our trading infrastructure for next generation trading with quant and automat...
Our trading infrastructure for next generation trading with quant and automat...
 
Building a [micro]services platform on AWS
Building a [micro]services platform on AWSBuilding a [micro]services platform on AWS
Building a [micro]services platform on AWS
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
2019 03-13-implementing microservices by ddd
2019 03-13-implementing microservices by ddd2019 03-13-implementing microservices by ddd
2019 03-13-implementing microservices by ddd
 
Implementing Microservices by DDD
Implementing Microservices by DDDImplementing Microservices by DDD
Implementing Microservices by DDD
 

Recently uploaded

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 

Recently uploaded (20)

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 

Samza london hug presentation Aug 5 2015 - Garry Turkington

  • 1. Copyright © 2015 Improve Digital - All Rights Reserved Example Use of Samza at Improve Digital Garry Turkington CTO g.turkington@improvedigital.com @garryturk
  • 2. Copyright © 2015 Improve Digital - All Rights Reserved • Cloud-based Real Time Advertising Technology • Focus on the premium publisher / media owner • Integrations with thousands of Demand Partners • Decision driven by Real Time Data • Offices in UK, NL, DE & ES • +100 Premium Publishers IMPROVE DIGITAL ABOUT US
  • 3. Copyright © 2015 Improve Digital - All Rights Reserved Data Sources
  • 4. Copyright © 2015 Improve Digital - All Rights Reserved • We have lots of data coming from the ad server fleet • We need to get to it as fast as possible • Kafka was a natural hub for that data • And Samza was a natural extension for message processing • Used for both single message notifications and aggregations Why Samza is useful
  • 5. Copyright © 2015 Improve Digital - All Rights Reserved Samza integration • A Samza job runs within multiple YARN containers • Each container runs multiple tasks • The task instances are the custom code • Each task is the dedicated consumer of a single partition in each of its input topics • Samza checkpoints task progress to allow recovery after task failure/restart
  • 6. Copyright © 2015 Improve Digital - All Rights Reserved Samza API Every Samza task implements the StreamTask interface which has one method: Void process( IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator);
  • 7. Copyright © 2015 Improve Digital - All Rights Reserved Samza API (continued) Optionally the initableTask and windowableTask interfaces can also be implemented: void init(Config config, TaskContext context); • Void window(MessageCollector collector, TaskCoordinator coordinator);
  • 8. Copyright © 2015 Improve Digital - All Rights Reserved For each client produce 1 minute aggregates: • Timestamp • Client • Number of impressions (ads won) • Number of bids received • Amount of budget spent Example Problem Statement
  • 9. Copyright © 2015 Improve Digital - All Rights Reserved • Log files pushed into S3 which creates SQS notifications • Ingest application reads the messages, and pulls files • Then splits files and pushes records into Kafka • Ensures correct partition key • Hold the obvious question about where the ingest lives… Step 1: Getting Data Into Kafka
  • 10. Copyright © 2015 Improve Digital - All Rights Reserved S3Ad Server Logs Ingest App SQS Notifications Get Request Logs Kafka Log Records Getting Data Into Kafka
  • 11. Copyright © 2015 Improve Digital - All Rights Reserved • Raw ingest rate is very high • So a first job does 10 second aggregations on the input • This also does some column filtering/cleaning • Not strictly necessary but the output proved very useful • Plus helps reduce YARN resource requirements Step 2: Top-Level Aggregation
  • 12. Copyright © 2015 Improve Digital - All Rights Reserved Top Level Aggregation Impression Reduction Imp Logs Bid Logs Bids Reduction
  • 13. Copyright © 2015 Improve Digital - All Rights Reserved • First step that is specific to this use case • From the 10 second aggs combines into per-minute aggregates • Importantly also reduces to a much smaller set of key fields • Builds aggregates using persistent task state Step 3: Main Aggregation
  • 14. Copyright © 2015 Improve Digital - All Rights Reserved Main Aggregation Impression Reduction Imp Logs Bid Logs Bids Reduction Bids Aggregation Impression Aggregation 1 min imp aggs 1 min bid aggs
  • 15. Copyright © 2015 Improve Digital - All Rights Reserved • Next job in the chain creates composite structures • The imp and bid aggregates are combined into single records • Heavily uses Samza persistent state • Data object is stored for each minute and fields filled from both streams Step 4: Crossing the Streams
  • 16. Copyright © 2015 Improve Digital - All Rights Reserved Crossing The Stream Impression Reduction Imp Logs Bid Logs Bids Reduction Bids Aggregation Impression Aggregation 1 min imp aggs 1 min bid aggs Combiner Composite records
  • 17. Copyright © 2015 Improve Digital - All Rights Reserved This doesn’t give us revenue data converted to client currency But to know how to convert we need reference data: • Exchange rate information • The preferred currency for each client • We use Samza bootstrap streams to inject this information • Then update as new data arrives Reference data
  • 18. Copyright © 2015 Improve Digital - All Rights Reserved Reference Data Impression Reduction Imp Logs Bid Logs Bids Reduction Bids Aggregation Impression Aggregation 1 min imp aggs 1 min bid aggs Combiner Composite records Reference Streams
  • 19. Copyright © 2015 Improve Digital - All Rights Reserved Step 5: Getting Data Out of Kafka Final job reads the composite records and pushes downstream Where export lives is flexible: • Write directly from final job • Add a specific output job • Use an external client • Kafka copycat may give a common framework here
  • 20. Copyright © 2015 Improve Digital - All Rights Reserved Getting Data Out of Kafka Impression Reduction Imp Logs Bid Logs Bids Reduction Bids Aggregation Impression Aggregation 1 min imp aggs 1 min bid aggs Combiner Composite records Reference Streams Exporter Downstream Data
  • 21. Copyright © 2015 Improve Digital - All Rights Reserved Final Bill of Materials To meet our requirement we use a combination of: • External ingest client • Initial data reduction job • Data aggregation job • Stream combiner job • Data exporter job
  • 22. Copyright © 2015 Improve Digital - All Rights Reserved Summary • Samza gives great flexibility for small window aggregation • Decomposing into multiple jobs creates reusable intermediate data • Remember the output of each job is available to any Kafka client • Much thought often goes into getting data into and out of Kafka • Kafka copycat may make that a whole lot easier
  • 23. Copyright © 2015 Improve Digital - All Rights Reserved Questions?