SlideShare a Scribd company logo
1 of 23
Copyright © 2015 Improve Digital - All Rights Reserved
Example Use of Samza at
Improve Digital
Garry Turkington CTO
g.turkington@improvedigital.com
@garryturk
Copyright © 2015 Improve Digital - All Rights Reserved
• Cloud-based Real Time Advertising
Technology
• Focus on the premium publisher /
media owner
• Integrations with thousands of
Demand Partners
• Decision driven by Real Time Data
• Offices in UK, NL, DE & ES
• +100 Premium Publishers
IMPROVE
DIGITAL
ABOUT US
Copyright © 2015 Improve Digital - All Rights Reserved
Data Sources
Copyright © 2015 Improve Digital - All Rights Reserved
• We have lots of data coming from the ad server fleet
• We need to get to it as fast as possible
• Kafka was a natural hub for that data
• And Samza was a natural extension for message
processing
• Used for both single message notifications and
aggregations
Why Samza is useful
Copyright © 2015 Improve Digital - All Rights Reserved
Samza integration
• A Samza job runs within multiple YARN
containers
• Each container runs multiple tasks
• The task instances are the custom code
• Each task is the dedicated consumer of a
single partition in each of its input topics
• Samza checkpoints task progress to allow
recovery after task failure/restart
Copyright © 2015 Improve Digital - All Rights Reserved
Samza API
Every Samza task implements the StreamTask
interface which has one method:
Void process(
IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator);
Copyright © 2015 Improve Digital - All Rights Reserved
Samza API (continued)
Optionally the initableTask and windowableTask
interfaces can also be implemented:
void init(Config config, TaskContext context);
• Void window(MessageCollector collector,
TaskCoordinator coordinator);
Copyright © 2015 Improve Digital - All Rights Reserved
For each client produce 1 minute aggregates:
• Timestamp
• Client
• Number of impressions (ads won)
• Number of bids received
• Amount of budget spent
Example Problem Statement
Copyright © 2015 Improve Digital - All Rights Reserved
• Log files pushed into S3 which creates SQS
notifications
• Ingest application reads the messages, and pulls files
• Then splits files and pushes records into Kafka
• Ensures correct partition key
• Hold the obvious question about where the ingest
lives…
Step 1: Getting Data Into Kafka
Copyright © 2015 Improve Digital - All Rights Reserved
S3Ad Server
Logs
Ingest App
SQS Notifications Get
Request
Logs
Kafka
Log Records
Getting Data Into Kafka
Copyright © 2015 Improve Digital - All Rights Reserved
• Raw ingest rate is very high
• So a first job does 10 second aggregations on the
input
• This also does some column filtering/cleaning
• Not strictly necessary but the output proved very
useful
• Plus helps reduce YARN resource requirements
Step 2: Top-Level Aggregation
Copyright © 2015 Improve Digital - All Rights Reserved
Top Level Aggregation
Impression
Reduction
Imp Logs Bid Logs
Bids
Reduction
Copyright © 2015 Improve Digital - All Rights Reserved
• First step that is specific to this use case
• From the 10 second aggs combines into per-minute
aggregates
• Importantly also reduces to a much smaller set of
key fields
• Builds aggregates using persistent task state
Step 3: Main Aggregation
Copyright © 2015 Improve Digital - All Rights Reserved
Main Aggregation
Impression
Reduction
Imp Logs Bid Logs
Bids
Reduction
Bids
Aggregation
Impression
Aggregation
1 min imp aggs 1 min bid aggs
Copyright © 2015 Improve Digital - All Rights Reserved
• Next job in the chain creates composite structures
• The imp and bid aggregates are combined into
single records
• Heavily uses Samza persistent state
• Data object is stored for each minute and fields filled
from both streams
Step 4: Crossing the Streams
Copyright © 2015 Improve Digital - All Rights Reserved
Crossing The Stream
Impression
Reduction
Imp Logs Bid Logs
Bids
Reduction
Bids
Aggregation
Impression
Aggregation
1 min imp aggs 1 min bid aggs
Combiner
Composite records
Copyright © 2015 Improve Digital - All Rights Reserved
This doesn’t give us revenue data converted to client
currency
But to know how to convert we need reference data:
• Exchange rate information
• The preferred currency for each client
• We use Samza bootstrap streams to inject this
information
• Then update as new data arrives
Reference data
Copyright © 2015 Improve Digital - All Rights Reserved
Reference Data
Impression
Reduction
Imp Logs Bid Logs
Bids
Reduction
Bids
Aggregation
Impression
Aggregation
1 min imp aggs 1 min bid aggs
Combiner
Composite records
Reference
Streams
Copyright © 2015 Improve Digital - All Rights Reserved
Step 5: Getting Data Out of Kafka
Final job reads the composite records and pushes
downstream
Where export lives is flexible:
• Write directly from final job
• Add a specific output job
• Use an external client
• Kafka copycat may give a common framework here
Copyright © 2015 Improve Digital - All Rights Reserved
Getting Data Out of Kafka
Impression
Reduction
Imp Logs Bid Logs
Bids
Reduction
Bids
Aggregation
Impression
Aggregation
1 min imp aggs 1 min bid aggs
Combiner
Composite records
Reference
Streams
Exporter
Downstream Data
Copyright © 2015 Improve Digital - All Rights Reserved
Final Bill of Materials
To meet our requirement we use a combination
of:
• External ingest client
• Initial data reduction job
• Data aggregation job
• Stream combiner job
• Data exporter job
Copyright © 2015 Improve Digital - All Rights Reserved
Summary
• Samza gives great flexibility for small window aggregation
• Decomposing into multiple jobs creates reusable intermediate
data
• Remember the output of each job is available to any Kafka
client
• Much thought often goes into getting data into and out of
Kafka
• Kafka copycat may make that a whole lot easier
Copyright © 2015 Improve Digital - All Rights Reserved
Questions?

More Related Content

What's hot

The 5 Stages of Cloud Management for Enterprises
The 5 Stages of Cloud Management for EnterprisesThe 5 Stages of Cloud Management for Enterprises
The 5 Stages of Cloud Management for EnterprisesRightScale
 
Cloud Migration and Portability (with and without Containers)
Cloud Migration and Portability (with and without Containers)Cloud Migration and Portability (with and without Containers)
Cloud Migration and Portability (with and without Containers)RightScale
 
Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...
Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...
Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...Amazon Web Services
 
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...RightScale
 
Multi-Cloud Management with RightScale CMP (Demo)
Multi-Cloud Management with RightScale CMP (Demo)Multi-Cloud Management with RightScale CMP (Demo)
Multi-Cloud Management with RightScale CMP (Demo)RightScale
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Web Services
 
Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Amazon Web Services
 
Manage and Optimize Cloud Spend with RightScale Optima
Manage and Optimize Cloud Spend with RightScale OptimaManage and Optimize Cloud Spend with RightScale Optima
Manage and Optimize Cloud Spend with RightScale OptimaRightScale
 
Tagging Best Practices for Cloud Governance
Tagging Best Practices for Cloud GovernanceTagging Best Practices for Cloud Governance
Tagging Best Practices for Cloud GovernanceRightScale
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHostedbyConfluent
 
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018Amazon Web Services
 
Automating Multi-Cloud Policies for AWS, Azure, Google, and More
Automating Multi-Cloud Policies for AWS, Azure, Google, and MoreAutomating Multi-Cloud Policies for AWS, Azure, Google, and More
Automating Multi-Cloud Policies for AWS, Azure, Google, and MoreRightScale
 
Using RightScale CMP with Cloud Provider Tools
Using RightScale CMP with Cloud Provider ToolsUsing RightScale CMP with Cloud Provider Tools
Using RightScale CMP with Cloud Provider ToolsRightScale
 
12 Ways to Manage Cloud Costs and Optimize Cloud Spend
12 Ways to Manage Cloud Costs and Optimize Cloud Spend12 Ways to Manage Cloud Costs and Optimize Cloud Spend
12 Ways to Manage Cloud Costs and Optimize Cloud SpendRightScale
 
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenContainer Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenInfluxData
 
How MSPs Can Be Successful in AWS, Azure, and Google Clouds
How MSPs Can Be Successful in AWS, Azure, and Google CloudsHow MSPs Can Be Successful in AWS, Azure, and Google Clouds
How MSPs Can Be Successful in AWS, Azure, and Google CloudsRightScale
 
Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...
Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...
Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...Amazon Web Services
 
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...AWS Germany
 
AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)
AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)
AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)Amazon Web Services
 

What's hot (20)

The 5 Stages of Cloud Management for Enterprises
The 5 Stages of Cloud Management for EnterprisesThe 5 Stages of Cloud Management for Enterprises
The 5 Stages of Cloud Management for Enterprises
 
Cloud Migration and Portability (with and without Containers)
Cloud Migration and Portability (with and without Containers)Cloud Migration and Portability (with and without Containers)
Cloud Migration and Portability (with and without Containers)
 
Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...
Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...
Best Practices for Data Center Migration Planning - August 2016 Monthly Webin...
 
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
 
Multi-Cloud Management with RightScale CMP (Demo)
Multi-Cloud Management with RightScale CMP (Demo)Multi-Cloud Management with RightScale CMP (Demo)
Multi-Cloud Management with RightScale CMP (Demo)
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS
 
Manage and Optimize Cloud Spend with RightScale Optima
Manage and Optimize Cloud Spend with RightScale OptimaManage and Optimize Cloud Spend with RightScale Optima
Manage and Optimize Cloud Spend with RightScale Optima
 
Tagging Best Practices for Cloud Governance
Tagging Best Practices for Cloud GovernanceTagging Best Practices for Cloud Governance
Tagging Best Practices for Cloud Governance
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
 
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018
 
Automating Multi-Cloud Policies for AWS, Azure, Google, and More
Automating Multi-Cloud Policies for AWS, Azure, Google, and MoreAutomating Multi-Cloud Policies for AWS, Azure, Google, and More
Automating Multi-Cloud Policies for AWS, Azure, Google, and More
 
Using RightScale CMP with Cloud Provider Tools
Using RightScale CMP with Cloud Provider ToolsUsing RightScale CMP with Cloud Provider Tools
Using RightScale CMP with Cloud Provider Tools
 
12 Ways to Manage Cloud Costs and Optimize Cloud Spend
12 Ways to Manage Cloud Costs and Optimize Cloud Spend12 Ways to Manage Cloud Costs and Optimize Cloud Spend
12 Ways to Manage Cloud Costs and Optimize Cloud Spend
 
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenContainer Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
 
How MSPs Can Be Successful in AWS, Azure, and Google Clouds
How MSPs Can Be Successful in AWS, Azure, and Google CloudsHow MSPs Can Be Successful in AWS, Azure, and Google Clouds
How MSPs Can Be Successful in AWS, Azure, and Google Clouds
 
Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...
Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...
Revolutionising Cloud Operations with AWS Config, AWS CloudTrail and AWS Clou...
 
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
 
AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)
AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)
AWS re:Invent 2016: Saving at Scale with Reserved Instances (ENT307)
 
Management@Scale
Management@ScaleManagement@Scale
Management@Scale
 

Similar to Samza london hug presentation Aug 5 2015 - Garry Turkington

Best Practices for Scaling an InfluxEnterprise Cluster
Best Practices for Scaling an InfluxEnterprise ClusterBest Practices for Scaling an InfluxEnterprise Cluster
Best Practices for Scaling an InfluxEnterprise ClusterInfluxData
 
Praxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudPraxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudRoman Weber
 
RTBkit 2.0 Roadmap Preview
RTBkit 2.0 Roadmap PreviewRTBkit 2.0 Roadmap Preview
RTBkit 2.0 Roadmap PreviewDatacratic
 
Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...
Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...
Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...LetsConnect
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processingidan_by
 
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...Amazon Web Services
 
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)Amazon Web Services
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Will Du
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Amazon Web Services
 
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...Amazon Web Services
 
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Amazon Web Services
 
RTBkit Introduction & Best Practices
RTBkit Introduction & Best PracticesRTBkit Introduction & Best Practices
RTBkit Introduction & Best PracticesDatacratic
 
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...Amazon Web Services
 
Our trading infrastructure for next generation trading with quant and automat...
Our trading infrastructure for next generation trading with quant and automat...Our trading infrastructure for next generation trading with quant and automat...
Our trading infrastructure for next generation trading with quant and automat...Bryan Downing
 
Building a [micro]services platform on AWS
Building a [micro]services platform on AWSBuilding a [micro]services platform on AWS
Building a [micro]services platform on AWSShaun Pearce
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Amazon Web Services
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
2019 03-13-implementing microservices by ddd
2019 03-13-implementing microservices by ddd2019 03-13-implementing microservices by ddd
2019 03-13-implementing microservices by dddKim Kao
 
Implementing Microservices by DDD
Implementing Microservices by DDDImplementing Microservices by DDD
Implementing Microservices by DDDAmazon Web Services
 

Similar to Samza london hug presentation Aug 5 2015 - Garry Turkington (20)

Best Practices for Scaling an InfluxEnterprise Cluster
Best Practices for Scaling an InfluxEnterprise ClusterBest Practices for Scaling an InfluxEnterprise Cluster
Best Practices for Scaling an InfluxEnterprise Cluster
 
Praxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudPraxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloud
 
RTBkit 2.0 Roadmap Preview
RTBkit 2.0 Roadmap PreviewRTBkit 2.0 Roadmap Preview
RTBkit 2.0 Roadmap Preview
 
Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...
Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...
Moving your social collaboration infrastructure to the Cloud. Stairway to Hea...
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
 
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
 
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
 
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
 
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
 
RTBkit Introduction & Best Practices
RTBkit Introduction & Best PracticesRTBkit Introduction & Best Practices
RTBkit Introduction & Best Practices
 
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
Vonage & Aspect: Transform Real-Time Communications & Customer Engagement (TL...
 
Our trading infrastructure for next generation trading with quant and automat...
Our trading infrastructure for next generation trading with quant and automat...Our trading infrastructure for next generation trading with quant and automat...
Our trading infrastructure for next generation trading with quant and automat...
 
Building a [micro]services platform on AWS
Building a [micro]services platform on AWSBuilding a [micro]services platform on AWS
Building a [micro]services platform on AWS
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
2019 03-13-implementing microservices by ddd
2019 03-13-implementing microservices by ddd2019 03-13-implementing microservices by ddd
2019 03-13-implementing microservices by ddd
 
Implementing Microservices by DDD
Implementing Microservices by DDDImplementing Microservices by DDD
Implementing Microservices by DDD
 

Recently uploaded

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad EscortsCall girls in Ahmedabad High profile
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Recently uploaded (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Samza london hug presentation Aug 5 2015 - Garry Turkington

  • 1. Copyright © 2015 Improve Digital - All Rights Reserved Example Use of Samza at Improve Digital Garry Turkington CTO g.turkington@improvedigital.com @garryturk
  • 2. Copyright © 2015 Improve Digital - All Rights Reserved • Cloud-based Real Time Advertising Technology • Focus on the premium publisher / media owner • Integrations with thousands of Demand Partners • Decision driven by Real Time Data • Offices in UK, NL, DE & ES • +100 Premium Publishers IMPROVE DIGITAL ABOUT US
  • 3. Copyright © 2015 Improve Digital - All Rights Reserved Data Sources
  • 4. Copyright © 2015 Improve Digital - All Rights Reserved • We have lots of data coming from the ad server fleet • We need to get to it as fast as possible • Kafka was a natural hub for that data • And Samza was a natural extension for message processing • Used for both single message notifications and aggregations Why Samza is useful
  • 5. Copyright © 2015 Improve Digital - All Rights Reserved Samza integration • A Samza job runs within multiple YARN containers • Each container runs multiple tasks • The task instances are the custom code • Each task is the dedicated consumer of a single partition in each of its input topics • Samza checkpoints task progress to allow recovery after task failure/restart
  • 6. Copyright © 2015 Improve Digital - All Rights Reserved Samza API Every Samza task implements the StreamTask interface which has one method: Void process( IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator);
  • 7. Copyright © 2015 Improve Digital - All Rights Reserved Samza API (continued) Optionally the initableTask and windowableTask interfaces can also be implemented: void init(Config config, TaskContext context); • Void window(MessageCollector collector, TaskCoordinator coordinator);
  • 8. Copyright © 2015 Improve Digital - All Rights Reserved For each client produce 1 minute aggregates: • Timestamp • Client • Number of impressions (ads won) • Number of bids received • Amount of budget spent Example Problem Statement
  • 9. Copyright © 2015 Improve Digital - All Rights Reserved • Log files pushed into S3 which creates SQS notifications • Ingest application reads the messages, and pulls files • Then splits files and pushes records into Kafka • Ensures correct partition key • Hold the obvious question about where the ingest lives… Step 1: Getting Data Into Kafka
  • 10. Copyright © 2015 Improve Digital - All Rights Reserved S3Ad Server Logs Ingest App SQS Notifications Get Request Logs Kafka Log Records Getting Data Into Kafka
  • 11. Copyright © 2015 Improve Digital - All Rights Reserved • Raw ingest rate is very high • So a first job does 10 second aggregations on the input • This also does some column filtering/cleaning • Not strictly necessary but the output proved very useful • Plus helps reduce YARN resource requirements Step 2: Top-Level Aggregation
  • 12. Copyright © 2015 Improve Digital - All Rights Reserved Top Level Aggregation Impression Reduction Imp Logs Bid Logs Bids Reduction
  • 13. Copyright © 2015 Improve Digital - All Rights Reserved • First step that is specific to this use case • From the 10 second aggs combines into per-minute aggregates • Importantly also reduces to a much smaller set of key fields • Builds aggregates using persistent task state Step 3: Main Aggregation
  • 14. Copyright © 2015 Improve Digital - All Rights Reserved Main Aggregation Impression Reduction Imp Logs Bid Logs Bids Reduction Bids Aggregation Impression Aggregation 1 min imp aggs 1 min bid aggs
  • 15. Copyright © 2015 Improve Digital - All Rights Reserved • Next job in the chain creates composite structures • The imp and bid aggregates are combined into single records • Heavily uses Samza persistent state • Data object is stored for each minute and fields filled from both streams Step 4: Crossing the Streams
  • 16. Copyright © 2015 Improve Digital - All Rights Reserved Crossing The Stream Impression Reduction Imp Logs Bid Logs Bids Reduction Bids Aggregation Impression Aggregation 1 min imp aggs 1 min bid aggs Combiner Composite records
  • 17. Copyright © 2015 Improve Digital - All Rights Reserved This doesn’t give us revenue data converted to client currency But to know how to convert we need reference data: • Exchange rate information • The preferred currency for each client • We use Samza bootstrap streams to inject this information • Then update as new data arrives Reference data
  • 18. Copyright © 2015 Improve Digital - All Rights Reserved Reference Data Impression Reduction Imp Logs Bid Logs Bids Reduction Bids Aggregation Impression Aggregation 1 min imp aggs 1 min bid aggs Combiner Composite records Reference Streams
  • 19. Copyright © 2015 Improve Digital - All Rights Reserved Step 5: Getting Data Out of Kafka Final job reads the composite records and pushes downstream Where export lives is flexible: • Write directly from final job • Add a specific output job • Use an external client • Kafka copycat may give a common framework here
  • 20. Copyright © 2015 Improve Digital - All Rights Reserved Getting Data Out of Kafka Impression Reduction Imp Logs Bid Logs Bids Reduction Bids Aggregation Impression Aggregation 1 min imp aggs 1 min bid aggs Combiner Composite records Reference Streams Exporter Downstream Data
  • 21. Copyright © 2015 Improve Digital - All Rights Reserved Final Bill of Materials To meet our requirement we use a combination of: • External ingest client • Initial data reduction job • Data aggregation job • Stream combiner job • Data exporter job
  • 22. Copyright © 2015 Improve Digital - All Rights Reserved Summary • Samza gives great flexibility for small window aggregation • Decomposing into multiple jobs creates reusable intermediate data • Remember the output of each job is available to any Kafka client • Much thought often goes into getting data into and out of Kafka • Kafka copycat may make that a whole lot easier
  • 23. Copyright © 2015 Improve Digital - All Rights Reserved Questions?