SlideShare a Scribd company logo
1 of 79
Niels Basjes | Principal IT Architect
How to handle
100K events/sec
Measuring 2.0
• Context
• Why Measuring 2.0 (M2)
• Measuring
• Cause and effect
• End-to-end pipeline
• State machines
Agenda
TU-Delft Computer Science (MSc)
Nyenrode Business School (MSc)
Software Developer (USoft)
Research Scientist (NLR)
Infra Architect (NLR)
WebAnalytics Architect (Moniforce)
IT Architect/Inventor (Bol.com)
Contributor for Apache Hadoop,
Pig, HBase, Flink, Beam, Parquet, …
Apache Avro Committer & PMC
Niels Basjes
nbasjes@bol.com
@nielsbasjes
https://github.com/nielsbasjes
Previously at BBuzz
20.074.985
Measuring interaction
• What are we showing (and why)?
• How are our visitors responding?
• Pages
• Products/Offers
• Add to cart
• Purchase
• Advertising
• Inspiration
Use cases
• Dashboards
• Personalization
• Site optimization
• Fraud prevention
• Data Science
• …
Why M2?
We already have
Omniture/GA…
~ 3K-4K pages/sec
~ 30 events/page
~ 100K events/sec
Q2 2019: Per day
• 109 measurements
• 2TiB data
More details
JavaScript data is …
• Broken … fundamentally broken
• Measure side effect
• Missing & Duplicate orders
• Blockable
• Intelligent Tracking Protection, Adblockers, Spiders, Hackers, …
• “Boxed” SaaS solutions.
• Not enough ‘eVars’.
Loading extra labels
needed for reporting.
JavaScript data …
SRSLY?
Measurements are…
too old.
• Available once every 24 hours.
• So personalization is a ‘day behind’.
Useless inspiration:
I was interested in this YESTERDAY
Data relevance decay
Age of the data
Valueofthedata
Days WeeksMinutes
Time is not always important
Crowd pattern analysis of website usage
Building a better website for future visitors: Batch
Individual pattern analysis of website usage
Supporting and advising the current visitor: Realtime
Batch processing
Stream processing
M2
Measuring 2.0
It’s really all about…
• Measuring
• Better
• Processing
• Faster
• Applications
• More relevant
Goals of “Measuring 2.0”
• Measure everything of our website
• All interactions (also AJAX)
• All channels (also mobile, email, …)
• All countries
• All details
• All visitors (also Googlebot)
• More reliable data
• Lowest possible load on the client
• Lowest possible latency (< 1 second)
Goals of “Measuring 2.0”
• Developer
• Easy to build
• Easy to validate
• Test automation
• Business
• Always measure everything
• Data is “independent”
• New questions are allowed
Goals of “Measuring 2.0”
• Privacy by design
• General Data Protection Regulation (GDPR)
• Algemene verordening gegevensbescherming (AVG)
• No long term profiling
• Security
• Avoid storing “login” info.
• Business
• Do long term profiling
THE goal of “Measuring 2.0”
Make the best possible
interaction data stream.
Measure
Where to measure?
• Measure where “it” happens.
• “In” the responsible “frontend” service!
• Webshop
• App API
• Basket Service
• Order Service
• …
• Usually NOT in the browser
Measure pages
• Serverside
• What is in the page
• Clientside (Javascript)
• What part was viewed
• Screen resolution
Measure orders
• Listen to Order events!
• with website/app sessionid.
• The “Order confirmation” page.
• Just a viewing of the order.
Record everything at the start
• Measure what “really” happens.
• Keep all relevant details
• Product: ProductId, Product type, …
• Offer: OfferId, ProductId, Price, Condition, SellerId, …
• Later joining on productid/offerid is “impossible”.
• Webshop caching
• Data volume / Extra latency
Cause
and effect
• All single event entities
• No correlations
• No logical ‘cause and effect’
This is easy
Our usecases
• Banner optimization
• Look / Search  Next page better banner
• A/B testing
• Show feature  Use  Buy product
• Search Suggestions
• Search  Find  Choose  Buy product
• Attribution modeling
• Show ad  Click  Buy product
• …
Behavioral analytics
• Cause and effect
• Action: We show something
• Reaction: To click or not to click
Event ordering matters
• Click banner , Buy product
• Banner WAS (possibly) part of reason to buy.
• Buy product, Click banner
• Banner WAS NOT part of reason to buy.
“WAS” or “WAS NOT” is based on
The ordering of the events
Finite State machine
• Simple, low latency, pattern detection
• Ordered events
Pushdown automaton
• State machine
• with a memory stack
• Simple, low latency,
pattern detection
• Ordered events
Event ordering matters
• A fast temperature change is dangerous
• should alert IMMEDIATELY
• Delta stays in bounds
• Expect “Ordered”
while (curr = newEvent()) {
if (tooBig(curr, prev))
sendAlert();
prev = curr;
}
-40
-20
0
20
40
60
80
100
T1 T2 T3 T4 T5 T6 T7 T8 T9
Temperature Delta
This is a simple
pushdown automaton
Event ordering matters
• A fast temperature change is dangerous
• should alert IMMEDIATELY
Ordering problems
• Many false positives !
• Many false negatives !
-40
-20
0
20
40
60
80
100
T1 T5 T7 T4 T2 T8 T3 T6 T9
Temperature Delta
!
! !
!
!
Repairing event ordering
• Is hard
• Needless complexity
• Takes time
• Buffer for the maximum ‘out-of-orderness’ period.
• Several minutes
• We want really low latency
123 4 56 78 9
Sliding time based
sort buffer
Exactly once please
• At least once
• Need data deduplication 
• Is hard
• Large memory buffer
• Idempotent output
• Takes time
We need
ordering guarantees
per session
Achieving end-to-end ordering...
1. The measuring point
2. Measurement transport
3. Measurement processing
The measuring point
• Single entity
•  single measuring instance
• Multiple instances
• Multiple output buffers
• Race conditions
• Ordering problems
The measuring point
In IOT:
• One temperature sensor
• one recording device
At bol.com
• One visitor
• Single webshop instance
• Session routing is a MUST have!!
• Not perfect!
• Impact negligible
• “View” measurements
• Orders
Transport
Message transport
• We need ordering per session: FIFO
• “Queue” or “Partitioned Queue”
• Session pinned to a specific partition
https://en.wikipedia.org/wiki/Queue_(abstract_data_type)
Partitioned Queue
Many “Queue” are not a Queue !
https://en.wikipedia.org/wiki/Java_Message_Service
https://stackoverflow.com/questions/16300353/activemq-lifo-ordering
Many “Queue” are not a Queue !
https://cloud.google.com/pubsub/docs/ordering
High volume partitioned queues
• Apache Kafka
• https://kafka.apache.org/
• Production ready
• Apache Pulsar
• https://pulsar.apache.org/
• Connector for Flink very new.
• Pravega
• http://pravega.io/
• Not yet production ready
• Amazon Kinesis
• Sorry, wrong cloud
• Microsoft Event Hubs
• Sorry, wrong cloud Azure Event Hubs
Use
Google PubSub
If ordering
does not matter
at-least-once
is Ok.
High IO
‘Distributed Set’
If ordering
matters
and/or
exactly-once
is needed
High IO
‘Partitioned Queue’
Use
Apache Kafka
Processing
Measurement processing
Requirements
• Low latency
• Exactly once
• Ordering guarantees
• A pushdown automaton per session
• Keyed Stateful processing
• Where the ‘key’ is the ‘session id’
Choosing a Processing toolkit
Apache Beam
• Low latency … except
• Exactly once by deduplication
• NO ordering guarantees
• NO natural keyed stateful processing
• “Dynamic” scaling
• Abstract Java API
• Runs on
• DataFlow
• Flink
Choosing a Processing toolkit
Apache Flink
• Low latency
• Exactly once
• Ordering guarantees
• Keyed Stateful processing
• “Fixed” scaling
• Easy Java API
• Runs on
• Hadoop
• Kubernetes
Changes
happen
Applications change!
• New business
• New insights
• New wishes
• New scope
• New …
The records will
•  get new fields
•  have obsolete fields
Data producerData producerData producer
Streaming applications
Data producer Streaming Interface
Data consumers
Data consumersData consumers
Data consumers
The real payload is
“byte array”
Multiple Applications
Rolling upgrades
Canary releases
Multiple Applications
Rolling upgrades
Canary releases
Kafka persists messages
• A message is retained until the TTL expired.
• So a topic will contain several message versions!
• With different fields
V1 V2
V3 V4
So we need something to
• Serialize records into bytes
• Data types
• Nested records
• Bidirectional Schema evolution
Apache Avro
is what we use !
Apache Avro (IDL Schema)
Code generation
Avro Message format
• Single record into bytes encoding
• Designed for evolving streaming applications
• Need schema database:
• Key = 64bit long
• Value = String
The json version of the schema
Producing from Flink into Kafka
Consume from Kafka into Flink
Consume from Kafka into Flink
AnonymizedPersonal
Measuring 2.0Browser
Measuring2.0
JavaScript
HTML
Measure
endpoint
Kafka
Sessionizer
Kafka
Kafka
YearsAnonymize
Personalization, Campaigning
Search Suggest, RECO, …
Analytics (Web, Price, …)
Security
Fraud prevention
WebShop
Build
HTML
Measuring2.0
SORSOROrders/…
VisitId GeoIP Useragent
Files/BQ
Files/BQ
Using
the data
Search Suggestions
The search funnel
Search
Find
(Product page)
Choose
(Add to cart)
Buy
Deeper = more relevant
(Search)
Find
Choose
Buy
High level
M2
Stateful
analysis
Relevant events
Suggestions
Event to
scored
suggestions
Suggestion Delta
Is this a valuable event?
Pushdown Automaton
1. Input:
“Harry Potter” PDP
2. Score:
“Harry Potter” 25
3. Suggest:
“H”  “Harry Potter”: 25
“Ha”  “Harry Potter”: 25
“Har”  “Harry Potter”: 25
Ordering is important Ordering is NOT important
Search Suggestions
Default
no search
Searched
Found
Chosen
Initial
Search
for “X”
To PDP
Add to Cart
Within PDP
Anythingelse
The PDP is really a set of
pages about the product,
offers, reviews, …
= Send out event that something useful happened
Searched
for “X”
PDP for “X”
ATC for “X”
Recommended reading
Join us
70
https://careers.bol.com
Thanks
Any questions?

More Related Content

What's hot

Getting Started With AWS Security
Getting Started With AWS SecurityGetting Started With AWS Security
Getting Started With AWS SecurityAmazon Web Services
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistentconfluent
 
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...Amazon Web Services
 
ENT312 Learn about Software Procurement Using AWS Marketplace and Service Cat...
ENT312 Learn about Software Procurement Using AWS Marketplace and Service Cat...ENT312 Learn about Software Procurement Using AWS Marketplace and Service Cat...
ENT312 Learn about Software Procurement Using AWS Marketplace and Service Cat...Amazon Web Services
 
The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)Julien SIMON
 
Unified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logUnified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logAlexander Dean
 
AWS re:Invent 2016: Re-imagining Insurance Processes with AWS Mobile Services...
AWS re:Invent 2016: Re-imagining Insurance Processes with AWS Mobile Services...AWS re:Invent 2016: Re-imagining Insurance Processes with AWS Mobile Services...
AWS re:Invent 2016: Re-imagining Insurance Processes with AWS Mobile Services...Amazon Web Services
 
Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS SoftServe
 
AWS re:Invent 2016: Offload Security Heavy-lifting to the AWS Edge (CTD204)
AWS re:Invent 2016: Offload Security Heavy-lifting to the AWS Edge (CTD204)AWS re:Invent 2016: Offload Security Heavy-lifting to the AWS Edge (CTD204)
AWS re:Invent 2016: Offload Security Heavy-lifting to the AWS Edge (CTD204)Amazon Web Services
 
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...Lucas Jellema
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014Amazon Web Services
 
Implementing a Serverless IoT Architecture - Pop-up Loft TLV 2017
Implementing a Serverless IoT Architecture - Pop-up Loft TLV 2017Implementing a Serverless IoT Architecture - Pop-up Loft TLV 2017
Implementing a Serverless IoT Architecture - Pop-up Loft TLV 2017Amazon Web Services
 
Customer Case Study: Achieving PCI Compliance in AWS
Customer Case Study: Achieving PCI Compliance in AWSCustomer Case Study: Achieving PCI Compliance in AWS
Customer Case Study: Achieving PCI Compliance in AWSAmazon Web Services
 

What's hot (13)

Getting Started With AWS Security
Getting Started With AWS SecurityGetting Started With AWS Security
Getting Started With AWS Security
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...
 
ENT312 Learn about Software Procurement Using AWS Marketplace and Service Cat...
ENT312 Learn about Software Procurement Using AWS Marketplace and Service Cat...ENT312 Learn about Software Procurement Using AWS Marketplace and Service Cat...
ENT312 Learn about Software Procurement Using AWS Marketplace and Service Cat...
 
The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)
 
Unified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logUnified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified log
 
AWS re:Invent 2016: Re-imagining Insurance Processes with AWS Mobile Services...
AWS re:Invent 2016: Re-imagining Insurance Processes with AWS Mobile Services...AWS re:Invent 2016: Re-imagining Insurance Processes with AWS Mobile Services...
AWS re:Invent 2016: Re-imagining Insurance Processes with AWS Mobile Services...
 
Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS
 
AWS re:Invent 2016: Offload Security Heavy-lifting to the AWS Edge (CTD204)
AWS re:Invent 2016: Offload Security Heavy-lifting to the AWS Edge (CTD204)AWS re:Invent 2016: Offload Security Heavy-lifting to the AWS Edge (CTD204)
AWS re:Invent 2016: Offload Security Heavy-lifting to the AWS Edge (CTD204)
 
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
 
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
 
Implementing a Serverless IoT Architecture - Pop-up Loft TLV 2017
Implementing a Serverless IoT Architecture - Pop-up Loft TLV 2017Implementing a Serverless IoT Architecture - Pop-up Loft TLV 2017
Implementing a Serverless IoT Architecture - Pop-up Loft TLV 2017
 
Customer Case Study: Achieving PCI Compliance in AWS
Customer Case Study: Achieving PCI Compliance in AWSCustomer Case Study: Achieving PCI Compliance in AWS
Customer Case Study: Achieving PCI Compliance in AWS
 

Similar to Measuring 2.0 - How to handle 100K events/sec - Berlin Buzzwords 2019

Iot meets Serverless
Iot meets ServerlessIot meets Serverless
Iot meets ServerlessNarendran R
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without InterferenceTony Tam
 
CQRS and Event Sourcing for IoT applications
CQRS and Event Sourcing for IoT applicationsCQRS and Event Sourcing for IoT applications
CQRS and Event Sourcing for IoT applicationsMichael Blackstock
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureNiels Naglé
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesSwami Sundaramurthy
 
A Pragmatic Reference Architecture for The Internet of Things
A Pragmatic Reference Architecture for The Internet of ThingsA Pragmatic Reference Architecture for The Internet of Things
A Pragmatic Reference Architecture for The Internet of ThingsRick G. Garibay
 
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with AnalyticsWSO2
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWSCaserta
 
Code first in the cloud: going serverless with Azure
Code first in the cloud: going serverless with AzureCode first in the cloud: going serverless with Azure
Code first in the cloud: going serverless with AzureJeremy Likness
 
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroDevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroGaurav "GP" Pal
 
Dances with bits - industrial data analytics made easy!
Dances with bits - industrial data analytics made easy!Dances with bits - industrial data analytics made easy!
Dances with bits - industrial data analytics made easy!Julian Feinauer
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryDataWorks Summit/Hadoop Summit
 
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesMalin Weiss
 
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSpeedment, Inc.
 
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...Sencha
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 

Similar to Measuring 2.0 - How to handle 100K events/sec - Berlin Buzzwords 2019 (20)

Iot meets Serverless
Iot meets ServerlessIot meets Serverless
Iot meets Serverless
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without Interference
 
CQRS and Event Sourcing for IoT applications
CQRS and Event Sourcing for IoT applicationsCQRS and Event Sourcing for IoT applications
CQRS and Event Sourcing for IoT applications
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architecture
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
 
A Pragmatic Reference Architecture for The Internet of Things
A Pragmatic Reference Architecture for The Internet of ThingsA Pragmatic Reference Architecture for The Internet of Things
A Pragmatic Reference Architecture for The Internet of Things
 
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
 
Code first in the cloud: going serverless with Azure
Code first in the cloud: going serverless with AzureCode first in the cloud: going serverless with Azure
Code first in the cloud: going serverless with Azure
 
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroDevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
 
Dances with bits - industrial data analytics made easy!
Dances with bits - industrial data analytics made easy!Dances with bits - industrial data analytics made easy!
Dances with bits - industrial data analytics made easy!
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
 
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
 
2015 5-7-slide
2015 5-7-slide2015 5-7-slide
2015 5-7-slide
 
Azure によるスピードレイヤの分析アーキテクチャ
Azure によるスピードレイヤの分析アーキテクチャAzure によるスピードレイヤの分析アーキテクチャ
Azure によるスピードレイヤの分析アーキテクチャ
 
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
SenchaCon 2016: How to Auto Generate a Back-end in Minutes - Per Minborg, Emi...
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Measuring 2.0 - How to handle 100K events/sec - Berlin Buzzwords 2019

  • 1. Niels Basjes | Principal IT Architect How to handle 100K events/sec Measuring 2.0
  • 2. • Context • Why Measuring 2.0 (M2) • Measuring • Cause and effect • End-to-end pipeline • State machines Agenda
  • 3. TU-Delft Computer Science (MSc) Nyenrode Business School (MSc) Software Developer (USoft) Research Scientist (NLR) Infra Architect (NLR) WebAnalytics Architect (Moniforce) IT Architect/Inventor (Bol.com) Contributor for Apache Hadoop, Pig, HBase, Flink, Beam, Parquet, … Apache Avro Committer & PMC Niels Basjes nbasjes@bol.com @nielsbasjes https://github.com/nielsbasjes
  • 5.
  • 7.
  • 8.
  • 9. Measuring interaction • What are we showing (and why)? • How are our visitors responding? • Pages • Products/Offers • Add to cart • Purchase • Advertising • Inspiration
  • 10. Use cases • Dashboards • Personalization • Site optimization • Fraud prevention • Data Science • …
  • 11. Why M2? We already have Omniture/GA…
  • 12. ~ 3K-4K pages/sec ~ 30 events/page ~ 100K events/sec Q2 2019: Per day • 109 measurements • 2TiB data More details
  • 13. JavaScript data is … • Broken … fundamentally broken • Measure side effect • Missing & Duplicate orders • Blockable • Intelligent Tracking Protection, Adblockers, Spiders, Hackers, … • “Boxed” SaaS solutions. • Not enough ‘eVars’. Loading extra labels needed for reporting.
  • 15. Measurements are… too old. • Available once every 24 hours. • So personalization is a ‘day behind’. Useless inspiration: I was interested in this YESTERDAY
  • 16. Data relevance decay Age of the data Valueofthedata Days WeeksMinutes
  • 17. Time is not always important Crowd pattern analysis of website usage Building a better website for future visitors: Batch Individual pattern analysis of website usage Supporting and advising the current visitor: Realtime
  • 21. It’s really all about… • Measuring • Better • Processing • Faster • Applications • More relevant
  • 22. Goals of “Measuring 2.0” • Measure everything of our website • All interactions (also AJAX) • All channels (also mobile, email, …) • All countries • All details • All visitors (also Googlebot) • More reliable data • Lowest possible load on the client • Lowest possible latency (< 1 second)
  • 23. Goals of “Measuring 2.0” • Developer • Easy to build • Easy to validate • Test automation • Business • Always measure everything • Data is “independent” • New questions are allowed
  • 24. Goals of “Measuring 2.0” • Privacy by design • General Data Protection Regulation (GDPR) • Algemene verordening gegevensbescherming (AVG) • No long term profiling • Security • Avoid storing “login” info. • Business • Do long term profiling
  • 25. THE goal of “Measuring 2.0” Make the best possible interaction data stream.
  • 26.
  • 28. Where to measure? • Measure where “it” happens. • “In” the responsible “frontend” service! • Webshop • App API • Basket Service • Order Service • … • Usually NOT in the browser
  • 29. Measure pages • Serverside • What is in the page • Clientside (Javascript) • What part was viewed • Screen resolution
  • 30. Measure orders • Listen to Order events! • with website/app sessionid. • The “Order confirmation” page. • Just a viewing of the order.
  • 31. Record everything at the start • Measure what “really” happens. • Keep all relevant details • Product: ProductId, Product type, … • Offer: OfferId, ProductId, Price, Condition, SellerId, … • Later joining on productid/offerid is “impossible”. • Webshop caching • Data volume / Extra latency
  • 33. • All single event entities • No correlations • No logical ‘cause and effect’ This is easy
  • 34. Our usecases • Banner optimization • Look / Search  Next page better banner • A/B testing • Show feature  Use  Buy product • Search Suggestions • Search  Find  Choose  Buy product • Attribution modeling • Show ad  Click  Buy product • …
  • 35. Behavioral analytics • Cause and effect • Action: We show something • Reaction: To click or not to click
  • 36. Event ordering matters • Click banner , Buy product • Banner WAS (possibly) part of reason to buy. • Buy product, Click banner • Banner WAS NOT part of reason to buy. “WAS” or “WAS NOT” is based on The ordering of the events
  • 37. Finite State machine • Simple, low latency, pattern detection • Ordered events
  • 38. Pushdown automaton • State machine • with a memory stack • Simple, low latency, pattern detection • Ordered events
  • 39. Event ordering matters • A fast temperature change is dangerous • should alert IMMEDIATELY • Delta stays in bounds • Expect “Ordered” while (curr = newEvent()) { if (tooBig(curr, prev)) sendAlert(); prev = curr; } -40 -20 0 20 40 60 80 100 T1 T2 T3 T4 T5 T6 T7 T8 T9 Temperature Delta This is a simple pushdown automaton
  • 40. Event ordering matters • A fast temperature change is dangerous • should alert IMMEDIATELY Ordering problems • Many false positives ! • Many false negatives ! -40 -20 0 20 40 60 80 100 T1 T5 T7 T4 T2 T8 T3 T6 T9 Temperature Delta ! ! ! ! !
  • 41. Repairing event ordering • Is hard • Needless complexity • Takes time • Buffer for the maximum ‘out-of-orderness’ period. • Several minutes • We want really low latency 123 4 56 78 9 Sliding time based sort buffer
  • 42. Exactly once please • At least once • Need data deduplication  • Is hard • Large memory buffer • Idempotent output • Takes time
  • 44. Achieving end-to-end ordering... 1. The measuring point 2. Measurement transport 3. Measurement processing
  • 45. The measuring point • Single entity •  single measuring instance • Multiple instances • Multiple output buffers • Race conditions • Ordering problems
  • 46. The measuring point In IOT: • One temperature sensor • one recording device At bol.com • One visitor • Single webshop instance • Session routing is a MUST have!! • Not perfect! • Impact negligible • “View” measurements • Orders
  • 48. Message transport • We need ordering per session: FIFO • “Queue” or “Partitioned Queue” • Session pinned to a specific partition https://en.wikipedia.org/wiki/Queue_(abstract_data_type) Partitioned Queue
  • 49. Many “Queue” are not a Queue ! https://en.wikipedia.org/wiki/Java_Message_Service https://stackoverflow.com/questions/16300353/activemq-lifo-ordering
  • 50. Many “Queue” are not a Queue ! https://cloud.google.com/pubsub/docs/ordering
  • 51. High volume partitioned queues • Apache Kafka • https://kafka.apache.org/ • Production ready • Apache Pulsar • https://pulsar.apache.org/ • Connector for Flink very new. • Pravega • http://pravega.io/ • Not yet production ready • Amazon Kinesis • Sorry, wrong cloud • Microsoft Event Hubs • Sorry, wrong cloud Azure Event Hubs
  • 52. Use Google PubSub If ordering does not matter at-least-once is Ok. High IO ‘Distributed Set’
  • 53. If ordering matters and/or exactly-once is needed High IO ‘Partitioned Queue’ Use Apache Kafka
  • 55. Measurement processing Requirements • Low latency • Exactly once • Ordering guarantees • A pushdown automaton per session • Keyed Stateful processing • Where the ‘key’ is the ‘session id’
  • 56. Choosing a Processing toolkit Apache Beam • Low latency … except • Exactly once by deduplication • NO ordering guarantees • NO natural keyed stateful processing • “Dynamic” scaling • Abstract Java API • Runs on • DataFlow • Flink
  • 57. Choosing a Processing toolkit Apache Flink • Low latency • Exactly once • Ordering guarantees • Keyed Stateful processing • “Fixed” scaling • Easy Java API • Runs on • Hadoop • Kubernetes
  • 59. Applications change! • New business • New insights • New wishes • New scope • New … The records will •  get new fields •  have obsolete fields
  • 60. Data producerData producerData producer Streaming applications Data producer Streaming Interface Data consumers Data consumersData consumers Data consumers The real payload is “byte array” Multiple Applications Rolling upgrades Canary releases Multiple Applications Rolling upgrades Canary releases
  • 61. Kafka persists messages • A message is retained until the TTL expired. • So a topic will contain several message versions! • With different fields V1 V2 V3 V4
  • 62. So we need something to • Serialize records into bytes • Data types • Nested records • Bidirectional Schema evolution
  • 64. Apache Avro (IDL Schema) Code generation
  • 65. Avro Message format • Single record into bytes encoding • Designed for evolving streaming applications • Need schema database: • Key = 64bit long • Value = String The json version of the schema
  • 66. Producing from Flink into Kafka
  • 67. Consume from Kafka into Flink
  • 68. Consume from Kafka into Flink
  • 69. AnonymizedPersonal Measuring 2.0Browser Measuring2.0 JavaScript HTML Measure endpoint Kafka Sessionizer Kafka Kafka YearsAnonymize Personalization, Campaigning Search Suggest, RECO, … Analytics (Web, Price, …) Security Fraud prevention WebShop Build HTML Measuring2.0 SORSOROrders/… VisitId GeoIP Useragent Files/BQ Files/BQ
  • 72. The search funnel Search Find (Product page) Choose (Add to cart) Buy
  • 73. Deeper = more relevant (Search) Find Choose Buy
  • 74. High level M2 Stateful analysis Relevant events Suggestions Event to scored suggestions Suggestion Delta Is this a valuable event? Pushdown Automaton 1. Input: “Harry Potter” PDP 2. Score: “Harry Potter” 25 3. Suggest: “H”  “Harry Potter”: 25 “Ha”  “Harry Potter”: 25 “Har”  “Harry Potter”: 25 Ordering is important Ordering is NOT important
  • 75. Search Suggestions Default no search Searched Found Chosen Initial Search for “X” To PDP Add to Cart Within PDP Anythingelse The PDP is really a set of pages about the product, offers, reviews, … = Send out event that something useful happened Searched for “X” PDP for “X” ATC for “X”

Editor's Notes

  1. Just storing the globalid/offerid and later joining is impossible due to the size of the datasets, the required speed and caching.
  2. https://blog.scottlogic.com/2018/04/17/comparing-big-data-messaging.html