Scaling event tracking can be a challenge. But with a smart combination of AWS' services, this can be achieved with little effort and costs. This talk shows how Blinkist moved Web and Mobile Analytics in-house with a globally distributed and fully managed solution for little cost thanks to AWS.
2. About me
• Sebastian Schleicher, Director of Engineering @Blinkist
https://about.me/sebastian.schleicher
• With Blinkist since the very beginning
2
3. What’s in it for you?
• How we went from expensive and disconnected event tracking
to an affordable and fully connected Business Intelligence solution
• Many buzzwords and product names…
3
8. –An Anonymous Product Manager
“Mixpanel has it all to run AB tests, e-mail and push campaigns.
I saw on their website that they also have artificial intelligence!
Engineers only have to add this tiny SDK. It’s sooo easy….“
23. Tracking requirements
• All data integrated inside a central BI database
• Backend data
• Web & Mobile tracking
• External vendor data (payment etc)
• … whatever comes in the future
23
29. Self-hosted Web tracking
29
Website
Backend
NGINX
Postgres
Copy with
Data Pipeline
Copy with
custom scripts
Redshift
API
tracking.example.com
Custom
access log
processing
action_name=Something
idsite=1
rec=1
r=322284
url=https://mywebsite.com/a_page
… Quite some columns 😎
Running on
Digital Ocean
GET https://tracking.example.com/t?action_name=Something…..piwik.js
31. Slow network roundtrips
31
Digital Ocean
Machine in Frankfurt
200ms
400ms
300ms
100ms
400ms
400ms
300ms
Estimations based on
https://wondernetwork.com/pings
32. Self-hosted Web tracking
• Everything is under our control
• Data available inside Redshift
• Running outside of AWS on Digital Ocean
• Custom-configured NGINX and PostgresDB
• Custom log processing and data copy breaks sometimes
• Poor performance/latency leads to data inconsistencies
32
34. AWS powered Web tracking
34
Website
Backend
CloudFront
Copy with
Data Pipeline
Redshift
API
tracking.example.com
S3 Bucket
Store Access Logs
S3 Bucket
/t (image/gif)
Lambda
Notify putObject
Kinesis Firehose
Parse and putRecords
GET https://tracking.example.com/t?action_name=Something…..piwik.js
Load data
36. AWS powered Web tracking
• Very cost effective 🤫🤓
• Everything is under our control
• Data available inside Redshift
• Native citizen in our stack
• All managed AWS services
• Little latency leads to better data consistency
36
39. Mixpanel mobile tracking
• Fully managed service
• Great self-service User Interface
• Amazing SDKs
• Becomes very expensive
• Data extraction is tricky
• Many unused features
39
43. Pinpoint mobile tracking
• Everything is under our control
• Data available inside Redshift
• Native citizen in our stack
• All managed AWS services, CloudFront backed
• Very affordable (100 million events per month for free!)
• Rather young product (but sufficient for our need)
• Very basic User Interface
43
49. Technologies used
• Redshift for central BI database
• Some ETL for production databases
• Piwik.js and CloudFront, Lambda and Firehose for Web tracking
• Pinpoint and Kinesis for Mobile tracking
49
50. Tracking requirements
• All data integrated inside a central BI database ✅
• Backend data ✅
• Web & Mobile tracking ✅
• External vendor data ✅ (didn’t mention it, but it’s there)
• Ready for the future
50
51. A few thoughts
• Raw data requires expertise on how to deal with it 😐
• Tools for Redshift aren’t cheap (Matillion, Periscope, etc) 😕
• Data autonomy is amazing 🤩
• This solution is fully managed and scales for us 🤗
51
53. More tracking requirements
• Data quality has to improve
• Tracking data should be centrally available in realtime:
• Create realtime monitoring
• Send events to backend systems or external services
• Immediate feedback
• Data transformation and enrichment should happen in realtime
53
59. CloudFront Example
59
S3 Input
Kinesis Output
Cloudfront Log
Transformation
Unzip
Transformation
Quality Control
Transformation
SQS Output
Pipeline
Data
Sane
Data
Faulty
Data
Check faulty data
and react quickly
CF Logs
63. Key takeaways
• Tracking can be scalable & affordable 💸
• Data autonomy is 🚀
• Start collecting raw data early while it’s manageable
• Get your Business Intelligence homework done.
Define and collect metrics that you can trust!
63