Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stream based mobile and web event tracking backed by aws kinesis

1,534 views

Published on

In these slides I introduce our open source ETL framework for stream-based mobile and web event tracking "Alchemist". You'll also learn how to easily and with little cost in-house all your event tracking thanks to some AWS tools.

Published in: Engineering
  • Be the first to comment

Stream based mobile and web event tracking backed by aws kinesis

  1. 1. Stream-Based Mobile and Web Event Tracking An introduction into Alchemist
  2. 2. About me • Sebastian Schleicher, Director of Engineering @Blinkist
 https://about.me/sebastian.schleicher 2
  3. 3. A book’s key insights distilled into bite-sized, 15-minute reads 15-minute book summaries A library of 2,500+ bite-sized 
 insights for reading and listening Text and audio Blinkist – Big Ideas in Small Packages
  4. 4. What’s in it for you? • Innovative solutions for in-house web and mobile event tracking • A stream-based approach to event tracking with AWS • An Open Source framework that helps you wire all this together 4
  5. 5. What is Event Tracking? • User behavior in your apps • Click behavior / Page visits on your websites • E-Mail openings / interactions • Business events from your backend • … many more 5
  6. 6. Typical setup 6 Google Analytics Marketing Analytics Provider Product Newsletter System CRM
  7. 7. These are information silos and dead ends.
  8. 8. Information Silos & Dead Ends 8 Google Analytics Analytics Provider Marketing Product Newsletter System CRM
  9. 9. How many daily active users do we have?
  10. 10. Information Silos & Dead Ends 10 Google Analytics Analytics Provider Marketing Product Newsletter System CRM 2,000 7,000 1,000
  11. 11. Can’t all events reside in a central place …
  12. 12. … so that reports are based on the same data?
  13. 13. A Central Event Stream 13 Event Stream Marketing Product CRM 
 Business Intelligence System
  14. 14. Could we also route events 
 to other systems?
  15. 15. Event Routing 15 Event Stream Marketing Product CRM 
 Business Intelligence System External
 System External
 System
  16. 16. A Reactive System (Fantasy) 16 Event Stream Marketing Product CRM 
 Business Intelligence System External
 System External
 System
  17. 17. How can we realise 
 this with AWS?
  18. 18. A Central Event Stream 18 Event Stream Marketing Product CRM 
 Business Intelligence System External
 System External
 System
  19. 19. A Central Event Stream 19 Event Stream Marketing Product CRM External
 System External
 System 
 Redshift
  20. 20. Streaming System on AWS 20 Collector Kinesis Stream Kinesis
 Firehose Lambda External
 System 
 Redshift
  21. 21. What is this Kinesis?
  22. 22. AWS Kinesis 22 Record Ordered by time of arrival Retained up to 7 days Consumer
 Read records
 and do 
 something Producer putRecords Producer putRecords Producer putRecords
  23. 23. AWS Kinesis Record 23 Base64 Encoded Binary
 e.g. JSON String * https://docs.aws.amazon.com/kinesis/latest/APIReference/API_Record.html
  24. 24. Streaming System on AWS 24 Collector Kinesis Stream Kinesis
 Firehose Lambda External
 System 
 Redshift
  25. 25. AWS Kinesis Firehose 25 Producer putRecords 
 Redshift Copy Records Producer putRecords Producer putRecords Managed 
 AWS
 Consumer • Managed Kinesis Application • Copies records to Redshift 
 (or other AWS data stores)
 • putRecords ensures that only 
 JSON inserted • Object keys get mapped to 
 columns in Redshift
  26. 26. Streaming System on AWS 26 Collector Kinesis Stream Kinesis
 Firehose Lambda External
 System 
 Redshift
  27. 27. What is this Lambda?
  28. 28. AWS Lambda 28 Lambda • “Serverless” Application • Simple Function Executions • JavaScript / Java / Python / Go S3 Event Kinesis Event Many More Events
  29. 29. Streaming System on AWS 29 Kinesis Stream Kinesis
 Firehose Lambda External
 System 
 Redshift Running on Lambda CollectorAlchemist
  30. 30. Welcome Alchemist • Lightweight E(xtract) T(ransform) L(oad) framework written in JS • Ideal for the usage in an AWS Lambda environment • Many built-in adapters to extract data from AWS resources (like S3) • Easy to extend 30
  31. 31. Alchemist Pipeline 31 Input Transformations Output Pipeline Data Pipeline Data Pipeline Data Faulty
 Data Sane
 Data AdapterAdapter
  32. 32. Alchemist use cases 32 Mobile Tracking
 Alchemist Web tracking Alchemist Email Tracking Alchemist Kinesis
  33. 33. Web Tracking 33 > trackEvent(“My Event”, “some-action”, 12);
 
 > GET https://example.com/t?event=My%20Event&action=some-action&value=12 S3 Bucket CloudFront Access Log S3 Event putRecords Alchemist Kinesis *Now: https://matomo.org/ *
  34. 34. Alchemist Web Pipeline 34 S3 Input Kinesis Output Cloudfront Log Transformation Unzip Transformation Quality Control Transformation SQS Output Pipeline Data Sane
 Data Faulty
 Data Check faulty data
 and react quickly S3 Bucket Load the file 
 from S3
  35. 35. Alchemist Use Cases 35 Log Files Web tracking Alchemist Kinesis Mobile Tracking
 Alchemist Email Tracking Alchemist
  36. 36. Mobile Tracking 36 > trackEvent(“My Mobile Event”, { “some”: “parameter” }); putRecords Kinesis Event putRecords Alchemist Kinesis Amazon Pinpoint Kinesis Pinpoint SDK
  37. 37. Alchemist Mobile Pipeline 37 Kinesis Input Kinesis Output Map Fields
 Transformation Quality Control Transformation SQS Output Pipeline Data Sane
 Data Faulty
 Data Check faulty data
 and react quickly Decode64
 Parse JSON Kinesis
  38. 38. Alchemist use cases 38 Kinesis Log Files Web tracking Alchemist Kinesis Mobile Tracking
 Alchemist Email Tracking Alchemist
  39. 39. E-Mail Tracking 39 Webhooks for User opens an email User clicks a button in an email HTTP Event putRecords Alchemist Kinesis Newsletter Provider API Gateway
  40. 40. Alchemist E-Mail Pipeline 40 HTTP Input Kinesis Output Parse Body
 Transformation Quality Control Transformation SQS Output Pipeline Data Sane
 Data Faulty
 Data Check faulty data
 and react quickly API Gateway
  41. 41. Alchemist use cases 41 Kinesis Log Files Web tracking Alchemist Kinesis Mobile Tracking
 Alchemist Email Tracking Alchemist API Gateway
  42. 42. Streaming System on AWS 42 Kinesis Stream Kinesis
 Firehose Lambda External
 System 
 Redshift Alchemist Alchemist Alchemist
  43. 43. BI System Routing 43 Kinesis Event Mobile Alchemist Kinesis 
 Redshift Web Web Events Table Mobile Events Table E-Mail E-Mail Events Table
  44. 44. Alchemist BI Pipeline 44 Kinesis Input Firehose 
 Routed Output Map Fields
 Transformation Set Route
 Transformation SQS Output Pipeline Data Sane
 Data Faulty
 Data Check faulty data
 and react quickly Decode64
 Parse JSON Kinesis Whitelist Events
 Transformation
  45. 45. Streaming System on AWS 45 Kinesis Stream Kinesis
 Firehose Lambda External
 System 
 Redshift Alchemist Alchemist Alchemist Alchemist
  46. 46. External System Routing 46 Kinesis Event POST https://external.system/events Alchemist Kinesis External
 System
  47. 47. Alchemist External Pipeline 47 Kinesis Input HTTP Output Prepare Body
 Transformation SQS Output Pipeline Data Sane
 Data Faulty
 Data Check faulty data
 and react quickly Decode64
 Parse JSON Kinesis Whitelist Events
 Transformation
  48. 48. Streaming System on AWS 48 Kinesis Stream External
 System 
 Redshift Alchemist Alchemist Alchemist Alchemist Alchemist
  49. 49. We use Alchemist to 
 route our data streams anywhere we need them.
  50. 50. is Open Source https://github.com/blinkist/alchemist Currently only available in JavaScript Licensed under MIT
  51. 51. Let’s talk about the costs…
  52. 52. Pricing • We pay currently 150$ for 200 million events per month* 🤫 • Everything is fully managed and scaled by AWS 🤗 • Simple to maintain and operate as a developer 🤓 • No expensive pre-build solution required ✌ 52 *Redshift not included (around 0.25$/hour per node)
  53. 53. Key takeaways
  54. 54. Key takeaways • In-house event tracking can be scalable & affordable 💸 • A central data stream containing all relevant events is awesome 🚀 • Having a simple streaming architecture rocks🤘 • Embrace the vendor lock-in – it helps you more than it hurts 🤠 54 Don’t be afraid and in-house your event tracking!
  55. 55. Thank you for your attention. https://about.me/sebastian.schleicher

×