λ architecture applied at Exponea.

321 views

Published on

We will be talking about components of our architecture, how do data flow via those components and what is the real life use case applied to it. Evolution and vision as critical aspect of building such platform. Technology stack that we will talk about includes Kafka, Hadoop stack, Spark, Legacy MongoDB, IMF.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
321
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

λ architecture applied at Exponea.

  1. 1. Analy&cs for the fastest growing companies
  2. 2. λ architecture applied at Exponea Mar&n Strýček, 10.3.2016
  3. 3. Choose your stack
  4. 4. Intuitive approach •  Collect data to (no)SQL DB •  Running live queries against (no)SQL database •  UI will be like SQL generator •  But may/will eventually result in slow queries •  Batch preprocessing of data •  Continues change of report definitions •  Delays / Over night results / no more night
  5. 5. Conversion Funnel
  6. 6. Valid solutions - SQL FROM events e1 LEFT JOIN events e2 ON e1.customer_id = e2.customer_id AND e2.type = 'view_item' AND e1.?mestamp < e2.?mestamp LEFT JOIN events e3 ON e2.customer_id = e3.customer_id AND e3.type = 'add_to_cart' AND e2.?mestamp < e3.?mestamp
  7. 7. Valid solutions - NoSQL var map = func?on () { var steps = ['view_item', 'add_to_cart', 'buy’]; var counts = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]; var i = 0; for (var j in this.value.events) { var event = this.value.events[j]; if (event['type'] == steps[i]) { counts[i]++; i++; if (i === steps.length) break; } } if (i > 0) emit('funnel', {'counts': counts}); };
  8. 8. Valid solutions - NoSQL var reduce = func?on (key, values) { var counts = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]; for (var i in values) { for (var j in values[i].counts) { counts[j] += values[i].counts[j]; } } return {'counts': counts}; }; db.embeded_customers.mapReduce(map, reduce, { out: 'customers_matched_funnel_1' }).find();
  9. 9. Alternative solutions - custom in memory database
  10. 10. IMF – Customer data structure
  11. 11. IMF – Basic structure project1 customer1 event1 ?mestamp proper?es property1, value1 ... event2 ... proper?es property1, value1 ... …
  12. 12. IMF •  Sharding •  Customer Id as sharding key •  Replication •  IMF –master knows how many shards and replicas are connected •  Loading •  From a stream of data
  13. 13. App architecture
  14. 14. λ architecture
  15. 15. λ architecture
  16. 16. We have speed we need volume •  Fast layer is solved •  Big data requirements •  Loading old data into fast layer •  0 data expiration •  Access to data from BI tools •  Custom queries
  17. 17. λ architecture
  18. 18. Map-R •  Map-R filesystem •  Direct access to files that are stored within cluster •  Faster than HDFS •  Map-R distribution •  No dependency hell
  19. 19. λ architecture
  20. 20. Data collec?on API : Real?me vs Async •  Realtime •  Customer segments •  Website customization •  Recommendations •  personalization •  Async •  Do not lose data •  Event driven campaigns
  21. 21. Data Collection
  22. 22. Real ?me – web customiza?on
  23. 23. Even trigger campaign
  24. 24. λ architecture at Exponea
  25. 25. Takeaways •  Lambda solves two contradictory challenges •  Process data fast •  Process very big data •  Apache Spark is good choice for both speed & batch layer, anyway our IMF is way faster :-)
  26. 26. Thank you.

×