Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big and fast a quest for relevant and real-time analytics


Published on

Our retail banking market demands now more than ever to stay close to our customers, and to carefully understand what services, products, and wishes are relevant for each customer at any given time.

This sort of marketing research is often beyond the capacity of traditional BI reporting frameworks. In this talk, we illustrate how we team up data scientists and big data engineers in order to create and scale distributed analyses on a big data platform.

Published in: Marketing
  • Be the first to comment

Big and fast a quest for relevant and real-time analytics

  1. 1. Big & Fast: A quest for relevant and real-time analytics Natalino Busa @natalinobusa
  2. 2. Parallelism Mathematics Programming Languages Machine Learning Statistics Big Data Algorithms Cloud Computing Natalino Busa @natalinobusa
  3. 3. Big and Fast. Methodology Architecture Roles and organization
  4. 4. Conversion is the ultimate form of permission marketing Permission marketing is about the honour of being heard. How to earn it ? Provide the right suggestions, at the right time. This is what makes data analysis valuable
  5. 5. When do you really know your customer ? know about last unique: 5 songs? 100 songs? 10’000 songs?
  6. 6. Old & New stuff. We evolve slowly, our personality, our habits. But events and trends can affect us on a short notice How do you combine old with new?
  7. 7. The customer’s context Complex on many dimensions: Personal history: amount of transactions ever done Long term Interaction: how the users’ action correlate with others Real time events: Trends and recent events
  8. 8. The customer’s context context is related to time: slow changing: the defining characteristic of a person fast changing: events which influence our lives, trends Require very different technology solutions !!!
  9. 9. Challenges millions of billions of Not much time to react window of opportunity sometimes is just a few seconds Load of information to process you want to understand well the user history
  10. 10. Slow and fast ranking and preference analysis segmentation and clustering short term trending topics rule-based recommendations 10’s Terabytes of Data. This can take hours …. 100’s of events per second. This must be fast ….
  11. 11. Hadoop: Distributed Data OS Reliable Distributed, Replicated File System Low cost ↓ Cost vs ↑ Performance/Storage Computing Powerhouse All clusters CPU’s working in parallel for running queries
  12. 12. Scala / Akka / Spray: a WEB API reactive framework Actor A Actor B Actor C msg 1 msg 2 msg 3 msg 4 ● it scales horizontally (can run in cluster mode) ● maximum use of the available cores/memory 1. processing is non-blocking, threads are re-used 2. can parallelize computing power across many actors Very fast: 1000’s messages/sec Very reliable: auto recovery
  13. 13. Distributed computing: lambda architecture Batch Computing HTTP RESTful API In-Memory Distributed Database In-memory Distributed DB’s Lambda Architecture Batch + Streaming low-latency Web API services Streaming Computing Data Warehouses Messaging Busses
  14. 14. Distributed computing: some techs Hadoop Cassandra millions of billions of λ= conversions ( lamda )
  15. 15. All Things Distributed Distributing computing and storage more machines = more storage/computing Open Source software solutions mature enough for pragmatic adopters Near realtime + big data technologies Hadoop, Scala, Akka, Spray, Cassandra
  16. 16. Science & Engineering Statistics, Data Science Python R Visualization IT Infra Big Data Java Scala SQL Hadoop: Big Data Infrastructure, Data Science on large datasets Big Data and Fast Data requires different profiles to be able to achieve the best results
  17. 17. Parallelism Mathematics Programming Languages Machine Learning Statistics Big Data Algorithms Cloud Computing Natalino Busa @natalinobusa Thanks ! Any questions?
  18. 18. Natalino Busa @natalinobusa