Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2017 05 Hadoop User Group Meetup Dublin

163 views

Published on

An introduction to data analytics and AI at Altocloud using #Spark, #Cassandra, #Kafka. #AI, #ML

Published in: Data & Analytics
  • Be the first to comment

2017 05 Hadoop User Group Meetup Dublin

  1. 1. Data @Altocloud Maciej Dabrowski, Chief Data Scientist HUG 05/2017 Dublin 1
  2. 2. Modern Customer Engagement • SMS • Web Chat • FB Messenger, Twitter DM • Offers & Surveys • Scheduled Callbacks • Customer Context • Behaviour Analytics • Call Attribution to Campaigns • Predictive Models • Voice Calls • Video • Screen-share Customer Journey Analytics Connect the dots with live analytics and AI to discover, analyse and predict customer behaviour patterns. Digital Messaging Connect with customers by having live web chat or SMS conversations, sending targeted messages and offers. Real Time Communications Connect in real time using voice, video and screensharing to engage with exceptional customer service.
  3. 3. • Engage at the best time • Accelerate revenue conversion • Improve Customer Experience • Resolve issues quickly • Reduce calls / workload • Increase First Call Resolution • Reduce bounce and abandons How Companies Benefit
  4. 4. 25 people 1 dragon 2 locations 8 nationalities having fun …and growing! 4
  5. 5. EVENT PROCESSORS Altocloud Holistic Customer Journey BATCH MODEL LEARNING ENRICHMENT MODEL EVALUATION STORAGE QUEUES Web events Call, IVR,Ticket events ACTIONS Marketing Automation SEGMENTATION CRM Web Hook AGGREGATION ACTIONS CREATION EVENT STREAMS OUTCOME PROBABILITIES REAL-TIME CUSTOMER JOURNEY
  6. 6. Holistic view of your customers 6
  7. 7. Focus on real-time analytics Make predictions on live visitors in real-time (in seconds) by: Ingesting customer actions (events) and context Building predictive models Actions offered to customers based on real-time predictions 7
  8. 8. DISCLAIMER: NO LIPSTICK This is not a sales pitch Learn from mistakes of others Show what works and what not 8
  9. 9. Agenda Engineering challenges Data platform AI platform and workloads 9
  10. 10. Engineering challenges Product complexity Communication platform Data platform Scale Millions of events per day Billions of events overall Typically no stable schemas 10 Real-time aspects Response in second(s) Streaming nature Reliability 24/7 availability Services go down Servers disappear
  11. 11. ALTOCLOUD DATA PLATFORM ALTOCLOUD PLATFORM Altocloud Platform 11 APIs MESSAGE QUEUES DATA PROCESSORS STORAGE APIsAPIs APIs
  12. 12. Tools that we use Focus on open source (Apache) 12
  13. 13. Tools that we use - data 13
  14. 14. Why Spark Fast for iterative algorithms (important for Machine Learning) Good integration with other tools (Kafka and Cassandra) One code base for streaming and batch processing Easy to deploy and maintain Growing ecosystem (SQL, MLlib, GraphX, …) Large open-source community 14
  15. 15. Data source: Kafka Pub-sub message broker Fast: 100s MBs /s on a single broker Scalable: partitioned data streams Durable: messages persisted and replicated Distributed: Strong durability and fault-tolerance Downside: requires ZooKeeper 15
  16. 16. Scalable storage Easy to setup High availability - no master Great performance CQL - SQL like querying Great support and bug-free drivers from Datastax Key: Design your schema around queries; 16
  17. 17. Data Demographic device location organisation contact details, and more JSON 17 Events: page views form fills searches purchases IVR / telephony custom events …
  18. 18. MESSAGE QUEUES DATA PROCESSORS DATA INGESTION QUERY LAYER STORAGE LAYER Altocloud Data Platform 18 PLATFORM APIs DATA APIs
  19. 19. Goals for Analytics platform Easy to scale As real-time as possible Performance vs. flexibility ~80% of queries known upfront Limited resources Low latency 19
  20. 20. Analytics MESSAGE QUEUES DATA PROCESSORS QUERY LAYER STORAGE LAYER 20 APIs EVENT STORAGE EVENTS DIMENSIONS VIEWS AGGREGATIONS EVENTS EVENT METADATA 1 2 2 4 3 5 6 7 APIs
  21. 21. Summary Materialise views for buckets every minute Hourly roll ups on raw events Some numbers: 1bn+ events / day on 8 cores (Spark) Sub-second query time Lessons learned: Know your data partitioning Idempotent design is key! 21
  22. 22. Outcome Probabilities 22
  23. 23. AI platform Goal: predict probability of customer X achieving goal Y Train Models per Outcome and Business (1000s) Apply models per each event in real time (5s) Flexibility to add new data features on demand Different dataset sizes forcing different algorithms 23
  24. 24. Spark ML Pipeline “Decode” Spark ML pipeline & stages Combine feature & model pipelines per-outcome “Compose” per-outcome pipeline in streaming Apply different pipelines per event in streaming batch
  25. 25. Key takeaways Streaming over batch - highly reactive, low latency Design for idempotent processing: things will always fail Open source is great (most of the time) and cheap macdab@altocloud.com 25
  26. 26. Complex algorithms behind simple UX 26

×