Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
1 of 29

AI-Powered Streaming Analytics for Real-Time Customer Experience



Download to read offline

Interacting with customers in the moment and in a relevant, meaningful way can be challenging to organizations faced with hundreds of various data sources at the edge, on-premises, and in multiple clouds.

To capitalize on real-time customer data, you need a data management infrastructure that allows you to do three things:
1) Sense-Capture event data and stream data from a source, e.g. social media, web logs, machine logs, IoT sensors.
2) Reason-Automatically combine and process this data with existing data for context.
3) Act-Respond appropriately in a reliable, timely, consistent way. In this session we’ll describe and demo an AI powered streaming solution that can tackle the entire end-to-end sense-reason-act process at any latency (real-time, streaming, and batch) using Spark Structured Streaming.

The solution uses AI (e.g. A* and NLP for data structure inference and machine learning algorithms for ETL transform recommendations) and metadata to automate data management processes (e.g. parse, ingest, integrate, and cleanse dynamic and complex structured and unstructured data) and guide user behavior for real-time streaming analytics. It’s built on Spark Structured Streaming to take advantage of unified API’s, multi-latency and event time-based processing, out-of-order data delivery, and other capabilities.

You will gain a clear understanding of how to use Spark Structured Streaming for data engineering using an intelligent data streaming solution that unifies fast-lane data streaming and batch lane data processing to deliver in-the-moment next best actions that improve customer experience.

AI-Powered Streaming Analytics for Real-Time Customer Experience

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. AI-Powered Streaming Analytics for Real-Time Customer Experience Your Name, Your Organization #UnifiedDataAnalytics #SparkAISummit John Haddad & Vishwa Belur
  3. 3. Sense Reason Act
  4. 4. Streaming Data Management Streaming Pipeline Operationalization Streaming Data Enrichment Streaming Data Ingestion Collect streaming data from streaming and IoT endpoints and ingest it onto the lake or messaging hub Operationalize actions based on insights from streaming data Enrich and distribute streaming data in real time for business user consumption Sense Reason Act
  5. 5. Intelligent Streaming Analytics Identify stress signals coming from devices and act on them before its too late Predictive Maintenance and Smart Factory Real-time Offer Management Combine web searches and camera feeds to identify the customer and roll out real-time offers Clinical Research Optimization Collect and process bedside monitor data for clinical researchers to more effectively understand and detect disease Real-time Fraud Detection and Alerting Run real time fraud detection machine- learning model on the transaction data
  6. 6. Case Study: Multi-latency data management at a large global Media & Entertainment company Objective: Enhance the customer experience with chatbot on mobile app in cloud • How did Informatica help? – Real-time ingestion and analytics on chatbot responses – Provide real time feedback to IBM Watson on the chatbot response quality – Run batch analytics on the data for operational reporting Mobile phone (Line) Mobile App Data Lake Amazon Kinesis Data Engineering Data Engineering
  7. 7. Case Study: Streaming Analytics at OVO, a leading Indonesian financial services platform Objective: Create better customer engagement with targeted real-time campaigns • How did Informatica help? – Stream data from millions of customers for analysis of various combinations of customer behaviors and segments – Integrate customer transaction and interaction data to deliver personalized campaigns in less than 15 seconds Top up, payments, purchases Mobile App – notificatons, promo codes, deals Customer analytics and attributes Data Lake Data Engineering
  8. 8. Event-centric Data Processing – Methodology The value of most events is multiplied by the context Action Dynamic Context Models, Rules, History, Patterns, States Enterprise Data External Data Event Sense Reason Act
  9. 9. Real-time Streaming Analytics — Customer Expectations Simple edge processing to handle bad records Add metadata to the message for better analytics Variety of sources and source protocol (MQTT, OPC, HTTP, etc.) Structured/Unstructured Cloud data lakes and messaging hubs Support “intent- driven ingestion” Support for dynamically evolving schema Real time enrichment with transformation Serverless streaming Real time ML model operationalization Support “event time based processing” Support for late arrival of IoT events Versatile Connectivity Edge Processing Schema Drift Streaming Analytics Serverless & ML Sense Reason Act
  10. 10. Informatica Enterprise Streaming and Ingestion Real time offer alert Capture and Ingest Relational Systems Real time dashboard Machine Data / IoT Sensor Data Web Logs Social Media Change Data Capture & Publish Message Hub Persist /Data Lake/Data Warehouse Trigger business processes Changes Amazon Kinesis Azure Event Hub Real-time/Batch Processing & Analytics Enrich, Process, and Analyze Filter Transform Aggregate EnrichParse AWS/Azure/Google Not Only SQL Sense Reason Act Real-time Ingestion
  11. 11. Cloud Streaming Ingestion Service Ingest streaming data: Logs, clickstream, social media, Kafka Kinesis, S3, ADLS, Firehose, etc. Orchestrate streaming data ingestion in hybrid/cloud as managed and secure service Real-time monitoring of ingestion jobs with lifecycle management and alerting in case of issues Provides streaming ingestion capabilities as part of IICS Data Ingestion service Machine Data / IOT Sensor Data WebLogs Social Media Messaging Systems Messaging Systems Real time analytics Data Lake Consumption Sense
  12. 12. Decipher Data With CLAIRE™ Automatic Structure Detection • Machine learning algorithm recognizes the file structure • Relational structure generated on the fly Automatic Model Development • View data in a visual structure • Clearly see which elements are connected to real data • Refine data: – Normalize – Exclude – Rename element Deploy on Cloud or on-premise • Parser is automatically created • Intelligent Parsers can be used in run-time to transform similar data files for continuous processing E xp o r t E xp o r t Reason
  13. 13. Intelligent Structure Discovery in Action • Data Fluctuation and Data Drifting – Different formats of the same semantics: 01/01/2019 and 01-01-2019 and Jan-01-2019 – Record changes within file: If some records contain 10 fields other contain 8 – Some changes in file format: for example additional fields Original Log New version Log New fields that are not in the model are mapped to unassigned ports New date format is handled correctly Added spaces are handled correctly
  14. 14. Spark Structured Streaming—Overview • What is Structured Streaming? – High level streaming API built on DataFrames/DataSets; treats stream as an infinite table – Unifies streaming, interactive and batch queries and provides a structured abstraction • How does it help? – Handle streaming data based on event time instead of processing (Spark) time – Address otherwise impossible—out of ordered delivery of streaming data with watermarking – Support for output mode in Streaming target—append, complete and update Reason Scalable and fault-tolerant stream processing engine STRUCTURED STREAMING
  15. 15. Motivation for Structured Streaming • Streaming and batch needed separate implementation – Streaming (Dstream) managed by StreamingContext – Batch (DataFrame) managed by SqlContext • Spark unified API to leverage the functionalities offered by DataFrame API’s – Common implementation for batch & streaming – Provides enhanced Stateful operations and gives further control to spark application developers – Moving towards operating batch as stream and stream as batch
  16. 16. Spark Structured Streaming Support What are we doing? • First vendor to support Spark Structured Streaming with Spark 2.3.1 • Windowing based on source event time • Ability to define watermark for late event handling How does it help? • Aggregate streaming data based on event time instead of processing time • Handle “out of order” data from the source and deliver “in order” to target
  17. 17. Spark Structured Streaming vs Dstream • Batch Size : 20 seconds • Window Size : 40 seconds • Watermark Interval : 5 minutes (ignored in Dstream , applicable only for structured streaming). Event EventTime Spark Computing Time Actual Event Rec1 01/01/2019 00:00:00 01/01/2019 00:00:01 LogDateTime ( 01/01/2019 00:00:00 ) , LogLevel - ERROR Rec 2 01/01/2019 00:00:01 01/01/2019 00:01:01 Rec2 – LogDateTime ( 01/01/2019 00:00:01 ) , LogLevel - ERROR Rec 3 01/01/2019 00:00:02 01/01/2019 00:00:03 Rec3 – LogDateTime ( 01/01/2019 00:00:02 ) , LogLevel - ERROR Rec 4 01/01/2019 00:00:03 01/01/2019 00:00:21 – LogDateTime ( 01/01/2019 00:00:03 ) , LogLevel - ERROR Rec 5 01/01/2019 00:00:04 01/01/2019 00:00:23 – LogDateTime ( 01/01/2019 00:00:04 ) , LogLevel – ERROR
  18. 18. Processing with Dstream • Dstream’s window is based on the processing time- time of arrival of data into spark’s window computation • User has no choice to pick any data or metadata which is part of the incoming data for windowing Rec1 ….. Rec3 Rec4 …… Rec5 Rec2 batch-(n+2)batch-(n+1) batch-(n+3) batch-(n+4) window -2 Count(Rec1, Rec3, Rec4, Rec5) = 4 count(Rec2) = 1 Late Arrival is not addressed and source event time based ordered processing is not possible window -1
  19. 19. Processing with Structured Streaming [01/01/2019 00:00:00 – 01/01/2019 00:00:40] 5 WaterMarkDelay = 31/12/2018 11:59:20 – 5 minutes = 31/12/2018 11:54:20 Rec1 and Rec3 falls within 01/01/2019 00:00:00 – 01/01/2019 00:00:40 and watermark is not greater than end interval. So, spark waits and aggregate for records arriving for this window. …Recx (last data to arrive with TS 31/12/2018 11:59:20) in this batch. last record in batch-(n+1) : Rec 3 WaterMarkDelay = 01/01/2019 00:00:02 – 5 minutes = 31/12/2018 11:55:02 Rec4 and Rec5 falls within 01/01/2019 00:00:00 – 01/01/2019 00:00:40 and watermark is not greater than end interval. Empty batch will be ignored. Last record in batch-(n+2) : Rec5 Water Mark Delay = 01/01/2019 00:00:04 – 5 minutes = 31/12/2018 11:55:04 . Rec2 falls within 01/01/2019 00:00:00 – 01/01/2019 00:00:40 and watermark is not greater than end interval. Window Count(Aggregate) Structured Streaming aggregates the late arrival of events in the same window [01/01/2019 00:00:00 – 01/01/2019 00:00:40] batch-(n+2)batch-(n+1) batch-(n+3) batch-(n+4) State Store maintained by Spark Rec1 ….. Rec3 Rec4 …… Rec5 Rec2
  20. 20. How Informatica adopted Structured Streaming? 20
  21. 21. Data Science & Machine Learning What are we doing? • Python Tx in batch and streaming mapping How does it help? • Helps to operationalize ML models in streaming flows • Solves Data science use cases Act
  22. 22. Real-time Visualization Alerts Streaming & Batch Analytics Parse Enrich Deliver Machine Learning-based Recommendation Engine MDM Hub Data Warehouse Integrate Cleanse Match Mask Target Topic Data Lakes & Warehouses Customer Service Representative Databases CRM ERP THINGS Enterprise Streaming & Ingestion - Reference Architecture CDC Ingest Stream & IoT Ingest
  23. 23. 2323 Demo
  24. 24. Innovative Retail Corp. – Use case • Innovative Retail Corp - big retail corporation with offline and online presence • Business Challenges • Loyal customers visiting the shop are not buying products they browsed online • Customers browsing products online, but not purchasing • What do they want to do? • Track customer online searches • Ingest camera feed & use image recognition to identify customers visiting the store • Ingest Beacon feed data to identify the real-time location of the customer in shop • Push real time offers on customer mobile phones based on their search history & the section they are in the store 24
  25. 25. Demo Scenario www…com weblogs Searc h history Ingestion Analytics JSON Capture and Transport Enrich and Process Camera Feed Beacon data Ingestion JSON Lookup Real time dashboard Sense Reason Act Real time offer Alert SMS
  26. 26. AI-Powered Enterprise Streaming Solution Enterprise Class Hybrid Streaming • Cloud native streaming data ingestion and stream processing • Enterprise readiness – distributed deployment, IoT deployment, dynamic mapping High Level Roadmap Multi-Cloud, Connectivity • Serverless computing • Streaming reference architecture with Databricks • Extensive Streaming & IoT source & target connectivity Streaming Analytics & CLAIRE • CLAIRE & Confluent Schema Registry support for parsing & schema drift • De-duplication and Continuous streaming
  27. 27. Summary • Multi latency data management platform • End-to-end streaming for Cloud/Hybrid eco-systems Streaming in Multi Latency Platform • IoT & Streaming source connectivity • Cloud data lakes and messaging hub connectivity Streaming Connectivity • Easy to use experience for data ingestion & integration • Unified UI for streaming and batch • Integrated deployment & real-time monitoring Simplified UI • AI-driven complex data parsing with CLAIRETM • Address dynamically evolving schema Complex Data Parsing & Schema Drift
  28. 28. Thank You