Advertisement

Processing Real-Time Data at Scale: A streaming platform as a central nervous system in the enterprise

confluent
confluent
Oct. 31, 2018
Advertisement

More Related Content

Slideshows for you(20)

Similar to Processing Real-Time Data at Scale: A streaming platform as a central nervous system in the enterprise(20)

Advertisement

More from confluent(20)

Advertisement

Processing Real-Time Data at Scale: A streaming platform as a central nervous system in the enterprise

  1. 1 Processing Real-Time Data at Scale A streaming platform as a central nervous system in the enterprise October 30th , 2018 Marcus Urbatschek, Confluent marcus.urbatschek@confluent.io +49 171 77 433 83
  2. 2 Agenda • Streaming Platform • Use Cases • Apache Kafka and Confluent Platform • Q&A
  3. 3 Survey #1 What is your key requirement for your future modern architecture? (Multiple answers possible) - Digitalization - Speed / Time to market - Innovation and agile projects - Real-time Insights
  4. 4 Agenda • Streaming Platform • Use Cases • Apache Kafka and Confluent Platform • Q&A
  5. DB Old World: Pre-Streaming DB DB DB DWH Operational Databases Relational Data Warehouse Reporting Analytics App App App
  6. Old World: Pre-Streaming
  7. Legacy Data Infrastructure Solutions Have Architectural Flaws App App DWH Transactional Databases Analytics Databases Data Flow DB DB App App MOM MOM ETL ETL ESB These solutions can be ● Batch-oriented, instead of event- oriented in real time ● Complex to scale at high throughput ● Connected point-to-point, instead of publish / subscribe ● Lacking data persistence and retention ● Incapable of in-flight message processing App App
  8. Modern Architectures are Adapting to New Data Requirements NoSQL DBs Big Data Analytics But how do we revolutionize data flow in a world of exploding, distributed and ever changing data? App App DWH Transactional Databases Analytics Databases Data Flow DB DB App App MOM MOM ETL ETL ESB App App
  9. The Solution is a Streaming Platform for Real-Time Data Processing A Streaming Platform provides a single source of truth about your data to everyone in your organization NoSQL DBs Big Data Analytics App App DWH Transactional Databases Analytics Databases Data Flow DB DB App AppApp App Streaming Platform
  10. Business Digitization Trends are Revolutionizing your Data Flow Massive volumes of new data generated every day Mobile Cloud Microservices Internet of Things Machine Learning Distributed across apps, devices, datacenters, clouds Structured, unstructured polymorphic
  11. What’s Needed? Event Centric Thinking
  12. Events What is an event?
  13. Events
  14. Events A Sale An Invoice A Trade A Customer Experience
  15. What is a company? A business is a series of events and reacting to those events.
  16. All Your Data is a Stream of Events
  17. A Streaming Platform enables event-centric thinking
  18. What is a Streaming Platform? 01 Messaging, Done Right 02 Foundation for ETL & Data Integration 03 Hadoop Made Fast Search Stream Processing DWH Hadoop RDBMS Apps Real-Time Analytics Monitoring K/V
  19. The Streaming Platform Technical Capabilities Store Process Publish & Subscribe
  20. Streaming Adoption Journey Pre-Streaming Streaming Awareness and Pilot Early Production Streaming Mission Critical, Integrated Streaming Global Streaming Central Nervous System
  21. 23 Survey #2 Where are you in your journey to establish a modern streaming platform in your enterprise? (One answer possible) - 1 – Pre-Streaming (Batch, Legacy) - 2 – Interest (first proof-of-concepts or pilots) - 3 – Early Production (some independent projects in production) - 4 – Integrated Streaming (streaming platform with different projects in production) - 5 – Streaming Platform (streaming enterprise with mostly event-based applications) Things to consider - How many in each stage? - Where do you want to be in 12-24 months? - How big is your jump?
  22. 28 Agenda • Streaming Platform • Use Cases • Apache Kafka and Confluent Platform • Q&A
  23. IoT sensor ingestion Digital replatforming/ Mainframe Offload Customer 360 Faster transactional processing / analysis incl. Machine Learning / AI Microservices Architecture Online Fraud Detection Online Security (syslog, log aggregation, Splunk replacement) Middleware replacement Website / Core Operations (Central Nervous System) Example Use Cases Real-time app updates
  24. Business Value Increase Revenue (make money) Decrease Costs (save money) Mitigate Risk (protect money) Key Driver Digital Transformation $↑ $↓ $ Improve Customer Experience (CX) Core Business Platform Increase Operational Efficiency Migrate to Cloud Strategic Objective Fraud Detection Regulatory IoT sensor ingestion Digital replatforming/ Mainframe Offload Customer 360 Faster transactional processing / analysis incl. Machine Learning / AI Microservices Architecture Online Fraud Detection Online Security (syslog, log aggregation, Splunk replacement) Middleware replacement Website / Core Operations (Central Nervous System) Real-time app updates Example Use Case Value?
  25. Stream Data is The Faster the Better Big Data was The More the Better ValueofData Volume of Data ValueofData Age of Data Value of Data?
  26. 32 Streaming is Transforming Customer Experiences in Different Industries Healthcare & Pharma ● Patient monitoring ● Prescription control ● Lab alerts ● Medication tracking Banking & Capital Markets ● Risk management ● Trade data capture ● Customer 360 ● Fraud detection ● Mainframe offload Retail ● Inventory management ● Product catalog ● A/B testing ● Customized experiences Telecommunications ● Personalized ads ● Customer 360 ● Network integrity Automotive & Transportation ● Connected car ● Fleet management ● Manufacturing data processing Travel & Leisure ● Visitor segmentation ● Booking systems ● Pricing services ● Fraud detection
  27. 33 Tech Giants - Streaming Log Analytics https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
  28. 34 More than 1 petabyte of data in Kafka Over 4.5 trillion messages per day 60,000+ data streams Source of all data warehouse & Hadoop data Over 300 billion user- related events per day Apacke Kafka®: Open Source Streaming Platform Battle-Tested at Scale The birthplace of Apache Kafka
  29. The Future of the Automotive Industry is a Real Time Data Cluster Front, rear and top view cameras Parking assistant Environment pointer Ultrasonic Sensors Parking assistant with front and rear camera plus environment indicator Crash Sensors Front protection adaptivity Side protection Tail impact protection Front Camera Audi Active lane assistant Speed limit indicator Adaptive light Infrared Camera Rearview assistance with Pedestrian recognition Front and Rear Radar Sensors ACC with stop and go function Side assist
  30. The Future of the Automotive Industry is a Real Time Data Cluster Front, rear and top view cameras Ultrasonic SensorsCrash Sensors Front Camera Infrared Camera Front and Rear Radar Sensors Traffic Alerts Hazard Alerts Personalization Anomaly Detection MQTT MQTT MQTT MQTT MQTTMQTT
  31. 37 Retail: Hypercompetitive market with a need to respond to customer demand in real-time ● Technology Issue: Base systems in legacy architecture built around Hadoop with Spark & traditional ETL – slow response times not meeting business needs. ● Challenges to synchronize data and have visibility across systems including online, supply chain and vendors.
  32. 38 Retail: Real-Time Customer Experience “Wal-Mart is able to take data from your past buying patterns, their internal stock information, your mobile phone location data, social media as well as external weather information and analyse all of this in seconds so it can send you a voucher for a BBQ cleaner to your phone– but only if you own a barbeque, the weather is nice and you currently are within a 3 miles radius of a Wal-Mart store that has the BBQ cleaner in stock.” Results
  33. Winning in the Digital Era doesn’t have to be hard. Mainframes Proprietary messaging systems Monolithic application development On-premises data centers Batch-oriented, closed systems Scalable machine clusters No bottlenecks from message queues Agile software development through microservices Cloud capable, and even… Data systems turned inside out, open & transparent Slow speed of execution Fast and flexible Your data infrastructure was built for a different era Imagine a world…
  34. 49 Survey #3 What use cases do you see for a streaming platform? (Multiple answers possible) - New real-time applications - IoT / Connected XYZ - Decoupling between different legacy applications and modern applications - (New) Legacy Offloading - Microservices Architecture - Fast data processing / Real-time decisioning - Analytics (e.g. for data science, machine learning) - Mission critical applications (e.g. payments, fraud detection, customer experience) - Other?
  35. 50 Agenda • Streaming Platform • Use Cases • Apache Kafka and Confluent Platform • Q&A
  36. 51 Confluent Delivers a Mission-Critical Streaming Platform Apache Kafka® Core | Connect API | Streams API Data Compatibility Schema Registry Enterprise Operations Replicator | Auto Data Balancer | Connectors | MQTT Proxy | Operator Database Changes Log Events IoT Data Web Events other events Hadoop Database Data Warehouse CRM other DATA INTEGRATION Transformations Custom Apps Analytics Monitoring other REAL-TIME APPLICATIONS OPEN SOURCE FEATURES COMMERCIAL FEATURES Datacenter Public Cloud Confluent Cloud Confluent Platform Management & Monitoring Control Center | Security Development & Connectivity Clients | Connectors | REST Proxy | KSQL CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
  37. 52 Connect All Applications and Data Sources and Sinks
  38. 53 KSQL Streaming SQL Engine for Apache Kafka ksql > CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = ‘Platinum’; confluent.io/product/ksql Develop real-time stream processing apps writing only SQL! No Java, Python, or other boilerplate to wrap around it
  39. 54 CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code; CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum'; CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3; KSQL: the Simplest Way to Do Stream Processing 1 2 3Streaming ETL Anomaly detection Monitoring
  40. 63 Survey #4 Do you use or plan to use the Apache Kafka ecosystem in your projects? (One answer possible) • We already use Apache Kafka. • We already use Apache Kafka and also leverage Confluent components. • We do not use Apache Kafka yet but plan to use it soon. • We do not plan to use Apache Kafka soon to build a streaming platform.
  41. 64 Agenda • Streaming Platform • Use Cases • Apache Kafka and Confluent Platform • Q&A
  42. 65Confidential THANK YOU! Learn more: confluent.io/download confluent.io/product/ksql/ confluent.io/confluent-cloud/
Advertisement