Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real-Time Analytics with Confluent and MemSQL

8,908 views

Published on

MemSQL Meetup, August 11, 2016

Published in: Data & Analytics
  • Be the first to comment

Real-Time Analytics with Confluent and MemSQL

  1. 1. Hans Jespersen and Steven Camiña August 11, 2016 Enabling Real-Time Analytics for IoT
  2. 2. The Rise of Real-Time Analytics On-demand economy Internet of Things New technologies
  3. 3. Auto and Transportation Delivery Energy Warehousing and Logistics Manufacturing Healthcare Industries that Need Real Time
  4. 4. Data Producers (simulating sensor activity) User Interface Architecting for Real-Time Analytics Databasegateway gateway ... gateway Message Queue Data Transformation
  5. 5. High-Speed Ingest Data Producers (simulating sensor activity) Data Transformation User Interface Architecting for Real-Time Analytics Database Message Queue gateway gateway ... gateway
  6. 6. About Confluent
  7. 7. 7 About Confluent and Apache Kafka • Founded by the creators of Apache Kafka • Founded September 2014 • Technology developed while at LinkedIn • 73% of active Kafka committers Cheryl Dalrymple CFO Jay Kreps CEO Neha Narkhede CTO, VP Engineering Luanne Dauber CMO Leadership Todd Barnett VP WW Sales Jabari Norton VP Business Dev
  8. 8. 8 What is a Stream Data Platform? KAFKA Stream Data Platform Search NoSQL RDBMS Monitoring Stream ProcessingReal-time Analytics Data Warehouse Apps Apps Hadoop Synchronous Req/Response 0 – 100s ms Near Real Time > 100s ms Offline Batch > 1 hour Build streaming applications Deploy streaming applications at scale Monitor and manage streaming applications Common Kafka Use Cases • Log data • Database changes • Sensors and device data • Monitoring streams • Call data records • Monitoring • Asynchronous applications • Fraud and security
  9. 9. 9 Confluent Platform Confluent Platform Alerting Monitoring Real-time Analytics Custom Application Transformations Real Time Applications Apache Kafka Core Connectors Control Center REST Proxy & Schema Registry Hadoop ERP CRM Data Warehouse RDBMS Data Integration Connectors Database Changes Mobile DevicesloTLogs Website Events Confluent Platform Confluent Platform Enterprise External Product Support, Services and Consulting Kafka Streams Source Sink
  10. 10. 10 Confluent Control Center Configures Kafka Connect data pipelines Monitors all pipelines from end-to-end
  11. 11. Confluent Streaming Data Platform Data Producers (simulating sensor activity) Architecting for IoT Streaming Data Ingestion REST MQTT WSS Data Transformation User Interface Kafka Cluster ... Database
  12. 12. Fast, Performant Data Storage Data Transformation User Interface Architecting for Real-Time Analytics Database Message Queue Data Producers (simulating sensor activity) gateway gateway ... gateway
  13. 13. About MemSQL
  14. 14. 14 Designed for Modern Operational Workloads Scalable SQL In-Memory and Solid-State Distributed Datacenter or Cloud ▪ Multi-mode ▪ OLTP, OLAP, HTAP ▪ Multi-model ▪ ANSI SQL ▪ Document/JSON ▪ Geospatial ▪ In-Memory rowstore ▪ Solid-state columnstore ▪ Stream directly to rowstore or columnstore ▪ Distributed query optimizer and execution ▪ Scale-out on commodity hardware ▪ Deploy on-premises ▪ Cloud agnostic ▪ Amazon ▪ Microsoft ▪ Google ▪ Digital Ocean Simple Real-Time Low Cost Flexible SSD
  15. 15. 15 Real-Time Processing Features ▪ Ecosystem Compatibility • MySQL Wire Protocol • Stream processing through Integrated Apache Spark ▪ In-Memory Performance • Code Compilation for SQL queries • Maximum Concurrency with Lock-free components • Full Data Durability and High Availability ▪ Distributed System Processing • Distributed Database Joins • Distributed Query Optimizer ▪ Multi-mode and Multi-model data • In-Memory Rowstore and Flash/SSD Columnstore • SQL, JSON and Geospatial data
  16. 16. ▪ MemSQL Streamliner is an integrated MemSQL and Apache Spark solution ▪ Deploys Apache Spark with one click ▪ Creates real-time data pipelines through a graphical UI ▪ Open sourced on GitHub at memsql.github.io/spark-streamliner Real-Time Application Real-Time Inputs 16 Real-Time Data Pipelines with Spark STREAMLINER Apache Spark Extract, Transform, Load
  17. 17. Orchestration / Containers Cloud / On-Premises Platform MessagingInputs Real-Time Applications Business Intelligence Dashboards Relational Key-Value Document Geospatial Existing Data Stores Rowstore Columnstore Real-Time Data Pipelines Hadoop Amazon S3MySQL 17 MemSQL Ecosystem and Architecture
  18. 18. MemSQL Platform Database Data Transformation User Interface Architecting for Real-Time Analytics Message Queue Data Producers (simulating sensor activity) gateway gateway ... gateway
  19. 19. Real-Time Applications Data Transformation User Interface Architecting for Real-Time Analytics Message Queue Data Producers (simulating sensor activity) gateway gateway ... gateway Database
  20. 20. MemEx
  21. 21. MemEx: IoT Showcase Application - Combines MemSQL, Apache Kafka, and Spark for global supply chain management - Enables enterprises to predict throughput of supply warehouses - Processes 2 million data points, based on 2,000 sensors across 1,000 warehouses
  22. 22. Data Producers (simulating sensor activity) MemEx UI MemEx Architecture gateway gateway ... gateway Data Transformation Apache Spark Spark MLlib Predictive Model Raw Sensor 1 + Predictive Score 1 S1 P1 1
  23. 23. Classification BLUE Minor Damage Type 1 BLACK training data for machine operating normally ORANGE Major Damage Type 2
  24. 24. Live Demo
  25. 25. Q/A
  26. 26. Thank You
  27. 27. Appendix
  28. 28. 28 Real-time drilling sensor data to manage the high stakes of producing oil in a depressed market and maximizing productivity. + Top Energy Firm 28
  29. 29. TECHNICAL BENEFITS - Enabled machine learning scoring of streaming data for real-time Predictive Analytics - Integrated SAS BI PMML for deep analytics - Joined multiple data types and third party sources including geospatial and weather data 29
  30. 30. 30 Spark MLlib Predictive Model REAL-TIME INPUTS Streamliner Raw Sensor 1 + Predictive Score 1 S1 P1 1 BUSINESS LOGIC
  31. 31. Continued Rise of IoT 31 Sensor Array PoS Systems Connected Fleets Mobile Apps Security Reporting Systems Log Systems Data Lake Data Warehouse Databases “By 2020, over 20 billion connected things will be in use across a range of industries; the IoT will touch every role across the enterprise.” Source: Gartner
  32. 32. 32 “These are highly automated drones. They have what is called sense-and-avoid technology. That means, basically, seeing and then avoiding obstacles.” Yahoo, January 2016: https://www.yahoo.com/tech/exclusive-amazon-reveals-details-about-1343951725436982.html 32 Amazon Invests in Drones for 30 Minute Post-Order Deliveries
  33. 33. 33 Fedex Breaks Record With 317 Million Packages Shipped Over Christmas 2015 “FedEx Ground continues to advance the industry’s most automated hub network with investments in package sortation systems that enable flexible and reliable operations and six-sided scanning tunnels that boost data and image capture.” FedEx, October 2015: http://about.van.fedex.com/newsroom/global-english/fedex-forecasts-record-volume-this-holiday-season/ 33
  34. 34. The Evolution of Data Analytics 34 Descriptive Analytics Predictive AnalyticsReal-Time Analytics
  35. 35. High-Speed Ingest Data Producers (simulating sensor activity) STREAMLINER Apache Spark Real-Time Application Message Queue
  36. 36. High-Speed Ingest Data Producers (simulating sensor activity) STREAMLINER Apache Spark Real-Time Application MemEx Architecture
  37. 37. 37 Top supply chain companies are turning to the adoption of advanced analytics to improve supply chain functions. Source: Gartner

×