Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X

2,061 views

Published on

Data integration and processing is a huge challenge in Industrial IoT (IIoT, aka Industry 4.0 or Automation Industry) due to monolithic systems and proprietary protocols. Apache Kafka, its ecosystem (Kafka Connect, KSQL) and Apache PLC4X are a great open source choice to implement this integration end to end in a scalable, reliable and flexible way.

This blog post covers a high level overview about the challenges and a good, flexible architecture. At the end, I share a video recording and the corresponding slide deck. These provide many more details and insights.

Apache Kafka is the De-facto Standard for Real-Time Event Streaming. It provides

Open Source (Apache 2.0 License)
Global-scale
Real-time
Persistent Storage
Stream Processing

PCL4X allows vertical integration and to write software independent of PLCs using JDBC-like adapters for various protocols like Siemens S7, Modbus, Allen Bradley, Beckhoff ADS, OPC-UA, Emerson, Profinet, BACnet, Ethernet.

Github example: https://github.com/kaiwaehner/iiot-integration-apache-plc4x-kafka-connect-ksql-opc-ua-modbus-siemens-s7

More details: http://www.kai-waehner.de/blog/2019/09/02/iiot-data-integr…and-apache-plc4x/ ‎

Video Recording: https://youtu.be/RWKggid25ds

Published in: Software

IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X

  1. 1. 1Confidential Flexible and Scalable Integration in Automation Industry / Industrial IoT Kai Waehner Technology Evangelist contact@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de Kafka-Native End-to-End IIoT Data Integration and Processing with Kafka Connect, KSQL and Apache PLC4X
  2. 2. 2 Agenda 1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning 2) Automation Industry and its Challenges 3) Architecture for End-to-End Integration from Edge to Data Center / Cloud 4) Apache Kafka as Event Streaming Platform 5) Apache PLC4X for Edge Integration 6) Example: Supply Chain Optimization at Scale in Real Time
  3. 3. 3 Agenda 1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning 2) Automation Industry and its Challenges 3) Architecture for End-to-End Integration from Edge to Data Center / Cloud 4) Apache Kafka as Event Streaming Platform 5) Apache PLC4X for Edge Integration 6) Example: Supply Chain Optimization at Scale in Real Time
  4. 4. 4 Business Digitalization Trends are Driving the Need to Process Events at a whole new Scale, Speed and Efficiency Mobile Cloud Microservices Internet of Things Machine Learning The world has changed!
  5. 5. 5 Industry 4.0 / Industrial IoT (IIoT)
  6. 6. 6 Some IIoT use cases Analytics • Ingest data into cloud for analytics • Reduce cost: Leverage open frameworks instead of paying very expensive licenses per machine • Flexible integration (select data to ingest, flexible changes over time) • Machine Learning / Data Science Manufacturing • Collect data from machines à Preprocess + monitoring to optimize assembly line and reduce cost • Aggregate data from different machines / companies —> Leverage (and sell?) insights • Sell services on top of machines —> Predictive maintenance (remote) • Scale up (add more sites, add more data) Production Robots • Ingest, process and monitor large volumes data (where the proprietary monolith does not scale) Smart Factories • Monitor and manage the whole factory (at scale, in real time, flexible) • Integration with legacy proprietary protocols and modern cloud-native technologies
  7. 7. 7 Agenda 1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning 2) Automation Industry and its Challenges 3) Architecture for End-to-End Integration from Edge to Data Center / Cloud 4) Apache Kafka as Event Streaming Platform 5) Apache PLC4X for Edge Integration 6) Example: Supply Chain Optimization at Scale in Real Time
  8. 8. 8 History of Automation Industry vs. Big Data and Cloud Christofer Dutz (codecentric) https://foss-backstage.de/sites/foss-backstage.de/files/2018-07/Revolutionizing%20Industrial%20IoT%20with%20Apache%20PLC4X.pdf
  9. 9. 9 Challenges in Automation Industry IoT != IIoT • IoT = Connected cars, smart home, … à Large scale, secure, scalable, open, modern technologies • IIoT = Slow, insecure, not scalable, proprietary Legacy / Proprietary IIoT Technologies • Usually incompatible protocols, typically proprietary • Usually serial connections (very low latency, nanoseconds) - with TCP / UDP wrapper around it to integrate with “external world” • Siemens S7, Modbus, Beckhoff, Profinet, Allen Bradley, etc. • OPC-UA (required machine update + license cost) Product Lifecycles • Long lifecycle (tens of years) • Factories cost millions, no simple changes / upgrades • Still using Windows 7 without Service Packs => Usability and security issues • Mantra: “Stay with your well known vendor forever”
  10. 10. 10 Challenges in Automation Industry Monoliths • No scalability • No extendibility • No real failover (start your backup machine) Missing Security Capabilities • Security in software development == Authentication, Authorization, Antivirus, SSL, SASL, Kerberos • Security in automation industry == Safety • “if you press the red button, the machine stops immediately” • Insecure by nature => No Authentication / Authorization / Encryption • Mantra: “Our factory building and network is secure, no access from outside” • Contradicts with “move to cloud and big data analytics”
  11. 11. 11 PLC (Programmable Logic Controller) • Started early 70’s • Control of manufacturing processes • Small grey box • ~100 messages per second, stored to CSV file, Windows Share • Limited operations: Read (90+%), Write, Subscribe, Call Functions, List Resources • High reliability control, ease of programming and process fault diagnosis • Hardwire à softwire • Has Input / Sensors, Output / Actors • Firmware (= operating system) • Mechanism to load user programs • Highly fragmented market • S7 (Siemens), Beckhoff ADS, Modbus (Asia), Ethernet/IP, KNX, Emerson DeltaV, Profinet, Allen Bradley, etc. • State of the art in automation industry
  12. 12. 12 Example: Siemens S7 Communication When communicating with S7 Devices there is a whole family of protocols, that can be used. In general you can divide them into Profinet protocols and S7 Comm protocols. The later are far simpler in structure, but also far less documented. The S7 Comm protocols are generally split up into two flavors: The classic S7 Comm and a newer version unofficially called S7 Comm Plus. https://plc4x.apache.org/protocols/s7/index.html
  13. 13. 13 Trends: ~50% of industrial assets in factories will be connected by 2020 https://iot-analytics.com/5-industrial-connectivity-trends-driving-the-it-ot-convergence
  14. 14. 14 Trends: Evolution of Convergence between IT and Industrial Automation https://iot-analytics.com/5-industrial-connectivity-trends-driving-the-it-ot-convergence
  15. 15. 15 How to get from legacy, proprietary to cloud, big data, machine learning?
  16. 16. 16 Costly and inflexible legacy Integration between IIoT and other Systems ModbusS7 Siemens Integration Middleware Monolith Schneider Electric Integration Middleware Monolith Integration Middleware
  17. 17. 17 Huge demand to build an open, flexible, scalable platform • Cost reduction • Flexibility • Standards-based • Scalability • Extendibility • Security • Infrastructure-independent
  18. 18. 18 Agenda 1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning 2) Automation Industry and its Challenges 3) Architecture for End-to-End Integration from Edge to Data Center / Cloud 4) Apache Kafka as Event Streaming Platform 5) Apache PLC4X for Edge Integration 6) Example: Supply Chain Optimization at Scale in Real Time
  19. 19. 19 ? IIoT Architecture (High Level) Kafka BrokerKafka BrokerStreaming Platform Connect w/ MQTT connector GatewayDevicesDevicesDevicesMachine Sensor Analytics (Real Time) Predictive Maintenance (Near Real Time) Machine Learning (Batch) Edge Data Center / Cloud How to integrate and process data at scale and reliable?
  20. 20. 20 Vendor-Neutral IoT Architectures across Edge, On Premise and Multi-Cloud On-Premise / Edge Deploy on bare-metal, VMs, containers or Kubernetes in your datacenter with Confluent Platform and Confluent Operator Public Cloud Implement self-managed in the public cloud or adopt a fully managed service with Confluent Cloud Hybrid Cloud Build a persistent bridge between datacenter and cloud with Confluent Replicator Confluent Replicator VM SELF MANAGED FULLY MANAGED
  21. 21. Data Lake Batch Analytics Event Streaming Platform Batch Integration Real Time Pre- processing Machine Sensors Streaming Platform Other Components Real Time Processing (6b) All Data (7) Potential Defect (3) Read Data Optimization / Analytics (5) Deploy Optimization Model (8b) Alert Person (e.g. Mobile App) (2) Preprocess Data (6a) Consume machine data Model Standard based Integration (8a) Stop Machine (1) Ingest Data Real Time Edge Computing Model Lite Real Time App Model Server RPC PLC Proprietary based Integration Standard Interface Proprietary Interface (9) Manual user-based analytics and reporting to find insights and improve real time process
  22. 22. 22 Agenda 1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning 2) Automation Industry and its Challenges 3) Architecture for End-to-End Integration from Edge to Data Center / Cloud 4) Apache Kafka as Event Streaming Platform 5) Apache PLC4X for Edge Integration 6) Example: Supply Chain Optimization at Scale in Real Time
  23. 23. 23 The beginning of a new Era https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying The first use case. This is why Kafka was created!
  24. 24. 24 The Log ConnectorsConnectors Producer Consumer Streaming Engine Apache Kafka—The Rise of an Event Streaming Platform
  25. 25. 25 ● Global-scale ● Real-time ● Persistent Storage ● Stream Processing Edge Cloud Data LakeDatabases Datacenter IoT SaaS AppsMobile Microservices Machine Learning Apache Kafka Apache Kafka: The De-facto Standard for Real-Time Event Streaming
  26. 26. 26 Apache Kafka at Scale at Tech Giants > 4.5 trillion messages / day > 6 Petabytes / day “You name it” * Kafka Is not just used by tech giants ** Kafka is not just used for big data
  27. 27. 27 Confluent - Business Value per Use Case Improve Customer Experience (CX) Increase Revenue (make money) Business Value Decrease Costs (save money) Core Business Platform Increase Operational Efficiency Migrate to Cloud Mitigate Risk (protect money) Key Drivers Strategic Objectives (sample) Fraud Detection IoT sensor ingestion Digital replatforming/ Mainframe Offload Connected Car: Navigation & improved in-car experience: Audi Customer 360 Simplifying Omni-channel Retail at Scale: Target Faster transactional processing / analysis incl. Machine Learning / AI Mainframe Offload: RBC Microservices Architecture Online Fraud Detection Online Security (syslog, log aggregation, Splunk replacement) Middleware replacement Regulatory Digital Transformation Application Modernization: Multiple Examples Website / Core Operations (Central Nervous System) The [Silicon Valley] Digital Natives; LinkedIn, Netflix, Uber, Yelp... Predictive Maintenance: Audi Streaming Platform in a regulated environment (e.g. Electronic Medical Records): Celmatix Real-time app updates Real Time Streaming Platform for Communications and Beyond: Capital One Developer Velocity - Building Stateful Financial Applications with Kafka Streams: Funding Circle Detect Fraud & Prevent Fraud in Real Time: PayPal Kafka as a Service - A Tale of Security and Multi-Tenancy: Apple Example Use Cases $↑ $↓ $ Example Case Studies (of many)
  28. 28. 28 Apache Kafka - A Distributed Commit Log Writers Kafka cluster Readers
  29. 29. 29 Kafka Topics my-topic my-topic-partition-0 my-topic-partition-1 my-topic-partition-2 broker-1 broker-2 broker-3
  30. 30. 30 P Decoupled Producers and Consumers Time C2 C3C1
  31. 31. 31 Partition Leadership and Replication Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  32. 32. 32 Confluent Schema Registry
  33. 33. 33 Kafka Streams Your app sinksource KafkaConnect KafkaConnect Kafka Cluster Apache Kafka includes Kafka Connect and Kafka Streams
  34. 34. 34 Kafka Streams ● No separate processing cluster required ● Develop on Mac, Linux, Windows ● Deploy to containers, VMs, bare metal, cloud ● Powered by Kafka: elastic, scalable, distributed, battle-tested ● Perfect for small, medium, large use cases ● Fully integrated with Kafka security ● Exactly-once processing semantics ● Part of Apache Kafka KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic"); KTable<Windowed<User>, Long> viewsPerUserSession = pageViews .groupByKey() .count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views"); https://docs.confluent.io/current/streams/ Write standard Java apps and microservices to process your data in real-time
  35. 35. 35 KSQL: Enable Stream Processing using SQL-like Semantics Leverage Kafka Streams API using simple SQL commands KSQL server Engine (runs queries) REST API CLIClients Confluent Control Center GUI Kafka Cluster Use any programming language Connect via Control Center UI, CLI, REST or deploy in headless mode
  36. 36. 36 streams The streaming SQL engine for Apache Kafka CREATE STREAM fraudulent_payments AS SELECT * FROM payments WHERE fraudProbability > 0.8; Apache Kafka library to write real-time applications and microservices in Java and Scala confluent.io/product/ksql Confluent KSQL You write only SQL. No Java, Python, or other boilerplate to wrap around it! Event Transformation with Stream Processing
  37. 37. 37 Kafka Connect ● Centralized management and configuration ● Support for hundreds of technologies including RDBMS, Elasticsearch, HDFS, S3 ● Supports CDC ingest of events from RDBMS ● Preserves data schema ● Fault tolerant and automatically load balanced ● Extensible API ● Single Message Transforms ● Part of Apache Kafka { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo", "table.whitelist": "sales,orders,customers" } https://docs.confluent.io/current/connect/ Reliable and scalable integration of Kafka with other systems
  38. 38. 38 Connect External Data Sources and Sinks with Connectors SOURCES SINKS CDC Connectors developed and supported by Confluent, partners and the open source community available on confluent.io/hub
  39. 39. 39 IoT Integration with Kafka Connect, MQTT and REST Proxy Video and Slides: https://www.confluent.io/kafka-summit-sf18/processing-iot-data-from-end-to-end
  40. 40. 40 Native, decoupled Integration between IIoT and other Systems ModbusSiemens S7 Siemens S7 Siemens S7 Modbus Modbus Modbus Kafka Connect Kafka Connect Siemens S7 ?
  41. 41. 41 Agenda 1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning 2) Automation Industry and its Challenges 3) Architecture for End-to-End Integration from Edge to Data Center / Cloud 4) Apache Kafka as Event Streaming Platform 5) Apache PLC4X for Edge Integration 6) Example: Supply Chain Optimization at Scale in Real Time
  42. 42. 42 Apache PLC4X • Top Level Apache project • PLC 4 (for) X (anything) • Goal: Open up PLC interfaces to outside world • Vertical integration • Write software independent of PLC • JDBC-like Adapters for various protocols https://plc4x.apache.org/
  43. 43. 43 Code Example – Connection to Siemens S7 PLC Feels like JDBC
  44. 44. 44 Native, decoupled Integration between IIoT and other Systems ModbusSiemens S7 Siemens S7 Siemens S7 Modbus Modbus ModbusSiemens S7 Kafka Connect
  45. 45. 45 One more thing à PLC4X vs. OPC-UA • Open standard • All the pros and cons of an open standard (works with different vendors; slow adoption; inflexible, etc.) • Often poorly implemented • Requires app server on top of PLC • Every device has to be retrofitted with the ability to speak a new protocol and use a common client to speak with these devices • Often overengineering for just reading the data • Activating OPC-UA support on existing PLCs greatly increases the load on the PLCs • With licensing cost for every machine • Open source framework (Apache 2.0 license) • Provides unified API by implementing drivers for communicating with most industrial controllers in the protocols they natively understand • No need to modify existing hardware • No increased load on the PLCs • No need to pay for licenses to activate OPC-UA support • Drivers being implemented from the specs or by reverse engineering protocols in order to be fully Apache 2.0 licensed • PLC4X adapter for OPC-UA available -> Both can be used together!
  46. 46. 46 Agenda 1) Modern IIoT Use Cases around Cloud, Big Data, Machine Learning 2) Automation Industry and its Challenges 3) Architecture for End-to-End Integration from Edge to Data Center / Cloud 4) Apache Kafka as Event Streaming Platform 5) Apache PLC4X for Edge Integration 6) Example: Supply Chain Optimization at Scale in Real Time
  47. 47. Spark Notebooks (Jupyter) Kafka Cluster Kafka Connect KSQL Machine Sensors Kafka Ecosystem Other Components Real Time Kafka Streams Application (Java / Scala) (6b) All Data (7) Potential Defect (3) Read Data TensorFlow I/O TensorFlow (5) Deploy Model (2) Preprocess Data (6a) Consume machine data TensorFlow File HTTP MQTT ROS (8a) Stop Machine (1) Ingest Data Real Time Edge Computing (C / librdkafka) TensorFlow Lite Real Time Kafka App TensorFlow Serving HTTP / gRPC (4) Train Model PLC Beckhoff S7 Modbus Allen Bradley OPC-UA PLC4X Connector Kafka Connect Standard Interface Proprietary Interface (8b) Alert Person (e.g. Mobile App) (9) Manual user-based analytics and reporting to find insights and improve real time process
  48. 48. Example Project: Supply Chain Optimization in Real Time at Scale
  49. 49. Planners forecast long term schedule Production begins IOT data from production: inventories, manufacturing machines, yield metrics Production forecast Forecasted production - plan diffs Re optimize plan based on actuals Change orders to supply chain: inventory, manufacturing schedules Change operational characteristics : plant 223 needs new Al extruder Customer delivery SLAs: actuals vs. plan Streaming analytics using Confluent Batch analytics using other frameworks Physical operations UI UI UIUI (Reference use case implemented with our partner Expero)
  50. 50. Planners forecast long term schedule Production begins IOT data from production: inventories, manufacturing machines, yield metrics Production forecast Forecasted production - plan diffs Re optimize plan based on actuals Change orders to supply chain: inventory, manufacturing schedules Change operational characteristics : plant 223 needs new Al extruder Customer delivery SLAs: actuals vs. plan UI UI UIUI Kafka Connect + PLC4X Connector Machine Sensors Kafka Cluster KSQL Tensor Flow Kafka Connect Notebooks (Jupyter) Spark Real Time Kafka App Streaming analytics using Confluent Batch analytics using other frameworks Physical operations TensorFlow Serving (Reference use case implemented with our partner Expero)
  51. 51. 51 Supply Chain Optimization in Real Time at Scale Slides and Video Recording: http://www.kai-waehner.de/blog/2019/08/23/apache-kafka-machine-learning-for-real-time-supply-chain-iiot-opcua-modbus/
  52. 52. Why for IIoT Projects?
  53. 53. 53 Confluent Platform The Event Streaming Platform Built by the Original Creators of Apache Kafka® Operations and Security Development & Stream Processing Apache Kafka Confluent Platform Support,Services, Training,&Partners Mission-Critical Reliability Complete Event Streaming Platform Freedom of Choice Datacenter Public Cloud Confluent Cloud Self-Managed Software Fully Managed Service
  54. 54. 56 Confluent Platform – Benefits for IoT Projects • Based on open source and de facto standards for IoT projects • Low license / subscription costs for Confluent support / services / training (compared to traditional IoT vendors + their products) • Spend budget for consulting to realize the project successfully • Mission critical deployments at large scale in various industries • Automotive, Manufacturing, Logistics, Oil&Gas, Retail, Telco, … • Flexible architecture • Lightweight infrastructure footprint on commodity hardware • Pick what you need • Deploy where you want • Complementary to other frameworks, technologies (e.g. Siemens MindSphere, Cisco Kinetic) and cloud services (e.g. Google Cloud IoT) • Customize and build for the specific customer use case • Battle-tested at large scale • Event Streaming Platform for real time integration and processing (plus integration to batch, file and other communication protocols) • Security and reliability as core concepts • Elastic scalability, start small and grow to extreme scale easily • Partner (open source) technologies for specific integrations (like HiveMQ or PLC4X) • Integration with any legacy and modern technology • IoT standards like MQTT or OPC-UA • Legacy and proprietary IIoT protocols like Modbus, Siemens S7, Beckhoff, Allen Bradley, etc. • Modern technologies like S3, HDFS, MongoDB, etc. • Modern applications (business services like Salesforce and IoT solutions like Siemens MindSphere)
  55. 55. 57 Confluent and IoT Platform Solutions Kafka Cluster Siemens MindSphere KSQL Machine Sensors File HTTP MQTT ROS PLC Beckhoff S7 Modbus OPC-UA “you-name-it” PLC4X Connector Kafka Connect Azure IoT Hub Framework or solution? Or both as complementary technologies? S7 PLC
  56. 56. 58 Kai Waehner Technology Evangelist contact@kai-waehner.de @KaiWaehner www.kai-waehner.de www.confluent.io LinkedIn Questions? Feedback? Let’s connect!

×