CEP - simplified streaming architecture - Strata Singapore 2016

1,011 views

Published on

We describe an application of CEP using a microservice-based streaming architecture. We use Drools business rule engine to apply rules in real time to an event stream from IoT traffic sensor data.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,011
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
108
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • It’s just not true ML solves all problems. ML seeks to make predictions, which is very useful. But most business processes don’t need prediction every step of the way, they are rather more like a series of steps with conditionals arranged in a DAG
  • Rules need to be:
    Independent
    Easily Updated (Add, Change, Delete)
    Rules apply to only minimum set of relevant data
    Allow business domain experts to contribute
  • Integrate Flink/Spark Streaming with Drools Performance and Scalability Testing Flink brings “for free” lots of benefits: State is saved automatically by checkpoints Fault-recovery for Drools state is simplified Record-at-a-time processing is a good model to add data to KieSession
  • CEP - simplified streaming architecture - Strata Singapore 2016

    1. 1. © 2016 MapR Technologies 1© 2016 MapR Technologies 1MapR Confidential © 2016 MapR Technologies CEP - A Simplified Enterprise Architecture for Real-time Stream Processing Mathieu Dumoulin, Data Engineer (mdumoulin@mapr.com, @lordxar)
    2. 2. © 2016 MapR Technologies 2© 2016 MapR Technologies 2MapR Confidential Mathieu Dumoulin • Living in Tokyo, Japan last 3 years • Data Engineer for MapR Professional Services • Other jobs: Data Scientist, Search Engineer • Connect with me: –Read my blog posts: https://www.mapr.com/blog/author/mathieu-dumoulin –Twitter: @Lordxar –Email: mdumoulin@mapr.com
    3. 3. © 2016 MapR Technologies 3© 2016 MapR Technologies 3MapR Confidential Content Summary 1.Complex Event Processing 2.Streaming Architecture 3.Rules Engines for CEP 4.Simplified Hadoop-based CEP Architecture 5.Live Demo 6.Does it scale? 7.Conclusion
    4. 4. © 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential Complex Event Processing (CEP) Some terminology: • Event: Data with a timestamp (a log event, a transaction, ...) • Event processing: Track and analyze streaming event data • Complex event processing is to identify meaningful events and respond to them as quickly as possible. Usually over a sliding window on the stream of event data. CEP is just a fancy way to do business rules on streaming data
    5. 5. © 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential IoT: Needs some CEP in There Somewhere
    6. 6. © 2016 MapR Technologies 6© 2016 MapR Technologies 6MapR Confidential CEP in Action The power of CEP comes from being able to detect complex situations that could not be detected from any individual data directly. Window opened Motion Sensor Light turned on Door opened
    7. 7. © 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential Actually, CEP Has Been Around For a While Taken from March 2010 issue of the Dutch Java Magazine (source)
    8. 8. © 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential Technology Has Been Holding Rule Engines Back • Rule engines are not new – First papers from the 90’s, many implementations in early 2000’s • Engine is running in-memory on single node – A few GB of memory (or less) was a severe limitation – Single core CPU can only do so much • Need modern stream messaging (Kafka, MapR Streams) – Need persistence – Need speed • No standard, no dominant sponsor – 90’s and early 2000 dominated by Microsoft – OSS had not come of age in enterprise IT
    9. 9. © 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential CEP in a Modern Enterprise Data Pipeline Source: Oracle / Rittman Mead Information Management Reference Architecture
    10. 10. © 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential Modern Streaming Architecture • Build flexible systems – more efficient and easier to build – Decouples dependencies • Better model the way business processes take place. • More value now – Aggregates data from many sources once – Serves data to one or many projects immediately • More value later – Run batch analytics on the data later – Reprocess the data with different algorithms later
    11. 11. © 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential Kafka-esque Messaging for Rule Engines • Stream Persistence is a key feature • CEP is only one use case – Support batch analytics and Ad-hoc analysis from the same data stream • Compensate for Current Rule Engine limitations – Enables Hot Replacement for fault-tolerance – Enables simple horizontal scaling by partitioning data and rules • Convergence – Run this use case on your existing, standard, big data technology – Use OSS frameworks and Open APIs
    12. 12. © 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential Roy Schulte, vice president, Gartner Most CEP in IoT [...] is custom coded [...] rather than [using a] general purpose stream platform. See: Complex Event Processing and The Future Of Business Decisions by David Luckham and W. Roy Schulte
    13. 13. © 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential Custom Coded CEP: The Good and The Bad The Good: • Made to order with a modern framework • “No limit” to potential for performance and scalability • Fit to purpose technology The bad: • Engineers aren’t business domain experts • Lots of work to build from scratch every time • Changes to logic is a pain point (from business side) • Lack of available talent/organizational capability
    14. 14. © 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential Declarative Makes Sense For Business Manage complex behavior through simple rules working together, executed by a rules Engine.
    15. 15. © 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential Drools is a business rule management system (BRMS) with a forward and backward chaining inference based rules engine. • Project homepage: http://www.drools.org/ • Developer: Red Hat • Enterprise supported version available – JBoss Enterprise BRMS • Enhanced implementation of the Rete algorithm – A state of the art algorithm for rules engines • Has a GUI Rules Editor: Workbench An Open Source Rule Engine:
    16. 16. © 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential An Open Source Rule Engine: Production Memory (Rules) Working Memory (Facts) Pattern Matcher AgendaDomain Expert Rules Editor Actions
    17. 17. © 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential STATELESS Session CEP in Drools: Stateful Session and Sliding Window STATELESS Session Rule: Is the ball red? Rule: Are there 2+ red balls in the last 4 balls I’ve seen?
    18. 18. © 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential STATEFUL Session CEP in Drools: Stateful Session + Sliding Window STATELESS Session Rule: Is the ball red? Rule: Are there 2+ red balls in the last 4 balls I’ve seen?
    19. 19. © 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential Streaming Architecture for CEP Sensors - Real-time Data Producer Distributed Cluster (Kafka, MapR) Consumer Server (Edge node, cluster node) Integrate with other systems
    20. 20. © 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential The Case for CEP on Streaming Architecture • Decouple rules maintenance from code and infrastructure – Manage the cluster separately – The application code may need only minimal maintenance • Rules maintenance in the hands of the business domain experts – Easily supports multiple projects & teams • Data is persisted in the stream (input and output) – Open to new use cases • Send data back to the stream – Integrate with other downstream use cases
    21. 21. © 2016 MapR Technologies 21© 2016 MapR Technologies 21MapR Confidential But Does It Scale? Yes, But Only to a Point • Drools and other rule engines are in-memory and the memory is not distributed – This is only a technical limitation that can be overcome (Ex: Alluxio, Apache Ignite) • Streams make it easy to provide reasonable fault- tolerance and quick disaster recovery • Run multiple servers, split rules logically, fan out data into multiple topics • A single session can handle 100K+/sec events. How much scale is needed?
    22. 22. © 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential Live Demo: Smart City Traffic Management
    23. 23. © 2016 MapR Technologies 23© 2016 MapR Technologies 23MapR Confidential ● Try out integration with Spark Streaming and Flink ● Run serious performance benchmarks ● Deploy into production
    24. 24. © 2016 MapR Technologies 24© 2016 MapR Technologies 24MapR Confidential Recap • It’s not Rule Engine vs. Spark and Flink Stream processing – It’s Rules + Stream Processing – Spark Flink, Java are just an implementation choice • Focus on business value from applying rules to data – Think of benefits of SQL vs. Java, C++, Scala, … • Great use case for a Streaming Architecture and microservices An in-depth blog post on this talk topic will be available on MapR blog: https://www.mapr.com/blog/author/mathieu-dumoulin
    25. 25. © 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential Suggested Reading ● Get Ted & Ellen’s book and many more for free: ○ https://www.mapr.com/ebooks/ ● More more great blog content about CEP and IoT applications ○ Eric Bruno on Linkedin ○ Karzel et al. on InfoQ
    26. 26. © 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential Q & A @mapr mdumoulin@mapr.com @lordxar Engage with us! mapr-technologies

    ×