Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Streaming patterns revolutionary architectures

352 views

Published on

The Stream as the system of Record , CQRS , other streaming patterns and examples

Published in: Software
  • Be the first to comment

Streaming patterns revolutionary architectures

  1. 1. © 2017 MapR Technologies Streaming Patterns, Revolutionary Architectures Carol McDonald @caroljmcdonald
  2. 2. © 2017 MapR Technologies Agenda Streams Core Components Patterns •  Event Sourcing •  Duality of Streams and Databases •  Command Query Responsibility Separation •  Polyglot Persistence, Multiple Materialized Views •  Turning the Database Upside Down Real World Examples •  Retail Monolith to Microservice •  Healthcare Exchange
  3. 3. © 2017 MapR Technologies What’s a Stream ? Producers ConsumersEvents_Stream A stream is an unbounded sequence of events carried from a set of producers to a set of consumers. Events
  4. 4. © 2017 MapR Technologies What is Streaming Data? Got Some Examples? Data Collection Devices Smart Machinery Phones and Tablets Home Automation RFID Systems Digital Signage Security Systems Medical Devices
  5. 5. © 2017 MapR Technologies Why Streams? Trigger Events: •  Stock Prices •  User Activity •  Sensor Data Topic Many Big Data sources are Event Oriented StreamStreamStream Event Data TopicTopic Real-Time Analytics
  6. 6. © 2017 MapR Technologies Analyze Data What if you need to analyze data as it arrives?
  7. 7. © 2017 MapR Technologies It was hot at 6:05 yesterday! Batch Processing Analyze 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75° 90°90° 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75°
  8. 8. © 2017 MapR Technologies Event Processing with Streams 6:05 P.M.: 90° To pic Stream Temperature Turn on the air conditioning!
  9. 9. © 2017 MapR Technologies Organize Data What if you need to organize data as it arrives?
  10. 10. © 2017 MapR Technologies Integrating Many Data Sources and Applications Sources (Producers) Applications (Consumers) Unorganized, Complicated, and Tightly Coupled.
  11. 11. © 2017 MapR Technologies Organize Data into Topics with MapR Streams Topics Organize Events into Categories and Decouple Producers from Consumers Consumers MapR Cluster Topic: Pressure Topic: Temperature Topic: Warnings Consumers Consumers Kafka API Kafka API
  12. 12. © 2017 MapR Technologies Process High Volume of Data What if you need to process a high volume of data as it arrives?
  13. 13. © 2017 MapR Technologies What if BP had detected problems before the oil hit the water ? •  1M samples/sec •  High performance at scale is necessary!
  14. 14. © 2017 MapR Technologies Traditional Message queue Huge performance hit: •  Lots of disk I/O
  15. 15. © 2017 MapR Technologies Scalable Messaging with MapR Streams Server 1 Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Server 2 Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Server 3 Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Topics are partitioned for throughput and scalability
  16. 16. © 2017 MapR Technologies Scalable Messaging with MapR Streams Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Producers are load balanced between partitions Kafka API
  17. 17. © 2017 MapR Technologies Scalable Messaging with MapR Streams Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Consumers Consumers Consumers Consumer groups can read in parallel Kafka API
  18. 18. © 2017 MapR Technologies Partition is like a Queue Consumers MapR Cluster Topic: Admission / Server 1 Topic: Admission / Server 2 Topic: Admission / Server 3 Consumers Consumers Partition 1 New Messages are appended to the end Partition 2 Partition 3 6 5 4 3 2 1 3 2 1 5 4 3 2 1 Producers Producers Producers New Message 6 5 4 3 2 1 Old Message
  19. 19. © 2017 MapR Technologies Events are delivered in the order they are received, like a queue messages are delivered in the order they are received MapR Cluster 6 5 4 3 2 1 Consumer groupProducers Read cursors Consumer group
  20. 20. © 2017 MapR Technologies Unlike a queue, events are persisted even after they’re delivered Messages remain on the partition, available to other consumers Minimizes Non-Sequential disk read-writes MapR Cluster (1 Server) Topic: Warning Partition 1 3 2 1 Unread Events Get Unread 3 2 1 Client Library ConsumerPoll
  21. 21. © 2017 MapR Technologies When Are Messages Deleted? •  Messages can be persisted forever •  Or •  Older messages can be deleted automatically based on time to live MapR Cluster (1 Server) 6 5 4 3 2 1Partition 1 Older message
  22. 22. © 2017 MapR Technologies Processing Same Message for Different Purposes Consumers Consumers Consumers Producers Producers Producers MapR-FS Kafka API Kafka API
  23. 23. © 2017 MapR Technologies Partition Fault Tolerance
  24. 24. © 2017 MapR Technologies Message Recovery What if you need to recover messages in case of server failure?
  25. 25. © 2017 MapR Technologies Partitions are Replicated for Fault Tolerance Producer Producer Server 2 Partition2: Topic - Warning Producer Server 1 Partition1: Topic - Warning Server 3 Partition3: Topic - Warning Server 2 Server 3 Server 1 Server 3 Server 1 Server 2
  26. 26. © 2017 MapR Technologies Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning Producer Producer Producer Server 1 Server 2 Server 3 Security Investigation & Event Management Operational Intelligence Real-time Analytics Partition2: Warning Partitions are Replicated for Fault Tolerance
  27. 27. © 2017 MapR Technologies Partitions are Replicated for Fault Tolerance Producer Producer Producer Security Investigation & Event Management Operational Intelligence Real-time Analytics Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning Server 1 Server 2 Server 3 Partition2: Warning
  28. 28. © 2017 MapR Technologies Partitions are Replicated for Fault tolerance Producer Producer Producer Security Investigation & Event Management Operational Intelligence Real-time Analytics Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning Server 1 Server 2 Server 3 Partition2: Warning
  29. 29. © 2017 MapR Technologies Streams and High Availability
  30. 30. © 2017 MapR Technologies Real-time Access What if you need real-time access to live data distributed across multiple clusters and multiple data centers?
  31. 31. © 2017 MapR Technologies Streams and Replication Streams: •  can be replicated worldwide Topic: A Topic: B Topic: C Topic: A Topic: B Topic: C Replicating to another cluster
  32. 32. © 2017 MapR Technologies Streams: •  high availability •  disaster recovery Streams and Replication Topic: A Topic: B Topic: C Fail Over
  33. 33. © 2017 MapR Technologies Patterns
  34. 34. © 2017 MapR Technologies Patterns
  35. 35. Batch Architecture mins - hrs Streaming Architecture ms - secs
  36. 36. © 2017 MapR Technologies Event Sourcing Updates Imagine each event as a change to an entry in a database. Account Id Balance WillO 80.00 BradA 20.00 1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00 https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying Change log 4 3 2 1 queue of all deposit and withdrawal events current account balances
  37. 37. © 2017 MapR Technologies Replication Change Log https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying 3 2 1 3 2 1 3 2 1 Duality of Streams and Tables Master: Append writes Slave: Apply writes in order
  38. 38. © 2017 MapR Technologies Which Makes a Better System of Record? Which of these can be used to reconstruct the other? 1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00 Account Id Balance WillO 80.00 BradA 20.00 Change Log 3 2 1
  39. 39. © 2017 MapR Technologies Rewind: Reprocessing Events MapR Cluster 6 5 4 3 2 1Producers Reprocess from oldest Consumer Create new view, Index, cache
  40. 40. © 2017 MapR Technologies Rewind Reprocessing Events MapR Cluster 6 5 4 3 2 1Producers To Newest Consumer new view Read from new view
  41. 41. © 2017 MapR Technologies Event Sourcing, Command Query Responsibility Separation: Turning the Database Upside Down Key-Val Document Graph Wide Column Time Series Relational ???Events Updates
  42. 42. © 2017 MapR Technologies What Else Do I Use My Stream For? Lineage - “how did BradA’s balance get so low?” Auditing - “who deposited/withdrew from BradA’s account?” History – to see the status of the accounts last year Integrity - “can I trust this data hasn’t been tampered with?” •  Yup - Streams are immutable 0: WillO : Deposit : 100.00 1: BradA : Deposit : 50.00 2: BradA : Withdraw : 30.00 3: WillO : Withdraw: 20.00
  43. 43. © 2017 MapR Technologies What Do I Need For This to Work? Infinitely persisted events A way to query your persisted stream data An integrated security model across the stream and databases
  44. 44. © 2017 MapR Technologies Examples with Patterns
  45. 45. © 2017 MapR Technologies Breaking up Online shopping rating items into Microservices Concurrency bottleneck
  46. 46. © 2017 MapR Technologies Separate Write from Read using CQRS Command Query Responsibility Separation Separate the Rate Item write “command” from the Get Item Ratings read “query” using event sourcing { "itemid": "sku124", "rating": "4", "userid": "cmcdonald", "comment": "works well" } { "itemid": "sku124", "pname": "bluetooth earbud", "ratings": [ { "rating": "4", "userid": "cmcdonald", "comment": "works well" }, { "rating": "1", "userid": "diego", "comment": "hated it" }] }
  47. 47. © 2017 MapR Technologies NoSQL Scaling Fast Reads and Writes Design your schema so that the data that is read together is stored together
  48. 48. © 2017 MapR Technologies Event Sourcing: New Uses of Data Add new Services like Recommendations
  49. 49. © 2017 MapR Technologies Fraud Detection Point of Sale -> Data Center is Transaction Fraud ? •  Lots of requests •  Need answer within ~ 50 100 milliseconds Data Center Point of Sale Location, time, card# Fraud yes/no ?
  50. 50. © 2017 MapR Technologies Traditional Solution POS 1..n Fraud detector Last card use 1.  Look up last card use 2.  Compute the card velocity: •  Subtract last location, time from current location, time 3.  Update last card use
  51. 51. © 2017 MapR Technologies What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector 1.  Read last card use 2.  Compute the card velocity 3.  Update last card use
  52. 52. © 2017 MapR Technologies Service Isolation: Separate Read from Write POS 1..n Fraud detector Last card use Updater card activity Read Read last card use
  53. 53. © 2017 MapR Technologies Separate Read Model from the Write Model: Command Query Responsibility Separation POS 1..n Fraud detector Last card use Updater card activity Read Event last card use Write last card use
  54. 54. © 2017 MapR Technologies Event Sourcing: New Uses of Data Processing Same Message for Different Views POS 1..n Fraud detector Last card use Updater Card location history Other card activity
  55. 55. © 2017 MapR Technologies Scaling Through Isolation POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector Multiple fraud detectors can use the same message queue
  56. 56. © 2017 MapR Technologies Lessons De-coupling and isolation are key Propagate events, not table updates
  57. 57. © 2017 MapR Technologies Real World Solution
  58. 58. © 2017 MapR Technologies Use Case: Streaming System of Record for Healthcare Objective: •  Build a flexible, secure healthcare exchange Records Analysis Applications Challenges: •  Many different data models •  Security and privacy issues •  HIPAA compliance Records
  59. 59. © 2017 MapR Technologies59 ALLOY Health: Exchange State HIE Clinical Data Viewer Reporting and Analytics Clinical Data Financial Data Provider Organizations
  60. 60. © 2017 MapR Technologies This is a PAIN ! COMPLIAN CE SECURITY CONTROLS COMPLIANCE FEATURES PRIVACY PCI DSS 3.0 21 CFR Part 11 SSAE16 / SOC2 HIPAA/HITECH
  61. 61. © 2017 MapR Technologies WHY NOW? 2014 FQ4 profit $ -440 M Total Cost Estimate $ -12 B
  62. 62. © 2017 MapR Technologies Why Now? The Relational database is not the only tool 1234 Attribute Value patient_id 1234 Name Jon Smith Age 50 999 Attribute Value patient_id 999 Name Jonathan Smith DOB Jun 1965 86 9876 Attribute Value provider_id 86 Name Dr. Nora Paige Specialty Diabetes Attribute Value rx_id 9876 Name Sitagliptin Dosage 325mg Visited Prescribed WasPrescribed Patient Patient Prescription Provider Context and Relationships
  63. 63. © 2017 MapR Technologies WHY NOW? Mind the Gap 63
  64. 64. © 2017 MapR Technologies Streaming System of Record for Healthcare Stream Topic Records Applications 6 5 4 3 2 1 Search Graph DB JSON HBase Micro Service Micro Service Micro Service Micro Service Micro Service Micro Service A P I Streaming System of Record Materialized Views Consumer workflow Consumer workflow Consumer workflowImmutable Log pre- processor
  65. 65. © 2017 MapR Technologies 65 Immutable Log Raw Data workflow Key/Value (MapR-DB) materialized view workflow Search Engine materialized view CEP k v v v v v k v v v k v v k v v v v k v v v k v v v v v Document Log (MapR-FS) log API App pre- processor workflow Graph (ArangoDB) materialized view workflow Time Series (OpenTSDB) materialized view micro service micro service micro service micro service micro service micro service micro service micro service App AppApp ... The Promised Land Compliance Auditor smiley faces Data Lineage Audit Logging
  66. 66. © 2017 MapR Technologies Solution Design/architecture solved some •  Streams •  Data Lineage/System of Record •  Kappa Architecture (Kreps/Kleppman) MapR solved others •  Unified Security •  Replication DC to DC •  Converge Kafka/HBase/Hadoop to one cluster •  Multi-tenancy (lots of topics, for lots of tenants) 66
  67. 67. © 2017 MapR Technologies Real World Solution
  68. 68. © 2017 MapR Technologies Challenge: Major Latency from Batch File Transfer 20-30 Minutes
  69. 69. © 2017 MapR Technologies Regional Datacenter Topic Elasticsearch Kibana File Server Producer (Java) Consumer (Java) Index Filtering config •  Monitoring directory •  Parsing CSV files •  Publishing messages to topic •  Parsing master data •  Subscribing topic •  Join tables •  Aggregation Dashboard
  70. 70. © 2017 MapR Technologies Streams and Replication Streams: Topic: A Topic: B Topic: C Topic: A Topic: B Topic: C Replicating to another cluster
  71. 71. © 2017 MapR Technologies Central Data Center Ad-hoc analysis Other Data Sources Real-time analysis Reporting Streaming Stream Topic Replicating Regional Data Centers Stream Topic Stream Topic Performance and other monitoring related data. Aggregation of data across all regional data centers
  72. 72. © 2017 MapR Technologies Stream Processing Building a Complete Data Architecture MapR File System (MapR-FS) MapR Converged Data Platform MapR Database (MapR-DB) MapR Streams Sources/Apps Bulk Processing
  73. 73. © 2017 MapR Technologies To Learn More: •  Streaming Architecture ebook •  https://mapr.com/streaming-architecture-using-apache-kafka-mapr-streams/
  74. 74. © 2017 MapR Technologies
  75. 75. © 2017 MapR Technologies MapR Blog • https://www.mapr.com/blog/
  76. 76. © 2017 MapR Technologies To Learn More: •  End to End Application for Monitoring Uber Data using Spark ML •  https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine- learning-streaming-and-kafka-api-part-1/
  77. 77. © 2017 MapR Technologies …helping you put data technology to work ●  Find answers ●  Ask technical questions ●  Join on-demand training course discussions ●  Follow release announcements ●  Share and vote on product ideas ●  Find Meetup and event listings Connect with fellow Apache Hadoop and Spark professionals community.mapr.com
  78. 78. © 2017 MapR Technologies To Learn More: •  MapR Free ODT http://learn.mapr.com/
  79. 79. © 2017 MapR Technologies Q&A ENGAGE WITH US

×