Streaming patterns revolutionary architectures

© 2017 MapR Technologies
Streaming Patterns, Revolutionary
Architectures
Carol McDonald
@caroljmcdonald

Agenda
Streams Core Components
Patterns
•  Event Sourcing
•  Duality of Streams and Databases
•  Command Query Responsibility Separation
•  Polyglot Persistence, Multiple Materialized Views
•  Turning the Database Upside Down
Real World Examples
•  Retail Monolith to Microservice
•  Healthcare Exchange

What’s a Stream ?
Producers ConsumersEvents_Stream
A stream is an unbounded sequence of events carried
from a set of producers to a set of consumers.
Events

What is Streaming Data? Got Some Examples?
Data Collection
Devices
Smart Machinery Phones and Tablets Home Automation
RFID Systems Digital Signage Security Systems Medical Devices

Why Streams?
Trigger Events:
•  Stock Prices
•  User Activity
•  Sensor Data
Topic
Many Big Data sources are Event Oriented
StreamStreamStream
Event Data
TopicTopic
Real-Time Analytics

Analyze Data
What if you need to analyze data as it arrives?

It was hot
at 6:05
yesterday!
Batch Processing
Analyze
6:01 P.M.: 72°
6:02 P.M.: 75°
6:03 P.M.: 77°
6:04 P.M.: 85°
6:05 P.M.: 90°
6:06 P.M.: 85°
6:07 P.M.: 77°
6:08 P.M.: 75°
90°90°
6:01 P.M.: 72°
6:02 P.M.: 75°
6:03 P.M.: 77°
6:04 P.M.: 85°
6:05 P.M.: 90°
6:06 P.M.: 85°
6:07 P.M.: 77°
6:08 P.M.: 75°

Event Processing with Streams
6:05 P.M.: 90°
To
pic
Stream
Temperature
Turn on the air
conditioning!

Organize Data
What if you need to organize data as it arrives?

Integrating Many Data Sources and Applications
Sources
(Producers)
Applications
(Consumers)
Unorganized, Complicated, and Tightly Coupled.

Organize Data into Topics with MapR Streams
Topics Organize Events into Categories and Decouple Producers from Consumers
Consumers
MapR Cluster
Topic: Pressure
Topic: Temperature
Topic: Warnings
Consumers
Consumers
Kafka API Kafka API

Process High Volume of Data
What if you need to process a high volume of data as it arrives?

What if BP had detected problems before the oil hit the water ?
•  1M samples/sec
•  High performance at
scale is necessary!

Traditional Message queue
Huge performance hit:
•  Lots of disk I/O

Scalable Messaging with MapR Streams
Server 1
Partition1: Topic - Pressure
Partition1: Topic - Temperature
Partition1: Topic - Warning
Server 2
Server 3
Topics are
partitioned for
throughput and
scalability

Producers are load
balanced between partitions
Kafka API

Consumers
Consumers
Consumers
Consumer groups can read in parallel
Kafka API

Partition is like a Queue
Consumers
MapR Cluster
Topic: Admission / Server 1
Consumers
Consumers
Partition
1
New Messages are
appended to the end
Partition
2
Partition
3
6 5 4 3 2 1
3 2 1
5 4 3 2 1
Producers
Producers
Producers
New
Message
6 5 4 3 2 1
Old
Message

Events are delivered in the order they are received, like a queue
messages are delivered in the order they are received
MapR Cluster
6 5 4 3 2 1
Consumer
groupProducers
Read cursors
Consumer
group

Unlike a queue, events are persisted even after they’re delivered
Messages remain on the partition, available to other consumers
Minimizes Non-Sequential disk read-writes
MapR Cluster (1 Server)
Topic: Warning
Partition
1
3 2 1 Unread Events
Get Unread
3 2 1
Client Library ConsumerPoll

When Are Messages Deleted?
•  Messages can be persisted forever
•  Or
•  Older messages can be deleted automatically based on time to live
MapR Cluster (1 Server)
6 5 4 3 2 1Partition
1
Older
message

Processing Same Message for Different Purposes
Consumers
Consumers
Consumers
Producers
Producers
Producers
MapR-FS
Kafka API Kafka API

Partition Fault Tolerance

Message Recovery
What if you need to recover messages in case of server failure?

Partitions are Replicated for Fault Tolerance
Producer
Producer
Server 2 Partition2: Topic - Warning
Producer
Server 2
Server 3
Server 1
Server 3
Server 1
Server 2

Partition1: Warning
Partition2: Warning Replica
Partition3: Warning
Producer
Producer
Producer
Server 1
Server 2
Server 3
Security Investigation &
Event Management
Operational
Intelligence
Real-time Analytics
Partition2: Warning

Producer
Producer
Producer
Event Management
Operational
Intelligence
Real-time Analytics
Partition1: Warning
Partition3: Warning
Server 1
Server 2
Server 3
Partition2: Warning

Partitions are Replicated for Fault tolerance
Producer
Producer
Producer
Event Management
Operational
Intelligence
Real-time Analytics
Partition1: Warning
Partition3: Warning
Server 1
Server 2
Server 3
Partition2: Warning

Streams and High Availability

Real-time Access
What if you need real-time access to live data distributed across multiple clusters
and multiple data centers?

Streams and Replication
Streams:
•  can be replicated worldwide
Topic: A
Topic: B
Topic: C
Topic: A
Topic: B
Topic: C
Replicating to
another
cluster

Streams:
•  high availability
•  disaster recovery
Topic: A
Topic: B
Topic: C
Fail Over

Patterns

Batch
Architecture
mins - hrs
Streaming
Architecture
ms - secs

Event Sourcing
Updates
Imagine each event as a change to an entry in a database.
Account Id Balance
WillO 80.00
BradA 20.00
1: WillO : Deposit : 100.00
2: BradA : Deposit : 50.00
3: BradA : Withdraw : 30.00
4: WillO : Withdraw: 20.00
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Change log
4 3 2 1
queue of all deposit and withdrawal events current account balances

Replication
Change Log
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
3 2 1 3 2 1
3 2 1
Duality of Streams and Tables
Master:
Append writes
Slave:
Apply writes in order

Which Makes a Better System of Record?
Which of these can be used to reconstruct the other?
Account Id Balance
WillO 80.00
BradA 20.00
Change Log
3 2 1

Rewind: Reprocessing Events
MapR Cluster
6 5 4 3 2 1Producers
Reprocess from
oldest
Consumer
Create new view, Index, cache

Rewind Reprocessing Events
MapR Cluster
6 5 4 3 2 1Producers
To Newest
Consumer new view
Read from
new view

Event Sourcing, Command Query Responsibility Separation:
Turning the Database Upside Down
Key-Val Document Graph
Wide
Column
Time
Series
Relational
???Events Updates

What Else Do I Use My Stream For?
Lineage - “how did BradA’s balance get so low?”
Auditing - “who deposited/withdrew from BradA’s account?”
History – to see the status of the accounts last year
Integrity - “can I trust this data hasn’t been tampered with?”
•  Yup - Streams are immutable

What Do I Need For This to Work?
Infinitely persisted events
A way to query your persisted stream data
An integrated security model across the stream and databases

Examples with Patterns

Breaking up Online shopping rating items into Microservices
Concurrency
bottleneck

Separate Write from Read using CQRS
Command Query Responsibility Separation
Separate the Rate Item write “command”
from the Get Item Ratings read “query” using event sourcing
{
"itemid": "sku124",
"rating": "4",
"userid": "cmcdonald",
"comment": "works well"
}
{
"itemid": "sku124",
"pname": "bluetooth earbud",
"ratings": [
{
"rating": "4",
"userid": "cmcdonald",
"comment": "works well"
},
{
"rating": "1",
"userid": "diego",
"comment": "hated it"
}]
}

NoSQL Scaling Fast Reads and Writes
Design your schema so that the data that is read together is
stored together

Event Sourcing: New Uses of Data
Add new Services like Recommendations

Fraud Detection
Point of Sale -> Data Center is Transaction Fraud ?
•  Lots of requests
•  Need answer within ~ 50 100 milliseconds
Data
Center
Point of Sale
Location, time, card#
Fraud yes/no ?

Traditional Solution
POS
1..n
Fraud
detector
Last card
use
1.  Look up last card use
2.  Compute the card velocity:
•  Subtract last location, time from
current location, time
3.  Update last card use

What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
1.  Read last card use
2.  Compute the card velocity
3.  Update last card use

Service Isolation: Separate Read from Write
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
Read
Read last card use

Separate Read Model from the Write Model:
Command Query Responsibility Separation
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
Read
Event last card use
Write last card use

Event Sourcing: New Uses of Data
Processing Same Message for Different Views
POS
1..n
Fraud
detector
Last card
use
Updater
Card
location
history
Other
card activity

Scaling Through Isolation
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector
Multiple fraud detectors can use the same message queue

Lessons
De-coupling and isolation are key
Propagate events, not table updates

Real World Solution

Use Case: Streaming System of Record for Healthcare
Objective:
•  Build a flexible, secure
healthcare exchange
Records Analysis
Applications
Challenges:
•  Many different data models
•  Security and privacy issues
•  HIPAA compliance
Records

© 2017 MapR Technologies59
ALLOY Health:
Exchange State HIE
Clinical Data Viewer
Reporting and Analytics
Clinical Data
Financial Data
Provider
Organizations

This is a PAIN !
COMPLIAN
CE
SECURITY CONTROLS
COMPLIANCE
FEATURES
PRIVACY
PCI DSS
3.0
21 CFR Part
11
SSAE16 /
SOC2
HIPAA/HITECH

WHY NOW?
2014 FQ4 profit
$ -440 M
Total Cost Estimate
$ -12 B

Why Now? The Relational database is not the only tool
1234
Attribute Value
patient_id 1234
Name Jon Smith
Age 50
999
Attribute Value
patient_id 999
Name Jonathan
Smith
DOB Jun 1965
86
9876
Attribute Value
provider_id 86
Name Dr. Nora Paige
Specialty Diabetes
Attribute Value
rx_id 9876
Name Sitagliptin
Dosage 325mg
Visited
Prescribed
WasPrescribed
Patient
Patient
Prescription
Provider
Context and Relationships

WHY NOW? Mind the Gap
63

Streaming System of Record for Healthcare
Stream
Topic
Records
Applications
6 5 4 3 2 1
Search
Graph DB
JSON
HBase
Micro
Service
Micro
Service
Micro
Service
Micro
Service
Micro
Service
Micro
Service
A
P
I
Streaming System of Record Materialized
Views
Consumer
workflow
Consumer
workflow
Consumer
workflowImmutable Log
pre-
processor

65
Immutable Log
Raw
Data
workflow
Key/Value
(MapR-DB)
materialized
view
workflow
Search
Engine
materialized
view
CEP
k v v v v v
k v v v
k v v
k v v v v
k v v v
k v v v v v
Document Log
(MapR-FS)
log
API
App
pre-
processor
workflow
Graph
(ArangoDB)
materialized
view
workflow
Time
Series
(OpenTSDB)
materialized
view
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
App AppApp
...
The Promised Land
Compliance
Auditor
smiley faces
Data Lineage
Audit Logging

Solution
Design/architecture solved some
•  Streams
•  Data Lineage/System of Record
•  Kappa Architecture (Kreps/Kleppman)
MapR solved others
•  Unified Security
•  Replication DC to DC
•  Converge Kafka/HBase/Hadoop to one cluster
•  Multi-tenancy (lots of topics, for lots of tenants)
66

Challenge: Major Latency from Batch File Transfer
20-30 Minutes

Regional Datacenter
Topic
Elasticsearch
Kibana
File Server
Producer
(Java)
Consumer
(Java) Index
Filtering config
•  Monitoring directory
•  Parsing CSV files
•  Publishing messages to
topic
•  Parsing master data
•  Subscribing topic
•  Join tables
•  Aggregation
Dashboard

Streams:
Topic: A
Topic: B
Topic: C
Topic: A
Topic: B
Topic: C
Replicating to
another
cluster

Central Data Center
Ad-hoc
analysis
Other Data
Sources
Real-time
analysis
Reporting
Streaming
Stream
Topic
Replicating
Regional Data Centers
Stream
Topic
Stream
Topic
Performance
and other
monitoring
related data.
Aggregation of data across all regional data centers

Stream Processing
Building a Complete Data Architecture
MapR File System
(MapR-FS)
MapR Converged Data Platform
MapR Database
(MapR-DB)
MapR Streams
Sources/Apps Bulk Processing

To Learn More:
•  Streaming Architecture ebook
•  https://mapr.com/streaming-architecture-using-apache-kafka-mapr-streams/

MapR Blog
• https://www.mapr.com/blog/

To Learn More:
•  End to End Application for Monitoring Uber Data using Spark ML
•  https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine-
learning-streaming-and-kafka-api-part-1/

…helping you put data technology to work
●  Find answers
●  Ask technical questions
●  Join on-demand training course
discussions
●  Follow release announcements
●  Share and vote on product ideas
●  Find Meetup and event listings
Connect with fellow Apache
Hadoop and Spark professionals
community.mapr.com

To Learn More:
•  MapR Free ODT http://learn.mapr.com/

Q&A
ENGAGE WITH US

Streaming patterns revolutionary architectures

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Streaming patterns revolutionary architectures

Similar to Streaming patterns revolutionary architectures (20)

More from Carol McDonald

More from Carol McDonald (12)

Recently uploaded

Recently uploaded (20)

Streaming patterns revolutionary architectures