New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
The Stream is the Database - Revolutionizing Healthcare Data Architecture
1. The Stream is the Database -
Revolutionizing Healthcare Data
Architecture
Brad Anderson, VP Big Data Informatics, Liaison
Will Ochandarena, Director of Product, MapR
2. About Us
Now:
• Head of Data Management @ Liaison
• Board Member @ OnKöl
Before:
• SE @ MapR
• Founder @ Heartbyte
• Founder @ Verdeeco
Now: Product guy for Streams @ MapR
Before:
• OpenStack product guy @ AMD/SeaMicro
• Network product guy @ Cisco
Brad Will
3. Agenda
• “Stream System of Record” - A Techie Concept (Will)
• Applied “Stream System of Record” @ Liaison (Brad)
5. What’s a Stream Again?
Producers ConsumersEvents_Stream
A stream is an unbounded sequence of events carried from a
set of producers to a set of consumers.
Events
6. What’s a Stream Again?
Unlike with a queue, events are persisted even after they’re delivered.
Events are delivered in the order they are received, like a queue.
7. What Does That Have to Do With a Database?
DMV_Updates
Imagine each event as a change to an entry in a database.
DL_ID City Points
0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 }
1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 }
2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213}
3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 }
WillO
BradA
Mountain View
Atlanta
0
0
San Jose
2
8. Streams and Databases in Harmony
Key-Val Document Graph
Wide Column Time Series Relational
???Inserts Updates
9. What Else Do I Use My Stream For?
• Lineage - “how did BradA’s points get so high?”
• Auditing - “who added points to BradA license?”
• History - “where did WillO used to live?”
• Integrity - “can I trust this data hasn’t been tampered with?”
• Yup - Streams are immutable
0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 }
1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 }
2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213}
3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 }
10. Which Makes a Better System of Record?
Which of these can be used to reconstruct the other?
0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 }
1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 }
2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213}
3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 }
DL_ID City Points
Will0 San Jose 0
BradA Atlanta 2
11. What Do I Need For This to Work?
• Infinitely persisted events
• A way to query your persisted stream data
• An integrated security model across the stream and databases
21. 21
Immutable Log
Raw
Data
workflow
Key/Value
(MapR)
materialized view
workflow
Search
Engine
(ElasticSearch)
materialized view
CEP
k v v v v v
k v v v
k v v
k v v v v
k v v v
k v v v v v
Document Log
(MapR)
log
API
App
pre-
processor
workflow
Graph
(ArangoDB)
materialized view
workflow
Time
Series
(OpenTSDB)
materialized view
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
micro
service
App AppApp
...
The Promised Land Compliance
Auditor
22. The Promised Land
• Auditor smiley faces
• Data Lineage
• Audit Logging
• Wire-level encryption
• At Rest encryption
• Replication
• Disaster Recovery
• EU – data can’t leave
• Non-Stream / Non-”Big Data”
• Software Development Lifecycle
• System Hardening
• Separation of Concerns
• Dev vs Ops
• Patch Management
22
23. Solution
• Design/architecture solved some
• Streams
• Data Lineage/System of Record
• Kappa Architecture (Kreps/Kleppman)
• MapR solved others
• Unified Security
• Replication DC to DC
• Converge Kafka/HBase/Hadoop to one cluster
• Multi-tenancy (lots of topics, for lots of tenants)
23
While streams are generic constructs, they can be used to model databases. Imagine that each “event” is an incremental update to an entry in a database. In this case, the state of a particular entry database is simply the accumulation of events pertaining to that entry.
While a stream can be used as a proxy for a database, it isn’t a replacement - there are lots of databases out there, each optimized for a type of lookup. Usually databases won’t even replace your database - so what do you do when you need to keep multiple systems synchronized?
You add a stream.
Streams are good for more than just keeping your databases in sync. You can think of your stream as yet another database in your architecture, you just query it for different types of things.