Today's customers demand applications which integrate intelligently with data from mobile, social media and cloud sources. A system of engagement meets these expectations by applying data and analytics drawn from an array of master systems. The enormous scale and performance required overwhelm relational approaches, but we can use MongoDB to meet the challenge. We'll learn to capture and transmit data changes among disparate systems, expose batch data as interactive operational queries and build systems with strong division of concerns, agility and flexibility.
2. 2
Part 2 In The Data Management Series
Data integration
Capture data changes
Engaging with your data
From Relational
To MongoDB
Conquering
Data Proliferation
Bulletproof
Data Management
ç
Ω
Part
1
Part
2
Part
3
6. 6
Result
• Data walled off in "silos"
• Can't get a complete picture
• Have to "swivel chair" system to system
• Hard to find new avenues to add value
• Frustrated ops
• Frustrated customers
7. 7
Example
• 20+ million Veterans in the US today
• 250,000+ employees at Veterans Affairs
• $3.9 billion for IT in 2015 budget
• What happens when a Veteran has to change their
address with the VA?
• How does a doctor see a single view of a Veteran's
health record?
9. 9
Big Wave of Change Happening
Today's Systems of Record were
yesterday's Systems of Engagement!
Enterprise IT Transition From
• Systems of Record
To the Next Stage
• Systems of Engagement
10. 10
Definition
• Incorporate technologies which encourage peer
interactions
• More decentralized
• More options for infrastructure especially cloud
• Enable new / faster interactions
11. 11
Notional Architecture
Systems of Engagement
DataServices Data Processing
Integration,
Analytics, etc.
Systems of Record
Master Data
Raw Data
Integrated Data
…
12. 12
Many Complexities to Tackle
• Data Extraction (ETL)
• Change Data Capture (CDC)
• Data Governance
• Data Lineage
– Versioning
– Merging changes
• Security / Entitlements
13. 13
Focus for Today
• Data Extraction (ETL)
• Change Data Capture (CDC)
• Data Governance
• Data Lineage
– Versioning
– Merging changes
• Security / Entitlements
15. 15
Don't Boil the Ocean
• Information is often spread across multiple systems of
record
• Start with a read-only view of that information
• Target high value/impact data – "moments of
engagement"
16. 16
Example – Single View of a Health Record
• Veteran's view
• Doctor's view
• Case worker's view
17. 17
Single View Architecture
Systems of Engagement
DataServices
Data Processing
Integration,
Analytics, etc.
Systems of Record
Master Data
Raw Data
Integrated Data
…
ETL
record
record
18. 18
• Dynamic schema
• Rich querying
• Aggregation framework
• High scale/performance
• Auto-sharding
• Map-reduce capability (Native MR or Hadoop Connector)
• Enterprise Security Features
Single View – Why MongoDB?
19. 19
Systems of Record Data Model
• Continuity of Care (CCR) XML docs
• Pulled some examples from
http://googlehealthsamples.googlecode.com/svn/trunk/CCR_samples
...
<Immunizations>
<Immunization>
<CCRDataObjectID>BB0022</CCRDataObjectID>
<DateTime>
<Type>
<Text>Start date</Text>
</Type>
<ExactDateTime>1998-06-13T05:00:00Z</ExactDateTime>
</DateTime>
<Source>
<Actor>
<ActorID>Jane Smith</ActorID>
<ActorRole>
<Text>Ordering clinician</Text>
</ActorRole>
</Actor>
</Source>
...
20. 20
Systems of Record Data Model
...
<Medications>
<Medication>
<CCRDataObjectID>52</CCRDataObjectID>
<DateTime>
<Type>
<Text>Prescription Date</Text>
</Type>
<ExactDateTime>2007-03-09T12:00:00Z</ExactDateTime>
</DateTime>
<Type>
<Text>Medication</Text>
</Type>
<Source>
<Actor>
<ActorID>Rx History Supplier</ActorID>
</Actor>
</Source>
<Product>
<ProductName>
<Text>TIZANIDINE HCL 4 MG TABLET TEV</Text>
<Code>
<Value>-1</Value>
<CodingSystem>omi-coding</CodingSystem>
<Version>2005</Version>
...
21. 21
Engagement Data Model
• Leverage dynamic schema / flexible data model
• Use an envelope/wrapper pattern
Source Data
Master Data /
Common Data Model
Metadata
Integrated Data
Metadata
22. 22
Data Flow
1. Read most recent CCRs from each source system
2. Create a source document for each CCR in our system
of engagement database
1. Transform XML to JSON for the source data
2. Record the system and date in the metadata
3. Pull out the patient's identifying information to the
common data
4. Generate an Id for the raw file
3. Store the original CCR XML into GridFS
4. After each source document is created, update the
integrated document for the patient
23. 23
Engagement Data Model - Metadata
{
_id : ObjectId("556b92b83f7e775b8e92b30a"),
meta : {
system : "EHR1",
lastUpdate : ISODate(...)
...
},
common : { ... },
source : { ... }
raw_id : "..."
}
24. 24
Engagement Data Model - Source
{
_id : ObjectId("556b92b83f7e775b8e92b30a"),
...
source : {
...
Immunizations : {
Immunization : {
CCRDataObjectID :"BB0022",
DateTime : {
Type : {
Text :"Start date"
},
ExactDateTime :"1998-06-13T05:00:00Z"
},
Source : {
Actor : {
ActorID :"Jane Smith",
ActorRole : {
Text :"Ordering clinician"
}
}
},
...
},
...
}
25. 25
Engagement Data Model - Common
{
_id : ObjectId("556b92b83f7e775b8e92b30a"),
...
common : {
patient : "D6E5D510-592D-C613-DB46..."
},
...
}
26. 26
Engagement Data Model - Integrated
{
_id : ObjectId("556b92b83f7e775b8e92b30d"),
...
meta : {
lastUpdate : ISODate(...)
integrated : [
{ _id : ObjectId("...a"),
{ _id : ObjectId("...b")
]
},
common : { ... }
...
}
29. 29
Single View Enables New Interactions
• Deliver faster
• Deliver to new applications (mobile, etc.)
• Improve services
• New analytics
30. 30
Changing Data
• Now that data is easy to get to, users will want to make
changes
• With single view, can change data in the source systems
of record
• Remember the change of address scenario?
31. 31
Example – Change of Address
• Enter in different systems
• Call different parts of the organization
• What if you have dependents that
live with you?
32. 32
Capture Data Changes
Systems of Engagement
DataServices
Data Processing
Integration,
Analytics, etc.
Systems of Record
Master Data
Raw Data
Integrated Data
…
ETL
Bus
Apache Kafka
record
record
record
33. 33
Engagement Data Model - Metadata
{
_id : ObjectId("556c1122c9c8f48313553be5"),
meta : {
system : "PatientRecords",
lastUpdate : ISODate(...),
version : 2,
lineage : { ... },
...
},
common : { ... },
source : { ... }
}
34. 34
Engagement Data Model - Source
{
_id : ObjectId("556c1122c9c8f48313553be5"),
...
source : {
patientId : "D6E5D510-592D-C613-DB46..."
address1 : "John Smith",
address2 : null,
city : "New York",
state : "NY",
zip : "10007"
},
...
}
35. 35
Engagement Data Model - Common
{
_id : ObjectId("556c1122c9c8f48313553be5"),
...
common : {
patient : "D6E5D510-592D-C613-DB46...",
address : {
addr1 : "John Smith",
city : "New York",
state : "NY",
zip : "10007"
}
},
...
}
36. 36
Systems of Record Data Model
• Address records can be in different systems
• Each system can be notified of the change to the record
37. 37
Update Process
1. User accesses an application to change their address
2. User updates their address in the System of
Engagement
3. The address change is broadcast to any Systems of
Record that have registered
4. An adapter applies the address change to the System
of Record in an application-specific manner
38. 38
Tracking Changes
• Add basic document versioning to track what changed
when
• Prefer the separate "current" and "history" collections
approach
– current contains the last updated version
– history contains all previous versions
• Can query history to see the lineage
(See http://askasya.com/post/revisitversions)
39. 39
Engagement Data Model – Current
{
_id : ObjectId("556c1122c9c8f48313553be5"),
meta : {
system : "PatientRecords",
lastUpdate : ISODate(...),
version : 2,
lineage : {
event : "update",
source : "ProfileApp",
},
...
},
...
}
40. 40
Engagement Data Model - History
{
_id : {
id : ObjectId("556c1122c9c8f48313553be5"), v : 1
},
meta : {
system : "PatientRecords",
lastUpdate : ISODate(...),
version : 1,
lineage : {
event : "update",
source : "PatientRecords",
},
...
},
...
}
41. 41
Result – New Possibilities
• Change address in one place!
• Other value-add processes can be triggered by changes
• Example: Automated outreach
– heath and benefits centers in new location
– help moving
• Extend address change to Veteran’s dependents
43. 43
Keep going
• Keep adding valuable processes to improve or provide
new services
• Phase out legacy if desired
– Part 1 – From Relational to MongoDB
• Improve data governance
– Part 3 – Bulletproof Data Management
• Reduce costs
• Innovate
44. 44
• Systems of Engagement give users new ways to interact
with data
• You can start small and add value quickly
• MongoDB enables Systems of Engagement
– Dynamic schema
– Fast, flexible querying, analysis, & aggregation
– High performance
– Scalable
– Secure
Summary
45. 45
• Systems of Engagement and the Future of Enterprise IT:
A Sea Change in Enterprise IT
http://www.aiim.org/futurehistory
• Systems of Engagement & the Enterprise
http://www-01.ibm.com/software/ebusiness/jstart/systemsofengagement/
• Geoffrey Moore - The Future of Enterprise IT
http://www.slideshare.net/SAPanalytics/geoffrey-moore-the-future-of-
enterprise-it
• Ask Asya
http://askasya.com/post/trackversions
http://askasya.com/post/revisitversions
References
Hello and welcome to Conquering Data Proliferation, the 2nd talk in our 3 part data management prototype to production series today.
My name is James Kerr and I'm a Solutions Architect here at MongoDB. I've been with the company about 2 and a half years now and have been in the NoSQL space building large scale distributed databases for the last 9 years or so. I work primarily with US government agencies building things on MongoDB but I do work with our commercial customers now and again as well.
As I said, this is part 2 in out 3 part data management path to production series. Hopefully you caught Jay's talk on migrating from relational to mongodb.
In this talk I'll cover one type of system you could possibly migrate your relational databases to.
I'll talk about what's being called systems of engagement (as opposed to systems of record) and how to get your data to and from that system.
Be sure to catch the 3rd part of the series where Buzz will talk about some clever ways of tackling some of the data governance and quality issues we face when building these types of systems
The title of this talk is "Conquering Data Proliferation" which is a pretty far reaching topic. But the fact is that this is a problem that most enterprises, big and small, are facing today.
Cannot link data together across Systems of Record
Systems of Record not designed to have end users interact with them
Cannot link data together across Systems of Record
Systems of Record not designed to have end users interact with them
What happens when a Veteran has to change their address with the VA?
Veteran has to update different parts of the organization manually
May be propagated to other internal systems (or not)
What happens if an address is not up to date?
Benefits mailed to the wrong address (delay or maybe worse) in service
How does a doctor see a single view of a Veteran's health record?
Next to impossible right now
VA has efforts underway to address (Political issues outside the scope of this talk)
They need a way and are in fact making efforts to simplify basic functions such as this
So how do we tackle this?
Systems of Engagement is a concept was introduced by a fellow by the name of Geoffrey Moore a few years back. - NEXT
He said that Systems of Engagement are the next big wave of change in Enterprise IT. And we see this starting to happen across many organizations today.
This is essentially the transition from our existing, mostly passive, systems of record to connected systems of engagement that are more active and encourage/support peer interactions. Essentially, our customers and employees want to interact with business systems the way they they do in their personal/social lives.
By some accounts, industry spent over $1 trillion on systems of record and though we continue to spend on them, we have reached a point of diminishing returns.
Remember though that Today's Systems of Record were yesterday's Systems of Engagement – the engagement model has changed. People need to be able to do things themselves. Can't wait for hours for things to happen.
"systems of engagement refers to the transition from current enterprise systems designed around discrete pieces of information ("records") to systems which are more decentralized, incorporate technologies which encourage peer interactions, and which often leverage cloud technologies to provide the capabilities to enable those interactions."
You have probably heard terms such as
Data Lake
Operational Data Layer
Data hub
These are all moves towards a system of engagement where data is central
We need to be able to do this transition while still leveraging our investment in Systems of Record
Analogy: Retrofitting old building with new “connectivity” and interfaces (maintain existing architecture)
This enables a new class of collaborative applications that interact with the data
Today we'll touch on getting data out of source systems of record, pushing changes back to those systems and tracking the lineage of data through the system
So how do we start to approach this?
You have heard a lot about the 360 degree view of a customer, product or anything that is core to a business
* popular concept that a lot of enterprises are putting solutions in place for
This is a good starting point for a System of Engagement
* You need a view across your core business before you can start to find new ways of interacting with it
You have heard a lot about the360 degree view or single view of a customer, product or anything that is core to a business
This is a popular concept that a lot of enterprises are putting solutions in place for
This is a good starting point for a System of Engagement
You often need a view across your core business before you can start to find new ways of interacting with it
Remember the Retrofitting analogy? We are trying to build on top of our existing investment in systems of record and the Information about our core business objects is typically spread across these systems.
Let's start by identifying "moments of engagement" that are of high value to our business and customers and that just providing a view for would make major improvements.
These "moments" are the things that customers/users either expect the most from the business or feel the most pain about when they interact with the business.
Let's go back to our examples at the VA. One of the more complex issues they face is providing a single view of a Veteran's health record.
Providing this view is critical to a Veteran receiving quality care from doctors as well as other services provided by the VA. Right now health records are spread across systems (and agencies at that) and it is very difficult to see a Veteran's entire history. The VA has efforts under way to improve this and the approach is in line with the systems of engagement concept I have been talking about.
So let's go back to our notional architecture…
We have a central database fed and orchestrated by data ingestion and processing capabilities.
This is fronted by data services that are consumed by new applications
Let's put some technologies in place to try it out:
1) For the central database, we'll use MongoDB. No big surprise here and we'll talk more about why MongoDB is a great fit for this in a second.
2) For the ETL and data processing, we'll use Pentaho. There are other options for this but Pentaho has a good integration with MongoDB and is fairly easy to use (see part 1 in this series for more details on migrating data from relational databases)
3) Lastly, we'll use Node.js to build our RESTful services to sit on top of the database
** Describe the data flow:
Records in SOR
ELT pulls parts from SOR and updates SOE
This is a picture you have seen many times over the years so what's different?
Let's talk about some of the features of MongoDB and how they enable
Dynamic schema can handle vastly different data together and can keep improving and fixing issues over time easily (schema on read)
Our example shows two systems but think about the complexities of integrating data across 5, 10 or even more systems
Rich querying supporting ends users directly requires multiple ways of access and key/value is not sufficient
Aggregation framework database-supported roll-ups for analysis
High scale/performance directly impacts customer & user experience so every second counts
Auto-sharding can automatically add processing power as data is added
Map-reduce capability (Native MR or Hadoop Connector) batch analysis looking for patterns and opportunities in the single view
Enterprise Security Provide the security controls necessary to protect the data
So let's jump back into our example…
We have a couple of different electronic health record (EHR) systems that we are going to pull Continuity of Care, or CCR, records from
They happen to be in XML and have a lot of fields so I just pulled a couple of snippets out
CCRs are meant to track many different types of medical interactions and events
Here we have some things about immunizations…
CLICK
And here we have some data about medications that were perscribed
So how are we going to put that together in one place so we can better interact with it and get a single view across the different CCRs for each patient?
CLICK
Leveraging the dynamic schema capabilities, we can readily create a document data model to encapsulate our original source data, common fields across systems as well as metadata
CLICK
We can start with the source data
The source data can be a roughly transformed version of the data from the system of record so that we can interact with it as JSON/BSON
CLICK
The source data can be wrapped in an envelope document
CLICK
We can track metadata about the source data – source system, date, etc
CLICK
We can also extract any master data or the data that fits into a common enterprise data model out of the source
If the raw data is required, it can be stored or maybe just cached as binary data stored directly in MongoDB and the wrapper document can contain a pointer to it
CLICK, CLICK, CLICK
Let's add a few more source data documents
Now, in the process of creating these documents, we could also be creating an integrated view across them
CLICK, CLICK
There's a lot of flexibility here
Maybe you want to keep the original source data objects as individual documents and then integrate them or maybe you just want to keep the integrated data objects
** add pentaho screenshot if you get a chance
The actual process we go though isn't really that interesting
You've all seen or written basic ETL processes before so I'll just cover it at a high level
For this example, we'll create source documents for each of the original CCRs as well as an integrated view for each patient
The last step can either be done incrementally or once you have completed loading the full batch of source documents for all patients. At which time, you would create an integrated document for each patient that you updated.
Here, the source data is transformed from XML into JSON so we can work with the structure in MongoDB. Otherwise, we have to just store it raw in binary or text form
The topic of converting XML to JSON is a whole separate discussion but it can range from simple to complex depending on how general a solution you need.
Also keep in mind that this is an optional step and can be done at later stages in the process of rolling out your system. It may be more beneficial to focus on the "common" fields and integrate them initially.
In this case, we only have one common data element and that's the patient Id that ties the records together
Our integrated data model can contain metadata about what source data objects we put together.
This helps us understand the lineage
The integrated data model can pull in the patient Id as well as the list of the continuum of care records
We can just pull in the fields that we want from each CCR
Now that we have a single view, what do we do next? – NEXT, NEXT
Improve care in the case of our VA example
Having a view is a great first step but now that data is easy to engage with, users will want to be able make changes to it as well
Worst case, they can go back to the systems of record directly and change the data there
Let's switch gears a bit and go back to the change of address example we talked about
So let's add a component that will propagate changes from the system of engagement back to the systems of record
In addition to the previous components we put in place for the single view, we need some sort of message processing component to receive and publish data changes back to the source systems. For this example, we will use Apache Kafka as it is pretty commonly used these days.
We'll show changing the integrated data in the system of engagement database and propagating that back to the systems of record
Let's start with the data model in our system of engagement this time…
The metadata looks the same as before but lets now add the version and lineage fields to track when changes are made. We'll talk more about this in a second.
The source data contains the patient Id and their address as it came from the Patient Records system of record.
Notice the address2 field is null as that is how it came from the source system of record.
We can then pull the address into our common / master data model
Notice that we have an "addr1" field but no "addr2" field because in a document model, we can just omit fields rather than set them to null
For the sake of this example, we can have two simple but different address formats in our systems of record.
In some cases, the systems will track overlapping sets of addresses
In others, they may track completely separate sets. They may be different lines of business for example.
At a high level, conceptually, this process is quite simple
1, 2, 3, 4
As we drill into it though, many complexities arise:
How do we track who change what?
How do we deal with failure, retries and conflict resolution?
Unfortunately, I don't have all day to talk about this so let's just focus on tracking the changes for now
To track the changes that were made to the data and when, lets extend our data model
There are a number of possible approaches to this and Asya's "Ask Asya" blog does a great job of summarizing the tradeoffs
I prefer the separate "current" and "history" collections approach though. In this approach, your current versions are always in your current collection so it's easy to query the current state and your history collection contains all the prior versions
You can easily query your history collection to see the full lineage of changes made to the documents
The version field contains a number indicating what version of the document is current
Our lineage field can contain the type of event that changed the document as well as the source of the change.
Our history collection contains all the previous versions of the document. We can add the version number to the _id field so we can easily use the a range query on _id to get all of the versions of a document.
We can then examine the lineage and lastUpdate fields to see the list of events and when they occurred to understand the lineage of the document
So what do we end up with? We have re-implemented a "moment of engagement" that use to be complicated for operations and frustrating to end users to now be simple and painless.
We can now think about additional processes that we could launch in response to this change:
Let's look for dependents in the Veteran's benefits and, if they live with them, update their address too
Let's do some automatic outreach to help the Veteran get settled in their new location
Combining this with the benefits of creating single views, we can see how truly powerful this can be.
So what's next? - NEXT
Just keep going…
Systems of engagement are a new wave of change happening in enterprise IT. These new systems are transforming businesses and allowing both employees and end users to interact with systems in new ways.
MongoDB enables Systems of Engagement
Dynamic schema can handle data from numerous different systems all in one place
Fast, flexible querying, analysis, & aggregation gets maximum value from the data
High performance allows Systems of Engagement to handle load from a new class of users