SlideShare a Scribd company logo
1 of 14
Download to read offline
BETTER WITH
BITEMPORAL
MARKLOGIC WHITE PAPER • JUNE 2015
In our age of billion-dollar regulatory fines and time-consuming, costly litigation,
a database must hold up as the main system of record. Unfortunately, traditional
databases do not keep a complete history of the past. Only with a bitemporal
database can you truly maintain a complete and accurate picture of the past,
understanding exactly “what you knew” and “when you knew it.”
ASSESSMENT: DO YOU NEED BITEMPORAL?
Before you go any further, it is probably helpful to first ask whether you might need bitemporal data management
in your organization. If you answer “yes” to any of the following questions, then bitemporal is a solution that you
should consider.
YES NO
1.	 Is tracking when events or transactions occur critical to your business? ✔
2.	 Are there ever cases when historical data needs to be updated? ✔
3.	 Do you run into circumstances in which there is a lag between when something happened
in the real world, and when it was recorded in the database?
✔
4.	 Do you get frequent requests from regulators to review historical data? ✔
5.	 Do you work in an industry in which the sequence of when you learn about certain
information is significant, such as in law and intelligence?
✔
6.	 Is the cost and complexity of storing and accessing historical data in your organization
overwhelming?
✔
7.	 Does managing and accessing historical data cost significant developer resources, or
carry increasing risk over time?
✔
Contents
Introduction................................................................................................................................................................1
The Cost of Not Having Bitemporal
Three Types of Temporality......................................................................................................................................2
Non-temporal
Unitemporal
Bitemporal
The Benefits of Bitemporal.......................................................................................................................................4
Things You Can Do With Bitemporal
The Increasing Need for Bitemporal
Bitemporal Across Industries
Why Bitemporal Has Been Difficult.........................................................................................................................7
Why the Time for Bitemporal is Now.......................................................................................................................8
Key Features of Bitemporal in MarkLogic
Get Going Quickly
More Information
INTRODUCTION
Today, databases are the primary system of record,
not paper. In this new reality, organizations are required
to keep an accurate picture of all the facts, as they
occur. For certain industries such as financial services,
insurance, and healthcare, there are even laws that
mandate how historical data is tracked and managed.
Unfortunately, traditional databases cannot provide
a truly accurate picture of your business at different
points-in-time. The reason is that traditional databases
are unitemporal, and can only track start and end
times along a single timeline. But, what if there is a lag
between when something happened and when you
found out about it? Which time should you record?
Or, what if you realize you need to make a correction
to when something happened, but do not want to
overwrite any historical data? In those cases, a single
timeline is not enough.
With a bitemporal database, you can store and query
data along two timelines with timestamps for both
valid time—when a fact occurred in the real world
(“what you knew”), and also system time—when that
fact was recorded to the database (“when you knew
it”). By tracking events along two timelines with a
bitemporal database, it is possible to keep a complete
and accurate picture of your business at any given time
for internal search and discovery purposes or for when
regulators conduct audits.
Consider some of the new questions that a bitemporal
database allows you to ask:
•	 What were my customer’s credit ratings last year
as I knew them last quarter?
•	 What was our position with that security before the
trade was amended?
•	 What did our intelligence indicate before we
learned that new piece of information?
With a traditional unitemporal database, you can ask
what your customer’s credit ratings looked like as you
knew them today, but not yesterday or last quarter.
Only a bitemporal database allows you to go back
and see an accurate and unaltered picture of historical
data, including past and present changes. A bitemporal
database is necessary for today’s enterprises to be able
to accurately explore historical data, manage that data
across systems, ensure full data integrity, and do more
complex analysis.
MarkLogic®
is an Enterprise NoSQL database that is
best suited for storing and managing bitemporal data
for the following reasons:
•	 Flexible Data Model – MarkLogic’s document-
oriented data model is schema-agnostic and able
to manage the complexities of bitemporal data
that relational databases are ill-suited for, such
as integrity constraints, evolving schemas, and
multiple different data models.
•	 Enterprise Reliability – MarkLogic has the
enterprise features that other new generation
databases do not. MarkLogic is a proven database
that runs mission-critical applications at hundreds
of world-leading organizations.
•	 Bitemporal Out-of-the-Box – Bitemporal is
a feature built-in to MarkLogic whereas other
vendors make it an additional software add-on that
increases cost and complexity.1
1	 Hudson Foods recalled one-fifth of their annual output in 1997 due to an
outbreak of E. Coli, costing them an estimated $25 Million. Their database only al-
lowed them to see a current view of which beef came from which sources, and not
a view of their data as it existed on the day the supplier processed the small batch
of contaminated meat. This meant the entire product had to be recalled. For more
information, see Richard T. Snodgrass’ book, Developing Time-Oriented Database
Applications in SQL (ch.2, 11).
THE COST OF NOT HAVING BITEMPORAL
Not having bitemporal is directly attributed to
costing one company $25 Million.1
It has cost
(or perhaps saved) many politicians their jobs.
In our age of super-regulation and the need
to maintain provenance, immutability, and
governance with historical data, the potential
cost of not using bitemporal grows much larger.
This is particularly true in industries such as
financial services where not having an accurate
picture of the past has contributed to multi-
billion dollar fines and further increases in
regulation.
1
THREE TYPES OF TEMPORALITY
To understand bitemporal, you first have to understand
how databases currently manage time. In relation to
time, there are three basic categories of databases:
non-temporal, unitemporal, and bitemporal. Each
type is discussed below, using the example of when
a patient was diagnosed with an allergy and when the
doctor found out about it as a guide.
NON-TEMPORAL
Non-temporal databases store data with no time
dimension. A fact is just a fact—there is no history
and it is only understood to be true at the current
point in time. Data models that do not support a time
dimension are just called snapshots.
Just imagine the example of when a patient was
diagnosed with an allergy, which is an important piece
of information considering the potential adverse and
even deadly reactions that some patients can have
to common medications like penicillin. With a non-
temporal database, you would just see the current
state, which would be either “patient has no allergy” or
“patient has a positive allergy diagnosis,” as depicted in
Figure 1 in which the shaded area represents when the
fact is true.
In a non-temporal database, you just get a single
view of the data without respect to time. It should not
be surprising that non-temporal databases are very
uncommon, as most applications deal with time-
varying data.
UNITEMPORAL
Unitemporal databases support time across one
dimension: valid time. Most people just think of valid
time as just “time”—it represents when something
happened in the real world. Valid time is tracked along a
single timeline to answer questions such as: When did
that patient get diagnosed with an allergy? How many
patients have that same allergy? How long has the
patient had the allergy? In the example of the patient
with the allergy, it is clear from the graph in Figure 2 that
the patient was diagnosed with an allergy at 9:00am
along the valid timeline.
The problem is that valid time only shows a piece of
the picture. Looking at the figure above, it would not be
clear to an outside observer when the doctor learned
that the patient was diagnosed with the allergy. What
if it was the lab that first discovered the allergy, but
there was a lag in time before the doctor actually found
out about the lab results? That is valuable information
that is not recorded in a unitemporal database. In
this example, imagine if a drug was administered
to the patient that day that caused an anaphylactic
reaction—didn’t the doctor know not to administer that
drug? Let’s look at how to solve this problem with a
bitemporal database.
BITEMPORAL
A bitemporal database records timestamps for events
along two dimensions of time: valid time and system
time. Valid time tracks when an event occurred in the
real world. System time (sometimes called “transaction
time”) tracks when the event is recorded to the
database. These two time dimensions are depicted
graphically along both axes in Figure 3. In this example,
valid time represents when the lab discovered the
allergy, and system time represents when the doctor
found out about it and recorded it to his chart.
Unitemporal databases make the false assumption
that valid time is always equal to system time, and in
doing so loses valuable information. Sometime, as
Figure 3 depicts, valid time is equal to system time. But,
you would not know that unless you had a bitemporal
database. A bitemporal database records time along
POSITIVE ALLERGY
DIAGNOSIS
FIGURE 1: A nontemporal database does not store any time dimensions.
TIME:
NO
ALLERGY
DIAGNOSIS
POSITIVE ALLERGY
DIAGNOSIS
9 AM 10 AM 11 AM 12 AM
FIGURE 2: A unitemporal database only tracks valid time.
2
both dimensions independently so you can keep
accurate records.
Using the example of the patient with the allergy,
imagine that the doctor actually found out about the
allergy at 10:30am, an hour and a half after the lab
did their tests and concluded that the patient had an
allergy. The lab noted that the patient had an allergy at
9:00am, but that information did not get to the doctor
until 10:30am. This represents a lag between valid time
and system time, and would look like Figure 4.
Taking this example a bit further, imagine that later
on the same day, at 11:30am, the doctor gets a call
from the lab saying that they just discovered that they
did the tests incorrectly. The lab result was actually
negative—the patient does not have an allergy. This
correction is shown in Figure 5. With a bitemporal
database, it is easy to make corrections to historical
data, and the process does not overwrite any data.
By looking at Figure 5, we can ascertain the
following facts:
•	 Before 10:30am (system time), the doctor did not
know about the allergy
•	 At 10:30am (system time), the doctor recorded
the patient having an allergy, which had been
discovered by the lab at 9:00am (valid time)
•	 At 11:30am (system time), the lab and doctor
discover the mistake and update the records to
show that the patient does not have an allergy
With this timeline tracked across both axes, it is now
possible to go back and see a true picture of events.
This can be extremely helpful in understanding and
avoiding mistakes, as the doctor’s decisions can be
easily married to what he knew or did not know at any
given point in time. In the setting of a hospital, drug
allergies can be life threatening, so having an accurate
record of when a patient was diagnosed and when care
providers learn this information is critical.
POSITIVE ALLERGY
DIAGNOSIS
9 AM
9 AM
10 AM
11 AM
12 AM
10 AM 11 AM 12 AM
NO ALLERGY
DIAGNOSIS
SYSTEM TIME
“When the doctor found out about it”
VALIDTIME
“Whenthelabdiscoveredtheallergy”
FIGURE 3: A bitemporal database tracks both valid time and system time.
FIGURE 4: A bitemporal database tracks lags in information.
POSITIVE ALLERGY
DIAGNOSIS
9 AM
9 AM
10 AM
11 AM
12 AM
10 AM 11 AM 12 AM
SYSTEM TIME
“When the doctor found out about it”
VALIDTIME
“Whenthelabdiscoveredtheallergy”
NO ALLERGY
DIAGNOSIS
LAG
POSITIVE
ALLERGY
DIAGNOSIS
9 AM
9 AM
10 AM
11 AM
12 AM
10 AM
CORRECTION
11 AM 12 AM
SYSTEM TIME
“When the doctor found out about it”
VALIDTIME
“Whenthelabdiscoveredtheallergy”
NO ALLERGY
DIAGNOSIS
FIGURE 5: A bitemporal database tracks corrections without overwriting data.
3
The example of the allergy diagnosis may seem
somewhat simple, but the same concept can be
applied to any piece of data, whether it is when a
financial trade occurred, when someone got insurance,
or when someone owned a house. In all of these cases,
START DATE and END DATE for both valid time and
system time can be tracked in order to preserve the
most accurate picture of reality.
TABLE 1: Comparing Unitemporal to Bitemporal for a variety of examples.
UNITEMPORAL BITEMPORAL
When did the lab results
indicate that the patient
had an allergy to penicillin?
When did the lab results
indicate that the patient had
an allergy to penicillin, and
when did the care provider
learn about the allergy?
When was the sell order
cancelled by the bank’s
counter party?
When was the sell order
cancelled by the bank’s
counter party, and when did
the trader learn that it was
cancelled?
What reference data
existed regarding trade
events on December 4th?
What reference data did
the trader actually have on
December 4th?
When did John become
eligible for insurance
coverage, as the
employment records
indicate now?
When did John become
eligible for insurance
coverage, as the
employment records
indicated in 2012?
THE BENEFITS OF BITEMPORAL
Bitemporal, simply put, gives you a better way to
manage time. No alternative to bitemporal, even
temporal versioning, can provide a seamless, query-
able, flexible view of historical data. Bitemporal
is a critical capability any organization can take
advantage of, and there is a particularly growing
need for bitemporal in industries that face growing
regulatory pressures and litigation such as financial
services, insurance, and healthcare. In these industries,
organizations are having to better account for all of their
past actions with the onset of new laws and litigation,
more frequent and in-depth audits, and increased fines
for non-compliance. Organizations that better manage
their historical data are able to reduce their risk and get
through audits unscathed.
THINGS YOU CAN DO WITH BITEMPORAL
•	 Handle Regulation and Audits – Provide an
accurate picture of the past to meet requirements
for increased transparency and accountability
•	 Manage Risk – Create better risk models and
improve business intelligence by analyzing true
historical data
•	 Reduce Costs – Simplify architecture and reduce
the cost and operational risk of storing redundant
historical data
THE INCREASING NEED FOR BITEMPORAL
The need to better manage regulatory concerns is
growing in general, though it is having a particularly
significant impact in certain industries, such as financial
services. Large banks have been hit with record-
breaking fines in recent years, coupled with an increase
in regulatory pressures. Since 2009, banks in the U.S.
and Europe have paid over $128 billion to regulators,
and 2014 was the biggest year ever, with $65 billion in
penalties and fines, about 40% greater than in 2013.2
Today, regulators are more intrusive and carry out more
vigorous enforcement as they drill into the details.
According to Gerold Grasshoff, the global head of
risk management and regulation at Boston Consulting
Group, regulatory pressures are now a core issue for
banks. “You have to change your operating model,
change your products, change the legal risks now...
Nothing is changing business models as much as
the regulatory issues. That is the biggest strategic
challenge.” To adopt to the changing way in which
business is done, banks are having to change their IT
and data management approaches to
increase transparency.3
Other industries are also facing increased regulatory
pressures. In healthcare, for example, there is the
problem wrought by medical errors, which some reports
estimate to be $1 Trillion.4
Knowing when and how
2	 James Sterngold, “For Banks, 2014 Was a Year of Big Penalties”, Dec.
30, 2014 <http://www.wsj.com/articles/no-more-regulatory-nice-guy-for-
banks-1419957394>
3	 Boston Consulting Group, “Building the Transparent Bank”, Dec. 2014 <https://
www.bcgperspectives.com/Images/Building_the_Transparent_Bank_Dec_2014_
tcm80-177814.pdf>
4	 Andel, Davidow, Hollander, Moreno. “The economics of health care quality and
medical errors.” Journal of Health Care Finance 39(1):39-50 (2012) <http://www.
ncbi.nlm.nih.gov/pubmed/23155743>
4
errors occurred is critical to improving medical decision
making and avoiding medical malpractice. And,
consider the growing cost of fraud and abuse across
the healthcare industry, estimated to be anywhere
between $82 and 272 Billion in the U.S.5
Unfortunately,
the general cost and complexity surrounding patient
safety, malpractice litigation, and fraud and abuse is
only increasing.6
By implementing bitemporal data management,
organizations can take a bold step towards lowering
risk, improving transparency, and gaining a competitive
advantage to outrun the competition.
BITEMPORAL ACROSS INDUSTRIES
FINANCIAL SERVICES
Bitemporal helps large banks better manage their data
and adapt to the changes in laws and regulation that
are impacting how business is done. For example,
bitemporal helps by providing an accurate record of
trades as they occur and are amended. After trades are
made, they are later reconciled with counterparties and
updates often occur before the trade is closed. With
a unitemporal database, updates overwrite historical
data, which can put enormous risk on individual traders
and entire companies. Bitemporal provides an accurate
picture of the entire lifecycle of a trade review, including
when changes to counterparty names, transaction id’s,
or price corrections occurred.
INSURANCE
In the insurance industry, bitemporal helps by providing
a clear determination of coverage over the course
of history, ensuring that even if there are retroactive
changes, data is never overwritten.
5	 Berwick, Hackbarth. “Eliminating waste in US health care.” JAMA
307(14):1513-6 (2012) <http://www.ncbi.nlm.nih.gov/pubmed/22419800>
6 James Sterngold. “For Banks, 2014 Was a Year of Big Penalties.” Wall Street
Journal, 2014.
TABLE 2: Bitemporal in Financial Services
BEFORE BITEMPORAL AFTER BITEMPORAL
What do we think the trader’s
position was, and what
information do we think was
available to the trader around
the time when the trade was
executed?
What was the trader’s exact
position when the trade was
executed, and what exact
reference data was available
at the time the trade was
executed?
What were our customer’s
credit ratings last year?
What were our customer’s
credit ratings last year, as we
knew them last quarter?
What was our market
exposure when trade was
made at 11:00am?
What was our market
exposure when that trade
was made at 11:00am, as we
knew it at 11:30am?
What was the company’s
profit when we gave
guidance?
What did we think the
company’s profit was when
we gave guidance?
TABLE 3: Bitemporal in Insurance
BEFORE BITEMPORAL AFTER BITEMPORAL
What was the estimated
impact of the disaster on
insurance premiums?
What was the estimated
impact of the disaster on
insurance premiums, before
the data was adjusted
retroactively?
Did the beneficiary have
coverage at the point of
diagnosis?
Did the beneficiary have
coverage at the point
of diagnosis, before the
legislation was enacted?
Was the employee with the
company when the event
occurred?
Was the employee with the
company when the event
occurred, as indicated by
your records at that time?
“We’re in an era of very, very vigorous enforcement, of heightened
super regulation. It’s not a one-off thing.”
Benjamin Lawsky,
Superintendent for Financial Services, New York State6
5
The insurance company can always go back and see a
history of past coverage at any point in time in the past.
An insurer may also want to know employee status, and
may need an accurate picture of when an employee
was actually with a company at any point in time, as
they knew it at any point in time.
HEALTHCARE
Healthcare faces enormous challenges for all
stakeholders, including providers, payers, and
pharmaceutical and biotechnology companies.
Bitemporal is one component of improvements in
health IT that helps lower costs and improve outcomes
by giving providers a more accurate picture of a
patient’s history as varied teams direct the course of
treatment, and an improved investigative tool when
looking at adverse events. And, when Payers receive
billing codes for procedures, they are able to track the
full history of each patient. Even if changes to insurance
coverage were made retroactively, no part of the history
is lost. There are also benefits to pharmaceutical
and biotechnology companies as they are able to
use bitemporal to enhance decision making in both
research and business.
LAW AND INTELLIGENCE
Bitemporal helps paint a complete picture even when
disparate facts are gathered piece-meal before and
after certain events. With a more complete picture,
government agencies have the ability to better
understand motives and even better predict future
events. During investigations, bitemporal enables law
enforcement officers to go back and ask why you went
down a certain path, which is particularly useful when
investigations are resurrected from cold case files.
TABLE 4: Bitemporal in Healthcare
BEFORE BITEMPORAL AFTER BITEMPORAL
What did the patient’s chart
look like when the medication
was prescribed?
What did the patient’s chart
look like when the medication
was prescribed, before the
chart was updated with the
lab results?
What was the coverage
determination for that patient
in June 2010?
What was the coverage
determination for that patient
in June 2010, as we knew it in
August 2010?
What did the clinical trial
results indicate when
you made the additional
investment?
What did the clinical trial
results indicate when
you made the additional
investment, before the
research results were
updated?
TABLE 5: Bitemporal in Law and Intelligence
BEFORE BITEMPORAL AFTER BITEMPORAL
What was happening when
we made the decision?
What did we think was
happening when we made
the decision?
When did the event happen? When did the event happen,
and when was that recorded?
Why do we currently think
that we pursued that course
of action?
What were we thinking when
we pursued that course of
action?
“MarkLogic’s bitemporal offers the flexibility of correlating and delivering additional
value of data (by providing intraday information, not just end-of-day information)
to a diverse customer group—rapidly—that just hasn’t been fully realized
before... In fact, MarkLogic’s bitemporal will provide an entirely new opportunity
for our customers to perform additional analytics as well as enabling much richer
capabilities in the area of compliance management.”
Paolo Pelizzoli, Global Head of Architecture, Global Technology Operations, Broadridge Financial Solutions
6
WHY BITEMPORAL
HAS BEEN DIFFICULT
At this point you are likely asking, “If bitemporal is that
important, why haven’t I heard about it?” Although
there have been thousands of research papers written
on the topic of temporal data in the past twenty years
and the topic of bitemporal has been discussed by
experts since the early 1990’s, bitemporal is still
relatively unknown.
Bitemporal clearly has incredible business value. Yet,
most analysts on the business side do not even know
they can ask for bitemporal data because it is so
seldom put into production. The problem is that with
relational databases the complexities of implementing
and maintaining bitemporal generally outweigh the
benefits. In fact, just handling ordinary temporal data in
a relational database can be a huge challenge.
Unfortunately, despite efforts to make bitemporal data
easier to manage in relational databases, bitemporal
remains an unreachable goal with traditional tools. The
number of experts in the world that can manage the
complexities inherent with bitemporal implementations
using relational databases is probably limited to only a
special few individuals. Without going too deeply into
the details of bitemporal data modelling, here are some
of the key reasons why relational databases are ill-
suited for bitemporal data management:
•	 Integrity Constraints – The relational data model
comes with constraints such as referential integrity,
entity integrity, and defined schemas that are not
easily changed. Some constraints are specific to
temporal data, such as child rows within a table
only being able to include valid periods of time
within the valid period of time defined by the
parent row of the table. When bitemporal columns
are added to a relational table, they can wreak
havoc on the relational data model.
•	 Schema Evolution – There is incredible
complexity when adding bitemporal to a relational
model. Architectural and structural changes are
temporal themselves, and when new columns are
added with temporal dimensions or new tables are
created as new data is ingested, the schema will
change. Handling a changing schema and resulting
changes in application code are complex projects
already, even before trying to add bitemporal.
•	 Multiple Data Models – Handling schema
evolution is a difficult challenge, but now imagine
the task of handling multiple evolving schemas
across multiple data models and data silos, and
then aggregating them into a single source of truth.
Data integration is an expensive task, but when
bitemporal data is included, the complexity grows
exponentially.
•	 Decline in Performance – Read and write
performance typically dips because bitemporal
queries must consider the additional axis of time
in every query, and data usually spans multiple
tables and in some cases even multiple servers.
Attempts have been made to simplify queries and
improve performance, but they have not gone
far enough in eliminating the inherent complexity
and performance issues caused by scattering
bitemporal data across tables.
•	 Vendor Lock-in – Some vendors have begun to
implement improvements in bitemporal. However,
as happened in the past with implementing SQL
standards, each vendor will implement them
differently with their own syntax and then tack on
an additional cost of the feature as an add-on.
Oftentimes, the response to the challenges of
implementing bitemporal in a relational database is
to find the next best solution. Here are some of the
common responses.
“Relational bitemporal offerings are not widely adopted because as time changes,
the shape of the data usually changes as well… and RDBMS’ are not able to
capture the evolving schema.”
Global Investment Bank
7
•	 “But, I can just use Slowly Changing
Dimensions” – Attempts to use dimensional
modelling “type two” Slowly Changing Dimensions
(SCDs) as a way to approximate bitemporal data
have been made in recent years, and the problems
with this approach have been well documented.7
Using SCDs only approximates valid temporal data
and results in many inconsistencies that are later
difficult to uncover and fix. And, even if everything
is designed properly, query performance will likely
still be slow and results may not be reproducible.
•	 “But, I can just take frequent snapshots” –
This approach, also referred to as “temporal
versioning,” is a more common argument against
bitemporal, as most organizations are already
taking regular weekly or monthly snapshots of
their data. This approach is stable and predictable.
Unfortunately, this approach results in massive
amounts of redundant data, immense storage
costs, and still lots of lost information because
of the gaps between snapshots. And, even if
frequent snapshots are taken, regulators in most
industries view this as increasingly unacceptable.
Both regulators and data analysts have specific
questions, require fast answers, and do not
appreciate any gaps.
7	 Tom Johnston. Bitemporal Data: Theory and Practice (Waltham, MA: Elsevier,
2014) 311 - 313.
•	 “But, I can just rely on my audit logs” – While
useful for tracking event information, logs are not
sufficient for bitemporal because they cannot
be easily or quickly queried and would not meet
standards for maintaining immutable records
Bitemporal is the only approach to managing time that
provides a quick and seamless way to look back at
historical data, query it on the fly at any point-in-time,
and work with it operationally just as you would with
your most current data.8
WHY THE TIME FOR
BITEMPORAL IS NOW
As an Enterprise NoSQL database, MarkLogic
provides the flexibility required to make storing and
managing bitemporal data a practical reality, without
sacrificing any performance with complex queries or
data resiliency and security. MarkLogic is also unique
in being the only Enterprise NoSQL database that has
bitemporal capability.
MarkLogic is schema-agnostic, and manages data
as documents. This means that you do not have to
maintain a strict schema that must be adhered to
throughout the life of the database. If you have to
8	 Richard T. Snodgrass. Developing Time-Oriented Database Applications in
SQL. Morgan Kaufmann Publishers, Inc., San Francisco, July, 1999. <http://www.
cs.arizona.edu/~rts/tdbbook.pdf>
“ Despite the near universality of time and the time-varying nature of the enterprise being
modeled—a static and unmalleable configuration is rare and uninteresting—SQL quite
frankly does a lousy job in capturing those aspects that are changing in time, or in
providing constructs to effectively model, query, or modify such information.”
Richard Snodgrass, Developing Time-Oriented Database Applications in SQL8
Advantages of Bitemporal in MarkLogic MarkLogic Other DBs
Schema-agnostic to handle schema evolution and multiple varying data models ✔ ✖
Simpler coding and operations ✔ ✖
Quicker time-to-value ✔ ✖
Scalability, elasticity, and reduced storage costs ✔ ✖
8
integrate a new data source at a later date, you do
not have to do complex ETL before loading that data
into MarkLogic. The frustration of having to add a new
column into a relational database simply disappears—
whether you are adding a DATE column or
anything else.
Bitemporal data may have a lifespan of decades, and
organizations need a database that can respond rapidly
to keep pace with schema evolution as new data
sources are added. MarkLogic makes it easy to ingest
new data sources, and if there are conflicts that need
to be resolved (e.g., new data source has the column
name “SRC_DATE” but it should be “CLAIM_DATE”),
MarkLogic makes it easy to perform the necessary
transformations to ensure a standard vocabulary.
With MarkLogic, you never have to worry about
the constraints found with relational data modelling
such as entity integrity, referential integrity, and
denormalization—even when it comes to bitemporal
data management.
MarkLogic performs orders of magnitude better than
relational databases for large-scale data integration
projects, speeding up project delivery times by
reducing the amount of time spent doing requirements
gathering and data modelling, and improving the quality
of prototypes. At Broadridge, a large financial services
organization, it was remarked that “The first MarkLogic
project took 60 days… It was estimated to take 3,000
days with existing technology.”
HOW BITEMPORAL
WORKS IN MARKLOGIC
For those with a relational database background,
working with temporal and bitemporal data in
MarkLogic should be very familiar. The main difference
is that rather than columns of dates in a table, that
information now appears as timestamps within
documents. MarkLogic stores and manages all data as
documents, including bitemporal data.
Whether working with JSON or XML documents, a
document is considered to be bitemporal if it includes
timestamps for valid start and end times, and for
system start and end times. One way to load a
bitemporal document into MarkLogic is with MarkLogic
Content Pump, or mlcp. You can also use the REST
API. Or, you can load a bitemporal document using a
simple JavaScript update query, which is
shown below.
After loading bitemporal documents into MarkLogic,
they are managed as a series of documents with range
indexes for valid and system time axes. The valid
and system time axes each serve as a container for
a named pair of range indexes. And, the bitemporal
documents are stored in temporal collections, which
are logical groupings of temporal documents. You
can create additional temporal collections if you have
documents that require a different schema for the
timestamps.
declareUpdate();
var root =
{ "tempdoc": {
"systemStart": null,
"systemEnd": null
"validStart": "2014-04-03T11:00:00",
"validEnd": "2014-04-03T16:00:00",
"content": "some data, like closing price"
}
};
temporal.documentInsert("temporalCollection", "exampledata.json", root);
FIGURE 6. Updating a bitemporal document
9
After initial documents are loaded into MarkLogic,
they are always kept and never changed. Even if a
bitemporal document is “deleted”, MarkLogic still
keeps the document, but the system time is changed
from infinity to the time of the delete. The same process
works for updates—older versions are still kept and
the “new” version is simply added. MarkLogic also
does not allow updates to system start times. Once the
system time is set for a collection, it continues to roll
forward to further insure the integrity of the data.
Keeping track of the provenance of information with
full governance and immutability is critical, which
is why MarkLogic applies its security model to
bitemporal documents. MarkLogic is certified by the
National Information Assurance Partnership (NIAP)
Common Criteria Evaluation and Validation Scheme,
and uses Role Based Access Control (RBAC) by
default to manage access to documents. This high
level of security ensures that historical records are
not tampered with, and that documents maintain their
permissions over time.
KEY FEATURES OF BITEMPORAL IN MARKLOGIC
Insert, update (and never delete) – Ingest temporal
JSON or XML documents with references to valid time
using the Temporal API or mlcp, and make changes
without losing any data as new versions are added
Complex temporal queries – Query the database
along valid and system time axes using standard Allen
and SQL operators when comparing time periods
Adapt to evolving schema – Avoid worrying about the
changing shape of the data over time. Unlike relational
databases, MarkLogic is schema-agnostic and can
easily manage schema changes over time
Maintain a Last Stable Query Time – A special
timestamp, called the LSQT (Last Stable Query Time),
can be enabled in order to manage and coordinate
system start times across systems
Combine with tiered storage – Use tiered storage to
easily migrate historical data to less expensive storage
tiers, without losing the ability to query the data
Combine with semantics – Assign bitemporal
elements to documents, whether they are RDF triples,
or documents that include RDF triples, giving you the
ability to track how relationships change over time
Combine with geospatial – Gain the ability to track
your data over time and space. MarkLogic stores
geospatial data, and now you can accurately track how
geospatial data changes over time
Take advantage of certified security – Manage
bitemporal documents with the same certified security
as all other documents, using Role Based Access
Control (RBAC) or other security models
Scale quickly and easily – Avoid any concerns
of under-provisioning with MarkLogic’s scale-out
architecture, which allows you to easily add nodes to
handle the increased demands of bitemporal data
FIGURE 7. A bitemporal query as viewed in MarkLogic Query Console
1 0
HOW TO GET STARTED
Managing time is not easy. If it were, we probably could
have avoided multi-billion dollar problems like Y2K.9
But, managing time is a necessity, and bitemporal is
the future of managing time in datbases as we seek to
maintain a better record of “what we knew” and “when
we knew it.” MarkLogic takes away the constraints
that prevent the adoption of bitemporal, and is the best
database for storing and managing bitemporal data.
GET GOING QUICKLY
1.	 Identify the questions your business cannot
currently answer
2.	 Identify the business benefits of adding bitemporal
3.	 Assess the current data management environment
4.	 Engage with MarkLogic to discuss implementation
5.	 Download MarkLogic
6.	 Learn more in MarkLogic’s free training
9	 According to the BBC and ComputerWorld, the estimated cost of the prepara-
tion and remediation for the “Year 2000 problem”, or Y2K, was $608 Billion, and
that’s not taking into account inflation. For more information: Robert L. Mitchell.
“Y2K: The good, the bad and the crazy”. ComputerWorld (28 December 2009)
<http://www.computerworld.com/article/2522197/it-management/y2k--the-good--
the-bad-and-the-crazy.html?page=2>
MORE INFORMATION
•	 Read MarkLogic Documentation – Learn how to
work with bitemporal data in MarkLogic at
docs.marklogic.com/guide/temporal
•	 Watch a Presentation – Hear from a MarkLogic
customer about “Why Banks Care About
Bitemporal” www.marklogic.com/resources/why-
banks-care-about-bitemporality/
•	 Schedule a Meeting – Discuss your particular
use case with a MarkLogic sales representative by
contacting us at sales@marklogic.com
“MarkLogic has a history of bringing advanced data management
technology to market, and many of their customers and partners are
accustomed to managing complex data in an agile manner. As a result,
MarkLogic customers and partners, in general, have a more mature
and creative view of how to manage and use data than do most other
database users.”
Carl Olofson, Research Vice President for Data Management Software Research, IDC
1 1

More Related Content

Similar to MarkLogic White Paper Better With Bitemporal (Interactive)

Analytics, business cycles and disruptions
Analytics, business cycles and disruptionsAnalytics, business cycles and disruptions
Analytics, business cycles and disruptionsMark Albala
 
Avelo_BigData_Whitepaper
Avelo_BigData_WhitepaperAvelo_BigData_Whitepaper
Avelo_BigData_WhitepaperMark Pearce
 
Accounting Information Systems 13Th Chapter 1
Accounting Information Systems 13Th Chapter 1Accounting Information Systems 13Th Chapter 1
Accounting Information Systems 13Th Chapter 1Don Dooley
 
Predictive Response to Combat Retail Shrink
Predictive Response to Combat Retail ShrinkPredictive Response to Combat Retail Shrink
Predictive Response to Combat Retail ShrinkCognizant
 
Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...
Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...
Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...Lora Cecere
 
6 factors to consider when selecting a solution for demand forecasting
6 factors to consider when selecting a solution for demand forecasting6 factors to consider when selecting a solution for demand forecasting
6 factors to consider when selecting a solution for demand forecastingSymphony RetailAI
 
Portrait Of A Writer Essay
Portrait Of A Writer EssayPortrait Of A Writer Essay
Portrait Of A Writer EssayEmily Parrish
 
IBM impact-final-reviewed1
IBM impact-final-reviewed1IBM impact-final-reviewed1
IBM impact-final-reviewed1Priya Thinagar
 
Demand estimating and forcasting
Demand estimating and forcastingDemand estimating and forcasting
Demand estimating and forcastingMuntaquir Hasnain
 
big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...johnmutiso245
 
big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...johnmutiso245
 
Whitepaper next generation_patient_safety_bertine_mc_kenna.01
Whitepaper next generation_patient_safety_bertine_mc_kenna.01Whitepaper next generation_patient_safety_bertine_mc_kenna.01
Whitepaper next generation_patient_safety_bertine_mc_kenna.01Ronan Martin
 
Data Management Project Proposal
Data Management Project ProposalData Management Project Proposal
Data Management Project ProposalPatrick Garbart
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumDale Sanders
 
InternetOfThingBuildsCapacity
InternetOfThingBuildsCapacityInternetOfThingBuildsCapacity
InternetOfThingBuildsCapacityKelly Delaney
 

Similar to MarkLogic White Paper Better With Bitemporal (Interactive) (20)

Data dynamite presentation
Data dynamite presentationData dynamite presentation
Data dynamite presentation
 
Analytics, business cycles and disruptions
Analytics, business cycles and disruptionsAnalytics, business cycles and disruptions
Analytics, business cycles and disruptions
 
Avelo_BigData_Whitepaper
Avelo_BigData_WhitepaperAvelo_BigData_Whitepaper
Avelo_BigData_Whitepaper
 
Accounting Information Systems 13Th Chapter 1
Accounting Information Systems 13Th Chapter 1Accounting Information Systems 13Th Chapter 1
Accounting Information Systems 13Th Chapter 1
 
Slow moving products
Slow moving productsSlow moving products
Slow moving products
 
Predictive Response to Combat Retail Shrink
Predictive Response to Combat Retail ShrinkPredictive Response to Combat Retail Shrink
Predictive Response to Combat Retail Shrink
 
Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...
Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...
Supply Chain Metrics That Matter: A Focus on the Consumer Products Industry 2...
 
"One Report"
"One Report""One Report"
"One Report"
 
6 factors to consider when selecting a solution for demand forecasting
6 factors to consider when selecting a solution for demand forecasting6 factors to consider when selecting a solution for demand forecasting
6 factors to consider when selecting a solution for demand forecasting
 
Bigdata Hadoop introduction
Bigdata Hadoop introductionBigdata Hadoop introduction
Bigdata Hadoop introduction
 
Portrait Of A Writer Essay
Portrait Of A Writer EssayPortrait Of A Writer Essay
Portrait Of A Writer Essay
 
IBM impact-final-reviewed1
IBM impact-final-reviewed1IBM impact-final-reviewed1
IBM impact-final-reviewed1
 
Demand estimating and forcasting
Demand estimating and forcastingDemand estimating and forcasting
Demand estimating and forcasting
 
Academic writer 23
Academic writer 23Academic writer 23
Academic writer 23
 
big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...
 
big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...
 
Whitepaper next generation_patient_safety_bertine_mc_kenna.01
Whitepaper next generation_patient_safety_bertine_mc_kenna.01Whitepaper next generation_patient_safety_bertine_mc_kenna.01
Whitepaper next generation_patient_safety_bertine_mc_kenna.01
 
Data Management Project Proposal
Data Management Project ProposalData Management Project Proposal
Data Management Project Proposal
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall Symposium
 
InternetOfThingBuildsCapacity
InternetOfThingBuildsCapacityInternetOfThingBuildsCapacity
InternetOfThingBuildsCapacity
 

More from Lloyd SOLDATT

DataSecurityTechnicalOverview
DataSecurityTechnicalOverviewDataSecurityTechnicalOverview
DataSecurityTechnicalOverviewLloyd SOLDATT
 
Data Security and Technical White Paper Final (1)
Data Security and Technical White Paper Final (1)Data Security and Technical White Paper Final (1)
Data Security and Technical White Paper Final (1)Lloyd SOLDATT
 
Prism Walkthrough May 2016 (3)
Prism Walkthrough May 2016 (3)Prism Walkthrough May 2016 (3)
Prism Walkthrough May 2016 (3)Lloyd SOLDATT
 
ML_CORP_DECK_Partners
ML_CORP_DECK_PartnersML_CORP_DECK_Partners
ML_CORP_DECK_PartnersLloyd SOLDATT
 
about-marklogic-factsheet_Feb12014
about-marklogic-factsheet_Feb12014about-marklogic-factsheet_Feb12014
about-marklogic-factsheet_Feb12014Lloyd SOLDATT
 

More from Lloyd SOLDATT (6)

DataSecurityTechnicalOverview
DataSecurityTechnicalOverviewDataSecurityTechnicalOverview
DataSecurityTechnicalOverview
 
Data Security and Technical White Paper Final (1)
Data Security and Technical White Paper Final (1)Data Security and Technical White Paper Final (1)
Data Security and Technical White Paper Final (1)
 
vision-app-v5 (1)
vision-app-v5 (1)vision-app-v5 (1)
vision-app-v5 (1)
 
Prism Walkthrough May 2016 (3)
Prism Walkthrough May 2016 (3)Prism Walkthrough May 2016 (3)
Prism Walkthrough May 2016 (3)
 
ML_CORP_DECK_Partners
ML_CORP_DECK_PartnersML_CORP_DECK_Partners
ML_CORP_DECK_Partners
 
about-marklogic-factsheet_Feb12014
about-marklogic-factsheet_Feb12014about-marklogic-factsheet_Feb12014
about-marklogic-factsheet_Feb12014
 

MarkLogic White Paper Better With Bitemporal (Interactive)

  • 1. BETTER WITH BITEMPORAL MARKLOGIC WHITE PAPER • JUNE 2015 In our age of billion-dollar regulatory fines and time-consuming, costly litigation, a database must hold up as the main system of record. Unfortunately, traditional databases do not keep a complete history of the past. Only with a bitemporal database can you truly maintain a complete and accurate picture of the past, understanding exactly “what you knew” and “when you knew it.”
  • 2. ASSESSMENT: DO YOU NEED BITEMPORAL? Before you go any further, it is probably helpful to first ask whether you might need bitemporal data management in your organization. If you answer “yes” to any of the following questions, then bitemporal is a solution that you should consider. YES NO 1. Is tracking when events or transactions occur critical to your business? ✔ 2. Are there ever cases when historical data needs to be updated? ✔ 3. Do you run into circumstances in which there is a lag between when something happened in the real world, and when it was recorded in the database? ✔ 4. Do you get frequent requests from regulators to review historical data? ✔ 5. Do you work in an industry in which the sequence of when you learn about certain information is significant, such as in law and intelligence? ✔ 6. Is the cost and complexity of storing and accessing historical data in your organization overwhelming? ✔ 7. Does managing and accessing historical data cost significant developer resources, or carry increasing risk over time? ✔
  • 3. Contents Introduction................................................................................................................................................................1 The Cost of Not Having Bitemporal Three Types of Temporality......................................................................................................................................2 Non-temporal Unitemporal Bitemporal The Benefits of Bitemporal.......................................................................................................................................4 Things You Can Do With Bitemporal The Increasing Need for Bitemporal Bitemporal Across Industries Why Bitemporal Has Been Difficult.........................................................................................................................7 Why the Time for Bitemporal is Now.......................................................................................................................8 Key Features of Bitemporal in MarkLogic Get Going Quickly More Information
  • 4. INTRODUCTION Today, databases are the primary system of record, not paper. In this new reality, organizations are required to keep an accurate picture of all the facts, as they occur. For certain industries such as financial services, insurance, and healthcare, there are even laws that mandate how historical data is tracked and managed. Unfortunately, traditional databases cannot provide a truly accurate picture of your business at different points-in-time. The reason is that traditional databases are unitemporal, and can only track start and end times along a single timeline. But, what if there is a lag between when something happened and when you found out about it? Which time should you record? Or, what if you realize you need to make a correction to when something happened, but do not want to overwrite any historical data? In those cases, a single timeline is not enough. With a bitemporal database, you can store and query data along two timelines with timestamps for both valid time—when a fact occurred in the real world (“what you knew”), and also system time—when that fact was recorded to the database (“when you knew it”). By tracking events along two timelines with a bitemporal database, it is possible to keep a complete and accurate picture of your business at any given time for internal search and discovery purposes or for when regulators conduct audits. Consider some of the new questions that a bitemporal database allows you to ask: • What were my customer’s credit ratings last year as I knew them last quarter? • What was our position with that security before the trade was amended? • What did our intelligence indicate before we learned that new piece of information? With a traditional unitemporal database, you can ask what your customer’s credit ratings looked like as you knew them today, but not yesterday or last quarter. Only a bitemporal database allows you to go back and see an accurate and unaltered picture of historical data, including past and present changes. A bitemporal database is necessary for today’s enterprises to be able to accurately explore historical data, manage that data across systems, ensure full data integrity, and do more complex analysis. MarkLogic® is an Enterprise NoSQL database that is best suited for storing and managing bitemporal data for the following reasons: • Flexible Data Model – MarkLogic’s document- oriented data model is schema-agnostic and able to manage the complexities of bitemporal data that relational databases are ill-suited for, such as integrity constraints, evolving schemas, and multiple different data models. • Enterprise Reliability – MarkLogic has the enterprise features that other new generation databases do not. MarkLogic is a proven database that runs mission-critical applications at hundreds of world-leading organizations. • Bitemporal Out-of-the-Box – Bitemporal is a feature built-in to MarkLogic whereas other vendors make it an additional software add-on that increases cost and complexity.1 1 Hudson Foods recalled one-fifth of their annual output in 1997 due to an outbreak of E. Coli, costing them an estimated $25 Million. Their database only al- lowed them to see a current view of which beef came from which sources, and not a view of their data as it existed on the day the supplier processed the small batch of contaminated meat. This meant the entire product had to be recalled. For more information, see Richard T. Snodgrass’ book, Developing Time-Oriented Database Applications in SQL (ch.2, 11). THE COST OF NOT HAVING BITEMPORAL Not having bitemporal is directly attributed to costing one company $25 Million.1 It has cost (or perhaps saved) many politicians their jobs. In our age of super-regulation and the need to maintain provenance, immutability, and governance with historical data, the potential cost of not using bitemporal grows much larger. This is particularly true in industries such as financial services where not having an accurate picture of the past has contributed to multi- billion dollar fines and further increases in regulation. 1
  • 5. THREE TYPES OF TEMPORALITY To understand bitemporal, you first have to understand how databases currently manage time. In relation to time, there are three basic categories of databases: non-temporal, unitemporal, and bitemporal. Each type is discussed below, using the example of when a patient was diagnosed with an allergy and when the doctor found out about it as a guide. NON-TEMPORAL Non-temporal databases store data with no time dimension. A fact is just a fact—there is no history and it is only understood to be true at the current point in time. Data models that do not support a time dimension are just called snapshots. Just imagine the example of when a patient was diagnosed with an allergy, which is an important piece of information considering the potential adverse and even deadly reactions that some patients can have to common medications like penicillin. With a non- temporal database, you would just see the current state, which would be either “patient has no allergy” or “patient has a positive allergy diagnosis,” as depicted in Figure 1 in which the shaded area represents when the fact is true. In a non-temporal database, you just get a single view of the data without respect to time. It should not be surprising that non-temporal databases are very uncommon, as most applications deal with time- varying data. UNITEMPORAL Unitemporal databases support time across one dimension: valid time. Most people just think of valid time as just “time”—it represents when something happened in the real world. Valid time is tracked along a single timeline to answer questions such as: When did that patient get diagnosed with an allergy? How many patients have that same allergy? How long has the patient had the allergy? In the example of the patient with the allergy, it is clear from the graph in Figure 2 that the patient was diagnosed with an allergy at 9:00am along the valid timeline. The problem is that valid time only shows a piece of the picture. Looking at the figure above, it would not be clear to an outside observer when the doctor learned that the patient was diagnosed with the allergy. What if it was the lab that first discovered the allergy, but there was a lag in time before the doctor actually found out about the lab results? That is valuable information that is not recorded in a unitemporal database. In this example, imagine if a drug was administered to the patient that day that caused an anaphylactic reaction—didn’t the doctor know not to administer that drug? Let’s look at how to solve this problem with a bitemporal database. BITEMPORAL A bitemporal database records timestamps for events along two dimensions of time: valid time and system time. Valid time tracks when an event occurred in the real world. System time (sometimes called “transaction time”) tracks when the event is recorded to the database. These two time dimensions are depicted graphically along both axes in Figure 3. In this example, valid time represents when the lab discovered the allergy, and system time represents when the doctor found out about it and recorded it to his chart. Unitemporal databases make the false assumption that valid time is always equal to system time, and in doing so loses valuable information. Sometime, as Figure 3 depicts, valid time is equal to system time. But, you would not know that unless you had a bitemporal database. A bitemporal database records time along POSITIVE ALLERGY DIAGNOSIS FIGURE 1: A nontemporal database does not store any time dimensions. TIME: NO ALLERGY DIAGNOSIS POSITIVE ALLERGY DIAGNOSIS 9 AM 10 AM 11 AM 12 AM FIGURE 2: A unitemporal database only tracks valid time. 2
  • 6. both dimensions independently so you can keep accurate records. Using the example of the patient with the allergy, imagine that the doctor actually found out about the allergy at 10:30am, an hour and a half after the lab did their tests and concluded that the patient had an allergy. The lab noted that the patient had an allergy at 9:00am, but that information did not get to the doctor until 10:30am. This represents a lag between valid time and system time, and would look like Figure 4. Taking this example a bit further, imagine that later on the same day, at 11:30am, the doctor gets a call from the lab saying that they just discovered that they did the tests incorrectly. The lab result was actually negative—the patient does not have an allergy. This correction is shown in Figure 5. With a bitemporal database, it is easy to make corrections to historical data, and the process does not overwrite any data. By looking at Figure 5, we can ascertain the following facts: • Before 10:30am (system time), the doctor did not know about the allergy • At 10:30am (system time), the doctor recorded the patient having an allergy, which had been discovered by the lab at 9:00am (valid time) • At 11:30am (system time), the lab and doctor discover the mistake and update the records to show that the patient does not have an allergy With this timeline tracked across both axes, it is now possible to go back and see a true picture of events. This can be extremely helpful in understanding and avoiding mistakes, as the doctor’s decisions can be easily married to what he knew or did not know at any given point in time. In the setting of a hospital, drug allergies can be life threatening, so having an accurate record of when a patient was diagnosed and when care providers learn this information is critical. POSITIVE ALLERGY DIAGNOSIS 9 AM 9 AM 10 AM 11 AM 12 AM 10 AM 11 AM 12 AM NO ALLERGY DIAGNOSIS SYSTEM TIME “When the doctor found out about it” VALIDTIME “Whenthelabdiscoveredtheallergy” FIGURE 3: A bitemporal database tracks both valid time and system time. FIGURE 4: A bitemporal database tracks lags in information. POSITIVE ALLERGY DIAGNOSIS 9 AM 9 AM 10 AM 11 AM 12 AM 10 AM 11 AM 12 AM SYSTEM TIME “When the doctor found out about it” VALIDTIME “Whenthelabdiscoveredtheallergy” NO ALLERGY DIAGNOSIS LAG POSITIVE ALLERGY DIAGNOSIS 9 AM 9 AM 10 AM 11 AM 12 AM 10 AM CORRECTION 11 AM 12 AM SYSTEM TIME “When the doctor found out about it” VALIDTIME “Whenthelabdiscoveredtheallergy” NO ALLERGY DIAGNOSIS FIGURE 5: A bitemporal database tracks corrections without overwriting data. 3
  • 7. The example of the allergy diagnosis may seem somewhat simple, but the same concept can be applied to any piece of data, whether it is when a financial trade occurred, when someone got insurance, or when someone owned a house. In all of these cases, START DATE and END DATE for both valid time and system time can be tracked in order to preserve the most accurate picture of reality. TABLE 1: Comparing Unitemporal to Bitemporal for a variety of examples. UNITEMPORAL BITEMPORAL When did the lab results indicate that the patient had an allergy to penicillin? When did the lab results indicate that the patient had an allergy to penicillin, and when did the care provider learn about the allergy? When was the sell order cancelled by the bank’s counter party? When was the sell order cancelled by the bank’s counter party, and when did the trader learn that it was cancelled? What reference data existed regarding trade events on December 4th? What reference data did the trader actually have on December 4th? When did John become eligible for insurance coverage, as the employment records indicate now? When did John become eligible for insurance coverage, as the employment records indicated in 2012? THE BENEFITS OF BITEMPORAL Bitemporal, simply put, gives you a better way to manage time. No alternative to bitemporal, even temporal versioning, can provide a seamless, query- able, flexible view of historical data. Bitemporal is a critical capability any organization can take advantage of, and there is a particularly growing need for bitemporal in industries that face growing regulatory pressures and litigation such as financial services, insurance, and healthcare. In these industries, organizations are having to better account for all of their past actions with the onset of new laws and litigation, more frequent and in-depth audits, and increased fines for non-compliance. Organizations that better manage their historical data are able to reduce their risk and get through audits unscathed. THINGS YOU CAN DO WITH BITEMPORAL • Handle Regulation and Audits – Provide an accurate picture of the past to meet requirements for increased transparency and accountability • Manage Risk – Create better risk models and improve business intelligence by analyzing true historical data • Reduce Costs – Simplify architecture and reduce the cost and operational risk of storing redundant historical data THE INCREASING NEED FOR BITEMPORAL The need to better manage regulatory concerns is growing in general, though it is having a particularly significant impact in certain industries, such as financial services. Large banks have been hit with record- breaking fines in recent years, coupled with an increase in regulatory pressures. Since 2009, banks in the U.S. and Europe have paid over $128 billion to regulators, and 2014 was the biggest year ever, with $65 billion in penalties and fines, about 40% greater than in 2013.2 Today, regulators are more intrusive and carry out more vigorous enforcement as they drill into the details. According to Gerold Grasshoff, the global head of risk management and regulation at Boston Consulting Group, regulatory pressures are now a core issue for banks. “You have to change your operating model, change your products, change the legal risks now... Nothing is changing business models as much as the regulatory issues. That is the biggest strategic challenge.” To adopt to the changing way in which business is done, banks are having to change their IT and data management approaches to increase transparency.3 Other industries are also facing increased regulatory pressures. In healthcare, for example, there is the problem wrought by medical errors, which some reports estimate to be $1 Trillion.4 Knowing when and how 2 James Sterngold, “For Banks, 2014 Was a Year of Big Penalties”, Dec. 30, 2014 <http://www.wsj.com/articles/no-more-regulatory-nice-guy-for- banks-1419957394> 3 Boston Consulting Group, “Building the Transparent Bank”, Dec. 2014 <https:// www.bcgperspectives.com/Images/Building_the_Transparent_Bank_Dec_2014_ tcm80-177814.pdf> 4 Andel, Davidow, Hollander, Moreno. “The economics of health care quality and medical errors.” Journal of Health Care Finance 39(1):39-50 (2012) <http://www. ncbi.nlm.nih.gov/pubmed/23155743> 4
  • 8. errors occurred is critical to improving medical decision making and avoiding medical malpractice. And, consider the growing cost of fraud and abuse across the healthcare industry, estimated to be anywhere between $82 and 272 Billion in the U.S.5 Unfortunately, the general cost and complexity surrounding patient safety, malpractice litigation, and fraud and abuse is only increasing.6 By implementing bitemporal data management, organizations can take a bold step towards lowering risk, improving transparency, and gaining a competitive advantage to outrun the competition. BITEMPORAL ACROSS INDUSTRIES FINANCIAL SERVICES Bitemporal helps large banks better manage their data and adapt to the changes in laws and regulation that are impacting how business is done. For example, bitemporal helps by providing an accurate record of trades as they occur and are amended. After trades are made, they are later reconciled with counterparties and updates often occur before the trade is closed. With a unitemporal database, updates overwrite historical data, which can put enormous risk on individual traders and entire companies. Bitemporal provides an accurate picture of the entire lifecycle of a trade review, including when changes to counterparty names, transaction id’s, or price corrections occurred. INSURANCE In the insurance industry, bitemporal helps by providing a clear determination of coverage over the course of history, ensuring that even if there are retroactive changes, data is never overwritten. 5 Berwick, Hackbarth. “Eliminating waste in US health care.” JAMA 307(14):1513-6 (2012) <http://www.ncbi.nlm.nih.gov/pubmed/22419800> 6 James Sterngold. “For Banks, 2014 Was a Year of Big Penalties.” Wall Street Journal, 2014. TABLE 2: Bitemporal in Financial Services BEFORE BITEMPORAL AFTER BITEMPORAL What do we think the trader’s position was, and what information do we think was available to the trader around the time when the trade was executed? What was the trader’s exact position when the trade was executed, and what exact reference data was available at the time the trade was executed? What were our customer’s credit ratings last year? What were our customer’s credit ratings last year, as we knew them last quarter? What was our market exposure when trade was made at 11:00am? What was our market exposure when that trade was made at 11:00am, as we knew it at 11:30am? What was the company’s profit when we gave guidance? What did we think the company’s profit was when we gave guidance? TABLE 3: Bitemporal in Insurance BEFORE BITEMPORAL AFTER BITEMPORAL What was the estimated impact of the disaster on insurance premiums? What was the estimated impact of the disaster on insurance premiums, before the data was adjusted retroactively? Did the beneficiary have coverage at the point of diagnosis? Did the beneficiary have coverage at the point of diagnosis, before the legislation was enacted? Was the employee with the company when the event occurred? Was the employee with the company when the event occurred, as indicated by your records at that time? “We’re in an era of very, very vigorous enforcement, of heightened super regulation. It’s not a one-off thing.” Benjamin Lawsky, Superintendent for Financial Services, New York State6 5
  • 9. The insurance company can always go back and see a history of past coverage at any point in time in the past. An insurer may also want to know employee status, and may need an accurate picture of when an employee was actually with a company at any point in time, as they knew it at any point in time. HEALTHCARE Healthcare faces enormous challenges for all stakeholders, including providers, payers, and pharmaceutical and biotechnology companies. Bitemporal is one component of improvements in health IT that helps lower costs and improve outcomes by giving providers a more accurate picture of a patient’s history as varied teams direct the course of treatment, and an improved investigative tool when looking at adverse events. And, when Payers receive billing codes for procedures, they are able to track the full history of each patient. Even if changes to insurance coverage were made retroactively, no part of the history is lost. There are also benefits to pharmaceutical and biotechnology companies as they are able to use bitemporal to enhance decision making in both research and business. LAW AND INTELLIGENCE Bitemporal helps paint a complete picture even when disparate facts are gathered piece-meal before and after certain events. With a more complete picture, government agencies have the ability to better understand motives and even better predict future events. During investigations, bitemporal enables law enforcement officers to go back and ask why you went down a certain path, which is particularly useful when investigations are resurrected from cold case files. TABLE 4: Bitemporal in Healthcare BEFORE BITEMPORAL AFTER BITEMPORAL What did the patient’s chart look like when the medication was prescribed? What did the patient’s chart look like when the medication was prescribed, before the chart was updated with the lab results? What was the coverage determination for that patient in June 2010? What was the coverage determination for that patient in June 2010, as we knew it in August 2010? What did the clinical trial results indicate when you made the additional investment? What did the clinical trial results indicate when you made the additional investment, before the research results were updated? TABLE 5: Bitemporal in Law and Intelligence BEFORE BITEMPORAL AFTER BITEMPORAL What was happening when we made the decision? What did we think was happening when we made the decision? When did the event happen? When did the event happen, and when was that recorded? Why do we currently think that we pursued that course of action? What were we thinking when we pursued that course of action? “MarkLogic’s bitemporal offers the flexibility of correlating and delivering additional value of data (by providing intraday information, not just end-of-day information) to a diverse customer group—rapidly—that just hasn’t been fully realized before... In fact, MarkLogic’s bitemporal will provide an entirely new opportunity for our customers to perform additional analytics as well as enabling much richer capabilities in the area of compliance management.” Paolo Pelizzoli, Global Head of Architecture, Global Technology Operations, Broadridge Financial Solutions 6
  • 10. WHY BITEMPORAL HAS BEEN DIFFICULT At this point you are likely asking, “If bitemporal is that important, why haven’t I heard about it?” Although there have been thousands of research papers written on the topic of temporal data in the past twenty years and the topic of bitemporal has been discussed by experts since the early 1990’s, bitemporal is still relatively unknown. Bitemporal clearly has incredible business value. Yet, most analysts on the business side do not even know they can ask for bitemporal data because it is so seldom put into production. The problem is that with relational databases the complexities of implementing and maintaining bitemporal generally outweigh the benefits. In fact, just handling ordinary temporal data in a relational database can be a huge challenge. Unfortunately, despite efforts to make bitemporal data easier to manage in relational databases, bitemporal remains an unreachable goal with traditional tools. The number of experts in the world that can manage the complexities inherent with bitemporal implementations using relational databases is probably limited to only a special few individuals. Without going too deeply into the details of bitemporal data modelling, here are some of the key reasons why relational databases are ill- suited for bitemporal data management: • Integrity Constraints – The relational data model comes with constraints such as referential integrity, entity integrity, and defined schemas that are not easily changed. Some constraints are specific to temporal data, such as child rows within a table only being able to include valid periods of time within the valid period of time defined by the parent row of the table. When bitemporal columns are added to a relational table, they can wreak havoc on the relational data model. • Schema Evolution – There is incredible complexity when adding bitemporal to a relational model. Architectural and structural changes are temporal themselves, and when new columns are added with temporal dimensions or new tables are created as new data is ingested, the schema will change. Handling a changing schema and resulting changes in application code are complex projects already, even before trying to add bitemporal. • Multiple Data Models – Handling schema evolution is a difficult challenge, but now imagine the task of handling multiple evolving schemas across multiple data models and data silos, and then aggregating them into a single source of truth. Data integration is an expensive task, but when bitemporal data is included, the complexity grows exponentially. • Decline in Performance – Read and write performance typically dips because bitemporal queries must consider the additional axis of time in every query, and data usually spans multiple tables and in some cases even multiple servers. Attempts have been made to simplify queries and improve performance, but they have not gone far enough in eliminating the inherent complexity and performance issues caused by scattering bitemporal data across tables. • Vendor Lock-in – Some vendors have begun to implement improvements in bitemporal. However, as happened in the past with implementing SQL standards, each vendor will implement them differently with their own syntax and then tack on an additional cost of the feature as an add-on. Oftentimes, the response to the challenges of implementing bitemporal in a relational database is to find the next best solution. Here are some of the common responses. “Relational bitemporal offerings are not widely adopted because as time changes, the shape of the data usually changes as well… and RDBMS’ are not able to capture the evolving schema.” Global Investment Bank 7
  • 11. • “But, I can just use Slowly Changing Dimensions” – Attempts to use dimensional modelling “type two” Slowly Changing Dimensions (SCDs) as a way to approximate bitemporal data have been made in recent years, and the problems with this approach have been well documented.7 Using SCDs only approximates valid temporal data and results in many inconsistencies that are later difficult to uncover and fix. And, even if everything is designed properly, query performance will likely still be slow and results may not be reproducible. • “But, I can just take frequent snapshots” – This approach, also referred to as “temporal versioning,” is a more common argument against bitemporal, as most organizations are already taking regular weekly or monthly snapshots of their data. This approach is stable and predictable. Unfortunately, this approach results in massive amounts of redundant data, immense storage costs, and still lots of lost information because of the gaps between snapshots. And, even if frequent snapshots are taken, regulators in most industries view this as increasingly unacceptable. Both regulators and data analysts have specific questions, require fast answers, and do not appreciate any gaps. 7 Tom Johnston. Bitemporal Data: Theory and Practice (Waltham, MA: Elsevier, 2014) 311 - 313. • “But, I can just rely on my audit logs” – While useful for tracking event information, logs are not sufficient for bitemporal because they cannot be easily or quickly queried and would not meet standards for maintaining immutable records Bitemporal is the only approach to managing time that provides a quick and seamless way to look back at historical data, query it on the fly at any point-in-time, and work with it operationally just as you would with your most current data.8 WHY THE TIME FOR BITEMPORAL IS NOW As an Enterprise NoSQL database, MarkLogic provides the flexibility required to make storing and managing bitemporal data a practical reality, without sacrificing any performance with complex queries or data resiliency and security. MarkLogic is also unique in being the only Enterprise NoSQL database that has bitemporal capability. MarkLogic is schema-agnostic, and manages data as documents. This means that you do not have to maintain a strict schema that must be adhered to throughout the life of the database. If you have to 8 Richard T. Snodgrass. Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann Publishers, Inc., San Francisco, July, 1999. <http://www. cs.arizona.edu/~rts/tdbbook.pdf> “ Despite the near universality of time and the time-varying nature of the enterprise being modeled—a static and unmalleable configuration is rare and uninteresting—SQL quite frankly does a lousy job in capturing those aspects that are changing in time, or in providing constructs to effectively model, query, or modify such information.” Richard Snodgrass, Developing Time-Oriented Database Applications in SQL8 Advantages of Bitemporal in MarkLogic MarkLogic Other DBs Schema-agnostic to handle schema evolution and multiple varying data models ✔ ✖ Simpler coding and operations ✔ ✖ Quicker time-to-value ✔ ✖ Scalability, elasticity, and reduced storage costs ✔ ✖ 8
  • 12. integrate a new data source at a later date, you do not have to do complex ETL before loading that data into MarkLogic. The frustration of having to add a new column into a relational database simply disappears— whether you are adding a DATE column or anything else. Bitemporal data may have a lifespan of decades, and organizations need a database that can respond rapidly to keep pace with schema evolution as new data sources are added. MarkLogic makes it easy to ingest new data sources, and if there are conflicts that need to be resolved (e.g., new data source has the column name “SRC_DATE” but it should be “CLAIM_DATE”), MarkLogic makes it easy to perform the necessary transformations to ensure a standard vocabulary. With MarkLogic, you never have to worry about the constraints found with relational data modelling such as entity integrity, referential integrity, and denormalization—even when it comes to bitemporal data management. MarkLogic performs orders of magnitude better than relational databases for large-scale data integration projects, speeding up project delivery times by reducing the amount of time spent doing requirements gathering and data modelling, and improving the quality of prototypes. At Broadridge, a large financial services organization, it was remarked that “The first MarkLogic project took 60 days… It was estimated to take 3,000 days with existing technology.” HOW BITEMPORAL WORKS IN MARKLOGIC For those with a relational database background, working with temporal and bitemporal data in MarkLogic should be very familiar. The main difference is that rather than columns of dates in a table, that information now appears as timestamps within documents. MarkLogic stores and manages all data as documents, including bitemporal data. Whether working with JSON or XML documents, a document is considered to be bitemporal if it includes timestamps for valid start and end times, and for system start and end times. One way to load a bitemporal document into MarkLogic is with MarkLogic Content Pump, or mlcp. You can also use the REST API. Or, you can load a bitemporal document using a simple JavaScript update query, which is shown below. After loading bitemporal documents into MarkLogic, they are managed as a series of documents with range indexes for valid and system time axes. The valid and system time axes each serve as a container for a named pair of range indexes. And, the bitemporal documents are stored in temporal collections, which are logical groupings of temporal documents. You can create additional temporal collections if you have documents that require a different schema for the timestamps. declareUpdate(); var root = { "tempdoc": { "systemStart": null, "systemEnd": null "validStart": "2014-04-03T11:00:00", "validEnd": "2014-04-03T16:00:00", "content": "some data, like closing price" } }; temporal.documentInsert("temporalCollection", "exampledata.json", root); FIGURE 6. Updating a bitemporal document 9
  • 13. After initial documents are loaded into MarkLogic, they are always kept and never changed. Even if a bitemporal document is “deleted”, MarkLogic still keeps the document, but the system time is changed from infinity to the time of the delete. The same process works for updates—older versions are still kept and the “new” version is simply added. MarkLogic also does not allow updates to system start times. Once the system time is set for a collection, it continues to roll forward to further insure the integrity of the data. Keeping track of the provenance of information with full governance and immutability is critical, which is why MarkLogic applies its security model to bitemporal documents. MarkLogic is certified by the National Information Assurance Partnership (NIAP) Common Criteria Evaluation and Validation Scheme, and uses Role Based Access Control (RBAC) by default to manage access to documents. This high level of security ensures that historical records are not tampered with, and that documents maintain their permissions over time. KEY FEATURES OF BITEMPORAL IN MARKLOGIC Insert, update (and never delete) – Ingest temporal JSON or XML documents with references to valid time using the Temporal API or mlcp, and make changes without losing any data as new versions are added Complex temporal queries – Query the database along valid and system time axes using standard Allen and SQL operators when comparing time periods Adapt to evolving schema – Avoid worrying about the changing shape of the data over time. Unlike relational databases, MarkLogic is schema-agnostic and can easily manage schema changes over time Maintain a Last Stable Query Time – A special timestamp, called the LSQT (Last Stable Query Time), can be enabled in order to manage and coordinate system start times across systems Combine with tiered storage – Use tiered storage to easily migrate historical data to less expensive storage tiers, without losing the ability to query the data Combine with semantics – Assign bitemporal elements to documents, whether they are RDF triples, or documents that include RDF triples, giving you the ability to track how relationships change over time Combine with geospatial – Gain the ability to track your data over time and space. MarkLogic stores geospatial data, and now you can accurately track how geospatial data changes over time Take advantage of certified security – Manage bitemporal documents with the same certified security as all other documents, using Role Based Access Control (RBAC) or other security models Scale quickly and easily – Avoid any concerns of under-provisioning with MarkLogic’s scale-out architecture, which allows you to easily add nodes to handle the increased demands of bitemporal data FIGURE 7. A bitemporal query as viewed in MarkLogic Query Console 1 0
  • 14. HOW TO GET STARTED Managing time is not easy. If it were, we probably could have avoided multi-billion dollar problems like Y2K.9 But, managing time is a necessity, and bitemporal is the future of managing time in datbases as we seek to maintain a better record of “what we knew” and “when we knew it.” MarkLogic takes away the constraints that prevent the adoption of bitemporal, and is the best database for storing and managing bitemporal data. GET GOING QUICKLY 1. Identify the questions your business cannot currently answer 2. Identify the business benefits of adding bitemporal 3. Assess the current data management environment 4. Engage with MarkLogic to discuss implementation 5. Download MarkLogic 6. Learn more in MarkLogic’s free training 9 According to the BBC and ComputerWorld, the estimated cost of the prepara- tion and remediation for the “Year 2000 problem”, or Y2K, was $608 Billion, and that’s not taking into account inflation. For more information: Robert L. Mitchell. “Y2K: The good, the bad and the crazy”. ComputerWorld (28 December 2009) <http://www.computerworld.com/article/2522197/it-management/y2k--the-good-- the-bad-and-the-crazy.html?page=2> MORE INFORMATION • Read MarkLogic Documentation – Learn how to work with bitemporal data in MarkLogic at docs.marklogic.com/guide/temporal • Watch a Presentation – Hear from a MarkLogic customer about “Why Banks Care About Bitemporal” www.marklogic.com/resources/why- banks-care-about-bitemporality/ • Schedule a Meeting – Discuss your particular use case with a MarkLogic sales representative by contacting us at sales@marklogic.com “MarkLogic has a history of bringing advanced data management technology to market, and many of their customers and partners are accustomed to managing complex data in an agile manner. As a result, MarkLogic customers and partners, in general, have a more mature and creative view of how to manage and use data than do most other database users.” Carl Olofson, Research Vice President for Data Management Software Research, IDC 1 1