1. Historical Data
The Simple Hard Problem Of Time
• Time and time series are fundamental
to the fabric of relationships that exist
in any database that manages historical
data.
• Time and time series are independent of
the applications and entities they help
to model.
Historical Data – A Simple Hard Problem 1
2. Historical Databases, Time, and Time Series
• An introduction to some financial data
and a simple hard problem.
• An introduction to temporal data from a
data base designer’s perspective.
• A introduction to a query processing
architecture that supports this model.
Historical Data – A Simple Hard Problem 2
3. A Simple Hard Problem
• Find the average pe for the S&P 500 for
the last 12 month ends.
^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.
evaluate: [
^date print;
Named Universe SP500 list average: [price / eps12]. printNL
];
Historical Data – A Simple Hard Problem 3
4. A Simple Hard Problem
Complex Data Varies Over Time
Named Universe SP500 list
• S&P updates the membership of the
S&P 500 monthly, so...
• Time Series hold more than simple data
like numbers and strings.
Historical Data – A Simple Hard Problem 4
5. A Simple Hard Problem
Comparable Data Is Measured At Different Points In Time
price / eps12
• This simple ratio is based on data
measured and recorded at different
frequencies and at different points in
time.
• price is probably measured and recorded
on a business day basis
• eps12 is a quarterly value based on the
most recent 4 quarters of data.
Historical Data – A Simple Hard Problem 5
6. A Simple Hard Problem
Historical Data is Restated
• Stock splits require adjustments to
historical data:
– At the end of 1997, Microsoft reported
earnings that made eps12 $3.24 per share.
– Effective February 23, 1998, Microsoft
stock split 2 for 1.
– After the split and until a new value is
reported, the value of eps12 should be $1.62
per share.
Historical Data – A Simple Hard Problem 6
7. A Simple Hard Problem
Historical Data is Restated
• Split adjustments require the
restatement of all historical per share
data to make it consistent and
comparable.
Date Value Date Value
03/31/1997 2.27 03/31/1997 1.13
06/30/1997 2.64 06/30/1997 1.32
09/30/1997 3.60 09/30/1997 1.80
12/31/1997 3.24 12/31/1997 1.62
Historical Data – A Simple Hard Problem 7
8. A Simple Hard Problem
Restating Historical Data
• There are two ways to restate historical
data:
– Convert a simple fact into a massive,
complex, and error prone update.
– Adjust the affected data on access
using a time-series of adjustment
factors.
Historical Data – A Simple Hard Problem 8
9. A Simple Hard Problem
Restating Historical Data
• Split adjustment data is completely
irregular and has absolutely no
periodicity, so…
• Efficient, irregular, event-oriented time-
series are required to store it with
minimal redundancy and maximal
consistency.
Historical Data – A Simple Hard Problem 9
10. A Simple Hard Problem
Seemingly Regular Data Is Not as Regular As It Seems
• Companies report their data on a fiscal,
not a calendar basis:
– the fourth quarter of 1998 for
Woolworth’s ends in January, 1998
– the fourth quarter of 1998 for
Walgreen’s ends in August, 1998
Historical Data – A Simple Hard Problem 10
11. A Simple Hard Problem
Seemingly Regular Data Is Not as Regular As It Seems
• Accessing the most recent earnings per
share as of August 25, 1998 means
accessing:
– 2nd Quarter, 1999 fiscal data for
Woolworth’s
– 3rd Quarter, 1998 fiscal data for
Walgreen’s
Historical Data – A Simple Hard Problem 11
12. A Simple Hard Problem
Currency Conversion
• What if this simple hard problem was
based on a universe of international
securities?
• What if different data sources report
data for the same security in different
currencies?
• Currency conversion rates - another
time-series required to correctly use
financial data.
Historical Data – A Simple Hard Problem 12
13. A Simple Hard Problem
A Summary of Some of the Issues
• Complex aggregates, not just numbers
and strings, vary over time.
• Comparable data is measured at
different points in time.
• Regularly measured data is adjusted for
the effects of irregularly spaced events.
• Seemingly regular data is often not as
regular as it first appears.
Historical Data – A Simple Hard Problem 13
14. A Simple Hard Problem
A Summary of Some of the Needs
• Complex rules are required to correctly
interpret and use the data.
• These rules must be encapsulated in a
reusable form so that every application
does not need to reproduce them.
• These rules must be accessible to the
DBMS if it is to be more than a static
repository.
Historical Data – A Simple Hard Problem 14
15. A Simple Hard Problem
A Summary of Some of the Needs
• Simplicity
• Despite the complexity associated with
accessing and using the data, simple
queries must remain simple to state:
^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.
evaluate: [
^date print;
Named Universe SP500 list average: [price / eps12]. printNL
];
Historical Data – A Simple Hard Problem 15
16. A Simple Hard Problem
A Summary of Some of the Needs
• The issue is building and using an
historical database, not just storing and
retrieving stand-alone time-series.
Historical Data – A Simple Hard Problem 16
17. A Designer’s Perspective
On A Simple Hard Problem
• With time providing the context to
answer it correctly…
^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.
evaluate: [
^date print;
Named Universe SP500 list average: [price / eps12]. printNL
];
• How do we get there?
Historical Data – A Simple Hard Problem 17
18. A Designer’s Perspective
Some Underlying Data
• Underneath it all, a simple enough
workhorse…
PriceRecord
defineFixedProperty: ‘security’.
defineFixedProperty: ‘recordDate’.
defineFixedProperty: ‘rawPrice’.
defineFixedProperty: ‘rawVolume’.
defineFixedProperty: ‘adjustmentDate’
Historical Data – A Simple Hard Problem 18
19. A Designer’s Perspective
Temporal Multi-Valued Relationships
• … and a temporal, multi-valued
relationship from Security to
PriceRecord (a.k.a... a TimeSeries).
Security define: ‘prices’ withDefault: PriceRecord
prices
Security PriceRecord
PriceRecord
[1:n] (T)
Historical Data – A Simple Hard Problem 19
20. A Designer’s Perspective
Temporal Multi-Valued Relationships
• Temporal multi-valued relationships
can be accessed and used as time-
series…
Named Security IBM :prices count
Named Security IBM :prices minimum: [recordDate]
Named Security IBM :prices mavg30: [price]
Named Security IBM :prices asOf: ^today - 6 monthEnds
Historical Data – A Simple Hard Problem 20
21. A Designer’s Perspective
Temporal Multi-Valued Relationships
• But they exhibit their modeling power
when combined with the temporal
context of an operation to yield the
correct single value for that context…
Named Security IBM prices rawPrice
19980731 evaluate: [
Named Security IBM prices rawPrice
]
Historical Data – A Simple Hard Problem 21
22. A Designer’s Perspective
What About Rules Like Split Adjustment?
• Split adjustment requires a time-series of
adjustment factors for each Security:
Security define: 'adjustmentFactor' withDefault: 1.0;
• And a rule to compute a relative
adjustment factor between an arbitrary
date and the present:
Security defineMethod: [ | adjustmentRelativeTo: aDate |
(:adjustmentFactor asOf: ^today) / (:adjustmentFactor asOf: aDate)
];
Historical Data – A Simple Hard Problem 22
23. A Designer’s Perspective
What About Rules Like Split Adjustment?
• With the rule in place, PriceRecord and
Security can use it:
» PriceRecord defineMethod: [ | adjustedPrice |
rawPrice / adjustmentFactor
];
» PriceRecord defineMethod: [ | adjustmentFactor |
security adjustmentRelativeTo: (adjustmentDate else: recordDate)
];
» Security defineMethod: [ | price |
prices adjustedPrice
];
Historical Data – A Simple Hard Problem 23
24. A Designer’s Perspective
Queries Revisited
• … to enable the simple statement of
complex queries…
» Named Security IBM price
» ^today - 1 monthEnds evaluate: [Named Security IBM price]
» ^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.
evaluate: [
^date print;
Named Universe SP500 list average: [price / eps12]. printNL
];
Historical Data – A Simple Hard Problem 24
25. A Designer’s Perspective
What Are Time Series? A Recap
• Time series are date indexed collections.
• Time series support collection level
operations:
select: average: min: max:
• The set of collection level operations is
and must be user extensible:
mavg30: lsgrow:
Historical Data – A Simple Hard Problem 25
26. A Designer’s Perspective
What Are Time Series? A Recap
• Time series have an associated date
type that serves as a ‘calendar’.
• The date type defines a time line along
which observations are stored.
• The events recorded in a time series
divide the time line into intervals.
Historical Data – A Simple Hard Problem 26
27. A Designer’s Perspective
What Are Time Series? A Recap
• Time series support the interval queries
needed to project temporal multi-valued
relationships to context dependent
single valued relationships:
– Find the observation on or before a
given time point.
– Find the time point that begins
(ends) the interval containing a
given time point.
Historical Data – A Simple Hard Problem 27
28. An Architectural Perspective
(Don’t Try This At Home)
• The engine that powers these examples
employs a model of information that
integrates data base and programming
language principles into a scalable data
base programming language.
Historical Data – A Simple Hard Problem 28
29. An Architectural Perspective
(Don’t Try This At Home)
• The examples are data base oriented,
but the architecture and implementation
is not that of a programming language
manipulating data extracted from an
external data base.
Historical Data – A Simple Hard Problem 29
30. An Architectural Perspective
(Don’t Try This At Home)
• The examples are object-oriented, but
the architecture and implementation is
not that of a traditional object-oriented
programming language.
Historical Data – A Simple Hard Problem 30
31. An Architectural Perspective
Key Features of the Model
• Relationship centric information model
based on category theory.
• Objects are abstract entities that have
no internal state or structure. They are
not records.
• All information is stored in the
functions that connect objects.
Historical Data – A Simple Hard Problem 31
32. An Architectural Perspective
Inherently Algebraic
• The following diagram is a simplified
view of the algebraic structure of a
time-series lookup operation:
Elements
e k
epart
Result Series Dates
csel
esel ksel
Query
Historical Data – A Simple Hard Problem 32
33. An Architectural Perspective
Inherently Collection Centric And Parallel
• For example, when processing the price
method in:
^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.
evaluate: [
^date print;
Named Universe SP500 list average: [price / eps12]. printNL
];
• …the engine is operating on a set of
Security objects, not a single Security.
Historical Data – A Simple Hard Problem 33
34. An Architectural Perspective
Globally Optimize-able
• Optimizations apply to the entire
application, not just the data base or
programming language portions of it:
– query precision
– computation flows tuned to
clustering
– morphism factoring
Historical Data – A Simple Hard Problem 34