SlideShare a Scribd company logo
1 of 34
Historical Data
     The Simple Hard Problem Of Time

• Time and time series are fundamental
  to the fabric of relationships that exist
  in any database that manages historical
  data.
• Time and time series are independent of
  the applications and entities they help
  to model.



             Historical Data – A Simple Hard Problem   1
Historical Databases, Time, and Time Series


• An introduction to some financial data
  and a simple hard problem.
• An introduction to temporal data from a
  data base designer’s perspective.
• A introduction to a query processing
  architecture that supports this model.




             Historical Data – A Simple Hard Problem   2
A Simple Hard Problem


• Find the average pe for the S&P 500 for
  the last 12 month ends.

  ^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.
  evaluate: [
     ^date print;
     Named Universe SP500 list average: [price / eps12]. printNL
  ];




                    Historical Data – A Simple Hard Problem        3
A Simple Hard Problem
       Complex Data Varies Over Time

  Named Universe SP500 list
• S&P updates the membership of the
  S&P 500 monthly, so...
• Time Series hold more than simple data
  like numbers and strings.




                Historical Data – A Simple Hard Problem   4
A Simple Hard Problem
Comparable Data Is Measured At Different Points In Time


price / eps12
• This simple ratio is based on data
   measured and recorded at different
   frequencies and at different points in
   time.
• price is probably measured and recorded
   on a business day basis
• eps12 is a quarterly value based on the
   most recent 4 quarters of data.
                   Historical Data – A Simple Hard Problem   5
A Simple Hard Problem
          Historical Data is Restated

• Stock splits require adjustments to
  historical data:
   – At the end of 1997, Microsoft reported
     earnings that made eps12 $3.24 per share.
   – Effective February 23, 1998, Microsoft
     stock split 2 for 1.
   – After the split and until a new value is
     reported, the value of eps12 should be $1.62
     per share.


               Historical Data – A Simple Hard Problem   6
A Simple Hard Problem
               Historical Data is Restated

• Split adjustments require the
  restatement of all historical per share
  data to make it consistent and
  comparable.

  Date          Value                                Date         Value
  03/31/1997    2.27                                 03/31/1997   1.13
  06/30/1997    2.64                                 06/30/1997   1.32
  09/30/1997    3.60                                 09/30/1997   1.80
  12/31/1997    3.24                                 12/31/1997   1.62


                       Historical Data – A Simple Hard Problem           7
A Simple Hard Problem
          Restating Historical Data

• There are two ways to restate historical
  data:
   – Convert a simple fact into a massive,
     complex, and error prone update.
   – Adjust the affected data on access
     using a time-series of adjustment
     factors.



             Historical Data – A Simple Hard Problem   8
A Simple Hard Problem
          Restating Historical Data

• Split adjustment data is completely
  irregular and has absolutely no
  periodicity, so…
• Efficient, irregular, event-oriented time-
  series are required to store it with
  minimal redundancy and maximal
  consistency.



              Historical Data – A Simple Hard Problem   9
A Simple Hard Problem
Seemingly Regular Data Is Not as Regular As It Seems



• Companies report their data on a fiscal,
  not a calendar basis:
   – the fourth quarter of 1998 for
     Woolworth’s ends in January, 1998
   – the fourth quarter of 1998 for
     Walgreen’s ends in August, 1998



                Historical Data – A Simple Hard Problem   10
A Simple Hard Problem
Seemingly Regular Data Is Not as Regular As It Seems



• Accessing the most recent earnings per
  share as of August 25, 1998 means
  accessing:
   – 2nd Quarter, 1999 fiscal data for
     Woolworth’s
   – 3rd Quarter, 1998 fiscal data for
     Walgreen’s


                Historical Data – A Simple Hard Problem   11
A Simple Hard Problem
           Currency Conversion

• What if this simple hard problem was
  based on a universe of international
  securities?
• What if different data sources report
  data for the same security in different
  currencies?
• Currency conversion rates - another
  time-series required to correctly use
  financial data.

             Historical Data – A Simple Hard Problem   12
A Simple Hard Problem
      A Summary of Some of the Issues

• Complex aggregates, not just numbers
  and strings, vary over time.
• Comparable data is measured at
  different points in time.
• Regularly measured data is adjusted for
  the effects of irregularly spaced events.
• Seemingly regular data is often not as
  regular as it first appears.

              Historical Data – A Simple Hard Problem   13
A Simple Hard Problem
      A Summary of Some of the Needs

• Complex rules are required to correctly
  interpret and use the data.
• These rules must be encapsulated in a
  reusable form so that every application
  does not need to reproduce them.
• These rules must be accessible to the
  DBMS if it is to be more than a static
  repository.


             Historical Data – A Simple Hard Problem   14
A Simple Hard Problem
        A Summary of Some of the Needs

• Simplicity
• Despite the complexity associated with
  accessing and using the data, simple
  queries must remain simple to state:

  ^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.
  evaluate: [
     ^date print;
     Named Universe SP500 list average: [price / eps12]. printNL
  ];


                    Historical Data – A Simple Hard Problem        15
A Simple Hard Problem
      A Summary of Some of the Needs



• The issue is building and using an
  historical database, not just storing and
  retrieving stand-alone time-series.




             Historical Data – A Simple Hard Problem   16
A Designer’s Perspective
             On A Simple Hard Problem

• With time providing the context to
  answer it correctly…
  ^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.
  evaluate: [
     ^date print;
     Named Universe SP500 list average: [price / eps12]. printNL
  ];


• How do we get there?

                    Historical Data – A Simple Hard Problem        17
A Designer’s Perspective
              Some Underlying Data

• Underneath it all, a simple enough
  workhorse…

  PriceRecord
     defineFixedProperty: ‘security’.
     defineFixedProperty: ‘recordDate’.
     defineFixedProperty: ‘rawPrice’.
     defineFixedProperty: ‘rawVolume’.
     defineFixedProperty: ‘adjustmentDate’



                  Historical Data – A Simple Hard Problem   18
A Designer’s Perspective
    Temporal Multi-Valued Relationships

• … and a temporal, multi-valued
  relationship from Security to
  PriceRecord (a.k.a... a TimeSeries).

  Security define: ‘prices’ withDefault: PriceRecord


                          prices
      Security                                  PriceRecord
                                                PriceRecord
                          [1:n] (T)



                   Historical Data – A Simple Hard Problem    19
A Designer’s Perspective
    Temporal Multi-Valued Relationships

• Temporal multi-valued relationships
  can be accessed and used as time-
  series…

  Named Security IBM :prices count
  Named Security IBM :prices minimum: [recordDate]
  Named Security IBM :prices mavg30: [price]
  Named Security IBM :prices asOf: ^today - 6 monthEnds



                 Historical Data – A Simple Hard Problem   20
A Designer’s Perspective
    Temporal Multi-Valued Relationships

• But they exhibit their modeling power
  when combined with the temporal
  context of an operation to yield the
  correct single value for that context…

  Named Security IBM prices rawPrice

  19980731 evaluate: [
    Named Security IBM prices rawPrice
  ]


                 Historical Data – A Simple Hard Problem   21
A Designer’s Perspective
  What About Rules Like Split Adjustment?

• Split adjustment requires a time-series of
  adjustment factors for each Security:
    Security define: 'adjustmentFactor' withDefault: 1.0;

• And a rule to compute a relative
  adjustment factor between an arbitrary
  date and the present:
    Security defineMethod: [ | adjustmentRelativeTo: aDate |
       (:adjustmentFactor asOf: ^today) / (:adjustmentFactor asOf: aDate)
    ];


                     Historical Data – A Simple Hard Problem       22
A Designer’s Perspective
    What About Rules Like Split Adjustment?

• With the rule in place, PriceRecord and
    Security can use it:
»    PriceRecord defineMethod: [ | adjustedPrice |
            rawPrice / adjustmentFactor
     ];

»    PriceRecord defineMethod: [ | adjustmentFactor |
            security adjustmentRelativeTo: (adjustmentDate else: recordDate)
     ];

»    Security defineMethod: [ | price |
            prices adjustedPrice
     ];

                       Historical Data – A Simple Hard Problem     23
A Designer’s Perspective
                     Queries Revisited

• … to enable the simple statement of
    complex queries…

» Named Security IBM price
» ^today - 1 monthEnds evaluate: [Named Security IBM price]
» ^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.
    evaluate: [
       ^date print;
       Named Universe SP500 list average: [price / eps12]. printNL
    ];


                       Historical Data – A Simple Hard Problem       24
A Designer’s Perspective
        What Are Time Series? A Recap

• Time series are date indexed collections.
• Time series support collection level
  operations:
   select:   average:           min:           max:
• The set of collection level operations is
  and must be user extensible:
   mavg30:     lsgrow:


                Historical Data – A Simple Hard Problem   25
A Designer’s Perspective
      What Are Time Series? A Recap

• Time series have an associated date
  type that serves as a ‘calendar’.
• The date type defines a time line along
  which observations are stored.
• The events recorded in a time series
  divide the time line into intervals.




             Historical Data – A Simple Hard Problem   26
A Designer’s Perspective
      What Are Time Series? A Recap

• Time series support the interval queries
  needed to project temporal multi-valued
  relationships to context dependent
  single valued relationships:
   – Find the observation on or before a
     given time point.
   – Find the time point that begins
     (ends) the interval containing a
     given time point.

             Historical Data – A Simple Hard Problem   27
An Architectural Perspective
        (Don’t Try This At Home)



• The engine that powers these examples
  employs a model of information that
  integrates data base and programming
  language principles into a scalable data
  base programming language.




             Historical Data – A Simple Hard Problem   28
An Architectural Perspective
        (Don’t Try This At Home)




• The examples are data base oriented,
  but the architecture and implementation
  is not that of a programming language
  manipulating data extracted from an
  external data base.



            Historical Data – A Simple Hard Problem   29
An Architectural Perspective
         (Don’t Try This At Home)




• The examples are object-oriented, but
  the architecture and implementation is
  not that of a traditional object-oriented
  programming language.




             Historical Data – A Simple Hard Problem   30
An Architectural Perspective
         Key Features of the Model

• Relationship centric information model
  based on category theory.
• Objects are abstract entities that have
  no internal state or structure. They are
  not records.
• All information is stored in the
  functions that connect objects.



             Historical Data – A Simple Hard Problem   31
An Architectural Perspective
            Inherently Algebraic

• The following diagram is a simplified
  view of the algebraic structure of a
  time-series lookup operation:
                        Elements
            e                                k
                        epart
Result                          Series                      Dates
                         csel
           esel                              ksel

                          Query


                  Historical Data – A Simple Hard Problem           32
An Architectural Perspective
  Inherently Collection Centric And Parallel

• For example, when processing the price
  method in:
    ^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.
    evaluate: [
        ^date print;
        Named Universe SP500 list average: [price / eps12]. printNL
    ];

• …the engine is operating on a set of
  Security objects, not a single Security.


                    Historical Data – A Simple Hard Problem      33
An Architectural Perspective
          Globally Optimize-able

• Optimizations apply to the entire
  application, not just the data base or
  programming language portions of it:
   – query precision
   – computation flows tuned to
     clustering
   – morphism factoring



             Historical Data – A Simple Hard Problem   34

More Related Content

Similar to The simple hard problem of time

Ugif 04 2011 france ug04042011-jroy_ts
Ugif 04 2011   france ug04042011-jroy_tsUgif 04 2011   france ug04042011-jroy_ts
Ugif 04 2011 france ug04042011-jroy_ts
UGIF
 
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
DataWorks Summit
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
tafosepsdfasg
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
Terry Bunio
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
InformaticaTrainingClasses
 
Lecture 06 -IIS-OLAP.pptx
Lecture 06 -IIS-OLAP.pptxLecture 06 -IIS-OLAP.pptx
Lecture 06 -IIS-OLAP.pptx
Asadkhan47384
 

Similar to The simple hard problem of time (20)

Ugif 04 2011 france ug04042011-jroy_ts
Ugif 04 2011   france ug04042011-jroy_tsUgif 04 2011   france ug04042011-jroy_ts
Ugif 04 2011 france ug04042011-jroy_ts
 
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
 
Data Mining
Data MiningData Mining
Data Mining
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
Application Middleware Overview
Application Middleware OverviewApplication Middleware Overview
Application Middleware Overview
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
Cissp business continuity planning
Cissp business continuity planning   Cissp business continuity planning
Cissp business continuity planning
 
Business intelligence an Overview
Business intelligence an OverviewBusiness intelligence an Overview
Business intelligence an Overview
 
Timeseries forecasting
Timeseries forecastingTimeseries forecasting
Timeseries forecasting
 
SEAMS SWIM Dec 2016
SEAMS SWIM Dec 2016 SEAMS SWIM Dec 2016
SEAMS SWIM Dec 2016
 
Lecture 06 -IIS-OLAP.pptx
Lecture 06 -IIS-OLAP.pptxLecture 06 -IIS-OLAP.pptx
Lecture 06 -IIS-OLAP.pptx
 
Integration of data mining results into multi-dimensional data models
Integration of data mining results into multi-dimensional data modelsIntegration of data mining results into multi-dimensional data models
Integration of data mining results into multi-dimensional data models
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real world
 
Ch1_slides.ppt
Ch1_slides.pptCh1_slides.ppt
Ch1_slides.ppt
 
Ch1 slides
Ch1 slidesCh1 slides
Ch1 slides
 
Econometrics
EconometricsEconometrics
Econometrics
 
Ch1_slides.ppt
Ch1_slides.pptCh1_slides.ppt
Ch1_slides.ppt
 
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 

The simple hard problem of time

  • 1. Historical Data The Simple Hard Problem Of Time • Time and time series are fundamental to the fabric of relationships that exist in any database that manages historical data. • Time and time series are independent of the applications and entities they help to model. Historical Data – A Simple Hard Problem 1
  • 2. Historical Databases, Time, and Time Series • An introduction to some financial data and a simple hard problem. • An introduction to temporal data from a data base designer’s perspective. • A introduction to a query processing architecture that supports this model. Historical Data – A Simple Hard Problem 2
  • 3. A Simple Hard Problem • Find the average pe for the S&P 500 for the last 12 month ends. ^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds. evaluate: [ ^date print; Named Universe SP500 list average: [price / eps12]. printNL ]; Historical Data – A Simple Hard Problem 3
  • 4. A Simple Hard Problem Complex Data Varies Over Time Named Universe SP500 list • S&P updates the membership of the S&P 500 monthly, so... • Time Series hold more than simple data like numbers and strings. Historical Data – A Simple Hard Problem 4
  • 5. A Simple Hard Problem Comparable Data Is Measured At Different Points In Time price / eps12 • This simple ratio is based on data measured and recorded at different frequencies and at different points in time. • price is probably measured and recorded on a business day basis • eps12 is a quarterly value based on the most recent 4 quarters of data. Historical Data – A Simple Hard Problem 5
  • 6. A Simple Hard Problem Historical Data is Restated • Stock splits require adjustments to historical data: – At the end of 1997, Microsoft reported earnings that made eps12 $3.24 per share. – Effective February 23, 1998, Microsoft stock split 2 for 1. – After the split and until a new value is reported, the value of eps12 should be $1.62 per share. Historical Data – A Simple Hard Problem 6
  • 7. A Simple Hard Problem Historical Data is Restated • Split adjustments require the restatement of all historical per share data to make it consistent and comparable. Date Value Date Value 03/31/1997 2.27 03/31/1997 1.13 06/30/1997 2.64 06/30/1997 1.32 09/30/1997 3.60 09/30/1997 1.80 12/31/1997 3.24 12/31/1997 1.62 Historical Data – A Simple Hard Problem 7
  • 8. A Simple Hard Problem Restating Historical Data • There are two ways to restate historical data: – Convert a simple fact into a massive, complex, and error prone update. – Adjust the affected data on access using a time-series of adjustment factors. Historical Data – A Simple Hard Problem 8
  • 9. A Simple Hard Problem Restating Historical Data • Split adjustment data is completely irregular and has absolutely no periodicity, so… • Efficient, irregular, event-oriented time- series are required to store it with minimal redundancy and maximal consistency. Historical Data – A Simple Hard Problem 9
  • 10. A Simple Hard Problem Seemingly Regular Data Is Not as Regular As It Seems • Companies report their data on a fiscal, not a calendar basis: – the fourth quarter of 1998 for Woolworth’s ends in January, 1998 – the fourth quarter of 1998 for Walgreen’s ends in August, 1998 Historical Data – A Simple Hard Problem 10
  • 11. A Simple Hard Problem Seemingly Regular Data Is Not as Regular As It Seems • Accessing the most recent earnings per share as of August 25, 1998 means accessing: – 2nd Quarter, 1999 fiscal data for Woolworth’s – 3rd Quarter, 1998 fiscal data for Walgreen’s Historical Data – A Simple Hard Problem 11
  • 12. A Simple Hard Problem Currency Conversion • What if this simple hard problem was based on a universe of international securities? • What if different data sources report data for the same security in different currencies? • Currency conversion rates - another time-series required to correctly use financial data. Historical Data – A Simple Hard Problem 12
  • 13. A Simple Hard Problem A Summary of Some of the Issues • Complex aggregates, not just numbers and strings, vary over time. • Comparable data is measured at different points in time. • Regularly measured data is adjusted for the effects of irregularly spaced events. • Seemingly regular data is often not as regular as it first appears. Historical Data – A Simple Hard Problem 13
  • 14. A Simple Hard Problem A Summary of Some of the Needs • Complex rules are required to correctly interpret and use the data. • These rules must be encapsulated in a reusable form so that every application does not need to reproduce them. • These rules must be accessible to the DBMS if it is to be more than a static repository. Historical Data – A Simple Hard Problem 14
  • 15. A Simple Hard Problem A Summary of Some of the Needs • Simplicity • Despite the complexity associated with accessing and using the data, simple queries must remain simple to state: ^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds. evaluate: [ ^date print; Named Universe SP500 list average: [price / eps12]. printNL ]; Historical Data – A Simple Hard Problem 15
  • 16. A Simple Hard Problem A Summary of Some of the Needs • The issue is building and using an historical database, not just storing and retrieving stand-alone time-series. Historical Data – A Simple Hard Problem 16
  • 17. A Designer’s Perspective On A Simple Hard Problem • With time providing the context to answer it correctly… ^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds. evaluate: [ ^date print; Named Universe SP500 list average: [price / eps12]. printNL ]; • How do we get there? Historical Data – A Simple Hard Problem 17
  • 18. A Designer’s Perspective Some Underlying Data • Underneath it all, a simple enough workhorse… PriceRecord defineFixedProperty: ‘security’. defineFixedProperty: ‘recordDate’. defineFixedProperty: ‘rawPrice’. defineFixedProperty: ‘rawVolume’. defineFixedProperty: ‘adjustmentDate’ Historical Data – A Simple Hard Problem 18
  • 19. A Designer’s Perspective Temporal Multi-Valued Relationships • … and a temporal, multi-valued relationship from Security to PriceRecord (a.k.a... a TimeSeries). Security define: ‘prices’ withDefault: PriceRecord prices Security PriceRecord PriceRecord [1:n] (T) Historical Data – A Simple Hard Problem 19
  • 20. A Designer’s Perspective Temporal Multi-Valued Relationships • Temporal multi-valued relationships can be accessed and used as time- series… Named Security IBM :prices count Named Security IBM :prices minimum: [recordDate] Named Security IBM :prices mavg30: [price] Named Security IBM :prices asOf: ^today - 6 monthEnds Historical Data – A Simple Hard Problem 20
  • 21. A Designer’s Perspective Temporal Multi-Valued Relationships • But they exhibit their modeling power when combined with the temporal context of an operation to yield the correct single value for that context… Named Security IBM prices rawPrice 19980731 evaluate: [ Named Security IBM prices rawPrice ] Historical Data – A Simple Hard Problem 21
  • 22. A Designer’s Perspective What About Rules Like Split Adjustment? • Split adjustment requires a time-series of adjustment factors for each Security: Security define: 'adjustmentFactor' withDefault: 1.0; • And a rule to compute a relative adjustment factor between an arbitrary date and the present: Security defineMethod: [ | adjustmentRelativeTo: aDate | (:adjustmentFactor asOf: ^today) / (:adjustmentFactor asOf: aDate) ]; Historical Data – A Simple Hard Problem 22
  • 23. A Designer’s Perspective What About Rules Like Split Adjustment? • With the rule in place, PriceRecord and Security can use it: » PriceRecord defineMethod: [ | adjustedPrice | rawPrice / adjustmentFactor ]; » PriceRecord defineMethod: [ | adjustmentFactor | security adjustmentRelativeTo: (adjustmentDate else: recordDate) ]; » Security defineMethod: [ | price | prices adjustedPrice ]; Historical Data – A Simple Hard Problem 23
  • 24. A Designer’s Perspective Queries Revisited • … to enable the simple statement of complex queries… » Named Security IBM price » ^today - 1 monthEnds evaluate: [Named Security IBM price] » ^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds. evaluate: [ ^date print; Named Universe SP500 list average: [price / eps12]. printNL ]; Historical Data – A Simple Hard Problem 24
  • 25. A Designer’s Perspective What Are Time Series? A Recap • Time series are date indexed collections. • Time series support collection level operations: select: average: min: max: • The set of collection level operations is and must be user extensible: mavg30: lsgrow: Historical Data – A Simple Hard Problem 25
  • 26. A Designer’s Perspective What Are Time Series? A Recap • Time series have an associated date type that serves as a ‘calendar’. • The date type defines a time line along which observations are stored. • The events recorded in a time series divide the time line into intervals. Historical Data – A Simple Hard Problem 26
  • 27. A Designer’s Perspective What Are Time Series? A Recap • Time series support the interval queries needed to project temporal multi-valued relationships to context dependent single valued relationships: – Find the observation on or before a given time point. – Find the time point that begins (ends) the interval containing a given time point. Historical Data – A Simple Hard Problem 27
  • 28. An Architectural Perspective (Don’t Try This At Home) • The engine that powers these examples employs a model of information that integrates data base and programming language principles into a scalable data base programming language. Historical Data – A Simple Hard Problem 28
  • 29. An Architectural Perspective (Don’t Try This At Home) • The examples are data base oriented, but the architecture and implementation is not that of a programming language manipulating data extracted from an external data base. Historical Data – A Simple Hard Problem 29
  • 30. An Architectural Perspective (Don’t Try This At Home) • The examples are object-oriented, but the architecture and implementation is not that of a traditional object-oriented programming language. Historical Data – A Simple Hard Problem 30
  • 31. An Architectural Perspective Key Features of the Model • Relationship centric information model based on category theory. • Objects are abstract entities that have no internal state or structure. They are not records. • All information is stored in the functions that connect objects. Historical Data – A Simple Hard Problem 31
  • 32. An Architectural Perspective Inherently Algebraic • The following diagram is a simplified view of the algebraic structure of a time-series lookup operation: Elements e k epart Result Series Dates csel esel ksel Query Historical Data – A Simple Hard Problem 32
  • 33. An Architectural Perspective Inherently Collection Centric And Parallel • For example, when processing the price method in: ^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds. evaluate: [ ^date print; Named Universe SP500 list average: [price / eps12]. printNL ]; • …the engine is operating on a set of Security objects, not a single Security. Historical Data – A Simple Hard Problem 33
  • 34. An Architectural Perspective Globally Optimize-able • Optimizations apply to the entire application, not just the data base or programming language portions of it: – query precision – computation flows tuned to clustering – morphism factoring Historical Data – A Simple Hard Problem 34