Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"Three Dimensional Time: Working with Alternative Data" by Kathryn Glowinski, Engineer at Quantopian

From QuantCon 2017: Lookahead bias and stale data when used in an algorithm are generally categorized as "incorrect data". In fact, the issue does not lie with the data itself, but instead is an issue of perspective. This talk will examine how data is typically viewed through the lens of time, and why, on the whole, that approach is wrong.

At Quantopian, we've tried several ways of handling data with regards to time, and we'll talk about lessons learned along the way. We'll also discuss what multidimensionality means for financial data specifically, and how we can apply this to get better results in backtesting.

Additionally, we'll touch on how to apply multidimensionality to more general data, and why it's important for anyone working with applied data to take this approach.

"Three Dimensional Time: Working with Alternative Data" by Kathryn Glowinski, Engineer at Quantopian

  1. 1. Three-Dimensional Time: Working with Alternative Data Now You’re Thinking with Perspectives! Kathryn Glowinski Engineer, Quantopian
  2. 2. Disclaimer This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Additionally, this presentation is being provided on the express basis that it and any related communications (whether written or oral) will not cause Quantopian to become an investment advice fiduciary under ERISA or the Internal Revenue Code with respect to any retirement plan or IRA investor, as the recipients are fully aware that the Quantopian (i) is not undertaking to provide impartial investment advice, make a recommendation regarding the acquisition, holding or disposal of an investment, act as an impartial adviser, or give advice in a fiduciary capacity, and (ii) has a financial interest in the offering and sale of one or more products and services, which may depend on a number of factors relating to Quantopian’s internal business objectives, and which has been disclosed to the recipient. Nothing set forth herein or any information conveyed (in writing or orally) in connection with this presentation is intended to constitute a recommendation that any person take or refrain from taking any course of action within the meaning of U.S. Department of Labor Regulation §2510.3-21(b)(1), including without limitation buying, selling or continuing to hold any security. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. You are advised to contact your own financial advisor or other fiduciary unrelated to Quantopian about whether any given course of action may be appropriate for your circumstances. The information provided herein is intended to be used solely by the recipient in considering the products or services described herein and may not be used for any other reason, personal or otherwise. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.
  3. 3. What’ll It Be? Let’s chat about how data is typically viewed through the lens of time. Because generally, that way is typically some percentage wrong. Let me tell you why. Lessons learned at Q along the way. What does “multidimensional” data mean for MY data? More importantly, what does it mean for ANYONE’S data? Quantopian.com
  4. 4. “Alternative” Data?
  5. 5. “Fundamental” Data?
  6. 6. My Very Special Dataset
  7. 7. Early Attempts
  8. 8. Consider the Following
  9. 9. So, We Can Just Change the Data, Right?
  10. 10. Enter, Fundamentals What if we captured, every day, what we knew the latest value to be for every piece of known information? It should now be the corrected state of the world for any present moment. 3/1/17 3/3/17 3/5/17 3/9/17 3/7/17 3/13/17 3/11/17 First Known Revisions 10 12 11 10 10 10 10 10 10 11 12 12 12 12 12 12 12Seen Quantopian.com
  11. 11. Still Not Quite Right But we still have the same problem, for revisions to updated data. And what happens if you have 250GB of sparse data alone, before you even forward fill those values? Quantopian.com
  12. 12. Dueling Problems
  13. 13. Lookahead Bias Using data for a backtest that we didn’t know at the time is called lookahead bias. We try VERY hard to avoid this, because it corrupts evaluation of any strategy. “I know that Apple did well in the past, so I’m going to backtest a strategy that just holds Apple after 2005.” Quantopian.com
  14. 14. Stale Data This can be equally disastrous! If the data is never updated, you may be stuck with hilariously incorrect values. “My vendor told me that company ABCD announced a split of 1:25, so I on that day, I traded 25 times what I normally would. But when it actually happened it was only 1:5!” Quantopian.com
  15. 15. So, What Do We Do? This is referred to as point-in-time data. It’s a BIG deal. Personally, I think it’s the BIGGEST deal. (I’m really biased because I do this for a living.) Quantopian.com
  16. 16. Let’s Talk Timelines!
  17. 17. Timelines http://www.csusmhistory.org/faulk006/thresholds/
  18. 18. Though Some People Think About This
  19. 19. Or...This.
  20. 20. Perspective Matters It’s not just about when the data happened. It’s also about when you’re observing the data. If you have a dataset that ever has updates, revisions, or corrections, the data for a single data can change as you move through time. Quantopian.com
  21. 21. 5 87 9
  22. 22. 5 297 9 5 87 9 >>>mean(4_days.values()) >>>7.25 >>>mean(4_days.values()) >>>12.5
  23. 23. Bi-Temporal Data Separates the concepts of when the information HAPPENED from when we KNOW it. Maintains accuracy with regards to data changes through history. Allows questions asked to be answered with regards to perspective. Quantopian.com
  24. 24. Deltas
  25. 25. Base Table: Deltas Table:
  26. 26. 5 297 9 5 87 9 as-of date 1, timestamp 1 as-of date 1, timestamp 2
  27. 27. But...Do We Care? PROS Reproduces events EXACTLY as they occurred. Allows for accurate modeling of simulations from the past. Easily allows for vendor updates to the most accurate known data. CONS Can force modeling of atypical past events that wouldn’t happen in modern day. If there are system errors, can be proliferated even into past data. Data shown can be “imperfect” from vendor or ingestor error. Quantopian.com
  28. 28. Data Analyses Should be Replicable Realistic view of data delivery, instead of the optimized view. The world isn’t perfect, and your data is DEFINITELY not perfect. But, it should at least be consistently imperfect. Quantopian.com
  29. 29. Different Users, Different Needs Quantopian.com Point in time data is a layer of complexity. In evaluation, only care- does it have alpha? Users further on have the luxury of checking survivability. 99% of users don’t want to see a platform’s mistakes. (I made that statistic up, but I’m pretty sure it’s accurate.)
  30. 30. What Else Can this Do for Me? TWTR Actual Time Perspective 2006 2017 2006 Tweeter ------- 2017 Tweeter Twitter Quantopian.com
  31. 31. Verify data model assumptions “We never change our data after the fact. We wouldn’t do that” ~A Quantopian Data Vendor Quantopian.com Is that All?
  32. 32. Split Adjustments Quantopian.com
  33. 33. Why Did You Call this 3D Time at All?
  34. 34. Look Into the Future
  35. 35. Fundamentals, Redux We just deployed a new system. Capture not only the first known value, but also the adjustments to those values. 3/1/17 3/3/17 3/5/17 3/9/17 3/7/17 3/13/17 3/11/17 First Known Revisions Perspective of 3/6/17 10 12 11 10 10 10 10 10 10 - - - - - - - - 11 11 11 11 11 11 11 11 12 12 12 12 12 12Perspective of 3/14/17
  36. 36. Point in Timeness as a Service Raw data history for all (ingested) time But we’d like to update the way that users can give us data too!
  37. 37. But Wait, there’s More Point in Time data doesn’t just have to be stock specific data. This should be applicable to any field, any data.
  38. 38. Data is the Future
  39. 39. Thank You! Kathryn Glowinski Engineer, Quantopian
  40. 40. Disclaimer This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Additionally, this presentation is being provided on the express basis that it and any related communications (whether written or oral) will not cause Quantopian to become an investment advice fiduciary under ERISA or the Internal Revenue Code with respect to any retirement plan or IRA investor, as the recipients are fully aware that the Quantopian (i) is not undertaking to provide impartial investment advice, make a recommendation regarding the acquisition, holding or disposal of an investment, act as an impartial adviser, or give advice in a fiduciary capacity, and (ii) has a financial interest in the offering and sale of one or more products and services, which may depend on a number of factors relating to Quantopian’s internal business objectives, and which has been disclosed to the recipient. Nothing set forth herein or any information conveyed (in writing or orally) in connection with this presentation is intended to constitute a recommendation that any person take or refrain from taking any course of action within the meaning of U.S. Department of Labor Regulation §2510.3-21(b)(1), including without limitation buying, selling or continuing to hold any security. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. You are advised to contact your own financial advisor or other fiduciary unrelated to Quantopian about whether any given course of action may be appropriate for your circumstances. The information provided herein is intended to be used solely by the recipient in considering the products or services described herein and may not be used for any other reason, personal or otherwise. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.

×