Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Productive Data Tools for Quants

1,233 views

Published on

April 5, 2013

Published in: Technology
  • Be the first to comment

Productive Data Tools for Quants

  1. 1. Productive Data Tools for Quants Wes McKinney @wesmckinn Python in Finance 2013, 2013-04-05
  2. 2. Me • Started pandas project at AQR in 2008 • Other Python projects I’ve been involved with: statsmodels, vbench, gpustats • http://blog.wesmckinney.com • Currently: Founder of stealth SF data startup
  3. 3. Book • In print now! • IPython • NumPy • pandas • matplotlib • Case studies
  4. 4. Finance languages
  5. 5. pandas • Productivity-focused structured data manipulation tools for Python • Fast, intuitive data structures • Filling the gap between Python and more domain-specific languages like R • Huge growth in 2011-2012, continuing in 2013
  6. 6. Productivity, why do we care?
  7. 7. People time = money
  8. 8. Productive not same as high performance
  9. 9. Tool bottlenecks impede innovation
  10. 10. Aside: vbench for performance testing
  11. 11. (Some) financial data challenges • Metadata and data alignment • “Missing” data • Group Operations • Time series
  12. 12. Data alignment •Stock universes •Timestamps
  13. 13. Let’s talk about...
  14. 14. Let’s talk about... a - b Signal 1 Signal 2
  15. 15. Let’s talk about... sum(a - b) / mean(c)
  16. 16. a - b • Same length? • Same metadata? • Same frequency? Data alignment Assumptions can be dangerous
  17. 17. Data alignment • pandas uses axis indexing to specify default join (“automatic data alignment”) behavior B C D E 1 2 3 4 A B C D 0 1 2 3 + = A B C D NA 2 4 6 E NA
  18. 18. Hierarchical indexes • Semantics: a tuple at each tick • Enables easy group selection • Terminology:“multiple levels” • Natural part of GroupBy and reshape operations A 1 2 3 1 2 3 4 B
  19. 19. Missing data • Interpolation (esp. time series) • Dropping / filtering • Replacing with value • Excluding from statistical computations
  20. 20. Time series • Data alignment • Frequency conversions • Date arithmetic • Resampling • Time zones • “As of” joins and lookups
  21. 21. GroupBy A 0 B 5 C 10 5 10 15 10 15 20 A A A B B B C C C A 15 B 30 C 45 A B C A B C 0 5 10 5 10 15 10 15 20 sum ApplySplit Key Combine sum sum

×