Successfully reported this slideshow.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Productive Data Tools for Quants

  1. 1. Productive Data Tools for Quants Wes McKinney @wesmckinn Python in Finance 2013, 2013-04-05
  2. 2. Me • Started pandas project at AQR in 2008 • Other Python projects I’ve been involved with: statsmodels, vbench, gpustats • http://blog.wesmckinney.com • Currently: Founder of stealth SF data startup
  3. 3. Book • In print now! • IPython • NumPy • pandas • matplotlib • Case studies
  4. 4. Finance languages
  5. 5. pandas • Productivity-focused structured data manipulation tools for Python • Fast, intuitive data structures • Filling the gap between Python and more domain-specific languages like R • Huge growth in 2011-2012, continuing in 2013
  6. 6. Productivity, why do we care?
  7. 7. People time = money
  8. 8. Productive not same as high performance
  9. 9. Tool bottlenecks impede innovation
  10. 10. Aside: vbench for performance testing
  11. 11. (Some) financial data challenges • Metadata and data alignment • “Missing” data • Group Operations • Time series
  12. 12. Data alignment •Stock universes •Timestamps
  13. 13. Let’s talk about...
  14. 14. Let’s talk about... a - b Signal 1 Signal 2
  15. 15. Let’s talk about... sum(a - b) / mean(c)
  16. 16. a - b • Same length? • Same metadata? • Same frequency? Data alignment Assumptions can be dangerous
  17. 17. Data alignment • pandas uses axis indexing to specify default join (“automatic data alignment”) behavior B C D E 1 2 3 4 A B C D 0 1 2 3 + = A B C D NA 2 4 6 E NA
  18. 18. Hierarchical indexes • Semantics: a tuple at each tick • Enables easy group selection • Terminology:“multiple levels” • Natural part of GroupBy and reshape operations A 1 2 3 1 2 3 4 B
  19. 19. Missing data • Interpolation (esp. time series) • Dropping / filtering • Replacing with value • Excluding from statistical computations
  20. 20. Time series • Data alignment • Frequency conversions • Date arithmetic • Resampling • Time zones • “As of” joins and lookups
  21. 21. GroupBy A 0 B 5 C 10 5 10 15 10 15 20 A A A B B B C C C A 15 B 30 C 45 A B C A B C 0 5 10 5 10 15 10 15 20 sum ApplySplit Key Combine sum sum

×