2. Me
• Started pandas project at AQR in 2008
• Other Python projects I’ve been involved
with: statsmodels, vbench, gpustats
• http://blog.wesmckinney.com
• Currently: Founder of stealth SF data startup
3. Book
• In print now!
• IPython
• NumPy
• pandas
• matplotlib
• Case studies
5. pandas
• Productivity-focused structured data
manipulation tools for Python
• Fast, intuitive data structures
• Filling the gap between Python and more
domain-specific languages like R
• Huge growth in 2011-2012, continuing in 2013
16. a - b
• Same length?
• Same metadata?
• Same frequency?
Data alignment
Assumptions can be dangerous
17. Data alignment
• pandas uses axis indexing to specify default
join (“automatic data alignment”) behavior
B
C
D
E
1
2
3
4
A
B
C
D
0
1
2
3
+ =
A
B
C
D
NA
2
4
6
E NA
18. Hierarchical indexes
• Semantics: a tuple at each tick
• Enables easy group selection
• Terminology:“multiple levels”
• Natural part of GroupBy and
reshape operations
A 1
2
3
1
2
3
4
B
19. Missing data
• Interpolation (esp. time series)
• Dropping / filtering
• Replacing with value
• Excluding from statistical computations
20. Time series
• Data alignment
• Frequency conversions
• Date arithmetic
• Resampling
• Time zones
• “As of” joins and lookups
21. GroupBy
A 0
B 5
C 10
5
10
15
10
15
20
A
A
A
B
B
B
C
C
C
A 15
B 30
C 45
A
B
C
A
B
C
0
5
10
5
10
15
10
15
20
sum
ApplySplit
Key
Combine
sum
sum