This brief lightening talk introduces the issues to be considered when deciding whether Python is a viable replacement for a commercial statistics package in a quantitative analysis shop
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Python as a Replacement for Commercial Stats Packages
1. Python as a Replacement forPython as a Replacement for
Commercial Stats PackagesCommercial Stats Packages
Harold Henson - Hensky Consulting
November 23, 2017
Codie’s Café
Shopify
1Codie’s Cafe Nov 2017
2. Software Choice a Key Element ofSoftware Choice a Key Element of
Business Intelligence InfrastructureBusiness Intelligence Infrastructure
Many areas in government invest in
current data on an ongoing basis
◦ Software cost are minor relative to total costs
Many feel safer with known entities
◦ Onus is on experts to champion a new option
such as Python
Python is still considered exotic
However it is a viable choice!
2Codie’s Cafe Nov 2017
3. Several Core ModulesSeveral Core Modules
Statisticians will focus on a few core
modules in the entire ecosystem
Pandas
◦ Can build analytical datasets
◦ Has many rudimentary techniques
Numpy
◦ Matrix Algebra
SciPy
StatsModels
Codie’s Cafe Nov 2017
3
4. Sample OutputSample Output
Very Similar to a Commercial Package
Codie’s Cafe Nov 2017 4
OLS Regression Results
==============================================================================
Dep. Variable: A2Y R-squared: 0.283
Model: OLS Adj. R-squared: 0.268
Method: Least Squares F-statistic: 18.94
Date: Mon, 04 Sep 2017 Prob (F-statistic): 7.02e-05
Time: 18:31:07 Log-Likelihood: -62.546
No. Observations: 50 AIC: 129.1
Df Residuals: 48 BIC: 132.9
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------------------
Intercept 0.0310 0.635 0.049 0.961 -1.246 1.308
A1Y 0.5432 0.125 4.352 0.000 0.292 0.794
==============================================================================
Omnibus: 1.817 Durbin-Watson: 2.084
Prob(Omnibus): 0.403 Jarque-Bera (JB): 1.102
Skew: -0.339 Prob(JB): 0.576
Kurtosis: 3.263 Cond. No. 27.5
==============================================================================
Warnings:
[1] Standard Errors assumes that the covariance matrix of the errors is correctly specified.
5. Numerous Speciality ModulesNumerous Speciality Modules
Pretty Pandas
◦ Henry Hammond
Tensor Flow
◦ Independent big data/neural networks project
Sckit Learn
PyMc
◦ Baysian Statistics
Codie’s Cafe Nov 2017 5
6. Quality of Communications ofQuality of Communications of
Results MixedResults Mixed
Graphics Package Second to None
Tabulation is weak
◦ Export to Spreadsheet is the easiest way to
support professional tables in Documents
Jupyter Notebooks very useful for limited
applications
Codie’s Cafe Nov 2017
6
7. In SummaryIn Summary
PerfectlyViable Option
Increased Power May Come at a Cost of
Training
◦ More research oriented will favour Python
◦ High turnover environments will favour
commercial packages
Other open source projects of note
◦ R – has very long history
◦ Julia – the next generation?
Codie’s Cafe Nov 2017 7
8. ReferencesReferences
Python for Data Analysis – Wes McKinney
◦ Core document for project
◦ Crucial details are online
◦ 2nd
edition just released
Guide to Numpy – Travis Oliphant
◦ Applied matrix algebra
Learning SciPy for Numerical and
Scientific Computing – Rojas et. al.
Codie’s Cafe Nov 2017
8