SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
3.
Book
• Python essentials
• NumPy
• IPython
• matplotlib
• pandas
Published October 2012
4.
Some context
• 2007 to 2013
• NumPy, SciPy mature
• IPython Notebook
• Key libraries/tools developed: scikit-
learn, statsmodels, PyCUDA, ...
• pandas helps make Python a desirable
data preparation language
5.
pandas
• Fast structured data manipulation tools for
Python with nice API
• Goal: make Python a halfway decent language
for data preparation / statistical analysis
• Sometimes say:“R data frames in Python”
• Fast-growing user base / community
9.
Some Trends
• Decline of Desktop, Rise of Web/Cloud
• SVG / HTML5 Canvas / WebGL Tech
• Big Data
• JIT-compile all the things
• Democratize all the things
11.
Data on the Web
• Nirvana: ubiquitous, easy data analysis
• Challenges
• JavaScript: weak language for implementing
analytics
• Computation needs to run “close” to data
• Maintaining interactivity
13.
Embracing the JavaScript
• Build bridges, not walls
• Some examples
• IPython Notebook
• RStudio
• Rob Story’s pandas integrations
• Chartkick
14.
In search of the perfect
“data language”
• Minimal syntax overhead
• Domain-specific data types that all support
missing (NA) values
• Rich built-in prep-related operations
• E.g. set logic, group by, sorting, binning,
indexing
• Integrate within a larger application
15.
JIT compiler tech
• LLVM: growing in popularity
• Rolling a new, fast compute engine much
easier than it used to be
• But: not sure compiling Python code is the
optimal long-term strategy