SlideShare a Scribd company logo
Scientist meets web dev:
how Python became the language of data
GaĀØel Varoquaux
Scientist meets web dev:
how Python became the language of data
GaĀØel Varoquaux
Very diverse community
This talk: a reļ¬‚ection on what
we have in common, Python
I am talking about
things you donā€™t understand (my science)
and things I donā€™t understand (web dev)
I actually did a PhD
in quantum physics
Hence I think I qualify as
a ā€œscientistā€
G Varoquaux 3
I now do computer science for neuroscience
Try to link neural activity to thoughts and cognition
G Varoquaux 4
I now do computer science for neuroscience
Try to link neural activity to thoughts and cognition
We attack it as a machine learning problem
Python software: nilearn
G Varoquaux 5
On the way, we created
a machine-learning library:
scikit-learn
G Varoquaux 6
Data science with Python is hot
Huge success.
Cool.
Data science is THE thing.
G Varoquaux 7
Data science with Python is hot
Huge success.
Cool.
Data science is THE thing.
Python is the go-to language
How did it happen?
We built scikit-learn, others pandas, etc...,
but these were built on solid foundations
G Varoquaux 7
1 Scientists come from Jupiter
And web devs from Saturn?
And sysadmins from Neptune?
G Varoquaux 8
1 Weā€™re diļ¬€erent
numbers (in arrays)
arrays (of numbers)
arrays
arrays
strings
databases
object-oriented
programming
ļ¬‚ow control
A bit of a culture gap
G Varoquaux 9
1 Letā€™s do something together: sort EuroPython site
205 talks:
How OpenStack makes Python better (and vice-versa)
Introduction to aiohttp
So you think your Python startup is worth $10 million...
SQLAlchemy as the backbone of a Data Science company
Learn Python The Fun Way
Scaling Microservices with Crossbar.io
If you can read this you donā€™t need glasses
Letā€™s ļ¬nd some common topics with data science
G Varoquaux 10
1 Letā€™s do something together: sort EuroPython site
205 talks:
How OpenStack makes Python better (and vice-versa)
Introduction to aiohttp
So you think your Python startup is worth $10 million...
SQLAlchemy as the backbone of a Data Science company
Learn Python The Fun Way
Scaling Microservices with Crossbar.io
If you can read this you donā€™t need glasses
Letā€™s ļ¬nd some common topics with data science
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
G Varoquaux 10
1 Letā€™s do something together: sort EuroPython site
205 talks:
How OpenStack makes Python better (and vice-versa)
Introduction to aiohttp
So you think your Python startup is worth $10 million...
SQLAlchemy as the backbone of a Data Science company
Learn Python The Fun Way
Scaling Microservices with Crossbar.io
If you can read this you donā€™t need glasses
Letā€™s ļ¬nd some common topics with data science
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
import urllib2, bs4
import sklearn,
wordcloud
G Varoquaux 10
1 Letā€™s do something together: sort EuroPython site
Crawl
the schedule to get a list of titles and URLs
talk pages to retrieve abstract and tags
bs4: beautiful soup, matchings on the DOM tree
G Varoquaux 11
1 Letā€™s do something together: sort EuroPython site
Crawl
the schedule to get a list of titles and URLs
talk pages to retrieve abstract and tags
bs4: beautiful soup, matchings on the DOM tree
Vectorize
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
a
can
code
is
module
proļ¬ling
performance
Python
the
20
10
4
14
3
2
1
9
18
Term Freq
G Varoquaux 11
1 Letā€™s do something together: sort EuroPython site
Crawl
the schedule to get a list of titles and URLs
talk pages to retrieve abstract and tags
bs4: beautiful soup, matchings on the DOM tree
Vectorize
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
a
can
code
is
module
proļ¬ling
performance
Python
the
20
10
4
14
3
2
1
9
18
Term Freq
1321
540
208
964
123
7
6
191
1450
All
docs
G Varoquaux 11
1 Letā€™s do something together: sort EuroPython site
Crawl
the schedule to get a list of titles and URLs
talk pages to retrieve abstract and tags
bs4: beautiful soup, matchings on the DOM tree
Vectorize
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
a
can
code
is
module
proļ¬ling
performance
Python
the
20
10
4
14
3
2
1
9
18
Term Freq
1321
540
208
964
123
7
6
191
1450
All
docs
.015
.018
.019
.014
.023
.286
.167
.047
.012
Ratio
TF-IDF in scikit-learn
sklearn.feature extraction.text.TfidfVectorizer
G Varoquaux 11
1 Letā€™s do something together: sort EuroPython site
03078090707907
00790752700578
94071006000797
00970008007000
10000400400090
00050205008000
documents
the
Python
performance
profiling
module
is
code
can
a
Term-document matrix
G Varoquaux 12
1 Letā€™s do something together: sort EuroPython site
03078090707907
00790752700578
94071006000797
00970008007000
10000400400090
00050205008000
documents
the
Python
performance
profiling
module
is
code
can
a
Term-document matrix
3 78 9 7 79 7
79 7527 578
94 71 6
797
97
8 7
1
4 4
9
5 2 5 8
Can be a sparse matrix
G Varoquaux 12
1 Letā€™s do something together: sort EuroPython site
03078090707907
00790752700578
94071006000797
00970008007000
10000400400090
00050205008000
documents
the
Python
performance
profiling
module
is
code
can
a
ā†’
03078090707907
00790752700578
94071006000797
topics
the
Python
performance
profiling
module
is
code
can
a
030
007
940
009
100
000
documents
topics
+
What terms
are in a topics
What documents
are in a topics
A matrix factorization
Often with non-negative constraints
sklearn.decompositions.NMF
G Varoquaux 13
1 Letā€™s do something together: sort EuroPython site
EuroPyton abstracts
Topic 1
G Varoquaux 14
1 Letā€™s do something together: sort EuroPython site
EuroPyton abstracts
Topic 2
G Varoquaux 14
1 Letā€™s do something together: sort EuroPython site
EuroPyton abstracts
Topic 3
G Varoquaux 14
1 Letā€™s do something together: sort EuroPython site
EuroPyton abstracts
G Varoquaux 14
1 Letā€™s do something together: sort EuroPython site
EuroPyton abstracts
Add one of Pythonā€™s great templating engine
... get a usable website
https://gaelvaroquaux.github.io/my_topics/ep16
G Varoquaux 14
Want to try it?
$ pip install scikit-learn
...
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1
G Varoquaux 15
Want to try it?
$ pip install scikit-learn
...
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1
G Varoquaux 15
Want to try it?
$ pip install scikit-learn
...
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1
C:> pip install numpy
...
error: Unable to find vcvarsall.bat
G Varoquaux 15
Want to try it?
$ pip install scikit-learn
...
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1
C:> pip install numpy
...
error: Unable to find vcvarsall.bat
G Varoquaux 15
1 Weā€™re diļ¬€erent
Well
fast linear algebra
ATLAS (Fortran) 70x faster
libfortran.so.3 ??
youā€™re kidding me
G Varoquaux 16
1 Weā€™re diļ¬€erent
Well
fast linear algebra
ATLAS (Fortran) 70x faster
libfortran.so.3 ??
youā€™re kidding me
Packaging is a major roadblock for scientiļ¬c Python
A lot of compiled code + shared libraries
ā‡’ library + ABI compatibility issues
Progress:
- Manylinux wheels: PEP 513, RT. McGibbon, NJ. Smith
rely on a conservative core set of libs
- Openblas: pure-C, fast linear algebra
G Varoquaux 16
1 Weā€™re diļ¬€erent
But working together gives us awesome things
Text mining ā‡’ intelligent interfaces
G Varoquaux 17
2 The scientistā€™s view of code
Numerics versus control ļ¬‚ow
Numerics versus databases
Numerics versus strings
Numerics versus the world
G Varoquaux 18
2 Why we love numpy
100 000 term frequency vs inverse doc frequency:
In [*]: %timeit [t * i for t, i in izip(tf, idf)]
100 loops, best of 3: 6.2 ms per loop
The numpy style:
In [*]: %timeit tf * idf
1000 loops, best of 3: 74.2 Āµs per loop
102
104
106
number of elements
1Āµs
100ns
10ns
1ns
timeperelement
lists
numpy
G Varoquaux 19
2 Why we love numpy
100 000 term frequency vs inverse doc frequency:
In [*]: %timeit [t * i for t, i in izip(tf, idf)]
100 loops, best of 3: 6.2 ms per loop
The numpy style:
In [*]: %timeit tf * idf
1000 loops, best of 3: 74.2 Āµs per loop
102
104
106
number of elements
1Āµs
100ns
10ns
1ns
timeperelement
lists
numpy
Array computing can be more readable
tf * idf
vs
[t * i for t, i in izip(tf, idf)]
G Varoquaux 19
2 arrays are nothing but pointers
A numpy array =
memory address data type shape strides
03878794797927
01790752701578
94071746124797
54970718717887
13653490495190
74754265358098
48721546349084
90345673245614
78957187745620
stride 2
stride 1
shape 1
shape 2
Represents any regular
data in a structured way:
how to access elements
via pointer arythmetics
(computing oļ¬€sets)
stride 2
stride 1
shape 1
shape 2
03878794797927 01790752701578 ...
G Varoquaux 20
2 arrays are nothing but pointers
A numpy array =
memory address data type shape strides
03878794797927
01790752701578
94071746124797
54970718717887
13653490495190
74754265358098
48721546349084
90345673245614
78957187745620
stride 2
stride 1
shape 1
shape 2
Represents any regular
data in a structured way:
how to access elements
via pointer arythmetics
(computing oļ¬€sets)
stride 2
stride 1
shape 1
shape 2
03878794797927 01790752701578 ...
Matches the memory model of numerical libraries
ā‡’ Enables copyless interactions
Numpy is really a memory model
G Varoquaux 20
2 Array computing is fast
102
104
106
number of elements
1Āµs
100ns
10ns
1ns
timeperelement
lists
numpy
tf idf = tf * idf CPU
03878794797927
01790752701578
*tf_idf=No type checking
Direct sequential memory access
Vector operations (SIMD)
G Varoquaux 21
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
tf idf = tf * idf CPU
03878794797927
01790752701578
*tf_idf=
2x slowdown passed a certain size
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
105
āˆ¼ size of
the CPU cache
Memory is much slower than CPU
tf idf = tf * idf
03878794797927
01790752701578
*tf_idf=
0387879
017
cpu
cache
2x slowdown passed a certain size
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
Memory is much slower than CPU
tf idf = tf * idf - 1
It gets worse for complex expressions
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
Memory is much slower than CPU
tf idf = tf * idf - 1
Whatā€™s going on:
1. tmp ā† tf * idf
2. tf idf ā† tmp - 1
Big temporary:
Moving data in
& out of cache
G Varoquaux 22
2 Array computing is limited by CPU starvation
tf idf = tf * idf - 1
Whatā€™s going on:
1. tmp ā† tf * idf
2. tf idf ā† tmp - 1
Big temporary:
Moving data in
& out of cache
In [*]: %timeit tf * idf
10000 loops, best of 3: 74.2 Āµs per loop
In [*]: %timeit tf * idf - 1
1000 loops, best of 3: 418 Āµs per loop
G Varoquaux 22
2 Array computing is limited by CPU starvation
tf idf = tf * idf - 1
Whatā€™s going on:
1. tmp ā† tf * idf
2. tf idf ā† tmp - 1
Big temporary:
Moving data in
& out of cache
In [*]: %timeit tf * idf
10000 loops, best of 3: 74.2 Āµs per loop
In [*]: %timeit tf * idf - 1
1000 loops, best of 3: 418 Āµs per loop
In-place operations: reuse the allocation
In [*]: %timeit tmp = tf * idf; tmp -= 1
10000 loops, best of 3: 112 Āµs per loop
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
numpy
np inplace
tmp = tf * idf
tmp -= 1
tf idf = tf * idf - 1
Whatā€™s going on:
1. tmp ā† tf * idf
2. tf idf ā† tmp - 1
Big temporary:
Moving data in
& out of cache
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
numpy
np inplace
tmp = tf * idf
tmp -= 1
tf idf = tf * idf - 1
Whatā€™s going on:
1. tmp ā† tf * idf
2. tf idf ā† tmp - 1
Big temporary:
Moving data in
& out of cache
A compilation problem:
tf idf = tf * idf - 1
tf idf = tf * idf
tf idf -= 1
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
numpy
np inplace
numexpr
tf idf = tf * idf - 1
Whatā€™s going on:
1. tmp ā† tf * idf
2. tf idf ā† tmp - 1
Big temporary:
Moving data in
& out of cache
A compilation problem:
ā€¢ Removing/reusing temporaries
ā€¢ Operating on ā€œchunksā€ that ļ¬t in cache
Addressed by numexpr, with string expressions
numexpr.evaluate(ā€™tf * idf - 1ā€™, locals())
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
numpy
np inplace
numexpr
tf idf = tf * idf - 1
Whatā€™s going on:
1. tmp ā† tf * idf
2. tf idf ā† tmp - 1
Big temporary:
Moving data in
& out of cache
A compilation problem:
ā€¢ Removing/reusing temporaries
ā€¢ Operating on ā€œchunksā€ that ļ¬t in cache
Addressed by numexpr, with string expressions
Addressed by numba, with bytecode inspection
lazyarray
Similar problem to pagination with SQL queries
G Varoquaux 22
2 Array computing is limited by CPU starvation
tf idf = tf * idf
Too small:
overhead
Too BIG:
Out of cache
BIG Data
$$$
G Varoquaux 23
2 Numerics versus control ļ¬‚ow
What if there is an if
tf idf = tf / idf
tf idf[idf == 0] = 0
Suppose the we are looking at ages in a population:
ages[gender == ā€™maleā€™].mean()
- ages[gender == ā€™femaleā€™].mean()
G Varoquaux 24
2 Numerics versus control ļ¬‚ow
What if there is an if
tf idf = tf / idf
tf idf[idf == 0] = 0
Suppose the we are looking at ages in a population:
ages[gender == ā€™maleā€™].mean()
- ages[gender == ā€™femaleā€™].mean()
This is really starting to be looking like databases
pandas: something in between arrays and
an in-memory database
Great for queries, less great for numerics.
G Varoquaux 24
Installation
PROBLEMS
Beautiful
Python
COde
Routines
in Fortran
or C++
ScalaBILiTY
G Varoquaux 25
Installation
PROBLEMS
Beautiful
Python
COde
Routines
in Fortran
or C++
ScalaBILiTY
DeploymentPROBLEMS
Beautiful
Python
COde
DATABASE
in C++, JAVA,
ERLANG...
ScalaBILiTY
Numpy is the scientistā€™s
equivalent to an ORM
Gives speed
with non-Python code
G Varoquaux 25
numerics vs databases
numerics eļ¬ƒcient on regularly spaced data
But numpy creates cache misses for big arrays
ā‡’ Need to remove temporaries and chunk data
G Varoquaux 26
numerics vs databases
numerics eļ¬ƒcient on regularly spaced data
But numpy creates cache misses for big arrays
ā‡’ Need to remove temporaries and chunk data
selection and grouping eļ¬ƒcient with indexes or trees
ā‡’ Need to group queries
Compilation is unpythonic
G Varoquaux 26
numerics vs databases
numerics eļ¬ƒcient on regularly spaced data
But numpy creates cache misses for big arrays
ā‡’ Need to remove temporaries and chunk data
selection and grouping eļ¬ƒcient with indexes or trees
ā‡’ Need to group queries
Compilation is unpythonic
A computation & query language? numexpr
I hate domain-speciļ¬c languages (SQL)
Numpy is very expressive
G Varoquaux 26
numerics vs databases
numerics eļ¬ƒcient on regularly spaced data
But numpy creates cache misses for big arrays
ā‡’ Need to remove temporaries and chunk data
selection and grouping eļ¬ƒcient with indexes or trees
ā‡’ Need to group queries
Compilation is unpythonic
A computation & query language? numexpr
I hate domain-speciļ¬c languages (SQL)
Numpy is very expressive
PonyORM: Compiling Python to optimized SQL
Datascience with SQL: Ibis & Blaze
G Varoquaux 26
numerics vs databases
numerics eļ¬ƒcient on regularly spaced data
But numpy creates cache misses for big arrays
ā‡’ Need to remove temporaries and chunk data
selection and grouping eļ¬ƒcient with indexes or trees
ā‡’ Need to group queries
Spark: java-world ā€œbig dataā€ rising star
combines distributed store
+ computing model
We (scikit-learn) are faster when data ļ¬ts in RAM
G Varoquaux 26
Operations on chunks, or algorithms on chunks
Machine learning, data mining = numerics
03878794797927
03878794797927
G Varoquaux 27
Operations on chunks, or algorithms on chunks
Machine learning, data mining = numerics
03878794797927
03878794797927
03878794797927
03878794797927
ETL (extract, transform, & load) Multivariate statistics
G Varoquaux 27
Operations on chunks, or algorithms on chunks
Machine learning, data mining = numerics
03878794797927
03878794797927
03878794797927
03878794797927
ETL (extract, transform, & load) Multivariate statistics
Out-of-core opera-
tions not eļ¬ƒcient:
no data locality
On-line aglorithms (streaming)
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
eg stochastic gradient descent
As in deep learning
G Varoquaux 27
Making the data-science magic happens
from sklearn import
03078090707907
00790752700578
94071006000797
00970008007000
10000400400090
00050205008000
documents
the
Python
performance
profiling
module
is
code
can
a
03078090707907
00790752700578
94071006000797
topics
the
Python
performance
profiling
module
is
code
can
a
030
007
940
009
100
000
documents
topics
+
What terms
are in a topics
What documents
are in a topics
G Varoquaux 28
Making the data-science magic happens
from sklearn import
Turning applied maths papers to robust code
High-level, readable, simple syntax reduces cognitive load
Thanks
G Varoquaux 28
3 Beyond numerics
Make #PyData great (again)
G Varoquaux 29
3 Data/computation ļ¬‚ow is crucial
03878794797927
03878794797927
03878794797927
03878794797927
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
Data-ļ¬‚ow engines are everywhere
dask pure-Python dynamic scheduler
static compiler parallel & distributed
theano expression analysis pure-Python
tensorflow C library distributed
G Varoquaux 30
3 Data/computation ļ¬‚ow is crucial
03878794797927
03878794797927
03878794797927
03878794797927
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
Data-ļ¬‚ow engines are everywhere
Python should shine there:
reļ¬‚exivity + metaprogramming + async
ā€œPython is the best numerical language out there
because itā€™s not a numerical language.ā€ ā€“ Nathaniel Smith
API challenging:
For algorithm design: no framework / inversion of control
G Varoquaux 30
3 Ingredients for future data ļ¬‚ows
Distributed computation & Run-time analysis
Reļ¬‚exivity is central
Debugging
Interactive work
Code analysis
Persistence
03878794797927
03878794797927
Parallel computing
G Varoquaux 31
3 Ingredients for future data ļ¬‚ows
Distributed computation & Run-time analysis
Reļ¬‚exivity is central
Debugging
Interactive work
Code analysis
Persistence
03878794797927
03878794797927
Parallel computing
Pickle
distribute code and data without data model
serialize intermediate results
deep of hash of any data structure joblib.hash
Very limited (eg no lambda #19272)
ā‡’ variants: dill, cloudpickle
G Varoquaux 31
3 Ingredients for future data ļ¬‚ows
Distributed computation & Run-time analysis
Reļ¬‚exivity is central
Debugging
Interactive work
Code analysis
Persistence
03878794797927
03878794797927
Parallel computing
Pickle
distribute code and data without data model
serialize intermediate results
deep of hash of any data structure joblib.hash
joblib:
Simple parallel syntax:
Parallel(n jobs=2)(delayed(sqrt)(i) for i in range(10))
Fast persistence:
joblib.dump(anything, ā€™filename.pkl.gzā€™)
Primitive for out of core:
pointer = mem.cache(f).call and shelves(big data)
ā€¢Non-invasive syntax / paradigm
ā€¢Fast on big numpy arrays
ā€¢Soon backend system (job broker and persistence)
Gets job managment into algorithms (eg in scikit-learn)
G Varoquaux 31
3 The Python VM is great
The simplicity of the VM is our strength
Software Transactional Memory... would be nice
But, I want to use foreign memory
Java gained jmalloc for foreign memory
Better garbage collection
Yes but, I easily plug into reference counting
A strength of Python is its clear C API
ā‡’ Easy foreign functionality
G Varoquaux 32
3 The Python VM is great
The simplicity of the VM is our strength
Software Transactional Memory... would be nice
But, I want to use foreign memory
Java gained jmalloc for foreign memory
Better garbage collection
Yes but, I easily plug into reference counting
A strength of Python is its clear C API
ā‡’ Easy foreign functionality
Cython: the best of C and Python
Add types for speed (numpy arrays as float*)
Call C to bind external libraries: surprisingly easy
no pointer arithmetics
An adaptation layer between Python VM and C
G Varoquaux 32
4 Working together
G Varoquaux 33
4 Scikit-learn is easy machine learning
As easy as py
from s k l e a r n import svm
c l a s s i f i e r = svm.SVC()
c l a s s i f i e r . f i t ( X t r a i n , Y t r a i n )
Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t )
People love the encapsulation
classifier is a semi black box
The power of a simple object-oriented API
Documentation-driven development
G Varoquaux 34
4 Scikit-learn is easy machine learning
As easy as py
from s k l e a r n import svm
c l a s s i f i e r = svm.SVC()
c l a s s i f i e r . f i t ( X t r a i n , Y t r a i n )
Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t )
People love the encapsulation
classifier is a semi black box
The power of a simple object-oriented API
Documentation-driven development
High-level, readable, simple API reduces cognitive load
PyData loves Python in return
G Varoquaux 34
4 Diļ¬€erence is richness
We all do diļ¬€erent things
We can all beneļ¬t from others
though we donā€™t know how
G Varoquaux 35
4 Diļ¬€erence is richness, but requires outreach
We all do diļ¬€erent things
We can all beneļ¬t from others
though we donā€™t know how
Being didactic outside oneā€™s community is crucial
Avoiding jargon take that machine learning
Prioritizing information
ā€œSimple is better than complexā€
Students learning numerics donā€™t care about unicode
Build documentation upon very simple examples
Think stackoverļ¬‚ow
Sphinx + Sphinx-gallery
G Varoquaux 35
@GaelVaroquaux
Scientist web dev: Python is the language for data
Python language & VM is perfect to manipulate
low-level constructs
with high-level wordings
Connects to other paradigms, eg C
@GaelVaroquaux
Scientist web dev: Python is the language for data
Python language & VM is perfect to manipulate
low-level constructs
with high-level wordings
Dynamism and reļ¬‚exivity
ā‡’ meta-programming and debugging
@GaelVaroquaux
Scientist web dev: Python is the language for data
Python language & VM is perfect to manipulate
low-level constructs
with high-level wordings
Dynamism and reļ¬‚exivity
ā‡’ meta-programming and debugging
Needs for compilation and dynamism:
a diļ¬ƒcult balance
PEP 509: guards on run-time modiļ¬cation
PEP 510: function specicalization
@GaelVaroquaux
Scientist web dev: Python is the language for data
Python language & VM is perfect to manipulate
low-level constructs
with high-level wordings
Dynamism and reļ¬‚exivity
ā‡’ meta-programming and debugging
Needs for compilation and dynamism
Pydata will use DB and concurrency from web
PyData can give knowledge engineering + AI

More Related Content

What's hot

Python Programming - XIII. GUI Programming
Python Programming - XIII. GUI ProgrammingPython Programming - XIII. GUI Programming
Python Programming - XIII. GUI ProgrammingRanel Padon
Ā 
SWIG Hello World
SWIG Hello WorldSWIG Hello World
SWIG Hello World
e8xu
Ā 
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...
Edureka!
Ā 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming
KrishnaMildain
Ā 
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Edureka!
Ā 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
Edureka!
Ā 
Python 101 for the .NET Developer
Python 101 for the .NET DeveloperPython 101 for the .NET Developer
Python 101 for the .NET Developer
Sarah Dutkiewicz
Ā 
What is Range Function? | Range in Python Explained | Edureka
What is Range Function? | Range in Python Explained | EdurekaWhat is Range Function? | Range in Python Explained | Edureka
What is Range Function? | Range in Python Explained | Edureka
Edureka!
Ā 
Introduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter NotebooksIntroduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter Notebooks
Eueung Mulyana
Ā 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
Rajesh Rajamani
Ā 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
Marcel Caraciolo
Ā 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
Yoni Davidson
Ā 
Building custom kernels for IPython
Building custom kernels for IPythonBuilding custom kernels for IPython
Building custom kernels for IPython
Narahari (Hari) Allamraju
Ā 
Final presentation on python
Final presentation on pythonFinal presentation on python
Final presentation on python
RaginiJain21
Ā 
Python lec1
Python lec1Python lec1
Python lec1
Swarup Ghosh
Ā 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
Edureka!
Ā 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
Ā 
Jupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleJupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at Scale
Matthias Bussonnier
Ā 
ĪœĪµĻ„Ī±Ļ€ĻĪæĪ³ĻĪ±ĀµĀµĪ±Ļ„Ī¹ĻƒĀµĻŒĻ‚ ĪŗĻŽĪ“Ī¹ĪŗĪ± Python ĻƒĪµ Ī³Ī»ĻŽĻƒĻƒĪ± Ī³ĻĪ±ĀµĀµĪ¹ĪŗĪæĻ Ļ‡ĻĻŒĪ½ĪæĻ… Ī³Ī¹Ī± Ī±Ļ…Ļ„ĻŒĀµĪ±Ļ„Ī· ĪµĻ€Ī±...
ĪœĪµĻ„Ī±Ļ€ĻĪæĪ³ĻĪ±ĀµĀµĪ±Ļ„Ī¹ĻƒĀµĻŒĻ‚ ĪŗĻŽĪ“Ī¹ĪŗĪ± Python ĻƒĪµ Ī³Ī»ĻŽĻƒĻƒĪ± Ī³ĻĪ±ĀµĀµĪ¹ĪŗĪæĻ Ļ‡ĻĻŒĪ½ĪæĻ… Ī³Ī¹Ī± Ī±Ļ…Ļ„ĻŒĀµĪ±Ļ„Ī· ĪµĻ€Ī±...ĪœĪµĻ„Ī±Ļ€ĻĪæĪ³ĻĪ±ĀµĀµĪ±Ļ„Ī¹ĻƒĀµĻŒĻ‚ ĪŗĻŽĪ“Ī¹ĪŗĪ± Python ĻƒĪµ Ī³Ī»ĻŽĻƒĻƒĪ± Ī³ĻĪ±ĀµĀµĪ¹ĪŗĪæĻ Ļ‡ĻĻŒĪ½ĪæĻ… Ī³Ī¹Ī± Ī±Ļ…Ļ„ĻŒĀµĪ±Ļ„Ī· ĪµĻ€Ī±...
ĪœĪµĻ„Ī±Ļ€ĻĪæĪ³ĻĪ±ĀµĀµĪ±Ļ„Ī¹ĻƒĀµĻŒĻ‚ ĪŗĻŽĪ“Ī¹ĪŗĪ± Python ĻƒĪµ Ī³Ī»ĻŽĻƒĻƒĪ± Ī³ĻĪ±ĀµĀµĪ¹ĪŗĪæĻ Ļ‡ĻĻŒĪ½ĪæĻ… Ī³Ī¹Ī± Ī±Ļ…Ļ„ĻŒĀµĪ±Ļ„Ī· ĪµĻ€Ī±...
ISSEL
Ā 
C pythontalk
C pythontalkC pythontalk
C pythontalk
Nicholaus Jackson
Ā 

What's hot (20)

Python Programming - XIII. GUI Programming
Python Programming - XIII. GUI ProgrammingPython Programming - XIII. GUI Programming
Python Programming - XIII. GUI Programming
Ā 
SWIG Hello World
SWIG Hello WorldSWIG Hello World
SWIG Hello World
Ā 
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...
Ā 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming
Ā 
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Ā 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
Ā 
Python 101 for the .NET Developer
Python 101 for the .NET DeveloperPython 101 for the .NET Developer
Python 101 for the .NET Developer
Ā 
What is Range Function? | Range in Python Explained | Edureka
What is Range Function? | Range in Python Explained | EdurekaWhat is Range Function? | Range in Python Explained | Edureka
What is Range Function? | Range in Python Explained | Edureka
Ā 
Introduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter NotebooksIntroduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter Notebooks
Ā 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
Ā 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
Ā 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
Ā 
Building custom kernels for IPython
Building custom kernels for IPythonBuilding custom kernels for IPython
Building custom kernels for IPython
Ā 
Final presentation on python
Final presentation on pythonFinal presentation on python
Final presentation on python
Ā 
Python lec1
Python lec1Python lec1
Python lec1
Ā 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
Ā 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Ā 
Jupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleJupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at Scale
Ā 
ĪœĪµĻ„Ī±Ļ€ĻĪæĪ³ĻĪ±ĀµĀµĪ±Ļ„Ī¹ĻƒĀµĻŒĻ‚ ĪŗĻŽĪ“Ī¹ĪŗĪ± Python ĻƒĪµ Ī³Ī»ĻŽĻƒĻƒĪ± Ī³ĻĪ±ĀµĀµĪ¹ĪŗĪæĻ Ļ‡ĻĻŒĪ½ĪæĻ… Ī³Ī¹Ī± Ī±Ļ…Ļ„ĻŒĀµĪ±Ļ„Ī· ĪµĻ€Ī±...
ĪœĪµĻ„Ī±Ļ€ĻĪæĪ³ĻĪ±ĀµĀµĪ±Ļ„Ī¹ĻƒĀµĻŒĻ‚ ĪŗĻŽĪ“Ī¹ĪŗĪ± Python ĻƒĪµ Ī³Ī»ĻŽĻƒĻƒĪ± Ī³ĻĪ±ĀµĀµĪ¹ĪŗĪæĻ Ļ‡ĻĻŒĪ½ĪæĻ… Ī³Ī¹Ī± Ī±Ļ…Ļ„ĻŒĀµĪ±Ļ„Ī· ĪµĻ€Ī±...ĪœĪµĻ„Ī±Ļ€ĻĪæĪ³ĻĪ±ĀµĀµĪ±Ļ„Ī¹ĻƒĀµĻŒĻ‚ ĪŗĻŽĪ“Ī¹ĪŗĪ± Python ĻƒĪµ Ī³Ī»ĻŽĻƒĻƒĪ± Ī³ĻĪ±ĀµĀµĪ¹ĪŗĪæĻ Ļ‡ĻĻŒĪ½ĪæĻ… Ī³Ī¹Ī± Ī±Ļ…Ļ„ĻŒĀµĪ±Ļ„Ī· ĪµĻ€Ī±...
ĪœĪµĻ„Ī±Ļ€ĻĪæĪ³ĻĪ±ĀµĀµĪ±Ļ„Ī¹ĻƒĀµĻŒĻ‚ ĪŗĻŽĪ“Ī¹ĪŗĪ± Python ĻƒĪµ Ī³Ī»ĻŽĻƒĻƒĪ± Ī³ĻĪ±ĀµĀµĪ¹ĪŗĪæĻ Ļ‡ĻĻŒĪ½ĪæĻ… Ī³Ī¹Ī± Ī±Ļ…Ļ„ĻŒĀµĪ±Ļ„Ī· ĪµĻ€Ī±...
Ā 
C pythontalk
C pythontalkC pythontalk
C pythontalk
Ā 

Viewers also liked

Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in Python
Gael Varoquaux
Ā 
On the code of data science
On the code of data scienceOn the code of data science
On the code of data science
Gael Varoquaux
Ā 
Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016
Gael Varoquaux
Ā 
Succeeding in academia despite doing good_software
Succeeding in academia despite doing good_softwareSucceeding in academia despite doing good_software
Succeeding in academia despite doing good_software
Gael Varoquaux
Ā 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questions
Gael Varoquaux
Ā 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizations
Gael Varoquaux
Ā 
Scikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonScikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en Python
Gael Varoquaux
Ā 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the project
Gael Varoquaux
Ā 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRI
Gael Varoquaux
Ā 
Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budget
Gael Varoquaux
Ā 
Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...
Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...
Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...
Gael Varoquaux
Ā 
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionA hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
Gael Varoquaux
Ā 
Advanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisAdvanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysis
Gael Varoquaux
Ā 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patterns
Gael Varoquaux
Ā 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsity
Gael Varoquaux
Ā 
Open Source Scientific Software
Open Source Scientific SoftwareOpen Source Scientific Software
Open Source Scientific Software
Gael Varoquaux
Ā 
Connectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsConnectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis Methods
Gael Varoquaux
Ā 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
Gael Varoquaux
Ā 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
Fabian Pedregosa
Ā 
Mediation analysis
Mediation analysisMediation analysis
Mediation analysis
nelle varoquaux
Ā 

Viewers also liked (20)

Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in Python
Ā 
On the code of data science
On the code of data scienceOn the code of data science
On the code of data science
Ā 
Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016
Ā 
Succeeding in academia despite doing good_software
Succeeding in academia despite doing good_softwareSucceeding in academia despite doing good_software
Succeeding in academia despite doing good_software
Ā 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questions
Ā 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizations
Ā 
Scikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonScikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en Python
Ā 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the project
Ā 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRI
Ā 
Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budget
Ā 
Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...
Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...
Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...
Ā 
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionA hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
Ā 
Advanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisAdvanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysis
Ā 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patterns
Ā 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsity
Ā 
Open Source Scientific Software
Open Source Scientific SoftwareOpen Source Scientific Software
Open Source Scientific Software
Ā 
Connectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsConnectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis Methods
Ā 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
Ā 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
Ā 
Mediation analysis
Mediation analysisMediation analysis
Mediation analysis
Ā 

Similar to Scientist meets web dev: how Python became the language of data

A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
Asia Smith
Ā 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming Language
Edureka!
Ā 
Py4 inf 01-intro
Py4 inf 01-introPy4 inf 01-intro
Py4 inf 01-intro
Ishaq Ali
Ā 
05 python.pdf
05 python.pdf05 python.pdf
05 python.pdf
SugumarSarDurai
Ā 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python course
Eran Shlomo
Ā 
python programming.pptx
python programming.pptxpython programming.pptx
python programming.pptx
Kaviya452563
Ā 
Python Training in Pune - Ethans Tech Pune
Python Training in Pune - Ethans Tech PunePython Training in Pune - Ethans Tech Pune
Python Training in Pune - Ethans Tech Pune
Ethan's Tech
Ā 
Python
Python Python
Python Edureka!
Ā 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
Spotle.ai
Ā 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
Nicholas Pringle
Ā 
Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...
Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...
Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...
Edureka!
Ā 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
SudhanshiBakre1
Ā 
5 Effective Tips to Learn Python Fast.pptx
5 Effective Tips to Learn Python Fast.pptx5 Effective Tips to Learn Python Fast.pptx
5 Effective Tips to Learn Python Fast.pptx
Attitude Tally Academy
Ā 
Seminar report on python 3 course
Seminar report on python 3 courseSeminar report on python 3 course
Seminar report on python 3 course
HimanshuPanwar38
Ā 
Django Article V0
Django Article V0Django Article V0
Django Article V0Udi Bauman
Ā 
BarCamb Connotea by Ian Mulvany
BarCamb Connotea by Ian MulvanyBarCamb Connotea by Ian Mulvany
BarCamb Connotea by Ian Mulvany
Ian Mulvany
Ā 
Training report 1923-b.e-eee-batchno--intern-54 (1).pdf
Training report 1923-b.e-eee-batchno--intern-54 (1).pdfTraining report 1923-b.e-eee-batchno--intern-54 (1).pdf
Training report 1923-b.e-eee-batchno--intern-54 (1).pdf
YadavHarshKr
Ā 
pycon-2015-liza-daly
pycon-2015-liza-dalypycon-2015-liza-daly
pycon-2015-liza-dalyLiza Daly
Ā 
Pyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdfPyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdf
Mattupallipardhu
Ā 
Python Full Stack Training in Noida.pptx
Python Full Stack Training in  Noida.pptxPython Full Stack Training in  Noida.pptx
Python Full Stack Training in Noida.pptx
ashishthakur730937
Ā 

Similar to Scientist meets web dev: how Python became the language of data (20)

A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
Ā 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming Language
Ā 
Py4 inf 01-intro
Py4 inf 01-introPy4 inf 01-intro
Py4 inf 01-intro
Ā 
05 python.pdf
05 python.pdf05 python.pdf
05 python.pdf
Ā 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python course
Ā 
python programming.pptx
python programming.pptxpython programming.pptx
python programming.pptx
Ā 
Python Training in Pune - Ethans Tech Pune
Python Training in Pune - Ethans Tech PunePython Training in Pune - Ethans Tech Pune
Python Training in Pune - Ethans Tech Pune
Ā 
Python
Python Python
Python
Ā 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
Ā 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
Ā 
Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...
Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...
Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...
Ā 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
Ā 
5 Effective Tips to Learn Python Fast.pptx
5 Effective Tips to Learn Python Fast.pptx5 Effective Tips to Learn Python Fast.pptx
5 Effective Tips to Learn Python Fast.pptx
Ā 
Seminar report on python 3 course
Seminar report on python 3 courseSeminar report on python 3 course
Seminar report on python 3 course
Ā 
Django Article V0
Django Article V0Django Article V0
Django Article V0
Ā 
BarCamb Connotea by Ian Mulvany
BarCamb Connotea by Ian MulvanyBarCamb Connotea by Ian Mulvany
BarCamb Connotea by Ian Mulvany
Ā 
Training report 1923-b.e-eee-batchno--intern-54 (1).pdf
Training report 1923-b.e-eee-batchno--intern-54 (1).pdfTraining report 1923-b.e-eee-batchno--intern-54 (1).pdf
Training report 1923-b.e-eee-batchno--intern-54 (1).pdf
Ā 
pycon-2015-liza-daly
pycon-2015-liza-dalypycon-2015-liza-daly
pycon-2015-liza-daly
Ā 
Pyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdfPyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdf
Ā 
Python Full Stack Training in Noida.pptx
Python Full Stack Training in  Noida.pptxPython Full Stack Training in  Noida.pptx
Python Full Stack Training in Noida.pptx
Ā 

More from Gael Varoquaux

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic value
Gael Varoquaux
Ā 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imaging
Gael Varoquaux
Ā 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing values
Gael Varoquaux
Ā 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
Gael Varoquaux
Ā 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settings
Gael Varoquaux
Ā 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...
Gael Varoquaux
Ā 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?
Gael Varoquaux
Ā 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mapping
Gael Varoquaux
Ā 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
Gael Varoquaux
Ā 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
Gael Varoquaux
Ā 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imaging
Gael Varoquaux
Ā 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
Gael Varoquaux
Ā 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imaging
Gael Varoquaux
Ā 
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingScikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Gael Varoquaux
Ā 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible science
Gael Varoquaux
Ā 
Coding for science and innovation
Coding for science and innovationCoding for science and innovation
Coding for science and innovation
Gael Varoquaux
Ā 
Estimating Functional Connectomes: Sparsityā€™s Strength and Limitations
Estimating Functional Connectomes: Sparsityā€™s Strength and LimitationsEstimating Functional Connectomes: Sparsityā€™s Strength and Limitations
Estimating Functional Connectomes: Sparsityā€™s Strength and Limitations
Gael Varoquaux
Ā 

More from Gael Varoquaux (17)

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic value
Ā 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imaging
Ā 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing values
Ā 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
Ā 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settings
Ā 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...
Ā 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?
Ā 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mapping
Ā 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
Ā 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
Ā 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imaging
Ā 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
Ā 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imaging
Ā 
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingScikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Ā 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible science
Ā 
Coding for science and innovation
Coding for science and innovationCoding for science and innovation
Coding for science and innovation
Ā 
Estimating Functional Connectomes: Sparsityā€™s Strength and Limitations
Estimating Functional Connectomes: Sparsityā€™s Strength and LimitationsEstimating Functional Connectomes: Sparsityā€™s Strength and Limitations
Estimating Functional Connectomes: Sparsityā€™s Strength and Limitations
Ā 

Recently uploaded

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
Ā 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
Ā 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
Ā 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
Ā 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
Ā 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
Ā 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
Ā 
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
UiPathCommunity
Ā 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
Ā 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
Ā 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
Ā 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
Ā 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
Ā 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
Ā 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
Ā 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
Ā 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
Ā 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
Ā 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
Ā 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
Ā 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Ā 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Ā 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Ā 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Ā 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Ā 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Ā 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Ā 
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder ā€“ active learning and UiPath LLMs for do...
Ā 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ā 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Ā 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
Ā 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Ā 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
Ā 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Ā 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
Ā 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Ā 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
Ā 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Ā 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Ā 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Ā 

Scientist meets web dev: how Python became the language of data

  • 1. Scientist meets web dev: how Python became the language of data GaĀØel Varoquaux
  • 2. Scientist meets web dev: how Python became the language of data GaĀØel Varoquaux Very diverse community This talk: a reļ¬‚ection on what we have in common, Python I am talking about things you donā€™t understand (my science) and things I donā€™t understand (web dev)
  • 3. I actually did a PhD in quantum physics Hence I think I qualify as a ā€œscientistā€ G Varoquaux 3
  • 4. I now do computer science for neuroscience Try to link neural activity to thoughts and cognition G Varoquaux 4
  • 5. I now do computer science for neuroscience Try to link neural activity to thoughts and cognition We attack it as a machine learning problem Python software: nilearn G Varoquaux 5
  • 6. On the way, we created a machine-learning library: scikit-learn G Varoquaux 6
  • 7. Data science with Python is hot Huge success. Cool. Data science is THE thing. G Varoquaux 7
  • 8. Data science with Python is hot Huge success. Cool. Data science is THE thing. Python is the go-to language How did it happen? We built scikit-learn, others pandas, etc..., but these were built on solid foundations G Varoquaux 7
  • 9. 1 Scientists come from Jupiter And web devs from Saturn? And sysadmins from Neptune? G Varoquaux 8
  • 10. 1 Weā€™re diļ¬€erent numbers (in arrays) arrays (of numbers) arrays arrays strings databases object-oriented programming ļ¬‚ow control A bit of a culture gap G Varoquaux 9
  • 11. 1 Letā€™s do something together: sort EuroPython site 205 talks: How OpenStack makes Python better (and vice-versa) Introduction to aiohttp So you think your Python startup is worth $10 million... SQLAlchemy as the backbone of a Data Science company Learn Python The Fun Way Scaling Microservices with Crossbar.io If you can read this you donā€™t need glasses Letā€™s ļ¬nd some common topics with data science G Varoquaux 10
  • 12. 1 Letā€™s do something together: sort EuroPython site 205 talks: How OpenStack makes Python better (and vice-versa) Introduction to aiohttp So you think your Python startup is worth $10 million... SQLAlchemy as the backbone of a Data Science company Learn Python The Fun Way Scaling Microservices with Crossbar.io If you can read this you donā€™t need glasses Letā€™s ļ¬nd some common topics with data science Anyone who has used Python to search text for substring patterns has at least heard of the regular expression module. Many of us use it extensively for parsers and lexers, The py.test tool presents a rapid and simple way to write tests for your Python code. This training gives a quick introduction with exercises into some distinguishing features.Chat with the core developers about how to extend django CMS or how to integrate your own apps seamlessly. Lets talk about your plugins, apphooks, toolbar extensions G Varoquaux 10
  • 13. 1 Letā€™s do something together: sort EuroPython site 205 talks: How OpenStack makes Python better (and vice-versa) Introduction to aiohttp So you think your Python startup is worth $10 million... SQLAlchemy as the backbone of a Data Science company Learn Python The Fun Way Scaling Microservices with Crossbar.io If you can read this you donā€™t need glasses Letā€™s ļ¬nd some common topics with data science Anyone who has used Python to search text for substring patterns has at least heard of the regular expression module. Many of us use it extensively for parsers and lexers, The py.test tool presents a rapid and simple way to write tests for your Python code. This training gives a quick introduction with exercises into some distinguishing features.Chat with the core developers about how to extend django CMS or how to integrate your own apps seamlessly. Lets talk about your plugins, apphooks, toolbar extensions import urllib2, bs4 import sklearn, wordcloud G Varoquaux 10
  • 14. 1 Letā€™s do something together: sort EuroPython site Crawl the schedule to get a list of titles and URLs talk pages to retrieve abstract and tags bs4: beautiful soup, matchings on the DOM tree G Varoquaux 11
  • 15. 1 Letā€™s do something together: sort EuroPython site Crawl the schedule to get a list of titles and URLs talk pages to retrieve abstract and tags bs4: beautiful soup, matchings on the DOM tree Vectorize Anyone who has used Python to search text for substring patterns has at least heard of the regular expression module. Many of us use it extensively for parsers and lexers, The py.test tool presents a rapid and simple way to write tests for your Python code. This training gives a quick introduction with exercises into some distinguishing features.Chat with the core developers about how to extend django CMS or how to integrate your own apps seamlessly. Lets talk about your plugins, apphooks, toolbar extensions a can code is module proļ¬ling performance Python the 20 10 4 14 3 2 1 9 18 Term Freq G Varoquaux 11
  • 16. 1 Letā€™s do something together: sort EuroPython site Crawl the schedule to get a list of titles and URLs talk pages to retrieve abstract and tags bs4: beautiful soup, matchings on the DOM tree Vectorize Anyone who has used Python to search text for substring patterns has at least heard of the regular expression module. Many of us use it extensively for parsers and lexers, The py.test tool presents a rapid and simple way to write tests for your Python code. This training gives a quick introduction with exercises into some distinguishing features.Chat with the core developers about how to extend django CMS or how to integrate your own apps seamlessly. Lets talk about your plugins, apphooks, toolbar extensions a can code is module proļ¬ling performance Python the 20 10 4 14 3 2 1 9 18 Term Freq 1321 540 208 964 123 7 6 191 1450 All docs G Varoquaux 11
  • 17. 1 Letā€™s do something together: sort EuroPython site Crawl the schedule to get a list of titles and URLs talk pages to retrieve abstract and tags bs4: beautiful soup, matchings on the DOM tree Vectorize Anyone who has used Python to search text for substring patterns has at least heard of the regular expression module. Many of us use it extensively for parsers and lexers, The py.test tool presents a rapid and simple way to write tests for your Python code. This training gives a quick introduction with exercises into some distinguishing features.Chat with the core developers about how to extend django CMS or how to integrate your own apps seamlessly. Lets talk about your plugins, apphooks, toolbar extensions a can code is module proļ¬ling performance Python the 20 10 4 14 3 2 1 9 18 Term Freq 1321 540 208 964 123 7 6 191 1450 All docs .015 .018 .019 .014 .023 .286 .167 .047 .012 Ratio TF-IDF in scikit-learn sklearn.feature extraction.text.TfidfVectorizer G Varoquaux 11
  • 18. 1 Letā€™s do something together: sort EuroPython site 03078090707907 00790752700578 94071006000797 00970008007000 10000400400090 00050205008000 documents the Python performance profiling module is code can a Term-document matrix G Varoquaux 12
  • 19. 1 Letā€™s do something together: sort EuroPython site 03078090707907 00790752700578 94071006000797 00970008007000 10000400400090 00050205008000 documents the Python performance profiling module is code can a Term-document matrix 3 78 9 7 79 7 79 7527 578 94 71 6 797 97 8 7 1 4 4 9 5 2 5 8 Can be a sparse matrix G Varoquaux 12
  • 20. 1 Letā€™s do something together: sort EuroPython site 03078090707907 00790752700578 94071006000797 00970008007000 10000400400090 00050205008000 documents the Python performance profiling module is code can a ā†’ 03078090707907 00790752700578 94071006000797 topics the Python performance profiling module is code can a 030 007 940 009 100 000 documents topics + What terms are in a topics What documents are in a topics A matrix factorization Often with non-negative constraints sklearn.decompositions.NMF G Varoquaux 13
  • 21. 1 Letā€™s do something together: sort EuroPython site EuroPyton abstracts Topic 1 G Varoquaux 14
  • 22. 1 Letā€™s do something together: sort EuroPython site EuroPyton abstracts Topic 2 G Varoquaux 14
  • 23. 1 Letā€™s do something together: sort EuroPython site EuroPyton abstracts Topic 3 G Varoquaux 14
  • 24. 1 Letā€™s do something together: sort EuroPython site EuroPyton abstracts G Varoquaux 14
  • 25. 1 Letā€™s do something together: sort EuroPython site EuroPyton abstracts Add one of Pythonā€™s great templating engine ... get a usable website https://gaelvaroquaux.github.io/my_topics/ep16 G Varoquaux 14
  • 26. Want to try it? $ pip install scikit-learn ... ImportError: Numerical Python (NumPy) is not installed. scikit-learn requires NumPy >= 1.6.1 G Varoquaux 15
  • 27. Want to try it? $ pip install scikit-learn ... ImportError: Numerical Python (NumPy) is not installed. scikit-learn requires NumPy >= 1.6.1 G Varoquaux 15
  • 28. Want to try it? $ pip install scikit-learn ... ImportError: Numerical Python (NumPy) is not installed. scikit-learn requires NumPy >= 1.6.1 C:> pip install numpy ... error: Unable to find vcvarsall.bat G Varoquaux 15
  • 29. Want to try it? $ pip install scikit-learn ... ImportError: Numerical Python (NumPy) is not installed. scikit-learn requires NumPy >= 1.6.1 C:> pip install numpy ... error: Unable to find vcvarsall.bat G Varoquaux 15
  • 30. 1 Weā€™re diļ¬€erent Well fast linear algebra ATLAS (Fortran) 70x faster libfortran.so.3 ?? youā€™re kidding me G Varoquaux 16
  • 31. 1 Weā€™re diļ¬€erent Well fast linear algebra ATLAS (Fortran) 70x faster libfortran.so.3 ?? youā€™re kidding me Packaging is a major roadblock for scientiļ¬c Python A lot of compiled code + shared libraries ā‡’ library + ABI compatibility issues Progress: - Manylinux wheels: PEP 513, RT. McGibbon, NJ. Smith rely on a conservative core set of libs - Openblas: pure-C, fast linear algebra G Varoquaux 16
  • 32. 1 Weā€™re diļ¬€erent But working together gives us awesome things Text mining ā‡’ intelligent interfaces G Varoquaux 17
  • 33. 2 The scientistā€™s view of code Numerics versus control ļ¬‚ow Numerics versus databases Numerics versus strings Numerics versus the world G Varoquaux 18
  • 34. 2 Why we love numpy 100 000 term frequency vs inverse doc frequency: In [*]: %timeit [t * i for t, i in izip(tf, idf)] 100 loops, best of 3: 6.2 ms per loop The numpy style: In [*]: %timeit tf * idf 1000 loops, best of 3: 74.2 Āµs per loop 102 104 106 number of elements 1Āµs 100ns 10ns 1ns timeperelement lists numpy G Varoquaux 19
  • 35. 2 Why we love numpy 100 000 term frequency vs inverse doc frequency: In [*]: %timeit [t * i for t, i in izip(tf, idf)] 100 loops, best of 3: 6.2 ms per loop The numpy style: In [*]: %timeit tf * idf 1000 loops, best of 3: 74.2 Āµs per loop 102 104 106 number of elements 1Āµs 100ns 10ns 1ns timeperelement lists numpy Array computing can be more readable tf * idf vs [t * i for t, i in izip(tf, idf)] G Varoquaux 19
  • 36. 2 arrays are nothing but pointers A numpy array = memory address data type shape strides 03878794797927 01790752701578 94071746124797 54970718717887 13653490495190 74754265358098 48721546349084 90345673245614 78957187745620 stride 2 stride 1 shape 1 shape 2 Represents any regular data in a structured way: how to access elements via pointer arythmetics (computing oļ¬€sets) stride 2 stride 1 shape 1 shape 2 03878794797927 01790752701578 ... G Varoquaux 20
  • 37. 2 arrays are nothing but pointers A numpy array = memory address data type shape strides 03878794797927 01790752701578 94071746124797 54970718717887 13653490495190 74754265358098 48721546349084 90345673245614 78957187745620 stride 2 stride 1 shape 1 shape 2 Represents any regular data in a structured way: how to access elements via pointer arythmetics (computing oļ¬€sets) stride 2 stride 1 shape 1 shape 2 03878794797927 01790752701578 ... Matches the memory model of numerical libraries ā‡’ Enables copyless interactions Numpy is really a memory model G Varoquaux 20
  • 38. 2 Array computing is fast 102 104 106 number of elements 1Āµs 100ns 10ns 1ns timeperelement lists numpy tf idf = tf * idf CPU 03878794797927 01790752701578 *tf_idf=No type checking Direct sequential memory access Vector operations (SIMD) G Varoquaux 21
  • 39. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement tf idf = tf * idf CPU 03878794797927 01790752701578 *tf_idf= 2x slowdown passed a certain size G Varoquaux 22
  • 40. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement 105 āˆ¼ size of the CPU cache Memory is much slower than CPU tf idf = tf * idf 03878794797927 01790752701578 *tf_idf= 0387879 017 cpu cache 2x slowdown passed a certain size G Varoquaux 22
  • 41. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement Memory is much slower than CPU tf idf = tf * idf - 1 It gets worse for complex expressions G Varoquaux 22
  • 42. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement Memory is much slower than CPU tf idf = tf * idf - 1 Whatā€™s going on: 1. tmp ā† tf * idf 2. tf idf ā† tmp - 1 Big temporary: Moving data in & out of cache G Varoquaux 22
  • 43. 2 Array computing is limited by CPU starvation tf idf = tf * idf - 1 Whatā€™s going on: 1. tmp ā† tf * idf 2. tf idf ā† tmp - 1 Big temporary: Moving data in & out of cache In [*]: %timeit tf * idf 10000 loops, best of 3: 74.2 Āµs per loop In [*]: %timeit tf * idf - 1 1000 loops, best of 3: 418 Āµs per loop G Varoquaux 22
  • 44. 2 Array computing is limited by CPU starvation tf idf = tf * idf - 1 Whatā€™s going on: 1. tmp ā† tf * idf 2. tf idf ā† tmp - 1 Big temporary: Moving data in & out of cache In [*]: %timeit tf * idf 10000 loops, best of 3: 74.2 Āµs per loop In [*]: %timeit tf * idf - 1 1000 loops, best of 3: 418 Āµs per loop In-place operations: reuse the allocation In [*]: %timeit tmp = tf * idf; tmp -= 1 10000 loops, best of 3: 112 Āµs per loop G Varoquaux 22
  • 45. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement numpy np inplace tmp = tf * idf tmp -= 1 tf idf = tf * idf - 1 Whatā€™s going on: 1. tmp ā† tf * idf 2. tf idf ā† tmp - 1 Big temporary: Moving data in & out of cache G Varoquaux 22
  • 46. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement numpy np inplace tmp = tf * idf tmp -= 1 tf idf = tf * idf - 1 Whatā€™s going on: 1. tmp ā† tf * idf 2. tf idf ā† tmp - 1 Big temporary: Moving data in & out of cache A compilation problem: tf idf = tf * idf - 1 tf idf = tf * idf tf idf -= 1 G Varoquaux 22
  • 47. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement numpy np inplace numexpr tf idf = tf * idf - 1 Whatā€™s going on: 1. tmp ā† tf * idf 2. tf idf ā† tmp - 1 Big temporary: Moving data in & out of cache A compilation problem: ā€¢ Removing/reusing temporaries ā€¢ Operating on ā€œchunksā€ that ļ¬t in cache Addressed by numexpr, with string expressions numexpr.evaluate(ā€™tf * idf - 1ā€™, locals()) G Varoquaux 22
  • 48. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement numpy np inplace numexpr tf idf = tf * idf - 1 Whatā€™s going on: 1. tmp ā† tf * idf 2. tf idf ā† tmp - 1 Big temporary: Moving data in & out of cache A compilation problem: ā€¢ Removing/reusing temporaries ā€¢ Operating on ā€œchunksā€ that ļ¬t in cache Addressed by numexpr, with string expressions Addressed by numba, with bytecode inspection lazyarray Similar problem to pagination with SQL queries G Varoquaux 22
  • 49. 2 Array computing is limited by CPU starvation tf idf = tf * idf Too small: overhead Too BIG: Out of cache BIG Data $$$ G Varoquaux 23
  • 50. 2 Numerics versus control ļ¬‚ow What if there is an if tf idf = tf / idf tf idf[idf == 0] = 0 Suppose the we are looking at ages in a population: ages[gender == ā€™maleā€™].mean() - ages[gender == ā€™femaleā€™].mean() G Varoquaux 24
  • 51. 2 Numerics versus control ļ¬‚ow What if there is an if tf idf = tf / idf tf idf[idf == 0] = 0 Suppose the we are looking at ages in a population: ages[gender == ā€™maleā€™].mean() - ages[gender == ā€™femaleā€™].mean() This is really starting to be looking like databases pandas: something in between arrays and an in-memory database Great for queries, less great for numerics. G Varoquaux 24
  • 53. Installation PROBLEMS Beautiful Python COde Routines in Fortran or C++ ScalaBILiTY DeploymentPROBLEMS Beautiful Python COde DATABASE in C++, JAVA, ERLANG... ScalaBILiTY Numpy is the scientistā€™s equivalent to an ORM Gives speed with non-Python code G Varoquaux 25
  • 54. numerics vs databases numerics eļ¬ƒcient on regularly spaced data But numpy creates cache misses for big arrays ā‡’ Need to remove temporaries and chunk data G Varoquaux 26
  • 55. numerics vs databases numerics eļ¬ƒcient on regularly spaced data But numpy creates cache misses for big arrays ā‡’ Need to remove temporaries and chunk data selection and grouping eļ¬ƒcient with indexes or trees ā‡’ Need to group queries Compilation is unpythonic G Varoquaux 26
  • 56. numerics vs databases numerics eļ¬ƒcient on regularly spaced data But numpy creates cache misses for big arrays ā‡’ Need to remove temporaries and chunk data selection and grouping eļ¬ƒcient with indexes or trees ā‡’ Need to group queries Compilation is unpythonic A computation & query language? numexpr I hate domain-speciļ¬c languages (SQL) Numpy is very expressive G Varoquaux 26
  • 57. numerics vs databases numerics eļ¬ƒcient on regularly spaced data But numpy creates cache misses for big arrays ā‡’ Need to remove temporaries and chunk data selection and grouping eļ¬ƒcient with indexes or trees ā‡’ Need to group queries Compilation is unpythonic A computation & query language? numexpr I hate domain-speciļ¬c languages (SQL) Numpy is very expressive PonyORM: Compiling Python to optimized SQL Datascience with SQL: Ibis & Blaze G Varoquaux 26
  • 58. numerics vs databases numerics eļ¬ƒcient on regularly spaced data But numpy creates cache misses for big arrays ā‡’ Need to remove temporaries and chunk data selection and grouping eļ¬ƒcient with indexes or trees ā‡’ Need to group queries Spark: java-world ā€œbig dataā€ rising star combines distributed store + computing model We (scikit-learn) are faster when data ļ¬ts in RAM G Varoquaux 26
  • 59. Operations on chunks, or algorithms on chunks Machine learning, data mining = numerics 03878794797927 03878794797927 G Varoquaux 27
  • 60. Operations on chunks, or algorithms on chunks Machine learning, data mining = numerics 03878794797927 03878794797927 03878794797927 03878794797927 ETL (extract, transform, & load) Multivariate statistics G Varoquaux 27
  • 61. Operations on chunks, or algorithms on chunks Machine learning, data mining = numerics 03878794797927 03878794797927 03878794797927 03878794797927 ETL (extract, transform, & load) Multivariate statistics Out-of-core opera- tions not eļ¬ƒcient: no data locality On-line aglorithms (streaming) 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 eg stochastic gradient descent As in deep learning G Varoquaux 27
  • 62. Making the data-science magic happens from sklearn import 03078090707907 00790752700578 94071006000797 00970008007000 10000400400090 00050205008000 documents the Python performance profiling module is code can a 03078090707907 00790752700578 94071006000797 topics the Python performance profiling module is code can a 030 007 940 009 100 000 documents topics + What terms are in a topics What documents are in a topics G Varoquaux 28
  • 63. Making the data-science magic happens from sklearn import Turning applied maths papers to robust code High-level, readable, simple syntax reduces cognitive load Thanks G Varoquaux 28
  • 64. 3 Beyond numerics Make #PyData great (again) G Varoquaux 29
  • 65. 3 Data/computation ļ¬‚ow is crucial 03878794797927 03878794797927 03878794797927 03878794797927 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 Data-ļ¬‚ow engines are everywhere dask pure-Python dynamic scheduler static compiler parallel & distributed theano expression analysis pure-Python tensorflow C library distributed G Varoquaux 30
  • 66. 3 Data/computation ļ¬‚ow is crucial 03878794797927 03878794797927 03878794797927 03878794797927 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 Data-ļ¬‚ow engines are everywhere Python should shine there: reļ¬‚exivity + metaprogramming + async ā€œPython is the best numerical language out there because itā€™s not a numerical language.ā€ ā€“ Nathaniel Smith API challenging: For algorithm design: no framework / inversion of control G Varoquaux 30
  • 67. 3 Ingredients for future data ļ¬‚ows Distributed computation & Run-time analysis Reļ¬‚exivity is central Debugging Interactive work Code analysis Persistence 03878794797927 03878794797927 Parallel computing G Varoquaux 31
  • 68. 3 Ingredients for future data ļ¬‚ows Distributed computation & Run-time analysis Reļ¬‚exivity is central Debugging Interactive work Code analysis Persistence 03878794797927 03878794797927 Parallel computing Pickle distribute code and data without data model serialize intermediate results deep of hash of any data structure joblib.hash Very limited (eg no lambda #19272) ā‡’ variants: dill, cloudpickle G Varoquaux 31
  • 69. 3 Ingredients for future data ļ¬‚ows Distributed computation & Run-time analysis Reļ¬‚exivity is central Debugging Interactive work Code analysis Persistence 03878794797927 03878794797927 Parallel computing Pickle distribute code and data without data model serialize intermediate results deep of hash of any data structure joblib.hash joblib: Simple parallel syntax: Parallel(n jobs=2)(delayed(sqrt)(i) for i in range(10)) Fast persistence: joblib.dump(anything, ā€™filename.pkl.gzā€™) Primitive for out of core: pointer = mem.cache(f).call and shelves(big data) ā€¢Non-invasive syntax / paradigm ā€¢Fast on big numpy arrays ā€¢Soon backend system (job broker and persistence) Gets job managment into algorithms (eg in scikit-learn) G Varoquaux 31
  • 70. 3 The Python VM is great The simplicity of the VM is our strength Software Transactional Memory... would be nice But, I want to use foreign memory Java gained jmalloc for foreign memory Better garbage collection Yes but, I easily plug into reference counting A strength of Python is its clear C API ā‡’ Easy foreign functionality G Varoquaux 32
  • 71. 3 The Python VM is great The simplicity of the VM is our strength Software Transactional Memory... would be nice But, I want to use foreign memory Java gained jmalloc for foreign memory Better garbage collection Yes but, I easily plug into reference counting A strength of Python is its clear C API ā‡’ Easy foreign functionality Cython: the best of C and Python Add types for speed (numpy arrays as float*) Call C to bind external libraries: surprisingly easy no pointer arithmetics An adaptation layer between Python VM and C G Varoquaux 32
  • 72. 4 Working together G Varoquaux 33
  • 73. 4 Scikit-learn is easy machine learning As easy as py from s k l e a r n import svm c l a s s i f i e r = svm.SVC() c l a s s i f i e r . f i t ( X t r a i n , Y t r a i n ) Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t ) People love the encapsulation classifier is a semi black box The power of a simple object-oriented API Documentation-driven development G Varoquaux 34
  • 74. 4 Scikit-learn is easy machine learning As easy as py from s k l e a r n import svm c l a s s i f i e r = svm.SVC() c l a s s i f i e r . f i t ( X t r a i n , Y t r a i n ) Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t ) People love the encapsulation classifier is a semi black box The power of a simple object-oriented API Documentation-driven development High-level, readable, simple API reduces cognitive load PyData loves Python in return G Varoquaux 34
  • 75. 4 Diļ¬€erence is richness We all do diļ¬€erent things We can all beneļ¬t from others though we donā€™t know how G Varoquaux 35
  • 76. 4 Diļ¬€erence is richness, but requires outreach We all do diļ¬€erent things We can all beneļ¬t from others though we donā€™t know how Being didactic outside oneā€™s community is crucial Avoiding jargon take that machine learning Prioritizing information ā€œSimple is better than complexā€ Students learning numerics donā€™t care about unicode Build documentation upon very simple examples Think stackoverļ¬‚ow Sphinx + Sphinx-gallery G Varoquaux 35
  • 77. @GaelVaroquaux Scientist web dev: Python is the language for data Python language & VM is perfect to manipulate low-level constructs with high-level wordings Connects to other paradigms, eg C
  • 78. @GaelVaroquaux Scientist web dev: Python is the language for data Python language & VM is perfect to manipulate low-level constructs with high-level wordings Dynamism and reļ¬‚exivity ā‡’ meta-programming and debugging
  • 79. @GaelVaroquaux Scientist web dev: Python is the language for data Python language & VM is perfect to manipulate low-level constructs with high-level wordings Dynamism and reļ¬‚exivity ā‡’ meta-programming and debugging Needs for compilation and dynamism: a diļ¬ƒcult balance PEP 509: guards on run-time modiļ¬cation PEP 510: function specicalization
  • 80. @GaelVaroquaux Scientist web dev: Python is the language for data Python language & VM is perfect to manipulate low-level constructs with high-level wordings Dynamism and reļ¬‚exivity ā‡’ meta-programming and debugging Needs for compilation and dynamism Pydata will use DB and concurrency from web PyData can give knowledge engineering + AI