SlideShare a Scribd company logo
1 of 68
Crushing the Head of the Snake
Robert Brewer
Chief Architect
How to Time
from timeit import Timer
>>> range(5)
[0, 1, 2, 3, 4]
>>> t = Timer("range(a)", "a = 1000000")
>>> t.timeit(1)
>>> t.timeit(100)
>>> t.timeit(1000)
Comparing algorithms
>>> Timer("range(1000)").timeit(1 000 000)
>>> Timer("range(1000)").timeit()
>>> Timer("xrange(1000)").timeit()
>>> Timer("list(xrange(1000))").timeit()
Caveat: Overhead
>>> Timer().timeit(1000000)
Caveat: Wall time not CPU time
>>> Timer("xrange(1000)").timeit()
>>> Timer("xrange(1000)").repeat(3)
take the minimum
How to Profile
>>> import mod
>>> import cProfile
>>>"mod.b()", sort="cumulative")
How to Profile
>>> import mod
>>> import cProfile
>>>"mod.b()", sort="cumulative")
(make changes to module)
>>> reload(mod)
>>>"mod.b()", sort="cumulative")
How to Profile
>>>"for i in xrange(3000): range(i).sort()",
6002 function calls in 0.093 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.019 0.019 0.093 0.093 <string>:1(<module>)
3000 0.052 0.000 0.052 0.000 {list.sort}
3000 0.022 0.000 0.022 0.000 {range}
1 0.000 0.000 0.000 0.000 {method 'disable' of
How to Profile
6002 function calls in 0.093 seconds
ncalls tottime percall cumtime percall filename:lineno(func)
3000 0.052 0.000 0.052 0.000 {list.sort}
3000 0.022 0.000 0.022 0.000 {range}
Example: Standard Deviation
>>> import numpy
>>> n = 100
>>> a = numpy.array(xrange(n),
>>> a.std(ddof=1)
Example: Standard Deviation
>>> n = 4000000000
>>> a = numpy.array(xrange(n),
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: setting an array element
with a sequence.
Example: Standard Deviation
>>> n = 4000000000
>>> arr = numpy.zeros(n, dtype=float)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Example: Standard Deviation
Example: Standard Deviation
Given array A broken in n parts
and local variance V(ai) = Σj(aij - ai)2
V(a) + 2(Σaij)(ai - A) + |ai|(A2 - ai
|A| - ddof
Σi =
Example: Standard Deviation
def run():
points = 400 000 (0000)
segments = 100
part_len = points / segments
partitions = []
for p in range(segments):
part = range(part_len * p,
part_len * (p + 1))
return stddev(partitions, ddof=1)
Example: Standard Deviation
def stddev(partitions, ddof=0):
final = 0.0
for part in partitions:
m = total(part) / length(part)
# Find the mean of the entire group.
gtotal = total([total(p) for p in partitions])
glength = total([length(p) for p in partitions])
g = gtotal / glength
adj = ((2 * total(part) * (m - g)) +
((g ** 2 - m ** 2) * length(part)))
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof))
Example: Standard Deviation
2052106 function calls in 71.025 seconds
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 71.023 71.023
1 0.006 0.006 71.013 71.013
410400 63.406 0.000 70.490 0.000
100 0.341 0.003 69.178 0.692
410601 7.076 0.000 7.076 0.000 {range}
410200 0.151 0.000 0.174 0.000
820700 0.042 0.000 0.042 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
Example: Standard Deviation
400000 in 71.025 seconds
Assuming no other effects of scale,
it will take 197.3 hours (over 8 days)
to calculate our 4 billion-row array.
Example: Standard Deviation
Can we calculate
our 4 billion-row array in
1 minute?
That’s 400,000 in 6ms.
All we need is an 11,837.5x speedup.
Example: Standard Deviation
2052106 function calls in 71.025 seconds
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 71.023 71.023
1 0.006 0.006 71.013 71.013
410400 63.406 0.000 70.490 0.000
100 0.341 0.003 69.178 0.692
410601 7.076 0.000 7.076 0.000 {range}
410200 0.151 0.000 0.174 0.000
820700 0.042 0.000 0.042 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
Amongst Our Weaponry
Extracting loop invariants
Extracting Loop Invariants
def varsum(arr):
vs = 0
for j in range(len(arr)):
mean = (total(arr) / length(arr))
vs += (arr[j] - mean) ** 2
return vs
Extracting Loop Invariants
def varsum(arr):
vs = 0
mean = (total(arr) / length(arr))
for j in range(len(arr)):
vs += (arr[j] - mean) ** 2
return vs
Extracting Loop Invariants
52606 calls in 1.944 seconds (36x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 1.942 1.942
1 0.006 0.006 1.932 1.932
10500 1.673 0.000 1.859 0.000
10701 0.196 0.000 0.196 0.000 {range}
100 0.062 0.001 0.081 0.001
10300 0.003 0.000 0.003 0.000
20900 0.001 0.000 0.001 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
still 5.4 hrs
Extracting Loop Invariants
def stddev(partitions, ddof=0):
final = 0.0
for part in partitions:
m = total(part) / length(part)
# Find the mean of the entire group.
gtotal = total([total(p) for p in partitions])
glength = total([length(p) for p in partitions])
g = gtotal / glength
adj = ((2 * total(part) * (m - g)) +
((g ** 2 - m ** 2) * length(part)))
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof))
Extracting Loop Invariants
def stddev(partitions, ddof=0):
final = 0.0
# Find the mean of the entire group.
gtotal = total([total(p) for p in partitions])
glength = total([length(p) for p in partitions])
g = gtotal / glength
for part in partitions:
m = total(part) / length(part)
adj = ((2 * total(part) * (m - g)) +
((g ** 2 - m ** 2) * length(part)))
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof))
Extracting Loop Invariants
2512 function calls in 0.142 seconds (13x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.140 0.140
1 0.000 0.000 0.136 0.136
100 0.063 0.001 0.082 0.001
402 0.064 0.000 0.071 0.000
603 0.013 0.000 0.013 0.000 {range}
400 0.000 0.000 0.000 0.000
902 0.000 0.000 0.000 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
still 23 minutes
Amongst Our Weaponry
Use builtin Python functions
whenever possible
Use Python Builtins
def total(arr):
s = 0
for j in range(len(arr)):
s += arr[j]
return s
Use Python Builtins
def total(arr):
s = 0
for j in range(len(arr)):
s += arr[j]
return s
def total(arr):
return sum(arr)
Use Python Builtins
2110 function calls in 0.096 seconds (1.47x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.093 0.093
1 0.000 0.000 0.083 0.083
100 0.065 0.001 0.070 0.001
402 0.000 0.000 0.015 0.000
402 0.015 0.000 0.015 0.000 {sum}
201 0.012 0.000 0.012 0.000 {range}
400 0.000 0.000 0.000 0.000
500 0.000 0.000 0.000 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
still 16 minutes
Use Python Builtins
2110 function calls in 0.096 seconds (1.47x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.093 0.093
1 0.000 0.000 0.083 0.083
100 0.065 0.001 0.070 0.001
402 0.000 0.000 0.015 0.000
402 0.015 0.000 0.015 0.000 {sum}
201 0.012 0.000 0.012 0.000 {range}
400 0.000 0.000 0.000 0.000
500 0.000 0.000 0.000 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
Use Python Builtins
def varsum(arr):
vs = 0
mean = (total(arr) / length(arr))
for j in range(len(arr)):
vs += (arr[j] - mean) ** 2
return vs
Use Python Builtins
def varsum(arr):
mean = (total(arr) / length(arr))
return sum((v - mean) ** 2
for v in arr)
Use Python Builtins
402110 function calls in 0.122 seconds
1.27x slower
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.120 0.120
1 0.000 0.000 0.115 0.115
502 0.044 0.000 0.114 0.000 {sum}
100 0.000 0.000 0.106 0.001
400100 0.070 0.000 0.070 0.000
402 0.000 0.000 0.011 0.000
Amongst Our Weaponry
Reduce function calls
Reduce Function Calls
>>> Timer("sum(a)", "a = range(10)").repeat(3)
>>> Timer("total(a)",
"a = range(10); total = lambda x: sum(x)"
0.000000059 seconds per call
Reduce Function Calls
def variances_squared(arr):
mean = (total(arr) / length(arr))
for v in arr:
yield (v - mean) ** 2
Reduce Function Calls
def varsum(arr):
mean = (total(arr) / length(arr))
return sum( (v - mean) ** 2
for v in arr )
def varsum(arr):
mean = (total(arr) / length(arr))
return sum([(v - mean) ** 2
for v in arr])
Reduce Function Calls
2010 function calls in 0.082 seconds (1.17x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.080 0.080
1 0.000 0.000 0.071 0.071
100 0.050 0.001 0.056 0.001
502 0.020 0.000 0.020 0.000 {sum}
402 0.000 0.000 0.016 0.000
101 0.009 0.000 0.009 0.000 {range}
400 0.000 0.000 0.000 0.000
400 0.000 0.000 0.000 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
still 13+ minutes
Amongst Our Weaponry
Vector operations
with NumPy
Vector Operations
part = numpy.array(
xrange(...), dtype=float)
def total(arr):
return arr.sum()
def varsum(arr):
return (
(arr - arr.mean()) ** 2).sum()
Vector Operations
3408 function calls in 0.057 seconds (1.43x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.057 0.057
200 0.051 0.000 0.051 0.000 {numpy...array}
1 0.001 0.001 0.006 0.006
500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce}
100 0.001 0.000 0.003 0.000
400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum}
300 0.000 0.000 0.002 0.000
100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean}
still 9.5 minutes
Vector Operations
3408 function calls in 0.057 seconds (1.43x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.057 0.057
200 0.051 0.000 0.051 0.000 {numpy...array}
1 0.001 0.001 0.006 0.006
500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce}
100 0.001 0.000 0.003 0.000
400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum}
300 0.000 0.000 0.002 0.000
100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean}
still 9.5 minutes
Vector Operations
3408 function calls in 0.006 seconds (13.6x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.001 0.001 0.006 0.006
500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce}
100 0.001 0.000 0.003 0.000
400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum}
300 0.000 0.000 0.002 0.000
100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean}
should be exactly 1 minute
Vector Operations
Let’s try 4 billion!
Bump up that N...
Vector Operations
Oh, yeah...
Amongst Our Weaponry
from multiprocessing import Pool
def run():
results = Pool().map(
run_one, range(segments))
result = stddev(results)
return result
def run_one(i):
p = numpy.memmap(
'stddev.%d' % i, dtype=float,
mode='r', shape=(part_len,))
T, L = p.sum(), float(len(p))
m = T / L
V = ((p - m) ** 2).sum()
return T, L, V
def stddev(TLVs, ddof=0):
final = 0.0
totals = [T for T, L, V in TLVs]
lengths = [L for T, L, V in TLVs]
glength = sum(lengths)
g = sum(totals) / glength
for T, L, V in TLVs:
m = T / L
adj = ((2 * T * (m - g)) + ((g ** 2 - m ** 2) * L))
final += V + adj
return math.sqrt(final / (glength - ddof))
3734 function calls in 0.024 seconds
6x slower
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.024 0.024
4 0.000 0.000 0.011 0.003
22 0.011 0.000 0.011 0.000 {thread.lock.acquire}
1 0.000 0.000 0.011 0.011
1 0.000 0.000 0.008 0.008
4 0.001 0.000 0.005 0.001
1 0.003 0.003 0.005 0.005
4 0.000 0.000 0.004 0.001
4 0.003 0.001 0.003 0.001 {posix.fork}
Could that waiting be insignificant
when we scale up to 4 billion?
Let’s try it!
3766 function calls in 67.811 seconds
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 67.811 67.811
4 0.000 0.000 67.747 16.930
22 67.747 3.079 67.747 3.079 {thread.lock.acquire}
1 0.000 0.000 67.747 67.747
1 0.000 0.000 0.062 0.060
4 0.000 0.000 0.058 0.014
4 0.057 0.014 0.057 0.014 {posix.fork}
1 0.003 0.003 0.005 0.005
2 0.002 0.001 0.002 0.001 {sum}
SO CLOSE! 1.13 minutes
def run_one(i):
if i == 50:
cProfile.runctx(..., "prf.50")
>>> import pstats
>>> s = pstats.Stats("prf.50")
>>> s.sort_stats("cumulative")
<pstats.Stats instance at 0x2bddcb0>
>>> _.print_stats()
57 function calls in 2.804 seconds
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.431 0.431 2.791 2.791
2 0.000 0.000 2.360 1.180 numpy.ndarray.sum
2 2.360 1.180 2.360 1.180 numpy.ufunc.reduce
1 0.000 0.000 0.000 0.000
def run_one(i):
p = numpy.memmap(
'stddev.%d' % i, dtype=float,
mode='r', shape=(part_len,))
T, L = p.sum(), float(len(p))
m = T / L
V = ((p - m) ** 2).sum()
return T, L, V
200 seconds / 4 cores = 50
Parallelization? Serialization!
67.8 seconds for 4 billion rows, but
-50 of those are loading data!
17.8 seconds to do the actual math.
import bloscpack as bp
bargs = bp.args.DEFAULT_BLOSC_ARGS
bargs['clevel'] = 6
part, fname, blosc_args=bargs)
part = bp.unpack_ndarray_file(fname)
Let’s try it!
I Crush
I Crush Your Head!
1153 function calls in 26.166 seconds
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 26.166 26.166
4 0.000 0.000 26.134 6.53
22 26.134 1.188 26.134 1.188 thread.lock.acquire
1 0.000 0.000 26.133 26.133
1 0.000 0.000 26.133 26.133
1 0.000 0.000 26.133 26.133
1 0.003 0.003 0.030 0.030
1 0.000 0.000 0.021 0.021
I Crush Your Head!
With some time-tested general
programming techniques:
Extract loop invariants
Use language builtins
Reduce function calls
I Crush Your Head!
And some Python libraries
for architectural improvements:
Use NumPy for vector ops
Use multiprocessing for parallelization
Use bloscpack for compression
I Crush Your Head!
We sped up our calculation
so that it runs in:
0.003% of the time
or 27317 times faster
4.4 orders of magnitude
Crushing the Head of the Snake
Any questions?

More Related Content

What's hot

The Ring programming language version 1.9 book - Part 32 of 210
The Ring programming language version 1.9 book - Part 32 of 210The Ring programming language version 1.9 book - Part 32 of 210
The Ring programming language version 1.9 book - Part 32 of 210Mahmoud Samir Fayed
The Ring programming language version 1.9 book - Part 45 of 210
The Ring programming language version 1.9 book - Part 45 of 210The Ring programming language version 1.9 book - Part 45 of 210
The Ring programming language version 1.9 book - Part 45 of 210Mahmoud Samir Fayed
The Ring programming language version 1.8 book - Part 42 of 202
The Ring programming language version 1.8 book - Part 42 of 202The Ring programming language version 1.8 book - Part 42 of 202
The Ring programming language version 1.8 book - Part 42 of 202Mahmoud Samir Fayed
The Ring programming language version 1.8 book - Part 30 of 202
The Ring programming language version 1.8 book - Part 30 of 202The Ring programming language version 1.8 book - Part 30 of 202
The Ring programming language version 1.8 book - Part 30 of 202Mahmoud Samir Fayed
Numerical Algorithm for a few Special Functions
Numerical Algorithm for a few Special FunctionsNumerical Algorithm for a few Special Functions
Numerical Algorithm for a few Special FunctionsAmos Tsai
The Ring programming language version 1.3 book - Part 50 of 88
The Ring programming language version 1.3 book - Part 50 of 88The Ring programming language version 1.3 book - Part 50 of 88
The Ring programming language version 1.3 book - Part 50 of 88Mahmoud Samir Fayed
The Ring programming language version 1.10 book - Part 44 of 212
The Ring programming language version 1.10 book - Part 44 of 212The Ring programming language version 1.10 book - Part 44 of 212
The Ring programming language version 1.10 book - Part 44 of 212Mahmoud Samir Fayed
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Dr. Volkan OBAN
The Ring programming language version 1.5.3 book - Part 77 of 184
The Ring programming language version 1.5.3 book - Part 77 of 184The Ring programming language version 1.5.3 book - Part 77 of 184
The Ring programming language version 1.5.3 book - Part 77 of 184Mahmoud Samir Fayed
The Ring programming language version 1.4 book - Part 18 of 30
The Ring programming language version 1.4 book - Part 18 of 30The Ring programming language version 1.4 book - Part 18 of 30
The Ring programming language version 1.4 book - Part 18 of 30Mahmoud Samir Fayed
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSqlDmytro Shylovskyi
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...Codemotion
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slidesYanchang Zhao
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀EXEM
Matched filter detection
Matched filter detectionMatched filter detection
Matched filter detectionSURYA DEEPAK
Kotlin Coroutines. Flow is coming
Kotlin Coroutines. Flow is comingKotlin Coroutines. Flow is coming
Kotlin Coroutines. Flow is comingKirill Rozov
DSP_FOEHU - Lec 03 - Sampling of Continuous Time Signals
DSP_FOEHU - Lec 03 - Sampling of Continuous Time SignalsDSP_FOEHU - Lec 03 - Sampling of Continuous Time Signals
DSP_FOEHU - Lec 03 - Sampling of Continuous Time SignalsAmr E. Mohamed

What's hot (20)

The Ring programming language version 1.9 book - Part 32 of 210
The Ring programming language version 1.9 book - Part 32 of 210The Ring programming language version 1.9 book - Part 32 of 210
The Ring programming language version 1.9 book - Part 32 of 210
The Ring programming language version 1.9 book - Part 45 of 210
The Ring programming language version 1.9 book - Part 45 of 210The Ring programming language version 1.9 book - Part 45 of 210
The Ring programming language version 1.9 book - Part 45 of 210
The Ring programming language version 1.8 book - Part 42 of 202
The Ring programming language version 1.8 book - Part 42 of 202The Ring programming language version 1.8 book - Part 42 of 202
The Ring programming language version 1.8 book - Part 42 of 202
The Ring programming language version 1.8 book - Part 30 of 202
The Ring programming language version 1.8 book - Part 30 of 202The Ring programming language version 1.8 book - Part 30 of 202
The Ring programming language version 1.8 book - Part 30 of 202
Numerical Algorithm for a few Special Functions
Numerical Algorithm for a few Special FunctionsNumerical Algorithm for a few Special Functions
Numerical Algorithm for a few Special Functions
The Ring programming language version 1.3 book - Part 50 of 88
The Ring programming language version 1.3 book - Part 50 of 88The Ring programming language version 1.3 book - Part 50 of 88
The Ring programming language version 1.3 book - Part 50 of 88
The Ring programming language version 1.10 book - Part 44 of 212
The Ring programming language version 1.10 book - Part 44 of 212The Ring programming language version 1.10 book - Part 44 of 212
The Ring programming language version 1.10 book - Part 44 of 212
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
The Ring programming language version 1.5.3 book - Part 77 of 184
The Ring programming language version 1.5.3 book - Part 77 of 184The Ring programming language version 1.5.3 book - Part 77 of 184
The Ring programming language version 1.5.3 book - Part 77 of 184
ScalaMeter 2012
ScalaMeter 2012ScalaMeter 2012
ScalaMeter 2012
The Ring programming language version 1.4 book - Part 18 of 30
The Ring programming language version 1.4 book - Part 18 of 30The Ring programming language version 1.4 book - Part 18 of 30
The Ring programming language version 1.4 book - Part 18 of 30
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSql
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
Matched filter detection
Matched filter detectionMatched filter detection
Matched filter detection
Kotlin Coroutines. Flow is coming
Kotlin Coroutines. Flow is comingKotlin Coroutines. Flow is coming
Kotlin Coroutines. Flow is coming
DSP_FOEHU - Lec 03 - Sampling of Continuous Time Signals
DSP_FOEHU - Lec 03 - Sampling of Continuous Time SignalsDSP_FOEHU - Lec 03 - Sampling of Continuous Time Signals
DSP_FOEHU - Lec 03 - Sampling of Continuous Time Signals

Viewers also liked

Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014PyData
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypetPyData
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014PyData
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"PyData
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...PyData
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...PyData
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...PyData
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischInteractive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischPyData
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...PyData
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerPyData
Python resampling
Python resamplingPython resampling
Python resamplingPyData
Doing frequentist statistics with scipy
Doing frequentist statistics with scipyDoing frequentist statistics with scipy
Doing frequentist statistics with scipyPyData
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataFang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataPyData
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014PyData
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPyData
Making your code faster cython and parallel processing in the jupyter notebook
Making your code faster   cython and parallel processing in the jupyter notebookMaking your code faster   cython and parallel processing in the jupyter notebook
Making your code faster cython and parallel processing in the jupyter notebookPyData
Large scale-ctr-prediction lessons-learned-florian-hartl
Large scale-ctr-prediction lessons-learned-florian-hartlLarge scale-ctr-prediction lessons-learned-florian-hartl
Large scale-ctr-prediction lessons-learned-florian-hartlPyData
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
Embracing the Monolith in Small Teams: Doubling down on python to move fast w...
Embracing the Monolith in Small Teams: Doubling down on python to move fast w...Embracing the Monolith in Small Teams: Doubling down on python to move fast w...
Embracing the Monolith in Small Teams: Doubling down on python to move fast w...PyData

Viewers also liked (20)

Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypet
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischInteractive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Python resampling
Python resamplingPython resampling
Python resampling
Doing frequentist statistics with scipy
Doing frequentist statistics with scipyDoing frequentist statistics with scipy
Doing frequentist statistics with scipy
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataFang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices Environment
Making your code faster cython and parallel processing in the jupyter notebook
Making your code faster   cython and parallel processing in the jupyter notebookMaking your code faster   cython and parallel processing in the jupyter notebook
Making your code faster cython and parallel processing in the jupyter notebook
Large scale-ctr-prediction lessons-learned-florian-hartl
Large scale-ctr-prediction lessons-learned-florian-hartlLarge scale-ctr-prediction lessons-learned-florian-hartl
Large scale-ctr-prediction lessons-learned-florian-hartl
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
Embracing the Monolith in Small Teams: Doubling down on python to move fast w...
Embracing the Monolith in Small Teams: Doubling down on python to move fast w...Embracing the Monolith in Small Teams: Doubling down on python to move fast w...
Embracing the Monolith in Small Teams: Doubling down on python to move fast w...

Similar to Crushing the Head of the Snake by Robert Brewer PyData SV 2014

The Ring programming language version 1.3 book - Part 16 of 88
The Ring programming language version 1.3 book - Part 16 of 88The Ring programming language version 1.3 book - Part 16 of 88
The Ring programming language version 1.3 book - Part 16 of 88Mahmoud Samir Fayed
The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.1 book - Part 23 of 180The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.1 book - Part 23 of 180Mahmoud Samir Fayed
The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.5.2 book - Part 24 of 181The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.5.2 book - Part 24 of 181Mahmoud Samir Fayed
The Ring programming language version 1.10 book - Part 33 of 212
The Ring programming language version 1.10 book - Part 33 of 212The Ring programming language version 1.10 book - Part 33 of 212
The Ring programming language version 1.10 book - Part 33 of 212Mahmoud Samir Fayed
The Ring programming language version 1.2 book - Part 14 of 84
The Ring programming language version 1.2 book - Part 14 of 84The Ring programming language version 1.2 book - Part 14 of 84
The Ring programming language version 1.2 book - Part 14 of 84Mahmoud Samir Fayed
The Ring programming language version 1.7 book - Part 28 of 196
The Ring programming language version 1.7 book - Part 28 of 196The Ring programming language version 1.7 book - Part 28 of 196
The Ring programming language version 1.7 book - Part 28 of 196Mahmoud Samir Fayed
The Ring programming language version 1.5.2 book - Part 75 of 181
The Ring programming language version 1.5.2 book - Part 75 of 181The Ring programming language version 1.5.2 book - Part 75 of 181
The Ring programming language version 1.5.2 book - Part 75 of 181Mahmoud Samir Fayed
Python profiling
Python profilingPython profiling
Python profilingdreampuf
mat lab introduction and basics to learn
mat lab introduction and basics to learnmat lab introduction and basics to learn
mat lab introduction and basics to learnpavan373
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with RYanchang Zhao
The Ring programming language version 1.5.3 book - Part 35 of 184
The Ring programming language version 1.5.3 book - Part 35 of 184The Ring programming language version 1.5.3 book - Part 35 of 184
The Ring programming language version 1.5.3 book - Part 35 of 184Mahmoud Samir Fayed
Fourier project presentation
Fourier project  presentationFourier project  presentation
Fourier project presentation志璿 楊
The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184Mahmoud Samir Fayed
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchAhmed BESBES

Similar to Crushing the Head of the Snake by Robert Brewer PyData SV 2014 (20)

The Ring programming language version 1.3 book - Part 16 of 88
The Ring programming language version 1.3 book - Part 16 of 88The Ring programming language version 1.3 book - Part 16 of 88
The Ring programming language version 1.3 book - Part 16 of 88
The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.1 book - Part 23 of 180The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.5.2 book - Part 24 of 181The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.10 book - Part 33 of 212
The Ring programming language version 1.10 book - Part 33 of 212The Ring programming language version 1.10 book - Part 33 of 212
The Ring programming language version 1.10 book - Part 33 of 212
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
The Ring programming language version 1.2 book - Part 14 of 84
The Ring programming language version 1.2 book - Part 14 of 84The Ring programming language version 1.2 book - Part 14 of 84
The Ring programming language version 1.2 book - Part 14 of 84
The Ring programming language version 1.7 book - Part 28 of 196
The Ring programming language version 1.7 book - Part 28 of 196The Ring programming language version 1.7 book - Part 28 of 196
The Ring programming language version 1.7 book - Part 28 of 196
The Ring programming language version 1.5.2 book - Part 75 of 181
The Ring programming language version 1.5.2 book - Part 75 of 181The Ring programming language version 1.5.2 book - Part 75 of 181
The Ring programming language version 1.5.2 book - Part 75 of 181
Python profiling
Python profilingPython profiling
Python profiling
mat lab introduction and basics to learn
mat lab introduction and basics to learnmat lab introduction and basics to learn
mat lab introduction and basics to learn
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
The Ring programming language version 1.5.3 book - Part 35 of 184
The Ring programming language version 1.5.3 book - Part 35 of 184The Ring programming language version 1.5.3 book - Part 35 of 184
The Ring programming language version 1.5.3 book - Part 35 of 184
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
Fourier project presentation
Fourier project  presentationFourier project  presentation
Fourier project presentation
The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184
Clojure basics
Clojure basicsClojure basics
Clojure basics
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
MLE Example
MLE ExampleMLE Example
MLE Example

More from PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

Recently uploaded

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli

Recently uploaded (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers

Crushing the Head of the Snake by Robert Brewer PyData SV 2014

  • 1. Crushing the Head of the Snake Robert Brewer Chief Architect
  • 2. How to Time from timeit import Timer >>> range(5) [0, 1, 2, 3, 4] >>> t = Timer("range(a)", "a = 1000000") >>> t.timeit(1) 0.028472900390625 >>> t.timeit(100) 1.8600409030914307 >>> t.timeit(1000) 18.056041955947876
  • 3. Comparing algorithms >>> Timer("range(1000)").timeit(1 000 000) >>> Timer("range(1000)").timeit() 11.392634868621826 >>> Timer("xrange(1000)").timeit() 0.20040297508239746 >>> Timer("list(xrange(1000))").timeit() 12.207480907440186
  • 5. Caveat: Wall time not CPU time >>> Timer("xrange(1000)").timeit() 0.20040297508239746 >>> Timer("xrange(1000)").repeat(3) [0.20735883712768555, 0.1968221664428711, 0.18882489204406738] take the minimum
  • 6. How to Profile >>> import mod >>> import cProfile >>>"mod.b()", sort="cumulative")
  • 7. How to Profile >>> import mod >>> import cProfile >>>"mod.b()", sort="cumulative") (make changes to module) >>> reload(mod) >>>"mod.b()", sort="cumulative")
  • 8. How to Profile >>>"for i in xrange(3000): range(i).sort()", sort="cumulative") 6002 function calls in 0.093 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(func) 1 0.019 0.019 0.093 0.093 <string>:1(<module>) 3000 0.052 0.000 0.052 0.000 {list.sort} 3000 0.022 0.000 0.022 0.000 {range} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
  • 9. How to Profile 6002 function calls in 0.093 seconds ncalls tottime percall cumtime percall filename:lineno(func) 3000 0.052 0.000 0.052 0.000 {list.sort} 3000 0.022 0.000 0.022 0.000 {range}
  • 10. Example: Standard Deviation >>> import numpy >>> n = 100 >>> a = numpy.array(xrange(n), dtype=float) >>> a.std(ddof=1) 29.011491975882016
  • 11. Example: Standard Deviation >>> n = 4000000000 >>> a = numpy.array(xrange(n), dtype=float) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: setting an array element with a sequence.
  • 12. Example: Standard Deviation >>> n = 4000000000 >>> arr = numpy.zeros(n, dtype=float) Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError
  • 14. Example: Standard Deviation Given array A broken in n parts and local variance V(ai) = Σj(aij - ai)2 V(a) + 2(Σaij)(ai - A) + |ai|(A2 - ai 2) |A| - ddof n Σi = 1 √
  • 15. Example: Standard Deviation def run(): points = 400 000 (0000) segments = 100 part_len = points / segments partitions = [] for p in range(segments): part = range(part_len * p, part_len * (p + 1)) partitions.append(part) return stddev(partitions, ddof=1)
  • 16. Example: Standard Deviation def stddev(partitions, ddof=0): final = 0.0 for part in partitions: m = total(part) / length(part) # Find the mean of the entire group. gtotal = total([total(p) for p in partitions]) glength = total([length(p) for p in partitions]) g = gtotal / glength adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** 2) * length(part))) final += varsum(part) + adj return math.sqrt(final / (glength - ddof))
  • 17. Example: Standard Deviation 2052106 function calls in 71.025 seconds ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 71.023 71.023 1 0.006 0.006 71.013 71.013 410400 63.406 0.000 70.490 0.000 100 0.341 0.003 69.178 0.692 410601 7.076 0.000 7.076 0.000 {range} 410200 0.151 0.000 0.174 0.000 820700 0.042 0.000 0.042 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}
  • 18. Example: Standard Deviation 400000 in 71.025 seconds Assuming no other effects of scale, it will take 197.3 hours (over 8 days) to calculate our 4 billion-row array.
  • 19. Example: Standard Deviation Can we calculate our 4 billion-row array in 1 minute? That’s 400,000 in 6ms. All we need is an 11,837.5x speedup.
  • 21. Example: Standard Deviation 2052106 function calls in 71.025 seconds ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 71.023 71.023 1 0.006 0.006 71.013 71.013 410400 63.406 0.000 70.490 0.000 100 0.341 0.003 69.178 0.692 410601 7.076 0.000 7.076 0.000 {range} 410200 0.151 0.000 0.174 0.000 820700 0.042 0.000 0.042 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}
  • 23. Extracting Loop Invariants def varsum(arr): vs = 0 for j in range(len(arr)): mean = (total(arr) / length(arr)) vs += (arr[j] - mean) ** 2 return vs
  • 24. Extracting Loop Invariants def varsum(arr): vs = 0 mean = (total(arr) / length(arr)) for j in range(len(arr)): vs += (arr[j] - mean) ** 2 return vs
  • 25. Extracting Loop Invariants 52606 calls in 1.944 seconds (36x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 1.942 1.942 1 0.006 0.006 1.932 1.932 10500 1.673 0.000 1.859 0.000 10701 0.196 0.000 0.196 0.000 {range} 100 0.062 0.001 0.081 0.001 10300 0.003 0.000 0.003 0.000 20900 0.001 0.000 0.001 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt} still 5.4 hrs
  • 26. Extracting Loop Invariants def stddev(partitions, ddof=0): final = 0.0 for part in partitions: m = total(part) / length(part) # Find the mean of the entire group. gtotal = total([total(p) for p in partitions]) glength = total([length(p) for p in partitions]) g = gtotal / glength adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** 2) * length(part))) final += varsum(part) + adj return math.sqrt(final / (glength - ddof))
  • 27. Extracting Loop Invariants def stddev(partitions, ddof=0): final = 0.0 # Find the mean of the entire group. gtotal = total([total(p) for p in partitions]) glength = total([length(p) for p in partitions]) g = gtotal / glength for part in partitions: m = total(part) / length(part) adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** 2) * length(part))) final += varsum(part) + adj return math.sqrt(final / (glength - ddof))
  • 28. Extracting Loop Invariants 2512 function calls in 0.142 seconds (13x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.140 0.140 1 0.000 0.000 0.136 0.136 100 0.063 0.001 0.082 0.001 402 0.064 0.000 0.071 0.000 603 0.013 0.000 0.013 0.000 {range} 400 0.000 0.000 0.000 0.000 902 0.000 0.000 0.000 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt} still 23 minutes
  • 29. Amongst Our Weaponry Use builtin Python functions whenever possible
  • 30. Use Python Builtins def total(arr): s = 0 for j in range(len(arr)): s += arr[j] return s
  • 31. Use Python Builtins def total(arr): s = 0 for j in range(len(arr)): s += arr[j] return s def total(arr): return sum(arr)
  • 32. Use Python Builtins 2110 function calls in 0.096 seconds (1.47x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.093 0.093 1 0.000 0.000 0.083 0.083 100 0.065 0.001 0.070 0.001 402 0.000 0.000 0.015 0.000 402 0.015 0.000 0.015 0.000 {sum} 201 0.012 0.000 0.012 0.000 {range} 400 0.000 0.000 0.000 0.000 500 0.000 0.000 0.000 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt} still 16 minutes
  • 33. Use Python Builtins 2110 function calls in 0.096 seconds (1.47x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.093 0.093 1 0.000 0.000 0.083 0.083 100 0.065 0.001 0.070 0.001 402 0.000 0.000 0.015 0.000 402 0.015 0.000 0.015 0.000 {sum} 201 0.012 0.000 0.012 0.000 {range} 400 0.000 0.000 0.000 0.000 500 0.000 0.000 0.000 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}
  • 34. Use Python Builtins def varsum(arr): vs = 0 mean = (total(arr) / length(arr)) for j in range(len(arr)): vs += (arr[j] - mean) ** 2 return vs
  • 35. Use Python Builtins def varsum(arr): mean = (total(arr) / length(arr)) return sum((v - mean) ** 2 for v in arr)
  • 36. Use Python Builtins 402110 function calls in 0.122 seconds 1.27x slower ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.120 0.120 1 0.000 0.000 0.115 0.115 502 0.044 0.000 0.114 0.000 {sum} 100 0.000 0.000 0.106 0.001 400100 0.070 0.000 0.070 0.000 402 0.000 0.000 0.011 0.000 …
  • 37.
  • 38. Amongst Our Weaponry Reduce function calls
  • 39. Reduce Function Calls >>> Timer("sum(a)", "a = range(10)").repeat(3) [0.15801000595092773, 0.1406857967376709, 0.14577603340148926] >>> Timer("total(a)", "a = range(10); total = lambda x: sum(x)" ).repeat(3) [0.2066800594329834, 0.1998300552368164, 0.21536493301391602] 0.000000059 seconds per call
  • 40. Reduce Function Calls def variances_squared(arr): mean = (total(arr) / length(arr)) for v in arr: yield (v - mean) ** 2
  • 41. Reduce Function Calls def varsum(arr): mean = (total(arr) / length(arr)) return sum( (v - mean) ** 2 for v in arr ) def varsum(arr): mean = (total(arr) / length(arr)) return sum([(v - mean) ** 2 for v in arr])
  • 42. Reduce Function Calls 2010 function calls in 0.082 seconds (1.17x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.080 0.080 1 0.000 0.000 0.071 0.071 100 0.050 0.001 0.056 0.001 502 0.020 0.000 0.020 0.000 {sum} 402 0.000 0.000 0.016 0.000 101 0.009 0.000 0.009 0.000 {range} 400 0.000 0.000 0.000 0.000 400 0.000 0.000 0.000 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt} still 13+ minutes
  • 43. Amongst Our Weaponry Vector operations with NumPy
  • 44. Vector Operations part = numpy.array( xrange(...), dtype=float) def total(arr): return arr.sum() def varsum(arr): return ( (arr - arr.mean()) ** 2).sum()
  • 45. Vector Operations 3408 function calls in 0.057 seconds (1.43x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.057 0.057 200 0.051 0.000 0.051 0.000 {numpy...array} 1 0.001 0.001 0.006 0.006 500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce} 100 0.001 0.000 0.003 0.000 400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum} 300 0.000 0.000 0.002 0.000 100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean} … still 9.5 minutes
  • 46. Vector Operations 3408 function calls in 0.057 seconds (1.43x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.057 0.057 200 0.051 0.000 0.051 0.000 {numpy...array} 1 0.001 0.001 0.006 0.006 500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce} 100 0.001 0.000 0.003 0.000 400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum} 300 0.000 0.000 0.002 0.000 100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean} … still 9.5 minutes
  • 47. Vector Operations 3408 function calls in 0.006 seconds (13.6x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.001 0.001 0.006 0.006 500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce} 100 0.001 0.000 0.003 0.000 400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum} 300 0.000 0.000 0.002 0.000 100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean} … should be exactly 1 minute
  • 48. Vector Operations Let’s try 4 billion! Bump up that N...
  • 51. Parallelization from multiprocessing import Pool def run(): results = Pool().map( run_one, range(segments)) result = stddev(results) return result
  • 52. Parallelization def run_one(i): p = numpy.memmap( 'stddev.%d' % i, dtype=float, mode='r', shape=(part_len,)) T, L = p.sum(), float(len(p)) m = T / L V = ((p - m) ** 2).sum() return T, L, V
  • 53. Parallelization def stddev(TLVs, ddof=0): final = 0.0 totals = [T for T, L, V in TLVs] lengths = [L for T, L, V in TLVs] glength = sum(lengths) g = sum(totals) / glength for T, L, V in TLVs: m = T / L adj = ((2 * T * (m - g)) + ((g ** 2 - m ** 2) * L)) final += V + adj return math.sqrt(final / (glength - ddof))
  • 54. Parallelization 3734 function calls in 0.024 seconds 6x slower ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.024 0.024 4 0.000 0.000 0.011 0.003 22 0.011 0.000 0.011 0.000 {thread.lock.acquire} 1 0.000 0.000 0.011 0.011 1 0.000 0.000 0.008 0.008 4 0.001 0.000 0.005 0.001 1 0.003 0.003 0.005 0.005 4 0.000 0.000 0.004 0.001 4 0.003 0.001 0.003 0.001 {posix.fork} ...
  • 55. Parallelization Could that waiting be insignificant when we scale up to 4 billion? Let’s try it!
  • 56. Parallelization 3766 function calls in 67.811 seconds ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 67.811 67.811 4 0.000 0.000 67.747 16.930 22 67.747 3.079 67.747 3.079 {thread.lock.acquire} 1 0.000 0.000 67.747 67.747 1 0.000 0.000 0.062 0.060 4 0.000 0.000 0.058 0.014 4 0.057 0.014 0.057 0.014 {posix.fork} 1 0.003 0.003 0.005 0.005 2 0.002 0.001 0.002 0.001 {sum} SO CLOSE! 1.13 minutes
  • 57. Parallelization def run_one(i): if i == 50: cProfile.runctx(..., "prf.50") >>> import pstats >>> s = pstats.Stats("prf.50") >>> s.sort_stats("cumulative") <pstats.Stats instance at 0x2bddcb0> >>> _.print_stats()
  • 58. Parallelization 57 function calls in 2.804 seconds ncalls tottime percall cumtime percall filename:lineno(func) 1 0.431 0.431 2.791 2.791 2 0.000 0.000 2.360 1.180 numpy.ndarray.sum 2 2.360 1.180 2.360 1.180 numpy.ufunc.reduce 1 0.000 0.000 0.000 0.000
  • 59. Parallelization def run_one(i): p = numpy.memmap( 'stddev.%d' % i, dtype=float, mode='r', shape=(part_len,)) T, L = p.sum(), float(len(p)) m = T / L V = ((p - m) ** 2).sum() return T, L, V 200 seconds / 4 cores = 50
  • 60. Parallelization? Serialization! 67.8 seconds for 4 billion rows, but -50 of those are loading data! 17.8 seconds to do the actual math.
  • 61. Serialization import bloscpack as bp bargs = bp.args.DEFAULT_BLOSC_ARGS bargs['clevel'] = 6 bp.pack_ndarray_file( part, fname, blosc_args=bargs) part = bp.unpack_ndarray_file(fname)
  • 64. I Crush Your Head! 1153 function calls in 26.166 seconds ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 26.166 26.166 4 0.000 0.000 26.134 6.53 22 26.134 1.188 26.134 1.188 thread.lock.acquire 1 0.000 0.000 26.133 26.133 1 0.000 0.000 26.133 26.133 1 0.000 0.000 26.133 26.133 1 0.003 0.003 0.030 0.030 1 0.000 0.000 0.021 0.021
  • 65. I Crush Your Head! With some time-tested general programming techniques: Extract loop invariants Use language builtins Reduce function calls
  • 66. I Crush Your Head! And some Python libraries for architectural improvements: Use NumPy for vector ops Use multiprocessing for parallelization Use bloscpack for compression
  • 67. I Crush Your Head! We sped up our calculation so that it runs in: 0.003% of the time or 27317 times faster 4.4 orders of magnitude
  • 68. Crushing the Head of the Snake Any questions? @aminusfu