SlideShare a Scribd company logo
1 of 68
Crushing the Head of the Snake
Robert Brewer
Chief Architect
Crunch.io
How to Time
from timeit import Timer
>>> range(5)
[0, 1, 2, 3, 4]
>>> t = Timer("range(a)", "a = 1000000")
>>> t.timeit(1)
0.028472900390625
>>> t.timeit(100)
1.8600409030914307
>>> t.timeit(1000)
18.056041955947876
Comparing algorithms
>>> Timer("range(1000)").timeit(1 000 000)
>>> Timer("range(1000)").timeit()
11.392634868621826
>>> Timer("xrange(1000)").timeit()
0.20040297508239746
>>> Timer("list(xrange(1000))").timeit()
12.207480907440186
Caveat: Overhead
>>> Timer().timeit(1000000)
0.029289960861206055
Caveat: Wall time not CPU time
>>> Timer("xrange(1000)").timeit()
0.20040297508239746
>>> Timer("xrange(1000)").repeat(3)
[0.20735883712768555,
0.1968221664428711,
0.18882489204406738]
take the minimum
How to Profile
>>> import mod
>>> import cProfile
>>> cProfile.run("mod.b()", sort="cumulative")
How to Profile
>>> import mod
>>> import cProfile
>>> cProfile.run("mod.b()", sort="cumulative")
(make changes to module)
>>> reload(mod)
>>> cProfile.run("mod.b()", sort="cumulative")
How to Profile
>>> cProfile.run("for i in xrange(3000): range(i).sort()",
sort="cumulative")
6002 function calls in 0.093 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.019 0.019 0.093 0.093 <string>:1(<module>)
3000 0.052 0.000 0.052 0.000 {list.sort}
3000 0.022 0.000 0.022 0.000 {range}
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler'
objects}
How to Profile
6002 function calls in 0.093 seconds
ncalls tottime percall cumtime percall filename:lineno(func)
3000 0.052 0.000 0.052 0.000 {list.sort}
3000 0.022 0.000 0.022 0.000 {range}
Example: Standard Deviation
>>> import numpy
>>> n = 100
>>> a = numpy.array(xrange(n),
dtype=float)
>>> a.std(ddof=1)
29.011491975882016
Example: Standard Deviation
>>> n = 4000000000
>>> a = numpy.array(xrange(n),
dtype=float)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: setting an array element
with a sequence.
Example: Standard Deviation
>>> n = 4000000000
>>> arr = numpy.zeros(n, dtype=float)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
Example: Standard Deviation
Example: Standard Deviation
Given array A broken in n parts a1...an
and local variance V(ai) = Σj(aij - ai)2
V(a) + 2(Σaij)(ai - A) + |ai|(A2 - ai
2)
|A| - ddof
n
Σi =
1
√
Example: Standard Deviation
def run():
points = 400 000 (0000)
segments = 100
part_len = points / segments
partitions = []
for p in range(segments):
part = range(part_len * p,
part_len * (p + 1))
partitions.append(part)
return stddev(partitions, ddof=1)
Example: Standard Deviation
def stddev(partitions, ddof=0):
final = 0.0
for part in partitions:
m = total(part) / length(part)
# Find the mean of the entire group.
gtotal = total([total(p) for p in partitions])
glength = total([length(p) for p in partitions])
g = gtotal / glength
adj = ((2 * total(part) * (m - g)) +
((g ** 2 - m ** 2) * length(part)))
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof))
Example: Standard Deviation
2052106 function calls in 71.025 seconds
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 71.023 71.023 stddev.py:39(run)
1 0.006 0.006 71.013 71.013 stddev.py:22(stddev)
410400 63.406 0.000 70.490 0.000 stddev.py:4(total)
100 0.341 0.003 69.178 0.692 stddev.py:15(varsum)
410601 7.076 0.000 7.076 0.000 {range}
410200 0.151 0.000 0.174 0.000 stddev.py:11(length)
820700 0.042 0.000 0.042 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
Example: Standard Deviation
400000 in 71.025 seconds
Assuming no other effects of scale,
it will take 197.3 hours (over 8 days)
to calculate our 4 billion-row array.
Example: Standard Deviation
Can we calculate
our 4 billion-row array in
1 minute?
That’s 400,000 in 6ms.
All we need is an 11,837.5x speedup.
Optimization
Example: Standard Deviation
2052106 function calls in 71.025 seconds
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 71.023 71.023 stddev.py:39(run)
1 0.006 0.006 71.013 71.013 stddev.py:22(stddev)
410400 63.406 0.000 70.490 0.000 stddev.py:4(total)
100 0.341 0.003 69.178 0.692 stddev.py:15(varsum)
410601 7.076 0.000 7.076 0.000 {range}
410200 0.151 0.000 0.174 0.000 stddev.py:11(length)
820700 0.042 0.000 0.042 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
Amongst Our Weaponry
Extracting loop invariants
Extracting Loop Invariants
def varsum(arr):
vs = 0
for j in range(len(arr)):
mean = (total(arr) / length(arr))
vs += (arr[j] - mean) ** 2
return vs
Extracting Loop Invariants
def varsum(arr):
vs = 0
mean = (total(arr) / length(arr))
for j in range(len(arr)):
vs += (arr[j] - mean) ** 2
return vs
Extracting Loop Invariants
52606 calls in 1.944 seconds (36x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 1.942 1.942 stddev1.py:41(run)
1 0.006 0.006 1.932 1.932 stddev1.py:23(stddev)
10500 1.673 0.000 1.859 0.000 stddev1.py:4(total)
10701 0.196 0.000 0.196 0.000 {range}
100 0.062 0.001 0.081 0.001 stddev1.py:15(varsum)
10300 0.003 0.000 0.003 0.000 stddev1.py:11(length)
20900 0.001 0.000 0.001 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
still 5.4 hrs
Extracting Loop Invariants
def stddev(partitions, ddof=0):
final = 0.0
for part in partitions:
m = total(part) / length(part)
# Find the mean of the entire group.
gtotal = total([total(p) for p in partitions])
glength = total([length(p) for p in partitions])
g = gtotal / glength
adj = ((2 * total(part) * (m - g)) +
((g ** 2 - m ** 2) * length(part)))
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof))
Extracting Loop Invariants
def stddev(partitions, ddof=0):
final = 0.0
# Find the mean of the entire group.
gtotal = total([total(p) for p in partitions])
glength = total([length(p) for p in partitions])
g = gtotal / glength
for part in partitions:
m = total(part) / length(part)
adj = ((2 * total(part) * (m - g)) +
((g ** 2 - m ** 2) * length(part)))
final += varsum(part) + adj
return math.sqrt(final / (glength - ddof))
Extracting Loop Invariants
2512 function calls in 0.142 seconds (13x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.140 0.140 stddev1.py:42(run)
1 0.000 0.000 0.136 0.136 stddev1.py:23(stddev)
100 0.063 0.001 0.082 0.001 stddev1.py:15(varsum)
402 0.064 0.000 0.071 0.000 stddev1.py:4(total)
603 0.013 0.000 0.013 0.000 {range}
400 0.000 0.000 0.000 0.000 stddev1.py:11(length)
902 0.000 0.000 0.000 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
still 23 minutes
Amongst Our Weaponry
Use builtin Python functions
whenever possible
Use Python Builtins
def total(arr):
s = 0
for j in range(len(arr)):
s += arr[j]
return s
Use Python Builtins
def total(arr):
s = 0
for j in range(len(arr)):
s += arr[j]
return s
def total(arr):
return sum(arr)
Use Python Builtins
2110 function calls in 0.096 seconds (1.47x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.093 0.093 stddev1.py:39(run)
1 0.000 0.000 0.083 0.083 stddev1.py:20(stddev)
100 0.065 0.001 0.070 0.001 stddev1.py:12(varsum)
402 0.000 0.000 0.015 0.000 stddev1.py:4(total)
402 0.015 0.000 0.015 0.000 {sum}
201 0.012 0.000 0.012 0.000 {range}
400 0.000 0.000 0.000 0.000 stddev1.py:8(length)
500 0.000 0.000 0.000 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
still 16 minutes
Use Python Builtins
2110 function calls in 0.096 seconds (1.47x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.093 0.093 stddev1.py:39(run)
1 0.000 0.000 0.083 0.083 stddev1.py:20(stddev)
100 0.065 0.001 0.070 0.001 stddev1.py:12(varsum)
402 0.000 0.000 0.015 0.000 stddev1.py:4(total)
402 0.015 0.000 0.015 0.000 {sum}
201 0.012 0.000 0.012 0.000 {range}
400 0.000 0.000 0.000 0.000 stddev1.py:8(length)
500 0.000 0.000 0.000 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
Use Python Builtins
def varsum(arr):
vs = 0
mean = (total(arr) / length(arr))
for j in range(len(arr)):
vs += (arr[j] - mean) ** 2
return vs
Use Python Builtins
def varsum(arr):
mean = (total(arr) / length(arr))
return sum((v - mean) ** 2
for v in arr)
Use Python Builtins
402110 function calls in 0.122 seconds
1.27x slower
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.120 0.120 stddev.py:36(run)
1 0.000 0.000 0.115 0.115 stddev.py:17(stddev)
502 0.044 0.000 0.114 0.000 {sum}
100 0.000 0.000 0.106 0.001 stddev.py:12(varsum)
400100 0.070 0.000 0.070 0.000 stddev.py:14(genexpr)
402 0.000 0.000 0.011 0.000 stddev.py:4(total)
…
Amongst Our Weaponry
Reduce function calls
Reduce Function Calls
>>> Timer("sum(a)", "a = range(10)").repeat(3)
[0.15801000595092773,
0.1406857967376709,
0.14577603340148926]
>>> Timer("total(a)",
"a = range(10); total = lambda x: sum(x)"
).repeat(3)
[0.2066800594329834,
0.1998300552368164,
0.21536493301391602]
0.000000059 seconds per call
Reduce Function Calls
def variances_squared(arr):
mean = (total(arr) / length(arr))
for v in arr:
yield (v - mean) ** 2
Reduce Function Calls
def varsum(arr):
mean = (total(arr) / length(arr))
return sum( (v - mean) ** 2
for v in arr )
def varsum(arr):
mean = (total(arr) / length(arr))
return sum([(v - mean) ** 2
for v in arr])
Reduce Function Calls
2010 function calls in 0.082 seconds (1.17x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.080 0.080 stddev.py:36(run)
1 0.000 0.000 0.071 0.071 stddev.py:17(stddev)
100 0.050 0.001 0.056 0.001 stddev.py:12(varsum)
502 0.020 0.000 0.020 0.000 {sum}
402 0.000 0.000 0.016 0.000 stddev.py:4(total)
101 0.009 0.000 0.009 0.000 {range}
400 0.000 0.000 0.000 0.000 stddev.py:8(length)
400 0.000 0.000 0.000 0.000 {len}
100 0.000 0.000 0.000 0.000 {list.append}
1 0.000 0.000 0.000 0.000 {math.sqrt}
still 13+ minutes
Amongst Our Weaponry
Vector operations
with NumPy
Vector Operations
part = numpy.array(
xrange(...), dtype=float)
def total(arr):
return arr.sum()
def varsum(arr):
return (
(arr - arr.mean()) ** 2).sum()
Vector Operations
3408 function calls in 0.057 seconds (1.43x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.057 0.057 stddev1.py:37(run)
200 0.051 0.000 0.051 0.000 {numpy...array}
1 0.001 0.001 0.006 0.006 stddev1.py:18(stddev)
500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce}
100 0.001 0.000 0.003 0.000 stddev1.py:14(varsum)
400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum}
300 0.000 0.000 0.002 0.000 stddev1.py:6(total)
100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean}
…
still 9.5 minutes
Vector Operations
3408 function calls in 0.057 seconds (1.43x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.057 0.057 stddev1.py:37(run)
200 0.051 0.000 0.051 0.000 {numpy...array}
1 0.001 0.001 0.006 0.006 stddev1.py:18(stddev)
500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce}
100 0.001 0.000 0.003 0.000 stddev1.py:14(varsum)
400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum}
300 0.000 0.000 0.002 0.000 stddev1.py:6(total)
100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean}
…
still 9.5 minutes
Vector Operations
3408 function calls in 0.006 seconds (13.6x)
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.001 0.001 0.006 0.006 stddev1.py:18(stddev)
500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce}
100 0.001 0.000 0.003 0.000 stddev1.py:14(varsum)
400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum}
300 0.000 0.000 0.002 0.000 stddev1.py:6(total)
100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean}
…
should be exactly 1 minute
Vector Operations
Let’s try 4 billion!
Bump up that N...
Vector Operations
MemoryError
Oh, yeah...
Amongst Our Weaponry
Parallelization
with
multiprocessing
Parallelization
from multiprocessing import Pool
def run():
results = Pool().map(
run_one, range(segments))
result = stddev(results)
return result
Parallelization
def run_one(i):
p = numpy.memmap(
'stddev.%d' % i, dtype=float,
mode='r', shape=(part_len,))
T, L = p.sum(), float(len(p))
m = T / L
V = ((p - m) ** 2).sum()
return T, L, V
Parallelization
def stddev(TLVs, ddof=0):
final = 0.0
totals = [T for T, L, V in TLVs]
lengths = [L for T, L, V in TLVs]
glength = sum(lengths)
g = sum(totals) / glength
for T, L, V in TLVs:
m = T / L
adj = ((2 * T * (m - g)) + ((g ** 2 - m ** 2) * L))
final += V + adj
return math.sqrt(final / (glength - ddof))
Parallelization
3734 function calls in 0.024 seconds
6x slower
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 0.024 0.024 stddev.py:47(run)
4 0.000 0.000 0.011 0.003 threading.py:234(wait)
22 0.011 0.000 0.011 0.000 {thread.lock.acquire}
1 0.000 0.000 0.011 0.011 pool.py:222(map)
1 0.000 0.000 0.008 0.008 pool.py:113(__init__)
4 0.001 0.000 0.005 0.001 process.py:116(start)
1 0.003 0.003 0.005 0.005 stddev.py:11(stddev)
4 0.000 0.000 0.004 0.001 forking.py:115(init)
4 0.003 0.001 0.003 0.001 {posix.fork}
...
Parallelization
Could that waiting be insignificant
when we scale up to 4 billion?
Let’s try it!
Parallelization
3766 function calls in 67.811 seconds
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 67.811 67.811 stddev.py:47(run)
4 0.000 0.000 67.747 16.930 threading.py:234(wait)
22 67.747 3.079 67.747 3.079 {thread.lock.acquire}
1 0.000 0.000 67.747 67.747 pool.py:222(map)
1 0.000 0.000 0.062 0.060 pool.py:113(__init__)
4 0.000 0.000 0.058 0.014 process.py:116(start)
4 0.057 0.014 0.057 0.014 {posix.fork}
1 0.003 0.003 0.005 0.005 stddev.py:11(stddev)
2 0.002 0.001 0.002 0.001 {sum}
SO CLOSE! 1.13 minutes
Parallelization
def run_one(i):
if i == 50:
cProfile.runctx(..., "prf.50")
>>> import pstats
>>> s = pstats.Stats("prf.50")
>>> s.sort_stats("cumulative")
<pstats.Stats instance at 0x2bddcb0>
>>> _.print_stats()
Parallelization
57 function calls in 2.804 seconds
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.431 0.431 2.791 2.791 stddev.py:43(run_one)
2 0.000 0.000 2.360 1.180 numpy.ndarray.sum
2 2.360 1.180 2.360 1.180 numpy.ufunc.reduce
1 0.000 0.000 0.000 0.000 memmap.py:195(__new__)
Parallelization
def run_one(i):
p = numpy.memmap(
'stddev.%d' % i, dtype=float,
mode='r', shape=(part_len,))
T, L = p.sum(), float(len(p))
m = T / L
V = ((p - m) ** 2).sum()
return T, L, V
200 seconds / 4 cores = 50
Parallelization? Serialization!
67.8 seconds for 4 billion rows, but
-50 of those are loading data!
17.8 seconds to do the actual math.
Serialization
import bloscpack as bp
bargs = bp.args.DEFAULT_BLOSC_ARGS
bargs['clevel'] = 6
bp.pack_ndarray_file(
part, fname, blosc_args=bargs)
part = bp.unpack_ndarray_file(fname)
Serialization
Let’s try it!
I Crush
Your
Head!
I Crush Your Head!
1153 function calls in 26.166 seconds
ncalls tottime percall cumtime percall filename:lineno(func)
1 0.000 0.000 26.166 26.166 stddev_bp.py:56(run)
4 0.000 0.000 26.134 6.53 threading.py:234(wait)
22 26.134 1.188 26.134 1.188 thread.lock.acquire
1 0.000 0.000 26.133 26.133 pool.py:222(map)
1 0.000 0.000 26.133 26.133 pool.py:521(get)
1 0.000 0.000 26.133 26.133 pool.py:513(wait)
1 0.003 0.003 0.030 0.030 __init__.py:227(Pool)
1 0.000 0.000 0.021 0.021 pool.py:113(__init__)
I Crush Your Head!
With some time-tested general
programming techniques:
Extract loop invariants
Use language builtins
Reduce function calls
I Crush Your Head!
And some Python libraries
for architectural improvements:
Use NumPy for vector ops
Use multiprocessing for parallelization
Use bloscpack for compression
I Crush Your Head!
We sped up our calculation
so that it runs in:
0.003% of the time
or 27317 times faster
4.4 orders of magnitude
Crushing the Head of the Snake
Any questions?
@aminusfu
bob@crunch.io

More Related Content

What's hot

The Ring programming language version 1.9 book - Part 32 of 210
The Ring programming language version 1.9 book - Part 32 of 210The Ring programming language version 1.9 book - Part 32 of 210
The Ring programming language version 1.9 book - Part 32 of 210Mahmoud Samir Fayed
 
The Ring programming language version 1.9 book - Part 45 of 210
The Ring programming language version 1.9 book - Part 45 of 210The Ring programming language version 1.9 book - Part 45 of 210
The Ring programming language version 1.9 book - Part 45 of 210Mahmoud Samir Fayed
 
The Ring programming language version 1.8 book - Part 42 of 202
The Ring programming language version 1.8 book - Part 42 of 202The Ring programming language version 1.8 book - Part 42 of 202
The Ring programming language version 1.8 book - Part 42 of 202Mahmoud Samir Fayed
 
The Ring programming language version 1.8 book - Part 30 of 202
The Ring programming language version 1.8 book - Part 30 of 202The Ring programming language version 1.8 book - Part 30 of 202
The Ring programming language version 1.8 book - Part 30 of 202Mahmoud Samir Fayed
 
Numerical Algorithm for a few Special Functions
Numerical Algorithm for a few Special FunctionsNumerical Algorithm for a few Special Functions
Numerical Algorithm for a few Special FunctionsAmos Tsai
 
The Ring programming language version 1.3 book - Part 50 of 88
The Ring programming language version 1.3 book - Part 50 of 88The Ring programming language version 1.3 book - Part 50 of 88
The Ring programming language version 1.3 book - Part 50 of 88Mahmoud Samir Fayed
 
The Ring programming language version 1.10 book - Part 44 of 212
The Ring programming language version 1.10 book - Part 44 of 212The Ring programming language version 1.10 book - Part 44 of 212
The Ring programming language version 1.10 book - Part 44 of 212Mahmoud Samir Fayed
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Dr. Volkan OBAN
 
The Ring programming language version 1.5.3 book - Part 77 of 184
The Ring programming language version 1.5.3 book - Part 77 of 184The Ring programming language version 1.5.3 book - Part 77 of 184
The Ring programming language version 1.5.3 book - Part 77 of 184Mahmoud Samir Fayed
 
The Ring programming language version 1.4 book - Part 18 of 30
The Ring programming language version 1.4 book - Part 18 of 30The Ring programming language version 1.4 book - Part 18 of 30
The Ring programming language version 1.4 book - Part 18 of 30Mahmoud Samir Fayed
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSqlDmytro Shylovskyi
 
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...Codemotion
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slidesYanchang Zhao
 
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀EXEM
 
Matched filter detection
Matched filter detectionMatched filter detection
Matched filter detectionSURYA DEEPAK
 
Kotlin Coroutines. Flow is coming
Kotlin Coroutines. Flow is comingKotlin Coroutines. Flow is coming
Kotlin Coroutines. Flow is comingKirill Rozov
 
DSP_FOEHU - Lec 03 - Sampling of Continuous Time Signals
DSP_FOEHU - Lec 03 - Sampling of Continuous Time SignalsDSP_FOEHU - Lec 03 - Sampling of Continuous Time Signals
DSP_FOEHU - Lec 03 - Sampling of Continuous Time SignalsAmr E. Mohamed
 

What's hot (20)

The Ring programming language version 1.9 book - Part 32 of 210
The Ring programming language version 1.9 book - Part 32 of 210The Ring programming language version 1.9 book - Part 32 of 210
The Ring programming language version 1.9 book - Part 32 of 210
 
The Ring programming language version 1.9 book - Part 45 of 210
The Ring programming language version 1.9 book - Part 45 of 210The Ring programming language version 1.9 book - Part 45 of 210
The Ring programming language version 1.9 book - Part 45 of 210
 
The Ring programming language version 1.8 book - Part 42 of 202
The Ring programming language version 1.8 book - Part 42 of 202The Ring programming language version 1.8 book - Part 42 of 202
The Ring programming language version 1.8 book - Part 42 of 202
 
The Ring programming language version 1.8 book - Part 30 of 202
The Ring programming language version 1.8 book - Part 30 of 202The Ring programming language version 1.8 book - Part 30 of 202
The Ring programming language version 1.8 book - Part 30 of 202
 
Numerical Algorithm for a few Special Functions
Numerical Algorithm for a few Special FunctionsNumerical Algorithm for a few Special Functions
Numerical Algorithm for a few Special Functions
 
The Ring programming language version 1.3 book - Part 50 of 88
The Ring programming language version 1.3 book - Part 50 of 88The Ring programming language version 1.3 book - Part 50 of 88
The Ring programming language version 1.3 book - Part 50 of 88
 
The Ring programming language version 1.10 book - Part 44 of 212
The Ring programming language version 1.10 book - Part 44 of 212The Ring programming language version 1.10 book - Part 44 of 212
The Ring programming language version 1.10 book - Part 44 of 212
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
 
The Ring programming language version 1.5.3 book - Part 77 of 184
The Ring programming language version 1.5.3 book - Part 77 of 184The Ring programming language version 1.5.3 book - Part 77 of 184
The Ring programming language version 1.5.3 book - Part 77 of 184
 
ScalaMeter 2012
ScalaMeter 2012ScalaMeter 2012
ScalaMeter 2012
 
The Ring programming language version 1.4 book - Part 18 of 30
The Ring programming language version 1.4 book - Part 18 of 30The Ring programming language version 1.4 book - Part 18 of 30
The Ring programming language version 1.4 book - Part 18 of 30
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSql
 
Groovy
GroovyGroovy
Groovy
 
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
 
Lesson10
Lesson10Lesson10
Lesson10
 
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
 
Matched filter detection
Matched filter detectionMatched filter detection
Matched filter detection
 
Kotlin Coroutines. Flow is coming
Kotlin Coroutines. Flow is comingKotlin Coroutines. Flow is coming
Kotlin Coroutines. Flow is coming
 
DSP_FOEHU - Lec 03 - Sampling of Continuous Time Signals
DSP_FOEHU - Lec 03 - Sampling of Continuous Time SignalsDSP_FOEHU - Lec 03 - Sampling of Continuous Time Signals
DSP_FOEHU - Lec 03 - Sampling of Continuous Time Signals
 

Viewers also liked

Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014PyData
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypetPyData
 
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014PyData
 
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"PyData
 
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...PyData
 
Nipype
NipypeNipype
NipypePyData
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...PyData
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...PyData
 
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischInteractive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischPyData
 
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...PyData
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerPyData
 
Python resampling
Python resamplingPython resampling
Python resamplingPyData
 
Doing frequentist statistics with scipy
Doing frequentist statistics with scipyDoing frequentist statistics with scipy
Doing frequentist statistics with scipyPyData
 
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataFang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataPyData
 
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014PyData
 
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPyData
 
Making your code faster cython and parallel processing in the jupyter notebook
Making your code faster   cython and parallel processing in the jupyter notebookMaking your code faster   cython and parallel processing in the jupyter notebook
Making your code faster cython and parallel processing in the jupyter notebookPyData
 
Large scale-ctr-prediction lessons-learned-florian-hartl
Large scale-ctr-prediction lessons-learned-florian-hartlLarge scale-ctr-prediction lessons-learned-florian-hartl
Large scale-ctr-prediction lessons-learned-florian-hartlPyData
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
 
Embracing the Monolith in Small Teams: Doubling down on python to move fast w...
Embracing the Monolith in Small Teams: Doubling down on python to move fast w...Embracing the Monolith in Small Teams: Doubling down on python to move fast w...
Embracing the Monolith in Small Teams: Doubling down on python to move fast w...PyData
 

Viewers also liked (20)

Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypet
 
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
 
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
 
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
 
Nipype
NipypeNipype
Nipype
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
 
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischInteractive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
 
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
 
Python resampling
Python resamplingPython resampling
Python resampling
 
Doing frequentist statistics with scipy
Doing frequentist statistics with scipyDoing frequentist statistics with scipy
Doing frequentist statistics with scipy
 
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataFang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
 
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014Low-rank matrix approximations in Python by Christian Thurau PyData 2014
Low-rank matrix approximations in Python by Christian Thurau PyData 2014
 
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices Environment
 
Making your code faster cython and parallel processing in the jupyter notebook
Making your code faster   cython and parallel processing in the jupyter notebookMaking your code faster   cython and parallel processing in the jupyter notebook
Making your code faster cython and parallel processing in the jupyter notebook
 
Large scale-ctr-prediction lessons-learned-florian-hartl
Large scale-ctr-prediction lessons-learned-florian-hartlLarge scale-ctr-prediction lessons-learned-florian-hartl
Large scale-ctr-prediction lessons-learned-florian-hartl
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
Embracing the Monolith in Small Teams: Doubling down on python to move fast w...
Embracing the Monolith in Small Teams: Doubling down on python to move fast w...Embracing the Monolith in Small Teams: Doubling down on python to move fast w...
Embracing the Monolith in Small Teams: Doubling down on python to move fast w...
 

Similar to Crushing the Head of the Snake by Robert Brewer PyData SV 2014

The Ring programming language version 1.3 book - Part 16 of 88
The Ring programming language version 1.3 book - Part 16 of 88The Ring programming language version 1.3 book - Part 16 of 88
The Ring programming language version 1.3 book - Part 16 of 88Mahmoud Samir Fayed
 
The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.1 book - Part 23 of 180The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.1 book - Part 23 of 180Mahmoud Samir Fayed
 
The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.5.2 book - Part 24 of 181The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.5.2 book - Part 24 of 181Mahmoud Samir Fayed
 
The Ring programming language version 1.10 book - Part 33 of 212
The Ring programming language version 1.10 book - Part 33 of 212The Ring programming language version 1.10 book - Part 33 of 212
The Ring programming language version 1.10 book - Part 33 of 212Mahmoud Samir Fayed
 
The Ring programming language version 1.2 book - Part 14 of 84
The Ring programming language version 1.2 book - Part 14 of 84The Ring programming language version 1.2 book - Part 14 of 84
The Ring programming language version 1.2 book - Part 14 of 84Mahmoud Samir Fayed
 
The Ring programming language version 1.7 book - Part 28 of 196
The Ring programming language version 1.7 book - Part 28 of 196The Ring programming language version 1.7 book - Part 28 of 196
The Ring programming language version 1.7 book - Part 28 of 196Mahmoud Samir Fayed
 
The Ring programming language version 1.5.2 book - Part 75 of 181
The Ring programming language version 1.5.2 book - Part 75 of 181The Ring programming language version 1.5.2 book - Part 75 of 181
The Ring programming language version 1.5.2 book - Part 75 of 181Mahmoud Samir Fayed
 
Python profiling
Python profilingPython profiling
Python profilingdreampuf
 
mat lab introduction and basics to learn
mat lab introduction and basics to learnmat lab introduction and basics to learn
mat lab introduction and basics to learnpavan373
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with RYanchang Zhao
 
The Ring programming language version 1.5.3 book - Part 35 of 184
The Ring programming language version 1.5.3 book - Part 35 of 184The Ring programming language version 1.5.3 book - Part 35 of 184
The Ring programming language version 1.5.3 book - Part 35 of 184Mahmoud Samir Fayed
 
Fourier project presentation
Fourier project  presentationFourier project  presentation
Fourier project presentation志璿 楊
 
The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184Mahmoud Samir Fayed
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchAhmed BESBES
 

Similar to Crushing the Head of the Snake by Robert Brewer PyData SV 2014 (20)

The Ring programming language version 1.3 book - Part 16 of 88
The Ring programming language version 1.3 book - Part 16 of 88The Ring programming language version 1.3 book - Part 16 of 88
The Ring programming language version 1.3 book - Part 16 of 88
 
The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.1 book - Part 23 of 180The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.1 book - Part 23 of 180
 
The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.5.2 book - Part 24 of 181The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.5.2 book - Part 24 of 181
 
The Ring programming language version 1.10 book - Part 33 of 212
The Ring programming language version 1.10 book - Part 33 of 212The Ring programming language version 1.10 book - Part 33 of 212
The Ring programming language version 1.10 book - Part 33 of 212
 
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
 
The Ring programming language version 1.2 book - Part 14 of 84
The Ring programming language version 1.2 book - Part 14 of 84The Ring programming language version 1.2 book - Part 14 of 84
The Ring programming language version 1.2 book - Part 14 of 84
 
The Ring programming language version 1.7 book - Part 28 of 196
The Ring programming language version 1.7 book - Part 28 of 196The Ring programming language version 1.7 book - Part 28 of 196
The Ring programming language version 1.7 book - Part 28 of 196
 
The Ring programming language version 1.5.2 book - Part 75 of 181
The Ring programming language version 1.5.2 book - Part 75 of 181The Ring programming language version 1.5.2 book - Part 75 of 181
The Ring programming language version 1.5.2 book - Part 75 of 181
 
Python profiling
Python profilingPython profiling
Python profiling
 
mat lab introduction and basics to learn
mat lab introduction and basics to learnmat lab introduction and basics to learn
mat lab introduction and basics to learn
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
 
The Ring programming language version 1.5.3 book - Part 35 of 184
The Ring programming language version 1.5.3 book - Part 35 of 184The Ring programming language version 1.5.3 book - Part 35 of 184
The Ring programming language version 1.5.3 book - Part 35 of 184
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Fourier project presentation
Fourier project  presentationFourier project  presentation
Fourier project presentation
 
The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184
 
Clojure basics
Clojure basicsClojure basics
Clojure basics
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 
MLE Example
MLE ExampleMLE Example
MLE Example
 

More from PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Recently uploaded

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Recently uploaded (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Crushing the Head of the Snake by Robert Brewer PyData SV 2014

  • 1. Crushing the Head of the Snake Robert Brewer Chief Architect Crunch.io
  • 2. How to Time from timeit import Timer >>> range(5) [0, 1, 2, 3, 4] >>> t = Timer("range(a)", "a = 1000000") >>> t.timeit(1) 0.028472900390625 >>> t.timeit(100) 1.8600409030914307 >>> t.timeit(1000) 18.056041955947876
  • 3. Comparing algorithms >>> Timer("range(1000)").timeit(1 000 000) >>> Timer("range(1000)").timeit() 11.392634868621826 >>> Timer("xrange(1000)").timeit() 0.20040297508239746 >>> Timer("list(xrange(1000))").timeit() 12.207480907440186
  • 5. Caveat: Wall time not CPU time >>> Timer("xrange(1000)").timeit() 0.20040297508239746 >>> Timer("xrange(1000)").repeat(3) [0.20735883712768555, 0.1968221664428711, 0.18882489204406738] take the minimum
  • 6. How to Profile >>> import mod >>> import cProfile >>> cProfile.run("mod.b()", sort="cumulative")
  • 7. How to Profile >>> import mod >>> import cProfile >>> cProfile.run("mod.b()", sort="cumulative") (make changes to module) >>> reload(mod) >>> cProfile.run("mod.b()", sort="cumulative")
  • 8. How to Profile >>> cProfile.run("for i in xrange(3000): range(i).sort()", sort="cumulative") 6002 function calls in 0.093 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(func) 1 0.019 0.019 0.093 0.093 <string>:1(<module>) 3000 0.052 0.000 0.052 0.000 {list.sort} 3000 0.022 0.000 0.022 0.000 {range} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
  • 9. How to Profile 6002 function calls in 0.093 seconds ncalls tottime percall cumtime percall filename:lineno(func) 3000 0.052 0.000 0.052 0.000 {list.sort} 3000 0.022 0.000 0.022 0.000 {range}
  • 10. Example: Standard Deviation >>> import numpy >>> n = 100 >>> a = numpy.array(xrange(n), dtype=float) >>> a.std(ddof=1) 29.011491975882016
  • 11. Example: Standard Deviation >>> n = 4000000000 >>> a = numpy.array(xrange(n), dtype=float) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: setting an array element with a sequence.
  • 12. Example: Standard Deviation >>> n = 4000000000 >>> arr = numpy.zeros(n, dtype=float) Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError
  • 14. Example: Standard Deviation Given array A broken in n parts a1...an and local variance V(ai) = Σj(aij - ai)2 V(a) + 2(Σaij)(ai - A) + |ai|(A2 - ai 2) |A| - ddof n Σi = 1 √
  • 15. Example: Standard Deviation def run(): points = 400 000 (0000) segments = 100 part_len = points / segments partitions = [] for p in range(segments): part = range(part_len * p, part_len * (p + 1)) partitions.append(part) return stddev(partitions, ddof=1)
  • 16. Example: Standard Deviation def stddev(partitions, ddof=0): final = 0.0 for part in partitions: m = total(part) / length(part) # Find the mean of the entire group. gtotal = total([total(p) for p in partitions]) glength = total([length(p) for p in partitions]) g = gtotal / glength adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** 2) * length(part))) final += varsum(part) + adj return math.sqrt(final / (glength - ddof))
  • 17. Example: Standard Deviation 2052106 function calls in 71.025 seconds ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 71.023 71.023 stddev.py:39(run) 1 0.006 0.006 71.013 71.013 stddev.py:22(stddev) 410400 63.406 0.000 70.490 0.000 stddev.py:4(total) 100 0.341 0.003 69.178 0.692 stddev.py:15(varsum) 410601 7.076 0.000 7.076 0.000 {range} 410200 0.151 0.000 0.174 0.000 stddev.py:11(length) 820700 0.042 0.000 0.042 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}
  • 18. Example: Standard Deviation 400000 in 71.025 seconds Assuming no other effects of scale, it will take 197.3 hours (over 8 days) to calculate our 4 billion-row array.
  • 19. Example: Standard Deviation Can we calculate our 4 billion-row array in 1 minute? That’s 400,000 in 6ms. All we need is an 11,837.5x speedup.
  • 21. Example: Standard Deviation 2052106 function calls in 71.025 seconds ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 71.023 71.023 stddev.py:39(run) 1 0.006 0.006 71.013 71.013 stddev.py:22(stddev) 410400 63.406 0.000 70.490 0.000 stddev.py:4(total) 100 0.341 0.003 69.178 0.692 stddev.py:15(varsum) 410601 7.076 0.000 7.076 0.000 {range} 410200 0.151 0.000 0.174 0.000 stddev.py:11(length) 820700 0.042 0.000 0.042 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}
  • 23. Extracting Loop Invariants def varsum(arr): vs = 0 for j in range(len(arr)): mean = (total(arr) / length(arr)) vs += (arr[j] - mean) ** 2 return vs
  • 24. Extracting Loop Invariants def varsum(arr): vs = 0 mean = (total(arr) / length(arr)) for j in range(len(arr)): vs += (arr[j] - mean) ** 2 return vs
  • 25. Extracting Loop Invariants 52606 calls in 1.944 seconds (36x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 1.942 1.942 stddev1.py:41(run) 1 0.006 0.006 1.932 1.932 stddev1.py:23(stddev) 10500 1.673 0.000 1.859 0.000 stddev1.py:4(total) 10701 0.196 0.000 0.196 0.000 {range} 100 0.062 0.001 0.081 0.001 stddev1.py:15(varsum) 10300 0.003 0.000 0.003 0.000 stddev1.py:11(length) 20900 0.001 0.000 0.001 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt} still 5.4 hrs
  • 26. Extracting Loop Invariants def stddev(partitions, ddof=0): final = 0.0 for part in partitions: m = total(part) / length(part) # Find the mean of the entire group. gtotal = total([total(p) for p in partitions]) glength = total([length(p) for p in partitions]) g = gtotal / glength adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** 2) * length(part))) final += varsum(part) + adj return math.sqrt(final / (glength - ddof))
  • 27. Extracting Loop Invariants def stddev(partitions, ddof=0): final = 0.0 # Find the mean of the entire group. gtotal = total([total(p) for p in partitions]) glength = total([length(p) for p in partitions]) g = gtotal / glength for part in partitions: m = total(part) / length(part) adj = ((2 * total(part) * (m - g)) + ((g ** 2 - m ** 2) * length(part))) final += varsum(part) + adj return math.sqrt(final / (glength - ddof))
  • 28. Extracting Loop Invariants 2512 function calls in 0.142 seconds (13x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.140 0.140 stddev1.py:42(run) 1 0.000 0.000 0.136 0.136 stddev1.py:23(stddev) 100 0.063 0.001 0.082 0.001 stddev1.py:15(varsum) 402 0.064 0.000 0.071 0.000 stddev1.py:4(total) 603 0.013 0.000 0.013 0.000 {range} 400 0.000 0.000 0.000 0.000 stddev1.py:11(length) 902 0.000 0.000 0.000 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt} still 23 minutes
  • 29. Amongst Our Weaponry Use builtin Python functions whenever possible
  • 30. Use Python Builtins def total(arr): s = 0 for j in range(len(arr)): s += arr[j] return s
  • 31. Use Python Builtins def total(arr): s = 0 for j in range(len(arr)): s += arr[j] return s def total(arr): return sum(arr)
  • 32. Use Python Builtins 2110 function calls in 0.096 seconds (1.47x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.093 0.093 stddev1.py:39(run) 1 0.000 0.000 0.083 0.083 stddev1.py:20(stddev) 100 0.065 0.001 0.070 0.001 stddev1.py:12(varsum) 402 0.000 0.000 0.015 0.000 stddev1.py:4(total) 402 0.015 0.000 0.015 0.000 {sum} 201 0.012 0.000 0.012 0.000 {range} 400 0.000 0.000 0.000 0.000 stddev1.py:8(length) 500 0.000 0.000 0.000 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt} still 16 minutes
  • 33. Use Python Builtins 2110 function calls in 0.096 seconds (1.47x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.093 0.093 stddev1.py:39(run) 1 0.000 0.000 0.083 0.083 stddev1.py:20(stddev) 100 0.065 0.001 0.070 0.001 stddev1.py:12(varsum) 402 0.000 0.000 0.015 0.000 stddev1.py:4(total) 402 0.015 0.000 0.015 0.000 {sum} 201 0.012 0.000 0.012 0.000 {range} 400 0.000 0.000 0.000 0.000 stddev1.py:8(length) 500 0.000 0.000 0.000 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt}
  • 34. Use Python Builtins def varsum(arr): vs = 0 mean = (total(arr) / length(arr)) for j in range(len(arr)): vs += (arr[j] - mean) ** 2 return vs
  • 35. Use Python Builtins def varsum(arr): mean = (total(arr) / length(arr)) return sum((v - mean) ** 2 for v in arr)
  • 36. Use Python Builtins 402110 function calls in 0.122 seconds 1.27x slower ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.120 0.120 stddev.py:36(run) 1 0.000 0.000 0.115 0.115 stddev.py:17(stddev) 502 0.044 0.000 0.114 0.000 {sum} 100 0.000 0.000 0.106 0.001 stddev.py:12(varsum) 400100 0.070 0.000 0.070 0.000 stddev.py:14(genexpr) 402 0.000 0.000 0.011 0.000 stddev.py:4(total) …
  • 37.
  • 38. Amongst Our Weaponry Reduce function calls
  • 39. Reduce Function Calls >>> Timer("sum(a)", "a = range(10)").repeat(3) [0.15801000595092773, 0.1406857967376709, 0.14577603340148926] >>> Timer("total(a)", "a = range(10); total = lambda x: sum(x)" ).repeat(3) [0.2066800594329834, 0.1998300552368164, 0.21536493301391602] 0.000000059 seconds per call
  • 40. Reduce Function Calls def variances_squared(arr): mean = (total(arr) / length(arr)) for v in arr: yield (v - mean) ** 2
  • 41. Reduce Function Calls def varsum(arr): mean = (total(arr) / length(arr)) return sum( (v - mean) ** 2 for v in arr ) def varsum(arr): mean = (total(arr) / length(arr)) return sum([(v - mean) ** 2 for v in arr])
  • 42. Reduce Function Calls 2010 function calls in 0.082 seconds (1.17x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.080 0.080 stddev.py:36(run) 1 0.000 0.000 0.071 0.071 stddev.py:17(stddev) 100 0.050 0.001 0.056 0.001 stddev.py:12(varsum) 502 0.020 0.000 0.020 0.000 {sum} 402 0.000 0.000 0.016 0.000 stddev.py:4(total) 101 0.009 0.000 0.009 0.000 {range} 400 0.000 0.000 0.000 0.000 stddev.py:8(length) 400 0.000 0.000 0.000 0.000 {len} 100 0.000 0.000 0.000 0.000 {list.append} 1 0.000 0.000 0.000 0.000 {math.sqrt} still 13+ minutes
  • 43. Amongst Our Weaponry Vector operations with NumPy
  • 44. Vector Operations part = numpy.array( xrange(...), dtype=float) def total(arr): return arr.sum() def varsum(arr): return ( (arr - arr.mean()) ** 2).sum()
  • 45. Vector Operations 3408 function calls in 0.057 seconds (1.43x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.057 0.057 stddev1.py:37(run) 200 0.051 0.000 0.051 0.000 {numpy...array} 1 0.001 0.001 0.006 0.006 stddev1.py:18(stddev) 500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce} 100 0.001 0.000 0.003 0.000 stddev1.py:14(varsum) 400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum} 300 0.000 0.000 0.002 0.000 stddev1.py:6(total) 100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean} … still 9.5 minutes
  • 46. Vector Operations 3408 function calls in 0.057 seconds (1.43x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.057 0.057 stddev1.py:37(run) 200 0.051 0.000 0.051 0.000 {numpy...array} 1 0.001 0.001 0.006 0.006 stddev1.py:18(stddev) 500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce} 100 0.001 0.000 0.003 0.000 stddev1.py:14(varsum) 400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum} 300 0.000 0.000 0.002 0.000 stddev1.py:6(total) 100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean} … still 9.5 minutes
  • 47. Vector Operations 3408 function calls in 0.006 seconds (13.6x) ncalls tottime percall cumtime percall filename:lineno(func) 1 0.001 0.001 0.006 0.006 stddev1.py:18(stddev) 500 0.003 0.000 0.003 0.000 {numpy.ufunc.reduce} 100 0.001 0.000 0.003 0.000 stddev1.py:14(varsum) 400 0.000 0.000 0.003 0.000 {numpy.ndarray.sum} 300 0.000 0.000 0.002 0.000 stddev1.py:6(total) 100 0.000 0.000 0.001 0.000 {numpy.ndarray.mean} … should be exactly 1 minute
  • 48. Vector Operations Let’s try 4 billion! Bump up that N...
  • 51. Parallelization from multiprocessing import Pool def run(): results = Pool().map( run_one, range(segments)) result = stddev(results) return result
  • 52. Parallelization def run_one(i): p = numpy.memmap( 'stddev.%d' % i, dtype=float, mode='r', shape=(part_len,)) T, L = p.sum(), float(len(p)) m = T / L V = ((p - m) ** 2).sum() return T, L, V
  • 53. Parallelization def stddev(TLVs, ddof=0): final = 0.0 totals = [T for T, L, V in TLVs] lengths = [L for T, L, V in TLVs] glength = sum(lengths) g = sum(totals) / glength for T, L, V in TLVs: m = T / L adj = ((2 * T * (m - g)) + ((g ** 2 - m ** 2) * L)) final += V + adj return math.sqrt(final / (glength - ddof))
  • 54. Parallelization 3734 function calls in 0.024 seconds 6x slower ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 0.024 0.024 stddev.py:47(run) 4 0.000 0.000 0.011 0.003 threading.py:234(wait) 22 0.011 0.000 0.011 0.000 {thread.lock.acquire} 1 0.000 0.000 0.011 0.011 pool.py:222(map) 1 0.000 0.000 0.008 0.008 pool.py:113(__init__) 4 0.001 0.000 0.005 0.001 process.py:116(start) 1 0.003 0.003 0.005 0.005 stddev.py:11(stddev) 4 0.000 0.000 0.004 0.001 forking.py:115(init) 4 0.003 0.001 0.003 0.001 {posix.fork} ...
  • 55. Parallelization Could that waiting be insignificant when we scale up to 4 billion? Let’s try it!
  • 56. Parallelization 3766 function calls in 67.811 seconds ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 67.811 67.811 stddev.py:47(run) 4 0.000 0.000 67.747 16.930 threading.py:234(wait) 22 67.747 3.079 67.747 3.079 {thread.lock.acquire} 1 0.000 0.000 67.747 67.747 pool.py:222(map) 1 0.000 0.000 0.062 0.060 pool.py:113(__init__) 4 0.000 0.000 0.058 0.014 process.py:116(start) 4 0.057 0.014 0.057 0.014 {posix.fork} 1 0.003 0.003 0.005 0.005 stddev.py:11(stddev) 2 0.002 0.001 0.002 0.001 {sum} SO CLOSE! 1.13 minutes
  • 57. Parallelization def run_one(i): if i == 50: cProfile.runctx(..., "prf.50") >>> import pstats >>> s = pstats.Stats("prf.50") >>> s.sort_stats("cumulative") <pstats.Stats instance at 0x2bddcb0> >>> _.print_stats()
  • 58. Parallelization 57 function calls in 2.804 seconds ncalls tottime percall cumtime percall filename:lineno(func) 1 0.431 0.431 2.791 2.791 stddev.py:43(run_one) 2 0.000 0.000 2.360 1.180 numpy.ndarray.sum 2 2.360 1.180 2.360 1.180 numpy.ufunc.reduce 1 0.000 0.000 0.000 0.000 memmap.py:195(__new__)
  • 59. Parallelization def run_one(i): p = numpy.memmap( 'stddev.%d' % i, dtype=float, mode='r', shape=(part_len,)) T, L = p.sum(), float(len(p)) m = T / L V = ((p - m) ** 2).sum() return T, L, V 200 seconds / 4 cores = 50
  • 60. Parallelization? Serialization! 67.8 seconds for 4 billion rows, but -50 of those are loading data! 17.8 seconds to do the actual math.
  • 61. Serialization import bloscpack as bp bargs = bp.args.DEFAULT_BLOSC_ARGS bargs['clevel'] = 6 bp.pack_ndarray_file( part, fname, blosc_args=bargs) part = bp.unpack_ndarray_file(fname)
  • 64. I Crush Your Head! 1153 function calls in 26.166 seconds ncalls tottime percall cumtime percall filename:lineno(func) 1 0.000 0.000 26.166 26.166 stddev_bp.py:56(run) 4 0.000 0.000 26.134 6.53 threading.py:234(wait) 22 26.134 1.188 26.134 1.188 thread.lock.acquire 1 0.000 0.000 26.133 26.133 pool.py:222(map) 1 0.000 0.000 26.133 26.133 pool.py:521(get) 1 0.000 0.000 26.133 26.133 pool.py:513(wait) 1 0.003 0.003 0.030 0.030 __init__.py:227(Pool) 1 0.000 0.000 0.021 0.021 pool.py:113(__init__)
  • 65. I Crush Your Head! With some time-tested general programming techniques: Extract loop invariants Use language builtins Reduce function calls
  • 66. I Crush Your Head! And some Python libraries for architectural improvements: Use NumPy for vector ops Use multiprocessing for parallelization Use bloscpack for compression
  • 67. I Crush Your Head! We sped up our calculation so that it runs in: 0.003% of the time or 27317 times faster 4.4 orders of magnitude
  • 68. Crushing the Head of the Snake Any questions? @aminusfu bob@crunch.io