SlideShare a Scribd company logo
1 of 43
Download to read offline
New Trends In Storing
And Analyzing Large
Data Silos With Python
Francesc Alted!
Freelancer
(ÜberResearch, University of Oslo)
!
April 3rd, 2015
About ÜberResearch
• Team’s 10+ years experience delivering solutions
and services for funding and research institutions
• Over 20 development partners, and clients
globally, from smallest non-profits to large
government agencies
• Portfolio company of Digital Science (Macmillan
Publishers), the younger sibling of the Nature
Publishing Group
http://www.uberresearch.com/
Data$layer$
Data$
Enrichment$
Func1onal$
layer$
Applica1on$
layer$
Global$grant$
database$$
Publica1ons,$
trials,$patents$
Internal$data,$
databases$
Data$model$
mapping$/$
cleansing$
Ins1tu1on$
disambigua1on$
Person$
disambigua1on$
Search$
Clustering$/$
topic$
modelling$
Research$
classifica1on$
support$
Visualisa1on$
support$
APIs$
NLP$/$noun$
phrase$extrac1on$
Customer$
services$and$APIs$
Customer$$
func1onali1es$
Thesaurus$support$
for$indexing$
Integra1on$in$customer$
applica1on$and$scenarios$
About Me
• Physicist by training
• Computer scientist by passion
• Open Source enthusiast by philosophy
• PyTables (2002 - 2011)
• Blosc (2009 - now)
• bcolz (2010 - now)
–Manuel Oltra, music composer
“The art is in the execution of an idea. Not in the
idea. There is not much left just from an idea.”
“Real artists ship”
–Seth Godin, writer
Why Free/Libre Projects?
• Nice way to realize yourself while helping others
OPSI
Out-of-core
Expressions
Indexed
Queries
+ a Twist
Overview
• The need for speed: fitting and analyzing as much
data as possible with your existing resources
• Recent trends in computer hardware
• bcolz: an example of data container for large
datasets following the principles of newer computer
architectures
The Need For Speed
Don’t Forget Python’s Real
Strengths
• Interactivity
• Data-oriented libraries (NumPy, Pandas, Scikit-
Learn…)
• Interactivity
• Performance (thanks to Cython, SWIG, f2py…)
• Interactivity (did I mentioned that already?)
The Need For Speed
• But interactivity without performance in Big Data is
a no go
• Designing code for data storage performance
depends very much on computer architecture
• IMO, existing Python libraries need more effort in
getting the most out of existing and future
computer architectures
The Daily Python Working
Scenario
Quiz: which computer is best for interactivity?
Although Modern Servers/Laptops
Can Be Very Complex Beasts
We need to know them better so as 

to get the most out of them
Recent Trends In
Computer Hardware
“There's Plenty of Room at the Bottom”

An Invitation to Enter a New Field of Physics
—Talk by Richard Feynman at Caltech, 1959
Memory Access Time
vs CPU Cycle Time
The gap is wide and still opening!
Computer Architecture
Evolution
Up to end 80’s 90’s and 2000’s 2010’s
Figure 1. Evolution of the hierarchical memory model. (a) The primordial (and simplest) model; (b) the most common current
Mechanical disk Mechanical disk Mechanical disk
Speed
Capacity
Solid state disk
Main memory
Level 3 cache
Level 2 cache
Level 1 cache
Level 2 cache
Level 1 cache
Main memoryMain memory
CPUCPU
(a) (b) (c)
Central
processing
unit (CPU)
Latency Numbers Every
Programmer Should Know
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Read 4K randomly from memory 1,000 ns 0.001 ms
Compress 1K bytes with Zippy 3,000 ns
Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
Read 4K randomly from SSD* 150,000 ns 0.15 ms
Read 1 MB sequentially from memory 250,000 ns 0.25 ms
Round trip within same datacenter 500,000 ns 0.5 ms
Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory
Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150 ms
Source: Jeff Dean and Peter Norvig (Google), with some additions
https://gist.github.com/hellerbarde/2843375
tref
ttrans
CPU cache
CPU cache
Block in storage
to transmit
to CPU
Reference Time vs
Transmission Time
tref ~= ttrans => optimizes memory access
Not All Storage Layers Are
Created Equal
Memory: tref: 100 ns / ttrans (1 KB): ~100 ns
Solid State Disk: tref: 10 us / ttrans (4 KB): ~10 us
Mechanical Disk: tref: 10 ms / ttrans (1 MB): ~10 ms
This has profound implications on how you access storage!
The slower the media, the larger the block 

that is worth to transmit
We Are In A Multicore Age
• This requires special programming measures to
leverage all its potential: threads, multiprocessing
SIMD: Single Instruction,
Multiple Data
More operations in the same CPU clock
The growing gap
between DRAM and
HDD is facilitating
the introduction of

new SDD devices
Forthcoming Trends (I)
Forthcoming Trends (II)
CPU+GPU

Integration
Bcolz: An Example Of Data
Containers Applying The
Principles Of New Hardware
What is bcolz?
• bcolz provides data containers that can be used
in a similar way than the ones in NumPy, Pandas
• The main difference is that data storage is
chunked, not contiguous!
• Two flavors:
• carray: homogenous, n-dim data types
• ctable: heterogeneous types, columnar
Contiguous vs Chunked
NumPy container
Contiguous memory
carray container
chunk 1
chunk 2
Discontiguous memory
chunk N
...
Why Chunking?
• Chunking means more difficulty handling data, so
why bother?
• Efficient enlarging and shrinking
• Compression is possible
• Chunk size can be adapted to the storage
layer (memory, SSD, mechanical disk)
Why Columnar?
• Because it adapts better to newer computer
architectures
String …
String Int32 Float64 Int16
String …
String Int32 Float64 Int16
String …
String Int32 Float64 Int16
String …
String Int32 Float64 Int16
Interesting column
Interesting Data: N * 4 bytes (Int32)	

Actual Data Read: N * 64 bytes (cache line)
}N rows
In-Memory Row-Wise Table
(Structured NumPy array)
String …
String Int32 Float64 Int16
String …
String Int32 Float64 Int16
String …
String Int32 Float64 Int16
String …
String Int32 Float64 Int16
Interesting column
Interesting Data: N * 4 bytes (Int32)	

Actual Data Read: N * 4 bytes (Int32)
In-Memory Column-Wise Table
(bcolz ctable)
}N rows
Less memory travels
to CPU!
Copy!
Array to be

enlarged
Final array

object
Data to append
New memory

allocation
• Both memory areas have to exist simultaneously
Appending Data in
NumPy
Appending Data in bcolz
final carray object
chunk 1
chunk 2
new chunk(s)
carray to be enlarged
chunk 1
chunk 2
data to append
X
compression
Only compression on

new data is required!
Blosc
Less memory travels
to CPU!
Why Compression (I)?
Compressed Dataset
Original Dataset
More data can be packed using the same storage
Why Compression (II)?
Less data needs to be transmitted to the CPU
Disk or Memory Bus
Decompression
Disk or Memory (RAM)
CPU Cache
Original

Dataset
Compressed

Dataset
Transmission + decompression faster than direct transfer?
Blosc: Compressing
Faster Than memcpy()
How Blosc Works
Multithreading & SIMD
at work!
Figure attr: Valentin Haenel
Accelerating I/O With
Blosc
Blosc
}
}
Other	

compressors
–Release Notes for OpenVDB 3.0, maintained by DreamWorks Animation
“Blosc compresses almost as well as ZLIB, but
it is much faster”
Blosc In OpenVDB
And Houdini
Some Projects Using bcolz
• Visualfabriq’s bquery (out-of-core groupby’s):

https://github.com/visualfabriq/bquery
• Continuum’s Blaze:

http://blaze.pydata.org/
• Quantopian: 

http://quantopian.github.io/talks/NeedForSpeed/
slides#/

bquery - On-Disk GroupBy
In-memory (pandas) vs on-disk (bquery+bcolz) groupby
“Switching to bcolz enabled us to have a much better scalable

architecture yet with near in-memory performance”

— Carst Vaartjes, co-founder visualfabriq
Quantopian’s Use Case
“We set up a project to convert Quantopian’s production and
development infrastructure to use bcolz” — Eddie Herbert
Closing Notes
• If you need a data container that fits your needs, look
for already nice libraries out there (NumPy, DyND,
Pandas, PyTables, bcolz…)
• Pay attention to hardware and software trends and
make informed decisions in your current developments
(which, btw, will be deployed in the future :)
• Performance is needed for improving interactivity, so
do not hesitate to optimize the hot spots in C if needed
(via Cython or other means)
“It is change, continuing change, inevitable
change, that is the dominant factor in Computer
Sciences today. No sensible decision can be made
any longer without taking into account not only the
computer as it is, but the computer as it will be.”
— Based on a quote by Isaac Asimov
Thank You!

More Related Content

What's hot

Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Kenta Oono
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012Chris Richardson
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012Amazon Web Services
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305mjfrankli
 
Apache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupApache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupCloudera, Inc.
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Seiya Tokui
 
Process' Virtual Address Space in GNU/Linux
Process' Virtual Address Space in GNU/LinuxProcess' Virtual Address Space in GNU/Linux
Process' Virtual Address Space in GNU/LinuxVarun Mahajan
 
Wuala, P2P Online Storage
Wuala, P2P Online StorageWuala, P2P Online Storage
Wuala, P2P Online Storageadunne
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2Park Chunduck
 
Optimal Strategies for Large-Scale Batch ETL Jobs
Optimal Strategies for Large-Scale Batch ETL JobsOptimal Strategies for Large-Scale Batch ETL Jobs
Optimal Strategies for Large-Scale Batch ETL JobsEmma Tang
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Anne Nicolas
 
GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingJun Young Park
 
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationSchubert Zhang
 
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02Yuta Kashino
 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Cloudera, Inc.
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentationAmir Razmjou
 
Class 1: What is an Operating System?
Class 1: What is an Operating System?Class 1: What is an Operating System?
Class 1: What is an Operating System?David Evans
 

What's hot (20)

lec4_ref.pdf
lec4_ref.pdflec4_ref.pdf
lec4_ref.pdf
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 
Apache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupApache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's Group
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
 
Process' Virtual Address Space in GNU/Linux
Process' Virtual Address Space in GNU/LinuxProcess' Virtual Address Space in GNU/Linux
Process' Virtual Address Space in GNU/Linux
 
Wuala, P2P Online Storage
Wuala, P2P Online StorageWuala, P2P Online Storage
Wuala, P2P Online Storage
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2
 
Optimal Strategies for Large-Scale Batch ETL Jobs
Optimal Strategies for Large-Scale Batch ETL JobsOptimal Strategies for Large-Scale Batch ETL Jobs
Optimal Strategies for Large-Scale Batch ETL Jobs
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
 
GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel Computing
 
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance Evaluation
 
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentation
 
Class 1: What is an Operating System?
Class 1: What is an Operating System?Class 1: What is an Operating System?
Class 1: What is an Operating System?
 
Handout3o
Handout3oHandout3o
Handout3o
 
Bigdata roundtable-storm
Bigdata roundtable-stormBigdata roundtable-storm
Bigdata roundtable-storm
 

Similar to PyData Paris 2015 - Closing keynote Francesc Alted

Blosc Talk by Francesc Alted from PyData London 2014
Blosc Talk by Francesc Alted from PyData London 2014Blosc Talk by Francesc Alted from PyData London 2014
Blosc Talk by Francesc Alted from PyData London 2014PyData
 
Blosc: Sending Data from Memory to CPU (and back) Faster than Memcpy by Franc...
Blosc: Sending Data from Memory to CPU (and back) Faster than Memcpy by Franc...Blosc: Sending Data from Memory to CPU (and back) Faster than Memcpy by Franc...
Blosc: Sending Data from Memory to CPU (and back) Faster than Memcpy by Franc...PyData
 
It's the memory, stupid! CodeJam 2014
It's the memory, stupid!  CodeJam 2014It's the memory, stupid!  CodeJam 2014
It's the memory, stupid! CodeJam 2014Francesc Alted
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Discover Pinterest
 
Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Robert Grossman
 
CT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloudCT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloudJan Aerts
 
Talk at Bioinformatics Open Source Conference, 2012
Talk at Bioinformatics Open Source Conference, 2012Talk at Bioinformatics Open Source Conference, 2012
Talk at Bioinformatics Open Source Conference, 2012c.titus.brown
 
Big data berlin
Big data berlinBig data berlin
Big data berlinkammeyer
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor DesignSri Prasanna
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Javamalduarte
 
Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipcPeter Lawrey
 
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...In-Memory Computing Summit
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
Cluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomesCluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomesGuy Coates
 
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.iraminnezarat
 
Afterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranAfterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranJoseph Glorieux
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresSpark Summit
 

Similar to PyData Paris 2015 - Closing keynote Francesc Alted (20)

Blosc Talk by Francesc Alted from PyData London 2014
Blosc Talk by Francesc Alted from PyData London 2014Blosc Talk by Francesc Alted from PyData London 2014
Blosc Talk by Francesc Alted from PyData London 2014
 
Blosc: Sending Data from Memory to CPU (and back) Faster than Memcpy by Franc...
Blosc: Sending Data from Memory to CPU (and back) Faster than Memcpy by Franc...Blosc: Sending Data from Memory to CPU (and back) Faster than Memcpy by Franc...
Blosc: Sending Data from Memory to CPU (and back) Faster than Memcpy by Franc...
 
It's the memory, stupid! CodeJam 2014
It's the memory, stupid!  CodeJam 2014It's the memory, stupid!  CodeJam 2014
It's the memory, stupid! CodeJam 2014
 
Massively Parallel Architectures
Massively Parallel ArchitecturesMassively Parallel Architectures
Massively Parallel Architectures
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
 
Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)
 
CT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloudCT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloud
 
Talk at Bioinformatics Open Source Conference, 2012
Talk at Bioinformatics Open Source Conference, 2012Talk at Bioinformatics Open Source Conference, 2012
Talk at Bioinformatics Open Source Conference, 2012
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Java
 
Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipc
 
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
Cluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomesCluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomes
 
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
 
Afterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranAfterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écran
 
2014 pycon-talk
2014 pycon-talk2014 pycon-talk
2014 pycon-talk
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
 

More from Pôle Systematic Paris-Region

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...Pôle Systematic Paris-Region
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...Pôle Systematic Paris-Region
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...Pôle Systematic Paris-Region
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyPôle Systematic Paris-Region
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAPôle Systematic Paris-Region
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentPôle Systematic Paris-Region
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...Pôle Systematic Paris-Region
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotPôle Systematic Paris-Region
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...Pôle Systematic Paris-Region
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...Pôle Systematic Paris-Region
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...Pôle Systematic Paris-Region
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)Pôle Systematic Paris-Region
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPôle Systematic Paris-Region
 

More from Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
 

Recently uploaded

Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeralNABLAS株式会社
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 

Recently uploaded (20)

Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 

PyData Paris 2015 - Closing keynote Francesc Alted

  • 1. New Trends In Storing And Analyzing Large Data Silos With Python Francesc Alted! Freelancer (ÜberResearch, University of Oslo) ! April 3rd, 2015
  • 2. About ÜberResearch • Team’s 10+ years experience delivering solutions and services for funding and research institutions • Over 20 development partners, and clients globally, from smallest non-profits to large government agencies • Portfolio company of Digital Science (Macmillan Publishers), the younger sibling of the Nature Publishing Group http://www.uberresearch.com/
  • 4. About Me • Physicist by training • Computer scientist by passion • Open Source enthusiast by philosophy • PyTables (2002 - 2011) • Blosc (2009 - now) • bcolz (2010 - now)
  • 5. –Manuel Oltra, music composer “The art is in the execution of an idea. Not in the idea. There is not much left just from an idea.” “Real artists ship” –Seth Godin, writer Why Free/Libre Projects? • Nice way to realize yourself while helping others
  • 7. Overview • The need for speed: fitting and analyzing as much data as possible with your existing resources • Recent trends in computer hardware • bcolz: an example of data container for large datasets following the principles of newer computer architectures
  • 8. The Need For Speed
  • 9. Don’t Forget Python’s Real Strengths • Interactivity • Data-oriented libraries (NumPy, Pandas, Scikit- Learn…) • Interactivity • Performance (thanks to Cython, SWIG, f2py…) • Interactivity (did I mentioned that already?)
  • 10. The Need For Speed • But interactivity without performance in Big Data is a no go • Designing code for data storage performance depends very much on computer architecture • IMO, existing Python libraries need more effort in getting the most out of existing and future computer architectures
  • 11. The Daily Python Working Scenario Quiz: which computer is best for interactivity?
  • 12. Although Modern Servers/Laptops Can Be Very Complex Beasts We need to know them better so as 
 to get the most out of them
  • 13. Recent Trends In Computer Hardware “There's Plenty of Room at the Bottom”
 An Invitation to Enter a New Field of Physics —Talk by Richard Feynman at Caltech, 1959
  • 14. Memory Access Time vs CPU Cycle Time The gap is wide and still opening!
  • 15. Computer Architecture Evolution Up to end 80’s 90’s and 2000’s 2010’s Figure 1. Evolution of the hierarchical memory model. (a) The primordial (and simplest) model; (b) the most common current Mechanical disk Mechanical disk Mechanical disk Speed Capacity Solid state disk Main memory Level 3 cache Level 2 cache Level 1 cache Level 2 cache Level 1 cache Main memoryMain memory CPUCPU (a) (b) (c) Central processing unit (CPU)
  • 16. Latency Numbers Every Programmer Should Know Latency Comparison Numbers -------------------------- L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Read 4K randomly from memory 1,000 ns 0.001 ms Compress 1K bytes with Zippy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD* 150,000 ns 0.15 ms Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150 ms Source: Jeff Dean and Peter Norvig (Google), with some additions https://gist.github.com/hellerbarde/2843375
  • 17. tref ttrans CPU cache CPU cache Block in storage to transmit to CPU Reference Time vs Transmission Time tref ~= ttrans => optimizes memory access
  • 18. Not All Storage Layers Are Created Equal Memory: tref: 100 ns / ttrans (1 KB): ~100 ns Solid State Disk: tref: 10 us / ttrans (4 KB): ~10 us Mechanical Disk: tref: 10 ms / ttrans (1 MB): ~10 ms This has profound implications on how you access storage! The slower the media, the larger the block 
 that is worth to transmit
  • 19. We Are In A Multicore Age • This requires special programming measures to leverage all its potential: threads, multiprocessing
  • 20. SIMD: Single Instruction, Multiple Data More operations in the same CPU clock
  • 21. The growing gap between DRAM and HDD is facilitating the introduction of
 new SDD devices Forthcoming Trends (I)
  • 23. Bcolz: An Example Of Data Containers Applying The Principles Of New Hardware
  • 24. What is bcolz? • bcolz provides data containers that can be used in a similar way than the ones in NumPy, Pandas • The main difference is that data storage is chunked, not contiguous! • Two flavors: • carray: homogenous, n-dim data types • ctable: heterogeneous types, columnar
  • 25. Contiguous vs Chunked NumPy container Contiguous memory carray container chunk 1 chunk 2 Discontiguous memory chunk N ...
  • 26. Why Chunking? • Chunking means more difficulty handling data, so why bother? • Efficient enlarging and shrinking • Compression is possible • Chunk size can be adapted to the storage layer (memory, SSD, mechanical disk)
  • 27. Why Columnar? • Because it adapts better to newer computer architectures
  • 28. String … String Int32 Float64 Int16 String … String Int32 Float64 Int16 String … String Int32 Float64 Int16 String … String Int32 Float64 Int16 Interesting column Interesting Data: N * 4 bytes (Int32) Actual Data Read: N * 64 bytes (cache line) }N rows In-Memory Row-Wise Table (Structured NumPy array)
  • 29. String … String Int32 Float64 Int16 String … String Int32 Float64 Int16 String … String Int32 Float64 Int16 String … String Int32 Float64 Int16 Interesting column Interesting Data: N * 4 bytes (Int32) Actual Data Read: N * 4 bytes (Int32) In-Memory Column-Wise Table (bcolz ctable) }N rows Less memory travels to CPU!
  • 30. Copy! Array to be
 enlarged Final array
 object Data to append New memory
 allocation • Both memory areas have to exist simultaneously Appending Data in NumPy
  • 31. Appending Data in bcolz final carray object chunk 1 chunk 2 new chunk(s) carray to be enlarged chunk 1 chunk 2 data to append X compression Only compression on
 new data is required! Blosc Less memory travels to CPU!
  • 32. Why Compression (I)? Compressed Dataset Original Dataset More data can be packed using the same storage
  • 33. Why Compression (II)? Less data needs to be transmitted to the CPU Disk or Memory Bus Decompression Disk or Memory (RAM) CPU Cache Original
 Dataset Compressed
 Dataset Transmission + decompression faster than direct transfer?
  • 35. How Blosc Works Multithreading & SIMD at work! Figure attr: Valentin Haenel
  • 37. –Release Notes for OpenVDB 3.0, maintained by DreamWorks Animation “Blosc compresses almost as well as ZLIB, but it is much faster” Blosc In OpenVDB And Houdini
  • 38. Some Projects Using bcolz • Visualfabriq’s bquery (out-of-core groupby’s):
 https://github.com/visualfabriq/bquery • Continuum’s Blaze:
 http://blaze.pydata.org/ • Quantopian: 
 http://quantopian.github.io/talks/NeedForSpeed/ slides#/

  • 39. bquery - On-Disk GroupBy In-memory (pandas) vs on-disk (bquery+bcolz) groupby “Switching to bcolz enabled us to have a much better scalable
 architecture yet with near in-memory performance”
 — Carst Vaartjes, co-founder visualfabriq
  • 40. Quantopian’s Use Case “We set up a project to convert Quantopian’s production and development infrastructure to use bcolz” — Eddie Herbert
  • 41. Closing Notes • If you need a data container that fits your needs, look for already nice libraries out there (NumPy, DyND, Pandas, PyTables, bcolz…) • Pay attention to hardware and software trends and make informed decisions in your current developments (which, btw, will be deployed in the future :) • Performance is needed for improving interactivity, so do not hesitate to optimize the hot spots in C if needed (via Cython or other means)
  • 42. “It is change, continuing change, inevitable change, that is the dominant factor in Computer Sciences today. No sensible decision can be made any longer without taking into account not only the computer as it is, but the computer as it will be.” — Based on a quote by Isaac Asimov