Python Blaze Overview

Next Generation NumPy
!

blaze.pydata.org

NumPy

Blaze

Out of Core,

Distributed and Optimized

NumPy

Blaze: Different kinds of Arrays
Indexable

Record Type

Primitive Type

NDTable

Deferred

Concrete

NDArray

Deferred

Concrete

Blaze Deferred Arrays
• Symbolic objects which build a graph

• Represents deferred computation
+"

A + B*C
Usually what you have when
you have a Blaze Array

A"

*"
B"

C"

Deferred allows handling large arrays

Can be handled out-ofcore using chunks to
stream through memory.

Blaze Concrete Array
Data Descriptor
Where are the bytes?

DataShape

URL

URL

URL

URL

URL

Indexes

What do the bytes mean?

Extensible Type System
which includes shape

MetaData

Dictionary

Labels, provenance, etc.

Multiple URLs comprising an array

URLs Provide Bytes
Memory-Like

Arbitrarily sliced

Random Seeks

File-Like

Deal with in chunks

Random Seeks

Stream-Like

Deal with in Chunks

Sequential Seeks

Blaze Data Container

Index
Operation

Data Buffer
ByteProvider

Data Descriptor
Protocol
NumPy
Data Stream

BLZ

Persistent

Format

RDBMS
CSV

Indexes
Contiguous / Strided

NumPy-Like

Chunked / Tiled

Special Access

Opaque Element-only

Opaque

Iterator-access

Indexes allow for many orderings

DataShape Type System
Shape

DType

DataShape

• A data description language

• A super-set of NumPy’s dtype

• Provides more ﬂexibility

Allows for all kinds of containers

Advanced Types
Parametrized Types
type SquareMatrix T = N, N, T

type Point = {
x : int;
y : int
}

Alias Types

!

type IntMatrix N = N, N, int32

type Space = {
a: Point;
b: Point
}
!

5, 10, Space

Advanced Shapes
{1,2,4,2,1}, int32

[
[1],
[1,2],
[1,3,2,9],
[3,2],
[3]

Could Represent

]

Execution Model
• Graphs dispatch to specialized library code

that is “registered with the system” based on
type and meta-data of array (blaze Modules)

• Many operations can be compiled with LLVM
to machine-code

• BLIR (simple typed expression syntax)

• Numba (Python compiler)

Blaze Agents
Code
Data
Blaze
Agent

Code Graph with Blaze
Arrays

CSV
Directory

Blaze
Agent

MongoDB

Blaze
Agent

Vertica

Blaze
Agent

HDFS

How?

“I think you should be more
explicit here in step two.”

NumFOCUS
Num(Py) Foundation for Open Code for Usable Science

http://www.numfocus.org

BLZ persistence
BLZ$layout$at$a$glance$
Dataset$

Super<Chunk$

Chunk$

Header&
Offset$0$

__0__.blp$

Blaze$(BLZ)$format$

__1__.blp$

Offset$2$

<<<<<$

<<<<<$

Chunk$0$

data$

Offset$1$

Offset$2$

meta$

Offset$0$

Offset$1$

root$

Header&

Block$0$

Chunk$1$

Block$1$

Chunk$2$

Block$2$

Bloscpack$(BLP)$format$

Blosc$format$

Python Blaze Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Python Blaze Overview

Similar to Python Blaze Overview (20)

More from Boston Consulting Group

More from Boston Consulting Group (7)

Recently uploaded

Recently uploaded (20)

Python Blaze Overview