Python Blaze Overview

3,145
-1

Published on

Blaze is a next-generation NumPy library to provide out of memory data access. This talk summarizes its features.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,145
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
39
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Python Blaze Overview

  1. 1. Next Generation NumPy ! blaze.pydata.org
  2. 2. NumPy Blaze Out of Core, Distributed and Optimized NumPy
  3. 3. NumPy Array shape
  4. 4. Blaze: Different kinds of Arrays Indexable Record Type Primitive Type NDTable Deferred Concrete NDArray Deferred Concrete
  5. 5. Blaze Deferred Arrays • Symbolic objects which build a graph • Represents deferred computation +" A + B*C Usually what you have when you have a Blaze Array A" *" B" C"
  6. 6. Deferred allows handling large arrays Can be handled out-ofcore using chunks to stream through memory.
  7. 7. Blaze Concrete Array Data Descriptor Where are the bytes? DataShape URL URL URL URL URL Indexes What do the bytes mean? Extensible Type System which includes shape MetaData Dictionary Labels, provenance, etc.
  8. 8. Multiple URLs comprising an array
  9. 9. URLs Provide Bytes Memory-Like Arbitrarily sliced Random Seeks File-Like Deal with in chunks Random Seeks Stream-Like Deal with in Chunks Sequential Seeks
  10. 10. Blaze Data Container Index Operation Data Buffer ByteProvider Data Descriptor Protocol NumPy Data Stream BLZ Persistent Format RDBMS CSV
  11. 11. Indexes Contiguous / Strided NumPy-Like Chunked / Tiled Special Access Opaque Element-only Opaque Iterator-access
  12. 12. Indexes allow for many orderings
  13. 13. DataShape Type System Shape DType DataShape • A data description language • A super-set of NumPy’s dtype • Provides more flexibility
  14. 14. Allows for all kinds of containers
  15. 15. Advanced Types Parametrized Types type SquareMatrix T = N, N, T type Point = { x : int; y : int } Alias Types ! type IntMatrix N = N, N, int32 type Space = { a: Point; b: Point } ! 5, 10, Space
  16. 16. Advanced Shapes {1,2,4,2,1}, int32 [ [1], [1,2], [1,3,2,9], [3,2], [3] Could Represent ]
  17. 17. Execution Model • Graphs dispatch to specialized library code that is “registered with the system” based on type and meta-data of array (blaze Modules) • Many operations can be compiled with LLVM to machine-code • BLIR (simple typed expression syntax) • Numba (Python compiler)
  18. 18. Blaze Agents Code Data Blaze Agent Code Graph with Blaze Arrays CSV Directory Blaze Agent MongoDB Blaze Agent Vertica Blaze Agent HDFS
  19. 19. How? “I think you should be more explicit here in step two.”
  20. 20. Out-of-core calculations
  21. 21. NumFOCUS Num(Py) Foundation for Open Code for Usable Science http://www.numfocus.org
  22. 22. BLZ persistence BLZ$layout$at$a$glance$ Dataset$ Super<Chunk$ Chunk$ Header& Offset$0$ __0__.blp$ Blaze$(BLZ)$format$ __1__.blp$ Offset$2$ <<<<<$ <<<<<$ Chunk$0$ data$ Offset$1$ Offset$2$ meta$ Offset$0$ Offset$1$ root$ Header& Block$0$ Chunk$1$ Block$1$ Chunk$2$ Block$2$ Bloscpack$(BLP)$format$ Blosc$format$
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×