Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Python Blaze Overview


Published on

Blaze is a next-generation NumPy library to provide out of memory data access. This talk summarizes its features.

Published in: Technology
  • Be the first to comment

Python Blaze Overview

  1. 1. Next Generation NumPy !
  2. 2. NumPy Blaze Out of Core, Distributed and Optimized NumPy
  3. 3. NumPy Array shape
  4. 4. Blaze: Different kinds of Arrays Indexable Record Type Primitive Type NDTable Deferred Concrete NDArray Deferred Concrete
  5. 5. Blaze Deferred Arrays • Symbolic objects which build a graph • Represents deferred computation +" A + B*C Usually what you have when you have a Blaze Array A" *" B" C"
  6. 6. Deferred allows handling large arrays Can be handled out-ofcore using chunks to stream through memory.
  7. 7. Blaze Concrete Array Data Descriptor Where are the bytes? DataShape URL URL URL URL URL Indexes What do the bytes mean? Extensible Type System which includes shape MetaData Dictionary Labels, provenance, etc.
  8. 8. Multiple URLs comprising an array
  9. 9. URLs Provide Bytes Memory-Like Arbitrarily sliced Random Seeks File-Like Deal with in chunks Random Seeks Stream-Like Deal with in Chunks Sequential Seeks
  10. 10. Blaze Data Container Index Operation Data Buffer ByteProvider Data Descriptor Protocol NumPy Data Stream BLZ Persistent Format RDBMS CSV
  11. 11. Indexes Contiguous / Strided NumPy-Like Chunked / Tiled Special Access Opaque Element-only Opaque Iterator-access
  12. 12. Indexes allow for many orderings
  13. 13. DataShape Type System Shape DType DataShape • A data description language • A super-set of NumPy’s dtype • Provides more flexibility
  14. 14. Allows for all kinds of containers
  15. 15. Advanced Types Parametrized Types type SquareMatrix T = N, N, T type Point = { x : int; y : int } Alias Types ! type IntMatrix N = N, N, int32 type Space = { a: Point; b: Point } ! 5, 10, Space
  16. 16. Advanced Shapes {1,2,4,2,1}, int32 [ [1], [1,2], [1,3,2,9], [3,2], [3] Could Represent ]
  17. 17. Execution Model • Graphs dispatch to specialized library code that is “registered with the system” based on type and meta-data of array (blaze Modules) • Many operations can be compiled with LLVM to machine-code • BLIR (simple typed expression syntax) • Numba (Python compiler)
  18. 18. Blaze Agents Code Data Blaze Agent Code Graph with Blaze Arrays CSV Directory Blaze Agent MongoDB Blaze Agent Vertica Blaze Agent HDFS
  19. 19. How? “I think you should be more explicit here in step two.”
  20. 20. Out-of-core calculations
  21. 21. NumFOCUS Num(Py) Foundation for Open Code for Usable Science
  22. 22. BLZ persistence BLZ$layout$at$a$glance$ Dataset$ Super<Chunk$ Chunk$ Header& Offset$0$ __0__.blp$ Blaze$(BLZ)$format$ __1__.blp$ Offset$2$ <<<<<$ <<<<<$ Chunk$0$ data$ Offset$1$ Offset$2$ meta$ Offset$0$ Offset$1$ root$ Header& Block$0$ Chunk$1$ Block$1$ Chunk$2$ Block$2$ Bloscpack$(BLP)$format$ Blosc$format$