Sudheesh Katkam
Simplifying Data Access for Python
Introduction
• Data comes in all shapes, sizes and formats
• Data captured in multiple storage systems
• Data takes a complex path to (Python) applications
• How do we simplify access to data?
Demo
Traditional
Memory Buffer
Memory Layout
Table
Traditional
Memory Buffer
Arrow
Memory Buffer
Memory Layout
Table
Apache Arrow Goals
• Cache-efficient columnar memory
• Zero-copy messaging / IPC
• Language-agnostic metadata
• Complex/ nested schema support
• Main implementations in C++ and Java, with bindings for C, Python,
Ruby, JavaScript
Apache Arrow
Apache Arrow Adoption
About Dremio
• Launched in July 2017
• Self-Service Data Platform
• Apache License
• Built entirely on Apache Arrow, Calcite, Parquet
• Narwhal’s name is Gnarly (see me for stickers!)
SQL
Data Virtualization
RDBMS, MongoDB, Elasticsearch, Hadoop, S3,
NAS, Excel, JSON
Data Acceleration
OLAP and ad hoc queries at interactive speed,
without cubes or BI extracts
Data Curation
Wrangle, prepare, enrich any source without
making copies of your data
Data Catalog
Interactive Data Discovery, Enterprise and
Personal Data Assets
New Tier in Analytics: Self-Service Data
Demo
Join the Community!
• GitHub:
github.com/dremio/dremio-oss
github.com/apache/arrow
• Dremio Community: community.dremio.com
• Arrow Slack:apachearrowslackin.herokuapp.com
• Twitter: @ApacheArrow, @DremioHQ

Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow