Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow

Sudheesh Katkam
Simplifying Data Access for Python

Introduction
• Data comes in all shapes, sizes and formats
• Data captured in multiple storage systems
• Data takes a complex path to (Python) applications
• How do we simplify access to data?

Traditional
Memory Buffer
Memory Layout
Table

Traditional
Memory Buffer
Arrow
Memory Buffer
Memory Layout
Table

Apache Arrow Goals
• Cache-efficient columnar memory
• Zero-copy messaging / IPC
• Language-agnostic metadata
• Complex/ nested schema support
• Main implementations in C++ and Java, with bindings for C, Python,
Ruby, JavaScript

About Dremio
• Launched in July 2017
• Self-Service Data Platform
• Apache License
• Built entirely on Apache Arrow, Calcite, Parquet
• Narwhal’s name is Gnarly (see me for stickers!)

SQL
Data Virtualization
RDBMS, MongoDB, Elasticsearch, Hadoop, S3,
NAS, Excel, JSON
Data Acceleration
OLAP and ad hoc queries at interactive speed,
without cubes or BI extracts
Data Curation
Wrangle, prepare, enrich any source without
making copies of your data
Data Catalog
Interactive Data Discovery, Enterprise and
Personal Data Assets
New Tier in Analytics: Self-Service Data

Join the Community!
• GitHub:
github.com/dremio/dremio-oss
github.com/apache/arrow
• Dremio Community: community.dremio.com
• Arrow Slack:apachearrowslackin.herokuapp.com
• Twitter: @ApacheArrow, @DremioHQ

Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow

More Related Content

What's hot

Similar to Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow

More from PyData

Recently uploaded

Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow