1	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Python	
  Data	
  Ecosystem:	
  
Thoughts	
  on	
  Building	
  for	
  the	
  
Future	
  
Wes	
  McKinney	
  @wesmckinn	
  
PyData	
  Berlin	
  2016-­‐05-­‐21	
  
2	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Me	
  
•  Data	
  Science	
  Tools	
  at	
  Cloudera,	
  formerly	
  DataPad	
  CEO/founder	
  
•  Serial	
  creator	
  of	
  structured	
  data	
  tools	
  /	
  user	
  interfaces	
  
•  Wrote	
  bestseller	
  Python	
  for	
  Data	
  Analysis	
  2012	
  
•  Open	
  source	
  projects	
  
•  Python	
  {pandas,	
  Ibis,	
  statsmodels}	
  
•  Apache	
  {Arrow,	
  Parquet,	
  Kudu	
  (incubaWng)}	
  
•  Mostly	
  work	
  in	
  Python	
  and	
  Cython/C/C++	
  
	
  
3	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
In	
  process:	
  
Python	
  for	
  Data	
  Analysis:	
  2nd	
  Edi4on	
  
Coming	
  early	
  2017	
  
4	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Building	
  open	
  source	
  communiWes	
  
5	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Social architecture is the
conscious design of an
environment that
encourages a desired range
of social behaviors leading
towards some goal or set of
goals.
Wikipedia
6	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Step	
  1	
  
	
  
Be	
  open	
  and	
  transparent	
  
7	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Step	
  2	
  
	
  
Reach	
  out	
  to	
  others	
  
8	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Step	
  3	
  
	
  
Strive	
  for	
  consensus	
  
9	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Step	
  4	
  
Value	
  contribuWons	
  extending	
  
beyond	
  lines	
  of	
  code	
  
10	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Step	
  5	
  
	
  
Make	
  things	
  harder	
  for	
  bad	
  actors	
  
11	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
12	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Handling
problems
carefully
13	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
http://numfocus.org
http://apache.org
14	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Python	
  packaging	
  
15	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Packaging	
  is	
  hard	
  
• 	
  Reproducible	
  infrastructure	
  	
  
• 	
  Reproducible	
  toolchains	
  	
  
• 	
  Reproducible	
  build	
  scripts	
  
• 	
  IntegraWon	
  tesWng	
  
• 	
  MulWple	
  library	
  version	
  builds	
  
• 	
  MulWple	
  Python	
  versions	
  
• 	
  Dependency	
  resoluWon	
  
• 	
  HosWng	
  and	
  distribuWon	
  
• 	
  MulWple	
  environment	
  management	
  
16	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
ReflecWng	
  on	
  the	
  past	
  
17	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
18	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
conda-­‐forge	
  
• 	
  Community-­‐curated	
  conda	
  package	
  channel	
  (on	
  anaconda.org)	
  
• 	
  Reproducible	
  build	
  infrastructure	
  (Docker	
  +	
  Circle	
  CI	
  +	
  Travis	
  CI	
  +	
  Appveyor)	
  
• 	
  Automated	
  GitHub	
  helper	
  tools	
  
conda config --add channels conda-forge
19	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
What’s	
  important	
  to	
  me	
  right	
  now?	
  
20	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Important	
  things	
  
• 	
  Building	
  bridges	
  with	
  other	
  data	
  science	
  communiWes	
  (R,	
  Julia,	
  Scala,	
  etc.)	
  
• 	
  Enabling	
  Python	
  to	
  more	
  efficiently	
  talk	
  to	
  other	
  systems	
  (e.g.	
  Hadoop	
  things)	
  
• 	
  Building	
  Python	
  tools	
  for	
  new	
  and	
  changing	
  varieWes	
  of	
  data	
  
21	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
RAM	
  as	
  the	
  new	
  disk?	
  
•  SSD – DRAM
performance
convergence
•  NVM developments
(3D Xpoint)Memory working set
Consumer Consumer Consumer
22	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Problems	
  
• 	
  Memory	
  (data	
  structure)	
  representaWons	
  
• 	
  Metadata	
  representaWons	
  
• 	
  Memory	
  ownership,	
  life-­‐cycle	
  
23	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
NumPy	
  solved	
  this	
  problem	
  for	
  Python	
  scienWsts	
  
• 	
  Common	
  memory	
  representaWon	
  
• 	
  ndarray	
  strided,	
  homogeneous	
  buffer	
  
• 	
  Common	
  metadata	
  
• 	
  NumPy	
  dtypes	
  
• 	
  No	
  well-­‐defined	
  memory	
  sharing	
  /	
  messaging	
  model:	
  case	
  by	
  case	
  basis	
  
24	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Problems	
  NumPy	
  doesn’t	
  solve	
  as	
  well	
  
• 	
  Nested	
  data	
  types	
  (think	
  JSON)	
  
• 	
  Missing	
  /	
  NULL	
  data	
  
• 	
  Strings	
  and	
  category	
  types	
  
• 	
  Columnar	
  memory	
  representaWon	
  for	
  tables	
  (think:	
  analyWc	
  SQL	
  databases)	
  
25	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Apache	
  
Arrow	
  
http://arrow.apache.org
Some slides from Strata-HW talk w/
Jacques Nadeau
26	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Arrow	
  in	
  a	
  Slide	
  
•  New	
  Top-­‐level	
  Apache	
  Sonware	
  FoundaWon	
  project	
  
	
  
•  Focused	
  on	
  Columnar	
  In-­‐Memory	
  AnalyWcs	
  
1.  10-­‐100x	
  speedup	
  on	
  many	
  workloads	
  
2.  Common	
  data	
  layer	
  enables	
  companies	
  to	
  choose	
  best	
  of	
  
breed	
  systems	
  	
  
3.  Designed	
  to	
  work	
  with	
  any	
  programming	
  language	
  
4.  Support	
  for	
  both	
  relaWonal	
  and	
  complex	
  data	
  as-­‐is	
  
	
  
•  Developers	
  from	
  13+	
  major	
  open	
  source	
  projects	
  involved	
  
•  A	
  significant	
  %	
  of	
  the	
  world’s	
  data	
  will	
  be	
  processed	
  through	
  
Arrow!	
  
	
  
Calcite
Cassandra
Deeplearning4j
Drill
Hadoop
HBase
Ibis
Impala
Kudu
Pandas
Parquet
Phoenix
Spark
Storm
R
27	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Focus	
  on	
  CPU	
  Efficiency	
  
1331246660
1331246351
1331244570
1331261196
3/8/2012 2:44PM
3/8/2012 2:38PM
3/8/2012 2:09PM
3/8/2012 6:46PM
99.155.155.225
65.87.165.114
71.10.106.181
76.102.156.138
Row 1
Row 2
Row 3
Row 4
1331246660
1331246351
1331244570
1331261196
3/8/2012 2:44PM
3/8/2012 2:38PM
3/8/2012 2:09PM
3/8/2012 6:46PM
99.155.155.225
65.87.165.114
71.10.106.181
76.102.156.138
session_id
timestamp
source_ip
Traditional
Memory Buffer
Arrow
Memory Buffer
• Cache	
  Locality	
  
• Super-­‐scalar	
  &	
  vectorized	
  
operaWon	
  
• Minimal	
  Structure	
  Overhead	
  
• Constant	
  value	
  access	
  	
  
• With	
  minimal	
  structure	
  
overhead	
  
• Operate	
  directly	
  on	
  columnar	
  
compressed	
  data	
  
28	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
High	
  Performance	
  Sharing	
  &	
  Interchange	
  
Today With Arrow
•  Each system has its own internal
memory format
•  70-80% CPU wasted on serialization
and deserialization
•  Similar functionality implemented in
multiple projects
•  All systems utilize the same memory
format
•  No overhead for cross-system
communication
•  Projects can share functionality (eg,
Parquet-to-Arrow reader)
29	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Arrow	
  in	
  acWon:	
  Feather	
  File	
  Format	
  for	
  Python	
  and	
  R	
  
• Problem:	
  fast,	
  language-­‐
agnosWc	
  binary	
  data	
  frame	
  
file	
  format	
  
• By	
  Wes	
  McKinney	
  (Python)	
  
and	
  Hadley	
  Wickham	
  (R)	
  
• Read	
  speeds	
  close	
  to	
  disk	
  IO	
  
performance	
  
30	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Real	
  World	
  Example:	
  Feather	
  File	
  Format	
  for	
  Python	
  
and	
  R	
  
library(feather)	
  
	
  	
  
path	
  <-­‐	
  "my_data.feather"	
  
write_feather(df,	
  path)	
  
	
  	
  
df	
  <-­‐	
  read_feather(path)	
  
import	
  feather	
  
	
  	
  
path	
  =	
  'my_data.feather'	
  
	
  	
  
feather.write_dataframe(df,	
  path)	
  
df	
  =	
  feather.read_dataframe(path)	
  
R	
   Python	
  
31	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
More	
  on	
  Feather	
  
array 0
array 1
array 2
...
array n - 1
METADATA
Feather File
libfeather
C++ library
Rcpp
Cython
R data.frame
pandas DataFrame
32	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Feather:	
  the	
  good	
  and	
  not-­‐so-­‐good	
  
•  Good	
  
•  Language-­‐agnosWc	
  memory	
  representaWon	
  
•  Extremely	
  fast	
  
•  New	
  storage	
  features	
  can	
  be	
  added	
  without	
  much	
  difficulty	
  
	
  
•  Not-­‐so-­‐good	
  
•  Data	
  must	
  be	
  convert	
  to/from	
  storage	
  representaWon	
  (Arrow)	
  and	
  in-­‐
memory	
  “proprietary”	
  data	
  structures	
  (R	
  /	
  Python	
  data	
  frames)	
  
33	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Apache	
  Parquet:	
  Python	
  support	
  is	
  coming	
  
•  Collaborating with Uwe Korn from
Blue Yonder
pandas
Arrow (C++ / Python)
Parquet (C++)
34	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Shared	
  needs	
  for	
  Python,	
  R,	
  Julia,	
  ...	
  
•  If	
  PLs	
  can	
  establish	
  a	
  common	
  data	
  frame	
  C/C++-­‐level	
  memory	
  representaWon,	
  
we	
  can	
  share	
  algorithms	
  and	
  libraries	
  much	
  more	
  easily	
  
•  Example:	
  dplyr’s	
  in-­‐memory	
  backend	
  
	
  
•  Other	
  requirements	
  
•  Permissive	
  licensing	
  (Python	
  /	
  Julia	
  require	
  MIT/Apache-­‐like)	
  
•  Common	
  build/test/packaging	
  for	
  shared	
  C/C++	
  library	
  components	
  
35	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Real	
  World	
  Example:	
  Python	
  With	
  Spark,	
  Drill,	
  Impala	
  
in partition 0
…
in partition
n - 1
SQL Engine
Python
function
input
Python
function
input
User-supplied
Python code
output
output
out partition 0
…
out partition
n - 1
SQL Engine
36	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Get	
  Involved	
  in	
  Arrow	
  
•  Join	
  the	
  community	
  
•  dev@arrow.apache.org	
  
•  Slack:	
  hups://apachearrowslackin.herokuapp.com/	
  
•  hup://arrow.apache.org	
  
•  @ApacheArrow	
  
37	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Thank	
  you	
  
Wes	
  McKinney	
  @wesmckinn	
  
Views	
  are	
  my	
  own	
  

Python Data Ecosystem: Thoughts on Building for the Future

  • 1.
    1  ©  Cloudera,  Inc.  All  rights  reserved.   Python  Data  Ecosystem:   Thoughts  on  Building  for  the   Future   Wes  McKinney  @wesmckinn   PyData  Berlin  2016-­‐05-­‐21  
  • 2.
    2  ©  Cloudera,  Inc.  All  rights  reserved.   Me   •  Data  Science  Tools  at  Cloudera,  formerly  DataPad  CEO/founder   •  Serial  creator  of  structured  data  tools  /  user  interfaces   •  Wrote  bestseller  Python  for  Data  Analysis  2012   •  Open  source  projects   •  Python  {pandas,  Ibis,  statsmodels}   •  Apache  {Arrow,  Parquet,  Kudu  (incubaWng)}   •  Mostly  work  in  Python  and  Cython/C/C++    
  • 3.
    3  ©  Cloudera,  Inc.  All  rights  reserved.   In  process:   Python  for  Data  Analysis:  2nd  Edi4on   Coming  early  2017  
  • 4.
    4  ©  Cloudera,  Inc.  All  rights  reserved.   Building  open  source  communiWes  
  • 5.
    5  ©  Cloudera,  Inc.  All  rights  reserved.   Social architecture is the conscious design of an environment that encourages a desired range of social behaviors leading towards some goal or set of goals. Wikipedia
  • 6.
    6  ©  Cloudera,  Inc.  All  rights  reserved.   Step  1     Be  open  and  transparent  
  • 7.
    7  ©  Cloudera,  Inc.  All  rights  reserved.   Step  2     Reach  out  to  others  
  • 8.
    8  ©  Cloudera,  Inc.  All  rights  reserved.   Step  3     Strive  for  consensus  
  • 9.
    9  ©  Cloudera,  Inc.  All  rights  reserved.   Step  4   Value  contribuWons  extending   beyond  lines  of  code  
  • 10.
    10  ©  Cloudera,  Inc.  All  rights  reserved.   Step  5     Make  things  harder  for  bad  actors  
  • 11.
    11  ©  Cloudera,  Inc.  All  rights  reserved.  
  • 12.
    12  ©  Cloudera,  Inc.  All  rights  reserved.   Handling problems carefully
  • 13.
    13  ©  Cloudera,  Inc.  All  rights  reserved.   http://numfocus.org http://apache.org
  • 14.
    14  ©  Cloudera,  Inc.  All  rights  reserved.   Python  packaging  
  • 15.
    15  ©  Cloudera,  Inc.  All  rights  reserved.   Packaging  is  hard   •   Reproducible  infrastructure     •   Reproducible  toolchains     •   Reproducible  build  scripts   •   IntegraWon  tesWng   •   MulWple  library  version  builds   •   MulWple  Python  versions   •   Dependency  resoluWon   •   HosWng  and  distribuWon   •   MulWple  environment  management  
  • 16.
    16  ©  Cloudera,  Inc.  All  rights  reserved.   ReflecWng  on  the  past  
  • 17.
    17  ©  Cloudera,  Inc.  All  rights  reserved.  
  • 18.
    18  ©  Cloudera,  Inc.  All  rights  reserved.   conda-­‐forge   •   Community-­‐curated  conda  package  channel  (on  anaconda.org)   •   Reproducible  build  infrastructure  (Docker  +  Circle  CI  +  Travis  CI  +  Appveyor)   •   Automated  GitHub  helper  tools   conda config --add channels conda-forge
  • 19.
    19  ©  Cloudera,  Inc.  All  rights  reserved.   What’s  important  to  me  right  now?  
  • 20.
    20  ©  Cloudera,  Inc.  All  rights  reserved.   Important  things   •   Building  bridges  with  other  data  science  communiWes  (R,  Julia,  Scala,  etc.)   •   Enabling  Python  to  more  efficiently  talk  to  other  systems  (e.g.  Hadoop  things)   •   Building  Python  tools  for  new  and  changing  varieWes  of  data  
  • 21.
    21  ©  Cloudera,  Inc.  All  rights  reserved.   RAM  as  the  new  disk?   •  SSD – DRAM performance convergence •  NVM developments (3D Xpoint)Memory working set Consumer Consumer Consumer
  • 22.
    22  ©  Cloudera,  Inc.  All  rights  reserved.   Problems   •   Memory  (data  structure)  representaWons   •   Metadata  representaWons   •   Memory  ownership,  life-­‐cycle  
  • 23.
    23  ©  Cloudera,  Inc.  All  rights  reserved.   NumPy  solved  this  problem  for  Python  scienWsts   •   Common  memory  representaWon   •   ndarray  strided,  homogeneous  buffer   •   Common  metadata   •   NumPy  dtypes   •   No  well-­‐defined  memory  sharing  /  messaging  model:  case  by  case  basis  
  • 24.
    24  ©  Cloudera,  Inc.  All  rights  reserved.   Problems  NumPy  doesn’t  solve  as  well   •   Nested  data  types  (think  JSON)   •   Missing  /  NULL  data   •   Strings  and  category  types   •   Columnar  memory  representaWon  for  tables  (think:  analyWc  SQL  databases)  
  • 25.
    25  ©  Cloudera,  Inc.  All  rights  reserved.   Apache   Arrow   http://arrow.apache.org Some slides from Strata-HW talk w/ Jacques Nadeau
  • 26.
    26  ©  Cloudera,  Inc.  All  rights  reserved.   Arrow  in  a  Slide   •  New  Top-­‐level  Apache  Sonware  FoundaWon  project     •  Focused  on  Columnar  In-­‐Memory  AnalyWcs   1.  10-­‐100x  speedup  on  many  workloads   2.  Common  data  layer  enables  companies  to  choose  best  of   breed  systems     3.  Designed  to  work  with  any  programming  language   4.  Support  for  both  relaWonal  and  complex  data  as-­‐is     •  Developers  from  13+  major  open  source  projects  involved   •  A  significant  %  of  the  world’s  data  will  be  processed  through   Arrow!     Calcite Cassandra Deeplearning4j Drill Hadoop HBase Ibis Impala Kudu Pandas Parquet Phoenix Spark Storm R
  • 27.
    27  ©  Cloudera,  Inc.  All  rights  reserved.   Focus  on  CPU  Efficiency   1331246660 1331246351 1331244570 1331261196 3/8/2012 2:44PM 3/8/2012 2:38PM 3/8/2012 2:09PM 3/8/2012 6:46PM 99.155.155.225 65.87.165.114 71.10.106.181 76.102.156.138 Row 1 Row 2 Row 3 Row 4 1331246660 1331246351 1331244570 1331261196 3/8/2012 2:44PM 3/8/2012 2:38PM 3/8/2012 2:09PM 3/8/2012 6:46PM 99.155.155.225 65.87.165.114 71.10.106.181 76.102.156.138 session_id timestamp source_ip Traditional Memory Buffer Arrow Memory Buffer • Cache  Locality   • Super-­‐scalar  &  vectorized   operaWon   • Minimal  Structure  Overhead   • Constant  value  access     • With  minimal  structure   overhead   • Operate  directly  on  columnar   compressed  data  
  • 28.
    28  ©  Cloudera,  Inc.  All  rights  reserved.   High  Performance  Sharing  &  Interchange   Today With Arrow •  Each system has its own internal memory format •  70-80% CPU wasted on serialization and deserialization •  Similar functionality implemented in multiple projects •  All systems utilize the same memory format •  No overhead for cross-system communication •  Projects can share functionality (eg, Parquet-to-Arrow reader)
  • 29.
    29  ©  Cloudera,  Inc.  All  rights  reserved.   Arrow  in  acWon:  Feather  File  Format  for  Python  and  R   • Problem:  fast,  language-­‐ agnosWc  binary  data  frame   file  format   • By  Wes  McKinney  (Python)   and  Hadley  Wickham  (R)   • Read  speeds  close  to  disk  IO   performance  
  • 30.
    30  ©  Cloudera,  Inc.  All  rights  reserved.   Real  World  Example:  Feather  File  Format  for  Python   and  R   library(feather)       path  <-­‐  "my_data.feather"   write_feather(df,  path)       df  <-­‐  read_feather(path)   import  feather       path  =  'my_data.feather'       feather.write_dataframe(df,  path)   df  =  feather.read_dataframe(path)   R   Python  
  • 31.
    31  ©  Cloudera,  Inc.  All  rights  reserved.   More  on  Feather   array 0 array 1 array 2 ... array n - 1 METADATA Feather File libfeather C++ library Rcpp Cython R data.frame pandas DataFrame
  • 32.
    32  ©  Cloudera,  Inc.  All  rights  reserved.   Feather:  the  good  and  not-­‐so-­‐good   •  Good   •  Language-­‐agnosWc  memory  representaWon   •  Extremely  fast   •  New  storage  features  can  be  added  without  much  difficulty     •  Not-­‐so-­‐good   •  Data  must  be  convert  to/from  storage  representaWon  (Arrow)  and  in-­‐ memory  “proprietary”  data  structures  (R  /  Python  data  frames)  
  • 33.
    33  ©  Cloudera,  Inc.  All  rights  reserved.   Apache  Parquet:  Python  support  is  coming   •  Collaborating with Uwe Korn from Blue Yonder pandas Arrow (C++ / Python) Parquet (C++)
  • 34.
    34  ©  Cloudera,  Inc.  All  rights  reserved.   Shared  needs  for  Python,  R,  Julia,  ...   •  If  PLs  can  establish  a  common  data  frame  C/C++-­‐level  memory  representaWon,   we  can  share  algorithms  and  libraries  much  more  easily   •  Example:  dplyr’s  in-­‐memory  backend     •  Other  requirements   •  Permissive  licensing  (Python  /  Julia  require  MIT/Apache-­‐like)   •  Common  build/test/packaging  for  shared  C/C++  library  components  
  • 35.
    35  ©  Cloudera,  Inc.  All  rights  reserved.   Real  World  Example:  Python  With  Spark,  Drill,  Impala   in partition 0 … in partition n - 1 SQL Engine Python function input Python function input User-supplied Python code output output out partition 0 … out partition n - 1 SQL Engine
  • 36.
    36  ©  Cloudera,  Inc.  All  rights  reserved.   Get  Involved  in  Arrow   •  Join  the  community   •  dev@arrow.apache.org   •  Slack:  hups://apachearrowslackin.herokuapp.com/   •  hup://arrow.apache.org   •  @ApacheArrow  
  • 37.
    37  ©  Cloudera,  Inc.  All  rights  reserved.   Thank  you   Wes  McKinney  @wesmckinn   Views  are  my  own