Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
PyData &
Apache Spark
2017 / 2 / 10
Sapporo TechBar #7
@
▸ facebook : Ryuji Tamagawa
▸ Twitter : tamagawa_ryuji
▸ FB
techbar
▸ FB
▸ Twitter
5


Python
PyData
Apache
Spark
Jupyter
Notebook
2017
and the
future
Pandas
PyData
1 / 5 : PyData
1 / 5 : PyData
PyData.org
1 / 5 : PyData
PyData
Anaconda Python
Blaze NumPy and pandas interface to Big Data'. dask
Bokeh
Canopy Python
IPython
matp...
pandas
2 / 5 : pandas
pandas
▸ NumPy SciPy 

▸ DataFrame
▸
2 / 5 : pandas
pandas 

Wes McKinney
2 / 5 : pandas
DataFrame
2 / 5 : pandas
2 / 5 : pandas
▸ 

Python
▸
▸ PyData pandas


Jupyter Notebook
3 /5 : Jupyter Notebook
IPython Notebook
▸ Jupyter Notebook
▸ Julia Python R
▸ JupyterCon
3 /5 : Jupyter Notebook
3 /5 : Jupyter Notebook
3 /5 : Jupyter Notebook
pandas / matplotlib
3 /5 : Jupyter Notebook
Interactive Widget
3 /5 : Jupyter Notebook
▸ Learning Jupyter
Apache Spark
4 / 5 : Apache Spark
Hadoop
▸ MapReduce Spark
▸ 2010 Hadoop = MapReduce + HDFS
▸ Hadoop
OS
HDFS
Hive e.t.c.
HBaseMapReduce...
4 / 5 : Apache Spark
Apache Spark PyData pandas
Apache Spark pandas
JVM Python
× dask
I/O
Scala Java Python R

JVM
Python
4 / 5 : Apache Spark
Spark
▸
▸
▸ 1 PC 

Hadoop / MapReduce
4 / 5 : Apache Spark
DataFrame
4 / 5 : Apache Spark
▸
▸ SSD
▸ Spark Parquet
▸ Performance comparison of different file formats
and storage engines in the ...
4 / 5 : Apache Spark
Apache Spark
▸
▸ Parquet
▸
▸
Machine Learning
Machine Learning
▸
▸ scikit-learn
▸ Spark MLlib / ML
▸
▸ TensorFlow
▸ Python
2017 and the future
5/5 : 2017 and the future
PyData
▸
▸ Spark - pandas
▸ pandas → Spark …
5/5 : 2017 and the future
Wes blog
▸ pandas Apache Arrow
▸ Blog
▸ PyData Blog


Wes OK
▸ 2017 : pandas, Arrow, Feather, Pa...
5/5 : 2017 and the future
High speed Apache Parquet for Python
▸ Parquet
▸ Spark
▸ Python
▸ Fastparquet
▸ pyarrow
5/5 : 2017 and the future
: apache arrow
▸ apache arrow
▸ PyData / OSS
▸ /
20170210 sapporotechbar7
20170210 sapporotechbar7
20170210 sapporotechbar7
Upcoming SlideShare
Loading in …5
×

20170210 sapporotechbar7

2017/2/10のインサイトテクノロジーさんのSapporo TechBarでお話しさせていただいたPyDataとSparkに関するスライドです。

  • Be the first to comment

  • Be the first to like this

20170210 sapporotechbar7

  1. 1. PyData & Apache Spark 2017 / 2 / 10 Sapporo TechBar #7 @
  2. 2. ▸ facebook : Ryuji Tamagawa ▸ Twitter : tamagawa_ryuji ▸ FB techbar ▸ FB ▸ Twitter
  3. 3. 5
  4. 4. 
 Python PyData Apache Spark Jupyter Notebook 2017 and the future Pandas
  5. 5. PyData
  6. 6. 1 / 5 : PyData
  7. 7. 1 / 5 : PyData PyData.org
  8. 8. 1 / 5 : PyData PyData Anaconda Python Blaze NumPy and pandas interface to Big Data'. dask Bokeh Canopy Python IPython matplotlib PyData nose numba JIT NumPy PyData Scipy PyData Statsmodels SymPy pandas NumPy SciPy scikit-image scikit-learn PyData 

  9. 9. pandas
  10. 10. 2 / 5 : pandas pandas ▸ NumPy SciPy 
 ▸ DataFrame ▸
  11. 11. 2 / 5 : pandas pandas 
 Wes McKinney
  12. 12. 2 / 5 : pandas DataFrame
  13. 13. 2 / 5 : pandas
  14. 14. 2 / 5 : pandas ▸ 
 Python ▸ ▸ PyData pandas 

  15. 15. Jupyter Notebook
  16. 16. 3 /5 : Jupyter Notebook IPython Notebook ▸ Jupyter Notebook ▸ Julia Python R ▸ JupyterCon
  17. 17. 3 /5 : Jupyter Notebook
  18. 18. 3 /5 : Jupyter Notebook
  19. 19. 3 /5 : Jupyter Notebook pandas / matplotlib
  20. 20. 3 /5 : Jupyter Notebook Interactive Widget
  21. 21. 3 /5 : Jupyter Notebook ▸ Learning Jupyter
  22. 22. Apache Spark
  23. 23. 4 / 5 : Apache Spark Hadoop ▸ MapReduce Spark ▸ 2010 Hadoop = MapReduce + HDFS ▸ Hadoop OS HDFS Hive e.t.c. HBaseMapReduce YARN Impala e.t.c in- memory SQL engine Spark Spark Streaming, MLlib, GraphX, Spark SQL) Hadoop HDFS S3 
 YARN Mesos 
 /
  24. 24. 4 / 5 : Apache Spark Apache Spark PyData pandas Apache Spark pandas JVM Python × dask I/O Scala Java Python R
 JVM Python
  25. 25. 4 / 5 : Apache Spark Spark ▸ ▸ ▸ 1 PC 
 Hadoop / MapReduce
  26. 26. 4 / 5 : Apache Spark DataFrame
  27. 27. 4 / 5 : Apache Spark ▸ ▸ SSD ▸ Spark Parquet ▸ Performance comparison of different file formats and storage engines in the Hadoop ecosystem ▸ Parquet Python
  28. 28. 4 / 5 : Apache Spark Apache Spark ▸ ▸ Parquet ▸ ▸
  29. 29. Machine Learning
  30. 30. Machine Learning ▸ ▸ scikit-learn ▸ Spark MLlib / ML ▸ ▸ TensorFlow ▸ Python
  31. 31. 2017 and the future
  32. 32. 5/5 : 2017 and the future PyData ▸ ▸ Spark - pandas ▸ pandas → Spark …
  33. 33. 5/5 : 2017 and the future Wes blog ▸ pandas Apache Arrow ▸ Blog ▸ PyData Blog 
 Wes OK ▸ 2017 : pandas, Arrow, Feather, Parquet, Spark, Ibis
 http://qiita.com/tamagawa-ryuji/items/deb3f63ed4c7c8065e81
  34. 34. 5/5 : 2017 and the future High speed Apache Parquet for Python ▸ Parquet ▸ Spark ▸ Python ▸ Fastparquet ▸ pyarrow
  35. 35. 5/5 : 2017 and the future : apache arrow ▸ apache arrow ▸ PyData / OSS ▸ /

×