Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20170210 sapporotechbar7

765 views

Published on

2017/2/10のインサイトテクノロジーさんのSapporo TechBarでお話しさせていただいたPyDataとSparkに関するスライドです。

Published in: Software
  • Be the first to comment

  • Be the first to like this

20170210 sapporotechbar7

  1. 1. PyData & Apache Spark 2017 / 2 / 10 Sapporo TechBar #7 @
  2. 2. ▸ facebook : Ryuji Tamagawa ▸ Twitter : tamagawa_ryuji ▸ FB techbar ▸ FB ▸ Twitter
  3. 3. 5
  4. 4. 
 Python PyData Apache Spark Jupyter Notebook 2017 and the future Pandas
  5. 5. PyData
  6. 6. 1 / 5 : PyData
  7. 7. 1 / 5 : PyData PyData.org
  8. 8. 1 / 5 : PyData PyData Anaconda Python Blaze NumPy and pandas interface to Big Data'. dask Bokeh Canopy Python IPython matplotlib PyData nose numba JIT NumPy PyData Scipy PyData Statsmodels SymPy pandas NumPy SciPy scikit-image scikit-learn PyData 

  9. 9. pandas
  10. 10. 2 / 5 : pandas pandas ▸ NumPy SciPy 
 ▸ DataFrame ▸
  11. 11. 2 / 5 : pandas pandas 
 Wes McKinney
  12. 12. 2 / 5 : pandas DataFrame
  13. 13. 2 / 5 : pandas
  14. 14. 2 / 5 : pandas ▸ 
 Python ▸ ▸ PyData pandas 

  15. 15. Jupyter Notebook
  16. 16. 3 /5 : Jupyter Notebook IPython Notebook ▸ Jupyter Notebook ▸ Julia Python R ▸ JupyterCon
  17. 17. 3 /5 : Jupyter Notebook
  18. 18. 3 /5 : Jupyter Notebook
  19. 19. 3 /5 : Jupyter Notebook pandas / matplotlib
  20. 20. 3 /5 : Jupyter Notebook Interactive Widget
  21. 21. 3 /5 : Jupyter Notebook ▸ Learning Jupyter
  22. 22. Apache Spark
  23. 23. 4 / 5 : Apache Spark Hadoop ▸ MapReduce Spark ▸ 2010 Hadoop = MapReduce + HDFS ▸ Hadoop OS HDFS Hive e.t.c. HBaseMapReduce YARN Impala e.t.c in- memory SQL engine Spark Spark Streaming, MLlib, GraphX, Spark SQL) Hadoop HDFS S3 
 YARN Mesos 
 /
  24. 24. 4 / 5 : Apache Spark Apache Spark PyData pandas Apache Spark pandas JVM Python × dask I/O Scala Java Python R
 JVM Python
  25. 25. 4 / 5 : Apache Spark Spark ▸ ▸ ▸ 1 PC 
 Hadoop / MapReduce
  26. 26. 4 / 5 : Apache Spark DataFrame
  27. 27. 4 / 5 : Apache Spark ▸ ▸ SSD ▸ Spark Parquet ▸ Performance comparison of different file formats and storage engines in the Hadoop ecosystem ▸ Parquet Python
  28. 28. 4 / 5 : Apache Spark Apache Spark ▸ ▸ Parquet ▸ ▸
  29. 29. Machine Learning
  30. 30. Machine Learning ▸ ▸ scikit-learn ▸ Spark MLlib / ML ▸ ▸ TensorFlow ▸ Python
  31. 31. 2017 and the future
  32. 32. 5/5 : 2017 and the future PyData ▸ ▸ Spark - pandas ▸ pandas → Spark …
  33. 33. 5/5 : 2017 and the future Wes blog ▸ pandas Apache Arrow ▸ Blog ▸ PyData Blog 
 Wes OK ▸ 2017 : pandas, Arrow, Feather, Parquet, Spark, Ibis
 http://qiita.com/tamagawa-ryuji/items/deb3f63ed4c7c8065e81
  34. 34. 5/5 : 2017 and the future High speed Apache Parquet for Python ▸ Parquet ▸ Spark ▸ Python ▸ Fastparquet ▸ pyarrow
  35. 35. 5/5 : 2017 and the future : apache arrow ▸ apache arrow ▸ PyData / OSS ▸ /

×