SlideShare a Scribd company logo
PyData &
Apache Spark
2017 / 2 / 10
Sapporo TechBar #7
@
▸ facebook : Ryuji Tamagawa
▸ Twitter : tamagawa_ryuji
▸ FB
techbar
▸ FB
▸ Twitter
5


Python
PyData
Apache
Spark
Jupyter
Notebook
2017
and the
future
Pandas
PyData
1 / 5 : PyData
1 / 5 : PyData
PyData.org
1 / 5 : PyData
PyData
Anaconda Python
Blaze NumPy and pandas interface to Big Data'. dask
Bokeh
Canopy Python
IPython
matplotlib PyData
nose
numba JIT
NumPy PyData
Scipy PyData
Statsmodels
SymPy
pandas NumPy SciPy
scikit-image
scikit-learn PyData


pandas
2 / 5 : pandas
pandas
▸ NumPy SciPy 

▸ DataFrame
▸
2 / 5 : pandas
pandas 

Wes McKinney
2 / 5 : pandas
DataFrame
2 / 5 : pandas
2 / 5 : pandas
▸ 

Python
▸
▸ PyData pandas


Jupyter Notebook
3 /5 : Jupyter Notebook
IPython Notebook
▸ Jupyter Notebook
▸ Julia Python R
▸ JupyterCon
3 /5 : Jupyter Notebook
3 /5 : Jupyter Notebook
3 /5 : Jupyter Notebook
pandas / matplotlib
3 /5 : Jupyter Notebook
Interactive Widget
3 /5 : Jupyter Notebook
▸ Learning Jupyter
Apache Spark
4 / 5 : Apache Spark
Hadoop
▸ MapReduce Spark
▸ 2010 Hadoop = MapReduce + HDFS
▸ Hadoop
OS
HDFS
Hive e.t.c.
HBaseMapReduce
YARN
Impala
e.t.c in-
memory SQL
engine
Spark
Spark Streaming, MLlib,
GraphX, Spark SQL)
Hadoop
HDFS S3 

YARN Mesos 

/
4 / 5 : Apache Spark
Apache Spark PyData pandas
Apache Spark pandas
JVM Python
× dask
I/O
Scala Java Python R

JVM
Python
4 / 5 : Apache Spark
Spark
▸
▸
▸ 1 PC 

Hadoop / MapReduce
4 / 5 : Apache Spark
DataFrame
4 / 5 : Apache Spark
▸
▸ SSD
▸ Spark Parquet
▸ Performance comparison of different file formats
and storage engines in the Hadoop ecosystem
▸ Parquet Python
4 / 5 : Apache Spark
Apache Spark
▸
▸ Parquet
▸
▸
Machine Learning
Machine Learning
▸
▸ scikit-learn
▸ Spark MLlib / ML
▸
▸ TensorFlow
▸ Python
2017 and the future
5/5 : 2017 and the future
PyData
▸
▸ Spark - pandas
▸ pandas → Spark …
5/5 : 2017 and the future
Wes blog
▸ pandas Apache Arrow
▸ Blog
▸ PyData Blog


Wes OK
▸ 2017 : pandas, Arrow, Feather, Parquet, Spark, Ibis

http://qiita.com/tamagawa-ryuji/items/deb3f63ed4c7c8065e81
5/5 : 2017 and the future
High speed Apache Parquet for Python
▸ Parquet
▸ Spark
▸ Python
▸ Fastparquet
▸ pyarrow
5/5 : 2017 and the future
: apache arrow
▸ apache arrow
▸ PyData / OSS
▸ /
20170210 sapporotechbar7

More Related Content

What's hot

Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon
Jeremy Hanna
 
Watch Your Log!
Watch Your Log!Watch Your Log!
Watch Your Log!
Co-graph Inc.
 
Big Data Programming Using Hadoop Workshop
Big Data Programming Using Hadoop WorkshopBig Data Programming Using Hadoop Workshop
Big Data Programming Using Hadoop Workshop
IMC Institute
 
Hadoop 1 vs hadoop2
Hadoop 1 vs hadoop2Hadoop 1 vs hadoop2
Hadoop 1 vs hadoop2
Sandeep Patil
 
A complete hadoop stack
A complete hadoop stackA complete hadoop stack
A complete hadoop stack
Abhra Pal
 
Mapreduce Tutorial
Mapreduce TutorialMapreduce Tutorial
Mapreduce Tutorial
Sandeep Patil
 
Hadoop 2 cluster architecture
Hadoop 2 cluster architectureHadoop 2 cluster architecture
Hadoop 2 cluster architecture
Sandeep Patil
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data Technology
Firman Gautama
 
Hadoop-BigData
Hadoop-BigDataHadoop-BigData
Hadoop-BigData
Gigin Krishnan
 
Hadoop
HadoopHadoop
Hadoop 101 v2
Hadoop 101 v2Hadoop 101 v2
Hadoop 101 v2
John Berns
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
Michael Stack
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
elliando dias
 
Alluxio
AlluxioAlluxio
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
Thomas Vanhove
 
Papyri.info's Linked Data Story
Papyri.info's Linked Data StoryPapyri.info's Linked Data Story
Papyri.info's Linked Data Story
Hugh Cayless
 
Big Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseBig Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + Couchbase
Fujio Turner
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoop
Amir Sedighi
 
SparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big DataSparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big Data
samuel shamiri
 
HPCC Systems vs Hadoop
HPCC Systems vs HadoopHPCC Systems vs Hadoop
HPCC Systems vs Hadoop
Fujio Turner
 

What's hot (20)

Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon
 
Watch Your Log!
Watch Your Log!Watch Your Log!
Watch Your Log!
 
Big Data Programming Using Hadoop Workshop
Big Data Programming Using Hadoop WorkshopBig Data Programming Using Hadoop Workshop
Big Data Programming Using Hadoop Workshop
 
Hadoop 1 vs hadoop2
Hadoop 1 vs hadoop2Hadoop 1 vs hadoop2
Hadoop 1 vs hadoop2
 
A complete hadoop stack
A complete hadoop stackA complete hadoop stack
A complete hadoop stack
 
Mapreduce Tutorial
Mapreduce TutorialMapreduce Tutorial
Mapreduce Tutorial
 
Hadoop 2 cluster architecture
Hadoop 2 cluster architectureHadoop 2 cluster architecture
Hadoop 2 cluster architecture
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data Technology
 
Hadoop-BigData
Hadoop-BigDataHadoop-BigData
Hadoop-BigData
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop 101 v2
Hadoop 101 v2Hadoop 101 v2
Hadoop 101 v2
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Alluxio
AlluxioAlluxio
Alluxio
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
 
Papyri.info's Linked Data Story
Papyri.info's Linked Data StoryPapyri.info's Linked Data Story
Papyri.info's Linked Data Story
 
Big Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseBig Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + Couchbase
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoop
 
SparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big DataSparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big Data
 
HPCC Systems vs Hadoop
HPCC Systems vs HadoopHPCC Systems vs Hadoop
HPCC Systems vs Hadoop
 

Viewers also liked

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Databricks
 
Pynq祭り資料
Pynq祭り資料Pynq祭り資料
Pynq祭り資料
一路 川染
 
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
Insight Technology, Inc.
 
PYNQ単体でUIを表示してみる(PYNQまつり)
PYNQ単体でUIを表示してみる(PYNQまつり)PYNQ単体でUIを表示してみる(PYNQまつり)
PYNQ単体でUIを表示してみる(PYNQまつり)
Kenta IDA
 
PYNQ祭りLT todotani
PYNQ祭りLT todotaniPYNQ祭りLT todotani
PYNQ祭りLT todotani
Kenshi Kamiya
 
Pynqでカメラ画像をリアルタイムfastx コーナー検出
Pynqでカメラ画像をリアルタイムfastx コーナー検出Pynqでカメラ画像をリアルタイムfastx コーナー検出
Pynqでカメラ画像をリアルタイムfastx コーナー検出
marsee101
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
wyukawa
 
PYNQ 祭り: Pmod のプログラミング
PYNQ 祭り: Pmod のプログラミングPYNQ 祭り: Pmod のプログラミング
PYNQ 祭り: Pmod のプログラミング
ryos36
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
Spark Summit
 
PYNQで○○してみた!
PYNQで○○してみた!PYNQで○○してみた!
PYNQで○○してみた!
aster_ism
 
PYNQ祭り
PYNQ祭りPYNQ祭り
PYNQ祭り
Mr. Vengineer
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
Craig Chao
 
Spark machine learning & deep learning
Spark machine learning & deep learningSpark machine learning & deep learning
Spark machine learning & deep learning
hoondong kim
 
コンピュータエンジニアへのFPGAのすすめ
コンピュータエンジニアへのFPGAのすすめコンピュータエンジニアへのFPGAのすすめ
コンピュータエンジニアへのFPGAのすすめ
Takeshi HASEGAWA
 

Viewers also liked (15)

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
 
Pynq祭り資料
Pynq祭り資料Pynq祭り資料
Pynq祭り資料
 
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
 
PYNQ単体でUIを表示してみる(PYNQまつり)
PYNQ単体でUIを表示してみる(PYNQまつり)PYNQ単体でUIを表示してみる(PYNQまつり)
PYNQ単体でUIを表示してみる(PYNQまつり)
 
PYNQ祭りLT todotani
PYNQ祭りLT todotaniPYNQ祭りLT todotani
PYNQ祭りLT todotani
 
Pynqでカメラ画像をリアルタイムfastx コーナー検出
Pynqでカメラ画像をリアルタイムfastx コーナー検出Pynqでカメラ画像をリアルタイムfastx コーナー検出
Pynqでカメラ画像をリアルタイムfastx コーナー検出
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
PYNQ 祭り: Pmod のプログラミング
PYNQ 祭り: Pmod のプログラミングPYNQ 祭り: Pmod のプログラミング
PYNQ 祭り: Pmod のプログラミング
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
 
PYNQで○○してみた!
PYNQで○○してみた!PYNQで○○してみた!
PYNQで○○してみた!
 
PYNQ祭り
PYNQ祭りPYNQ祭り
PYNQ祭り
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
 
Spark machine learning & deep learning
Spark machine learning & deep learningSpark machine learning & deep learning
Spark machine learning & deep learning
 
コンピュータエンジニアへのFPGAのすすめ
コンピュータエンジニアへのFPGAのすすめコンピュータエンジニアへのFPGAのすすめ
コンピュータエンジニアへのFPGAのすすめ
 

Similar to 20170210 sapporotechbar7

Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
Wes McKinney
 
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Spark Summit
 
Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.
Peadar Coyle
 
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
eNovance
 
Harry Potter & Apache iceberg format
Harry Potter & Apache iceberg formatHarry Potter & Apache iceberg format
Harry Potter & Apache iceberg format
Taras Fedorov
 
Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018
Holden Karau
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
Nathan Halko
 
Hadoop at ayasdi
Hadoop at ayasdiHadoop at ayasdi
Hadoop at ayasdi
Mohit Jaggi
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and Mesos
Discover Pinterest
 
Python in big data ecosystem by Nicholas Lu
Python in big data ecosystem by Nicholas LuPython in big data ecosystem by Nicholas Lu
Python in big data ecosystem by Nicholas Lu
PYCON MY PLT
 
Spark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the UglySpark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the Ugly
Sarah Guido
 
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
Deep Learning Italia
 
Contributing to pandas (Korean)
Contributing to pandas (Korean)Contributing to pandas (Korean)
Contributing to pandas (Korean)
Younggun Kim
 
Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Making the big data ecosystem work together with Python & Apache Arrow, Apach...Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Holden Karau
 
Making the big data ecosystem work together with python apache arrow, spark,...
Making the big data ecosystem work together with python  apache arrow, spark,...Making the big data ecosystem work together with python  apache arrow, spark,...
Making the big data ecosystem work together with python apache arrow, spark,...
Holden Karau
 
Distributing Sage / Python Code, The Right Way
Distributing Sage / Python Code, The Right WayDistributing Sage / Python Code, The Right Way
Distributing Sage / Python Code, The Right Way
mmasdeu
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
Happiest Minds Technologies
 
Lightning Fast Dataframes with Polars
Lightning Fast Dataframes with PolarsLightning Fast Dataframes with Polars
Lightning Fast Dataframes with Polars
Alberto Danese
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
Edureka!
 
Putting data science to work
Putting data science to workPutting data science to work
Putting data science to work
Alex Breeze
 

Similar to 20170210 sapporotechbar7 (20)

Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
 
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
 
Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to Spark: Or how I learned to love 'big data' after all.
 
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
 
Harry Potter & Apache iceberg format
Harry Potter & Apache iceberg formatHarry Potter & Apache iceberg format
Harry Potter & Apache iceberg format
 
Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Hadoop at ayasdi
Hadoop at ayasdiHadoop at ayasdi
Hadoop at ayasdi
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and Mesos
 
Python in big data ecosystem by Nicholas Lu
Python in big data ecosystem by Nicholas LuPython in big data ecosystem by Nicholas Lu
Python in big data ecosystem by Nicholas Lu
 
Spark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the UglySpark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the Ugly
 
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
 
Contributing to pandas (Korean)
Contributing to pandas (Korean)Contributing to pandas (Korean)
Contributing to pandas (Korean)
 
Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Making the big data ecosystem work together with Python & Apache Arrow, Apach...Making the big data ecosystem work together with Python & Apache Arrow, Apach...
Making the big data ecosystem work together with Python & Apache Arrow, Apach...
 
Making the big data ecosystem work together with python apache arrow, spark,...
Making the big data ecosystem work together with python  apache arrow, spark,...Making the big data ecosystem work together with python  apache arrow, spark,...
Making the big data ecosystem work together with python apache arrow, spark,...
 
Distributing Sage / Python Code, The Right Way
Distributing Sage / Python Code, The Right WayDistributing Sage / Python Code, The Right Way
Distributing Sage / Python Code, The Right Way
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Lightning Fast Dataframes with Polars
Lightning Fast Dataframes with PolarsLightning Fast Dataframes with Polars
Lightning Fast Dataframes with Polars
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
 
Putting data science to work
Putting data science to workPutting data science to work
Putting data science to work
 

More from Ryuji Tamagawa

hbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineeringhbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineering
Ryuji Tamagawa
 
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
Ryuji Tamagawa
 
20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌
Ryuji Tamagawa
 
20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark
Ryuji Tamagawa
 
20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet
Ryuji Tamagawa
 
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame APIPerformant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
Ryuji Tamagawa
 
Apache Sparkの紹介
Apache Sparkの紹介Apache Sparkの紹介
Apache Sparkの紹介
Ryuji Tamagawa
 
足を地に着け落ち着いて考える
足を地に着け落ち着いて考える足を地に着け落ち着いて考える
足を地に着け落ち着いて考える
Ryuji Tamagawa
 
ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践
Ryuji Tamagawa
 
BigQueryの課金、節約しませんか
BigQueryの課金、節約しませんかBigQueryの課金、節約しませんか
BigQueryの課金、節約しませんか
Ryuji Tamagawa
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQuery
Ryuji Tamagawa
 
Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測
Ryuji Tamagawa
 
lessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conferencelessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conference
Ryuji Tamagawa
 
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
Ryuji Tamagawa
 
Mongo dbを知ろう devlove関西
Mongo dbを知ろう   devlove関西Mongo dbを知ろう   devlove関西
Mongo dbを知ろう devlove関西
Ryuji Tamagawa
 
Seleniumをもっと知るための本の話
Seleniumをもっと知るための本の話Seleniumをもっと知るための本の話
Seleniumをもっと知るための本の話
Ryuji Tamagawa
 
データベース勉強会 In 広島 mongodb
データベース勉強会 In 広島  mongodbデータベース勉強会 In 広島  mongodb
データベース勉強会 In 広島 mongodb
Ryuji Tamagawa
 
Invitation to mongo db @ Rakuten TechTalk
Invitation to mongo db @ Rakuten TechTalkInvitation to mongo db @ Rakuten TechTalk
Invitation to mongo db @ Rakuten TechTalk
Ryuji Tamagawa
 
MongoDB tuning on AWS
MongoDB tuning on AWSMongoDB tuning on AWS
MongoDB tuning on AWS
Ryuji Tamagawa
 

More from Ryuji Tamagawa (20)

hbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineeringhbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineering
 
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
 
20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌
 
20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark
 
20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet
 
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame APIPerformant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
 
Apache Sparkの紹介
Apache Sparkの紹介Apache Sparkの紹介
Apache Sparkの紹介
 
足を地に着け落ち着いて考える
足を地に着け落ち着いて考える足を地に着け落ち着いて考える
足を地に着け落ち着いて考える
 
ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践
 
Google Big Query
Google Big QueryGoogle Big Query
Google Big Query
 
BigQueryの課金、節約しませんか
BigQueryの課金、節約しませんかBigQueryの課金、節約しませんか
BigQueryの課金、節約しませんか
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQuery
 
Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測
 
lessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conferencelessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conference
 
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
 
Mongo dbを知ろう devlove関西
Mongo dbを知ろう   devlove関西Mongo dbを知ろう   devlove関西
Mongo dbを知ろう devlove関西
 
Seleniumをもっと知るための本の話
Seleniumをもっと知るための本の話Seleniumをもっと知るための本の話
Seleniumをもっと知るための本の話
 
データベース勉強会 In 広島 mongodb
データベース勉強会 In 広島  mongodbデータベース勉強会 In 広島  mongodb
データベース勉強会 In 広島 mongodb
 
Invitation to mongo db @ Rakuten TechTalk
Invitation to mongo db @ Rakuten TechTalkInvitation to mongo db @ Rakuten TechTalk
Invitation to mongo db @ Rakuten TechTalk
 
MongoDB tuning on AWS
MongoDB tuning on AWSMongoDB tuning on AWS
MongoDB tuning on AWS
 

Recently uploaded

Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
ssuserad3af4
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Envertis Software Solutions
 
Mobile app Development Services | Drona Infotech
Mobile app Development Services  | Drona InfotechMobile app Development Services  | Drona Infotech
Mobile app Development Services | Drona Infotech
Drona Infotech
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Top 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptxTop 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptx
devvsandy
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
mz5nrf0n
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 

Recently uploaded (20)

Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
 
Mobile app Development Services | Drona Infotech
Mobile app Development Services  | Drona InfotechMobile app Development Services  | Drona Infotech
Mobile app Development Services | Drona Infotech
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Top 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptxTop 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptx
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 

20170210 sapporotechbar7

  • 1. PyData & Apache Spark 2017 / 2 / 10 Sapporo TechBar #7 @
  • 2. ▸ facebook : Ryuji Tamagawa ▸ Twitter : tamagawa_ryuji ▸ FB techbar ▸ FB ▸ Twitter
  • 3.
  • 4. 5
  • 5.
  • 8. 1 / 5 : PyData
  • 9. 1 / 5 : PyData PyData.org
  • 10. 1 / 5 : PyData PyData Anaconda Python Blaze NumPy and pandas interface to Big Data'. dask Bokeh Canopy Python IPython matplotlib PyData nose numba JIT NumPy PyData Scipy PyData Statsmodels SymPy pandas NumPy SciPy scikit-image scikit-learn PyData 

  • 12. 2 / 5 : pandas pandas ▸ NumPy SciPy 
 ▸ DataFrame ▸
  • 13. 2 / 5 : pandas pandas 
 Wes McKinney
  • 14. 2 / 5 : pandas DataFrame
  • 15. 2 / 5 : pandas
  • 16. 2 / 5 : pandas ▸ 
 Python ▸ ▸ PyData pandas 

  • 18. 3 /5 : Jupyter Notebook IPython Notebook ▸ Jupyter Notebook ▸ Julia Python R ▸ JupyterCon
  • 19. 3 /5 : Jupyter Notebook
  • 20. 3 /5 : Jupyter Notebook
  • 21. 3 /5 : Jupyter Notebook pandas / matplotlib
  • 22. 3 /5 : Jupyter Notebook Interactive Widget
  • 23. 3 /5 : Jupyter Notebook ▸ Learning Jupyter
  • 25. 4 / 5 : Apache Spark Hadoop ▸ MapReduce Spark ▸ 2010 Hadoop = MapReduce + HDFS ▸ Hadoop OS HDFS Hive e.t.c. HBaseMapReduce YARN Impala e.t.c in- memory SQL engine Spark Spark Streaming, MLlib, GraphX, Spark SQL) Hadoop HDFS S3 
 YARN Mesos 
 /
  • 26. 4 / 5 : Apache Spark Apache Spark PyData pandas Apache Spark pandas JVM Python × dask I/O Scala Java Python R
 JVM Python
  • 27. 4 / 5 : Apache Spark Spark ▸ ▸ ▸ 1 PC 
 Hadoop / MapReduce
  • 28. 4 / 5 : Apache Spark DataFrame
  • 29. 4 / 5 : Apache Spark ▸ ▸ SSD ▸ Spark Parquet ▸ Performance comparison of different file formats and storage engines in the Hadoop ecosystem ▸ Parquet Python
  • 30. 4 / 5 : Apache Spark Apache Spark ▸ ▸ Parquet ▸ ▸
  • 32. Machine Learning ▸ ▸ scikit-learn ▸ Spark MLlib / ML ▸ ▸ TensorFlow ▸ Python
  • 33. 2017 and the future
  • 34. 5/5 : 2017 and the future PyData ▸ ▸ Spark - pandas ▸ pandas → Spark …
  • 35. 5/5 : 2017 and the future Wes blog ▸ pandas Apache Arrow ▸ Blog ▸ PyData Blog 
 Wes OK ▸ 2017 : pandas, Arrow, Feather, Parquet, Spark, Ibis
 http://qiita.com/tamagawa-ryuji/items/deb3f63ed4c7c8065e81
  • 36. 5/5 : 2017 and the future High speed Apache Parquet for Python ▸ Parquet ▸ Spark ▸ Python ▸ Fastparquet ▸ pyarrow
  • 37. 5/5 : 2017 and the future : apache arrow ▸ apache arrow ▸ PyData / OSS ▸ /