SlideShare a Scribd company logo
Spark + Scikit-Learn -
Performance Tuning
Who am I ?
● Kent (施晨揚)
● 熱愛 Machine Learning & Big
Data
● 兩個孩子的爸
https://www.facebook.com/texib
What ? Key Factor - Influence Performance
• Key Factor - Influence Performance
• Large Raw Data Size (4 Billion Record)
• Large Number of Cookies (40 Million Records )
• Machine Learning Library - Prediction Function Cost
How ?
● Spark
o Parallel Computing
o Scaleable
o Very Powerful Data Processing Tool
o 但 Machine Learning Library ….
● Python Scikit-Learn
o Very Powerful Machine Learning Library
o 但部份都只能用到單核 XD
So We Have a Idea !
Data Size
Training
Data Prediction Data
Major
Problem
<~ 100 times
Use Python Prepare Prediction Data
Prepare Data
Train Model
Prepare Prediction
Data
About 30 Mins
> 1 Weeks
Do Prediction
Aggregation - Where is Slow ?
50%
• Aggregate 4 Billion Rows to 40 Million Cookies is a Very
Consuming Job
Use mapPartitions()
• Instead of Using ReduceByKey() with Yours Aggregation
Logic
• How :
• 1 step : use db(redshift) to prepare prediction data
order by cookie
• 2 step : use local map partitions to do batch prediction
Use ReduceByKey
(A,1)
(B,2)
(A,3)
(A,3)
(B,4)
(C,2)
(B,2)
(C,2)
(D.1)
(C,1)
(D,2)
(D,1)
(A,4)
(B,2)
(A,3)
(B,4)
(C,2)
(B,2)
(C,2)
(D.1)
(C,1)
(D,3)
(A,7)
(B,6)
(C,2)
(B,2)
(C,3)
(D,4)
(A,7)
(B,8)
(C,5)
(D,4)
24 hours!!
Use DB to Pre-Sort and mapPartitions
(A,1)
(A,3)
(A,3)
(B,2)
(B,4)
(B,2)
(C,2)
(C,2)
(C,1)
(D,1)
(D,2)
(D,1)
(A,7)
(B,8)
(D,4)
(C,5)
12 hours!!
Prediction - Atomic Job
Do Prediction
Prediction - Batch Job
Do Prediction
Conclusion
• Use db to do presort data better than do aggregation by
spark
• Use batch better than atomic
Another Case - Spam Article Classifier
• Article Structure Classifier
• Article Content Classifier
• Bag of Word
• High Dimension Feature Space
• Very Sparse Vector
• Large Number of Documents
Original
sc.textfile
RDD
text to terms
RDD
collect to python
list
2 Millions
Docs
Bang!!
dict
vectorize
sparse vectorstd-idf transformbuild classifier
New
sc.textfile
RDD
text to terms
RDD
distinct rdd
tf-idf transformtf-idf sparse matrix
collect terms
and
Build Vectorize
terms to sparse
vector
RDD
collect sparse
vector to list
Use Vstack
list to sparse
martix
build classifier

More Related Content

What's hot

Big data PPT
Big data PPT Big data PPT
Big data PPT
Nitesh Dubey
 
Frequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoopFrequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoop
SWAMI06
 
Big data – An Introduction, July 2013
Big data – An Introduction, July 2013Big data – An Introduction, July 2013
Big data – An Introduction, July 2013
Peter Morgan
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
DataWorks Summit/Hadoop Summit
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
Abinaya B
 
A Hadoop Primer
A Hadoop PrimerA Hadoop Primer
A Hadoop Primer
sogrady
 
BitBootCamp Evening Classes
BitBootCamp Evening ClassesBitBootCamp Evening Classes
BitBootCamp Evening Classes
Menish Gupta
 
Biq query devfest2017_slides
Biq query devfest2017_slidesBiq query devfest2017_slides
Biq query devfest2017_slides
getdinesh
 
Hudson Data Corp Training
Hudson Data Corp TrainingHudson Data Corp Training
Hudson Data Corp Training
Menish Gupta
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
Christos Charmatzis
 
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
Keiichiro Ono
 
WeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew ReyesWeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
SlideCentral
 
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Data Con LA
 
ECU SBL Learning Analytics for Assurance of Learning
ECU SBL Learning Analytics for Assurance of LearningECU SBL Learning Analytics for Assurance of Learning
ECU SBL Learning Analytics for Assurance of Learning
Sue Hickton
 

What's hot (15)

Big data PPT
Big data PPT Big data PPT
Big data PPT
 
Frequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoopFrequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoop
 
Big data – An Introduction, July 2013
Big data – An Introduction, July 2013Big data – An Introduction, July 2013
Big data – An Introduction, July 2013
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
 
A Hadoop Primer
A Hadoop PrimerA Hadoop Primer
A Hadoop Primer
 
BitBootCamp Evening Classes
BitBootCamp Evening ClassesBitBootCamp Evening Classes
BitBootCamp Evening Classes
 
Biq query devfest2017_slides
Biq query devfest2017_slidesBiq query devfest2017_slides
Biq query devfest2017_slides
 
Hudson Data Corp Training
Hudson Data Corp TrainingHudson Data Corp Training
Hudson Data Corp Training
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
 
WeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew ReyesWeCloudData Toronto Open311 Workshop - Matthew Reyes
WeCloudData Toronto Open311 Workshop - Matthew Reyes
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
 
ECU SBL Learning Analytics for Assurance of Learning
ECU SBL Learning Analytics for Assurance of LearningECU SBL Learning Analytics for Assurance of Learning
ECU SBL Learning Analytics for Assurance of Learning
 

Viewers also liked

Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
Spark Summit
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
Piotr Przymus
 
Vasiliy Litvinov - Python Profiling
Vasiliy Litvinov - Python ProfilingVasiliy Litvinov - Python Profiling
Vasiliy Litvinov - Python Profiling
Sergey Arkhipov
 
Denis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceDenis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python Performance
Sergey Arkhipov
 
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian OzsvaldThe High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
PyData
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python Integration
GlobalLogic Ukraine
 
Python profiling
Python profilingPython profiling
Python profiling
dreampuf
 
Spark sql
Spark sqlSpark sql
Spark sql
Freeman Zhang
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
Kazuaki Ishizaki
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
Databricks
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
Maksud Ibrahimov
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in Spark
Spark Summit
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130
Xuan-Chao Huang
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profiling
Jon Haddad
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production ScaleGPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
sparktc
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Spark Summit
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
Spark Summit
 
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Spark Summit
 
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleGPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
Spark Summit
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
Spark Summit
 

Viewers also liked (20)

Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
Vasiliy Litvinov - Python Profiling
Vasiliy Litvinov - Python ProfilingVasiliy Litvinov - Python Profiling
Vasiliy Litvinov - Python Profiling
 
Denis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceDenis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python Performance
 
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian OzsvaldThe High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python Integration
 
Python profiling
Python profilingPython profiling
Python profiling
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in Spark
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profiling
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production ScaleGPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
 
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleGPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
 

Similar to Spark + Scikit Learn- Performance Tuning

Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
Nicolas Poggi
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Josh Baer
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
Giivee The
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
Wayne Chen
 
There is more to Big Data than data
There is more to Big Data than dataThere is more to Big Data than data
There is more to Big Data than dataCapgemini
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Anant Corporation
 
Python ml
Python mlPython ml
Python ml
Shubham Sharma
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?
Vincent Terrasi
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
NUS-ISS
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
Gleb Kanterov
 
Etl with apache impala by athemaster
Etl with apache impala by athemasterEtl with apache impala by athemaster
Etl with apache impala by athemaster
Athemaster Co., Ltd.
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
dhruv_gairola
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
AWS Chicago
 
New Capabilities in the PyData Ecosystem
New Capabilities in the PyData EcosystemNew Capabilities in the PyData Ecosystem
New Capabilities in the PyData Ecosystem
Turi, Inc.
 
Faster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at YahooFaster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at Yahoo
Mithun Radhakrishnan
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 

Similar to Spark + Scikit Learn- Performance Tuning (20)

Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
 
There is more to Big Data than data
There is more to Big Data than dataThere is more to Big Data than data
There is more to Big Data than data
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
 
Python ml
Python mlPython ml
Python ml
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Etl with apache impala by athemaster
Etl with apache impala by athemasterEtl with apache impala by athemaster
Etl with apache impala by athemaster
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
 
New Capabilities in the PyData Ecosystem
New Capabilities in the PyData EcosystemNew Capabilities in the PyData Ecosystem
New Capabilities in the PyData Ecosystem
 
Faster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at YahooFaster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at Yahoo
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 

More from 晨揚 施

Google Cloud Summit BigQuery PIXNET
Google Cloud Summit BigQuery PIXNETGoogle Cloud Summit BigQuery PIXNET
Google Cloud Summit BigQuery PIXNET
晨揚 施
 
How to run an AI Project @pixnet
How to run an AI Project @pixnetHow to run an AI Project @pixnet
How to run an AI Project @pixnet
晨揚 施
 
2016 Hadoop Conf TW - 如何建置數據精靈
2016 Hadoop Conf TW - 如何建置數據精靈2016 Hadoop Conf TW - 如何建置數據精靈
2016 Hadoop Conf TW - 如何建置數據精靈
晨揚 施
 
痞客邦的關鍵字精靈
痞客邦的關鍵字精靈痞客邦的關鍵字精靈
痞客邦的關鍵字精靈
晨揚 施
 
如何建置關鍵字精靈 How to Build an Keyword Wizard
如何建置關鍵字精靈 How to Build an Keyword Wizard如何建置關鍵字精靈 How to Build an Keyword Wizard
如何建置關鍵字精靈 How to Build an Keyword Wizard
晨揚 施
 
Using browsing behavior history to predict user’s gender presenation
Using browsing behavior history to predict user’s gender   presenationUsing browsing behavior history to predict user’s gender   presenation
Using browsing behavior history to predict user’s gender presenation晨揚 施
 

More from 晨揚 施 (6)

Google Cloud Summit BigQuery PIXNET
Google Cloud Summit BigQuery PIXNETGoogle Cloud Summit BigQuery PIXNET
Google Cloud Summit BigQuery PIXNET
 
How to run an AI Project @pixnet
How to run an AI Project @pixnetHow to run an AI Project @pixnet
How to run an AI Project @pixnet
 
2016 Hadoop Conf TW - 如何建置數據精靈
2016 Hadoop Conf TW - 如何建置數據精靈2016 Hadoop Conf TW - 如何建置數據精靈
2016 Hadoop Conf TW - 如何建置數據精靈
 
痞客邦的關鍵字精靈
痞客邦的關鍵字精靈痞客邦的關鍵字精靈
痞客邦的關鍵字精靈
 
如何建置關鍵字精靈 How to Build an Keyword Wizard
如何建置關鍵字精靈 How to Build an Keyword Wizard如何建置關鍵字精靈 How to Build an Keyword Wizard
如何建置關鍵字精靈 How to Build an Keyword Wizard
 
Using browsing behavior history to predict user’s gender presenation
Using browsing behavior history to predict user’s gender   presenationUsing browsing behavior history to predict user’s gender   presenation
Using browsing behavior history to predict user’s gender presenation
 

Recently uploaded

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 

Recently uploaded (20)

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 

Spark + Scikit Learn- Performance Tuning