SlideShare a Scribd company logo
30 分鐘學會
實作 Python Feature Selection
James CC Huang
Disclaimer
• 只有實作
• 沒有數學
• 沒有統計
Source: Internet
Warming Up
• 聽說這場分享不會有人問問題 (把講者釘在台上)
• 原 session 只講 40 分鐘,但是今天的分享給了 2 小時
• 考驗我的記憶力和理解力
• 講者講了一大堆名詞但沒有講實作 (不可能有時間講)
• 我用 Python 實作範例
• 希望大家如果跟我一樣,不搞理論也不搞數學統計,回家用剪貼的就可
以用 scikit-learn 做 feature selection
Reinventing the Wheel?
Source: P.60 http://www.slideshare.net/tw_dsconf/ss-62245351
進行 Machine Learning 和 Deep Learning…
• 到底需不需要懂背後的數學、統計、理論…?
• 推廣及普及 Machine Learning / Deep Learning
• 工具的易用性及快速開發
• 正反方意見都有
• 正方例子:談到投入大演算 ”… 你會認為這需要繁重的數
學和嚴謹的理論工作,其實不然,反倒這所需要的是從
艱深的數學理論抽離,以便能看到學習現象的整體模
式。” (大演算 The Master Algorithm, P. 40)
• 反方例子:Deep Neural Networks - A Developmental
Perspective (slides, video)
2014 – 2016
台灣資料科學”愛好者”年會
我的分享
一、連續 3 年吃便當的經驗
二、2016 聽完 Feature Engineering in Machine Learning 演講後夢到的東西
三年的進化
• 參加的人愈來愈多
• [不負責任目測] 與會者平均年齡愈來愈大 XD
• 內容愈來愈多、場次愈來愈多
• 演講者身份的改變:教授和來自研究單位變多
• Deep Learning 這個詞出現頻率大幅增加
• $$ 愈來愈貴
• 朝向使用者付費
• 部分付費課程也會持續開課
• 便當沒有進化(都是同樣那幾家)
http://datasci.tw/agenda.php
http://datasci.tw/agenda.php
http://datasci.tw/agenda.php
http://datasci.tw/agenda.php
http://datasci.tw/agenda.php
http://datasci.tw/agenda.php
http://datasci.tw/agenda.php
Feature Engineering in Machine Learning
Session (Speaker: 李俊良)
Source: http://www.slideshare.net/tw_dsconf/feature-engineering-in-machine-learning
用 Feature Engineering 可否判斷出寫作風
格?
• 羅琳化名寫小說 曝光後銷量飆升
http://www.bbc.com/zhongwen/trad/uk_study/2013/07/130714_ro
wling_novel
• “曾有書評評價新書《杜鵑鳥在呼喚》是部「才華橫溢的處女作」,還有
書評盛讚這名男性作者,能如此精湛地描述女性的服裝。”
• “… 出版( 3 個月)的這部小說,已經售出1500冊。但亞馬遜網站報道說,
周日正午12點後,該書的銷售量飆增,增速高達500000%。”
• 原投影片 P. 14 (Source:
http://www.slideshare.net/tw_dsconf/feature-engineering-in-
machine-learning)
Find Word / Doc Similarity with
Deep Learning
Using word2vec and Gensim (Python)
Goal (or Problem to Solve)
• Problem: Tech Support engineers (TS) want to “precisely” categorize
support cases. The task is being performed manually by TS engineers.
• Goal: Automatically categorize support case.
• What I have:
• 156 classified cases (with “so-called” correct issue categories)
• Support cases in database
• Challenges:
• Based on current data available, supervised classification algorisms can‘t be
applied.
• Clustering may not 100% achieve the goal.
• What about Deep Learning?
Gensim (word2vec implementation in Python)
from os import listdir
import gensim
LabeledSentence = gensim.models.doc2vec.LabeledSentence
docLabels = []
docLabels = [f for f in listdir(“../corpora/2016/”) if f.endswith(‘.txt’)]
data = []
for doc in docLabels:
data.append(open(“../corpora/2016/” + doc, ‘r’))
class LabeledLineSentence(object):
def __init__(self, doc_list, labels_list):
self.labels_list = labels_list
self.doc_list = doc_list
def __iter__(self):
for idx, doc in enumerate(self.doc_list):
yield LabeledSentence(words=doc.read().split(),
labels=[self.labels_list[idx]])
Gensim (Cont’d)
it = LabeledLineSentence(data, docLabels)
model = gensim.models.Doc2Vec(alpha=0.025,
min_alpha=0.025)
model.build_vocab(it)
for epoch in range(10):
model.train(it)
model.alpha -= 0.002
model.min_alpha = model.alpha
# find most similar support case
print model.most_similar(“00111105”)
江湖傳言
• 用 Deep Learning 就不需要做 feature selection,因為 deep learning
會自動幫你決定
• From Wikipedia (https://en.wikipedia.org/wiki/Deep_learning):
• “One of the promises of deep learning is replacing handcrafted features with
efficient algorithms for unsupervised or semi-supervised feature learning and
hierarchical feature extraction.”
• 真 的 有 這 麼 神 奇 嗎 ?
Feature selection for Iris Dataset as Example
• Iris dataset attributes
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class:
-- Iris Setosa
-- Iris Versicolour
-- Iris Virginica
Feature Selection - LASSO
>>> from sklearn.linear_model import Lasso
>>> from sklearn.datasets import load_iris
>>> from sklearn.feature_selection import SelectFromModel
>>> iris = load_iris()
>>> X, y = iris.data, iris.target
>>> print X.shape
(150, 4)
>>> clf = Lasso(alpha=0.01)
>>> sfm = SelectFromModel(clf, threshold=0.25)
>>> sfm.fit(X, y)
>>> n_features = sfm.transform(X).shape[1]
>>> print n_features
2
petal width & petal length
Feature Selection - LASSO (Cont’d)
>>> scaler = StandardScaler()
>>> X = scaler.fit_transform(X)
>>> names = iris["feature_names"]
>>> lasso = Lasso(alpha=0.01, positive=True)
>>> lasso.fit(X, y)
>>> print (sorted(zip(map(lambda x: round(x, 4),
lasso.coef_), names), reverse=True))
[(0.47199999999999998, 'petal width (cm)'),
(0.3105, 'petal length (cm)'), (0.0, 'sepal
width (cm)'), (0.0, 'sepal length (cm)')]
Feature Selection – Random Forest
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import RandomForestRegressor
>>> iris = load_iris()
>>> X, y = iris.data, iris.target
>>> print (X.shape)
(150, 4)
>>> names = iris["feature_names"]
>>> rf = RandomForestRegressor()
>>> rf.fit(X, y)
>>> print (sorted(zip(map(lambda x: round(x, 4),
rf.feature_importances_), names), reverse=True))
[(0.50729999999999997, 'petal width (cm)'), (0.47870000000000001,
'petal length (cm)'), (0.0091000000000000004, 'sepal width (cm)'),
(0.0048999999999999998, 'sepal length (cm)')]
Dimension Reduction - PCA
>>> from sklearn.datasets import load_iris
>>> from sklearn.decomposition import PCA as pca
>>> from sklearn.preprocessing import StandardScaler
>>> iris = load_iris()
>>> X, y = iris.data, iris.target
>>> X = StandardScaler().fit_transform(X)
>>> sklearn_pca = pca(n_components=2)
>>> sklearn_pca.fit_transform(X)
>>> print (sklearn_pca.components_)
[[ 0.52237162 -0.26335492 0.58125401 0.56561105]
[-0.37231836 -0.92555649 -0.02109478 -0.06541577]]
There are many others…
這次分享就是僅是把原講者所提到的方式實際做出來
簡單的我做完了, 難的就留給大家去發掘~
Reference
scikit-learn
• Feature selection
http://scikit-learn.org/stable/modules/feature_selection.html
• sklearn.linear_model.Lasso
http://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html
• sklearn.decomposition.PCA http://scikit-
learn.org/stable/modules/generated/sklearn.decomposition.PCA.htm
l
Gensim
• https://radimrehurek.com/gensim/index.html
HoG (Histogram of Oriented Gradients)
• Python code example http://scikit-
image.org/docs/dev/auto_examples/plot_hog.html
An Introduction to Variable and Feature
Selection
• Author: Isabelle Guyon and Andre Elisseeff
• PDF download:
http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf

More Related Content

What's hot

High performance GPU computing with Ruby RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby RubyConf 2017
Prasun Anand
 
Cheat sheet python3
Cheat sheet python3Cheat sheet python3
Cheat sheet python3
sxw2k
 
Python 2.5 reference card (2009)
Python 2.5 reference card (2009)Python 2.5 reference card (2009)
Python 2.5 reference card (2009)
gekiaruj
 
Python bokeh cheat_sheet
Python bokeh cheat_sheet Python bokeh cheat_sheet
Python bokeh cheat_sheet
Nishant Upadhyay
 
Артём Акуляков - F# for Data Analysis
Артём Акуляков - F# for Data AnalysisАртём Акуляков - F# for Data Analysis
Артём Акуляков - F# for Data Analysis
SpbDotNet Community
 
밑바닥부터 시작하는 의료 AI
밑바닥부터 시작하는 의료 AI밑바닥부터 시작하는 의료 AI
밑바닥부터 시작하는 의료 AI
NAVER Engineering
 
Python_ 3 CheatSheet
Python_ 3 CheatSheetPython_ 3 CheatSheet
Python_ 3 CheatSheet
Dr. Volkan OBAN
 
Python data structures
Python data structuresPython data structures
Python data structures
kalyanibedekar
 
Begin with Machine Learning
Begin with Machine LearningBegin with Machine Learning
Begin with Machine Learning
Narong Intiruk
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat Sheet
GlowTouch
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat Sheet
ACASH1011
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
Sourabh Sahu
 
Palestra sobre Collections com Python
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Pythonpugpe
 
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 AutumnGoptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Masashi Shibata
 
Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}
Takashi J OZAKI
 
Chaco Step-by-Step
Chaco Step-by-StepChaco Step-by-Step
Chaco Step-by-Step
Enthought, Inc.
 
Mementopython3 english
Mementopython3 englishMementopython3 english
Mementopython3 english
ssuser442080
 
Clustering com numpy e cython
Clustering com numpy e cythonClustering com numpy e cython
Clustering com numpy e cythonAnderson Dantas
 
Haskellで学ぶ関数型言語
Haskellで学ぶ関数型言語Haskellで学ぶ関数型言語
Haskellで学ぶ関数型言語
ikdysfm
 
Pybelsberg — Constraint-based Programming in Python
Pybelsberg — Constraint-based Programming in PythonPybelsberg — Constraint-based Programming in Python
Pybelsberg — Constraint-based Programming in Python
Christoph Matthies
 

What's hot (20)

High performance GPU computing with Ruby RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby RubyConf 2017
 
Cheat sheet python3
Cheat sheet python3Cheat sheet python3
Cheat sheet python3
 
Python 2.5 reference card (2009)
Python 2.5 reference card (2009)Python 2.5 reference card (2009)
Python 2.5 reference card (2009)
 
Python bokeh cheat_sheet
Python bokeh cheat_sheet Python bokeh cheat_sheet
Python bokeh cheat_sheet
 
Артём Акуляков - F# for Data Analysis
Артём Акуляков - F# for Data AnalysisАртём Акуляков - F# for Data Analysis
Артём Акуляков - F# for Data Analysis
 
밑바닥부터 시작하는 의료 AI
밑바닥부터 시작하는 의료 AI밑바닥부터 시작하는 의료 AI
밑바닥부터 시작하는 의료 AI
 
Python_ 3 CheatSheet
Python_ 3 CheatSheetPython_ 3 CheatSheet
Python_ 3 CheatSheet
 
Python data structures
Python data structuresPython data structures
Python data structures
 
Begin with Machine Learning
Begin with Machine LearningBegin with Machine Learning
Begin with Machine Learning
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat Sheet
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat Sheet
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Palestra sobre Collections com Python
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Python
 
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 AutumnGoptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
 
Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}
 
Chaco Step-by-Step
Chaco Step-by-StepChaco Step-by-Step
Chaco Step-by-Step
 
Mementopython3 english
Mementopython3 englishMementopython3 english
Mementopython3 english
 
Clustering com numpy e cython
Clustering com numpy e cythonClustering com numpy e cython
Clustering com numpy e cython
 
Haskellで学ぶ関数型言語
Haskellで学ぶ関数型言語Haskellで学ぶ関数型言語
Haskellで学ぶ関数型言語
 
Pybelsberg — Constraint-based Programming in Python
Pybelsberg — Constraint-based Programming in PythonPybelsberg — Constraint-based Programming in Python
Pybelsberg — Constraint-based Programming in Python
 

Viewers also liked

Multi Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back PropagationMulti Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back Propagation
Sung-ju Kim
 
MPerceptron
MPerceptronMPerceptron
MPerceptronbutest
 
Aprendizaje Redes Neuronales
Aprendizaje Redes NeuronalesAprendizaje Redes Neuronales
Aprendizaje Redes Neuronales
Alex Jhampier Rojas Herrera
 
閒聊Python應用在game server的開發
閒聊Python應用在game server的開發閒聊Python應用在game server的開發
閒聊Python應用在game server的開發
Eric Chen
 
Pengenalan pola sederhana dg perceptron
Pengenalan pola sederhana dg perceptronPengenalan pola sederhana dg perceptron
Pengenalan pola sederhana dg perceptron
Arief Fatchul Huda
 
Technology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondTechnology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and Beyond
James Huang
 
Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031frdos
 
Perceptron Slides
Perceptron SlidesPerceptron Slides
Perceptron SlidesESCOM
 
14 Machine Learning Single Layer Perceptron
14 Machine Learning Single Layer Perceptron14 Machine Learning Single Layer Perceptron
14 Machine Learning Single Layer Perceptron
Andres Mendez-Vazquez
 
Perceptron
PerceptronPerceptron
Perceptron
Nagarajan
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Mohammed Bennamoun
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
REHMAT ULLAH
 
Short Term Load Forecasting Using Multi Layer Perceptron
Short Term Load Forecasting Using Multi Layer Perceptron Short Term Load Forecasting Using Multi Layer Perceptron
Short Term Load Forecasting Using Multi Layer Perceptron
IJMER
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
Ahmed_hashmi
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksstellajoseph
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 

Viewers also liked (16)

Multi Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back PropagationMulti Layer Perceptron & Back Propagation
Multi Layer Perceptron & Back Propagation
 
MPerceptron
MPerceptronMPerceptron
MPerceptron
 
Aprendizaje Redes Neuronales
Aprendizaje Redes NeuronalesAprendizaje Redes Neuronales
Aprendizaje Redes Neuronales
 
閒聊Python應用在game server的開發
閒聊Python應用在game server的開發閒聊Python應用在game server的開發
閒聊Python應用在game server的開發
 
Pengenalan pola sederhana dg perceptron
Pengenalan pola sederhana dg perceptronPengenalan pola sederhana dg perceptron
Pengenalan pola sederhana dg perceptron
 
Technology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondTechnology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and Beyond
 
Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031
 
Perceptron Slides
Perceptron SlidesPerceptron Slides
Perceptron Slides
 
14 Machine Learning Single Layer Perceptron
14 Machine Learning Single Layer Perceptron14 Machine Learning Single Layer Perceptron
14 Machine Learning Single Layer Perceptron
 
Perceptron
PerceptronPerceptron
Perceptron
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
 
Short Term Load Forecasting Using Multi Layer Perceptron
Short Term Load Forecasting Using Multi Layer Perceptron Short Term Load Forecasting Using Multi Layer Perceptron
Short Term Load Forecasting Using Multi Layer Perceptron
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 

Similar to 30 分鐘學會實作 Python Feature Selection

A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
Databricks
 
Python 표준 라이브러리
Python 표준 라이브러리Python 표준 라이브러리
Python 표준 라이브러리
용 최
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPy
Kimikazu Kato
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
Gabriel Moreira
 
Simple APIs and innovative documentation
Simple APIs and innovative documentationSimple APIs and innovative documentation
Simple APIs and innovative documentation
PyDataParis
 
Python utan-stodhjul-motorsag
Python utan-stodhjul-motorsagPython utan-stodhjul-motorsag
Python utan-stodhjul-motorsagniklal
 
C3 w2
C3 w2C3 w2
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Gabriel Moreira
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
PROIDEA
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Time Series Analysis:Basic Stochastic Signal Recovery
Time Series Analysis:Basic Stochastic Signal RecoveryTime Series Analysis:Basic Stochastic Signal Recovery
Time Series Analysis:Basic Stochastic Signal Recovery
Daniel Cuneo
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010
Qiangning Hong
 
Ml5
Ml5Ml5
Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook
Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook

Artur Rodrigues
 
Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Francesco
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
Hichem Felouat
 
Python: The Dynamic!
Python: The Dynamic!Python: The Dynamic!
Python: The Dynamic!
Omid Mogharian
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
Daniel Greenfeld
 

Similar to 30 分鐘學會實作 Python Feature Selection (20)

A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
 
Python 표준 라이브러리
Python 표준 라이브러리Python 표준 라이브러리
Python 표준 라이브러리
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPy
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Simple APIs and innovative documentation
Simple APIs and innovative documentationSimple APIs and innovative documentation
Simple APIs and innovative documentation
 
Python utan-stodhjul-motorsag
Python utan-stodhjul-motorsagPython utan-stodhjul-motorsag
Python utan-stodhjul-motorsag
 
C3 w2
C3 w2C3 w2
C3 w2
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Time Series Analysis:Basic Stochastic Signal Recovery
Time Series Analysis:Basic Stochastic Signal RecoveryTime Series Analysis:Basic Stochastic Signal Recovery
Time Series Analysis:Basic Stochastic Signal Recovery
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 
Ml5
Ml5Ml5
Ml5
 
Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook
Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook

 
Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 
Python: The Dynamic!
Python: The Dynamic!Python: The Dynamic!
Python: The Dynamic!
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 

Recently uploaded

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 

Recently uploaded (20)

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 

30 分鐘學會實作 Python Feature Selection

  • 1. 30 分鐘學會 實作 Python Feature Selection James CC Huang
  • 2. Disclaimer • 只有實作 • 沒有數學 • 沒有統計 Source: Internet
  • 3. Warming Up • 聽說這場分享不會有人問問題 (把講者釘在台上) • 原 session 只講 40 分鐘,但是今天的分享給了 2 小時 • 考驗我的記憶力和理解力 • 講者講了一大堆名詞但沒有講實作 (不可能有時間講) • 我用 Python 實作範例 • 希望大家如果跟我一樣,不搞理論也不搞數學統計,回家用剪貼的就可 以用 scikit-learn 做 feature selection
  • 4. Reinventing the Wheel? Source: P.60 http://www.slideshare.net/tw_dsconf/ss-62245351
  • 5. 進行 Machine Learning 和 Deep Learning… • 到底需不需要懂背後的數學、統計、理論…? • 推廣及普及 Machine Learning / Deep Learning • 工具的易用性及快速開發 • 正反方意見都有 • 正方例子:談到投入大演算 ”… 你會認為這需要繁重的數 學和嚴謹的理論工作,其實不然,反倒這所需要的是從 艱深的數學理論抽離,以便能看到學習現象的整體模 式。” (大演算 The Master Algorithm, P. 40) • 反方例子:Deep Neural Networks - A Developmental Perspective (slides, video)
  • 6. 2014 – 2016 台灣資料科學”愛好者”年會 我的分享 一、連續 3 年吃便當的經驗 二、2016 聽完 Feature Engineering in Machine Learning 演講後夢到的東西
  • 7. 三年的進化 • 參加的人愈來愈多 • [不負責任目測] 與會者平均年齡愈來愈大 XD • 內容愈來愈多、場次愈來愈多 • 演講者身份的改變:教授和來自研究單位變多 • Deep Learning 這個詞出現頻率大幅增加 • $$ 愈來愈貴 • 朝向使用者付費 • 部分付費課程也會持續開課 • 便當沒有進化(都是同樣那幾家)
  • 15. Feature Engineering in Machine Learning Session (Speaker: 李俊良) Source: http://www.slideshare.net/tw_dsconf/feature-engineering-in-machine-learning
  • 16. 用 Feature Engineering 可否判斷出寫作風 格? • 羅琳化名寫小說 曝光後銷量飆升 http://www.bbc.com/zhongwen/trad/uk_study/2013/07/130714_ro wling_novel • “曾有書評評價新書《杜鵑鳥在呼喚》是部「才華橫溢的處女作」,還有 書評盛讚這名男性作者,能如此精湛地描述女性的服裝。” • “… 出版( 3 個月)的這部小說,已經售出1500冊。但亞馬遜網站報道說, 周日正午12點後,該書的銷售量飆增,增速高達500000%。” • 原投影片 P. 14 (Source: http://www.slideshare.net/tw_dsconf/feature-engineering-in- machine-learning)
  • 17. Find Word / Doc Similarity with Deep Learning Using word2vec and Gensim (Python)
  • 18. Goal (or Problem to Solve) • Problem: Tech Support engineers (TS) want to “precisely” categorize support cases. The task is being performed manually by TS engineers. • Goal: Automatically categorize support case. • What I have: • 156 classified cases (with “so-called” correct issue categories) • Support cases in database • Challenges: • Based on current data available, supervised classification algorisms can‘t be applied. • Clustering may not 100% achieve the goal. • What about Deep Learning?
  • 19. Gensim (word2vec implementation in Python) from os import listdir import gensim LabeledSentence = gensim.models.doc2vec.LabeledSentence docLabels = [] docLabels = [f for f in listdir(“../corpora/2016/”) if f.endswith(‘.txt’)] data = [] for doc in docLabels: data.append(open(“../corpora/2016/” + doc, ‘r’)) class LabeledLineSentence(object): def __init__(self, doc_list, labels_list): self.labels_list = labels_list self.doc_list = doc_list def __iter__(self): for idx, doc in enumerate(self.doc_list): yield LabeledSentence(words=doc.read().split(), labels=[self.labels_list[idx]])
  • 20. Gensim (Cont’d) it = LabeledLineSentence(data, docLabels) model = gensim.models.Doc2Vec(alpha=0.025, min_alpha=0.025) model.build_vocab(it) for epoch in range(10): model.train(it) model.alpha -= 0.002 model.min_alpha = model.alpha # find most similar support case print model.most_similar(“00111105”)
  • 21. 江湖傳言 • 用 Deep Learning 就不需要做 feature selection,因為 deep learning 會自動幫你決定 • From Wikipedia (https://en.wikipedia.org/wiki/Deep_learning): • “One of the promises of deep learning is replacing handcrafted features with efficient algorithms for unsupervised or semi-supervised feature learning and hierarchical feature extraction.” • 真 的 有 這 麼 神 奇 嗎 ?
  • 22. Feature selection for Iris Dataset as Example • Iris dataset attributes 1. sepal length in cm 2. sepal width in cm 3. petal length in cm 4. petal width in cm 5. class: -- Iris Setosa -- Iris Versicolour -- Iris Virginica
  • 23. Feature Selection - LASSO >>> from sklearn.linear_model import Lasso >>> from sklearn.datasets import load_iris >>> from sklearn.feature_selection import SelectFromModel >>> iris = load_iris() >>> X, y = iris.data, iris.target >>> print X.shape (150, 4) >>> clf = Lasso(alpha=0.01) >>> sfm = SelectFromModel(clf, threshold=0.25) >>> sfm.fit(X, y) >>> n_features = sfm.transform(X).shape[1] >>> print n_features 2 petal width & petal length
  • 24. Feature Selection - LASSO (Cont’d) >>> scaler = StandardScaler() >>> X = scaler.fit_transform(X) >>> names = iris["feature_names"] >>> lasso = Lasso(alpha=0.01, positive=True) >>> lasso.fit(X, y) >>> print (sorted(zip(map(lambda x: round(x, 4), lasso.coef_), names), reverse=True)) [(0.47199999999999998, 'petal width (cm)'), (0.3105, 'petal length (cm)'), (0.0, 'sepal width (cm)'), (0.0, 'sepal length (cm)')]
  • 25. Feature Selection – Random Forest >>> from sklearn.datasets import load_iris >>> from sklearn.ensemble import RandomForestRegressor >>> iris = load_iris() >>> X, y = iris.data, iris.target >>> print (X.shape) (150, 4) >>> names = iris["feature_names"] >>> rf = RandomForestRegressor() >>> rf.fit(X, y) >>> print (sorted(zip(map(lambda x: round(x, 4), rf.feature_importances_), names), reverse=True)) [(0.50729999999999997, 'petal width (cm)'), (0.47870000000000001, 'petal length (cm)'), (0.0091000000000000004, 'sepal width (cm)'), (0.0048999999999999998, 'sepal length (cm)')]
  • 26. Dimension Reduction - PCA >>> from sklearn.datasets import load_iris >>> from sklearn.decomposition import PCA as pca >>> from sklearn.preprocessing import StandardScaler >>> iris = load_iris() >>> X, y = iris.data, iris.target >>> X = StandardScaler().fit_transform(X) >>> sklearn_pca = pca(n_components=2) >>> sklearn_pca.fit_transform(X) >>> print (sklearn_pca.components_) [[ 0.52237162 -0.26335492 0.58125401 0.56561105] [-0.37231836 -0.92555649 -0.02109478 -0.06541577]]
  • 27. There are many others… 這次分享就是僅是把原講者所提到的方式實際做出來 簡單的我做完了, 難的就留給大家去發掘~
  • 29. scikit-learn • Feature selection http://scikit-learn.org/stable/modules/feature_selection.html • sklearn.linear_model.Lasso http://scikit- learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html • sklearn.decomposition.PCA http://scikit- learn.org/stable/modules/generated/sklearn.decomposition.PCA.htm l
  • 31. HoG (Histogram of Oriented Gradients) • Python code example http://scikit- image.org/docs/dev/auto_examples/plot_hog.html
  • 32. An Introduction to Variable and Feature Selection • Author: Isabelle Guyon and Andre Elisseeff • PDF download: http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf