@canard0328
t t
2
r
Tn
t
t
or
3
http://nbviewer.ipython.org/gist/canard0328/6f44229365f53b7bd30f/
http://nbviewer.ipython.org/gist/canard0328/a5911ee5b4bf1a07fbcb/
https://gist.github.com/canard0328/07a65584c134a2700725
https://gist.github.com/canard0328/b2f8aec2b9c286f53400
4
Sample
Explore
Modify
Model
Assess
Sample Explore Modify Model Assess
t
t r
t
SEMMA
5
CRISPLDM CRossLIndustryNStandardNProcessNforNDataNMining
BusinessNUnderstanding
DataNUnderstanding
DataNPreparation
Modeling
Evaluation
Deployment
KDD KnowledgeNDiscoveryNinNDatabases
Selection
Preprocessing
Transformation
DataNMining
Interpretation/Evaluation
KKD Keiken,NKan andNDokyo
6
t
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv
(DataNobtainedNfromNhttp://biostat.mc.vanderbilt.edu/DataSets)
> data = read.csv(“titanic3.csv”,
+ stringsAsFactors=F, na.strings=c("","NA"))
>>> import pandas as pd
>>> data = pd.read_csv(‘titanic3.csv')
Sample Explore Modify
Assess Model
7
t
t
t
t
t
Sample Explore Modify
Assess Model
8
() ( ( ) )
) ( ()
(
( ) (
( ) (
÷
9
r
t r
r
r
10
11
1. t
2. t
3. t
4. t
12
t
t
t
t
Sample Explore Modify
Assess Model
13
t rT
14
u
nT t T
10of0K 15
N t NL1 t
Feature hashing /=Hashing trick 16
FeatureNhashing t
Nt v t
xN:=NnewNvector[N]
forNfNinNfeatures:
hN:=Nhash(f)
x[hNmodNN]N+=N1
http://en.wikipedia.org/wiki/Feature_hashing
(Curse=of=dimensionality) 17
t
r
g r ur n
u
t e
e e
t T Tn u
t e
(Standardization) 18
xt 10 i
t n
(Standardization) 19
a
(Standardization)
σ
µ−
=
x
z
σ
µ xt
xt
P 1 e
20
t r
(Feature selection)
t r t e
(ForwardNstepwiseNselection)
(BackwardNstepwiseNselection)
21
UglyNducklingNtheorem
T t t t t u
t t t t T
t
22
4. t
5. t
6. t
23
u
“MachineNlearningNisNtheNscienceNofNgettingNcomputersNtoN
actNwithoutNbeingNexplicitlyNprogrammed.”NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
AndrewNNg
u t T
t e e
23
Sample Explore Modify
Assess Model
24
supervisedNlearning
t
• classification
• regression
unsupervisedNlearning
u t
•
•
• outlierNdetection
25
gt t
• semiLsupervisedNlearning
• reinforcementNlearning
• activeNlearning
• onlineNlearning
• transferNlearning
26
•
•
•
• k
•
•
•
•
•
27
r
• KLmeansN
•
• Apriori
• OneLclassNSVM
28
nu
TnT t
rT
r rT
29
x
y
εββββ +++++= ii xxxy !22110
u
generalizedNlinearNmodel
u t
30
KLmeans
KLmeans u
t
n
T
GaussianNmixtureNmodel
t
31
t T
÷
u n T t T t
32
7.
3333
Sample Explore Modify
Assess Model
34
(MeanNabsoluteNerror)
T T
(MeanNsquare(d)Nerror)
T T
RootNMeanNSquare(d)N Error
R2(CoefficientNofNdetermination)
÷ T e
0( T) 1( T)
T r
35
(Accuracy)
(ErrorNrate)
1N
1 t t 100 t
e t u99%
u T T i
36
(ConfusionNmatrix)
(Positive)   26 5 8 6
(TrueNpositiveN:NTP) (FalseNnegativeN:NFN)
(FalseNpositiveN:NFP)   4: 6 96 5 8 6 / 42
T nT
v t r
37
(Precision)
TP/(TPN+NFP)
tt
(Recall)
TP/(TPN+NFN)
t
F (F1Nscore,NFLmeasure)
2 ( )N/N( ) P 2
3 TP FN
2 FP 42
38
(True Positive Rate)
TP/(TPN+NFN)
t
(False Positive Rate)
FP/(FPN+NTN)
t n
P 2
3 TP FN
2 FP 42
39
1 t t 100 t
e
(Positive) (Negative)
0 100
0 9900
0.99
0
0
F 0
40
t u
e r
T
t
rT e T e
SMOTE
u r rT T...
41
u
t T e
u r r
ROC
t r
t
AUC
ROC t t 1.0
42
ROC AUC
43
n
r
T t u
rT
>Nclf =NSVC().fit(X,Ny)
44
u
e
>Nclf =NSVC(kernel=‘rbf’,NC=1.0Ngamma=0.1).fit(X,Ny)
45
r t
T t e
46
t
r t( : t )u r
g rT tu n(10L2,10L1,100,101,102)
u
n
47
n
r
0.0 F 1.0 i
r
48
t 0.0 u t
49
(OverNfitting)
n
n T u T
e n
t e
e
t T T r
T eT
50
e r e rT
(Regularization) t
Lasso SVMr
t t
r
e n rT(UnderNfitting)
51
(Cross validation)
e
1. B E A
2. A,C E B
3. A,B,D,E C
4. A C,E D
5. A D E
6. 5t
5 5LfoldNcrossNvalidation
52
t
K
1 (LeaveLoneLout cross validation)
(StratifiedNcrossNvalidation)
t t
K
t
a r t e t
53
8.
9.
54
t
ε=N(0,Nσ2)
σ2+Bias2+Variance
Bias( )
t e
Variance( )
e
55
t
ε
t
56
ε
t
u T tv u T →
1
57
ε
t
T →
58
u t
t T
(OverNfitting)
t T
UnderNfitting
59
r ( )
( )
60
( ) T( T)
t T
t T
t nTrT
61
T
t T
t T
62
r e
t T t e
e
r e
63
10.
11. t
12. t
64
(EnsembleNlearning)
• t t
• Stacking Bagging Boosting
• u
DeepNlearning
• NeuralNnetworkst
• r
… 65
https://www.linkedin.com/pulse/inconvenientLtruthLdataLscienceLkamilLbartocha
66
MALSS
(MachineNLearningNSupportNSystem)
t e
Python
•
•
•
•
•
67
MALSS
> pip install –U malss
> from malss import MALSS
> clf = MALSS('classification‘, lang=‘jp’)
> clf.fit(X, y, ‘report_output_dir')
> clf.make_sample_code('sample_code.py')
68
MALSS
69
MALSS
70
F.NProvost
Coursera:=Machine=Learning
AndrewNNg https://www.coursera.org/course/ml
scikit0learn=Tutorials
http://scikitLlearn.org/stable/tutorial/
Tutorial:=Machine=Learning=for=Astronomy=with=Scikit0learn
http://www.astroml.org/sklearn_tutorial/
71
MALSS=(Machine=Learning=Support=System)
https://pypi.python.org/pypi/malss/
https://github.com/canard0328/malss
Python MALSS
Qiita http://qiita.com/canard0328/items/fe1ccd5721d59d76cc77
Python MALSS
Qiita http://qiita.com/canard0328/items/5da95ff4f2e1611f87e1
Python MALSS
Qiita http://qiita.com/canard0328/items/3713d6758fe9c045a19d
72
1.
SEMMA CRISPLDM KDD KKD
2. t
t T T t
3.
4.

機械学習によるデータ分析 実践編