Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
@canard0328
t t
2
r
Tn
t
t
or
3
http://nbviewer.ipython.org/gist/canard0328/6f44229365f53b7bd30f/
http://nbviewer.ipython.org/gist/canard0328/a5911ee5b4...
4
Sample
Explore
Modify
Model
Assess
Sample Explore Modify Model Assess
t
t r
t
SEMMA
5
CRISPLDM CRossLIndustryNStandardNProcessNforNDataNMining
BusinessNUnderstanding
DataNUnderstanding
DataNPreparation
Mode...
6
t
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv
(DataNobtainedNfromNhttp://biostat.mc.vanderbilt....
7
t
t
t
t
t
Sample Explore Modify
Assess Model
8
() ( ( ) )
) ( ()
(
( ) (
( ) (
÷
9
r
t r
r
r
10
11
1. t
2. t
3. t
4. t
12
t
t
t
t
Sample Explore Modify
Assess Model
13
t rT
14
u
nT t T
10of0K 15
N t NL1 t
Feature hashing /=Hashing trick 16
FeatureNhashing t
Nt v t
xN:=NnewNvector[N]
forNfNinNfeatures:
hN:=Nhash(f)
x[hNmodNN]N...
(Curse=of=dimensionality) 17
t
r
g r ur n
u
t e
e e
t T Tn u
t e
(Standardization) 18
xt 10 i
t n
(Standardization) 19
a
(Standardization)
σ
µ−
=
x
z
σ
µ xt
xt
P 1 e
20
t r
(Feature selection)
t r t e
(ForwardNstepwiseNselection)
(BackwardNstepwiseNselection)
21
UglyNducklingNtheorem
T t t t t u
t t t t T
t
22
4. t
5. t
6. t
23
u
“MachineNlearningNisNtheNscienceNofNgettingNcomputersNtoN
actNwithoutNbeingNexplicitlyNprogrammed.”NNNNNNNNNNNNNNNNNN...
24
supervisedNlearning
t
• classification
• regression
unsupervisedNlearning
u t
•
•
• outlierNdetection
25
gt t
• semiLsupervisedNlearning
• reinforcementNlearning
• activeNlearning
• onlineNlearning
• transferNlearning
26
•
•
•
• k
•
•
•
•
•
27
r
• KLmeansN
•
• Apriori
• OneLclassNSVM
28
nu
TnT t
rT
r rT
29
x
y
εββββ +++++= ii xxxy !22110
u
generalizedNlinearNmodel
u t
30
KLmeans
KLmeans u
t
n
T
GaussianNmixtureNmodel
t
31
t T
÷
u n T t T t
32
7.
3333
Sample Explore Modify
Assess Model
34
(MeanNabsoluteNerror)
T T
(MeanNsquare(d)Nerror)
T T
RootNMeanNSquare(d)N Error
R2(CoefficientNofNdetermination)
÷ T e
...
35
(Accuracy)
(ErrorNrate)
1N
1 t t 100 t
e t u99%
u T T i
36
(ConfusionNmatrix)
(Positive)   26 5 8 6
(TrueNpositiveN:NTP) (FalseNnegativeN:NFN)
(FalseNpositiveN:NFP)   4: 6 96 5 8...
37
(Precision)
TP/(TPN+NFP)
tt
(Recall)
TP/(TPN+NFN)
t
F (F1Nscore,NFLmeasure)
2 ( )N/N( ) P 2
3 TP FN
2 FP 42
38
(True Positive Rate)
TP/(TPN+NFN)
t
(False Positive Rate)
FP/(FPN+NTN)
t n
P 2
3 TP FN
2 FP 42
39
1 t t 100 t
e
(Positive) (Negative)
0 100
0 9900
0.99
0
0
F 0
40
t u
e r
T
t
rT e T e
SMOTE
u r rT T...
41
u
t T e
u r r
ROC
t r
t
AUC
ROC t t 1.0
42
ROC AUC
43
n
r
T t u
rT
>Nclf =NSVC().fit(X,Ny)
44
u
e
>Nclf =NSVC(kernel=‘rbf’,NC=1.0Ngamma=0.1).fit(X,Ny)
45
r t
T t e
46
t
r t( : t )u r
g rT tu n(10L2,10L1,100,101,102)
u
n
47
n
r
0.0 F 1.0 i
r
48
t 0.0 u t
49
(OverNfitting)
n
n T u T
e n
t e
e
t T T r
T eT
50
e r e rT
(Regularization) t
Lasso SVMr
t t
r
e n rT(UnderNfitting)
51
(Cross validation)
e
1. B E A
2. A,C E B
3. A,B,D,E C
4. A C,E D
5. A D E
6. 5t
5 5LfoldNcrossNvalidation
52
t
K
1 (LeaveLoneLout cross validation)
(StratifiedNcrossNvalidation)
t t
K
t
a r t e t
53
8.
9.
54
t
ε=N(0,Nσ2)
σ2+Bias2+Variance
Bias( )
t e
Variance( )
e
55
t
ε
t
56
ε
t
u T tv u T →
1
57
ε
t
T →
58
u t
t T
(OverNfitting)
t T
UnderNfitting
59
r ( )
( )
60
( ) T( T)
t T
t T
t nTrT
61
T
t T
t T
62
r e
t T t e
e
r e
63
10.
11. t
12. t
64
(EnsembleNlearning)
• t t
• Stacking Bagging Boosting
• u
DeepNlearning
• NeuralNnetworkst
• r
… 65
https://www.linkedin.com/pulse/inconvenientLtruthLdataLscienceLkamilLbartocha
66
MALSS
(MachineNLearningNSupportNSystem)
t e
Python
•
•
•
•
•
67
MALSS
> pip install –U malss
> from malss import MALSS
> clf = MALSS('classification‘, lang=‘jp’)
> clf.fit(X, y, ‘repo...
68
MALSS
69
MALSS
70
F.NProvost
Coursera:=Machine=Learning
AndrewNNg https://www.coursera.org/course/ml
scikit0learn=Tutorials
http://scikit...
71
MALSS=(Machine=Learning=Support=System)
https://pypi.python.org/pypi/malss/
https://github.com/canard0328/malss
Python ...
72
1.
SEMMA CRISPLDM KDD KKD
2. t
t T T t
3.
4.
Upcoming SlideShare
Loading in …5
×

機械学習によるデータ分析 実践編

13,535 views

Published on

演習用のスクリプトは以下にあります.
Python
http://nbviewer.ipython.org/gist/canard0328/a5911ee5b4bf1a07fbcb/
https://gist.github.com/canard0328/07a65584c134a2700725
R
http://nbviewer.ipython.org/gist/canard0328/6f44229365f53b7bd30f/
https://gist.github.com/canard0328/b2f8aec2b9c286f53400

Published in: Data & Analytics

機械学習によるデータ分析 実践編

  1. 1. @canard0328 t t
  2. 2. 2 r Tn t t or
  3. 3. 3 http://nbviewer.ipython.org/gist/canard0328/6f44229365f53b7bd30f/ http://nbviewer.ipython.org/gist/canard0328/a5911ee5b4bf1a07fbcb/ https://gist.github.com/canard0328/07a65584c134a2700725 https://gist.github.com/canard0328/b2f8aec2b9c286f53400
  4. 4. 4 Sample Explore Modify Model Assess Sample Explore Modify Model Assess t t r t SEMMA
  5. 5. 5 CRISPLDM CRossLIndustryNStandardNProcessNforNDataNMining BusinessNUnderstanding DataNUnderstanding DataNPreparation Modeling Evaluation Deployment KDD KnowledgeNDiscoveryNinNDatabases Selection Preprocessing Transformation DataNMining Interpretation/Evaluation KKD Keiken,NKan andNDokyo
  6. 6. 6 t http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv (DataNobtainedNfromNhttp://biostat.mc.vanderbilt.edu/DataSets) > data = read.csv(“titanic3.csv”, + stringsAsFactors=F, na.strings=c("","NA")) >>> import pandas as pd >>> data = pd.read_csv(‘titanic3.csv') Sample Explore Modify Assess Model
  7. 7. 7 t t t t t Sample Explore Modify Assess Model
  8. 8. 8 () ( ( ) ) ) ( () ( ( ) ( ( ) ( ÷
  9. 9. 9 r t r r r
  10. 10. 10
  11. 11. 11 1. t 2. t 3. t 4. t
  12. 12. 12 t t t t Sample Explore Modify Assess Model
  13. 13. 13 t rT
  14. 14. 14 u nT t T
  15. 15. 10of0K 15 N t NL1 t
  16. 16. Feature hashing /=Hashing trick 16 FeatureNhashing t Nt v t xN:=NnewNvector[N] forNfNinNfeatures: hN:=Nhash(f) x[hNmodNN]N+=N1 http://en.wikipedia.org/wiki/Feature_hashing
  17. 17. (Curse=of=dimensionality) 17 t r g r ur n u t e e e t T Tn u t e
  18. 18. (Standardization) 18 xt 10 i t n
  19. 19. (Standardization) 19 a (Standardization) σ µ− = x z σ µ xt xt P 1 e
  20. 20. 20 t r (Feature selection) t r t e (ForwardNstepwiseNselection) (BackwardNstepwiseNselection)
  21. 21. 21 UglyNducklingNtheorem T t t t t u t t t t T t
  22. 22. 22 4. t 5. t 6. t
  23. 23. 23 u “MachineNlearningNisNtheNscienceNofNgettingNcomputersNtoN actNwithoutNbeingNexplicitlyNprogrammed.”NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN AndrewNNg u t T t e e 23 Sample Explore Modify Assess Model
  24. 24. 24 supervisedNlearning t • classification • regression unsupervisedNlearning u t • • • outlierNdetection
  25. 25. 25 gt t • semiLsupervisedNlearning • reinforcementNlearning • activeNlearning • onlineNlearning • transferNlearning
  26. 26. 26 • • • • k • • • • •
  27. 27. 27 r • KLmeansN • • Apriori • OneLclassNSVM
  28. 28. 28 nu TnT t rT r rT
  29. 29. 29 x y εββββ +++++= ii xxxy !22110 u generalizedNlinearNmodel u t
  30. 30. 30 KLmeans KLmeans u t n T GaussianNmixtureNmodel t
  31. 31. 31 t T ÷ u n T t T t
  32. 32. 32 7.
  33. 33. 3333 Sample Explore Modify Assess Model
  34. 34. 34 (MeanNabsoluteNerror) T T (MeanNsquare(d)Nerror) T T RootNMeanNSquare(d)N Error R2(CoefficientNofNdetermination) ÷ T e 0( T) 1( T) T r
  35. 35. 35 (Accuracy) (ErrorNrate) 1N 1 t t 100 t e t u99% u T T i
  36. 36. 36 (ConfusionNmatrix) (Positive)  26 5 8 6 (TrueNpositiveN:NTP) (FalseNnegativeN:NFN) (FalseNpositiveN:NFP)  4: 6 96 5 8 6 / 42 T nT v t r
  37. 37. 37 (Precision) TP/(TPN+NFP) tt (Recall) TP/(TPN+NFN) t F (F1Nscore,NFLmeasure) 2 ( )N/N( ) P 2 3 TP FN 2 FP 42
  38. 38. 38 (True Positive Rate) TP/(TPN+NFN) t (False Positive Rate) FP/(FPN+NTN) t n P 2 3 TP FN 2 FP 42
  39. 39. 39 1 t t 100 t e (Positive) (Negative) 0 100 0 9900 0.99 0 0 F 0
  40. 40. 40 t u e r T t rT e T e SMOTE u r rT T...
  41. 41. 41 u t T e u r r ROC t r t AUC ROC t t 1.0
  42. 42. 42 ROC AUC
  43. 43. 43 n r T t u rT >Nclf =NSVC().fit(X,Ny)
  44. 44. 44 u e >Nclf =NSVC(kernel=‘rbf’,NC=1.0Ngamma=0.1).fit(X,Ny)
  45. 45. 45 r t T t e
  46. 46. 46 t r t( : t )u r g rT tu n(10L2,10L1,100,101,102) u n
  47. 47. 47 n r 0.0 F 1.0 i r
  48. 48. 48 t 0.0 u t
  49. 49. 49 (OverNfitting) n n T u T e n t e e t T T r T eT
  50. 50. 50 e r e rT (Regularization) t Lasso SVMr t t r e n rT(UnderNfitting)
  51. 51. 51 (Cross validation) e 1. B E A 2. A,C E B 3. A,B,D,E C 4. A C,E D 5. A D E 6. 5t 5 5LfoldNcrossNvalidation
  52. 52. 52 t K 1 (LeaveLoneLout cross validation) (StratifiedNcrossNvalidation) t t K t a r t e t
  53. 53. 53 8. 9.
  54. 54. 54 t ε=N(0,Nσ2) σ2+Bias2+Variance Bias( ) t e Variance( ) e
  55. 55. 55 t ε t
  56. 56. 56 ε t u T tv u T → 1
  57. 57. 57 ε t T →
  58. 58. 58 u t t T (OverNfitting) t T UnderNfitting
  59. 59. 59 r ( ) ( )
  60. 60. 60 ( ) T( T) t T t T t nTrT
  61. 61. 61 T t T t T
  62. 62. 62 r e t T t e e r e
  63. 63. 63 10. 11. t 12. t
  64. 64. 64 (EnsembleNlearning) • t t • Stacking Bagging Boosting • u DeepNlearning • NeuralNnetworkst • r
  65. 65. … 65 https://www.linkedin.com/pulse/inconvenientLtruthLdataLscienceLkamilLbartocha
  66. 66. 66 MALSS (MachineNLearningNSupportNSystem) t e Python • • • • •
  67. 67. 67 MALSS > pip install –U malss > from malss import MALSS > clf = MALSS('classification‘, lang=‘jp’) > clf.fit(X, y, ‘report_output_dir') > clf.make_sample_code('sample_code.py')
  68. 68. 68 MALSS
  69. 69. 69 MALSS
  70. 70. 70 F.NProvost Coursera:=Machine=Learning AndrewNNg https://www.coursera.org/course/ml scikit0learn=Tutorials http://scikitLlearn.org/stable/tutorial/ Tutorial:=Machine=Learning=for=Astronomy=with=Scikit0learn http://www.astroml.org/sklearn_tutorial/
  71. 71. 71 MALSS=(Machine=Learning=Support=System) https://pypi.python.org/pypi/malss/ https://github.com/canard0328/malss Python MALSS Qiita http://qiita.com/canard0328/items/fe1ccd5721d59d76cc77 Python MALSS Qiita http://qiita.com/canard0328/items/5da95ff4f2e1611f87e1 Python MALSS Qiita http://qiita.com/canard0328/items/3713d6758fe9c045a19d
  72. 72. 72 1. SEMMA CRISPLDM KDD KKD 2. t t T T t 3. 4.

×