SlideShare a Scribd company logo
1 of 70
Download to read offline
모듈형 패키지를 활용한
나만의 기계학습 모형 만들기
-회귀나무모형을 중심으로
신한은행
Digital Innovation 센터
어수행, PhD
(eo.sooheang@gmail.com)
Translating statistical ideas into software.
R = Statistical Software
목 차
•모형구현의 시각에서 본 회귀나무모형 역사
•모듈러 패키지를 활용한 회귀나무모형 구현
본 세션은 아래와 같은 분들을 위해 기획되었습니다.
• 대상: 연구자/대학원생
• 연구분야: 사회과학(심리)/의학(병리 혹은 예방)
• 아래 분야에 대한 전문지식이 있으시면 세션이해가 수월하실 듯 합니다.

-> 교과서가 아닌 실무(연구)에서의 통계적 가설검정 경험

-> 1학기 분량의 기계학습 정규과정 수강 경험

-> R(혹은 SAS, Python)에서 tree model을 사용해본 경험
(Linear) Regression
£Input Variables: X=(X1, X2, …, Xp) ∈ Rp
£Target Variable: Y ∈ R
£Model: Yi= β0+β1X1i+ β2X2i+…+βp Xpi+εi,
εi~ iid N(0, σ2)
£Parameters for Regression Coefficients:
(β0,β1,β2,…,βp)
Y
X
(Linear) Regression
£Input Variables: X=(X1, X2, …, Xp) ∈ Rp
£Target Variable: Y ∈ R
£Model: Yi= β0+β1X1i+ β2X2i+…+βp Xpi+εi,
εi~ iid N(0, σ2)
£Parameters for Regression Coefficients:
(β0,β1,β2,…,βp)
Y
X
(Linear) Regression Tree
β0
Y
X
β0
(Linear) Regression Tree
Y
Xc
β0
β1 β2
x <= c x > c
β1
β2
(Linear) Regression Tree
Loh (2014)
Decision Tree
Recursive Partitioning
Tree(-structured) Model
Classification (Regression) Tree
(Linear) Regression Tree
• A tree model is a logical model represented as a tree that shows how the value
of a target variable can be predicted by using the values of a set of predictor
(input) variables.
• A tree model recursively partitions the data and sample space to construct the
predictive model.
• Its name derives from the practice of displaying the partitions as a decision tree,
from which the roles of the predictor variables may be inferred.
• The idea was first implemented by Morgan and Sonquist in 1963. It is called the
AID (Automatic Interaction Detection) algorithm.
AID model
Automatic Interaction Detection (AID) model by Morgan and Sonquist (1963; JASA)
http://home.isr.umich.edu/education/fellowships-awards/james-morgan-fund/
AID model
Code name ‘SEARCH’
Automatic Interaction Detector Program 

(Sonquist and Morgan, 1964)
Enhanced version of the AID program

(Sonquist et al., 1974)
AID Model
model estimation
+
segmentation
CART model
Classification and Regression Tree (CART) model by BFOS (1976~)
Leo Breiman Jerome Friedman Richard Olshen Charles Stone
CART model
Stone (1977). Consistent Nonparametric Regression(with discussion), The Annals of Statistics, 5, 590-625
CART model = Pruning
• The faithfulness of any classification tree is measured by a
deviance measure, D(T), which takes its miminum value at
zero if every member of the training sample is uniquely and
correctly classified.
• The size of a tree is the number of terminal nodes.
• A cost-complexity measure of a tree is the deviance
penalized by a multiple of the size:
D(T) = D(T) + α size(T)
where α is a tuning constant. This is eventually minimized.
• Low values of α for this measure imply that accuracy of
prediction (in the training sample) is more important than
simplicity.
• High values of α rate simplicity relatively more highly than
predictive accuracy.
Implementation of CART model
tree package rpart package
Implementation of CART model
tree package rpart package
Implementation of CART model
• In R there is a native tree library that V&R have some reservations about. It is useful,
though.
•rpart is a library written by Beth Atkinson and Terry Therneau of the Mayo Clinic,
Rochester, NY. It is much closer to the spirit of the original CART algorithm of Breiman,
et al. It is now supplied with both S-PLUS and R.
• In R, there is a tree library that is an S-PLUS look-alike, but we think better in some
respects.
•rpart is the more flexible and allows various splitting criteria and different model bases
(survival trees, for example).
•rpart is probably the better package, but tree is acceptable and some things such as
cross-validation are easier with tree.
• In this discussion we (nevertheless) largely use tree!
Implementation of CART model
https://cran.r-project.org/package=rpart
CART Model
Tree Size (=Pruning)
+
Theoretical Properties
GUIDE model
Generalised Unbiased Variable Selection and Interaction Detection model 

by Low and others (1986~)
Implementation of GUIDE model
IBM SPSS
GUIDE CORE

(Fortran95)
GUIDE Interface
GUIDE Model
piecewise linear model
+
segmentation
+
Unified Framework
with statistical testing
CTREE and MOB model
Model-based Recursive Partitioning by Hothorn and Zeileis (2004~)
CTREE and MOB model
Models: Estimation of parametric models with observations yi (and
regressors xi), parameter vector θ, and additive objective function Ψ.
Recursive partitioning:
1 Fit the model in the current subsample.
2 Assess the stability of θ across each partitioning variable zj.
3 Split sample along the zj∗ with strongest association: Choose breakpoint
with highest improvement of the model fit.
4 Repeat steps 1–3 recursively in the subsamples until some stopping
criterion is met.
ˆ✓ = argmin✓
P
i
0
(yi, xi, ˆ✓)
CTREE and MOB model
Implementation of MOB model
(Regression) Tree-based Model
Unified Framework
with modular system
정리 (tree model의 관점)
• 1세대 - Michigan (1964 ~ 199x)

piecewise constant model with exhaustive (heuristic) search
• 2세대 - Berkely & Stanford (1972 ~ 200x)

Unified tree framework with exhastive search
• 2.5세대 - Wisconsin & ISI (1986 ~ 201x)

Unified tree framework with statistical testing
• 3세대 - LMU & Upenn & UNC (2005 ~ 201x)

Unified tree framework with piecewise model-based model

+ extensions (Domain / Bayesian Approaches / Tree-structured Objects)
순도 100% 개인적 생각
정리 (구현 관점)
The CRAN task view on “Machine Learning” at http://CRAN.R-project.org/
view=MachineLearning lists numerous packages for tree-based modeling and
recursive partitioning, including
– rpart (CART),

– tree (CART),

– mvpart (multivariate CART),

– RWeka (J4.8, M5’, LMT),

– party (CTree, MOB),

– and many more (C50, quint, stima, . . . ).
Related: Packages for tree-based ensemble methods such as random forests or
boosting, e.g., randomForest, gbm, mboost, etc.
모듈형 패키지를 활용한
나만의 회귀나무모형 만들기
How many LEGOs need for
building a tree model?
segmentation
statistical testing
model estimation
pruning
1 Fit a model to the y or y and x variables using the observations in the
current node
2 Assess the stability of the model parameters with respect to each of the
partitioning variables z1, ..., zl. If there is some overall instability, choose
the variable z associated with the smallest p value for partitioning,
otherwise stop.
3 Search for the locally optimal split in z by minimizing the objective
function of the model. Typically, this will be something like deviance or
the negative logLik.
4 Refit the model in both kid subsamples and repeat from step 2.
How many LEGOs need for
building a tree model?
http://partykit.r-forge.r-project.org/partykit/outreach/
modular R package - partykit
http://partykit.r-forge.r-project.org/partykit/outreach/
Example: Linear Model Tree
Example: Linear Model Tree
Example: Linear Model Tree
Example: Linear Model Tree
Implementation: Models
Input: Basic interface.
fit(y, x = NULL, start = NULL, weights = NULL,
offset = NULL, ...)
y, x, weights, offset are (the subset of) the preprocessed data.
Starting values and further fitting arguments are in start and ....
Output: Fitted model object of class with suitable methods.
coef(): Estimated parameters hat_{theta}
logLik(): Maximized log-likelihood function .
estfun(): Empirical estimating functions Ψ0
http://partykit.r-forge.r-project.org/partykit/outreach/
Implementation: Models
Input: Extended interface.
fit(y, x = NULL, start = NULL, weights = NULL,
offset = NULL, ..., estfun = FALSE, object = FALSE)
Output: List object
coefficients: Estimated parameters
objfun: Minimized objective function
estfun: Empirical estimating functions
object: A model object for which further methods could be
available (e.g., predict(), or fitted(), etc.).
Internally: Extended interface constructed from basic interface if
supplied. Efficiency can be gained through extended approach.
ˆ✓ P
i (yi, xi, ˆ✓)
0
(yi, xi, ˆ✓)
http://partykit.r-forge.r-project.org/partykit/outreach/
Implementation: Framework
Class: ‘modelparty’ inheriting from ‘party’.
Main addition: Data handling for regressor and partitioning variables.
The Formula package is used for two-part formulas, e.g.,
y ~ x1 + x2 | z1 + z2 + z3.
The corresponding terms are stored for the combined model and
only for the partitioning variables.
Additional information: In info slots of ‘party’ and ‘partynode’.
call, formula, Formula, terms (partitioning variables only),
fit, control, dots, nreg.
coefficients, objfun, object, nobs, p.value, test.
Reusability: Could in principle be used for other model trees as well
(inferred by other algorithms than MOB).
http://partykit.r-forge.r-project.org/partykit/outreach/
Example: Bradley-Terry Tree
http://partykit.r-forge.r-project.org/partykit/outreach/
Example: Bradley-Terry Tree
• Task: Preference scaling of attractiveness.
• Data: Paired comparisons of attractiveness.
Germany’s Next Topmodel 2007 finalists:
Barbara, Anni, Hana,Fiona, Mandy, Anja.
Survey with 192 respondents at Universit t T bingen.
Available covariates: Gender, age, familiarty with the TV show.
Familiarity assessed by yes/no questions: 

(1) Do you recognize the women?/Do you know the show? 

(2) Did you watch it regularly?

(3) Did you watch the final show?/Do you know who won?
http://partykit.r-forge.r-project.org/partykit/outreach/
Example: Bradley-Terry Tree
Model: Bradley-Terry (or Bradley-Terry-Luce) model.
Standard model for paired comparisons in social sciences.
Parametrizes probability for preferring object i over j in terms of
corresponding “ability” or “worth” parameters
Implementation: bttree() in psychotree (Strobl et al. 2011).
Here: Use mob() directly to build model from scratch using
btReg.fit() from psychotools
⇡ij
✓i
⇡ij = ✓i
✓i+✓j
http://partykit.r-forge.r-project.org/partykit/outreach/
Example: Bradley-Terry Tree
Example: Bradley-Terry Tree
bttree() in psychotree
bttree() in psychotree
적용
Case 1. random effects tree model
Eo and Cho (2014)
Case 2. latent variable tree model
Lee et al. (2012); Eo et al. (2014); Eo (2015);
Case 3. multiway splits tree model
Eo et al. (2014)Kim and Loh (1998)
Case 4. Industry (FDS)
Case 5. Industry (Quantile Tree)
마무리
Extensions to Deep Learning
Summary
R에서
tree-structured model을 쓰지 않는다면
앙꼬없는 찐빵을 먹는것과 같다
매일 밤 졸업을 위해 고민하는 대학원생들에게 본 세션을 바칩니다.
http://www.phdcomics.com/
Some review papers
• Loh, W.-Y. (2014). Fifty years of classification and regression trees (with discussion). International Statistical Review, vol,
pages.
• Loh, W.-Y. (2008). Regression by parts: Fitting visually interpretable models with GUIDE. In Handbook of Data Visualization, C.Chen,
W.H rdle, and A.Unwin, Eds. Springer, pp.447-469.
• Loh, W.-Y. (2008). Classification and regression tree methods. In Encyclopedia of Statistics in Quality and Reliability, F.Ruggeri,
R.Kenett, and F.W. Faltin, Eds. Wiley, Chichester, UK, pp.315-323.
• Loh, W.-Y. (2010). Tree-structured classifiers. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 364-369.
• Loh, W.-Y. (2011). Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14-23.
• Merkle, E.C. and Shaffer, V.A. (2011). Binary recursive partitioning: Background, methods, and application to psychology, British
Journal of Mathematical and Statistical Psychology, 64, 161–181.
• Morgan, J. N. and Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. J. Amer. Statist. Assoc. 58 415–434.
• Strobl, C., Malley, J. and Tutz, G. (2009). An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of
Classification and Regression Trees, Bagging, and Random forests. Psychological Methods, 14(4), 323–348.

More Related Content

What's hot

Transfer Learning for Improving Model Predictions in Robotic Systems
Transfer Learning for Improving Model Predictions  in Robotic SystemsTransfer Learning for Improving Model Predictions  in Robotic Systems
Transfer Learning for Improving Model Predictions in Robotic SystemsPooyan Jamshidi
 
20190927 generative models_aia
20190927 generative models_aia20190927 generative models_aia
20190927 generative models_aiaYi-Fan Liou
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation systemKimikazu Kato
 
Machine teaching tbo_20190518
Machine teaching tbo_20190518Machine teaching tbo_20190518
Machine teaching tbo_20190518Yi-Fan Liou
 
Data mining with caret package
Data mining with caret packageData mining with caret package
Data mining with caret packageVivian S. Zhang
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionJaroslaw Szymczak
 
Metric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible packageMetric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible packageWilliam de Vazelhes
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsKrishna Sankar
 
GBM package in r
GBM package in rGBM package in r
GBM package in rmark_landry
 
Predict future time series forecasting
Predict future time series forecastingPredict future time series forecasting
Predict future time series forecastingHichem Felouat
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2arogozhnikov
 
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015Paul Richards
 
Kaggle talk series top 0.2% kaggler on amazon employee access challenge
Kaggle talk series  top 0.2% kaggler on amazon employee access challengeKaggle talk series  top 0.2% kaggler on amazon employee access challenge
Kaggle talk series top 0.2% kaggler on amazon employee access challengeVivian S. Zhang
 
How to Build your First Neural Network
How to Build your First Neural NetworkHow to Build your First Neural Network
How to Build your First Neural NetworkHichem Felouat
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machinesNawal Sharma
 

What's hot (20)

Transfer Learning for Improving Model Predictions in Robotic Systems
Transfer Learning for Improving Model Predictions  in Robotic SystemsTransfer Learning for Improving Model Predictions  in Robotic Systems
Transfer Learning for Improving Model Predictions in Robotic Systems
 
20190927 generative models_aia
20190927 generative models_aia20190927 generative models_aia
20190927 generative models_aia
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation system
 
Machine teaching tbo_20190518
Machine teaching tbo_20190518Machine teaching tbo_20190518
Machine teaching tbo_20190518
 
Data mining with caret package
Data mining with caret packageData mining with caret package
Data mining with caret package
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Metric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible packageMetric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible package
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 
Xgboost
XgboostXgboost
Xgboost
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
Machine Learning - Supervised Learning
Machine Learning - Supervised LearningMachine Learning - Supervised Learning
Machine Learning - Supervised Learning
 
Predict future time series forecasting
Predict future time series forecastingPredict future time series forecasting
Predict future time series forecasting
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2
 
Xgboost
XgboostXgboost
Xgboost
 
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
 
Kaggle talk series top 0.2% kaggler on amazon employee access challenge
Kaggle talk series  top 0.2% kaggler on amazon employee access challengeKaggle talk series  top 0.2% kaggler on amazon employee access challenge
Kaggle talk series top 0.2% kaggler on amazon employee access challenge
 
How to Build your First Neural Network
How to Build your First Neural NetworkHow to Build your First Neural Network
How to Build your First Neural Network
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 

Viewers also liked

RUCK 2017 R로 API 서버를 만드는 4가지 방법(은 삽질기)
RUCK 2017 R로 API 서버를 만드는 4가지 방법(은 삽질기)RUCK 2017 R로 API 서버를 만드는 4가지 방법(은 삽질기)
RUCK 2017 R로 API 서버를 만드는 4가지 방법(은 삽질기)r-kor
 
RUCK 2017 REx: 엑셀 기반 R 연동 통계분석 소프트웨어
RUCK 2017 REx: 엑셀 기반 R 연동 통계분석 소프트웨어RUCK 2017 REx: 엑셀 기반 R 연동 통계분석 소프트웨어
RUCK 2017 REx: 엑셀 기반 R 연동 통계분석 소프트웨어r-kor
 
RUCK 2017 R 을 이용한 사회조사 자료의 분석 및 보고서 작성 방법
RUCK 2017 R 을 이용한 사회조사 자료의 분석 및 보고서 작성 방법RUCK 2017 R 을 이용한 사회조사 자료의 분석 및 보고서 작성 방법
RUCK 2017 R 을 이용한 사회조사 자료의 분석 및 보고서 작성 방법r-kor
 
RUCK 2017 빅데이터 분석에서 모형의 역할
RUCK 2017 빅데이터 분석에서 모형의 역할RUCK 2017 빅데이터 분석에서 모형의 역할
RUCK 2017 빅데이터 분석에서 모형의 역할r-kor
 
RUCK 2017 Shiny의 또 다른 활용: RStudio addin 함수 및 패키지의 제작
RUCK 2017 Shiny의 또 다른 활용: RStudio addin 함수 및 패키지의 제작RUCK 2017 Shiny의 또 다른 활용: RStudio addin 함수 및 패키지의 제작
RUCK 2017 Shiny의 또 다른 활용: RStudio addin 함수 및 패키지의 제작r-kor
 
RUCK 2017 - 강병엽 - Spark와 R을 연동한 빅데이터 분석
RUCK 2017 - 강병엽 - Spark와 R을 연동한 빅데이터 분석RUCK 2017 - 강병엽 - Spark와 R을 연동한 빅데이터 분석
RUCK 2017 - 강병엽 - Spark와 R을 연동한 빅데이터 분석r-kor
 
RUCK 2017 샤이니 대시보드를 활용한 interactive chart 구현
RUCK 2017 샤이니 대시보드를 활용한 interactive chart 구현RUCK 2017 샤이니 대시보드를 활용한 interactive chart 구현
RUCK 2017 샤이니 대시보드를 활용한 interactive chart 구현r-kor
 
RUCK 2017 베이즈 모형의 꽃 - 계층 모형
RUCK 2017 베이즈 모형의 꽃 - 계층 모형RUCK 2017 베이즈 모형의 꽃 - 계층 모형
RUCK 2017 베이즈 모형의 꽃 - 계층 모형r-kor
 
RUCK 2017 김대영 R 기반 프로덕트의 개발과 배포
RUCK 2017 김대영 R 기반 프로덕트의 개발과 배포RUCK 2017 김대영 R 기반 프로덕트의 개발과 배포
RUCK 2017 김대영 R 기반 프로덕트의 개발과 배포r-kor
 
RUCK 2017 권재명 효율적 데이터 과학과 데이터 조직을 위한 7가지 요인
RUCK 2017 권재명 효율적 데이터 과학과 데이터 조직을 위한 7가지 요인RUCK 2017 권재명 효율적 데이터 과학과 데이터 조직을 위한 7가지 요인
RUCK 2017 권재명 효율적 데이터 과학과 데이터 조직을 위한 7가지 요인r-kor
 
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개r-kor
 

Viewers also liked (11)

RUCK 2017 R로 API 서버를 만드는 4가지 방법(은 삽질기)
RUCK 2017 R로 API 서버를 만드는 4가지 방법(은 삽질기)RUCK 2017 R로 API 서버를 만드는 4가지 방법(은 삽질기)
RUCK 2017 R로 API 서버를 만드는 4가지 방법(은 삽질기)
 
RUCK 2017 REx: 엑셀 기반 R 연동 통계분석 소프트웨어
RUCK 2017 REx: 엑셀 기반 R 연동 통계분석 소프트웨어RUCK 2017 REx: 엑셀 기반 R 연동 통계분석 소프트웨어
RUCK 2017 REx: 엑셀 기반 R 연동 통계분석 소프트웨어
 
RUCK 2017 R 을 이용한 사회조사 자료의 분석 및 보고서 작성 방법
RUCK 2017 R 을 이용한 사회조사 자료의 분석 및 보고서 작성 방법RUCK 2017 R 을 이용한 사회조사 자료의 분석 및 보고서 작성 방법
RUCK 2017 R 을 이용한 사회조사 자료의 분석 및 보고서 작성 방법
 
RUCK 2017 빅데이터 분석에서 모형의 역할
RUCK 2017 빅데이터 분석에서 모형의 역할RUCK 2017 빅데이터 분석에서 모형의 역할
RUCK 2017 빅데이터 분석에서 모형의 역할
 
RUCK 2017 Shiny의 또 다른 활용: RStudio addin 함수 및 패키지의 제작
RUCK 2017 Shiny의 또 다른 활용: RStudio addin 함수 및 패키지의 제작RUCK 2017 Shiny의 또 다른 활용: RStudio addin 함수 및 패키지의 제작
RUCK 2017 Shiny의 또 다른 활용: RStudio addin 함수 및 패키지의 제작
 
RUCK 2017 - 강병엽 - Spark와 R을 연동한 빅데이터 분석
RUCK 2017 - 강병엽 - Spark와 R을 연동한 빅데이터 분석RUCK 2017 - 강병엽 - Spark와 R을 연동한 빅데이터 분석
RUCK 2017 - 강병엽 - Spark와 R을 연동한 빅데이터 분석
 
RUCK 2017 샤이니 대시보드를 활용한 interactive chart 구현
RUCK 2017 샤이니 대시보드를 활용한 interactive chart 구현RUCK 2017 샤이니 대시보드를 활용한 interactive chart 구현
RUCK 2017 샤이니 대시보드를 활용한 interactive chart 구현
 
RUCK 2017 베이즈 모형의 꽃 - 계층 모형
RUCK 2017 베이즈 모형의 꽃 - 계층 모형RUCK 2017 베이즈 모형의 꽃 - 계층 모형
RUCK 2017 베이즈 모형의 꽃 - 계층 모형
 
RUCK 2017 김대영 R 기반 프로덕트의 개발과 배포
RUCK 2017 김대영 R 기반 프로덕트의 개발과 배포RUCK 2017 김대영 R 기반 프로덕트의 개발과 배포
RUCK 2017 김대영 R 기반 프로덕트의 개발과 배포
 
RUCK 2017 권재명 효율적 데이터 과학과 데이터 조직을 위한 7가지 요인
RUCK 2017 권재명 효율적 데이터 과학과 데이터 조직을 위한 7가지 요인RUCK 2017 권재명 효율적 데이터 과학과 데이터 조직을 위한 7가지 요인
RUCK 2017 권재명 효율적 데이터 과학과 데이터 조직을 위한 7가지 요인
 
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
 

Similar to 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로

Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxmuhammadsamroz
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersUniversity of Huddersfield
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3butest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
13 random forest
13 random forest13 random forest
13 random forestVishal Dutt
 
Strategic Sourcing Quantitative Decision-Making and Analytics Mo.docx
Strategic Sourcing Quantitative Decision-Making and Analytics Mo.docxStrategic Sourcing Quantitative Decision-Making and Analytics Mo.docx
Strategic Sourcing Quantitative Decision-Making and Analytics Mo.docxcpatriciarpatricia
 
Exploratory data analysis using xgboost package in R
Exploratory data analysis using xgboost package in RExploratory data analysis using xgboost package in R
Exploratory data analysis using xgboost package in RSatoshi Kato
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 

Similar to 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 (20)

Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptx
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
13 random forest
13 random forest13 random forest
13 random forest
 
Strategic Sourcing Quantitative Decision-Making and Analytics Mo.docx
Strategic Sourcing Quantitative Decision-Making and Analytics Mo.docxStrategic Sourcing Quantitative Decision-Making and Analytics Mo.docx
Strategic Sourcing Quantitative Decision-Making and Analytics Mo.docx
 
Exploratory data analysis using xgboost package in R
Exploratory data analysis using xgboost package in RExploratory data analysis using xgboost package in R
Exploratory data analysis using xgboost package in R
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 

More from r-kor

오픈데이터와 오픈소스 소프트웨어를 이용한 의료이용정보의 시각화
오픈데이터와 오픈소스 소프트웨어를 이용한 의료이용정보의 시각화오픈데이터와 오픈소스 소프트웨어를 이용한 의료이용정보의 시각화
오픈데이터와 오픈소스 소프트웨어를 이용한 의료이용정보의 시각화r-kor
 
구조화된 데이터: Schema.org와 Microdata, RDFa, JSON-LD
구조화된 데이터: Schema.org와 Microdata, RDFa, JSON-LD구조화된 데이터: Schema.org와 Microdata, RDFa, JSON-LD
구조화된 데이터: Schema.org와 Microdata, RDFa, JSON-LDr-kor
 
빅데이터 인공지능 전략 및 로드맵
빅데이터 인공지능 전략 및 로드맵빅데이터 인공지능 전략 및 로드맵
빅데이터 인공지능 전략 및 로드맵r-kor
 
선박식별정보를 이용한 어업활동 공간밀도 가시화
선박식별정보를 이용한 어업활동 공간밀도 가시화선박식별정보를 이용한 어업활동 공간밀도 가시화
선박식별정보를 이용한 어업활동 공간밀도 가시화r-kor
 
한글 언어 자원과 R: KoNLP 개선과 활용
한글 언어 자원과 R: KoNLP 개선과 활용한글 언어 자원과 R: KoNLP 개선과 활용
한글 언어 자원과 R: KoNLP 개선과 활용r-kor
 
지능정보시대를 위한 빅데이터, 이대로 좋은가
지능정보시대를 위한 빅데이터, 이대로 좋은가지능정보시대를 위한 빅데이터, 이대로 좋은가
지능정보시대를 위한 빅데이터, 이대로 좋은가r-kor
 
과학기술 발전과 오픈소스
과학기술 발전과 오픈소스과학기술 발전과 오픈소스
과학기술 발전과 오픈소스r-kor
 
오픈 데이터, 스마트 시티 그리고 인공지능
오픈 데이터, 스마트 시티 그리고 인공지능오픈 데이터, 스마트 시티 그리고 인공지능
오픈 데이터, 스마트 시티 그리고 인공지능r-kor
 
유엔 해비타트 신도시의제 실현을 위한 오픈소스 지오스페셜
유엔 해비타트 신도시의제 실현을 위한 오픈소스 지오스페셜유엔 해비타트 신도시의제 실현을 위한 오픈소스 지오스페셜
유엔 해비타트 신도시의제 실현을 위한 오픈소스 지오스페셜r-kor
 
Expanding Open Data Horizons with R and RStudio
Expanding Open Data Horizons with R and RStudioExpanding Open Data Horizons with R and RStudio
Expanding Open Data Horizons with R and RStudior-kor
 
Bristol Approach To Citizen Sensing
Bristol Approach To Citizen SensingBristol Approach To Citizen Sensing
Bristol Approach To Citizen Sensingr-kor
 
OSGeo와 Open Data
OSGeo와 Open DataOSGeo와 Open Data
OSGeo와 Open Datar-kor
 
오픈데이터 관련 글로벌 동향과 정책 기술적 지향점에 대한 고찰
오픈데이터 관련 글로벌 동향과 정책 기술적 지향점에 대한 고찰오픈데이터 관련 글로벌 동향과 정책 기술적 지향점에 대한 고찰
오픈데이터 관련 글로벌 동향과 정책 기술적 지향점에 대한 고찰r-kor
 
Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점
Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점
Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점r-kor
 
IoT 구현을 위한 오픈 데이터 이슈
IoT 구현을 위한 오픈 데이터 이슈IoT 구현을 위한 오픈 데이터 이슈
IoT 구현을 위한 오픈 데이터 이슈r-kor
 
황성수 공공데이터 개방과 공공이슈 해결
황성수 공공데이터 개방과 공공이슈 해결황성수 공공데이터 개방과 공공이슈 해결
황성수 공공데이터 개방과 공공이슈 해결r-kor
 

More from r-kor (16)

오픈데이터와 오픈소스 소프트웨어를 이용한 의료이용정보의 시각화
오픈데이터와 오픈소스 소프트웨어를 이용한 의료이용정보의 시각화오픈데이터와 오픈소스 소프트웨어를 이용한 의료이용정보의 시각화
오픈데이터와 오픈소스 소프트웨어를 이용한 의료이용정보의 시각화
 
구조화된 데이터: Schema.org와 Microdata, RDFa, JSON-LD
구조화된 데이터: Schema.org와 Microdata, RDFa, JSON-LD구조화된 데이터: Schema.org와 Microdata, RDFa, JSON-LD
구조화된 데이터: Schema.org와 Microdata, RDFa, JSON-LD
 
빅데이터 인공지능 전략 및 로드맵
빅데이터 인공지능 전략 및 로드맵빅데이터 인공지능 전략 및 로드맵
빅데이터 인공지능 전략 및 로드맵
 
선박식별정보를 이용한 어업활동 공간밀도 가시화
선박식별정보를 이용한 어업활동 공간밀도 가시화선박식별정보를 이용한 어업활동 공간밀도 가시화
선박식별정보를 이용한 어업활동 공간밀도 가시화
 
한글 언어 자원과 R: KoNLP 개선과 활용
한글 언어 자원과 R: KoNLP 개선과 활용한글 언어 자원과 R: KoNLP 개선과 활용
한글 언어 자원과 R: KoNLP 개선과 활용
 
지능정보시대를 위한 빅데이터, 이대로 좋은가
지능정보시대를 위한 빅데이터, 이대로 좋은가지능정보시대를 위한 빅데이터, 이대로 좋은가
지능정보시대를 위한 빅데이터, 이대로 좋은가
 
과학기술 발전과 오픈소스
과학기술 발전과 오픈소스과학기술 발전과 오픈소스
과학기술 발전과 오픈소스
 
오픈 데이터, 스마트 시티 그리고 인공지능
오픈 데이터, 스마트 시티 그리고 인공지능오픈 데이터, 스마트 시티 그리고 인공지능
오픈 데이터, 스마트 시티 그리고 인공지능
 
유엔 해비타트 신도시의제 실현을 위한 오픈소스 지오스페셜
유엔 해비타트 신도시의제 실현을 위한 오픈소스 지오스페셜유엔 해비타트 신도시의제 실현을 위한 오픈소스 지오스페셜
유엔 해비타트 신도시의제 실현을 위한 오픈소스 지오스페셜
 
Expanding Open Data Horizons with R and RStudio
Expanding Open Data Horizons with R and RStudioExpanding Open Data Horizons with R and RStudio
Expanding Open Data Horizons with R and RStudio
 
Bristol Approach To Citizen Sensing
Bristol Approach To Citizen SensingBristol Approach To Citizen Sensing
Bristol Approach To Citizen Sensing
 
OSGeo와 Open Data
OSGeo와 Open DataOSGeo와 Open Data
OSGeo와 Open Data
 
오픈데이터 관련 글로벌 동향과 정책 기술적 지향점에 대한 고찰
오픈데이터 관련 글로벌 동향과 정책 기술적 지향점에 대한 고찰오픈데이터 관련 글로벌 동향과 정책 기술적 지향점에 대한 고찰
오픈데이터 관련 글로벌 동향과 정책 기술적 지향점에 대한 고찰
 
Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점
Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점
Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점
 
IoT 구현을 위한 오픈 데이터 이슈
IoT 구현을 위한 오픈 데이터 이슈IoT 구현을 위한 오픈 데이터 이슈
IoT 구현을 위한 오픈 데이터 이슈
 
황성수 공공데이터 개방과 공공이슈 해결
황성수 공공데이터 개방과 공공이슈 해결황성수 공공데이터 개방과 공공이슈 해결
황성수 공공데이터 개방과 공공이슈 해결
 

Recently uploaded

Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 

Recently uploaded (16)

Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 

모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로

  • 1. 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 -회귀나무모형을 중심으로 신한은행 Digital Innovation 센터 어수행, PhD (eo.sooheang@gmail.com)
  • 2. Translating statistical ideas into software. R = Statistical Software
  • 3. 목 차 •모형구현의 시각에서 본 회귀나무모형 역사 •모듈러 패키지를 활용한 회귀나무모형 구현
  • 4. 본 세션은 아래와 같은 분들을 위해 기획되었습니다. • 대상: 연구자/대학원생 • 연구분야: 사회과학(심리)/의학(병리 혹은 예방) • 아래 분야에 대한 전문지식이 있으시면 세션이해가 수월하실 듯 합니다.
 -> 교과서가 아닌 실무(연구)에서의 통계적 가설검정 경험
 -> 1학기 분량의 기계학습 정규과정 수강 경험
 -> R(혹은 SAS, Python)에서 tree model을 사용해본 경험
  • 5. (Linear) Regression £Input Variables: X=(X1, X2, …, Xp) ∈ Rp £Target Variable: Y ∈ R £Model: Yi= β0+β1X1i+ β2X2i+…+βp Xpi+εi, εi~ iid N(0, σ2) £Parameters for Regression Coefficients: (β0,β1,β2,…,βp) Y X
  • 6. (Linear) Regression £Input Variables: X=(X1, X2, …, Xp) ∈ Rp £Target Variable: Y ∈ R £Model: Yi= β0+β1X1i+ β2X2i+…+βp Xpi+εi, εi~ iid N(0, σ2) £Parameters for Regression Coefficients: (β0,β1,β2,…,βp) Y X
  • 8. (Linear) Regression Tree Y Xc β0 β1 β2 x <= c x > c β1 β2
  • 10. Decision Tree Recursive Partitioning Tree(-structured) Model Classification (Regression) Tree
  • 11. (Linear) Regression Tree • A tree model is a logical model represented as a tree that shows how the value of a target variable can be predicted by using the values of a set of predictor (input) variables. • A tree model recursively partitions the data and sample space to construct the predictive model. • Its name derives from the practice of displaying the partitions as a decision tree, from which the roles of the predictor variables may be inferred. • The idea was first implemented by Morgan and Sonquist in 1963. It is called the AID (Automatic Interaction Detection) algorithm.
  • 12. AID model Automatic Interaction Detection (AID) model by Morgan and Sonquist (1963; JASA) http://home.isr.umich.edu/education/fellowships-awards/james-morgan-fund/
  • 14. Code name ‘SEARCH’ Automatic Interaction Detector Program 
 (Sonquist and Morgan, 1964) Enhanced version of the AID program
 (Sonquist et al., 1974)
  • 16. CART model Classification and Regression Tree (CART) model by BFOS (1976~) Leo Breiman Jerome Friedman Richard Olshen Charles Stone
  • 17. CART model Stone (1977). Consistent Nonparametric Regression(with discussion), The Annals of Statistics, 5, 590-625
  • 18. CART model = Pruning • The faithfulness of any classification tree is measured by a deviance measure, D(T), which takes its miminum value at zero if every member of the training sample is uniquely and correctly classified. • The size of a tree is the number of terminal nodes. • A cost-complexity measure of a tree is the deviance penalized by a multiple of the size: D(T) = D(T) + α size(T) where α is a tuning constant. This is eventually minimized. • Low values of α for this measure imply that accuracy of prediction (in the training sample) is more important than simplicity. • High values of α rate simplicity relatively more highly than predictive accuracy.
  • 19. Implementation of CART model tree package rpart package
  • 20. Implementation of CART model tree package rpart package
  • 21. Implementation of CART model • In R there is a native tree library that V&R have some reservations about. It is useful, though. •rpart is a library written by Beth Atkinson and Terry Therneau of the Mayo Clinic, Rochester, NY. It is much closer to the spirit of the original CART algorithm of Breiman, et al. It is now supplied with both S-PLUS and R. • In R, there is a tree library that is an S-PLUS look-alike, but we think better in some respects. •rpart is the more flexible and allows various splitting criteria and different model bases (survival trees, for example). •rpart is probably the better package, but tree is acceptable and some things such as cross-validation are easier with tree. • In this discussion we (nevertheless) largely use tree!
  • 22. Implementation of CART model https://cran.r-project.org/package=rpart
  • 23. CART Model Tree Size (=Pruning) + Theoretical Properties
  • 24. GUIDE model Generalised Unbiased Variable Selection and Interaction Detection model 
 by Low and others (1986~)
  • 25. Implementation of GUIDE model IBM SPSS GUIDE CORE
 (Fortran95) GUIDE Interface
  • 26. GUIDE Model piecewise linear model + segmentation + Unified Framework with statistical testing
  • 27. CTREE and MOB model Model-based Recursive Partitioning by Hothorn and Zeileis (2004~)
  • 28. CTREE and MOB model Models: Estimation of parametric models with observations yi (and regressors xi), parameter vector θ, and additive objective function Ψ. Recursive partitioning: 1 Fit the model in the current subsample. 2 Assess the stability of θ across each partitioning variable zj. 3 Split sample along the zj∗ with strongest association: Choose breakpoint with highest improvement of the model fit. 4 Repeat steps 1–3 recursively in the subsamples until some stopping criterion is met. ˆ✓ = argmin✓ P i 0 (yi, xi, ˆ✓)
  • 29. CTREE and MOB model
  • 31. (Regression) Tree-based Model Unified Framework with modular system
  • 32. 정리 (tree model의 관점) • 1세대 - Michigan (1964 ~ 199x)
 piecewise constant model with exhaustive (heuristic) search • 2세대 - Berkely & Stanford (1972 ~ 200x)
 Unified tree framework with exhastive search • 2.5세대 - Wisconsin & ISI (1986 ~ 201x)
 Unified tree framework with statistical testing • 3세대 - LMU & Upenn & UNC (2005 ~ 201x)
 Unified tree framework with piecewise model-based model
 + extensions (Domain / Bayesian Approaches / Tree-structured Objects) 순도 100% 개인적 생각
  • 33. 정리 (구현 관점) The CRAN task view on “Machine Learning” at http://CRAN.R-project.org/ view=MachineLearning lists numerous packages for tree-based modeling and recursive partitioning, including – rpart (CART),
 – tree (CART),
 – mvpart (multivariate CART),
 – RWeka (J4.8, M5’, LMT),
 – party (CTree, MOB),
 – and many more (C50, quint, stima, . . . ). Related: Packages for tree-based ensemble methods such as random forests or boosting, e.g., randomForest, gbm, mboost, etc.
  • 34. 모듈형 패키지를 활용한 나만의 회귀나무모형 만들기
  • 35. How many LEGOs need for building a tree model? segmentation statistical testing model estimation pruning
  • 36. 1 Fit a model to the y or y and x variables using the observations in the current node 2 Assess the stability of the model parameters with respect to each of the partitioning variables z1, ..., zl. If there is some overall instability, choose the variable z associated with the smallest p value for partitioning, otherwise stop. 3 Search for the locally optimal split in z by minimizing the objective function of the model. Typically, this will be something like deviance or the negative logLik. 4 Refit the model in both kid subsamples and repeat from step 2. How many LEGOs need for building a tree model? http://partykit.r-forge.r-project.org/partykit/outreach/
  • 37. modular R package - partykit http://partykit.r-forge.r-project.org/partykit/outreach/
  • 40.
  • 43.
  • 44. Implementation: Models Input: Basic interface. fit(y, x = NULL, start = NULL, weights = NULL, offset = NULL, ...) y, x, weights, offset are (the subset of) the preprocessed data. Starting values and further fitting arguments are in start and .... Output: Fitted model object of class with suitable methods. coef(): Estimated parameters hat_{theta} logLik(): Maximized log-likelihood function . estfun(): Empirical estimating functions Ψ0 http://partykit.r-forge.r-project.org/partykit/outreach/
  • 45. Implementation: Models Input: Extended interface. fit(y, x = NULL, start = NULL, weights = NULL, offset = NULL, ..., estfun = FALSE, object = FALSE) Output: List object coefficients: Estimated parameters objfun: Minimized objective function estfun: Empirical estimating functions object: A model object for which further methods could be available (e.g., predict(), or fitted(), etc.). Internally: Extended interface constructed from basic interface if supplied. Efficiency can be gained through extended approach. ˆ✓ P i (yi, xi, ˆ✓) 0 (yi, xi, ˆ✓) http://partykit.r-forge.r-project.org/partykit/outreach/
  • 46. Implementation: Framework Class: ‘modelparty’ inheriting from ‘party’. Main addition: Data handling for regressor and partitioning variables. The Formula package is used for two-part formulas, e.g., y ~ x1 + x2 | z1 + z2 + z3. The corresponding terms are stored for the combined model and only for the partitioning variables. Additional information: In info slots of ‘party’ and ‘partynode’. call, formula, Formula, terms (partitioning variables only), fit, control, dots, nreg. coefficients, objfun, object, nobs, p.value, test. Reusability: Could in principle be used for other model trees as well (inferred by other algorithms than MOB). http://partykit.r-forge.r-project.org/partykit/outreach/
  • 48. Example: Bradley-Terry Tree • Task: Preference scaling of attractiveness. • Data: Paired comparisons of attractiveness. Germany’s Next Topmodel 2007 finalists: Barbara, Anni, Hana,Fiona, Mandy, Anja. Survey with 192 respondents at Universit t T bingen. Available covariates: Gender, age, familiarty with the TV show. Familiarity assessed by yes/no questions: 
 (1) Do you recognize the women?/Do you know the show? 
 (2) Did you watch it regularly?
 (3) Did you watch the final show?/Do you know who won? http://partykit.r-forge.r-project.org/partykit/outreach/
  • 49. Example: Bradley-Terry Tree Model: Bradley-Terry (or Bradley-Terry-Luce) model. Standard model for paired comparisons in social sciences. Parametrizes probability for preferring object i over j in terms of corresponding “ability” or “worth” parameters Implementation: bttree() in psychotree (Strobl et al. 2011). Here: Use mob() directly to build model from scratch using btReg.fit() from psychotools ⇡ij ✓i ⇡ij = ✓i ✓i+✓j http://partykit.r-forge.r-project.org/partykit/outreach/
  • 52.
  • 53.
  • 56.
  • 57.
  • 58.
  • 60. Case 1. random effects tree model Eo and Cho (2014)
  • 61. Case 2. latent variable tree model Lee et al. (2012); Eo et al. (2014); Eo (2015);
  • 62. Case 3. multiway splits tree model Eo et al. (2014)Kim and Loh (1998)
  • 64. Case 5. Industry (Quantile Tree)
  • 66. Extensions to Deep Learning
  • 67.
  • 68. Summary R에서 tree-structured model을 쓰지 않는다면 앙꼬없는 찐빵을 먹는것과 같다
  • 69. 매일 밤 졸업을 위해 고민하는 대학원생들에게 본 세션을 바칩니다. http://www.phdcomics.com/
  • 70. Some review papers • Loh, W.-Y. (2014). Fifty years of classification and regression trees (with discussion). International Statistical Review, vol, pages. • Loh, W.-Y. (2008). Regression by parts: Fitting visually interpretable models with GUIDE. In Handbook of Data Visualization, C.Chen, W.H rdle, and A.Unwin, Eds. Springer, pp.447-469. • Loh, W.-Y. (2008). Classification and regression tree methods. In Encyclopedia of Statistics in Quality and Reliability, F.Ruggeri, R.Kenett, and F.W. Faltin, Eds. Wiley, Chichester, UK, pp.315-323. • Loh, W.-Y. (2010). Tree-structured classifiers. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 364-369. • Loh, W.-Y. (2011). Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14-23. • Merkle, E.C. and Shaffer, V.A. (2011). Binary recursive partitioning: Background, methods, and application to psychology, British Journal of Mathematical and Statistical Psychology, 64, 161–181. • Morgan, J. N. and Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. J. Amer. Statist. Assoc. 58 415–434. • Strobl, C., Malley, J. and Tutz, G. (2009). An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random forests. Psychological Methods, 14(4), 323–348.