Comparison of methods used to predict mass killings

•

0 likes•123 views

Surui Sun

Outline
Mass killing has long been a focus for worldwide concern
Source of data and reference:
http://www.earlywarningproject.com/data
Purpose: to predict the probability of occurence
Approaches to modeling:
Logistic, Lasso, Ridge
Missing data, Random Forest, Random Uniform Forest
Roc curve, Heatmap
1 / 11

Data Dictionary
Introduction:
to generate statistical assessments of the risk of onsets of
state-led mass killing epsiodes in countries worldwide
output variable:
mkl.start.1: yi ∈ {−1, 1}
input variable:
reg.afr: reg.amr different region
mkl.ongoing: any ongoing episodes of state-led mass kill
mkl.ever: any state-led mass killing since WWII
countryage.ln: log of country age
wdi.popsize.ln: log of population size
. . .
2 / 11

Lasso vs. Ridge
Lambda produced by cv.glmnet in glmnet package: K-fold
CV
Lasso regularization doesn’t work very well:
coefﬁcients are very small. If penalized, all of them will
shrinkage to 0
for N >> P, under highly correlated variables, ridge does
better than lasso
Logistic: Fitting Logistic Regression Models and predicting
Logistic probability
3 / 11

Missing Data
the Imputation approach:
impute the missing value with the mean or median of the
nonmissing values for that feature.
Use R package imputeMissings
4 / 11

Random Forest and Random Uniform Forest
Random Forest:
R package randomForest
mtry=3 ntree=1000
Random Uniform Forest:
R package randomUniformForest
metry=3, ntree=1000
5 / 11

AdaBoost
function:
(βm, Gm) = argminβ,G
N
i=1
exp[−yi(fm−1(xi) + βG(xi))]
the individual classiﬁers Gm(x) ∈ {−1, 1}
R package: adabag
6 / 11

Roc
Why Roc?
Since rare event with lots of 0 by random forest will result
predicted error –>1. Thus, we use Roc curve to get the
inﬂuence of different threshold to classiﬁcation.
7 / 11

Roc curve
use ROCR package
We ﬁnd that RUF(Random Uniform Forest) is the best method
Truepositiverate
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
RF = 0.813
RUF = 0.821
Ridge = 0.813
Logistic = 0.790
Adaboosting = 0.738
8 / 11

Heatmap
Why Heatmap?
It is difﬁcult to predict 1, thus we use probability instead of
classiﬁcation to show the relative prob in different countries.
And we can plot top 30 countries with largest prob.
9 / 11

What's hot

Community ecology lab - species diversityjamisona

Change Point AnalysisMark Conway

(Big) data for env. monitoring, public health and verifiable risk assessment-...BigData_Europe

One Weird Trick to Increase A/B Test PowerJohn Clevenger

MUMS: Coupling Uncertain Geophysical Hazards Workshop: Coastal Flooding Uncer...The Statistical and Applied Mathematical Sciences Institute

The physics background of the BDE SC5 pilot casesBigData_Europe

Luo-IGARSS2011-2385.pptgrssieee

Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...The Statistical and Applied Mathematical Sciences Institute

Spatial presentation of prognosis models in plant protectionCAPIGI

Modeling the water food-energy nexus in the Arab world: The NASA land informa...International Food Policy Research Institute (IFPRI)

NDGeospatialSummit2019 - Earthquake Vulnerability in the Region of Arequipa, ...North Dakota GIS Hub

Predictive modellingInchara Diwakar

Evaluating aboveground terrestrial carbon flux as ecosystem planningWorld Agroforestry (ICRAF)

Crop Scan - Daniel LinklaterMallee Sustainable Farming

U.S.NationalAgriculturalLandCoverMonitoring_Mueller.pptxgrssieee

Change detection of forest fire in los angelesParthipan S

final project poster finalZachary Woodcock

Geospatial Techniques for Measuring SI Assessment Indicatorsafrica-rising

ADOpresentation.lv20150714Luana Valentini

Pangeo climpred presentationRiley X. Brady

What's hot (20)

Community ecology lab - species diversity

Change Point Analysis

(Big) data for env. monitoring, public health and verifiable risk assessment-...

One Weird Trick to Increase A/B Test Power

MUMS: Coupling Uncertain Geophysical Hazards Workshop: Coastal Flooding Uncer...

The physics background of the BDE SC5 pilot cases

Luo-IGARSS2011-2385.ppt

Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...

Spatial presentation of prognosis models in plant protection

Modeling the water food-energy nexus in the Arab world: The NASA land informa...

NDGeospatialSummit2019 - Earthquake Vulnerability in the Region of Arequipa, ...

Predictive modelling

Evaluating aboveground terrestrial carbon flux as ecosystem planning

Crop Scan - Daniel Linklater

U.S.NationalAgriculturalLandCoverMonitoring_Mueller.pptx

Change detection of forest fire in los angeles

final project poster final

Geospatial Techniques for Measuring SI Assessment Indicators

ADOpresentation.lv20150714

Pangeo climpred presentation

Similar to Comparison of methods used to predict mass killings

CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...The Statistical and Applied Mathematical Sciences Institute

d-VMP: Distributed Variational Message Passing (PGM2016)AMIDST Toolbox

PresentationAndrey Skripnikov

Gamma and inverse Gaussian frailty models: A comparative studyinventionjournals

Data exploration and graphics with RAlberto Labarga

Artificial Intelligence Muhammad Ahad

Hmm and neural networksJanani Ramasamy

Mitigating Interference to GPS Operation Using Variable Forgetting Factor Bas...IJCNCJournal

Egg deposition for Bloaters, aged 3, in Lake Huron and predicting next five y...Arpan Kumar

07 Machine Learning - Expectation MaximizationAndres Mendez-Vazquez

Modeling the Chlorophyll-a from Sea Surface Reflectance in West Africa by Dee...gerogepatton

CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...The Statistical and Applied Mathematical Sciences Institute

An Introduction To Basic Statistics And ProbabilityMaria Perkins

ABC short course: final chaptersChristian Robert

ResearchPaperDaniel Healy

A Statistical Perspective on Retrieval-Based Models.pdfPo-Chuan Chen

Robust Portfolio Optimization with Multivariate Copulas: A Worst-Case CVaR Ap...Fernando A. B. Sabino da Silva

Ica group 3[1]Apoorva Srinivasan

Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...AMIDST Toolbox

2009 asilomarPioneer Natural Resources

Similar to Comparison of methods used to predict mass killings (20)

CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...

d-VMP: Distributed Variational Message Passing (PGM2016)

Presentation

Gamma and inverse Gaussian frailty models: A comparative study

Data exploration and graphics with R

Artificial Intelligence

Hmm and neural networks

Mitigating Interference to GPS Operation Using Variable Forgetting Factor Bas...

Egg deposition for Bloaters, aged 3, in Lake Huron and predicting next five y...

07 Machine Learning - Expectation Maximization

Modeling the Chlorophyll-a from Sea Surface Reflectance in West Africa by Dee...

CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...

An Introduction To Basic Statistics And Probability

ABC short course: final chapters

ResearchPaper

A Statistical Perspective on Retrieval-Based Models.pdf

Robust Portfolio Optimization with Multivariate Copulas: A Worst-Case CVaR Ap...

Ica group 3[1]

Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...

2009 asilomar

Comparison of methods used to predict mass killings

1. Comparison of methods used to predict Mass-killing Surui Sun Chufeng Hu Yidan Sun Department of Statistics UCLA March 8, 2016

2. Outline Mass killing has long been a focus for worldwide concern Source of data and reference: http://www.earlywarningproject.com/data Purpose: to predict the probability of occurence Approaches to modeling: Logistic, Lasso, Ridge Missing data, Random Forest, Random Uniform Forest Roc curve, Heatmap 1 / 11

3. Data Dictionary Introduction: to generate statistical assessments of the risk of onsets of state-led mass killing epsiodes in countries worldwide output variable: mkl.start.1: yi ∈ {−1, 1} input variable: reg.afr: reg.amr different region mkl.ongoing: any ongoing episodes of state-led mass kill mkl.ever: any state-led mass killing since WWII countryage.ln: log of country age wdi.popsize.ln: log of population size . . . 2 / 11

4. Data Dictionary Introduction: to generate statistical assessments of the risk of onsets of state-led mass killing epsiodes in countries worldwide output variable: mkl.start.1: yi ∈ {−1, 1} input variable: reg.afr: reg.amr different region mkl.ongoing: any ongoing episodes of state-led mass kill mkl.ever: any state-led mass killing since WWII countryage.ln: log of country age wdi.popsize.ln: log of population size . . . Variable: 32 Variables 9163 Observations 2 / 11

5. Lasso vs. Ridge Lambda produced by cv.glmnet in glmnet package: K-fold CV Lasso regularization doesn’t work very well: coefﬁcients are very small. If penalized, all of them will shrinkage to 0 for N >> P, under highly correlated variables, ridge does better than lasso Logistic: Fitting Logistic Regression Models and predicting Logistic probability 3 / 11

6. Missing Data the Imputation approach: impute the missing value with the mean or median of the nonmissing values for that feature. Use R package imputeMissings 4 / 11

7. Random Forest and Random Uniform Forest Random Forest: R package randomForest mtry=3 ntree=1000 Random Uniform Forest: R package randomUniformForest metry=3, ntree=1000 5 / 11

8. AdaBoost function: (βm, Gm) = argminβ,G N i=1 exp[−yi(fm−1(xi) + βG(xi))] the individual classiﬁers Gm(x) ∈ {−1, 1} R package: adabag 6 / 11

9. Roc Why Roc? Since rare event with lots of 0 by random forest will result predicted error –>1. Thus, we use Roc curve to get the inﬂuence of different threshold to classiﬁcation. 7 / 11

10. Roc curve use ROCR package 8 / 11

11. Roc curve use ROCR package We ﬁnd that RUF(Random Uniform Forest) is the best method Truepositiverate 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 RF = 0.813 RUF = 0.821 Ridge = 0.813 Logistic = 0.790 Adaboosting = 0.738 8 / 11

12. Heatmap Why Heatmap? It is difﬁcult to predict 1, thus we use probability instead of classiﬁcation to show the relative prob in different countries. And we can plot top 30 countries with largest prob. 9 / 11

13. Headmap use rworldmap package 10 / 11

14. dotplot 11 / 11

Comparison of methods used to predict mass killings

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Comparison of methods used to predict mass killings

Similar to Comparison of methods used to predict mass killings (20)

Comparison of methods used to predict mass killings