SlideShare a Scribd company logo
1 of 33
Download to read offline
The R User Conference 2014 
useR 2014 
@Ä- 
UCLA in LA 
6.30  7.3 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 1 / 33
What is useR? 
Contents 
1 What is useR? 
2 1st day: Tutorial 
Applied Predictive Modeling in R 
Graphical Models and Bayesian Networks with R 
3 2nd day 
4 3rd day 
5 4th day 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 2 / 33
What is useR? 
Since 2004: : : 
Main meeting of the R user and developer community. 
Invited keynote lectures 
broad spectrum of topics ranging from technical and R-related 
computing issues to general statistical topics of current interest 
User-contributed presentations 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 3 / 33
What is useR? 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 4 / 33
What is useR? 
Figure. Afternoon tutorial: dplyr 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 5 / 33
1st day: Tutorial 
Contents 
1 What is useR? 
2 1st day: Tutorial 
Applied Predictive Modeling in R 
Graphical Models and Bayesian Networks with R 
3 2nd day 
4 3rd day 
5 4th day 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 6 / 33
1st day: Tutorial 
List 
http://user2014.stat.ucla.edu/#tutorials 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 7 / 33
1st day: Tutorial Applied Predictive Modeling in R 
Applied Predictive Modeling in R 
Max Kuhn, Ph.D : P
zer Global RD 
http://appliedpredictivemodeling.com 
caret package in R 
Outline 
Conventions in R 
Data Splitting and Estimating Performance 
Data Pre-Processing 
Over{Fitting and Resampling 
Training and Tuning Tree Models 
Training and Tuning A Support Vector Machine 
Comparing Models 
Parallel Processing 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 8 / 33
1st day: Tutorial Applied Predictive Modeling in R 
0´ 
1 t VS ! 
Trade-o : Logistic regressiont t@ ¸XÀÌ(Odds ratio), t| 
àÑ` D”” Æä. !X UÄ@ Ä| t ¼ÈàÀ logit 
@ hÈ  ˆä. 
XäD D”Ð 0|  mŒ À(ex: Log ÀX, scale, centering) 
R2, AIC, p-value: : : VS cross validation, bootstrapping, sampling, 
ROC curve 
2 Supervised machine learning 
Regression : simple, glm, PCA, penalized(Ridge, Lasso, elastic-net) : : : 
Classi
cation: K{Nearest Neighbors, trees 
Common: Boosting, Support Vector Machine (SVM) 
3 Parallel Processing 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 9 / 33
1st day: Tutorial Applied Predictive Modeling in R 
ùtü  ¥Äü 
1 Supervised learning 
ù¥ ©. 
Our data| Rt ù`  ˆ”??? 
2 Unsupervised learning 
Deep learning: ì5 à½Ý(DNN: Deep Neural Network) 
2014D uì0  Q˜. ex)L1 xÝ 
(|xX? 
https://class.coursera.org/neuralnets-2012-001/lecture 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 10 / 33
1st day: Tutorial Graphical Models and Bayesian Networks with R 
Graphical Models and Bayesian Networks with R 
Probability propagation with Bayesian networks (BNs) and their 
implementation in the gRain (gRaphical independence networks) 
package. 
A look under the hood of BNs to understand mechanisms of 
probability propagation. Dependency graphs and conditional 
independence restrictions. 
Log-linear models, graphical models, decompsable models and their 
implementation in the gRim (gRaphical independence models) 
package. 
Model selection with gRim 
Converting a decompsable graphical model to a Bayesian network. 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 11 / 33
1st day: Tutorial Graphical Models and Bayesian Networks with R 
The chest clinic narrative 
p(V) = p(a)p(tja)p(s)p(l js)p(bjs)p(ejt; l)p(dje; b)p(xje) 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 12 / 33
2nd day 
Contents 
1 What is useR? 
2 1st day: Tutorial 
Applied Predictive Modeling in R 
Graphical Models and Bayesian Networks with R 
3 2nd day 
4 3rd day 
5 4th day 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 13 / 33
2nd day 
Opening Keynote 
John Chambers(Stats, Stanford) - Interfaces, Eciency and Big 
Data 
Rcpp: cpp function ! R 
RLLVM, http://www.omegahat.org/Rllvm: RD compilet ôä. 
h2o: java baseX machine learining for big data. 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 14 / 33
2nd day 
1185 Terra Bella Ave, Mountain View, CA 94043 | 650-429-8337 | h2o.ai 
.ai 
.ai 
H2O is the world’s fastest in-memory platform for 
machine learning and predictive analytics on big 
data. It is the only alternative to combine the power 
of highly advanced algorithms, the freedom of open 
source, and the capacity of truly scalable in-memory 
processing for big data on one or many nodes. Com-bined, 
these capabilities make it faster, easier, and 
more cost effective to harness big data to maximum 
benefit for the business. 
With H2O, you can: 
• Make better predictions. Harness sophisticated, 
ready-to-use algorithms and the processing 
power you need to analyze bigger data sets, 
more models, and more variables. 
• Get started with minimal effort and invest-ment. 
H2O is an extensible open source platform 
that offers the most pragmatic way to put big 
data to work for your business. With H2O, you 
can work with your existing languages and tools. 
Further, you can extend the platform seamlessly 
into your Hadoop environments. 
Churn Prediction 
• Banking. What are the profiles and usage 
patterns of customers who are most likely to 
defect? 
• Online retail. What are the leading indicators 
and patterns of behavior to predict customer 
churn? Predict the segment of customers most 
likely to churn, and when, 
in order to intercept it and 
change their behavior. 
Fraud Prediction 
• Payment processor. 
Predict fraudulent activity 
using anomaly detection 
methods. 
• Insurance. Stop fraud 
before claims are paid 
using real time scoring. 
Identify repeat offenders 
and score incoming claims 
based on fraudulent his-tory 
patterns. 
Scoring Engine 
• Score customers based on purchase history and 
analyze the lifetime value of key accounts to 
discover upsell and cross-sell opportunities. 
Pricing Engine 
• Travel. Analyze different cost and promotional 
packages to create the most competitive combi-nation 
of services. 
• Healthcare. Discover new insights and create 
competitive services and healthcare programs 
by analyzing patient attributes, including envi-ronment, 
lifestyle and medical history. 
Forecast 
• Real estate. Predict property value and forecast 
sales by neighborhoods and regional variables. 
Analyze larger nationwide datasets vs. smaller 
sample sets to realize greater accuracy and find 
previously unnoticed patterns. 
Key Benefits 
BETTER PREDICTIONS 
• Ready-to-use, powerful algorithms for 
regression, classification, clustering, and 
deep learning—along with advanced capa-bilities 
for churn prediction, recommenda-tions, 
fraud prediction, and more. 
SPEED 
• In-memory processing provides real-time 
responsiveness and enables you to run 
more models. 
• Fine-grain parallel distribution on big 
data—enabling accurate computations 
across one or many nodes by moving the 
code to the data. 
EASE OF USE 
• Easy set up and use, either through an 
intuitive Web interface or your existing 
tools, including R, Java, Scala, and Python. 
• Model export in plain Java code for real-time 
scoring in any environment. 
EXTENSIBILITY 
• Seamless Hadoop integration with distrib-uted 
data ingestion from HDFS and S3. 
Algorithms 
EXPLORATORY DATA ANALYTICS (EDA) 
• Summary* 
• K-Means* 
• PCA* 
• Data Munging / Transformation* 
* Supported in R 
ADVANCED ALGORITHMS 
• Generalized Linear Model (GLM)— 
Poisson, Gamma Tweedie, binomial (logit), 
Gaussian* 
• Random Forest* 
• Gradient Boosted Regression* 
• Gradient Boosted Classification* 
* Low Latency Java Scoring 
SCORING AND PREDICTION ENGINES 
• GLM 
• Random Forest 
• Gradient Boosted Regression 
• Gradient Boosted Classification 
• K-Means 
DEEP LEARNING 
• Neural Networks 
H2O 
The Open Source In-Memory Prediction Engine 
What can you do with better 
predictions? Expect more 
from your data. 
Customers 
spot a job that should 
be stopped and more 
quickly iterate to find 
the optimal approach. 
Native R and 
Seamless 
Hadoop 
Integration 
H2O can run as a 
standalone platform 
or within an existing 
Hadoop installation, 
bringing in-memory 
performance to 
Hadoop. H2O works 
with data in HDFS and 
supports familiar pro-gramming 
tools, such 
as Hive and Pig. In 
addition, the solution 
can be efficiently run 
in Amazon Web Ser-vices 
environments. 
Fine-Grain Distributed 
Processing on Big Data at 
Speeds Up to 100x Faster 
Faster H2O lets you model interactively using 
in-memory processing, and delivers paral-lel 
distributed scalability required to support 
your big data production environments. The 
solution combines the responsiveness of in-memory 
processing with the ability to run fast 
serialization between nodes and clusters—so 
you can support the size requirements of 
your large data sets. Further, H2O does this 
distributed processing with fine-grain parallel-ism, 
which enables optimal efficiency, without 
introducing degradation in computational 
accuracy. 
Join the H2O Movement 
H2O brings better algorithms to big data. 
H2O is a fast open source in-memory predic-tion 
engine and machine learning platform. 
With H2O enterprises can use all of their data 
(instead of sampling) in real-time for better 
predictions. Users can model data quickly and 
make better data-driven decisions faster by 
running advanced algorithms such as Deep 
Learning, Classification, Regression, Decision 
Trees, Forests, Gradient Boosting, GLM, PCA 
and more. Data Scientists can take both simple 
 sophisticated models to production from the 
same interactive platform used for modeling 
within R and JSON. 
Our earliest customers have built powerful 
domain specific predictive engines for Recom-mendations, 
Pricing, Outlier Detection and 
Fraud Prediction for Insurance and Ad Plat-forms. 
H2O is nurturing a grassroots movement 
of math, systems and data scientists to herald 
the new wave of Discovery with Big Data 
Science. H2O is on CRN’s 10 Coolest Big Data 
Products of 2013. www.h2o.ai 
For latest features and updates, go to H2O 
Open Source Github Repository 
http://0xdata.github.io/h2o/ 
H2O Billion Row Machine Learning Benchmark 
GLM Logistic Regression 
Hadoop/Mahout 
H2O 16 EC2 
nodes 
H2O 16 EC2 
nodes 
H2O 48 EC2 
nodes 
H2O 48 EC2 
nodes 
34.9 sec, 3 itera
ons 
numerical and categorical 
16.5 sec, 2 itera
ons 
numerical 
14.2 sec, 3 itera
ons 
numerical and categorical 
5.6 sec, 2 itera
ons 
numerical 
Compute Hardware: AWS EC2 c3.2xlarge - 8 cores and 15 GB per node, 1 GbE interconnect 
Airline Dataset 1987-2013, 42 GB CSV, 1 billion rows, 12 input columns, 1 outcome column 
9 numerical features, 3 categorical features with cardinali
es 30, 376 and 380 
Work with R, Familiar Tools and 
Intuitive Interfaces 
Through its intuitive Web interface and inte-gration 
with common tools, H2O makes it fast 
and easy to get started with big data analyt-ics. 
The solution works seamlessly with R and 
R Studio. For example, using the R interface, 
you can forward workflows to H2O for big data 
processing, and work in a familiar interface 
while running algorithms on data sets that are 
hundreds of times larger than what would be 
possible on a user machine. H2O also features 
native support for Java, Scala, and Python. 
The solution’s interface is driven by JSON APIs, 
which makes it easy to plug into your organiza-tion’s 
existing tools and processes to train your 
data and continuously improve your models 
and predictive accuracy. 
In-Memory Processing 
Responsiveness 
With H2O, your organization can harness the 
responsiveness of highly optimized in-memory 
processing, so you can operationalize many 
more models and gain real-time intelligence in 
business transactions and interactions. With 
model export as plain Java code, you gain light-ning 
fast real-time scoring in any environment. 
In addition, the solution enables data scientists 
to view partial query results while longer pro-cesses 
are running, so they can immediately 
Copyright © H20 All rights reserved. All trademarks referenced herein belong to their respective companies. 
.ai 1185 Terra Bella Ave, Mountain View, CA 94043 | 650-429-8337 | h2o.ai 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 15 / 33
2nd day 
Session 1: Bayesian 
 tÀH µÄ Ä°Ð approximation 
Spatial analysis ©. 
 tÀH Œ¸è´.. 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 16 / 33
2nd day 
Session 2 
R For Improving Consumer Engagement and Health Outcomes 
ActiveHealth Management, Inc 
internal member data + externally-purchased lifestyle  behavioral 
data 
K-Means Clustering and CART classi
cation trees 
Response rate øùÐ 0| ätä. Þ¤ ©. 
Shiny: R made interactive 
http://shiny.rstudio.com/gallery/kmeans-example.html 
Fostering the next generation of open science with R 
http://ropensci.org/ 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 17 / 33
2nd day 
Invited Talk 
Martin Maechler(Math, Zurich)- Good Practices in R Programming 
¡ä T)¤¬ä.. (ex:   VS =, ü, D´ð0 ñ..) 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 18 / 33
2nd day 
Session 3 
dplyr,data.table: High performance in data step 
PivotalR: A Package for Machine Learning on Big Data 
https://github.com/gopivotal/PivotalR 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 19 / 33
2nd day 
Poster Session 1: 28 posters 
Reproducible Research in Public Health 
Jinseob Kim1, Joohon Sung1,? 
1. Complex Disease and Genetic Epidemiology Branch, Graduate School of Public Health, Seoul National University 
Objectives 
In this study, we aimed to construct pipelines 
of reproducible statistical analysis in health 
research. The development of pipelines in this 
study consists of 
• Automatic suggestions of a summary table 
describing the general characteristics of the 
study 
• Univariate analysis of both explanatory and 
outcome variables of a study 
• Graphical presentations of summary and 
univariate analyses 
• Automatic analysis and tabulations of main 
results 
based on frequently used analytical methods 
in the health research area (e.g., multiple re-gression, 
logistic regression, survival analysis, 
multilevel analysis, genome-wide association 
study(GWAS)). 
Background 
For most health reasearchers who learned and ap-plied 
the statistical methods properly had spent 
long time for learning statistics. Addtionally, sta-tistical 
methods are ever evolving and updating 
their knowledge and analytic skills will require the 
most precious resources-the time. 
Methods 
With xtable packages in R and tex4ht in LATEX, 
researcher will get PDF or odt(open document 
text) version of descriptive statistics when they se-lect 
data, variables(factor or numeric), strata vari-able( 
sex, region, etc. . . )[1, 2]. In next, among var-ious 
analysis, we present GWAS’ example using 
FASTA method in GenABEL R package or fas-tassoc 
method in MERLIN. If they select lists of 
phenotypes, covariates and kinship matrix, PDF 
or odt files including each phenotype’s name, nar-row 
sense heritability, top SNPs in GWAS re-sults, 
qqplot and manhattan plot were created 
automatically[3, 4]. Example dataset is TG, LDL 
phenotypes and chromosome 21 in Healthy Twin 
Study, Korea[5]. 
Key Methods 
Researchers can obtain tables and figures if they select data set and dependent variables of interest, 
and define the nature of each variables (e.g. continuous, binomial, count), explanatory variables, and 
group variable (e.g., sex, region, unit of random effects or family structure). 
Example: GWAS-TG 
Table : PDF file- Descriptive statistics of TG 
Variable: Mean (SD) or N (%) Male Female P-value P-value:NP 
Age 44.6 (13.6) 43.95 (12.73) 0.208 0.294 
Smoke  0.001  0.001 
No 278 (26.18) 1506 (90.29) 
Past 290 (27.31) 53 (3.18) 
Current 494 (46.52) 109 (6.53) 
TG 140.34 (92.34) 98.8 (62.82)  0.001  0.001 
NP: non-parametric 
Figure : PDF file- GWAS results of TG 
TG (h2 = 0.48) 
SNP Chromosome Position A1 A2 N MAF B FASTA SE FASTA P FASTA B fassoc SE fassoc P fassoc 
rs12626621 21 23496579 C T 1838 0.167 13.61 3.46 8.41E-05 -12.82 3.68 5.00E-04 
rs1702393 21 30866791 C T 1832 0.353 10.67 2.72 8.76E-05 -11.51 2.96 9.99E-05 
rs12627596 21 30867101 T A 1840 0.352 10.06 2.72 2.10E-04 -11.06 2.95 1.81E-04 
rs128592 21 30878280 C T 1841 0.352 10.04 2.72 2.18E-04 -11.06 2.95 1.80E-04 
rs198935 21 30867998 C T 1799 0.358 10.13 2.74 2.19E-04 -11.42 2.98 1.26E-04 
rs11702393 21 39304254 G A 1832 0.297 10.71 2.91 2.35E-04 -12.55 3.13 6.23E-05 
rs382004 21 19226026 T C 1840 0.035 25.36 7.00 2.92E-04 -22.86 7.63 2.75E-03 
rs1888516 21 41219289 G A 1837 0.014 38.42 10.61 2.93E-04 -34.38 11.09 1.92E-03 
rs9306107 21 44798662 G A 1827 0.008 51.02 14.36 3.82E-04 -45.77 15.16 2.53E-03 
rs426803 21 19225690 T C 1839 0.037 24.02 6.78 3.98E-04 -22.16 7.38 2.67E-03 
rs198936 21 30869966 C T 1837 0.351 9.41 2.73 5.54E-04 -10.69 2.96 3.10E-04 
rs1702405 21 30914735 A G 1841 0.493 -9.05 2.63 5.86E-04 10.97 2.86 1.23E-04 
rs2257149 21 36213059 G A 1806 0.360 -9.39 2.75 6.24E-04 10.86 3.00 2.96E-04 
rs174897 21 30919037 A G 1835 0.492 -8.74 2.63 8.89E-04 10.62 2.85 1.92E-04 
rs198871 21 30932579 G A 1819 0.492 -8.76 2.64 8.91E-04 10.42 2.85 2.59E-04 
Table 1: GWAS: TG 
(a) QQplot-FASTA: TG (b) QQplot-Fassoc: TG 
Figure 1: QQplot: TG 
(a) Manhattan plot-FASTA: TG 
(b) Manhattan plot-Fassoc: TG 
Figure 2: Manhattan plot: TG 
1 
Example: GWAS:LDL 
Table : PDF file- Descriptive statistics of LDL 
Variable: Mean (SD) or N (%) Male Female P-value P-value:NP 
Age 44.6 (13.6) 43.95 (12.73) 0.208 0.294 
FBS 97.12 (19.67) 91.36 (16.54)  0.001  0.001 
tCholesterol 191.08 (34.99) 188.63 (35.83) 0.077 0.031 
HDL 46.45 (11.24) 52.26 (12.72)  0.001  0.001 
LDL 112.84 (30.7) 108.76 (30.3)  0.001  0.001 
NP: non-parametric 
Figure : PDF file- GWAS results of LDL 
LDL (h2 = 0.47) 
SNP Chromosome Position A1 A2 N MAF B FASTA SE FASTA P FASTA B fassoc SE fassoc P fassoc 
rs4818418 21 18802344 C T 1823 0.276 4.87 1.13 1.53E-05 -4.54 1.22 1.93E-04 
rs1735790 21 18788001 A G 1792 0.274 4.90 1.15 2.10E-05 -5.20 1.24 2.82E-05 
rs2824856 21 18793090 G A 1811 0.278 4.82 1.14 2.21E-05 -4.71 1.22 1.15E-04 
rs2824898 21 18813513 T C 1841 0.280 4.70 1.12 2.50E-05 -4.46 1.21 2.34E-04 
rs2252190 21 18792169 A G 1839 0.277 4.74 1.13 2.54E-05 -4.81 1.22 7.94E-05 
rs2824899 21 18813559 T C 1840 0.280 4.69 1.12 2.68E-05 -4.46 1.21 2.34E-04 
rs914244 21 46340779 C T 1839 0.408 -4.38 1.05 2.88E-05 4.78 1.15 3.07E-05 
rs2824857 21 18793134 G T 1837 0.279 4.67 1.12 3.26E-05 -4.71 1.22 1.09E-04 
rs2026211 21 18806617 C T 1842 0.276 4.64 1.12 3.51E-05 -4.38 1.21 3.10E-04 
rs2824880 21 18807606 A G 1842 0.276 4.64 1.12 3.51E-05 -4.38 1.21 3.10E-04 
rs2838534 21 44498077 T C 1811 0.352 3.94 1.06 2.04E-04 -4.69 1.16 5.22E-05 
rs456164 21 27761178 T C 1842 0.395 -3.82 1.05 2.85E-04 4.46 1.14 9.88E-05 
Table 1: GWAS: LDL 
(a) QQplot-FASTA: LDL (b) QQplot-Fassoc: LDL 
Figure 1: QQplot: LDL 
(a) Manhattan plot-FASTA: LDL 
(b) Manhattan plot-Fassoc: LDL 
Figure 2: Manhattan plot: LDL 
1 
Conclusion 
Using xtable package in R, LATEX and tex4ht pack-age 
in LATEX with various statistical packages in 
R, we developed a automatic words describing the 
result tables and figures with PDF or opendocu-ment 
format directly[1, 2]. Though we presented 
only descriptive statistics and GWAS examples, 
pipelines of other analysis(e.g., survival analysis, 
multilevel analysis, etc. . . ) were also made us-ing 
similar packages above and some additional 
packages. This automated statistical pipeline tools 
will help individual researcher in health-related or 
broader arena to help to reduce their analytical bur-dens, 
as well as to conduct appropriate statistical 
analysis much faster and reliable manner. 
References 
[1] David B. Dahl. xtable: Export tables to LaTeX or HTML, 
2014. R package version 1.7-3. 
[2] Emma Cliffe. Methods to produce flexible and accessible 
learning resources in mathematics: overview document. 
2012. 
[3] GenABEL project developers. GenABEL: genome-wide 
SNP association analysis, 2013. R package version 1.8-0. 
[4]Wei-Min Chen and Gonçalo R Abecasis. Family-based 
association tests for genomewide association scans. The 
American Journal of Human Genetics, 81(5):913–926, 
2007. 
[5] Joohon Sung, Sung-Il Cho, Yun-Mi Song, Kayoung Lee, 
Eun-Young Choi, Mina Ha, Jihae Kim, Ho Kim, Yeonju 
Kim, Eun-Kyung Shin, et al. Do we need more twin stud-ies? 
the healthy twin study, korea. International journal 
of epidemiology, 35(2):488–490, 2006. 
Contact Information 
•Web: http://snugepi.snu.ac.kr 
• Email: kimjinseob@snu.ac.kr 
• Phone: +82-2-880-2743 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 20 / 33
2nd day 
Visualizationt 8? 
mx 2…: ð8YP XüY(ˆ8), 1àYP(õ?) 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 21 / 33
3rd day 
Contents 
1 What is useR? 
2 1st day: Tutorial 
Applied Predictive Modeling in R 
Graphical Models and Bayesian Networks with R 
3 2nd day 
4 3rd day 
5 4th day 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 22 / 33
3rd day 
Invited Talk 
Dirk Eddelbuettel - R, C++ and Rcpp 
Rcpp : RX ¸h + cppX speed 
RÐ èXŒ(pointerD”ÆL) cpp h Ìàä. 
cppTÜÐ R h ½…¥. 
R () cpp  ¬ Tää: : : 
Docker : È´ Á8à 
0tX Á8à : ø¨ + |t
ì¬ + OS 
Docker : ø¨ + |t
ì¬ only 
Only for linux 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 23 / 33
3rd day 
Sponsors Talk 
Revolution Analytics 
ÁÅ© R. 
SASôä «ä. 
Oracle: ROracle 
R+ Oracle database 
Google 
SAS Æä. 
ÔÀÈ´, °, µÄY õ lx. 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 24 / 33
3rd day 
Session 4 
R in the Midst of Exploding Stars: Distributed, Time-Domain Transient 
Classi
cation 
NASAÐÄ RD ôä. 
à1 ñ Ä „XÐ machine learning(clustering..) t©. 
Imputation of Missing Values with the R Package VIM 
ä‘ Imputation method ÀÐ. 
Imputation°ü| ø˜= ÀÐ. 
PSAboot: An R Package for Bootstrapping Propensity Score Analysis 
http://github.com/jbryer/psa 
Risk of bias due to unobserved covariates 
ä‘ )•(5À) bootstrapD t©Xì PSA ‰h. 
Permutation Tests in Multidimensional Scaling 
ä(Ð ô ! Áx p¬ . 
smacof package in R 
Permutation test: dissimilarity randomXŒ „ì(null) VS pt0X „ì 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 25 / 33
3rd day 
Invited Talk 
David Diez(Openintro) - Textbooks struggle where software 
succeeds 
http://www.openintro.org 
Open source textbook: paperback  $10 
Labs, videos, for teachers(slides..) 
statsTeachR.org, OpenStaxCollege.org, 
https://www.coursera.org 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 26 / 33
3rd day 
Session 5: Biology/Ecology 
Simulating In
uenza Transmission with Real Network Data 
Network data: High school, chip??, statnet package: graph 
Simulation: various incubation period, infection period, transmission probability: : : 
Enhancing Medical Reporting by Combining Electronic Health Records with 
REDCap: Applications of the REDCap API 
http://www.project-redcap.org 
pt0 ¨D L chart review X˜X˜ ` D” Æt X¬ ì8| ` X| 
°tä. 
Simulations for regulatory decision making: How many simulations do we need to 
run? 
Simulationt Ü0Ð ”Xä(ex: FDA). 
Simulation@ complex model(not analytically tractable)Ð - D”Xä. 
1,000ˆ, 10,000ˆ Ä” ´¼Æ” ½° ˆä. 400̈ tÁLÀÄ.. R  
parallel computing: : : 
Monitoring Patients with Ongoing Reduced Kidney Function 
@Ä- (UCLA in LA) The R User Conference 2014 6.30  7.3 27 / 33

More Related Content

What's hot

An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...riyaniaes
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce FrameworkEdureka!
 
Qiu bosc2010
Qiu bosc2010Qiu bosc2010
Qiu bosc2010BOSC 2010
 
Hadoop World 2011: Hadoop and Graph Data Management: Challenges and Opportuni...
Hadoop World 2011: Hadoop and Graph Data Management: Challenges and Opportuni...Hadoop World 2011: Hadoop and Graph Data Management: Challenges and Opportuni...
Hadoop World 2011: Hadoop and Graph Data Management: Challenges and Opportuni...Cloudera, Inc.
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...Bikash Chandra Karmokar
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkeldariof
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...Geoffrey Fox
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceHortonworks
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning  ClusteringGraphlab Ted Dunning  Clustering
Graphlab Ted Dunning ClusteringMapR Technologies
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaGezim Sejdiu
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...MLconf
 
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Vijay Srinivas Agneeswaran, Ph.D
 

What's hot (20)

An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
Qiu bosc2010
Qiu bosc2010Qiu bosc2010
Qiu bosc2010
 
Hadoop World 2011: Hadoop and Graph Data Management: Challenges and Opportuni...
Hadoop World 2011: Hadoop and Graph Data Management: Challenges and Opportuni...Hadoop World 2011: Hadoop and Graph Data Management: Challenges and Opportuni...
Hadoop World 2011: Hadoop and Graph Data Management: Challenges and Opportuni...
 
Hadoop
HadoopHadoop
Hadoop
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce framework
 
Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
 
IJET-V3I2P14
IJET-V3I2P14IJET-V3I2P14
IJET-V3I2P14
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning  ClusteringGraphlab Ted Dunning  Clustering
Graphlab Ted Dunning Clustering
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
 
Apache Hadoop: DFS and Map Reduce
Apache Hadoop: DFS and Map ReduceApache Hadoop: DFS and Map Reduce
Apache Hadoop: DFS and Map Reduce
 

Viewers also liked

Whole Genome Regression using Bayesian Lasso
Whole Genome Regression using Bayesian LassoWhole Genome Regression using Bayesian Lasso
Whole Genome Regression using Bayesian LassoJinseob Kim
 
가설검정의 심리학
가설검정의 심리학 가설검정의 심리학
가설검정의 심리학 Jinseob Kim
 
iHS calculation in R
iHS calculation in RiHS calculation in R
iHS calculation in RJinseob Kim
 
Multilevel study
Multilevel study Multilevel study
Multilevel study Jinseob Kim
 
Deep learning by JSKIM
Deep learning by JSKIMDeep learning by JSKIM
Deep learning by JSKIMJinseob Kim
 
GEE & GLMM in GWAS
GEE & GLMM in GWASGEE & GLMM in GWAS
GEE & GLMM in GWASJinseob Kim
 
R Introduction & auto make table1
R Introduction & auto make table1R Introduction & auto make table1
R Introduction & auto make table1Jinseob Kim
 
Case-crossover study
Case-crossover studyCase-crossover study
Case-crossover studyJinseob Kim
 
Generalized Additive Model
Generalized Additive Model Generalized Additive Model
Generalized Additive Model Jinseob Kim
 
괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.Jinseob Kim
 

Viewers also liked (14)

Whole Genome Regression using Bayesian Lasso
Whole Genome Regression using Bayesian LassoWhole Genome Regression using Bayesian Lasso
Whole Genome Regression using Bayesian Lasso
 
가설검정의 심리학
가설검정의 심리학 가설검정의 심리학
가설검정의 심리학
 
iHS calculation in R
iHS calculation in RiHS calculation in R
iHS calculation in R
 
Tree advanced
Tree advancedTree advanced
Tree advanced
 
Fst in R
Fst in R Fst in R
Fst in R
 
Multilevel study
Multilevel study Multilevel study
Multilevel study
 
Think bayes
Think bayesThink bayes
Think bayes
 
Deep learning by JSKIM
Deep learning by JSKIMDeep learning by JSKIM
Deep learning by JSKIM
 
GEE & GLMM in GWAS
GEE & GLMM in GWASGEE & GLMM in GWAS
GEE & GLMM in GWAS
 
R Introduction & auto make table1
R Introduction & auto make table1R Introduction & auto make table1
R Introduction & auto make table1
 
Case-crossover study
Case-crossover studyCase-crossover study
Case-crossover study
 
Generalized Additive Model
Generalized Additive Model Generalized Additive Model
Generalized Additive Model
 
괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.
 
DALY & QALY
DALY & QALYDALY & QALY
DALY & QALY
 

Similar to useR 2014 jskim

Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computingBAINIDA
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopDataWorks Summit
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormRevolution Analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & RŁukasz Grala
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document usefulssuser3c3f88
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
Open source analytics
Open source analyticsOpen source analytics
Open source analyticsAjay Ohri
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Rio Info
 
Venkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-ResumeVenkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-Resumevenkata sateeshs
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL ServerŁukasz Grala
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAlex Palamides
 
Scientific
Scientific Scientific
Scientific marpierc
 

Similar to useR 2014 jskim (20)

Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
Venkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-ResumeVenkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-Resume
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
Scientific
Scientific Scientific
Scientific
 
Michal Marušan: Scalable R
Michal Marušan: Scalable RMichal Marušan: Scalable R
Michal Marušan: Scalable R
 

More from Jinseob Kim

Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...Jinseob Kim
 
Fst, selection index
Fst, selection indexFst, selection index
Fst, selection indexJinseob Kim
 
Why Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellWhy Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellJinseob Kim
 
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...Jinseob Kim
 
Win Above Replacement in Sabermetrics
Win Above Replacement in SabermetricsWin Above Replacement in Sabermetrics
Win Above Replacement in SabermetricsJinseob Kim
 
Regression Basic : MLE
Regression  Basic : MLERegression  Basic : MLE
Regression Basic : MLEJinseob Kim
 
Selection index population_genetics
Selection index population_geneticsSelection index population_genetics
Selection index population_geneticsJinseob Kim
 
질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010Jinseob Kim
 
Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)Jinseob Kim
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning IntroductionJinseob Kim
 

More from Jinseob Kim (11)

Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
 
Fst, selection index
Fst, selection indexFst, selection index
Fst, selection index
 
Why Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellWhy Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So Well
 
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
 
Win Above Replacement in Sabermetrics
Win Above Replacement in SabermetricsWin Above Replacement in Sabermetrics
Win Above Replacement in Sabermetrics
 
Regression Basic : MLE
Regression  Basic : MLERegression  Basic : MLE
Regression Basic : MLE
 
Selection index population_genetics
Selection index population_geneticsSelection index population_genetics
Selection index population_genetics
 
질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010
 
Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning Introduction
 
Main result
Main result Main result
Main result
 

Recently uploaded

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 

Recently uploaded (20)

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

useR 2014 jskim

  • 1. The R User Conference 2014 useR 2014 @Ä- UCLA in LA 6.30 7.3 @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 1 / 33
  • 2. What is useR? Contents 1 What is useR? 2 1st day: Tutorial Applied Predictive Modeling in R Graphical Models and Bayesian Networks with R 3 2nd day 4 3rd day 5 4th day @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 2 / 33
  • 3. What is useR? Since 2004: : : Main meeting of the R user and developer community. Invited keynote lectures broad spectrum of topics ranging from technical and R-related computing issues to general statistical topics of current interest User-contributed presentations @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 3 / 33
  • 4. What is useR? @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 4 / 33
  • 5. What is useR? Figure. Afternoon tutorial: dplyr @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 5 / 33
  • 6. 1st day: Tutorial Contents 1 What is useR? 2 1st day: Tutorial Applied Predictive Modeling in R Graphical Models and Bayesian Networks with R 3 2nd day 4 3rd day 5 4th day @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 6 / 33
  • 7. 1st day: Tutorial List http://user2014.stat.ucla.edu/#tutorials @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 7 / 33
  • 8. 1st day: Tutorial Applied Predictive Modeling in R Applied Predictive Modeling in R Max Kuhn, Ph.D : P
  • 9. zer Global RD http://appliedpredictivemodeling.com caret package in R Outline Conventions in R Data Splitting and Estimating Performance Data Pre-Processing Over{Fitting and Resampling Training and Tuning Tree Models Training and Tuning A Support Vector Machine Comparing Models Parallel Processing @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 8 / 33
  • 10. 1st day: Tutorial Applied Predictive Modeling in R 0´ 1 t VS ! Trade-o : Logistic regressiont t@ ¸XÀÌ(Odds ratio), t| àÑ` D”” Æä. !X UÄ@ Ä| t ¼ÈàÀ logit @ hÈ ˆä. XäD D”Ð 0|  mŒ À(ex: Log ÀX, scale, centering) R2, AIC, p-value: : : VS cross validation, bootstrapping, sampling, ROC curve 2 Supervised machine learning Regression : simple, glm, PCA, penalized(Ridge, Lasso, elastic-net) : : : Classi
  • 11. cation: K{Nearest Neighbors, trees Common: Boosting, Support Vector Machine (SVM) 3 Parallel Processing @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 9 / 33
  • 12. 1st day: Tutorial Applied Predictive Modeling in R ùtü ¥Äü 1 Supervised learning ù¥ ©. Our data| Rt ù` ˆ”??? 2 Unsupervised learning Deep learning: ì5 à½Ý(DNN: Deep Neural Network) 2014D uì0 Q˜. ex)L1 xÝ (|xX? https://class.coursera.org/neuralnets-2012-001/lecture @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 10 / 33
  • 13. 1st day: Tutorial Graphical Models and Bayesian Networks with R Graphical Models and Bayesian Networks with R Probability propagation with Bayesian networks (BNs) and their implementation in the gRain (gRaphical independence networks) package. A look under the hood of BNs to understand mechanisms of probability propagation. Dependency graphs and conditional independence restrictions. Log-linear models, graphical models, decompsable models and their implementation in the gRim (gRaphical independence models) package. Model selection with gRim Converting a decompsable graphical model to a Bayesian network. @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 11 / 33
  • 14. 1st day: Tutorial Graphical Models and Bayesian Networks with R The chest clinic narrative p(V) = p(a)p(tja)p(s)p(l js)p(bjs)p(ejt; l)p(dje; b)p(xje) @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 12 / 33
  • 15. 2nd day Contents 1 What is useR? 2 1st day: Tutorial Applied Predictive Modeling in R Graphical Models and Bayesian Networks with R 3 2nd day 4 3rd day 5 4th day @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 13 / 33
  • 16. 2nd day Opening Keynote John Chambers(Stats, Stanford) - Interfaces, Eciency and Big Data Rcpp: cpp function ! R RLLVM, http://www.omegahat.org/Rllvm: RD compilet ôä. h2o: java baseX machine learining for big data. @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 14 / 33
  • 17. 2nd day 1185 Terra Bella Ave, Mountain View, CA 94043 | 650-429-8337 | h2o.ai .ai .ai H2O is the world’s fastest in-memory platform for machine learning and predictive analytics on big data. It is the only alternative to combine the power of highly advanced algorithms, the freedom of open source, and the capacity of truly scalable in-memory processing for big data on one or many nodes. Com-bined, these capabilities make it faster, easier, and more cost effective to harness big data to maximum benefit for the business. With H2O, you can: • Make better predictions. Harness sophisticated, ready-to-use algorithms and the processing power you need to analyze bigger data sets, more models, and more variables. • Get started with minimal effort and invest-ment. H2O is an extensible open source platform that offers the most pragmatic way to put big data to work for your business. With H2O, you can work with your existing languages and tools. Further, you can extend the platform seamlessly into your Hadoop environments. Churn Prediction • Banking. What are the profiles and usage patterns of customers who are most likely to defect? • Online retail. What are the leading indicators and patterns of behavior to predict customer churn? Predict the segment of customers most likely to churn, and when, in order to intercept it and change their behavior. Fraud Prediction • Payment processor. Predict fraudulent activity using anomaly detection methods. • Insurance. Stop fraud before claims are paid using real time scoring. Identify repeat offenders and score incoming claims based on fraudulent his-tory patterns. Scoring Engine • Score customers based on purchase history and analyze the lifetime value of key accounts to discover upsell and cross-sell opportunities. Pricing Engine • Travel. Analyze different cost and promotional packages to create the most competitive combi-nation of services. • Healthcare. Discover new insights and create competitive services and healthcare programs by analyzing patient attributes, including envi-ronment, lifestyle and medical history. Forecast • Real estate. Predict property value and forecast sales by neighborhoods and regional variables. Analyze larger nationwide datasets vs. smaller sample sets to realize greater accuracy and find previously unnoticed patterns. Key Benefits BETTER PREDICTIONS • Ready-to-use, powerful algorithms for regression, classification, clustering, and deep learning—along with advanced capa-bilities for churn prediction, recommenda-tions, fraud prediction, and more. SPEED • In-memory processing provides real-time responsiveness and enables you to run more models. • Fine-grain parallel distribution on big data—enabling accurate computations across one or many nodes by moving the code to the data. EASE OF USE • Easy set up and use, either through an intuitive Web interface or your existing tools, including R, Java, Scala, and Python. • Model export in plain Java code for real-time scoring in any environment. EXTENSIBILITY • Seamless Hadoop integration with distrib-uted data ingestion from HDFS and S3. Algorithms EXPLORATORY DATA ANALYTICS (EDA) • Summary* • K-Means* • PCA* • Data Munging / Transformation* * Supported in R ADVANCED ALGORITHMS • Generalized Linear Model (GLM)— Poisson, Gamma Tweedie, binomial (logit), Gaussian* • Random Forest* • Gradient Boosted Regression* • Gradient Boosted Classification* * Low Latency Java Scoring SCORING AND PREDICTION ENGINES • GLM • Random Forest • Gradient Boosted Regression • Gradient Boosted Classification • K-Means DEEP LEARNING • Neural Networks H2O The Open Source In-Memory Prediction Engine What can you do with better predictions? Expect more from your data. Customers spot a job that should be stopped and more quickly iterate to find the optimal approach. Native R and Seamless Hadoop Integration H2O can run as a standalone platform or within an existing Hadoop installation, bringing in-memory performance to Hadoop. H2O works with data in HDFS and supports familiar pro-gramming tools, such as Hive and Pig. In addition, the solution can be efficiently run in Amazon Web Ser-vices environments. Fine-Grain Distributed Processing on Big Data at Speeds Up to 100x Faster Faster H2O lets you model interactively using in-memory processing, and delivers paral-lel distributed scalability required to support your big data production environments. The solution combines the responsiveness of in-memory processing with the ability to run fast serialization between nodes and clusters—so you can support the size requirements of your large data sets. Further, H2O does this distributed processing with fine-grain parallel-ism, which enables optimal efficiency, without introducing degradation in computational accuracy. Join the H2O Movement H2O brings better algorithms to big data. H2O is a fast open source in-memory predic-tion engine and machine learning platform. With H2O enterprises can use all of their data (instead of sampling) in real-time for better predictions. Users can model data quickly and make better data-driven decisions faster by running advanced algorithms such as Deep Learning, Classification, Regression, Decision Trees, Forests, Gradient Boosting, GLM, PCA and more. Data Scientists can take both simple sophisticated models to production from the same interactive platform used for modeling within R and JSON. Our earliest customers have built powerful domain specific predictive engines for Recom-mendations, Pricing, Outlier Detection and Fraud Prediction for Insurance and Ad Plat-forms. H2O is nurturing a grassroots movement of math, systems and data scientists to herald the new wave of Discovery with Big Data Science. H2O is on CRN’s 10 Coolest Big Data Products of 2013. www.h2o.ai For latest features and updates, go to H2O Open Source Github Repository http://0xdata.github.io/h2o/ H2O Billion Row Machine Learning Benchmark GLM Logistic Regression Hadoop/Mahout H2O 16 EC2 nodes H2O 16 EC2 nodes H2O 48 EC2 nodes H2O 48 EC2 nodes 34.9 sec, 3 itera ons numerical and categorical 16.5 sec, 2 itera ons numerical 14.2 sec, 3 itera ons numerical and categorical 5.6 sec, 2 itera ons numerical Compute Hardware: AWS EC2 c3.2xlarge - 8 cores and 15 GB per node, 1 GbE interconnect Airline Dataset 1987-2013, 42 GB CSV, 1 billion rows, 12 input columns, 1 outcome column 9 numerical features, 3 categorical features with cardinali es 30, 376 and 380 Work with R, Familiar Tools and Intuitive Interfaces Through its intuitive Web interface and inte-gration with common tools, H2O makes it fast and easy to get started with big data analyt-ics. The solution works seamlessly with R and R Studio. For example, using the R interface, you can forward workflows to H2O for big data processing, and work in a familiar interface while running algorithms on data sets that are hundreds of times larger than what would be possible on a user machine. H2O also features native support for Java, Scala, and Python. The solution’s interface is driven by JSON APIs, which makes it easy to plug into your organiza-tion’s existing tools and processes to train your data and continuously improve your models and predictive accuracy. In-Memory Processing Responsiveness With H2O, your organization can harness the responsiveness of highly optimized in-memory processing, so you can operationalize many more models and gain real-time intelligence in business transactions and interactions. With model export as plain Java code, you gain light-ning fast real-time scoring in any environment. In addition, the solution enables data scientists to view partial query results while longer pro-cesses are running, so they can immediately Copyright © H20 All rights reserved. All trademarks referenced herein belong to their respective companies. .ai 1185 Terra Bella Ave, Mountain View, CA 94043 | 650-429-8337 | h2o.ai @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 15 / 33
  • 18. 2nd day Session 1: Bayesian  tÀH µÄ Ä°Ð approximation Spatial analysis ©.  tÀH Œ¸è´.. @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 16 / 33
  • 19. 2nd day Session 2 R For Improving Consumer Engagement and Health Outcomes ActiveHealth Management, Inc internal member data + externally-purchased lifestyle behavioral data K-Means Clustering and CART classi
  • 20. cation trees Response rate øùÐ 0| ätä. Þ¤ ©. Shiny: R made interactive http://shiny.rstudio.com/gallery/kmeans-example.html Fostering the next generation of open science with R http://ropensci.org/ @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 17 / 33
  • 21. 2nd day Invited Talk Martin Maechler(Math, Zurich)- Good Practices in R Programming ¡ä T)¤¬ä.. (ex: VS =, ü, D´ð0 ñ..) @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 18 / 33
  • 22. 2nd day Session 3 dplyr,data.table: High performance in data step PivotalR: A Package for Machine Learning on Big Data https://github.com/gopivotal/PivotalR @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 19 / 33
  • 23. 2nd day Poster Session 1: 28 posters Reproducible Research in Public Health Jinseob Kim1, Joohon Sung1,? 1. Complex Disease and Genetic Epidemiology Branch, Graduate School of Public Health, Seoul National University Objectives In this study, we aimed to construct pipelines of reproducible statistical analysis in health research. The development of pipelines in this study consists of • Automatic suggestions of a summary table describing the general characteristics of the study • Univariate analysis of both explanatory and outcome variables of a study • Graphical presentations of summary and univariate analyses • Automatic analysis and tabulations of main results based on frequently used analytical methods in the health research area (e.g., multiple re-gression, logistic regression, survival analysis, multilevel analysis, genome-wide association study(GWAS)). Background For most health reasearchers who learned and ap-plied the statistical methods properly had spent long time for learning statistics. Addtionally, sta-tistical methods are ever evolving and updating their knowledge and analytic skills will require the most precious resources-the time. Methods With xtable packages in R and tex4ht in LATEX, researcher will get PDF or odt(open document text) version of descriptive statistics when they se-lect data, variables(factor or numeric), strata vari-able( sex, region, etc. . . )[1, 2]. In next, among var-ious analysis, we present GWAS’ example using FASTA method in GenABEL R package or fas-tassoc method in MERLIN. If they select lists of phenotypes, covariates and kinship matrix, PDF or odt files including each phenotype’s name, nar-row sense heritability, top SNPs in GWAS re-sults, qqplot and manhattan plot were created automatically[3, 4]. Example dataset is TG, LDL phenotypes and chromosome 21 in Healthy Twin Study, Korea[5]. Key Methods Researchers can obtain tables and figures if they select data set and dependent variables of interest, and define the nature of each variables (e.g. continuous, binomial, count), explanatory variables, and group variable (e.g., sex, region, unit of random effects or family structure). Example: GWAS-TG Table : PDF file- Descriptive statistics of TG Variable: Mean (SD) or N (%) Male Female P-value P-value:NP Age 44.6 (13.6) 43.95 (12.73) 0.208 0.294 Smoke 0.001 0.001 No 278 (26.18) 1506 (90.29) Past 290 (27.31) 53 (3.18) Current 494 (46.52) 109 (6.53) TG 140.34 (92.34) 98.8 (62.82) 0.001 0.001 NP: non-parametric Figure : PDF file- GWAS results of TG TG (h2 = 0.48) SNP Chromosome Position A1 A2 N MAF B FASTA SE FASTA P FASTA B fassoc SE fassoc P fassoc rs12626621 21 23496579 C T 1838 0.167 13.61 3.46 8.41E-05 -12.82 3.68 5.00E-04 rs1702393 21 30866791 C T 1832 0.353 10.67 2.72 8.76E-05 -11.51 2.96 9.99E-05 rs12627596 21 30867101 T A 1840 0.352 10.06 2.72 2.10E-04 -11.06 2.95 1.81E-04 rs128592 21 30878280 C T 1841 0.352 10.04 2.72 2.18E-04 -11.06 2.95 1.80E-04 rs198935 21 30867998 C T 1799 0.358 10.13 2.74 2.19E-04 -11.42 2.98 1.26E-04 rs11702393 21 39304254 G A 1832 0.297 10.71 2.91 2.35E-04 -12.55 3.13 6.23E-05 rs382004 21 19226026 T C 1840 0.035 25.36 7.00 2.92E-04 -22.86 7.63 2.75E-03 rs1888516 21 41219289 G A 1837 0.014 38.42 10.61 2.93E-04 -34.38 11.09 1.92E-03 rs9306107 21 44798662 G A 1827 0.008 51.02 14.36 3.82E-04 -45.77 15.16 2.53E-03 rs426803 21 19225690 T C 1839 0.037 24.02 6.78 3.98E-04 -22.16 7.38 2.67E-03 rs198936 21 30869966 C T 1837 0.351 9.41 2.73 5.54E-04 -10.69 2.96 3.10E-04 rs1702405 21 30914735 A G 1841 0.493 -9.05 2.63 5.86E-04 10.97 2.86 1.23E-04 rs2257149 21 36213059 G A 1806 0.360 -9.39 2.75 6.24E-04 10.86 3.00 2.96E-04 rs174897 21 30919037 A G 1835 0.492 -8.74 2.63 8.89E-04 10.62 2.85 1.92E-04 rs198871 21 30932579 G A 1819 0.492 -8.76 2.64 8.91E-04 10.42 2.85 2.59E-04 Table 1: GWAS: TG (a) QQplot-FASTA: TG (b) QQplot-Fassoc: TG Figure 1: QQplot: TG (a) Manhattan plot-FASTA: TG (b) Manhattan plot-Fassoc: TG Figure 2: Manhattan plot: TG 1 Example: GWAS:LDL Table : PDF file- Descriptive statistics of LDL Variable: Mean (SD) or N (%) Male Female P-value P-value:NP Age 44.6 (13.6) 43.95 (12.73) 0.208 0.294 FBS 97.12 (19.67) 91.36 (16.54) 0.001 0.001 tCholesterol 191.08 (34.99) 188.63 (35.83) 0.077 0.031 HDL 46.45 (11.24) 52.26 (12.72) 0.001 0.001 LDL 112.84 (30.7) 108.76 (30.3) 0.001 0.001 NP: non-parametric Figure : PDF file- GWAS results of LDL LDL (h2 = 0.47) SNP Chromosome Position A1 A2 N MAF B FASTA SE FASTA P FASTA B fassoc SE fassoc P fassoc rs4818418 21 18802344 C T 1823 0.276 4.87 1.13 1.53E-05 -4.54 1.22 1.93E-04 rs1735790 21 18788001 A G 1792 0.274 4.90 1.15 2.10E-05 -5.20 1.24 2.82E-05 rs2824856 21 18793090 G A 1811 0.278 4.82 1.14 2.21E-05 -4.71 1.22 1.15E-04 rs2824898 21 18813513 T C 1841 0.280 4.70 1.12 2.50E-05 -4.46 1.21 2.34E-04 rs2252190 21 18792169 A G 1839 0.277 4.74 1.13 2.54E-05 -4.81 1.22 7.94E-05 rs2824899 21 18813559 T C 1840 0.280 4.69 1.12 2.68E-05 -4.46 1.21 2.34E-04 rs914244 21 46340779 C T 1839 0.408 -4.38 1.05 2.88E-05 4.78 1.15 3.07E-05 rs2824857 21 18793134 G T 1837 0.279 4.67 1.12 3.26E-05 -4.71 1.22 1.09E-04 rs2026211 21 18806617 C T 1842 0.276 4.64 1.12 3.51E-05 -4.38 1.21 3.10E-04 rs2824880 21 18807606 A G 1842 0.276 4.64 1.12 3.51E-05 -4.38 1.21 3.10E-04 rs2838534 21 44498077 T C 1811 0.352 3.94 1.06 2.04E-04 -4.69 1.16 5.22E-05 rs456164 21 27761178 T C 1842 0.395 -3.82 1.05 2.85E-04 4.46 1.14 9.88E-05 Table 1: GWAS: LDL (a) QQplot-FASTA: LDL (b) QQplot-Fassoc: LDL Figure 1: QQplot: LDL (a) Manhattan plot-FASTA: LDL (b) Manhattan plot-Fassoc: LDL Figure 2: Manhattan plot: LDL 1 Conclusion Using xtable package in R, LATEX and tex4ht pack-age in LATEX with various statistical packages in R, we developed a automatic words describing the result tables and figures with PDF or opendocu-ment format directly[1, 2]. Though we presented only descriptive statistics and GWAS examples, pipelines of other analysis(e.g., survival analysis, multilevel analysis, etc. . . ) were also made us-ing similar packages above and some additional packages. This automated statistical pipeline tools will help individual researcher in health-related or broader arena to help to reduce their analytical bur-dens, as well as to conduct appropriate statistical analysis much faster and reliable manner. References [1] David B. Dahl. xtable: Export tables to LaTeX or HTML, 2014. R package version 1.7-3. [2] Emma Cliffe. Methods to produce flexible and accessible learning resources in mathematics: overview document. 2012. [3] GenABEL project developers. GenABEL: genome-wide SNP association analysis, 2013. R package version 1.8-0. [4]Wei-Min Chen and Gonçalo R Abecasis. Family-based association tests for genomewide association scans. The American Journal of Human Genetics, 81(5):913–926, 2007. [5] Joohon Sung, Sung-Il Cho, Yun-Mi Song, Kayoung Lee, Eun-Young Choi, Mina Ha, Jihae Kim, Ho Kim, Yeonju Kim, Eun-Kyung Shin, et al. Do we need more twin stud-ies? the healthy twin study, korea. International journal of epidemiology, 35(2):488–490, 2006. Contact Information •Web: http://snugepi.snu.ac.kr • Email: kimjinseob@snu.ac.kr • Phone: +82-2-880-2743 @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 20 / 33
  • 24. 2nd day Visualizationt 8? mx 2…: ð8YP XüY(ˆ8), 1àYP(õ?) @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 21 / 33
  • 25. 3rd day Contents 1 What is useR? 2 1st day: Tutorial Applied Predictive Modeling in R Graphical Models and Bayesian Networks with R 3 2nd day 4 3rd day 5 4th day @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 22 / 33
  • 26. 3rd day Invited Talk Dirk Eddelbuettel - R, C++ and Rcpp Rcpp : RX ¸h + cppX speed RÐ èXŒ(pointerD”ÆL) cpp h Ìàä. cppTÜÐ R h ½…¥. R () cpp  ¬ Tää: : : Docker : È´ Á8à 0tX Á8à : ø¨ + |t
  • 27. ì¬ + OS Docker : ø¨ + |t
  • 28. ì¬ only Only for linux @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 23 / 33
  • 29. 3rd day Sponsors Talk Revolution Analytics ÁÅ© R. SASôä «ä. Oracle: ROracle R+ Oracle database Google SAS Æä. ÔÀÈ´, °, µÄY õ lx. @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 24 / 33
  • 30. 3rd day Session 4 R in the Midst of Exploding Stars: Distributed, Time-Domain Transient Classi
  • 31. cation NASAÐÄ RD ôä. à1 ñ Ä „XÐ machine learning(clustering..) t©. Imputation of Missing Values with the R Package VIM ä‘ Imputation method ÀÐ. Imputation°ü| ø˜= ÀÐ. PSAboot: An R Package for Bootstrapping Propensity Score Analysis http://github.com/jbryer/psa Risk of bias due to unobserved covariates ä‘ )•(5À) bootstrapD t©Xì PSA ‰h. Permutation Tests in Multidimensional Scaling ä(Ð ô ! Áx p¬ . smacof package in R Permutation test: dissimilarity randomXŒ „ì(null) VS pt0X „ì @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 25 / 33
  • 32. 3rd day Invited Talk David Diez(Openintro) - Textbooks struggle where software succeeds http://www.openintro.org Open source textbook: paperback $10 Labs, videos, for teachers(slides..) statsTeachR.org, OpenStaxCollege.org, https://www.coursera.org @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 26 / 33
  • 33. 3rd day Session 5: Biology/Ecology Simulating In uenza Transmission with Real Network Data Network data: High school, chip??, statnet package: graph Simulation: various incubation period, infection period, transmission probability: : : Enhancing Medical Reporting by Combining Electronic Health Records with REDCap: Applications of the REDCap API http://www.project-redcap.org pt0 ¨D L chart review X˜X˜ ` D” Æt X¬ ì8| ` X| °tä. Simulations for regulatory decision making: How many simulations do we need to run? Simulationt Ü0Ð ”Xä(ex: FDA). Simulation@ complex model(not analytically tractable)Ð - D”Xä. 1,000ˆ, 10,000ˆ Ä” ´¼Æ” ½° ˆä. 400̈ tÁLÀÄ.. R parallel computing: : : Monitoring Patients with Ongoing Reduced Kidney Function @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 27 / 33
  • 34. 3rd day Poster session 2: 28 posters 8” 1 Visualization(ex: shiny) Shiny-ing compareGroups Data Works: An Interactive Data Visualization Application Built with Shiny R graphics in Tidal Wetland Restoration Statistics without Numbers: Using Data Visualization to Quantify Trends for Cycling Safety Visually Analyzing and Running Multilevel Data in R and BUGS Using RGraphviz as a
  • 35. rst pass for layout of small structural model graphs Developing shiny applications for the classroom 2 Automatic reporting tools Multi-center Clinical trials reporting with R Teaching data analysis in R through the lens of reproducibility Better Data Quality In Clinical Trials @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 28 / 33
  • 36. 4th day Contents 1 What is useR? 2 1st day: Tutorial Applied Predictive Modeling in R Graphical Models and Bayesian Networks with R 3 2nd day 4 3rd day 5 4th day @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 29 / 33
  • 37. 4th day Invited Talk Karline Soetaert(Head of the Department of Ecosystem Studies, Royal Netherlands Institute of Sea Research) - Solving dierential equations in R Marine science: ¨à ƒD ä !` Æä. Døä.. ø„)Ý ˜8À| ”. tƒƒ ìUt ðä 2008D R µ|X$à (¤À deSolve . @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 30 / 33
  • 38. 4th day Session 6 13Ü 10„ œmÄ ãÀ »h. č mlr: machine learning package in R rapport: a report templating system in R http://rapport-package.info/#templates Tˆ 𔠵ĄРreport tool ) Var1X Éà@ mean, ¸(” sdä. + Table + Histogram. |8© Lt@ lH(.. @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 31 / 33
  • 39. 4th day …i: useR2014X 8? 1 Machine Learning: Predictive modelling H2o, alteryx, Rstudio, TIBC €„ predictive modelling Œ¬ 2 Performance : parallel, other languages : : : 3 Visualization 4 Reproducible research @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 32 / 33
  • 40. 4th day http://user2015.math.aau.dk @Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 33 / 33