The document describes the R User Conference 2014 which was held from June 30 to July 3 at UCLA in Los Angeles. The conference included tutorials on the first day covering topics like applied predictive modeling in R and graphical models. Keynote speeches and sessions were held on subsequent days covering various technical and statistical topics as well as best practices in R programming. Tutorials and sessions demonstrated tools and packages in R like dplyr and Shiny for data analysis and interactive visualizations.
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
useR 2014 jskim
1. The R User Conference 2014
useR 2014
@Ä-
UCLA in LA
6.30 7.3
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 1 / 33
2. What is useR?
Contents
1 What is useR?
2 1st day: Tutorial
Applied Predictive Modeling in R
Graphical Models and Bayesian Networks with R
3 2nd day
4 3rd day
5 4th day
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 2 / 33
3. What is useR?
Since 2004: : :
Main meeting of the R user and developer community.
Invited keynote lectures
broad spectrum of topics ranging from technical and R-related
computing issues to general statistical topics of current interest
User-contributed presentations
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 3 / 33
4. What is useR?
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 4 / 33
5. What is useR?
Figure. Afternoon tutorial: dplyr
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 5 / 33
6. 1st day: Tutorial
Contents
1 What is useR?
2 1st day: Tutorial
Applied Predictive Modeling in R
Graphical Models and Bayesian Networks with R
3 2nd day
4 3rd day
5 4th day
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 6 / 33
7. 1st day: Tutorial
List
http://user2014.stat.ucla.edu/#tutorials
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 7 / 33
8. 1st day: Tutorial Applied Predictive Modeling in R
Applied Predictive Modeling in R
Max Kuhn, Ph.D : P
9. zer Global RD
http://appliedpredictivemodeling.com
caret package in R
Outline
Conventions in R
Data Splitting and Estimating Performance
Data Pre-Processing
Over{Fitting and Resampling
Training and Tuning Tree Models
Training and Tuning A Support Vector Machine
Comparing Models
Parallel Processing
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 8 / 33
11. cation: K{Nearest Neighbors, trees
Common: Boosting, Support Vector Machine (SVM)
3 Parallel Processing
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 9 / 33
13. 1st day: Tutorial Graphical Models and Bayesian Networks with R
Graphical Models and Bayesian Networks with R
Probability propagation with Bayesian networks (BNs) and their
implementation in the gRain (gRaphical independence networks)
package.
A look under the hood of BNs to understand mechanisms of
probability propagation. Dependency graphs and conditional
independence restrictions.
Log-linear models, graphical models, decompsable models and their
implementation in the gRim (gRaphical independence models)
package.
Model selection with gRim
Converting a decompsable graphical model to a Bayesian network.
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 11 / 33
14. 1st day: Tutorial Graphical Models and Bayesian Networks with R
The chest clinic narrative
p(V) = p(a)p(tja)p(s)p(l js)p(bjs)p(ejt; l)p(dje; b)p(xje)
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 12 / 33
15. 2nd day
Contents
1 What is useR?
2 1st day: Tutorial
Applied Predictive Modeling in R
Graphical Models and Bayesian Networks with R
3 2nd day
4 3rd day
5 4th day
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 13 / 33
16. 2nd day
Opening Keynote
John Chambers(Stats, Stanford) - Interfaces, Eciency and Big
Data
Rcpp: cpp function ! R
RLLVM, http://www.omegahat.org/Rllvm: RD compilet ôä.
h2o: java baseX machine learining for big data.
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 14 / 33
19. 2nd day
Session 2
R For Improving Consumer Engagement and Health Outcomes
ActiveHealth Management, Inc
internal member data + externally-purchased lifestyle behavioral
data
K-Means Clustering and CART classi
21. 2nd day
Invited Talk
Martin Maechler(Math, Zurich)- Good Practices in R Programming
¡ä T)¤¬ä.. (ex: VS =, ü, D´ð0 ñ..)
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 18 / 33
22. 2nd day
Session 3
dplyr,data.table: High performance in data step
PivotalR: A Package for Machine Learning on Big Data
https://github.com/gopivotal/PivotalR
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 19 / 33
23. 2nd day
Poster Session 1: 28 posters
Reproducible Research in Public Health
Jinseob Kim1, Joohon Sung1,?
1. Complex Disease and Genetic Epidemiology Branch, Graduate School of Public Health, Seoul National University
Objectives
In this study, we aimed to construct pipelines
of reproducible statistical analysis in health
research. The development of pipelines in this
study consists of
• Automatic suggestions of a summary table
describing the general characteristics of the
study
• Univariate analysis of both explanatory and
outcome variables of a study
• Graphical presentations of summary and
univariate analyses
• Automatic analysis and tabulations of main
results
based on frequently used analytical methods
in the health research area (e.g., multiple re-gression,
logistic regression, survival analysis,
multilevel analysis, genome-wide association
study(GWAS)).
Background
For most health reasearchers who learned and ap-plied
the statistical methods properly had spent
long time for learning statistics. Addtionally, sta-tistical
methods are ever evolving and updating
their knowledge and analytic skills will require the
most precious resources-the time.
Methods
With xtable packages in R and tex4ht in LATEX,
researcher will get PDF or odt(open document
text) version of descriptive statistics when they se-lect
data, variables(factor or numeric), strata vari-able(
sex, region, etc. . . )[1, 2]. In next, among var-ious
analysis, we present GWAS’ example using
FASTA method in GenABEL R package or fas-tassoc
method in MERLIN. If they select lists of
phenotypes, covariates and kinship matrix, PDF
or odt files including each phenotype’s name, nar-row
sense heritability, top SNPs in GWAS re-sults,
qqplot and manhattan plot were created
automatically[3, 4]. Example dataset is TG, LDL
phenotypes and chromosome 21 in Healthy Twin
Study, Korea[5].
Key Methods
Researchers can obtain tables and figures if they select data set and dependent variables of interest,
and define the nature of each variables (e.g. continuous, binomial, count), explanatory variables, and
group variable (e.g., sex, region, unit of random effects or family structure).
Example: GWAS-TG
Table : PDF file- Descriptive statistics of TG
Variable: Mean (SD) or N (%) Male Female P-value P-value:NP
Age 44.6 (13.6) 43.95 (12.73) 0.208 0.294
Smoke 0.001 0.001
No 278 (26.18) 1506 (90.29)
Past 290 (27.31) 53 (3.18)
Current 494 (46.52) 109 (6.53)
TG 140.34 (92.34) 98.8 (62.82) 0.001 0.001
NP: non-parametric
Figure : PDF file- GWAS results of TG
TG (h2 = 0.48)
SNP Chromosome Position A1 A2 N MAF B FASTA SE FASTA P FASTA B fassoc SE fassoc P fassoc
rs12626621 21 23496579 C T 1838 0.167 13.61 3.46 8.41E-05 -12.82 3.68 5.00E-04
rs1702393 21 30866791 C T 1832 0.353 10.67 2.72 8.76E-05 -11.51 2.96 9.99E-05
rs12627596 21 30867101 T A 1840 0.352 10.06 2.72 2.10E-04 -11.06 2.95 1.81E-04
rs128592 21 30878280 C T 1841 0.352 10.04 2.72 2.18E-04 -11.06 2.95 1.80E-04
rs198935 21 30867998 C T 1799 0.358 10.13 2.74 2.19E-04 -11.42 2.98 1.26E-04
rs11702393 21 39304254 G A 1832 0.297 10.71 2.91 2.35E-04 -12.55 3.13 6.23E-05
rs382004 21 19226026 T C 1840 0.035 25.36 7.00 2.92E-04 -22.86 7.63 2.75E-03
rs1888516 21 41219289 G A 1837 0.014 38.42 10.61 2.93E-04 -34.38 11.09 1.92E-03
rs9306107 21 44798662 G A 1827 0.008 51.02 14.36 3.82E-04 -45.77 15.16 2.53E-03
rs426803 21 19225690 T C 1839 0.037 24.02 6.78 3.98E-04 -22.16 7.38 2.67E-03
rs198936 21 30869966 C T 1837 0.351 9.41 2.73 5.54E-04 -10.69 2.96 3.10E-04
rs1702405 21 30914735 A G 1841 0.493 -9.05 2.63 5.86E-04 10.97 2.86 1.23E-04
rs2257149 21 36213059 G A 1806 0.360 -9.39 2.75 6.24E-04 10.86 3.00 2.96E-04
rs174897 21 30919037 A G 1835 0.492 -8.74 2.63 8.89E-04 10.62 2.85 1.92E-04
rs198871 21 30932579 G A 1819 0.492 -8.76 2.64 8.91E-04 10.42 2.85 2.59E-04
Table 1: GWAS: TG
(a) QQplot-FASTA: TG (b) QQplot-Fassoc: TG
Figure 1: QQplot: TG
(a) Manhattan plot-FASTA: TG
(b) Manhattan plot-Fassoc: TG
Figure 2: Manhattan plot: TG
1
Example: GWAS:LDL
Table : PDF file- Descriptive statistics of LDL
Variable: Mean (SD) or N (%) Male Female P-value P-value:NP
Age 44.6 (13.6) 43.95 (12.73) 0.208 0.294
FBS 97.12 (19.67) 91.36 (16.54) 0.001 0.001
tCholesterol 191.08 (34.99) 188.63 (35.83) 0.077 0.031
HDL 46.45 (11.24) 52.26 (12.72) 0.001 0.001
LDL 112.84 (30.7) 108.76 (30.3) 0.001 0.001
NP: non-parametric
Figure : PDF file- GWAS results of LDL
LDL (h2 = 0.47)
SNP Chromosome Position A1 A2 N MAF B FASTA SE FASTA P FASTA B fassoc SE fassoc P fassoc
rs4818418 21 18802344 C T 1823 0.276 4.87 1.13 1.53E-05 -4.54 1.22 1.93E-04
rs1735790 21 18788001 A G 1792 0.274 4.90 1.15 2.10E-05 -5.20 1.24 2.82E-05
rs2824856 21 18793090 G A 1811 0.278 4.82 1.14 2.21E-05 -4.71 1.22 1.15E-04
rs2824898 21 18813513 T C 1841 0.280 4.70 1.12 2.50E-05 -4.46 1.21 2.34E-04
rs2252190 21 18792169 A G 1839 0.277 4.74 1.13 2.54E-05 -4.81 1.22 7.94E-05
rs2824899 21 18813559 T C 1840 0.280 4.69 1.12 2.68E-05 -4.46 1.21 2.34E-04
rs914244 21 46340779 C T 1839 0.408 -4.38 1.05 2.88E-05 4.78 1.15 3.07E-05
rs2824857 21 18793134 G T 1837 0.279 4.67 1.12 3.26E-05 -4.71 1.22 1.09E-04
rs2026211 21 18806617 C T 1842 0.276 4.64 1.12 3.51E-05 -4.38 1.21 3.10E-04
rs2824880 21 18807606 A G 1842 0.276 4.64 1.12 3.51E-05 -4.38 1.21 3.10E-04
rs2838534 21 44498077 T C 1811 0.352 3.94 1.06 2.04E-04 -4.69 1.16 5.22E-05
rs456164 21 27761178 T C 1842 0.395 -3.82 1.05 2.85E-04 4.46 1.14 9.88E-05
Table 1: GWAS: LDL
(a) QQplot-FASTA: LDL (b) QQplot-Fassoc: LDL
Figure 1: QQplot: LDL
(a) Manhattan plot-FASTA: LDL
(b) Manhattan plot-Fassoc: LDL
Figure 2: Manhattan plot: LDL
1
Conclusion
Using xtable package in R, LATEX and tex4ht pack-age
in LATEX with various statistical packages in
R, we developed a automatic words describing the
result tables and figures with PDF or opendocu-ment
format directly[1, 2]. Though we presented
only descriptive statistics and GWAS examples,
pipelines of other analysis(e.g., survival analysis,
multilevel analysis, etc. . . ) were also made us-ing
similar packages above and some additional
packages. This automated statistical pipeline tools
will help individual researcher in health-related or
broader arena to help to reduce their analytical bur-dens,
as well as to conduct appropriate statistical
analysis much faster and reliable manner.
References
[1] David B. Dahl. xtable: Export tables to LaTeX or HTML,
2014. R package version 1.7-3.
[2] Emma Cliffe. Methods to produce flexible and accessible
learning resources in mathematics: overview document.
2012.
[3] GenABEL project developers. GenABEL: genome-wide
SNP association analysis, 2013. R package version 1.8-0.
[4]Wei-Min Chen and Gonçalo R Abecasis. Family-based
association tests for genomewide association scans. The
American Journal of Human Genetics, 81(5):913–926,
2007.
[5] Joohon Sung, Sung-Il Cho, Yun-Mi Song, Kayoung Lee,
Eun-Young Choi, Mina Ha, Jihae Kim, Ho Kim, Yeonju
Kim, Eun-Kyung Shin, et al. Do we need more twin stud-ies?
the healthy twin study, korea. International journal
of epidemiology, 35(2):488–490, 2006.
Contact Information
•Web: http://snugepi.snu.ac.kr
• Email: kimjinseob@snu.ac.kr
• Phone: +82-2-880-2743
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 20 / 33
24. 2nd day
Visualizationt 8?
mx 2…: ð8YP XüY(ˆ8), 1àYP(õ?)
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 21 / 33
25. 3rd day
Contents
1 What is useR?
2 1st day: Tutorial
Applied Predictive Modeling in R
Graphical Models and Bayesian Networks with R
3 2nd day
4 3rd day
5 4th day
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 22 / 33
26. 3rd day
Invited Talk
Dirk Eddelbuettel - R, C++ and Rcpp
Rcpp : RX ¸h + cppX speed
RÐ èXŒ(pointerD”ÆL) cpp h Ìàä.
cppTÜÐ R h ½…¥.
R () cpp ¬ Tää: : :
Docker : È´ Á8à
0tX Á8à : ø¨ + |t
32. 3rd day
Invited Talk
David Diez(Openintro) - Textbooks struggle where software
succeeds
http://www.openintro.org
Open source textbook: paperback $10
Labs, videos, for teachers(slides..)
statsTeachR.org, OpenStaxCollege.org,
https://www.coursera.org
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 26 / 33
33. 3rd day
Session 5: Biology/Ecology
Simulating In
uenza Transmission with Real Network Data
Network data: High school, chip??, statnet package: graph
Simulation: various incubation period, infection period, transmission probability: : :
Enhancing Medical Reporting by Combining Electronic Health Records with
REDCap: Applications of the REDCap API
http://www.project-redcap.org
pt0 ¨D L chart review X˜X˜ ` D” Æt X¬ ì8| ` X|
°tä.
Simulations for regulatory decision making: How many simulations do we need to
run?
Simulationt Ü0Ð ”Xä(ex: FDA).
Simulation@ complex model(not analytically tractable)Ð - D”Xä.
1,000ˆ, 10,000ˆ Ä” ´¼Æ” ½° ˆä. 400̈ tÁLÀÄ.. R
parallel computing: : :
Monitoring Patients with Ongoing Reduced Kidney Function
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 27 / 33
34. 3rd day
Poster session 2: 28 posters
8”
1 Visualization(ex: shiny)
Shiny-ing compareGroups
Data Works: An Interactive Data Visualization Application Built
with Shiny
R graphics in Tidal Wetland Restoration
Statistics without Numbers: Using Data Visualization to
Quantify Trends for Cycling Safety
Visually Analyzing and Running Multilevel Data in R and BUGS
Using RGraphviz as a
35. rst pass for layout of small structural
model graphs
Developing shiny applications for the classroom
2 Automatic reporting tools
Multi-center Clinical trials reporting with R
Teaching data analysis in R through the lens of reproducibility
Better Data Quality In Clinical Trials
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 28 / 33
36. 4th day
Contents
1 What is useR?
2 1st day: Tutorial
Applied Predictive Modeling in R
Graphical Models and Bayesian Networks with R
3 2nd day
4 3rd day
5 4th day
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 29 / 33
37. 4th day
Invited Talk
Karline Soetaert(Head of the Department of Ecosystem Studies,
Royal Netherlands Institute of Sea Research) - Solving dierential
equations in R
Marine science: ¨à ƒD ä !` Æä. Døä..
ø„)Ý ˜8À| ”.
tƒƒ ìUt ðä 2008D R µ|X$à (¤À deSolve
.
@Ä- (UCLA in LA) The R User Conference 2014 6.30 7.3 30 / 33