My Research Journey with R

My Research Journey with R
How learning, using, and teaching R has helped my career in the life sciences
#TokyoR 2018-7-15
Tom Kelly
Postdoctoral Researcher
Epigenome Technology Exploration Unit
RIKEN Centre for Integrative Medical Sciences
Yokohama, Japan
ケケケリリリーーー・・・トトトムムム
ポスドクで研究者
エピゲノム技術開発ユニット
国立研究開発法人理化学研究所の生命医科学研究センター
日本の横浜市

Why I chose R to do (the vast majority of) my research
What I use R for in my research and what I’ve learned along the way
How my workﬂow has changed and package recommendations
Future challenges and hot topics

Introduction
Studied at the University of Otago, Dunedin, New Zealand
Majored in genetics and mathematics
Focused on “bioinformatics” in postgrad
PhD on gene interactions in breast cancer for “precision
medicine” supervised by A/Prof. Mik Black (a statistician)
Worked at Tohoku University, Sendai, Miyagi Prefecture
Assisted with academic writing and data analysis in
Neuroscience and Bioengineering Laboratories
Taught statistical analysis and programming in R to
international postgraduate students (in English)
Currently a postdoc at RIKEN, Yokohama campus
Part of a Plant Stem Cell Analysis consortium
Focusing on single-cell genomics technologies
Continuing to develop new analysis techniques and
pipelines driven by new technology
Tom Kelly
Twitter:
@tomkXY
GitHub:
TomKellyGenetics

Why I Started With R
My supervisor was a statistician and good example of how
R could be used in my ﬁeld
An opportunity to learn new (transferable) computational
skills and work with “Big Data” (rather than theory or
experiments)
Free and Open-Source
A large (and growing) user community to engage with (and
seek help from) online and at events
A huge ecosystem of packages to do statistical analyses and
plotting (especially in the ﬁeld of genomics/bioinformatics)
CRAN Bioconductor GitHub
Mik Black
Otago Uni
Dunedin, New
Zealand

What I Use R For
Pretty much everything . . .
Analysis of gene expression patterns (differential expression,
molecular subtypes, cluster analysis)
Pathway (functional group) enrichment and network (graph
structure) analysis
Develop and test novel analysis methods for genomics data
Analysis heterogeneity (variation) at the single-cell level
(classification and markers of cell types)
Integrative “omics” analysis across data from different
techniques (genetic variant, mutation, gene expression,
protein, metabolism, epigenetic regulatory states, chromatin
structure)

How I Use R
Data manipulation and statistical analysis
Built-in functions (“base R”, stats) and distributions (mvtnorm, extraDist)
data.table (fread) and tibble for enhanced “data frames”
igraph for graph theory, pathway structure, and network analysis
Parallel computing with snow and OpenMPI (simulations and permutations)
Accessing genomics annotation and analysis packages
Genomic data (e.g., org.Hs.eg.db, reactome.db)
Statistical analysis (e.g., limma, edgeR)
Plotting and data visualisation
gplots(heatmap.2 and venn diagram), vioplot, and built-in plots (scatterplot,
lineplot, boxplot, histograms, titles, axes, legends, etc)
Dimension reduction techniques: SVD, PCA, tSNE (Rtsne), UMAP (umap)
Many of these are also provided in the “tidyverse”
readr, tidyr and dplyr for data manipulation
ggplot2 for visualisation
More and more and more utilities and packages from GitHub

How I Use R
Shiny Apps
Build and share interactive apps
Even if you can’t write JavaScript

How I Use R
Rmarkdown, knitr, bookdown

How I Use R
Package development and code release with devtools
Develop R packages with devtools and roxygen2 (documentation)
Share functions and release code as a research output
Release” CRAN, Bioconductor, GitHub, ROpenSci
Cite: Zenodo, Journal of Open Source Software, Journal of Statistical Software

How I Use R
Packages I’ve developed
Data visualisation
heatmap.2x for annotated heatmap.2x (gplots)
vioplot enhanced version: proposed version 0.3
plot.igraph plotting directional graph structures, including
inhibitory links
Network analysis using igraph
graphsim simulate gene expression from pathway graph structures
pathway.structure.permutation perform permutation analysis
of gene candidates in a pathway structure
info.centrality compute network eﬃciency and information
centrality
igraph.extensions install all of the above
Gene expression analysis
slipt detect “synthetic lethal” gene interactions in expression data
DoubletDetection R implementation of a tool to detect technical
errors in single-cell RNA-Seq data
Developing packages has become a part of how I analyse data

How I Use R
How my workﬂow has changed
Interactive with RStudio IDE (which I still use)
Using Projects (especially to develop packages)
Running scripts and running in the terminal (background with
nohup) on local PC or remote servers
Developing (and documenting) functions and packages that intend
to reuse and share

How I Use R
Biggest challenges
Being an early-adopter is hard
(and sometimes worth it)
Taking a project using diﬀerent tools to your team is hard
(but there is help online!)
Keeping up with the latest tools in the ﬁeld
(but there could be worse problems)

Engage with the community
Online (beyond the “help’ system’)
StackOverﬂow/StackExchange (Q&A)
GitHub (Share code)
Twitter (#Rstats #Rlang)
R blogs
Google (everyone does it!)
Workshops and community events
Software Carpentry / Data Carpentry
(swcarpentry @thecarpentries)
Reseach Bazaar (ResBaz)
HackyHour
Mozilla “Study Group”
R user groups (Meetup, #TokyoR)

It’s not just statistics: it’s a language
Mike Sumner
Australian Antarctic
Division, Antarctic
Climate and
Ecosystems
Hobart, Australia
Twitter:
@mdsumner
GitHub:
mdsumner
#RLang

It’s not just a language: it’s a community

Learning in a community
Australia
Research Bazaar (2015) Melbourne
ResBaz organisers Software Carpentry Instructors

Learning in a community
New Zealand
ResBaz (2016) Dunedin ResBaz (2017) Auckland
ResBaz (Feb 2018) Dunedin
ResBaz (June 2018) Dunedin

R is a global community
R user groups (RUGs)
Joseph Rickert (@RStudioJoe)
ResBaz events (2017)
Software Carpentry Instructors
R User Groups (Meetup)
“RLadies” Groups

Programming is Learning
Things I want to learn more about or do better
Project management
Tracking package versions (packrat)
Testing functions and packages with Travis CI or Appveyor
Version control (git) and containers (docker)
Calling other languages (use the best tool for the job)
Python (reticulate), Julia (RJulia), C++ (Rcpp)
The “tidyverse” from Hadley Wickham et al
readr, tidyr, glue, dplyr, purrr, ggplot2 (gganimate, gghighlight)
Analysis techniques
Machine Learning, Statistical Learning, AI
Bayesian modelling and inference
Techniques for “single-cell” analysis (Suerat, monocle, etc)
Plotting to communicate variation and uncertainty
Colour-blind “friendly” palettes (RColorBrewer, viridis)
Value-suppressing uncertainty palettes (VSUP)
Interactive plots (plotly, shiny, or D3.js)

We can plot data
Plotting doubt is harder

Advice
You never stop learning R
Everyone uses Google (and that’s ok!)
Seek projects that challenge you to learn more
Code is a means to an end: keep project goals in mind!
Code together; teach together; learn together

My Research Journey with R

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to My Research Journey with R

Similar to My Research Journey with R (20)

More from Tom Kelly

More from Tom Kelly (8)

Recently uploaded

Recently uploaded (20)

My Research Journey with R