SlideShare a Scribd company logo
My Research Journey with R
How learning, using, and teaching R has helped my career in the life sciences
#TokyoR 2018-7-15
Tom Kelly
Postdoctoral Researcher
Epigenome Technology Exploration Unit
RIKEN Centre for Integrative Medical Sciences
Yokohama, Japan
ケケケリリリーーー・・・トトトムムム
ポスドクで 研究者
エピゲノム技術開発ユニット
国立研究開発法人理化学研究所の生命医科学研究センター
日本の横浜市
My Research Journey with R
Why I chose R to do (the vast majority of) my research
What I use R for in my research and what I’ve learned along the way
How my workflow has changed and package recommendations
Future challenges and hot topics
My Research Journey with R
Introduction
Studied at the University of Otago, Dunedin, New Zealand
Majored in genetics and mathematics
Focused on “bioinformatics” in postgrad
PhD on gene interactions in breast cancer for “precision
medicine” supervised by A/Prof. Mik Black (a statistician)
Worked at Tohoku University, Sendai, Miyagi Prefecture
Assisted with academic writing and data analysis in
Neuroscience and Bioengineering Laboratories
Taught statistical analysis and programming in R to
international postgraduate students (in English)
Currently a postdoc at RIKEN, Yokohama campus
Part of a Plant Stem Cell Analysis consortium
Focusing on single-cell genomics technologies
Continuing to develop new analysis techniques and
pipelines driven by new technology
Tom Kelly
Twitter:
@tomkXY
GitHub:
TomKellyGenetics
Why I Started With R
My supervisor was a statistician and good example of how
R could be used in my field
An opportunity to learn new (transferable) computational
skills and work with “Big Data” (rather than theory or
experiments)
Free and Open-Source
A large (and growing) user community to engage with (and
seek help from) online and at events
A huge ecosystem of packages to do statistical analyses and
plotting (especially in the field of genomics/bioinformatics)
CRAN Bioconductor GitHub
Mik Black
Otago Uni
Dunedin, New
Zealand
What I Use R For
Pretty much everything . . .
Analysis of gene expression patterns (differential expression,
molecular subtypes, cluster analysis)
Pathway (functional group) enrichment and network (graph
structure) analysis
Develop and test novel analysis methods for genomics data
Analysis heterogeneity (variation) at the single-cell level
(classification and markers of cell types)
Integrative “omics” analysis across data from different
techniques (genetic variant, mutation, gene expression,
protein, metabolism, epigenetic regulatory states, chromatin
structure)
How I Use R
Data manipulation and statistical analysis
Built-in functions (“base R”, stats) and distributions (mvtnorm, extraDist)
data.table (fread) and tibble for enhanced “data frames”
igraph for graph theory, pathway structure, and network analysis
Parallel computing with snow and OpenMPI (simulations and permutations)
Accessing genomics annotation and analysis packages
Genomic data (e.g., org.Hs.eg.db, reactome.db)
Statistical analysis (e.g., limma, edgeR)
Plotting and data visualisation
gplots(heatmap.2 and venn diagram), vioplot, and built-in plots (scatterplot,
lineplot, boxplot, histograms, titles, axes, legends, etc)
Dimension reduction techniques: SVD, PCA, tSNE (Rtsne), UMAP (umap)
Many of these are also provided in the “tidyverse”
readr, tidyr and dplyr for data manipulation
ggplot2 for visualisation
More and more and more utilities and packages from GitHub
How I Use R
Shiny Apps
Build and share interactive apps
Even if you can’t write JavaScript
How I Use R
Rmarkdown, knitr, bookdown
How I Use R
Package development and code release with devtools
Develop R packages with devtools and roxygen2 (documentation)
Share functions and release code as a research output
Release” CRAN, Bioconductor, GitHub, ROpenSci
Cite: Zenodo, Journal of Open Source Software, Journal of Statistical Software
How I Use R
Packages I’ve developed
Data visualisation
heatmap.2x for annotated heatmap.2x (gplots)
vioplot enhanced version: proposed version 0.3
plot.igraph plotting directional graph structures, including
inhibitory links
Network analysis using igraph
graphsim simulate gene expression from pathway graph structures
pathway.structure.permutation perform permutation analysis
of gene candidates in a pathway structure
info.centrality compute network efficiency and information
centrality
igraph.extensions install all of the above
Gene expression analysis
slipt detect “synthetic lethal” gene interactions in expression data
DoubletDetection R implementation of a tool to detect technical
errors in single-cell RNA-Seq data
Developing packages has become a part of how I analyse data
How I Use R
How my workflow has changed
Interactive with RStudio IDE (which I still use)
Using Projects (especially to develop packages)
Running scripts and running in the terminal (background with
nohup) on local PC or remote servers
Developing (and documenting) functions and packages that intend
to reuse and share
How I Use R
Biggest challenges
Being an early-adopter is hard
(and sometimes worth it)
Taking a project using different tools to your team is hard
(but there is help online!)
Keeping up with the latest tools in the field
(but there could be worse problems)
Engage with the community
Online (beyond the “help’ system’)
StackOverflow/StackExchange (Q&A)
GitHub (Share code)
Twitter (#Rstats #Rlang)
R blogs
Google (everyone does it!)
Workshops and community events
Software Carpentry / Data Carpentry
(swcarpentry @thecarpentries)
Reseach Bazaar (ResBaz)
HackyHour
Mozilla “Study Group”
R user groups (Meetup, #TokyoR)
It’s not just statistics: it’s a language
Mike Sumner
Australian Antarctic
Division, Antarctic
Climate and
Ecosystems
Hobart, Australia
Twitter:
@mdsumner
GitHub:
mdsumner
#RLang
It’s not just a language: it’s a community
Learning in a community
Australia
Research Bazaar (2015) Melbourne
ResBaz organisers Software Carpentry Instructors
Learning in a community
New Zealand
ResBaz (2016) Dunedin ResBaz (2017) Auckland
ResBaz (Feb 2018) Dunedin
ResBaz (June 2018) Dunedin
R is a global community
R user groups (RUGs)
Joseph Rickert (@RStudioJoe)
ResBaz events (2017)
Software Carpentry Instructors
R User Groups (Meetup)
“RLadies” Groups
Programming is Learning
Things I want to learn more about or do better
Project management
Tracking package versions (packrat)
Testing functions and packages with Travis CI or Appveyor
Version control (git) and containers (docker)
Calling other languages (use the best tool for the job)
Python (reticulate), Julia (RJulia), C++ (Rcpp)
The “tidyverse” from Hadley Wickham et al
readr, tidyr, glue, dplyr, purrr, ggplot2 (gganimate, gghighlight)
Analysis techniques
Machine Learning, Statistical Learning, AI
Bayesian modelling and inference
Techniques for “single-cell” analysis (Suerat, monocle, etc)
Plotting to communicate variation and uncertainty
Colour-blind “friendly” palettes (RColorBrewer, viridis)
Value-suppressing uncertainty palettes (VSUP)
Interactive plots (plotly, shiny, or D3.js)
We can plot data
Plotting doubt is harder
Advice
You never stop learning R
Everyone uses Google (and that’s ok!)
Seek projects that challenge you to learn more
Code is a means to an end: keep project goals in mind!
Code together; teach together; learn together

More Related Content

What's hot

Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
Raul Palma
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
Carole Goble
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
OEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringOEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology Engineering
María Poveda Villalón
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
Daniel JACOB
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
Carole Goble
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
Carole Goble
 
Gene Ontology WormBase Workshop International Worm Meeting 2015
Gene Ontology WormBase Workshop International Worm Meeting 2015Gene Ontology WormBase Workshop International Worm Meeting 2015
Gene Ontology WormBase Workshop International Worm Meeting 2015
raymond91105
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Rothamsted Research, UK
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
dgarijo
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
Stuart Chalk
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
Alejandra Gonzalez-Beltran
 
Lei_Resume-it.doc
Lei_Resume-it.docLei_Resume-it.doc
Lei_Resume-it.doc
butest
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
Stuart Chalk
 
Neo4j and bioinformatics
Neo4j and bioinformaticsNeo4j and bioinformatics
Neo4j and bioinformatics
Pablo Pareja Tobes
 
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataRepeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
Leighton Pritchard
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
Duncan Hull
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
Araport
 
Zelditchetal workbookgeomorphoanalyses
Zelditchetal workbookgeomorphoanalysesZelditchetal workbookgeomorphoanalyses
Zelditchetal workbookgeomorphoanalyses
Wagner M. S. Sampaio
 

What's hot (20)

Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
 
OEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringOEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology Engineering
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
 
Gene Ontology WormBase Workshop International Worm Meeting 2015
Gene Ontology WormBase Workshop International Worm Meeting 2015Gene Ontology WormBase Workshop International Worm Meeting 2015
Gene Ontology WormBase Workshop International Worm Meeting 2015
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
Lei_Resume-it.doc
Lei_Resume-it.docLei_Resume-it.doc
Lei_Resume-it.doc
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
Neo4j and bioinformatics
Neo4j and bioinformaticsNeo4j and bioinformatics
Neo4j and bioinformatics
 
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataRepeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
Zelditchetal workbookgeomorphoanalyses
Zelditchetal workbookgeomorphoanalysesZelditchetal workbookgeomorphoanalyses
Zelditchetal workbookgeomorphoanalyses
 

Similar to My Research Journey with R

R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometrics
Diane Talley
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
David Ruau
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
Andy Petrella
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Keiichiro Ono
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
Sarah Guido
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
c.titus.brown
 
Cv long
Cv longCv long
CV_10/17
CV_10/17CV_10/17
Towards reproducibility and maximally-open data
Towards reproducibility and maximally-open dataTowards reproducibility and maximally-open data
Towards reproducibility and maximally-open data
Pablo Bernabeu
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
ARPUTHA SELVARAJ A
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
Ian Foster
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
Pistoia Alliance
 
Using R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataUsing R for Classification of Large Social Network Data
Using R for Classification of Large Social Network Data
IJCSIS Research Publications
 
2014 genome informatics Linked Data
2014 genome informatics Linked Data2014 genome informatics Linked Data
2014 genome informatics Linked Data
ENCODE-DCC
 
grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013 grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013
adrianheilbut
 
A biologist in e-Science
A biologist in e-ScienceA biologist in e-Science
A biologist in e-Science
Leiden University Medical Center
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
Gaignard Alban
 
Data Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake FansData Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake Fans
Jameel Syed
 
D1803012022
D1803012022D1803012022
D1803012022
IOSR Journals
 

Similar to My Research Journey with R (20)

R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometrics
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
Cv long
Cv longCv long
Cv long
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
Towards reproducibility and maximally-open data
Towards reproducibility and maximally-open dataTowards reproducibility and maximally-open data
Towards reproducibility and maximally-open data
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
 
Using R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataUsing R for Classification of Large Social Network Data
Using R for Classification of Large Social Network Data
 
2014 genome informatics Linked Data
2014 genome informatics Linked Data2014 genome informatics Linked Data
2014 genome informatics Linked Data
 
grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013 grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013
 
A biologist in e-Science
A biologist in e-ScienceA biologist in e-Science
A biologist in e-Science
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Data Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake FansData Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake Fans
 
D1803012022
D1803012022D1803012022
D1803012022
 

More from Tom Kelly

Presentation feb-2020-hackathon-public
Presentation feb-2020-hackathon-publicPresentation feb-2020-hackathon-public
Presentation feb-2020-hackathon-public
Tom Kelly
 
Presentation oct-2018-tokyo r
Presentation oct-2018-tokyo rPresentation oct-2018-tokyo r
Presentation oct-2018-tokyo r
Tom Kelly
 
Tom kelly genetics journal club 2016
Tom kelly   genetics journal club 2016Tom kelly   genetics journal club 2016
Tom kelly genetics journal club 2016
Tom Kelly
 
QMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyQMB_Poster_Tom_Kelly
QMB_Poster_Tom_Kelly
Tom Kelly
 
ResBaz poster: Toolkit
ResBaz poster: ToolkitResBaz poster: Toolkit
ResBaz poster: Toolkit
Tom Kelly
 
E research feb2016 sifting the needles in the haystack
E research feb2016 sifting the needles in the haystackE research feb2016 sifting the needles in the haystack
E research feb2016 sifting the needles in the haystack
Tom Kelly
 
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerBioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
Tom Kelly
 
Hidden in Plain Sight - The Genetics of Zombies
Hidden in Plain Sight - The Genetics of ZombiesHidden in Plain Sight - The Genetics of Zombies
Hidden in Plain Sight - The Genetics of Zombies
Tom Kelly
 

More from Tom Kelly (8)

Presentation feb-2020-hackathon-public
Presentation feb-2020-hackathon-publicPresentation feb-2020-hackathon-public
Presentation feb-2020-hackathon-public
 
Presentation oct-2018-tokyo r
Presentation oct-2018-tokyo rPresentation oct-2018-tokyo r
Presentation oct-2018-tokyo r
 
Tom kelly genetics journal club 2016
Tom kelly   genetics journal club 2016Tom kelly   genetics journal club 2016
Tom kelly genetics journal club 2016
 
QMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyQMB_Poster_Tom_Kelly
QMB_Poster_Tom_Kelly
 
ResBaz poster: Toolkit
ResBaz poster: ToolkitResBaz poster: Toolkit
ResBaz poster: Toolkit
 
E research feb2016 sifting the needles in the haystack
E research feb2016 sifting the needles in the haystackE research feb2016 sifting the needles in the haystack
E research feb2016 sifting the needles in the haystack
 
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerBioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
 
Hidden in Plain Sight - The Genetics of Zombies
Hidden in Plain Sight - The Genetics of ZombiesHidden in Plain Sight - The Genetics of Zombies
Hidden in Plain Sight - The Genetics of Zombies
 

Recently uploaded

Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
manji sharman06
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 

Recently uploaded (20)

Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 

My Research Journey with R

  • 1. My Research Journey with R How learning, using, and teaching R has helped my career in the life sciences #TokyoR 2018-7-15 Tom Kelly Postdoctoral Researcher Epigenome Technology Exploration Unit RIKEN Centre for Integrative Medical Sciences Yokohama, Japan ケケケリリリーーー・・・トトトムムム ポスドクで 研究者 エピゲノム技術開発ユニット 国立研究開発法人理化学研究所の生命医科学研究センター 日本の横浜市
  • 2. My Research Journey with R Why I chose R to do (the vast majority of) my research What I use R for in my research and what I’ve learned along the way How my workflow has changed and package recommendations Future challenges and hot topics
  • 3. My Research Journey with R Introduction Studied at the University of Otago, Dunedin, New Zealand Majored in genetics and mathematics Focused on “bioinformatics” in postgrad PhD on gene interactions in breast cancer for “precision medicine” supervised by A/Prof. Mik Black (a statistician) Worked at Tohoku University, Sendai, Miyagi Prefecture Assisted with academic writing and data analysis in Neuroscience and Bioengineering Laboratories Taught statistical analysis and programming in R to international postgraduate students (in English) Currently a postdoc at RIKEN, Yokohama campus Part of a Plant Stem Cell Analysis consortium Focusing on single-cell genomics technologies Continuing to develop new analysis techniques and pipelines driven by new technology Tom Kelly Twitter: @tomkXY GitHub: TomKellyGenetics
  • 4. Why I Started With R My supervisor was a statistician and good example of how R could be used in my field An opportunity to learn new (transferable) computational skills and work with “Big Data” (rather than theory or experiments) Free and Open-Source A large (and growing) user community to engage with (and seek help from) online and at events A huge ecosystem of packages to do statistical analyses and plotting (especially in the field of genomics/bioinformatics) CRAN Bioconductor GitHub Mik Black Otago Uni Dunedin, New Zealand
  • 5. What I Use R For Pretty much everything . . . Analysis of gene expression patterns (differential expression, molecular subtypes, cluster analysis) Pathway (functional group) enrichment and network (graph structure) analysis Develop and test novel analysis methods for genomics data Analysis heterogeneity (variation) at the single-cell level (classification and markers of cell types) Integrative “omics” analysis across data from different techniques (genetic variant, mutation, gene expression, protein, metabolism, epigenetic regulatory states, chromatin structure)
  • 6. How I Use R Data manipulation and statistical analysis Built-in functions (“base R”, stats) and distributions (mvtnorm, extraDist) data.table (fread) and tibble for enhanced “data frames” igraph for graph theory, pathway structure, and network analysis Parallel computing with snow and OpenMPI (simulations and permutations) Accessing genomics annotation and analysis packages Genomic data (e.g., org.Hs.eg.db, reactome.db) Statistical analysis (e.g., limma, edgeR) Plotting and data visualisation gplots(heatmap.2 and venn diagram), vioplot, and built-in plots (scatterplot, lineplot, boxplot, histograms, titles, axes, legends, etc) Dimension reduction techniques: SVD, PCA, tSNE (Rtsne), UMAP (umap) Many of these are also provided in the “tidyverse” readr, tidyr and dplyr for data manipulation ggplot2 for visualisation More and more and more utilities and packages from GitHub
  • 7. How I Use R Shiny Apps Build and share interactive apps Even if you can’t write JavaScript
  • 8. How I Use R Rmarkdown, knitr, bookdown
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. How I Use R Package development and code release with devtools Develop R packages with devtools and roxygen2 (documentation) Share functions and release code as a research output Release” CRAN, Bioconductor, GitHub, ROpenSci Cite: Zenodo, Journal of Open Source Software, Journal of Statistical Software
  • 15. How I Use R Packages I’ve developed Data visualisation heatmap.2x for annotated heatmap.2x (gplots) vioplot enhanced version: proposed version 0.3 plot.igraph plotting directional graph structures, including inhibitory links Network analysis using igraph graphsim simulate gene expression from pathway graph structures pathway.structure.permutation perform permutation analysis of gene candidates in a pathway structure info.centrality compute network efficiency and information centrality igraph.extensions install all of the above Gene expression analysis slipt detect “synthetic lethal” gene interactions in expression data DoubletDetection R implementation of a tool to detect technical errors in single-cell RNA-Seq data Developing packages has become a part of how I analyse data
  • 16. How I Use R How my workflow has changed Interactive with RStudio IDE (which I still use) Using Projects (especially to develop packages) Running scripts and running in the terminal (background with nohup) on local PC or remote servers Developing (and documenting) functions and packages that intend to reuse and share
  • 17. How I Use R Biggest challenges Being an early-adopter is hard (and sometimes worth it) Taking a project using different tools to your team is hard (but there is help online!) Keeping up with the latest tools in the field (but there could be worse problems)
  • 18. Engage with the community Online (beyond the “help’ system’) StackOverflow/StackExchange (Q&A) GitHub (Share code) Twitter (#Rstats #Rlang) R blogs Google (everyone does it!) Workshops and community events Software Carpentry / Data Carpentry (swcarpentry @thecarpentries) Reseach Bazaar (ResBaz) HackyHour Mozilla “Study Group” R user groups (Meetup, #TokyoR)
  • 19. It’s not just statistics: it’s a language Mike Sumner Australian Antarctic Division, Antarctic Climate and Ecosystems Hobart, Australia Twitter: @mdsumner GitHub: mdsumner #RLang
  • 20. It’s not just a language: it’s a community
  • 21. Learning in a community Australia Research Bazaar (2015) Melbourne ResBaz organisers Software Carpentry Instructors
  • 22. Learning in a community New Zealand ResBaz (2016) Dunedin ResBaz (2017) Auckland ResBaz (Feb 2018) Dunedin ResBaz (June 2018) Dunedin
  • 23. R is a global community R user groups (RUGs) Joseph Rickert (@RStudioJoe) ResBaz events (2017) Software Carpentry Instructors R User Groups (Meetup) “RLadies” Groups
  • 24. Programming is Learning Things I want to learn more about or do better Project management Tracking package versions (packrat) Testing functions and packages with Travis CI or Appveyor Version control (git) and containers (docker) Calling other languages (use the best tool for the job) Python (reticulate), Julia (RJulia), C++ (Rcpp) The “tidyverse” from Hadley Wickham et al readr, tidyr, glue, dplyr, purrr, ggplot2 (gganimate, gghighlight) Analysis techniques Machine Learning, Statistical Learning, AI Bayesian modelling and inference Techniques for “single-cell” analysis (Suerat, monocle, etc) Plotting to communicate variation and uncertainty Colour-blind “friendly” palettes (RColorBrewer, viridis) Value-suppressing uncertainty palettes (VSUP) Interactive plots (plotly, shiny, or D3.js)
  • 25.
  • 26. We can plot data Plotting doubt is harder
  • 27. Advice You never stop learning R Everyone uses Google (and that’s ok!) Seek projects that challenge you to learn more Code is a means to an end: keep project goals in mind! Code together; teach together; learn together