Supporting researchers in the molecular life sciences Jeff Christiansen

Supporting Researchers in the Molecular Life Sciences
Jeff Christiansen
UQ RCC Health and Life Sciences Program Manager
QCIF Health and Life Sciences Program Manager
EMBL-ABR Key Areas Coordinator

DNA
mRNA
protein
metabolites
The central dogma of biology
Cell type 1 vs cell type 2: same genes but different mRNAs, proteins and metabolites (and with different levels)
Traditionally, researchers would focus on a small numbers of genes/proteins etc. due to technical constraints
folding
large
molecules
(small molecules)
enzymatic
catalysis

Global biomolecular profiling: the data explosion
DNA RNA protein metabolites
genomics transcriptomics proteomics metabolomics
20,005 ‘protein
coding’ genes
~200,000(?) transcripts
abundance?
16,518 identified
abundance?
>24597 compounds
abundance?
https://www.ebi.ac.uk/metabolights/referencehttps://hupo.org/HPP-Q&Ahttps://hupo.org/HPP-Q&A

The data explosion: challenges
• Data storage
• non-complex org’s (bacteria): 12GB raw data / sample (genomic, transcriptomic, proteomic, metabolomic)
• globally, est. 100 PB used by 20 largest institutions for genomic storage alone1
• Tools
• to convert data from raw > processed
• for comparative analyses on processed data (e.g. genome v. genome, transcriptome v. proteome)
• documenting methods (i.e. tool use – versions used, workflows applied)
• Compute
• resource intense (e.g. a single human : mouse genome alignment consumes ~100 CPU hrs.)
• Data management
• context surrounding the specimen (e.g. healthy vs diseased) and experiment
• context surrounding the data itself (provenance, state {raw, processed}, formats, etc.)
• managing sharing within research team
• data publishing at project end to international repositories
• Skills development
• enabling biologists to utilise bioinformatics approaches (expert [cmd line] > novice [GUI])
• enabling biologists to use storage, tools, compute and data management effectively
Stephens et al (2015) Astronomical or Genomical? PLOS Biology https://doi.org/10.1371/journal.pbio.1002195

Unmet Needs for Analysing Biological Big Data:
A Survey of 704 NSF Principal Investigators
Percent responding negatively
(318 ≤ n ≤ 510)
0% 20% 40% 60% 80% 100%
Barone L, Williams J, Micklos D; BioRxiv (2017)
Training on integration of multiple data types
Training on data management and metadata
Training on scaling analysis to cloud/HPC
Multi-step analysis workflows or pipelines
Cloud computing
Search for data & discover relevant datasets
Support for bioinformatics and analysis
Publish data to the community
Updated analysis software
Share data with colleagues
Training on basic computing and scripting
Sufficient data storage
High-performance computing
90% indicated
they are
currently or will
soon be
analysing large
digital datasets

Australian needs
The Most UsefulBiggest bioinformatics difficulty
https://www.embl-abr.org.au/news/braembl-community-survey-report-2013/
2013
N=210

Source: EMBL-ABR Australian Bioscience Data Capability (ABDC) working group.

The data ecosystem is complex:

The data ecosystem is complex: plenty of upskilling is required

Organise training material and events around research-relevant tasks, not the tools themselves
Training in how to perform tasks is required

Organise training material and events around research-relevant tasks, not the tools themselves
Training in how to perform tasks is required
Genome Annotation using Apollo

Building more intuitive tools is imperative

Involve a wide variety of users in usability testing

Involve a wide variety of users in usability testing
14 users (novice to expert bioinformaticians, student to CI)
5 tests (representing broad task types)
47 usability issues found – 38 addressed

Build/provide functionality that supports users with differing informatics skill levels

Australia is geographically challenging:

leverage technology, international and local expertise to help
deliver training to a wider audience
Dr Monica Muñoz-Torres
Project Lead, Apollo Project, Berkeley

9 EMBL-ABR Nodes, 92 registrants
QLD: QCIF, JCU (TSV+CNS)
NSW: UNSW, SCU
VIC: Monash, UniMelb
SA: UniAdel
TAS: UTas

Many existing materials and efforts can be leveraged:

Many existing materials and efforts can be leveraged:
TeSS
Training Portal
Training Portal
ERuDite
Training Portal

Source: EMBL-ABR Australian Bioscience Data Capability (ABDC) working group.
Biologists will be empowered to use data and informatics approaches

Supporting researchers in the molecular life sciences Jeff Christiansen

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Supporting researchers in the molecular life sciences Jeff Christiansen

Similar to Supporting researchers in the molecular life sciences Jeff Christiansen (20)

More from ARDC

More from ARDC (20)

Recently uploaded

Recently uploaded (20)

Supporting researchers in the molecular life sciences Jeff Christiansen