Plant systematics to cancer biology:
Transferrable skills and evolutionary
thinking in bioinformatics
Kate L Hertweck
The University of Texas at Tyler
Department of Biology
Twitter @k8hert
I am...
...an educator and researcher.
...an evolutionary biologist.
...a data-driven bioinformaticist.
...committed to reproducible science.
Goal: Relate
genomic variation to
organismal function
and evolution to
understand complex
traits.
Outline:
1. Evolution in monocots
2. Population genomics in Drosophila
3. Biomarkers in cancer
Objectives:
1. Identify associations between genomic and
organismal variation
2. Consider opportunities transferring
bioinformatic skills among model systems
Biodiversity
Heritage
Library
Can we use genomic data to determine relationships
among species and identify patterns of genomic evolution
across deep time?
?
Monocots are a delicious and diverse model system
●
ca. 60,000 species, many edible and ornamental
●
Variation in traits
●
life history :growth habit, habitat
●
genome: size, chromosome number, ploidy
●
Few genomic resources except in grasses
Darlington 1963
Asparagus from
user Evan-Amos
Allium from user Ram-ManIris from user
Bob Gutowski
Allium, Bozzini 1964
Monocots exhibit varying rates of evolution and
shifts in diversification rates
Hertweck et al., 2015 Bot J Linn Soc
●
Data: Eight loci from three
genomic partitions (mt, cp,
nuclear; including one low-
copy nuclear gene)
●
Analysis: tree-building with
RAxML, divergence time
analysis with r8s and
multidivtime, diversification
with MEDUSA and
apTreeShape
A
Fossil calibration
Species-rich lineage (MEDUSA)
Species-poor lineage (MEDUSA)
A ApTreeShape
Steele, Hertweck, Mayfield, McKain,
Leebens-Mack, and Pires, 2012 AJB
●
Data: Genomic survey
sequences (GSS;
anonymous, low-coverage
NGS data)
●
Analysis: plastome and
mt/nrDNA assembly, tree
building with PAUP and Garli
●
Used less than 10% of the
data collected!
Doryanthaceae
Iridaceae
Xeronemataceae
Hemerocallidoideae
Xanthorrhoeoideae
Asphodeloideae
Agapanthoideae
Allioideae
Amaryllidoideae
Aphyllanthoideae
Lomandroideae
Asparagoideae
Nolinoideae
Agavoideae
Scilloideae
Brodiaeoideae
Plastid genomes resolve relationships in Asparagales
Xanthorrhoeaeceae
AgapanthaceaeAsparagaceae
*increase in
bootstrap support
*
*
*
*
*
*
*
*
*
*problematic
*
●
Transposable
elements (TEs): mobile
genetic elements or
jumping genes
●
Independently
replicating
●
Similar to or derived
from viruses
●
Occur in multiple
copies throughout the
genome
●
TEs are an important
driver in genomic
evolution
●
Interactions with
genes
●
Genome-wide
modifications
●
Source of mutation on
which natural selection
can act
Transposable elements are an underappreciated
source of genomic variation
Approach: assembly of
TEs from GSS
●
contigs are consensus
of most abundant TEs
in the genome
●
TEs must exist in high
copy to have sufficient
reads for detection
(assembly)
●
the older a TE
insertion, the more
likely it has
accumulated
mutations which will
inhibit detection
●
data presented as
percentage of TE type
in nuclear genome
(relative abundance)
Heslop-Harrison et al, 1997
Hertweck, 2013, Genome
TE content does not vary with genome size
Aphyllanthes
Lomandra
Sansevieria
Asparagus
Ledebouria
Dichelostemma
Agapanthis
Allium
Haworthia
Hosta
Scadoxus
0%
10%
20%
30%
40%
50%
60%
70%
0
5000
10000
15000
20000
25000
Percentageofsequence
readsfromnucleargenome
One of largest
genomes in
dataset, but very
small proportion
of repeats!
●
Data: Previously
published GSS data
●
Analysis: assembly
with MaSuRCA,
BLAST to remove
organellar
sequences, annotate
with RepeatMasker
●
Inconsistent with
hypothesis that TE
proliferation is related
to an increase in
genome size
Genomesize(Mb/1C)
tetraploid,
largest (known) genome in dataset
Agavoideae TEs are difficult to annotate but
appear to vary with ploidy
●
Data: GSS from
Agavoideae (tequila)
●
Analysis: additional
annotation methods
with CDD
●
Agavoideae TEs are
particularly difficult to
sequence
●
CDD more than
doubles identifiable
sequence!
Agave tequilana from user
Stan Shebs
copia
gypsy
Allium
other Allioideae
Allium have much lower proportions of
Copia LTR retrotransposons than closely related genera
●
Data: GSS from
Allioideae (onion, garlic,
leek)
●
Allium has 800+
species, related genera
have relatively few
●
Low proportion of copia
counter to expectations
of diversification from
TE expansion
Allium senescens from user
Adamantios
Conclusions: AsparagalesConclusions: Evolution in monocots
Can we use genomic data to determine relationships
among species and identify patterns of genomic
evolution across deep time?
●
Monocot phylogenetics
●
Unlinked loci from across the genome provide the
framework for diversification analyses
●
Complete plastomes resolve Asparagales relationships
●
Asparagales TEs
●
GSS can suggest what parts of the genome may be
interesting for further investigation
1. Evolution in monocots
2. Population genomics in Drosophila
3. Biomarkers in cancer
Collaborators:
Michael R. Rose (UC Irvine)
Joseph L. Graves (NC A&T, UNCG)
D. melanogaster male from user Aka
Do populations experimentally selected for specific
phenotypes yield similar genomic patterns?
Long term experimental evolution
system (established 1980) with
following treatments:
A short life cycle (9 days)
B baseline life cycle (14 days)
C long life cycle (28 days)
●
Data: Whole-genome pooled
population resequencing,
three selection types, six
treatments, five populations
each
●
Analysis: phenotypes, SNPs,
structural variants, TEs
Experimental evolution in Drosophila results in
parallel responses to selection for time to development
NCO
BO
AO
CO
ACO
B
B C A
Populations with accelerated development
have higher TE load
●
Analysis: Identification of
per-population TE load using
PopoolationTE
●
Within-treatment TE load is
not significantly different
(p>0.05)
●
Between-treatment TE load
does differ
●
Consistent with expectation
that TEs are more tightly
controlled in populations with
longer life spans
Heterozygosity of TE insertions is higher in
populations with accelerated development
●
Analysis: T-lex to
identify insertion
frequencies for TEs
compared to reference
genome
●
Within-treatment TE
load is not significantly
different (p>0.05)
●
Between-treatment TE
load does differ
●
Consistent with
expectation that A-type
selection is more
intense
B C A
●
Analysis: T-lex to
identify insertion
frequencies for TEs
compared to reference
genome followed by
CMH test
●
177 insertions vary in
frequency between two
or more populations
●
91 insertions were
significantly
differentiated among at
least one treatment
comparison
●
Within-treatment
comparisons have few
to no significantly
differentiated TEs
Between-treatment comparisons have
more significantly differentiated TEs
●
Yes, with evidence from across the genome
●
Many types of TEs are responding to selective pressures
●
Comparisons of treatment types shows parallel response to selection
●
These data are a powerful tool for continuing to assess TE
responses to selection at a genomic level
Conclusions: Population genomics of Drosophila
f
Do populations experimentally selected for specific
phenotypes yield similar genomic patterns?
1. Evolution in monocots
2. Population genomics in Drosophila
3. Biomarkers in cancer
Collaborator:
Santanu Dasgupta (UT Health Northeast)
Philley et al, 2015, J Cell Phys
Can we integrate genomic data
with experimental studies to
identify biomarkers and cancer
pathways?
Background
●
Both detection and treatment of
cancer remain problematic
because of complex and
heterogeneous genetics
●
Integration of NGS analysis with
traditional wet lab work can
inform the relevance of particular
genetic variants and be used for
biomarker development
Philley et al, 2015, J Cell Phys
Philley et al, 2015, J Cell Phys
Haplotype phylogeny identifies variants
potentially linked to cancer
Turquoise = heteroplasmy
@ = reversion
●
Data: mitochondrial genome
sequencing from prostate
cancer patients
●
Analysis: Variant calling,
haplotypes assigned with
HaploGrep and PhyloTree
●
Differentiates variants due to
common ancestry from
variants possibly related to
cancer
Somatic mutations inform analyses in
genes of interest for HNSCC
●
Data: whole-genome
NGS data from paired
tumor/non-tumor tonsil
tissue (HPV-induced
head/neck squamous
cell carcinoma)
●
Analysis: Variant
calling, filter for only
somatic variants, mine
genes of interest
●
Provides the genetic
context to match with
protein expression
studies
●
Opportunities for data
re-use to examine
evolutionary questions
Kannan, Hertweck et al., in review
Conclusions: Biomarkers in cancer
Can we integrate genomic data with experimental studies
to identify biomarkers and cancer pathways?
●
Paired tumor/normal samples are a powerful tool for identifying
variants related to multiple types of cancer
●
The integration of genomic data with wet-lab work contributes to both
biomarker development and elucidation of cancer pathways
●
Evolutionary thinking is valuable for interpreting integrative studies
General conclusions
●
You can answer really interesting questions about evolutionary
biology by combining NGS data with other types of biological
information
●
Skills to assess variation in large datasets are very transferrable and
offer great opportunity for novel research approaches
Goal: Relate genomic variation to organismal
function and evolution to understand complex
traits.
1. Evolution in monocots
2. Population genomics in Drosophila
3. Biomarkers in cancer
Considerations for diversifying your research
●
Learning reproducible science skills is well worth your
time!
●
Find a community.
●
Be prepared to spend lots of time managing and
organizing data.
●
Choose collaborations carefully, but don't be afraid to
branch out.
Image by Sugar Research Australia
Hibiscus dasycalyx by user
Sesamehoneytart
Hibiscus dasycalyx by user
Sesamehoneytart
Clostridium acetobutylicum by user
Geoman3
Acknowledgements
Research
https://sites.google.com/site/k8hertweck
Blog:
k8hert.blogspot.com
Twitter @k8hert
Google+ k8hertweck@gmail.com
GitHub https://github.com/k8hertweck
For
images:
For research
support: For intellectual support
and training:
Bulbapedia

Hertweck AB3ACBS presentation

  • 1.
    Plant systematics tocancer biology: Transferrable skills and evolutionary thinking in bioinformatics Kate L Hertweck The University of Texas at Tyler Department of Biology Twitter @k8hert
  • 2.
    I am... ...an educatorand researcher. ...an evolutionary biologist. ...a data-driven bioinformaticist. ...committed to reproducible science.
  • 3.
    Goal: Relate genomic variationto organismal function and evolution to understand complex traits.
  • 4.
    Outline: 1. Evolution inmonocots 2. Population genomics in Drosophila 3. Biomarkers in cancer Objectives: 1. Identify associations between genomic and organismal variation 2. Consider opportunities transferring bioinformatic skills among model systems Biodiversity Heritage Library
  • 5.
    Can we usegenomic data to determine relationships among species and identify patterns of genomic evolution across deep time? ?
  • 6.
    Monocots are adelicious and diverse model system ● ca. 60,000 species, many edible and ornamental ● Variation in traits ● life history :growth habit, habitat ● genome: size, chromosome number, ploidy ● Few genomic resources except in grasses Darlington 1963 Asparagus from user Evan-Amos Allium from user Ram-ManIris from user Bob Gutowski Allium, Bozzini 1964
  • 7.
    Monocots exhibit varyingrates of evolution and shifts in diversification rates Hertweck et al., 2015 Bot J Linn Soc ● Data: Eight loci from three genomic partitions (mt, cp, nuclear; including one low- copy nuclear gene) ● Analysis: tree-building with RAxML, divergence time analysis with r8s and multidivtime, diversification with MEDUSA and apTreeShape A Fossil calibration Species-rich lineage (MEDUSA) Species-poor lineage (MEDUSA) A ApTreeShape
  • 8.
    Steele, Hertweck, Mayfield,McKain, Leebens-Mack, and Pires, 2012 AJB ● Data: Genomic survey sequences (GSS; anonymous, low-coverage NGS data) ● Analysis: plastome and mt/nrDNA assembly, tree building with PAUP and Garli ● Used less than 10% of the data collected! Doryanthaceae Iridaceae Xeronemataceae Hemerocallidoideae Xanthorrhoeoideae Asphodeloideae Agapanthoideae Allioideae Amaryllidoideae Aphyllanthoideae Lomandroideae Asparagoideae Nolinoideae Agavoideae Scilloideae Brodiaeoideae Plastid genomes resolve relationships in Asparagales Xanthorrhoeaeceae AgapanthaceaeAsparagaceae *increase in bootstrap support * * * * * * * * * *problematic *
  • 9.
    ● Transposable elements (TEs): mobile geneticelements or jumping genes ● Independently replicating ● Similar to or derived from viruses ● Occur in multiple copies throughout the genome ● TEs are an important driver in genomic evolution ● Interactions with genes ● Genome-wide modifications ● Source of mutation on which natural selection can act Transposable elements are an underappreciated source of genomic variation Approach: assembly of TEs from GSS ● contigs are consensus of most abundant TEs in the genome ● TEs must exist in high copy to have sufficient reads for detection (assembly) ● the older a TE insertion, the more likely it has accumulated mutations which will inhibit detection ● data presented as percentage of TE type in nuclear genome (relative abundance) Heslop-Harrison et al, 1997
  • 10.
    Hertweck, 2013, Genome TEcontent does not vary with genome size Aphyllanthes Lomandra Sansevieria Asparagus Ledebouria Dichelostemma Agapanthis Allium Haworthia Hosta Scadoxus 0% 10% 20% 30% 40% 50% 60% 70% 0 5000 10000 15000 20000 25000 Percentageofsequence readsfromnucleargenome One of largest genomes in dataset, but very small proportion of repeats! ● Data: Previously published GSS data ● Analysis: assembly with MaSuRCA, BLAST to remove organellar sequences, annotate with RepeatMasker ● Inconsistent with hypothesis that TE proliferation is related to an increase in genome size Genomesize(Mb/1C)
  • 11.
    tetraploid, largest (known) genomein dataset Agavoideae TEs are difficult to annotate but appear to vary with ploidy ● Data: GSS from Agavoideae (tequila) ● Analysis: additional annotation methods with CDD ● Agavoideae TEs are particularly difficult to sequence ● CDD more than doubles identifiable sequence! Agave tequilana from user Stan Shebs
  • 12.
    copia gypsy Allium other Allioideae Allium havemuch lower proportions of Copia LTR retrotransposons than closely related genera ● Data: GSS from Allioideae (onion, garlic, leek) ● Allium has 800+ species, related genera have relatively few ● Low proportion of copia counter to expectations of diversification from TE expansion Allium senescens from user Adamantios
  • 13.
    Conclusions: AsparagalesConclusions: Evolutionin monocots Can we use genomic data to determine relationships among species and identify patterns of genomic evolution across deep time? ● Monocot phylogenetics ● Unlinked loci from across the genome provide the framework for diversification analyses ● Complete plastomes resolve Asparagales relationships ● Asparagales TEs ● GSS can suggest what parts of the genome may be interesting for further investigation
  • 14.
    1. Evolution inmonocots 2. Population genomics in Drosophila 3. Biomarkers in cancer Collaborators: Michael R. Rose (UC Irvine) Joseph L. Graves (NC A&T, UNCG) D. melanogaster male from user Aka
  • 15.
    Do populations experimentallyselected for specific phenotypes yield similar genomic patterns?
  • 16.
    Long term experimentalevolution system (established 1980) with following treatments: A short life cycle (9 days) B baseline life cycle (14 days) C long life cycle (28 days) ● Data: Whole-genome pooled population resequencing, three selection types, six treatments, five populations each ● Analysis: phenotypes, SNPs, structural variants, TEs Experimental evolution in Drosophila results in parallel responses to selection for time to development NCO BO AO CO ACO B
  • 17.
    B C A Populationswith accelerated development have higher TE load ● Analysis: Identification of per-population TE load using PopoolationTE ● Within-treatment TE load is not significantly different (p>0.05) ● Between-treatment TE load does differ ● Consistent with expectation that TEs are more tightly controlled in populations with longer life spans
  • 18.
    Heterozygosity of TEinsertions is higher in populations with accelerated development ● Analysis: T-lex to identify insertion frequencies for TEs compared to reference genome ● Within-treatment TE load is not significantly different (p>0.05) ● Between-treatment TE load does differ ● Consistent with expectation that A-type selection is more intense B C A
  • 19.
    ● Analysis: T-lex to identifyinsertion frequencies for TEs compared to reference genome followed by CMH test ● 177 insertions vary in frequency between two or more populations ● 91 insertions were significantly differentiated among at least one treatment comparison ● Within-treatment comparisons have few to no significantly differentiated TEs Between-treatment comparisons have more significantly differentiated TEs
  • 20.
    ● Yes, with evidencefrom across the genome ● Many types of TEs are responding to selective pressures ● Comparisons of treatment types shows parallel response to selection ● These data are a powerful tool for continuing to assess TE responses to selection at a genomic level Conclusions: Population genomics of Drosophila f Do populations experimentally selected for specific phenotypes yield similar genomic patterns?
  • 21.
    1. Evolution inmonocots 2. Population genomics in Drosophila 3. Biomarkers in cancer Collaborator: Santanu Dasgupta (UT Health Northeast) Philley et al, 2015, J Cell Phys
  • 22.
    Can we integrategenomic data with experimental studies to identify biomarkers and cancer pathways?
  • 23.
    Background ● Both detection andtreatment of cancer remain problematic because of complex and heterogeneous genetics ● Integration of NGS analysis with traditional wet lab work can inform the relevance of particular genetic variants and be used for biomarker development Philley et al, 2015, J Cell Phys
  • 24.
    Philley et al,2015, J Cell Phys Haplotype phylogeny identifies variants potentially linked to cancer Turquoise = heteroplasmy @ = reversion ● Data: mitochondrial genome sequencing from prostate cancer patients ● Analysis: Variant calling, haplotypes assigned with HaploGrep and PhyloTree ● Differentiates variants due to common ancestry from variants possibly related to cancer
  • 25.
    Somatic mutations informanalyses in genes of interest for HNSCC ● Data: whole-genome NGS data from paired tumor/non-tumor tonsil tissue (HPV-induced head/neck squamous cell carcinoma) ● Analysis: Variant calling, filter for only somatic variants, mine genes of interest ● Provides the genetic context to match with protein expression studies ● Opportunities for data re-use to examine evolutionary questions Kannan, Hertweck et al., in review
  • 26.
    Conclusions: Biomarkers incancer Can we integrate genomic data with experimental studies to identify biomarkers and cancer pathways? ● Paired tumor/normal samples are a powerful tool for identifying variants related to multiple types of cancer ● The integration of genomic data with wet-lab work contributes to both biomarker development and elucidation of cancer pathways ● Evolutionary thinking is valuable for interpreting integrative studies
  • 27.
    General conclusions ● You cananswer really interesting questions about evolutionary biology by combining NGS data with other types of biological information ● Skills to assess variation in large datasets are very transferrable and offer great opportunity for novel research approaches Goal: Relate genomic variation to organismal function and evolution to understand complex traits. 1. Evolution in monocots 2. Population genomics in Drosophila 3. Biomarkers in cancer
  • 28.
    Considerations for diversifyingyour research ● Learning reproducible science skills is well worth your time! ● Find a community. ● Be prepared to spend lots of time managing and organizing data. ● Choose collaborations carefully, but don't be afraid to branch out. Image by Sugar Research Australia Hibiscus dasycalyx by user Sesamehoneytart Hibiscus dasycalyx by user Sesamehoneytart Clostridium acetobutylicum by user Geoman3
  • 29.
    Acknowledgements Research https://sites.google.com/site/k8hertweck Blog: k8hert.blogspot.com Twitter @k8hert Google+ k8hertweck@gmail.com GitHubhttps://github.com/k8hertweck For images: For research support: For intellectual support and training: Bulbapedia