PhD thesis presentation

Next-generation text-mining
applied to toxicogenomics data
analysis

Kristina Hettne
PhD thesis defense

20 December, 2012

Toxicogenomics: study if a chemical causes
damage to genes

Text mining: teach a computer to “read”
articles and extract explicit information

Next-generation text mining: teach a
computer to find implicit information in
articles

Drug safety is essential!
But… how to minimize animal testing?

Image source: The Independent, July 12, 2012

Toxicogenomics data Interpretation using
knowledge from manually
curated databases

Image sources: Verhallen and Piersma, 2011, de Jong et al 2011, http://www.flickr.com/photos/jseita/3764113525/

Toxicogenomics data Interpretation using
knowledge from manually
curated databases

Not sufficient in coverage

We hypothesize that next-generation text mining
can increase the information coverage
Image sources: Verhallen and Piersma, 2011, de Jong et al 2011, http://www.flickr.com/photos/jseita/3764113525/

Next-generation text mining = concept profile
matching
Information cloud for
a gene concept Shared concepts

Information cloud
for a chemical
concept

Image source: Herman van Haagen

7

Concepts come from a thesaurus and are identified
in text with concept identification software

A good
thesaurus =
the basis for
good concept
identification

Image source: Herman van Haagen

Research objectives:
• Investigate information coverage in public
biomedical and chemical thesauri and
databases
• Provide methods to improve the quality
and coverage
• Give recommendations for use
• Investigate added value of next-
generation text mining when interpreting
toxicogenomics data
9

A thesaurus of chemical concepts1 and
methods1,2,3 to prepare a thesaurus to be
used with concept identification software

http://www.biosemantics.org/casper http://www.biosemantics.org/jochem

1. Hettne et al. Bioinformatics, 2009
2. Hettne et al. Journal of Biomedical Semantics, 2010
11
3. Hettne et al. Journal of Cheminformatics, 2010

A next-generation text mining-based method
for interpreting biological data
Next-generation
Biological data Statistical test text mining
12

This method gives more, and more specific results1
than other available tools
http://www.biosemantics.org/weightedglobaltest

1. Jelier R, Goeman JJ, Hettne KM, Schuemie MJ, den Dunnen JT, 't Hoen PA. Briefings in Bioinformatics, 2011

Application to toxicogenomics
Hettne et al. (submitted)
http://www.biosemantics.org/index.php?page=chemicalresponse-specific-gene-sets

See developmental defects in stem cells instead of
in animal embryos
Embryonic
structure
1.

2. Posterior neuropore open

A) Control group rat embryo B)Triazole-exposed rat embryo
Image sources1. Verhallen and Piersma, 2011, 2. De Jong et al 2012

Toxicity class prediction (case study: Triazoles)
25 times larger chemical-gene matrix compared to manual
work (Comparative Toxicogenomics Database)
Chemical
1.

Image source 1: Verhallen and Piersma, 2011

Conclusions
Next-generation text mining combined with
statistical tests complements, and is
sometimes superior to, manually curated
databases in:
- Relating chemical information to gene
expression data
- Identifying toxic effects already at the
gene expression stage
- Discriminating between different classes
of chemicals

Future
1. Make the method easier to use
(currently being worked on)

2. Apply the method for new drugs
with unknown toxicity

Early prediction of toxicity ->
less animal testing and safer drugs

Thank you to all who made
this possible!

PhD thesis presentation

More Related Content

What's hot

Similar to PhD thesis presentation

PhD thesis presentation