Improving Text Mining with Controlled
Natural Language:
A Case Study for Protein Interactions
Tobias Kuhn (speaker)
Loïc R...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 2
Cooperation of
University of Zurich
(Norbert E. Fuchs, Tobias Kuhn)
and...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 3
Introduction
 Biomedical literature is growing at a
tremendous pace
 ...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 4
Today's Solution
NLP, manual
annotation
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 5
Our Approach
 Let the researchers express their own
results in a forma...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 6
Knowledge Representation
Languages
OWL with RDF/XML
Description Logics
...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 7
Attempto Controlled English
(ACE)
 Formal language that looks like nat...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 8
Formal Summaries
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 9
Formal Summaries
BubR1 interacts-with a trunk-domain of Beta2-Adaptin.
...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 10
Ontology for Protein Interactions
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 11
Empirical Study
 “How suitable is ACE together with our
ontology to e...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 12
Empirical Study
154
57
62
matched perfectly
matched partially
unmatche...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 13
Authoring tool
 Helps writing ACE sentences
 Shows step by step the ...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 14
Authoring tool:
Prototype demo
http://gopubmed.biotec.tu-dresden.de/Ac...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 15
Benefits of our Approach
 Consistency / redundancy checks
 “Is there...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 16
Conclusions
 Formal summaries for scientific articles
can make text m...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 17
Thank you for your attention!
Questions
&
Discussion
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 18
Subheadings: Example
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 19
Degree of Matching: Examples
 Matched perfectly:
 Interaction of Act...
Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 20
Reasons for Non-perfect
Matching: Examples
 Not covered by the model:...
Upcoming SlideShare
Loading in …5
×

Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions

323 views
231 views

Published on

(See paper for details: http://attempto.ifi.uzh.ch/site/pubs/papers/dils2006_kuhn.pdf )

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
323
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions

  1. 1. Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions Tobias Kuhn (speaker) Loïc Royer Norbert E. Fuchs Michael Schroeder DILS'06, Hinxton (UK) 21 July 2006
  2. 2. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 2 Cooperation of University of Zurich (Norbert E. Fuchs, Tobias Kuhn) and TU Dresden (Loïc Royer, Michael Schroeder)
  3. 3. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 3 Introduction  Biomedical literature is growing at a tremendous pace  PubMed contains 16 million articles and grows by over 600'000 articles per year  Computational support is needed!
  4. 4. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 4 Today's Solution NLP, manual annotation
  5. 5. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 5 Our Approach  Let the researchers express their own results in a formal language  Perfect processing of scientific results by computers  This formal language has to be ...  easy to learn and understand  expressive enough to express even complicated scientific results
  6. 6. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 6 Knowledge Representation Languages OWL with RDF/XML Description Logics first-order logic ACE UML has
  7. 7. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 7 Attempto Controlled English (ACE)  Formal language that looks like natural English  Unambiguously translatable into first- order logic  Restricted grammar  Unlimited vocabulary  www.ifi.unizh.ch/attempto
  8. 8. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 8 Formal Summaries
  9. 9. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 9 Formal Summaries BubR1 interacts-with a trunk-domain of Beta2-Adaptin. [A, B, C, D] named(A, BubR1)-1 object(A, atomic, named_entity, object, cardinality, count_unit, eq, 1)-1 named(B, Beta2-Adaptin)-1 object(B, atomic, named_entity, object, cardinality, count_unit, eq, 1)-1 object(C, atomic, trunk-domain, unspecified, cardinality, count_unit, eq, 1)-1 relation(C, trunk-domain, of, B)-1 predicate(D, unspecified, interact_with, A, C)-1 ACE text Logical representation (DRS)
  10. 10. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 10 Ontology for Protein Interactions
  11. 11. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 11 Empirical Study  “How suitable is ACE together with our ontology to express scientific results of protein interactions?”  Manual translation of 273 facts about protein interactions  These facts are subheadings of the “Results”-sections of 89 articles (journals by Elsevier)
  12. 12. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 12 Empirical Study 154 57 62 matched perfectly matched partially unmatched not covered by the model relations of relations fuzzy 21 56 11 31 not understood Total: Non-perfect:
  13. 13. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 13 Authoring tool  Helps writing ACE sentences  Shows step by step the possible continuations of the sentence  New words can be created on-the-fly  Awareness of the underlying ontology  The users do not need to know the details of the ACE syntax and of the underlying ontology
  14. 14. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 14 Authoring tool: Prototype demo http://gopubmed.biotec.tu-dresden.de/AceWiki/
  15. 15. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 15 Benefits of our Approach  Consistency / redundancy checks  “Is there a paper that contradicts my results?”  “Is there a paper that comes to the same or similar results?”  Answer extraction  “Which proteins interact with a certain domain of protein X?”  Automatically updated knowledge bases  “Give me an overview of the relations of a protein X to other proteins!”
  16. 16. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 16 Conclusions  Formal summaries for scientific articles can make text mining easier and more powerful  ACE combines the power of ontologies with the convenience of natural language  Let the researchers formalize their own results!
  17. 17. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 17 Thank you for your attention! Questions & Discussion
  18. 18. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 18 Subheadings: Example
  19. 19. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 19 Degree of Matching: Examples  Matched perfectly:  Interaction of Act1 with TRAF6  → Act1 interacts-with TRAF6.  Matched partially:  The mtFabD protein is part of the core of the FAS-II complex  → MtFabD is a subunit of FAS-II.  Unmatched:  Cav1 interacts differentially with distinct Dyn2 forms
  20. 20. Tobias Kuhn, DILS'06, Hinxton (UK), 21 July 2006 20 Reasons for Non-perfect Matching: Examples  Not covered by the model:  Daxx Potentiates Fas-Mediated Apoptosis  Relations of relations:  Kal-GEF1 activation of Pak does not require GEF activity  Fuzzy:  ANKRD1 contains potential CASQ2 binding sequences located in both its NT- and CT-regions  Not understood:  hSrb7 does not interact with other nuclear receptors

×