Patent Annotation Tool

1,370 views
1,276 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,370
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Patent Annotation Tool

  1. 1. Chemistry Annotation Tool Diego Bragato [email_address]
  2. 2. A custom ETL tool and a metadata editor <ul><li>Extracting entities from text </li></ul><ul><li>Enriching these entities manually </li></ul><ul><li>Enriching entities through automated lookups </li></ul><ul><li>Enriching entities with relations between entities and from entities to ontologies </li></ul><ul><li>A large architecture around the tool for automated extraction with text mining and ETL </li></ul>
  3. 3. The patent chemistry business case <ul><li>The text source are chemistry patents </li></ul><ul><li>Entitities are compounds, substances and classes of compounds or substances </li></ul><ul><li>Ontology/category provider for chemistry is Chebi database </li></ul><ul><li>Database to be built is a large scale chemistry database </li></ul>
  4. 4. Opening the tool
  5. 5. One pencil for each annotation type
  6. 6. Select an entity to annotate “a compound”
  7. 7. Enrich annotation and save
  8. 8. Annotate different entities and enrich through external lookup “CHEBI:16336”
  9. 9. Automated lookups for Multiple annotations in the text
  10. 10. Enrich with Relations annotation
  11. 11. Relations Selection “Hyaluronic Acid”
  12. 12. Class selection “Natural Mucopolysaccharide”
  13. 13. Enriched with: “Hyaluronic acid” is a “natural mucopolysaccharide” “Sodium Hyaluronan” is a “Hyaluronic Acid”
  14. 14. Text to ontology selection
  15. 15. Text to ontology annotation “Hyaluronic acid is a molecular entity”
  16. 16. Features <ul><li>Entity annotation (together with attributes) </li></ul><ul><li>Relation annotation (one entity against ontology, dictionary or against another entity) </li></ul><ul><li>Localization of the annotation against position in the text </li></ul><ul><li>Local dictionaries </li></ul><ul><li>Dynamic dictionaries (create a new dictionary of entities) </li></ul><ul><li>One Local ontology (cyclic graph) </li></ul><ul><li>Annotate once locally or everywhere (automated lookup of the same terms in the text) </li></ul>
  17. 17. To improve <ul><li>Partially supported features </li></ul><ul><ul><li>Part of word annotation </li></ul></ul><ul><ul><li>Multiple annotation on the same group of words (e.g. annotate both “vitamin E” and “vitamin E acetate”) </li></ul></ul><ul><li>To improve </li></ul><ul><ul><li>Better definition of entities, properties and relations </li></ul></ul><ul><ul><li>Multiple ontologies, and dictionaries </li></ul></ul>
  18. 18. Must have <ul><li>Remote lookups, remote ontologies and dictionaries both static and dynamic </li></ul><ul><li>Remote loading to a database </li></ul><ul><li>Disjoint annotations two words separated by some text but connected logically (e.g. a chemical name scattered between words) </li></ul><ul><li>Character level resolution (in order to be a proper tool in support to high recall text mining) </li></ul>
  19. 19. Nice to have <ul><li>Image support </li></ul><ul><li>PDF support </li></ul><ul><li>Cross annotation between images and text (applicable for markush within patents) </li></ul><ul><li>Annotation of information such as compositions, mixtures, syntesis. </li></ul><ul><li>OWL support </li></ul>

×