Jamsandekar Mugdha  Prabhu Priyanka Sugandh Neha Extraction of Relations from Unstructured Text
Introduction <ul><li>Goal - to automatically extract groups of entities of a specific relation from unstructured text. </l...
Approach <ul><li>(gene, disease) seed values </li></ul><ul><li>MMTX- Medical Ontology </li></ul><ul><li>Pattern <order, pr...
Patterns <ul><li><true| polymorphisms of|LTA gene| are associated with risk of|MI| in Japanese> </li></ul><ul><li><true| a...
30 Relations  Obtained – 4% of abstracts
Statistics <ul><li>4% of abstracts    30 pair results </li></ul><ul><li>All 20000 abstracts  ≈ 750 pair results </li></ul...
Advantages <ul><li>Generic domain independent method for extracting relations  </li></ul><ul><li>Gives the patterns where ...
Extraction of Relations from Unstructured Text <ul><li>Thank You </li></ul><ul><li>Any Questions ?? </li></ul><ul><li>Jams...
Upcoming SlideShare
Loading in...5
×

Automatic extraction of genes and diseases

441

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
441
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Automatic extraction of genes and diseases

  1. 1. Jamsandekar Mugdha Prabhu Priyanka Sugandh Neha Extraction of Relations from Unstructured Text
  2. 2. Introduction <ul><li>Goal - to automatically extract groups of entities of a specific relation from unstructured text. </li></ul><ul><li>HuGE – Human Gene Expression, a medical database which contains about 20,000 abstracts </li></ul><ul><li>(gene, disease) relations </li></ul>
  3. 3. Approach <ul><li>(gene, disease) seed values </li></ul><ul><li>MMTX- Medical Ontology </li></ul><ul><li>Pattern <order, prefix, gene/disease, middle, disease/gene, suffix> </li></ul><ul><li>Weights for prefix, middle and suffix, Stop word </li></ul><ul><li>Threshold, Match </li></ul>
  4. 4. Patterns <ul><li><true| polymorphisms of|LTA gene| are associated with risk of|MI| in Japanese> </li></ul><ul><li><true| allele of|angiotensin converting enzyme (ACE) gene| is associated with|hypoxemia| in sars p> </li></ul><ul><li><false| developing ovarian|cancer| borderline tumours in presence of|BRCA1| mutations> </li></ul><ul><li><true| apolipoprotein|epsilon4 allele| with progression in|AD| pkd may not> </li></ul><ul><li><true| mutations in|BRCA1| brca2 genes explain at least 10 % of breast|cancer| cases diagnosed> </li></ul><ul><li><true| mutation of|BRCA1| contributes little occurrence of breast|cancer| in taiwanese> </li></ul><ul><li><true| between|factor V Leiden gene variant| carotid|atherosclerosis| in cross - sectional> </li></ul>
  5. 5. 30 Relations Obtained – 4% of abstracts
  6. 6. Statistics <ul><li>4% of abstracts  30 pair results </li></ul><ul><li>All 20000 abstracts ≈ 750 pair results </li></ul><ul><li>30 results  1 error pair </li></ul><ul><li>Error ≈ 1/30 = 3.33% </li></ul>
  7. 7. Advantages <ul><li>Generic domain independent method for extracting relations </li></ul><ul><li>Gives the patterns where the entities occur, these can be used in pattern analysis </li></ul><ul><li>Better results in iterative steps </li></ul><ul><li>More efficient as compared to traditional ontology based approach </li></ul>
  8. 8. Extraction of Relations from Unstructured Text <ul><li>Thank You </li></ul><ul><li>Any Questions ?? </li></ul><ul><li>Jamsandekar Mugdha </li></ul><ul><li>Prabhu Priyanka </li></ul><ul><li>Sugandh Neha </li></ul>

×