Your SlideShare is downloading. ×
0
Automatic extraction of genes and diseases
Automatic extraction of genes and diseases
Automatic extraction of genes and diseases
Automatic extraction of genes and diseases
Automatic extraction of genes and diseases
Automatic extraction of genes and diseases
Automatic extraction of genes and diseases
Automatic extraction of genes and diseases
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Automatic extraction of genes and diseases

431

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
431
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Jamsandekar Mugdha Prabhu Priyanka Sugandh Neha Extraction of Relations from Unstructured Text
  • 2. Introduction <ul><li>Goal - to automatically extract groups of entities of a specific relation from unstructured text. </li></ul><ul><li>HuGE – Human Gene Expression, a medical database which contains about 20,000 abstracts </li></ul><ul><li>(gene, disease) relations </li></ul>
  • 3. Approach <ul><li>(gene, disease) seed values </li></ul><ul><li>MMTX- Medical Ontology </li></ul><ul><li>Pattern &lt;order, prefix, gene/disease, middle, disease/gene, suffix&gt; </li></ul><ul><li>Weights for prefix, middle and suffix, Stop word </li></ul><ul><li>Threshold, Match </li></ul>
  • 4. Patterns <ul><li>&lt;true| polymorphisms of|LTA gene| are associated with risk of|MI| in Japanese&gt; </li></ul><ul><li>&lt;true| allele of|angiotensin converting enzyme (ACE) gene| is associated with|hypoxemia| in sars p&gt; </li></ul><ul><li>&lt;false| developing ovarian|cancer| borderline tumours in presence of|BRCA1| mutations&gt; </li></ul><ul><li>&lt;true| apolipoprotein|epsilon4 allele| with progression in|AD| pkd may not&gt; </li></ul><ul><li>&lt;true| mutations in|BRCA1| brca2 genes explain at least 10 % of breast|cancer| cases diagnosed&gt; </li></ul><ul><li>&lt;true| mutation of|BRCA1| contributes little occurrence of breast|cancer| in taiwanese&gt; </li></ul><ul><li>&lt;true| between|factor V Leiden gene variant| carotid|atherosclerosis| in cross - sectional&gt; </li></ul>
  • 5. 30 Relations Obtained – 4% of abstracts
  • 6. Statistics <ul><li>4% of abstracts  30 pair results </li></ul><ul><li>All 20000 abstracts ≈ 750 pair results </li></ul><ul><li>30 results  1 error pair </li></ul><ul><li>Error ≈ 1/30 = 3.33% </li></ul>
  • 7. Advantages <ul><li>Generic domain independent method for extracting relations </li></ul><ul><li>Gives the patterns where the entities occur, these can be used in pattern analysis </li></ul><ul><li>Better results in iterative steps </li></ul><ul><li>More efficient as compared to traditional ontology based approach </li></ul>
  • 8. Extraction of Relations from Unstructured Text <ul><li>Thank You </li></ul><ul><li>Any Questions ?? </li></ul><ul><li>Jamsandekar Mugdha </li></ul><ul><li>Prabhu Priyanka </li></ul><ul><li>Sugandh Neha </li></ul>

×