• Save
Automatic extraction of genes and diseases
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
847
On Slideshare
832
From Embeds
15
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 15

http://pprabhu.freehostia.com 14
http://www.slideshare.net 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Jamsandekar Mugdha Prabhu Priyanka Sugandh Neha Extraction of Relations from Unstructured Text
  • 2. Introduction
    • Goal - to automatically extract groups of entities of a specific relation from unstructured text.
    • HuGE – Human Gene Expression, a medical database which contains about 20,000 abstracts
    • (gene, disease) relations
  • 3. Approach
    • (gene, disease) seed values
    • MMTX- Medical Ontology
    • Pattern <order, prefix, gene/disease, middle, disease/gene, suffix>
    • Weights for prefix, middle and suffix, Stop word
    • Threshold, Match
  • 4. Patterns
    • <true| polymorphisms of|LTA gene| are associated with risk of|MI| in Japanese>
    • <true| allele of|angiotensin converting enzyme (ACE) gene| is associated with|hypoxemia| in sars p>
    • <false| developing ovarian|cancer| borderline tumours in presence of|BRCA1| mutations>
    • <true| apolipoprotein|epsilon4 allele| with progression in|AD| pkd may not>
    • <true| mutations in|BRCA1| brca2 genes explain at least 10 % of breast|cancer| cases diagnosed>
    • <true| mutation of|BRCA1| contributes little occurrence of breast|cancer| in taiwanese>
    • <true| between|factor V Leiden gene variant| carotid|atherosclerosis| in cross - sectional>
  • 5. 30 Relations Obtained – 4% of abstracts
  • 6. Statistics
    • 4% of abstracts  30 pair results
    • All 20000 abstracts ≈ 750 pair results
    • 30 results  1 error pair
    • Error ≈ 1/30 = 3.33%
  • 7. Advantages
    • Generic domain independent method for extracting relations
    • Gives the patterns where the entities occur, these can be used in pattern analysis
    • Better results in iterative steps
    • More efficient as compared to traditional ontology based approach
  • 8. Extraction of Relations from Unstructured Text
    • Thank You
    • Any Questions ??
    • Jamsandekar Mugdha
    • Prabhu Priyanka
    • Sugandh Neha