• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Automatic extraction of genes and diseases
 

Automatic extraction of genes and diseases

on

  • 791 views

 

Statistics

Views

Total Views
791
Views on SlideShare
776
Embed Views
15

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 15

http://pprabhu.freehostia.com 14
http://www.slideshare.net 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Automatic extraction of genes and diseases Automatic extraction of genes and diseases Presentation Transcript

    • Jamsandekar Mugdha Prabhu Priyanka Sugandh Neha Extraction of Relations from Unstructured Text
    • Introduction
      • Goal - to automatically extract groups of entities of a specific relation from unstructured text.
      • HuGE – Human Gene Expression, a medical database which contains about 20,000 abstracts
      • (gene, disease) relations
    • Approach
      • (gene, disease) seed values
      • MMTX- Medical Ontology
      • Pattern <order, prefix, gene/disease, middle, disease/gene, suffix>
      • Weights for prefix, middle and suffix, Stop word
      • Threshold, Match
    • Patterns
      • <true| polymorphisms of|LTA gene| are associated with risk of|MI| in Japanese>
      • <true| allele of|angiotensin converting enzyme (ACE) gene| is associated with|hypoxemia| in sars p>
      • <false| developing ovarian|cancer| borderline tumours in presence of|BRCA1| mutations>
      • <true| apolipoprotein|epsilon4 allele| with progression in|AD| pkd may not>
      • <true| mutations in|BRCA1| brca2 genes explain at least 10 % of breast|cancer| cases diagnosed>
      • <true| mutation of|BRCA1| contributes little occurrence of breast|cancer| in taiwanese>
      • <true| between|factor V Leiden gene variant| carotid|atherosclerosis| in cross - sectional>
    • 30 Relations Obtained – 4% of abstracts
    • Statistics
      • 4% of abstracts  30 pair results
      • All 20000 abstracts ≈ 750 pair results
      • 30 results  1 error pair
      • Error ≈ 1/30 = 3.33%
    • Advantages
      • Generic domain independent method for extracting relations
      • Gives the patterns where the entities occur, these can be used in pattern analysis
      • Better results in iterative steps
      • More efficient as compared to traditional ontology based approach
    • Extraction of Relations from Unstructured Text
      • Thank You
      • Any Questions ??
      • Jamsandekar Mugdha
      • Prabhu Priyanka
      • Sugandh Neha