Analysing Entity Type Variation across Biomedical Subdomains
Upcoming SlideShare
Loading in...5
×
 

Analysing Entity Type Variation across Biomedical Subdomains

on

  • 315 views

 

Statistics

Views

Total Views
315
Views on SlideShare
313
Embed Views
2

Actions

Likes
0
Downloads
2
Comments
0

1 Embed 2

https://www.linkedin.com 2

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Analysing Entity Type Variation across Biomedical Subdomains Analysing Entity Type Variation across Biomedical Subdomains Presentation Transcript

  • Analysing Entity Type Variation across Biomedical SubdomainsClaudiu Mihăilă, Riza Theresa Batista-Navarro, Sophia Ananiadou Claudiu Mihăilă National Centre for Text Mining School of Computer Science University of Manchester 26 May 2012
  • BioTxtM 2012 Introduction • Named entities o Atomic elements, classified into various categories (protein, gene, disease, treatment, metabolite etc.) Theme Organism Theme Organism Pro Pro Pro Transcription +Reg ProIn contrast to the phenotype of the pta ackA double mutant, pbgP transcription was reduced in the pmrD mutant. 2
  • BioTxtM 2012Introduction• Corpora3
  • BioTxtM 2012Methodology• Full-text open-access journal articles from UKPMC• 20 subdomains 400 single broad-subject-termed articles Allergy & Communicable Biology Cell Biology Critical Care Immunology Diseases Health Environmental Medical Genetics Services Medicine Health Informatics Research Microbiology Neoplasms Neurology Pharmacology Physiology Pulmonary Tropical Public Health Rheumatology Virology Medicine Medicine4
  • BioTxtM 2012Methodology• NE source: ASilver = AUKPMC AOscar ANeMine Corpus Annotation Allergy & UKPMC Communicable Critical Care Biology Cell Biology Critical Care Immunology Diseases Health Environmental Medical Medicine Genetics Services Medicine Health Informatics Research OSCAR Physiology Microbiology Neoplasms Neurology Pharmacology Physiology Pulmonary Tropical Virology Public Health Rheumatology NeMine Virology Medicine Medicine 5
  • BioTxtM 2012Methodology NeMine UKPMCGene GeneProtein ProteinDisease DiseaseDrug DrugMetabolite MetaboliteBacteria Gene|ProteinDiagnostic processGeneral phenomenon SilverIndicator AnnotationNatural phenomenon OSCAROrgan Chemical moleculePathologic function Chemical adjectiveSymptom EnzymeTherapeutic process Reaction 6
  • BioTxtM 2012Methodology• Feature vectors Document d Document dEnzyme 2 Enzyme 0.45%Chemical molecule 71 Chemical molecule 14.85%Disease 8 Disease 1.67%Drug 12 Drug 2.51%Gene 15 Gene 3.13%Gene|Protein 155 Gene|Protein 3.24%Metabolite 3 Metabolite 0.62%Protein 188 Protein 39.33%Reaction 24 Reaction 5.02% 7
  • BioTxtM 2012Methodology8
  • BioTxtM 2012Methodology9
  • BioTxtM 2012Methodology• Chi-squared statistics10
  • BioTxtM 2012Methodology• Frobenius norm 1247.072511
  • BioTxtM 2012Feature evaluation• Good features for o Cell Biology o Pharmacology o Health Sciences o Public Health• Not-so-good features for o Medical Informatics o Medicine o Microbiology o Neoplasms o Neurology Frobenius norm of 2 vectors for each pair.12
  • BioTxtM 2012Feature evaluation• Mean Chi-Squared for every feature over all pairs13
  • BioTxtM 2012Classifier selection Classifier Top result count J48 0 0% JRip 4 2.10% Logistic 2 1.05% Random Tree 0 0% Random Forest 86 45.26% SMO 0 0% J48 6 3.15% JRip 7 3.68% Decision Stump 16 8.42% AdaBoost Logistic 0 0% Random Tree 0 0% Random Forest 68 35.78% Random Forest F-score for each5.26% SMO 1 pair.14
  • BioTxtM 2012Classifier evaluation• Dissimilar subdomains o Cell Biology o Pharmacology o Health Sciences o Public Health• Similar subdomains o Medical Informatics o Medicine o Microbiology o Neoplasms o Neurology Random Forest F-score for each pair.15
  • BioTxtM 2012Conclusions• To remember o Significant semantic variation of biomedical sublanguages o Distinguishable bio-subdomains using only NE types o Caution needed when adapting NLP tools to subdomains• To do o Extension to bio-events o Combination with lexical, syntactical, discourse features o Extension to other domains16
  • BioTxtM 2012Thank you! http://misteringo.deviantart.com/art/Bunnies-Scream-Again-7974597417