Successfully reported this slideshow.

Roeder rocky 2011_46

344 views

Published on

Conference Talk: A Distributed Framework for Computation on the Results of Large Scale NLP

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Roeder rocky 2011_46

  1. 1. A Distributed Framework forComputation on the Results of Large Scale NLP Christophe Roeder, William A. Baumgartner Jr., Kevin Livingston, Lawrence E. Hunter (University of Colorado Anschutz Medial Campus) Chris.Roeder@ucdenver.edu http://compbio.ucdenver.edu
  2. 2. Motivation• A vast amount of information is available in journal articles• Journal articles are unstructured text• Many applications require structured knowledge – Curated ontologies (Gene Ontology) – Databases (UniProt, EntrezGene)• Challenge: extract structured knowledge from unstructured text and integrate with existing knowledge…at massive scale
  3. 3. ArchitectureJournal RDF Scaled NLP PipelineArticles(u Documentnstructured) s(structured) Queries Sesam Knowledge e/Hado Base(Ontologi op es, Databases) Knowledg Applications Applications(Visualization e Distilled Applications (Visualization , (Visualization NLP,…) , NLP,…) Output , NLP,…) (structured) Structured Information
  4. 4. Example Application• Concept annotation trends over time Insuli n NOS1 http://tinyurl.com/bio-trends
  5. 5. Summary• NLP pipelines extract structured annotations• Our framework provides massively parallel access to these structured document annotations• Structured representation is integrated with knowledge base• Affords parallelization when possible, and access to knowledge base when necessary• Provides integration of unstructured document text with structured knowledge for enabling applications such as: – Visualization (BioJigsaw, Hanalyzer,…) – Natural Language Understanding (OpenDMAP) – Leveraging text data for validation and evaluation of other methods
  6. 6. Thank You / Questions• http://tinyurl.com/bio-trends• Co-authors – William A. Baumgartner Jr. for data generation – Kevin Livingston for RDF and Clojure help• Grants and PIs – Lawrence E Hunter, UCDenver SOM • NIH 2R01LM009254-04, NIH 2R01LM008111-04A1, NIH 5R01GM083649-02 – Karin Verspoor, UCDenver SOM • NIH R01 LM010120-01 – Gully Burns, ISI • NSF 0849977

×