SlideShare a Scribd company logo
1 of 25
MEBI 591C/598 – Text Mining/NLP
                   Subproblems
                  MelihaYetisgen-Yildiz
From last week’s discussion
Presentation
   Schedule:
    http://faculty.washington.edu/melihay/MEBI591C.htm
       50 minutes presentation+discussion+question answering
   Content:
           Research/Project Idea
               Motivation + Problem + Potential Solution
           Survey or literature review
               A general area
                   Text mining: named entity recognition - gene name identification
                   Data Mining: classification, clustering
               Available resources for a given area
                   Open source libraries
                   Data resources
           Paper
               Conference or journal article
       Preparation:
           Email the plan + reading list at least 3 days prior to class
           GoMap Discussion List
System Design
   Team:
        Marcin, Wynona, Karl, Stella, Francisco, Jeffry, Safiyyah
         (not registered)
   Example data released:
        https://www.i2b2.org/NLP/Relations/Documentation.php
   The fourth i2b2 challenge is a three tiered challenge
    that studies:
    1.    extraction of medical problems, tests, and treatments
    2.    classification of assertions made on medical problems
    3.    relations of medical problems, tests, and treatments
2010 - I2b2 Challenge
   Important Dates:
       March 5th – Registration opens
       April 15th – Commitment to Participate in Challenge &
        Training Data Release
       July 15th – Test Data Release
       September 1st – Short papers due
       October 1st – Invitations to present at the Workshop
       November, 2010 – Workshop
   Preparations
       Linux server + accounts (meliha)
             Accounts
             Dev environment
             Subversion ?
Text Mining/NLP Sub-problems – Part 1
   Sentence Delimiters
   Tokenizers
   Part-of-Speech Tags
   Collocations
Sentence Delimiters
   Document -> Paragraph -> Sentences
   Sentence boundary disambiguation (SBD) is the problem in
    NLP of deciding where sentences begin and end.
   Sentence boundary identification is challenging because
    punctuation marks are often ambiguous.
       period may denote
           Abbreviation
           Decimal point
           Email address
           About 47% of the periods in the Wall Street Journal corpus denote
            abbreviations.
       Question marks and exclamation marks may appear
           embedded quotations, emotions, computer code, and slang
   Tools:
       OpenNLP has a class for sentence detection
       NacTEM: http://text0.mib.man.ac.uk:8080/scottpiao/sent_detector
Tokenization
   Document -> Paragraph -> Sentence -> Tokens
   Based on white-space characters
       In Unicode (Unicode Character Database) the following
        codepoints are defined as whitespace:
           U+0009–U+000D (control characters, containing Tab, CR and LF)
           U+0020 SPACE
           U+0085 NEL (control character next line)
           U+00A0 NBSP (NO-BREAK SPACE)
           U+1680 OGHAM SPACE MARK
           U+180E MONGOLIAN VOWEL SEPARATOR
           U+2000–U+200A (different sorts of spaces)
           U+2028 LS (LINE SEPARATOR)
           U+2029 PS (PARAGRAPH SEPARATOR)
           U+202F NNBSP (NARROW NO-BREAK SPACE)
           U+205F MMSP (MEDIUM MATHEMATICAL SPACE)
           U+3000 IDEOGRAPHIC SPACE
Part-OF-Speech Tagging

    “The process of assigning a part-of-speech or
    other lexical class marker to each word in a
    corpus” (Jurafsky and Martin)

           WORDS
                                  TAGS
              the
              girl
              kissed              N
              the                 V
              boy                 P
              on                  DET
              the
              cheek
Penn Tree POS Tags
1. CC Coordinating conjunction                   19. PRP$ Possessive pronoun
2. CD Cardinal number                            20. RB Adverb
3. DT Determiner                                 21. RBR Adverb, comparative
4. EX Existential there                          22. RBS Adverb, superlative
5. FW Foreign word                               23. RP Particle
6. IN Preposition or subordinating conjunction   24. SYM Symbol
7. JJ Adjective                                  25. TO to
8. JJR Adjective, comparative                    26. UH Interjection
9. JJS Adjective, superlative                    27. VB Verb, base form
10. LS List item marker                          28. VBD Verb, past tense
11. MD Modal                                     29. VBG Verb, gerund or present participle
12. NN Noun, singular or mass                    30. VBN Verb, past participle
13. NNS Noun, plural                             31. VBP Verb, non-3rd person singular present
14. NNP Proper noun, singular                    32. VBZ Verb, 3rd person singular present
15. NNPS Proper noun, plural                     33. WDT Wh-determiner
16. PDT Predeterminer                            34. WP Wh-pronoun
17. POS Possessive ending                        35. WP$ Possessive wh-pronoun
18. PRP Personal pronoun                         36. WRB Wh-adverb
Applications of Tagging
   Partial parsing: syntactic analysis
   Information Extraction: tagging and partial parsing help
    identify useful terms and relationships between them.
   Information Retrieval: noun phrase recognition and
    query-document matching based on meaningful units
    rather than individual terms.
   Question Answering: analyzing a query to understand
    what type of entity the user is looking for and how it is
    related to other noun phrases mentioned in the question.
Information Souces in Tagging
   How do we decide the correct POS for a word?
       Syntagmatic Information: Look at tags of other words in
        the context of the word we are interested in.
       Lexical Information: Predicting a tag based on the word
        concerned. For words with a number of POS, they usually
        occur used as one particular POS.
POS Approaches – Rule Bases
•   Basic Idea:
    –   Assign all possible tags to words
    –   Remove tags according to set of rules of type: if word+1 is an
        adj, adv, or quantifier and the following is a sentence boundary
        and word-1 is not a verb like “consider” then eliminate non-adv
        else eliminate adv.
    –   Typically more than 1000 hand-written rules, but may be
        machine-learned.
POS Approaches – Machine Learning
 •   Based on probability of certain tag occurring given
     various possibilities
 •   Requires a training corpus
 •   Training corpus may be different from test corpus.
 •   Examples
     •   Hidden Markov Model Taggers
     •   Transformation Based Taggers
     •   Maximum Entropy Taggers

     Ling572 (Advanced Statistical Methods in NLP) -
        http://courses.washington.edu/ling572/winter10/teaching_slides/ne
        w_syllabus.htm
Tagging Accuracy
   Ranges from 95%-97%
   Depends on:
       Amount of training data available.
       Difference between training corpus and dictionary and
        the corpus of application.
       Unknown words in the corpus of application.
Tagging Unknown Words

 •   New words added to (newspaper) language 20+ per
     month
 •   Plus many proper names …
 •   Increases error rates by 1-2%

 •   Method 1: assume they are nouns
 •   Method 2: assume the unknown words have a
     probability distribution similar to words only occurring
     once in the training set.
 •   Method 3: Use morphological information, e.g., words
     ending with –ed tend to be tagged VBN.
POS Taggers
Freely downloadable Part of Speech Taggers
 Stanford POS taggerLoglinear tagger in Java (by Kristina Toutanova)
 hunpos An HMM tagger with models available for English and Hungarian. A reimplementation of
   TnT (see below) in OCaml. pre-compiled models. Runs on Linux, Mac OS X, and Windows.
 MBT: Memory-based Tagger Based on TiMBLTreeTagger A decision tree based tagger from the
   University of Stuttgart (Helmut Scmid). It's language independent, but comes complete with
   parameter files for English, German, Italian, Dutch, French, Old French, Spanish, Bulgarian, and
   Russian. (Linux, Sparc-Solaris, Windows, and Mac OS X versions. Binary distribution only.) Page
   has links to sites where you can run it online.
 SVMTool POS Tagger based on SVMs (uses SVMlight). LGPL.
 ACOPOST (formerly ICOPOST) Open source C taggers originally written by by Ingo Schröder.
   Implements maximum entropy, HMM trigram, and transformation-based learning. C source
   available under GNU public license.
 MXPOST: AdwaitRatnaparkhi's Maximum Entropy part of speech tagger Java POS tagger. A
   sentence boundary detector (MXTERMINATOR) is also included. Original version was only
   JDK1.1; later version worked with JDK1.3+. Class files, not source.
 fnTBL A fast and flexible implementation of Transformation-Based Learning in C++. Includes a
   POS tagger, but also NP chunking and general chunking models.
 mu-TBL An implementation of a Transformation-based Learner (a la Brill), usable for POS tagging
   and other things by Torbjörn Lager. Web demo also available. Prolog.
 YamCha SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source.
   Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an end user.)
 QTAG Part of speech tagger An HMM-based Java POS tagger from Birmingham U. (Oliver
   Mason). English and German parameter files. [Java class files, not source.]
Collocations
   A collocation is an expression consisting two or more
    words that correspond to some conventional way of
    saying things
   Methods:
       Simplest solution – counting
           Google 5-gram corpus (2006)
               ceramics collectables fine 130
               ceramics collected by 52
               ceramics collection , 144
               ceramics collection . 247
       Use POS Tags
       Use Noun Phrase Chunking / Parsing
NLP/Text Mining POINTERS
   NLP BOOKS:
       Manning and Schütze, Foundations of Statistical Natural
        Language Processing (MIT Press, 1999).
       Jurafsky, Daniel, and James H. Martin. 2009. Speech and
        Language Processing: An Introduction to Natural
        Language Processing, Speech Recognition, and
        Computational Linguistics. 2nd edition. Prentice-Hall.
Books on Regular Expressions
   Jeffrey E.F. Friedl, Mastering Regular Expressions,
    O’Reilly.
   Jan Goyvaerts, Regular Expressions Cookbook,
    O’Reilly
NLP Research Groups
   Stanford NLP Group
       http://nlp.stanford.edu/
   CMU NLP Group
       http://www.cs.cmu.edu/~nasmith/nlp-cl.html
   Upenn NLP Group
       http://nlp.cis.upenn.edu/
   NACTEM – National Center for Text Mining
       http://www.nactem.ac.uk/
   UW – Turing Center
       http://turing.cs.washington.edu/
NLP Libraries
   List of tools from Stanford NLP webpage
       http://nlp.stanford.edu/links/statnlp.html
   Mallet – Machine learning for language toolkit
       MALLET is a Java-based package for statistical natural language
        processing, document classification, clustering, topic modeling, information
        extraction, and other machine learning applications to text.
       UMASS - http://mallet.cs.umass.edu/
   Minorthird
       MinorThird is a collection of Java classes for storing text, annotating text, and
        learning to extract entities and categorize text.
       CMU - http://sourceforge.net/apps/trac/minorthird/wiki
   OpenNLP
       OpenNLP hosts a variety of java-based NLP tools which perform sentence
        detection, tokenization, pos-tagging, chunking and parsing, named-entity
        detection, and coreference using the OpenNLPMaxent machine learning package.
       http://opennlp.sourceforge.net/
   GATE
       General architecture for NLP tasks
       http://gate.ac.uk/
Biomedical NLP and Text Mining Tools
   Metamap (MMTx) - NLM
       http://mmtx.nlm.nih.gov/
   Negex, Context – University of Pittsburg – BluLab
       http://www.dbmi.pitt.edu/blulab/index.html
   Ctakes – Mayo Clinic
       https://cabig-
        kc.nci.nih.gov/Vocab/KC/index.php/OHNLP_Documentati
        on_and_Downloads
Bio-medicial Text Mining Tools
   Chilibot — A tool for finding relationships between genes or gene products.
   EBIMed - EBIMed is a web application that combines Information Retrieval and Extraction from Medline. [1]
   FABLE — A gene-centric text-mining search engine for Medline
   GOAnnotator, an online tool that uses semantic similarity for verification of electronic protein annotations using GO
    terms automatically extracted from literature.
   GoPubMed — retrieves Medline abstracts for your search query, then detects ontology terms from the Gene
    Ontology and Medical Subject Headings in the abstracts and allows the user to browse the search results by
    exploring the ontologies and displaying only papers mentioning selected terms, their synonyms or descendants.
   Information Hyperlinked Over Proteins (iHOP)[2]: "A network of concurring genes and proteins extends through the
    scientific literature touching on phenotypes, pathologies and gene function. iHOP provides this network as a natural
    way of accessing millions of Medline abstracts. By using genes and proteins as hyperlinks between sentences and
    abstracts, the information in Medline can be converted into one navigable resource, bringing all advantages of the
    internet to scientific literature research."
   LitInspector — Gene and signal transduction pathway data mining in Medline abstracts.
   NextBio- Life sciences search engine with a text mining functionality that utilizes Medline abstracts and clinical trials
    to return concepts relevant to the query based on a number of heuristics including ontology relationships, journal
    impact, publication date, and authorship.
   PubAnatomy — An interactive visual search engine that provides new ways to explore relationships among Medline
    literature, text mining results, anatomical structures, gene expression and other background information.
   PubGene — Co-occurrence network display of gene and protein symbols as well as MeSH, GO, PubChem and
    interaction terms (such as "binds" or "induces") as these appear in Medline records (that is, PubMed titles and
    abstracts).
   TexFlame, an online tool that renders a single Medline abstract as a Systems Biology Graphical Notation (SBGN)-
    like graph. The graph is a complete syntactic-semantic representation of the abstract.
   Whatizit - Whatizit is great at identifying molecular biology terms and linking them to publicly available databases.
   XTractor — Discovering Newer Scientific Relations Across PubMed Abstracts. A tool to obtain manually
    annotated,expertcurated relationships for Proteins, Diseases, Drugs and Biological Processes as they get
    published in Medline.
Literature-based discovery tools
   Arrowsmith - UIC-based site for searching links
    between two literatures within Medline. Also contains
    the Author-ity tool for disambiguating authors on
    scientific papers, and the Anne O'Tate tool for
    summarizing a results of a PubMed query.
   BITOLAhelps biomedical researchers make new
    discoveries by discovering potentially new relations
    between biomedical concepts.
   Manjal another LBD tools by PadminiSrinivasan

More Related Content

What's hot

DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyanrudolf eremyan
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Jorge Baptista
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedValeria de Paiva
 
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...Editor IJARCET
 
A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...Ilia Karpov
 
OUTDATED Text Mining 3/5: String Processing
OUTDATED Text Mining 3/5: String ProcessingOUTDATED Text Mining 3/5: String Processing
OUTDATED Text Mining 3/5: String ProcessingFlorian Leitner
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...Guy De Pauw
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalTony Russell-Rose
 
referát.doc
referát.docreferát.doc
referát.docbutest
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introductionThennarasuSakkan
 
Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)ThennarasuSakkan
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Francisco Manuel Rangel Pardo
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportAlexandre Rademaker
 

What's hot (20)

DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction Revisited
 
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
 
A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...
 
OUTDATED Text Mining 3/5: String Processing
OUTDATED Text Mining 3/5: String ProcessingOUTDATED Text Mining 3/5: String Processing
OUTDATED Text Mining 3/5: String Processing
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
 
NLP
NLPNLP
NLP
 
referát.doc
referát.docreferát.doc
referát.doc
 
Intro to NLP. Lecture 2
Intro to NLP.  Lecture 2Intro to NLP.  Lecture 2
Intro to NLP. Lecture 2
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introduction
 
Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and Systems
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project Report
 

Similar to MEBI 591C/598 – Data and Text Mining in Biomedical Informatics

Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...butest
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative researchGhulam Qambar
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflowseungwoo kim
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationssChandan Deb
 
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionLuca Nannini
 
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...Rommel Carvalho
 
Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Marcia Zeng
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...Iman Mirrezaei
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Bay Area NLP Reading Group - 7.12.16
Bay Area NLP Reading Group - 7.12.16 Bay Area NLP Reading Group - 7.12.16
Bay Area NLP Reading Group - 7.12.16 Katie Bauer
 
Bondec - A Sentence Boundary Detector
Bondec - A Sentence Boundary DetectorBondec - A Sentence Boundary Detector
Bondec - A Sentence Boundary Detectorbutest
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingSean Golliher
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptxsiddhantroy13
 

Similar to MEBI 591C/598 – Data and Text Mining in Biomedical Informatics (20)

Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Lidia Pivovarova
Lidia PivovarovaLidia Pivovarova
Lidia Pivovarova
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative research
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
 
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
PrOntoLearn: Unsupervised Lexico-Semantic Ontology Generation using Probabili...
 
Class14
Class14Class14
Class14
 
Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Bay Area NLP Reading Group - 7.12.16
Bay Area NLP Reading Group - 7.12.16 Bay Area NLP Reading Group - 7.12.16
Bay Area NLP Reading Group - 7.12.16
 
Bondec - A Sentence Boundary Detector
Bondec - A Sentence Boundary DetectorBondec - A Sentence Boundary Detector
Bondec - A Sentence Boundary Detector
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
1 l5eng
1 l5eng1 l5eng
1 l5eng
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptx
 
NLP todo
NLP todoNLP todo
NLP todo
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

MEBI 591C/598 – Data and Text Mining in Biomedical Informatics

  • 1. MEBI 591C/598 – Text Mining/NLP Subproblems MelihaYetisgen-Yildiz
  • 2. From last week’s discussion
  • 3. Presentation  Schedule: http://faculty.washington.edu/melihay/MEBI591C.htm  50 minutes presentation+discussion+question answering  Content:  Research/Project Idea  Motivation + Problem + Potential Solution  Survey or literature review  A general area  Text mining: named entity recognition - gene name identification  Data Mining: classification, clustering  Available resources for a given area  Open source libraries  Data resources  Paper  Conference or journal article  Preparation:  Email the plan + reading list at least 3 days prior to class  GoMap Discussion List
  • 4. System Design  Team:  Marcin, Wynona, Karl, Stella, Francisco, Jeffry, Safiyyah (not registered)  Example data released:  https://www.i2b2.org/NLP/Relations/Documentation.php  The fourth i2b2 challenge is a three tiered challenge that studies: 1. extraction of medical problems, tests, and treatments 2. classification of assertions made on medical problems 3. relations of medical problems, tests, and treatments
  • 5. 2010 - I2b2 Challenge  Important Dates:  March 5th – Registration opens  April 15th – Commitment to Participate in Challenge & Training Data Release  July 15th – Test Data Release  September 1st – Short papers due  October 1st – Invitations to present at the Workshop  November, 2010 – Workshop  Preparations  Linux server + accounts (meliha)  Accounts  Dev environment  Subversion ?
  • 6. Text Mining/NLP Sub-problems – Part 1  Sentence Delimiters  Tokenizers  Part-of-Speech Tags  Collocations
  • 7. Sentence Delimiters  Document -> Paragraph -> Sentences  Sentence boundary disambiguation (SBD) is the problem in NLP of deciding where sentences begin and end.  Sentence boundary identification is challenging because punctuation marks are often ambiguous.  period may denote  Abbreviation  Decimal point  Email address  About 47% of the periods in the Wall Street Journal corpus denote abbreviations.  Question marks and exclamation marks may appear  embedded quotations, emotions, computer code, and slang  Tools:  OpenNLP has a class for sentence detection  NacTEM: http://text0.mib.man.ac.uk:8080/scottpiao/sent_detector
  • 8. Tokenization  Document -> Paragraph -> Sentence -> Tokens  Based on white-space characters  In Unicode (Unicode Character Database) the following codepoints are defined as whitespace:  U+0009–U+000D (control characters, containing Tab, CR and LF)  U+0020 SPACE  U+0085 NEL (control character next line)  U+00A0 NBSP (NO-BREAK SPACE)  U+1680 OGHAM SPACE MARK  U+180E MONGOLIAN VOWEL SEPARATOR  U+2000–U+200A (different sorts of spaces)  U+2028 LS (LINE SEPARATOR)  U+2029 PS (PARAGRAPH SEPARATOR)  U+202F NNBSP (NARROW NO-BREAK SPACE)  U+205F MMSP (MEDIUM MATHEMATICAL SPACE)  U+3000 IDEOGRAPHIC SPACE
  • 9. Part-OF-Speech Tagging “The process of assigning a part-of-speech or other lexical class marker to each word in a corpus” (Jurafsky and Martin) WORDS TAGS the girl kissed N the V boy P on DET the cheek
  • 10. Penn Tree POS Tags 1. CC Coordinating conjunction 19. PRP$ Possessive pronoun 2. CD Cardinal number 20. RB Adverb 3. DT Determiner 21. RBR Adverb, comparative 4. EX Existential there 22. RBS Adverb, superlative 5. FW Foreign word 23. RP Particle 6. IN Preposition or subordinating conjunction 24. SYM Symbol 7. JJ Adjective 25. TO to 8. JJR Adjective, comparative 26. UH Interjection 9. JJS Adjective, superlative 27. VB Verb, base form 10. LS List item marker 28. VBD Verb, past tense 11. MD Modal 29. VBG Verb, gerund or present participle 12. NN Noun, singular or mass 30. VBN Verb, past participle 13. NNS Noun, plural 31. VBP Verb, non-3rd person singular present 14. NNP Proper noun, singular 32. VBZ Verb, 3rd person singular present 15. NNPS Proper noun, plural 33. WDT Wh-determiner 16. PDT Predeterminer 34. WP Wh-pronoun 17. POS Possessive ending 35. WP$ Possessive wh-pronoun 18. PRP Personal pronoun 36. WRB Wh-adverb
  • 11. Applications of Tagging  Partial parsing: syntactic analysis  Information Extraction: tagging and partial parsing help identify useful terms and relationships between them.  Information Retrieval: noun phrase recognition and query-document matching based on meaningful units rather than individual terms.  Question Answering: analyzing a query to understand what type of entity the user is looking for and how it is related to other noun phrases mentioned in the question.
  • 12. Information Souces in Tagging  How do we decide the correct POS for a word?  Syntagmatic Information: Look at tags of other words in the context of the word we are interested in.  Lexical Information: Predicting a tag based on the word concerned. For words with a number of POS, they usually occur used as one particular POS.
  • 13. POS Approaches – Rule Bases • Basic Idea: – Assign all possible tags to words – Remove tags according to set of rules of type: if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv. – Typically more than 1000 hand-written rules, but may be machine-learned.
  • 14. POS Approaches – Machine Learning • Based on probability of certain tag occurring given various possibilities • Requires a training corpus • Training corpus may be different from test corpus. • Examples • Hidden Markov Model Taggers • Transformation Based Taggers • Maximum Entropy Taggers Ling572 (Advanced Statistical Methods in NLP) - http://courses.washington.edu/ling572/winter10/teaching_slides/ne w_syllabus.htm
  • 15. Tagging Accuracy  Ranges from 95%-97%  Depends on:  Amount of training data available.  Difference between training corpus and dictionary and the corpus of application.  Unknown words in the corpus of application.
  • 16. Tagging Unknown Words • New words added to (newspaper) language 20+ per month • Plus many proper names … • Increases error rates by 1-2% • Method 1: assume they are nouns • Method 2: assume the unknown words have a probability distribution similar to words only occurring once in the training set. • Method 3: Use morphological information, e.g., words ending with –ed tend to be tagged VBN.
  • 17. POS Taggers Freely downloadable Part of Speech Taggers  Stanford POS taggerLoglinear tagger in Java (by Kristina Toutanova)  hunpos An HMM tagger with models available for English and Hungarian. A reimplementation of TnT (see below) in OCaml. pre-compiled models. Runs on Linux, Mac OS X, and Windows.  MBT: Memory-based Tagger Based on TiMBLTreeTagger A decision tree based tagger from the University of Stuttgart (Helmut Scmid). It's language independent, but comes complete with parameter files for English, German, Italian, Dutch, French, Old French, Spanish, Bulgarian, and Russian. (Linux, Sparc-Solaris, Windows, and Mac OS X versions. Binary distribution only.) Page has links to sites where you can run it online.  SVMTool POS Tagger based on SVMs (uses SVMlight). LGPL.  ACOPOST (formerly ICOPOST) Open source C taggers originally written by by Ingo Schröder. Implements maximum entropy, HMM trigram, and transformation-based learning. C source available under GNU public license.  MXPOST: AdwaitRatnaparkhi's Maximum Entropy part of speech tagger Java POS tagger. A sentence boundary detector (MXTERMINATOR) is also included. Original version was only JDK1.1; later version worked with JDK1.3+. Class files, not source.  fnTBL A fast and flexible implementation of Transformation-Based Learning in C++. Includes a POS tagger, but also NP chunking and general chunking models.  mu-TBL An implementation of a Transformation-based Learner (a la Brill), usable for POS tagging and other things by Torbjörn Lager. Web demo also available. Prolog.  YamCha SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source. Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an end user.)  QTAG Part of speech tagger An HMM-based Java POS tagger from Birmingham U. (Oliver Mason). English and German parameter files. [Java class files, not source.]
  • 18. Collocations  A collocation is an expression consisting two or more words that correspond to some conventional way of saying things  Methods:  Simplest solution – counting  Google 5-gram corpus (2006)  ceramics collectables fine 130  ceramics collected by 52  ceramics collection , 144  ceramics collection . 247  Use POS Tags  Use Noun Phrase Chunking / Parsing
  • 19. NLP/Text Mining POINTERS  NLP BOOKS:  Manning and Schütze, Foundations of Statistical Natural Language Processing (MIT Press, 1999).  Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall.
  • 20. Books on Regular Expressions  Jeffrey E.F. Friedl, Mastering Regular Expressions, O’Reilly.  Jan Goyvaerts, Regular Expressions Cookbook, O’Reilly
  • 21. NLP Research Groups  Stanford NLP Group  http://nlp.stanford.edu/  CMU NLP Group  http://www.cs.cmu.edu/~nasmith/nlp-cl.html  Upenn NLP Group  http://nlp.cis.upenn.edu/  NACTEM – National Center for Text Mining  http://www.nactem.ac.uk/  UW – Turing Center  http://turing.cs.washington.edu/
  • 22. NLP Libraries  List of tools from Stanford NLP webpage  http://nlp.stanford.edu/links/statnlp.html  Mallet – Machine learning for language toolkit  MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.  UMASS - http://mallet.cs.umass.edu/  Minorthird  MinorThird is a collection of Java classes for storing text, annotating text, and learning to extract entities and categorize text.  CMU - http://sourceforge.net/apps/trac/minorthird/wiki  OpenNLP  OpenNLP hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLPMaxent machine learning package.  http://opennlp.sourceforge.net/  GATE  General architecture for NLP tasks  http://gate.ac.uk/
  • 23. Biomedical NLP and Text Mining Tools  Metamap (MMTx) - NLM  http://mmtx.nlm.nih.gov/  Negex, Context – University of Pittsburg – BluLab  http://www.dbmi.pitt.edu/blulab/index.html  Ctakes – Mayo Clinic  https://cabig- kc.nci.nih.gov/Vocab/KC/index.php/OHNLP_Documentati on_and_Downloads
  • 24. Bio-medicial Text Mining Tools  Chilibot — A tool for finding relationships between genes or gene products.  EBIMed - EBIMed is a web application that combines Information Retrieval and Extraction from Medline. [1]  FABLE — A gene-centric text-mining search engine for Medline  GOAnnotator, an online tool that uses semantic similarity for verification of electronic protein annotations using GO terms automatically extracted from literature.  GoPubMed — retrieves Medline abstracts for your search query, then detects ontology terms from the Gene Ontology and Medical Subject Headings in the abstracts and allows the user to browse the search results by exploring the ontologies and displaying only papers mentioning selected terms, their synonyms or descendants.  Information Hyperlinked Over Proteins (iHOP)[2]: "A network of concurring genes and proteins extends through the scientific literature touching on phenotypes, pathologies and gene function. iHOP provides this network as a natural way of accessing millions of Medline abstracts. By using genes and proteins as hyperlinks between sentences and abstracts, the information in Medline can be converted into one navigable resource, bringing all advantages of the internet to scientific literature research."  LitInspector — Gene and signal transduction pathway data mining in Medline abstracts.  NextBio- Life sciences search engine with a text mining functionality that utilizes Medline abstracts and clinical trials to return concepts relevant to the query based on a number of heuristics including ontology relationships, journal impact, publication date, and authorship.  PubAnatomy — An interactive visual search engine that provides new ways to explore relationships among Medline literature, text mining results, anatomical structures, gene expression and other background information.  PubGene — Co-occurrence network display of gene and protein symbols as well as MeSH, GO, PubChem and interaction terms (such as "binds" or "induces") as these appear in Medline records (that is, PubMed titles and abstracts).  TexFlame, an online tool that renders a single Medline abstract as a Systems Biology Graphical Notation (SBGN)- like graph. The graph is a complete syntactic-semantic representation of the abstract.  Whatizit - Whatizit is great at identifying molecular biology terms and linking them to publicly available databases.  XTractor — Discovering Newer Scientific Relations Across PubMed Abstracts. A tool to obtain manually annotated,expertcurated relationships for Proteins, Diseases, Drugs and Biological Processes as they get published in Medline.
  • 25. Literature-based discovery tools  Arrowsmith - UIC-based site for searching links between two literatures within Medline. Also contains the Author-ity tool for disambiguating authors on scientific papers, and the Anne O'Tate tool for summarizing a results of a PubMed query.  BITOLAhelps biomedical researchers make new discoveries by discovering potentially new relations between biomedical concepts.  Manjal another LBD tools by PadminiSrinivasan

Editor's Notes

  1. People who haven’t returned me with dates: Daniel and Marcin
  2. Stanford: Chris Manning and DanielJurafskyCMU: William Cohen