SlideShare a Scribd company logo
1 of 28
Identifying genes and proteins in text: a short review
           of available tools and resources

                            Nathan Harmston

                           Theoretical Systems Biology
                            Centre for Bioinformatics
       Centre for Integrative Systems Biology at Imperial College London


                               24/02/2011




   Nathan Harmston                Review of Gene NER                  24/02/2011   1 / 15
Deluge/Flood/Tsunami of publications




Literature contains important knowledge which is generated by researchers and
ideally not just something to promote their career.
        Nathan Harmston            Review of Gene NER             24/02/2011    2 / 15
Named Entity Recognition
Selection of sup1 and sup2 mutants in the yeast Saccharomyces cerevisiae on
cycloheximide containing media revealed classes of mutants that either are
completely unable to grow on YAPD without cycloheximide or need this drug
under high temperature incubation (30 or 36 degrees C). Some of these mutants
also exhibit the growth dependence on another antibiotic– trichodermin, and, at
the same time, the osmotic dependence. A hypothesis claiming that sup1 and
sup2 mutations cause conformational lability of yeast cytoplasmic ribosomes has
been put forward. It is also proposed that binding of cycloheximide and
trichodermin to the mutant ribosomes cause their conformational shift, which
compensates the functional defects.




        Nathan Harmston            Review of Gene NER             24/02/2011   3 / 15
Named Entity Recognition
Selection of sup1 and sup2 mutants in the yeast Saccharomyces cerevisiae on
cycloheximide containing media revealed classes of mutants that either are
completely unable to grow on YAPD without cycloheximide or need this drug
under high temperature incubation (30 or 36 degrees C). Some of these mutants
also exhibit the growth dependence on another antibiotic– trichodermin, and, at
the same time, the osmotic dependence. A hypothesis claiming that sup1 and
sup2 mutations cause conformational lability of yeast cytoplasmic ribosomes has
been put forward. It is also proposed that binding of cycloheximide and
trichodermin to the mutant ribosomes cause their conformational shift, which
compensates the functional defects.




        Nathan Harmston            Review of Gene NER             24/02/2011   3 / 15
Named Entity Recognition
Selection of sup1 and sup2 mutants in the yeast Saccharomyces cerevisiae on
cycloheximide containing media revealed classes of mutants that either are
completely unable to grow on YAPD without cycloheximide or need this drug
under high temperature incubation (30 or 36 degrees C). Some of these mutants
also exhibit the growth dependence on another antibiotic– trichodermin, and, at
the same time, the osmotic dependence. A hypothesis claiming that sup1 and
sup2 mutations cause conformational lability of yeast cytoplasmic ribosomes has
been put forward. It is also proposed that binding of cycloheximide and
trichodermin to the mutant ribosomes cause their conformational shift, which
compensates the functional defects.




        Nathan Harmston            Review of Gene NER             24/02/2011   3 / 15
Named Entity Recognition
Selection of sup1 and sup2 mutants in the yeast Saccharomyces cerevisiae on
cycloheximide containing media revealed classes of mutants that either are
completely unable to grow on YAPD without cycloheximide or need this drug
under high temperature incubation (30 or 36 degrees C). Some of these mutants
also exhibit the growth dependence on another antibiotic– trichodermin, and, at
the same time, the osmotic dependence. A hypothesis claiming that sup1 and
sup2 mutations cause conformational lability of yeast cytoplasmic ribosomes has
been put forward. It is also proposed that binding of cycloheximide and
trichodermin to the mutant ribosomes cause their conformational shift, which
compensates the functional defects.




        Nathan Harmston            Review of Gene NER             24/02/2011   3 / 15
Named Entity Recognition
Selection of sup1 and sup2 mutants in the yeast Saccharomyces cerevisiae on
cycloheximide containing media revealed classes of mutants that either are
completely unable to grow on YAPD without cycloheximide or need this drug
under high temperature incubation (30 or 36 degrees C). Some of these mutants
also exhibit the growth dependence on another antibiotic– trichodermin, and, at
the same time, the osmotic dependence. A hypothesis claiming that sup1 and
sup2 mutations cause conformational lability of yeast cytoplasmic ribosomes has
been put forward. It is also proposed that binding of cycloheximide and
trichodermin to the mutant ribosomes cause their conformational shift, which
compensates the functional defects.

     Genes have many different names e.g. { P53, TP53, Hs.1845, TRP53 }
     Gene names are subject to morphological (transcription factor,
     transcriptional factor), orthographic (NF kappa B, NF kappaB),
     combinatorial (homolog of actin, actin homolog) and inflectional variation
     (antibody, antibodies).
     Some names overlap with normal english breathless, Not, That
     Deciding when a term refers to a gene, RNA or a protein is difficult: pspA,
     PspA
        Nathan Harmston            Review of Gene NER             24/02/2011   3 / 15
Problems

HUNK is associated with expression of Frizzled 2
    HUman Natural Killer




       Nathan Harmston        Review of Gene NER   24/02/2011   4 / 15
Problems

HUNK is associated with expression of Frizzled 2
    HUman Natural Killer
    Large piece of something without definite shape




       Nathan Harmston         Review of Gene NER    24/02/2011   4 / 15
Problems

HUNK is associated with expression of Frizzled 2
    HUman Natural Killer
    Large piece of something without definite shape
    A well built sexually attractive man




       Nathan Harmston           Review of Gene NER   24/02/2011   4 / 15
Problems

HUNK is associated with expression of Frizzled 2
    HUman Natural Killer
    Large piece of something without definite shape
    A well built sexually attractive man
    Hormonally Upregulated Neu-associated Kinase




       Nathan Harmston           Review of Gene NER   24/02/2011   4 / 15
Methods
   dictionary
          BioThesaurus
          fuzzy matching techniques (Levenshtein, Jaro, Jaro-Winkler)
          BLAST
          Whatizit, Reflect.WS
   rule/pattern based matching
          good for things like Yeast genes, but rubbish for fruitfly
          ABGENE
   Machine learning
          Classification
               Support Vector Machines - NLProt
               Logistic Regression -
          Sequence Labelling
               Conditional Random Fields - ABNER, BANNER, JNET
               Hidden Markov Models - GENIA
   Hybrid methods

      Nathan Harmston               Review of Gene NER                24/02/2011   5 / 15
Corpus
    A corpus is a collection of manually annotated documents which have had
    NEs marked up by a human expert.
    serve as a benchmark to compare methods.
    serve as development/training sets for methods.
    Size, Inter-Annotator Agreement (IAA), Scope, Evaluation scheme
    BioCreative I GM, BioCreative II GM, NLPBA, GENIA
                                                .
                                                .
                                                .
 P07642544A0868            Conversely, treatment of human protein-tyrosine phosphatase
                           alpha-overexpressing cells with phenylarsine oxide led to a loss
                           of the constitutive NF-kappa B activity.
                                                .
                                                .
                                                .

                           P07642544A0868|127 135| NF-kappa B


         Nathan Harmston                  Review of Gene NER               24/02/2011   6 / 15
Classification-based approaches
Conversely, treatment of human protein-tyrosine phosphatase alpha-overexpressing
cells with phenylarsine oxide led to a loss of the constitutive NF-kappa B activity.

                                                                           
 xi = training data                                        gene after
                                                                        0 
        1,    if xi belongs to class 1                       kappa      1 
 yi =                                                                     
        −1, if xi belongs to class 2                      constitutive  1 
                                                                          
                                                          noun phrase    1

     surface clues, syntactic properties of NEs, Part of Speech
     surrounding words
     matches against dictionary
     typically binary decision (SVMs only work well for binary problems)
     Maximum Entropy, SVM, Naive Bayes
     order-independent vector

         Nathan Harmston             Review of Gene NER                 24/02/2011   7 / 15
Sequence labelling approaches
Conversely, treatment of human protein-tyrosine phosphatase alpha-overexpressing
cells with phenylarsine oxide led to a loss of the constitutive NF-kappa B activity.

          y1                   y2                         y3             y4




          x1                   x2                         x3             x4

     constitutive          NF-kappa                       B           activity


     consider the complete ordered sequence of tokens in a sentence
     predict the most probable sequence of tags for a given sequence of
     words in a sentence
     using semantic and lexical features
     takes order into account
         Nathan Harmston             Review of Gene NER              24/02/2011   8 / 15
Nathan Harmston   Review of Gene NER   24/02/2011   9 / 15
Performance - strict matching

               TP                             TP                         Precision·Recall
Precision =   TP+FP              Recall =   TP+FN             F1 = 2 ·   Precision+Recall

    Tagger               Notes                    Precision    Recall        F1
    ABNER                NLPBA corpus              0.4867      0.5584      0.5201
    ABNER                BCI corpus                0.6749      0.5830      0.6256
    BANNER               Hepple POS + BCII         0.7605      0.7068      0.7327
    BANNER               MedPOS + BCII             0.7593      0.7195      0.7388
    GENIA Tagger                                   0.4665      0.5789      0.5166
    JNET                                           0.5074      0.3802      0.4347
    Whatizit             whatizitSwissprot         0.4980      0.3465      0.4087
    Reflect.ws                                      0.4678      0.3734      0.4153




       Nathan Harmston                Review of Gene NER                 24/02/2011   10 / 15
Performance - sloppy matching

               TP                             TP                         Precision·Recall
Precision =   TP+FP              Recall =   TP+FN             F1 = 2 ·   Precision+Recall

    Tagger               Notes                    Precision    Recall        F1
    ABNER                NLPBA corpus              0.6229      0.7146      0.6656
    ABNER                BCI corpus                0.8641      0.7465      0.8010
    BANNER               Hepple POS + BCII         0.8654      0.8043      0.8337
    BANNER               MedPOS + BCII             0.8596      0.8146      0.8365
    GENIA Tagger                                   0.5909      0.7334      0.6545
    JNET                                           0.5616      0.4208      0.4811
    Whatizit             whatizitSwissprot         0.5061      0.3522      0.4154
    Reflect.ws                                      0.4829      0.3854      0.4287




       Nathan Harmston                Review of Gene NER                 24/02/2011   11 / 15
Availability
    Most are easily available and released under open source licenses.
    Variety of languages (primarily Java and C++)
    Most require hacking to get them working
    OSCAR3 is a beast
    GENIA - very easy to write a SWIG access so you can call it from Python
    JNET - few hacks
    ReflectWS (REST/SOAP) Whatizit (SOAP)

http://pages.cs.wisc.edu/~bsettles/abner/
http://banner.sourceforge.net/
http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/
http://linnaeus.sourceforge.net/
http://cubic.bioc.columbia.edu/services/nlprot/
http://www.ebi.ac.uk/webservices/whatizit/
http://sourceforge.net/projects/oscar3-chem/
http://julielab.de/
       Nathan Harmston             Review of Gene NER             24/02/2011   12 / 15
Literature based discovery - CRPS
Literature based discovery - CRPS




                         NF-κB




      Nathan Harmston    Review of Gene NER   24/02/2011   13 / 15
Literature based discovery - CRPS




                                     NF-κB




Outcome
NF-κB is involved in CRPS
allows generation of new mechanistic hypotheses
new drug target

  Hettne et al - 2007 Applied information retrieval and multidisciplinary research:
                 new mechanistic hypotheses in Complex Regional Pain Syndrome
        Nathan Harmston             Review of Gene NER              24/02/2011   13 / 15
Finally........
     for standalone - BANNER
     web services - who knows?
     Chemical NER - OSCAR (make sure you use the PubMed models)
     Species NER - Linnaeus




          Nathan Harmston        Review of Gene NER       24/02/2011   14 / 15
Finally........
     for standalone - BANNER
     web services - who knows?
     Chemical NER - OSCAR (make sure you use the PubMed models)
     Species NER - Linnaeus
     So now you have the named entities - you need to map them to canonical
     identifiers - called gene normalisation (GN).
           .... but thats for another talk
     What are they doing? PPI extraction - is there a physical interaction
     between two genes in an abstract - Binding between Akt2 and APPL




          Nathan Harmston             Review of Gene NER          24/02/2011   14 / 15
Finally........
     for standalone - BANNER
     web services - who knows?
     Chemical NER - OSCAR (make sure you use the PubMed models)
     Species NER - Linnaeus
     So now you have the named entities - you need to map them to canonical
     identifiers - called gene normalisation (GN).
           .... but thats for another talk
     What are they doing? PPI extraction - is there a physical interaction
     between two genes in an abstract - Binding between Akt2 and APPL
     Text mining is noisy and imperfect - but then so is manual curation (IAA)




          Nathan Harmston             Review of Gene NER          24/02/2011   14 / 15
Finally........
     for standalone - BANNER
     web services - who knows?
     Chemical NER - OSCAR (make sure you use the PubMed models)
     Species NER - Linnaeus
     So now you have the named entities - you need to map them to canonical
     identifiers - called gene normalisation (GN).
           .... but thats for another talk
     What are they doing? PPI extraction - is there a physical interaction
     between two genes in an abstract - Binding between Akt2 and APPL
     Text mining is noisy and imperfect - but then so is manual curation (IAA)
     Text mining is a noisy (and biased) way of extracting information from noisy
     (and biased) text which represents the results of noisy (and biased)
     experiments carried out by researchers (who are probably noisy and biased).


          Nathan Harmston             Review of Gene NER          24/02/2011   14 / 15
Shameless self-promotion.......
Harmston, N., Filsell, W., and Stumpf, M. P. H. (2010) What the papers
say: text mining for genomics and systems biology. Hum Genomics, 5(1),
17-29

                    nathan.harmston07@imperial.ac.uk




        Nathan Harmston           Review of Gene NER      24/02/2011   15 / 15
Shameless self-promotion.......
Harmston, N., Filsell, W., and Stumpf, M. P. H. (2010) What the papers
say: text mining for genomics and systems biology. Hum Genomics, 5(1),
17-29

                    nathan.harmston07@imperial.ac.uk




                          Questions?

        Nathan Harmston           Review of Gene NER      24/02/2011   15 / 15

More Related Content

What's hot

Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Sage Base
 
60 ch14dn ahistory2008
60 ch14dn ahistory200860 ch14dn ahistory2008
60 ch14dn ahistory2008sbarkanic
 
Gene Editing: An Essential Tool For Plant Breeding
Gene Editing: An Essential Tool For Plant BreedingGene Editing: An Essential Tool For Plant Breeding
Gene Editing: An Essential Tool For Plant BreedingNoreen Fatima
 
Kariuki group nanobody expression artical
Kariuki group nanobody expression articalKariuki group nanobody expression artical
Kariuki group nanobody expression articalSamuel Kariuki
 
Seminario biología molecular-presentación
Seminario biología molecular-presentaciónSeminario biología molecular-presentación
Seminario biología molecular-presentaciónSharaCarolinaMontoya
 
Plant epigenetic memory in plant growth behavior and stress response. Sally M...
Plant epigenetic memory in plant growth behavior and stress response. Sally M...Plant epigenetic memory in plant growth behavior and stress response. Sally M...
Plant epigenetic memory in plant growth behavior and stress response. Sally M...CIAT
 
R DNA Technology
R DNA TechnologyR DNA Technology
R DNA TechnologyJeevarahini
 
Cloning the Soil Metagenome
Cloning the Soil MetagenomeCloning the Soil Metagenome
Cloning the Soil MetagenomeZuleika86
 
Presence of genetically modified organism genes in carica papaya, glycine max...
Presence of genetically modified organism genes in carica papaya, glycine max...Presence of genetically modified organism genes in carica papaya, glycine max...
Presence of genetically modified organism genes in carica papaya, glycine max...Carlos Santos Perez
 
Presence of genetically modified organism genes in carica papaya, glycine max...
Presence of genetically modified organism genes in carica papaya, glycine max...Presence of genetically modified organism genes in carica papaya, glycine max...
Presence of genetically modified organism genes in carica papaya, glycine max...valrivera
 
Gene editing 1
Gene editing 1Gene editing 1
Gene editing 1ajayveeru
 
NEW PHARMACEUTICALS DERIVED FROM BIOTECHNOLOGY
NEW PHARMACEUTICALS DERIVED FROM BIOTECHNOLOGYNEW PHARMACEUTICALS DERIVED FROM BIOTECHNOLOGY
NEW PHARMACEUTICALS DERIVED FROM BIOTECHNOLOGYpriyadarshini288
 
A Lovatt and Roberst IS. Micobiology 1994
A Lovatt and Roberst IS. Micobiology 1994A Lovatt and Roberst IS. Micobiology 1994
A Lovatt and Roberst IS. Micobiology 1994Archie Lovatt
 
SKV rDNA Technology & Hybridoma Technology
SKV rDNA Technology & Hybridoma TechnologySKV rDNA Technology & Hybridoma Technology
SKV rDNA Technology & Hybridoma TechnologySACHINKUMARVISHWAKAR4
 
Manipulation of gene expression in prokaryotes
Manipulation of gene expression in prokaryotesManipulation of gene expression in prokaryotes
Manipulation of gene expression in prokaryotesSabahat Ali
 
Selection & Screening of Recombinant cells & expression of recombinant (2) (1)
Selection & Screening of  Recombinant cells & expression of recombinant (2) (1)Selection & Screening of  Recombinant cells & expression of recombinant (2) (1)
Selection & Screening of Recombinant cells & expression of recombinant (2) (1)SunandaArya
 
Spring 2016 Poster - Ashley 2
Spring 2016 Poster - Ashley 2Spring 2016 Poster - Ashley 2
Spring 2016 Poster - Ashley 2Ashley Smith
 

What's hot (20)

Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
 
60 ch14dn ahistory2008
60 ch14dn ahistory200860 ch14dn ahistory2008
60 ch14dn ahistory2008
 
Gene Editing: An Essential Tool For Plant Breeding
Gene Editing: An Essential Tool For Plant BreedingGene Editing: An Essential Tool For Plant Breeding
Gene Editing: An Essential Tool For Plant Breeding
 
Kariuki group nanobody expression artical
Kariuki group nanobody expression articalKariuki group nanobody expression artical
Kariuki group nanobody expression artical
 
Seminario biología molecular-presentación
Seminario biología molecular-presentaciónSeminario biología molecular-presentación
Seminario biología molecular-presentación
 
Plant epigenetic memory in plant growth behavior and stress response. Sally M...
Plant epigenetic memory in plant growth behavior and stress response. Sally M...Plant epigenetic memory in plant growth behavior and stress response. Sally M...
Plant epigenetic memory in plant growth behavior and stress response. Sally M...
 
R DNA Technology
R DNA TechnologyR DNA Technology
R DNA Technology
 
Cloning the Soil Metagenome
Cloning the Soil MetagenomeCloning the Soil Metagenome
Cloning the Soil Metagenome
 
Presence of genetically modified organism genes in carica papaya, glycine max...
Presence of genetically modified organism genes in carica papaya, glycine max...Presence of genetically modified organism genes in carica papaya, glycine max...
Presence of genetically modified organism genes in carica papaya, glycine max...
 
Mitochondria transformation
Mitochondria transformationMitochondria transformation
Mitochondria transformation
 
Presence of genetically modified organism genes in carica papaya, glycine max...
Presence of genetically modified organism genes in carica papaya, glycine max...Presence of genetically modified organism genes in carica papaya, glycine max...
Presence of genetically modified organism genes in carica papaya, glycine max...
 
Gene editing 1
Gene editing 1Gene editing 1
Gene editing 1
 
PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81
 
NEW PHARMACEUTICALS DERIVED FROM BIOTECHNOLOGY
NEW PHARMACEUTICALS DERIVED FROM BIOTECHNOLOGYNEW PHARMACEUTICALS DERIVED FROM BIOTECHNOLOGY
NEW PHARMACEUTICALS DERIVED FROM BIOTECHNOLOGY
 
A Lovatt and Roberst IS. Micobiology 1994
A Lovatt and Roberst IS. Micobiology 1994A Lovatt and Roberst IS. Micobiology 1994
A Lovatt and Roberst IS. Micobiology 1994
 
SKV rDNA Technology & Hybridoma Technology
SKV rDNA Technology & Hybridoma TechnologySKV rDNA Technology & Hybridoma Technology
SKV rDNA Technology & Hybridoma Technology
 
TDikow Hennig 2011
TDikow Hennig 2011TDikow Hennig 2011
TDikow Hennig 2011
 
Manipulation of gene expression in prokaryotes
Manipulation of gene expression in prokaryotesManipulation of gene expression in prokaryotes
Manipulation of gene expression in prokaryotes
 
Selection & Screening of Recombinant cells & expression of recombinant (2) (1)
Selection & Screening of  Recombinant cells & expression of recombinant (2) (1)Selection & Screening of  Recombinant cells & expression of recombinant (2) (1)
Selection & Screening of Recombinant cells & expression of recombinant (2) (1)
 
Spring 2016 Poster - Ashley 2
Spring 2016 Poster - Ashley 2Spring 2016 Poster - Ashley 2
Spring 2016 Poster - Ashley 2
 

Viewers also liked

How to Connect Moves App with NewU
How to Connect Moves App with NewUHow to Connect Moves App with NewU
How to Connect Moves App with NewUAbhinav Singh
 
Gamification: Foundation for Young Australians
Gamification: Foundation for Young AustraliansGamification: Foundation for Young Australians
Gamification: Foundation for Young AustraliansDr. Marigo Raftopoulos
 
Trabajar las competencias en el aula
Trabajar las competencias en el aulaTrabajar las competencias en el aula
Trabajar las competencias en el aulakathylaboure
 
Mevsys Data Mining: lift for sales campaigns.
Mevsys Data Mining: lift for sales campaigns.Mevsys Data Mining: lift for sales campaigns.
Mevsys Data Mining: lift for sales campaigns.Mevsys Data Mining
 
Barbara Reale, presentazione Arte Spirituale San Biagio
Barbara Reale, presentazione Arte Spirituale San BiagioBarbara Reale, presentazione Arte Spirituale San Biagio
Barbara Reale, presentazione Arte Spirituale San BiagioGIOVANNI LARICCIA
 
2_Flat_logo_on_transparent
2_Flat_logo_on_transparent2_Flat_logo_on_transparent
2_Flat_logo_on_transparentJames Monk Hilty
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisationBiogeeks
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportBiogeeks
 
How to Connect with HealthKit App with NewU
How to Connect with HealthKit App with NewUHow to Connect with HealthKit App with NewU
How to Connect with HealthKit App with NewUAbhinav Singh
 
Archivo 1 - El profesor suplente
Archivo 1 - El profesor suplenteArchivo 1 - El profesor suplente
Archivo 1 - El profesor suplenteEnrique_Alfredo
 
Complications in laparoscopic surgery
Complications in laparoscopic surgeryComplications in laparoscopic surgery
Complications in laparoscopic surgeryJohn Thanakumar
 
Licenciatura intercultural indígina estrutura e resolução 105-2013
Licenciatura intercultural indígina   estrutura e resolução 105-2013Licenciatura intercultural indígina   estrutura e resolução 105-2013
Licenciatura intercultural indígina estrutura e resolução 105-2013clayton Clayton
 

Viewers also liked (18)

How to Connect Moves App with NewU
How to Connect Moves App with NewUHow to Connect Moves App with NewU
How to Connect Moves App with NewU
 
Marigo Serious Games Preso for Gcap
Marigo Serious Games Preso for GcapMarigo Serious Games Preso for Gcap
Marigo Serious Games Preso for Gcap
 
Gamification: Foundation for Young Australians
Gamification: Foundation for Young AustraliansGamification: Foundation for Young Australians
Gamification: Foundation for Young Australians
 
RR Nagar
RR NagarRR Nagar
RR Nagar
 
GROUND FLOOR
GROUND FLOORGROUND FLOOR
GROUND FLOOR
 
Trabajar las competencias en el aula
Trabajar las competencias en el aulaTrabajar las competencias en el aula
Trabajar las competencias en el aula
 
Mevsys Data Mining: lift for sales campaigns.
Mevsys Data Mining: lift for sales campaigns.Mevsys Data Mining: lift for sales campaigns.
Mevsys Data Mining: lift for sales campaigns.
 
Barbara Reale, presentazione Arte Spirituale San Biagio
Barbara Reale, presentazione Arte Spirituale San BiagioBarbara Reale, presentazione Arte Spirituale San Biagio
Barbara Reale, presentazione Arte Spirituale San Biagio
 
2_Flat_logo_on_transparent
2_Flat_logo_on_transparent2_Flat_logo_on_transparent
2_Flat_logo_on_transparent
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
Extraordinaria 4 (1)
Extraordinaria 4 (1)Extraordinaria 4 (1)
Extraordinaria 4 (1)
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
 
Karta Pracy
Karta PracyKarta Pracy
Karta Pracy
 
How to Connect with HealthKit App with NewU
How to Connect with HealthKit App with NewUHow to Connect with HealthKit App with NewU
How to Connect with HealthKit App with NewU
 
Archivo 1 - El profesor suplente
Archivo 1 - El profesor suplenteArchivo 1 - El profesor suplente
Archivo 1 - El profesor suplente
 
Complications in laparoscopic surgery
Complications in laparoscopic surgeryComplications in laparoscopic surgery
Complications in laparoscopic surgery
 
Licenciatura intercultural indígina estrutura e resolução 105-2013
Licenciatura intercultural indígina   estrutura e resolução 105-2013Licenciatura intercultural indígina   estrutura e resolução 105-2013
Licenciatura intercultural indígina estrutura e resolução 105-2013
 

Similar to Identifying genes and proteins in text: a short review of available tools and resources

Arabidopsis Climate Change
Arabidopsis Climate ChangeArabidopsis Climate Change
Arabidopsis Climate ChangeNicole Wells
 
Yeast two hybrid system / protein-protein interaction
Yeast two hybrid system / protein-protein interactionYeast two hybrid system / protein-protein interaction
Yeast two hybrid system / protein-protein interactionMaryam Shakeel
 
Communications
CommunicationsCommunications
Communicationssomasushma
 
Environmental Factor - July 2014_ Intramural papers of the month
Environmental Factor - July 2014_ Intramural papers of the monthEnvironmental Factor - July 2014_ Intramural papers of the month
Environmental Factor - July 2014_ Intramural papers of the monthXunhai 郑训海
 
Biotech 2012 spring-6_protein_interactions_0
Biotech 2012 spring-6_protein_interactions_0Biotech 2012 spring-6_protein_interactions_0
Biotech 2012 spring-6_protein_interactions_0BioinformaticsInstitute
 
POST-TRANSCRIPTIONAL GENE SILENCING BY DOUBLESTRANDED RNA
POST-TRANSCRIPTIONAL GENE SILENCING BY DOUBLESTRANDED RNAPOST-TRANSCRIPTIONAL GENE SILENCING BY DOUBLESTRANDED RNA
POST-TRANSCRIPTIONAL GENE SILENCING BY DOUBLESTRANDED RNAerickmadness
 
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...Jonathan Eisen
 
Comparitive modelling.
Comparitive modelling.Comparitive modelling.
Comparitive modelling.Balvinder Kaur
 
UC Davis EVE161 Lecture 15 by @phylogenomics
UC Davis EVE161 Lecture 15 by @phylogenomicsUC Davis EVE161 Lecture 15 by @phylogenomics
UC Davis EVE161 Lecture 15 by @phylogenomicsJonathan Eisen
 
Molecular basis of insecticides resistance in insects with special reference ...
Molecular basis of insecticides resistance in insects with special reference ...Molecular basis of insecticides resistance in insects with special reference ...
Molecular basis of insecticides resistance in insects with special reference ...Assam Agricultural University
 
Molecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsMolecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsSaramita De Chakravarti
 
Essay On Arabidopsis Thaliaa
Essay On Arabidopsis ThaliaaEssay On Arabidopsis Thaliaa
Essay On Arabidopsis ThaliaaEvelyn Donaldson
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wdWagied Davids
 

Similar to Identifying genes and proteins in text: a short review of available tools and resources (20)

Arabidopsis Climate Change
Arabidopsis Climate ChangeArabidopsis Climate Change
Arabidopsis Climate Change
 
Regulatory RNA at epigenetic level
Regulatory RNA at epigenetic level Regulatory RNA at epigenetic level
Regulatory RNA at epigenetic level
 
Curriculum Vitae.
Curriculum Vitae.Curriculum Vitae.
Curriculum Vitae.
 
Yeast two hybrid system / protein-protein interaction
Yeast two hybrid system / protein-protein interactionYeast two hybrid system / protein-protein interaction
Yeast two hybrid system / protein-protein interaction
 
Communications
CommunicationsCommunications
Communications
 
Environmental Factor - July 2014_ Intramural papers of the month
Environmental Factor - July 2014_ Intramural papers of the monthEnvironmental Factor - July 2014_ Intramural papers of the month
Environmental Factor - July 2014_ Intramural papers of the month
 
Biotech 2012 spring-6_protein_interactions_0
Biotech 2012 spring-6_protein_interactions_0Biotech 2012 spring-6_protein_interactions_0
Biotech 2012 spring-6_protein_interactions_0
 
POST-TRANSCRIPTIONAL GENE SILENCING BY DOUBLESTRANDED RNA
POST-TRANSCRIPTIONAL GENE SILENCING BY DOUBLESTRANDED RNAPOST-TRANSCRIPTIONAL GENE SILENCING BY DOUBLESTRANDED RNA
POST-TRANSCRIPTIONAL GENE SILENCING BY DOUBLESTRANDED RNA
 
Msb201158
Msb201158Msb201158
Msb201158
 
Presentation final
Presentation finalPresentation final
Presentation final
 
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
 
Comparitive modelling.
Comparitive modelling.Comparitive modelling.
Comparitive modelling.
 
Louisville2
Louisville2Louisville2
Louisville2
 
UC Davis EVE161 Lecture 15 by @phylogenomics
UC Davis EVE161 Lecture 15 by @phylogenomicsUC Davis EVE161 Lecture 15 by @phylogenomics
UC Davis EVE161 Lecture 15 by @phylogenomics
 
Molecular basis of insecticides resistance in insects with special reference ...
Molecular basis of insecticides resistance in insects with special reference ...Molecular basis of insecticides resistance in insects with special reference ...
Molecular basis of insecticides resistance in insects with special reference ...
 
Molecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsMolecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in Insects
 
Genome editing tools article
Genome editing tools   articleGenome editing tools   article
Genome editing tools article
 
Drosophila Leon mutant:Study of Wing Development
Drosophila Leon mutant:Study of Wing DevelopmentDrosophila Leon mutant:Study of Wing Development
Drosophila Leon mutant:Study of Wing Development
 
Essay On Arabidopsis Thaliaa
Essay On Arabidopsis ThaliaaEssay On Arabidopsis Thaliaa
Essay On Arabidopsis Thaliaa
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wd
 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Identifying genes and proteins in text: a short review of available tools and resources

  • 1. Identifying genes and proteins in text: a short review of available tools and resources Nathan Harmston Theoretical Systems Biology Centre for Bioinformatics Centre for Integrative Systems Biology at Imperial College London 24/02/2011 Nathan Harmston Review of Gene NER 24/02/2011 1 / 15
  • 2. Deluge/Flood/Tsunami of publications Literature contains important knowledge which is generated by researchers and ideally not just something to promote their career. Nathan Harmston Review of Gene NER 24/02/2011 2 / 15
  • 3. Named Entity Recognition Selection of sup1 and sup2 mutants in the yeast Saccharomyces cerevisiae on cycloheximide containing media revealed classes of mutants that either are completely unable to grow on YAPD without cycloheximide or need this drug under high temperature incubation (30 or 36 degrees C). Some of these mutants also exhibit the growth dependence on another antibiotic– trichodermin, and, at the same time, the osmotic dependence. A hypothesis claiming that sup1 and sup2 mutations cause conformational lability of yeast cytoplasmic ribosomes has been put forward. It is also proposed that binding of cycloheximide and trichodermin to the mutant ribosomes cause their conformational shift, which compensates the functional defects. Nathan Harmston Review of Gene NER 24/02/2011 3 / 15
  • 4. Named Entity Recognition Selection of sup1 and sup2 mutants in the yeast Saccharomyces cerevisiae on cycloheximide containing media revealed classes of mutants that either are completely unable to grow on YAPD without cycloheximide or need this drug under high temperature incubation (30 or 36 degrees C). Some of these mutants also exhibit the growth dependence on another antibiotic– trichodermin, and, at the same time, the osmotic dependence. A hypothesis claiming that sup1 and sup2 mutations cause conformational lability of yeast cytoplasmic ribosomes has been put forward. It is also proposed that binding of cycloheximide and trichodermin to the mutant ribosomes cause their conformational shift, which compensates the functional defects. Nathan Harmston Review of Gene NER 24/02/2011 3 / 15
  • 5. Named Entity Recognition Selection of sup1 and sup2 mutants in the yeast Saccharomyces cerevisiae on cycloheximide containing media revealed classes of mutants that either are completely unable to grow on YAPD without cycloheximide or need this drug under high temperature incubation (30 or 36 degrees C). Some of these mutants also exhibit the growth dependence on another antibiotic– trichodermin, and, at the same time, the osmotic dependence. A hypothesis claiming that sup1 and sup2 mutations cause conformational lability of yeast cytoplasmic ribosomes has been put forward. It is also proposed that binding of cycloheximide and trichodermin to the mutant ribosomes cause their conformational shift, which compensates the functional defects. Nathan Harmston Review of Gene NER 24/02/2011 3 / 15
  • 6. Named Entity Recognition Selection of sup1 and sup2 mutants in the yeast Saccharomyces cerevisiae on cycloheximide containing media revealed classes of mutants that either are completely unable to grow on YAPD without cycloheximide or need this drug under high temperature incubation (30 or 36 degrees C). Some of these mutants also exhibit the growth dependence on another antibiotic– trichodermin, and, at the same time, the osmotic dependence. A hypothesis claiming that sup1 and sup2 mutations cause conformational lability of yeast cytoplasmic ribosomes has been put forward. It is also proposed that binding of cycloheximide and trichodermin to the mutant ribosomes cause their conformational shift, which compensates the functional defects. Nathan Harmston Review of Gene NER 24/02/2011 3 / 15
  • 7. Named Entity Recognition Selection of sup1 and sup2 mutants in the yeast Saccharomyces cerevisiae on cycloheximide containing media revealed classes of mutants that either are completely unable to grow on YAPD without cycloheximide or need this drug under high temperature incubation (30 or 36 degrees C). Some of these mutants also exhibit the growth dependence on another antibiotic– trichodermin, and, at the same time, the osmotic dependence. A hypothesis claiming that sup1 and sup2 mutations cause conformational lability of yeast cytoplasmic ribosomes has been put forward. It is also proposed that binding of cycloheximide and trichodermin to the mutant ribosomes cause their conformational shift, which compensates the functional defects. Genes have many different names e.g. { P53, TP53, Hs.1845, TRP53 } Gene names are subject to morphological (transcription factor, transcriptional factor), orthographic (NF kappa B, NF kappaB), combinatorial (homolog of actin, actin homolog) and inflectional variation (antibody, antibodies). Some names overlap with normal english breathless, Not, That Deciding when a term refers to a gene, RNA or a protein is difficult: pspA, PspA Nathan Harmston Review of Gene NER 24/02/2011 3 / 15
  • 8. Problems HUNK is associated with expression of Frizzled 2 HUman Natural Killer Nathan Harmston Review of Gene NER 24/02/2011 4 / 15
  • 9. Problems HUNK is associated with expression of Frizzled 2 HUman Natural Killer Large piece of something without definite shape Nathan Harmston Review of Gene NER 24/02/2011 4 / 15
  • 10. Problems HUNK is associated with expression of Frizzled 2 HUman Natural Killer Large piece of something without definite shape A well built sexually attractive man Nathan Harmston Review of Gene NER 24/02/2011 4 / 15
  • 11. Problems HUNK is associated with expression of Frizzled 2 HUman Natural Killer Large piece of something without definite shape A well built sexually attractive man Hormonally Upregulated Neu-associated Kinase Nathan Harmston Review of Gene NER 24/02/2011 4 / 15
  • 12. Methods dictionary BioThesaurus fuzzy matching techniques (Levenshtein, Jaro, Jaro-Winkler) BLAST Whatizit, Reflect.WS rule/pattern based matching good for things like Yeast genes, but rubbish for fruitfly ABGENE Machine learning Classification Support Vector Machines - NLProt Logistic Regression - Sequence Labelling Conditional Random Fields - ABNER, BANNER, JNET Hidden Markov Models - GENIA Hybrid methods Nathan Harmston Review of Gene NER 24/02/2011 5 / 15
  • 13. Corpus A corpus is a collection of manually annotated documents which have had NEs marked up by a human expert. serve as a benchmark to compare methods. serve as development/training sets for methods. Size, Inter-Annotator Agreement (IAA), Scope, Evaluation scheme BioCreative I GM, BioCreative II GM, NLPBA, GENIA . . . P07642544A0868 Conversely, treatment of human protein-tyrosine phosphatase alpha-overexpressing cells with phenylarsine oxide led to a loss of the constitutive NF-kappa B activity. . . . P07642544A0868|127 135| NF-kappa B Nathan Harmston Review of Gene NER 24/02/2011 6 / 15
  • 14. Classification-based approaches Conversely, treatment of human protein-tyrosine phosphatase alpha-overexpressing cells with phenylarsine oxide led to a loss of the constitutive NF-kappa B activity.   xi = training data gene after  0  1, if xi belongs to class 1 kappa  1  yi =   −1, if xi belongs to class 2 constitutive  1    noun phrase 1 surface clues, syntactic properties of NEs, Part of Speech surrounding words matches against dictionary typically binary decision (SVMs only work well for binary problems) Maximum Entropy, SVM, Naive Bayes order-independent vector Nathan Harmston Review of Gene NER 24/02/2011 7 / 15
  • 15. Sequence labelling approaches Conversely, treatment of human protein-tyrosine phosphatase alpha-overexpressing cells with phenylarsine oxide led to a loss of the constitutive NF-kappa B activity. y1 y2 y3 y4 x1 x2 x3 x4 constitutive NF-kappa B activity consider the complete ordered sequence of tokens in a sentence predict the most probable sequence of tags for a given sequence of words in a sentence using semantic and lexical features takes order into account Nathan Harmston Review of Gene NER 24/02/2011 8 / 15
  • 16. Nathan Harmston Review of Gene NER 24/02/2011 9 / 15
  • 17. Performance - strict matching TP TP Precision·Recall Precision = TP+FP Recall = TP+FN F1 = 2 · Precision+Recall Tagger Notes Precision Recall F1 ABNER NLPBA corpus 0.4867 0.5584 0.5201 ABNER BCI corpus 0.6749 0.5830 0.6256 BANNER Hepple POS + BCII 0.7605 0.7068 0.7327 BANNER MedPOS + BCII 0.7593 0.7195 0.7388 GENIA Tagger 0.4665 0.5789 0.5166 JNET 0.5074 0.3802 0.4347 Whatizit whatizitSwissprot 0.4980 0.3465 0.4087 Reflect.ws 0.4678 0.3734 0.4153 Nathan Harmston Review of Gene NER 24/02/2011 10 / 15
  • 18. Performance - sloppy matching TP TP Precision·Recall Precision = TP+FP Recall = TP+FN F1 = 2 · Precision+Recall Tagger Notes Precision Recall F1 ABNER NLPBA corpus 0.6229 0.7146 0.6656 ABNER BCI corpus 0.8641 0.7465 0.8010 BANNER Hepple POS + BCII 0.8654 0.8043 0.8337 BANNER MedPOS + BCII 0.8596 0.8146 0.8365 GENIA Tagger 0.5909 0.7334 0.6545 JNET 0.5616 0.4208 0.4811 Whatizit whatizitSwissprot 0.5061 0.3522 0.4154 Reflect.ws 0.4829 0.3854 0.4287 Nathan Harmston Review of Gene NER 24/02/2011 11 / 15
  • 19. Availability Most are easily available and released under open source licenses. Variety of languages (primarily Java and C++) Most require hacking to get them working OSCAR3 is a beast GENIA - very easy to write a SWIG access so you can call it from Python JNET - few hacks ReflectWS (REST/SOAP) Whatizit (SOAP) http://pages.cs.wisc.edu/~bsettles/abner/ http://banner.sourceforge.net/ http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/ http://linnaeus.sourceforge.net/ http://cubic.bioc.columbia.edu/services/nlprot/ http://www.ebi.ac.uk/webservices/whatizit/ http://sourceforge.net/projects/oscar3-chem/ http://julielab.de/ Nathan Harmston Review of Gene NER 24/02/2011 12 / 15
  • 21. Literature based discovery - CRPS NF-κB Nathan Harmston Review of Gene NER 24/02/2011 13 / 15
  • 22. Literature based discovery - CRPS NF-κB Outcome NF-κB is involved in CRPS allows generation of new mechanistic hypotheses new drug target Hettne et al - 2007 Applied information retrieval and multidisciplinary research: new mechanistic hypotheses in Complex Regional Pain Syndrome Nathan Harmston Review of Gene NER 24/02/2011 13 / 15
  • 23. Finally........ for standalone - BANNER web services - who knows? Chemical NER - OSCAR (make sure you use the PubMed models) Species NER - Linnaeus Nathan Harmston Review of Gene NER 24/02/2011 14 / 15
  • 24. Finally........ for standalone - BANNER web services - who knows? Chemical NER - OSCAR (make sure you use the PubMed models) Species NER - Linnaeus So now you have the named entities - you need to map them to canonical identifiers - called gene normalisation (GN). .... but thats for another talk What are they doing? PPI extraction - is there a physical interaction between two genes in an abstract - Binding between Akt2 and APPL Nathan Harmston Review of Gene NER 24/02/2011 14 / 15
  • 25. Finally........ for standalone - BANNER web services - who knows? Chemical NER - OSCAR (make sure you use the PubMed models) Species NER - Linnaeus So now you have the named entities - you need to map them to canonical identifiers - called gene normalisation (GN). .... but thats for another talk What are they doing? PPI extraction - is there a physical interaction between two genes in an abstract - Binding between Akt2 and APPL Text mining is noisy and imperfect - but then so is manual curation (IAA) Nathan Harmston Review of Gene NER 24/02/2011 14 / 15
  • 26. Finally........ for standalone - BANNER web services - who knows? Chemical NER - OSCAR (make sure you use the PubMed models) Species NER - Linnaeus So now you have the named entities - you need to map them to canonical identifiers - called gene normalisation (GN). .... but thats for another talk What are they doing? PPI extraction - is there a physical interaction between two genes in an abstract - Binding between Akt2 and APPL Text mining is noisy and imperfect - but then so is manual curation (IAA) Text mining is a noisy (and biased) way of extracting information from noisy (and biased) text which represents the results of noisy (and biased) experiments carried out by researchers (who are probably noisy and biased). Nathan Harmston Review of Gene NER 24/02/2011 14 / 15
  • 27. Shameless self-promotion....... Harmston, N., Filsell, W., and Stumpf, M. P. H. (2010) What the papers say: text mining for genomics and systems biology. Hum Genomics, 5(1), 17-29 nathan.harmston07@imperial.ac.uk Nathan Harmston Review of Gene NER 24/02/2011 15 / 15
  • 28. Shameless self-promotion....... Harmston, N., Filsell, W., and Stumpf, M. P. H. (2010) What the papers say: text mining for genomics and systems biology. Hum Genomics, 5(1), 17-29 nathan.harmston07@imperial.ac.uk Questions? Nathan Harmston Review of Gene NER 24/02/2011 15 / 15