Google Project Page: https://code.google.com/p/bionlp-sadi/

 Project Demo Page: https://cbakerlab:8080/p/bionlp-sadi/




   Presenter: Ahmad C. Bukhari




                                                              1
   Motivation and Introduction
   Past Research Work
   Proposed Methodology
       System architecture
       System design
       Ontology Development
       SADI Service development
   Demo and code view
   Experiments and Results
   Conclusion and Future work
   References

                                   2
    Scientific literature, the most updated source of information

   Explosive growth observed in scientific literature
    production

    Internet is full of Bio related databases and search
    engines

      Text formats are provided by PubMed and OMIM.
      Sequence data is provided by GenBank, in terms of DNA, and UniProt,
       in terms of protein.
      Protein structures are provided by PDB, SCOP, and CATH.

                                                                             3
   Thousands of documents produced weekly : Impossible to read all
    the published documents

   Several solution developed based on AI techniques

   Lost significant due to new terms developed and static mechanism

   NLP emerged as possible solution in past decade

   NLP was widely adopted by scientists

   Several applications are available on internet based on NLP
    techniques

                                                                       4
   We Introduced semantically rich interoperable suite of BioNLP
    services based on SADI framework.

   Exploits the NLP technologies in order to extract the
    biological useful information from scientific documents.

   Can present the extracted information in such fashion that it
    would be reusable, searchable and interoperable.

   Can display the output in integrated format which further can
    lead for better bio system analysis

                                                                    5
Existing text mining services




Existing text mining services with web services

  •U-Compare
  •Whatizit
  •EBIMED


                                                  6
   Scientific community looking for sophisticated solution which can
    handle Biological data interoperability, usability and integration
    challenges.

 We coupled the useful biological NLP techniques with SADI
   framework to cope the biological information logistics
    issues.
 Proposed solution exploit the NLP technologies to extract
  bio worthy info. With semantic support

    Proposed solution provides output in reusable; searchable
    and interoperable format

                                                                         7
User Interaction Layer




SADI services suite




                      8
REST, XML, SOAP, or WSDL




            XML, RDF, OWL, RDFS




                           SWS+BNLP
KLEIO
U-Compare         NLP +WS = XML output
GENIA
FACTA+
etc                                      9
10
11
Deal with Annotation




    All document related concepts



           Feature Modeling



                                    12
13
   mutationFinder
   DrugExtractor (enhanced)
   DrugDrug Interaction (80% complte)
   Drug2Food Interaction (Business logic
    complte)
   Pmid2pdf (enhanced)
   Pdf2ascii (upgraded overall) // A lot bug in
    existing
   SADI client level integration service
                                                   14
Tools and technologies used
                              •Java
                              •Servlet
                              •RDF
                              •SPARQL
                              •JSP
                              •JSF
                              •Javascript
                              •XHTML
                              •And several
                              third party
                               libraries




                                      15
Demo and Code View


                     16
17
18
   Show where the drug Amoxicillin (DB01060 )
    positive effect against higher serum levels
   Give me the sentence where mutation and
    drug name occur in the same sentence.
   Extract all the drug names from text and
    show me the interaction (if exist) among all
    the drugs
   Tell me the food which have bad interaction
    with drug Cytarabine
                                                   19
Consolidated Output Generated By system
                                          20
   Proposed a generalized architecture : semantic interoperability
    and integration among BNLP tools

    Performed several experiments by designing different corpora’s
    and by choosing different combination of services

   In most of the cases: system generated the results according to
    our requirements
   . AS a future work, we will try to enhance the performance of the
     system by refining the algorithms
   A registry feature will be added to give user more freedom to work.


                                                                      21
   Topic Finding

   Limited availability of tools

   Development challenges (countless)

   Integration with web

   Finding case study (still have)
                                         22
   E. Gatial, Z. Balogh, M. Ciglan, L. Hluchy, Focused web crawling mechanism based on page relevance, In: Proc
    eedings of (ITAT 2005) information technologies applications and theory, 2005, pp. 41–45
   F.N Natalya, LM Deborah, Ontology development 101: a guide to creating your first ontology. http://protege.s
    tanford.edu/publications/ontology_development/ontology101-noy-mcguinness.htm
   H. Cunningham, Y. Wilks, R. J. Gaizauskas, GATE, a General Architecture for Text Engineering. Computers and
    humanities (2002), 1057-1060.
   R. Subhashini, V.J.S Kumar, Shallow NLP techniques for noun phrase extraction, In: Proceeding of Trendz in Inf
    ormation Sciences & Computing (TISC), 2010 , pp.73-77.
   S. Nasrolahi, M. Nikdast, M. Boroujerdi, The semantic web: a new approach for future world wide web, In: Pro
    ceedings of World Academy of Science, Engineering and Technology, 2009, pp. 1149-1154
   A.C. Bukhari, Y.G Kim, Exploiting the Heavyweight Ontology with Multi-Agent System Using Vocal Command
    System: A Case Study on E-Mall, International Journal of Advancements in Computing Technology 3(2011) 233
    -241.
   A.C. Bukhari, Y.G Kim, Ontology-assisted automatic precise information extractor for visually impaired inhabit
    ants, Artificial Intelligence Review (2005) Issn: 0269-2821.
   D.H. Fudholi, N. Maneerat, R. Varakulsiripunth, Y. Kato, Application of Protégé, SWRL and SQWRL in fuzzy on
    tology-based menu recommendation, International Symposium on Intelligent Signal Processing and Commu
    nication Systems, 2009, pp. 631-634.
   Baumgartner WA, Cohen KB, Fox L, Acquaah-Mensah G, Hunter L: Manual annotation is not sufficient for cura
    ting genomic databases.
   Bioinformatics 2007, 23:i41-i48. PubMed Abstract | Publisher Full Text | PubMed Central Full Text
   Laurilla J, Naderi N, Witte R, Riazanov A, Kouznetsov A, Baker CJO: Algorithms and semantic infrastructure for
    mutation impact extraction and grounding.
   BMC Genomics 2010, 11(Suppl 4):S24. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text




                                                                                                                     23
Many Thanks



              24

BioNLPSADI

  • 1.
    Google Project Page:https://code.google.com/p/bionlp-sadi/ Project Demo Page: https://cbakerlab:8080/p/bionlp-sadi/ Presenter: Ahmad C. Bukhari 1
  • 2.
    Motivation and Introduction  Past Research Work  Proposed Methodology  System architecture  System design  Ontology Development  SADI Service development  Demo and code view  Experiments and Results  Conclusion and Future work  References 2
  • 3.
    Scientific literature, the most updated source of information  Explosive growth observed in scientific literature production  Internet is full of Bio related databases and search engines  Text formats are provided by PubMed and OMIM.  Sequence data is provided by GenBank, in terms of DNA, and UniProt, in terms of protein.  Protein structures are provided by PDB, SCOP, and CATH. 3
  • 4.
    Thousands of documents produced weekly : Impossible to read all the published documents  Several solution developed based on AI techniques  Lost significant due to new terms developed and static mechanism  NLP emerged as possible solution in past decade  NLP was widely adopted by scientists  Several applications are available on internet based on NLP techniques 4
  • 5.
    We Introduced semantically rich interoperable suite of BioNLP services based on SADI framework.  Exploits the NLP technologies in order to extract the biological useful information from scientific documents.  Can present the extracted information in such fashion that it would be reusable, searchable and interoperable.  Can display the output in integrated format which further can lead for better bio system analysis 5
  • 6.
    Existing text miningservices Existing text mining services with web services •U-Compare •Whatizit •EBIMED 6
  • 7.
    Scientific community looking for sophisticated solution which can handle Biological data interoperability, usability and integration challenges.  We coupled the useful biological NLP techniques with SADI framework to cope the biological information logistics issues.  Proposed solution exploit the NLP technologies to extract bio worthy info. With semantic support  Proposed solution provides output in reusable; searchable and interoperable format 7
  • 8.
  • 9.
    REST, XML, SOAP,or WSDL XML, RDF, OWL, RDFS SWS+BNLP KLEIO U-Compare NLP +WS = XML output GENIA FACTA+ etc 9
  • 10.
  • 11.
  • 12.
    Deal with Annotation All document related concepts Feature Modeling 12
  • 13.
  • 14.
    mutationFinder  DrugExtractor (enhanced)  DrugDrug Interaction (80% complte)  Drug2Food Interaction (Business logic complte)  Pmid2pdf (enhanced)  Pdf2ascii (upgraded overall) // A lot bug in existing  SADI client level integration service 14
  • 15.
    Tools and technologiesused •Java •Servlet •RDF •SPARQL •JSP •JSF •Javascript •XHTML •And several third party libraries 15
  • 16.
  • 17.
  • 18.
  • 19.
    Show where the drug Amoxicillin (DB01060 ) positive effect against higher serum levels  Give me the sentence where mutation and drug name occur in the same sentence.  Extract all the drug names from text and show me the interaction (if exist) among all the drugs  Tell me the food which have bad interaction with drug Cytarabine 19
  • 20.
  • 21.
    Proposed a generalized architecture : semantic interoperability and integration among BNLP tools  Performed several experiments by designing different corpora’s and by choosing different combination of services  In most of the cases: system generated the results according to our requirements  . AS a future work, we will try to enhance the performance of the system by refining the algorithms  A registry feature will be added to give user more freedom to work. 21
  • 22.
    Topic Finding  Limited availability of tools  Development challenges (countless)  Integration with web  Finding case study (still have) 22
  • 23.
    E. Gatial, Z. Balogh, M. Ciglan, L. Hluchy, Focused web crawling mechanism based on page relevance, In: Proc eedings of (ITAT 2005) information technologies applications and theory, 2005, pp. 41–45  F.N Natalya, LM Deborah, Ontology development 101: a guide to creating your first ontology. http://protege.s tanford.edu/publications/ontology_development/ontology101-noy-mcguinness.htm  H. Cunningham, Y. Wilks, R. J. Gaizauskas, GATE, a General Architecture for Text Engineering. Computers and humanities (2002), 1057-1060.  R. Subhashini, V.J.S Kumar, Shallow NLP techniques for noun phrase extraction, In: Proceeding of Trendz in Inf ormation Sciences & Computing (TISC), 2010 , pp.73-77.  S. Nasrolahi, M. Nikdast, M. Boroujerdi, The semantic web: a new approach for future world wide web, In: Pro ceedings of World Academy of Science, Engineering and Technology, 2009, pp. 1149-1154  A.C. Bukhari, Y.G Kim, Exploiting the Heavyweight Ontology with Multi-Agent System Using Vocal Command System: A Case Study on E-Mall, International Journal of Advancements in Computing Technology 3(2011) 233 -241.  A.C. Bukhari, Y.G Kim, Ontology-assisted automatic precise information extractor for visually impaired inhabit ants, Artificial Intelligence Review (2005) Issn: 0269-2821.  D.H. Fudholi, N. Maneerat, R. Varakulsiripunth, Y. Kato, Application of Protégé, SWRL and SQWRL in fuzzy on tology-based menu recommendation, International Symposium on Intelligent Signal Processing and Commu nication Systems, 2009, pp. 631-634.  Baumgartner WA, Cohen KB, Fox L, Acquaah-Mensah G, Hunter L: Manual annotation is not sufficient for cura ting genomic databases.  Bioinformatics 2007, 23:i41-i48. PubMed Abstract | Publisher Full Text | PubMed Central Full Text  Laurilla J, Naderi N, Witte R, Riazanov A, Kouznetsov A, Baker CJO: Algorithms and semantic infrastructure for mutation impact extraction and grounding.  BMC Genomics 2010, 11(Suppl 4):S24. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text 23
  • 24.