Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Gene Wiki and Wikimedia Foundation SPARQL workshop

650 views

Published on

Introduction to gene wiki project, the centralized model organism project and the use of SPARQL.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Gene Wiki and Wikimedia Foundation SPARQL workshop

  1. 1. CURATING BIOMEDICAL KNOWLEDGE ON WIKIDATA AND WIKIPEDIA GENE WIKI Benjamin Good The Scripps Research Institute, La Jolla, California bgood@scripps.edu Twitter: @bgood
  2. 2. Gene Wikidata Team Andrew Su (Scripps) Andra Waagmeester (Micelio) Sebastian Burgstaller (Scripps) Tim Putman (Scripps) – speaking next Julia Turner (Scripps) Elvira Mitraka (U Maryland) Justin Leong (UBC) Lynn Schriml (U Maryland) Paul Pavlidis (UBC) Ginger Tsueng (Scripps) ACKNOWLEDGEMENTS
  3. 3. “knowledge” • A lot • Important • Text
  4. 4. More than 2 articles published/minute
  5. 5. Documents Concepts Gene Wiki: Filtering and summarizing PubMed
  6. 6. GENE WIKI 6 Protein structure Symbols and identifiers Tissue expression pattern Gene Ontology annotations Links to structured databases Gene summary Protein interactions Linked references Huss, PLoS Biol, 2008 Bot!
  7. 7. GENE WIKI TIMELINE Project Starts https://en.wikipedia.org/wiki/Portal:Gene_Wiki
  8. 8. Gene Wiki Version 1. {{GNF_Protein_box | Name = Reelin| image = | image_source = | PDB = {{PDB2|4AD9}} | HGNCid = 18512 | MGIid = | Symbol = LACTB2 | AltSymbols =; CGI-83 | IUPHAR = | ChEMBL = | OMIM = None | ECnumber = | Homologene = 9349 | GeneAtlas_image1 = | GeneAtlas_image2 = | GeneAtlas_image3 = | Protein_domain_image = | Function = {{GNF_GO|id=GO:0005515 |text = protein binding}} {{GNF_GO|id=GO:0016787 |text = hydrolase activity}} {{GNF_GO|id=GO:0046872 |text = metal ion binding}} | Component = {{GNF_GO|id=GO:0005739 |text = mitochondrion}} | Process = {{GNF_GO|id=GO:0008152 |text = metabolic process}} | Hs_EntrezGene = 51110 | Hs_Ensembl = ENSG00000147592 | Hs_RefseqmRNA = NM_016027 | Hs_RefseqProtein = NP_057111 | Hs_GenLoc_db = hg38 | Hs_GenLoc_chr = 8 | Hs_GenLoc_start = 70635318 | Hs_GenLoc_end = 70669174 | Hs_Uniprot = Q53H82 | Mm_EntrezGene = 212442 | Mm_Ensembl = ENSMUSG00000025937 | Mm_RefseqmRNA = NM_145381 | Mm_RefseqProtein = NP_663356 | Mm_GenLoc_db = mm10 | Mm_GenLoc_chr = 1 | Mm_GenLoc_start = 13623330 | Mm_GenLoc_end = 13660546 | Mm_Uniprot = Q99KR3 | path = PBB/51110}} = Gene Wiki Version 2. {{Infobox gene}} • All data in Wikidata • 1 Lua script works for all 11,000+ genes = (1 of these for every gene) IMPACT OF WIKIDATA ON WIKIPEDIA
  9. 9. IMPACT BEYOND WIKIPEDIA = SPARQL
  10. 10. Sample of current biomedical content • All human, mouse genes and proteins • All Gene Ontology terms (describe function) • All Human Disease Ontology terms • All FDA approved drugs • 109+ reference microbial genomes Burgstaller-Muelbacher et al (2016) Database Mitraka et al (2015) Semantic Web Applications for the Life Sciences Putman et al (2016) Database
  11. 11. http://tinyurl.com/biowiki-sparql Sample queries that are currently possible: • “where in the cell is the Reelin protein expressed?” • “What diseases are treated by Metformin” • “What diseases might be treated by Metformin” http://query.wikidata.org
  12. 12. Example question: repurposing Metformin http://tinyurl.com/zem3oxz Metformin ?disease interacts with protein geneencoded by genetic association Might treat ? Solute carrier family 22 member 3 SLC22A3 prostate cancer
  13. 13. A SPARQL powered user interface for consuming and editing organism data in Wikidata Timothy E. Putman Ph.D. The Scripps Research Institute, La Jolla, California tputman@scripps.edu Twitter: @putmantime
  14. 14. Gene Wikidata Team Andrew Su (Scripps) Benjamin Good – just spoke Andra Waagmeester (Micelio) Sebastian Burgstaller (Scripps) Elvira Mitraka (U Maryland) Julia Turner (Scripps) Justin Leong (UBC) Lynn Schriml (U Maryland) Paul Pavlidis (UBC) Ginger Tsueng (Scripps) ACKNOWLEDGEMENTS
  15. 15. Centralizing and Linking the Data Bacteria Q10876 domain TRPA Q21153984 protein C.trachomatis Q131065 species trpA Q21153861 gene C. trachomatis 434/BU Q20800254 strain
  16. 16. C. trachomatis Q131065 species trpA Q21153861 gene TRPA Q21153984 protein C. trachomatis 434/BU Q20800254 strain
  17. 17. trpA Q21153861 gene TRPA Q21153984 protein C. trachomatis 434/BU Q20800254 strain C. trachomatis Q131065 species
  18. 18. C. trachomatis Q131065 species TRPA Q21153984 protein C. trachomatis 434/BU Q20800254 strain trpA Q21153861 gene
  19. 19. C. trachomatis Q131065 species trpA Q21153861 gene C. trachomatis 434/BU Q20800254 strain TRPA Q21153984 protein
  20. 20. SPARQL Query • On page load • JQuery execution of SPARQL query as AJAX GET Request
  21. 21. • On organism select • Get all gene and protein data for organism by taxid
  22. 22. QUESTIONS?

×