SlideShare a Scribd company logo
CURATING BIOMEDICAL KNOWLEDGE ON WIKIDATA AND WIKIPEDIA
GENE WIKI
Benjamin Good
The Scripps Research Institute,
La Jolla, California
bgood@scripps.edu
Twitter: @bgood
Gene Wikidata Team
Andrew Su (Scripps)
Andra Waagmeester (Micelio)
Sebastian Burgstaller (Scripps)
Tim Putman (Scripps) – speaking next
Julia Turner (Scripps)
Elvira Mitraka (U Maryland)
Justin Leong (UBC)
Lynn Schriml (U Maryland)
Paul Pavlidis (UBC)
Ginger Tsueng (Scripps)
ACKNOWLEDGEMENTS
“knowledge”
• A lot
• Important
• Text
More than 2 articles published/minute
Documents
Concepts
Gene Wiki: Filtering and summarizing PubMed
GENE WIKI
6
Protein structure
Symbols and
identifiers
Tissue expression
pattern
Gene Ontology
annotations
Links to structured
databases
Gene
summary
Protein
interactions
Linked
references
Huss, PLoS Biol, 2008
Bot!
GENE WIKI TIMELINE
Project
Starts
https://en.wikipedia.org/wiki/Portal:Gene_Wiki
Gene Wiki
Version 1.
{{GNF_Protein_box | Name = Reelin| image = |
image_source = | PDB = {{PDB2|4AD9}} | HGNCid = 18512 |
MGIid = | Symbol = LACTB2 | AltSymbols =; CGI-83 |
IUPHAR = | ChEMBL = | OMIM = None | ECnumber = |
Homologene = 9349 | GeneAtlas_image1 = |
GeneAtlas_image2 = | GeneAtlas_image3 = |
Protein_domain_image = | Function =
{{GNF_GO|id=GO:0005515 |text = protein binding}}
{{GNF_GO|id=GO:0016787 |text = hydrolase activity}}
{{GNF_GO|id=GO:0046872 |text = metal ion binding}} |
Component = {{GNF_GO|id=GO:0005739 |text =
mitochondrion}} | Process = {{GNF_GO|id=GO:0008152
|text = metabolic process}} | Hs_EntrezGene = 51110 |
Hs_Ensembl = ENSG00000147592 | Hs_RefseqmRNA =
NM_016027 | Hs_RefseqProtein = NP_057111 |
Hs_GenLoc_db = hg38 | Hs_GenLoc_chr = 8 |
Hs_GenLoc_start = 70635318 | Hs_GenLoc_end = 70669174
| Hs_Uniprot = Q53H82 | Mm_EntrezGene = 212442 |
Mm_Ensembl = ENSMUSG00000025937 |
Mm_RefseqmRNA = NM_145381 | Mm_RefseqProtein =
NP_663356 | Mm_GenLoc_db = mm10 | Mm_GenLoc_chr =
1 | Mm_GenLoc_start = 13623330 | Mm_GenLoc_end =
13660546 | Mm_Uniprot = Q99KR3 | path = PBB/51110}}
=
Gene Wiki
Version 2.
{{Infobox gene}}
• All data in Wikidata
• 1 Lua script works for
all 11,000+ genes
=
(1 of these for every gene)
IMPACT OF WIKIDATA ON WIKIPEDIA
IMPACT BEYOND WIKIPEDIA
= SPARQL
Sample of current biomedical content
• All human, mouse genes and proteins
• All Gene Ontology terms (describe function)
• All Human Disease Ontology terms
• All FDA approved drugs
• 109+ reference microbial genomes
Burgstaller-Muelbacher et al (2016) Database
Mitraka et al (2015) Semantic Web Applications for the Life Sciences
Putman et al (2016) Database
http://tinyurl.com/biowiki-sparql
Sample queries that are currently possible:
• “where in the cell is the Reelin protein expressed?”
• “What diseases are treated by Metformin”
• “What diseases might be treated by Metformin”
http://query.wikidata.org
Example question: repurposing Metformin
http://tinyurl.com/zem3oxz
Metformin
?disease
interacts
with
protein
geneencoded by genetic
association
Might
treat ?
Solute carrier
family 22
member 3
SLC22A3
prostate
cancer
Gene Wiki and Wikimedia Foundation SPARQL workshop
A SPARQL powered user interface
for consuming and editing organism
data in Wikidata
Timothy E. Putman Ph.D.
The Scripps Research Institute,
La Jolla, California
tputman@scripps.edu
Twitter: @putmantime
Gene Wikidata Team
Andrew Su (Scripps)
Benjamin Good – just spoke
Andra Waagmeester (Micelio)
Sebastian Burgstaller (Scripps)
Elvira Mitraka (U Maryland)
Julia Turner (Scripps)
Justin Leong (UBC)
Lynn Schriml (U Maryland)
Paul Pavlidis (UBC)
Ginger Tsueng (Scripps)
ACKNOWLEDGEMENTS
Centralizing and Linking the Data
Bacteria
Q10876
domain
TRPA
Q21153984
protein
C.trachomatis
Q131065
species
trpA
Q21153861
gene
C.
trachomatis
434/BU
Q20800254
strain
C.
trachomatis
Q131065
species
trpA
Q21153861
gene
TRPA
Q21153984
protein
C. trachomatis
434/BU
Q20800254
strain
trpA
Q21153861
gene
TRPA
Q21153984
protein
C. trachomatis
434/BU
Q20800254
strain
C.
trachomatis
Q131065
species
C.
trachomatis
Q131065
species
TRPA
Q21153984
protein
C. trachomatis
434/BU
Q20800254
strain
trpA
Q21153861
gene
C.
trachomatis
Q131065
species
trpA
Q21153861
gene
C. trachomatis
434/BU
Q20800254
strain
TRPA
Q21153984
protein
Gene Wiki and Wikimedia Foundation SPARQL workshop
SPARQL Query
• On page load
• JQuery execution of SPARQL query as AJAX GET Request
Gene Wiki and Wikimedia Foundation SPARQL workshop
• On organism select
• Get all gene and protein data for
organism by taxid
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
QUESTIONS?

More Related Content

Gene Wiki and Wikimedia Foundation SPARQL workshop

Editor's Notes

  • #5: Knowledge is either not shared (stuck in your head or your notebook) or it is shared as text and images in journal articles. There are more than 1 million articles added to PubMed each year
  • #6: Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  • #9: Now we can use a database instead of wikitext to store data. great! and opens up other possibilities
  • #17: Using this linked data model . For a bacterial genome, each genetic item is linked to the taxonomic hierarchy, and the gene and protein are distinct entities. The gene having genomic annotations, the protein functional annotations, and them both being linked by the encodes and encoded by properties.
  • #18: So here is the wikidata item that represents the strain or subspecies taxa in our data model. Now we can navigate through the graph by following the statements that lead to other wikidata items. So for example if you click on parent taxon, you go to the species level item …
  • #19: chlamydia trachomatis,if you kept going in that direction, once you have gone through genus, family, order etc.. you would eventually reach bacteria King Phillip Came Over From Great Spain.
  • #20: IF you go in the other direction, you get to the genes found in that taxon through the predicate of that name, that gene is linked to its product through encodes
  • #21: and its product is linked back to its gene through encoded by, and the strain also through found in taxon. On the protein is where you would find functional annotations such as GO terms.