SlideShare a Scribd company logo
Group Meeting 2016-08-17, Tech
“GPKB: Genomic and Proteomic
Knowledge Base”
by Davide Chicco
davide.chicco@gmail.com
● A data warehouse developed and mantained by my
former colleagues at Politecnico di Milano university
● Integration of several data sources:
● KEGG (Kyoto Encyclopedia of Genes and Genomes)
● OMIM (Online Mendelian Inheritance in Man)
● Gene Ontology Annotations (GOA)
● Gene Ontology (GO)
● Expasy Enzyme
● Entrez Gene
● Reactome
● UniProt
● BioCyc
● IntAct
Genomic and Proteomic Knowledge Base (GPKB)
(c) Flickr Vitlava: database-integration
● Large amounts of biological datasets are available all
around the world
● Especially, biomolecular annotations (associations
between genes or gene products and biological function
features) can help scientists in the understanding of
biology and life science
● The hierarchical structure of the ontology structure of
these datasets are able to highlight semantic
relationships beween data
Motivation
● Implemented in PostgreSQL
● It can be downloaded or used through a web interface
● Dataset quantitative characteristics:
~ 20 milions of genes
~ 20 milions of proteins
~ 17 milions of gene annotations
~ 31 milions of protein annotations
● Some tables are simply imported from data sources
(GO, Reactome, etc)
● Other tables are INFERED from the available datasets
Technical details and quantitative characteristics
● Data tables available:
Technical details and quantitative characteristics
Image from M. Masseroli, et al. "Explorative search of distributed bio-data to answer complex biomedical questions." BMC
Bioinformatics 15.1 (2014): 1.
Green-gray boxes: data table available in the general data warehouse and publically
available on the web interface
Gray boxes: data table available in the general data warehouse (publically available in the
future)
Two main execution modes:
● Basic search
● Easy search
GPKB
● The Basic search functionality is available for searches
aimed at retrieving all information directly associated
with a single feature instance, either imported from
external sources or inferred based on the integrated
data
● For example, all annotations and interactions of a
specific gene or protein (e.g. the human insulin-like
growth factor 2 (somatome-din A) (IGF2) gene, Entrez
Gene ID 3481), or all genes and proteins annotated to
a particular biomedical feature instance, such as a
specific pathway or genetic disorder (e.g. the Alzheimer
disease , OMIM ID 104300).
Basic search
● Authors also implemented an enhanced functionality
and graphical interface for multi-feature search, named
Easy search.
● It supports the simple graphical composition of
complex queries on multiple features just by orderly
selecting the required features, e.g. gene, pathway,
enzyme, biological function feature, genetic disorder,
clinical synopsis, etc.; if needed, display and filtering
constrains can be defined for any attribute of each
selected feature just by specifying them in the feature
windo.
Easy search
● Query example: relationship between genes, biological
function features of pathologies (e.g. in Muscular
dystrophy, Duchenne type).
● Using the Easy search functionality, the user can
orderly select the gene feature, then the gene
associated biological function feature and genetic
disorder features, and then the genetic disorder
associated clinical synopsis feature; finally, before
submitting the query, if the user wants to investigate
only some related pathologies, he/she can specify them
as value of the name attribute in the genetic disorder
feature window.
Easy search
Distinct: only distinct results
Exact count: it runs exact count of the query results,
otherwise it estimates the result count
Conceptual query (C): the query includes the
conceptually equivalent database items coming
from other data sources
Semantic expansion: When a query is executed with
semantic expansion for a feature then the result contains
not only the items that satisfy the query but also
semantically related more general items based on the
feature ontologies
Expand query: After obtaining results for an initial
query, to expand the query only for the user selected
rows of the previous query result
Show all: shows all the query results
Only matching: shows only the query results
matching values between all the selected features
“Find all the genes that are involved both in breast
cancer and in prostate cancer, and then retrieve all the
proteins that are encoded by one of those genes”
http://www.bioinformatics.deib.polimi.it/GPKB
Demo
Main advantages of GPKB compared to other systems
(such as BioWarehouse, Biozon, etc):
1) flexible data schema and software architecture, to
facilitate data import
2) integration of datasets from different sources
highlight semantic relationships between data
elements
3) ability to answer multi-domain biomedical
questions
GPKB advantages
M. Masseroli, A. Canakoglu, and S. Ceri. "Integration
and querying of genomic and proteomic semantic
annotations for biomedical knowledge extraction"
IEEE/ACM Transactions on Computational Biology and
Bioinformatics 13.2 (2016): 209-219.
http://www.bioinformatics.deib.polimi.it/GPKB
Citation and web link

More Related Content

What's hot

Network-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataNetwork-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal data
SOYEON KIM
 
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Fokhruz Zaman
 
Mega
MegaMega
Biomart WormBase Workshop International Worm Meeting 2017
Biomart WormBase Workshop International Worm Meeting 2017Biomart WormBase Workshop International Worm Meeting 2017
Biomart WormBase Workshop International Worm Meeting 2017
raymond91105
 
iEvobIO
iEvobIO iEvobIO
iEvobIO
marypanahiazar
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
ijitcs
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2
Benjamin Good
 
Applications of bioinformatics
Applications of bioinformaticsApplications of bioinformatics
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
AkanshaChauhan15
 
PAG 2015 - Overview of the Breeding Management System - Dr Graham McLaren
PAG 2015 - Overview of the Breeding Management System - Dr Graham McLarenPAG 2015 - Overview of the Breeding Management System - Dr Graham McLaren
PAG 2015 - Overview of the Breeding Management System - Dr Graham McLaren
Integrated Breeding Platform
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
Shikha Thakur
 
KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017
Keywan Hassani-Pak
 
KnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network Miner
Keywan Hassani-Pak
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
KAUSHAL SAHU
 
Bioinformatics
BioinformaticsBioinformatics
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
philmaweb
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
KAUSHAL SAHU
 
Characteristics of biological databases
Characteristics of biological databasesCharacteristics of biological databases
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its Scope
Nixon Mendez
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
Phoenix Bioinformatics
 

What's hot (20)

Network-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataNetwork-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal data
 
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
 
Mega
MegaMega
Mega
 
Biomart WormBase Workshop International Worm Meeting 2017
Biomart WormBase Workshop International Worm Meeting 2017Biomart WormBase Workshop International Worm Meeting 2017
Biomart WormBase Workshop International Worm Meeting 2017
 
iEvobIO
iEvobIO iEvobIO
iEvobIO
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2
 
Applications of bioinformatics
Applications of bioinformaticsApplications of bioinformatics
Applications of bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
PAG 2015 - Overview of the Breeding Management System - Dr Graham McLaren
PAG 2015 - Overview of the Breeding Management System - Dr Graham McLarenPAG 2015 - Overview of the Breeding Management System - Dr Graham McLaren
PAG 2015 - Overview of the Breeding Management System - Dr Graham McLaren
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
 
KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017
 
KnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network Miner
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
 
Characteristics of biological databases
Characteristics of biological databasesCharacteristics of biological databases
Characteristics of biological databases
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its Scope
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
 

Similar to GPKB: Genomic and Proteomic Knowledge Base

Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
Aisha Kalsoom
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
bhargvi sharma
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
VinaKhan1
 
Genome browsing in Bioinformatics.pptx
Genome browsing in Bioinformatics.pptxGenome browsing in Bioinformatics.pptx
Genome browsing in Bioinformatics.pptx
University of Petroleum and Energy studies
 
A systematic review of network analyst - Pubrica
A systematic review of network analyst - PubricaA systematic review of network analyst - Pubrica
A systematic review of network analyst - Pubrica
Pubrica
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
Bioinformatics and Computational Biosciences Branch
 
String.pptx
String.pptxString.pptx
String.pptx
RitikaChoudhary57
 
ProGene 1.0-An In Silico Tool for Protein-Gene Analysis
ProGene 1.0-An In Silico Tool for Protein-Gene AnalysisProGene 1.0-An In Silico Tool for Protein-Gene Analysis
ProGene 1.0-An In Silico Tool for Protein-Gene Analysis
SSR Institute of International Journal of Life Sciences
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
Rothamsted Research, UK
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
Sangeeta Das
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
BITS
 
Kegg databse
Kegg databseKegg databse
Kegg databse
Rashi Srivastava
 
Ugene
UgeneUgene
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
contactsoorya
 
PIR & MINT
PIR & MINTPIR & MINT
PIR & MINT
monaDubey3
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
International Journal of Engineering Inventions www.ijeijournal.com
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Mathew Varghese
 
Ecocyc database
Ecocyc databaseEcocyc database
Ecocyc database
Shiv Kumar
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
Prof. Wim Van Criekinge
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
Eman Abdelrazik
 

Similar to GPKB: Genomic and Proteomic Knowledge Base (20)

Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
 
Genome browsing in Bioinformatics.pptx
Genome browsing in Bioinformatics.pptxGenome browsing in Bioinformatics.pptx
Genome browsing in Bioinformatics.pptx
 
A systematic review of network analyst - Pubrica
A systematic review of network analyst - PubricaA systematic review of network analyst - Pubrica
A systematic review of network analyst - Pubrica
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
String.pptx
String.pptxString.pptx
String.pptx
 
ProGene 1.0-An In Silico Tool for Protein-Gene Analysis
ProGene 1.0-An In Silico Tool for Protein-Gene AnalysisProGene 1.0-An In Silico Tool for Protein-Gene Analysis
ProGene 1.0-An In Silico Tool for Protein-Gene Analysis
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
Kegg databse
Kegg databseKegg databse
Kegg databse
 
Ugene
UgeneUgene
Ugene
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
PIR & MINT
PIR & MINTPIR & MINT
PIR & MINT
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
 
Ecocyc database
Ecocyc databaseEcocyc database
Ecocyc database
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
 

More from Hoffman Lab

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talk
Hoffman Lab
 
TCRpower
TCRpowerTCRpower
TCRpower
Hoffman Lab
 
Efficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetEfficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with gget
Hoffman Lab
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome Browser
Hoffman Lab
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network Tunnel
Hoffman Lab
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seaborn
Hoffman Lab
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)
Hoffman Lab
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
Hoffman Lab
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and Rmdformats
Hoffman Lab
 
File searching tools
File searching toolsFile searching tools
File searching tools
Hoffman Lab
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for Zotero
Hoffman Lab
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and Bioawk
Hoffman Lab
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and Shells
Hoffman Lab
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/Acronym
Hoffman Lab
 
Linters in R
Linters in RLinters in R
Linters in R
Hoffman Lab
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biology
Hoffman Lab
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With Git
Hoffman Lab
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
Hoffman Lab
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...
Hoffman Lab
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2
Hoffman Lab
 

More from Hoffman Lab (20)

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talk
 
TCRpower
TCRpowerTCRpower
TCRpower
 
Efficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetEfficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with gget
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome Browser
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network Tunnel
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seaborn
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and Rmdformats
 
File searching tools
File searching toolsFile searching tools
File searching tools
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for Zotero
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and Bioawk
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and Shells
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/Acronym
 
Linters in R
Linters in RLinters in R
Linters in R
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biology
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With Git
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2
 

Recently uploaded

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 

Recently uploaded (20)

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 

GPKB: Genomic and Proteomic Knowledge Base

  • 1. Group Meeting 2016-08-17, Tech “GPKB: Genomic and Proteomic Knowledge Base” by Davide Chicco davide.chicco@gmail.com
  • 2. ● A data warehouse developed and mantained by my former colleagues at Politecnico di Milano university ● Integration of several data sources: ● KEGG (Kyoto Encyclopedia of Genes and Genomes) ● OMIM (Online Mendelian Inheritance in Man) ● Gene Ontology Annotations (GOA) ● Gene Ontology (GO) ● Expasy Enzyme ● Entrez Gene ● Reactome ● UniProt ● BioCyc ● IntAct Genomic and Proteomic Knowledge Base (GPKB) (c) Flickr Vitlava: database-integration
  • 3. ● Large amounts of biological datasets are available all around the world ● Especially, biomolecular annotations (associations between genes or gene products and biological function features) can help scientists in the understanding of biology and life science ● The hierarchical structure of the ontology structure of these datasets are able to highlight semantic relationships beween data Motivation
  • 4. ● Implemented in PostgreSQL ● It can be downloaded or used through a web interface ● Dataset quantitative characteristics: ~ 20 milions of genes ~ 20 milions of proteins ~ 17 milions of gene annotations ~ 31 milions of protein annotations ● Some tables are simply imported from data sources (GO, Reactome, etc) ● Other tables are INFERED from the available datasets Technical details and quantitative characteristics
  • 5. ● Data tables available: Technical details and quantitative characteristics Image from M. Masseroli, et al. "Explorative search of distributed bio-data to answer complex biomedical questions." BMC Bioinformatics 15.1 (2014): 1. Green-gray boxes: data table available in the general data warehouse and publically available on the web interface Gray boxes: data table available in the general data warehouse (publically available in the future)
  • 6. Two main execution modes: ● Basic search ● Easy search GPKB
  • 7. ● The Basic search functionality is available for searches aimed at retrieving all information directly associated with a single feature instance, either imported from external sources or inferred based on the integrated data ● For example, all annotations and interactions of a specific gene or protein (e.g. the human insulin-like growth factor 2 (somatome-din A) (IGF2) gene, Entrez Gene ID 3481), or all genes and proteins annotated to a particular biomedical feature instance, such as a specific pathway or genetic disorder (e.g. the Alzheimer disease , OMIM ID 104300). Basic search
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. ● Authors also implemented an enhanced functionality and graphical interface for multi-feature search, named Easy search. ● It supports the simple graphical composition of complex queries on multiple features just by orderly selecting the required features, e.g. gene, pathway, enzyme, biological function feature, genetic disorder, clinical synopsis, etc.; if needed, display and filtering constrains can be defined for any attribute of each selected feature just by specifying them in the feature windo. Easy search
  • 14. ● Query example: relationship between genes, biological function features of pathologies (e.g. in Muscular dystrophy, Duchenne type). ● Using the Easy search functionality, the user can orderly select the gene feature, then the gene associated biological function feature and genetic disorder features, and then the genetic disorder associated clinical synopsis feature; finally, before submitting the query, if the user wants to investigate only some related pathologies, he/she can specify them as value of the name attribute in the genetic disorder feature window. Easy search
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 21. Exact count: it runs exact count of the query results, otherwise it estimates the result count
  • 22. Conceptual query (C): the query includes the conceptually equivalent database items coming from other data sources
  • 23. Semantic expansion: When a query is executed with semantic expansion for a feature then the result contains not only the items that satisfy the query but also semantically related more general items based on the feature ontologies
  • 24. Expand query: After obtaining results for an initial query, to expand the query only for the user selected rows of the previous query result
  • 25. Show all: shows all the query results Only matching: shows only the query results matching values between all the selected features
  • 26. “Find all the genes that are involved both in breast cancer and in prostate cancer, and then retrieve all the proteins that are encoded by one of those genes” http://www.bioinformatics.deib.polimi.it/GPKB Demo
  • 27. Main advantages of GPKB compared to other systems (such as BioWarehouse, Biozon, etc): 1) flexible data schema and software architecture, to facilitate data import 2) integration of datasets from different sources highlight semantic relationships between data elements 3) ability to answer multi-domain biomedical questions GPKB advantages
  • 28. M. Masseroli, A. Canakoglu, and S. Ceri. "Integration and querying of genomic and proteomic semantic annotations for biomedical knowledge extraction" IEEE/ACM Transactions on Computational Biology and Bioinformatics 13.2 (2016): 209-219. http://www.bioinformatics.deib.polimi.it/GPKB Citation and web link