SlideShare a Scribd company logo
1 of 28
Download to read offline
Group Meeting 2016-08-17, Tech
“GPKB: Genomic and Proteomic
Knowledge Base”
by Davide Chicco
davide.chicco@gmail.com
● A data warehouse developed and mantained by my
former colleagues at Politecnico di Milano university
● Integration of several data sources:
● KEGG (Kyoto Encyclopedia of Genes and Genomes)
● OMIM (Online Mendelian Inheritance in Man)
● Gene Ontology Annotations (GOA)
● Gene Ontology (GO)
● Expasy Enzyme
● Entrez Gene
● Reactome
● UniProt
● BioCyc
● IntAct
Genomic and Proteomic Knowledge Base (GPKB)
(c) Flickr Vitlava: database-integration
● Large amounts of biological datasets are available all
around the world
● Especially, biomolecular annotations (associations
between genes or gene products and biological function
features) can help scientists in the understanding of
biology and life science
● The hierarchical structure of the ontology structure of
these datasets are able to highlight semantic
relationships beween data
Motivation
● Implemented in PostgreSQL
● It can be downloaded or used through a web interface
● Dataset quantitative characteristics:
~ 20 milions of genes
~ 20 milions of proteins
~ 17 milions of gene annotations
~ 31 milions of protein annotations
● Some tables are simply imported from data sources
(GO, Reactome, etc)
● Other tables are INFERED from the available datasets
Technical details and quantitative characteristics
● Data tables available:
Technical details and quantitative characteristics
Image from M. Masseroli, et al. "Explorative search of distributed bio-data to answer complex biomedical questions." BMC
Bioinformatics 15.1 (2014): 1.
Green-gray boxes: data table available in the general data warehouse and publically
available on the web interface
Gray boxes: data table available in the general data warehouse (publically available in the
future)
Two main execution modes:
● Basic search
● Easy search
GPKB
● The Basic search functionality is available for searches
aimed at retrieving all information directly associated
with a single feature instance, either imported from
external sources or inferred based on the integrated
data
● For example, all annotations and interactions of a
specific gene or protein (e.g. the human insulin-like
growth factor 2 (somatome-din A) (IGF2) gene, Entrez
Gene ID 3481), or all genes and proteins annotated to
a particular biomedical feature instance, such as a
specific pathway or genetic disorder (e.g. the Alzheimer
disease , OMIM ID 104300).
Basic search
● Authors also implemented an enhanced functionality
and graphical interface for multi-feature search, named
Easy search.
● It supports the simple graphical composition of
complex queries on multiple features just by orderly
selecting the required features, e.g. gene, pathway,
enzyme, biological function feature, genetic disorder,
clinical synopsis, etc.; if needed, display and filtering
constrains can be defined for any attribute of each
selected feature just by specifying them in the feature
windo.
Easy search
● Query example: relationship between genes, biological
function features of pathologies (e.g. in Muscular
dystrophy, Duchenne type).
● Using the Easy search functionality, the user can
orderly select the gene feature, then the gene
associated biological function feature and genetic
disorder features, and then the genetic disorder
associated clinical synopsis feature; finally, before
submitting the query, if the user wants to investigate
only some related pathologies, he/she can specify them
as value of the name attribute in the genetic disorder
feature window.
Easy search
Distinct: only distinct results
Exact count: it runs exact count of the query results,
otherwise it estimates the result count
Conceptual query (C): the query includes the
conceptually equivalent database items coming
from other data sources
Semantic expansion: When a query is executed with
semantic expansion for a feature then the result contains
not only the items that satisfy the query but also
semantically related more general items based on the
feature ontologies
Expand query: After obtaining results for an initial
query, to expand the query only for the user selected
rows of the previous query result
Show all: shows all the query results
Only matching: shows only the query results
matching values between all the selected features
“Find all the genes that are involved both in breast
cancer and in prostate cancer, and then retrieve all the
proteins that are encoded by one of those genes”
http://www.bioinformatics.deib.polimi.it/GPKB
Demo
Main advantages of GPKB compared to other systems
(such as BioWarehouse, Biozon, etc):
1) flexible data schema and software architecture, to
facilitate data import
2) integration of datasets from different sources
highlight semantic relationships between data
elements
3) ability to answer multi-domain biomedical
questions
GPKB advantages
M. Masseroli, A. Canakoglu, and S. Ceri. "Integration
and querying of genomic and proteomic semantic
annotations for biomedical knowledge extraction"
IEEE/ACM Transactions on Computational Biology and
Bioinformatics 13.2 (2016): 209-219.
http://www.bioinformatics.deib.polimi.it/GPKB
Citation and web link

More Related Content

What's hot

Network-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataNetwork-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataSOYEON KIM
 
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Fokhruz Zaman
 
Biomart WormBase Workshop International Worm Meeting 2017
Biomart WormBase Workshop International Worm Meeting 2017Biomart WormBase Workshop International Worm Meeting 2017
Biomart WormBase Workshop International Worm Meeting 2017raymond91105
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...ijitcs
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Benjamin Good
 
PAG 2015 - Overview of the Breeding Management System - Dr Graham McLaren
PAG 2015 - Overview of the Breeding Management System - Dr Graham McLarenPAG 2015 - Overview of the Breeding Management System - Dr Graham McLaren
PAG 2015 - Overview of the Breeding Management System - Dr Graham McLarenIntegrated Breeding Platform
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsShikha Thakur
 
KnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKeywan Hassani-Pak
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsphilmaweb
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kkKAUSHAL SAHU
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeNixon Mendez
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenix Bioinformatics
 

What's hot (20)

Network-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal dataNetwork-based machine learning approach for aggregating multi-modal data
Network-based machine learning approach for aggregating multi-modal data
 
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
 
Mega
MegaMega
Mega
 
Biomart WormBase Workshop International Worm Meeting 2017
Biomart WormBase Workshop International Worm Meeting 2017Biomart WormBase Workshop International Worm Meeting 2017
Biomart WormBase Workshop International Worm Meeting 2017
 
iEvobIO
iEvobIO iEvobIO
iEvobIO
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2
 
Applications of bioinformatics
Applications of bioinformaticsApplications of bioinformatics
Applications of bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
PAG 2015 - Overview of the Breeding Management System - Dr Graham McLaren
PAG 2015 - Overview of the Breeding Management System - Dr Graham McLarenPAG 2015 - Overview of the Breeding Management System - Dr Graham McLaren
PAG 2015 - Overview of the Breeding Management System - Dr Graham McLaren
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
 
KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017
 
KnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network Miner
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
 
Characteristics of biological databases
Characteristics of biological databasesCharacteristics of biological databases
Characteristics of biological databases
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its Scope
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
 

Similar to GPKB Genomic Proteomic Knowledge Base

Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomicsAisha Kalsoom
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of BioinformaticsVinaKhan1
 
A systematic review of network analyst - Pubrica
A systematic review of network analyst - PubricaA systematic review of network analyst - Pubrica
A systematic review of network analyst - PubricaPubrica
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesRothamsted Research, UK
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformaticscontactsoorya
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Mathew Varghese
 
Ecocyc database
Ecocyc databaseEcocyc database
Ecocyc databaseShiv Kumar
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsProf. Wim Van Criekinge
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur PipelineEman Abdelrazik
 

Similar to GPKB Genomic Proteomic Knowledge Base (20)

Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
 
Genome browsing in Bioinformatics.pptx
Genome browsing in Bioinformatics.pptxGenome browsing in Bioinformatics.pptx
Genome browsing in Bioinformatics.pptx
 
A systematic review of network analyst - Pubrica
A systematic review of network analyst - PubricaA systematic review of network analyst - Pubrica
A systematic review of network analyst - Pubrica
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
String.pptx
String.pptxString.pptx
String.pptx
 
ProGene 1.0-An In Silico Tool for Protein-Gene Analysis
ProGene 1.0-An In Silico Tool for Protein-Gene AnalysisProGene 1.0-An In Silico Tool for Protein-Gene Analysis
ProGene 1.0-An In Silico Tool for Protein-Gene Analysis
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
Kegg databse
Kegg databseKegg databse
Kegg databse
 
Ugene
UgeneUgene
Ugene
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
PIR & MINT
PIR & MINTPIR & MINT
PIR & MINT
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
 
Ecocyc database
Ecocyc databaseEcocyc database
Ecocyc database
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
 

More from Hoffman Lab

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkHoffman Lab
 
Efficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetEfficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetHoffman Lab
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome BrowserHoffman Lab
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelHoffman Lab
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornHoffman Lab
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)Hoffman Lab
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorHoffman Lab
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and RmdformatsHoffman Lab
 
File searching tools
File searching toolsFile searching tools
File searching toolsHoffman Lab
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroHoffman Lab
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and BioawkHoffman Lab
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and ShellsHoffman Lab
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/AcronymHoffman Lab
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyHoffman Lab
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With GitHoffman Lab
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserHoffman Lab
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...Hoffman Lab
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2Hoffman Lab
 

More from Hoffman Lab (20)

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talk
 
TCRpower
TCRpowerTCRpower
TCRpower
 
Efficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetEfficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with gget
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome Browser
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network Tunnel
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seaborn
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and Rmdformats
 
File searching tools
File searching toolsFile searching tools
File searching tools
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for Zotero
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and Bioawk
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and Shells
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/Acronym
 
Linters in R
Linters in RLinters in R
Linters in R
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biology
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With Git
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

GPKB Genomic Proteomic Knowledge Base

  • 1. Group Meeting 2016-08-17, Tech “GPKB: Genomic and Proteomic Knowledge Base” by Davide Chicco davide.chicco@gmail.com
  • 2. ● A data warehouse developed and mantained by my former colleagues at Politecnico di Milano university ● Integration of several data sources: ● KEGG (Kyoto Encyclopedia of Genes and Genomes) ● OMIM (Online Mendelian Inheritance in Man) ● Gene Ontology Annotations (GOA) ● Gene Ontology (GO) ● Expasy Enzyme ● Entrez Gene ● Reactome ● UniProt ● BioCyc ● IntAct Genomic and Proteomic Knowledge Base (GPKB) (c) Flickr Vitlava: database-integration
  • 3. ● Large amounts of biological datasets are available all around the world ● Especially, biomolecular annotations (associations between genes or gene products and biological function features) can help scientists in the understanding of biology and life science ● The hierarchical structure of the ontology structure of these datasets are able to highlight semantic relationships beween data Motivation
  • 4. ● Implemented in PostgreSQL ● It can be downloaded or used through a web interface ● Dataset quantitative characteristics: ~ 20 milions of genes ~ 20 milions of proteins ~ 17 milions of gene annotations ~ 31 milions of protein annotations ● Some tables are simply imported from data sources (GO, Reactome, etc) ● Other tables are INFERED from the available datasets Technical details and quantitative characteristics
  • 5. ● Data tables available: Technical details and quantitative characteristics Image from M. Masseroli, et al. "Explorative search of distributed bio-data to answer complex biomedical questions." BMC Bioinformatics 15.1 (2014): 1. Green-gray boxes: data table available in the general data warehouse and publically available on the web interface Gray boxes: data table available in the general data warehouse (publically available in the future)
  • 6. Two main execution modes: ● Basic search ● Easy search GPKB
  • 7. ● The Basic search functionality is available for searches aimed at retrieving all information directly associated with a single feature instance, either imported from external sources or inferred based on the integrated data ● For example, all annotations and interactions of a specific gene or protein (e.g. the human insulin-like growth factor 2 (somatome-din A) (IGF2) gene, Entrez Gene ID 3481), or all genes and proteins annotated to a particular biomedical feature instance, such as a specific pathway or genetic disorder (e.g. the Alzheimer disease , OMIM ID 104300). Basic search
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. ● Authors also implemented an enhanced functionality and graphical interface for multi-feature search, named Easy search. ● It supports the simple graphical composition of complex queries on multiple features just by orderly selecting the required features, e.g. gene, pathway, enzyme, biological function feature, genetic disorder, clinical synopsis, etc.; if needed, display and filtering constrains can be defined for any attribute of each selected feature just by specifying them in the feature windo. Easy search
  • 14. ● Query example: relationship between genes, biological function features of pathologies (e.g. in Muscular dystrophy, Duchenne type). ● Using the Easy search functionality, the user can orderly select the gene feature, then the gene associated biological function feature and genetic disorder features, and then the genetic disorder associated clinical synopsis feature; finally, before submitting the query, if the user wants to investigate only some related pathologies, he/she can specify them as value of the name attribute in the genetic disorder feature window. Easy search
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 21. Exact count: it runs exact count of the query results, otherwise it estimates the result count
  • 22. Conceptual query (C): the query includes the conceptually equivalent database items coming from other data sources
  • 23. Semantic expansion: When a query is executed with semantic expansion for a feature then the result contains not only the items that satisfy the query but also semantically related more general items based on the feature ontologies
  • 24. Expand query: After obtaining results for an initial query, to expand the query only for the user selected rows of the previous query result
  • 25. Show all: shows all the query results Only matching: shows only the query results matching values between all the selected features
  • 26. “Find all the genes that are involved both in breast cancer and in prostate cancer, and then retrieve all the proteins that are encoded by one of those genes” http://www.bioinformatics.deib.polimi.it/GPKB Demo
  • 27. Main advantages of GPKB compared to other systems (such as BioWarehouse, Biozon, etc): 1) flexible data schema and software architecture, to facilitate data import 2) integration of datasets from different sources highlight semantic relationships between data elements 3) ability to answer multi-domain biomedical questions GPKB advantages
  • 28. M. Masseroli, A. Canakoglu, and S. Ceri. "Integration and querying of genomic and proteomic semantic annotations for biomedical knowledge extraction" IEEE/ACM Transactions on Computational Biology and Bioinformatics 13.2 (2016): 209-219. http://www.bioinformatics.deib.polimi.it/GPKB Citation and web link