Comprehensive gene set enrichment analysis web server Enrichr

•Download as PPTX, PDF•

0 likes•271 views

Enrichr is a web server that provides comprehensive gene set enrichment analysis. It contains over 31,000 gene sets across 35 libraries covering categories like transcription, pathways, cell types, and diseases. These gene sets represent human and mouse genomes and proteomes. Enrichr calculates enrichment using methods like Fisher exact test, z-score, and combined score. It offers advantages of being easy to use with interactive visualization, but lacks some flexibility and features of other tools.

Science

Enrichr
a comprehensive gene set enrichment analysis web server
INFO-703 Biological Data Management
March 27th, 2019
Presented by Thi Nguyen
http://amp.pharm.mssm.edu/Enrichr/

EnrichR gene-set libraries
• 35 gene-set libraries: transcription, pathways, ontologies, diseases/drugs,
cell types and misc.
• total = 31,026 gene-sets that completely cover human and mouse
genome + proteome
• on average, each gene-set has ~ 350 genes and > 6 million connections between
gene and term.
• gene frequencies for most gene-set libraries follow power law.

EnrichR gene-set libraries
I. transcription category: link DEG with transcription factors:
1. ChIP-x Enrichment Analysis (ChEA) database
2. Position weight matrices (PWM) from TRANSFAC database
3. transcription factor target genes inferred from PWM
4. ENCODE transcription factor gene-set library
5. Histone modification extracted from from
NIH Roadmap Epigenomics
6. microRNA gene set library from TargetScan

EnrichR gene-set libraries
II. Pathway Category includes gene-set libraries from well-known databases:
• WikiPathways
• KEGG
• BioCarta
• Reactome
and other libraries are created from their own resources:
• kinase enrichment analysis (KEA)
• PPI hubs
• CORUM
• complexes from IP-MS study
• mannually assembled lists of phosphoproteins from SILAC phosphoproteomics
III. Ontology Category: contains gene-set libraries from 3 gene ontology threes
and from the knockout mouse phenotypes ontology from MGI-MP browser (Jackson
lab)

EnrichR gene-set libraries
V. Cell type category :
• highly expressed genes from Mouse and Human Gene Atlases
• highly expressed genes from cancer cells from Cancer Cell Line Encyclopedia (CCLE)
• NCI-60 Cell line data set
VI. Misc category:
• chromosome location (MSigDB)
• metabolites (HMDB)
• structural domains (PFAM and InterPro)
IV. Disease/drug category:
• CMAP database
• GeneSigDB
• MSigDB
• OMIM
• VirusMINT

3 methods to rank enrichment scores
1.Fisher Exact test
2.z-score of the deviation from the expected
rank by Fisher Exact test
3. combined score that multiplies the log of
p-value (Fisher exact test) by the z-score

Using EnrichR
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-128

2016 NAR update
• 180, 184 annotated gene sets from 102 gene set libraries (GEO)
• new features:
submit fuzzy sets
upload BED files
improve API
visualization tool: clustergram
different scoring scheme
visualize overlap between Enrichr and other gene set libraries

2016 NAR update
comparison of resources
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987924/

2016 NAR update
comparison of resources
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987924/
User interface pros and cons
There are many other gene set enrichment analysis tools that could be compared with
Enrichr; for example, some leading tools are Fidea (39), DAVID (13), WebGestalt (12), g:Profiler
(12) and GSEA (40). The advantages of Enrichr over some of these tools are its
comprehensiveness, ease of use and interactive visualization of the results. Enrichr is lacking
some of the flexibility available with those other tools. For example, Enrichr merges human,
mouse and rat genes, which has advantages and disadvantages. Enrichr does not have an ID
conversion tool, which is highly desired by many users. Enrichr also does not have the ability
to upload a background list, and it does not have implementation of parametric tests such as
Gene Set Enrichment Analysis (GSEA) (40), Parametric Analysis of Gene set Enrichment (PAGE)
(9), and our own Principal Angle Enrichment Analysis (PAEA) (41). These features are planned.

What's hot

Yeast two hybrid system / protein-protein interactionMaryam Shakeel

Protein structure prediction (1)Sabahat Ali

Protein Data Bank (PDB)Thapar Institute of Engineering & Technology, Patiala, Punjab, India

Gene Ontology Projectvaibhavdeoda

UCSF Chimera | BioCode LtdBioCode Ltd

Structural Bioinformatics - Homology modeling & its ScopeNixon Mendez

Protein modelingMalla Reddy College of Pharmacy

Presentation on Biological database By Elufer Akram @ University Of Science ...Elufer Akram

Bioinformatics t6-phylogenetics v2014Prof. Wim Van Criekinge

Gene discoveryศุภชัย โตภาณุรักษ์

Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDBDinesh Barupal

Multiple Alignment Sequence using Clustal Omega/ Shumaila RiazShumailaRiaz6

Introduction to Proteogenomics Yasset Perez-Riverol

Chemical File Formats for storing chemical dataAbhik Seal

Introduction to pdbMakarand Bhale

Phylogenetic Tree evolutionMd Omama Jawaid

Molecular modeling database Jayati Shrivastava

Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...VHIR Vall d’Hebron Institut de Recerca

Ensembl annotationGenome Reference Consortium

Database in bioinformaticsVinaKhan1

What's hot (20)

Yeast two hybrid system / protein-protein interaction

Protein structure prediction (1)

Protein Data Bank (PDB)

Gene Ontology Project

UCSF Chimera | BioCode Ltd

Structural Bioinformatics - Homology modeling & its Scope

Protein modeling

Presentation on Biological database By Elufer Akram @ University Of Science ...

Bioinformatics t6-phylogenetics v2014

Gene discovery

Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB

Multiple Alignment Sequence using Clustal Omega/ Shumaila Riaz

Introduction to Proteogenomics

Chemical File Formats for storing chemical data

Introduction to pdb

Phylogenetic Tree evolution

Molecular modeling database

Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...

Ensembl annotation

Database in bioinformatics

Similar to Comprehensive gene set enrichment analysis web server Enrichr

Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceAndrew Su

16S classifierAshok Sharma

Variant analysis and whole exome sequencingBioinformatics and Computational Biosciences Branch

Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseNathan Olson

Quantifying the content of biomedical semantic resources as a core for drug d...Syed Muhammad Ali Hasnain

Bioinformatics (Exam point of view)Sijo A

MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...Syed Ahmad Chan Bukhari, PhD

A framework for human microbiome researchAlfonso Enrique Islas Rodríguez

Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute

Overview of Next Gen Sequencing Data AnalysisBioinformatics and Computational Biosciences Branch

INFORMATICS 2.pptxramadevi824914

INFORMATICS 2.pptxOramadevi1

Biological DatabaseSombir Kashyap

EnrichNet: Graph-based statistic and web-application for gene/protein set enr...Enrico Glaab

Bioinformatics مي.pdfnedalalazzwy

Bridging Histology and BioinformaticsNahla Imbarak

Introduction to databases.pptxsworna kumari chithiraivelu

MetSim: Integrated Programmatic Access and Pathway Management for Xenobiotic ...Louis C. Groff II, PhD

GWAS and DASVerena139

Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su

Similar to Comprehensive gene set enrichment analysis web server Enrichr (20)

Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science

16S classifier

Variant analysis and whole exome sequencing

Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database

Quantifying the content of biomedical semantic resources as a core for drug d...

Bioinformatics (Exam point of view)

MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Seque...

A framework for human microbiome research

Advanced Bioinformatics for Genomics and BioData Driven Research

Overview of Next Gen Sequencing Data Analysis

INFORMATICS 2.pptx

Biological Database

EnrichNet: Graph-based statistic and web-application for gene/protein set enr...

Bioinformatics مي.pdf

Bridging Histology and Bioinformatics

Introduction to databases.pptx

MetSim: Integrated Programmatic Access and Pathway Management for Xenobiotic ...

GWAS and DAS

Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org

Recently uploaded

Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji

Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani

Recombination DNA Technology (Microinjection)Jshifa

Is RISC-V ready for HPC workload? Maybe?Patrick Diehl

A relative description on Sonoporation.pdfnehabiju2046

Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA

Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha

Natural Polymer Based NanomaterialsAArockiyaNisha

zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069

Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P

Isotopic evidence of long-lived volcanism on IoSérgio Sacani

Neurodevelopmental disorders according to the dsm 5 trssuser06f238

Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR

Engler and Prantl system of classification in plant taxonomyNistarini College, Purulia (W.B) India

The Black hole shadow in Modified GravitySubhadipsau21168

Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar

Nanoparticles synthesis and characterization kaibalyasahoo82800

TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter

Recently uploaded (20)

Luciferase in rDNA technology (biotechnology).pptx

Recombination DNA Technology (Nucleic Acid Hybridization )

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...

Recombination DNA Technology (Microinjection)

Is RISC-V ready for HPC workload? Maybe?

A relative description on Sonoporation.pdf

Grafana in space: Monitoring Japan's SLIM moon lander in real time

Physiochemical properties of nanomaterials and its nanotoxicity.pptx

Natural Polymer Based Nanomaterials

zoogeography of pakistan.pptx fauna of Pakistan

Artificial Intelligence In Microbiology by Dr. Prince C P

Isotopic evidence of long-lived volcanism on Io

Neurodevelopmental disorders according to the dsm 5 tr

Recombinant DNA technology( Transgenic plant and animal)

Engler and Prantl system of classification in plant taxonomy

The Black hole shadow in Modified Gravity

Analytical Profile of Coleus Forskohlii | Forskolin .pdf

Nanoparticles synthesis and characterization

TOPIC 8 Temperature and Heat.pdf physics

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx

Comprehensive gene set enrichment analysis web server Enrichr

1. Enrichr a comprehensive gene set enrichment analysis web server INFO-703 Biological Data Management March 27th, 2019 Presented by Thi Nguyen http://amp.pharm.mssm.edu/Enrichr/

2. EnrichR gene-set libraries • 35 gene-set libraries: transcription, pathways, ontologies, diseases/drugs, cell types and misc. • total = 31,026 gene-sets that completely cover human and mouse genome + proteome • on average, each gene-set has ~ 350 genes and > 6 million connections between gene and term. • gene frequencies for most gene-set libraries follow power law.

3. EnrichR gene-set libraries I. transcription category: link DEG with transcription factors: 1. ChIP-x Enrichment Analysis (ChEA) database 2. Position weight matrices (PWM) from TRANSFAC database 3. transcription factor target genes inferred from PWM 4. ENCODE transcription factor gene-set library 5. Histone modification extracted from from NIH Roadmap Epigenomics 6. microRNA gene set library from TargetScan

4. EnrichR gene-set libraries II. Pathway Category includes gene-set libraries from well-known databases: • WikiPathways • KEGG • BioCarta • Reactome and other libraries are created from their own resources: • kinase enrichment analysis (KEA) • PPI hubs • CORUM • complexes from IP-MS study • mannually assembled lists of phosphoproteins from SILAC phosphoproteomics III. Ontology Category: contains gene-set libraries from 3 gene ontology threes and from the knockout mouse phenotypes ontology from MGI-MP browser (Jackson lab)

5. EnrichR gene-set libraries V. Cell type category : • highly expressed genes from Mouse and Human Gene Atlases • highly expressed genes from cancer cells from Cancer Cell Line Encyclopedia (CCLE) • NCI-60 Cell line data set VI. Misc category: • chromosome location (MSigDB) • metabolites (HMDB) • structural domains (PFAM and InterPro) IV. Disease/drug category: • CMAP database • GeneSigDB • MSigDB • OMIM • VirusMINT

6. 3 methods to rank enrichment scores 1.Fisher Exact test 2.z-score of the deviation from the expected rank by Fisher Exact test 3. combined score that multiplies the log of p-value (Fisher exact test) by the z-score

7. Using EnrichR https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-128

8. 2016 NAR update • 180, 184 annotated gene sets from 102 gene set libraries (GEO) • new features: submit fuzzy sets upload BED files improve API visualization tool: clustergram different scoring scheme visualize overlap between Enrichr and other gene set libraries

9. 2016 NAR update comparison of resources https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987924/

10. 2016 NAR update comparison of resources https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987924/ User interface pros and cons There are many other gene set enrichment analysis tools that could be compared with Enrichr; for example, some leading tools are Fidea (39), DAVID (13), WebGestalt (12), g:Profiler (12) and GSEA (40). The advantages of Enrichr over some of these tools are its comprehensiveness, ease of use and interactive visualization of the results. Enrichr is lacking some of the flexibility available with those other tools. For example, Enrichr merges human, mouse and rat genes, which has advantages and disadvantages. Enrichr does not have an ID conversion tool, which is highly desired by many users. Enrichr also does not have the ability to upload a background list, and it does not have implementation of parametric tests such as Gene Set Enrichment Analysis (GSEA) (40), Parametric Analysis of Gene set Enrichment (PAGE) (9), and our own Principal Angle Enrichment Analysis (PAEA) (41). These features are planned.

Editor's Notes

it is a database of gene set libraries n enrichment analysis tool developed by the Ma'ayan Lab.
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Enrichr includes 35 gene-set libraries totaling 31,026 gene-sets that completely cover the human and mouse genome and proteome (Table 1). On average, each gene-set has ~350 genes and there are over six million connections between terms and genes. Further statistics and information of where the gene-set libraries were derived from can be found in the “Dataset Statistics” tab of the Enrichr main page. Histograms of gene frequencies for most gene-set libraries follow a power law, suggesting that some genes are much more common in gene-set libraries than others (Figure 2a). This has an implication for enrichment computations that we did not consider yet in Enrichr. Some genes are more likely to appear in various enrichment analyses more than others, this tendency can stem from various sources including well-studied genes. This research focus bias is in several of the libraries.
Enrichr contains 35 gene-set libraries where some libraries are borrowed from other tools while many other libraries are newly created and only available in Enrichr. he gene-set libraries provided by Enrichr are divided into six categories: transcription, pathways, ontologies, diseases/drugs, cell types and miscellaneous. The following is a description of each library and how it was created: The transcription category provides six gene-set libraries that attempt to link differentially expressed genes with the transcriptional machinery. These six libraries include the ability to identify transcription factors that are enriched for target genes within the input list using four different options: 1) ChEA [10]; 2) position weight matrices (PWMs) from TRANSFAC [11] and JASPAR [12]; 3) target genes generated from PMWs downloaded from the UCSC genome browser [13]; and 4) transcription factor targets extracted from the ENCODE project [14, 15]. In addition, the two other gene-set libraries in the transcription category are gene sets associated with: 5) histone modifications extracted from the Roadmap Epigenomics Project [16]; and 6) microRNAs targets computationally predicted by TargetScan [17].
Enrichr contains 35 gene-set libraries where some libraries are borrowed from other tools while many other libraries are newly created and only available in Enrichr. he gene-set libraries provided by Enrichr are divided into six categories: transcription, pathways, ontologies, diseases/drugs, cell types and miscellaneous. The following is a description of each library and how it was created:
Enrichr contains 35 gene-set libraries where some libraries are borrowed from other tools while many other libraries are newly created and only available in Enrichr. he gene-set libraries provided by Enrichr are divided into six categories: transcription, pathways, ontologies, diseases/drugs, cell types and miscellaneous. The following is a description of each library and how it was created:
Enrichr computes three types of enrichment scores to assess the significance of overlap between the input list and the gene sets in each gene-set library for ranking a term’s relevance to the input list. These tests are: 1) the Fisher exact test, a test that is implemented in most gene list enrichment analyses programs; 2) a test statistics that we developed which is the z-score of the deviation from the expected rank by the Fisher exact test; and 3) a combined score that multiplies the log of the p-value computed with the Fisher exact test by the z-score computed by our correction to the tes
Enrichr workflow. Enrichr receives lists of human or mouse genes as input. It uses 35 gene-set libraries to compute enrichment. The enrichment results are interactively displayed as bar graphs, tables, grids of terms with the enriched terms highlighted, and networks of enriched terms. After submitting the list for analysis, the user is presented with the results page, which is divided into the six different categories: transcription, pathways, ontologies, disease/drugs, cell types, and miscellaneous. Clicking on the name of the gene-set library expands a box that reveals the enrichment analysis results for that gene-set library. Users are first presented with a bar graph that shows the top 10 enriched terms for the selected gene-set library (Figure 1 and Additional file 2: Figure S2). The bar graph provides a visual representation of how significant each term is based on the overlap with the user’s input list. The longer bars and lighter colored bars mean that the term is more significant. It is possible to export the bar graph as a figure for publication or other form of presentation into three formats: JPEG, SVG and PNG. In addition, the color of the bar graph can be customized using a hexagonal color selection wheel populated with colors that provide the best contrast. There are three methods to compute enrichment and the user can toggle between them by clicking on any bar of the bar graph: Fisher exact test based ranking, rank based ranking, and combined score ranking.
verall, Enichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. The new gene set libraries that were added include differentially expressed genes after drug, gene, disease and pathogen perturbations extracted from the national center for biotechnology information (NCBI) gene expression omnibus (GEO) through a crowdsourcing project. Furthermore, we have implemented the ability to submit fuzzy sets, upload BED files, a calendar that shows the number of lists submitted each day, an improved application programming interface (API), an enhanced help documentation, an improved Find a Gene feature, and visualization of the results as clustergrams. In this manuscript, we also provide updated benchmarking results of the different scoring schemes implemented in Enrichr and visualize the overlap between the data sets currently within Enrichr compared with other comparable web-server tools and resources that serve gene set libraries.
https://www.youtube.com/watch?v=HfUZdNJ9a3A

Comprehensive gene set enrichment analysis web server Enrichr

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Comprehensive gene set enrichment analysis web server Enrichr

Similar to Comprehensive gene set enrichment analysis web server Enrichr (20)

More from Thi K. Tran-Nguyen, PhD

More from Thi K. Tran-Nguyen, PhD (20)

Recently uploaded

Recently uploaded (20)

Comprehensive gene set enrichment analysis web server Enrichr

Editor's Notes