SlideShare a Scribd company logo
1 of 30
Efficient querying of genomic reference
databases with gget
Luomeng Tan
April 5, 2023
gget
• free, open-source
• command-line tool & Python package
• pip install or conda install
Overview
gget.ref(species, which='all', release=None,
ftp=False, save=False, list_species=False)
which = [‘gtf’, ’cdna’, ‘dna’, cds’, ‘cdrna’, ‘pep’]
gget.search(searchwords, species, id_type='gene',
seqtype=None, andor='or', limit=None, wrap_text=False,
json=False, save=False)
1. Find the most correlated genes to a gene of interest
Use data from the human and mouse RNA-seq database ARCHS4
gget.archs4(gene, ensembl=False, which='correlation',
gene_count=100, species='human', json=False, save=False)
gget.archs4(gene, ensembl=False, which='correlation',
gene_count=100, species='human', json=False, save=False)
2. Find the tissue expression atlas of a gene of interest
Database to use as reference:
• 'pathway' (KEGG_2021_Human)
• 'transcription' (ChEA_2016)
• 'ontology' (GO_Biological_Process_2021)
• 'diseases_drugs' (GWAS_Catalog_2019)
• 'celltypes' (PanglaoDB_Augmented_2021)
• 'kinase_interactions' (KEA_2015)
gget.enrichr(genes, database, ensembl=False, plot=False,
figsize=(10, 10), ax=None, json=False, save=False)
gget.info(ens_ids, wrap_text=False, pdb=False,
ensembl_only=False, json=False, verbose=True,
save=False, expand=False)
• ensembl_id
• uniprot_id
• pdb_id
• ncbi_gene_id
• species
• assembly_name
• primary_gene_name
• ensembl_gene_name
• synonyms
• parent_gene
• protein_names
• ensembl_description
• uniprot_description
• ncbi_description
• subcellular_localisation
• object_type
• biotype
• canonical_transcript
• seq_region_name
• strand
• start
• end
• all_transcripts
• transcript_biotype
• stranscript_names
• transcript_strands
• transcript_starts
• transcript_ends
• all_exons
• exon_starts
• exon_ends
• all_translations
• translation_starts
• translation_ends
All fileds in gget info results
gget.seq(ens_ids, translate=False, isoforms=False,
save=False, translate=None, seqtype=None)
If translate = False, it returns nucleotide sequences
gget.seq(ens_ids, translate=False, isoforms=False,
save=False, translate=None, seqtype=None)
If translate = True, it returns amino acid sequences
Use MUSCLE algorithm to align the nucleotide/amino acid sequences of all transcripts
gget.muscle(fasta, super5=False, out=None)
BLAST the gene nucleotide sequence or amino acid of the canonical transcript:
gget.blast(sequence, program='default', database='default',
limit=50, expect=10.0, low_comp_filt=False, megablast=True,
verbose=True, wrap_text=False, json=False, save=False)
BLAT the gene nucleotide/amino acid sequence to find its genomic location:
gget.blat(sequence, seqtype='default',
assembly='human', json=False, save=False)
gget.alphafold(sequence,
out="./[date_time]_gget_alphafold_prediction",
multimer_for_monomer=False, relax=False, multimer_recycles=3,
plot=True, show_sidechains=True)
gget.alphafold(sequence,
out="./[date_time]_gget_alphafold_prediction",
multimer_for_monomer=False, relax=False, multimer_recycles=3,
plot=True, show_sidechains=True)
Overview

More Related Content

More from Hoffman Lab

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkHoffman Lab
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome BrowserHoffman Lab
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelHoffman Lab
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornHoffman Lab
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)Hoffman Lab
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorHoffman Lab
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and RmdformatsHoffman Lab
 
File searching tools
File searching toolsFile searching tools
File searching toolsHoffman Lab
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroHoffman Lab
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and BioawkHoffman Lab
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and ShellsHoffman Lab
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/AcronymHoffman Lab
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyHoffman Lab
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With GitHoffman Lab
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserHoffman Lab
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...Hoffman Lab
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2Hoffman Lab
 
Basic Cryptography & Security
Basic Cryptography & SecurityBasic Cryptography & Security
Basic Cryptography & SecurityHoffman Lab
 

More from Hoffman Lab (20)

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talk
 
TCRpower
TCRpowerTCRpower
TCRpower
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome Browser
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network Tunnel
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seaborn
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and Rmdformats
 
File searching tools
File searching toolsFile searching tools
File searching tools
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for Zotero
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and Bioawk
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and Shells
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/Acronym
 
Linters in R
Linters in RLinters in R
Linters in R
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biology
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With Git
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2
 
Basic Cryptography & Security
Basic Cryptography & SecurityBasic Cryptography & Security
Basic Cryptography & Security
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Efficient querying of genomic reference databases with gget

Editor's Notes

  1. gget is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases.  gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code. a majority of researchers currently access genomic reference databases to annotate and functionally characterize putative marker genes through manual web access manual web access is time-consuming and potentially error-prone, as it requires manually copying and pasting data, such as gene IDs.
  2. ref(species, which='all', release=None, ftp=False, save=False, list_species=False) Fetch FTPs for reference genomes and annotations by species from Ensembl. Args: species Defines the species for which the reference should be fetched in the format "<genus>_<species>", e.g. species = "homo_sapiens". which Defines which results to return. Default: 'all' -> Returns all available results. Possible entries are one or a combination (as a list of strings) of the following: 'gtf' - Returns the annotation (GTF). 'cdna' - Returns the trancriptome (cDNA). 'dna' - Returns the genome (DNA). 'cds - Returns the coding sequences corresponding to Ensembl genes. (Does not contain UTR or intronic sequence.) 'cdrna' - Returns transcript sequences corresponding to non-coding RNA genes (ncRNA). 'pep' - Returns the protein translations of Ensembl genes. release Defines the Ensembl release number from which the files are fetched, e.g. release = 104. Default: None -> latest Ensembl release is used ftp Return only the requested FTP links in a list (default: False). save Save the results in the local directory (default: False). list_species If True and `species=None`, returns a list of all available species from the Ensembl database for large genomes (not including plants/bacteria) (default: False). (Can be combined with `release` to get the available species from a specific Ensembl release.) Returns a dictionary containing the requested URLs with their respective Ensembl version and release date and time. (If FTP=True, returns a list containing only the URLs.)
  3. search(searchwords, species, id_type='gene', seqtype=None, andor='or', limit=None, wrap_text=False, json=False, save=False) Function to query Ensembl for genes based on species and free form search terms. Automatically fetches results from latest Ensembl release, unless user specifies database (see 'species' argument). Args: searchwords Free form search words (not case-sensitive) as a string or list of strings (e.g.searchwords = ["GABA", "gamma-aminobutyric"]). species Species can be passed in the format "genus_species", e.g. "homo_sapiens". To pass a specific database, enter the name of the core database, e.g. 'mus_musculus_dba2j_core_105_1'. All availabale species databases can be found here: http://ftp.ensembl.org/pub/release-106/mysql/ id_type "gene" (default) or "transcript" Defines whether genes or transcripts matching the searchwords are returned. andor "or" (default) or "and" "or": Returns all genes that INCLUDE AT LEAST ONE of the searchwords in their name/description. "and": Returns only genes that INCLUDE ALL of the searchwords in their name/description. limit (int) Limit the number of search results returned (default: None). wrap_text If True, displays data frame with wrapped text for easy reading. Default: False. json If True, returns results in json format instead of data frame. Default: False. save If True, the data frame is saved as a csv in the current directory (default: False). Returns a data frame with the query results. Deprecated arguments: 'seqtype' (renamed to id_type)
  4. archs4(gene, ensembl=False, which='correlation', gene_count=100, species='human', json=False, save=False) Find the most correlated genes or the tissue expression atlas of a gene of interest using data from the human and mouse RNA-seq database ARCHS4 (https://maayanlab.cloud/archs4/). Args: gene Short name (Entrez gene symbol) of gene of interest (str), e.g. 'STAT4'. Set 'ensembl=True' to input an Ensembl gene ID, e.g. ENSG00000138378. ensembl Define as 'True' if 'gene' is an Ensembl gene ID. (Default: False) which 'correlation' (default) or 'tissue’. 'correlation' returns a gene correlation table that contains the 100 most correlated genes to the gene of interest. The Pearson correlation is calculated over all samples and tissues in ARCHS4. 'tissue' returns a tissue expression atlas calculated from human or mouse samples (as defined by 'species') in ARCHS4. gene_count Number of correlated genes to return (default: 100). (Only for gene correlation.) species 'human' (default) or 'mouse'. (Only for tissue expression atlas.) json If True, returns results in json format instead of data frame. Default: False. save True/False whether to save the results in the local directory. Returns a data frame with the requested results. The Pearson correlation is calculated over all samples and tissues. The gene list can be uploaded to Enrichr for further investigation.
  5. archs4(gene, ensembl=False, which='correlation', gene_count=100, species='human', json=False, save=False) Find the most correlated genes or the tissue expression atlas of a gene of interest using data from the human and mouse RNA-seq database ARCHS4 (https://maayanlab.cloud/archs4/). Args: gene Short name (Entrez gene symbol) of gene of interest (str), e.g. 'STAT4'. Set 'ensembl=True' to input an Ensembl gene ID, e.g. ENSG00000138378. ensembl Define as 'True' if 'gene' is an Ensembl gene ID. (Default: False) which 'correlation' (default) or 'tissue’. 'correlation' returns a gene correlation table that contains the 100 most correlated genes to the gene of interest. The Pearson correlation is calculated over all samples and tissues in ARCHS4. 'tissue' returns a tissue expression atlas calculated from human or mouse samples (as defined by 'species') in ARCHS4. gene_count Number of correlated genes to return (default: 100). (Only for gene correlation.) species 'human' (default) or 'mouse'. (Only for tissue expression atlas.) json If True, returns results in json format instead of data frame. Default: False. save True/False whether to save the results in the local directory. Returns a data frame with the requested results.
  6. enrichr(genes, database, ensembl=False, plot=False, figsize=(10, 10), ax=None, json=False, save=False) Perform an enrichment analysis on a list of genes using Enrichr (https://maayanlab.cloud/Enrichr/). Args: genes List of Entrez gene symbols to perform enrichment analysis on, passed as a list of strings, e.g. ['PHF14', 'RBM3', 'MSL1', 'PHF21A']. Set 'ensembl = True' to input a list of Ensembl gene IDs, e.g. ['ENSG00000106443', 'ENSG00000102317', 'ENSG00000188895’]. database Database to use as reference for the enrichment analysis. Supported shortcuts (and their default database): 'pathway' (KEGG_2021_Human) 'transcription' (ChEA_2016) 'ontology' (GO_Biological_Process_2021) 'diseases_drugs' (GWAS_Catalog_2019) 'celltypes' (PanglaoDB_Augmented_2021) 'kinase_interactions' (KEA_2015) or any database listed under Gene-set Library at: https://maayanlab.cloud/Enrichr/#libraries ensembl Define as 'True' if 'genes' is a list of Ensembl gene IDs. (Default: False) plot True/False whether to provide a graphical overview of the first 15 results. (Default: False) figsize (width, height) of plot in inches. (Default: (10,10)) ax Pass a matplotlib axes object for further customization of the plot. (Default: None) json If True, returns results in json format instead of data frame. (Default: False) save True/False whether to save the results in the local directory. (Default: False) Returns a data frame with the Enrichr results.
  7. enrichr(genes, database, ensembl=False, plot=False, figsize=(10, 10), ax=None, json=False, save=False) Perform an enrichment analysis on a list of genes using Enrichr (https://maayanlab.cloud/Enrichr/). Args: genes List of Entrez gene symbols to perform enrichment analysis on, passed as a list of strings, e.g. ['PHF14', 'RBM3', 'MSL1', 'PHF21A']. Set 'ensembl = True' to input a list of Ensembl gene IDs, e.g. ['ENSG00000106443', 'ENSG00000102317', 'ENSG00000188895’]. database Database to use as reference for the enrichment analysis. Supported shortcuts (and their default database): 'pathway' (KEGG_2021_Human) 'transcription' (ChEA_2016) 'ontology' (GO_Biological_Process_2021) 'diseases_drugs' (GWAS_Catalog_2019) 'celltypes' (PanglaoDB_Augmented_2021) 'kinase_interactions' (KEA_2015) or any database listed under Gene-set Library at: https://maayanlab.cloud/Enrichr/#libraries ensembl Define as 'True' if 'genes' is a list of Ensembl gene IDs. (Default: False) plot True/False whether to provide a graphical overview of the first 15 results. (Default: False) figsize (width, height) of plot in inches. (Default: (10,10)) ax Pass a matplotlib axes object for further customization of the plot. (Default: None) json If True, returns results in json format instead of data frame. (Default: False) save True/False whether to save the results in the local directory. (Default: False) Returns a data frame with the Enrichr results.
  8. info(ens_ids, wrap_text=False, pdb=False, ensembl_only=False, json=False, verbose=True, save=False, expand=False) Fetch gene and transcript metadata using Ensembl IDs. Args: ens_ids One or more Ensembl IDs to look up (string or list of strings). Also supports WormBase and Flybase IDs. wrap_text If True, displays data frame with wrapped text for easy reading. Default: False. pdb If True, also returns PDB IDs (might increase run time). Default: False. ensembl_only If True, only returns results from Ensembl (excludes PDB, UniProt, and NCBI results). Default: False. json If True, returns results in json/dictionary format instead of data frame. Default: False. verbose True/False whether to print progress information. Default True. save True/False wether to save csv with query results in current working directory. Default: False. Returns a data frame containing the requested information.
  9. info(ens_ids, wrap_text=False, pdb=False, ensembl_only=False, json=False, verbose=True, save=False, expand=False) Fetch gene and transcript metadata using Ensembl IDs. Args: ens_ids One or more Ensembl IDs to look up (string or list of strings). Also supports WormBase and Flybase IDs. wrap_text If True, displays data frame with wrapped text for easy reading. Default: False. pdb If True, also returns PDB IDs (might increase run time). Default: False. ensembl_only If True, only returns results from Ensembl (excludes PDB, UniProt, and NCBI results). Default: False. json If True, returns results in json/dictionary format instead of data frame. Default: False. verbose True/False whether to print progress information. Default True. save True/False wether to save csv with query results in current working directory. Default: False. Returns a data frame containing the requested information.
  10. seq(ens_ids, translate=False, isoforms=False, save=False, transcribe=None, seqtype=None) Fetch nucleotide or amino acid sequence (FASTA) of a gene (and all its isoforms) or transcript by Ensembl, WormBase or FlyBase ID. Args: ens_ids One or more Ensembl IDs (passed as string or list of strings). Also supports WormBase and FlyBase IDs. translate True/False (default: False -> returns nucleotide sequences). Defines whether nucleotide or amino acid sequences are returned. Nucleotide sequences are fetched from the Ensembl REST API server. Amino acid sequences are fetched from the UniProt REST API server. isoforms If True, returns the sequences of all known transcripts (default: False). (Only for gene IDs.) save If True, saves output FASTA to current directory (default: False). Returns a list (or FASTA file if 'save=True') containing the requested sequences.
  11. seq(ens_ids, translate=False, isoforms=False, save=False, transcribe=None, seqtype=None) Fetch nucleotide or amino acid sequence (FASTA) of a gene (and all its isoforms) or transcript by Ensembl, WormBase or FlyBase ID. Args: ens_ids One or more Ensembl IDs (passed as string or list of strings). Also supports WormBase and FlyBase IDs. translate True/False (default: False -> returns nucleotide sequences). Defines whether nucleotide or amino acid sequences are returned. Nucleotide sequences are fetched from the Ensembl REST API server. Amino acid sequences are fetched from the UniProt REST API server. isoforms If True, returns the sequences of all known transcripts (default: False). (Only for gene IDs.) save If True, saves output FASTA to current directory (default: False). Returns a list (or FASTA file if 'save=True') containing the requested sequences.
  12. muscle(fasta, super5=False, out=None) Align multiple nucleotide or amino acid sequences against each other (using the Muscle v5 algorithm). Args: fasta Path to fasta file containing the sequences to be aligned. super5 True/False (default: False). If True, align input using Super5 algorithm instead of PPP algorithm to decrease time and memory. Use for large inputs (a few hundred sequences). out Path to save an 'aligned FASTA' (.afa) file with the results, e.g. 'path/to/directory/results.afa’. Default: 'None' -> Results will be printed in Clustal format. Returns alignment results in an "aligned FASTA" (.afa) file.
  13. blast(sequence, program='default', database='default', limit=50, expect=10.0, low_comp_filt=False, megablast=True, verbose=True, wrap_text=False, json=False, save=False) BLAST a nucleotide or amino acid sequence against any BLAST DB. Args: sequence Sequence (str) or path to FASTA file. (If more than one sequence in FASTA file, only the first will be submitted to BLAST.) program 'blastn', 'blastp', 'blastx', 'tblastn', or 'tblastx’. Default: 'blastn' for nucleotide sequences; 'blastp' for amino acid sequences. database 'nt', 'nr', 'refseq_rna', 'refseq_protein', 'swissprot', 'pdbaa', or 'pdbnt’. Default: 'nt' for nucleotide sequences; 'nr' for amino acid sequences. More info on BLAST databases: https://ncbi.github.io/blast-cloud/blastdb/available-blastdbs.html - limit Limits number of hits to return. Default 50. expect float or None. An expect value cutoff. Default 10.0. - low_comp_filt True/False whether to apply low complexity filter. Default False. megablast True/False whether to use the MegaBLAST algorithm (blastn only). Default True. verbose True/False whether to print progress information. Default True. wrap_text If True, displays data frame with wrapped text for easy reading. Default: False. json If True, returns results in json/dictionary format instead of data frame. Default: False. save If True, the data frame is saved as a csv in the current directory (default: False). Returns a data frame with the BLAST results.
  14. blat(sequence, seqtype='default', assembly='human', json=False, save=False) BLAT a nucleotide or amino acid sequence against any BLAT UCSC assembly. Args: sequence Sequence (str) or path to fasta file containing one sequence. seqtype 'DNA', 'protein', 'translated%20RNA', or 'translated%20DNA'. Default: 'DNA' for nucleotide sequences; 'protein' for amino acid sequences. assembly 'human' (hg38) (default), 'mouse' (mm39), 'zebrafinch' (taeGut2), or any of the species assemblies available at https://genome.ucsc.edu/cgi-bin/hgBlat (use short assembly name as listed after the "/"). json If True, returns results in json format instead of data frame. Default: False. save If True, the data frame is saved as a csv in the current directory (default: False). Returns a data frame with the BLAT results.
  15. alphafold(sequence, out='2022_12_30-1803_gget_alphafold_prediction', multimer_for_monomer=False, relax=False, multimer_recycles=3, plot=True, show_sidechains=True) Predicts the structure of a protein using a slightly simplified version of AlphaFold v2.3.0 (https://doi.org/10.1038/s41586-021-03819-2) published in the AlphaFold Colab notebook (https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb). Args: sequence Amino acid sequence (str), a list of sequences, or path to a FASTA file. out Path to folder to save prediction results in (str). Default: "./[date_time]_gget_alphafold_prediction" multimer_for_monomer Use multimer model for a monomer (default: False). multimer_recycles The multimer model will continue recycling until the predictions stop changing, up to the limit set here (default: 3). For higher accuracy, at the potential cost of longer inference times, set this to 20. relax True/False whether to AMBER relax the best model (default: False). plot True/False whether to provide a graphical overview of the prediction (default: True). show_sidechains True/False whether to show side chains in the plot (default: True). Saves the predicted aligned error (json) and the prediction (PDB) in the defined 'out' folder.
  16. alphafold(sequence, out='2022_12_30-1803_gget_alphafold_prediction', multimer_for_monomer=False, relax=False, multimer_recycles=3, plot=True, show_sidechains=True) Predicts the structure of a protein using a slightly simplified version of AlphaFold v2.3.0 (https://doi.org/10.1038/s41586-021-03819-2) published in the AlphaFold Colab notebook (https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb). Args: sequence Amino acid sequence (str), a list of sequences, or path to a FASTA file. out Path to folder to save prediction results in (str). Default: "./[date_time]_gget_alphafold_prediction" multimer_for_monomer Use multimer model for a monomer (default: False). multimer_recycles The multimer model will continue recycling until the predictions stop changing, up to the limit set here (default: 3). For higher accuracy, at the potential cost of longer inference times, set this to 20. relax True/False whether to AMBER relax the best model (default: False). plot True/False whether to provide a graphical overview of the prediction (default: True). show_sidechains True/False whether to show side chains in the plot (default: True). Saves the predicted aligned error (json) and the prediction (PDB) in the defined 'out' folder.