Efficient querying of genomic reference databases with gget

•Download as PPTX, PDF•

0 likes•9 views

Hoffman Lab

Hoffman Lab Tech Talk

Technology

Efficient querying of genomic reference
databases with gget
Luomeng Tan
April 5, 2023

gget
• free, open-source
• command-line tool & Python package
• pip install or conda install

gget.ref(species, which='all', release=None,
ftp=False, save=False, list_species=False)
which = [‘gtf’, ’cdna’, ‘dna’, cds’, ‘cdrna’, ‘pep’]

gget.search(searchwords, species, id_type='gene',
seqtype=None, andor='or', limit=None, wrap_text=False,
json=False, save=False)

1. Find the most correlated genes to a gene of interest
Use data from the human and mouse RNA-seq database ARCHS4
gget.archs4(gene, ensembl=False, which='correlation',
gene_count=100, species='human', json=False, save=False)

gget.archs4(gene, ensembl=False, which='correlation',
gene_count=100, species='human', json=False, save=False)
2. Find the tissue expression atlas of a gene of interest

Database to use as reference:
• 'pathway' (KEGG_2021_Human)
• 'transcription' (ChEA_2016)
• 'ontology' (GO_Biological_Process_2021)
• 'diseases_drugs' (GWAS_Catalog_2019)
• 'celltypes' (PanglaoDB_Augmented_2021)
• 'kinase_interactions' (KEA_2015)
gget.enrichr(genes, database, ensembl=False, plot=False,
figsize=(10, 10), ax=None, json=False, save=False)

gget.info(ens_ids, wrap_text=False, pdb=False,
ensembl_only=False, json=False, verbose=True,
save=False, expand=False)

• ensembl_id
• uniprot_id
• pdb_id
• ncbi_gene_id
• species
• assembly_name
• primary_gene_name
• ensembl_gene_name
• synonyms
• parent_gene
• protein_names
• ensembl_description
• uniprot_description
• ncbi_description
• subcellular_localisation
• object_type
• biotype
• canonical_transcript
• seq_region_name
• strand
• start
• end
• all_transcripts
• transcript_biotype
• stranscript_names
• transcript_strands
• transcript_starts
• transcript_ends
• all_exons
• exon_starts
• exon_ends
• all_translations
• translation_starts
• translation_ends
All fileds in gget info results

gget.seq(ens_ids, translate=False, isoforms=False,
save=False, translate=None, seqtype=None)
If translate = False, it returns nucleotide sequences

gget.seq(ens_ids, translate=False, isoforms=False,
save=False, translate=None, seqtype=None)
If translate = True, it returns amino acid sequences

Use MUSCLE algorithm to align the nucleotide/amino acid sequences of all transcripts
gget.muscle(fasta, super5=False, out=None)

BLAST the gene nucleotide sequence or amino acid of the canonical transcript:
gget.blast(sequence, program='default', database='default',
limit=50, expect=10.0, low_comp_filt=False, megablast=True,
verbose=True, wrap_text=False, json=False, save=False)

BLAT the gene nucleotide/amino acid sequence to find its genomic location:
gget.blat(sequence, seqtype='default',
assembly='human', json=False, save=False)

gget.alphafold(sequence,
out="./[date_time]_gget_alphafold_prediction",
multimer_for_monomer=False, relax=False, multimer_recycles=3,
plot=True, show_sidechains=True)

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

How to convert PDF to text with Nanonetsnaman860154

Developing An App To Navigate The Roads of BrazilV3cube

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

GenCyber Cyber Security Day PresentationMichael W. Hawkins

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script

CNv6 Instructor Chapter 6 Quality of Service

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

[2024]Digital Global Overview Report 2024 Meltwater.pdf

The Codex of Business Writing Software for Real-World Solutions 2.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

How to convert PDF to text with Nanonets

Developing An App To Navigate The Roads of Brazil

Driving Behavioral Change for Information Management through Data-Driven Gree...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

GenCyber Cyber Security Day Presentation

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Breaking the Kubernetes Kill Chain: Host Path Mount

08448380779 Call Girls In Friends Colony Women Seeking Men

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

08448380779 Call Girls In Civil Lines Women Seeking Men

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Salesforce Community Group Quito, Salesforce 101

Efficient querying of genomic reference databases with gget

1. Efficient querying of genomic reference databases with gget Luomeng Tan April 5, 2023

2. gget • free, open-source • command-line tool & Python package • pip install or conda install

3. Overview

6. gget.ref(species, which='all', release=None, ftp=False, save=False, list_species=False) which = [‘gtf’, ’cdna’, ‘dna’, cds’, ‘cdrna’, ‘pep’]

7. gget.search(searchwords, species, id_type='gene', seqtype=None, andor='or', limit=None, wrap_text=False, json=False, save=False)

9. 1. Find the most correlated genes to a gene of interest Use data from the human and mouse RNA-seq database ARCHS4 gget.archs4(gene, ensembl=False, which='correlation', gene_count=100, species='human', json=False, save=False)

10. gget.archs4(gene, ensembl=False, which='correlation', gene_count=100, species='human', json=False, save=False) 2. Find the tissue expression atlas of a gene of interest

11.

12. Database to use as reference: • 'pathway' (KEGG_2021_Human) • 'transcription' (ChEA_2016) • 'ontology' (GO_Biological_Process_2021) • 'diseases_drugs' (GWAS_Catalog_2019) • 'celltypes' (PanglaoDB_Augmented_2021) • 'kinase_interactions' (KEA_2015) gget.enrichr(genes, database, ensembl=False, plot=False, figsize=(10, 10), ax=None, json=False, save=False)

13.

14.

15. gget.info(ens_ids, wrap_text=False, pdb=False, ensembl_only=False, json=False, verbose=True, save=False, expand=False)

16. • ensembl_id • uniprot_id • pdb_id • ncbi_gene_id • species • assembly_name • primary_gene_name • ensembl_gene_name • synonyms • parent_gene • protein_names • ensembl_description • uniprot_description • ncbi_description • subcellular_localisation • object_type • biotype • canonical_transcript • seq_region_name • strand • start • end • all_transcripts • transcript_biotype • stranscript_names • transcript_strands • transcript_starts • transcript_ends • all_exons • exon_starts • exon_ends • all_translations • translation_starts • translation_ends All fileds in gget info results

17.

18. gget.seq(ens_ids, translate=False, isoforms=False, save=False, translate=None, seqtype=None) If translate = False, it returns nucleotide sequences

19. gget.seq(ens_ids, translate=False, isoforms=False, save=False, translate=None, seqtype=None) If translate = True, it returns amino acid sequences

20.

21. Use MUSCLE algorithm to align the nucleotide/amino acid sequences of all transcripts gget.muscle(fasta, super5=False, out=None)

22.

23. BLAST the gene nucleotide sequence or amino acid of the canonical transcript: gget.blast(sequence, program='default', database='default', limit=50, expect=10.0, low_comp_filt=False, megablast=True, verbose=True, wrap_text=False, json=False, save=False)

24.

25. BLAT the gene nucleotide/amino acid sequence to find its genomic location: gget.blat(sequence, seqtype='default', assembly='human', json=False, save=False)

26.

27. gget.alphafold(sequence, out="./[date_time]_gget_alphafold_prediction", multimer_for_monomer=False, relax=False, multimer_recycles=3, plot=True, show_sidechains=True)

28. gget.alphafold(sequence, out="./[date_time]_gget_alphafold_prediction", multimer_for_monomer=False, relax=False, multimer_recycles=3, plot=True, show_sidechains=True)

29.

30. Overview

Editor's Notes

gget is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code. a majority of researchers currently access genomic reference databases to annotate and functionally characterize putative marker genes through manual web access manual web access is time-consuming and potentially error-prone, as it requires manually copying and pasting data, such as gene IDs.
ref(species, which='all', release=None, ftp=False, save=False, list_species=False) Fetch FTPs for reference genomes and annotations by species from Ensembl. Args: species Defines the species for which the reference should be fetched in the format "<genus>_<species>", e.g. species = "homo_sapiens". which Defines which results to return. Default: 'all' -> Returns all available results. Possible entries are one or a combination (as a list of strings) of the following: 'gtf' - Returns the annotation (GTF). 'cdna' - Returns the trancriptome (cDNA). 'dna' - Returns the genome (DNA). 'cds - Returns the coding sequences corresponding to Ensembl genes. (Does not contain UTR or intronic sequence.) 'cdrna' - Returns transcript sequences corresponding to non-coding RNA genes (ncRNA). 'pep' - Returns the protein translations of Ensembl genes. release Defines the Ensembl release number from which the files are fetched, e.g. release = 104. Default: None -> latest Ensembl release is used ftp Return only the requested FTP links in a list (default: False). save Save the results in the local directory (default: False). list_species If True and `species=None`, returns a list of all available species from the Ensembl database for large genomes (not including plants/bacteria) (default: False). (Can be combined with `release` to get the available species from a specific Ensembl release.) Returns a dictionary containing the requested URLs with their respective Ensembl version and release date and time. (If FTP=True, returns a list containing only the URLs.)
search(searchwords, species, id_type='gene', seqtype=None, andor='or', limit=None, wrap_text=False, json=False, save=False) Function to query Ensembl for genes based on species and free form search terms. Automatically fetches results from latest Ensembl release, unless user specifies database (see 'species' argument). Args: searchwords Free form search words (not case-sensitive) as a string or list of strings (e.g.searchwords = ["GABA", "gamma-aminobutyric"]). species Species can be passed in the format "genus_species", e.g. "homo_sapiens". To pass a specific database, enter the name of the core database, e.g. 'mus_musculus_dba2j_core_105_1'. All availabale species databases can be found here: http://ftp.ensembl.org/pub/release-106/mysql/ id_type "gene" (default) or "transcript" Defines whether genes or transcripts matching the searchwords are returned. andor "or" (default) or "and" "or": Returns all genes that INCLUDE AT LEAST ONE of the searchwords in their name/description. "and": Returns only genes that INCLUDE ALL of the searchwords in their name/description. limit (int) Limit the number of search results returned (default: None). wrap_text If True, displays data frame with wrapped text for easy reading. Default: False. json If True, returns results in json format instead of data frame. Default: False. save If True, the data frame is saved as a csv in the current directory (default: False). Returns a data frame with the query results. Deprecated arguments: 'seqtype' (renamed to id_type)
archs4(gene, ensembl=False, which='correlation', gene_count=100, species='human', json=False, save=False) Find the most correlated genes or the tissue expression atlas of a gene of interest using data from the human and mouse RNA-seq database ARCHS4 (https://maayanlab.cloud/archs4/). Args: gene Short name (Entrez gene symbol) of gene of interest (str), e.g. 'STAT4'. Set 'ensembl=True' to input an Ensembl gene ID, e.g. ENSG00000138378. ensembl Define as 'True' if 'gene' is an Ensembl gene ID. (Default: False) which 'correlation' (default) or 'tissue’. 'correlation' returns a gene correlation table that contains the 100 most correlated genes to the gene of interest. The Pearson correlation is calculated over all samples and tissues in ARCHS4. 'tissue' returns a tissue expression atlas calculated from human or mouse samples (as defined by 'species') in ARCHS4. gene_count Number of correlated genes to return (default: 100). (Only for gene correlation.) species 'human' (default) or 'mouse'. (Only for tissue expression atlas.) json If True, returns results in json format instead of data frame. Default: False. save True/False whether to save the results in the local directory. Returns a data frame with the requested results. The Pearson correlation is calculated over all samples and tissues. The gene list can be uploaded to Enrichr for further investigation.
archs4(gene, ensembl=False, which='correlation', gene_count=100, species='human', json=False, save=False) Find the most correlated genes or the tissue expression atlas of a gene of interest using data from the human and mouse RNA-seq database ARCHS4 (https://maayanlab.cloud/archs4/). Args: gene Short name (Entrez gene symbol) of gene of interest (str), e.g. 'STAT4'. Set 'ensembl=True' to input an Ensembl gene ID, e.g. ENSG00000138378. ensembl Define as 'True' if 'gene' is an Ensembl gene ID. (Default: False) which 'correlation' (default) or 'tissue’. 'correlation' returns a gene correlation table that contains the 100 most correlated genes to the gene of interest. The Pearson correlation is calculated over all samples and tissues in ARCHS4. 'tissue' returns a tissue expression atlas calculated from human or mouse samples (as defined by 'species') in ARCHS4. gene_count Number of correlated genes to return (default: 100). (Only for gene correlation.) species 'human' (default) or 'mouse'. (Only for tissue expression atlas.) json If True, returns results in json format instead of data frame. Default: False. save True/False whether to save the results in the local directory. Returns a data frame with the requested results.
enrichr(genes, database, ensembl=False, plot=False, figsize=(10, 10), ax=None, json=False, save=False) Perform an enrichment analysis on a list of genes using Enrichr (https://maayanlab.cloud/Enrichr/). Args: genes List of Entrez gene symbols to perform enrichment analysis on, passed as a list of strings, e.g. ['PHF14', 'RBM3', 'MSL1', 'PHF21A']. Set 'ensembl = True' to input a list of Ensembl gene IDs, e.g. ['ENSG00000106443', 'ENSG00000102317', 'ENSG00000188895’]. database Database to use as reference for the enrichment analysis. Supported shortcuts (and their default database): 'pathway' (KEGG_2021_Human) 'transcription' (ChEA_2016) 'ontology' (GO_Biological_Process_2021) 'diseases_drugs' (GWAS_Catalog_2019) 'celltypes' (PanglaoDB_Augmented_2021) 'kinase_interactions' (KEA_2015) or any database listed under Gene-set Library at: https://maayanlab.cloud/Enrichr/#libraries ensembl Define as 'True' if 'genes' is a list of Ensembl gene IDs. (Default: False) plot True/False whether to provide a graphical overview of the first 15 results. (Default: False) figsize (width, height) of plot in inches. (Default: (10,10)) ax Pass a matplotlib axes object for further customization of the plot. (Default: None) json If True, returns results in json format instead of data frame. (Default: False) save True/False whether to save the results in the local directory. (Default: False) Returns a data frame with the Enrichr results.
enrichr(genes, database, ensembl=False, plot=False, figsize=(10, 10), ax=None, json=False, save=False) Perform an enrichment analysis on a list of genes using Enrichr (https://maayanlab.cloud/Enrichr/). Args: genes List of Entrez gene symbols to perform enrichment analysis on, passed as a list of strings, e.g. ['PHF14', 'RBM3', 'MSL1', 'PHF21A']. Set 'ensembl = True' to input a list of Ensembl gene IDs, e.g. ['ENSG00000106443', 'ENSG00000102317', 'ENSG00000188895’]. database Database to use as reference for the enrichment analysis. Supported shortcuts (and their default database): 'pathway' (KEGG_2021_Human) 'transcription' (ChEA_2016) 'ontology' (GO_Biological_Process_2021) 'diseases_drugs' (GWAS_Catalog_2019) 'celltypes' (PanglaoDB_Augmented_2021) 'kinase_interactions' (KEA_2015) or any database listed under Gene-set Library at: https://maayanlab.cloud/Enrichr/#libraries ensembl Define as 'True' if 'genes' is a list of Ensembl gene IDs. (Default: False) plot True/False whether to provide a graphical overview of the first 15 results. (Default: False) figsize (width, height) of plot in inches. (Default: (10,10)) ax Pass a matplotlib axes object for further customization of the plot. (Default: None) json If True, returns results in json format instead of data frame. (Default: False) save True/False whether to save the results in the local directory. (Default: False) Returns a data frame with the Enrichr results.
info(ens_ids, wrap_text=False, pdb=False, ensembl_only=False, json=False, verbose=True, save=False, expand=False) Fetch gene and transcript metadata using Ensembl IDs. Args: ens_ids One or more Ensembl IDs to look up (string or list of strings). Also supports WormBase and Flybase IDs. wrap_text If True, displays data frame with wrapped text for easy reading. Default: False. pdb If True, also returns PDB IDs (might increase run time). Default: False. ensembl_only If True, only returns results from Ensembl (excludes PDB, UniProt, and NCBI results). Default: False. json If True, returns results in json/dictionary format instead of data frame. Default: False. verbose True/False whether to print progress information. Default True. save True/False wether to save csv with query results in current working directory. Default: False. Returns a data frame containing the requested information.
info(ens_ids, wrap_text=False, pdb=False, ensembl_only=False, json=False, verbose=True, save=False, expand=False) Fetch gene and transcript metadata using Ensembl IDs. Args: ens_ids One or more Ensembl IDs to look up (string or list of strings). Also supports WormBase and Flybase IDs. wrap_text If True, displays data frame with wrapped text for easy reading. Default: False. pdb If True, also returns PDB IDs (might increase run time). Default: False. ensembl_only If True, only returns results from Ensembl (excludes PDB, UniProt, and NCBI results). Default: False. json If True, returns results in json/dictionary format instead of data frame. Default: False. verbose True/False whether to print progress information. Default True. save True/False wether to save csv with query results in current working directory. Default: False. Returns a data frame containing the requested information.
seq(ens_ids, translate=False, isoforms=False, save=False, transcribe=None, seqtype=None) Fetch nucleotide or amino acid sequence (FASTA) of a gene (and all its isoforms) or transcript by Ensembl, WormBase or FlyBase ID. Args: ens_ids One or more Ensembl IDs (passed as string or list of strings). Also supports WormBase and FlyBase IDs. translate True/False (default: False -> returns nucleotide sequences). Defines whether nucleotide or amino acid sequences are returned. Nucleotide sequences are fetched from the Ensembl REST API server. Amino acid sequences are fetched from the UniProt REST API server. isoforms If True, returns the sequences of all known transcripts (default: False). (Only for gene IDs.) save If True, saves output FASTA to current directory (default: False). Returns a list (or FASTA file if 'save=True') containing the requested sequences.
seq(ens_ids, translate=False, isoforms=False, save=False, transcribe=None, seqtype=None) Fetch nucleotide or amino acid sequence (FASTA) of a gene (and all its isoforms) or transcript by Ensembl, WormBase or FlyBase ID. Args: ens_ids One or more Ensembl IDs (passed as string or list of strings). Also supports WormBase and FlyBase IDs. translate True/False (default: False -> returns nucleotide sequences). Defines whether nucleotide or amino acid sequences are returned. Nucleotide sequences are fetched from the Ensembl REST API server. Amino acid sequences are fetched from the UniProt REST API server. isoforms If True, returns the sequences of all known transcripts (default: False). (Only for gene IDs.) save If True, saves output FASTA to current directory (default: False). Returns a list (or FASTA file if 'save=True') containing the requested sequences.
muscle(fasta, super5=False, out=None) Align multiple nucleotide or amino acid sequences against each other (using the Muscle v5 algorithm). Args: fasta Path to fasta file containing the sequences to be aligned. super5 True/False (default: False). If True, align input using Super5 algorithm instead of PPP algorithm to decrease time and memory. Use for large inputs (a few hundred sequences). out Path to save an 'aligned FASTA' (.afa) file with the results, e.g. 'path/to/directory/results.afa’. Default: 'None' -> Results will be printed in Clustal format. Returns alignment results in an "aligned FASTA" (.afa) file.
blast(sequence, program='default', database='default', limit=50, expect=10.0, low_comp_filt=False, megablast=True, verbose=True, wrap_text=False, json=False, save=False) BLAST a nucleotide or amino acid sequence against any BLAST DB. Args: sequence Sequence (str) or path to FASTA file. (If more than one sequence in FASTA file, only the first will be submitted to BLAST.) program 'blastn', 'blastp', 'blastx', 'tblastn', or 'tblastx’. Default: 'blastn' for nucleotide sequences; 'blastp' for amino acid sequences. database 'nt', 'nr', 'refseq_rna', 'refseq_protein', 'swissprot', 'pdbaa', or 'pdbnt’. Default: 'nt' for nucleotide sequences; 'nr' for amino acid sequences. More info on BLAST databases: https://ncbi.github.io/blast-cloud/blastdb/available-blastdbs.html - limit Limits number of hits to return. Default 50. expect float or None. An expect value cutoff. Default 10.0. - low_comp_filt True/False whether to apply low complexity filter. Default False. megablast True/False whether to use the MegaBLAST algorithm (blastn only). Default True. verbose True/False whether to print progress information. Default True. wrap_text If True, displays data frame with wrapped text for easy reading. Default: False. json If True, returns results in json/dictionary format instead of data frame. Default: False. save If True, the data frame is saved as a csv in the current directory (default: False). Returns a data frame with the BLAST results.
blat(sequence, seqtype='default', assembly='human', json=False, save=False) BLAT a nucleotide or amino acid sequence against any BLAT UCSC assembly. Args: sequence Sequence (str) or path to fasta file containing one sequence. seqtype 'DNA', 'protein', 'translated%20RNA', or 'translated%20DNA'. Default: 'DNA' for nucleotide sequences; 'protein' for amino acid sequences. assembly 'human' (hg38) (default), 'mouse' (mm39), 'zebrafinch' (taeGut2), or any of the species assemblies available at https://genome.ucsc.edu/cgi-bin/hgBlat (use short assembly name as listed after the "/"). json If True, returns results in json format instead of data frame. Default: False. save If True, the data frame is saved as a csv in the current directory (default: False). Returns a data frame with the BLAT results.
alphafold(sequence, out='2022_12_30-1803_gget_alphafold_prediction', multimer_for_monomer=False, relax=False, multimer_recycles=3, plot=True, show_sidechains=True) Predicts the structure of a protein using a slightly simplified version of AlphaFold v2.3.0 (https://doi.org/10.1038/s41586-021-03819-2) published in the AlphaFold Colab notebook (https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb). Args: sequence Amino acid sequence (str), a list of sequences, or path to a FASTA file. out Path to folder to save prediction results in (str). Default: "./[date_time]_gget_alphafold_prediction" multimer_for_monomer Use multimer model for a monomer (default: False). multimer_recycles The multimer model will continue recycling until the predictions stop changing, up to the limit set here (default: 3). For higher accuracy, at the potential cost of longer inference times, set this to 20. relax True/False whether to AMBER relax the best model (default: False). plot True/False whether to provide a graphical overview of the prediction (default: True). show_sidechains True/False whether to show side chains in the plot (default: True). Saves the predicted aligned error (json) and the prediction (PDB) in the defined 'out' folder.
alphafold(sequence, out='2022_12_30-1803_gget_alphafold_prediction', multimer_for_monomer=False, relax=False, multimer_recycles=3, plot=True, show_sidechains=True) Predicts the structure of a protein using a slightly simplified version of AlphaFold v2.3.0 (https://doi.org/10.1038/s41586-021-03819-2) published in the AlphaFold Colab notebook (https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb). Args: sequence Amino acid sequence (str), a list of sequences, or path to a FASTA file. out Path to folder to save prediction results in (str). Default: "./[date_time]_gget_alphafold_prediction" multimer_for_monomer Use multimer model for a monomer (default: False). multimer_recycles The multimer model will continue recycling until the predictions stop changing, up to the limit set here (default: 3). For higher accuracy, at the potential cost of longer inference times, set this to 20. relax True/False whether to AMBER relax the best model (default: False). plot True/False whether to provide a graphical overview of the prediction (default: True). show_sidechains True/False whether to show side chains in the plot (default: True). Saves the predicted aligned error (json) and the prediction (PDB) in the defined 'out' folder.

Efficient querying of genomic reference databases with gget

Recommended

Recommended

More Related Content

More from Hoffman Lab

More from Hoffman Lab (20)

Recently uploaded

Recently uploaded (20)

Efficient querying of genomic reference databases with gget

Editor's Notes