Go Get Data (GGD)
July 7th 2021
1
Go Get Data
“Conda” for data: A genomics
data management system that
provides access to processed
and curated genomic data files
2
Installation
Current version only works with Python 3.6 (not 3.8)
conda create --name ggd_env python=3.6
conda activate ggd_env
conda install -c bioconda ggd
3
Data recipe
ggd make-recipe --species Homo_sapiens 
--genome-build hg19 
--authors <your name> 
--package-version 1 
--data-version 27-Apr-2009 
--data-provider UCSC 
--coordinate-base "0-based-inclusive" 
--summary "cpg islands from UCSC" 
--dependency htslib 
--dependency gsort 
--keyword CpG 
--keyword region 
--name cpg-islands 
cpg.sh
cpg.sh
genome=https://raw.githubusercontent.com/gogetdata/ggd-
recipes/genomes/hg19/hg19.genome
wget -O - http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cpgIslandExt.txt.gz

| gzip -dc 
| cut -f 2-5 
| gsort /dev/stdin $genome 
| bgzip -c > cpg.bed.gz
tabix cpg.bed.gz
4
Data recipe
ggd make-recipe --species Homo_sapiens 
--genome-build hg19 
--authors <your name> 
--package-version 1 
--data-version 27-Apr-2009 
--data-provider UCSC 
--coordinate-base "0-based-inclusive" 
--summary "cpg islands from UCSC" 
--dependency htslib 
--dependency gsort 
--keyword CpG 
--keyword region 
--name cpg-islands 
cpg.sh
cpg.sh
genome=https://raw.githubusercontent.com/gogetdata/ggd-
recipes/genomes/hg19/hg19.genome
wget -O - http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cpgIslandExt.txt.gz

| gzip -dc 
| cut -f 2-5 
| gsort /dev/stdin $genome 
| bgzip -c > cpg.bed.gz
tabix cpg.bed.gz
Data package -> hg19-cpg-islands-ucsc-v1
5
Data packages
Search a package
$ ggd search genome
$ ggd search grch38 reference genome
Install a package $ ggd install grch38-reference-genome-ensembl-v1
Uninstall a package $ ggd uninstall grch38-reference-genome-ensembl-v1
List installed packages $ ggd list
6
Data environment variable
$ ggd install grch38-reference-genome-ensembl-v1
$ ls $ggd_grch38_reference_genome_ensemble_v1_dir
$ cd $ggd_grch38_reference_genome_ensemble_v1_dir
bwa mem $ggd_grch38_reference_genome_ensemble_v1_file reads.fq > aln.sam
7
Other features
Meta recipes
GGD + workflow (Snakemake)
Available data packages: https://gogetdata.github.io/recipes.html
- No epigenomic data
8

Go Get Data (GGD)

  • 1.
    Go Get Data(GGD) July 7th 2021 1
  • 2.
    Go Get Data “Conda”for data: A genomics data management system that provides access to processed and curated genomic data files 2
  • 3.
    Installation Current version onlyworks with Python 3.6 (not 3.8) conda create --name ggd_env python=3.6 conda activate ggd_env conda install -c bioconda ggd 3
  • 4.
    Data recipe ggd make-recipe--species Homo_sapiens --genome-build hg19 --authors <your name> --package-version 1 --data-version 27-Apr-2009 --data-provider UCSC --coordinate-base "0-based-inclusive" --summary "cpg islands from UCSC" --dependency htslib --dependency gsort --keyword CpG --keyword region --name cpg-islands cpg.sh cpg.sh genome=https://raw.githubusercontent.com/gogetdata/ggd- recipes/genomes/hg19/hg19.genome wget -O - http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cpgIslandExt.txt.gz | gzip -dc | cut -f 2-5 | gsort /dev/stdin $genome | bgzip -c > cpg.bed.gz tabix cpg.bed.gz 4
  • 5.
    Data recipe ggd make-recipe--species Homo_sapiens --genome-build hg19 --authors <your name> --package-version 1 --data-version 27-Apr-2009 --data-provider UCSC --coordinate-base "0-based-inclusive" --summary "cpg islands from UCSC" --dependency htslib --dependency gsort --keyword CpG --keyword region --name cpg-islands cpg.sh cpg.sh genome=https://raw.githubusercontent.com/gogetdata/ggd- recipes/genomes/hg19/hg19.genome wget -O - http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cpgIslandExt.txt.gz | gzip -dc | cut -f 2-5 | gsort /dev/stdin $genome | bgzip -c > cpg.bed.gz tabix cpg.bed.gz Data package -> hg19-cpg-islands-ucsc-v1 5
  • 6.
    Data packages Search apackage $ ggd search genome $ ggd search grch38 reference genome Install a package $ ggd install grch38-reference-genome-ensembl-v1 Uninstall a package $ ggd uninstall grch38-reference-genome-ensembl-v1 List installed packages $ ggd list 6
  • 7.
    Data environment variable $ggd install grch38-reference-genome-ensembl-v1 $ ls $ggd_grch38_reference_genome_ensemble_v1_dir $ cd $ggd_grch38_reference_genome_ensemble_v1_dir bwa mem $ggd_grch38_reference_genome_ensemble_v1_file reads.fq > aln.sam 7
  • 8.
    Other features Meta recipes GGD+ workflow (Snakemake) Available data packages: https://gogetdata.github.io/recipes.html - No epigenomic data 8