 Develop and maintain molecular and bibliographic
databases.
 Develop software for searching, and analysis of
these data.
 Provide Web access point for data and software.
1/24/2017 2
 Sequences
 Expression
 Genome Maps
 3D Structures
 Protein Domains
 Homologous Genes,
Proteins, Structures
 Pathways
 Genetic Variation
1/24/2017
3
 Biomedical Literature
 PubMed, PubMed Central, Bookshelf
 Molecular Databases and Metadatabases
 Sequences, Structures, Variation, Chemicals etc.
 Clinical / Medical Genetics
 GTR, ClinVar, MedGen, OMIM, PubMed Health, dbGaP
1/24/2017 4
 Primary Data /Database
 Results of a particular
technique
 Submitted to NCBI
 Submitter has editorial
control
 Curated Data /Database
 Based on primary
database records
 Third party (NCBI)
maintains and updates
 Often includes additional
analyses
1/24/2017
5
 Sequences (DNA)
 GenBank (International Sequence Database Collaboration)
now 2.1 X 1012 bases
 Sequence Read Archive (SRA), Next-Gen sequence reads now 9.7
X 1015 bases!
 Other databases with a primary component
 Expression
 Gene Expression Omnibus
 RNA-Seq, Microarray, Other high throughput data
 Variation
 dbSNP small scale variants
 dbVar genomic structural studies
 Database of Genotype and Phenotype (dbGaP)
1/24/2017 6
 Sequences
 GenPept translations of CDS regions on INSDC records
 NCBI Reference Sequences (DNA and Protein)
 Variation
 NCBI Reference SNPs (non-redundant set of variants)
 Structures
 NCBI’s MMDB
 based on PDB
 Conserved Domains
 NCBI Conserved Domain Database
1/24/2017 7
 Entrez integrated literature and molecular databases
 Graphical Sequence Viewer annotation viewer and
analysis tool
 BLAST sequence similarity search service
 VAST structure similarity searches
 Cn3D 3D structure viewer
 Genome Workbench standalone sequence analysis
annotation platform
 SRA Utilities
 SRA Run Browser web access for viewing, searching and
downloading next-generation reads
 SRA toolkit standalone SRA manipulator and client
1/24/2017 8
www.ncbi.nlm.nih.gov
9
1/24/2017
10
 Literature
 PubMed, PMC, Books
 Sequences
 Protein, Nuccore, GSS, SRA, Assembly
 Expression
 GEO profiles
 Variation
 dbSNP, dbVaR
 Protein and Nucleic acid structures
 Structure
 Small Molecules
 PubChem
 Medical Genetics
 ClinVar, MedGen, GTR
1/24/2017 11
Central Resources / Databases
• Taxonomy
• BioProject
• Assembly
• Gene
Follow links to others when
needed
Nucleotide, Protein, SRA
1/24/2017
12
The Entrez system: 39 (and counting) integrated databases
1/24/2017
13
If your question is about data for ...
 an organism -> Taxonomy
 a gene name -> Gene (common organisms)
 a large-scale project -> BioProject
 a bacterial genome -> Genome
 a genome sequence -> Assembly
1/24/2017 14
Organizes gene-centered data
 Biological role; genomic context; phenotypes; interactions;
literature
 Sequences
 Genomic
 Transcript
 Proteins
 Best entry point for many biomolecular searches
 Eukaryotic and Microbial Genomes
 17.3 million records for 13,566 taxa
1/24/2017 15
 Provide a reference standard
 Represent all molecules in the central dogma
 Selected Eukaryotes
 Genomic
 Transcripts
 Proteins
 All Prokaryotes and Viruses
 Genomic and Protein only
 Maintained by NCBI staff and outside experts
 Distinct accession series
 (NC_, AC_, NG_, NM_, NM_, NR_, XM_, XR_)
1/24/2017 16
Specific gene:
XXX[Symbol] AND YYY[Organism]
APRT[Symbol] AND human[Organism]
apt[Symbol] AND Escherichia coli[Organism]
All genes:
YYY[Organism] AND current only[Filter]
zebrafish[Organism]AND current
only[Filter]
1/24/2017 17
1/24/2017
18
Protein-Structure Shortcut
1/24/2017
19
UniGene
GEO
Profiles
Expression
HomoloG
ene
Homologs
PubMed
PMC
Literature
Gene
• Genomic Structure
• Orthologs via Gpipe
Structure
Structures
SNP ClinVar
Variation
OMIMdbGaP
Nuccore
Protein
Sequences
Homologs via Blink
Proteins w Structure via
Related Strutures
SRA
20
1/24/2017
 Learn: <ncbi>/learn.shtml
 Factsheets: <ftp>/pub/factsheets/
 NCBI YouTube Channel: (www.youtube.com/ncbinlm)
 NCBI Helpdesk: info@ncbi.nlm.nih.gov
1/24/2017 21

Ncbi basic intro_v_pitt_kent_osu

  • 2.
     Develop andmaintain molecular and bibliographic databases.  Develop software for searching, and analysis of these data.  Provide Web access point for data and software. 1/24/2017 2
  • 3.
     Sequences  Expression Genome Maps  3D Structures  Protein Domains  Homologous Genes, Proteins, Structures  Pathways  Genetic Variation 1/24/2017 3
  • 4.
     Biomedical Literature PubMed, PubMed Central, Bookshelf  Molecular Databases and Metadatabases  Sequences, Structures, Variation, Chemicals etc.  Clinical / Medical Genetics  GTR, ClinVar, MedGen, OMIM, PubMed Health, dbGaP 1/24/2017 4
  • 5.
     Primary Data/Database  Results of a particular technique  Submitted to NCBI  Submitter has editorial control  Curated Data /Database  Based on primary database records  Third party (NCBI) maintains and updates  Often includes additional analyses 1/24/2017 5
  • 6.
     Sequences (DNA) GenBank (International Sequence Database Collaboration) now 2.1 X 1012 bases  Sequence Read Archive (SRA), Next-Gen sequence reads now 9.7 X 1015 bases!  Other databases with a primary component  Expression  Gene Expression Omnibus  RNA-Seq, Microarray, Other high throughput data  Variation  dbSNP small scale variants  dbVar genomic structural studies  Database of Genotype and Phenotype (dbGaP) 1/24/2017 6
  • 7.
     Sequences  GenPepttranslations of CDS regions on INSDC records  NCBI Reference Sequences (DNA and Protein)  Variation  NCBI Reference SNPs (non-redundant set of variants)  Structures  NCBI’s MMDB  based on PDB  Conserved Domains  NCBI Conserved Domain Database 1/24/2017 7
  • 8.
     Entrez integratedliterature and molecular databases  Graphical Sequence Viewer annotation viewer and analysis tool  BLAST sequence similarity search service  VAST structure similarity searches  Cn3D 3D structure viewer  Genome Workbench standalone sequence analysis annotation platform  SRA Utilities  SRA Run Browser web access for viewing, searching and downloading next-generation reads  SRA toolkit standalone SRA manipulator and client 1/24/2017 8
  • 9.
  • 10.
  • 11.
     Literature  PubMed,PMC, Books  Sequences  Protein, Nuccore, GSS, SRA, Assembly  Expression  GEO profiles  Variation  dbSNP, dbVaR  Protein and Nucleic acid structures  Structure  Small Molecules  PubChem  Medical Genetics  ClinVar, MedGen, GTR 1/24/2017 11
  • 12.
    Central Resources /Databases • Taxonomy • BioProject • Assembly • Gene Follow links to others when needed Nucleotide, Protein, SRA 1/24/2017 12 The Entrez system: 39 (and counting) integrated databases
  • 13.
  • 14.
    If your questionis about data for ...  an organism -> Taxonomy  a gene name -> Gene (common organisms)  a large-scale project -> BioProject  a bacterial genome -> Genome  a genome sequence -> Assembly 1/24/2017 14
  • 15.
    Organizes gene-centered data Biological role; genomic context; phenotypes; interactions; literature  Sequences  Genomic  Transcript  Proteins  Best entry point for many biomolecular searches  Eukaryotic and Microbial Genomes  17.3 million records for 13,566 taxa 1/24/2017 15
  • 16.
     Provide areference standard  Represent all molecules in the central dogma  Selected Eukaryotes  Genomic  Transcripts  Proteins  All Prokaryotes and Viruses  Genomic and Protein only  Maintained by NCBI staff and outside experts  Distinct accession series  (NC_, AC_, NG_, NM_, NM_, NR_, XM_, XR_) 1/24/2017 16
  • 17.
    Specific gene: XXX[Symbol] ANDYYY[Organism] APRT[Symbol] AND human[Organism] apt[Symbol] AND Escherichia coli[Organism] All genes: YYY[Organism] AND current only[Filter] zebrafish[Organism]AND current only[Filter] 1/24/2017 17
  • 18.
  • 19.
  • 20.
    UniGene GEO Profiles Expression HomoloG ene Homologs PubMed PMC Literature Gene • Genomic Structure •Orthologs via Gpipe Structure Structures SNP ClinVar Variation OMIMdbGaP Nuccore Protein Sequences Homologs via Blink Proteins w Structure via Related Strutures SRA 20 1/24/2017
  • 21.
     Learn: <ncbi>/learn.shtml Factsheets: <ftp>/pub/factsheets/  NCBI YouTube Channel: (www.youtube.com/ncbinlm)  NCBI Helpdesk: info@ncbi.nlm.nih.gov 1/24/2017 21