19. GenBank EMBL DDBJ There are three major public DNA databases The underlying raw DNA sequences are identical Databases in Bioinformatics
20. GenBank EMBL DDBJ Housed at EBI European Bioinformatics Institute There are three major public DNA databases Housed at NCBI National Center for Biotechnology Information Housed in Japan
21. >100,000 species are represented in GenBank all species 128,941 viruses 6,137 bacteria 31,262 archaea 2,100 eukaryota 87,147
22.
23.
24. The most sequenced organisms in GenBank Homo sapiens (6.9 million entries) Mus musculus (5.0 million) Zea mays (896,000) Rattus norvegicus (819,000) Gallus gallus (567,000) Arabidopsis thaliana (519,000) Danio rerio (492,000) Drosophila melanogaster (350,000) Oryza sativa (221,000)
30. Entrez is a search and retrieval system that integrates NCBI databases
31.
32.
33.
34.
35. Question #1: How can I use PubMed at NCBI to find literature information?
36. PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations and author abstracts from over 4,000 journals published in the United States and in 70 foreign countries. It has 12 million records dating back to 1966.
37. MeSH is the acronym for "Medical Subject Headings." MeSH is the list of the vocabulary terms used for subject analysis of biomedical literature at NLM. MeSH vocabulary is used for indexing journal articles for MEDLINE. The MeSH controlled vocabulary imposes uniformity and consistency to the indexing of biomedical literature.
38.
39.
40. PubMed search strategies Try the tutorial (“education” on the left sidebar) Use boolean queries lipocalin AND disease Try using “limits” Try “LinkOut” to find external resources Obtain articles on-line via Welch Medical Library (and download pdf files): http://www.welch.jhu.edu/
DNA sequences of genes are rarely of any functional value alone. It is the proteins that they encode that are important to the organism. The process of reading the code in DNA and converting that code into a functional protein is highly conserved across almost all branches of life. An RNA-based copy of a gene’s DNA sequence on a chromosome is constructed by a molecule called RNA polymerase through a process called transcription. This RNA molecule is then read by ribosomes, which manufacture amino acids and assemble them into amino acid sequences. This latter process is known as translation. To summarize: DNA sequences are transcribed into RNA sequences, which are then translated into proteins.
A gene sequence is not simply a series of codons. Instead, there are several key components. Promoter sequences assist the RNA polymerase in attaching itself to the DNA sequence template. Once the DNA sequence is transcribed, processing still remains. One of the most unexpected findings in the history of molecular genetics was the discovery that genes are split into pieces. Exons composed of codons are often interrupted by intron sequences that do not encode amino acids. Before translation can occur, the intron sequences must be spliced out of the RNA. The exons are then reassembled for translation into proteins.
Here we see a representation of the steps involved in creating a protein from a DNA sequence.