SlideShare a Scribd company logo
1 of 54
BY:
V.DHANYALAKSHMI
AGENDA
• Database
• Significance of Database
• Primary biological database
• SEQUENCE DATABASE
1.Nucleic Acid
a)GENBANK
2. Protein
a) TrEMBL
b)SWISS-PROT
• UNIPROT
• DDBJ
• PDB
DATABASES
Are essential for storing, organizing and retrieving this information efficiently, since biological
research generates vast amounts of data.
• Primary Biological Database: GenBank
• Managed and maintained by: National Center for Biotechnology Information
(NCBI), a division of the National Institutes of Health (NIH) in the United
States.
• Contains: Annotated DNA and RNA sequences from various organisms,
including genes, transcripts, and proteins.
• Used in: Genetics, genomics, molecular biology, and bioinformatics research.
• Researchers worldwide deposit their sequence data into GenBank.
• Vital resource for scientists in the biological field.
This information is accurate as of September 2021.
PRIMARY BIOLOGICAL DATABASE
SIGNIFICANCE OF DATABASE
• Centralized storage of biological data
• Efficient organization and retrieval of large amounts of information.
• Integration of data from multiple sources.
• Facilitation of data analysis and explanation.
• Support for complex queries and statistical methods.
• Sharing of data and knowledge within the scientific community.
• Acceleration of scientific progress through collaboration.
• Improvement of healthcare and biotechnology through insights
gained from data analysis.
GENBANK
The Genbank sequence database is
• An open access
• Annotated collection of all publicity available nucleotide sequences and their
protein translations.
It is produced and maintained by the National Center for Biotechnology
Information (NCBI) as part of the International Nucleotide Sequence
Database Collaboration(INSDC)
GenBank 2023 update
ABSTRACT
• GenBank is a public and comprehensive database containing genetic information.
• It holds a vast amount of data, including 19.6 trillion base pairs.
• The database includes over 2.9 billion nucleotide sequences from 504,000 formally
described species.
• GenBank maintains daily data exchange with the European Nucleotide Archive (ENA)
and the DNA Data Bank of Japan (DDBJ), ensuring worldwide coverage and
collaboration.
• Recent updates include resources for data related to the SARS-CoV-2 virus, NCBI
Datasets, BLAST ClusteredNR, the Submission Portal, table2asn, and a Foreign
Contamination Screening tool.
• Additionally, the database includes information on BioSample, which provides
descriptions and metadata for biological samples used in genomic studies.
INTRODUCTION
• GenBank is a public database managed by NCBI, located at the NIH in Bethesda,
MD, USA.
• It contains nucleotide sequences and supporting annotations.
• The paper discusses recent developments in GenBank.
• It also provides brief usage guidelines for data submission and access.
• Readers can refer to https://www.ncbi.nlm.nih.gov/genbank/ for a general overview
of GenBank.
• Data in GenBank are collected in various divisions, with size and growth shown in
Table 1 and Figure 1.
• The VRL division saw a significant increase due to nearly 5 million new SARS-
CoV-2 sequences in the past year.
• Multiple complete mouse genomes, like BioProject PRJEB47108, contributed to
about two-thirds of the growth in the ROD division.
Division Description Base pairs*
WGS Whole genome shotgun data 17 511 809 676 629
TSA Transcriptome shotgun data 511 680 950 707
PLN Plants 484 803 006 831
INV Invertebrates 269 338 221 858
VRL Viruses 187 366 647 663
BCT Bacteria 166 217 792 419
VRT Other vertebrates 99 921 122 967
ROD Rodents 66 092 410 483
TLS Targeted loci studies 43 852 280 645
EST Expressed sequence tags 43 330 114 068
MAM Other mammals 41 720 029 494
PAT Patent sequences 30 938 105 095
HTG High-throughput genomic 27 801 878 633
GSS Genome survey sequences 26 380 049 011
PRI Primates 15 619 743 253
ENV Environmental samples 8 516 518 905
SYN Synthetic 8 030 787 249
PHG Phages 1 158 493 277
HTC High-throughput cDNA 740 853 492
STS Sequence tagged sites 640 923 137
UNA Unannotated 4 436 341
Table 1
GenBank divisions
[Base pairs]
*Release 251 (8/2022).
•This graph discusses the annual increase in base pairs (bp) for each division of GenBank in release 251
(August 2022) compared to release 245 (August 2021).
•Table 1 provides a description of the division abbreviations.
•The 'TOTAL' bar in the table represents the overall growth for GenBank during this period.
Figure 1
•GenBank is maintained by the National Center for Biotechnology Information (NCBI), which is
part of the National Library of Medicine (NLM), under the US National Institutes of Health (NIH).
•It serves as a repository for nucleotide sequences, which include DNA, RNA, and genomic data
from various
•organisms.
•The database includes annotations such as information about the organism, the gene product,
and related scientific literature.
•Researchers from around the world submit their genetic data to GenBank for public access and
sharing.
•The data in GenBank are freely available to the global scientific community and are widely used in
various research fields, including genetics, genomics, and bioinformatics.
•The paper highlights the remarkable growth in the VRL division due to the influx of approximately
5 million new SARS-CoV-2 sequences during the past year, reflecting the intense research on the
COVID-19 pandemic.
•The ROD division's substantial expansion is attributed to the inclusion of multiple complete mouse
genomes, indicating the importance of rodent research models in genetics and biomedical studies.
•The usage guidelines provided in the paper likely cover topics such as data formatting, submission
procedures, quality control measures, and how to access and retrieve data from the database.
•GenBank is an invaluable resource for scientists and researchers working on diverse biological
projects, facilitating the exchange and dissemination of genetic information worldwide.
•The size and diversity of GenBank's data make it an essential tool for comparative genomics,
evolutionary studies, and understanding genetic variations among different species.
RECENT DEVELOPMENTS
• SARS-CoV-2 resources
• Monkeypox sequences
• NCBI Datasets
• BLAST ClusteredNR
• Submission portal
• Table2asn
• Foreign contamination screening
• BioSample
1. SARS-CoV-2 Resources:
 A submission portal for SARS-CoV-2 sequences is available at the provided link https://submit.ncbi.nlm.nih.gov/sarscov2/.
 The portal accepts various data types related to SARS-CoV-2.
 Accessions to submitters are provided on average in 2 hours.
 Data submitted through the portal is accessible through INSDC databases, NCBI Virus resource, RefSeq, and BLAST.
 NCBI Datasets offers downloads for over 1.5 million complete SARS-CoV-2 genomes.
 A single landing page is available for the latest data and resources related to SARS-CoV-2.
2. Monkeypox Sequences:
 GenBank automatically detects and processes submissions of monkeypox sequences in response to the outbreak.
 The number of monkeypox sequences in GenBank increased by 270% in the past year.
3. NCBI Datasets:
 NCBI Datasets allows easy download of complex genomic datasets through various interfaces.
 The new genome table interface allows filtering, viewing, and downloading data from multiple species or taxonomic
nodes.
4. BLAST ClusteredNR:
 The protein BLAST web interface offers the ClusteredNR database for faster searches and exploring taxonomic diversity.
 BLAST results show representative sequences from clusters indicating the function of the protein.
5. Submission Portal:
 Eukaryotic nuclear mRNA sequence submission workflows are shifting from BankIt to the Submission
Portal.
 An interactive wizard simplifies the process, providing more control over release dates and editing
previous submissions.
6. Table2asn:
 The command-line tool table2asn replaces tbl2asn for preparing GenBank submissions.
 It is more efficient and accepts annotations in GenBank-format GFF files.
7. Foreign Contamination Screening (FCS) Tool:
 NCBI released a beta version of the FCS tool to improve the quality of submitted data.
 It consists of FCS-adaptor for detecting adaptor and vector contamination and FCS-GX for detecting
contamination from unintended sources.
8. BioSample:
 BioSample released new packages, including 'Pathogen' for standardizing samples of pathogenic
organisms and two SARS-CoV-2 packages for clinical and wastewater surveillance samples.
++
PubMed
PubMed Central
Click on Nucleotide (under Genomes)(Shown in Slide no. 17 )
Click on Homosapiens (Slide no. 20)
GI no.(GenInfo Identifier)
GI no. (GenInfo Identifier)
Identification no.
Features
Sequences
Scroll down and click on this Expand Ns
Click here
Shows like this
Represents the
unknown
sequence
FASTA
Graphical representation
Why is UniProtKB composed of 2 sections,
UniProtKB/Swiss-Prot and UniProtKB/TrEMBL?
• Swiss-Prot: Created in 1986, it's a high-quality manually annotated and non-redundant protein
sequence database.
• UniProtKB/Swiss-Prot: It's the reviewed section of the UniProt Knowledgebase, containing
experimental results, computed features, and scientific conclusions.
• TrEMBL: Introduced in 1996 to handle increased data flow from genome projects, it contains
computationally analyzed records enriched with automatic annotation and classification.
• Purpose of TrEMBL: To accommodate all available protein sequences without overwhelming the
labor-intensive manual curation process of Swiss-Prot.
• UniProtKB/TrEMBL entries: They are unreviewed and kept separate from Swiss-Prot to
maintain the high quality of the latter.
• Automatic processing: Enables quick availability of TrEMBL records to the public.
It was already recognized at that time that the traditional time- and labour-intensive manual curation
process which is the hallmark of Swiss-Prot could not be broadened to encompass all available
protein sequences. UniProtKB/TrEMBL contains high quality computationally analyzed records that
are enriched with automatic annotation and classification.
UNIPROT
STRUCTURE
The SWISS‐MODEL Repository of annotated
three‐dimensional protein structure homology models
1. Database - annotated 3D comparative protein structure models.
2. Generated by the fully automated homology‐modelling pipeline SWISS‐MODEL.
3. The Repository contains about 300,000 3D models for sequences from the Swiss‐Prot and TrEMBL
databases.
4. Regular Updates: new sequences (new template structures, and improve underlying modelling algorithms).
5. Contents of Entries: Each entry includes one or more 3D protein models, superposed template structures,
alignments, a summary of the modelling process, and a force field based quality assessment.
6. Querying: website at http://swissmodel.expasy.org/repository/.
7. Cross-Linking: such as Swiss-Prot on the ExPASy server, enabling seamless navigation between protein
sequence and structure information.
8. Purpose: The aim of the SWISS‐MODEL Repository is to provide access to an up‐to‐date collection of
annotated three‐dimensional protein models generated by automated homology modelling, bridging the gap
between sequence and structure databases.
OUTLOOK
• Rapid Growth
• Resource Connection
• Functional Annotation
• Widened Biological Information
• Cross-Linking
• Continued Development
• Overall Aim
ExPASy: the proteomics server for in-depth protein
knowledge and analysis
• Service Provider
• Databases
• Analysis Tools
• Integration
• Pioneering Service
• Mirror Sites
EMBL &TrEMBL
EMBL (European Molecular Biology Laboratory):
1.EMBL is a major bioinformatics resource and research institution in Europe.
2.It is involved in the collection, storage, and distribution of nucleotide sequences.
3.Nucleotide sequences in EMBL can be DNA or RNA sequences.
4.These sequences are obtained from various organisms and serve as valuable data for researchers
studying genes, evolutionary relationships, and genetic variation.
TrEMBL (Translated EMBL Nucleotide Sequence Database):
1.TrEMBL is a section of the UniProt Knowledgebase (UniProtKB).
2.It contains computationally translated nucleotide sequences.
3.These nucleotide sequences are converted into protein sequences using computational methods.
4.TrEMBL is a resource for protein sequences and functional information derived from nucleotide
sequences in EMBL.
5.The translation process in TrEMBL allows researchers to explore potential protein products encoded by
the nucleotide sequences in EMBL.
TrEMBL(Translation of EMBL)
• UniProtKB/TrEMBL contains translations of coding sequences (CDS) from the
EMBL/GenBank/DDBJ Nucleotide Sequence Databases.
• It also includes protein sequences extracted from the literature or submitted directly to
UniProtKB/Swiss-Prot.
• The database is enriched with automated classification and annotation.
• UniProtKB/TrEMBL serves as a comprehensive resource for protein sequence information,
covering a wide range of species and organisms.
• It complements the manually curated data in UniProtKB/Swiss-Prot, providing a larger set of
protein sequences, including those predicted from genomic sequences.
• Automated annotation in UniProtKB/TrEMBL involves the use of bioinformatics tools and
algorithms to infer functional and structural information for the proteins.
• Researchers and bioinformaticians widely use UniProtKB/TrEMBL to access up-to-date and
high-quality protein sequence data for various biological studies.
• The combination of UniProtKB/Swiss-Prot and UniProtKB/TrEMBL provides a valuable
resource for the scientific community to explore and understand protein function, structure,
and evolution.
Search 7DQA(ID of SARS CXOV-2)
THANK YOU

More Related Content

Similar to SOME OTHER TOOLS USED IN BIOINFORMATICS.pptx

V1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.docV1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.docpraveena06
 
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and PrimersGASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and Primersijdmtaiir
 
Biological Databases | Access to sequence data and related information
Biological Databases | Access to sequence data and related information Biological Databases | Access to sequence data and related information
Biological Databases | Access to sequence data and related information NahalMalik1
 
Submitting DNA sequences to the databases, SEQUIN.pptx
Submitting DNA sequences to the databases, SEQUIN.pptxSubmitting DNA sequences to the databases, SEQUIN.pptx
Submitting DNA sequences to the databases, SEQUIN.pptxVed Gharat
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseNathan Olson
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsRaj Varun
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformaticsVinaKhan1
 
Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in BioinformaticsMeghaj Mallick
 
Primary sequencing of nucleic acids
Primary sequencing of nucleic acidsPrimary sequencing of nucleic acids
Primary sequencing of nucleic acidsvibhakumari12
 
HBVR: A Global Repository for Genomics, Phylogenetics, and Therapeutics Resea...
HBVR: A Global Repository for Genomics, Phylogenetics, and Therapeutics Resea...HBVR: A Global Repository for Genomics, Phylogenetics, and Therapeutics Resea...
HBVR: A Global Repository for Genomics, Phylogenetics, and Therapeutics Resea...Vanshika Sharma
 
Nucleic Acid Sequence Databases
Nucleic Acid Sequence DatabasesNucleic Acid Sequence Databases
Nucleic Acid Sequence Databasesfarwa fayaz
 
Group 3 presentation.pptx
Group 3 presentation.pptxGroup 3 presentation.pptx
Group 3 presentation.pptxRoyBisenti
 
Nucleic Acid Databases (NDB ) of bioinformatics pptx
Nucleic Acid Databases (NDB ) of bioinformatics pptxNucleic Acid Databases (NDB ) of bioinformatics pptx
Nucleic Acid Databases (NDB ) of bioinformatics pptxkarmandeepkaur7
 
Role of Bioinformatics in Plant Pathology.pptx
Role of Bioinformatics in Plant Pathology.pptxRole of Bioinformatics in Plant Pathology.pptx
Role of Bioinformatics in Plant Pathology.pptxHasanRiaz18
 

Similar to SOME OTHER TOOLS USED IN BIOINFORMATICS.pptx (20)

V1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.docV1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.doc
 
Ddbj
DdbjDdbj
Ddbj
 
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and PrimersGASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Biological Databases | Access to sequence data and related information
Biological Databases | Access to sequence data and related information Biological Databases | Access to sequence data and related information
Biological Databases | Access to sequence data and related information
 
Submitting DNA sequences to the databases, SEQUIN.pptx
Submitting DNA sequences to the databases, SEQUIN.pptxSubmitting DNA sequences to the databases, SEQUIN.pptx
Submitting DNA sequences to the databases, SEQUIN.pptx
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
Biological database
Biological databaseBiological database
Biological database
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Protocols for genomics and proteomics
Protocols for genomics and proteomics Protocols for genomics and proteomics
Protocols for genomics and proteomics
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in Bioinformatics
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Primary sequencing of nucleic acids
Primary sequencing of nucleic acidsPrimary sequencing of nucleic acids
Primary sequencing of nucleic acids
 
HBVR: A Global Repository for Genomics, Phylogenetics, and Therapeutics Resea...
HBVR: A Global Repository for Genomics, Phylogenetics, and Therapeutics Resea...HBVR: A Global Repository for Genomics, Phylogenetics, and Therapeutics Resea...
HBVR: A Global Repository for Genomics, Phylogenetics, and Therapeutics Resea...
 
Data base in detail
Data base in detailData base in detail
Data base in detail
 
Nucleic Acid Sequence Databases
Nucleic Acid Sequence DatabasesNucleic Acid Sequence Databases
Nucleic Acid Sequence Databases
 
Group 3 presentation.pptx
Group 3 presentation.pptxGroup 3 presentation.pptx
Group 3 presentation.pptx
 
Nucleic Acid Databases (NDB ) of bioinformatics pptx
Nucleic Acid Databases (NDB ) of bioinformatics pptxNucleic Acid Databases (NDB ) of bioinformatics pptx
Nucleic Acid Databases (NDB ) of bioinformatics pptx
 
Role of Bioinformatics in Plant Pathology.pptx
Role of Bioinformatics in Plant Pathology.pptxRole of Bioinformatics in Plant Pathology.pptx
Role of Bioinformatics in Plant Pathology.pptx
 

Recently uploaded

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 

Recently uploaded (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 

SOME OTHER TOOLS USED IN BIOINFORMATICS.pptx

  • 2. AGENDA • Database • Significance of Database • Primary biological database • SEQUENCE DATABASE 1.Nucleic Acid a)GENBANK 2. Protein a) TrEMBL b)SWISS-PROT • UNIPROT • DDBJ • PDB
  • 3. DATABASES Are essential for storing, organizing and retrieving this information efficiently, since biological research generates vast amounts of data. • Primary Biological Database: GenBank • Managed and maintained by: National Center for Biotechnology Information (NCBI), a division of the National Institutes of Health (NIH) in the United States. • Contains: Annotated DNA and RNA sequences from various organisms, including genes, transcripts, and proteins. • Used in: Genetics, genomics, molecular biology, and bioinformatics research. • Researchers worldwide deposit their sequence data into GenBank. • Vital resource for scientists in the biological field. This information is accurate as of September 2021. PRIMARY BIOLOGICAL DATABASE
  • 4. SIGNIFICANCE OF DATABASE • Centralized storage of biological data • Efficient organization and retrieval of large amounts of information. • Integration of data from multiple sources. • Facilitation of data analysis and explanation. • Support for complex queries and statistical methods. • Sharing of data and knowledge within the scientific community. • Acceleration of scientific progress through collaboration. • Improvement of healthcare and biotechnology through insights gained from data analysis.
  • 5. GENBANK The Genbank sequence database is • An open access • Annotated collection of all publicity available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI) as part of the International Nucleotide Sequence Database Collaboration(INSDC)
  • 6.
  • 7. GenBank 2023 update ABSTRACT • GenBank is a public and comprehensive database containing genetic information. • It holds a vast amount of data, including 19.6 trillion base pairs. • The database includes over 2.9 billion nucleotide sequences from 504,000 formally described species. • GenBank maintains daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ), ensuring worldwide coverage and collaboration. • Recent updates include resources for data related to the SARS-CoV-2 virus, NCBI Datasets, BLAST ClusteredNR, the Submission Portal, table2asn, and a Foreign Contamination Screening tool. • Additionally, the database includes information on BioSample, which provides descriptions and metadata for biological samples used in genomic studies.
  • 8. INTRODUCTION • GenBank is a public database managed by NCBI, located at the NIH in Bethesda, MD, USA. • It contains nucleotide sequences and supporting annotations. • The paper discusses recent developments in GenBank. • It also provides brief usage guidelines for data submission and access. • Readers can refer to https://www.ncbi.nlm.nih.gov/genbank/ for a general overview of GenBank. • Data in GenBank are collected in various divisions, with size and growth shown in Table 1 and Figure 1. • The VRL division saw a significant increase due to nearly 5 million new SARS- CoV-2 sequences in the past year. • Multiple complete mouse genomes, like BioProject PRJEB47108, contributed to about two-thirds of the growth in the ROD division.
  • 9. Division Description Base pairs* WGS Whole genome shotgun data 17 511 809 676 629 TSA Transcriptome shotgun data 511 680 950 707 PLN Plants 484 803 006 831 INV Invertebrates 269 338 221 858 VRL Viruses 187 366 647 663 BCT Bacteria 166 217 792 419 VRT Other vertebrates 99 921 122 967 ROD Rodents 66 092 410 483 TLS Targeted loci studies 43 852 280 645 EST Expressed sequence tags 43 330 114 068 MAM Other mammals 41 720 029 494 PAT Patent sequences 30 938 105 095 HTG High-throughput genomic 27 801 878 633 GSS Genome survey sequences 26 380 049 011 PRI Primates 15 619 743 253 ENV Environmental samples 8 516 518 905 SYN Synthetic 8 030 787 249 PHG Phages 1 158 493 277 HTC High-throughput cDNA 740 853 492 STS Sequence tagged sites 640 923 137 UNA Unannotated 4 436 341 Table 1 GenBank divisions [Base pairs] *Release 251 (8/2022).
  • 10. •This graph discusses the annual increase in base pairs (bp) for each division of GenBank in release 251 (August 2022) compared to release 245 (August 2021). •Table 1 provides a description of the division abbreviations. •The 'TOTAL' bar in the table represents the overall growth for GenBank during this period. Figure 1
  • 11. •GenBank is maintained by the National Center for Biotechnology Information (NCBI), which is part of the National Library of Medicine (NLM), under the US National Institutes of Health (NIH). •It serves as a repository for nucleotide sequences, which include DNA, RNA, and genomic data from various •organisms. •The database includes annotations such as information about the organism, the gene product, and related scientific literature. •Researchers from around the world submit their genetic data to GenBank for public access and sharing. •The data in GenBank are freely available to the global scientific community and are widely used in various research fields, including genetics, genomics, and bioinformatics. •The paper highlights the remarkable growth in the VRL division due to the influx of approximately 5 million new SARS-CoV-2 sequences during the past year, reflecting the intense research on the COVID-19 pandemic. •The ROD division's substantial expansion is attributed to the inclusion of multiple complete mouse genomes, indicating the importance of rodent research models in genetics and biomedical studies. •The usage guidelines provided in the paper likely cover topics such as data formatting, submission procedures, quality control measures, and how to access and retrieve data from the database. •GenBank is an invaluable resource for scientists and researchers working on diverse biological projects, facilitating the exchange and dissemination of genetic information worldwide. •The size and diversity of GenBank's data make it an essential tool for comparative genomics, evolutionary studies, and understanding genetic variations among different species.
  • 12. RECENT DEVELOPMENTS • SARS-CoV-2 resources • Monkeypox sequences • NCBI Datasets • BLAST ClusteredNR • Submission portal • Table2asn • Foreign contamination screening • BioSample
  • 13. 1. SARS-CoV-2 Resources:  A submission portal for SARS-CoV-2 sequences is available at the provided link https://submit.ncbi.nlm.nih.gov/sarscov2/.  The portal accepts various data types related to SARS-CoV-2.  Accessions to submitters are provided on average in 2 hours.  Data submitted through the portal is accessible through INSDC databases, NCBI Virus resource, RefSeq, and BLAST.  NCBI Datasets offers downloads for over 1.5 million complete SARS-CoV-2 genomes.  A single landing page is available for the latest data and resources related to SARS-CoV-2. 2. Monkeypox Sequences:  GenBank automatically detects and processes submissions of monkeypox sequences in response to the outbreak.  The number of monkeypox sequences in GenBank increased by 270% in the past year. 3. NCBI Datasets:  NCBI Datasets allows easy download of complex genomic datasets through various interfaces.  The new genome table interface allows filtering, viewing, and downloading data from multiple species or taxonomic nodes. 4. BLAST ClusteredNR:  The protein BLAST web interface offers the ClusteredNR database for faster searches and exploring taxonomic diversity.  BLAST results show representative sequences from clusters indicating the function of the protein.
  • 14. 5. Submission Portal:  Eukaryotic nuclear mRNA sequence submission workflows are shifting from BankIt to the Submission Portal.  An interactive wizard simplifies the process, providing more control over release dates and editing previous submissions. 6. Table2asn:  The command-line tool table2asn replaces tbl2asn for preparing GenBank submissions.  It is more efficient and accepts annotations in GenBank-format GFF files. 7. Foreign Contamination Screening (FCS) Tool:  NCBI released a beta version of the FCS tool to improve the quality of submitted data.  It consists of FCS-adaptor for detecting adaptor and vector contamination and FCS-GX for detecting contamination from unintended sources. 8. BioSample:  BioSample released new packages, including 'Pathogen' for standardizing samples of pathogenic organisms and two SARS-CoV-2 packages for clinical and wastewater surveillance samples.
  • 15.
  • 16. ++
  • 17.
  • 20. Click on Nucleotide (under Genomes)(Shown in Slide no. 17 )
  • 21. Click on Homosapiens (Slide no. 20) GI no.(GenInfo Identifier) GI no. (GenInfo Identifier)
  • 23.
  • 26. Scroll down and click on this Expand Ns Click here
  • 27. Shows like this Represents the unknown sequence
  • 28. FASTA
  • 30.
  • 31. Why is UniProtKB composed of 2 sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL? • Swiss-Prot: Created in 1986, it's a high-quality manually annotated and non-redundant protein sequence database. • UniProtKB/Swiss-Prot: It's the reviewed section of the UniProt Knowledgebase, containing experimental results, computed features, and scientific conclusions. • TrEMBL: Introduced in 1996 to handle increased data flow from genome projects, it contains computationally analyzed records enriched with automatic annotation and classification. • Purpose of TrEMBL: To accommodate all available protein sequences without overwhelming the labor-intensive manual curation process of Swiss-Prot. • UniProtKB/TrEMBL entries: They are unreviewed and kept separate from Swiss-Prot to maintain the high quality of the latter. • Automatic processing: Enables quick availability of TrEMBL records to the public. It was already recognized at that time that the traditional time- and labour-intensive manual curation process which is the hallmark of Swiss-Prot could not be broadened to encompass all available protein sequences. UniProtKB/TrEMBL contains high quality computationally analyzed records that are enriched with automatic annotation and classification.
  • 33.
  • 34.
  • 35.
  • 37.
  • 38. The SWISS‐MODEL Repository of annotated three‐dimensional protein structure homology models 1. Database - annotated 3D comparative protein structure models. 2. Generated by the fully automated homology‐modelling pipeline SWISS‐MODEL. 3. The Repository contains about 300,000 3D models for sequences from the Swiss‐Prot and TrEMBL databases. 4. Regular Updates: new sequences (new template structures, and improve underlying modelling algorithms). 5. Contents of Entries: Each entry includes one or more 3D protein models, superposed template structures, alignments, a summary of the modelling process, and a force field based quality assessment. 6. Querying: website at http://swissmodel.expasy.org/repository/. 7. Cross-Linking: such as Swiss-Prot on the ExPASy server, enabling seamless navigation between protein sequence and structure information. 8. Purpose: The aim of the SWISS‐MODEL Repository is to provide access to an up‐to‐date collection of annotated three‐dimensional protein models generated by automated homology modelling, bridging the gap between sequence and structure databases.
  • 39. OUTLOOK • Rapid Growth • Resource Connection • Functional Annotation • Widened Biological Information • Cross-Linking • Continued Development • Overall Aim
  • 40. ExPASy: the proteomics server for in-depth protein knowledge and analysis • Service Provider • Databases • Analysis Tools • Integration • Pioneering Service • Mirror Sites
  • 41. EMBL &TrEMBL EMBL (European Molecular Biology Laboratory): 1.EMBL is a major bioinformatics resource and research institution in Europe. 2.It is involved in the collection, storage, and distribution of nucleotide sequences. 3.Nucleotide sequences in EMBL can be DNA or RNA sequences. 4.These sequences are obtained from various organisms and serve as valuable data for researchers studying genes, evolutionary relationships, and genetic variation. TrEMBL (Translated EMBL Nucleotide Sequence Database): 1.TrEMBL is a section of the UniProt Knowledgebase (UniProtKB). 2.It contains computationally translated nucleotide sequences. 3.These nucleotide sequences are converted into protein sequences using computational methods. 4.TrEMBL is a resource for protein sequences and functional information derived from nucleotide sequences in EMBL. 5.The translation process in TrEMBL allows researchers to explore potential protein products encoded by the nucleotide sequences in EMBL.
  • 42.
  • 43. TrEMBL(Translation of EMBL) • UniProtKB/TrEMBL contains translations of coding sequences (CDS) from the EMBL/GenBank/DDBJ Nucleotide Sequence Databases. • It also includes protein sequences extracted from the literature or submitted directly to UniProtKB/Swiss-Prot. • The database is enriched with automated classification and annotation. • UniProtKB/TrEMBL serves as a comprehensive resource for protein sequence information, covering a wide range of species and organisms. • It complements the manually curated data in UniProtKB/Swiss-Prot, providing a larger set of protein sequences, including those predicted from genomic sequences. • Automated annotation in UniProtKB/TrEMBL involves the use of bioinformatics tools and algorithms to infer functional and structural information for the proteins. • Researchers and bioinformaticians widely use UniProtKB/TrEMBL to access up-to-date and high-quality protein sequence data for various biological studies. • The combination of UniProtKB/Swiss-Prot and UniProtKB/TrEMBL provides a valuable resource for the scientific community to explore and understand protein function, structure, and evolution.
  • 44.
  • 45.
  • 46. Search 7DQA(ID of SARS CXOV-2)
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.