SlideShare a Scribd company logo
GROUP 3
MEMBERS
1) TAPIWANASHE V MTUNGWAZI
R206014X1
2)TERRENCE S SITHOLE R202731Q
3)EDMUND T MAPHOSA R202713Q
4) TINOTENDA DHLIWAYO R193986W
5) MUNASHE LUKE R202758C
6) CLEOPATRA MWARIRA R205306R
7) NHIKA TADIWANASHE R197095Z
8) YEMURAI NENZOU R202732E
9)HONOUR MUSVIBE.T R202714V
10)ANTONY SARANAVO R207297Z
QUESTION: EXPLAIN
#1 :DNA AND PROTEIN DATABASES
PROTEIN DATABASES
A PROTEIN DATABASE IS A COLLECTION OF DATA THAT HAS BEEN CONSTRUCTED FROM
PHYSICAL, CHEMICAL AND BIOLOGICAL INFORMATION ON SEQUENCE, DOMAIN
STRUCTURE, FUNCTION, THREE‐DIMENSIONAL STRUCTURE AND PROTEIN‐PROTEIN
INTERACTIONS.
COLLECTIVELY, PROTEIN DATABASES MAY FORM A PROTEIN SEQUENCE DATABASE.
IT IS THEREFORE IMPORTANT TO USE APPROPRIATE PROTEIN DATABASES WHICH
1) ANALYSE AND STORE DATA PERTAINING TO PROTEIN SCIENCE AND
2) FACILITATE USAGE OF ANALYTICAL SOFTWARE AVAILABLE TO THE SCIENTIFIC
COMMUNITY
CONT.…
• GENERALLY CAN BE DIVIDED INTO TWO TYPES.
THE FIRST TYPE
IT IS A UNIVERSAL DATABASE, WHICH COVERS THE PROTEINS PRESENT IN ALL KNOWN
BIOLOGICAL SPECIES.
THE SECOND TYPE
IS A SPECIALIZED DATABASE, AS DESCRIBED HERE, WHICH DEALS WITH THE PROTEINS
BELONGING TO A SPECIFIC GROUP OR FAMILY OF PROTEINS OF CERTAIN SPECIES . EACH
PROTEIN DATABASE CAN BE FURTHER CLASSIFIED INTO MORE SPECIALIZED CATEGORIES
ACCORDING TO THE TYPE OF INFORMATION SOUGHT.
DNA DATABASE
IT IS A DATABASE OF DNA PROFILES WHICH CAN BE USED IN THE ANALYSIS OF
GENETIC DISEASES, GENETIC FINGERPRINTING FOR CRIMINOLOGY, OR GENETIC
GENEALOGY.
ALSO CALLED A DNA DATABANK
DNA DATABASES MAY BE PUBLIC OR PRIVATE, THE LARGEST ONES BEING NATIONAL
DNA DATABASES.
For instance the National DNA Index System (NDSI) WHICH IS PART OF CODIS THE
NATIONAL LEVEL CONTAINING THE DNA PROFILES CONTRIBUTED BY FEDERAL, STATE
AND LOCAL PARTICIPATING FORENSIC LABORATORIES.
CODIS (COMBINED DNA INDEX SYSTEM)THIS DATABASE IS USED BY THE FBI IN
CRIMINOLOGY.
#2 DATA STORAGE, INFORMATION RETRIEVAL AND
FILE FORMATS
DATA STORAGE:
 IS THE RETENTION OF INFORMATION USING TECHNOLOGY SPECIFICALLY DEVELOPED TO KEEP THAT DATA
AND HAVE IT AS ACCESSIBLE AS NECESSARY
 DATA STORAGE REFERS TO THE USE OF RECORDING MEDIA TO RETURN DATA USING COMPUTER
 THE MOST PREVALENT FORMS OF DATA ARE STORAGE ARE FILE STORAGE, BLOCK STORAGE, AND OBJECT
STORAGE ,WITH EACH BEING IDEAL FOR DIFFERENT PURPOSES
INFORMATION RETRIEVAL(IR):
 IT THE FIELD OF COMPUTER SCIENCE THAT DEALS WITH THE PROCESSING OF DOCUMENTS CONTAINING
FREE TEXT, SO THAT THEY CAN BE RAPIDLY RETRIEVED BASED ON KEYWORDS SPECIFIED IN A USERS QUERY
FILE FORMATS:
 THE FILE FORMAT IS THE STRUCTURE OF A FILE THAT TELLS A PROGRAM HOW TO DISPLAY ITS CONTENTS
AND THE EXAMPLES INCLUDE;
THE FASTA FORMAT, FASTQ, THE SAM /BAM FORMAT, THE VCF AND GFF FORMAT
#3 NCBI AND EBI RESOURCES FOR THE MOLECULAR
DOMAIN OF BIOINFORMATICS, GENBANK UNIPROT,
ENTREZ AND GENE ONTOLOGY:
• NCBI DATABASES
• NCBI (NATIONAL CENTRE FOR BIOTECHNOLOGY INFORMATION)
• THE NCBI HOUSES A SERIES OF DATABASES RELEVANT TO BIOTECHNOLOGY AND
BIOMEDICINE AND IS AN IMPORTANT RESOURCE FOR BIOINFORMATICS TOOLS
AND SERVICES. MAJOR DATABASES INCLUDE GENBANK FOR DNA SEQUENCES
AND PUBMED, A BIBLIOGRAPHIC DATABASE FOR THE BIOMEDICAL LITERATURE
EBI
EUROPEAN BIOINFORMATICS INSTITUTE (EBI) MAINTAINS AND DISTRIBUTES THE
EMBL NUCLEOTIDE SEQUENCE DATA-BASE, EUROPE’S PRIMARY NUCLEOTIDE
SEQUENCE DATA RESOURCE.
THE EBI ALSO MAINTAINS AND DISTRIBUTES THE SWISS-PROT PROTEIN
SEQUENCE DATABASE. OVER FIFTY ADDITIONAL SPECIALIST MOLECULAR
BIOLOGY DATABASES, AS WELL AS SOFTWARE AND DOCUMENTATION OF
INTEREST TO MOLECULAR BIOLOGISTS ARE AVAILABLE. THE EBI NETWORK
SERVICES INCLUDE DATABASE SEARCHING AND SEQUENCE SIMILARITY
SEARCHING FACILITIES.
EBI IS A SINGLE FIGURE PROFIT INDEX AIMED AT HELPING FARMERS IDENTIFY
THE MOST PROFITABLE BULLS AND COWS FOR BREEDING DAIRY HERD
REPLACEMENTS. IT COMPRISES OF INFORMATION ON SEVEN SUB-INDEXES
RELATED TO PROFITABLE MILK PRODUCTION.
WHAT IS GENBANK?
• THE GENBANK DATABASE IS DESIGNED TO PROVIDE AND ENCOURAGE ACCESS WITHIN THE
SCIENTIFIC COMMUNITY TO THE MOST UP-TO-DATE AND COMPREHENSIVE DNA SEQUENCE
INFORMATION. THEREFORE, NCBI PLACES NO RESTRICTIONS ON THE USE OR DISTRIBUTION OF THE
GENBANK DATA. HOWEVER, SOME SUBMITTERS MAY CLAIM PATENT, COPYRIGHT, OR OTHER
INTELLECTUAL PROPERTY RIGHTS IN ALL OR A PORTION OF THE DATA THEY HAVE SUBMITTED. NCBI
IS NOT IN A POSITION TO ASSESS THE VALIDITY OF SUCH CLAIMS, AND THEREFORE CANNOT
PROVIDE COMMENT OR UNRESTRICTED PERMISSION CONCERNING THE USE, COPYING, OR
DISTRIBUTION OF THE INFORMATION CONTAINED
• A GENBANK RELEASE OCCURS EVERY TWO MONTHS AND IS AVAILABLE FROM THE FTP SITE. THE
RELEASE NOTES FOR THE CURRENT VERSION OF GENBANK PROVIDE DETAILED INFORMATION ABOUT
THE RELEASE AND NOTIFICATIONS OF UPCOMING CHANGES TO GENBANK. RELEASE NOTES FOR
PREVIOUS GENBANK RELEASES ARE ALSO AVAILABLE. GENBANK GROWTH STATISTICS FOR BOTH THE
TRADITIONAL GENBANK DIVISIONS AND THE WGS DIVISION ARE AVAILABLE FROM EACH RELEASE.
UNIPROT
UNIPROT IS THE UNIVERSAL PROTEIN RESOURCE
 TO PROVIDE THE SCIENTIFIC COMMUNITY WITH A SINGLE, CENTRALIZED, AUTHORITATIVE RESOURCE FOR PROTEIN
SEQUENCES AND FUNCTIONAL INFORMATION, THE SWISS-PROT, TREMBL AND PIR PROTEIN DATABASE ACTIVITIES
HAVE UNITED TO FORM THE UNIVERSAL PROTEIN KNOWLEDGEBASE (UNIPROT) CONSORTIUM.ITS MISSION IS TO
PROVIDE A COMPREHENSIVE, FULLY CLASSIFIED, RICHLY AND ACCURATELY ANNOTATED PROTEIN SEQUENCE
KNOWLEDGEBASE, WITH EXTENSIVE CROSS-REFERENCES AND QUERY INTERFACES.
IN UNIPROT, ANNOTATION CONSISTS OF THE DESCRIPTION OF THE FOLLOWING ITEMS:
• FUNCTION(S) OF THE PROTEIN;
• ENZYME-SPECIFIC INFORMATION (CATALYTIC ACTIVITY, COFACTORS, METABOLIC PATHWAY, REGULATION
MECHANISMS);
• MOLECULAR WEIGHT DETERMINED BY MASS SPECTROMETRY;
• POLYMORPHISM(S);
• SIMILARITIES TO OTHER PROTEINS;
• USE OF THE PROTEIN IN A BIOTECHNOLOGICAL PROCESS;
• DISEASES ASSOCIATED WITH DEFICIENCIES OR ABNORMALITIES OF THE PROTEIN;
• USE OF THE PROTEIN AS A PHARMACEUTICAL DRUG
ENTREZ
• A SEARCH AND RETRIEVAL TOOL DEVELOPED BY NCBI THAT IS CAPABLE OF SEARCHING
MULTIPLE NCBI DATABASES WITH JUST ONE QUERY. ENTREZ RETURNS SEARCH RESULTS
THAT CAN INCLUDE A COMBINATION OF MANY TYPES OF DATA ON THE QUERY, SUCH AS
NUCLEOTIDE SEQUENCES, PROTEIN SEQUENCES, MACROMOLECULAR STRUCTURES, AND
RELATED ARTICLES IN THE LITERATURE.
• PRIOR TO THE CREATION OF ENTREZ, AN INDIVIDUAL MIGHT HAVE TO PLACE ONE
QUERY TO A NUCLEOTIDE DATABASE TO FIND A NUCLEOTIDE SEQUENCE, SUBMIT
ANOTHER QUERY TO A STRUCTURAL DATABASE TO FIND THE PUBLISHED STRUCTURE OF
THE GENE PRODUCT, AND SUBMIT A FINAL QUERY TO A LITERATURE DATABASE TO FIND
CITATIONS FOR JOURNAL ARTICLES ON THE QUERY TOPIC.
• NCBI RECOGNIZED THE TIME AND EFFORT THAT COULD BE SAVED BY A TOOL THAT
COULD CROSS-LINK THESE DATABASES AND INTEGRATE ALL INFORMATION RELATED TO
A GIVEN QUERY SUBJECT INTO ONE REPORT
GENE ONTOLOGY
• THE GENE ONTOLOGY (GO) KNOWLEDGEBASE IS THE WORLD’S LARGEST SOURCE OF INFORMATION ON THE
FUNCTIONS OF GENES.
• THIS KNOWLEDGE IS BOTH HUMAN-READABLE AND MACHINE-READABLE, AND IS A FOUNDATION FOR
COMPUTATIONAL ANALYSIS OF LARGE-SCALE MOLECULAR BIOLOGY AND GENETICS EXPERIMENTS IN
BIOMEDICAL RESEARCH
• THE GENE ONTOLOGY ALLOWS USERS TO DESCRIBE A GENE/GENE PRODUCT IN DETAIL,
CONSIDERING THREE MAIN ASPECTS:
i. ITS MOLECULAR FUNCTION
ii. THE BIOLOGICAL PROCESS IN WHICH IT PARTICIPATES,
iii. AND ITS CELLULAR LOCATION.
GENE ONTOLOGY CONT.….
THE FUNCTIONS
• FINDING FUNCTIONAL SIMILARITIES IN GENES THAT ARE OVEREXPRESSED OR
UNDER EXPRESSED IN DISEASES AND AS WE AGE;
• PREDICTING THE LIKELIHOOD THAT A PARTICULAR GENE IS INVOLVED IN
DISEASES THAT HAVEN’T YET BEEN MAPPED TO SPECIFIC GENES;
• ANALYSING GROUPS OF GENES THAT ARE CO-EXPRESSED DURING
DEVELOPMENT;
• DEVELOPING AUTOMATED WAYS OF DERIVING INFORMATION ABOUT GENE
FUNCTION FROM THE LITERATURE;
#4.WHAT IS BLAST? WHAT TYPE OF INFORMATION
DOES A BLAST SEARCH GIVE YOU? BLASTN AND
BLASTP ETC.
BLAST
BASIC LOCAL ALIGNMENT SEARCH TOOL {BLAST}
 BLAST FINDS REGIONS OF SIMILARITY BETWEEN BIOLOGICAL SEQUENCES. THE PROGRAM COMPARES
NUCLEOTIDE OR PROTEIN SEQUENCES TO SEQUENCE DATABASES AND CALCULATES THE STATISTICAL
SIGNIFICANCE.
 IDENTIFIES SIMILARITIES BETWEEN BIOLOGICAL SEQUENCES BY COMPARING NUCLEOTIDE OR PROTEIN
SEQUENCES TO A DATABASE OF SEQUENCES.
 THE BASIC LOCAL ALIGNMENT SEARCH TOOL (BLAST) FINDS REGIONS OF LOCAL SIMILARITY BETWEEN
SEQUENCES. THE PROGRAM COMPARES NUCLEOTIDE OR PROTEIN SEQUENCES TO SEQUENCE DATABASES
AND CALCULATES THE STATISTICAL SIGNIFICANCE OF MATCHES. BLAST CAN BE USED TO INFER
FUNCTIONAL AND EVOLUTIONARY RELATIONSHIPS BETWEEN SEQUENCES AS WELL AS HELP IDENTIFY
MEMBERS OF GENE FAMILIES.
 . THE PROGRAM COMPARES NUCLEOTIDE OR PROTEIN SEQUENCES TO SEQUENCE DATABASES AND
CALCULATES THE STATISTICAL SIGNIFICANCE OF MATCHES. BLAST CAN BE USED TO INFER FUNCTIONAL
AND EVOLUTIONARY RELATIONSHIPS BETWEEN SEQUENCES AS WELL AS HELP IDENTIFY MEMBERS OF GENE
FAMILIES.
THERE ARE SEVERAL TYPES OF BLAST SEARCHES. NCBI'S WEB BLAST OFFERS FOUR MAIN SEARCH
TYPES.
BLASTN, BLASTX, BLASTP AND TBLASTN.
BUT IN THE PRESENTATION WILL LOOK AT THE 2 SEARCHES WHICH ARE COMMONLY USED
i. BLASTN (NUCLEOTIDE BLAST):
COMPARES ONE OR MORE NUCLEOTIDE QUERY SEQUENCES TO A SUBJECT NUCLEOTIDE
SEQUENCE OR A DATABASE OF NUCLEOTIDE SEQUENCES. THIS IS USEFUL WHEN TRYING TO
DETERMINE THE EVOLUTIONARY RELATIONSHIPS AMONG DIFFERENT ORGANISMS.
ii. BLASTP (PROTEIN BLAST):
COMPARES ONE OR MORE PROTEIN QUERY SEQUENCES TO A SUBJECT PROTEIN SEQUENCE OR
A DATABASE OF PROTEIN SEQUENCES. THIS IS USEFUL WHEN TRYING TO IDENTIFY A PROTEIN
(SEE FROM SEQUENCE TO PROTEIN AND GENE.)
#5. DETAIL ON HOW TO
CONDUCT SEARCHES AND
ILLUSTRATE 2 SEARCHES AND
EXPLAIN RESULTS
HOW TO CONDUCT A BLAST SEARCH
i)FROM PROTEIN NAME TO A GENE SEQUENCE
GO TO GenBank WEBSITE TO GET A SPECIFIC PROTEIN SEQUENCE FOR YOUR PROTEIN OF
CHOICE.
YOU CAN GET A PROTEIN SEQUENCE IN FASTA FORMAT OR AN ACCESSION NUMBER.
ii)IDENTIFYING SEQUENCES USING BLAST
1. NAVIGATE TO THE MAIN BLAST PAGE (HTTPS://BLAST.NCBI.NLM.NIH.GOV/BLAST.CGI).
2. SELECT THE APPROPRIATE TYPE OF BLAST FOR YOUR SEQUENCE
3. PASTE THE FIRST UNKNOWN SEQUENCE INTO THE BOX (FOR THIS ACTIVITY, YOU CAN
IGNORE THE SEARCH OPTIONS)
4. CLICK ON THE “BLAST” BUTTON AND WAIT FOR THE RESULTS. BLAST IS USUALLY
FAIRLY QUICK FOR SHORT SEQUENCES, BUT SHOULD STILL TAKE A FEW SECONDS.
5. ONCE THE RESULTS ARE DISPLAYED, NOTICE THERE ARE THREE MAIN HEADINGS:
GRAPHIC SUMMARY, DESCRIPTIONS, AND ALIGNMENTS (THESE MAY BE EXPANDED SO
YOU’LL HAVE TO SCROLL DOWN).
FASTA FORMAT
FASTA FORMAT IS USED TO REPRESENT EITHER NUCLEOTIDE OR PEPTIDE SEQUENCES.
THE FIRST LINE IS A COMMENT LINE, BEGINNING WITH “>” AND DESCRIBING THE
SEQUENCE. ALL THE FOLLOWING LINES ARE THE SEQUENCE, IN PLAIN TEXT.
EXAMPLE DNA SEQUENCE IN FASTA FORMAT:
 >GI|23423|REF|NM_23542.0| HOMO SAPIENS PROTEIN
ATGAATCGATACGATAGCTAGCTATCGATGCA
GATCAGAGAGGGGCTTTAGCTAGCTAAGCTAG
EXAMPLE PROTEIN SEQUENCE IN FASTA FORMAT:
 >MCHU - CALMODULIN - HUMAN, RABBIT, BOVINE, RAT, AND CHICKEN
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID
FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
DIDGDGQVNYEEFVQMMTAK*
ACCESSION NUMBER
• XM 005537111.1
• THIS IS A UNIQUE IDENTIFIER ASSIGNED TO A RECORD IN SEQUENCE DATABASES
SUCH AS GENBANK
• HAS AN ALPHABETICAL PREFIX AND A SERIES OF DIGITS.
HOW TO INTERPRET RESULTS
BLAST RESULTS HAVE THE FOLLOWING FIELDS:
• E VALUE: THE E VALUE (EXPECTED VALUE) IS A NUMBER THAT DESCRIBES HOW MANY
TIMES YOU WOULD EXPECT A MATCH BY CHANCE IN A DATABASE OF THAT SIZE. THE
LOWER THE E VALUE IS, THE MORE SIGNIFICANT THE MATCH.
• PERCENT IDENTITY: THE PERCENT IDENTITY IS A NUMBER THAT DESCRIBES HOW SIMILAR
THE QUERY SEQUENCE IS TO THE TARGET SEQUENCE (HOW MANY CHARACTERS IN EACH
SEQUENCE ARE IDENTICAL). THE HIGHER THE PERCENT IDENTITY IS, THE MORE
SIGNIFICANT THE MATCH.
• QUERY COVER: THE QUERY COVER IS A NUMBER THAT DESCRIBES HOW MUCH OF THE
QUERY SEQUENCE IS COVERED BY THE TARGET SEQUENCE. IF THE TARGET SEQUENCE IN
THE DATABASE SPANS THE WHOLE QUERY SEQUENCE, THEN THE QUERY COVER IS 100%.
THIS TELLS US HOW LONG THE SEQUENCES ARE, RELATIVE TO EACH OTHER.
QUESTIONS
1. IN THE DESCRIPTIONS SECTION, LOOK AT THE TOP RESULT, WHICH SHOULD BE
THE RESULT WITH THE HIGHEST SCORE. WRITE DOWN INFORMATION ABOUT THE
BEST MATCH
 DESCRIPTION (NO NEED TO WRITE THE WHOLE THING)
 E VALUE IDENTITY
 QUERY COVER
2. NOW SCROLL DOWN TO THE ALIGNMENTS HEADING. LOOK AT THE TOP RESULT,
WHICH SHOULD BE THE SAME ONE. LOOK AT THE ALIGNMENT BETWEEN YOUR
QUERY AND THE REFERENCE. DO YOU SEE ANY MISMATCHES?
3. HOW CAN YOU JUDGE WHETHER THIS IS A GOOD MATCH?
EXAMPLE 2
REFERENCES
AFIQAH-ALENG N, MOHAMED-HUSSEIN ZA. CONSTRUCTION OF PROTEIN
EXPRESSION NETWORK. METHODS MOL BIOL. 2021;2189:119-132. DOI:
10.1007/978-1-0716-0822-7_10. PMID: 33180298.
STRUYF P, DE MOOR S, VANDEVIVER C, RENARD B, VANDER BEKEN T. THE
EFFECTIVENESS OF DNA DATABASES IN RELATION TO THEIR PURPOSE AND CONTENT:
A SYSTEMATIC REVIEW. FORENSIC SCI INT. 2019 AUG;301:371-381. DOI:
10.1016/J.FORSCIINT.2019.05.052. EPUB 2019 JUN 5. PMID: 31212144.
KANZ,C. ET AL. (2005) THE EMBL NUCLEOTIDE SEQUENCE DATABASE. NUCLEIC
ACIDS RES., 33, D29–D33.
ALTSCHUL SF, GISH W, MILLER W, MYERS EW, LIPMAN DJ: BASIC LOCAL ALIGNMENT
SEARCH TOOL. J MOL BIOL 1990, 215:403-410.2.
NCBI BLAST [HTTP://WWW.NCBI.NLM.NIH.GOV/BLAST/]

More Related Content

Similar to Group 3 presentation.pptx

(Expasy)
(Expasy)(Expasy)
(Expasy)
Mazhar Khan
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
Hafiz Muhammad Zeeshan Raza
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
Sangeeta Das
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
Elufer Akram
 
Protein database
Protein  databaseProtein  database
Protein database
KAUSHAL SAHU
 
Nucleic acid and protein databanks
Nucleic acid and protein databanksNucleic acid and protein databanks
Nucleic acid and protein databanks
NithyaNandapal
 
Genome data management
Genome data managementGenome data management
Genome data management
Shareb Ismaeel
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptx
Swarup Malakar
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
sworna kumari chithiraivelu
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
RAJESHKUMAR428748
 
Data base in detail
Data base in detailData base in detail
Data base in detail
Vartika Mishra
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
KAUSHAL SAHU
 
Protein Sequence Databases
Protein Sequence Databases Protein Sequence Databases
Protein Sequence Databases
Hemant Bothe
 
Bio2RDF and Beyond!
Bio2RDF and Beyond!Bio2RDF and Beyond!
Bio2RDF and Beyond!
Michel Dumontier
 
WWW in biotechnology
WWW in biotechnology WWW in biotechnology
WWW in biotechnology
sajal shrivastav
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
Dr. Naveen Gaurav srivastava
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Raj Varun
 
57.insilico studies of cellulase from Aspergillus terreus
57.insilico studies of cellulase from Aspergillus terreus57.insilico studies of cellulase from Aspergillus terreus
57.insilico studies of cellulase from Aspergillus terreus
Annadurai B
 
Biological databases
Biological databasesBiological databases
Biological databases
Sarfaraz Nasri
 

Similar to Group 3 presentation.pptx (20)

(Expasy)
(Expasy)(Expasy)
(Expasy)
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Protein database
Protein  databaseProtein  database
Protein database
 
Nucleic acid and protein databanks
Nucleic acid and protein databanksNucleic acid and protein databanks
Nucleic acid and protein databanks
 
Genome data management
Genome data managementGenome data management
Genome data management
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptx
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
 
Data base in detail
Data base in detailData base in detail
Data base in detail
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Protein Sequence Databases
Protein Sequence Databases Protein Sequence Databases
Protein Sequence Databases
 
Bio2RDF and Beyond!
Bio2RDF and Beyond!Bio2RDF and Beyond!
Bio2RDF and Beyond!
 
WWW in biotechnology
WWW in biotechnology WWW in biotechnology
WWW in biotechnology
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
57.insilico studies of cellulase from Aspergillus terreus
57.insilico studies of cellulase from Aspergillus terreus57.insilico studies of cellulase from Aspergillus terreus
57.insilico studies of cellulase from Aspergillus terreus
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Recently uploaded

Aortic Association CBL Pilot April 19 – 20 Bern
Aortic Association CBL Pilot April 19 – 20 BernAortic Association CBL Pilot April 19 – 20 Bern
Aortic Association CBL Pilot April 19 – 20 Bern
suvadeepdas911
 
#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...
#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...
#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...
chandankumarsmartiso
 
THERAPEUTIC ANTISENSE MOLECULES .pptx
THERAPEUTIC ANTISENSE MOLECULES    .pptxTHERAPEUTIC ANTISENSE MOLECULES    .pptx
THERAPEUTIC ANTISENSE MOLECULES .pptx
70KRISHPATEL
 
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
rishi2789
 
ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.
drhasanrajab
 
Top-Vitamin-Supplement-Brands-in-India List
Top-Vitamin-Supplement-Brands-in-India ListTop-Vitamin-Supplement-Brands-in-India List
Top-Vitamin-Supplement-Brands-in-India List
SwisschemDerma
 
Does Over-Masturbation Contribute to Chronic Prostatitis.pptx
Does Over-Masturbation Contribute to Chronic Prostatitis.pptxDoes Over-Masturbation Contribute to Chronic Prostatitis.pptx
Does Over-Masturbation Contribute to Chronic Prostatitis.pptx
walterHu5
 
Top Effective Soaps for Fungal Skin Infections in India
Top Effective Soaps for Fungal Skin Infections in IndiaTop Effective Soaps for Fungal Skin Infections in India
Top Effective Soaps for Fungal Skin Infections in India
SwisschemDerma
 
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptxThyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Dr. Rabia Inam Gandapore
 
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
Holistified Wellness
 
The Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic PrinciplesThe Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic Principles
MedicoseAcademics
 
Light House Retreats: Plant Medicine Retreat Europe
Light House Retreats: Plant Medicine Retreat EuropeLight House Retreats: Plant Medicine Retreat Europe
Light House Retreats: Plant Medicine Retreat Europe
Lighthouse Retreat
 
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Oleg Kshivets
 
Cell Therapy Expansion and Challenges in Autoimmune Disease
Cell Therapy Expansion and Challenges in Autoimmune DiseaseCell Therapy Expansion and Challenges in Autoimmune Disease
Cell Therapy Expansion and Challenges in Autoimmune Disease
Health Advances
 
Efficacy of Avartana Sneha in Ayurveda
Efficacy of Avartana Sneha in AyurvedaEfficacy of Avartana Sneha in Ayurveda
Efficacy of Avartana Sneha in Ayurveda
Dr. Jyothirmai Paindla
 
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidadeNovas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Prof. Marcus Renato de Carvalho
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
Dr. Jyothirmai Paindla
 
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdfCHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
rishi2789
 
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptxVestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Dr. Rabia Inam Gandapore
 
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradeshBasavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Dr. Madduru Muni Haritha
 

Recently uploaded (20)

Aortic Association CBL Pilot April 19 – 20 Bern
Aortic Association CBL Pilot April 19 – 20 BernAortic Association CBL Pilot April 19 – 20 Bern
Aortic Association CBL Pilot April 19 – 20 Bern
 
#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...
#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...
#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...
 
THERAPEUTIC ANTISENSE MOLECULES .pptx
THERAPEUTIC ANTISENSE MOLECULES    .pptxTHERAPEUTIC ANTISENSE MOLECULES    .pptx
THERAPEUTIC ANTISENSE MOLECULES .pptx
 
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
 
ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.
 
Top-Vitamin-Supplement-Brands-in-India List
Top-Vitamin-Supplement-Brands-in-India ListTop-Vitamin-Supplement-Brands-in-India List
Top-Vitamin-Supplement-Brands-in-India List
 
Does Over-Masturbation Contribute to Chronic Prostatitis.pptx
Does Over-Masturbation Contribute to Chronic Prostatitis.pptxDoes Over-Masturbation Contribute to Chronic Prostatitis.pptx
Does Over-Masturbation Contribute to Chronic Prostatitis.pptx
 
Top Effective Soaps for Fungal Skin Infections in India
Top Effective Soaps for Fungal Skin Infections in IndiaTop Effective Soaps for Fungal Skin Infections in India
Top Effective Soaps for Fungal Skin Infections in India
 
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptxThyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
 
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
 
The Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic PrinciplesThe Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic Principles
 
Light House Retreats: Plant Medicine Retreat Europe
Light House Retreats: Plant Medicine Retreat EuropeLight House Retreats: Plant Medicine Retreat Europe
Light House Retreats: Plant Medicine Retreat Europe
 
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
Local Advanced Lung Cancer: Artificial Intelligence, Synergetics, Complex Sys...
 
Cell Therapy Expansion and Challenges in Autoimmune Disease
Cell Therapy Expansion and Challenges in Autoimmune DiseaseCell Therapy Expansion and Challenges in Autoimmune Disease
Cell Therapy Expansion and Challenges in Autoimmune Disease
 
Efficacy of Avartana Sneha in Ayurveda
Efficacy of Avartana Sneha in AyurvedaEfficacy of Avartana Sneha in Ayurveda
Efficacy of Avartana Sneha in Ayurveda
 
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidadeNovas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
 
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdfCHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
 
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptxVestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
 
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradeshBasavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
 

Group 3 presentation.pptx

  • 2. MEMBERS 1) TAPIWANASHE V MTUNGWAZI R206014X1 2)TERRENCE S SITHOLE R202731Q 3)EDMUND T MAPHOSA R202713Q 4) TINOTENDA DHLIWAYO R193986W 5) MUNASHE LUKE R202758C 6) CLEOPATRA MWARIRA R205306R 7) NHIKA TADIWANASHE R197095Z 8) YEMURAI NENZOU R202732E 9)HONOUR MUSVIBE.T R202714V 10)ANTONY SARANAVO R207297Z
  • 3. QUESTION: EXPLAIN #1 :DNA AND PROTEIN DATABASES PROTEIN DATABASES A PROTEIN DATABASE IS A COLLECTION OF DATA THAT HAS BEEN CONSTRUCTED FROM PHYSICAL, CHEMICAL AND BIOLOGICAL INFORMATION ON SEQUENCE, DOMAIN STRUCTURE, FUNCTION, THREE‐DIMENSIONAL STRUCTURE AND PROTEIN‐PROTEIN INTERACTIONS. COLLECTIVELY, PROTEIN DATABASES MAY FORM A PROTEIN SEQUENCE DATABASE. IT IS THEREFORE IMPORTANT TO USE APPROPRIATE PROTEIN DATABASES WHICH 1) ANALYSE AND STORE DATA PERTAINING TO PROTEIN SCIENCE AND 2) FACILITATE USAGE OF ANALYTICAL SOFTWARE AVAILABLE TO THE SCIENTIFIC COMMUNITY
  • 4. CONT.… • GENERALLY CAN BE DIVIDED INTO TWO TYPES. THE FIRST TYPE IT IS A UNIVERSAL DATABASE, WHICH COVERS THE PROTEINS PRESENT IN ALL KNOWN BIOLOGICAL SPECIES. THE SECOND TYPE IS A SPECIALIZED DATABASE, AS DESCRIBED HERE, WHICH DEALS WITH THE PROTEINS BELONGING TO A SPECIFIC GROUP OR FAMILY OF PROTEINS OF CERTAIN SPECIES . EACH PROTEIN DATABASE CAN BE FURTHER CLASSIFIED INTO MORE SPECIALIZED CATEGORIES ACCORDING TO THE TYPE OF INFORMATION SOUGHT.
  • 5. DNA DATABASE IT IS A DATABASE OF DNA PROFILES WHICH CAN BE USED IN THE ANALYSIS OF GENETIC DISEASES, GENETIC FINGERPRINTING FOR CRIMINOLOGY, OR GENETIC GENEALOGY. ALSO CALLED A DNA DATABANK DNA DATABASES MAY BE PUBLIC OR PRIVATE, THE LARGEST ONES BEING NATIONAL DNA DATABASES. For instance the National DNA Index System (NDSI) WHICH IS PART OF CODIS THE NATIONAL LEVEL CONTAINING THE DNA PROFILES CONTRIBUTED BY FEDERAL, STATE AND LOCAL PARTICIPATING FORENSIC LABORATORIES. CODIS (COMBINED DNA INDEX SYSTEM)THIS DATABASE IS USED BY THE FBI IN CRIMINOLOGY.
  • 6. #2 DATA STORAGE, INFORMATION RETRIEVAL AND FILE FORMATS DATA STORAGE:  IS THE RETENTION OF INFORMATION USING TECHNOLOGY SPECIFICALLY DEVELOPED TO KEEP THAT DATA AND HAVE IT AS ACCESSIBLE AS NECESSARY  DATA STORAGE REFERS TO THE USE OF RECORDING MEDIA TO RETURN DATA USING COMPUTER  THE MOST PREVALENT FORMS OF DATA ARE STORAGE ARE FILE STORAGE, BLOCK STORAGE, AND OBJECT STORAGE ,WITH EACH BEING IDEAL FOR DIFFERENT PURPOSES INFORMATION RETRIEVAL(IR):  IT THE FIELD OF COMPUTER SCIENCE THAT DEALS WITH THE PROCESSING OF DOCUMENTS CONTAINING FREE TEXT, SO THAT THEY CAN BE RAPIDLY RETRIEVED BASED ON KEYWORDS SPECIFIED IN A USERS QUERY FILE FORMATS:  THE FILE FORMAT IS THE STRUCTURE OF A FILE THAT TELLS A PROGRAM HOW TO DISPLAY ITS CONTENTS AND THE EXAMPLES INCLUDE; THE FASTA FORMAT, FASTQ, THE SAM /BAM FORMAT, THE VCF AND GFF FORMAT
  • 7. #3 NCBI AND EBI RESOURCES FOR THE MOLECULAR DOMAIN OF BIOINFORMATICS, GENBANK UNIPROT, ENTREZ AND GENE ONTOLOGY: • NCBI DATABASES • NCBI (NATIONAL CENTRE FOR BIOTECHNOLOGY INFORMATION) • THE NCBI HOUSES A SERIES OF DATABASES RELEVANT TO BIOTECHNOLOGY AND BIOMEDICINE AND IS AN IMPORTANT RESOURCE FOR BIOINFORMATICS TOOLS AND SERVICES. MAJOR DATABASES INCLUDE GENBANK FOR DNA SEQUENCES AND PUBMED, A BIBLIOGRAPHIC DATABASE FOR THE BIOMEDICAL LITERATURE
  • 8. EBI EUROPEAN BIOINFORMATICS INSTITUTE (EBI) MAINTAINS AND DISTRIBUTES THE EMBL NUCLEOTIDE SEQUENCE DATA-BASE, EUROPE’S PRIMARY NUCLEOTIDE SEQUENCE DATA RESOURCE. THE EBI ALSO MAINTAINS AND DISTRIBUTES THE SWISS-PROT PROTEIN SEQUENCE DATABASE. OVER FIFTY ADDITIONAL SPECIALIST MOLECULAR BIOLOGY DATABASES, AS WELL AS SOFTWARE AND DOCUMENTATION OF INTEREST TO MOLECULAR BIOLOGISTS ARE AVAILABLE. THE EBI NETWORK SERVICES INCLUDE DATABASE SEARCHING AND SEQUENCE SIMILARITY SEARCHING FACILITIES. EBI IS A SINGLE FIGURE PROFIT INDEX AIMED AT HELPING FARMERS IDENTIFY THE MOST PROFITABLE BULLS AND COWS FOR BREEDING DAIRY HERD REPLACEMENTS. IT COMPRISES OF INFORMATION ON SEVEN SUB-INDEXES RELATED TO PROFITABLE MILK PRODUCTION.
  • 9. WHAT IS GENBANK? • THE GENBANK DATABASE IS DESIGNED TO PROVIDE AND ENCOURAGE ACCESS WITHIN THE SCIENTIFIC COMMUNITY TO THE MOST UP-TO-DATE AND COMPREHENSIVE DNA SEQUENCE INFORMATION. THEREFORE, NCBI PLACES NO RESTRICTIONS ON THE USE OR DISTRIBUTION OF THE GENBANK DATA. HOWEVER, SOME SUBMITTERS MAY CLAIM PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHTS IN ALL OR A PORTION OF THE DATA THEY HAVE SUBMITTED. NCBI IS NOT IN A POSITION TO ASSESS THE VALIDITY OF SUCH CLAIMS, AND THEREFORE CANNOT PROVIDE COMMENT OR UNRESTRICTED PERMISSION CONCERNING THE USE, COPYING, OR DISTRIBUTION OF THE INFORMATION CONTAINED • A GENBANK RELEASE OCCURS EVERY TWO MONTHS AND IS AVAILABLE FROM THE FTP SITE. THE RELEASE NOTES FOR THE CURRENT VERSION OF GENBANK PROVIDE DETAILED INFORMATION ABOUT THE RELEASE AND NOTIFICATIONS OF UPCOMING CHANGES TO GENBANK. RELEASE NOTES FOR PREVIOUS GENBANK RELEASES ARE ALSO AVAILABLE. GENBANK GROWTH STATISTICS FOR BOTH THE TRADITIONAL GENBANK DIVISIONS AND THE WGS DIVISION ARE AVAILABLE FROM EACH RELEASE.
  • 10. UNIPROT UNIPROT IS THE UNIVERSAL PROTEIN RESOURCE  TO PROVIDE THE SCIENTIFIC COMMUNITY WITH A SINGLE, CENTRALIZED, AUTHORITATIVE RESOURCE FOR PROTEIN SEQUENCES AND FUNCTIONAL INFORMATION, THE SWISS-PROT, TREMBL AND PIR PROTEIN DATABASE ACTIVITIES HAVE UNITED TO FORM THE UNIVERSAL PROTEIN KNOWLEDGEBASE (UNIPROT) CONSORTIUM.ITS MISSION IS TO PROVIDE A COMPREHENSIVE, FULLY CLASSIFIED, RICHLY AND ACCURATELY ANNOTATED PROTEIN SEQUENCE KNOWLEDGEBASE, WITH EXTENSIVE CROSS-REFERENCES AND QUERY INTERFACES. IN UNIPROT, ANNOTATION CONSISTS OF THE DESCRIPTION OF THE FOLLOWING ITEMS: • FUNCTION(S) OF THE PROTEIN; • ENZYME-SPECIFIC INFORMATION (CATALYTIC ACTIVITY, COFACTORS, METABOLIC PATHWAY, REGULATION MECHANISMS); • MOLECULAR WEIGHT DETERMINED BY MASS SPECTROMETRY; • POLYMORPHISM(S); • SIMILARITIES TO OTHER PROTEINS; • USE OF THE PROTEIN IN A BIOTECHNOLOGICAL PROCESS; • DISEASES ASSOCIATED WITH DEFICIENCIES OR ABNORMALITIES OF THE PROTEIN; • USE OF THE PROTEIN AS A PHARMACEUTICAL DRUG
  • 11. ENTREZ • A SEARCH AND RETRIEVAL TOOL DEVELOPED BY NCBI THAT IS CAPABLE OF SEARCHING MULTIPLE NCBI DATABASES WITH JUST ONE QUERY. ENTREZ RETURNS SEARCH RESULTS THAT CAN INCLUDE A COMBINATION OF MANY TYPES OF DATA ON THE QUERY, SUCH AS NUCLEOTIDE SEQUENCES, PROTEIN SEQUENCES, MACROMOLECULAR STRUCTURES, AND RELATED ARTICLES IN THE LITERATURE. • PRIOR TO THE CREATION OF ENTREZ, AN INDIVIDUAL MIGHT HAVE TO PLACE ONE QUERY TO A NUCLEOTIDE DATABASE TO FIND A NUCLEOTIDE SEQUENCE, SUBMIT ANOTHER QUERY TO A STRUCTURAL DATABASE TO FIND THE PUBLISHED STRUCTURE OF THE GENE PRODUCT, AND SUBMIT A FINAL QUERY TO A LITERATURE DATABASE TO FIND CITATIONS FOR JOURNAL ARTICLES ON THE QUERY TOPIC. • NCBI RECOGNIZED THE TIME AND EFFORT THAT COULD BE SAVED BY A TOOL THAT COULD CROSS-LINK THESE DATABASES AND INTEGRATE ALL INFORMATION RELATED TO A GIVEN QUERY SUBJECT INTO ONE REPORT
  • 12. GENE ONTOLOGY • THE GENE ONTOLOGY (GO) KNOWLEDGEBASE IS THE WORLD’S LARGEST SOURCE OF INFORMATION ON THE FUNCTIONS OF GENES. • THIS KNOWLEDGE IS BOTH HUMAN-READABLE AND MACHINE-READABLE, AND IS A FOUNDATION FOR COMPUTATIONAL ANALYSIS OF LARGE-SCALE MOLECULAR BIOLOGY AND GENETICS EXPERIMENTS IN BIOMEDICAL RESEARCH • THE GENE ONTOLOGY ALLOWS USERS TO DESCRIBE A GENE/GENE PRODUCT IN DETAIL, CONSIDERING THREE MAIN ASPECTS: i. ITS MOLECULAR FUNCTION ii. THE BIOLOGICAL PROCESS IN WHICH IT PARTICIPATES, iii. AND ITS CELLULAR LOCATION.
  • 13. GENE ONTOLOGY CONT.…. THE FUNCTIONS • FINDING FUNCTIONAL SIMILARITIES IN GENES THAT ARE OVEREXPRESSED OR UNDER EXPRESSED IN DISEASES AND AS WE AGE; • PREDICTING THE LIKELIHOOD THAT A PARTICULAR GENE IS INVOLVED IN DISEASES THAT HAVEN’T YET BEEN MAPPED TO SPECIFIC GENES; • ANALYSING GROUPS OF GENES THAT ARE CO-EXPRESSED DURING DEVELOPMENT; • DEVELOPING AUTOMATED WAYS OF DERIVING INFORMATION ABOUT GENE FUNCTION FROM THE LITERATURE;
  • 14. #4.WHAT IS BLAST? WHAT TYPE OF INFORMATION DOES A BLAST SEARCH GIVE YOU? BLASTN AND BLASTP ETC.
  • 15. BLAST BASIC LOCAL ALIGNMENT SEARCH TOOL {BLAST}  BLAST FINDS REGIONS OF SIMILARITY BETWEEN BIOLOGICAL SEQUENCES. THE PROGRAM COMPARES NUCLEOTIDE OR PROTEIN SEQUENCES TO SEQUENCE DATABASES AND CALCULATES THE STATISTICAL SIGNIFICANCE.  IDENTIFIES SIMILARITIES BETWEEN BIOLOGICAL SEQUENCES BY COMPARING NUCLEOTIDE OR PROTEIN SEQUENCES TO A DATABASE OF SEQUENCES.  THE BASIC LOCAL ALIGNMENT SEARCH TOOL (BLAST) FINDS REGIONS OF LOCAL SIMILARITY BETWEEN SEQUENCES. THE PROGRAM COMPARES NUCLEOTIDE OR PROTEIN SEQUENCES TO SEQUENCE DATABASES AND CALCULATES THE STATISTICAL SIGNIFICANCE OF MATCHES. BLAST CAN BE USED TO INFER FUNCTIONAL AND EVOLUTIONARY RELATIONSHIPS BETWEEN SEQUENCES AS WELL AS HELP IDENTIFY MEMBERS OF GENE FAMILIES.  . THE PROGRAM COMPARES NUCLEOTIDE OR PROTEIN SEQUENCES TO SEQUENCE DATABASES AND CALCULATES THE STATISTICAL SIGNIFICANCE OF MATCHES. BLAST CAN BE USED TO INFER FUNCTIONAL AND EVOLUTIONARY RELATIONSHIPS BETWEEN SEQUENCES AS WELL AS HELP IDENTIFY MEMBERS OF GENE FAMILIES.
  • 16. THERE ARE SEVERAL TYPES OF BLAST SEARCHES. NCBI'S WEB BLAST OFFERS FOUR MAIN SEARCH TYPES. BLASTN, BLASTX, BLASTP AND TBLASTN. BUT IN THE PRESENTATION WILL LOOK AT THE 2 SEARCHES WHICH ARE COMMONLY USED i. BLASTN (NUCLEOTIDE BLAST): COMPARES ONE OR MORE NUCLEOTIDE QUERY SEQUENCES TO A SUBJECT NUCLEOTIDE SEQUENCE OR A DATABASE OF NUCLEOTIDE SEQUENCES. THIS IS USEFUL WHEN TRYING TO DETERMINE THE EVOLUTIONARY RELATIONSHIPS AMONG DIFFERENT ORGANISMS. ii. BLASTP (PROTEIN BLAST): COMPARES ONE OR MORE PROTEIN QUERY SEQUENCES TO A SUBJECT PROTEIN SEQUENCE OR A DATABASE OF PROTEIN SEQUENCES. THIS IS USEFUL WHEN TRYING TO IDENTIFY A PROTEIN (SEE FROM SEQUENCE TO PROTEIN AND GENE.)
  • 17. #5. DETAIL ON HOW TO CONDUCT SEARCHES AND ILLUSTRATE 2 SEARCHES AND EXPLAIN RESULTS
  • 18. HOW TO CONDUCT A BLAST SEARCH i)FROM PROTEIN NAME TO A GENE SEQUENCE GO TO GenBank WEBSITE TO GET A SPECIFIC PROTEIN SEQUENCE FOR YOUR PROTEIN OF CHOICE. YOU CAN GET A PROTEIN SEQUENCE IN FASTA FORMAT OR AN ACCESSION NUMBER. ii)IDENTIFYING SEQUENCES USING BLAST 1. NAVIGATE TO THE MAIN BLAST PAGE (HTTPS://BLAST.NCBI.NLM.NIH.GOV/BLAST.CGI). 2. SELECT THE APPROPRIATE TYPE OF BLAST FOR YOUR SEQUENCE 3. PASTE THE FIRST UNKNOWN SEQUENCE INTO THE BOX (FOR THIS ACTIVITY, YOU CAN IGNORE THE SEARCH OPTIONS) 4. CLICK ON THE “BLAST” BUTTON AND WAIT FOR THE RESULTS. BLAST IS USUALLY FAIRLY QUICK FOR SHORT SEQUENCES, BUT SHOULD STILL TAKE A FEW SECONDS. 5. ONCE THE RESULTS ARE DISPLAYED, NOTICE THERE ARE THREE MAIN HEADINGS: GRAPHIC SUMMARY, DESCRIPTIONS, AND ALIGNMENTS (THESE MAY BE EXPANDED SO YOU’LL HAVE TO SCROLL DOWN).
  • 19. FASTA FORMAT FASTA FORMAT IS USED TO REPRESENT EITHER NUCLEOTIDE OR PEPTIDE SEQUENCES. THE FIRST LINE IS A COMMENT LINE, BEGINNING WITH “>” AND DESCRIBING THE SEQUENCE. ALL THE FOLLOWING LINES ARE THE SEQUENCE, IN PLAIN TEXT. EXAMPLE DNA SEQUENCE IN FASTA FORMAT:  >GI|23423|REF|NM_23542.0| HOMO SAPIENS PROTEIN ATGAATCGATACGATAGCTAGCTATCGATGCA GATCAGAGAGGGGCTTTAGCTAGCTAAGCTAG EXAMPLE PROTEIN SEQUENCE IN FASTA FORMAT:  >MCHU - CALMODULIN - HUMAN, RABBIT, BOVINE, RAT, AND CHICKEN ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA DIDGDGQVNYEEFVQMMTAK*
  • 20. ACCESSION NUMBER • XM 005537111.1 • THIS IS A UNIQUE IDENTIFIER ASSIGNED TO A RECORD IN SEQUENCE DATABASES SUCH AS GENBANK • HAS AN ALPHABETICAL PREFIX AND A SERIES OF DIGITS.
  • 21. HOW TO INTERPRET RESULTS BLAST RESULTS HAVE THE FOLLOWING FIELDS: • E VALUE: THE E VALUE (EXPECTED VALUE) IS A NUMBER THAT DESCRIBES HOW MANY TIMES YOU WOULD EXPECT A MATCH BY CHANCE IN A DATABASE OF THAT SIZE. THE LOWER THE E VALUE IS, THE MORE SIGNIFICANT THE MATCH. • PERCENT IDENTITY: THE PERCENT IDENTITY IS A NUMBER THAT DESCRIBES HOW SIMILAR THE QUERY SEQUENCE IS TO THE TARGET SEQUENCE (HOW MANY CHARACTERS IN EACH SEQUENCE ARE IDENTICAL). THE HIGHER THE PERCENT IDENTITY IS, THE MORE SIGNIFICANT THE MATCH. • QUERY COVER: THE QUERY COVER IS A NUMBER THAT DESCRIBES HOW MUCH OF THE QUERY SEQUENCE IS COVERED BY THE TARGET SEQUENCE. IF THE TARGET SEQUENCE IN THE DATABASE SPANS THE WHOLE QUERY SEQUENCE, THEN THE QUERY COVER IS 100%. THIS TELLS US HOW LONG THE SEQUENCES ARE, RELATIVE TO EACH OTHER.
  • 22. QUESTIONS 1. IN THE DESCRIPTIONS SECTION, LOOK AT THE TOP RESULT, WHICH SHOULD BE THE RESULT WITH THE HIGHEST SCORE. WRITE DOWN INFORMATION ABOUT THE BEST MATCH  DESCRIPTION (NO NEED TO WRITE THE WHOLE THING)  E VALUE IDENTITY  QUERY COVER 2. NOW SCROLL DOWN TO THE ALIGNMENTS HEADING. LOOK AT THE TOP RESULT, WHICH SHOULD BE THE SAME ONE. LOOK AT THE ALIGNMENT BETWEEN YOUR QUERY AND THE REFERENCE. DO YOU SEE ANY MISMATCHES? 3. HOW CAN YOU JUDGE WHETHER THIS IS A GOOD MATCH?
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 30.
  • 31. REFERENCES AFIQAH-ALENG N, MOHAMED-HUSSEIN ZA. CONSTRUCTION OF PROTEIN EXPRESSION NETWORK. METHODS MOL BIOL. 2021;2189:119-132. DOI: 10.1007/978-1-0716-0822-7_10. PMID: 33180298. STRUYF P, DE MOOR S, VANDEVIVER C, RENARD B, VANDER BEKEN T. THE EFFECTIVENESS OF DNA DATABASES IN RELATION TO THEIR PURPOSE AND CONTENT: A SYSTEMATIC REVIEW. FORENSIC SCI INT. 2019 AUG;301:371-381. DOI: 10.1016/J.FORSCIINT.2019.05.052. EPUB 2019 JUN 5. PMID: 31212144. KANZ,C. ET AL. (2005) THE EMBL NUCLEOTIDE SEQUENCE DATABASE. NUCLEIC ACIDS RES., 33, D29–D33. ALTSCHUL SF, GISH W, MILLER W, MYERS EW, LIPMAN DJ: BASIC LOCAL ALIGNMENT SEARCH TOOL. J MOL BIOL 1990, 215:403-410.2. NCBI BLAST [HTTP://WWW.NCBI.NLM.NIH.GOV/BLAST/]

Editor's Notes

  1. E value can also be the number of expected hits of similar score that could be found just by chance. It is the probability that the sequence you have is not that gene so we need it to be zero so that we you are sure that it’s the correct gene
  2. You can also use the XM number in a case that the description is not written aladin, because not all proteins are named in all species.