An Introduction to "Bioinformatics & Internet"


Published on

An introduction to Bioinformatics & its relationship with internet.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

An Introduction to "Bioinformatics & Internet"

  1. 1. 01/31/14 Introduction to Computers 1
  2. 2. Bioinformatics & Internet By ASAR KHAN M.Sc Zoology AWKUM Buner Campus
  3. 3. Bioinformatics • The information technology applied to the biological information to Receive , Analyze & Retrieve the biological data. 3
  4. 4. Let`s Discuss in Detail…. 4
  5. 5. What is a Computer??? • In general, a computer is a machine which accepts data, processes it and returns new information as output. Processing Data (Input) Information (output) 5
  6. 6. Software • Software is set of programs (which are step by step instructions) telling the computer how to process data. it is Also called “firmware”. • Software needs to be installed on a computer, usually from a CD or USB. • e.g Digital audio editors , Win 98 , Win2000 , MS Office , Win7 , XP ghost , Win 2006. 6
  7. 7. Advantages of Using Computers • Speed: Computers can carry out instructions in less than a millionth of a second. • Accuracy : Computers can do the calculations without errors and very accurately. • Diligence : Computers are capable of performing any task given to them repetitively. • Storage Capacity : Computers can store large volume of data and information on magnetic media. 7
  8. 8. Computers languages • Commonly used high-level programming languages are Ada, V-BASIC, C , C++ , COBOL , Java , Lisp , Pascal. • Commonly used scripting languages are Bourne script, JavaScript, Python, Ruby, PHP, Perl 8
  9. 9. What is PERL? • Larry Wall developed Perl in 1986. • Perl is an interpreted language optimized for scanning arbitrary text files, extracting information from these files, and printing reports based on that information. • It is also a good language for many system management tasks. • In addition Perl-5 is used for graphics programming, system administration, network programming, finance, bioinformatics, and other applications 9
  10. 10. Advantages of PERL • These benefits include its generous licensing (it's free). • Cost and Licensing First, Perl is generally available on most server platforms, including the following: • Most UNIX variants , MS-DOS , Windows NT Windows 95 , OS/2 10
  11. 11. What is an Internet? • The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite (TCP/IP) to serve several billion users worldwide. • Internet provides many services: – Email – World Wide Web (www) – Remote Login (Telnet) – File Transfer (FTP) 11
  12. 12. Computer Network • A Computer Network is interconnection of Computers to share resources. • Resources can be : Information, Load, Devices etc. 12
  13. 13. Types Of Computer Networks On the basis of Size: • Local Area Network (LAN) Its a network of the computers locally i.e. in one room, one building or home. • Wide Area Network (WAN) Its a network of the computers spread widely geographically. 13
  14. 14. Benefits of Computer Networks • • • • Information Sharing , Device Sharing Load Sharing , Mobility Fast Communication Anywhere Anytime Banking 14
  15. 15. How to get connected ? ? ? • We can get connected through a modem which uses copper twisted cables carrying signals to transmit data. 15
  16. 16. Through WI-FI • Wi-Fi, is a popular technology that allows an electronic device to exchange data or connect to the internet wirelessly using radio waves. 16
  17. 17. Browsers • Clients that communicates with servers , using a set of standard protocols & conversations. • It contains the software we need in order to find , retrieve , view & send information over internet. 17
  18. 18. Browsers • Lynx it was developed in Kansas university USA to construct a campus-wide information system. it only provide a text-only via lower cost. • Mosaic Developed in 1993 at NCSA university of Illinois USA deign for M.Windows it provide a single user-friendly interface to diverse protocols , data formats & info. Servers available throughout internet. 18
  19. 19. • Netscape developed in 1994 by NCC California USA. it is now the most popular package for browsing information's on internet. e.g e-mail , audio videos etc • Internet Explorer developed in 1995 by Microsoft corp. Redmond USA designed to work with PC-based OS , it offers hypermedia browsers , including java & ActiveX User can navigate by clicking on specific buttons or pictures which are known as hyperlinks. 19
  20. 20. • Hyperlinks usually characterized by being highlighted in some way , either by using a different color from the main body of the text or by being boxed etc. • Each link have a uniform address known as URL (uniform resource locator) • HTTP (hyper text transport protocol) used to exchange info over internet. 20
  21. 21. • HTML (hyper text markup language) Hyper text documents are written in a standard markup language known as HTML. HTML code is strictly text-based & any associated graphics or sound for that document exist as separate files in a common format. 21
  22. 22. EMB net • EMB net (European Molecular Biology network) is an international network that aims to enhance bioinformatics services by bringing together bioinformatics service providers. 22
  23. 23. EMB net • Computer store sequence info as a simple rows of sequence characters called strings. Each character stored in binary code “smallest unit of memory” called byte 1byte = 8 bites • A DNA seq usually stored & read in computer as a series of 8-bit words in binary format, Value = 0 or 1 producing 255 possible combinations. • A protein seq appears as a series of 8-bit words comprising the binary form of amino acid letters. 23
  24. 24. • Normally DNA & Protein seq are presented in ASCII (American Standard Code for Information Interchange) & FASTA (FAST Alignment) format. (1) >MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPT EAELQDMINEVDADGNGTID FPEFLTMMARKMKDTDSEEEIRE AFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA DIDGDGQVNYEEFVQMMTAK* 24
  25. 25. TELNET • Its allows a user to remotely log onto a computer & access its facilities. It is useful only for occasional queries. • Its disadvantage is “it has extensive management of user identification & overloading of remote computer processing power”. 25
  26. 26. Address • To facilitate communication b/w nodes each computer on internet is given a unique identifying No (its IP address). • It is encoded in dotted decimal format e.g it represent a particular machine (PC). • But the domain-name sys also implemented which makes internet addresses easier to users. e.g meaning ncbi = national center for biotec & information nlm = national library of medicine nih = national institute of health 26
  27. 27. World Wide Web (www) • The World Wide Web consists of all the public Web sites connected to the Internet worldwide, including the client devices (such as computers and cell phones) that access Web content. 27
  28. 28. • It was developed by ENRC (European Nuclear Research Council) in 1989. to allow internationally info sharing , it led to a medium through which text , images , sounds & videos could be delivered on demand to users. • WWW greatly enhanced the power of cross references with the guarantee to retrieve the latest information. • The 1st Molecular biology web server was ExPASy (Expert Protein Analysis System) developed in 1993 by Geneva University Hospital & University of Geneva. 28
  29. 29. Web pages • The documents which appear in the web browser window when we surf the www called web pages. • Each document display on web is called “web page” & all of the related pages of a particular server is collectively called a web site. • Web site is a collection of relevant web pages & stored on one computer & each website has a unique address , the most feature of a site is link which allows jump to another page anywhere in the current website. 29
  30. 30. Web Pages 30
  31. 31. Nodes • In communication networks, a node is a connection point, either a redistribution point or a communication endpoint. • EMBnet operates 34 nodes in which 20 are national b/c nations have the mandate to provide database , software and online services , including sequence analysis , protein modeling , genetic mapping etc. 31
  32. 32. • 8 nodes design for user support & training and to undertake research and development. • These are actually academic , industrial or research centers that have knowledge of specific areas of B.I • They are responsible for the maintenance of biological database & software's. 32
  33. 33. • Remaining 6 sites have been accepted within EMBnet as associate nodes, Which are biocomputing centers from non-European countries • that serve their user communicate with the same kinds of service , as might a typical national node. • Most of them offer up-to-date access to sequence databases & analysis software. for molecular mapping , genome management , genetic mapping & so on. 33
  34. 34. EMBnet associate nodes Abbreviation Country Site IBBM Argentina ANGIS Australia CBI China CIGB Cuba CDFD India http://salarjung.embnet/ SANBI South Africa 34
  35. 35. SRS (Sequence Retrieval System) • It is a network browser for database in molecular biology , this involved to help EMBnet users. • It allows any flat-file database to be index to any other , it allows user to retrieve , link & access entries from all the interconnected resources. • The source links nucleic acid , protein sequence , structure , pattern , bibliographic databeses. 35
  36. 36. • SRS is integral system for info retrieval from many different sequence & for feeding the sequences retrieved into analytic tools such as sequence comparison and alignment programes. • It can search a total of 141 databases of protein & nucleotide sequences , metabolic pathways , 3D structures & functions , genomes , diseases and phenotype information. 36
  37. 37. NCBI (The National Center for Biotechnology Information) • Established in 1988 in USA as a division of National Library of Medicine located at Bethesda, Maryland • Its role is to develop new information technologies in aiding our understanding of molecular & genetic processes that underline health & diseases. • Its specific aims include the creation of automated system for sorting and analyzing biological infor.. 37
  38. 38. • The development of advanced methods of computer-based information processing. • The facilitation to user access to databases & software , and coordination of efforts to gather biotechnology information worldwide. • It maintain GenBank , the NIH DNA seq database. this data is exchange with international nucleotide databases , EMBL & DDBJ. 38
  39. 39. Entrez • DB of different kind merged together and become global hubs of knowledge. • Just like SRS for EMBnet , entrez facility evolved at NCBI to allow retrieval of molecular biology data & bibliographic citations from NCBI`s. • It permit related articles in different database to be linked to each other. 39
  40. 40. • It provide access to DNA seq from (GenBank ,EMBL & DDBJ) while protein seq from (SWISS-PORT ,PIR , PRF ,PBD & translated protein seq from DNA seq databases). • It is front-end to all databases maintained by NCBI`s & it is extremely easy to use , it is linked to total of 11 databases • It can be accessed through NCBI website by following URL 40
  41. 41. Databases covered by Entrez are listed below Category 1. N.A sequence Databases Entrez ntds: seq obtained from GenBank , Refseq & PDB Entrez Protein: seq obtained , from SWISS-PROT, PIR , 2. Protein sequences PRF , PDB & translations from coding region GenBank , Refseq 3. 3D structure Entrez Molecular Modeling Database (MMDB) 4. Genomes Complete genome assemblies from many sources 5. PopSet From GenBank , set of DNA seq that have been collected to analyze the evolutionary relatedness of a population. 6. OMIM Online Mendelian Inheritance in Man 7. Taxonomy NCBI taxonomy database 8. Books Bookshelf 9. Probeset Gene Expression Omnibus (GEO) 10. 3D domain Domains from the entrez Molecular Modeling Database 11. Literature PubMed 41
  42. 42. Retrieval & Application • The two main reasons for putting the data on the computer is Retrieval & Discovery. • Retrieval is the ability to get back out what we put in so this is more valuable to get back from the system more knowledge than was put in. • This will help in biological discoveries • NCBI uses 4 core data elements: bibliographic citations , DNA seq , Protein seq , & 3D structures. 42
  43. 43. Bioseq • Bioseq or biological sequence is a central element in NCBI data model it contain a single , continues molecule of nucleic acid or protein 43
  44. 44. Mirrors & Intranet Different servers providing the same services are called mirrors , to access a particular website it is necessary to type the URL in the address bar of the browser. 44
  45. 45. Intranet • Many academic institutions have an intranet , which means a local network that can be accessed only from computer within the institution. 45
  46. 46. • What makes a web the most powerful is its network • Here some basic sites for beginner of bioinformatics 1. 2. 3. 4. 5. 6. 46
  47. 47. • Apart from these sites , there are a great number of specialist sites with biological data which can be accessed. e.g • General purpose search engines such as 47
  48. 48. THANK YOU FOR YOUR ATTENTION Questions are Welcomed . . . 48