What is a Computer???
• In general, a computer is a machine which
accepts data, processes it and returns new
information as output.
• Software is set of programs (which are step by
step instructions) telling the computer how to
process data. it is Also called “firmware”.
• Software needs to be installed on a computer,
usually from a CD or USB.
• e.g Digital audio editors , Win 98 , Win2000 ,
MS Office , Win7 , XP ghost , Win 2006.
Advantages of Using Computers
• Speed: Computers can carry out instructions in less
than a millionth of a second.
• Accuracy : Computers can do the calculations without
errors and very accurately.
• Diligence : Computers are capable of performing any
task given to them repetitively.
• Storage Capacity : Computers can store large
volume of data and information on magnetic media.
• Commonly used high-level programming languages
are Ada, V-BASIC, C , C++ , COBOL , Java , Lisp ,
• Commonly used scripting languages are
What is PERL?
• Larry Wall developed Perl in 1986.
• Perl is an interpreted language optimized for
scanning arbitrary text files, extracting information
from these files, and printing reports based on that
• It is also a good language for many system
• In addition Perl-5 is used for graphics
programming, system administration, network
programming, finance, bioinformatics, and other
Advantages of PERL
• These benefits include its generous licensing (it's
• Cost and Licensing
First, Perl is generally available on most server
platforms, including the following:
• Most UNIX variants , MS-DOS , Windows NT
Windows 95 , OS/2
What is an Internet?
• The Internet is a global system of interconnected
computer networks that use the standard
Internet protocol suite (TCP/IP) to serve several
billion users worldwide.
• Internet provides many services:
– World Wide Web (www)
– Remote Login (Telnet)
– File Transfer (FTP)
• A Computer Network is interconnection of
Computers to share resources.
• Resources can be : Information, Load,
Types Of Computer Networks
On the basis of Size:
• Local Area Network (LAN)
Its a network of the computers locally i.e. in
one room, one building or home.
• Wide Area Network (WAN)
Its a network of the computers
spread widely geographically.
Benefits of Computer Networks
Information Sharing , Device Sharing
Load Sharing , Mobility
Anywhere Anytime Banking
How to get connected ? ? ?
• We can get connected through a modem which
uses copper twisted cables carrying signals to
• Wi-Fi, is a popular technology that allows an
electronic device to exchange data or connect to
the internet wirelessly using radio waves.
• Clients that communicates with servers , using a
set of standard protocols & conversations.
• It contains the software we need in order to find ,
retrieve , view & send information over internet.
it was developed in Kansas university USA to
construct a campus-wide information system.
it only provide a text-only via lower cost.
Developed in 1993 at NCSA university of Illinois USA
deign for M.Windows it provide a single user-friendly
interface to diverse protocols , data formats & info.
Servers available throughout internet.
developed in 1994 by NCC California USA.
it is now the most popular package for browsing
information's on internet. e.g e-mail , audio videos etc
developed in 1995 by Microsoft corp. Redmond USA
designed to work with PC-based OS , it offers
hypermedia browsers , including java & ActiveX
User can navigate by clicking on specific buttons or
pictures which are known as hyperlinks.
usually characterized by being highlighted in some
way , either by using a different color from the main
body of the text or by being boxed etc.
• Each link have a uniform address known as URL
(uniform resource locator)
• HTTP (hyper text transport
protocol) used to exchange info
• HTML (hyper text markup language)
Hyper text documents are written in a standard markup
language known as HTML.
HTML code is strictly text-based & any associated
graphics or sound for that document exist as
separate files in a common format.
• EMB net (European Molecular Biology network)
is an international network that aims to enhance
bioinformatics services by bringing together
bioinformatics service providers.
• Computer store sequence info as a simple rows of
sequence characters called strings. Each character
stored in binary code “smallest unit of memory”
called byte 1byte = 8 bites
• A DNA seq usually stored & read in computer as a
series of 8-bit words in binary format, Value = 0 or 1
producing 255 possible combinations.
• A protein seq appears as a series of 8-bit words
comprising the binary form of amino acid letters.
• Normally DNA & Protein seq are presented in ASCII
(American Standard Code for Information Interchange) &
FASTA (FAST Alignment) format.
>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
• Its allows a user to remotely log onto a computer &
access its facilities. It is useful only for occasional
• Its disadvantage is “it has extensive management of
user identification & overloading of remote
computer processing power”.
• To facilitate communication b/w nodes each
computer on internet is given a unique identifying No
(its IP address).
• It is encoded in dotted decimal format e.g
220.127.116.11 it represent a particular machine
• But the domain-name sys also implemented which
makes internet addresses easier to users.
e.g ncbi.nlm.nih.gov meaning
ncbi = national center for biotec & information
nlm = national library of medicine
nih = national institute of health
World Wide Web (www)
• The World Wide Web consists of all the public Web
sites connected to the Internet worldwide, including
the client devices (such as computers and cell
phones) that access Web content.
• It was developed by ENRC (European Nuclear Research Council)
in 1989. to allow internationally info sharing , it led to
a medium through which text , images , sounds &
videos could be delivered on demand to users.
• WWW greatly enhanced the power of cross
references with the guarantee to retrieve the latest
• The 1st Molecular biology web server was ExPASy
(Expert Protein Analysis System) developed in 1993 by
Geneva University Hospital & University of Geneva.
• The documents which appear in the web browser
window when we surf the www called web pages.
• Each document display on web is called “web page”
& all of the related pages of a particular server is
collectively called a web site.
• Web site is a collection of relevant web pages &
stored on one computer & each website has a
unique address , the most feature of a site is link
which allows jump to another page anywhere in the
• In communication networks, a node is a connection
point, either a redistribution point or a communication
• EMBnet operates 34 nodes in which 20 are national
b/c nations have the mandate to provide database ,
software and online services , including sequence
analysis , protein modeling , genetic mapping etc.
• 8 nodes design for user support & training and to
undertake research and development.
• These are actually academic , industrial or research
centers that have knowledge of specific areas of B.I
• They are responsible for the maintenance of
biological database & software's.
• Remaining 6 sites have been accepted within
EMBnet as associate nodes, Which are biocomputing
centers from non-European countries
• that serve their user communicate with the same
kinds of service , as might a typical national node.
• Most of them offer up-to-date access to sequence
databases & analysis software.
for molecular mapping , genome management ,
genetic mapping & so on.
EMBnet associate nodes
SRS (Sequence Retrieval System)
• It is a network browser for database in molecular
biology , this involved to help EMBnet users.
• It allows any flat-file database to be index to any
other , it allows user to retrieve , link & access
entries from all the interconnected resources.
• The source links nucleic acid , protein sequence ,
structure , pattern , bibliographic databeses.
• SRS is integral system for info retrieval from many
different sequence & for feeding the sequences
retrieved into analytic tools such as sequence
comparison and alignment programes.
• It can search a total of 141 databases of protein &
nucleotide sequences , metabolic pathways , 3D
structures & functions , genomes , diseases and
NCBI (The National Center for Biotechnology Information)
• Established in 1988 in USA as a division of National
Library of Medicine located at Bethesda, Maryland
• Its role is to develop new information technologies in
aiding our understanding of molecular & genetic
processes that underline health & diseases.
• Its specific aims include the creation of automated
system for sorting and analyzing biological infor..
• The development of advanced methods of
computer-based information processing.
• The facilitation to user access to databases &
software , and coordination of efforts to gather
biotechnology information worldwide.
• It maintain GenBank , the NIH DNA seq database.
this data is exchange with international nucleotide
databases , EMBL & DDBJ.
• DB of different kind merged together and become
global hubs of knowledge.
• Just like SRS for EMBnet , entrez facility evolved at
NCBI to allow retrieval of molecular biology data &
bibliographic citations from NCBI`s.
• It permit related articles in different database to be
linked to each other.
• It provide access to DNA seq from (GenBank ,EMBL
& DDBJ) while protein seq from (SWISS-PORT ,PIR
, PRF ,PBD & translated protein seq from DNA seq
• It is front-end to all databases maintained by NCBI`s
& it is extremely easy to use , it is linked to total of
• It can be accessed through NCBI website by
Databases covered by Entrez are listed below
1. N.A sequence
Entrez ntds: seq obtained from GenBank , Refseq & PDB
Entrez Protein: seq obtained , from SWISS-PROT, PIR ,
2. Protein sequences PRF , PDB & translations from coding region GenBank ,
3. 3D structure
Entrez Molecular Modeling Database (MMDB)
Complete genome assemblies from many sources
From GenBank , set of DNA seq that have been collected to
analyze the evolutionary relatedness of a population.
Online Mendelian Inheritance in Man
NCBI taxonomy database
Gene Expression Omnibus (GEO)
10. 3D domain
Domains from the entrez Molecular Modeling Database
Retrieval & Application
• The two main reasons for putting the data on the
computer is Retrieval & Discovery.
• Retrieval is the ability to get back out what we put in
so this is more valuable to get back from the system
more knowledge than was put in.
• This will help in biological discoveries
• NCBI uses 4 core data elements: bibliographic
citations , DNA seq , Protein seq , & 3D structures.
• Bioseq or biological sequence is a central element in
NCBI data model it contain a single , continues
molecule of nucleic acid or protein
Mirrors & Intranet
Different servers providing the same services are
called mirrors , to access a particular website it is
necessary to type the URL in the address bar of the
• Many academic institutions have an intranet , which
means a local network that can be accessed only
from computer within the institution.
• What makes a web the most powerful is its network
• Here some basic sites for beginner of bioinformatics
• Apart from these sites , there are a great number of
specialist sites with biological data which can be
• General purpose search engines such as
THANK YOU FOR YOUR
Questions are Welcomed . . .