SlideShare a Scribd company logo
1 of 183
Download to read offline
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 1 of 183
FOR MSC BIOTECHNOLOGY STUDENTS
2014 ONWARDS
Biochemistry scanner
THE IMPRINT
BTS - 205: BIOINFORMATICS
As per Bangalore University (CBCS) Syllabus
2016 Edition
BY: Prof. Balasubramanian Sathyamurthy
Supported By:
Ayesha Siddiqui
Kiran K.S.
THE MATERIALS FROM “THE IMPRINT (BIOCHEMISTRY SCANNER)” ARE NOT
FOR COMMERCIAL OR BRAND BUILDING. HENCE ONLY ACADEMIC CONTENT
WILL BE PRESENT INSIDE. WE THANK ALL THE CONTRIBUTORS FOR
ENCOURAGING THIS.
BE GOOD – DO GOOD & HELP OTHERS
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 2 of 183
DEDICATIONDEDICATIONDEDICATIONDEDICATION
I dedicate this mI dedicate this mI dedicate this mI dedicate this material to my spiritual guru Shri Raghavendra swamigal,aterial to my spiritual guru Shri Raghavendra swamigal,aterial to my spiritual guru Shri Raghavendra swamigal,aterial to my spiritual guru Shri Raghavendra swamigal,
parents, teachers, well wishers and students who always increase my moraleparents, teachers, well wishers and students who always increase my moraleparents, teachers, well wishers and students who always increase my moraleparents, teachers, well wishers and students who always increase my morale
and confidence to share myand confidence to share myand confidence to share myand confidence to share my knowledgeknowledgeknowledgeknowledge totototo reachreachreachreach all beneficiariesall beneficiariesall beneficiariesall beneficiaries....
PREFACEPREFACEPREFACEPREFACE
Biochemistry scanner ‘THE IMPRINT’ consists of last ten years solved question
paper of Bangalore University keeping in mind the syllabus and examination
pattern of the University. The content taken from the reference books has been
presented in a simple language for better understanding.
The Author Prof. Balasubramanian Sathyamurthy has 15 years of teaching
experience and has taught in 5 Indian Universities including Bangalore
University and more than 20 students has got university ranking under his
guidance.
THE IMPRINT is a genuine effort by the students to help their peers with their
examinations with the strategy that has been successfully utilized by them.
These final year M.Sc students have proven their mettle in university
examinations and are College / University rank holders.
This is truly for the students, by the students. We thank all the contributors for
their valuable suggestion in bringing out this book. We hope this will be
appreciated by the students and teachers alike. Suggestions are welcomed.
For any comments, queries, and suggestions and to get your free copy write us
at theimprintbiochemistry@gmail.com or call 9980494461
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 3 of 183
CONTRIBUTORS:
CHETAN ABBUR ANJALI TIWARI
AASHITA SINHA ASHWINI BELLATTI
BHARATH K CHAITHRA
GADIPARTHI VAMSEEKRISHNA KALYAN BANERJEE
KAMALA KISHORE
KIRAN KIRAN H.R
KRUTHI PRABAKAR KRUPA S
LATHA M MAMATA
MADHU PRAKASHHA G D MANJUNATH .B.P
NAYAB RASOOL S NAVYA KUCHARLAPATI
NEHA SHARIFF DIVYA DUBEY
NOOR AYESHA M PAYAL BANERJEE
POONAM PANCHAL PRAVEEN
PRAKASH K J M PRADEEP.R
PURSHOTHAM PUPPALA DEEPTHI
RAGHUNATH REDDY V RAMYA S
RAVI RESHMA
RUBY SHA SALMA H.
SHWETHA B S SHILPI CHOUBEY
SOUMOUNDA DAS SURENDRA N
THUMMALA MANOJ UDAYASHRE. B
DEEPIKA SHARMA
EDITION : 2016
PRINT : Bangalore
CONTACT : theimprintbiochemistry@gmail.com or 9980494461
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 4 of 183
M. SC. BIOTECHNOLOGY – SECOND SEMESTER
BTH - 205: BIOINFORMATICS
26 hrs
UNIT – 1: INTRODUCTION TO COMPUTER
Computer softwares – operating system – Windows, UNIX and Linux, Application
software – Word processor, spread sheet. Introduction to statistical software (SPSS).
(2 hrs)
UNIT – 2: COMPUTER NETWORK AND PROGRAMMING LANGUAGES
Structure, architecture, Advantages, types (LAN, MAN & WAN), Network protocols –
Internal protocol (TCP /IP), File transfer protocols (FTP), WWW, HTTP, HTML, URL.
Network Security – Group polices Fire –walls. C Programming and PERL – Algorithm
and flow chart, Structure of C program, Header file, Global declaration. Main function
variable declarations, Control statement – conditional and unconditional – sub
functions. Introduction and application of PERL & Bioperl (6 hrs)
UNIT – 3: DATABASES:
Introduction - Relational Databases Management (RDMS) - Oracle, SQL, Database
generation. (6 hrs)
UNIT – 4: BIOLOGICAL DATABASES:
Data mining and applications, accessing bibliographic databases – Pubmed, Nucleic
acid sequence databank – NCBI and EMBL. Protein sequence databank – NBRF – PIR,
SWISSPROT. Structural databases – Protein data Bank (PDB). KEGG: Kyoto
Encyclopedia of Genes and Genomes (metabolic pathway data bank), Microbial genomic
database (MBGD), Cell line database (ATCC), Virus data bank ( UICTV db). Sequence
alignment – Global and Local alignment, scoring matrices. Restriciting mapping – WEB
CUTTER & NEB CUTTER, Similarity searching ( FASTA and BLAST), Pair wise
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 5 of 183
comparision of sequences, Multiple Sequence alignment of sequences, Identification of
genes in genomes and Phylogenetic analysis with reference to nucleic acids and protein
sequences, Identification of ORF’s. Identification of motifs. (10 hrs)
UNIT - 5 PROTEIN STRUCTURE AND MOLECULAR INTERACTION:
Introduction to protein structure – secondary structure prediction, tertiary structure
prediction, protein modeling – principles of homology and comparative modeling.
Threading structure evaluation and validation and ab intio Modelling, Applications –
Rational Drug design and Molecular docking – Autodock. (5 hrs)
References:
1. Dhananjaya (2002). Introduction to Bioinformatics, www.sd-bio.com series
2. Jan (2001). Nucleic acid research, Genome Database issue
3. Higgins & Taylor (2000). Bioinformatics, OUP.
4. Baxavanis (1998). Bioinformatics.
5. Fry, J.C. (1993). Biological Data Analysis. A practical Approach. IRL Press, Oxford.
6. Swardlaw, A.C. (1985). Practical Statistics for Experimental Biologists, Joh
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 6 of 183
UNIT – 1: INTRODUCTION TO COMPUTER
Computer softwares – operating system – Windows, UNIX and Linux, Application
software – Word processor, spread sheet. Introduction to statistical software
(SPSS).
INTRODUCTION
The computer comprises of technologically advanced hardware put together to work at
great speed. To accomplish its various tasks, the computer is made of different parts,
each serving a particular purpose in conjunction with other parts. In other words, a
'computer' is an ensemble of different machines that you will be using to accomplish
your job. A computer is primarily made of the Central Processing Unit (usually referred
to as the computer), the monitor, the keyboard and the mouse. Other pieces of
hardware, commonly referred to as peripherals, can enhance or improve your
experience with the computer.
Evolution of computer technology
The origin of computer technology took place in the 19th century. People desired to
have a machine that would carry out mathematical calculations for them. The ABACUS
is considered to have been the first computer in the world. It was used to perform
simple measurements and calculations. ABACUS is available even today for school
going children. In the 17th century, a scientist named Pascal developed a machine that
could perform mathematical calculations. This machine comprised of a number of
gears. The movement of gear mechanism was used to perform some calculations. He
named the machine PASCALINE. However, the concept of the modern computer was
propounded by the scientist and mathematician Charles Babbage. He first wrote on the
use of logic and loops in process execution. Based on the concept of logic and loops,
Babbage envisaged two models for performing computations- Analytical Engine and
Difference Engine. In those days, electronics was not developed. Therefore, these models
proposed by Babbage existed only on paper. However, the ideas given by Babbage were
implemented after the invention of electronics. George Boolean developed the famous
Boolean algebra based on binary numbers. De Morgan put forward theorems on logic
gates. These theorems are known as De Morgan’s Theorems.
Lady Ada was the first computer programmer.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 7 of 183
The real application of computers began in the late fifties. The computers were used in
the United States for various applications such as census, defence, R&D, universities
etc.
Advantages of computers
Compared to traditional systems, computers offer many noteworthy advantages.
This is one reason that traditional systems are being replaced rapidly by computer-
based systems. The main advantages offered by computers are as follows:
High Accuracy
Superior Speed of Operation
Large Storage Capacity
User-friendly Features
Portability
Platform independence
Economical in the long term
Types of computers
Computers are classified in a variety of ways depending upon the principles of working,
construction, size and applications. Various types of computers are discussed in this
section.
Digital and analog computers
Analog computers
The computers that process analog signals are known as Analog Computers. The analog
signal is a continuous signal. For example, sine wave is an analog signal. The analog
quantities are based on decimal number systems. Examples of Analog computers are
the slide rule, ABACUS etc.
The operational amplifiers are widely used in the construction of analog computers
when the analog electrical signal is to be processed. For example, a differentiator is the
op amp circuit that differentiates input signal. If the input signal V sin q is given to
analog computer, the output would be V cos q. Accordingly, the analog computer that
generates the second order differential equation can be drawn as given in Fig.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 8 of 183
Digital computers
Computers that process digital signals are known as Digital Computers. The Digital
signal is a discrete signal with two states 0 and 1. In practice, the digital computers are
used and not analog.
Examples of digital computers are personal computers, supercomputers, mainframe
computers etc.
Supercomputers
Are the most powerful computers in terms of speed of execution and large storage
capacity. NASA uses supercomputers to track and control space explorations.
Mainframe Computers:
Are next to supercomputers in terms of capacity. The mainframe computers are multi
terminal computers, which can be shared simultaneously by multiple users. Unlike
personal computers, mainframe computers offer time-sharing. For example, insurance
companies use mainframe computers to process information about millions of its
policyholders.
Minicomputers
These computers are also known as midrange computers. These are desk-sized
machines and are used in medium scale applications. For example, production
departments use minicomputers to monitor various manufacturing processes and
assembly-line operations.
Microcomputers
As compared to supercomputers, mainframes and minicomputers, microcomputers are
the least powerful, but these are very widely used and rapidly gaining in popularity.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 9 of 183
Personal Computer
PC is the term referred to the computer that is designed for use by a single person. PCs
are also called microcontrollers because these are smaller when compared to
mainframes and minicomputers. The term ‘PC’ is frequently used to refer to desktop
computers. Although PCs are used by individuals, they can also be used in computer
networks.
Desktop Computer
This is the most commonly used personal computer. It comprises of a keyboard, mouse,
monitor and system unit. The system unit is also known as cabinet or chassis. It is the
container that houses most of the components such as motherboard, disk drives, ports,
switch mode power supply and add-on cards etc. The desktop computers are available
in two models- horizontal model and tower model.
Laptops
Are also called notebook computers. These are the portable computers. They have a size
of 8.5 x 11 inch and weigh about three-to-four kilos.
Palmtops
Palmtops are also called handheld computers. These are computing devices, which are
small enough to fit into your palm. The size of a palmtop is like an appointment book.
The palmtops are generally kept for personal use such as taking notes, developing a list
of friends, keeping track of dates, agendas etc. The Palmtop can also be connected to a
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 10 of 183
PC for downloading data. It also provides value-added features such as voice input,
Internet, cell phone, camera, movie player and GPS.
Personal Digital Assistant (PDA) – is the palm type computer. It combines pen input,
writing recognition, personal organisational tools and communication capabilities in a
small package.
COMPUTER ARCHITECTURE – INTERNAL AND EXTERNAL DEVICES
Basic elements of a computer system are Mouse, Keyboard, monitor, memory, CPU,
motherboard, Hard Disk, Speakers, Modem, power supply and processor.
Mouse:
Mouse is used for operating the system. Nowadays, optical mouse is more popular as
compared to simple mouse.
Keyboard:
Keyboard is used to input data in to the system so that the system gives output to the
user. Therefore, the keyboard is an integral part of the input system. A computer is
essentially incomplete without a keyboard.
Monitor:
Monitor, which again is a very essential part of the computer system, displays the
actions that the computer performs on our command.
Motherboard:
Motherboard again a necessary element of the computer system contains different
elements as memory, processor, modem, slots for graphic card and LAN card.
Hard Disk:
Hard disk is used to store data permanently on computer. Modem: Modem is used to
connecting to the Internet. Two types of modems are widely used. One is known as
software modems and the other is known as hardware modems.
Speakers:
Speakers are also included in basic elements of a computer. It is not indispensible,
because a computer can perform its function without speakers. However, we use them
to for multiple purposes.
Basic Computer Functioning
A computer can be defined as an electronic device that accepts data from an input
device, processes it, stores it in a disk and finally displays it on an output device such
as a monitor.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 11 of 183
To understand the basic rudiments of the functioning of the computer refer to the basic
block diagram of a computer as shown in Fig.
This flow of information holds true for all types of computers such as Personal
Computers, Laptops, Palmtops etc. In other words, the fundamental principle of
working is the same. As shown in Fig. there are four main building blocks in a
computer's functioninginput, processor, output and memory.
The data is entered through input devices like the keyboard, disks or mouse. These
input devices help convert data and programs into the language that the computer can
process.
The data received from the keyboard is processed by the CPU, i.e. the Central
Processing Unit. The CPU controls and manipulates the data that produce information.
The CPU is usually housed within the protective cartridge. The processed data is either
stored in the memory or sent to the output device, as per the command given by the
user. The memory unit holds data and program instructions for processing data.
Output devices translate the processed information from the computer into a form that
we can understand.
Components of Computer System
Motherboard
The motherboard is the main component inside the case. It is a large rectangular board
with integrated circuitry that connects the various parts of the computer as the CPU,
RAM, Disk drives (CD, DVD, Hard disk or any others) as well as any other peripherals
connected via the ports or the expansion slots.
Components directly attached to the motherboard include:
The central processing unit (CPU) performs most of the calculations that enable a
computer to function and is sometimes referred to as the "brain" of the computer. It is
usually cooled by a heat sink and fan.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 12 of 183
The chip set aids communication between the CPU and the other components of the
system, including main memory.
RAM (Random Access Memory) stores all are running processes (applications) and the
current running OS.
The BIOS includes boot firmware and power management. The Basic Input Output
System tasks are handled by operating system drivers.
Internal Buses connect the CPU to various internal components and to expansion cards
for graphics and sound.
Current Technology
The north bridge memory controller, for RAM and PCI Express
PCI Express, for expansion cards such as graphics and physics processors, and high-
end network interfaces
PCI, for other expansion cards
SATA, for disk drives
Obsolete Technology
ATA (superseded by SATA)
AGP (superseded by PCI Express)
VLB VESA Local Bus (superseded by AGP)
ISA (expansion card slot format obsolete in PCs but still used in industrial computers)
External Bus Controllers support ports for external peripherals. These ports may be
controlled directly by the south bridge I/O controller or based on expansion cards
attached to the motherboard through the PCI bus.
USB
FireWire
SATA
SCSI
POWER SUPPLY
A power supply unit (PSU) converts alternating current (AC) electric power to low
voltage
DC power for the internal components of the computer. Some power supplies have
a switch to change between 230 V and 115 V. Other models have automatic sensors
that switch input voltage automatically or are able to accept any voltage within these
limits.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 13 of 183
Power supply units used in computers are generally switch mode power supplies
(SMPS).
The SMPS provides regulated direct current power at several voltages as required by the
motherboard and accessories such as disk drives and cooling fans.
Removable media devices
CD (compact disc): The most common type of removable media, suitable for music and
data
CD-ROM Drive:
A device used for reading data from a CD
CD Writer:
A device used for both reading and writing data to and from a CD
DVD (digital versatile disc):
A popular type of removable media that is the same size as a CD but stores up to 12
times as much information- the most common way of transferring digital video and is
popular for data storage
DVD-ROM Drive:
A device used for reading data from a DVD
DVD Writer:
A device used for both reading and writing data to and from a DVD
DVD-RAM Drive:
A device used for rapid writing and reading of data from a special type of DVD
Blu-ray Disc:
A high density optical disc format for data and high-definition video that can store 70
times as much information as a CD
BD-ROM Drive:
A device used for reading data from a Blu-ray disc.
BD Writer:
A device used for both reading and writing data to and from a Blu-ray disc
HD DVD:
A discontinued competitor to the Blu-ray format
Floppy disk:
An outdated storage device consisting of a thin disk of a flexible magnetic storage
medium used today mainly for loading RAID drivers
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 14 of 183
Iomega Zip drive:
An outdated medium-capacity removable disk storage system, first introduced by
Iomega in 1994
USB flash drive:
A flash memory data storage device integrated with a USB interface, typically small,
lightweight, removable and rewritable with varying capacities from hundreds of
megabytes (in the same ballpark as CDs) to tens of gigabytes (surpassing, at great
expense, Blu-ray discs)
Tape drive:
A device that reads and writes data on a magnetic tape, used for long term storage and
backups
Secondary storage
This hardware keeps data inside the computer for later use and retains it even when
the computer has no power.
Hard disk:
A device for medium-term storage of data
Solid-state drive:
A device quite similar to the hard disk, but containing no moving parts and which
stores data in a digital format
RAID array controller:
A device to manage several internal or external hard disks and optionally some
peripherals in order to achieve performance or reliability improvement in what is called
a RAID array
Sound card
This device enables the computer to output sound to audio devices, as well as accept
input from a microphone. Most modern computers have sound cards built-in to the
motherboard, though it is common for a user to install a separate sound card as an
upgrade. Most sound cards, either built-in or added, have surround sound capabilities.
Other peripherals
In addition, hardware devices can include external components of a computer system.
The following are either standard or very common.
Wheel mouse
Includes various input and output devices, usually external to the computer system
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 15 of 183
INPUT
Text input devices
Keyboard: A device to input text and characters by pressing buttons (referred to as
keys)
Pointing devices
Mouse:
A pointing device that detects two-dimensional motion relative to its supporting surface
Optical Mouse:
Uses light to determine motion
Trackball:
A pointing device consisting of an exposed protruding ball housed in a socket that
detects rotation about the two axes
Touch screen:
Senses the user pressing directly on the display
Gaming devices
Joystick:
A control device that consists of a handheld stick that pivots around one end, to detect
angles in two or three dimensions
Gamepad:
A handheld game controller that relies on the digits/ fingers (especially thumbs) to
provide input
Game controller:
A specific type of controller specialized for certain gaming purposes
Image, video input devices
Image scanner:
A device that provides input by analysing images, printed text, handwriting or an object
Webcam:
A low resolution video camera used to provide visual input that can be easily
transferred over the Internet
Audio input devices
Microphone:
An acoustic sensor that provides input by converting sound into electrical signals
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 16 of 183
COMPUTER SOFTWARES
The computer performs its functions based on the instructions given by the user. The
set of such instructions written for a particular task is known as a computer program.
Program is the set of instructions that tells the computer how to process the data, into
the form desired by the user. The language in which a computer program is written is
known as programming language. The programming languages are classified as Low-
level language and High-level language.
Low-level language is further classified as Machine language and Assembly language.
Machine language is expressed in terms of binary numbers i.e. 0 and 1 as the processor
understands binary numbers only. However, it is difficult to read and write the program
in terms of 0s and 1s.
The machine language code is further simplified by converting it to the code called op
code. The op code depends upon the type of processor. The program written in the op
code is known as Assembly language code. During the run time, it is necessary to
convert the op code into machine language so that the processor will understand and
process the code.
The internal program that translates op code to machine code is known as Assembler.
Some examples of Assembler are Microsoft Assembler (MASM), Z-80, 8085, 8086 etc.
The Assembler for each processor is different. Usage of the Assembly language requires
knowledge of the Assembly language and computer hardware. It is more convenient to
write a program in a High level language which comprises of instructions in simple
English. Examples of High level language are BASIC, FORTRAN, COBOL etc. A compiler
is the internal program that translates High level language to Machine language.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 17 of 183
'Software' is another name for program. In most cases, the terms ‘software’ and
‘program’ are interchangeable. There are two types of software - system software and
application/ utility software. Application software is the end user software. The
programs written under application software are designed for general purpose and
special purpose applications. An example of application software is Microsoft Internet
Explorer. System Software enables application software to interact with the computer
hardware. System software is the ‘background’ software that helps the computer to
manage its internal resources. The most important system software is the operating
system. The system software performs important tasks such as running the program,
storing data, processing data etc. Windows XP is an example of system software.
OPERATING SYSTEM – WINDOWS, UNIX AND LINUX
Disk Operating System (specifically) and disk operating system (generically), most often
abbreviated as DOS, refers to an operating system software, used in most computers,
which provides the abstraction and management of secondary storage devices and the
information on them (e.g., file systems for organizing files of all sorts). Such software is
referred to as a disk operating system since the storage devices it manages are made of
rotating platters (such as hard disks or floppy disks).
DOS is the medium through which the user and external devices attached to the system
communicate. DOS translates the command issued by the user in the format that is
comprehensible by the computer and instructs the computer to function accordingly. It
also translates the result and any error message in the format for the user to
understand.
Features of windows
Microsoft Windows is a series of software operating systems and graphical user
interfaces developed by Microsoft. Some of its important features are listed below:
Faster Operating System:
Windows include tools that increase the speed of the computer. Windows includes a set
of programs designed to optimize the efficiency of computer, especially when used
together.
Improved Reliability:
Windows improves computer reliability by introducing new wizards, utilities and
resources that lend a hand in helping your system operate effortlessly.
Innovative, Easy to use features:
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 18 of 183
Windows makes your computer easier to use with new and enhanced features.
Overview of different versions of windows
The different versions of Windows are discussed below:
WINDOWS 1.0
Microsoft released the first version of Windows way back in 1985. It marked a major
breakthrough as it allowed users to switch from character based (CUI)/non-graphical
MSDOS to the GUI based operating system. The product incorporated a set of desktop
applications, including the MS-DOS file management program and value additions such
as a calendar, card file, notepad, calculator, clock and telecommunications programs. It
allowed users to work with multiple applications at the same time (multitasking).
WINDOWS 3.0
Microsoft released this version of Windows in 1990. Some of its main features were:
32 bit operating system with support for advanced graphics
Inclusion of Program Manager, File Manager and Print Manager
A completely rewritten application development with new capabilities and native
support for applications running in extended memory and fully pre-emptive MS-DOS
multitasking • Inclusion of Windows software development kit (SDK), which facilitated
software developers focus more on writing applications and less on writing device
drivers.
Improved Windows icons
WINDOWS NT 3.1/3.11
Microsoft released this version of Windows on July 27, 1993. This OS marked an
important milestone for Microsoft. Some of its main features were:
It was the first Windows operating system to merge support for high-end client/server
business applications.
It contains new built-in features for security, operating system power, performance,
desktop scalability and reliability.
It included support for multiprocessor (more than one CPU) architecture.
Windows NT was geared towards business users and had a rich Application
Programming Interface (API), which made it easier to run high-end engineering and
scientific applications.
WINDOWS 95
Microsoft released this version of Windows in 1995. Some of its main features were:
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 19 of 183
Provided 32 bit operating system with built-in Internet-support
Facilitated easy installation of hardware peripherals and software applications through
plug- and- play capabilities
Enhanced multimedia capabilities, more powerful features for mobile computing and
integrated networking
WINDOWS 98
Microsoft released this version of Windows in 1998. It is often described as an operating
system that ‘Works Better, Plays Better’. Some of its main features were:
New features were added to enable easy access to Internet-related information.
Multiple display support allowed using several Visual Display Units (VDU)
simultaneously to augment the capacity of the desktop and to allow running of different
programs on separate monitors.
USB Support – the Universal Serial Bus made a computer easier to use with advanced
plug-and-play capabilities. It allowed supplementing devices to your computer without
having to restart each time a device is added to the computer.
Accessibility wizard made it easier for physically challenged people to operate a
computer without installing any special software.
An extensive and easy-to-use self-help system was provided.
WINDOWS 2000 PROFESSIONAL
Microsoft released this version of Windows in 2000. It was an upgrade to Windows
NT4.0. It was designed with the aim to replace Windows 95, Windows 98 and Windows
NT 4.0 on desktops and laptops. It added major improvement in reliability, easy usage,
internet compatibility and support for mobile computing. It made hardware installation
much easier by adding support to a wide variety of new Plug and Play hardware,
including advanced networking and wireless products, USB devices and infrared
devices.
WINDOWS XP
Windows XP features user-friendly screens, simplified menus among other features. It
was a major breakthrough for desktop operating systems. Two main versions of
Windows XP were released, viz. Windows XP Home Edition and Windows XP
Professional Edition. Features of Windows XP are:
Safe and easy personal computing:
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 20 of 183
Windows XP makes personal computing easy and enjoyable. Along with unmatched
dependability and security, Windows XP displays power, performance, a bright original
appearance and abundant assistance tailored to one's requirement. World of Digital
Media: Work at length using digital media while at home, at work and on the Internet.
Enjoy photography, music, videos, computer games and more.
Connected Home and Office:
Share files, photos, music, even a printer and Internet connection; all on a network that
is private and secure.
Best for Business:
With Windows XP, you get the established reliability of Microsoft Windows 2000,
enhanced for high-speed performance and even superior consistency.
Text Processing Software
The text processing Software or Word Processing is one of the most significant
Application packages of Windows. The Word processing software is used for creating
documents. Drafts, letters, reports, essays, write-ups etc. can be created by means of
word processing software. Earlier, Word Star was being used extensively for this
purpose. However, the most commonplace word processing package used today is
Microsoft Word.
Microsoft Word is Microsoft's word processing software. It was first released in 1983
bearing the name Multi-Tool Word for Xenix systems. Later, Versions for several other
platforms including IBM PCs running DOS (1983), the Apple Macintosh (1984), SCO
UNIX, OS/2 and Microsoft Windows (1989) were written. It is a component of the
Microsoft Office system; however, it is also sold as a standalone product and included
in Microsoft Works Suite.
Beginning with the 2003 version, the branding was revised to emphasize Word's identity
as a component within the Office suite. Microsoft began calling it Microsoft Office Word
instead of merely Microsoft Word. The latest releases are Word 2007 for Windows and
Word 2008 for Mac OS X.
Once again, the 2010 version appears to be branded as Microsoft Word, once again.
The contemporary versions are Microsoft Word 2010 for Windows and 2008 for Mac.
The significant features of MS Word are as follows:
It is an easy and simple package for a general user.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 21 of 183
The features such as paragraph, font, symbols, spell check, table, drawing, bullets and
numbering, page numbering provided by this package enable a user to develop a
document in an error free format.
The text file generated by MS Word is .doc. This file can be used in other applications
such as MS Excel, MS Visual Studio 6.0, MS Visual Studio.net, Web browser, pdf
format etc.
APPLICATION SOFTWARE – WORD PROCESSOR, SPREAD SHEET
Word processing software
Word processing software is used for creating documents. Drafts, letters, reports,
essays, write-ups etc can be created using word processing software. Earlier, Word Star
was being used widely for this purpose. Sidekick and Word Perfect are also used for
drafting letters. However, the most commonly used word processing package in the
world is Microsoft Word, which will be discussed later in this book.
Spreadsheets
Spreadsheet is a computer application that simulates a paper worksheet. It displays
multiple cells that together, make up a grid consisting of rows and columns, each cell
containing either alphanumeric text or numeric values. Spreadsheets are frequently
used for financial information because of their ability to re-calculate the entire sheet
automatically after a change to a single cell is made.
Microsoft excel
Microsoft had been developing Excel on the Macintosh platform for several years to the
point, where it has developed into a powerful system. A port of Excel to Windows 2.0
resulted in a fully functional Windows spreadsheet. Starting in the mid 1990s and
continuing through the present, Microsoft Excel has dominated the commercial
electronic spreadsheet market.
Presentation programs
Microsoft PowerPoint is a presentation program developed by Microsoft. It is part of the
Microsoft Office suite and runs on Microsoft Windows and Apple's Mac OS X computer
operating systems.
PowerPoint is widely used by business people, educators, students and trainers and is
among the most prevalent forms of persuasive technology. Beginning with Microsoft
Office 2003, Microsoft revised the branding to emphasize PowerPoint's place within the
office suite, calling it Microsoft Office PowerPoint instead of just Microsoft PowerPoint.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 22 of 183
The current versions are Microsoft Office PowerPoint 2007 for Windows and 2008 for
Mac.
Graphics programs
Computer graphics are graphics, which are created with the aid of computers and the
representation and manipulation of pictorial data by a computer. The development of
computer graphics has made the application more user-friendly. It is also easier to
understand and interpret many types of data. Developments in computer graphics had
a profound impact on many types of media and revolutionized the animation and video
game industry. The term computer graphics includes everything on computers that is
not text or sound. Today nearly all computers use some graphics and users expect to
control their computer through icons and pictures rather than just by typing.
Computer graphics has the following features:
Representation and manipulation of pictorial data by a computer
Development of technologies used to create and manipulate such pictorial data
Digitally synthesizing and manipulating visual content
Today computer-generated images touch many aspects of our daily life. Computer
imagery is found on television, in newspapers, in weather reports and during surgical
procedures. A well-constructed graph can present complex statistics in a way that is
easier to understand and interpret. Such graphs are used to illustrate papers, reports,
thesis and other presentation material. A range of tools and facilities are available to
enable users to visualize their data.
After data collection on paper, the next step is data entry in a computer system. While
small data sets can be easily transferred directly to a data matrix on the screen, one
can make use of computer forms to enter large amounts of data. These computer forms
are the analogue of case report forms on a computer screen. Data are typed into the
various fields of the form. Commercial programs allowing the design and use of forms
are, e.g., Microsoft Office Access or SAS. Epi Info™ is a freeware program
(http://www.cdc.gov/epiinfo/, cf. Fig.
Using the format of Access to save data, but with a graphical user interface that is
easier to handle than that of Access.
After data entry using forms, the data values are saved in data matrices.
One row of the data matrix corresponds to one form, and each column of a row
corresponds to one field of a form.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 23 of 183
Electronic data bases can usually be converted from one program to another, e. g., from
a data entry program to a statistical software system like SPSS, which will be used
throughout the lecture. SPSS also offers a commercial program for the design of case
report forms and data entry (“SPSS Data Entry”).
When building a data base, no matter which program is used, the first step is to decide
which variables it should consist of. In a second step we must define the properties of
each variable.
The following rules apply:
The first variable should contain a unique patient identification number, which is also
recorded on the case report forms.
Each property that may vary in individuals can be considered as a variable.
With repeated measurements on individuals (e.g., before and after treatment) there are
several alternatives:
Wide format:
One row (form) per individual; repeated measurements on the same property are
recorded in multiple columns (or fields on the form); e. g., PAT=patient identification
number, VAS1, VAS2, VAS3 = repeated measurements on VAS
Long format:
One row per individual and measurement; repeated measurements on the same
property are recorded in multiple rows of the same column, using a separate column to
define the time of measurement; e.g., PAT=patient identification number, TIME=time of
measurement, VAS=value of VAS at the time of measurement
If for the first alternative the number of columns (fields) becomes too large such that
computer forms become too complex, the second alternative will be chosen. Note that
we can always restructure data from the wide to the long format and vice versa.
SPSS.
The statistical software package SPSS appears in several windows:
The data editor is used to build data bases, to enter data by hand, to import data from
other programs, to edit data, and to perform interactive data analyses.
The viewer collects results of data analyses.
The chart editor facilitates the modification of diagrams prepared by SPSS and allows
identifying individuals on a chart.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 24 of 183
Using the syntax editor, commands can be entered and collected in program scripts,
which facilitates non-interactive automated data analysis.
Data files are represented in the data editor, which consists of two tables (views). The
data view shows the data matrix with rows and columns corresponding to individuals
and variables, respectively.
The variable view contains the properties of the variables. It can be used to define new
variables or to modify properties of existing variables. These properties are:
Name: unique alphanumeric name of a variable, starting with an alphabetic character.
The name may consist of alphabetic and numeric characters and the underscore (“_”).
Neither spaces nor special characters are allowed.
Width: the maximum number of places.
Decimals: the number of decimal places.
Label: a description of the variable. The label will be used to name variables in all
menus and in the output viewer.
Values: labels assigned to values of - nominal or ordinal - variables.
The value labels replace values in any category listings. Example: value 1 corresponds
to value label ‘male’, value 2 to label ‘female’.
Data are entered as 1 and 2, but SPSS displays ‘male’ and ‘female’.
Using the button
one can switch between the display of the values and the value labels in the data view.
Within the value label view, one can directly choose from the defined value labels when
entering data.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 25 of 183
Missing: particular values may be defined as missing values. These values are not used
in any analyses. Usually, missing data values are represented as empty fields and if a
data value is missing for an individual, it is left empty in the data matrix.
Columns: defines the width of the data matrix column of a variable as number of
characters. This value is only relevant for display and changes if one column is
broadened using the mouse.
Align: defines the alignment of the data matrix column of a variable (left/right/center).
Measure: nominal, ordinal or scale. The applicability of particular statistical operations
on a variable (e. g., computing the mean) depends on the measure of the variable. The
measure of a variable is called o nominal if each observation belongs to one of a set of
categories, and it is called o ordinal if these categories have a natural order.
SPSS calls the measure of a variable o ‘scale’ if observations are numerical values that
represent different magnitudes of the variable. Usually (outside SPSS), such variables
are called `metric’, ‘continuous’ or ‘quantitative’, compared to the ‘qualitative’ and ‘semi-
quantitative’ nature of nominal and ordinal variables, respectively. Examples for
nominal variables are sex, type of operation, or type of treatment. Examples for ordinal
variables are treatment dose, tumor stage, response rating, etc. Scale variables are, e.
g., height, weight, blood pressure, age.
Type: the format in which data values are stored. The most important are the numeric,
string, and date formats.
o Nominal and ordinal variables: choose the numeric type.
Categories should be coded as 1, 2, 3, … or 0, 1, 2, … Value labels should be used to
paraphrase the codes.
o Scale variables: choose the numeric type, pay attention to the correct number of
decimal places, which applies to all computed statistics (e. g., if a variable is defined
with 2 decimal places, and you compute the mean of that variable, then 2 decimal
places will be shown). However, the computational accuracy is not affected by this
option.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 26 of 183
o Date variables: choose the date type; otherwise, SPSS won’t be able to compute the
length of time passing between two dates correctly.
The string format should only be used for text that will not be analyzed statistically (e.
g., addresses, remarks). For nominal or ordinal variables, the numeric type should be
used throughout as it requires an exact definition of categories. The list of possible
categories can be extended while entering data, and category codes can be recoded after
data entry.
Example: Consider the variable “location” and the possible outcome category “pancreas”
and “stomach”. How should this variable be defined?
Proper definition:
The variable is defined properly, if these categories are given two numeric codes, 1 and
2, say.
Value labels paraphrase these codes: In any output produced by SPSS, these value
labels will be used instead of the numeric codes. When entering data, the user may
choose between any of the predefined outcome categories:
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 27 of 183
Improper definition:
The variable is defined with string type of length 10. The user enters alphabetical
characters instead of choosing from a list (or entering numbers).
This may easily lead to various versions of the same category:
All entries in the column “location” will be treated as separate categories.
Thus the program works with six different categories instead of two.
Further remarks applying to data entry with any program:
Numerical variables should only contain numbers and no units of measurements (e.g.
“kg”, “mm Hg”, “points”) or other alphabetical or special characters. This is of special
importance if a spreadsheet program like Microsoft Excel is used for data entry. Unlike
real data base programs, spreadsheet programs allow the user to enter any type of data
in any cell, so the user solely takes responsibility over the entries.
“True” missing values should be left empty rather than using special codes for them
(e.g. -999, -998, -997). If special codes are used, they must be defined as missing value
codes and they should be defined as value lables as well. Special codes can be
advantageous for “temporal” missing values (e.g. -999=”ask patient”, -998=”ask
nurse”, -997=”check CRF”).
A missing value means that the value is missing. By constrast, in Microsoft® Office
Excel® an empty cell is sometimes interpreted as zero.
Imprecise values can be characterized by adding a column showing the degree of
certainty that is associated with such values (e. g., 1=exact value, 0=imprecise value).
This allows the analyst to drive two analyses: one with exact values only, and one using
also imprecise values. By no way should imprecisely collected data values be tagged by
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 28 of 183
a question mark! This will turn the column into a string format, and SPSS (or any other
statistics program) will not be able to use it for analyses.
Enter numbers without using separators (e. g., enter 1000 as 1000 not as 1,000).
If in a data base or statistics program a variable is defined as numeric then it is not
possible to enter something else than numbers! Therefore, programs that do not
distinguish variable types are error-prone (e. g., Excel).
For a more sophisticated discussion about data management issues the reader is
referred to Appendices A and B.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 29 of 183
UNIT – 2: COMPUTER NETWORK AND PROGRAMMING LANGUAGES
Structure, architecture, Advantages, types (LAN, MAN & WAN), Network protocols
– Internal protocol (TCP /IP), File transfer protocols (FTP), WWW, HTTP, HTML,
URL. Network Security – Group polices Fire –walls. C Programming and PERL –
Algorithm and flow chart, Structure of C program, Header file, Global declaration.
Main function variable declarations, Control statement – conditional and
unconditional – sub functions. Introduction and application of PERL & Bioperl
STRUCTURE, ARCHITECTURE, ADVANTAGES, TYPES (LAN, MAN & WAN)
Local Area Network (LAN) is a data communications network connecting terminals,
computers and printers within a building or other geographically limited areas. These
devices may be connected through wired cables or wireless links. Ethernet, Token Ring
and Wireless LAN using IEEE 802.11 are examples of standard LAN technologies.
Ethernet is by far the most commonly used LAN technology. Token Ring technology is
still used by some companies. FDDI is sometimes used as a backbone LAN
interconnecting Ethernet or Token Ring LANs. WLAN using IEEE 802.11 technologies is
rapidly becoming the new leading LAN technology because of its mobility and easy to
use features.
Local Area Networks can be interconnected using Wide Area Network (WAN) or
Metropolitan Area Network (MAN) technologies. The common WAN technologies include
TCP/IP, ATM, and Frame Relay etc.
The common MAN technologies include SMDS and 10 Gigabit Ethernet.
LANs are traditionally used to connect a group of people who are in the same local area.
However, working groups are becoming more geographically distributed in today’s
working environment. In these cases, virtual LAN (VLAN) technologies are defined for
people in different places to share the same networking resource.
Local Area Network protocols are mostly at the data link layer (layer 2). IEEE is the
leading organization defining LAN standards.
Ethernet: IEEE 802.3 Local Area Network protocols
Protocol Description
Ethernet protocols refer to the family of local-area networks (LAN) covered by a group of
IEEE 802.3 standards. In the Ethernet standard, there are two modes of operation:
half-duplex and full-duplex. In the half-duplex mode, data are transmitted using the
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 30 of 183
popular Carrier-Sense Multiple Access/Collision Detection (CSMA/CD) protocol on a
shared medium. The main disadvantages of the half-duplex are the efficiency and
distance limitation, in which the link distance is limited by the minimum MAC frame
size. This restriction reduces the efficiency drastically for high-rate transmission.
Therefore, the carrier extension technique is used to ensure the minimum frame size of
512 bytes in Gigabit Ethernet to achieve a reasonable link distance.
Four data rates are currently defined for operation over optical fiber and twisted-pair
cables:
10 Mbps—10Base-T Ethernet (802.3)
100 Mbps—Fast Ethernet (802.3u)
1000 Mbps—Gigabit Ethernet (802.3z)
10-Gigabit Ethernet - IEEE 802.3ae
The general aspects of the Ethernet
The Ethernet system consists of three basic elements:
The physical medium used to carry Ethernet signals between computers,
A set of medium access control rules embedded in each Ethernet interface that allows
multiple computers to fairly arbitrate access to the shared Ethernet channel, and
An Ethernet frame that consists of a standardized set of bits used to carry data over the
system.
As with all IEEE 802 protocols, the ISO data link layer is divided into two IEEE 802
sublayers, the Media Access Control (MAC) sub-layer and the MAC-client sublayer. The
IEEE 802.3 physical layer corresponds to the ISO physical layer.
The MAC sublayer has two primary responsibilities:
Data encapsulation, including frame assembly before transmission, and frame
parsing/error detection during and after reception.
Media access control, including initiation of frame transmission and recovery from
transmission failure
The MAC-client sublayer may be one of the following:
Logical Link Control (LLC), which provides the interface between the Ethernet MAC and
the upper layers in the protocol stack of the end station. The LLC sublayer is defined by
IEEE 802.2 standards.
Bridge entity, which provides LAN-to-LAN interfaces between
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 31 of 183
LANs that use the same protocol (for example, Ethernet to Ethernet) and also between
different protocols (for example, Ethernet to Token Ring). Bridge entities are defined by
IEEE 802.1 standards.
Each Ethernet-equipped computer operates independently of all other stations on the
network: there is no central controller.
All stations attached to an Ethernet are connected to a shared signaling system, also
called the medium. To send data a station first listens to the channel and, when the
channel is idle then transmits its data in the form of an Ethernet frame, or packet.
After each frame transmission, all stations on the network must contend equally for the
next frame transmission opportunity. Access to the shared channel is determined by
the medium access control (MAC) mechanism embedded in the Ethernet interface
located in each station. The medium access control mechanism is based on a system
called Carrier Sense Multiple Access with Collision Detection (CSMA/CD).
As each Ethernet frame is sent onto the shared signal channel, all Ethernet interfaces
look at the destination address. If the destination address of the frame matches with
the interface address, the frame will be read entirely and be delivered to the networking
software running on that computer. All other network interfaces will stop reading the
frame when they discover that the destination address does not match their own
address.
When it comes to how signals flow over the set of media segments that make up an
Ethernet system, it helps to understand the topology of the system. The signal topology
of the Ethernet is also known as the logical topology, to distinguish it from the actual
physical layout of the media cables. The logical topology of an Ethernet provides a
single channel (or bus) that carries Ethernet signals to all stations.
Multiple Ethernet segments can be linked together to form a larger Ethernet LAN using
a signal amplifying and retiming device called a repeater. Through the use of repeaters,
a given Ethernet system of multiple segments can grow as a “non-rooted branching
tree.” “Non-rooted” means that the resulting system of linked segments may grow in any
direction, and does not have a specific root segment. Most importantly, segments must
never be connected in a loop. Every segment in the system must have two ends, since
the Ethernet system will not operate correctly in the presence of loop paths.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 32 of 183
Even though the media segments may be physically connected in a star pattern, with
multiple segments attached to a repeater, the logical topology is still that of a single
Ethernet channel that carries signals to all stations.
The basic IEEE 802.3 MAC Data Frame for 10/100Mbps Ethernet: • Preamble (Pre)— 7
bytes. The PRE is an alternating pattern of ones and zeros that tells receiving stations
that a frame is coming, and that provides a means to synchronize the frame-reception
portions of receiving physical layers with the incoming bit stream.
Start-of-frame delimiter (SFD)—1 byte
The SOF is an alternating pattern of ones and zeros, ending with two consecutive 1-bits
indicating that the next bit is the left-most bit in the left-most byte of the destination
address.
Destination address (DA)— 6 bytes
The DA field identifies which station(s) should receive the frame.
Source addresses (SA)— 6 bytes
The SA field identifies the sending station.
Length/Type— 2 bytes
This field indicates either the number of MAC-client data bytes that are contained in the
data field of the frame, or the frame type ID if the frame is assembled using an optional
format.
Data—
Is a sequence of n bytes (46=< n =<1500) of any value. The total frame minimum is
64bytes.
Frame check sequence (FCS)— 4 bytes
This sequence contains a 32-bit cyclic redundancy check (CRC) value, which is created
by the sending MAC and is recalculated by the receiving MAC to check for damaged
frames.
MAC Frame with Gigabit Carrier Extension:
1000Base-X has a minimum frame size of 416bytes, and 1000Base-T has a minimum
frame size of 520bytes. An extension field is used to fill the frames that are shorter than
the minimum length.
Related protocols
IEEE 802.3, 802.3u, 802.3z, 802.3ab, 802.2, 802.1, 802.3ae, 802.1D, 802.1G, 802.1Q,
802.1p
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 33 of 183
Sponsor Source
Ethernet standards are defined by IEEE (http://www.ieee.org) in 802.3 specifications
NETWORK PROTOCOLS
A network architecture is a blueprint of the complete computer communication
network, which provides a framework and technology foundation for designing, building
and managing a communication network. It typically has a layered structure. Layering
is a modern network design principle which divides the communication tasks into a
number of smaller parts, each part accomplishing a particular sub-task and interacting
with the other parts in a small number of well-defined ways. Layering allows the parts
of a communication to be designed and tested without a combinatorial explosion of
cases, keeping each design relatively simple.
If network architecture is open, no single vendor owns the technology and controls its
definition and development. Anyone is free to design hardware and software based on
the network architecture. The TCP/IP network architecture, which the Internet is based
on, is such a open network architecture and it is adopted as a worldwide network
standard and widely deployed in local area network (LAN), wide area network (WAN),
small and large enterprises, and last but not the least, the Internet.
Open Systems Interconnection (OSI) network architecture, developed by International
Organization for Standardization, is an open standard for communication in the
network across different equipment and applications by different vendors. Though not
widely deployed, the OSI 7 layer model is considered the primary network architectural
model for inter-computing and inter-networking communications.
In addition to the OSI network architecture model, there exist other network
architecture models by many vendors, such as IBM SNA (Systems Network
Architecture), Digital Equipment Corporation (DEC; now part of HP) DNA (Digital
Network Architecture), Apple computer’s AppleTalk, and Novell’s NetWare. Actually, the
TCP/IP architecture does not exactly match the OSI model. Unfortunately, there is no
universal agreement regarding how to describe TCP/IP with a layered model. It is
generally agreed that TCP/IP has fewer levels (from three to five layers) than the seven
layers of the OSI model.
Network architecture provides only a conceptual framework for communications
between computers. The model itself does not provide specific methods of
communication. Actual communication is defined by various communication protocols.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 34 of 183
INTERNAL PROTOCOL (TCP /IP)
TCP/IP architecture does not exactly follow the OSI model. Unfortunately, there is no
universal agreement regarding how to describe TCP/IP with a layered model. It is
generally agreed that TCP/IP has fewer levels (from three to five layers) than the seven
layers of the OSI model. We adopt a four layers model for the TCP/IP architecture.
TCP/IP architecture omits some features found under the OSI model, combines the
features of some adjacent OSI layers and splits other layers apart. The 4-layer structure
of TCP/IP is built as information is passed down from applications to the physical
network layer. When data is sent, each layer treats all of the information it receives
from the upper layer as data, adds control information (header) to the front of that data
and then pass it to the lower layer. When data is received, the opposite procedure takes
place as each layer processes and removes its header before passing the data to the
upper layer. The TCP/IP 4-layer model and the key functions of each layer are described
below:
Application Layer
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 35 of 183
The Application Layer in TCP/IP groups the functions of OSI Application, Presentation
Layer and Session Layer. Therefore any process above the transport layer is called an
Application in the TCP/IP architecture. In TCP/IP socket and port are used to describe
the path over which applications communicate. Most application level protocols are
associated with one or more port number.
Transport Layer
In TCP/IP architecture, there are two Transport Layer protocols. The Transmission
Control Protocol (TCP) guarantees information transmission. The User Datagram
Protocol (UDP) transports datagram swithout end-to-end reliability checking. Both
protocols are useful for different applications.
Network Layer
The Internet Protocol (IP) is the primary protocol in the TCP/IP Network Layer. All upper
and lower layer communications must travel through IP as they are passed through the
TCP/IP protocol stack. In addition, there are many supporting protocols in the Network
Layer, such as ICMP, to facilitate and manage the routing process.
Network Access Layer:
In the TCP/IP architecture, the Data Link Layer and Physical Layer are normally
grouped together to become the Network Access layer. TCP/IP makes use of existing
Data Link and Physical Layer standards rather than defining its own. Many RFCs
describe how IP utilizes and interfaces with the existing data link protocols such as
Ethernet, Token Ring, FDDI, HSSI, and ATM.
The physical layer, which defines the hardware communication properties, is not often
directly interfaced with the TCP/IP protocols in the network layer and above.
FILE TRANSFER PROTOCOLS (FTP), WWW, HTTP, HTML, URL
FTP: File Transfer Protocol
Protocol Description
File Transfer Protocol (FTP) enables file sharing between hosts. FTP uses TCP to create a
virtual connection for control information and then creates a separate TCP connection
for data transfers. The control connection uses an image of the TELNET protocol to
exchange commands and messages between hosts.
Sometimes FTP access to files may be restricted. To retrieve files from these computers,
you must know the address and have a user ID and a password. However, many
computers are set up as anonymous FTP servers, where user usually logs in as
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 36 of 183
anonymous and gives his/her e-mail address as a password. Internet browsers such as
Netscape Navigator and Internet Explorer support anonymous FTP. Simply change the
URL from http:// to ftp: // and follow it with the name of the FTP site you wish to go
to. However, you must fill in your identity under the Options menu so that the browser
can log you in as an anonymous user. The program and files on FTP sites are usually
organized hierarchically in a series of directories. Those on anonymous FTP sites are
often in a directory called pub (i.e., public). It is worth remembering that many FTP
sites are running on computers with a UNIX operating system that is case sensitive.
Many FTP servers supply text information when you login, in addition to the readme
file. You can get help at the FTP prompt by typing ‘help’ or ‘?’. Before you
download/upload the image file, select the option of transferring from asc (ascii) for text
to bin (binary) for the image (graphic) files. Most files are stored in a compressed or
zipped format. Some programs for compressing and uncompressing come as one
integrated package, while others are two separate programs. The most commonly used
compression programs are as follows:
The key functions of FTP are:
To promote sharing of files (computer programs and/or data);
To encourage indirect or implicit (via programs) use of remote computers;
To shield a user from variations in file storage systems among hosts; and
To transfer data reliably and efficiently.
FTP, though usable directly by a user at a terminal, is designed mainly for use by
programs. FTP control frames are TELNET exchanges and can contain TELNET
commands and option negotiation. However, most FTP control frames are simple ASCII
text and can be classified as FTP commands or FTP messages.
FTP messages are responses to FTP commands and consist of a response code followed
by explanatory text.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 37 of 183
Protocol Structure
Command Description
ABOR Abort data connection process.
ACCT <account> Account for system privileges.
ALLO <bytes> Allocate bytes for file storage on server.
APPE <filename> Append file to file of same name on server.
CDUP <dir path> Change to parent directory on server.
CWD <dir path> Change working directory on server.
DELE <filename> Delete specified file on server.
HELP <command> Return information on specified command.
LIST <name> List information if name is a file or list files if name is a
directory.
MODE <mode> Transfer mode (S=stream, B=block,C=compressed).
MKD <directory> Create specified directory on server.
NLST <directory> List contents of specified directory.
NOOP Cause no action other than acknowledgement from server.
PASS <password> Password for system log-in.
PASV Request server wait for data connection.
PORT <address> IP address and two-byte system port ID.
PWD Display current working directory.
QUIT Log off from the FTP server.
REIN Reinitialize connection to log-in status.
REST <offset> Restart file transfer from given offset.
RETR <filename> Retrieve (copy) file from server.
RMD <directory> Remove specified directory on server.
RNFR <old path> Rename from old path.
RNTO <new path> Rename to new path.
SITE <params> Site specific parameters provided by server.
SMNT <pathname> Mount the specified file structure.
STAT <directory> Return information on current process or directory.
STOR <filename> Store (copy) file to server.
STOU <filename> Store file to server name.
STRU <type> Data structure (F=file, R=record,P=page).
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 38 of 183
SYST Return operating system used by server.
TYPE <data type> Data type (A=ASCII, E=EBCDIC, I=binary).
USER <username> User name for system log-in.
Standard FTP messages are as follows:
Response Code Explanatory Text
110 Restart marker at MARK yyyy=mmmm (new file pointers).
120 Service ready in nnn minutes.
125 Data connection open, transfer starting.
150 Open connection.
200 OK.
202 Command not implemented.
211 (System status reply).
212 (Directory status reply).
213 (File status reply).
214 (Help message reply).
215 (System type reply).
220 Service ready.
221 Log off network.
225 Data connection open.
226 Close data connection.
227 Enter passive mode (IP address, port ID).
230 Log on network.
250 File action completed.
257 Path name created.
331 Password required.
332 Account name required.
350 File action pending.
421 Service shutting down.
425 Cannot open data connection.
426 Connection closed.
450 File unavailable.
451 Local error encountered.
452 Insufficient disk space.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 39 of 183
500 Invalid command.
501 Bad parameter.
502 Command not implemented.
503 Bad command sequence.
504 Parameter invalid for command.
530 Not logged onto network.
532 Need account for storing files.
550 File unavailable.
551 Page type unknown.
552 Storage allocation exceeded.
553 File name not allowed.
Related protocols: TELNET
Sponsor Source: FTP is defined by IETF (http://www.ietf.org) in RFC 959 and updated
by 2228, 2640 and 2773.
WWW, HTTP, HTML, URL
The World Wide Web (WWW) is the worldwide connection of computer servers and a way
of using the vast interconnected network to find and view information from around the
world. Internet uses a language, TCP/IP, for talking back and forth. The TCP part
determines how to take apart a message into small packets that travel on the Internet
and then reassemble them at the other end.
The IP part determines how to get to other places on the Internet.
The WWW uses an additional language called the HyperText Transfer Protocol (HTTP).
The main use of the Web is for information retrieval, whereby multimedia documents
are copied for local viewing.
Web documents are written in HyperText Markup L anguage (HTML) which describes,
most importantly, where hypertext links are located within the document. These
hyperlinks provide connections between documents, so that a simple click on a
hypertext word or picture on a Web page allows your computer to extend across the
Internet and bring the document to your computer.
The central repository having information the user wants is called a server, and your
(user’s) computer is a client of the server. Because HTML is predominantly a generic
descriptive language based around text, it does not define any graphical descriptors.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 40 of 183
The common solution was to use bit-mapped graphical images in so-called GIF
(graphical interchange format) or JPEG (Joint photographic experts group) format.
Every Web document has a descriptor, Uniform Resource L ocator (URL), which
describes the address and file name of the page. The key to the Web is the browser
program, which is used to retrieve and display Web documents. The browser is an
Internet compatible program and does three things for Web documents:
It uses Internet to retrieve documents from servers.
It displays these documents on your screen, using formatting specified in the
document.
It makes the displayed documents active.
The common browsers are Netscape Navigator from Netscape Communications
(http://home.netscape.com/) and Internet Explore from Microsoft (http://
www.microsoft.com/).
To request a Web page on the Internet, you either click your mouse on a hyperlink or
type in the URL. The HTML file for the page is sent to your computer together with each
graphic image, sound sequence, or other special effect file that is mentioned in the
HTML file. Since some of these files may require special programming that has to be
added to your browser, you may have to download the program the first time you
receive one of these special files. These programs are called helper applications, add-
ons, or plug-ins.
A mechanism called MIME (Multipurpose Internet Mail Extensions), which allows a
variety of standard file formats to be exchanged over the Internet using electronic mail,
has been adopted for use with WWW. When a user makes a selection through a
hyperlink within an HTML document, the client browser posts the request to the
designated web server. Assuming that the server accepts the request, it locates the
appropriate file(s) and sends it with a short header at the top of each datafile/document
to the client, with the relevant MIME header attached. When the browser receives the
information, it reads the MIME types such as text/html or image/gif the browser have
been built in such a way that they can simply display the information in the browser
window. For given MIME type, a local preference file is inspected to determine what (if
any) local program (known as a helper application or plug-in) can display the
information, this program is then launched with the data file, and the results are
displayed in a newly opened application window. The important aspect of this
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 41 of 183
mechanism is that it achieves the delivery of semantic content to the user, who can
specify the style in which it will be displayed via the choice of an appropriate application
program.
Once connected to the Web, a variety of WWW directories and search engines are
available for those using the Web in a directed fashion. First, there are Web catalogues;
the best known of these is Yahoo, which organizes Web sites by subject classification.
You can either scroll through these subject categories or use the Yahoo search engine.
It also simultaneously forwards your search request to other leading search engines
such as AltaVista, Excite, and others. Alternatively, there are the Web databases, where
the contents of Web pages are indexed and searchable such as Lycos and InfoSeek.
Google is extremely comprehensive. Pages are ranked based on how many times they
are linked from other pages, thus a Google search would bring you to the well-traveled
pages that match your search topic. HotBot is relatively comprehensive and regularly
updated. It offers form-based query tools that eliminate the need for you to formulate
query statements. Major search engines on the Web are listed in Table 3.2. Many other
specialized search engines can be found in EasySearcher 2
(http://www.easysearcher.com/ez2.html).
HTTP
Protocol Description
The Hypertext Transfer Protocol (HTTP) is an application-level protocol with the
lightness and speed necessary for distributed, collaborative, hypermedia information
systems. HTTP has been in use by the World-Wide Web global information initiative
since 1990.
HTTP allows an open-ended set of methods to be used to indicate the purpose of a
request. It builds on the discipline of reference provided by the Uniform Resource
Identifier (URI), as a location (URL) or name (URN), for indicating the resource on which
a method is to be applied. Messages are passed in a format similar to that used by
Internet Mail and the Multipurpose Internet Mail Extensions (MIME).
HTTP is also used as a generic protocol for communication between user agents and
proxies/gateways to other Internet protocols, such as SMTP, NNTP, FTP, Gopher and
WAIS, allowing basic hypermedia access to resources available from diverse
applications and simplifying the implementation of user agents.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 42 of 183
The HTTP protocol is a request/response protocol. A client sends a request to the server
in the form of a request method, URI, and protocol version, followed by a MIME-like
message containing request modifiers, client information, and possible body content
over a connection with a server. The server responds with a status line, including the
message’s protocol version and a success or error code, followed by a MIME-like
message containing server information, entity meta information, and possible
entitybody content.
The first version of HTTP, referred to as HTTP/0.9, was a simple protocol for raw data
transfer across the Internet. HTTP/1.0, as defined by RFC 1945, improved the protocol
by allowing messages to be in the format of MIME-like messages, containing meta
information about the data transferred and modifiers on the request/response
semantics. However, HTTP/1.0 does not sufficiently take into consideration the effects
of hierarchical proxies, caching, the need for persistent connections, or virtual hosts.
“HTTP/1.1” includes more stringent requirements than HTTP/1.0 in order to ensure
reliable implementation of its features.
There is a secure version of HTTP (S-HTTP) specification, which will be discussed in a
separate document.
Protocol Structure
HTTP messages consist of requests from client to server and responses from server to
client.
The request message has the following format:
The Request-Line begins with a method token, followed by the Request-URI and the
protocol version, and ends with CRLF. The elements are separated by SP characters. No
CR or LF is allowed except in the final CRLF sequence. The details of the general
header, request header and entity header can be found in the reference documents.
The response message has the following format:
The Status-Code element is a 3-digit integer result code of the attempt to understand
and satisfy the request. The Reason-Phrase is intended to give a short textual
description of the Status-Code. The Status-Code is intended for use by automata and
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 43 of 183
the Reason-Phrase is intended for the human user. The client is not required to
examine or display the Reason-Phrase. The details of the general header, response
header and entity header could be found in the reference documents.
Related protocols: WWW, FTP, STMP, NNTP, Gopher, WAIS, DNS, S-HTTP
Sponsor Source
HTTP is defined by IETF (http://www.ietf.org) in RFC 1945 and 2616.
NETWORK SECURITY – GROUP POLICES FIRE –WALLS
Network security is an increasingly important factor in bioinformatics because of the
central role that online databases, applications, and groupware such as e-mail play in
the day-to-day operation of a bioinformatics facility. Opening an intranet to the outside
world through username and password protected restricted access may be the basis for
collaboration as well as a weak point in the security of the organization. In addition,
because many biometric laboratories are involved, even if indirectly, with applied
genomics, there is a group of politically active opponents to this research. The
computers avvy members of these activist groups represent a potential threat to
network security.
Every network presents a variety of security holes through which potential hackers and
disgruntled or simply curious employees can implement random threats, such as
viruses. Many of these threats are network- and operating system–specific. For
example, Microsoft typically announces a service pack within a few weeks after the
introduction of a server-based operating system to patch security holes discovered by
users.
The most secure method—physical isolation from outside networks—isn't usually a
viable option. Even a closed network without dial-in or any other wired access to other
networks can be breached by someone with enough motivation and time. For example,
wireless networks are notorious for their potential to disseminate data to nearby
listeners. A hacker with a high-gain antenna, receiver, and laptop computer can
monitor wireless network activity from a mile or more away. A similar setup, configured
to a slightly different frequency, can be used to reconstruct whatever data is displayed
on a video screen, including username and password information. Every cable,
peripheral, and display device emits a radio frequency signal that can be captured,
amplified, and read. For this reason, computer facilities used by military contractors are
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 44 of 183
frequently located in shielded, windowless rooms that minimize the chances of the
radiation emitted from a computer reaching someone who is monitoring the building.
Although it may be practically impossible to maintain security from professional
industrial spies, a variety of steps can be taken to minimize the threat posed by
modestly computer-savvy activists and the most common non-directed security threats.
These steps include using antiviral utilities, controlling access through the use of
advanced user-authentication technologies, firewalls, and, most importantly, low-level
encryption technologies.
Antiviral Utilities
In addition to threats from hackers, there is a constant threat of catastrophic loss of
data from viruses attached to documents from outside sources, even those from trusted
collaborators. The risk of virus infection can be minimized by installing virus-scanning
software on servers and locally on workstations. The downside to this often-unavoidable
precaution is decreased performance of the computers running antiviral programs, as
well as the maintenance of the virus-detection software to insure that the latest virus
definitions are installed.
Authentication
The most often used method of securing access to a network is to verify that users are
who they say they are. However, simple username and password protection at the
firewall and server levels can be defeated by someone who either can guess or otherwise
has access to the username and password information. A more secure option is to use a
synchronized, pseudorandom number generator for passwords. In this scheme, two
identical pseudorandom number generators, one running on a credit card–sized
computer and one running on a secure server, generate identical number sequences
that appear to be random to an observer.
The user carries a credit-card sized secure ID card that displays the sequence on an
LCD screen. When a user logs in to the computer network, she uses the displayed
number sequence for her password, which is compared to the current number
generated by a program running the server. If the sequences match, she is allowed
access to the server. Otherwise, she is locked out of the network. Because the number
displayed on the ID card—and in the server—changes every 30 seconds, the current
password doesn't provide a potential intruder with a way in to the system. The major
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 45 of 183
security hole is that a secure ID card can be stolen, which will provide the thief with the
password, but not the username.
More sophisticated methods of user authentication involve biometrics, the automated
recognition of fingerprint, voice, retina, or facial features. Authentication systems based
on these methods aren't completely accurate, however, and there are often false
positives (imposters passing as someone else) and false negatives (an authentic user is
incorrectly rejected by the system) involved in the process. In addition to errors in
recognition, there are often ways of defeating biometrical devices by bypassing the
image-processing components of the systems. For example, fingerprints are converted
into a number and letter sequence that serves as the key to gaining access to network
assets; anyone who can intercept that sequence and enter it directly into the system
can gain access to the network.
A researcher employed by a biotech firm to analyze nucleotide sequences probably has
no need to examine the files in a 3D protein visualization system in the laboratory a few
doors down from his office. Similarly, payroll, human resources, and other
administrative data may be of concern to the CFO, but not to the manager of the
microarray laboratory. Authentication provides the information necessary to provide
tiered access to networked resources. This access can be controlled at the workstation,
the server, and firewall levels to limit access to specific databases, applications, or
network databases.
Firewalls
As introduced in the discussion of network hardware, firewalls are stand-alone devices
or programs running on a server that block unauthorized access to a network.
Dedicated hardware firewalls are more secure than a software-only solution, but are
also considerably more expensive. Firewalls are commonly used in conjunction with
proxy servers to mirror servers inside a firewall, thereby intercepting requests and data
originally intended for an internal server. In this way, outside users can access copies of
some subset of the data on the system without ever having direct access to the data.
This practice provides an additional layer of security against hackers.
Encryption
Encryption, the process of making a message unintelligible to all but the intended
recipient, is one of the primary means of ensuring the security of messages sent
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 46 of 183
through the Internet and even in the same building. It's also one of the greatest
concerns—and limitations—of network professionals.
Many information services professionals are reluctant to install wireless networks
because of security concerns, for example.
Although cryptography—the study of encryption and decryption—predates computers
by several millennia, no one has yet devised a system that can't be defeated, given
enough time and resources.
Every form of encryption has tradeoffs of security versus processing and management
overhead, and different forms of encryption are used in different applications.
Of the encryption standards developed for the Internet, most are based on public key
encryption (PKE) technology. One reason that PKE is so prominent is because it's
supported by the Microsoft Internet Explorer and Netscape Navigator browsers. PKE is a
form of asymmetric encryption, in that the keys used for encryption and decryption are
different. Aside from the added complexity added by the use of different keys on the
sending and receiving ends, the two forms of encryption and decryption are virtually
identical.
C PROGRAMMING – ALGORITHM AND FLOW CHART
C is an offspring of ‘Basic Combined Programming Language’ (BCPL) called B,
developed in the 1960’s at Cambridge University. B language was modified by Dennis
Ritchie and was implemented at Bell Laboratories in 1972.The new language was
named as C. Since it was developed along with UNIX operating system, it is strongly
associated with UNIX. It can run under various operating systems including MSDOS.
C language is a middle level Language. It combines the features of high level language
and functionality like assembly language. It reduces the gap between high level
language and low-level language that is why it is known as middle level language. It is
well suited for writing both application software and system software.
Importance of C:
It is robust language.
Rich set of built-in functions and operators are available.
Easy to create complex programs.
Suitable to write both application software as well as system software because it
combines the features of assembly language with high level language features.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 47 of 183
It is highly portable. Because C programs written for one computer it can be run on
another with little or no modification.
It is well suited for structured programming. This makes the programmer to think
in terms of modules or blocks. This in turn makes program debugging, testing
and maintaining easier.
It can extend itself.
Programs written in C are efficient and faster. This is due to the variety of data type and
powerful operators.
It is many times faster than BASIC. For example a program to increment a variable from
0 to 15,000 takes 1 second in C where as more than 50 seconds in the BASIC
interpreter.
There are 32 keywords and the strength lies in the built-in functions available for
developing the programs.
Algorithm:
An algorithm is a sequence of instructions that one must perform in order to solve a
well-formulated problem. We will specify problems in terms of their inputs and their
outputs, and the algorithm will be the method of translating the inputs into the
outputs. A well-formulated problem is unambiguous and precise, leaving no room for
misinterpretation. In order to solve a problem, some entity needs to carry out the steps
specified by the algorithm. A human with a pen and paper would be able to do this, but
humans are generally slow, make mistakes, and prefer not to perform repetitive work. A
computer is less intelligent but can perform simple steps quickly and reliably. A
computer cannot understand English, so algorithms must be rephrased in a
programming language such as C or Java in order to give specific instructions to the
processor.
Flow chart:
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 48 of 183
PERL – ALGORITHM AND FLOW CHART
The strength of PERL at character-string handling makes it suitable for sequence
analysis tasks in biology.
Here is a very simple PERL program to translate a nucleotide sequence into an amino
acid sequence according to the standard genetic code. The first line, #!/usr/bin/perl, is
a signal to the UNIX (or LINUX) operating system that what follows is a PERL program.
Within the program, all text commencing with a #, through to the end of the line on
which it appears, is merely comment. The line __END__ signals that the program is
finished and what follows is the input data.
(All material that the reader might find useful to have in computer-readable form,
including all programs, appear in the web site associated with this book:
http://www.oup.com/uk/lesk/bioinf)
#!/usr/bin/perl
#translate.pl -- translate nucleic acid sequence to protein sequence
# according to standard genetic code
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 49 of 183
# set up table of standard genetic code
%standardgeneticcode = (
"ttt"=> "Phe", "tct"=> "Ser", "tat"=> "Tyr", "tgt"=> "Cys",
"ttc"=> "Phe", "tcc"=> "Ser", "tac"=> "Tyr", "tgc"=> "Cys",
"tta"=> "Leu", "tca"=> "Ser", "taa"=> "TER", "tga"=> "TER",
"ttg"=> "Leu", "tcg"=> "Ser", "tag"=> "TER", "tgg"=> "Trp",
"ctt"=> "Leu", "cct"=> "Pro", "cat"=> "His", "cgt"=> "Arg",
"ctc"=> "Leu", "ccc"=> "Pro", "cac"=> "His", "cgc"=> "Arg",
"cta"=> "Leu", "cca"=> "Pro", "caa"=> "Gln", "cga"=> "Arg",
"ctg"=> "Leu", "ccg"=> "Pro", "cag"=> "Gln", "cgg"=> "Arg",
"att"=> "Ile", "act"=> "Thr", "aat"=> "Asn", "agt"=> "Ser",
"atc"=> "Ile", "acc"=> "Thr", "aac"=> "Asn", "agc"=> "Ser",
"ata"=> "Ile", "aca"=> "Thr", "aaa"=> "Lys", "aga"=> "Arg",
"atg"=> "Met", "acg"=> "Thr", "aag"=> "Lys", "agg"=> "Arg",
"gtt"=> "Val", "gct"=> "Ala", "gat"=> "Asp", "ggt"=> "Gly",
"gtc"=> "Val", "gcc"=> "Ala", "gac"=> "Asp", "ggc"=> "Gly",
"gta"=> "Val", "gca"=> "Ala", "gaa"=> "Glu", "gga"=> "Gly",
"gtg"=> "Val", "gcg"=> "Ala", "gag"=> "Glu", "ggg"=> "Gly"
);
# process input data
while ($line = <DATA>) { # read in line of input
print "$line"; # transcribe to output
chop(); # remove end-of-line character
@triplets = unpack("a3" x (length($line)/3), $line); # pull out successive triplets
foreach $codon (@triplets) { # loop over triplets
print "$standardgeneticcode{$codon}"; # print out translation of each
} # end loop on triplets
print "nn"; # skip line on output
} # end loop on input lines
# what follows is input data
__END__
atgcatccctttaat
tctgtctga
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 50 of 183
Running this program on the given input data produces the output:
atgcatccctttaat
MetHisProPheAsn
tctgtctga
SerValTER
Even this simple program displays several features of the PERL language. The file
contains background data (the genetic code translation table), statements that tell the
computer to do something with the input (i.e. the sequence to be translated), and the
input data (appearing after the __END__ line). Comments summarize sections of the
program, and also describe the effect of each statement. The program is structured as
blocks enclosed in curly brackets: {...}, which are useful in controlling the flow of
execution. Within blocks, individual statements (each ending in a ;) are executed in
order of appearance. The outer block is a loop:
while ($line = <DATA>) {
...
}
<DATA> refers to the lines of input data (appearing after the __END__). The block is
executed once for each line of input; that is, while there is any line of input remaining.
Three types of data structures appear in the program. The line of input data, referred to
as $line, is a simple character string. It is split into an array or vector of triplets. An
array stores several items in a linear order, and individual items of data can be
retrieved from their positions in the array. For ease of looking up the amino acid coded
for by any triplet, the genetic code is stored as an associative array. An associative
array, or hash table, is a generalization of a simple or sequential array. If the elements
of a simple array are indexed by consecutive integers, the elements of an associative
array are indexed by any character strings, in this case the 64 triplets. We process the
input triplets in order of their appearance in the nucleotide sequence, but we need to
access the elements of the genetic code table in an arbitrary order as dictated by the
succession of triplets. A simple array or vector of character strings is appropriate for
processing successive triplets, and the associative array is appropriate for looking up
the amino acids that correspond to them.
.
PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS
Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 51 of 183
STRUCTURE OF C PROGRAM
The structure of a C program is a protocol (rules) to the programmer, while writing a C
program. The general basic structure of C program is shown in the figure below. The
whole program is controlled within main ( ) along with left brace denoted by “{” and right
braces denoted by “}”. If you need to declare local variables and executable program
structures are enclosed within “{” and “}” is called the body of the main function. The
main ( ) function can be preceded by documentation, preprocessor statements and
global declarations.
Documentations
The documentation section consist of a set of comment lines giving the name of the
program, the name and other details, which the programmer would like to use later.
Preprocessor Statements
The preprocessor statement begins with # symbol and is also called the preprocessor
directive. These statements instruct the compiler to include C preprocessors such as
header files and symbolic constants before compiling the C program. Some of the
preprocessor statements are listed below.
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics
Bts 205 bioinformatics

More Related Content

What's hot (20)

Flux balance analysis
Flux balance analysisFlux balance analysis
Flux balance analysis
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Comparitive genomics
Comparitive genomicsComparitive genomics
Comparitive genomics
 
Structure alignment methods
Structure alignment methodsStructure alignment methods
Structure alignment methods
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
PAM : Point Accepted Mutation
PAM : Point Accepted MutationPAM : Point Accepted Mutation
PAM : Point Accepted Mutation
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Lecture 3 bioprocess control
Lecture 3  bioprocess controlLecture 3  bioprocess control
Lecture 3 bioprocess control
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Functional annotation.pptx
Functional annotation.pptxFunctional annotation.pptx
Functional annotation.pptx
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS)
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 
Bioreactor control system
Bioreactor control system Bioreactor control system
Bioreactor control system
 
Protein modeling
Protein modelingProtein modeling
Protein modeling
 
Scop database
Scop databaseScop database
Scop database
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
Recombinant Proteins
Recombinant ProteinsRecombinant Proteins
Recombinant Proteins
 

Similar to Bts 205 bioinformatics

PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems
PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems
PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems Navodaya Institute of Technology
 
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...
Louisiana Biomedical Research Network -  Fall 2020 Bioinformatics Program Ove...Louisiana Biomedical Research Network -  Fall 2020 Bioinformatics Program Ove...
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...Elia Brodsky
 
Bth 303 genetic engineering
Bth 303 genetic engineeringBth 303 genetic engineering
Bth 303 genetic engineeringzamran khan
 
Cheminformatics Education with PubChem
Cheminformatics Education with PubChemCheminformatics Education with PubChem
Cheminformatics Education with PubChemSunghwan Kim
 
3D Printing and Additive Manufacturing - Principles and Applications
3D Printing and Additive Manufacturing - Principles and Applications3D Printing and Additive Manufacturing - Principles and Applications
3D Printing and Additive Manufacturing - Principles and ApplicationsWorld Scientific Publishing
 
Shyam-latest-resume-2015
Shyam-latest-resume-2015Shyam-latest-resume-2015
Shyam-latest-resume-2015shyam I
 
bioinformatics algorithms and its basics
bioinformatics algorithms and its basicsbioinformatics algorithms and its basics
bioinformatics algorithms and its basicssofav88068
 
BSI_April 2016_Raj Gunashekar
BSI_April 2016_Raj GunashekarBSI_April 2016_Raj Gunashekar
BSI_April 2016_Raj GunashekarRaj Gunashekar
 
Biocomputing
BiocomputingBiocomputing
Biocomputingijtsrd
 
II-SDV 2012 Patent Overlay Mapping
II-SDV 2012 Patent Overlay MappingII-SDV 2012 Patent Overlay Mapping
II-SDV 2012 Patent Overlay MappingDr. Haxel Consult
 
Paul Mignone_Inside 3D Printing Melbourne
Paul Mignone_Inside 3D Printing MelbournePaul Mignone_Inside 3D Printing Melbourne
Paul Mignone_Inside 3D Printing MelbourneMediabistro
 
Bioinformatics Course at Indian Biosciences and Research Institute
Bioinformatics Course at Indian Biosciences and Research InstituteBioinformatics Course at Indian Biosciences and Research Institute
Bioinformatics Course at Indian Biosciences and Research Instituteajay vishwakrma
 
Research & Reviews Journal of Computational Biology vol 5 issue 3
Research & Reviews Journal of Computational Biology vol 5 issue 3Research & Reviews Journal of Computational Biology vol 5 issue 3
Research & Reviews Journal of Computational Biology vol 5 issue 3STM Journals
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...diannepatricia
 

Similar to Bts 205 bioinformatics (20)

PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems
PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems
PSK Method for Solving Type-1 and Type-3 Fuzzy Transportation Problems
 
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...
Louisiana Biomedical Research Network -  Fall 2020 Bioinformatics Program Ove...Louisiana Biomedical Research Network -  Fall 2020 Bioinformatics Program Ove...
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...
 
Portfolio
PortfolioPortfolio
Portfolio
 
Covid 19[hbk]
Covid 19[hbk]Covid 19[hbk]
Covid 19[hbk]
 
IJPR short presentation 2021 updated
IJPR short presentation 2021 updatedIJPR short presentation 2021 updated
IJPR short presentation 2021 updated
 
Bth 303 genetic engineering
Bth 303 genetic engineeringBth 303 genetic engineering
Bth 303 genetic engineering
 
Cheminformatics Education with PubChem
Cheminformatics Education with PubChemCheminformatics Education with PubChem
Cheminformatics Education with PubChem
 
3D Printing and Additive Manufacturing - Principles and Applications
3D Printing and Additive Manufacturing - Principles and Applications3D Printing and Additive Manufacturing - Principles and Applications
3D Printing and Additive Manufacturing - Principles and Applications
 
Shyam-latest-resume-2015
Shyam-latest-resume-2015Shyam-latest-resume-2015
Shyam-latest-resume-2015
 
bioinformatics algorithms and its basics
bioinformatics algorithms and its basicsbioinformatics algorithms and its basics
bioinformatics algorithms and its basics
 
Algorithms
AlgorithmsAlgorithms
Algorithms
 
BSI_April 2016_Raj Gunashekar
BSI_April 2016_Raj GunashekarBSI_April 2016_Raj Gunashekar
BSI_April 2016_Raj Gunashekar
 
Gaber.pdf
Gaber.pdfGaber.pdf
Gaber.pdf
 
Biocomputing
BiocomputingBiocomputing
Biocomputing
 
II-SDV 2012 Patent Overlay Mapping
II-SDV 2012 Patent Overlay MappingII-SDV 2012 Patent Overlay Mapping
II-SDV 2012 Patent Overlay Mapping
 
Paul Mignone_Inside 3D Printing Melbourne
Paul Mignone_Inside 3D Printing MelbournePaul Mignone_Inside 3D Printing Melbourne
Paul Mignone_Inside 3D Printing Melbourne
 
Bioinformatics Course at Indian Biosciences and Research Institute
Bioinformatics Course at Indian Biosciences and Research InstituteBioinformatics Course at Indian Biosciences and Research Institute
Bioinformatics Course at Indian Biosciences and Research Institute
 
Research & Reviews Journal of Computational Biology vol 5 issue 3
Research & Reviews Journal of Computational Biology vol 5 issue 3Research & Reviews Journal of Computational Biology vol 5 issue 3
Research & Reviews Journal of Computational Biology vol 5 issue 3
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
 

More from zamran khan

Seminar on entreprenship
Seminar on entreprenshipSeminar on entreprenship
Seminar on entreprenshipzamran khan
 
Seminar on pgr in plants
Seminar on pgr in plantsSeminar on pgr in plants
Seminar on pgr in plantszamran khan
 
2017 pg 240 to 244, 651 to 654
2017 pg  240 to 244, 651 to 6542017 pg  240 to 244, 651 to 654
2017 pg 240 to 244, 651 to 654zamran khan
 
Seminar on green house effect
Seminar on green house effectSeminar on green house effect
Seminar on green house effectzamran khan
 
Bth 103 general microbiology
Bth 103 general microbiologyBth 103 general microbiology
Bth 103 general microbiologyzamran khan
 
Bth 101 cell biology
Bth 101 cell biologyBth 101 cell biology
Bth 101 cell biologyzamran khan
 
Compound lipids.
Compound lipids.  Compound lipids.
Compound lipids. zamran khan
 

More from zamran khan (7)

Seminar on entreprenship
Seminar on entreprenshipSeminar on entreprenship
Seminar on entreprenship
 
Seminar on pgr in plants
Seminar on pgr in plantsSeminar on pgr in plants
Seminar on pgr in plants
 
2017 pg 240 to 244, 651 to 654
2017 pg  240 to 244, 651 to 6542017 pg  240 to 244, 651 to 654
2017 pg 240 to 244, 651 to 654
 
Seminar on green house effect
Seminar on green house effectSeminar on green house effect
Seminar on green house effect
 
Bth 103 general microbiology
Bth 103 general microbiologyBth 103 general microbiology
Bth 103 general microbiology
 
Bth 101 cell biology
Bth 101 cell biologyBth 101 cell biology
Bth 101 cell biology
 
Compound lipids.
Compound lipids.  Compound lipids.
Compound lipids.
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Recently uploaded (20)

Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Bts 205 bioinformatics

  • 1. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 1 of 183 FOR MSC BIOTECHNOLOGY STUDENTS 2014 ONWARDS Biochemistry scanner THE IMPRINT BTS - 205: BIOINFORMATICS As per Bangalore University (CBCS) Syllabus 2016 Edition BY: Prof. Balasubramanian Sathyamurthy Supported By: Ayesha Siddiqui Kiran K.S. THE MATERIALS FROM “THE IMPRINT (BIOCHEMISTRY SCANNER)” ARE NOT FOR COMMERCIAL OR BRAND BUILDING. HENCE ONLY ACADEMIC CONTENT WILL BE PRESENT INSIDE. WE THANK ALL THE CONTRIBUTORS FOR ENCOURAGING THIS. BE GOOD – DO GOOD & HELP OTHERS
  • 2. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 2 of 183 DEDICATIONDEDICATIONDEDICATIONDEDICATION I dedicate this mI dedicate this mI dedicate this mI dedicate this material to my spiritual guru Shri Raghavendra swamigal,aterial to my spiritual guru Shri Raghavendra swamigal,aterial to my spiritual guru Shri Raghavendra swamigal,aterial to my spiritual guru Shri Raghavendra swamigal, parents, teachers, well wishers and students who always increase my moraleparents, teachers, well wishers and students who always increase my moraleparents, teachers, well wishers and students who always increase my moraleparents, teachers, well wishers and students who always increase my morale and confidence to share myand confidence to share myand confidence to share myand confidence to share my knowledgeknowledgeknowledgeknowledge totototo reachreachreachreach all beneficiariesall beneficiariesall beneficiariesall beneficiaries.... PREFACEPREFACEPREFACEPREFACE Biochemistry scanner ‘THE IMPRINT’ consists of last ten years solved question paper of Bangalore University keeping in mind the syllabus and examination pattern of the University. The content taken from the reference books has been presented in a simple language for better understanding. The Author Prof. Balasubramanian Sathyamurthy has 15 years of teaching experience and has taught in 5 Indian Universities including Bangalore University and more than 20 students has got university ranking under his guidance. THE IMPRINT is a genuine effort by the students to help their peers with their examinations with the strategy that has been successfully utilized by them. These final year M.Sc students have proven their mettle in university examinations and are College / University rank holders. This is truly for the students, by the students. We thank all the contributors for their valuable suggestion in bringing out this book. We hope this will be appreciated by the students and teachers alike. Suggestions are welcomed. For any comments, queries, and suggestions and to get your free copy write us at theimprintbiochemistry@gmail.com or call 9980494461
  • 3. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 3 of 183 CONTRIBUTORS: CHETAN ABBUR ANJALI TIWARI AASHITA SINHA ASHWINI BELLATTI BHARATH K CHAITHRA GADIPARTHI VAMSEEKRISHNA KALYAN BANERJEE KAMALA KISHORE KIRAN KIRAN H.R KRUTHI PRABAKAR KRUPA S LATHA M MAMATA MADHU PRAKASHHA G D MANJUNATH .B.P NAYAB RASOOL S NAVYA KUCHARLAPATI NEHA SHARIFF DIVYA DUBEY NOOR AYESHA M PAYAL BANERJEE POONAM PANCHAL PRAVEEN PRAKASH K J M PRADEEP.R PURSHOTHAM PUPPALA DEEPTHI RAGHUNATH REDDY V RAMYA S RAVI RESHMA RUBY SHA SALMA H. SHWETHA B S SHILPI CHOUBEY SOUMOUNDA DAS SURENDRA N THUMMALA MANOJ UDAYASHRE. B DEEPIKA SHARMA EDITION : 2016 PRINT : Bangalore CONTACT : theimprintbiochemistry@gmail.com or 9980494461
  • 4. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 4 of 183 M. SC. BIOTECHNOLOGY – SECOND SEMESTER BTH - 205: BIOINFORMATICS 26 hrs UNIT – 1: INTRODUCTION TO COMPUTER Computer softwares – operating system – Windows, UNIX and Linux, Application software – Word processor, spread sheet. Introduction to statistical software (SPSS). (2 hrs) UNIT – 2: COMPUTER NETWORK AND PROGRAMMING LANGUAGES Structure, architecture, Advantages, types (LAN, MAN & WAN), Network protocols – Internal protocol (TCP /IP), File transfer protocols (FTP), WWW, HTTP, HTML, URL. Network Security – Group polices Fire –walls. C Programming and PERL – Algorithm and flow chart, Structure of C program, Header file, Global declaration. Main function variable declarations, Control statement – conditional and unconditional – sub functions. Introduction and application of PERL & Bioperl (6 hrs) UNIT – 3: DATABASES: Introduction - Relational Databases Management (RDMS) - Oracle, SQL, Database generation. (6 hrs) UNIT – 4: BIOLOGICAL DATABASES: Data mining and applications, accessing bibliographic databases – Pubmed, Nucleic acid sequence databank – NCBI and EMBL. Protein sequence databank – NBRF – PIR, SWISSPROT. Structural databases – Protein data Bank (PDB). KEGG: Kyoto Encyclopedia of Genes and Genomes (metabolic pathway data bank), Microbial genomic database (MBGD), Cell line database (ATCC), Virus data bank ( UICTV db). Sequence alignment – Global and Local alignment, scoring matrices. Restriciting mapping – WEB CUTTER & NEB CUTTER, Similarity searching ( FASTA and BLAST), Pair wise
  • 5. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 5 of 183 comparision of sequences, Multiple Sequence alignment of sequences, Identification of genes in genomes and Phylogenetic analysis with reference to nucleic acids and protein sequences, Identification of ORF’s. Identification of motifs. (10 hrs) UNIT - 5 PROTEIN STRUCTURE AND MOLECULAR INTERACTION: Introduction to protein structure – secondary structure prediction, tertiary structure prediction, protein modeling – principles of homology and comparative modeling. Threading structure evaluation and validation and ab intio Modelling, Applications – Rational Drug design and Molecular docking – Autodock. (5 hrs) References: 1. Dhananjaya (2002). Introduction to Bioinformatics, www.sd-bio.com series 2. Jan (2001). Nucleic acid research, Genome Database issue 3. Higgins & Taylor (2000). Bioinformatics, OUP. 4. Baxavanis (1998). Bioinformatics. 5. Fry, J.C. (1993). Biological Data Analysis. A practical Approach. IRL Press, Oxford. 6. Swardlaw, A.C. (1985). Practical Statistics for Experimental Biologists, Joh
  • 6. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 6 of 183 UNIT – 1: INTRODUCTION TO COMPUTER Computer softwares – operating system – Windows, UNIX and Linux, Application software – Word processor, spread sheet. Introduction to statistical software (SPSS). INTRODUCTION The computer comprises of technologically advanced hardware put together to work at great speed. To accomplish its various tasks, the computer is made of different parts, each serving a particular purpose in conjunction with other parts. In other words, a 'computer' is an ensemble of different machines that you will be using to accomplish your job. A computer is primarily made of the Central Processing Unit (usually referred to as the computer), the monitor, the keyboard and the mouse. Other pieces of hardware, commonly referred to as peripherals, can enhance or improve your experience with the computer. Evolution of computer technology The origin of computer technology took place in the 19th century. People desired to have a machine that would carry out mathematical calculations for them. The ABACUS is considered to have been the first computer in the world. It was used to perform simple measurements and calculations. ABACUS is available even today for school going children. In the 17th century, a scientist named Pascal developed a machine that could perform mathematical calculations. This machine comprised of a number of gears. The movement of gear mechanism was used to perform some calculations. He named the machine PASCALINE. However, the concept of the modern computer was propounded by the scientist and mathematician Charles Babbage. He first wrote on the use of logic and loops in process execution. Based on the concept of logic and loops, Babbage envisaged two models for performing computations- Analytical Engine and Difference Engine. In those days, electronics was not developed. Therefore, these models proposed by Babbage existed only on paper. However, the ideas given by Babbage were implemented after the invention of electronics. George Boolean developed the famous Boolean algebra based on binary numbers. De Morgan put forward theorems on logic gates. These theorems are known as De Morgan’s Theorems. Lady Ada was the first computer programmer.
  • 7. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 7 of 183 The real application of computers began in the late fifties. The computers were used in the United States for various applications such as census, defence, R&D, universities etc. Advantages of computers Compared to traditional systems, computers offer many noteworthy advantages. This is one reason that traditional systems are being replaced rapidly by computer- based systems. The main advantages offered by computers are as follows: High Accuracy Superior Speed of Operation Large Storage Capacity User-friendly Features Portability Platform independence Economical in the long term Types of computers Computers are classified in a variety of ways depending upon the principles of working, construction, size and applications. Various types of computers are discussed in this section. Digital and analog computers Analog computers The computers that process analog signals are known as Analog Computers. The analog signal is a continuous signal. For example, sine wave is an analog signal. The analog quantities are based on decimal number systems. Examples of Analog computers are the slide rule, ABACUS etc. The operational amplifiers are widely used in the construction of analog computers when the analog electrical signal is to be processed. For example, a differentiator is the op amp circuit that differentiates input signal. If the input signal V sin q is given to analog computer, the output would be V cos q. Accordingly, the analog computer that generates the second order differential equation can be drawn as given in Fig.
  • 8. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 8 of 183 Digital computers Computers that process digital signals are known as Digital Computers. The Digital signal is a discrete signal with two states 0 and 1. In practice, the digital computers are used and not analog. Examples of digital computers are personal computers, supercomputers, mainframe computers etc. Supercomputers Are the most powerful computers in terms of speed of execution and large storage capacity. NASA uses supercomputers to track and control space explorations. Mainframe Computers: Are next to supercomputers in terms of capacity. The mainframe computers are multi terminal computers, which can be shared simultaneously by multiple users. Unlike personal computers, mainframe computers offer time-sharing. For example, insurance companies use mainframe computers to process information about millions of its policyholders. Minicomputers These computers are also known as midrange computers. These are desk-sized machines and are used in medium scale applications. For example, production departments use minicomputers to monitor various manufacturing processes and assembly-line operations. Microcomputers As compared to supercomputers, mainframes and minicomputers, microcomputers are the least powerful, but these are very widely used and rapidly gaining in popularity.
  • 9. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 9 of 183 Personal Computer PC is the term referred to the computer that is designed for use by a single person. PCs are also called microcontrollers because these are smaller when compared to mainframes and minicomputers. The term ‘PC’ is frequently used to refer to desktop computers. Although PCs are used by individuals, they can also be used in computer networks. Desktop Computer This is the most commonly used personal computer. It comprises of a keyboard, mouse, monitor and system unit. The system unit is also known as cabinet or chassis. It is the container that houses most of the components such as motherboard, disk drives, ports, switch mode power supply and add-on cards etc. The desktop computers are available in two models- horizontal model and tower model. Laptops Are also called notebook computers. These are the portable computers. They have a size of 8.5 x 11 inch and weigh about three-to-four kilos. Palmtops Palmtops are also called handheld computers. These are computing devices, which are small enough to fit into your palm. The size of a palmtop is like an appointment book. The palmtops are generally kept for personal use such as taking notes, developing a list of friends, keeping track of dates, agendas etc. The Palmtop can also be connected to a
  • 10. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 10 of 183 PC for downloading data. It also provides value-added features such as voice input, Internet, cell phone, camera, movie player and GPS. Personal Digital Assistant (PDA) – is the palm type computer. It combines pen input, writing recognition, personal organisational tools and communication capabilities in a small package. COMPUTER ARCHITECTURE – INTERNAL AND EXTERNAL DEVICES Basic elements of a computer system are Mouse, Keyboard, monitor, memory, CPU, motherboard, Hard Disk, Speakers, Modem, power supply and processor. Mouse: Mouse is used for operating the system. Nowadays, optical mouse is more popular as compared to simple mouse. Keyboard: Keyboard is used to input data in to the system so that the system gives output to the user. Therefore, the keyboard is an integral part of the input system. A computer is essentially incomplete without a keyboard. Monitor: Monitor, which again is a very essential part of the computer system, displays the actions that the computer performs on our command. Motherboard: Motherboard again a necessary element of the computer system contains different elements as memory, processor, modem, slots for graphic card and LAN card. Hard Disk: Hard disk is used to store data permanently on computer. Modem: Modem is used to connecting to the Internet. Two types of modems are widely used. One is known as software modems and the other is known as hardware modems. Speakers: Speakers are also included in basic elements of a computer. It is not indispensible, because a computer can perform its function without speakers. However, we use them to for multiple purposes. Basic Computer Functioning A computer can be defined as an electronic device that accepts data from an input device, processes it, stores it in a disk and finally displays it on an output device such as a monitor.
  • 11. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 11 of 183 To understand the basic rudiments of the functioning of the computer refer to the basic block diagram of a computer as shown in Fig. This flow of information holds true for all types of computers such as Personal Computers, Laptops, Palmtops etc. In other words, the fundamental principle of working is the same. As shown in Fig. there are four main building blocks in a computer's functioninginput, processor, output and memory. The data is entered through input devices like the keyboard, disks or mouse. These input devices help convert data and programs into the language that the computer can process. The data received from the keyboard is processed by the CPU, i.e. the Central Processing Unit. The CPU controls and manipulates the data that produce information. The CPU is usually housed within the protective cartridge. The processed data is either stored in the memory or sent to the output device, as per the command given by the user. The memory unit holds data and program instructions for processing data. Output devices translate the processed information from the computer into a form that we can understand. Components of Computer System Motherboard The motherboard is the main component inside the case. It is a large rectangular board with integrated circuitry that connects the various parts of the computer as the CPU, RAM, Disk drives (CD, DVD, Hard disk or any others) as well as any other peripherals connected via the ports or the expansion slots. Components directly attached to the motherboard include: The central processing unit (CPU) performs most of the calculations that enable a computer to function and is sometimes referred to as the "brain" of the computer. It is usually cooled by a heat sink and fan.
  • 12. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 12 of 183 The chip set aids communication between the CPU and the other components of the system, including main memory. RAM (Random Access Memory) stores all are running processes (applications) and the current running OS. The BIOS includes boot firmware and power management. The Basic Input Output System tasks are handled by operating system drivers. Internal Buses connect the CPU to various internal components and to expansion cards for graphics and sound. Current Technology The north bridge memory controller, for RAM and PCI Express PCI Express, for expansion cards such as graphics and physics processors, and high- end network interfaces PCI, for other expansion cards SATA, for disk drives Obsolete Technology ATA (superseded by SATA) AGP (superseded by PCI Express) VLB VESA Local Bus (superseded by AGP) ISA (expansion card slot format obsolete in PCs but still used in industrial computers) External Bus Controllers support ports for external peripherals. These ports may be controlled directly by the south bridge I/O controller or based on expansion cards attached to the motherboard through the PCI bus. USB FireWire SATA SCSI POWER SUPPLY A power supply unit (PSU) converts alternating current (AC) electric power to low voltage DC power for the internal components of the computer. Some power supplies have a switch to change between 230 V and 115 V. Other models have automatic sensors that switch input voltage automatically or are able to accept any voltage within these limits.
  • 13. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 13 of 183 Power supply units used in computers are generally switch mode power supplies (SMPS). The SMPS provides regulated direct current power at several voltages as required by the motherboard and accessories such as disk drives and cooling fans. Removable media devices CD (compact disc): The most common type of removable media, suitable for music and data CD-ROM Drive: A device used for reading data from a CD CD Writer: A device used for both reading and writing data to and from a CD DVD (digital versatile disc): A popular type of removable media that is the same size as a CD but stores up to 12 times as much information- the most common way of transferring digital video and is popular for data storage DVD-ROM Drive: A device used for reading data from a DVD DVD Writer: A device used for both reading and writing data to and from a DVD DVD-RAM Drive: A device used for rapid writing and reading of data from a special type of DVD Blu-ray Disc: A high density optical disc format for data and high-definition video that can store 70 times as much information as a CD BD-ROM Drive: A device used for reading data from a Blu-ray disc. BD Writer: A device used for both reading and writing data to and from a Blu-ray disc HD DVD: A discontinued competitor to the Blu-ray format Floppy disk: An outdated storage device consisting of a thin disk of a flexible magnetic storage medium used today mainly for loading RAID drivers
  • 14. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 14 of 183 Iomega Zip drive: An outdated medium-capacity removable disk storage system, first introduced by Iomega in 1994 USB flash drive: A flash memory data storage device integrated with a USB interface, typically small, lightweight, removable and rewritable with varying capacities from hundreds of megabytes (in the same ballpark as CDs) to tens of gigabytes (surpassing, at great expense, Blu-ray discs) Tape drive: A device that reads and writes data on a magnetic tape, used for long term storage and backups Secondary storage This hardware keeps data inside the computer for later use and retains it even when the computer has no power. Hard disk: A device for medium-term storage of data Solid-state drive: A device quite similar to the hard disk, but containing no moving parts and which stores data in a digital format RAID array controller: A device to manage several internal or external hard disks and optionally some peripherals in order to achieve performance or reliability improvement in what is called a RAID array Sound card This device enables the computer to output sound to audio devices, as well as accept input from a microphone. Most modern computers have sound cards built-in to the motherboard, though it is common for a user to install a separate sound card as an upgrade. Most sound cards, either built-in or added, have surround sound capabilities. Other peripherals In addition, hardware devices can include external components of a computer system. The following are either standard or very common. Wheel mouse Includes various input and output devices, usually external to the computer system
  • 15. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 15 of 183 INPUT Text input devices Keyboard: A device to input text and characters by pressing buttons (referred to as keys) Pointing devices Mouse: A pointing device that detects two-dimensional motion relative to its supporting surface Optical Mouse: Uses light to determine motion Trackball: A pointing device consisting of an exposed protruding ball housed in a socket that detects rotation about the two axes Touch screen: Senses the user pressing directly on the display Gaming devices Joystick: A control device that consists of a handheld stick that pivots around one end, to detect angles in two or three dimensions Gamepad: A handheld game controller that relies on the digits/ fingers (especially thumbs) to provide input Game controller: A specific type of controller specialized for certain gaming purposes Image, video input devices Image scanner: A device that provides input by analysing images, printed text, handwriting or an object Webcam: A low resolution video camera used to provide visual input that can be easily transferred over the Internet Audio input devices Microphone: An acoustic sensor that provides input by converting sound into electrical signals
  • 16. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 16 of 183 COMPUTER SOFTWARES The computer performs its functions based on the instructions given by the user. The set of such instructions written for a particular task is known as a computer program. Program is the set of instructions that tells the computer how to process the data, into the form desired by the user. The language in which a computer program is written is known as programming language. The programming languages are classified as Low- level language and High-level language. Low-level language is further classified as Machine language and Assembly language. Machine language is expressed in terms of binary numbers i.e. 0 and 1 as the processor understands binary numbers only. However, it is difficult to read and write the program in terms of 0s and 1s. The machine language code is further simplified by converting it to the code called op code. The op code depends upon the type of processor. The program written in the op code is known as Assembly language code. During the run time, it is necessary to convert the op code into machine language so that the processor will understand and process the code. The internal program that translates op code to machine code is known as Assembler. Some examples of Assembler are Microsoft Assembler (MASM), Z-80, 8085, 8086 etc. The Assembler for each processor is different. Usage of the Assembly language requires knowledge of the Assembly language and computer hardware. It is more convenient to write a program in a High level language which comprises of instructions in simple English. Examples of High level language are BASIC, FORTRAN, COBOL etc. A compiler is the internal program that translates High level language to Machine language.
  • 17. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 17 of 183 'Software' is another name for program. In most cases, the terms ‘software’ and ‘program’ are interchangeable. There are two types of software - system software and application/ utility software. Application software is the end user software. The programs written under application software are designed for general purpose and special purpose applications. An example of application software is Microsoft Internet Explorer. System Software enables application software to interact with the computer hardware. System software is the ‘background’ software that helps the computer to manage its internal resources. The most important system software is the operating system. The system software performs important tasks such as running the program, storing data, processing data etc. Windows XP is an example of system software. OPERATING SYSTEM – WINDOWS, UNIX AND LINUX Disk Operating System (specifically) and disk operating system (generically), most often abbreviated as DOS, refers to an operating system software, used in most computers, which provides the abstraction and management of secondary storage devices and the information on them (e.g., file systems for organizing files of all sorts). Such software is referred to as a disk operating system since the storage devices it manages are made of rotating platters (such as hard disks or floppy disks). DOS is the medium through which the user and external devices attached to the system communicate. DOS translates the command issued by the user in the format that is comprehensible by the computer and instructs the computer to function accordingly. It also translates the result and any error message in the format for the user to understand. Features of windows Microsoft Windows is a series of software operating systems and graphical user interfaces developed by Microsoft. Some of its important features are listed below: Faster Operating System: Windows include tools that increase the speed of the computer. Windows includes a set of programs designed to optimize the efficiency of computer, especially when used together. Improved Reliability: Windows improves computer reliability by introducing new wizards, utilities and resources that lend a hand in helping your system operate effortlessly. Innovative, Easy to use features:
  • 18. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 18 of 183 Windows makes your computer easier to use with new and enhanced features. Overview of different versions of windows The different versions of Windows are discussed below: WINDOWS 1.0 Microsoft released the first version of Windows way back in 1985. It marked a major breakthrough as it allowed users to switch from character based (CUI)/non-graphical MSDOS to the GUI based operating system. The product incorporated a set of desktop applications, including the MS-DOS file management program and value additions such as a calendar, card file, notepad, calculator, clock and telecommunications programs. It allowed users to work with multiple applications at the same time (multitasking). WINDOWS 3.0 Microsoft released this version of Windows in 1990. Some of its main features were: 32 bit operating system with support for advanced graphics Inclusion of Program Manager, File Manager and Print Manager A completely rewritten application development with new capabilities and native support for applications running in extended memory and fully pre-emptive MS-DOS multitasking • Inclusion of Windows software development kit (SDK), which facilitated software developers focus more on writing applications and less on writing device drivers. Improved Windows icons WINDOWS NT 3.1/3.11 Microsoft released this version of Windows on July 27, 1993. This OS marked an important milestone for Microsoft. Some of its main features were: It was the first Windows operating system to merge support for high-end client/server business applications. It contains new built-in features for security, operating system power, performance, desktop scalability and reliability. It included support for multiprocessor (more than one CPU) architecture. Windows NT was geared towards business users and had a rich Application Programming Interface (API), which made it easier to run high-end engineering and scientific applications. WINDOWS 95 Microsoft released this version of Windows in 1995. Some of its main features were:
  • 19. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 19 of 183 Provided 32 bit operating system with built-in Internet-support Facilitated easy installation of hardware peripherals and software applications through plug- and- play capabilities Enhanced multimedia capabilities, more powerful features for mobile computing and integrated networking WINDOWS 98 Microsoft released this version of Windows in 1998. It is often described as an operating system that ‘Works Better, Plays Better’. Some of its main features were: New features were added to enable easy access to Internet-related information. Multiple display support allowed using several Visual Display Units (VDU) simultaneously to augment the capacity of the desktop and to allow running of different programs on separate monitors. USB Support – the Universal Serial Bus made a computer easier to use with advanced plug-and-play capabilities. It allowed supplementing devices to your computer without having to restart each time a device is added to the computer. Accessibility wizard made it easier for physically challenged people to operate a computer without installing any special software. An extensive and easy-to-use self-help system was provided. WINDOWS 2000 PROFESSIONAL Microsoft released this version of Windows in 2000. It was an upgrade to Windows NT4.0. It was designed with the aim to replace Windows 95, Windows 98 and Windows NT 4.0 on desktops and laptops. It added major improvement in reliability, easy usage, internet compatibility and support for mobile computing. It made hardware installation much easier by adding support to a wide variety of new Plug and Play hardware, including advanced networking and wireless products, USB devices and infrared devices. WINDOWS XP Windows XP features user-friendly screens, simplified menus among other features. It was a major breakthrough for desktop operating systems. Two main versions of Windows XP were released, viz. Windows XP Home Edition and Windows XP Professional Edition. Features of Windows XP are: Safe and easy personal computing:
  • 20. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 20 of 183 Windows XP makes personal computing easy and enjoyable. Along with unmatched dependability and security, Windows XP displays power, performance, a bright original appearance and abundant assistance tailored to one's requirement. World of Digital Media: Work at length using digital media while at home, at work and on the Internet. Enjoy photography, music, videos, computer games and more. Connected Home and Office: Share files, photos, music, even a printer and Internet connection; all on a network that is private and secure. Best for Business: With Windows XP, you get the established reliability of Microsoft Windows 2000, enhanced for high-speed performance and even superior consistency. Text Processing Software The text processing Software or Word Processing is one of the most significant Application packages of Windows. The Word processing software is used for creating documents. Drafts, letters, reports, essays, write-ups etc. can be created by means of word processing software. Earlier, Word Star was being used extensively for this purpose. However, the most commonplace word processing package used today is Microsoft Word. Microsoft Word is Microsoft's word processing software. It was first released in 1983 bearing the name Multi-Tool Word for Xenix systems. Later, Versions for several other platforms including IBM PCs running DOS (1983), the Apple Macintosh (1984), SCO UNIX, OS/2 and Microsoft Windows (1989) were written. It is a component of the Microsoft Office system; however, it is also sold as a standalone product and included in Microsoft Works Suite. Beginning with the 2003 version, the branding was revised to emphasize Word's identity as a component within the Office suite. Microsoft began calling it Microsoft Office Word instead of merely Microsoft Word. The latest releases are Word 2007 for Windows and Word 2008 for Mac OS X. Once again, the 2010 version appears to be branded as Microsoft Word, once again. The contemporary versions are Microsoft Word 2010 for Windows and 2008 for Mac. The significant features of MS Word are as follows: It is an easy and simple package for a general user.
  • 21. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 21 of 183 The features such as paragraph, font, symbols, spell check, table, drawing, bullets and numbering, page numbering provided by this package enable a user to develop a document in an error free format. The text file generated by MS Word is .doc. This file can be used in other applications such as MS Excel, MS Visual Studio 6.0, MS Visual Studio.net, Web browser, pdf format etc. APPLICATION SOFTWARE – WORD PROCESSOR, SPREAD SHEET Word processing software Word processing software is used for creating documents. Drafts, letters, reports, essays, write-ups etc can be created using word processing software. Earlier, Word Star was being used widely for this purpose. Sidekick and Word Perfect are also used for drafting letters. However, the most commonly used word processing package in the world is Microsoft Word, which will be discussed later in this book. Spreadsheets Spreadsheet is a computer application that simulates a paper worksheet. It displays multiple cells that together, make up a grid consisting of rows and columns, each cell containing either alphanumeric text or numeric values. Spreadsheets are frequently used for financial information because of their ability to re-calculate the entire sheet automatically after a change to a single cell is made. Microsoft excel Microsoft had been developing Excel on the Macintosh platform for several years to the point, where it has developed into a powerful system. A port of Excel to Windows 2.0 resulted in a fully functional Windows spreadsheet. Starting in the mid 1990s and continuing through the present, Microsoft Excel has dominated the commercial electronic spreadsheet market. Presentation programs Microsoft PowerPoint is a presentation program developed by Microsoft. It is part of the Microsoft Office suite and runs on Microsoft Windows and Apple's Mac OS X computer operating systems. PowerPoint is widely used by business people, educators, students and trainers and is among the most prevalent forms of persuasive technology. Beginning with Microsoft Office 2003, Microsoft revised the branding to emphasize PowerPoint's place within the office suite, calling it Microsoft Office PowerPoint instead of just Microsoft PowerPoint.
  • 22. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 22 of 183 The current versions are Microsoft Office PowerPoint 2007 for Windows and 2008 for Mac. Graphics programs Computer graphics are graphics, which are created with the aid of computers and the representation and manipulation of pictorial data by a computer. The development of computer graphics has made the application more user-friendly. It is also easier to understand and interpret many types of data. Developments in computer graphics had a profound impact on many types of media and revolutionized the animation and video game industry. The term computer graphics includes everything on computers that is not text or sound. Today nearly all computers use some graphics and users expect to control their computer through icons and pictures rather than just by typing. Computer graphics has the following features: Representation and manipulation of pictorial data by a computer Development of technologies used to create and manipulate such pictorial data Digitally synthesizing and manipulating visual content Today computer-generated images touch many aspects of our daily life. Computer imagery is found on television, in newspapers, in weather reports and during surgical procedures. A well-constructed graph can present complex statistics in a way that is easier to understand and interpret. Such graphs are used to illustrate papers, reports, thesis and other presentation material. A range of tools and facilities are available to enable users to visualize their data. After data collection on paper, the next step is data entry in a computer system. While small data sets can be easily transferred directly to a data matrix on the screen, one can make use of computer forms to enter large amounts of data. These computer forms are the analogue of case report forms on a computer screen. Data are typed into the various fields of the form. Commercial programs allowing the design and use of forms are, e.g., Microsoft Office Access or SAS. Epi Info™ is a freeware program (http://www.cdc.gov/epiinfo/, cf. Fig. Using the format of Access to save data, but with a graphical user interface that is easier to handle than that of Access. After data entry using forms, the data values are saved in data matrices. One row of the data matrix corresponds to one form, and each column of a row corresponds to one field of a form.
  • 23. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 23 of 183 Electronic data bases can usually be converted from one program to another, e. g., from a data entry program to a statistical software system like SPSS, which will be used throughout the lecture. SPSS also offers a commercial program for the design of case report forms and data entry (“SPSS Data Entry”). When building a data base, no matter which program is used, the first step is to decide which variables it should consist of. In a second step we must define the properties of each variable. The following rules apply: The first variable should contain a unique patient identification number, which is also recorded on the case report forms. Each property that may vary in individuals can be considered as a variable. With repeated measurements on individuals (e.g., before and after treatment) there are several alternatives: Wide format: One row (form) per individual; repeated measurements on the same property are recorded in multiple columns (or fields on the form); e. g., PAT=patient identification number, VAS1, VAS2, VAS3 = repeated measurements on VAS Long format: One row per individual and measurement; repeated measurements on the same property are recorded in multiple rows of the same column, using a separate column to define the time of measurement; e.g., PAT=patient identification number, TIME=time of measurement, VAS=value of VAS at the time of measurement If for the first alternative the number of columns (fields) becomes too large such that computer forms become too complex, the second alternative will be chosen. Note that we can always restructure data from the wide to the long format and vice versa. SPSS. The statistical software package SPSS appears in several windows: The data editor is used to build data bases, to enter data by hand, to import data from other programs, to edit data, and to perform interactive data analyses. The viewer collects results of data analyses. The chart editor facilitates the modification of diagrams prepared by SPSS and allows identifying individuals on a chart.
  • 24. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 24 of 183 Using the syntax editor, commands can be entered and collected in program scripts, which facilitates non-interactive automated data analysis. Data files are represented in the data editor, which consists of two tables (views). The data view shows the data matrix with rows and columns corresponding to individuals and variables, respectively. The variable view contains the properties of the variables. It can be used to define new variables or to modify properties of existing variables. These properties are: Name: unique alphanumeric name of a variable, starting with an alphabetic character. The name may consist of alphabetic and numeric characters and the underscore (“_”). Neither spaces nor special characters are allowed. Width: the maximum number of places. Decimals: the number of decimal places. Label: a description of the variable. The label will be used to name variables in all menus and in the output viewer. Values: labels assigned to values of - nominal or ordinal - variables. The value labels replace values in any category listings. Example: value 1 corresponds to value label ‘male’, value 2 to label ‘female’. Data are entered as 1 and 2, but SPSS displays ‘male’ and ‘female’. Using the button one can switch between the display of the values and the value labels in the data view. Within the value label view, one can directly choose from the defined value labels when entering data.
  • 25. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 25 of 183 Missing: particular values may be defined as missing values. These values are not used in any analyses. Usually, missing data values are represented as empty fields and if a data value is missing for an individual, it is left empty in the data matrix. Columns: defines the width of the data matrix column of a variable as number of characters. This value is only relevant for display and changes if one column is broadened using the mouse. Align: defines the alignment of the data matrix column of a variable (left/right/center). Measure: nominal, ordinal or scale. The applicability of particular statistical operations on a variable (e. g., computing the mean) depends on the measure of the variable. The measure of a variable is called o nominal if each observation belongs to one of a set of categories, and it is called o ordinal if these categories have a natural order. SPSS calls the measure of a variable o ‘scale’ if observations are numerical values that represent different magnitudes of the variable. Usually (outside SPSS), such variables are called `metric’, ‘continuous’ or ‘quantitative’, compared to the ‘qualitative’ and ‘semi- quantitative’ nature of nominal and ordinal variables, respectively. Examples for nominal variables are sex, type of operation, or type of treatment. Examples for ordinal variables are treatment dose, tumor stage, response rating, etc. Scale variables are, e. g., height, weight, blood pressure, age. Type: the format in which data values are stored. The most important are the numeric, string, and date formats. o Nominal and ordinal variables: choose the numeric type. Categories should be coded as 1, 2, 3, … or 0, 1, 2, … Value labels should be used to paraphrase the codes. o Scale variables: choose the numeric type, pay attention to the correct number of decimal places, which applies to all computed statistics (e. g., if a variable is defined with 2 decimal places, and you compute the mean of that variable, then 2 decimal places will be shown). However, the computational accuracy is not affected by this option.
  • 26. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 26 of 183 o Date variables: choose the date type; otherwise, SPSS won’t be able to compute the length of time passing between two dates correctly. The string format should only be used for text that will not be analyzed statistically (e. g., addresses, remarks). For nominal or ordinal variables, the numeric type should be used throughout as it requires an exact definition of categories. The list of possible categories can be extended while entering data, and category codes can be recoded after data entry. Example: Consider the variable “location” and the possible outcome category “pancreas” and “stomach”. How should this variable be defined? Proper definition: The variable is defined properly, if these categories are given two numeric codes, 1 and 2, say. Value labels paraphrase these codes: In any output produced by SPSS, these value labels will be used instead of the numeric codes. When entering data, the user may choose between any of the predefined outcome categories:
  • 27. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 27 of 183 Improper definition: The variable is defined with string type of length 10. The user enters alphabetical characters instead of choosing from a list (or entering numbers). This may easily lead to various versions of the same category: All entries in the column “location” will be treated as separate categories. Thus the program works with six different categories instead of two. Further remarks applying to data entry with any program: Numerical variables should only contain numbers and no units of measurements (e.g. “kg”, “mm Hg”, “points”) or other alphabetical or special characters. This is of special importance if a spreadsheet program like Microsoft Excel is used for data entry. Unlike real data base programs, spreadsheet programs allow the user to enter any type of data in any cell, so the user solely takes responsibility over the entries. “True” missing values should be left empty rather than using special codes for them (e.g. -999, -998, -997). If special codes are used, they must be defined as missing value codes and they should be defined as value lables as well. Special codes can be advantageous for “temporal” missing values (e.g. -999=”ask patient”, -998=”ask nurse”, -997=”check CRF”). A missing value means that the value is missing. By constrast, in Microsoft® Office Excel® an empty cell is sometimes interpreted as zero. Imprecise values can be characterized by adding a column showing the degree of certainty that is associated with such values (e. g., 1=exact value, 0=imprecise value). This allows the analyst to drive two analyses: one with exact values only, and one using also imprecise values. By no way should imprecisely collected data values be tagged by
  • 28. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 28 of 183 a question mark! This will turn the column into a string format, and SPSS (or any other statistics program) will not be able to use it for analyses. Enter numbers without using separators (e. g., enter 1000 as 1000 not as 1,000). If in a data base or statistics program a variable is defined as numeric then it is not possible to enter something else than numbers! Therefore, programs that do not distinguish variable types are error-prone (e. g., Excel). For a more sophisticated discussion about data management issues the reader is referred to Appendices A and B.
  • 29. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 29 of 183 UNIT – 2: COMPUTER NETWORK AND PROGRAMMING LANGUAGES Structure, architecture, Advantages, types (LAN, MAN & WAN), Network protocols – Internal protocol (TCP /IP), File transfer protocols (FTP), WWW, HTTP, HTML, URL. Network Security – Group polices Fire –walls. C Programming and PERL – Algorithm and flow chart, Structure of C program, Header file, Global declaration. Main function variable declarations, Control statement – conditional and unconditional – sub functions. Introduction and application of PERL & Bioperl STRUCTURE, ARCHITECTURE, ADVANTAGES, TYPES (LAN, MAN & WAN) Local Area Network (LAN) is a data communications network connecting terminals, computers and printers within a building or other geographically limited areas. These devices may be connected through wired cables or wireless links. Ethernet, Token Ring and Wireless LAN using IEEE 802.11 are examples of standard LAN technologies. Ethernet is by far the most commonly used LAN technology. Token Ring technology is still used by some companies. FDDI is sometimes used as a backbone LAN interconnecting Ethernet or Token Ring LANs. WLAN using IEEE 802.11 technologies is rapidly becoming the new leading LAN technology because of its mobility and easy to use features. Local Area Networks can be interconnected using Wide Area Network (WAN) or Metropolitan Area Network (MAN) technologies. The common WAN technologies include TCP/IP, ATM, and Frame Relay etc. The common MAN technologies include SMDS and 10 Gigabit Ethernet. LANs are traditionally used to connect a group of people who are in the same local area. However, working groups are becoming more geographically distributed in today’s working environment. In these cases, virtual LAN (VLAN) technologies are defined for people in different places to share the same networking resource. Local Area Network protocols are mostly at the data link layer (layer 2). IEEE is the leading organization defining LAN standards. Ethernet: IEEE 802.3 Local Area Network protocols Protocol Description Ethernet protocols refer to the family of local-area networks (LAN) covered by a group of IEEE 802.3 standards. In the Ethernet standard, there are two modes of operation: half-duplex and full-duplex. In the half-duplex mode, data are transmitted using the
  • 30. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 30 of 183 popular Carrier-Sense Multiple Access/Collision Detection (CSMA/CD) protocol on a shared medium. The main disadvantages of the half-duplex are the efficiency and distance limitation, in which the link distance is limited by the minimum MAC frame size. This restriction reduces the efficiency drastically for high-rate transmission. Therefore, the carrier extension technique is used to ensure the minimum frame size of 512 bytes in Gigabit Ethernet to achieve a reasonable link distance. Four data rates are currently defined for operation over optical fiber and twisted-pair cables: 10 Mbps—10Base-T Ethernet (802.3) 100 Mbps—Fast Ethernet (802.3u) 1000 Mbps—Gigabit Ethernet (802.3z) 10-Gigabit Ethernet - IEEE 802.3ae The general aspects of the Ethernet The Ethernet system consists of three basic elements: The physical medium used to carry Ethernet signals between computers, A set of medium access control rules embedded in each Ethernet interface that allows multiple computers to fairly arbitrate access to the shared Ethernet channel, and An Ethernet frame that consists of a standardized set of bits used to carry data over the system. As with all IEEE 802 protocols, the ISO data link layer is divided into two IEEE 802 sublayers, the Media Access Control (MAC) sub-layer and the MAC-client sublayer. The IEEE 802.3 physical layer corresponds to the ISO physical layer. The MAC sublayer has two primary responsibilities: Data encapsulation, including frame assembly before transmission, and frame parsing/error detection during and after reception. Media access control, including initiation of frame transmission and recovery from transmission failure The MAC-client sublayer may be one of the following: Logical Link Control (LLC), which provides the interface between the Ethernet MAC and the upper layers in the protocol stack of the end station. The LLC sublayer is defined by IEEE 802.2 standards. Bridge entity, which provides LAN-to-LAN interfaces between
  • 31. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 31 of 183 LANs that use the same protocol (for example, Ethernet to Ethernet) and also between different protocols (for example, Ethernet to Token Ring). Bridge entities are defined by IEEE 802.1 standards. Each Ethernet-equipped computer operates independently of all other stations on the network: there is no central controller. All stations attached to an Ethernet are connected to a shared signaling system, also called the medium. To send data a station first listens to the channel and, when the channel is idle then transmits its data in the form of an Ethernet frame, or packet. After each frame transmission, all stations on the network must contend equally for the next frame transmission opportunity. Access to the shared channel is determined by the medium access control (MAC) mechanism embedded in the Ethernet interface located in each station. The medium access control mechanism is based on a system called Carrier Sense Multiple Access with Collision Detection (CSMA/CD). As each Ethernet frame is sent onto the shared signal channel, all Ethernet interfaces look at the destination address. If the destination address of the frame matches with the interface address, the frame will be read entirely and be delivered to the networking software running on that computer. All other network interfaces will stop reading the frame when they discover that the destination address does not match their own address. When it comes to how signals flow over the set of media segments that make up an Ethernet system, it helps to understand the topology of the system. The signal topology of the Ethernet is also known as the logical topology, to distinguish it from the actual physical layout of the media cables. The logical topology of an Ethernet provides a single channel (or bus) that carries Ethernet signals to all stations. Multiple Ethernet segments can be linked together to form a larger Ethernet LAN using a signal amplifying and retiming device called a repeater. Through the use of repeaters, a given Ethernet system of multiple segments can grow as a “non-rooted branching tree.” “Non-rooted” means that the resulting system of linked segments may grow in any direction, and does not have a specific root segment. Most importantly, segments must never be connected in a loop. Every segment in the system must have two ends, since the Ethernet system will not operate correctly in the presence of loop paths.
  • 32. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 32 of 183 Even though the media segments may be physically connected in a star pattern, with multiple segments attached to a repeater, the logical topology is still that of a single Ethernet channel that carries signals to all stations. The basic IEEE 802.3 MAC Data Frame for 10/100Mbps Ethernet: • Preamble (Pre)— 7 bytes. The PRE is an alternating pattern of ones and zeros that tells receiving stations that a frame is coming, and that provides a means to synchronize the frame-reception portions of receiving physical layers with the incoming bit stream. Start-of-frame delimiter (SFD)—1 byte The SOF is an alternating pattern of ones and zeros, ending with two consecutive 1-bits indicating that the next bit is the left-most bit in the left-most byte of the destination address. Destination address (DA)— 6 bytes The DA field identifies which station(s) should receive the frame. Source addresses (SA)— 6 bytes The SA field identifies the sending station. Length/Type— 2 bytes This field indicates either the number of MAC-client data bytes that are contained in the data field of the frame, or the frame type ID if the frame is assembled using an optional format. Data— Is a sequence of n bytes (46=< n =<1500) of any value. The total frame minimum is 64bytes. Frame check sequence (FCS)— 4 bytes This sequence contains a 32-bit cyclic redundancy check (CRC) value, which is created by the sending MAC and is recalculated by the receiving MAC to check for damaged frames. MAC Frame with Gigabit Carrier Extension: 1000Base-X has a minimum frame size of 416bytes, and 1000Base-T has a minimum frame size of 520bytes. An extension field is used to fill the frames that are shorter than the minimum length. Related protocols IEEE 802.3, 802.3u, 802.3z, 802.3ab, 802.2, 802.1, 802.3ae, 802.1D, 802.1G, 802.1Q, 802.1p
  • 33. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 33 of 183 Sponsor Source Ethernet standards are defined by IEEE (http://www.ieee.org) in 802.3 specifications NETWORK PROTOCOLS A network architecture is a blueprint of the complete computer communication network, which provides a framework and technology foundation for designing, building and managing a communication network. It typically has a layered structure. Layering is a modern network design principle which divides the communication tasks into a number of smaller parts, each part accomplishing a particular sub-task and interacting with the other parts in a small number of well-defined ways. Layering allows the parts of a communication to be designed and tested without a combinatorial explosion of cases, keeping each design relatively simple. If network architecture is open, no single vendor owns the technology and controls its definition and development. Anyone is free to design hardware and software based on the network architecture. The TCP/IP network architecture, which the Internet is based on, is such a open network architecture and it is adopted as a worldwide network standard and widely deployed in local area network (LAN), wide area network (WAN), small and large enterprises, and last but not the least, the Internet. Open Systems Interconnection (OSI) network architecture, developed by International Organization for Standardization, is an open standard for communication in the network across different equipment and applications by different vendors. Though not widely deployed, the OSI 7 layer model is considered the primary network architectural model for inter-computing and inter-networking communications. In addition to the OSI network architecture model, there exist other network architecture models by many vendors, such as IBM SNA (Systems Network Architecture), Digital Equipment Corporation (DEC; now part of HP) DNA (Digital Network Architecture), Apple computer’s AppleTalk, and Novell’s NetWare. Actually, the TCP/IP architecture does not exactly match the OSI model. Unfortunately, there is no universal agreement regarding how to describe TCP/IP with a layered model. It is generally agreed that TCP/IP has fewer levels (from three to five layers) than the seven layers of the OSI model. Network architecture provides only a conceptual framework for communications between computers. The model itself does not provide specific methods of communication. Actual communication is defined by various communication protocols.
  • 34. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 34 of 183 INTERNAL PROTOCOL (TCP /IP) TCP/IP architecture does not exactly follow the OSI model. Unfortunately, there is no universal agreement regarding how to describe TCP/IP with a layered model. It is generally agreed that TCP/IP has fewer levels (from three to five layers) than the seven layers of the OSI model. We adopt a four layers model for the TCP/IP architecture. TCP/IP architecture omits some features found under the OSI model, combines the features of some adjacent OSI layers and splits other layers apart. The 4-layer structure of TCP/IP is built as information is passed down from applications to the physical network layer. When data is sent, each layer treats all of the information it receives from the upper layer as data, adds control information (header) to the front of that data and then pass it to the lower layer. When data is received, the opposite procedure takes place as each layer processes and removes its header before passing the data to the upper layer. The TCP/IP 4-layer model and the key functions of each layer are described below: Application Layer
  • 35. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 35 of 183 The Application Layer in TCP/IP groups the functions of OSI Application, Presentation Layer and Session Layer. Therefore any process above the transport layer is called an Application in the TCP/IP architecture. In TCP/IP socket and port are used to describe the path over which applications communicate. Most application level protocols are associated with one or more port number. Transport Layer In TCP/IP architecture, there are two Transport Layer protocols. The Transmission Control Protocol (TCP) guarantees information transmission. The User Datagram Protocol (UDP) transports datagram swithout end-to-end reliability checking. Both protocols are useful for different applications. Network Layer The Internet Protocol (IP) is the primary protocol in the TCP/IP Network Layer. All upper and lower layer communications must travel through IP as they are passed through the TCP/IP protocol stack. In addition, there are many supporting protocols in the Network Layer, such as ICMP, to facilitate and manage the routing process. Network Access Layer: In the TCP/IP architecture, the Data Link Layer and Physical Layer are normally grouped together to become the Network Access layer. TCP/IP makes use of existing Data Link and Physical Layer standards rather than defining its own. Many RFCs describe how IP utilizes and interfaces with the existing data link protocols such as Ethernet, Token Ring, FDDI, HSSI, and ATM. The physical layer, which defines the hardware communication properties, is not often directly interfaced with the TCP/IP protocols in the network layer and above. FILE TRANSFER PROTOCOLS (FTP), WWW, HTTP, HTML, URL FTP: File Transfer Protocol Protocol Description File Transfer Protocol (FTP) enables file sharing between hosts. FTP uses TCP to create a virtual connection for control information and then creates a separate TCP connection for data transfers. The control connection uses an image of the TELNET protocol to exchange commands and messages between hosts. Sometimes FTP access to files may be restricted. To retrieve files from these computers, you must know the address and have a user ID and a password. However, many computers are set up as anonymous FTP servers, where user usually logs in as
  • 36. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 36 of 183 anonymous and gives his/her e-mail address as a password. Internet browsers such as Netscape Navigator and Internet Explorer support anonymous FTP. Simply change the URL from http:// to ftp: // and follow it with the name of the FTP site you wish to go to. However, you must fill in your identity under the Options menu so that the browser can log you in as an anonymous user. The program and files on FTP sites are usually organized hierarchically in a series of directories. Those on anonymous FTP sites are often in a directory called pub (i.e., public). It is worth remembering that many FTP sites are running on computers with a UNIX operating system that is case sensitive. Many FTP servers supply text information when you login, in addition to the readme file. You can get help at the FTP prompt by typing ‘help’ or ‘?’. Before you download/upload the image file, select the option of transferring from asc (ascii) for text to bin (binary) for the image (graphic) files. Most files are stored in a compressed or zipped format. Some programs for compressing and uncompressing come as one integrated package, while others are two separate programs. The most commonly used compression programs are as follows: The key functions of FTP are: To promote sharing of files (computer programs and/or data); To encourage indirect or implicit (via programs) use of remote computers; To shield a user from variations in file storage systems among hosts; and To transfer data reliably and efficiently. FTP, though usable directly by a user at a terminal, is designed mainly for use by programs. FTP control frames are TELNET exchanges and can contain TELNET commands and option negotiation. However, most FTP control frames are simple ASCII text and can be classified as FTP commands or FTP messages. FTP messages are responses to FTP commands and consist of a response code followed by explanatory text.
  • 37. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 37 of 183 Protocol Structure Command Description ABOR Abort data connection process. ACCT <account> Account for system privileges. ALLO <bytes> Allocate bytes for file storage on server. APPE <filename> Append file to file of same name on server. CDUP <dir path> Change to parent directory on server. CWD <dir path> Change working directory on server. DELE <filename> Delete specified file on server. HELP <command> Return information on specified command. LIST <name> List information if name is a file or list files if name is a directory. MODE <mode> Transfer mode (S=stream, B=block,C=compressed). MKD <directory> Create specified directory on server. NLST <directory> List contents of specified directory. NOOP Cause no action other than acknowledgement from server. PASS <password> Password for system log-in. PASV Request server wait for data connection. PORT <address> IP address and two-byte system port ID. PWD Display current working directory. QUIT Log off from the FTP server. REIN Reinitialize connection to log-in status. REST <offset> Restart file transfer from given offset. RETR <filename> Retrieve (copy) file from server. RMD <directory> Remove specified directory on server. RNFR <old path> Rename from old path. RNTO <new path> Rename to new path. SITE <params> Site specific parameters provided by server. SMNT <pathname> Mount the specified file structure. STAT <directory> Return information on current process or directory. STOR <filename> Store (copy) file to server. STOU <filename> Store file to server name. STRU <type> Data structure (F=file, R=record,P=page).
  • 38. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 38 of 183 SYST Return operating system used by server. TYPE <data type> Data type (A=ASCII, E=EBCDIC, I=binary). USER <username> User name for system log-in. Standard FTP messages are as follows: Response Code Explanatory Text 110 Restart marker at MARK yyyy=mmmm (new file pointers). 120 Service ready in nnn minutes. 125 Data connection open, transfer starting. 150 Open connection. 200 OK. 202 Command not implemented. 211 (System status reply). 212 (Directory status reply). 213 (File status reply). 214 (Help message reply). 215 (System type reply). 220 Service ready. 221 Log off network. 225 Data connection open. 226 Close data connection. 227 Enter passive mode (IP address, port ID). 230 Log on network. 250 File action completed. 257 Path name created. 331 Password required. 332 Account name required. 350 File action pending. 421 Service shutting down. 425 Cannot open data connection. 426 Connection closed. 450 File unavailable. 451 Local error encountered. 452 Insufficient disk space.
  • 39. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 39 of 183 500 Invalid command. 501 Bad parameter. 502 Command not implemented. 503 Bad command sequence. 504 Parameter invalid for command. 530 Not logged onto network. 532 Need account for storing files. 550 File unavailable. 551 Page type unknown. 552 Storage allocation exceeded. 553 File name not allowed. Related protocols: TELNET Sponsor Source: FTP is defined by IETF (http://www.ietf.org) in RFC 959 and updated by 2228, 2640 and 2773. WWW, HTTP, HTML, URL The World Wide Web (WWW) is the worldwide connection of computer servers and a way of using the vast interconnected network to find and view information from around the world. Internet uses a language, TCP/IP, for talking back and forth. The TCP part determines how to take apart a message into small packets that travel on the Internet and then reassemble them at the other end. The IP part determines how to get to other places on the Internet. The WWW uses an additional language called the HyperText Transfer Protocol (HTTP). The main use of the Web is for information retrieval, whereby multimedia documents are copied for local viewing. Web documents are written in HyperText Markup L anguage (HTML) which describes, most importantly, where hypertext links are located within the document. These hyperlinks provide connections between documents, so that a simple click on a hypertext word or picture on a Web page allows your computer to extend across the Internet and bring the document to your computer. The central repository having information the user wants is called a server, and your (user’s) computer is a client of the server. Because HTML is predominantly a generic descriptive language based around text, it does not define any graphical descriptors.
  • 40. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 40 of 183 The common solution was to use bit-mapped graphical images in so-called GIF (graphical interchange format) or JPEG (Joint photographic experts group) format. Every Web document has a descriptor, Uniform Resource L ocator (URL), which describes the address and file name of the page. The key to the Web is the browser program, which is used to retrieve and display Web documents. The browser is an Internet compatible program and does three things for Web documents: It uses Internet to retrieve documents from servers. It displays these documents on your screen, using formatting specified in the document. It makes the displayed documents active. The common browsers are Netscape Navigator from Netscape Communications (http://home.netscape.com/) and Internet Explore from Microsoft (http:// www.microsoft.com/). To request a Web page on the Internet, you either click your mouse on a hyperlink or type in the URL. The HTML file for the page is sent to your computer together with each graphic image, sound sequence, or other special effect file that is mentioned in the HTML file. Since some of these files may require special programming that has to be added to your browser, you may have to download the program the first time you receive one of these special files. These programs are called helper applications, add- ons, or plug-ins. A mechanism called MIME (Multipurpose Internet Mail Extensions), which allows a variety of standard file formats to be exchanged over the Internet using electronic mail, has been adopted for use with WWW. When a user makes a selection through a hyperlink within an HTML document, the client browser posts the request to the designated web server. Assuming that the server accepts the request, it locates the appropriate file(s) and sends it with a short header at the top of each datafile/document to the client, with the relevant MIME header attached. When the browser receives the information, it reads the MIME types such as text/html or image/gif the browser have been built in such a way that they can simply display the information in the browser window. For given MIME type, a local preference file is inspected to determine what (if any) local program (known as a helper application or plug-in) can display the information, this program is then launched with the data file, and the results are displayed in a newly opened application window. The important aspect of this
  • 41. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 41 of 183 mechanism is that it achieves the delivery of semantic content to the user, who can specify the style in which it will be displayed via the choice of an appropriate application program. Once connected to the Web, a variety of WWW directories and search engines are available for those using the Web in a directed fashion. First, there are Web catalogues; the best known of these is Yahoo, which organizes Web sites by subject classification. You can either scroll through these subject categories or use the Yahoo search engine. It also simultaneously forwards your search request to other leading search engines such as AltaVista, Excite, and others. Alternatively, there are the Web databases, where the contents of Web pages are indexed and searchable such as Lycos and InfoSeek. Google is extremely comprehensive. Pages are ranked based on how many times they are linked from other pages, thus a Google search would bring you to the well-traveled pages that match your search topic. HotBot is relatively comprehensive and regularly updated. It offers form-based query tools that eliminate the need for you to formulate query statements. Major search engines on the Web are listed in Table 3.2. Many other specialized search engines can be found in EasySearcher 2 (http://www.easysearcher.com/ez2.html). HTTP Protocol Description The Hypertext Transfer Protocol (HTTP) is an application-level protocol with the lightness and speed necessary for distributed, collaborative, hypermedia information systems. HTTP has been in use by the World-Wide Web global information initiative since 1990. HTTP allows an open-ended set of methods to be used to indicate the purpose of a request. It builds on the discipline of reference provided by the Uniform Resource Identifier (URI), as a location (URL) or name (URN), for indicating the resource on which a method is to be applied. Messages are passed in a format similar to that used by Internet Mail and the Multipurpose Internet Mail Extensions (MIME). HTTP is also used as a generic protocol for communication between user agents and proxies/gateways to other Internet protocols, such as SMTP, NNTP, FTP, Gopher and WAIS, allowing basic hypermedia access to resources available from diverse applications and simplifying the implementation of user agents.
  • 42. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 42 of 183 The HTTP protocol is a request/response protocol. A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. The server responds with a status line, including the message’s protocol version and a success or error code, followed by a MIME-like message containing server information, entity meta information, and possible entitybody content. The first version of HTTP, referred to as HTTP/0.9, was a simple protocol for raw data transfer across the Internet. HTTP/1.0, as defined by RFC 1945, improved the protocol by allowing messages to be in the format of MIME-like messages, containing meta information about the data transferred and modifiers on the request/response semantics. However, HTTP/1.0 does not sufficiently take into consideration the effects of hierarchical proxies, caching, the need for persistent connections, or virtual hosts. “HTTP/1.1” includes more stringent requirements than HTTP/1.0 in order to ensure reliable implementation of its features. There is a secure version of HTTP (S-HTTP) specification, which will be discussed in a separate document. Protocol Structure HTTP messages consist of requests from client to server and responses from server to client. The request message has the following format: The Request-Line begins with a method token, followed by the Request-URI and the protocol version, and ends with CRLF. The elements are separated by SP characters. No CR or LF is allowed except in the final CRLF sequence. The details of the general header, request header and entity header can be found in the reference documents. The response message has the following format: The Status-Code element is a 3-digit integer result code of the attempt to understand and satisfy the request. The Reason-Phrase is intended to give a short textual description of the Status-Code. The Status-Code is intended for use by automata and
  • 43. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 43 of 183 the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase. The details of the general header, response header and entity header could be found in the reference documents. Related protocols: WWW, FTP, STMP, NNTP, Gopher, WAIS, DNS, S-HTTP Sponsor Source HTTP is defined by IETF (http://www.ietf.org) in RFC 1945 and 2616. NETWORK SECURITY – GROUP POLICES FIRE –WALLS Network security is an increasingly important factor in bioinformatics because of the central role that online databases, applications, and groupware such as e-mail play in the day-to-day operation of a bioinformatics facility. Opening an intranet to the outside world through username and password protected restricted access may be the basis for collaboration as well as a weak point in the security of the organization. In addition, because many biometric laboratories are involved, even if indirectly, with applied genomics, there is a group of politically active opponents to this research. The computers avvy members of these activist groups represent a potential threat to network security. Every network presents a variety of security holes through which potential hackers and disgruntled or simply curious employees can implement random threats, such as viruses. Many of these threats are network- and operating system–specific. For example, Microsoft typically announces a service pack within a few weeks after the introduction of a server-based operating system to patch security holes discovered by users. The most secure method—physical isolation from outside networks—isn't usually a viable option. Even a closed network without dial-in or any other wired access to other networks can be breached by someone with enough motivation and time. For example, wireless networks are notorious for their potential to disseminate data to nearby listeners. A hacker with a high-gain antenna, receiver, and laptop computer can monitor wireless network activity from a mile or more away. A similar setup, configured to a slightly different frequency, can be used to reconstruct whatever data is displayed on a video screen, including username and password information. Every cable, peripheral, and display device emits a radio frequency signal that can be captured, amplified, and read. For this reason, computer facilities used by military contractors are
  • 44. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 44 of 183 frequently located in shielded, windowless rooms that minimize the chances of the radiation emitted from a computer reaching someone who is monitoring the building. Although it may be practically impossible to maintain security from professional industrial spies, a variety of steps can be taken to minimize the threat posed by modestly computer-savvy activists and the most common non-directed security threats. These steps include using antiviral utilities, controlling access through the use of advanced user-authentication technologies, firewalls, and, most importantly, low-level encryption technologies. Antiviral Utilities In addition to threats from hackers, there is a constant threat of catastrophic loss of data from viruses attached to documents from outside sources, even those from trusted collaborators. The risk of virus infection can be minimized by installing virus-scanning software on servers and locally on workstations. The downside to this often-unavoidable precaution is decreased performance of the computers running antiviral programs, as well as the maintenance of the virus-detection software to insure that the latest virus definitions are installed. Authentication The most often used method of securing access to a network is to verify that users are who they say they are. However, simple username and password protection at the firewall and server levels can be defeated by someone who either can guess or otherwise has access to the username and password information. A more secure option is to use a synchronized, pseudorandom number generator for passwords. In this scheme, two identical pseudorandom number generators, one running on a credit card–sized computer and one running on a secure server, generate identical number sequences that appear to be random to an observer. The user carries a credit-card sized secure ID card that displays the sequence on an LCD screen. When a user logs in to the computer network, she uses the displayed number sequence for her password, which is compared to the current number generated by a program running the server. If the sequences match, she is allowed access to the server. Otherwise, she is locked out of the network. Because the number displayed on the ID card—and in the server—changes every 30 seconds, the current password doesn't provide a potential intruder with a way in to the system. The major
  • 45. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 45 of 183 security hole is that a secure ID card can be stolen, which will provide the thief with the password, but not the username. More sophisticated methods of user authentication involve biometrics, the automated recognition of fingerprint, voice, retina, or facial features. Authentication systems based on these methods aren't completely accurate, however, and there are often false positives (imposters passing as someone else) and false negatives (an authentic user is incorrectly rejected by the system) involved in the process. In addition to errors in recognition, there are often ways of defeating biometrical devices by bypassing the image-processing components of the systems. For example, fingerprints are converted into a number and letter sequence that serves as the key to gaining access to network assets; anyone who can intercept that sequence and enter it directly into the system can gain access to the network. A researcher employed by a biotech firm to analyze nucleotide sequences probably has no need to examine the files in a 3D protein visualization system in the laboratory a few doors down from his office. Similarly, payroll, human resources, and other administrative data may be of concern to the CFO, but not to the manager of the microarray laboratory. Authentication provides the information necessary to provide tiered access to networked resources. This access can be controlled at the workstation, the server, and firewall levels to limit access to specific databases, applications, or network databases. Firewalls As introduced in the discussion of network hardware, firewalls are stand-alone devices or programs running on a server that block unauthorized access to a network. Dedicated hardware firewalls are more secure than a software-only solution, but are also considerably more expensive. Firewalls are commonly used in conjunction with proxy servers to mirror servers inside a firewall, thereby intercepting requests and data originally intended for an internal server. In this way, outside users can access copies of some subset of the data on the system without ever having direct access to the data. This practice provides an additional layer of security against hackers. Encryption Encryption, the process of making a message unintelligible to all but the intended recipient, is one of the primary means of ensuring the security of messages sent
  • 46. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 46 of 183 through the Internet and even in the same building. It's also one of the greatest concerns—and limitations—of network professionals. Many information services professionals are reluctant to install wireless networks because of security concerns, for example. Although cryptography—the study of encryption and decryption—predates computers by several millennia, no one has yet devised a system that can't be defeated, given enough time and resources. Every form of encryption has tradeoffs of security versus processing and management overhead, and different forms of encryption are used in different applications. Of the encryption standards developed for the Internet, most are based on public key encryption (PKE) technology. One reason that PKE is so prominent is because it's supported by the Microsoft Internet Explorer and Netscape Navigator browsers. PKE is a form of asymmetric encryption, in that the keys used for encryption and decryption are different. Aside from the added complexity added by the use of different keys on the sending and receiving ends, the two forms of encryption and decryption are virtually identical. C PROGRAMMING – ALGORITHM AND FLOW CHART C is an offspring of ‘Basic Combined Programming Language’ (BCPL) called B, developed in the 1960’s at Cambridge University. B language was modified by Dennis Ritchie and was implemented at Bell Laboratories in 1972.The new language was named as C. Since it was developed along with UNIX operating system, it is strongly associated with UNIX. It can run under various operating systems including MSDOS. C language is a middle level Language. It combines the features of high level language and functionality like assembly language. It reduces the gap between high level language and low-level language that is why it is known as middle level language. It is well suited for writing both application software and system software. Importance of C: It is robust language. Rich set of built-in functions and operators are available. Easy to create complex programs. Suitable to write both application software as well as system software because it combines the features of assembly language with high level language features.
  • 47. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 47 of 183 It is highly portable. Because C programs written for one computer it can be run on another with little or no modification. It is well suited for structured programming. This makes the programmer to think in terms of modules or blocks. This in turn makes program debugging, testing and maintaining easier. It can extend itself. Programs written in C are efficient and faster. This is due to the variety of data type and powerful operators. It is many times faster than BASIC. For example a program to increment a variable from 0 to 15,000 takes 1 second in C where as more than 50 seconds in the BASIC interpreter. There are 32 keywords and the strength lies in the built-in functions available for developing the programs. Algorithm: An algorithm is a sequence of instructions that one must perform in order to solve a well-formulated problem. We will specify problems in terms of their inputs and their outputs, and the algorithm will be the method of translating the inputs into the outputs. A well-formulated problem is unambiguous and precise, leaving no room for misinterpretation. In order to solve a problem, some entity needs to carry out the steps specified by the algorithm. A human with a pen and paper would be able to do this, but humans are generally slow, make mistakes, and prefer not to perform repetitive work. A computer is less intelligent but can perform simple steps quickly and reliably. A computer cannot understand English, so algorithms must be rephrased in a programming language such as C or Java in order to give specific instructions to the processor. Flow chart:
  • 48. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 48 of 183 PERL – ALGORITHM AND FLOW CHART The strength of PERL at character-string handling makes it suitable for sequence analysis tasks in biology. Here is a very simple PERL program to translate a nucleotide sequence into an amino acid sequence according to the standard genetic code. The first line, #!/usr/bin/perl, is a signal to the UNIX (or LINUX) operating system that what follows is a PERL program. Within the program, all text commencing with a #, through to the end of the line on which it appears, is merely comment. The line __END__ signals that the program is finished and what follows is the input data. (All material that the reader might find useful to have in computer-readable form, including all programs, appear in the web site associated with this book: http://www.oup.com/uk/lesk/bioinf) #!/usr/bin/perl #translate.pl -- translate nucleic acid sequence to protein sequence # according to standard genetic code
  • 49. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 49 of 183 # set up table of standard genetic code %standardgeneticcode = ( "ttt"=> "Phe", "tct"=> "Ser", "tat"=> "Tyr", "tgt"=> "Cys", "ttc"=> "Phe", "tcc"=> "Ser", "tac"=> "Tyr", "tgc"=> "Cys", "tta"=> "Leu", "tca"=> "Ser", "taa"=> "TER", "tga"=> "TER", "ttg"=> "Leu", "tcg"=> "Ser", "tag"=> "TER", "tgg"=> "Trp", "ctt"=> "Leu", "cct"=> "Pro", "cat"=> "His", "cgt"=> "Arg", "ctc"=> "Leu", "ccc"=> "Pro", "cac"=> "His", "cgc"=> "Arg", "cta"=> "Leu", "cca"=> "Pro", "caa"=> "Gln", "cga"=> "Arg", "ctg"=> "Leu", "ccg"=> "Pro", "cag"=> "Gln", "cgg"=> "Arg", "att"=> "Ile", "act"=> "Thr", "aat"=> "Asn", "agt"=> "Ser", "atc"=> "Ile", "acc"=> "Thr", "aac"=> "Asn", "agc"=> "Ser", "ata"=> "Ile", "aca"=> "Thr", "aaa"=> "Lys", "aga"=> "Arg", "atg"=> "Met", "acg"=> "Thr", "aag"=> "Lys", "agg"=> "Arg", "gtt"=> "Val", "gct"=> "Ala", "gat"=> "Asp", "ggt"=> "Gly", "gtc"=> "Val", "gcc"=> "Ala", "gac"=> "Asp", "ggc"=> "Gly", "gta"=> "Val", "gca"=> "Ala", "gaa"=> "Glu", "gga"=> "Gly", "gtg"=> "Val", "gcg"=> "Ala", "gag"=> "Glu", "ggg"=> "Gly" ); # process input data while ($line = <DATA>) { # read in line of input print "$line"; # transcribe to output chop(); # remove end-of-line character @triplets = unpack("a3" x (length($line)/3), $line); # pull out successive triplets foreach $codon (@triplets) { # loop over triplets print "$standardgeneticcode{$codon}"; # print out translation of each } # end loop on triplets print "nn"; # skip line on output } # end loop on input lines # what follows is input data __END__ atgcatccctttaat tctgtctga
  • 50. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 50 of 183 Running this program on the given input data produces the output: atgcatccctttaat MetHisProPheAsn tctgtctga SerValTER Even this simple program displays several features of the PERL language. The file contains background data (the genetic code translation table), statements that tell the computer to do something with the input (i.e. the sequence to be translated), and the input data (appearing after the __END__ line). Comments summarize sections of the program, and also describe the effect of each statement. The program is structured as blocks enclosed in curly brackets: {...}, which are useful in controlling the flow of execution. Within blocks, individual statements (each ending in a ;) are executed in order of appearance. The outer block is a loop: while ($line = <DATA>) { ... } <DATA> refers to the lines of input data (appearing after the __END__). The block is executed once for each line of input; that is, while there is any line of input remaining. Three types of data structures appear in the program. The line of input data, referred to as $line, is a simple character string. It is split into an array or vector of triplets. An array stores several items in a linear order, and individual items of data can be retrieved from their positions in the array. For ease of looking up the amino acid coded for by any triplet, the genetic code is stored as an associative array. An associative array, or hash table, is a generalization of a simple or sequential array. If the elements of a simple array are indexed by consecutive integers, the elements of an associative array are indexed by any character strings, in this case the 64 triplets. We process the input triplets in order of their appearance in the nucleotide sequence, but we need to access the elements of the genetic code table in an arbitrary order as dictated by the succession of triplets. A simple array or vector of character strings is appropriate for processing successive triplets, and the associative array is appropriate for looking up the amino acids that correspond to them. .
  • 51. PROF. BALASUBRAMANIAN SATHYAMURTHY 2016 EDITION BTS - 205: BIOINFORMATICS Contact for your free pdf & job opportunities theimprintbiochemistry@gmail.com or 9980494461 Page 51 of 183 STRUCTURE OF C PROGRAM The structure of a C program is a protocol (rules) to the programmer, while writing a C program. The general basic structure of C program is shown in the figure below. The whole program is controlled within main ( ) along with left brace denoted by “{” and right braces denoted by “}”. If you need to declare local variables and executable program structures are enclosed within “{” and “}” is called the body of the main function. The main ( ) function can be preceded by documentation, preprocessor statements and global declarations. Documentations The documentation section consist of a set of comment lines giving the name of the program, the name and other details, which the programmer would like to use later. Preprocessor Statements The preprocessor statement begins with # symbol and is also called the preprocessor directive. These statements instruct the compiler to include C preprocessors such as header files and symbolic constants before compiling the C program. Some of the preprocessor statements are listed below.