SlideShare a Scribd company logo
1 of 17
Description of Major Machine Learning Software
                        Packages


            How to use the SNNS for implementing ANN

                                    Sudipto saha


Introduction
SNNS (Stuttgart Neural Network Simulator) is a software simulator for neural networks
on Unix workstations developed at the Institute for Parallel and Distributed High
Performance Systems (IPVR) at the University of Stuttgart. The goal of the SNNS
project is to create an efficient and flexible simulation environment for research on and
application of neural nets.
The SNNS simulator consists of two main components:
1) simultor kernel written in C
2) graphical user interface under X11R4 or X11R5
The simulator kernel operates on the internal network data structures of the neural nets
and performs all operations of learning and recall. It can also be used without the other
parts as a C program embedded in custom applications. It supports arbitrary network
topologies and, like RCS, supports the concept of sites. SNNS can be extended by the
user with user defined activation functions, output functions, site functions and learning
procedures, which are written as simple C programs and linked to the simulator kernel.
The graphical user interface XGUI (X Graphical User Interface), built on top of the ker-
nel, gives a 2D and a 3D graphical representation of the neural networks and controls the
kernel during the simulation run. In addition, the 2D user interface has an integrated net-
work editor which can be used to directly create, manipulate and visualize neural nets in
various ways.

Building Blocks of Neural Nets
 The following paragraph describes a generic model for those neural nets that can be gen-
erated by the SNNS simulator. The basic principles and the terminology used in dealing
with the graphical interface are also briefly introduced. A more general and more de-
tailed introduction to connectionism can, e.g., be found in [RM86]. For readers fluent in
German, the most comprehensive and up to date book on neural network learning algo-
rithms, simulation systems and neural hardware is probably [Zel94]




                                                                                       7-1
A network consists of units and directed, weighted links (connections) between
them. In analogy to activation passing in biological neurons, each unit receives a net in-
put that is computed from the weighted outputs of prior units with connections leading to
this unit. Picture shows a small network.




Figure: A small network with three layers of units
The actual information processing within the units is modeled in the SNNS simulator
with the activation function and the output function. The activation function first com-
putes the net input of the unit from the weighted output values of prior units. It then com-
putes the new activation from this net input (and possibly its previous activation). The
output function takes this result to generate the output of the unit. These functions can
be arbitrary C functions linked to the simulator kernel and may be different for each unit.
Our simulator uses a discrete clock. Time is not modeled explicitly (i.e. there is no prop-
agation delay or explicit modeling of activation functions varying over time). Rather, the
net executes in update steps, where           is the activation of a unit one step after    .
The SNNS simulator, just like the Rochester Connectionist Simulator (RCS, [God87]),
offers the use of sites as additional network element. Sites are a simple model of the
dendrites of a neuron which allow a grouping and different treatment of the input signals
of a cell. Each site can have a different site function. This selective treatment of incom-
ing information allows more powerful connectionist models. Figure shows one unit
with sites and one without.




Figure: One unit with sites and one without
In the following all the various network elements are described in detail.




                                                                                           7-2
Units
 Depending on their function in the net, one can distinguish three types of units: The
units whose activations are the problem input for the net are called input units; the units
whose output represent the output of the net output units. The remaining units are
called hidden units, because they are not visible from the outside (see e.g. figure ).
In most neural network models the type correlates with the topological position of the
unit in the net: If a unit does not have input connections but only output connections,
then it is an input unit. If it lacks output connections but has input units, it is an output
unit. If it has both types of connections it is a hidden unit.
It can, however, be the case that the output of a topologically internal unit is regarded as
part of the output of the network. The IO-type of a unit used in the SNNS simulator has
to be understood in this manner. That is, units can receive input or generate output even
if they are not at the fringe of the network.
Below, all attributes of a unit are listed:
    •   no: For proper identification, every unit has a number   attached to it. This
        number defines the order in which the units are stored in the simulator kernel.
    •   name: The name can be selected arbitrarily by the user. It must not, however,
        contain blanks or special characters, and has to start with a letter. It is useful to
        select a short name that describes the task of the unit, since the name can be
        displayed with the network.
    •   io-type or io: The IO-type defines the function of the unit within the net. The
        following alternatives are possible


            o   input: input unit
            o   output: output unit
            o   dual: both input and output unit
            o   hidden: internal, i.e. hidden unit
            o   special: this type can be used in any way, depending upon the application.
                In the standard version of the SNNS simulator, the weights to such units
                are not adapted in the learning algorithm (see paragraph ).
            o   special input, special hidden, special output: sometimes it is necessary to
                to know where in the network a special unit is located. These three types
                enable the correlation of the units to the various layers of the network.
    •   activation: The activation value.

    •   initial activation or i_act: This variable contains the initial activation



                                                                                                7-3
value, present after the initial loading of the net. This initial configuration can be
       reproduced by resetting ( reset) the net, e.g. to get a defined starting state of the
       net.
   •   output: the output value.



   •   bias: In contrast to other network simulators where the bias (threshold) of a unit
       is simulated by a link weight from a special 'on'-unit, SNNS represents it as a unit
       parameter. In the standard version of SNNS the bias determines where the
       activation function has its steepest ascent. (see e.g. the activation function
       Act_logistic). Learning procedures like backpropagation change the bias of a
       unit like a weight during training.


activation function or actFunc: A new activation is computed from the output of
preceding units, usually multiplied by the weights connecting these predecessor units
with the current unit, the old activation of the unit and its bias. When sites are being
used, the network input is computed from the site values.
How to obtain SNNS
The SNNS simulator can be obtained from download area (http://www-
ra.informatik.uni-tuebingen.de/downloads/SNNS/) or via anonymous ftp (deprecated)
from host
      ftp.informatik.uni-tuebingen.de
in the subdirectory
      /pub/SNNS
as file
      SNNSv4.1.tar.Z                             (2.6 MB)
or in zipped version
      SNNSv4.1.tar.gz               (1.6 MB)
Be sure to set the ftp mode to binary before transmission of the files. Also watch out for
possible higher version numbers, patches or Readme files in the above directory
/pub/SNNS. After successful transmission of the file move it to the directory where you
want to install SNNS, uncompress and extract the file with the Unix commands
      uncompress SNNSv4.1.tar.Z
      tar xvf SNNSv4.1.tar
The SNNS distribution includes full source code, installation procedures for supported
machine architectures and some simple examples of trained networks. The PostScript
version of the user manual can be obtained as file
           SNNSv4.1.Manual.ps.Z                  (1.6 MB)
or in 15 parts as files
      SNNSv4.1.Manual.part01.ps.Z
      ...
      SNNSv4.1.Manual.part15.ps.Z




                                                                                           7-4
SNNS 4.2 for MS-Windows (http://www-ra.informatik.uni-
tuebingen.de/downloads/SNNS/Windows/ )


On-line SNNS User Manual (version 4.1)

On line manual is available at this site

http://www ra.informatik.unituebingen.de/SNNS/UserManual/UserManual.html



Input file in txt
Total number of sequence in this file is 78 and 10mer in length.

An example of sequences in fasta format,

>IgE_epitope1
AEDEDNQQGQ
>IgE_epitope2
AEEVEEERLK
>IgE_epitope3
AKSSPYQKKT
>IgE_epitope4
APRIVLDVAS
>IgE_epitope5
AVADVTPKQL
>IgE_epitope6
AVITWRALNK
>IgE_epitope7
AVPLYNRFSY
>IgE_epitope8
CDRPPKHSQN
>IgE_epitope9
CSGTKKLSEE
>IgE_epitope10
DGKTGSSTPH



Input file in binary form

Input file for SNNS, here amino acid composition of the sequences are calculated.
Note that the first 7 lines of the input file. First two lines , followed by to blank line then
the number of patterens (78 in this case, since total seqience is 78), number of input units
(20 in this case, calculating the amino acid composition) and the out puts (1, one value)



                                                                                           7-5
SNNS pattern definition file V4.2
generated at Sat Aug 27 16:40:25 2005


No. of patterns : 78
No. of input units : 20
No. of output units : 1

# Input pattern 1:
0.1 0 0.2 0.2 0 0.1 0 0 0 0 0 0.1 0 0.3 0 0 0 0 0 0
# Output pattern 1:
1
# Input pattern 2:
0.1 0 0 0.5 0 0 0 0 0.1 0.1 0 0 0 0 0.1 0 0 0.1 0 0
# Output pattern 2:
1


Output file of SNNS

The out put file of the SNNS is shown. The result show the summary of information.

SNNS result file V1.4-3D
generated at Tue Aug 30 08:58:52 2005

No. of patterns : 26
No. of input units : 20
No. of output units : 1
startpattern     :1
endpattern        : 26
input patterns included
teaching output included
#1.1
0.1 0 0.1 0 0 0 0 0.1 0.1 0.2
0 0 0 0 0 0.1 0.1 0 0 0.2
1                        Out put of SNNS
0.64832
#2.1
0 0 0.1 0.1 0.3 0.1 0 0 0 0
0 0 0.2 0.1 0 0 0.1 0 0 0
1
0.6276

The outputs of the SNNS are process at different threshold (0.1 to 1), and parameters like
sensitivity, specificity, and accuracy are calculated. The Artificial neural network tries to
classify positive from negative examples. For example here we take an example of IgE
epitopes and non epitopes. We need a data set of IgE epitope (positive set) and negative



                                                                                         7-6
set (non epitopes). The Netwok will classify this training set, it will be validated by one
set (to stop over fitting) and then tested by the left out testing set. Each set contains equal
number of sequence. In five fold cross validation it looks like this,

Training set               Validation set               Testing set
set 1,2,3                  set 4                        set 5
set 1,4,5                  set 3                        set 4
set 1,4,5                  set 2                        set 3
set 3,4,5                  set 1                        set 2
set 2,3,4                  set 5                        set 1


Processing of output data

The out put data are processed and interpreted, as shown (Thres=Threshold; Sen=Sensi-
tivity; Spe= Specificity; Acc=Accuracy; PPV=positive prediction value)

Thres Sen Spe Acc PPV
1.0000 0.0000 0.0000 0.0000 0.0000
0.9000 0.0214 0.9929 0.5071 0.7500
0.8000 0.1429 0.9857 0.5643 0.9091
0.7000 0.2571 0.9571 0.6071 0.8571
0.6000 0.5143 0.8357 0.6750 0.7579
0.5000 0.7214 0.7214 0.7214 0.7214
0.4500 0.8071 0.6000 0.7036 0.6686
0.4000 0.8571 0.4714 0.6643 0.6186
0.3000 0.9571 0.3286 0.6429 0.5877
0.2000 1.0000 0.1000 0.5500 0.5263
0.1000 1.0000 0.0071 0.5036 0.5018




                                                                                           7-7
How to use SVMlight efficiently for Implementing SVM

                                 Sneh Lata Pandey


Description
SVMlight is an implementation of Support Vector Machines (SVMs) in C. SVMlight is an
implementation of Vapnik's Support Vector Machine for the problem of pattern
recognition, for the problem of regression, and for the problem of learning a ranking
function. The algorithm has scalable memory requirements and can handle problems with
many thousands of support vectors efficiently. The software also provides methods for
assessing the generalization performance efficiently. It includes two efficient estimation
methods for both error rate and precision/recall.

Source Code and Binaries
The program is free for scientific use. The software must not be further distributed
without prior permission of the author. The implementation was developed on Solaris
2.5 with gcc, but compiles also on SunOS 3.1.4, Solaris 2.7, Linux, IRIX, Windows NT,
and Powermac. The source code is available at the following location:
http://download.joachims.org/svm_light/current/svm_light.tar.gz
If you just want the binaries, you can download them for the following systems:
Solaris: http://download.joachims.org/svm_light/current/svm_light_solaris.tar.gz
Windows: http://download.joachims.org/svm_light/current/svm_light_windows.zip
Cygwin: http://download.joachims.org/svm_light/current/svm_light_cygwin.tar.gz
Linux: http://download.joachims.org/svm_light/current/svm_light_linux.tar.gz

Installation
To install SVMlight you need to download svm_light.tar.gz. Create a new directory:
mkdir svm_light
Move svm_light.tar.gz to this directory and unpack it with
gunzip -c svm_light.tar.gz | tar xvf -
Now execute
make or make all
which compiles the system and creates the two executables
svm_learn (learning module)
svm_classify (classification module)

How to use
This section explains how to use the SVMlight software. SVMlight consists of a learning
module (svm_learn) and a classification module (svm_classify). The classification
module can be used to apply the learned model to new examples. See also the examples
below for how to use svm_learn and svm_classify.
svm_learn is called with the following parameters:
svm_learn [options] example_file model_file


                                                                                      7-8
Available options are:
General options:
-?       - this help
-v [0..3] - verbosity level (default 1)
Learning options:
-z {c,r,p} - select between classification (c), regression (r), and
                preference ranking (p) (see [Joachims, 2002c])
                (default classification)
-c float - C: trade-off between training error
              and margin (default [avg. x*x]^-1)
-w [0..] - epsilon width of tube for regression
              (default 0.1)
-j float - Cost: cost-factor, by which training errors on
             positive examples outweight errors on negative
             examples (default 1) (see [Morik et al., 1999])
-b [0,1] - use biased hyperplane (i.e. x*w+b0) instead
              of unbiased hyperplane (i.e. x*w0) (default 1)
-i [0,1] - remove inconsistent training examples
             and retrain (default 0)
Performance estimation options:
-x [0,1] - compute leave-one-out estimates (default 0)
-o ]0..2] - value of rho for XiAlpha-estimator and for pruning
               leave-one-out computation (default 1.0)
               (see [Joachims, 2002a])
-k [0..100] - search depth for extended XiAlpha-estimator
                 (default 0)
Kernel options:
-t int    - type of kernel function:
0: linear (default)
1: polynomial (s a*b+c)^d
2: radial basis function exp(-gamma ||a-b||^2)
3: sigmoid tanh(s a*b + c)
4: user defined kernel from kernel.h
-d int     - parameter d in polynomial kernel
-g float - parameter gamma in rbf kernel
-s float - parameter s in sigmoid/poly kernel
-r float - parameter c in sigmoid/poly kernel
-u string - parameter of user defined kernel
Optimization options (see [Joachims, 1999a], [Joachims, 2002a]):
-q [2..] - maximum size of QP-subproblems (default 10)
-n [2..q] - number of new variables entering the working set
               in each iteration (default n = q). Set n<q to prevent
               zig-zagging.
-m [5..] - size of cache for kernel evaluations in MB (default 40)
               The larger the faster...
-e float - eps: Allow that error for termination criterion
[y [w*x+b] - 1] = eps (default 0.001)


                                                                       7-9
-h [5..]  - number of iterations a variable needs to be
            optimal before considered for shrinking (default 100)
-f [0,1] - do final optimality check for variables removed by
              shrinking. Although this test is usually positive, there
               is no guarantee that the optimum was found if the test
               is omitted. (default 1)
-y string -> if option is given, reads alphas from file with given
                 and uses them as starting point. (default 'disabled')
-# int    -> terminate optimization, if no progress after this
              number of iterations. (default 100000)
Output options:
-l char - file to write predicted labels of unlabeled examples
             into after transductive learning
-a char - write all alphas to this file after learning (in the
              same order as in the training set)
A more detailed description of the parameters and how they link to the respective
algorithms is given in the appendix of [Joachims, 2002a].
The input file example_file contains the training examples. The first lines may contain
comments and are ignored if they start with #. Each of the following lines represents one
training example and is of the following format:
<line> .=. <target> <feature>:<value> <feature>:<value> ...
<feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float>
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>
A space character separates the target value and each of the feature/value pairs.
Feature/value pairs MUST be ordered by increasing feature number. Features with value
zero can be skipped. The string <info> can be used to pass additional information to the
kernel (e.g. non feature vector data).
In classification mode, the target value denotes the class of the example. +1 as the target
value marks a positive example, -1 a negative example respectively. So, for example, the
line
-1 1:0.43 3:0.12 9284:0.2 # abcdef
specifies a negative example for which feature number 1 has the value 0.43, feature
number 3 has the value 0.12, feature number 9284 has the value 0.2, and all the other
features have value 0. In addition, the string abcdef is stored with the vector, which can
serve as a way of providing additional information for user defined kernels. The order of
the predictions is the same as in the training data.
In regression mode, the <target> contains the real-valued target value.
In all modes, the result of svm_learn is the model which is learned from the training
data in example_file. The model is written to model_file. To make predictions on test
examples, svm_classify reads this file. svm_classify is called with the following
parameters:
svm_classify [options] example_file model_file output_file
Available options are:
-h       Help.
-v [0..3] Verbosity level (default 2).


                                                                                      7-10
-f [0,1] 0: old output format of V1.0
1: output the value of decision function (default)
The test examples in example_file are given in the same format as the training
examples (possibly with 0 as class label). For all test examples in example_file the
predicted values are written to output_file. There is one line per test example in
output_file containing the value of the decision function on that example. For
classification, the sign of this value determines the predicted class. For regression, it is
the predicted value itself, and for ranking the value can be used to order the test
examples. The test example file has the same format as the one for svm_learn. Again,
<class> can have the value zero indicating unknown.




                                                                                       7-11
How to use HMMER

                               Manish Kumar

2 Installation
Quick installation instructions configuring, compiling, and installing a source code
distribution
Download the source tarball (hmmer.tar.gz) from ftp://ftp.genetics.wustl.edu/pub/
eddy/hmmer/ or http://hmmer.wustl.edu/.
    Unpack the software:
       > tar xvf hmmer.tar.gz
Go into the newly created top-level directory (named hmmer-xx, where xxis a release
number):
       > cd hmmer-2.3.2
    Configure for your system, and build the programs:
       > ./configure
       > make
    Run the automated testsuite. This is optional. All these tests should pass:
       > make check
    The programs are now in the src/ subdirectory. The man pages are in the
documentation/man subdirectory. You can manually move or copy all of these to
appropriate locations if you want. You will want the programs to be in your $PATH.
Optionally, you can install the man pages and programs in system-wide directories. If
you are happy with the default (programs in /usr/local/bin/ and man pages in /usr/local/
man/man1), do: > make install (You might need to be root when you install, depending
on the permissions on your /usr/local directories.) That’s all. Each of these steps is
documented in more detail below, including how to change the default installation
directories for make install.

Configuring and Installing a precompiled binary distribution
Alternatively, you can obtain a precompiled binary distribution of HMMER from
http://hmmer.wustl.edu/. Thanks to generous hardware support from many
manufacturers, binary distributions are available for most common UNIX and UNIX-like
OS’s. For example, the distribution for Intel x86/GNU Linux machines is
hmmer-2.3.2.bin.intel-linux.tar.gz.
    After you download a binary distribution, unpack it:
      > tar xvf hmmer.bin.intel-linux.tar.gz HMMER is now in
  the newly created top-level directory (named hmmer-xx, where
  xx is a release number). Go into it:
         cd hmmer-2.3.2

   You don’t really need to do anything else. The programs are in the binaries/
subdirectory. The man pages are in the documentation/man subdirectory. The PDF


                                                                                   7-12
copy of the Userguide is in the top level HMMER directory (Userguide.pdf). You can
manually move or copy all of these to appropriate locations if you want. You will want
the programs to be in your $PATH. However, you’ll often want to install in a more
permanent place. To configure with the default locations (programs in /usr/local/bin/ and
man pages in /usr/local/man/man1) and install everything, do:
       > ./configure
       > make install


   If you want to install in different places than the defaults, keep reading; see the
beginning of the section on running the configure script.

System requirements and portability
HMMER is designed to run on UNIX platforms. The code is POSIX-compliant
ANSI C. You need a UNIX operating system to run it. You need an ANSI C
compiler if you want to build it from source.
    Linux and Apple Macintosh OS/X both count as UNIX. Microsoft operating
systems do not. How-ever, HMMER is known to be easy to port to Microsoft
Windows and other non-UNIX operating systems, provided that the platform
supports ANSI C and some reasonable level of POSIX compliance. Running the
testsuite (make check) requires that you have Perl (specifically, /usr/bin/perl).
However, Perl isn’t necessary to make HMMER work. HMMER has support for two
kinds of parallelization: POSIX multithreading and PVM (Parallel Vir-tual Machine)
clustering. Both are optional, not compiled by default; they are enabled by passing
the --enable-threadsor --enable-pvmoptions to the ./configurescript before
compilation. The pre-compiled binary distributions generally support multithreading
but not PVM.




Tutorial
Here’s a tutorial walk-through of some small projects with HMMER. This section should be
sufficient to get you started on work of your own, and you can (at least temporarily) skip the rest of
the Guide.

The programs in HMMER
There are currently nine programs supported in the HMMER 2 package:

                hmmalign Align sequences to an existing model. hmmbuild
                Build a model from a multiple sequence alignment.
                hmmcalibrate Takes an HMM and empirically determines



                                                                                                 7-13
parameters that are used to make searches more sensitive, by
               calculating more accurate expectation value scores (E-values).
               hmmconvert Convert a model file into different formats,
               including a compact HMMER 2 binary format, and “best effort”
               emulation of GCG profiles.

             hmmemit Emit sequences probabilistically from a profile HMM.hmmfetch Get a

            single model from an HMM database.hmmindex Index an HMM database.

             hmmpfam Search an HMM database for matches to a query sequence.hmmsearch

           Search a sequence database for matches to an HMM.

Files used in the tutorial
The subdirectory /tutorialin the HMMER distribution contains the files used in the tutorial, as
well as a number of examples of various file formats that HMMER reads. The important files for
the tutorial are:

                globins50.msf An alignment file of 50 aligned
                globin sequences, in GCG MSF format.
                globins630.fa A FASTA format file of 630
                unaligned globin sequences. fn3.sto An
                alignment file of fibronectin type III domains, in
                Stockholm format. (From Pfam 8.0.) rrm.sto An
                alignment file of RNA recognition motif domains, in
                Stockholm format. (From Pfam 8.0). rrm.hmm An
                example HMM, built from rrm.sto pkinase.sto
                An alignment file of protein kinase catalytic
                domains, in Stockholm format. (From Pfam 8.0).
                Artemia.fa A FASTA file of brine shrimp
                globin, which contains nine tandemly repeated
              globin domains.
          7LESDROME A SWISSPROT file of the Drosophila Sevenless sequence, a receptor
                      tyrosinekinase with multiple domains.

          RU1AHUMAN A SWISSPROT file of the human U1A protein sequence, which contains
                    two RRMdomains.

    Create a new directory that you can work in, and copy all the files in tutorial there. I’ll
assume for the following examples that you’ve installed the HMMER programs in your path; if not,



                                                                                             7-14
you’ll need to give a complete path name to the HMMER programs (e.g. something like
/usr/people/eddy/hmmer-2.2/binaries/hmmbuild instead of just hmmbuild).

Format of input alignment files
HMMER starts with a multiple sequence alignment file that you provide. HMMER can
read alignments in several common formats, including the output of the CLUSTAL
family of programs, Wisconsin/GCG MSF format, the input format for the PHYLIP
phylogenetic analysis programs, and “alighed FASTA” format (where the sequences in a
FASTA file contain gap symbols, so that they are all the same length). HMMER’s native
alignment format is called Stockholm format, the format of the Pfam protein database
that allows extensive markup and annotation. All these formats are documented in a later
section. The software autodetects the alignment file format, so you don’t have to worry
about it. Most of the example alignments in the tutorial are Stockholm files. rrm.sto is a
simple example (generated by stripping all the extra annotation off of a Pfam RNA
recognition motif seed alignment). pkinase.sto and fn3.sto are original Pfam seed
alignments, with all their annotation.

Searching a sequence database with a single profile HMM
One common use of HMMER is to search a sequence database for homologues
of a protein family of interest. You need a multiple sequence alignment of the
sequence family you’re interested in.
      � Can I build a model from unaligned sequences? In principle, profile HMMs can be trainedfrom 
      unaligned sequences; however, this functionality is temporarily withdrawn from HMMER. 
      Irecommend CLUSTALW as an excellent, freely available multiple sequence alignment 
      program.The original hmmt profile HMM training program from HMMER 1 is also still available, 
      fromftp://ftp.genetics.wustl.edu/pub/eddy/hmmer/hmmer­1.8.4.tar.Z.

build a profile HMM with hmmbuild
Let’s assume you have a multiple sequence alignment of a protein domain or
protein sequence family. To use HMMER to search for additional remote
homologues of the family, you want to first build a profile HMM from the
alignment. The following command builds a profile HMM from the alignment
of 50 globin sequences in globins50.msf:
        > hmmbuild globin.hmm globins50.msf

   This gives the following output:
       hmmbuild -build a hidden Markov model from an alignmentHMMER 2.3 (April
       2003)Copyright (C) 1992-2003 HHMI/Washington University School of MedicineFreely
       distributed under the GNU General Public License (GPL)

       Alignment file: globins50.msf




                                                     20
       File format: MSFSearch algorithm configuration: Multiple domain (hmmls)Model
       construction strategy: MAP (gapmax hint: 0.50)Null model used: (default)Prior



                                                                                              7-15
used: (default)Sequence weighting method: G/S/C tree weightsNew HMM file:
        globin.hmm


        Alignment: #1Number of sequences: 50Number of columns: 308

        Determining effective sequence number ... done. [2]Weighting sequences
        heuristically ... done.Constructing model architecture ... done.Converting counts
        to probabilities ... done.Setting model name, etc. ... done. [globins50]

        Constructed a profile HMM (length 143)Average score: 189.04 bitsMinimum score:
        -17.62 bitsMaximum score: 234.09 bitsStd. deviation: 53.18 bits

        Finalizing model configuration ... done.Saving model to file ... done.//

    The process takes a second or two. hmmbuild create a new HMM file called globin.hmm.
This is a human and computer readable ASCII text file, but for now you don’t care. You also don’t
care for now what all the stuff in the output means; I’ll describe it in detail later. The profile HMM
can be treated as a compiled model of your alignment.

calibrate the profile HMM with hmmcalibrate
This step is optional, but doing it will increase the sensitivity of your database search.
     When you search a sequence database, it is useful to get “E-values” (expectation values) in
addition to raw scores. When you see a database hit that scores x, an E-value tells you the number of
hits you would’ve expected to score xor more just by chance in a sequence database of this size.
     HMMER will always estimate an E-value for your hits. However, unless you “calibrate” your
model before a database search, HMMER uses an analytic upper bound calculation that is extremely
conservative. An empirical HMM calibration costs time (about 10% the time of a SWISSPROT
search) but it only has to be done once per model, and can greatly increase the sensitivity of a
database search. To empirically calibrate the E-value calculations for the globin model, type:
        > hmmcalibrate globin.hmm

    which results in:
        hmmcalibrate --calibrate HMM search statisticsHMMER 2.3 (April 2003)Copyright (C)
        1992-2003 HHMI/Washington University School of MedicineFreely distributed under
        the GNU General Public License (GPL)


        HMM file: globin.hmmLength distribution mean: 325Length distribution s.d.:
        200Number of samples: 5000random seed: 1051632537
        histogram(s) saved to: [not saved]POSIX threads: 4


        HMM : globins50mu : -39.897396lambda : 0.226086max : -9.567000//

    This might take several minutes, depending on your machine. Go have a cup of coffee. When it is
complete, the relevant parameters are added to the HMM file. (Note from the “POSIX threads: 4”
line that I’m running on 4 CPUs on a quad-processor box. I’m impatient.)
    Calibrated HMMER E-values tend to be relatively accurate. E-values of 0.1 or less are, in
general, significant hits. Uncalibrated HMMER E-values are also reliable, erring on the cautious side;
uncalibrated models may miss remote homologues.
      � Why doesn’t hmmcalibrate always give the same output, if I run it on the same HMM? 
      It’s fitting a distribution to the scores obtained from a random (Monte Carlo) simulation of a 
      small sequence database, and this random sequence database is different each time. You can 
      make hmmcalibrate give reproducible results by making it initialize its random number 



                                                                                                       7-16
generator with the same seed, using the --seed <x>option, where 
      xisanypositiveinteger.Bydefault,it choosesa“random”seed,which it reports in the output 
      header. You can reproduce an hmmcalibrate run by passing this number as the seed. 
      (Trivia: the default seed is the number of seconds that have passed since the UNIX “epoch” ­
      usually January 1, 1970. hmmcalibrate runs started in the same second will give 
      identical results. Beware, if you’re trying to measure the variance of HMMER’s estimated 
      λ ˆand µˆparameters...) 

search the sequence database with hmmsearch
    As an example of searching for new homologues using a profile HMM, we’ll use
    the globin model to search for globin domains in the example Artemia globin
    sequence in Artemia.fa: > hmmsearch globin.hmm Artemia.fa
      The output comes in several sections, and unlike building and calibrating
  the HMM, where we treated the HMM as a black box, now you do care about
  what it’s saying. The first section is the header that tells you what program
  you ran, on what, and with what options:
        hmmsearch -search a sequence database with a profile HMMHMMER 2.3 (April
        2003)Copyright (C) 1992-2003 HHMI/Washington University School of MedicineFreely
        distributed under the GNU General Public License (GPL)

        HMM file: globin.hmm [globins50]Sequence database: Artemia.faper-sequence score
        cutoff: [none]per-domain score cutoff: [none]per-sequence Eval cutoff: <= 10per-
        domain Eval cutoff: [none]



        Query HMM: globins50Accession: [none]Description: [none]
          [HMM has been calibrated; E-values are empirical estimates]



   The second section is the sequence top hits list. It is a list of ranked top hits (sorted by E-value,
most significant hit first), formatted in a BLAST-like style:
        Scores for complete sequences (score includes all domains):Sequence Description
        Score E-value N

        S13421         S13421         GLOBIN         -BRINE         SHRIMP         474.3         1.7e-143




                                                                                                     7-17

More Related Content

What's hot

Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
mona_hakmy
 
Inter process communication
Inter process communicationInter process communication
Inter process communication
Mohd Tousif
 
unix interprocess communication
unix interprocess communicationunix interprocess communication
unix interprocess communication
guest4c9430
 
Memory allocation for real time operating system
Memory allocation for real time operating systemMemory allocation for real time operating system
Memory allocation for real time operating system
Asma'a Lafi
 

What's hot (19)

Dosass2
Dosass2Dosass2
Dosass2
 
Processes, Threads and Scheduler
Processes, Threads and SchedulerProcesses, Threads and Scheduler
Processes, Threads and Scheduler
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
 
PREGEL a system for large scale graph processing
PREGEL a system for large scale graph processingPREGEL a system for large scale graph processing
PREGEL a system for large scale graph processing
 
Introduction to om ne t++
Introduction to om ne t++Introduction to om ne t++
Introduction to om ne t++
 
Pthreads linux
Pthreads linuxPthreads linux
Pthreads linux
 
Chapter 6 os
Chapter 6 osChapter 6 os
Chapter 6 os
 
EKernel: an object-oriented micro-kernel
EKernel: an object-oriented micro-kernelEKernel: an object-oriented micro-kernel
EKernel: an object-oriented micro-kernel
 
P-Threads
P-ThreadsP-Threads
P-Threads
 
IPC
IPCIPC
IPC
 
Inter process communication
Inter process communicationInter process communication
Inter process communication
 
Simulation using OMNet++
Simulation using OMNet++Simulation using OMNet++
Simulation using OMNet++
 
Porting MPEG-2 files on CerberO, a framework for FPGA based MPSoc
Porting MPEG-2 files on CerberO, a framework for FPGA based MPSocPorting MPEG-2 files on CerberO, a framework for FPGA based MPSoc
Porting MPEG-2 files on CerberO, a framework for FPGA based MPSoc
 
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERSVTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
 
Switches androuters
Switches androutersSwitches androuters
Switches androuters
 
S peculative multi
S peculative multiS peculative multi
S peculative multi
 
unix interprocess communication
unix interprocess communicationunix interprocess communication
unix interprocess communication
 
Memory allocation for real time operating system
Memory allocation for real time operating systemMemory allocation for real time operating system
Memory allocation for real time operating system
 
04 threads-pbl-2-slots
04 threads-pbl-2-slots04 threads-pbl-2-slots
04 threads-pbl-2-slots
 

Viewers also liked (6)

Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
letti
lettiletti
letti
 
Theory Generation for Security Protocols
Theory Generation for Security ProtocolsTheory Generation for Security Protocols
Theory Generation for Security Protocols
 
1
11
1
 
Andrew Shitov Rakudo Jonathan
Andrew Shitov Rakudo JonathanAndrew Shitov Rakudo Jonathan
Andrew Shitov Rakudo Jonathan
 
Pia Vilenius: Tutkimus tekstiiliteollisuusalan tilanteesta 2015
Pia Vilenius: Tutkimus tekstiiliteollisuusalan tilanteesta 2015Pia Vilenius: Tutkimus tekstiiliteollisuusalan tilanteesta 2015
Pia Vilenius: Tutkimus tekstiiliteollisuusalan tilanteesta 2015
 

Similar to CHAP7.DOC.doc

Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
IOSR Journals
 
Final Report(Routing_Misbehavior)
Final Report(Routing_Misbehavior)Final Report(Routing_Misbehavior)
Final Report(Routing_Misbehavior)
Ambreen Zafar
 
Artificial Neural Network Implementation on FPGA – a Modular Approach
Artificial Neural Network Implementation on FPGA – a Modular ApproachArtificial Neural Network Implementation on FPGA – a Modular Approach
Artificial Neural Network Implementation on FPGA – a Modular Approach
Roee Levy
 

Similar to CHAP7.DOC.doc (20)

Artificial Neural Network Implementation On FPGA Chip
Artificial Neural Network Implementation On FPGA ChipArtificial Neural Network Implementation On FPGA Chip
Artificial Neural Network Implementation On FPGA Chip
 
Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
 
11_Saloni Malhotra_SummerTraining_PPT.pptx
11_Saloni Malhotra_SummerTraining_PPT.pptx11_Saloni Malhotra_SummerTraining_PPT.pptx
11_Saloni Malhotra_SummerTraining_PPT.pptx
 
Neural Network
Neural NetworkNeural Network
Neural Network
 
N ns 1
N ns 1N ns 1
N ns 1
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learning
 
Final Report(Routing_Misbehavior)
Final Report(Routing_Misbehavior)Final Report(Routing_Misbehavior)
Final Report(Routing_Misbehavior)
 
Network simulator
Network  simulatorNetwork  simulator
Network simulator
 
Acem neuralnetworks
Acem neuralnetworksAcem neuralnetworks
Acem neuralnetworks
 
Nn devs
Nn devsNn devs
Nn devs
 
EXPERT SYSTEMS AND ARTIFICIAL INTELLIGENCE_ Neural Networks.pptx
EXPERT SYSTEMS AND ARTIFICIAL INTELLIGENCE_ Neural Networks.pptxEXPERT SYSTEMS AND ARTIFICIAL INTELLIGENCE_ Neural Networks.pptx
EXPERT SYSTEMS AND ARTIFICIAL INTELLIGENCE_ Neural Networks.pptx
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural Networks
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
G010334554
G010334554G010334554
G010334554
 
Cnn
CnnCnn
Cnn
 
Artificial Neural Network Implementation on FPGA – a Modular Approach
Artificial Neural Network Implementation on FPGA – a Modular ApproachArtificial Neural Network Implementation on FPGA – a Modular Approach
Artificial Neural Network Implementation on FPGA – a Modular Approach
 
A simplified design of multiplier for multi layer feed forward hardware neura...
A simplified design of multiplier for multi layer feed forward hardware neura...A simplified design of multiplier for multi layer feed forward hardware neura...
A simplified design of multiplier for multi layer feed forward hardware neura...
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

CHAP7.DOC.doc

  • 1. Description of Major Machine Learning Software Packages How to use the SNNS for implementing ANN Sudipto saha Introduction SNNS (Stuttgart Neural Network Simulator) is a software simulator for neural networks on Unix workstations developed at the Institute for Parallel and Distributed High Performance Systems (IPVR) at the University of Stuttgart. The goal of the SNNS project is to create an efficient and flexible simulation environment for research on and application of neural nets. The SNNS simulator consists of two main components: 1) simultor kernel written in C 2) graphical user interface under X11R4 or X11R5 The simulator kernel operates on the internal network data structures of the neural nets and performs all operations of learning and recall. It can also be used without the other parts as a C program embedded in custom applications. It supports arbitrary network topologies and, like RCS, supports the concept of sites. SNNS can be extended by the user with user defined activation functions, output functions, site functions and learning procedures, which are written as simple C programs and linked to the simulator kernel. The graphical user interface XGUI (X Graphical User Interface), built on top of the ker- nel, gives a 2D and a 3D graphical representation of the neural networks and controls the kernel during the simulation run. In addition, the 2D user interface has an integrated net- work editor which can be used to directly create, manipulate and visualize neural nets in various ways. Building Blocks of Neural Nets The following paragraph describes a generic model for those neural nets that can be gen- erated by the SNNS simulator. The basic principles and the terminology used in dealing with the graphical interface are also briefly introduced. A more general and more de- tailed introduction to connectionism can, e.g., be found in [RM86]. For readers fluent in German, the most comprehensive and up to date book on neural network learning algo- rithms, simulation systems and neural hardware is probably [Zel94] 7-1
  • 2. A network consists of units and directed, weighted links (connections) between them. In analogy to activation passing in biological neurons, each unit receives a net in- put that is computed from the weighted outputs of prior units with connections leading to this unit. Picture shows a small network. Figure: A small network with three layers of units The actual information processing within the units is modeled in the SNNS simulator with the activation function and the output function. The activation function first com- putes the net input of the unit from the weighted output values of prior units. It then com- putes the new activation from this net input (and possibly its previous activation). The output function takes this result to generate the output of the unit. These functions can be arbitrary C functions linked to the simulator kernel and may be different for each unit. Our simulator uses a discrete clock. Time is not modeled explicitly (i.e. there is no prop- agation delay or explicit modeling of activation functions varying over time). Rather, the net executes in update steps, where is the activation of a unit one step after . The SNNS simulator, just like the Rochester Connectionist Simulator (RCS, [God87]), offers the use of sites as additional network element. Sites are a simple model of the dendrites of a neuron which allow a grouping and different treatment of the input signals of a cell. Each site can have a different site function. This selective treatment of incom- ing information allows more powerful connectionist models. Figure shows one unit with sites and one without. Figure: One unit with sites and one without In the following all the various network elements are described in detail. 7-2
  • 3. Units Depending on their function in the net, one can distinguish three types of units: The units whose activations are the problem input for the net are called input units; the units whose output represent the output of the net output units. The remaining units are called hidden units, because they are not visible from the outside (see e.g. figure ). In most neural network models the type correlates with the topological position of the unit in the net: If a unit does not have input connections but only output connections, then it is an input unit. If it lacks output connections but has input units, it is an output unit. If it has both types of connections it is a hidden unit. It can, however, be the case that the output of a topologically internal unit is regarded as part of the output of the network. The IO-type of a unit used in the SNNS simulator has to be understood in this manner. That is, units can receive input or generate output even if they are not at the fringe of the network. Below, all attributes of a unit are listed: • no: For proper identification, every unit has a number attached to it. This number defines the order in which the units are stored in the simulator kernel. • name: The name can be selected arbitrarily by the user. It must not, however, contain blanks or special characters, and has to start with a letter. It is useful to select a short name that describes the task of the unit, since the name can be displayed with the network. • io-type or io: The IO-type defines the function of the unit within the net. The following alternatives are possible o input: input unit o output: output unit o dual: both input and output unit o hidden: internal, i.e. hidden unit o special: this type can be used in any way, depending upon the application. In the standard version of the SNNS simulator, the weights to such units are not adapted in the learning algorithm (see paragraph ). o special input, special hidden, special output: sometimes it is necessary to to know where in the network a special unit is located. These three types enable the correlation of the units to the various layers of the network. • activation: The activation value. • initial activation or i_act: This variable contains the initial activation 7-3
  • 4. value, present after the initial loading of the net. This initial configuration can be reproduced by resetting ( reset) the net, e.g. to get a defined starting state of the net. • output: the output value. • bias: In contrast to other network simulators where the bias (threshold) of a unit is simulated by a link weight from a special 'on'-unit, SNNS represents it as a unit parameter. In the standard version of SNNS the bias determines where the activation function has its steepest ascent. (see e.g. the activation function Act_logistic). Learning procedures like backpropagation change the bias of a unit like a weight during training. activation function or actFunc: A new activation is computed from the output of preceding units, usually multiplied by the weights connecting these predecessor units with the current unit, the old activation of the unit and its bias. When sites are being used, the network input is computed from the site values. How to obtain SNNS The SNNS simulator can be obtained from download area (http://www- ra.informatik.uni-tuebingen.de/downloads/SNNS/) or via anonymous ftp (deprecated) from host ftp.informatik.uni-tuebingen.de in the subdirectory /pub/SNNS as file SNNSv4.1.tar.Z (2.6 MB) or in zipped version SNNSv4.1.tar.gz (1.6 MB) Be sure to set the ftp mode to binary before transmission of the files. Also watch out for possible higher version numbers, patches or Readme files in the above directory /pub/SNNS. After successful transmission of the file move it to the directory where you want to install SNNS, uncompress and extract the file with the Unix commands uncompress SNNSv4.1.tar.Z tar xvf SNNSv4.1.tar The SNNS distribution includes full source code, installation procedures for supported machine architectures and some simple examples of trained networks. The PostScript version of the user manual can be obtained as file SNNSv4.1.Manual.ps.Z (1.6 MB) or in 15 parts as files SNNSv4.1.Manual.part01.ps.Z ... SNNSv4.1.Manual.part15.ps.Z 7-4
  • 5. SNNS 4.2 for MS-Windows (http://www-ra.informatik.uni- tuebingen.de/downloads/SNNS/Windows/ ) On-line SNNS User Manual (version 4.1) On line manual is available at this site http://www ra.informatik.unituebingen.de/SNNS/UserManual/UserManual.html Input file in txt Total number of sequence in this file is 78 and 10mer in length. An example of sequences in fasta format, >IgE_epitope1 AEDEDNQQGQ >IgE_epitope2 AEEVEEERLK >IgE_epitope3 AKSSPYQKKT >IgE_epitope4 APRIVLDVAS >IgE_epitope5 AVADVTPKQL >IgE_epitope6 AVITWRALNK >IgE_epitope7 AVPLYNRFSY >IgE_epitope8 CDRPPKHSQN >IgE_epitope9 CSGTKKLSEE >IgE_epitope10 DGKTGSSTPH Input file in binary form Input file for SNNS, here amino acid composition of the sequences are calculated. Note that the first 7 lines of the input file. First two lines , followed by to blank line then the number of patterens (78 in this case, since total seqience is 78), number of input units (20 in this case, calculating the amino acid composition) and the out puts (1, one value) 7-5
  • 6. SNNS pattern definition file V4.2 generated at Sat Aug 27 16:40:25 2005 No. of patterns : 78 No. of input units : 20 No. of output units : 1 # Input pattern 1: 0.1 0 0.2 0.2 0 0.1 0 0 0 0 0 0.1 0 0.3 0 0 0 0 0 0 # Output pattern 1: 1 # Input pattern 2: 0.1 0 0 0.5 0 0 0 0 0.1 0.1 0 0 0 0 0.1 0 0 0.1 0 0 # Output pattern 2: 1 Output file of SNNS The out put file of the SNNS is shown. The result show the summary of information. SNNS result file V1.4-3D generated at Tue Aug 30 08:58:52 2005 No. of patterns : 26 No. of input units : 20 No. of output units : 1 startpattern :1 endpattern : 26 input patterns included teaching output included #1.1 0.1 0 0.1 0 0 0 0 0.1 0.1 0.2 0 0 0 0 0 0.1 0.1 0 0 0.2 1 Out put of SNNS 0.64832 #2.1 0 0 0.1 0.1 0.3 0.1 0 0 0 0 0 0 0.2 0.1 0 0 0.1 0 0 0 1 0.6276 The outputs of the SNNS are process at different threshold (0.1 to 1), and parameters like sensitivity, specificity, and accuracy are calculated. The Artificial neural network tries to classify positive from negative examples. For example here we take an example of IgE epitopes and non epitopes. We need a data set of IgE epitope (positive set) and negative 7-6
  • 7. set (non epitopes). The Netwok will classify this training set, it will be validated by one set (to stop over fitting) and then tested by the left out testing set. Each set contains equal number of sequence. In five fold cross validation it looks like this, Training set Validation set Testing set set 1,2,3 set 4 set 5 set 1,4,5 set 3 set 4 set 1,4,5 set 2 set 3 set 3,4,5 set 1 set 2 set 2,3,4 set 5 set 1 Processing of output data The out put data are processed and interpreted, as shown (Thres=Threshold; Sen=Sensi- tivity; Spe= Specificity; Acc=Accuracy; PPV=positive prediction value) Thres Sen Spe Acc PPV 1.0000 0.0000 0.0000 0.0000 0.0000 0.9000 0.0214 0.9929 0.5071 0.7500 0.8000 0.1429 0.9857 0.5643 0.9091 0.7000 0.2571 0.9571 0.6071 0.8571 0.6000 0.5143 0.8357 0.6750 0.7579 0.5000 0.7214 0.7214 0.7214 0.7214 0.4500 0.8071 0.6000 0.7036 0.6686 0.4000 0.8571 0.4714 0.6643 0.6186 0.3000 0.9571 0.3286 0.6429 0.5877 0.2000 1.0000 0.1000 0.5500 0.5263 0.1000 1.0000 0.0071 0.5036 0.5018 7-7
  • 8. How to use SVMlight efficiently for Implementing SVM Sneh Lata Pandey Description SVMlight is an implementation of Support Vector Machines (SVMs) in C. SVMlight is an implementation of Vapnik's Support Vector Machine for the problem of pattern recognition, for the problem of regression, and for the problem of learning a ranking function. The algorithm has scalable memory requirements and can handle problems with many thousands of support vectors efficiently. The software also provides methods for assessing the generalization performance efficiently. It includes two efficient estimation methods for both error rate and precision/recall. Source Code and Binaries The program is free for scientific use. The software must not be further distributed without prior permission of the author. The implementation was developed on Solaris 2.5 with gcc, but compiles also on SunOS 3.1.4, Solaris 2.7, Linux, IRIX, Windows NT, and Powermac. The source code is available at the following location: http://download.joachims.org/svm_light/current/svm_light.tar.gz If you just want the binaries, you can download them for the following systems: Solaris: http://download.joachims.org/svm_light/current/svm_light_solaris.tar.gz Windows: http://download.joachims.org/svm_light/current/svm_light_windows.zip Cygwin: http://download.joachims.org/svm_light/current/svm_light_cygwin.tar.gz Linux: http://download.joachims.org/svm_light/current/svm_light_linux.tar.gz Installation To install SVMlight you need to download svm_light.tar.gz. Create a new directory: mkdir svm_light Move svm_light.tar.gz to this directory and unpack it with gunzip -c svm_light.tar.gz | tar xvf - Now execute make or make all which compiles the system and creates the two executables svm_learn (learning module) svm_classify (classification module) How to use This section explains how to use the SVMlight software. SVMlight consists of a learning module (svm_learn) and a classification module (svm_classify). The classification module can be used to apply the learned model to new examples. See also the examples below for how to use svm_learn and svm_classify. svm_learn is called with the following parameters: svm_learn [options] example_file model_file 7-8
  • 9. Available options are: General options: -? - this help -v [0..3] - verbosity level (default 1) Learning options: -z {c,r,p} - select between classification (c), regression (r), and preference ranking (p) (see [Joachims, 2002c]) (default classification) -c float - C: trade-off between training error and margin (default [avg. x*x]^-1) -w [0..] - epsilon width of tube for regression (default 0.1) -j float - Cost: cost-factor, by which training errors on positive examples outweight errors on negative examples (default 1) (see [Morik et al., 1999]) -b [0,1] - use biased hyperplane (i.e. x*w+b0) instead of unbiased hyperplane (i.e. x*w0) (default 1) -i [0,1] - remove inconsistent training examples and retrain (default 0) Performance estimation options: -x [0,1] - compute leave-one-out estimates (default 0) -o ]0..2] - value of rho for XiAlpha-estimator and for pruning leave-one-out computation (default 1.0) (see [Joachims, 2002a]) -k [0..100] - search depth for extended XiAlpha-estimator (default 0) Kernel options: -t int - type of kernel function: 0: linear (default) 1: polynomial (s a*b+c)^d 2: radial basis function exp(-gamma ||a-b||^2) 3: sigmoid tanh(s a*b + c) 4: user defined kernel from kernel.h -d int - parameter d in polynomial kernel -g float - parameter gamma in rbf kernel -s float - parameter s in sigmoid/poly kernel -r float - parameter c in sigmoid/poly kernel -u string - parameter of user defined kernel Optimization options (see [Joachims, 1999a], [Joachims, 2002a]): -q [2..] - maximum size of QP-subproblems (default 10) -n [2..q] - number of new variables entering the working set in each iteration (default n = q). Set n<q to prevent zig-zagging. -m [5..] - size of cache for kernel evaluations in MB (default 40) The larger the faster... -e float - eps: Allow that error for termination criterion [y [w*x+b] - 1] = eps (default 0.001) 7-9
  • 10. -h [5..] - number of iterations a variable needs to be optimal before considered for shrinking (default 100) -f [0,1] - do final optimality check for variables removed by shrinking. Although this test is usually positive, there is no guarantee that the optimum was found if the test is omitted. (default 1) -y string -> if option is given, reads alphas from file with given and uses them as starting point. (default 'disabled') -# int -> terminate optimization, if no progress after this number of iterations. (default 100000) Output options: -l char - file to write predicted labels of unlabeled examples into after transductive learning -a char - write all alphas to this file after learning (in the same order as in the training set) A more detailed description of the parameters and how they link to the respective algorithms is given in the appendix of [Joachims, 2002a]. The input file example_file contains the training examples. The first lines may contain comments and are ignored if they start with #. Each of the following lines represents one training example and is of the following format: <line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info> <target> .=. +1 | -1 | 0 | <float> <feature> .=. <integer> | "qid" <value> .=. <float> <info> .=. <string> A space character separates the target value and each of the feature/value pairs. Feature/value pairs MUST be ordered by increasing feature number. Features with value zero can be skipped. The string <info> can be used to pass additional information to the kernel (e.g. non feature vector data). In classification mode, the target value denotes the class of the example. +1 as the target value marks a positive example, -1 a negative example respectively. So, for example, the line -1 1:0.43 3:0.12 9284:0.2 # abcdef specifies a negative example for which feature number 1 has the value 0.43, feature number 3 has the value 0.12, feature number 9284 has the value 0.2, and all the other features have value 0. In addition, the string abcdef is stored with the vector, which can serve as a way of providing additional information for user defined kernels. The order of the predictions is the same as in the training data. In regression mode, the <target> contains the real-valued target value. In all modes, the result of svm_learn is the model which is learned from the training data in example_file. The model is written to model_file. To make predictions on test examples, svm_classify reads this file. svm_classify is called with the following parameters: svm_classify [options] example_file model_file output_file Available options are: -h Help. -v [0..3] Verbosity level (default 2). 7-10
  • 11. -f [0,1] 0: old output format of V1.0 1: output the value of decision function (default) The test examples in example_file are given in the same format as the training examples (possibly with 0 as class label). For all test examples in example_file the predicted values are written to output_file. There is one line per test example in output_file containing the value of the decision function on that example. For classification, the sign of this value determines the predicted class. For regression, it is the predicted value itself, and for ranking the value can be used to order the test examples. The test example file has the same format as the one for svm_learn. Again, <class> can have the value zero indicating unknown. 7-11
  • 12. How to use HMMER Manish Kumar 2 Installation Quick installation instructions configuring, compiling, and installing a source code distribution Download the source tarball (hmmer.tar.gz) from ftp://ftp.genetics.wustl.edu/pub/ eddy/hmmer/ or http://hmmer.wustl.edu/. Unpack the software: > tar xvf hmmer.tar.gz Go into the newly created top-level directory (named hmmer-xx, where xxis a release number): > cd hmmer-2.3.2 Configure for your system, and build the programs: > ./configure > make Run the automated testsuite. This is optional. All these tests should pass: > make check The programs are now in the src/ subdirectory. The man pages are in the documentation/man subdirectory. You can manually move or copy all of these to appropriate locations if you want. You will want the programs to be in your $PATH. Optionally, you can install the man pages and programs in system-wide directories. If you are happy with the default (programs in /usr/local/bin/ and man pages in /usr/local/ man/man1), do: > make install (You might need to be root when you install, depending on the permissions on your /usr/local directories.) That’s all. Each of these steps is documented in more detail below, including how to change the default installation directories for make install. Configuring and Installing a precompiled binary distribution Alternatively, you can obtain a precompiled binary distribution of HMMER from http://hmmer.wustl.edu/. Thanks to generous hardware support from many manufacturers, binary distributions are available for most common UNIX and UNIX-like OS’s. For example, the distribution for Intel x86/GNU Linux machines is hmmer-2.3.2.bin.intel-linux.tar.gz. After you download a binary distribution, unpack it: > tar xvf hmmer.bin.intel-linux.tar.gz HMMER is now in the newly created top-level directory (named hmmer-xx, where xx is a release number). Go into it:  cd hmmer-2.3.2 You don’t really need to do anything else. The programs are in the binaries/ subdirectory. The man pages are in the documentation/man subdirectory. The PDF 7-12
  • 13. copy of the Userguide is in the top level HMMER directory (Userguide.pdf). You can manually move or copy all of these to appropriate locations if you want. You will want the programs to be in your $PATH. However, you’ll often want to install in a more permanent place. To configure with the default locations (programs in /usr/local/bin/ and man pages in /usr/local/man/man1) and install everything, do: > ./configure > make install If you want to install in different places than the defaults, keep reading; see the beginning of the section on running the configure script. System requirements and portability HMMER is designed to run on UNIX platforms. The code is POSIX-compliant ANSI C. You need a UNIX operating system to run it. You need an ANSI C compiler if you want to build it from source. Linux and Apple Macintosh OS/X both count as UNIX. Microsoft operating systems do not. How-ever, HMMER is known to be easy to port to Microsoft Windows and other non-UNIX operating systems, provided that the platform supports ANSI C and some reasonable level of POSIX compliance. Running the testsuite (make check) requires that you have Perl (specifically, /usr/bin/perl). However, Perl isn’t necessary to make HMMER work. HMMER has support for two kinds of parallelization: POSIX multithreading and PVM (Parallel Vir-tual Machine) clustering. Both are optional, not compiled by default; they are enabled by passing the --enable-threadsor --enable-pvmoptions to the ./configurescript before compilation. The pre-compiled binary distributions generally support multithreading but not PVM. Tutorial Here’s a tutorial walk-through of some small projects with HMMER. This section should be sufficient to get you started on work of your own, and you can (at least temporarily) skip the rest of the Guide. The programs in HMMER There are currently nine programs supported in the HMMER 2 package: hmmalign Align sequences to an existing model. hmmbuild Build a model from a multiple sequence alignment. hmmcalibrate Takes an HMM and empirically determines 7-13
  • 14. parameters that are used to make searches more sensitive, by calculating more accurate expectation value scores (E-values). hmmconvert Convert a model file into different formats, including a compact HMMER 2 binary format, and “best effort” emulation of GCG profiles. hmmemit Emit sequences probabilistically from a profile HMM.hmmfetch Get a single model from an HMM database.hmmindex Index an HMM database. hmmpfam Search an HMM database for matches to a query sequence.hmmsearch Search a sequence database for matches to an HMM. Files used in the tutorial The subdirectory /tutorialin the HMMER distribution contains the files used in the tutorial, as well as a number of examples of various file formats that HMMER reads. The important files for the tutorial are: globins50.msf An alignment file of 50 aligned globin sequences, in GCG MSF format. globins630.fa A FASTA format file of 630 unaligned globin sequences. fn3.sto An alignment file of fibronectin type III domains, in Stockholm format. (From Pfam 8.0.) rrm.sto An alignment file of RNA recognition motif domains, in Stockholm format. (From Pfam 8.0). rrm.hmm An example HMM, built from rrm.sto pkinase.sto An alignment file of protein kinase catalytic domains, in Stockholm format. (From Pfam 8.0). Artemia.fa A FASTA file of brine shrimp globin, which contains nine tandemly repeated globin domains. 7LESDROME A SWISSPROT file of the Drosophila Sevenless sequence, a receptor tyrosinekinase with multiple domains. RU1AHUMAN A SWISSPROT file of the human U1A protein sequence, which contains two RRMdomains. Create a new directory that you can work in, and copy all the files in tutorial there. I’ll assume for the following examples that you’ve installed the HMMER programs in your path; if not, 7-14
  • 15. you’ll need to give a complete path name to the HMMER programs (e.g. something like /usr/people/eddy/hmmer-2.2/binaries/hmmbuild instead of just hmmbuild). Format of input alignment files HMMER starts with a multiple sequence alignment file that you provide. HMMER can read alignments in several common formats, including the output of the CLUSTAL family of programs, Wisconsin/GCG MSF format, the input format for the PHYLIP phylogenetic analysis programs, and “alighed FASTA” format (where the sequences in a FASTA file contain gap symbols, so that they are all the same length). HMMER’s native alignment format is called Stockholm format, the format of the Pfam protein database that allows extensive markup and annotation. All these formats are documented in a later section. The software autodetects the alignment file format, so you don’t have to worry about it. Most of the example alignments in the tutorial are Stockholm files. rrm.sto is a simple example (generated by stripping all the extra annotation off of a Pfam RNA recognition motif seed alignment). pkinase.sto and fn3.sto are original Pfam seed alignments, with all their annotation. Searching a sequence database with a single profile HMM One common use of HMMER is to search a sequence database for homologues of a protein family of interest. You need a multiple sequence alignment of the sequence family you’re interested in. � Can I build a model from unaligned sequences? In principle, profile HMMs can be trainedfrom  unaligned sequences; however, this functionality is temporarily withdrawn from HMMER.  Irecommend CLUSTALW as an excellent, freely available multiple sequence alignment  program.The original hmmt profile HMM training program from HMMER 1 is also still available,  fromftp://ftp.genetics.wustl.edu/pub/eddy/hmmer/hmmer­1.8.4.tar.Z. build a profile HMM with hmmbuild Let’s assume you have a multiple sequence alignment of a protein domain or protein sequence family. To use HMMER to search for additional remote homologues of the family, you want to first build a profile HMM from the alignment. The following command builds a profile HMM from the alignment of 50 globin sequences in globins50.msf: > hmmbuild globin.hmm globins50.msf This gives the following output: hmmbuild -build a hidden Markov model from an alignmentHMMER 2.3 (April 2003)Copyright (C) 1992-2003 HHMI/Washington University School of MedicineFreely distributed under the GNU General Public License (GPL) Alignment file: globins50.msf 20 File format: MSFSearch algorithm configuration: Multiple domain (hmmls)Model construction strategy: MAP (gapmax hint: 0.50)Null model used: (default)Prior 7-15
  • 16. used: (default)Sequence weighting method: G/S/C tree weightsNew HMM file: globin.hmm Alignment: #1Number of sequences: 50Number of columns: 308 Determining effective sequence number ... done. [2]Weighting sequences heuristically ... done.Constructing model architecture ... done.Converting counts to probabilities ... done.Setting model name, etc. ... done. [globins50] Constructed a profile HMM (length 143)Average score: 189.04 bitsMinimum score: -17.62 bitsMaximum score: 234.09 bitsStd. deviation: 53.18 bits Finalizing model configuration ... done.Saving model to file ... done.// The process takes a second or two. hmmbuild create a new HMM file called globin.hmm. This is a human and computer readable ASCII text file, but for now you don’t care. You also don’t care for now what all the stuff in the output means; I’ll describe it in detail later. The profile HMM can be treated as a compiled model of your alignment. calibrate the profile HMM with hmmcalibrate This step is optional, but doing it will increase the sensitivity of your database search. When you search a sequence database, it is useful to get “E-values” (expectation values) in addition to raw scores. When you see a database hit that scores x, an E-value tells you the number of hits you would’ve expected to score xor more just by chance in a sequence database of this size. HMMER will always estimate an E-value for your hits. However, unless you “calibrate” your model before a database search, HMMER uses an analytic upper bound calculation that is extremely conservative. An empirical HMM calibration costs time (about 10% the time of a SWISSPROT search) but it only has to be done once per model, and can greatly increase the sensitivity of a database search. To empirically calibrate the E-value calculations for the globin model, type: > hmmcalibrate globin.hmm which results in: hmmcalibrate --calibrate HMM search statisticsHMMER 2.3 (April 2003)Copyright (C) 1992-2003 HHMI/Washington University School of MedicineFreely distributed under the GNU General Public License (GPL) HMM file: globin.hmmLength distribution mean: 325Length distribution s.d.: 200Number of samples: 5000random seed: 1051632537 histogram(s) saved to: [not saved]POSIX threads: 4 HMM : globins50mu : -39.897396lambda : 0.226086max : -9.567000// This might take several minutes, depending on your machine. Go have a cup of coffee. When it is complete, the relevant parameters are added to the HMM file. (Note from the “POSIX threads: 4” line that I’m running on 4 CPUs on a quad-processor box. I’m impatient.) Calibrated HMMER E-values tend to be relatively accurate. E-values of 0.1 or less are, in general, significant hits. Uncalibrated HMMER E-values are also reliable, erring on the cautious side; uncalibrated models may miss remote homologues. � Why doesn’t hmmcalibrate always give the same output, if I run it on the same HMM?  It’s fitting a distribution to the scores obtained from a random (Monte Carlo) simulation of a  small sequence database, and this random sequence database is different each time. You can  make hmmcalibrate give reproducible results by making it initialize its random number  7-16
  • 17. generator with the same seed, using the --seed <x>option, where  xisanypositiveinteger.Bydefault,it choosesa“random”seed,which it reports in the output  header. You can reproduce an hmmcalibrate run by passing this number as the seed.  (Trivia: the default seed is the number of seconds that have passed since the UNIX “epoch” ­ usually January 1, 1970. hmmcalibrate runs started in the same second will give  identical results. Beware, if you’re trying to measure the variance of HMMER’s estimated  λ ˆand µˆparameters...)  search the sequence database with hmmsearch As an example of searching for new homologues using a profile HMM, we’ll use the globin model to search for globin domains in the example Artemia globin sequence in Artemia.fa: > hmmsearch globin.hmm Artemia.fa The output comes in several sections, and unlike building and calibrating the HMM, where we treated the HMM as a black box, now you do care about what it’s saying. The first section is the header that tells you what program you ran, on what, and with what options: hmmsearch -search a sequence database with a profile HMMHMMER 2.3 (April 2003)Copyright (C) 1992-2003 HHMI/Washington University School of MedicineFreely distributed under the GNU General Public License (GPL) HMM file: globin.hmm [globins50]Sequence database: Artemia.faper-sequence score cutoff: [none]per-domain score cutoff: [none]per-sequence Eval cutoff: <= 10per- domain Eval cutoff: [none] Query HMM: globins50Accession: [none]Description: [none] [HMM has been calibrated; E-values are empirical estimates] The second section is the sequence top hits list. It is a list of ranked top hits (sorted by E-value, most significant hit first), formatted in a BLAST-like style: Scores for complete sequences (score includes all domains):Sequence Description Score E-value N S13421 S13421 GLOBIN -BRINE SHRIMP 474.3 1.7e-143 7-17