3. INTRODUCTION
● The Protein Information Resource (PIR) is an integrated public bioinformatics
resource to support genomic, proteomic and systems biology research and
scientific studies.
● The PIR database evolved from the original NBRF Protein Sequence Database,
developed over a 20 year period by the late Margaret O. Dayhoff and published
as the ‘Atlas of Protein Sequence and Structure’. PIR-International is a
collaboration established in 1988 between the NBRF,
4. the Munich Information Center for Protein Sequences (MIPS),and the Japan
International Protein Information Database(JIPID) to collect and publish what
is now the oldest database of biomolecular sequence, source, bibliographic and
feature information.
5.
6. Missions of PIR
1. To create and maintain the Protein Sequence Database as a comprehensive,
non-redundant, well verified collection, organized according to biological
principles, including structural, functional and evolutionary relationships.
2. To provide a research tool that supports the study of protein sequences, their
structural and functional properties, and their biological origins.
3. To freely distribute the database to the public by the most accessible means
including the PIR Web site (see Table 1) and CD-ROM.
7.
8. 4.To collaborate with other databases in organizing and coordinating the
presentation of biomolecular structural information.
9. FEATURES OF THE PROTEIN SEQUENCE
DATABASE
1. Non-redundancy: The database is non-redundant; identical and highly similar
sequences from the same species are merged into a single entry. In merged
entries, each separately reported sequence is represented in a manner that
clearly shows any differences with the canonical sequence shown in the entry
and that allows the reported sequence to be reconstructed on the PIR Web site.
1. Classification: PIR sequences are classified by sequence similarity into
superfamilies, families and homology domains. Alignments of these
10. families are available. Full-scale family classification assists database organization,
improves database integrity and supports database searches by gene families.
3.Standardized annotation: The PIR Database is a value-added database in which
entries are annotated to include important features not found in the original
submission. Full citations are given, including article titles. Genetic information is
provided, including map position, intron positions and start codon (if different from
AUG). Feature annotations and other terminology have been standardized and
restricted vocabularies are enforced to provide greater accuracy and consistency
11. 4. Cross references: To optimize information retrieval, PIR entries are cross-
referenced to major molecular and reference databases, including Medline,
GenBank, EMBL, DDBJ, Protein Data Bank, Human Genome Database and others.
Hypertext-links to the cross-referenced database entry are available on the PIR
Web site.
5. Comprehensiveness: The Protein Sequence Database, supplemented with other
PIR-maintained databases, comprises the most comprehensive collection of non-
redundant protein sequences available.
6.Public domain with regular releases: The database is freely available to the public
and has been updated and released four times per year since 1988.
12. Weekly interim updates of the database are available for searching and browsing on
the PIR Web site. All sequence data are available to the public as soon as they are
available to the PIR staff.
7.Information retrieval: The database serves as a major information resource to
support biological research. Retrieval and knowledge discovery are facilitated by a
variety of search options including various database fields (such as superfamily,
authors, features and keywords) and direct database sequence similarity
searches.Family classification and multiple sequence alignments, coupled with
extensive hypertext-links, make it possible to rapidly find and retrieve information
on related sequences in PIR and other molecular databases.
23. INTRODUCTION
MINT is a database designed to store data on functional
interactions between proteins. Beyond cataloguing binary
complexes, MINT was conceived to store other types of
functional interactions, including enzymatic modifications of
one of the partners.. Release 1.0 of MINT focuses on
experimentally verified protein-protein interactions. Both
direct and indirect relationships are considered.
Furthermore, MINT aims at being exhaustive in the description
of the interaction and, whenever available, information about
kinetic and binding constants and about the domains
participating in the interaction is included in the entry.
24. ● MINT consists of entries extracted from the
scientific literature by expert curators
assisted by `MINT Assistant', a software
that targets abstracts containing
interaction information and presents them
to the curator in a user-friendly format.
RESOURCES
25. ● The interaction data can be easily extracted and viewed
graphically through `MINT Viewer'. Presently MINT
contains 4568 interactions, 782 of which are indirect or
genetic interactions.
● MINT is a relational database designed to collect and
integrate experimental protein interaction data, in a
unique database accessible via a user-friendly web
interface written in an HTML embedded scripting language
named PHP(personal home page) .The MINT core is stored in
an SQL server (PostgreSQL). The entity relationship model
underlying the database structure is shown in a
simplified form in the figure displayed further.
26.
27. Data submission and MINT Assistant
● MINT entries are curated by expert biologists who carefully screen the
interaction information published in peer-reviewed journals.
● Each entry contains a `core' information consisting of the SWISS-
PROT/TREMBL(Translation of EMBL nucleotide sequence
database)accession numbers of the two proteins and the specification of the
functional interaction (binds, activates, phosphorylatesT). Most of the entries in
the database currently refer to a Pubmed identification (PMID) number.
Unpublished observations, however, can also be added to the database.
28. ● Furthermore, the curator can enter information about the domains that are
demonstrated to be involved in the interaction, the binding and/or kinetic
constants and the experimental method(s) utilized to characterize the
interaction.
● The software scans titles and abstracts, extracted from the scientific literature,
by counting words that are frequent in papers describing protein-protein
interactions, essentially as already described.When a protein name is identified,
the program also registers the gene name, protein accession number and any
other information that is required to complete a protein-protein interaction
entry in MINT.
29. ● The software output consists of several html pages that can be viewed by an
internet browser.The front page displays the abstract titles, ranked according
to the likelihood to contain information about protein interactions, as assessed
by a statistical algorithm. By clicking a title the MINT curator has access to a
new html page that provides the information needed to complete the entry.
When an entry is completed, the information is stored in temporary tables
where the data are automatically double-checked and then entered in the
MINT database tables.
● The MIPS (Munich Information Center for Protein Sequences) yeast physical
and genetic interactions tables have also been incorporated into MINT.
30. Searching, browsing and visualizing a
protein network
● Searches can be performed via protein name, accession number or keywords.
The search returns a list of entries containing the query names or keywords
(only if present in the KEYWORDS line of the SWISS-PROT entries).
● By clicking the corresponding protein ID all the interactions described in MINT
and having the selected protein as one of the partners are displayed. Each
interaction ID in the output page is hyperlinked to a MINT entry.
31. ● In order to produce the output, the information about a specific interaction is
retrieved from one of the MINT tables and composed in two frames.
1. The first frame contains information about the interacting proteins,
2. The second shows the features of the interaction itself and the corresponding
experimental procedure.
3. Finally a third frame permits to display graphically the network of the
interacting proteins as produced by `MINT Viewer'
As shown in the IMAGE given in the next slide.
32.
33. ● This tool is based on a java applet derived from the Sun's applet `Graph'
(http://java.sun.com) and adapted for use in this database. Proteins are
represented by ovals whose size is proportional to molecular weight. Protein
interactions are represented by lines (edges) connecting the proteins (nodes).
Both nodes and edges are interactive and the action of clicking results in the
display of additional information about the partner proteins and their
interaction or in the expansion of the displayed network.
34. ● At 1 November 2001, the MINT database contained interaction information
about 3556 proteins from 64 different organisms. These proteins participate in
3786 pairwise interactions, 3 multimeric complexes, and 782 `indirect'
interactions.
● 76% of the interactions rely on a single experimental procedure, mostly yeast
two-hybrid, as many as 206 interactions are supported by three independent
approaches.
Current status of MINT
35. ● More than 700 articles have been processed manually by curators and 569
entries describe interactions between proteins of mammalian organisms.
● It is likely that, as the number of interactions in MINT is increased, the smaller
clusters will merge into a single network.