EBI is an Outstation of the European Molecular Biology Laboratory.
http://www.ebi.ac.uk/interpro
• is a database that groups predictive protein signatures together
• 11 member databases
• s...
http://www.ebi.ac.uk/interpro
InterPro Consortium
Consortium of 11 major
signature databases
http://www.ebi.ac.uk/interpro
Protein signatures
• More sensitive homology searches
• Each member database creates signatu...
http://www.ebi.ac.uk/interpro
Why do we need predictive
annotation tools?
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000...
http://www.ebi.ac.uk/interpro
What are protein signatures?
Multiple sequence alignment
Protein family/domain
Build model
S...
http://www.ebi.ac.uk/interpro
Member databases
Hidden Markov Models Finger-
Prints
Profiles Patterns
Sequence
Clusters
Str...
http://www.ebi.ac.uk/interpro
InterPro entry
http://www.ebi.ac.uk/interpro
InterPro entry
http://www.ebi.ac.uk/interpro
The InterPro entry: types
Proteins share a common evolutionary origin, as reflected in their...
http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and...
http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and...
http://www.ebi.ac.uk/interpro
Interpro hierarchies:
Families
FAMILIES can have parent/child relationships with other Famil...
http://www.ebi.ac.uk/interpro
InterPro hierarchies:
Domains
DOMAINS can have
parent/child relationships
with other domains
http://www.ebi.ac.uk/interpro
Domains and Families may be linked
through Domain Organisation
Hierarchy
http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and...
http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and...
http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and...
http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and...
http://www.ebi.ac.uk/interpro
Protein
Sequence
Predictive
Models
Analysis
algorithm
“Raw”
Matches
Filtering
algorithm
Repo...
http://www.ebi.ac.uk/interpro
Interactive:
http://www.ebi.ac.uk/Tools/pfa/iprscan/
Webservice (SOAP and REST):
http://www....
http://www.ebi.ac.uk/interpro
Why redesign InterProScan?
• InterProScan 4
– complicated installation
– complicated update
...
http://www.ebi.ac.uk/interpro
InterProScan 5.0 aims
• Easy install and configuration
• Modular
• Expandable
• Easily integ...
http://www.ebi.ac.uk/interpro
InterProScan 5 Technology
http://www.ebi.ac.uk/interpro
Oracle
PostgreSQL
HSQLDB
File
system
Data Model
Database Access File I/O
Business Logic:
per...
http://www.ebi.ac.uk/interpro
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task...
http://www.ebi.ac.uk/interpro
Beta release functionality
http://www.ebi.ac.uk/interpro
Installation
• Requirements
– Java 1.6
– Linux
– Perl
• Installation process
– ready to use
...
http://www.ebi.ac.uk/interpro
./interproscan.sh -i test_proteins.fasta -o test_proteins.tsv --goterms
A2YIW7 f927b0d241297...
http://www.ebi.ac.uk/interpro
./interproscan.sh -i test_proteins.fasta -o test_proteins.xml --goterms -F xml
<?xml version...
http://www.ebi.ac.uk/interpro
• BerkeleyDB-backed REST web service
• Includes matches for all of UniParc (27 million
seque...
http://www.ebi.ac.uk/interpro
Other functionality
• Increased reliability
• Precalculated match lookup
• Configuration
– s...
http://www.ebi.ac.uk/interpro
Future functionality
• Webservice
• Interact directly with architecture:
– LAN
– LSF
– PBS
–...
http://www.ebi.ac.uk/interpro
InterProScan 5 timeline
• Beta release
– August 2011
– InterProScan 4 still maintained
• Ful...
http://www.ebi.ac.uk/interpro
Acknowledgements
Craig
McAnulla
Anthony
Quinn
Phil
Jones
Matthew
Fraser
Maxim
Scheremetjew
A...
EBI is an Outstation of the European Molecular Biology Laboratory.
Come and see us at
booths 9 and 10!
• Job opportunities...
Upcoming SlideShare
Loading in...5
×

InterPro and InterProScan 5.0

1,452

Published on

Event: Plant and Animal Genomes conference 2012
Speaker: Sandra Orchard

InterPro is an open-source protein resource used for the automatic annotation of proteins, and is scalable to the analysis of entire new genomes through the use of a downloadable version of InterProScan, which can be incorporated into an existing local pipeline. InterPro integrates protein signatures from 11 major signature databases (CATH-Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY, and TIGRFAMs) into a single resource, taking advantage of the different areas of specialization of each to produce a resource that provides protein classification on multiple levels: protein families, structural superfamilies and functionally close subfamilies, as well as functional domains, repeats and important sites. The InterPro website has been improved, following extensive community consultation and a new version of InterProScan promises improved speed, ease of implementation as well as additional functionalities.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,452
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Mention why this needs to be InterPro spefic,we have to cover a lot of different member database definitions.
  • TALK MORE ABOUT HOW WE DO GO MAPPING IN INTERPRO
  • InterPro and InterProScan 5.0

    1. 1. EBI is an Outstation of the European Molecular Biology Laboratory.
    2. 2. http://www.ebi.ac.uk/interpro • is a database that groups predictive protein signatures together • 11 member databases • single searchable resource • provides functional analysis of proteins by classifying them into families and predicting domains and important sites • Enables whole genome analysis InterPro
    3. 3. http://www.ebi.ac.uk/interpro InterPro Consortium Consortium of 11 major signature databases
    4. 4. http://www.ebi.ac.uk/interpro Protein signatures • More sensitive homology searches • Each member database creates signatures using different methods and methodologies:  manually-created sequence alignments  automatic processes with some human input and correction  entirely automatically.
    5. 5. http://www.ebi.ac.uk/interpro Why do we need predictive annotation tools? 0 2,000,000 4,000,000 6,000,000 8,000,000 10,000,000 12,000,000 14,000,000 5-Jan-04 5-Jan-05 5-Jan-06 5-Jan-07 5-Jan-08 5-Jan-09 5-Jan-10 Numberofsequences Date UniProtKB UniProtKB/Swiss-Prot
    6. 6. http://www.ebi.ac.uk/interpro What are protein signatures? Multiple sequence alignment Protein family/domain Build model Search Mature model ITWKGPVCGLDGKTYRNECALL AVPRSPVCGSDDVTYANECELK UniProtit. Significant match Protein analysis
    7. 7. http://www.ebi.ac.uk/interpro Member databases Hidden Markov Models Finger- Prints Profiles Patterns Sequence Clusters Structural Domains Functional annotation of families/domains Prediction of conserved domains Protein features (active sites…) METHODS
    8. 8. http://www.ebi.ac.uk/interpro InterPro entry
    9. 9. http://www.ebi.ac.uk/interpro InterPro entry
    10. 10. http://www.ebi.ac.uk/interpro The InterPro entry: types Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure Family Distinct functional, structural or sequence units that may exist in a variety of biological contextsDomain Short sequences typically repeated within a proteinRepeats PTM Active Site Binding Site Conserved Site Sites
    11. 11. http://www.ebi.ac.uk/interpro InterPro Entry Adds extensive annotation Links to other databases Structural information and viewers Groups similar signatures together Adds extensive annotation Links to other databases Quality control Removes redundancy
    12. 12. http://www.ebi.ac.uk/interpro InterPro Entry Adds extensive annotation Links to other databases Structural information and viewers Groups similar signatures together Adds extensive annotation Links to other databases  Hierarchical classification
    13. 13. http://www.ebi.ac.uk/interpro Interpro hierarchies: Families FAMILIES can have parent/child relationships with other Families Parent/Child relationships are based on: • Comparison of protein hits  child should be a subset of parent  siblings should not have matches in common • Existing hierarchies in member databases • Biological knowledge of curators
    14. 14. http://www.ebi.ac.uk/interpro InterPro hierarchies: Domains DOMAINS can have parent/child relationships with other domains
    15. 15. http://www.ebi.ac.uk/interpro Domains and Families may be linked through Domain Organisation Hierarchy
    16. 16. http://www.ebi.ac.uk/interpro InterPro Entry Adds extensive annotation Links to other databases Structural information and viewers Groups similar signatures together Adds extensive annotation Links to other databases
    17. 17. http://www.ebi.ac.uk/interpro InterPro Entry Adds extensive annotation Links to other databases Structural information and viewers Groups similar signatures together Adds extensive annotation Links to other databases The Gene Ontology project provides a controlled vocabulary of terms for describing gene product characteristics
    18. 18. http://www.ebi.ac.uk/interpro InterPro Entry Adds extensive annotation Links to other databases Structural information and viewers Groups similar signatures together Adds extensive annotation Links to other databases UniProt KEGG ... Reactome ... IntAct ... UniProt taxonomy PANDIT ... MEROPS ... Pfam clans ... Pubmed
    19. 19. http://www.ebi.ac.uk/interpro InterPro Entry Adds extensive annotation Links to other databases Structural information and viewers Groups similar signatures together Adds extensive annotation Links to other databases PDB 3-D Structures SCOP Structural domains CATH Structural domain classification
    20. 20. http://www.ebi.ac.uk/interpro Protein Sequence Predictive Models Analysis algorithm “Raw” Matches Filtering algorithm Reported Matches InterProScan
    21. 21. http://www.ebi.ac.uk/interpro Interactive: http://www.ebi.ac.uk/Tools/pfa/iprscan/ Webservice (SOAP and REST): http://www.ebi.ac.uk/Tools/webservices/services/pfa/iprscan_rest http://www.ebi.ac.uk/Tools/webservices/services/pfa/iprscan_soap Downloadable: ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/ InterProScan access
    22. 22. http://www.ebi.ac.uk/interpro Why redesign InterProScan? • InterProScan 4 – complicated installation – complicated update – limited queuing system • Only guaranteed with LSF – limited configurability – reliability
    23. 23. http://www.ebi.ac.uk/interpro InterProScan 5.0 aims • Easy install and configuration • Modular • Expandable • Easily integrated into existing pipelines • Incorporate new data model / XML exchange format • Easy to port on to different architectures: • Desktop machine • Simple LAN • LSF • PBS • Sun Grid Engine ...cloud? GRID? • Reliablity
    24. 24. http://www.ebi.ac.uk/interpro InterProScan 5 Technology
    25. 25. http://www.ebi.ac.uk/interpro Oracle PostgreSQL HSQLDB File system Data Model Database Access File I/O Business Logic: performing analyses Job Management: scheduling analyses JMS: monitoring queues XML Cluster platform One-way dependencies + replaceable layers = low-coupling + maintainable Web services Architecture Java API InterPro website
    26. 26. http://www.ebi.ac.uk/interpro “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker Monitoring & Management Application Web or stand-alone app to monitor & manage InterProScan Broker starts workers on demand Workers take tasks off queues • Simple and robust programming model • Mature and stable standard – current JMS version released in 2002 • Guaranteed message delivery to a single worker • Easy to monitor • Flexible – easy to implement on multiple platforms Java Messaging Service “Master” Schedules tasks & sub-tasks, and places on queue Broker Manages queues & topics “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Performs task / sub-task, reports back to Broker
    27. 27. http://www.ebi.ac.uk/interpro Beta release functionality
    28. 28. http://www.ebi.ac.uk/interpro Installation • Requirements – Java 1.6 – Linux – Perl • Installation process – ready to use wget ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/i5-dist.tar.gz tar –xzf i5-dist.tar.gz
    29. 29. http://www.ebi.ac.uk/interpro ./interproscan.sh -i test_proteins.fasta -o test_proteins.tsv --goterms A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 Pfam PF00085 Thioredoxin 9 112 1.3E-28 T 08-07-2011 IPR013766 Thioredoxin domain Biological Process:cell redox homeostasis (GO:0045454) A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 ProSitePatterns PS00194 Thioredoxin family active site. 32 50 - T 08-07-2011 IPR017937 Thioredoxin, conserved site Biological Process:cell redox homeostasis (GO:0045454) A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 PIRSF PIRSF000077 null 4 113 1.50000307E-27 T 08-07-2011 IPR005746 Thioredoxin Molecular Function:protein disulfide oxidoreductase activity (GO:0015035), Biological Process:glycerol ether metabolic process (GO:0006662), Biological Process:cell redox homeostasis (GO:0045454), Molecular Function:electron carrier activity (GO:0009055) A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 PRINTS PR00421 Thioredoxin family signature 39 48 - T 08-07-2011 IPR005746 Thioredoxin Molecular Function:protein disulfide oxidoreductase activity (GO:0015035), Biological Process:glycerol ether metabolic process (GO:0006662), Biological Process:cell redox homeostasis (GO:0045454), Molecular Function:electron carrier activity (GO:0009055) A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 PRINTS PR00421 Thioredoxin family signature 78 89 - T 08-07-2011 IPR005746 Thioredoxin Molecular Function:protein disulfide oxidoreductase activity (GO:0015035), Biological Process:glycerol ether metabolic process (GO:0006662), Biological Process:cell redox homeostasis (GO:0045454), Molecular Function:electron carrier activity (GO:0009055) A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 PRINTS PR00421 Thioredoxin family signature 31 39 - T 08-07-2011 IPR005746 Thioredoxin Molecular Function:protein disulfide oxidoreductase activity (GO:0015035), Biological Process:glycerol ether metabolic process (GO:0006662), Biological Process:cell redox homeostasis (GO:0045454), Molecular Function:electron carrier activity (GO:0009055) Default tab-separated values output
    30. 30. http://www.ebi.ac.uk/interpro ./interproscan.sh -i test_proteins.fasta -o test_proteins.xml --goterms -F xml <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <protein-matches xmlns="http://www.ebi.ac.uk/schema/interpro"> <protein> <sequence md5="f927b0d241297dcc9a1c5990b58bf3c4">MAAEEGVVIACHNKDEFDAQMTKAKEAGKVVIIDFTASWCGPCRFIAPVFAEYAKKFPGAVFLKVDVDELKEV AEKYNVEAMPTFLFIKDGAEADKVVGARKDDLQNTIVKHVGATAASASA</sequence> <xref id="A2YIW7"/> <matches> <fingerprints-match graphscan="III" evalue="2.500000864E-7"> <signature name="THIOREDOXIN" desc="Thioredoxin family signature" ac="PR00421"> <models> <model name="THIOREDOXIN" desc="Thioredoxin family signature" ac="PR00421"/> </models> <signature-library-release version="41.1" library="PRINTS"/> </signature> <locations> <fingerprints-location score="0.0" pvalue="0.0" motifNumber="3" end="48" start="39"/> <fingerprints-location score="0.0" pvalue="0.0" motifNumber="2" end="89" start="78"/> <fingerprints-location score="0.0" pvalue="0.0" motifNumber="1" end="39" start="31"/> </locations> </fingerprints-match> <hmmer2-match score="100.5" evalue="-INF"> <signature name="Thioredoxin" ac="PIRSF000077"> <models> <model name="Thioredoxin" ac="PIRSF000077"/> </models> <signature-library-release version="2.74" library="PIRSF"/> </signature> <locations> <hmmer2-location hmm-length="0" hmm-end="108" hmm-start="1" evalue="1.50000307E-27" score="0.0" end="113" start="4"/> </locations> </hmmer2-match> ...etc XML output
    31. 31. http://www.ebi.ac.uk/interpro • BerkeleyDB-backed REST web service • Includes matches for all of UniParc (27 million sequences) • 250 million matches • Fast response • Integrated into i5. 0 50 100 150 200 250 300 350 400 0 10 20 30 40 50 60 70 80 Response Time (ms) per sequence Pre-calculated match lookup
    32. 32. http://www.ebi.ac.uk/interpro Other functionality • Increased reliability • Precalculated match lookup • Configuration – simple properties file • Nucleotide sequence – getOrf – map matches to nucleotide coordinates • Pathway mapping – KEGG, Reactome, MetaCyc, Unipathway
    33. 33. http://www.ebi.ac.uk/interpro Future functionality • Webservice • Interact directly with architecture: – LAN – LSF – PBS – Sun Grid Engine • Database persistence – Oracle – MySQL – Postgres – etc • Graphical output • Other functionality – ask!
    34. 34. http://www.ebi.ac.uk/interpro InterProScan 5 timeline • Beta release – August 2011 – InterProScan 4 still maintained • Full release – Early 2012 – InterProScan 4 deprecated interproscan-5-dev@googlegroups.com
    35. 35. http://www.ebi.ac.uk/interpro Acknowledgements Craig McAnulla Anthony Quinn Phil Jones Matthew Fraser Maxim Scheremetjew Alex Mitchell Siew-Yit Yong Amaia Sangrador Sebastien Pesseat Sarah Hunter Team leader Developers Bioinformaticians Curators Any Questions → Stand 302
    36. 36. EBI is an Outstation of the European Molecular Biology Laboratory. Come and see us at booths 9 and 10! • Job opportunities • PhD and postdoc positions • Training in person and online • Services • Industry programme
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×