SlideShare a Scribd company logo
1 of 36
EBI is an Outstation of the European Molecular Biology Laboratory.
http://www.ebi.ac.uk/interpro
• is a database that groups predictive protein signatures together
• 11 member databases
• single searchable resource
• provides functional analysis of proteins by classifying them into
families and predicting domains and important sites
• Enables whole genome analysis
InterPro
http://www.ebi.ac.uk/interpro
InterPro Consortium
Consortium of 11 major
signature databases
http://www.ebi.ac.uk/interpro
Protein signatures
• More sensitive homology searches
• Each member database creates signatures using different methods and
methodologies:
 manually-created sequence alignments
 automatic processes with some human input and correction
 entirely automatically.
http://www.ebi.ac.uk/interpro
Why do we need predictive
annotation tools?
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
5-Jan-04 5-Jan-05 5-Jan-06 5-Jan-07 5-Jan-08 5-Jan-09 5-Jan-10
Numberofsequences
Date
UniProtKB
UniProtKB/Swiss-Prot
http://www.ebi.ac.uk/interpro
What are protein signatures?
Multiple sequence alignment
Protein family/domain
Build model
Search
Mature
model
ITWKGPVCGLDGKTYRNECALL
AVPRSPVCGSDDVTYANECELK
UniProtit.
Significant
match
Protein analysis
http://www.ebi.ac.uk/interpro
Member databases
Hidden Markov Models Finger-
Prints
Profiles Patterns
Sequence
Clusters
Structural Domains
Functional annotation of families/domains
Prediction of
conserved
domains
Protein features
(active sites…)
METHODS
http://www.ebi.ac.uk/interpro
InterPro entry
http://www.ebi.ac.uk/interpro
InterPro entry
http://www.ebi.ac.uk/interpro
The InterPro entry: types
Proteins share a common evolutionary origin, as reflected in their
related functions, sequences or structure
Family
Distinct functional, structural or sequence units that may exist in a
variety of biological contextsDomain
Short sequences typically repeated within a proteinRepeats
PTM Active
Site
Binding
Site
Conserved
Site
Sites
http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and viewers
Groups similar signatures together
Adds extensive annotation
Links to other databases
Quality control
Removes redundancy
http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and viewers
Groups similar signatures together
Adds extensive annotation
Links to other databases
 Hierarchical classification
http://www.ebi.ac.uk/interpro
Interpro hierarchies:
Families
FAMILIES can have parent/child relationships with other Families
Parent/Child relationships are based on:
• Comparison of protein hits
 child should be a subset of parent
 siblings should not have matches in common
• Existing hierarchies in member databases
• Biological knowledge of curators
http://www.ebi.ac.uk/interpro
InterPro hierarchies:
Domains
DOMAINS can have
parent/child relationships
with other domains
http://www.ebi.ac.uk/interpro
Domains and Families may be linked
through Domain Organisation
Hierarchy
http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and viewers
Groups similar signatures together
Adds extensive annotation
Links to other databases
http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and viewers
Groups similar signatures together
Adds extensive annotation
Links to other databases
The Gene Ontology project provides a
controlled vocabulary of terms for
describing gene product characteristics
http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and viewers
Groups similar signatures together
Adds extensive annotation
Links to other databases
UniProt
KEGG ... Reactome ... IntAct ...
UniProt taxonomy
PANDIT ... MEROPS ... Pfam clans ...
Pubmed
http://www.ebi.ac.uk/interpro
InterPro Entry
Adds extensive annotation
Links to other databases
Structural information and viewers
Groups similar signatures together
Adds extensive annotation
Links to other databases
PDB 3-D Structures
SCOP Structural
domains
CATH Structural
domain classification
http://www.ebi.ac.uk/interpro
Protein
Sequence
Predictive
Models
Analysis
algorithm
“Raw”
Matches
Filtering
algorithm
Reported
Matches
InterProScan
http://www.ebi.ac.uk/interpro
Interactive:
http://www.ebi.ac.uk/Tools/pfa/iprscan/
Webservice (SOAP and REST):
http://www.ebi.ac.uk/Tools/webservices/services/pfa/iprscan_rest
http://www.ebi.ac.uk/Tools/webservices/services/pfa/iprscan_soap
Downloadable:
ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/
InterProScan access
http://www.ebi.ac.uk/interpro
Why redesign InterProScan?
• InterProScan 4
– complicated installation
– complicated update
– limited queuing system
• Only guaranteed with LSF
– limited configurability
– reliability
http://www.ebi.ac.uk/interpro
InterProScan 5.0 aims
• Easy install and configuration
• Modular
• Expandable
• Easily integrated into existing pipelines
• Incorporate new data model / XML exchange format
• Easy to port on to different architectures:
• Desktop machine
• Simple LAN
• LSF
• PBS
• Sun Grid Engine ...cloud? GRID?
• Reliablity
http://www.ebi.ac.uk/interpro
InterProScan 5 Technology
http://www.ebi.ac.uk/interpro
Oracle
PostgreSQL
HSQLDB
File
system
Data Model
Database Access File I/O
Business Logic:
performing analyses
Job Management:
scheduling analyses
JMS:
monitoring queues
XML
Cluster
platform
One-way dependencies +
replaceable layers
=
low-coupling + maintainable
Web
services
Architecture
Java API
InterPro
website
http://www.ebi.ac.uk/interpro
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
Monitoring & Management
Application
Web or stand-alone app to monitor & manage InterProScan
Broker starts
workers on demand
Workers take tasks
off queues
• Simple and robust programming model
• Mature and stable standard – current JMS version released in 2002
• Guaranteed message delivery to a single worker
• Easy to monitor
• Flexible – easy to implement on multiple platforms
Java Messaging Service
“Master”
Schedules tasks &
sub-tasks, and places
on queue
Broker
Manages
queues & topics
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Peforms task /
sub-task and
reports back to
Broker
“Worker”
Performs task /
sub-task, reports
back to Broker
http://www.ebi.ac.uk/interpro
Beta release functionality
http://www.ebi.ac.uk/interpro
Installation
• Requirements
– Java 1.6
– Linux
– Perl
• Installation process
– ready to use
wget ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/i5-dist.tar.gz
tar –xzf i5-dist.tar.gz
http://www.ebi.ac.uk/interpro
./interproscan.sh -i test_proteins.fasta -o test_proteins.tsv --goterms
A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 Pfam PF00085 Thioredoxin 9 112 1.3E-28
T 08-07-2011 IPR013766 Thioredoxin domain Biological Process:cell redox homeostasis
(GO:0045454)
A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 ProSitePatterns PS00194 Thioredoxin family active site.
32 50 - T 08-07-2011 IPR017937 Thioredoxin, conserved site Biological
Process:cell redox homeostasis (GO:0045454)
A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 PIRSF PIRSF000077 null 4 113
1.50000307E-27 T 08-07-2011 IPR005746 Thioredoxin Molecular Function:protein disulfide
oxidoreductase activity (GO:0015035), Biological Process:glycerol ether metabolic process (GO:0006662),
Biological Process:cell redox homeostasis (GO:0045454), Molecular Function:electron carrier activity
(GO:0009055)
A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 PRINTS PR00421 Thioredoxin family signature 39
48 - T 08-07-2011 IPR005746 Thioredoxin Molecular Function:protein disulfide
oxidoreductase activity (GO:0015035), Biological Process:glycerol ether metabolic process (GO:0006662),
Biological Process:cell redox homeostasis (GO:0045454), Molecular Function:electron carrier activity
(GO:0009055)
A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 PRINTS PR00421 Thioredoxin family signature 78
89 - T 08-07-2011 IPR005746 Thioredoxin Molecular Function:protein disulfide
oxidoreductase activity (GO:0015035), Biological Process:glycerol ether metabolic process (GO:0006662),
Biological Process:cell redox homeostasis (GO:0045454), Molecular Function:electron carrier activity
(GO:0009055)
A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 PRINTS PR00421 Thioredoxin family signature 31
39 - T 08-07-2011 IPR005746 Thioredoxin Molecular Function:protein disulfide
oxidoreductase activity (GO:0015035), Biological Process:glycerol ether metabolic process (GO:0006662),
Biological Process:cell redox homeostasis (GO:0045454), Molecular Function:electron carrier activity
(GO:0009055)
Default tab-separated values output
http://www.ebi.ac.uk/interpro
./interproscan.sh -i test_proteins.fasta -o test_proteins.xml --goterms -F xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<protein-matches xmlns="http://www.ebi.ac.uk/schema/interpro">
<protein>
<sequence
md5="f927b0d241297dcc9a1c5990b58bf3c4">MAAEEGVVIACHNKDEFDAQMTKAKEAGKVVIIDFTASWCGPCRFIAPVFAEYAKKFPGAVFLKVDVDELKEV
AEKYNVEAMPTFLFIKDGAEADKVVGARKDDLQNTIVKHVGATAASASA</sequence>
<xref id="A2YIW7"/>
<matches>
<fingerprints-match graphscan="III" evalue="2.500000864E-7">
<signature name="THIOREDOXIN" desc="Thioredoxin family signature" ac="PR00421">
<models>
<model name="THIOREDOXIN" desc="Thioredoxin family signature" ac="PR00421"/>
</models>
<signature-library-release version="41.1" library="PRINTS"/>
</signature>
<locations>
<fingerprints-location score="0.0" pvalue="0.0" motifNumber="3" end="48" start="39"/>
<fingerprints-location score="0.0" pvalue="0.0" motifNumber="2" end="89" start="78"/>
<fingerprints-location score="0.0" pvalue="0.0" motifNumber="1" end="39" start="31"/>
</locations>
</fingerprints-match>
<hmmer2-match score="100.5" evalue="-INF">
<signature name="Thioredoxin" ac="PIRSF000077">
<models>
<model name="Thioredoxin" ac="PIRSF000077"/>
</models>
<signature-library-release version="2.74" library="PIRSF"/>
</signature>
<locations>
<hmmer2-location hmm-length="0" hmm-end="108" hmm-start="1" evalue="1.50000307E-27"
score="0.0" end="113" start="4"/>
</locations>
</hmmer2-match>
...etc
XML output
http://www.ebi.ac.uk/interpro
• BerkeleyDB-backed REST web service
• Includes matches for all of UniParc (27 million
sequences)
• 250 million matches
• Fast response
• Integrated into i5.
0
50
100
150
200
250
300
350
400
0 10 20 30 40 50 60 70 80
Response Time (ms) per sequence
Pre-calculated match lookup
http://www.ebi.ac.uk/interpro
Other functionality
• Increased reliability
• Precalculated match lookup
• Configuration
– simple properties file
• Nucleotide sequence
– getOrf
– map matches to nucleotide coordinates
• Pathway mapping
– KEGG, Reactome, MetaCyc, Unipathway
http://www.ebi.ac.uk/interpro
Future functionality
• Webservice
• Interact directly with architecture:
– LAN
– LSF
– PBS
– Sun Grid Engine
• Database persistence
– Oracle
– MySQL
– Postgres
– etc
• Graphical output
• Other functionality
– ask!
http://www.ebi.ac.uk/interpro
InterProScan 5 timeline
• Beta release
– August 2011
– InterProScan 4 still maintained
• Full release
– Early 2012
– InterProScan 4 deprecated
interproscan-5-dev@googlegroups.com
http://www.ebi.ac.uk/interpro
Acknowledgements
Craig
McAnulla
Anthony
Quinn
Phil
Jones
Matthew
Fraser
Maxim
Scheremetjew
Alex
Mitchell
Siew-Yit
Yong
Amaia
Sangrador
Sebastien
Pesseat
Sarah
Hunter
Team leader Developers Bioinformaticians Curators
Any Questions → Stand 302
EBI is an Outstation of the European Molecular Biology Laboratory.
Come and see us at
booths 9 and 10!
• Job opportunities
• PhD and postdoc positions
• Training in person and online
• Services
• Industry programme

More Related Content

What's hot

Analysis of gene expression
Analysis of gene expressionAnalysis of gene expression
Analysis of gene expression
Tapeshwar Yadav
 

What's hot (20)

Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomics
 
Protein database
Protein databaseProtein database
Protein database
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Applications of bioinformatics
Applications of bioinformaticsApplications of bioinformatics
Applications of bioinformatics
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
Protein Database
Protein DatabaseProtein Database
Protein Database
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
PROTEIN MICROARRAYS
PROTEIN MICROARRAYSPROTEIN MICROARRAYS
PROTEIN MICROARRAYS
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
The ensembl database
The ensembl databaseThe ensembl database
The ensembl database
 
Analysis of gene expression
Analysis of gene expressionAnalysis of gene expression
Analysis of gene expression
 

Viewers also liked

Block culture of nacirema
Block culture of naciremaBlock culture of nacirema
Block culture of nacirema
Travis Klein
 
1 comprensión de oraciones
1 comprensión de oraciones1 comprensión de oraciones
1 comprensión de oraciones
Isabel Abanto
 
Federmanager bo convegno impermanenza_27_03_13
Federmanager bo  convegno impermanenza_27_03_13Federmanager bo  convegno impermanenza_27_03_13
Federmanager bo convegno impermanenza_27_03_13
Marco Frullanti
 
Cơ bản về tủ lạnh
Cơ bản về tủ lạnhCơ bản về tủ lạnh
Cơ bản về tủ lạnh
machupilani
 
Different games in the phillippines
Different  games in the phillippinesDifferent  games in the phillippines
Different games in the phillippines
janerosco
 
Dia de la_democracia
Dia de la_democraciaDia de la_democracia
Dia de la_democracia
Lauma1416
 

Viewers also liked (20)

Bioinformatics t2-databases v2014
Bioinformatics t2-databases v2014Bioinformatics t2-databases v2014
Bioinformatics t2-databases v2014
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Application of bioinformatics in climate smart horticulture
Application of bioinformatics in climate smart horticultureApplication of bioinformatics in climate smart horticulture
Application of bioinformatics in climate smart horticulture
 
Potato genome sequence paper
Potato genome sequence paperPotato genome sequence paper
Potato genome sequence paper
 
Block culture of nacirema
Block culture of naciremaBlock culture of nacirema
Block culture of nacirema
 
Sistema Europa: Istituzioni e politiche dell'UE
Sistema Europa: Istituzioni e politiche dell'UESistema Europa: Istituzioni e politiche dell'UE
Sistema Europa: Istituzioni e politiche dell'UE
 
White Paper: Integrated Computing Platforms - Infrastructure Builds for Tomor...
White Paper: Integrated Computing Platforms - Infrastructure Builds for Tomor...White Paper: Integrated Computing Platforms - Infrastructure Builds for Tomor...
White Paper: Integrated Computing Platforms - Infrastructure Builds for Tomor...
 
1 comprensión de oraciones
1 comprensión de oraciones1 comprensión de oraciones
1 comprensión de oraciones
 
Checklist
ChecklistChecklist
Checklist
 
IDC: Selecting the Optimal Path to Private Cloud
IDC: Selecting the Optimal Path to Private CloudIDC: Selecting the Optimal Path to Private Cloud
IDC: Selecting the Optimal Path to Private Cloud
 
Federmanager bo convegno impermanenza_27_03_13
Federmanager bo  convegno impermanenza_27_03_13Federmanager bo  convegno impermanenza_27_03_13
Federmanager bo convegno impermanenza_27_03_13
 
The Impact of Music & Artists
The Impact of Music & ArtistsThe Impact of Music & Artists
The Impact of Music & Artists
 
SAP HANA in an EMC Private Cloud
SAP HANA in an EMC Private CloudSAP HANA in an EMC Private Cloud
SAP HANA in an EMC Private Cloud
 
Cơ bản về tủ lạnh
Cơ bản về tủ lạnhCơ bản về tủ lạnh
Cơ bản về tủ lạnh
 
Different games in the phillippines
Different  games in the phillippinesDifferent  games in the phillippines
Different games in the phillippines
 
Big Data: A CIO’s Cut Out and Keep Guide
Big Data: A CIO’s Cut Out and Keep Guide Big Data: A CIO’s Cut Out and Keep Guide
Big Data: A CIO’s Cut Out and Keep Guide
 
TechBook: IMS on z/OS Using EMC Symmetrix Storage Systems
TechBook: IMS on z/OS Using EMC Symmetrix Storage SystemsTechBook: IMS on z/OS Using EMC Symmetrix Storage Systems
TechBook: IMS on z/OS Using EMC Symmetrix Storage Systems
 
White Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum DatabaseWhite Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum Database
 
EMC Perspective: Big Data Transforms the Life Science Commercial Model
EMC Perspective: Big Data Transforms the Life Science Commercial ModelEMC Perspective: Big Data Transforms the Life Science Commercial Model
EMC Perspective: Big Data Transforms the Life Science Commercial Model
 
Dia de la_democracia
Dia de la_democraciaDia de la_democracia
Dia de la_democracia
 

Similar to InterPro and InterProScan 5.0

Similar to InterPro and InterProScan 5.0 (20)

ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
2016-04-21 BioExcel Usecase Open PHACTS
2016-04-21 BioExcel Usecase Open PHACTS2016-04-21 BioExcel Usecase Open PHACTS
2016-04-21 BioExcel Usecase Open PHACTS
 
EnCore & EnVision
EnCore & EnVisionEnCore & EnVision
EnCore & EnVision
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
 
EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
 
NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
 
Kino : Making Semantic Annotations Easier
Kino : Making Semantic Annotations EasierKino : Making Semantic Annotations Easier
Kino : Making Semantic Annotations Easier
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications
 
exRNA Data Analysis Tools in the Genboree Workbench
exRNA Data Analysis Tools in the Genboree WorkbenchexRNA Data Analysis Tools in the Genboree Workbench
exRNA Data Analysis Tools in the Genboree Workbench
 
SP Intervets BioInformatics Portal - A customized global Pipeline Pilot Webpo...
SP Intervets BioInformatics Portal - A customized global Pipeline Pilot Webpo...SP Intervets BioInformatics Portal - A customized global Pipeline Pilot Webpo...
SP Intervets BioInformatics Portal - A customized global Pipeline Pilot Webpo...
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 
Neo4j and bioinformatics
Neo4j and bioinformaticsNeo4j and bioinformatics
Neo4j and bioinformatics
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 
Bio4j
Bio4jBio4j
Bio4j
 

More from EBI

The annotation of plant proteins in UniProtKB
The annotation of plant proteins in UniProtKBThe annotation of plant proteins in UniProtKB
The annotation of plant proteins in UniProtKB
EBI
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
EBI
 
The European Nucleotide Archive
The European Nucleotide ArchiveThe European Nucleotide Archive
The European Nucleotide Archive
EBI
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
EBI
 
Automatic Annotation in UniProtKB
Automatic Annotation in UniProtKBAutomatic Annotation in UniProtKB
Automatic Annotation in UniProtKB
EBI
 
The Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation DatabaseThe Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation Database
EBI
 

More from EBI (7)

The annotation of plant proteins in UniProtKB
The annotation of plant proteins in UniProtKBThe annotation of plant proteins in UniProtKB
The annotation of plant proteins in UniProtKB
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
The European Nucleotide Archive
The European Nucleotide ArchiveThe European Nucleotide Archive
The European Nucleotide Archive
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
 
Automatic Annotation in UniProtKB
Automatic Annotation in UniProtKBAutomatic Annotation in UniProtKB
Automatic Annotation in UniProtKB
 
The Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation DatabaseThe Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation Database
 
Train online
Train onlineTrain online
Train online
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

InterPro and InterProScan 5.0

  • 1. EBI is an Outstation of the European Molecular Biology Laboratory.
  • 2. http://www.ebi.ac.uk/interpro • is a database that groups predictive protein signatures together • 11 member databases • single searchable resource • provides functional analysis of proteins by classifying them into families and predicting domains and important sites • Enables whole genome analysis InterPro
  • 4. http://www.ebi.ac.uk/interpro Protein signatures • More sensitive homology searches • Each member database creates signatures using different methods and methodologies:  manually-created sequence alignments  automatic processes with some human input and correction  entirely automatically.
  • 5. http://www.ebi.ac.uk/interpro Why do we need predictive annotation tools? 0 2,000,000 4,000,000 6,000,000 8,000,000 10,000,000 12,000,000 14,000,000 5-Jan-04 5-Jan-05 5-Jan-06 5-Jan-07 5-Jan-08 5-Jan-09 5-Jan-10 Numberofsequences Date UniProtKB UniProtKB/Swiss-Prot
  • 6. http://www.ebi.ac.uk/interpro What are protein signatures? Multiple sequence alignment Protein family/domain Build model Search Mature model ITWKGPVCGLDGKTYRNECALL AVPRSPVCGSDDVTYANECELK UniProtit. Significant match Protein analysis
  • 7. http://www.ebi.ac.uk/interpro Member databases Hidden Markov Models Finger- Prints Profiles Patterns Sequence Clusters Structural Domains Functional annotation of families/domains Prediction of conserved domains Protein features (active sites…) METHODS
  • 10. http://www.ebi.ac.uk/interpro The InterPro entry: types Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure Family Distinct functional, structural or sequence units that may exist in a variety of biological contextsDomain Short sequences typically repeated within a proteinRepeats PTM Active Site Binding Site Conserved Site Sites
  • 11. http://www.ebi.ac.uk/interpro InterPro Entry Adds extensive annotation Links to other databases Structural information and viewers Groups similar signatures together Adds extensive annotation Links to other databases Quality control Removes redundancy
  • 12. http://www.ebi.ac.uk/interpro InterPro Entry Adds extensive annotation Links to other databases Structural information and viewers Groups similar signatures together Adds extensive annotation Links to other databases  Hierarchical classification
  • 13. http://www.ebi.ac.uk/interpro Interpro hierarchies: Families FAMILIES can have parent/child relationships with other Families Parent/Child relationships are based on: • Comparison of protein hits  child should be a subset of parent  siblings should not have matches in common • Existing hierarchies in member databases • Biological knowledge of curators
  • 14. http://www.ebi.ac.uk/interpro InterPro hierarchies: Domains DOMAINS can have parent/child relationships with other domains
  • 15. http://www.ebi.ac.uk/interpro Domains and Families may be linked through Domain Organisation Hierarchy
  • 16. http://www.ebi.ac.uk/interpro InterPro Entry Adds extensive annotation Links to other databases Structural information and viewers Groups similar signatures together Adds extensive annotation Links to other databases
  • 17. http://www.ebi.ac.uk/interpro InterPro Entry Adds extensive annotation Links to other databases Structural information and viewers Groups similar signatures together Adds extensive annotation Links to other databases The Gene Ontology project provides a controlled vocabulary of terms for describing gene product characteristics
  • 18. http://www.ebi.ac.uk/interpro InterPro Entry Adds extensive annotation Links to other databases Structural information and viewers Groups similar signatures together Adds extensive annotation Links to other databases UniProt KEGG ... Reactome ... IntAct ... UniProt taxonomy PANDIT ... MEROPS ... Pfam clans ... Pubmed
  • 19. http://www.ebi.ac.uk/interpro InterPro Entry Adds extensive annotation Links to other databases Structural information and viewers Groups similar signatures together Adds extensive annotation Links to other databases PDB 3-D Structures SCOP Structural domains CATH Structural domain classification
  • 21. http://www.ebi.ac.uk/interpro Interactive: http://www.ebi.ac.uk/Tools/pfa/iprscan/ Webservice (SOAP and REST): http://www.ebi.ac.uk/Tools/webservices/services/pfa/iprscan_rest http://www.ebi.ac.uk/Tools/webservices/services/pfa/iprscan_soap Downloadable: ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/ InterProScan access
  • 22. http://www.ebi.ac.uk/interpro Why redesign InterProScan? • InterProScan 4 – complicated installation – complicated update – limited queuing system • Only guaranteed with LSF – limited configurability – reliability
  • 23. http://www.ebi.ac.uk/interpro InterProScan 5.0 aims • Easy install and configuration • Modular • Expandable • Easily integrated into existing pipelines • Incorporate new data model / XML exchange format • Easy to port on to different architectures: • Desktop machine • Simple LAN • LSF • PBS • Sun Grid Engine ...cloud? GRID? • Reliablity
  • 25. http://www.ebi.ac.uk/interpro Oracle PostgreSQL HSQLDB File system Data Model Database Access File I/O Business Logic: performing analyses Job Management: scheduling analyses JMS: monitoring queues XML Cluster platform One-way dependencies + replaceable layers = low-coupling + maintainable Web services Architecture Java API InterPro website
  • 26. http://www.ebi.ac.uk/interpro “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker Monitoring & Management Application Web or stand-alone app to monitor & manage InterProScan Broker starts workers on demand Workers take tasks off queues • Simple and robust programming model • Mature and stable standard – current JMS version released in 2002 • Guaranteed message delivery to a single worker • Easy to monitor • Flexible – easy to implement on multiple platforms Java Messaging Service “Master” Schedules tasks & sub-tasks, and places on queue Broker Manages queues & topics “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Peforms task / sub-task and reports back to Broker “Worker” Performs task / sub-task, reports back to Broker
  • 28. http://www.ebi.ac.uk/interpro Installation • Requirements – Java 1.6 – Linux – Perl • Installation process – ready to use wget ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/i5-dist.tar.gz tar –xzf i5-dist.tar.gz
  • 29. http://www.ebi.ac.uk/interpro ./interproscan.sh -i test_proteins.fasta -o test_proteins.tsv --goterms A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 Pfam PF00085 Thioredoxin 9 112 1.3E-28 T 08-07-2011 IPR013766 Thioredoxin domain Biological Process:cell redox homeostasis (GO:0045454) A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 ProSitePatterns PS00194 Thioredoxin family active site. 32 50 - T 08-07-2011 IPR017937 Thioredoxin, conserved site Biological Process:cell redox homeostasis (GO:0045454) A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 PIRSF PIRSF000077 null 4 113 1.50000307E-27 T 08-07-2011 IPR005746 Thioredoxin Molecular Function:protein disulfide oxidoreductase activity (GO:0015035), Biological Process:glycerol ether metabolic process (GO:0006662), Biological Process:cell redox homeostasis (GO:0045454), Molecular Function:electron carrier activity (GO:0009055) A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 PRINTS PR00421 Thioredoxin family signature 39 48 - T 08-07-2011 IPR005746 Thioredoxin Molecular Function:protein disulfide oxidoreductase activity (GO:0015035), Biological Process:glycerol ether metabolic process (GO:0006662), Biological Process:cell redox homeostasis (GO:0045454), Molecular Function:electron carrier activity (GO:0009055) A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 PRINTS PR00421 Thioredoxin family signature 78 89 - T 08-07-2011 IPR005746 Thioredoxin Molecular Function:protein disulfide oxidoreductase activity (GO:0015035), Biological Process:glycerol ether metabolic process (GO:0006662), Biological Process:cell redox homeostasis (GO:0045454), Molecular Function:electron carrier activity (GO:0009055) A2YIW7 f927b0d241297dcc9a1c5990b58bf3c4 122 PRINTS PR00421 Thioredoxin family signature 31 39 - T 08-07-2011 IPR005746 Thioredoxin Molecular Function:protein disulfide oxidoreductase activity (GO:0015035), Biological Process:glycerol ether metabolic process (GO:0006662), Biological Process:cell redox homeostasis (GO:0045454), Molecular Function:electron carrier activity (GO:0009055) Default tab-separated values output
  • 30. http://www.ebi.ac.uk/interpro ./interproscan.sh -i test_proteins.fasta -o test_proteins.xml --goterms -F xml <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <protein-matches xmlns="http://www.ebi.ac.uk/schema/interpro"> <protein> <sequence md5="f927b0d241297dcc9a1c5990b58bf3c4">MAAEEGVVIACHNKDEFDAQMTKAKEAGKVVIIDFTASWCGPCRFIAPVFAEYAKKFPGAVFLKVDVDELKEV AEKYNVEAMPTFLFIKDGAEADKVVGARKDDLQNTIVKHVGATAASASA</sequence> <xref id="A2YIW7"/> <matches> <fingerprints-match graphscan="III" evalue="2.500000864E-7"> <signature name="THIOREDOXIN" desc="Thioredoxin family signature" ac="PR00421"> <models> <model name="THIOREDOXIN" desc="Thioredoxin family signature" ac="PR00421"/> </models> <signature-library-release version="41.1" library="PRINTS"/> </signature> <locations> <fingerprints-location score="0.0" pvalue="0.0" motifNumber="3" end="48" start="39"/> <fingerprints-location score="0.0" pvalue="0.0" motifNumber="2" end="89" start="78"/> <fingerprints-location score="0.0" pvalue="0.0" motifNumber="1" end="39" start="31"/> </locations> </fingerprints-match> <hmmer2-match score="100.5" evalue="-INF"> <signature name="Thioredoxin" ac="PIRSF000077"> <models> <model name="Thioredoxin" ac="PIRSF000077"/> </models> <signature-library-release version="2.74" library="PIRSF"/> </signature> <locations> <hmmer2-location hmm-length="0" hmm-end="108" hmm-start="1" evalue="1.50000307E-27" score="0.0" end="113" start="4"/> </locations> </hmmer2-match> ...etc XML output
  • 31. http://www.ebi.ac.uk/interpro • BerkeleyDB-backed REST web service • Includes matches for all of UniParc (27 million sequences) • 250 million matches • Fast response • Integrated into i5. 0 50 100 150 200 250 300 350 400 0 10 20 30 40 50 60 70 80 Response Time (ms) per sequence Pre-calculated match lookup
  • 32. http://www.ebi.ac.uk/interpro Other functionality • Increased reliability • Precalculated match lookup • Configuration – simple properties file • Nucleotide sequence – getOrf – map matches to nucleotide coordinates • Pathway mapping – KEGG, Reactome, MetaCyc, Unipathway
  • 33. http://www.ebi.ac.uk/interpro Future functionality • Webservice • Interact directly with architecture: – LAN – LSF – PBS – Sun Grid Engine • Database persistence – Oracle – MySQL – Postgres – etc • Graphical output • Other functionality – ask!
  • 34. http://www.ebi.ac.uk/interpro InterProScan 5 timeline • Beta release – August 2011 – InterProScan 4 still maintained • Full release – Early 2012 – InterProScan 4 deprecated interproscan-5-dev@googlegroups.com
  • 36. EBI is an Outstation of the European Molecular Biology Laboratory. Come and see us at booths 9 and 10! • Job opportunities • PhD and postdoc positions • Training in person and online • Services • Industry programme

Editor's Notes

  1. Mention why this needs to be InterPro spefic,we have to cover a lot of different member database definitions.
  2. TALK MORE ABOUT HOW WE DO GO MAPPING IN INTERPRO