Your SlideShare is downloading. ×
0
Using Ontology to Classify
Members of a Protein Family
Robert Stevens
BioHealth Informatics Group
School of Computer Scien...
Introduction
• Developing an automated system for extracting
and classifying proteins from newly sequenced
genomes
• Build...
Acknowledgements
(it takes all sorts)
Katy Wolstencroft (Bioinformatics)
Daniele Turi (Instance Store)
Phil Lord (myGrid)
...
Protein Classification
• Proteins divided into broad functional classes
“Protein Families”
• Families sub-divided to give ...
Finding Domains on a Sequence
A search of the linear sequence of protein
tyrosine phosphatase type K – identified 9
functi...
Why Classify?
• Classification and curation of a genome is
the first step in understanding the processes
and functions hap...
The Protein Phosphatases
• large superfamily of proteins – involved in
the removal of phosphate groups from
molecules
• Im...
Phosphatase Classification
• Diagnostic phosphatase domains/motifs –
sufficient for membership of the protein
phosphatase ...
Ontologies
• Describing and defining the classes of
objects represented in information
• Defining the characteristics of o...
Web Ontology Language (OWL)
• W3C recommendation for ontologies for the
Semantic Web
• OWL-DL mapped to a decidable fragme...
OWL represents
classes of instances
A
B
C
Necessity and Sufficiency
• An R2A phosphatase must have a fibronectin
domain
• Having a fibronectin domain does not a
pho...
Definition of Tyrosine
Phosphatase
Class TyrosineRreceptorProteinPhosphatase
EquivalentTo: Protein That
- contains atLeast...
…there are known knowns; there are things
we know we know. We also know there are
known unknowns; that is to say we know
t...
Definition of Tyrosine Phosphatase:
What we Know we Know
Class TyrosineRreceptorProteinPhosphatase
EquivalentTo: Protein T...
Definition for R2A Phosphatase
Class: R2A
EquivalentTO: Protein That
- contains 2 ProteinTyrosinePhosphataseDomain and
- (...
Automatic Reasoning
• An OWL-DL ontology mapped to its dL form
as a collection of axioms
• An automatic reasoner checks fo...
Incremental Addition of Protein
Functional Domains
Phosphatase catalytic
Cadherin-like
Immunoglobulin
MAM domain Cellular ...
Building the Ontology
• Classifications already made by biologists – based
on protein functionality;
• Protein domain comp...
Classification of the Classical
Tyrosine Phosphatases
What is the Ontology Telling Us?
• Each class of phosphatase defined in terms of
domain composition
• We know the characte...
Description of an Instance of a
Protein
• Instance: P21592
TypeOf: Protein That
Fact: hasDomain 2
ProteinTyrosinePhosphata...
Instance: P21592        
TypeOf: Protein That
Fact: hasDomain 2
ProteinTyrosinePhosphataseDomain and
Fact: hasdomain 1 Tra...
ClassifyingProteins
>uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine
phosphatase kappa precursor (EC 3.1.3.48) (R...
So Far…..
• Human phosphatases have been classified using
the system
• The ontology classification performed equally well
...
Aspergillus fumigatus
• Phosphatase compliment very different from
human
>100 human <50 A.fumigatus
• Whole subfamilies ‘m...
Scaling
• Over 700 protein families
• Some 14,000 described sequence
features
• Hundreds of thousands types of protein
• M...
Generic Technique
• Feature detection
• Categories defined in terms of those
features
• Produce catalogue of what you
curr...
Conclusions
• Using ontology allows automated classification to
reach the standard of human expert annotation
• Reasoning ...
Upcoming SlideShare
Loading in...5
×

Using Ontology to Classify Members of a Protein Family

126

Published on

Invited talk at Cambridge Chemistry Department

Published in: Science, Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
126
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • &amp;lt;number&amp;gt;
  • All of which helps build better ontologies. But can we actually apply this computational amenability more
    Directly to biological knowledge. In this example, which is work by Katy Wolstencroft, we have codified
    Community knowledge about protein domains in phosphatases in OWL. We then take unknown protein sequences,
    Pass then through interpro and stick them into the instance store, which is basically a database and reasoner tied together
    Qualified Cardiniality!!!
  • Transcript of "Using Ontology to Classify Members of a Protein Family "

    1. 1. Using Ontology to Classify Members of a Protein Family Robert Stevens BioHealth Informatics Group School of Computer Science University of Manchester Robert.stevens@manchester.ac.uk
    2. 2. Introduction • Developing an automated system for extracting and classifying proteins from newly sequenced genomes • Building an OWL ontology that defines class membership • Describing protein instances in OWL • Classifying against the ontology • Describing the protein family complement of a genome • As good as human classification, but added value • Only possible through inter-disciplinary research
    3. 3. Acknowledgements (it takes all sorts) Katy Wolstencroft (Bioinformatics) Daniele Turi (Instance Store) Phil Lord (myGrid) Lydia Tabernero (Protein Scientist) Matt Horridge, Nick Drummond et al (Protégé OWL) Andy Brass and Robert Stevens (Bioinformatics)
    4. 4. Protein Classification • Proteins divided into broad functional classes “Protein Families” • Families sub-divided to give family classifications • Class membership cam be determined by “protein features”, such as domains, etc. • Resources exist for feature detection via primary sequence– but not class membership • Current Limitation of Automated Tools • Needs human knowledge to recognise class membership
    5. 5. Finding Domains on a Sequence A search of the linear sequence of protein tyrosine phosphatase type K – identified 9 functional domains >uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine phosphatase kappa precursor (EC 3.1.3.48) (R-PTP-kappa). MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHV SAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNP GTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYI AIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV… ……..
    6. 6. Why Classify? • Classification and curation of a genome is the first step in understanding the processes and functions happening in an organism • Classification enables comparative genomic studies - what is already known in other organisms • The similarities and differences between processes and functions in related organisms often provide the greatest insight into the biology • In silico characterisation is the current bottleneck
    7. 7. The Protein Phosphatases • large superfamily of proteins – involved in the removal of phosphate groups from molecules • Important proteins in almost all cellular processes • Involved in diseases – diabetes and cancer • human phosphatases well characterised
    8. 8. Phosphatase Classification • Diagnostic phosphatase domains/motifs – sufficient for membership of the protein phosphatase superfamily • Any protein having a phosphatase domain is a member of the phosphatase super-family • Other motifs determine a protein’s place within the family • Usually needs human to recognise that features detected imply class membership • Can these be captured in an ontology?
    9. 9. Ontologies • Describing and defining the classes of objects represented in information • Defining the characteristics of objects • The characteristics by which it can be recognised to which class an object belongs • In a form understandable by a computer • … and, of course, humans.
    10. 10. Web Ontology Language (OWL) • W3C recommendation for ontologies for the Semantic Web • OWL-DL mapped to a decidable fragment of first order logic • Classes, properties and instances • Boolean operators, plus existential and universal quantification • Rich class expressions used in restriction on properties – hasDomain some (ImnunoGlobinDomain or FibronectinDomain)
    11. 11. OWL represents classes of instances A B C
    12. 12. Necessity and Sufficiency • An R2A phosphatase must have a fibronectin domain • Having a fibronectin domain does not a phosphatase make • Necessity -- what must a class instance have? • Any protein that has a phosphatase catalytic domain is a phosphatase enzyme • All phosphatase enzymes have a catalytic domain • Sufficiency – how is an instance recognised to be a member of a class?
    13. 13. Definition of Tyrosine Phosphatase Class TyrosineRreceptorProteinPhosphatase EquivalentTo: Protein That - contains atLeast-1 ProteinTyrosinePhosphataseDomain and - contains EXACTLY 1 TransmembraneDomain
    14. 14. …there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.
    15. 15. Definition of Tyrosine Phosphatase: What we Know we Know Class TyrosineRreceptorProteinPhosphatase EquivalentTo: Protein That - contains atLeast-1 ProteinTyrosinePhosphataseDomain and - contains EXACGTLY 1 TransmembraneDomain
    16. 16. Definition for R2A Phosphatase Class: R2A EquivalentTO: Protein That - contains 2 ProteinTyrosinePhosphataseDomain and - (contains 1 TransmembraneDomain )and - (contains 4 FibronectinDomains) and - contains 1 ImmunoglobulinDomain and - contains 1 MAMDomain and - contains 1 Cadherin-LikeDomain and - contains only TyrosinePhosphataseDomain or TransmembraneDomain or FibronectinDomain or ImnunoglobulinDomain or Clathrin-LikeDomain or ManDomain
    17. 17. Automatic Reasoning • An OWL-DL ontology mapped to its dL form as a collection of axioms • An automatic reasoner checks for satisfiability – throws out the inconsistant and infers subsumption • Defined classes (where there are necessary and sufficient restrictions) enable a reasoner to infer subclass axioms • Also infer to which class an object belongs • Based on the facts we know about it
    18. 18. Incremental Addition of Protein Functional Domains Phosphatase catalytic Cadherin-like Immunoglobulin MAM domain Cellular retinaldehyde Adhesion recognition Transmembrane Fibronectin III Glycosylation
    19. 19. Building the Ontology • Classifications already made by biologists – based on protein functionality; • Protein domain composition and other details in the literature; • Some 50 classes of phosphatase, 30 protein domains and one relationship; • ”Value partition” of protein domains (covering and disjoint); • Defines range of contains property; • Literature contains knowledge of how to recognise members of each class of phosphatase.
    20. 20. Classification of the Classical Tyrosine Phosphatases
    21. 21. What is the Ontology Telling Us? • Each class of phosphatase defined in terms of domain composition • We know the characteristics by which an individual protein can be recognised to be a member of a particular class of phosphatase • We have this knowledge in a computational form • If we had protein instances described in terms of the ontology, we could classify those individual proteins • A catalogue of phosphatases
    22. 22. Description of an Instance of a Protein • Instance: P21592 TypeOf: Protein That Fact: hasDomain 2 ProteinTyrosinePhosphataseDomain and Fact: hasdomain 1 TransmembraneDomain and Fact: hasdomain 4 FibronectinDomains and Fact: hasDomain 1 ImmunoglobulinDomain and Fact: hasdomain 1 MAMDomain and Fact: hasdomain 1 Cadherin-LikeDomain
    23. 23. Instance: P21592         TypeOf: Protein That Fact: hasDomain 2 ProteinTyrosinePhosphataseDomain and Fact: hasdomain 1 TransmembraneDomain and  Fact: hasdomain 4 FibronectinDomains and Fact: hasDomain 1 ImmunoglobulinDomain and Fact: hasdomain 1 MAMDomain and Fact: hasdomain 1 Cadherin-LikeDomain Tyrosine Phosphatase (containsDomain some TransmembraneDomain) and (containsDomain at least 1 ProteinTyrosinePhosphataseDomain) tase n some MAMDomain) and n some ProteinTyrosineCatalyticDomain or ImmunoglobulinDomain) and n some FibronectinDomain or FibronectinTypeIIIFoldDomain) and n exactly 2 ProteinTyrosinePhosphataseDomain)
    24. 24. ClassifyingProteins >uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine phosphatase kappa precursor (EC 3.1.3.48) (R-PTP-kappa). MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHV SAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNP GTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYI AIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV……….. InterPro Instance Store Reasoner Translate Codify
    25. 25. So Far….. • Human phosphatases have been classified using the system • The ontology classification performed equally well as expert classification • The ontology system refined classification - DUSC contains zinc finger domain Characterised and conserved – but not in classification - DUSA contains a disintegrin domain previously uncharacterised – evolutionarily conserved • A new kind of phosphatase?
    26. 26. Aspergillus fumigatus • Phosphatase compliment very different from human >100 human <50 A.fumigatus • Whole subfamilies ‘missing’ Different fungi-specific phosphorylation pathways? No requirement for tissue-specific variations? • Novel serine/threonine phosphatase with homeobox Conserved in aspergillus and closely related species, but not in any other Again, a new phosphatase?
    27. 27. Scaling • Over 700 protein families • Some 14,000 described sequence features • Hundreds of thousands types of protein • Mass classification, then what?
    28. 28. Generic Technique • Feature detection • Categories defined in terms of those features • Produce catalogue of what you currently know • Highlight cases that don’t match current knowledge
    29. 29. Conclusions • Using ontology allows automated classification to reach the standard of human expert annotation • Reasoning capabilities allow interpretation of domain organisation • Capturing human knowledge in computational form • Systematic survey produces interesting biological questions • Discovering the unexpected • Allows fast, efficient comparative genomics studies • A combination of CS and bioinformatics to do biology
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×