Usability and Bioinformatics Experience and Challenges Davide Bolchini University College London, Dept. Computer Science U...
<ul><li>The context </li></ul><ul><ul><li>web and bioinformatics applications </li></ul></ul><ul><li>Research goal </li></...
The context
<ul><li>Bioinformatics (or computational biology): applying computer science tools to the analysis, management and integra...
Bioinformatics researchers Biomedical, industrial researchers use, feed design, use, feed […] designers, developers „ biol...
<ul><li>Contents </li></ul><ul><ul><li>Proteins, protein structures, functions, sequences, genes, genomes, experimental da...
<ul><li>Biologists : research/task support, accessibility, findability, usability  </li></ul><ul><li>Bioinformaticians : a...
<ul><li>Effort, emphasis and primary funding focussed on content production and dissemination of results </li></ul><ul><ul...
Research Goals
<ul><li>Improve the usability of web bioinformatics resources </li></ul><ul><ul><li>„ making the design right“ </li></ul><...
<ul><li>Effort in building integrated interfaces over repositories  (Javaheri) </li></ul><ul><li>Advanced visualization te...
Ongoing work & results
<ul><li>Characterizing usability problems in bioinformatics </li></ul><ul><ul><li>Usability analysis of a sample of well-k...
Concept 4.0 – April 08 Protein Classification: Advanced Browsing
<ul><li>Protein classification based on a hierarchical model </li></ul><ul><ul><li>Each hierarchy level groups proteins wi...
<ul><li>Hierarchical classifications are typically turned into hierarchical navigation models </li></ul><ul><ul><li>Pure t...
Current information architecture and navigation: CATH
<ul><li>Tree-based navigation </li></ul><ul><ul><li>At each level </li></ul></ul><ul><ul><ul><li>access is granted to node...
<ul><li>The challenge is  decoupling  </li></ul><ul><ul><li>Information architecture </li></ul></ul><ul><ul><ul><li>Hierar...
Preliminary  high-level concept
<ul><li>Each classification criterion (hierarchy level) is modelled as a primary navigation dimension (facet or trail) </l...
beta Navigating the Protein Classification Class Architecture Topology Homologous Superfamily + + + +
<ul><li>Remodel the entire hierarchy into a semi-flat structure </li></ul><ul><li>Made of mini-hierarchies of facets-value...
beta Navigating the Protein Classification Class Architecture Topology Homologous Superfamily - - - - Mainly Alpha  Mainly...
1. Visualizing the distribution between classification levels
Topology  (1084) + beta Filter classification by: Navigating the Protein Classification (13) (3) <ul><li>How superfamilies...
<ul><li>Gaining insights about the cardinality of the protein classes, and their relationships </li></ul><ul><li>Skipping ...
2. Navigating the full protein collection by any criterion
beta Filter classification by: Navigating the Protein Classification Protein Domains Architecture: Alpha Horseshoe (443 do...
<ul><li>Faceted navigation </li></ul><ul><ul><li>Any classification dimension can be independently used as suitable criter...
3. Superimposing multiple classifications while browsing the protein collection
beta Filter classification by: Navigating the Protein Classification Protein Domains Architecture: Alpha Horseshoe (443 do...
beta Filter classification by: Navigating the Protein Classification Protein Domains Architecture: Alpha Horseshoe (443 do...
...with progressive filtering
beta Filter classification by: Navigating the Protein Classification Protein Domains Architecture: Alpha Horseshoe (443 do...
4. Associative navigation from the protein details
beta Filter classification by: Navigating the Protein Classification Class  (4) Topology  (3) Ribbon  Single Sheet Roll  B...
beta Filter classification by: Navigating the Protein Classification Protein Domains Architecture: Alpha Horseshoe (443 do...
<ul><li>Enhance serendipity in the user experience </li></ul><ul><ul><li>Discover new proteins sharing properties with a k...
Push communication (notifying local updates)
beta Filter classification by: Navigating the Protein Classification Class  (4) Topology  (3) Ribbon  Single Sheet Roll  B...
beta Filter classification by: Navigating the Protein Classification Protein Domains Architecture: Alpha Horseshoe (443 do...
<ul><li>For biologists </li></ul><ul><ul><li>Focussing long-term research on specific (limited number of) protein domain i...
<ul><li>Strategies to exploit the full potential of the information architecture based on protein classification </li></ul...
<ul><li>Ongoing review, walkthrough of the design concept with bioinformaticians, and the CATH team at UCL </li></ul><ul><...
<ul><li>Capture user requirements of: </li></ul><ul><ul><li>Biologists </li></ul></ul><ul><ul><li>Bioinformaticians </li><...
<ul><li>Davide Bolchini </li></ul><ul><li>[email_address] </li></ul><ul><li>http://bolchini.blogspot.com </li></ul><ul><li...
Upcoming SlideShare
Loading in …5
×

Usability and Bioinformatics: experience and research challenges

1,551 views

Published on

Slides of Davide Bolchini's seminar at City University, London (UK), at Centre for HCI Design, on May 2 2008. Host: Prof. Neil Maiden.

Published in: Technology, Design
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,551
On SlideShare
0
From Embeds
0
Number of Embeds
50
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Usability and Bioinformatics: experience and research challenges

  1. 1. Usability and Bioinformatics Experience and Challenges Davide Bolchini University College London, Dept. Computer Science University of Lugano, Faculty of Communication Sciences, TEC-Lab Joint work with Anthony Finkelstein (UCL), Vito Perrone (UCL), Paolo Paolini (POLIMI and USI), Luca Mainetti (UNILE) Seminar at City University, London, Centre for HCI Design – 2 May 2008
  2. 2. <ul><li>The context </li></ul><ul><ul><li>web and bioinformatics applications </li></ul></ul><ul><li>Research goal </li></ul><ul><ul><li>improve usability and disseminate design knowledge </li></ul></ul><ul><li>Ongoing work </li></ul><ul><ul><li>results and challenges </li></ul></ul><ul><li>Roadmap for future work </li></ul>Outline
  3. 3. The context
  4. 4. <ul><li>Bioinformatics (or computational biology): applying computer science tools to the analysis, management and integration of biological information - genes, genomes, proteins, cells, clinical information - * </li></ul><ul><li>The aim is to elucidate biological processes </li></ul><ul><li>Huge sets of biological data are made publicly accessibile via several databases and repositories </li></ul><ul><li>Web applications are designed on top to disseminate, access, share, cross-reference and manipulate those data. </li></ul>* Adapted from Sylvia B. Nagl „Introduction to Bioinformatics“ – UCL introductory bioinformatics course 2008 Web applications in bioinformatics
  5. 5. Bioinformatics researchers Biomedical, industrial researchers use, feed design, use, feed […] designers, developers „ biologists“, „wet“ scientists
  6. 6. <ul><li>Contents </li></ul><ul><ul><li>Proteins, protein structures, functions, sequences, genes, genomes, experimental data, clinical evidence, … </li></ul></ul><ul><li>Applications </li></ul><ul><ul><li>Hundreds of web repositories being developed, published and updated </li></ul></ul><ul><li>Evolution </li></ul><ul><ul><li>Originally designed by a local team, they become relevant to a wider audience, used for different purposes in different contexts by different people </li></ul></ul><ul><ul><li>Design quality and usability for the end users (biologists) do not always accompany this process </li></ul></ul>A world of etherogenous resources
  7. 7. <ul><li>Biologists : research/task support, accessibility, findability, usability </li></ul><ul><li>Bioinformaticians : accuracy of data, availability </li></ul><ul><li>Developers : efficient design/delivery/implementation, maintainability, … </li></ul><ul><li>Financial partners/funding orgs : „effectiveness“ and „impact“ of the applications funded (usages, satisfaction, „improvement in work“): better science </li></ul><ul><li>… </li></ul>Stakeholders‘ concerns
  8. 8. <ul><li>Effort, emphasis and primary funding focussed on content production and dissemination of results </li></ul><ul><ul><li>Limited attention – so far – to ensure actual usability of the bioinformatics applications for the biologists </li></ul></ul><ul><li>Enhanced usability of the resources can </li></ul><ul><ul><li>Enable life science researchers to exploit the full potential of the data </li></ul></ul><ul><ul><li>Generate wider adoption </li></ul></ul><ul><ul><li>Decrease the cost of technical support </li></ul></ul><ul><ul><li>Increase trust imputed to groups/institution </li></ul></ul><ul><ul><li>Better support them in their work and gain further insights (better science) </li></ul></ul>Motivation for the work
  9. 9. Research Goals
  10. 10. <ul><li>Improve the usability of web bioinformatics resources </li></ul><ul><ul><li>„ making the design right“ </li></ul></ul><ul><ul><ul><li>Ensure the usability of existing applications </li></ul></ul></ul><ul><ul><li>„ making the right design“ </li></ul></ul><ul><ul><ul><li>Re-understand the requirements and provide an enhanced, advanced support for biologists‘ work </li></ul></ul></ul><ul><li>Generate bottom-up awareness in the bioinf. community </li></ul><ul><li>Provide (transfer) tools (methods, patterns, guidelines) to designers to develop applications meeting the requirements of all stakeholders </li></ul>Goals
  11. 11. <ul><li>Effort in building integrated interfaces over repositories (Javaheri) </li></ul><ul><li>Advanced visualization techniques ( Hochheiser, Shneiderman ) </li></ul><ul><li>Analysis of information-driven activities (Bartlett) </li></ul><ul><li>Classification of tasks (Stevens) </li></ul><ul><li>Is this enough? What investigating design and usability issues? </li></ul>Related Research
  12. 12. Ongoing work & results
  13. 13. <ul><li>Characterizing usability problems in bioinformatics </li></ul><ul><ul><li>Usability analysis of a sample of well-known applications </li></ul></ul><ul><ul><ul><li>Usability inspection on a protein classification web application (CATH) [browsing] </li></ul></ul></ul><ul><ul><ul><li>User testing on three major repositories (NCBI, SwissProt, BioCarta) [search] </li></ul></ul></ul><ul><li>Crafting more usable design solutions </li></ul>Understanding usability
  14. 14. Concept 4.0 – April 08 Protein Classification: Advanced Browsing
  15. 15. <ul><li>Protein classification based on a hierarchical model </li></ul><ul><ul><li>Each hierarchy level groups proteins with similar characteristics (based on structure, sequence, functional properties) </li></ul></ul><ul><ul><li>E.g. CATH, SCOP repositories </li></ul></ul>Protein Classification
  16. 16. <ul><li>Hierarchical classifications are typically turned into hierarchical navigation models </li></ul><ul><ul><li>Pure tree navigation structures </li></ul></ul><ul><ul><ul><li>with many levels (7-8) </li></ul></ul></ul><ul><ul><li>Prone to offer rigid navigation mechanisms </li></ul></ul>Protein Classification
  17. 17. Current information architecture and navigation: CATH
  18. 18.
  19. 19. <ul><li>Tree-based navigation </li></ul><ul><ul><li>At each level </li></ul></ul><ul><ul><ul><li>access is granted to nodes to the immediate next level </li></ul></ul></ul><ul><ul><ul><li>nodes further down on the hierarchy are not directly accessible </li></ul></ul></ul><ul><ul><li>To reach leaf nodes (protein domains) the user is forced to traverse all the levels of the hierarchy </li></ul></ul><ul><ul><li>There is a necessary access sequence the user is forced to follow </li></ul></ul><ul><li>Effective when the user is able to specify upfront the values of all (8) parameters of the hierarchy, in order to locate a protein domain </li></ul><ul><li>Less effective when users have more ill-defined knowledge of the classification parameters, need exploring and iteratively refining the browsing scope </li></ul>Opportunities for improvement
  20. 20. <ul><li>The challenge is decoupling </li></ul><ul><ul><li>Information architecture </li></ul></ul><ul><ul><ul><li>Hierarchical </li></ul></ul></ul><ul><ul><ul><li>Useful to represent the domain knowledge </li></ul></ul></ul><ul><ul><ul><li>Metting specific needs of bioinformaticians? </li></ul></ul></ul><ul><ul><li>Navigation/interaction paradigms on top </li></ul></ul><ul><ul><ul><li>Many are possible (including hierarchical ones) </li></ul></ul></ul><ul><ul><ul><li>Supporting a more open-ended set of potential access and exploration tasks </li></ul></ul></ul><ul><ul><ul><li>Useful to browse effectively and efficiently, according to various user’s needs, especially those of biologists </li></ul></ul></ul>Challenge
  21. 21. Preliminary high-level concept
  22. 22. <ul><li>Each classification criterion (hierarchy level) is modelled as a primary navigation dimension (facet or trail) </li></ul><ul><li>It can be „projected“ to any other sublevel to facilitate the representation and visualization of the information </li></ul><ul><li>Hypermedia remodelling </li></ul>Basic Design Paradigm
  23. 23. beta Navigating the Protein Classification Class Architecture Topology Homologous Superfamily + + + +
  24. 24. <ul><li>Remodel the entire hierarchy into a semi-flat structure </li></ul><ul><li>Made of mini-hierarchies of facets-values, or groups of trails </li></ul>Basic Design Paradigm
  25. 25. beta Navigating the Protein Classification Class Architecture Topology Homologous Superfamily - - - - Mainly Alpha Mainly Beta Mixed Alpha-Beta Few Secondary Structures Orthogonal Bundle Up-down Bundle Alpha Horseshoe Alpha solenoid Alpha/alpha barrel Ribbon Single Sheet Roll Beta Barrel Sandwich Distorted Sandwich Trefoil Orthogonal Prism ... (4) (40) (1084) (2091) Single alpha-helices Heat-Stable Enterotoxin B F1FO ATP Synthase Pheromone ER-1 Methane Monooxygenase Chorismate Mutase Domain Acyl-CoA Binding Protein Receptor-associated Protein ADP Ribosyl Cyclase Phospholipase A2 Chitosanase ... Protein binding High density lipoproteins Coiled-coil Complex (site-specific ... Blood coagulation Blood coagulation Integral membrane protein Virus coat protein Regulatory protein Oxidoreductase Transport protein Proteasome activator ...
  26. 26. 1. Visualizing the distribution between classification levels
  27. 27. Topology (1084) + beta Filter classification by: Navigating the Protein Classification (13) (3) <ul><li>How superfamilies are distributed among topologies? </li></ul><ul><li>How superfamilies are distributed among architectures? </li></ul><ul><li>Which homologous superfamilies have architecture „alpha horseshoe“? </li></ul><ul><li>Which topologies have architecture „alpha horseshoe“? </li></ul><ul><li>. . . </li></ul><ul><li>What are the class „alpha“ topologies? </li></ul><ul><li>What are the class „alpha“ architectures? </li></ul><ul><li>How many protein domains are there in each class? </li></ul><ul><li>... </li></ul>Architecture (40) + Class (4) + Homologous Superfamily (2091) + Ribbon [… domains] Single Sheet Roll Beta Barrel [… domains] Clam [… domains] Sandwich Distorted Sandwich [… domains] Trefoil Orthogonal Bundle [… domains] Updown Bundle Alpha Horseshoe [ 443 domains ] Alpha solenoid [6 domains] Alpha/alpha barrel [… domains] Ribbon Single Sheet Roll Beta Barrel Clam Sandwich Distorted Sandwich Trefoil Orthogonal Bundle Updown Bundle Alpha Horseshoe Alpha solenoid Alpha/alpha barrel Ribbon Single Sheet Roll Beta Barrel Clam Sandwich Distorted Sandwich Trefoil Orthogonal Bundle Updown Bundle Alpha Horseshoe Alpha solenoid Alpha/alpha barrel Updown Bundle Sort by: name | domains | code Leucine-rich Repeat Variant [ 91 domains ] 70-kda Soluble Lytic Transglycosylase; domain 1 [ 3 domains ] Serine Threonine Protein Phosphatase 5, Tetratricopeptide repeat [ 349 domains ] Class 1: Mainly Alpha [ 19729 domains ] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain] 70-kda Soluble Lytic Transglycosylase, domain 1 [3 domains] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain] 70-kda Soluble Lytic Transglycosylase, domain 1 [3 domains] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain] 70-kda Soluble Lytic Transglycosylase, domain 1 [3 domains] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain] 70-kda Soluble Lytic Transglycosylase, domain 1 [3 domains] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain]
  28. 28. <ul><li>Gaining insights about the cardinality of the protein classes, and their relationships </li></ul><ul><li>Skipping levels of the hierarchy (top-down) to visualize relative distribution </li></ul><ul><li>Applicable to further sequence levels (SOLID) from any major level (CATH) </li></ul>Potential Benefits
  29. 29. 2. Navigating the full protein collection by any criterion
  30. 30. beta Filter classification by: Navigating the Protein Classification Protein Domains Architecture: Alpha Horseshoe (443 domains) 16 out of 443 domains Class (4) Topology (1084) Homologous Superfamily (2091) Ribbon Single Sheet Roll Beta Barrel Clam Sandwich Distorted Sandwich Trefoil Orthogonal Bundle Updown Bundle Alpha Horseshoe [ 443 domains ] Alpha solenoid [6 domains] Alpha/alpha barrel [… domains] Beta Barrel Clam Distorted Sandwich Orthogonal Bundle Alpha Horseshoe Alpha solenoid Alpha/alpha barrel + + + Sort by: name | domains | code 1mu5A02 1mu5A03 1mu5A05 4mu4A01 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 Architecture (40) _ 1mu5A02 1mu5A03 1mu5A05 4mu4A01 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02
  31. 31. <ul><li>Faceted navigation </li></ul><ul><ul><li>Any classification dimension can be independently used as suitable criterion for accessing the protein domains </li></ul></ul><ul><ul><li>Browsing ALL the protein domain instances for any dimension (facet, hypertext trail) </li></ul></ul><ul><ul><li>No need to traverse all the levels to reach the protein domains </li></ul></ul>Potential Benefits
  32. 32. 3. Superimposing multiple classifications while browsing the protein collection
  33. 33. beta Filter classification by: Navigating the Protein Classification Protein Domains Architecture: Alpha Horseshoe (443 domains) 1mu5A02 1mu5A03 1mu5A05 4mu4A01 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 16 out of 443 domains Class (4) Topology (3) Ribbon Single Sheet Roll Beta Barrel Clam Sandwich Distorted Sandwich Trefoil Orthogonal Bundle Updown Bundle Alpha Horseshoe [ 443 domains ] Alpha solenoid [6 domains] Alpha/alpha barrel [… domains] Beta Barrel Clam Distorted Sandwich Orthogonal Bundle Alpha Horseshoe Alpha solenoid Alpha/alpha barrel + - Sort by: name | domains | code Architecture (40) _ Leucine-rich Repeat Variant [ 91 domains ] 70-kda Soluble Lytic Transglycosylase; domain 1 [ 3 domains ] Serine Threonine Protein Phosphatase 5, Tetratricopeptide repeat [ 349 domains ]
  34. 34. beta Filter classification by: Navigating the Protein Classification Protein Domains Architecture: Alpha Horseshoe (443 domains) 1mu5A02 1mu5A03 1mu5A05 4mu4A01 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 16 out of 443 domains Class (4) Topology (3) Alpha Horseshoe [ 443 domains ] Alpha solenoid [6 domains] + - Sort by: name | domains | code (13) T, H T T Architecture (40) _ Leucine-rich Repeat Variant [ 91 domains ] 70-kda Soluble Lytic Transglycosylase; domain 1 [ 3 domains ] Serine Threonine Protein Phosphatase 5, Tetratricopeptide repeat [ 349 domains ] Homologous Superfamily (2091) + Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain] 70-kda Soluble Lytic Transglycosylase, domain 1 [3 domains] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain] 70-kda Soluble Lytic Transglycosylase, domain 1 [3 domains] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain] 70-kda Soluble Lytic Transglycosylase, domain 1 [3 domains] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain] 70-kda Soluble Lytic Transglycosylase, domain 1 [3 domains] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain]
  35. 35. ...with progressive filtering
  36. 36. beta Filter classification by: Navigating the Protein Classification Protein Domains Architecture: Alpha Horseshoe (443 domains) 1mu5A05 4mu4A01 1mu5A02 3 out of 59 domains Class (4) Topology (3) Alpha Horseshoe [ 443 domains ] Alpha solenoid [6 domains] + - Sort by: name | domains | code (13) T, H T T Architecture (40) _ Leucine-rich Repeat Variant [ 91 domains ] 70-kda Soluble Lytic Transglycosylase; domain 1 [ 3 domains ] Serine Threonine Protein Phosphatase 5, Tetratricopeptide repeat [ 349 domains ] Homologous Superfamily (2091) + Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain] 70-kda Soluble Lytic Transglycosylase, domain 1 [3 domains] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain] 70-kda Soluble Lytic Transglycosylase, domain 1 [3 domains] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain] 70-kda Soluble Lytic Transglycosylase, domain 1 [3 domains] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain] 70-kda Soluble Lytic Transglycosylase, domain 1 [3 domains] Leucine-rich Repeat Variant [89 domains] Lipovitellin. Chain A, domain 2 [1 domain] IP3 receptor type 1 binding core, domain 2 [1 domain]
  37. 37. 4. Associative navigation from the protein details
  38. 38. beta Filter classification by: Navigating the Protein Classification Class (4) Topology (3) Ribbon Single Sheet Roll Beta Barrel Clam Sandwich Distorted Sandwich Trefoil Orthogonal Bundle Updown Bundle Alpha Horseshoe [ 443 domains ] Alpha solenoid [6 domains] Alpha/alpha barrel [… domains] Beta Barrel Clam Distorted Sandwich Orthogonal Bundle Alpha Horseshoe Alpha solenoid Alpha/alpha barrel + - Sort by: name | domains | code Protein Domain: 1eyhA00 ATOM Sequence HNYSEAEIKVREATSNDPWGPSSSLMSEIADLTYNVVAFSEIMSMIWKRLNDHGKNWRHVYKAMTLMEYLIKTGSERVSQ QCKENMYAVQTLKDFQYVDRDGKDQGVNVREKAKQLVALLRDEDRLREERAHALKTKEKLAQTA COMBS Sequence HNYSEAEIKVREATSNDPWGPSSSLMSEIADLTYNVVAFSEIMSMIWKRLNDHGKNWRHVYKAMTLMEYLIKTGSERVSQ QCKENMYAVQTLKDFQYVDRDGKDQGVNVREKAKQLVALLRDEDRLREERAHALKTKEKLAQTA >> Chain: 1eyhA Summary Chain ID 1eyhA Insert Timestamp 05 Mar 2006 13:03 PDB code 1eyh Flow Stage Type Chopped Seq Length 144 Fraction of Non-Alpha Carbon Atoms 0.88 Chain History Chain chopped (05 Mar 2006: Auto) PDB chopped based on information from the domall file >> Pdb: 1eyh Status PDB code 1eyh Release Date 06 May 2000 Release Status PDB_RELEASE_STATUS_ACTIVE Superseded Architecture (40) _ Leucine-rich Repeat Variant [ 91 domains ] 70-kda Soluble Lytic Transglycosylase; domain 1 [ 3 domains ] Serine Threonine Protein Phosphatase 5, Tetratricopeptide repeat [ 349 domains ] See also domains of the same:  Class: alpha (19‘729 domains)  Architecture: alpha horseshoe (443 domains)  Topology: Serine… (349 domains)  Homologous Superfamily: cell cycle (46 domains) [by other levels]
  39. 39. beta Filter classification by: Navigating the Protein Classification Protein Domains Architecture: Alpha Horseshoe (443 domains) 1mu5A02 1mu5A03 1mu5A05 4mu4A01 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 16 out of 443 domains Class (4) Topology (1084) Homologous Superfamily (2091) Ribbon Single Sheet Roll Beta Barrel Clam Sandwich Distorted Sandwich Trefoil Orthogonal Bundle Updown Bundle Alpha Horseshoe [ 443 domains ] Alpha solenoid [6 domains] Alpha/alpha barrel [… domains] Beta Barrel Clam Distorted Sandwich Orthogonal Bundle Alpha Horseshoe Alpha solenoid Alpha/alpha barrel + + + Sort by: name | domains | code Architecture (40) _
  40. 40. <ul><li>Enhance serendipity in the user experience </li></ul><ul><ul><li>Discover new proteins sharing properties with a known one </li></ul></ul><ul><ul><li>Encourage further exploration </li></ul></ul>Potential Benefits
  41. 41. Push communication (notifying local updates)
  42. 42. beta Filter classification by: Navigating the Protein Classification Class (4) Topology (3) Ribbon Single Sheet Roll Beta Barrel Clam Sandwich Distorted Sandwich Trefoil Orthogonal Bundle Updown Bundle Alpha Horseshoe [ 443 domains ] Alpha solenoid [6 domains] Alpha/alpha barrel [… domains] Beta Barrel Clam Distorted Sandwich Orthogonal Bundle Alpha Horseshoe Alpha solenoid Alpha/alpha barrel + - Sort by: name | domains | code Protein Domain: 1eyhA00 ATOM Sequence HNYSEAEIKVREATSNDPWGPSSSLMSEIADLTYNVVAFSEIMSMIWKRLNDHGKNWRHVYKAMTLMEYLIKTGSERVSQ QCKENMYAVQTLKDFQYVDRDGKDQGVNVREKAKQLVALLRDEDRLREERAHALKTKEKLAQTA COMBS Sequence HNYSEAEIKVREATSNDPWGPSSSLMSEIADLTYNVVAFSEIMSMIWKRLNDHGKNWRHVYKAMTLMEYLIKTGSERVSQ QCKENMYAVQTLKDFQYVDRDGKDQGVNVREKAKQLVALLRDEDRLREERAHALKTKEKLAQTA >> Chain: 1eyhA Summary Chain ID 1eyhA Insert Timestamp 05 Mar 2006 13:03 PDB code 1eyh Flow Stage Type Chopped Seq Length 144 Fraction of Non-Alpha Carbon Atoms 0.88 Chain History Chain chopped (05 Mar 2006: Auto) PDB chopped based on information from the domall file >> Pdb: 1eyh Status PDB code 1eyh Release Date 06 May 2000 Release Status PDB_RELEASE_STATUS_ACTIVE Superseded XML populated as updates occur RSS Architecture (40) _ Leucine-rich Repeat Variant [ 91 domains ] 70-kda Soluble Lytic Transglycosylase; domain 1 [ 3 domains ] Serine Threonine Protein Phosphatase 5, Tetratricopeptide repeat [ 349 domains ] See also domains of the same:  Class: alpha (19‘729 domains)  Architecture: alpha horseshoe (443 domains)  Topology: Serine… (349 domains)  Homologous Superfamily: cell cycle (46 domains) […]
  43. 43. beta Filter classification by: Navigating the Protein Classification Protein Domains Architecture: Alpha Horseshoe (443 domains) 1mu5A02 1mu5A03 1mu5A05 4mu4A01 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 1mu5A02 16 out of 443 domains Class (4) Topology (1084) Homologous Superfamily (2091) Ribbon Single Sheet Roll Beta Barrel Clam Sandwich Distorted Sandwich Trefoil Orthogonal Bundle Updown Bundle Alpha Horseshoe [ 443 domains ] Alpha solenoid [6 domains] Alpha/alpha barrel [… domains] Beta Barrel Clam Distorted Sandwich Orthogonal Bundle Alpha Horseshoe Alpha solenoid Alpha/alpha barrel + + + Sort by: name | domains | code RSS XML populated as updates occur Architecture (40) _
  44. 44. <ul><li>For biologists </li></ul><ul><ul><li>Focussing long-term research on specific (limited number of) protein domain instances </li></ul></ul><ul><ul><li>Follow updates and research evolution </li></ul></ul><ul><ul><li>“ localized” proactive notification is important </li></ul></ul><ul><li>For bioinformaticians </li></ul><ul><ul><li>Working on large collections of data for computation purposes </li></ul></ul><ul><ul><li>To be combined with data download facilities </li></ul></ul><ul><li>Input and ideas to the redesign of CATH </li></ul>Potential Benefits
  45. 45. <ul><li>Strategies to exploit the full potential of the information architecture based on protein classification </li></ul><ul><li>Enhance usability and flexibility in accessing the protein collection </li></ul><ul><ul><li>Faceted navigation </li></ul></ul><ul><ul><ul><li>besides purely hierarchical access </li></ul></ul></ul><ul><ul><li>Hypertextual paths </li></ul></ul><ul><ul><ul><li>Setting more favorable conditions for serendipity and insights discovery </li></ul></ul></ul>Summary
  46. 46. <ul><li>Ongoing review, walkthrough of the design concept with bioinformaticians, and the CATH team at UCL </li></ul><ul><li>Provoking reflection and reaction to design opportunities to gain insight into domain knowledge and requirements </li></ul><ul><ul><ul><li>(„the importance of ignorance“) </li></ul></ul></ul>Eliciting domain knowledge
  47. 47. <ul><li>Capture user requirements of: </li></ul><ul><ul><li>Biologists </li></ul></ul><ul><ul><li>Bioinformaticians </li></ul></ul><ul><li>In terms of: </li></ul><ul><ul><li>Goals in using the current classification </li></ul></ul><ul><ul><li>Access and data manipulation tasks </li></ul></ul><ul><ul><li>Start from recruiting current CATH users </li></ul></ul><ul><li>Refine, validate the design concept </li></ul><ul><li>Implement a generic system architecture and make it available </li></ul>Next steps
  48. 48. <ul><li>Davide Bolchini </li></ul><ul><li>[email_address] </li></ul><ul><li>http://bolchini.blogspot.com </li></ul><ul><li>http://www.cs.ucl.ac.uk/staff/D.Bolchini/ </li></ul>Contacts

×