Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analyzing User Interactions with Biomedical Ontologies: A Visual Perspective

101 views

Published on

Biomedical ontologies are large: Several ontologies in the BioPortal repository contain thousands or even hundreds of thousands of entities. The development and maintenance of such large ontologies is difficult. To support ontology authors and repository developers in their work, it is crucial to improve our understanding of how these ontologies are explored, queried, reused, and used in downstream applications by biomedical researchers. We present an exploratory empirical analysis of user activities in the BioPortal ontology repository by analyzing BioPortal interaction logs across different access modes over several years. We investigate how users of BioPortal query and search for ontologies and their classes, how they explore the ontologies, and how they reuse classes from different ontologies. Additionally, through three real-world scenarios, we not only analyze the usage of ontologies for annotation tasks but also compare it to the browsing and querying behaviors of BioPortal users. For our investigation, we use several different visualization techniques. To inspect large amounts of interaction, reuse, and real-world usage data at a glance, we make use of and extend PolygOnto, a visualization method that has been successfully used to analyze reuse of ontologies in previous work. Our results show that exploration, query, reuse, and actual usage behaviors rarely align, suggesting that different users tend to explore, query and use different parts of an ontology. Finally, we highlight and discuss differences and commonalities among users of BioPortal.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Analyzing User Interactions with Biomedical Ontologies: A Visual Perspective

  1. 1. Analyzing User Interactions with Biomedical Ontologies: A Visual Perspective Tuesday Talk, 14th November 2017 M A U L I K KA M D A R , S I M O N WA L K , TA N I A T U D O R A C H E , MA R K MU S E N Stanford Center for Biomedical Informa:cs Research maulikrk@stanford.edu
  2. 2. Uses: Knowledge management, decision support, semantic search, data annotation, data integration, reasoning … …
  3. 3. Uses: Knowledge management, decision support, semantic search, data annotation, data integration, reasoning … …
  4. 4. hGp://bioportal.bioontology.org/
  5. 5. hGp://bioportal.bioontology.org/
  6. 6. hGp://bioportal.bioontology.org/
  7. 7. hGp://bioportal.bioontology.org/
  8. 8. hGp://bioportal.bioontology.org/
  9. 9. What this talk is about … •  BiOnIC - Catalog of User InteracRons with Biomedical Ontologies •  VisIOn applicaRon with embedded visualizaRons. Analysis of BioPortal WebUI exploraRon and API querying strategies, and correlaRon with usage.
  10. 10. What this talk is about … •  BiOnIC - Catalog of User InteracRons with Biomedical Ontologies •  VisIOn applicaRon with embedded visualizaRons. Analysis of BioPortal WebUI exploraRon and API querying strategies, and correlaRon with usage.
  11. 11. Benefits of analyzing user interacRons Ø  Ontology Engineers: v  IdenRfy exploraRon and querying paGerns v  Understand ontology usage and reuse v  Prune unwanted classes and relaRons Ø  Ontology Repository Maintainers: v  Categorize user behaviors v  Develop intelligent interfaces v  Provide targeted recommendaRons Ø  Biomedical Researchers: v  IdenRfy temporal research trends v  IdenRfy frequently accessed classes
  12. 12. BiOnIC: A Catalog of User InteracRons with Biomedical Ontologies hGp://onto-apps.stanford.edu/bionic/datasets
  13. 13. BiOnIC datasets creaRon
  14. 14. NCBO API Usage APIRequestsperMonth 2013−O ct 2014−Jan 2014−Apr 2014−Jul 2014−O ct 2015−Jan 2015−Apr 2015−Jul 2015−O ct 2016−Jan 2016−Apr 2016−Jul 2016−O ct 2M8M32M Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data NCBO Website Traffic OccurrencesperMonth 2009−Jan 2010−Jan 2011−Jan 2012−Jan 2013−Jan 2014−Jan 2015−Jan 2016−Jan 0100K200K Page Requests Unique IP Addresses BiOnIC datasets creaRon •  Removing robot/invalid requests •  Normalizing ontology idenRfiers and class IRIs
  15. 15. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon •  January 2015 version. •  Ontologies should have classes that are reused by others OR reuse classes from other ontologies. •  Ontologies should have minimum of 10 unique users via WebUI and API
  16. 16. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon Class Sta:s:cs Datasets For each class in each ontology: •  Access ANributes: o  Total IP Requests (WebUI/API) o  Unique IP Requests (WebUI/API) •  Reuse ANributes: o  Number of ontologies reusing a class •  Structural ANributes: o  Number of parent/child/sibling classes o  Depth from ontology root
  17. 17. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon User Interac:on Sequences Datasets Ontology 1 Ontology 2 Class Depth -> 2a 1 3a 4a 4b 4c 2b 3b 3c 1’ 2a’ 2b’ 2c’ 3a’ 3b’ 3c’
  18. 18. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon User Interac:on Sequences Datasets Ontology 1 Ontology 2 Class Depth -> 2a 1 3a 4a 4b 4c 2b 3b 3c 1’ 2a’ 2b’ 2c’ 3a’ 3b’ 3c’
  19. 19. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon User Interac:on Sequences Datasets Ontology 1 Ontology 2 Class Depth -> 2a 1 3a 4a 4b 4c 2b 3b 3c 1’ 2a’ 2b’ 2c’ 3a’ 3b’ 3c’
  20. 20. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon User Interac:on Sequences Datasets Ontology 1 Ontology 2 Class Depth -> 2a 1 3a 4a 4b 4c 2b 3b 3c 1’ 2a’ 2b’ 2c’ 3a’ 3b’ 3c’
  21. 21. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon User Interac:on Sequences Datasets Ontology 1 Ontology 2 Class Depth -> 2a 1 3a 4a 4b 4c 2b 3b 3c 1’ 2a’ 2b’ 2c’ 3a’ 3b’ 3c’
  22. 22. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon User Interac:on Sequences Datasets Ontology 1 Ontology 2 Class Depth -> 2a 1 3a 4a 4b 4c 2b 3b 3c 1’ 2a’ 2b’ 2c’ 3a’ 3b’ 3c’
  23. 23. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon User Interac:on Sequences Datasets Ontology 1 Ontology 2 Class Depth -> 2a 1 3a 4a 4b 4c 2b 3b 3c 1’ 2a’ 2b’ 2c’ 3a’ 3b’ 3c’
  24. 24. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon User Interac:on Sequences Datasets Ontology 1 Ontology 2 Class Depth -> 2a 1 3a 4a 4b 4c 2b 3b 3c 1’ 2a’ 2b’ 2c’ 3a’ 3b’ 3c’
  25. 25. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon User Interac:on Sequences Datasets Ontology 1 Ontology 2 Class Depth -> 2a 1 3a 4a 4b 4c 2b 3b 3c 1’ 2a’ 2b’ 2c’ 3a’ 3b’ 3c’
  26. 26. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon User Interac:on Sequences Datasets Ontology 1 Ontology 2 Class Depth -> 2a 1 3a 4a 4b 4c 2b 3b 3c 1’ 2a’ 2b’ 2c’ 3a’ 3b’ 3c’
  27. 27. 2a’ 1’ 2b’ 3b’ 2a 3a 4a 3a 1 Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon User Interac:on Sequences Datasets Ontology 1 Ontology 2
  28. 28. Filtering Access Logs Filtering Ontologies CompuRng Class Counts CompuRng Sequences Anonymizing Data BiOnIC datasets creaRon Anonymiza:on Steps •  IP addresses anonymized using unique SHA-224 hash-encoded user idenRfiers generated from “user_<Random String>_<Random_Integer>”. •  e.g. 39fd4e6d569a034973g61bb392a694d4eabe1ef98c43ee68ca2fc86 •  Absolute Time-stamps converted to relaRve Rme-stamps, with respect to first interacRon with BioPortal repository. •  e.g. 0, 2757, 2786, 3586, 3618, 3803, 3959, 4047, 5111 (s), …
  29. 29. BiOnIC schema to model staRsRcs and sequences data countStat bionic:CountStat bionic:ReuseCount -  reuseType -  reusingOntologies bionic:RequestCount -  accessType -  year -  totalUsers -  uniqueUsers prov:Agent bionic:Sequence -  accessType -  totalTime -  uniqueClasses bionic:SeqEn:ty -  rela6veTimestamp bionic:Ontology -  skos:prefLabel -  totalClasses -  maxDepth owl:Class - skos:prefLabel skos:Collec:on skos:Concept begin end nextEn6ty class class requests skos:member bionic:SeqDataset -  accessType bionic:StatDataset dcat:Dataset sequence classInfo ontology ontology bionic:ClassInfo -  siblings -  directParents -  directChildren -  classDepth class subClassOf ontology SKOS, PROV and DCAT standards are reused in the BiOnIC schema.
  30. 30. hGp://onto-apps.stanford.edu/bionic/datasets BiOnIC datasets
  31. 31. hGp://onto-apps.stanford.edu/bionic/datasets BiOnIC datasets hGp://www.rdjdt.org/
  32. 32. hGp://onto-apps.stanford.edu/bionic/datasets BiOnIC datasets hGp://www.rdjdt.org/ SPARQL Server
  33. 33. hGp://onto-apps.stanford.edu/bionic/datasets BiOnIC datasets hGp://www.rdjdt.org/ SPARQL Server BioPortal SPARQL Endpoint
  34. 34. hGp://onto-apps.stanford.edu/bionic/datasets BiOnIC datasets hGp://www.rdjdt.org/ SPARQL Server BioPortal SPARQL Endpoint 1.  How many agents click on the subclasses aIer exploring the “Protein Transmembrane Ac6vity” class in Gene Ontology? 2.  What is the average 6me spent by agents on classes similar to “Cell Growth” class in Gene Ontology?
  35. 35. VisIOn (Visualizing Ontology InteracRons) Web ApplicaRon hGp://onto-apps.stanford.edu/vision
  36. 36. Visualizing Ontology Structure with Count Data Force-directed Network Indented Tree
  37. 37. Visualizing Ontology Structure with Count Data Force-directed Network Indented Tree
  38. 38. PolygOnto visualizaRons of interacRon sequences Kamdar, et al. Visualizing Request and Reuse Data across Biomedical Ontologies. Journal of Web Seman:cs, (under review)
  39. 39. PolygOnto visualizaRons of interacRon sequences Kamdar, et al. Visualizing Request and Reuse Data across Biomedical Ontologies. Journal of Web Seman:cs, (under review)
  40. 40. PolygOnto visualizaRons of interacRon sequences Kamdar, et al. Visualizing Request and Reuse Data across Biomedical Ontologies. Journal of Web Seman:cs, (under review)
  41. 41. PolygOnto visualizaRons of interacRon sequences Kamdar, et al. Visualizing Request and Reuse Data across Biomedical Ontologies. Journal of Web Seman:cs, (under review) Ontology Structure
  42. 42. PolygOnto visualizaRons of interacRon sequences Kamdar, et al. Visualizing Request and Reuse Data across Biomedical Ontologies. Journal of Web Seman:cs, (under review) User Sequence Ontology Structure
  43. 43. PolygOnto visualizaRons of interacRon sequences Kamdar, et al. Visualizing Request and Reuse Data across Biomedical Ontologies. Journal of Web Seman:cs, (under review) User Sequence Single Click Ontology Structure
  44. 44. What this talk is about … •  BiOnIC - Catalog of User InteracRons with Biomedical Ontologies •  VisIOn applicaRon with embedded visualizaRons. Analysis of BioPortal WebUI exploraRon and API querying strategies, and correlaRon with usage.
  45. 45. CharacterisRcs of the BiOnIC Catalog •  WebUI Access: 5.4M class requests, 1M unique agents •  API Access: 67.2M class requests, 205K unique agents •  255 biomedical ontologies
  46. 46. •  Do BioPortal WebUI exploraRon and API querying strategies correlate with each other? •  Do BioPortal WebUI exploraRon and API querying strategies inform ontology usage? – Ontology usage, in this context, means reuse in other ontologies, and usage in data annotaRon.
  47. 47. Interface influences in browsing and querying Number of Unique API Users (Log Scale) Number of Unique WebUI Users (Log Scale) 1000 10 100 10 100 1000 1 1 Certain classes browsed or queried significantly more.
  48. 48. Interface influences in browsing and querying Number of Unique API Users (Log Scale) Number of Unique WebUI Users (Log Scale) 1000 10 100 10 100 1000 Female Reproduc:ve System 1 1 Certain classes browsed or queried significantly more. Dermis
  49. 49. Interface influences in browsing and querying Dysmorphic Syndrome Night blindness Number of Unique API Users (Log Scale) Number of Unique WebUI Users (Log Scale) 1000 10 100 10 100 1000 Female Reproduc:ve System 1 1 Certain classes browsed or queried significantly more. Dermis
  50. 50. ExploraRon and Querying behavioral paGerns •  Certain classes in the lower levels of the ontological hierarchy are rarely browsed and queried – this may be an arRfact of the indented tree visualizaRon. •  More triangular polygons (1 parent -> 2 children classes, or 2 parents -> 1 child class) observed in WebUI Access polygon due to indented tree visualizaRon.
  51. 51. ExploraRon and Querying behavioral paGerns 300,000+ classes 200,000+ users
  52. 52. Do BioPortal WebUI exploraRon and API querying strategies correlate with each other? •  Small proporRons of ontological content explored or queried for several ontologies, and minimal Spearman CorrelaRon and Jaccard Similarity observed between consumpRon strategies. •  Varying consumpRon strategies (triangular polygons, blind querying) across the interfaces – these strategies are generally uniform across ontologies, with a few excepRons (ChEBI). •  Classes in the lower layers of the ontological hierarchy are rarely browsed or queried using the WebUI or API.
  53. 53. Ontology usage: Reuse in other ontologies Dis:nct class sets reused in other ontologies
  54. 54. Ontology usage: GWAS Catalog and PubChem annotaRons Classes in the higher layers of the ontological hierarchy are very abstract and are never used for data annotaRons. Common consensus across 12 different data sources in Linked Open Data cloud, where ChEBI ontology is used for data integraRon.
  55. 55. Ontology usage: GWAS Catalog and PubChem annotaRons Classes in the higher layers of the ontological hierarchy are very abstract and are never used for data annotaRons. Common consensus across 12 different data sources in Linked Open Data cloud, where ChEBI ontology is used for data integraRon.
  56. 56. Do BioPortal WebUI exploraRon and API querying strategies inform ontology usage? •  Minimal correlaRon and similariRes between exploraRon and querying strategies, and usage (reuse and data annotaRons) •  Ontology reuse occurs from classes located in the higher layers of the ontological hierarchy for semanRc interoperability between ontologies . •  However, these classes are very abstract and not very relevant for data annotaRon or data integraRon, where classes in the lower layers of hierarchy are highly used. –  These classes are rarely explored or queried, we need beGer interfaces!
  57. 57. Novel research direcRons may be enabled through the BiOnIC and VisIOn resources •  Categorize user browsing behaviors by incorporaRng the structural features of the ontology classes. •  Develop personalized user interfaces for ontology navigaRon, with predicRons of the next class that a user is likely to access, or targeted recommendaRons based on user type (recurrent neural networks, collaboraRve filtering, …). •  Develop advanced methods for ontology summarizaRon and modularizaRon, using BiOnIC datasets as features. …
  58. 58. Acknowledgments Musen Lab, Stanford BMI PhD Program, Stanford US NIH Grants U54-HG004028 GM086587 maulikrk@stanford.edu hGp://onto-apps.stanford.edu/bionic hGp://onto-apps.stanford.edu/vision

×