Open Semantic Annotation  an experiment with BioMoby Web Services Benjamin Good, Paul Lu,  Edward Kawas,  Mark Wilkinson University of British Columbia Heart + Lung Research Institute St. Paul’s Hospital
The Web contains lots of things
But the Web doesn’t know what they ARE text/html video/mpeg image/jpg audio/aiff
The Semantic Web It’s A Duck
Semantic Web Reasoning Logically… It’s A Duck Defining the world by its properties helps me find the KINDS of things I am looking for  Add properties to the things we are describing Walks Like a Duck Quacks Like a Duck Looks Like a Duck
Asserted vs. Reasoned Semantic Web Catalog/ ID Selected Logical Constraints (disjointness,  inverse, …)  Terms/ glossary Thesauri “ narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value  Restrs. General Logical constraints Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann,  McGuinness, Uschold, Welty; –  updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
Who assigns these properties? Works ~well … but doesn’t scale
When we say “Web”  we mean “Scale”
Natural Language Processing Scales Well… Works!!  … Sometimes… … Sort of….
Natural Language Processing Problem #1 Requires text to get the process started Problem #2 Low accuracy means it can only support, not replace, manual annotation
Web 2.0 Approach OPEN to all Web users (Scale!) Parallel, Distributed, “ Human Computation”
Human Computation Getting  lots of people  to solve problems that are   difficult   for computers. (term introduced by Luis Von Ahn, Carnegie Mellon University)
Example: Image Annotation
ESP Game results >4 million images labeled >23,000 players Given 5,000  players online simultaneously, could label all of the images accessible to Google in a month  See the “Google image labeling game”… Luis Von Ahn and Laura Dabbish (2004)  “Labeling images with a computer game” ACM Conference on Human Factors in Computing Systems (CHI)
Social Tagging Accepted Widely applied  Passive volunteer annotation. Del.icio.us  2006 surpassed 1 million users Connotea, CiteUlike, etc. See also our ED2Connotea extension This is a picture of Japanese traditional wagashi sweets called “seioubo” which is modeled after a peach
BUSTED! I just pulled a bunch of Semantics out of my Seioubo!
BUSTED! This is a picture of Japanese traditional wagashi sweets called “seioubo” which is modeled after a peach This is a totally sweet picture of peaches grown in the city of Seioubo, in the Wagashi region of Japan
So tagging isn’t enough… We need properties, but the properties need to be semantically-grounded in order to enable reasoning (and this ain’t gonna happen through NLP because there is even  less  context in tags!)
Social Semantic Tagging Q1:   Can we design interfaces that assist “the masses” to derive their tags from controlled vocabularies (ontologies)? Q2:  How well do “the masses” do when faced with such an interface?  Can this data be used “rigorously” for e.g. logical reasoning? Q3:   “The masses” seem to be good at tagging things like pictures… no brainer!  How do they do at tagging more complex things like bioinformatics Web Services?
Context:  BioMoby Web Services BioMoby is a Semantic Web Services framework in which the data-objects consumed/produced by BioMoby service providers are explicitly grounded (semantically and syntactically) in an ontology A second ontology describes the analytical functions that a Web Service can perform
Context:  BioMoby Web Services BioMoby ontologies suffer from being  semantically VERY shallow…  thus it is VERY difficult to discover the Web Service that you REALLY want at any given moment… Can we improve discovery by improving the semantic annotation of the services?
Experiment Implemented The  BioMoby Annotator Web interface for annotation myGrid ontology + Freebase as the grounding Recruited volunteers Volunteers annotated BioMoby Web Services Measured Inter-annotator agreement Agreement with manually constructed standard Individuals, aggregates
BioMoby Annotator Information extracted from  Moby Central Web Service Registry Tagging areas
Tagging Type-ahead tag suggestions drawn from myGrid Web Service Ontology & from Freebase
Tagging New simple tags can also be created, as per normal tagging
“ Gold-Standard” Dataset 27 BioMoby services were hand-annotated by us Typical bioinformatics functions Retrieve database record Perform sequence alignment Identifier-to-Identifier mapping
Volunteers Recruited friends and posted on mailing lists. Offered small reward for completing the experiment ($20 Amazon) 19 participants Mix of BioMoby developers, bioinformaticians, statisticians, students. Majority had some experience with Web Services 13 completed annotating  all  of the selected services
Measurements Inter-annotator agreement Standard approach for estimating annotation quality. Usually measured for small groups of professional annotators (typically 2-4**) Agreement with the “gold standard” Measured in the same way but one “annotator” is considered the standard
Inter-annotator Agreement Metric Positive Specific Agreement Amount of overlap between all annotations elicited for a particular item comparing annotators pairwise 2*I (2*I + a + b) I = intersection of sets A and B a = A without I b = B without I  PSA(A, B) =
Gold-standard Agreement Metrics Precision, Recall, F measure True tags by T All tags by T Precision (T) = True tags by T All true tags Recall (T) = (F = PSA if one set considered “true”) F = harmonic mean of P and R (2PR/P+R)
Metrics Average pairwise agreements reported Across all pairs of annotators By Service Operation (e.g. retrieval) and Objects (e.g. DNA sequence) By semantically-grounded tags By free-text tags
Inter-Annotator Agreement Type N pairs mean median min max stand. dev. coefficient of variation Free, Object 1658 0.09 0.00 0.00 1.00 0.25 2.79 Semantic, Object 3482 0.44 0.40 0.00 1.00 0.43 0.98 Free,  Operation 210 0.13 0.00 0.00 1.00 0.33 2.49 Semantic, Operation 2599 0.54 0.67 0.00 1.00 0.32 0.58
Agreement to “Gold” Standard Subject Type measure mean median min max stand. dev. coefficient of variation Data-types (input & output) PSA 0.52 0.51 0.32 0.71 0.11 0.22 Precision 0.54 0.53 0.33 0.74 0.13 0.24 Recall 0.54 0.54 0.30 0.71 0.12 0.21 Web Service Operations PSA 0.59 0.60 0.36 0.75 0.10 0.18 Precision 0.81 0.79 0.52 1.0 0.13 0.16 Recall 0.53 0.50 0.26 0.77 0.15 0.28
Consensus & Correctness:  Datatypes
Consensus and Correctness:  Operations
Open Annotations are  Different
Trust must be earned Can be decided  at runtime By consensus agreement (as described here) By annotator reputation By recency By your favorite algorithm By you !
IT’S ALL ABOUT CONTEXT!! We can get REALLY good semantic annotations IF we provide context!!
Open Semantic Annotation Works IF we provide CONTEXT IF enough volunteers contribute BUT we do not understand why people do or do not contribute without $$$ incentive SO further research is needed to understand Social Psychology on the Web
Watch for Forthcoming issue in the International Journal of Knowledge Engineering and Data Mining on  “ Incentives for  Semantic Content Creation”
Ack’s Benjamin Good Edward Kawas Paul Lu MSFHR/CIHR Bioinformatics Training Programme @ UBC iCAPTURE Centre @ St. Paul’s Hospital NSERC Genome Canada/Genome Alberta

The BioMoby Semantic Annotation Experiment

  • 1.
    Open Semantic Annotation an experiment with BioMoby Web Services Benjamin Good, Paul Lu, Edward Kawas, Mark Wilkinson University of British Columbia Heart + Lung Research Institute St. Paul’s Hospital
  • 2.
    The Web containslots of things
  • 3.
    But the Webdoesn’t know what they ARE text/html video/mpeg image/jpg audio/aiff
  • 4.
    The Semantic WebIt’s A Duck
  • 5.
    Semantic Web ReasoningLogically… It’s A Duck Defining the world by its properties helps me find the KINDS of things I am looking for Add properties to the things we are describing Walks Like a Duck Quacks Like a Duck Looks Like a Duck
  • 6.
    Asserted vs. ReasonedSemantic Web Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “ narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
  • 7.
    Who assigns theseproperties? Works ~well … but doesn’t scale
  • 8.
    When we say“Web” we mean “Scale”
  • 9.
    Natural Language ProcessingScales Well… Works!! … Sometimes… … Sort of….
  • 10.
    Natural Language ProcessingProblem #1 Requires text to get the process started Problem #2 Low accuracy means it can only support, not replace, manual annotation
  • 11.
    Web 2.0 ApproachOPEN to all Web users (Scale!) Parallel, Distributed, “ Human Computation”
  • 12.
    Human Computation Getting lots of people to solve problems that are difficult for computers. (term introduced by Luis Von Ahn, Carnegie Mellon University)
  • 13.
  • 14.
    ESP Game results>4 million images labeled >23,000 players Given 5,000 players online simultaneously, could label all of the images accessible to Google in a month See the “Google image labeling game”… Luis Von Ahn and Laura Dabbish (2004) “Labeling images with a computer game” ACM Conference on Human Factors in Computing Systems (CHI)
  • 15.
    Social Tagging AcceptedWidely applied Passive volunteer annotation. Del.icio.us 2006 surpassed 1 million users Connotea, CiteUlike, etc. See also our ED2Connotea extension This is a picture of Japanese traditional wagashi sweets called “seioubo” which is modeled after a peach
  • 16.
    BUSTED! I justpulled a bunch of Semantics out of my Seioubo!
  • 17.
    BUSTED! This isa picture of Japanese traditional wagashi sweets called “seioubo” which is modeled after a peach This is a totally sweet picture of peaches grown in the city of Seioubo, in the Wagashi region of Japan
  • 18.
    So tagging isn’tenough… We need properties, but the properties need to be semantically-grounded in order to enable reasoning (and this ain’t gonna happen through NLP because there is even less context in tags!)
  • 19.
    Social Semantic TaggingQ1: Can we design interfaces that assist “the masses” to derive their tags from controlled vocabularies (ontologies)? Q2: How well do “the masses” do when faced with such an interface? Can this data be used “rigorously” for e.g. logical reasoning? Q3: “The masses” seem to be good at tagging things like pictures… no brainer! How do they do at tagging more complex things like bioinformatics Web Services?
  • 20.
    Context: BioMobyWeb Services BioMoby is a Semantic Web Services framework in which the data-objects consumed/produced by BioMoby service providers are explicitly grounded (semantically and syntactically) in an ontology A second ontology describes the analytical functions that a Web Service can perform
  • 21.
    Context: BioMobyWeb Services BioMoby ontologies suffer from being semantically VERY shallow… thus it is VERY difficult to discover the Web Service that you REALLY want at any given moment… Can we improve discovery by improving the semantic annotation of the services?
  • 22.
    Experiment Implemented The BioMoby Annotator Web interface for annotation myGrid ontology + Freebase as the grounding Recruited volunteers Volunteers annotated BioMoby Web Services Measured Inter-annotator agreement Agreement with manually constructed standard Individuals, aggregates
  • 23.
    BioMoby Annotator Informationextracted from Moby Central Web Service Registry Tagging areas
  • 24.
    Tagging Type-ahead tagsuggestions drawn from myGrid Web Service Ontology & from Freebase
  • 25.
    Tagging New simpletags can also be created, as per normal tagging
  • 26.
    “ Gold-Standard” Dataset27 BioMoby services were hand-annotated by us Typical bioinformatics functions Retrieve database record Perform sequence alignment Identifier-to-Identifier mapping
  • 27.
    Volunteers Recruited friendsand posted on mailing lists. Offered small reward for completing the experiment ($20 Amazon) 19 participants Mix of BioMoby developers, bioinformaticians, statisticians, students. Majority had some experience with Web Services 13 completed annotating all of the selected services
  • 28.
    Measurements Inter-annotator agreementStandard approach for estimating annotation quality. Usually measured for small groups of professional annotators (typically 2-4**) Agreement with the “gold standard” Measured in the same way but one “annotator” is considered the standard
  • 29.
    Inter-annotator Agreement MetricPositive Specific Agreement Amount of overlap between all annotations elicited for a particular item comparing annotators pairwise 2*I (2*I + a + b) I = intersection of sets A and B a = A without I b = B without I PSA(A, B) =
  • 30.
    Gold-standard Agreement MetricsPrecision, Recall, F measure True tags by T All tags by T Precision (T) = True tags by T All true tags Recall (T) = (F = PSA if one set considered “true”) F = harmonic mean of P and R (2PR/P+R)
  • 31.
    Metrics Average pairwiseagreements reported Across all pairs of annotators By Service Operation (e.g. retrieval) and Objects (e.g. DNA sequence) By semantically-grounded tags By free-text tags
  • 32.
    Inter-Annotator Agreement TypeN pairs mean median min max stand. dev. coefficient of variation Free, Object 1658 0.09 0.00 0.00 1.00 0.25 2.79 Semantic, Object 3482 0.44 0.40 0.00 1.00 0.43 0.98 Free, Operation 210 0.13 0.00 0.00 1.00 0.33 2.49 Semantic, Operation 2599 0.54 0.67 0.00 1.00 0.32 0.58
  • 33.
    Agreement to “Gold”Standard Subject Type measure mean median min max stand. dev. coefficient of variation Data-types (input & output) PSA 0.52 0.51 0.32 0.71 0.11 0.22 Precision 0.54 0.53 0.33 0.74 0.13 0.24 Recall 0.54 0.54 0.30 0.71 0.12 0.21 Web Service Operations PSA 0.59 0.60 0.36 0.75 0.10 0.18 Precision 0.81 0.79 0.52 1.0 0.13 0.16 Recall 0.53 0.50 0.26 0.77 0.15 0.28
  • 34.
  • 35.
  • 36.
  • 37.
    Trust must beearned Can be decided at runtime By consensus agreement (as described here) By annotator reputation By recency By your favorite algorithm By you !
  • 38.
    IT’S ALL ABOUTCONTEXT!! We can get REALLY good semantic annotations IF we provide context!!
  • 39.
    Open Semantic AnnotationWorks IF we provide CONTEXT IF enough volunteers contribute BUT we do not understand why people do or do not contribute without $$$ incentive SO further research is needed to understand Social Psychology on the Web
  • 40.
    Watch for Forthcomingissue in the International Journal of Knowledge Engineering and Data Mining on “ Incentives for Semantic Content Creation”
  • 41.
    Ack’s Benjamin GoodEdward Kawas Paul Lu MSFHR/CIHR Bioinformatics Training Programme @ UBC iCAPTURE Centre @ St. Paul’s Hospital NSERC Genome Canada/Genome Alberta