Your SlideShare is downloading. ×
Extracting semantics from crowds
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Extracting semantics from crowds

1,025
views

Published on

Talk at the IEEE International Summer School on Semantic Computing at UC Berkeley 2011

Talk at the IEEE International Summer School on Semantic Computing at UC Berkeley 2011

Published in: Education, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,025
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
37
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Knowledge Management Institute Extracting Semantics from Crowds International Summer School on Semantic Computing (SSSC 2011) (SSSC’2011) August 8 - 12, 2011, Berkeley, California Markus Strohmaier Assistant Professor Graz University of Technology, Austria Visiting Scientist (XEROX) PARC, USA with collaborators D. Helic, C. Körner, J. Pöschko, R. Kern, C. Trattner, C. Wagner (Graz University of Technology, Austria), D. Benz, G. Stumme (U. of Kassel, Germany), A. Hotho (U. of Würzburg, Germany), S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen (U. of Leipzig, Germany) Markus Strohmaier 2011 1
  • 2. Knowledge Management Institute Egypt 2011 Twitter Trends place event person Markus Strohmaier 2011 2
  • 3. Knowledge Management Institute Semantic Structures: The DMOZ Project Markus Strohmaier 2011 3
  • 4. Knowledge Management Institute Classification Systems in Information and Library Sciences Usually produced and maintained by few (e.g. (e g dozens of) domain experts experts. but: used by many (potentially millions). Can a very large group (a crowd) of users contribute to ontology engineering efforts? Markus Strohmaier 2011 4
  • 5. Knowledge Management Institute Objectives Provide (some) answers to the following questions: P id ( ) t th f ll i ti • What is the difference between extracting semantics from text vs. extracting semantics from crowds? • Why should we study crowds and crowd behavior from a semantic computing perspective? ti ti ti ? • How can semantics be extracted from online crowd behavior, such as • …from Social Labeling • …from Social Tagging • …from Soc a Navigation o Social a ga o • What are the implications for semantic computing research? Markus Strohmaier 2011 5
  • 6. Knowledge Management Institute Extracting Semantics from … Motivation Social Labeling • Hashtag Semantics Social Tagging Extraction • Tag Relatedness • Tag Generality • Tag Hierarchies Social Navigation • Navigational Knowledge Interaction Engineering g g Markus Strohmaier 2011 6
  • 7. Knowledge Management Institute Ontology engineering vs. learning • O t l Ontology engineering i i • Expert-driven, small-scale • Knowledge and concept identification • Preliminary informal representation • Formalization (RDF, OWL, etc) • Evaluation d Maintenance E l ti and M i t • Ontology learning • Data-driven, large-scale Data driven large scale • Source selection and data sampling • Data exploration and probing • Concept and link learning • Evaluation and Updating Markus Strohmaier 2011 7
  • 8. Knowledge Management Institute Distributional Distrib tional Semantics [Hovy 2011] • I recent years, people in NLP and IR h In t l i d have started t t d using a different representation for all kinds of problems: word distributions [Harris 1954]: “words that occur in the same contexts tend to have similar meanings“ • Statistical NLP operates at word level, frequency distributions are used as the (de facto) semantics of a word ( ) • Not actual semantics, but captures something of contents • Not compositional: how to ‘add’ two distributions? • No explicit theory of semantics Markus Strohmaier 2011 [Hovy 2011] Based on slides by Ed Hovy, USC, Reasoning with Text Workshop 2011 (link) 8
  • 9. Knowledge Management Institute Why Wh apple is similiar to pear [based on slides by Hovy 2011 / Pantel 02] Markus Strohmaier 2011Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining 9(KDD 02). ACM, New York, NY, USA, 613-619.
  • 10. Knowledge Management Institute Why Wh apple is not similiar to toothbr sh toothbrush [based on slides by Hovy 2011 / Pantel 02] Markus Strohmaier 2011Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining 10(KDD 02). ACM, New York, NY, USA, 613-619.
  • 11. Knowledge Management Institute Distributional Distrib tional Semantics [based on slides by Hovy 2011 / Pantel 02] Markus Strohmaier 2011Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining 11(KDD 02). ACM, New York, NY, USA, 613-619.
  • 12. Knowledge Management Institute Hearst Patterns H t P tt M. Hearst, Automatic Acquisition of Hyponyms from Large Text Corpora 1992 (S1) The bow lute, such as the Bambara ndang, is plucked and has an individual curved neck for each string. Markus Strohmaier 2011 12
  • 13. Knowledge Management Institute Ontology Learning From Text Markus Strohmaier 2011 13
  • 14. Knowledge Management Institute Evaluation A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES, Janez Brank, Marko Grobelnik, Dunja Mladenić, Proceedings of the Conference on Data Mining and Data Warehouses (SiKDD 2005), 2005 Markus Strohmaier 2011 14
  • 15. Knowledge Management Institute Limitations of Knowledge Extraction from Text (or why we want to study semantics in crowds) • Delay •Time it t k t write (hi h quality) t t on a t i Ti takes to it (high lit ) text topic • Population bias and dependence • Authors´ demographics Authors • study language of target groups (e.g. software developers) • Style y •Internal monologue vs. conversation with others • Topical bias • General Fiction, Mystery, Science Fiction, Romance, Humor, .. • Books, newspapers, magazines [Brown Corpus] Markus Strohmaier 2011 Brown Corpus: http://icame.uib.no/brown/bcm.html 15
  • 16. Knowledge Management Institute Extracting Semantics from Crowds (i.e. Social Media) How can we tap into users interaction with data and with each other for the extraction of semantic structures? Markus Strohmaier 2011 16
  • 17. Knowledge Management Institute Crowdsourcing “Crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call.” p [J. Howe 2006] Markus Strohmaier 2011 17
  • 18. Knowledge Management Institute Crowdsourcing Semantics: An Experiment (2008) LoC L C posted 3000 photos t d h t to flickr: 24 hours after launch: • over 4,000 unique tags • about 19,000 tags added Output? noisy, How can this data contribute to the unstructured, unqualified, induction of semantic structures? weak semantics Markus Strohmaier 2011 18
  • 19. Knowledge Management Institute Extracting Semantics from Crowds Vision: Vi i Utilizing online behavior of crowds for the construction, maintenance and enrichment construction of large-scale semantic structures. Mission: • to model behavior of large numbers (millions) of users online • to develop techniques and algorithms that acquire semantic structures from users‘ interaction with data and with each other • to influence user behavior and emerging semantics • to evaluate results Markus Strohmaier 2011 19
  • 20. Knowledge Management Institute Activities of Crowds Online Users engage in… Labeling Tagging Navigation Can we tap into the outcome of these activities to extract and evaluate semantic structures? Markus Strohmaier 2011 20
  • 21. Knowledge Management Institute Extracting Semantics from Crowds Motivation Social Labeling • Hashtag Semantics Social Tagging • Tag Relatedness • Tag Generality • Tag Hierarchies Social Navigation • Navigational Knowledge Engineering g g Markus Strohmaier 2011 21
  • 22. Knowledge Management Institute Social Labeling Example: Twitter users label short messages with concepts (hashtags) Whether hashtags behave as strong identifiers, and if they could be mapped to concept identifiers in the Semantic Web (URIs)? [Laniado and Mika 2010]Craig sCraig‘s talk this morning: Background knowledge for URLs to explore other URLs URLs, Markus Strohmaier 2011 22
  • 23. Markus Strohmaier Knowledge Management Institute withand C.Wagner 2011 J. Pöschko ConceptNet developed by C. Wagner, M. Strohmaie The Wisdom in Tw W er, weetonomies: Acquiri ing Latent Conceptua Structures from So al ocial Awareness Stre eams,23 Semmantic Search 2010 Workshop (SemSearch2 W 2010), in conjunction with the 19th Internatio w onal World Wide Web Conference (WWW20 C 010), Rale eigh, NC, USA, April 26-30, ACM, 2010. 2
  • 24. Markus Strohmaier Knowledge Management Institute such a network? p with X is a subconcept of Y e.g. X is more general than Y,and C.Wagner 2011 J. Pöschko ConceptNet Can we qualify semantic associations in ent Ev- ent Ev- developed by C. Wagner, M. Strohmaie The Wisdom in Tw W er, weetonomies: Acquiri ing Latent Conceptua Structures from So al ocial Awareness Stre eams,24 Semmantic Search 2010 Workshop (SemSearch2 W 2010), in conjunction with the 19th Internatio w onal World Wide Web Conference (WWW20 C 010), Rale eigh, NC, USA, April 26-30, ACM, 2010. 2
  • 25. Knowledge Management Institute Extracting Semantics from Crowds Motivation Social Labeling • Hashtag Semantics Social Tagging • Tag Relatedness • Tag Generality • Tag Hierarchies Social Navigation • Navigational Knowledge Engineering g g Markus Strohmaier 2011 25
  • 26. Knowledge Management Institute Social Tagging Example: Delicious Users label and categorize Resources resources with concepts (tags) Tags Users U is a tuple F:= (U, T, R, Y) where • th th the three di j i t fi it sets U T R correspond t disjoint, finite t U, T, d to user 1 – a set of persons or users u ∈ U – a set of tags t ∈ T and – a set of resources or objects r ∈ R tag 1 res. 1 • Y ⊆ U ×T ×R, called set of tag assignments Markus Strohmaier 2011 26
  • 27. Knowledge Management Institute Tag Relatedness ResCont t1 t2 t3 r1 r2 Different tag similiarity measures applied to del.icio.us Markus Strohmaier 2011 C. Cattuto, D. Benz, A. Hotho, G. Stumme, Semantic Grounding of Tag Relatedness in Social Bookmarking Systems, 7th International Semantic Web Conference 27 ISWC2008, LNCS 5318, 615-631 (2008).
  • 28. Knowledge Management Institute Intuition: Latent Hierachical Structures high centrality: more abstract [ [Strohmaier et al 2011] ] [Heyman and Garcia-Molina 2006] low centrality: more specific Note: Betweeness centrality is usually difficult to calculate. Calculating all shortest p path is usually O(n2). For all nodes O(n3). We use an approximation. y ( ) ( ) pp Markus Strohmaier 2011 M. Strohmaier, D. Helic, D. Benz. Evaluation of Folksonomy Induction Algorithms. Submitted to the ACM Transactions on Intelligent Systems and Technology, (2011). 28
  • 29. Knowledge Management Institute Tag Abstractness frequency Del.icio.us with D. Benz et al. Markus Strohmaier 2011 D. Benz, C. Körner, A. Hotho, G. Stumme, M. Strohmaier, One Tag to bind them all: Measuring Term Abstractness in Social Metadata, Submitted to ESWC 2011. 29
  • 30. Knowledge Management Institute Emergent semantics through hierarchical clustering Approaches: • k-means • Affinity propagation • Tag generality Applications: • user navigation i ti • ontology learning • disambiguation on a delicious dataset Evaluation: • Semantic grounding to Golden Standards (e.g. WordNet) W dN t) Study of different tagging systems: BibSonomy, CiteULike ,Delicious, Flickr, LastFM Semantic and Pragmatic Quality varying greatly. i tl Markus Strohmaier 2011Anon Plangprasopchok, Kristina Lerman, and Lise Getoor (2010), Integrating Structured Metadata via Relational Affinity Propagation, in Proceedings of AAAI Workshop on StatisticalRelational AI 30
  • 31. Knowledge Management Institute Semantic Validation of Folksonomies Semantic Networks Semantic Groundingg (Emergent) (Golden Standard) via e.g. hierarchical clustering WordNet: a lexical DB for English computers Map- Synset Hierarchy Programming ping programming distance d1 = 1 distance d2 = 2 Python Design g languages g g patterns abs. difference |d1 - d2| a Semantic simple p y for the q p proxy quality y grounding j java python of emergent semantics Markus Strohmaier 2011 Based on slides by A. Hotho 31
  • 32. Knowledge Management Institute Semantically Evaluating Tag Hierarchies [Dellschaft, Staab 2006] TP..Taxonomic TR..Taxonomic TF..Taxonomic TO…Taxonomic precision recall F measure overlap Different folksonomy algorithms Holds with other knowledge bases (Yago, Wikitaxonomy) and datasets (bibsonomy, citeulike, lastfm less so) [Dellschaft, Staab 2006] On How to Perform a Gold Standard Based Evaluation of Ontology Learning (2006), by Klaas Dellschaft , Steffen Staab, In Proceedings of the 5th International Semantic Web Conference (ISWC’06) Markus Strohmaier 2011 M. Strohmaier, D. Helic, D. Benz. Evaluation of Folksonomy Induction Algorithms. Submitted to the ACM Transactions on Intelligent Systems and Technology, (2011). 32
  • 33. Knowledge Management Institute Limitations and Opportunities Pragmatics i fl P ti influence semantics: ti 1. Tagging behavior effects preciseness of emerging semantics [Körner et al. 2010] 2. Social t 2 S i l networks i fl k influence resulting semantic networks lti ti t k [Wang and Groth 2010] 3. 3 Stream types effect stream semantics [Wagner and Strohmaier 2010] Markus Strohmaier 2011 33
  • 34. Knowledge Management Institute Different motivations for tagging User A User B This These seems to seem to be a be category! keywords! Markus Strohmaier 2011 34
  • 35. Knowledge Management Institute How do Semantics Emerge? Are they Influenced by the Pragmatics of Tagging? Different styles of tagging: Diff t t l ft i To categorize or to describe resources [Strohmaier et al. 2010] Categorizer (C) Describer (D) Goal later browsing later search Change of vocabulary costly cheap Size f Si of vocabulary b l limited li it d Open O Tags subjective objective Example tag clouds Semantic Assumption: Categorizers produce more precise emergent semantics than Describers. Markus Strohmaier 2011 [Strohmaier et al. 2010] M. Strohmaier, C. Koerner, R. Kern, Why do Users Tag? Detecting Users Motivation for Tagging in Social Tagging Systems, 4th International AAAI Conference on 35 Weblogs and Social Media (ICWSM2010), Washington, DC, USA, May 23-26, 2010.
  • 36. Knowledge Management Institute Example 1 p Describers outperform categorizers on precision of emergent tag semantics Categorizers perform Describers perform worse than random better than random Random Random users users (C) (D)C. Körner, D. Benz, A. Hotho, M. Strohmaier, G. Stumme, Stop Thinking, Start Tagging: Tag Semantics Emerge From Collaborative Verbosity, 19th International World Wide Web Conference Markus Strohmaier(WWW2010), Raleigh, NC, USA, April 26-30, ACM, 2010. 2011 36
  • 37. Knowledge Management Institute Example 2 p Categorizers outperform describers on social classification accuracy Describers perform worse than Categorizers Supervised multi-class SVM Markus Strohmaier 2011 A. Zubiaga, C. Koerner, and M. Strohmaier. Tags vs. shelves: From social tagging to social classification. In 22nd ACM SIGWEB Conference on Hypertext and Hypermedia 37 (HT 2011), Eindhoven, Netherlands, ACM, 2011.
  • 38. Knowledge Management Institute Example 3 Type of stream aggregation effects semantics … the type of aggregation: Hashtag stream aggregations are more robust against external disturbances than user list streams Hashtag Stream User List Stream OR(RUa)S(Rh) OR(RUa)S(RUL) Markus Strohmaier 2011 38
  • 39. Knowledge Management Institute Extracting Semantics from Crowds Motivation Social Labeling • Hashtag Semantics Social Tagging • Tag Relatedness • Tag Generality • Tag Hierarchies Social Navigation • Navigational Knowledge Engineering g g Markus Strohmaier 2011 39
  • 40. Knowledge Management Institute Social Navigation Users search for and navigate to certain sets of resources Can we produce ontological constructs as a byproduct of navigational activities of users ? b d t f i ti l ti iti f Navigational Knowledge Engineering: A light-weight methodology for low-cost knowledge engineering by a massive user base. Markus Strohmaier 2011 40
  • 41. Knowledge Management Institute Prototype: HANNE HANNE: Holistic Application for Navigational HANNE H li ti A li ti f N i ti l Knowledge Engineering http://aksw.org/Projects/NKE http://aksw org/Projects/NKE with S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, University of Leipzig Markus Strohmaier 2011 41
  • 42. Knowledge Management Institute Navigational Knowledge Engineering http://aksw.org/Projects/NKE Example: Extending DBPedia with NKE Markus Strohmaier 2011 S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, M. Strohmaier, Navigational Knowledge Engineering, (in preparation) 42
  • 43. Knowledge Management Institute Navigational Knowledge Engineering Choose initial positive and negative examples from the search result. Here we are looking for Football Clubs in Saxony, a region in German Germany. Markus Strohmaier 2011 http://aksw.org/Projects/NKE 43
  • 44. Knowledge Management Institute The Problem of Inductive Concept Learning U i a universal set of objects is i l t f bj t C is a concept: a subset of objects in U To learn a concept C means to learn to recognize objects in C t be able t t ll whether bj t i C, to b bl to tell h th Example: U may be the set of all patients in a register, and f The set of all patients having a particular disease Markus Strohmaier 2011 44
  • 45. Knowledge Management Institute Inductive Concept Learning To test the coverage, the function Returns the value true if e is covered by H, and false otherwise. Markus Strohmaier 2011 Nada Lavrac and Saso Dzeroski, Inductive Logic Programming: Techniques and Applications, 1994 45
  • 46. Knowledge Management Institute HANNE & The Problem of Inductive Concept Learning given: • Background knowledge (OWL/DL knowledge base) • positive and negative examples of a concept (instances) (i t ) find: fi d • A hypothesis (expressed as OWL class descriptions) that covers all positive and no negative examples Markus Strohmaier 2011 46
  • 47. Knowledge Management Institute Based on the extension, ICL searches for suitable hypotheses. Markus Strohmaier 2011 S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, M. Strohmaier, Navigational Knowledge Engineering, (in preparation) 47
  • 48. Knowledge Management Institute Concepts Learned • The Learned C Concept is shown in Manchester O OWL SSyntax • The user can retain the concept for later retrieval. • Saved concepts are displayed as social navigation suggestions. Can be used to enrich existing knowledge base. base Markus Strohmaier 2011 S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, M. Strohmaier, Navigational Knowledge Engineering, (in preparation) 48
  • 49. Knowledge Management Institute Result Useful properties: • Biased towards high recall • Scales well: Number of training examples is more important than the size of the background knowledge With only 2 positives and 4 negatives, it is possible to find 13 more instances, which are f tb ll clubs i t hi h football l b situated close to Saxony, Germany Markus Strohmaier 2011 S. Hellmann, J. Lehmann, C. Stadler, J. Unbehauen, M. Strohmaier, Navigational Knowledge Engineering, (in preparation) 49
  • 50. Knowledge Management Institute http://aksw.org/Projects/NKE Mock up (1) Markus Strohmaier 2011 50
  • 51. Knowledge Management Institute http://aksw.org/Projects/NKE Mock up (2) Markus Strohmaier 2011 51
  • 52. Knowledge Management Institute Extracting Semantics from … Motivation Social Labeling • Hashtag Semantics Social Tagging • Tag Relatedness • Tag Generality • Tag Hierarchies Social Navigation • Navigational Knowledge Engineering g g Conclusions Markus Strohmaier 2011 52
  • 53. Knowledge Management Institute Objectives Provide (some) answers to the following questions: P id ( ) t th f ll i ti • What is the difference between extracting semantics from text vs. extracting semantics from crowds? • Why should we study crowds and crowd behavior from a semantic computing perspective? ti ti ti ? • How can semantics be extracted from online crowd behavior, such as • …from Social Labeling • …from Social Tagging • …from Soc a Navigation o Social a ga o • What are the implications for semantic computing research? Markus Strohmaier 2011 53
  • 54. Knowledge Management Institute Social Computation for the Web of Data Crowds provide some advantages over semantic extraction from text. It is through the process of social computation, i.e. the combination of social behavior and algorithmic computation, that th t semantic structures emerge. ti t t In order to extract semantics from crowds understanding users crowds, users’ behavior and its impact on emerging semantic structures is important. Shaping or classifying certain users’ behavior are ways of increasing p g preciseness of emerging semantic structures. g g Markus Strohmaier 2011 54
  • 55. Knowledge Management Institute Crowd S C d Semantics ti based on [Hovy 2011] Markus Strohmaier 2011 [Hovy 2011] Based on slides by Ed Hovy, USC, Reasoning with Text Workshop 2011 (link) 55
  • 56. Knowledge Management Institute Summary We W can observe that: b th t • semantic structures can be obtained as a byproduct of online crowd behavior (l b li t i navigation) (labeling, tagging, i ti ) • these structures can approximate structures in reference knowledge bases (DBPedia WordNet etc) (DBPedia, WordNet, but: • pragmatics influences resulting semantics • semantic preciseness remains a challenge Markus Strohmaier 2011 56
  • 57. Knowledge Management Institute An Agenda for Semantic Computing Research s R What‘s the knowledge of SAS? What h 10 August, 2011 0 Web Resources Natural Language Constructs Real-World Happenings Utilizing online behavior of crowds for 57 the construction, maintenance and enrichment of large-scale semantic structures Markus Strohmaier 2011 http://www.flickr.com/photos/waldoj/722508166/ http://www.flickr.com/photos/matthewfield/2306001896/ 57
  • 58. Knowledge Management Institute Thank You. Acknowledgements co-authors and collaborators H.P. Grahsl, D. Helic, C. Körner, R. Kern, C. Trattner (TU Graz) D. Benz, G. Stumme (U. Kassel) A. Hotho (U. Würzburg) S. H ll S Hellmann, J. Lehmann, C. Stadler, J. Unbehauen (U. of Leipzig, Germany) J L h C St dl J U b h (U f L i i G ) Markus Strohmaier 2011 58

×