Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Harvesting Knowledge from Social Networks: Extracting Typed Relationships among Entities

325 views

Published on

Knowledge bases like DBpedia, Yago or Google's Knowledge
Graph contain huge amounts of ontological knowledge harvested from
(semi-)structured, curated data sources, such as relational databases or
XML and HTML documents. Yet, the Web is full of knowledge that is
not curated and/or structured and, hence, not easily indexed, for ex-
ample social data. Most work so far in this context has been dedicated
to the extraction of entities, i.e., people, things or concepts. This poster
describes our work toward the extraction of relationships among entities.
The objective is reconstructing a typed graph of entities and relation-
ships to represent the knowledge contained in social data, without the
need for a-priori domain knowledge. The experiments with real datasets
show promising performance across a variety of domains.
The key distinguishing
feature of the work is its focus on highly unstructured social data (tweets and
Facebook posts) without reliable grammar structures. Traditional relation extraction approaches supervised , semi-supervised or unsupervised,
commonly assume the availability of grammatically correct language corpora.

Published in: Data & Analytics
  • Be the first to comment

Harvesting Knowledge from Social Networks: Extracting Typed Relationships among Entities

  1. 1. Harvesting Knowledge from Social Networks: Extracting Typed Relationships among Entities Andrea Caielli, Marco Brambilla, Stefano Ceri, Florian Daniel marco.brambilla@polimi.it marcobrambi SoWeMine Workshop @ ICWE 2017, Rome, Italy
  2. 2. Agenda (1)Context (2)Objectives (3)Method (4)Experiments and Validation (5)Visualization and Exploration (6)Conclusions
  3. 3. (1) Context
  4. 4. Ontology is the philosophical study of the nature of being, becoming, existence or reality and the basic categories of being and their relations.
  5. 5. Formalizing new knowledge is hard Only high frequency emerges The long tail challenge
  6. 6. Sourcing the Long Tail Famous Emerging …
  7. 7. (2) Objective
  8. 8. Objective Extraction of relationships among entities Reconstruct a typed graph of entities & relationships Represent the knowledge contained in social data No need for a-priori domain knowledge
  9. 9. Knowledge Enrichment Setting HF Entity1 HF Entity5 HF Entity2 HF Entity4 HF Entity3 LF Entity1 ?? LF Entity2 LF Entity4 LF Entity3 ?? High Frequency Entities Low Frequency Entities ?? ?? ???? ?? Type1 Type11 Type2 Type111 Instances Types <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> ?? ?? ?? ?? ?? Seed Entity Seed Type Type of interest Legend Expert inputs Enrichment problems Property2 Relations HF - LF entities Relations LF - LF entities Typing of LF entities Extraction of new LF entities Property1 ?? ?? ?? Finding attribute values
  10. 10. A Practical Example
  11. 11. A Practical Example
  12. 12. Challenge and Innovation Highly unstructured social data (tweets and Facebook posts) No reliable grammar structures
  13. 13. (3) Method
  14. 14. Analysis Pipeline (0) Preprocessing (1) Entity Extraction (2) Relationship Extraction (3) Relationship Aggregation (4) Relationship Typing (1) Evolution of work presented in: M. Brambilla, S. Ceri, E. Della Valle, R. Volonterio, and F. Acero Salazar. “Extracting Emerging Knowledge from Social Media”, WWW 2017.
  15. 15. Pipeline Summary
  16. 16. (0) Preprocessing Text cleaning and enrichment + Traditional text preprocessing (stemming, …)
  17. 17. (1) Entity Extraction Entity identification and semantic typing Exploiting: Stanford CoreNLP NER Dandelion API
  18. 18. (2) Relationship Extraction Baseline with Stanford OpenIE for triple extraction: Several issues: - Meaningless relations - Wrong relations - Multiple relations
  19. 19. (3) Relationship Aggregation Sails fans. Season 2 airs on May 24th on History on D Stv Jag Comms Too many answers for the same question! Empirical rules {"entity1":"Season 2", "relationship":"air on", "entity2":"May 24th"}
  20. 20. (4) Relationship Typing (A): Synonyms Exploiting synsets based on WordNet 3.1
  21. 21. (4) Relationship Typing (B): Matching Types
  22. 22. (4) Relationship Typing (C): Linguistics Based on VerbNet Groupings of verbs based on syntactic and semantic properties
  23. 23. Pipeline Implementation
  24. 24. (4) Validation
  25. 25. Experiments TV Series: Black Salis, Teen Wolf, Vikings Milan Fashion Week Rugby games
  26. 26. Domains and quality of results - summary
  27. 27. Relationships and Verb Classes
  28. 28. Example: Teen Wolf 0 100 200 300 400 500 600 700 800 Occurrences Teen Wolf Synonyms Classes
  29. 29. Example: Teen Wolf 0 100 200 300 400 500 600 700 800 Occurrences Teen Wolf Synonyms Classes OCCURRENCES TEEN WOLF VERBNET CLASSES
  30. 30. Overall Quality Indexes of Entity and Relationships Extraction
  31. 31. (5) Visualization
  32. 32. Motivation Resulting semantic models extremely large and hard to interpret Example: Black Sails collection, containing 1243 entities and 2025 relations.
  33. 33. Exploration Visualization Filtering Navigation
  34. 34. Exploration Visualization Filtering Navigation
  35. 35. Exploration Visualization RELATIONSHIP Filtering Navigation
  36. 36. Examples Milano Fashion Week Generate d graph
  37. 37. Examples Milano Fashion Week Generated graph
  38. 38. Examples Milano Fashion Week Generated graph
  39. 39. Examples Milano Fashion Week Generated graph
  40. 40. (6) Conclusions
  41. 41. Conclusions Extraction of relevant emerging relationships feasible even in case of extremely unstructured and informal content (social media) Still a long way to perfect extraction: •N-ary relations •Time-dependency •Poor typing of entities in ontologies
  42. 42. THANKS! QUESTIONS? Andrea Caielli, Marco Brambilla, Stefano Ceri, Florian Daniel Harvesting Knowledge from Social Networks: Extracting Typed Relationships among Entities Marco Brambilla @marcobrambi marco.brambilla@polimi.it http://datascience.deib.polimi.it http://home.deib.polimi.it/marcobrambi

×