Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

On the Quest for Changing Knowledge. Capturing emerging entities from social media. WebScience 2016 DDI

1,397 views

Published on

Massive data integration technologies have been recently used to produce very large ontologies. However, knowledge in the world continuously evolves, and ontologies are largely incomplete for what concerns low-frequency data, belonging to the so-called long tail.
Socially produced content is an excellent source for discovering emerging knowledge: it is huge, and immediately
reflects the relevant changes which hide emerging entities. Thus, we propose a method for
discovering emerging entities by extracting them from social content.

We start from a purely-syntactic method as a baseline, and we propose a semantics-based method based on entity recognition and DBpedia. The method associates candidate entities to feature vectors, built
from social content by using co-occurrence, and then extracts the emerging entities by using feature similarity measures.
Once instrumented by experts through very simple initialization, the method is capable of finding emerging
entities and extracting their relevant relationships to given types; the method can be
continuously or periodically iterated, using the already identified emerged knowledge as new starting point.

We validate our method by applying it to a set of diverse domain-specific application scenarios, spanning fashion, literature, exhibitions and so on. We show the approach at work and we demonstrate its effectiveness on datasets with different characterization in terms of coverage, dynamics and size.

Published in: Social Media
  • Be the first to comment

  • Be the first to like this

On the Quest for Changing Knowledge. Capturing emerging entities from social media. WebScience 2016 DDI

  1. 1. On the Quest for Changing Knowledge Marco Brambilla, Stefano Ceri, Florian Daniel, Emanuele Della Valle @marcobrambi
  2. 2. Data-driven innovation and Innovation-driven data
  3. 3. Innovation requires Precise To the point Up-to-date Domain-specific information
  4. 4. There are more things In heaven and earth, Horatio, Than are dreamt of in your philosophy. Shakespeare (Hamlet Act 1, scene 5)
  5. 5. From Data to Wisdom
  6. 6. Formalizing new knowledge is hard Only high frequency emerges The long tail challenge
  7. 7. Knowledge Extraction Text mining Semantic Web Search and recommendation systems No specific care for emerging knowledge
  8. 8. Heaven and Heart How to peer through an effective window on real world? Social media, our blessing and curse Domain experts matter
  9. 9. Can we use social networks to discover emerging knowledge?
  10. 10. Beware the streetlamp effect The bias of the source The bias of the observer
  11. 11. Famous Emerging
  12. 12. Evolving Knowledge consolidated knowledge social content factoid a c ¬c bpotentially emerging potentially decaying
  13. 13. Overview
  14. 14. Knowledge Enrichment Setting HF Entity1 HF Entity5 HF Entity2 HF Entity4 HF Entity3 LF Entity1 ?? LF Entity2 LF Entity4 LF Entity3 ?? High Frequency Entities Low Frequency Entities ?? ?? ???? ?? Type1 Type11 Type2 Type111 Instances Types <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> ?? ?? ?? ?? ?? Seed Entity Seed Type Type of interest Legend Expert inputs Enrichment problems Property2 Relations HF - LF entities Relations LF - LF entities Typing of LF entities Extraction of new LF entities Property1 ?? ?? ?? Finding attribute values
  15. 15. Emerging Knowledge Harvesting
  16. 16. Domain Types Types selected by the experts Relevant for the domain
  17. 17. Seed characterization Selected by the expert Belonging to an expert type Thoroughly Described # @ a w
  18. 18. Social Media Sourcing Content coming from the seeds’ accounts
  19. 19. Candidate Selection Potentially any entity extracted from the social streams Resulting in huge sets of candidates # @ a w ♥
  20. 20. Candidate Typing
  21. 21. Candidate Pruning Initial pruning of candidates based on TF-DF:= df * tf / (N – df +1) (*) variant of TF-IDF that does not discount document frequency because we are actually happy about frequent appearance (we don’t look for information entropy!)
  22. 22. Candidate Ranking
  23. 23. Candidate Vector Space Purely syntactic Semantic: Based on entity extraction / DBpedia Based on deep learning on images / ClarifAI
  24. 24. Example Analysis
  25. 25. Experiments Fashion brands Writers Painters Exhibitions
  26. 26. 4,400 strategies evaluated 44 alternative feature vectors (12 basic features and 32 aggregations) 9 different weighting values for aggregations 5 levels of recall for entity extraction 3 different distances
  27. 27. Pruning Phase From 4,400 down to 10 strategies Eliminating the less relevant parameters
  28. 28. Italian Fashion Brands Precision @5 = 0.2 Increasing # seeds reduces precision
  29. 29. Australian Writers – 22 seeds Precision @5 = 0.8
  30. 30. Innovative Painters – 21 seeds Precision @5 = 0.6
  31. 31. Twitter vs. Instagram P@5 = 1.0 P@5 = 0.8 vs.
  32. 32. Fashion: Twitter + Instagram &
  33. 33. & Writers: Twitter + Instagram Prec. = 1
  34. 34. Conclusion It’s about time to build innovation based on data and build knowledge based on innovation Harvesting can be iterative
  35. 35. On the Quest for Changing Knowledge contact us Marco Brambilla, @marcobrambi, marco.brambilla@polimi.it http://datascience.deib.polimi.it

×