Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Extracting emerging knowledge from social media - WWW2017


Published on

These are the slides presenting our full paper titled "Extracting Emerging Knowledge from Social Media" at the WWW 2017 conference.

The work is based on a rather obvious assumption, i.e., that knowledge in the world continuously evolves, and ontologies are largely incomplete for what concerns low-frequency data, belonging to the so-called long tail.

Socially produced content is an excellent source for discovering emerging knowledge: it is huge, and immediately reflects the relevant changes which hide emerging entities.

In the paper we propose a method and a tool for discovering emerging entities by extracting them from social media.

Once instrumented by experts through very simple initialization, the method is capable of finding emerging entities; we propose a mixed syntactic + semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors, built by using terms occurring in their social content, and then ranks the candidates by using their distance from the centroid of seeds, returning the top candidates as result.

The method can be continuously or periodically iterated, using the results as new seeds.

The PDF of the full paper presented at WWW 2017 is available online (open access with Creative Common license).

You can also check out the slides of my presentation on Slideshare.

A demo version of the tool is available online for free use, thanks also to our partners Dandelion and Microsoft Azure.
You can TRY THE TOOL online if you want.

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website!
    Are you sure you want to  Yes  No
    Your message goes here

Extracting emerging knowledge from social media - WWW2017

  1. 1. Extracting Emerging Knowledge from Social Media Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Riccardo Volonterio, Felix Acero Salazar marcobrambi WWW 2017, Perth, Australia
  2. 2. Humans aim at formalizing knowledge
  3. 3. Ontology is the philosophical study of the nature of being, becoming, existence or reality and the basic categories of being and their relations.
  4. 4. the nature of being, becoming, existence or reality the basic categories of being and their relations.
  5. 5. the nature of being, becoming, existence or reality the basic categories of being and their relations.
  6. 6. Formalizing new knowledge is hard Only high frequency emerges The long tail challenge
  7. 7. There are more things In heaven and earth, Horatio, Than are dreamt of in your philosophy. Shakespeare (Hamlet Act 1, scene 5)
  8. 8. The Answer to the Great Question... Of Life, the Universe and Everything Data Information Knowledge WisdomContext independence Understanding Understanding relations Understanding patterns Understanding principles
  9. 9. Our focus: The Evolving Knowledge known social factoid a c ¬c bpotentially emerging potentially decaying actual and solid d
  10. 10. Heaven and Heart How to peer into the world through an effective window? TWO INGREDIENTS Social media – the data Domain experts – the context
  11. 11. Can we use social media to discover and codify emerging knowledge?
  12. 12. Overview
  13. 13. Famous Emerging …
  14. 14. Knowledge Enrichment Setting HF Entity1 HF Entity5 HF Entity2 HF Entity4 HF Entity3 LF Entity1 ?? LF Entity2 LF Entity4 LF Entity3 ?? High Frequency Entities Low Frequency Entities ?? ?? ???? ?? Type1 Type11 Type2 Type111 Instances Types <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> ?? ?? ?? ?? ?? Seed Entity Seed Type Type of interest Legend Expert inputs Enrichment problems Property2 Relations HF - LF entities Relations LF - LF entities Typing of LF entities Extraction of new LF entities Property1 ?? ?? ?? Finding attribute values
  15. 15. Emerging Knowledge Harvesting
  16. 16. Input (1): Domain Specific Types Types selected by the expert Relevant for the domain
  17. 17. Input (2): Seeds (emerging entities) Known and selected by the domain expert Belonging to an expert type Thoroughly Described # @ a
  18. 18. Objectives (1) Discover candidate unknown emerging entities (2) Determine the relevance of the candidate (3) Determine the type of the candidate
  19. 19. Step (1): Social Media Sourcing Collect content produced by the seeds
  20. 20. Step (2): Candidate Extraction Potentially any entity extracted from the social streams of the seeds Resulting in huge sets of candidates Our hyp.: take only SN users as candidates # @ w @
  21. 21. Step (3): Candidate Pruning Initial pruning of candidates based on TF-DF:= df * ttf / (N – df +1) Where: df = Number of seeds with which a candidate co-occurs with; ttf = Total number of times a candidate occurs in the analyzed content; N = Number of seeds. Ranking + threshold (*) variant of TF-IDF that does not discount document frequency because we are actually happy about frequent appearance (we don’t look for information entropy!)
  22. 22. Step (4): Candidate Description Repeat social media sourcing for candidates A potentially good candidate is one that behaves similarly to one or more of the seeds Our hyp.: Talks about the same things # @ w
  23. 23. Step (5): Candidate Ranking Seed centroid
  24. 24. Step (6): Feature selection Purely syntactic only user handles (accounts) handles and hashtags Semantic: based on entity extraction / Dbpedia based on deep learning on images / ClarifAI
  25. 25. Step (6): Semantic Feature selection for text 9 basic strategies Generating 18 combinations of T + E strategies
  26. 26. 990 semantic strategies evaluated 18 alternative feature vectors 11 different weighting values for aggregations 5 levels of recall for entity extraction ( + 3 different distance functions analyzed)
  27. 27. Experiments Fashion Brands Writers Exhibitions
  28. 28. Emerging Australian Writers – 22 seeds in June in Melbourne
  29. 29. Emerging Australian Writers Weighting parameter Entity extraction recall
  30. 30. Emerging Australian Writers Precision @ K for two strategies EHE—AST CHE—AST
  31. 31. Cross-scenario 39 strategies always outperform the syntactic one Writers Expo Fashion
  32. 32. Conclusions Extraction of relevant emerging entities Top, Fast and Reliable are the important Off-the-shelf or as-a-service tools
  33. 33. Repeatability in time (years!) Recursion (candidates to seeds) Multi-source data collection Multiple types Emerging relations Emerging types Challenges ahead
  34. 34. You can try it yourself!
  35. 35. THANKS! QUESTIONS? Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Riccardo Volonterio, Felix Acero Salazar Extracting Emerging Knowledge from Social Media Marco Brambilla @marcobrambi