Advertisement
Advertisement

More Related Content

More from Marco Brambilla(20)

Advertisement

Extracting emerging knowledge from social media - WWW2017

  1. Extracting Emerging Knowledge from Social Media Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Riccardo Volonterio, Felix Acero Salazar marco.brambilla@polimi.it marcobrambi WWW 2017, Perth, Australia
  2. Humans aim at formalizing knowledge
  3. Ontology is the philosophical study of the nature of being, becoming, existence or reality and the basic categories of being and their relations.
  4. the nature of being, becoming, existence or reality the basic categories of being and their relations.
  5. the nature of being, becoming, existence or reality the basic categories of being and their relations.
  6. Formalizing new knowledge is hard Only high frequency emerges The long tail challenge
  7. There are more things In heaven and earth, Horatio, Than are dreamt of in your philosophy. Shakespeare (Hamlet Act 1, scene 5)
  8. The Answer to the Great Question... Of Life, the Universe and Everything Data Information Knowledge WisdomContext independence Understanding Understanding relations Understanding patterns Understanding principles
  9. Our focus: The Evolving Knowledge known social factoid a c ¬c bpotentially emerging potentially decaying actual and solid d
  10. Heaven and Heart How to peer into the world through an effective window? TWO INGREDIENTS Social media – the data Domain experts – the context
  11. Can we use social media to discover and codify emerging knowledge?
  12. Overview
  13. Famous Emerging …
  14. Knowledge Enrichment Setting HF Entity1 HF Entity5 HF Entity2 HF Entity4 HF Entity3 LF Entity1 ?? LF Entity2 LF Entity4 LF Entity3 ?? High Frequency Entities Low Frequency Entities ?? ?? ???? ?? Type1 Type11 Type2 Type111 Instances Types <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> <<instanceof>> ?? ?? ?? ?? ?? Seed Entity Seed Type Type of interest Legend Expert inputs Enrichment problems Property2 Relations HF - LF entities Relations LF - LF entities Typing of LF entities Extraction of new LF entities Property1 ?? ?? ?? Finding attribute values
  15. Emerging Knowledge Harvesting
  16. Input (1): Domain Specific Types Types selected by the expert Relevant for the domain
  17. Input (2): Seeds (emerging entities) Known and selected by the domain expert Belonging to an expert type Thoroughly Described # @ a
  18. Objectives (1) Discover candidate unknown emerging entities (2) Determine the relevance of the candidate (3) Determine the type of the candidate
  19. Step (1): Social Media Sourcing Collect content produced by the seeds
  20. Step (2): Candidate Extraction Potentially any entity extracted from the social streams of the seeds Resulting in huge sets of candidates Our hyp.: take only SN users as candidates # @ w @
  21. Step (3): Candidate Pruning Initial pruning of candidates based on TF-DF:= df * ttf / (N – df +1) Where: df = Number of seeds with which a candidate co-occurs with; ttf = Total number of times a candidate occurs in the analyzed content; N = Number of seeds. Ranking + threshold (*) variant of TF-IDF that does not discount document frequency because we are actually happy about frequent appearance (we don’t look for information entropy!)
  22. Step (4): Candidate Description Repeat social media sourcing for candidates A potentially good candidate is one that behaves similarly to one or more of the seeds Our hyp.: Talks about the same things # @ w
  23. Step (5): Candidate Ranking Seed centroid
  24. Step (6): Feature selection Purely syntactic only user handles (accounts) handles and hashtags Semantic: based on entity extraction / Dbpedia based on deep learning on images / ClarifAI
  25. Step (6): Semantic Feature selection for text 9 basic strategies Generating 18 combinations of T + E strategies
  26. 990 semantic strategies evaluated 18 alternative feature vectors 11 different weighting values for aggregations 5 levels of recall for entity extraction ( + 3 different distance functions analyzed)
  27. Experiments Fashion Brands Writers Exhibitions
  28. Emerging Australian Writers – 22 seeds http://www.emergingwritersfestival.org.au/ in June in Melbourne
  29. Emerging Australian Writers Weighting parameter Entity extraction recall
  30. Emerging Australian Writers Precision @ K for two strategies EHE—AST CHE—AST
  31. Cross-scenario 39 strategies always outperform the syntactic one Writers Expo Fashion
  32. Conclusions Extraction of relevant emerging entities Top, Fast and Reliable are the important Off-the-shelf or as-a-service tools
  33. Repeatability in time (years!) Recursion (candidates to seeds) Multi-source data collection Multiple types Emerging relations Emerging types Challenges ahead
  34. You can try it yourself! http://datascience.deib.polimi.it/social-knowledge
  35. THANKS! QUESTIONS? Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Riccardo Volonterio, Felix Acero Salazar Extracting Emerging Knowledge from Social Media Marco Brambilla @marcobrambi marco.brambilla@polimi.it http://datascience.deib.polimi.it http://home.deib.polimi.it/marcobrambi
Advertisement