Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

702 views

Published on

Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

  1. 1. Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs A. Elizabeth Cano (@pixarelli) amparo.cano@open.ac.uk Yulan He y.he9@aston.ac.uk Harith Alani h.alani@open.ac.uk 1
  2. 2. Introduction Social Media Streams 2
  3. 3. Introduction Representing Topics in Dynamic Environments Techniques for topic classification of Social Media are sensitive to the evolution of topics #Jan24 dead Egypt protest security Egypt Pres Morsi Tehran Syrian uprising Boston bombing suspect Watertown Obama strategy ISIS 3 dead in protest in Egypt. Security official vows to ‘deal firmly..#Jan24 Egypt Pres Morsi uses his visit to Tehran to praise the Syrian uprising #Boston bombing suspect “pinned down” on boat in Watertown Why Obama needs to rethink his entire ISIS strategy… 2011 2012 2013 2014 3
  4. 4. Introduction Challenges • Keeping updated model requires regular retuning. • Manual annotation expensive Questions • Which feature types provide a more stable representation of a topic? 4
  5. 5. Introduction Previous work Using local features • Bag of Words (BoW)[Genc et al., 2011] • BoW + Bag of Entities (BoE) [Vitale et al., 2012] • BoW + BoE + Part of Speech (PoS) tagging [Munoz et al., 2011][Varga et al., 2012] Exploiting the link structure of a Knowledge Source • Exploiting categories containing entities [Michelson et al., 2010] • Relating tweets with Wikipedia resources[Milne et al., 2008] [Xu et al., 2011]. • Use of semantic features for topic classification [Cano et al., 2013] [Varga et al.,2014]. 5
  6. 6. Introduction Topic Evolution Twitter Corpus Topic . . . . . . . . t t+1 . . . . Seman7c Lexical 6
  7. 7. Introduction Characterising Topic Changes with DBpedia Some features remain unchanged, others provide information of past, current or future contexts (e.g. dbp:UnitedStatesPresidentialCandidates)! dbo:wikiPageWikiLink 3.8 DBPEDIA dbp:Budget_Control_Act_of_2011 3.7 DBPEDIA dbp:Al-Qaeda category:United_States_presidential_candidates,_2012 dbp:Hawaii dbo:birthPlace dbp:Barack_Obama rdf:type yago:PresidentOfTheUnitedStates rdfs:subClassOf dbo:Person dbo:author dbp:Michelle_Obama dbo:spouse skos:subject dbp:The_Audacity_of_Hope .. dbp:Dreams_from_My_Father . . category:Community_organisers . . category:Columbia_University_Alumni 3.6 DBPEDIA skos:subject dbo:leader dbp:United_States_National_Council dbp:National_Science_and_Techology dbo:wikiPageWikiLink 7
  8. 8. Approach DBpedia Graph Snapshots Definition: Time-dependent Resource Meta Graph! Is a sequence of tuples G:=(R,P,C,Y, ft) where • R, P, C are finite sets whose elements are resources, properties and classes; • Y is a ternary relation Y ⊆ R× P ×C representing a hypergraph with ternary edges. • Y is a tripartite graph H (Y ) = V,D where the vertices are D = {{r, p, c} (r, p, c) ∈Y} • ft assigns a temporal marker to each ternary edge. 8
  9. 9. Approach Semantic Representation of a Tweet <dbp:Hosni_Mubarak> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> dbp: http://dbpedia.org/resource/ 9
  10. 10. Approach Semantic Representation of a Tweet Class Features (rdf:type) <dbo:OfficeHolder> rdf:type <dbp:Hosni_Mubarak> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> rdf:type <yago:NobelPeacePrizeLaureates> rdf:type <dbo:Country> rdf:type <dbo:Broadcaster> dbo: http://dbpedia.org/ontology/ 10
  11. 11. Approach Semantic Representation of a Tweet dbprop:title <dbp:Hosni_Mubarak> Property Features <dbp:Prime_Minister_of_Egypt> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> dbprop:nationality American dbprop:headquarters <dbp:Altanta> dbprop:languages <dbp:Egyptian_Arabic> 11 skos: http://dbpedia.org/resource/Category:
  12. 12. Approach Semantic Representation of a Tweet Category Features (skos) <skos:PresidentsOfEgypt> dcterms:subject <dbp:Hosni_Mubarak> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> dcterms:subject dcterms:subject dcterms:subject <skos:English-language_television_stations> <skos:Presidents_of_the_United_States <skos:Arab_republics> 12 skos: http://dbpedia.org/resource/Category:
  13. 13. Approach Semantic Representation of a Tweet dbprop:title <dbp:Hosni_Mubarak> Resource Features <dbp:Prime_Minister_of_Egypt> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> dbprop:commander dbprop:headquarters <dbp:Altanta> <dbp:Death_Of_Osama_Bin_Laden> dbprop:languages <dbp:Egyptian_Arabic> 13 skos: http://dbpedia.org/resource/Category:
  14. 14. Approach DBpedia Graph Snapshots I.e. The meta-graph of entity e is the aggregation of all resources, properties and classes related to this entity at time t. Properties and Resources <dbp:Barack_Obama> DBpedia 3.6 3.7 3.8 …. prop:spouse <MichelleObama> prop:birthPlace <Hawaii> prop:spouse <MichelleObama> prop:birthPlace <Hawaii> prop:commander prop:spouse <MichelleObama> prop:birthPlace <Hawaii> prop:wikiPageWikiLink <UnitedStatesPresidentialCandidates> prop:wikiPageWikiLink <dbp:Death_Of_Osama_Bin_Laden> 14 <Budget_Control_Act_of_2011>
  15. 15. Approach Semantic Feature Weighting Strategies Topic Relevance-based Weighting Strategy: Characterise the global relevance of a semantic feature to a given topic in DBpedia at a given point in time. DBpedia Graph Topic graph in DBpedia Graph ? 15
  16. 16. Approach Semantic Feature Weighting Strategies Topic Relevance-based Weighting Strategy: • Class-based Topic Relevance (ClsW) • Property-based Topic Relevance (PropW) • Category-based Topic Relevance (CatW) • Resource Relevance (ResW) 16
  17. 17. Approach Semantic Feature Weighting Strategies Integrating weights into a Tweet representation DB_ t Wx ( f ) = DB_ t Nx ( f ) +1 F + DB_ t Nx ( f ') f '∈F Σ # %% $ & (( ' ∗ WDB_ t ( f ) #$ 1/2 &' Semantic feature f in a document x: Frequency with Laplace smoothing Weight derived from DB_t graph 17
  18. 18. Experiments Framework for Twitter Topic Classification with DBpedia • Do semantic features built from DBpedia Graphs 18 aid on a cross-epoch topic classification of Tweets? • Which feature type provides a more stable topic representation over time?
  19. 19. Experiments Framework for Twitter Topic Classification with DBpedia Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 19
  20. 20. Experiments Datasets Tweets 2010 2011 2013 Nov-Dec Aug Sep 1x106 1x106 1x106 Assigns a topic label from a pool of over 10 categories Violence Related Topics Disaster and Accident (D&A) Law and Crime (L&C) War and Conflict (W&C) Perform Manual Annotation until 1K per year per Topic Negative set 1K per year for Topics other than the 3 12K annotated tweets 20
  21. 21. Experiments Framework for Twitter Topic Classification with DBpedia Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 Concept Enrichment <dbp:Hosni_Mubarak> <dbp:Barack_Obama> <dbp:Egypt> <dbp:CNN> 21
  22. 22. Experiments Framework for Twitter Topic Classification with DBpedia Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 Concept Enrichment Resource Backtrack Mapping 2010 2011 2013 Deriving Semantic Graph Snapshots 22
  23. 23. Experiments Framework for Twitter Topic Classification with DBpedia Concept Enrichment Resource Backtrack Mapping 2010 2011 2013 Deriving Semantic Graph Snapshots DBpedia Topic Relevance based Feature Weighting Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 23
  24. 24. Experiments Datasets LEX 24 W&C D&A L&C NEG 2010 2011 2013 2010 2011 2013 2010 2011 2013 2010 2011 2013 SEMANTIC BoW Category Property Resource Class
  25. 25. Experiments Framework for Twitter Topic Classification with DBpedia Concept Enrichment Resource Backtrack Mapping 2010 2011 2013 Deriving Semantic Graph Snapshots Topic Labelled Microposts 2010 2011 2013 Build Topic Classifier DBpedia Topic Relevance based Feature Weighting Microposts 2010 Dumps 2011 3.6 3.7 3.8 2013 Resources 3.9 25
  26. 26. Experiments Understanding the Stability of a Topic Representation Same epoch Scenario Lexical Semantic Combined Epoch t t+1 train test 26
  27. 27. Experiments Epoch Scenarios Same epoch Scenario (Trained on 2010- Tested on 2010) All the experiments reported in our paper were conducted using a 10-­‐fold cross valida7on seMng Disaster_Acc Law_Crime War_Conflict F1 F1 F1 BoW 0.831 0.765 0.844 Category 0.697 0.650 0.744 Property 0.680 0.639 0.720 Resource 0.692 0.637 0.762 Class 0.633 0.583 0.637 27
  28. 28. Experiments Understanding the Stability of a Topic Representation Same epoch Scenario Lexical Semantic Combined Epoch t t+1 train test Cross-epoch Scenario train test t t+1 28
  29. 29. Experiments Epoch Scenarios Cross-epoch Scenario (Trained on 2010- Tested on X) Disaster_Acc Cross- Epoch 2010-2011 2010-2013 2011-2013 Average F1 F1 F1 BoW 0.634 0.481 0.261 0.458 Category 0.683 0.539 0.524 0.582 Property 0.665 0.557 0.502 0.603 Resource 0.774 0.544 0.445 0.587 Class 0.691 0.665 0.669 0.675 29
  30. 30. Experiments Epoch Scenarios Averaged Cross-epoch Scenarios Disaster_Acc Law_Crime War_Conflict Average F1 F1 F1 BoW 0.458 0.620 0.531 0.536 Category 0.582 0.537 0.453 0.55 Property 0.574 0.504 0.506 0.528 Resource 0.587 0.578 0.466 0.544 Class 0.675 0.647 0.664 0.665 30
  31. 31. Conclusions • Semantic Features are much slower to decay than lexical features. • Semantic representation improve performance in cross-time setting scenarios. • Class based features alone achieve on average a gain of 7% over lexical features on cross-epoch setting scenarios. 31
  32. 32. Future Work • Concept-drift tracking for transfer learning using Linked Data sources. • Study cross-epoch transfer learning approaches using semantic features. 32
  33. 33. Questions ampaeli@gmail.com @pixarelli 33

×