Steffen Staab
staab@uni-koblenz.de
1WeST
Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Modelling the ...
Steffen Staab
staab@uni-koblenz.de
2WeST
What do people want from the Web?
Web as storage
library
memory
Web as tool
searc...
Steffen Staab
staab@uni-koblenz.de
3WeST
What are some of the footprints people leave?
Steffen Staab
staab@uni-koblenz.de
4WeST
My Agenda in the Large
Web Content
 Discovering patterns
 Building tools
 Unde...
Steffen Staab
staab@uni-koblenz.de
5WeST
1. Modelling
Text
My Agenda for Today
Web Content Web Interaction
Web Evolution
2...
Steffen Staab
staab@uni-koblenz.de
6WeST
1. Modelling
Text
My Agenda for Today
Web Content Web Interaction
Web Evolution
2...
Steffen Staab
staab@uni-koblenz.de
7WeST
Autocompletion of queries
„UK is“?
Steffen Staab
staab@uni-koblenz.de
8WeST
Language Models
What follows „UK is“?
Conditional probability:
where
Issue:
Long ...
Steffen Staab
staab@uni-koblenz.de
9WeST
Modified Kneser-Ney Smoothing of n-grams
If sequence is hard to observe
then appr...
Steffen Staab
staab@uni-koblenz.de
10WeST
Modified Kneser-Ney Smoothing of n-grams
If sequence is hard to observe
then app...
Steffen Staab
staab@uni-koblenz.de
11WeST
Generalized Language Models [ACL14]
If sequence is too hard to observe,
then app...
Steffen Staab
staab@uni-koblenz.de
12WeST
Improvement of GLMs [ACL14]
Evaluation measure: Perplexity
Data set: English Wik...
Steffen Staab
staab@uni-koblenz.de
13WeST
Outlook for Generalized Language Models
 Correcting mistakes that are done in a...
Steffen Staab
staab@uni-koblenz.de
14WeST
1. Modelling
Text
My Agenda for Today
Web Content Web Interaction
Web Evolution
...
Steffen Staab
staab@uni-koblenz.de
15WeST
Evolution of Networks [ICWSM 2013]
Additions RemovalsTraining
Link
Prediction
Pr...
Steffen Staab
staab@uni-koblenz.de
16WeST
Related Work in Brief
Prediction feature f assigns a score to node pair (i, j)
...
Steffen Staab
staab@uni-koblenz.de
17WeST
Related Work in Brief
Static features
 degree
 common-neighbours
 path3
 loc...
Steffen Staab
staab@uni-koblenz.de
18WeST
Unlink prediction is much more difficult than link prediction
The Snapshot View
...
Steffen Staab
staab@uni-koblenz.de
19WeST
Related Work in Brief
Additions RemovalsTraining
Link
Prediction
Problem
Unlink
...
Steffen Staab
staab@uni-koblenz.de
20WeST
Our Approach - 1
Additions RemovalsTraining
Link
Prediction
Problem
Unlink
Predi...
Steffen Staab
staab@uni-koblenz.de
21WeST
Our Approach - 2
Dynamic features:
+ recency
+ longevity
Extrapolation for tempo...
Steffen Staab
staab@uni-koblenz.de
22WeST
Evaluation & Discussion (excerpt)
 Temporal link prediction significantly bette...
Steffen Staab
staab@uni-koblenz.de
23WeST
Outlook for Evolution of Networks
 Temporal dynamics still underexplored
 lack...
Steffen Staab
staab@uni-koblenz.de
24WeST
1. Modelling
Text
My Agenda for Today
Web Content Web Interaction
Web Evolution
...
Steffen Staab
staab@uni-koblenz.de
25WeST
fish, rice
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
fis...
Steffen Staab
staab@uni-koblenz.de
26WeST
fish, rice
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
fis...
Steffen Staab
staab@uni-koblenz.de
27WeST
Cultural areas, country borders, geographical features and other
geographical ob...
Steffen Staab
staab@uni-koblenz.de
28WeST
fish, rice
lobster, shrimp
seafood, fish seafood, shrimp lobster, wine
seafood, ...
Steffen Staab
staab@uni-koblenz.de
29WeST
fish, rice
lobster, shrimp
seafood, fish seafood, shrimp lobster, wine
seafood, ...
Steffen Staab
staab@uni-koblenz.de
30WeST
fish, rice
lobster, shrimp
seafood, fish seafood, shrimp lobster, wine
seafood, ...
Steffen Staab
staab@uni-koblenz.de
31WeST
Cluster adjacency Dependencies of document-
specific topic distributions
Exchang...
Steffen Staab
staab@uni-koblenz.de
32WeST
Exchange of topic information between clusters
MGTM 4: Exchange of Topic Informa...
Steffen Staab
staab@uni-koblenz.de
33WeST
Exchange of topic information between clusters
MGTM 4: Exchange of Topic Informa...
Steffen Staab
staab@uni-koblenz.de
34WeST
Exchange of topic information between clusters
MGTM 4: Exchange of Topic Informa...
Steffen Staab
staab@uni-koblenz.de
36WeST
Evaluation: Anectodal, Perplexity, Gaming
Gaming study:
intrusion detection
Prec...
Steffen Staab
staab@uni-koblenz.de
37WeST
Outlook for LDA with structure
 Texts + social network structures
 scientometr...
Steffen Staab
staab@uni-koblenz.de
38WeST
Future: Knowledge about social aspects needed
Future: CS style models for social...
Steffen Staab
staab@uni-koblenz.de
39WeST
References
[ACL14] R. Pickhardt, T. Gottron, M. Körner, P. G. Wagner, T. Speiche...
Steffen Staab
staab@uni-koblenz.de
40WeST
Semantic
Web
Social Web &
Web Retrieval
Interactive Web &
Human Computing
Web &
...
Steffen Staab
staab@uni-koblenz.de
41WeST
Maslows pyramid of needs
Upcoming SlideShare
Loading in …5
×

Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

1,369 views

Published on

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,369
On SlideShare
0
From Embeds
0
Number of Embeds
90
Actions
Shares
0
Downloads
6
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

  1. 1. Steffen Staab staab@uni-koblenz.de 1WeST Web Science & Technologies University of Koblenz ▪ Landau, Germany Modelling the Web Examples of Modelling Text, Knowledge Networks and Physical-Social Systems Steffen Staab
  2. 2. Steffen Staab staab@uni-koblenz.de 2WeST What do people want from the Web? Web as storage library memory Web as tool search transaction Web as social medium communication cooperation Web as mirror of self Identification outreach
  3. 3. Steffen Staab staab@uni-koblenz.de 3WeST What are some of the footprints people leave?
  4. 4. Steffen Staab staab@uni-koblenz.de 4WeST My Agenda in the Large Web Content  Discovering patterns  Building tools  Understanding Web Interaction  Monitoring  Exploiting  Guiding  Understanding Web Evolution  Monitoring  Predicting  Guiding  Understanding
  5. 5. Steffen Staab staab@uni-koblenz.de 5WeST 1. Modelling Text My Agenda for Today Web Content Web Interaction Web Evolution 2. Modeling Network Evolution 3. Modeling Physical- social Data
  6. 6. Steffen Staab staab@uni-koblenz.de 6WeST 1. Modelling Text My Agenda for Today Web Content Web Interaction Web Evolution 2. Modeling Network Evolution 3. Modeling Physical- social Data
  7. 7. Steffen Staab staab@uni-koblenz.de 7WeST Autocompletion of queries „UK is“?
  8. 8. Steffen Staab staab@uni-koblenz.de 8WeST Language Models What follows „UK is“? Conditional probability: where Issue: Long word sequences can rarely be observed
  9. 9. Steffen Staab staab@uni-koblenz.de 9WeST Modified Kneser-Ney Smoothing of n-grams If sequence is hard to observe then approximate recursively observing marginal frequencies of ......
  10. 10. Steffen Staab staab@uni-koblenz.de 10WeST Modified Kneser-Ney Smoothing of n-grams If sequence is hard to observe then approximate recursively observing marginal frequencies of First recursion step: Problem: If last word in the sequnce is rare, the overall sequence will be rare, then the approximation will be of low quality.
  11. 11. Steffen Staab staab@uni-koblenz.de 11WeST Generalized Language Models [ACL14] If sequence is too hard to observe, then approximate based on marginal probabilities of ... recursively. Core idea of formal solution: Recursively applicable, commutative skip operators
  12. 12. Steffen Staab staab@uni-koblenz.de 12WeST Improvement of GLMs [ACL14] Evaluation measure: Perplexity Data set: English Wikipedia, different sample sizes Relative improvement: 2,6% (most training data, smallest model) to 13,9% (least training data, largest model) Perplexity (normalized)
  13. 13. Steffen Staab staab@uni-koblenz.de 13WeST Outlook for Generalized Language Models  Correcting mistakes that are done in all tools  Lack of appropriate models  Other operators („the wild black cat“)  Delete: „the black cat“  Part-of-speech: „the adj adj cat“  Application: e.g. next word prediction  Other data structures  Tree-like data  Graph data proposal for Google current focus Semantic Web
  14. 14. Steffen Staab staab@uni-koblenz.de 14WeST 1. Modelling Text My Agenda for Today Web Content Web Interaction Web Evolution 2. Modeling Network Evolution 3. Modeling Physical- social Data
  15. 15. Steffen Staab staab@uni-koblenz.de 15WeST Evolution of Networks [ICWSM 2013] Additions RemovalsTraining Link Prediction Problem Unlink Prediction Problem Markov assumption: history irrelevant
  16. 16. Steffen Staab staab@uni-koblenz.de 16WeST Related Work in Brief Prediction feature f assigns a score to node pair (i, j)  implies to be ranked above • Link Prediction: edge likelier to be added • Unlink Prediction: edge likelier to be removed f (i , j) > f (i ,k) (i , j) (i , k)
  17. 17. Steffen Staab staab@uni-koblenz.de 17WeST Related Work in Brief Static features  degree  common-neighbours  path3  local-clustering- coefficient/embeddedness  ... Prediction feature f assigns a score to node pair (i, j)  implies to be ranked above • Link Prediction: edge likelier to be added • Unlink Prediction: edge likelier to be removed f (i , j) > f (i ,k) (i , j) (i , k)
  18. 18. Steffen Staab staab@uni-koblenz.de 18WeST Unlink prediction is much more difficult than link prediction The Snapshot View Link and unlink prediction (ICWSM 2013)
  19. 19. Steffen Staab staab@uni-koblenz.de 19WeST Related Work in Brief Additions RemovalsTraining Link Prediction Problem Unlink Prediction Problem Markov assumption: history irrelevant Advantage: General Model Disadvantage: General Model Idea Keep generality, improve prediction
  20. 20. Steffen Staab staab@uni-koblenz.de 20WeST Our Approach - 1 Additions RemovalsTraining Link Prediction Problem Unlink Prediction Problem Markov assumption: history irrelevant Hypothesis: Temporal information generally improves prediction Idea 1 Nodes concerned 2 Neighbourhood
  21. 21. Steffen Staab staab@uni-koblenz.de 21WeST Our Approach - 2 Dynamic features: + recency + longevity Extrapolation for temporal preferential attachment:
  22. 22. Steffen Staab staab@uni-koblenz.de 22WeST Evaluation & Discussion (excerpt)  Temporal link prediction significantly better, but only sightly  Temporal unlink prediction always significantly improved  Temporal preferential attachment best AUC baseline qualitative quantitative extrapolation
  23. 23. Steffen Staab staab@uni-koblenz.de 23WeST Outlook for Evolution of Networks  Temporal dynamics still underexplored  lack of datasets!  next experiments: • Twitter followers • Xing.de  Unlinks lead to link recommendation  new Wikipedia link (reorganization of Wikipedia pages!)  new job  new friend
  24. 24. Steffen Staab staab@uni-koblenz.de 24WeST 1. Modelling Text My Agenda for Today Web Content Web Interaction Web Evolution 2. Modeling Network Evolution 3. Modeling Physical- social Data
  25. 25. Steffen Staab staab@uni-koblenz.de 25WeST fish, rice seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon fish, salmon, wine rice, fish lobster, seafood, shrimp coffee coffee, wine coffee wine wine pizza, wine pizza, wine pasta, wine pasta, shrimp lobster, shrimp seafood, shrimp Tagged photos with geo-coordinates from Flickr
  26. 26. Steffen Staab staab@uni-koblenz.de 26WeST fish, rice seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta seafood, shrimp lobster, shrimp Tasks: Discovering topics, finding clusters
  27. 27. Steffen Staab staab@uni-koblenz.de 27WeST Cultural areas, country borders, geographical features and other geographical observations exhibit complex spatial distributions wikipedia.org Challenge
  28. 28. Steffen Staab staab@uni-koblenz.de 28WeST fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta A. Ahmed, L. Hong and A. Smola, 2013 (following (Yin et al 2011; Sizov 2010)) Existing approaches: Gaussian regions
  29. 29. Steffen Staab staab@uni-koblenz.de 29WeST fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta MGTM 1: Global Topic Clustering
  30. 30. Steffen Staab staab@uni-koblenz.de 30WeST fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta MGTM 2: Determining Neighbourhoods
  31. 31. Steffen Staab staab@uni-koblenz.de 31WeST Cluster adjacency Dependencies of document- specific topic distributions Exchange of topic information between clusters MGTM 3: Derived Topic Model
  32. 32. Steffen Staab staab@uni-koblenz.de 32WeST Exchange of topic information between clusters MGTM 4: Exchange of Topic Information
  33. 33. Steffen Staab staab@uni-koblenz.de 33WeST Exchange of topic information between clusters MGTM 4: Exchange of Topic Information
  34. 34. Steffen Staab staab@uni-koblenz.de 34WeST Exchange of topic information between clusters MGTM 4: Exchange of Topic Information
  35. 35. Steffen Staab staab@uni-koblenz.de 36WeST Evaluation: Anectodal, Perplexity, Gaming Gaming study: intrusion detection Precision 8 topics avg / median LGTA 0.60 / 0.58 Basic model 0.64 / 0.58 MGTM 0.78 / 0.75
  36. 36. Steffen Staab staab@uni-koblenz.de 37WeST Outlook for LDA with structure  Texts + social network structures  scientometry  xing.de  Web pages + user visits  chefkoch.de
  37. 37. Steffen Staab staab@uni-koblenz.de 38WeST Future: Knowledge about social aspects needed Future: CS style models for social sciences
  38. 38. Steffen Staab staab@uni-koblenz.de 39WeST References [ACL14] R. Pickhardt, T. Gottron, M. Körner, P. G. Wagner, T. Speicher, S. Staab. A Generalized Language Model as the Combination of Skipped n- grams and Modified Kneser Ney Smoothing. In: Proc. of ACL-2014 - The 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, June 22-27, 2014. [WSDM14] C. Kling, J. Kunegis, S. Sizov, S. Staab. Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections. In: Proc. of the 7th ACM Conference on Web Search and Data Mining (WSDM2014), New York, US, February 24-28, 2014. [ICWSM13] J.Preusse, J.Kunegis, M.Thimm, T.Gottron, S. Staab. Structural Changes in Collaborative Knowledge Networks. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM 2013), Boston, July 8-10, 2013.
  39. 39. Steffen Staab staab@uni-koblenz.de 40WeST Semantic Web Social Web & Web Retrieval Interactive Web & Human Computing Web & Economy Software & Services Web Science & Technologies Team & Research Computational Social Science Thank You!
  40. 40. Steffen Staab staab@uni-koblenz.de 41WeST Maslows pyramid of needs

×