Web Science & Technologies                           University of Koblenz ▪ Landau, Germany          Managing Social Comm...
Institut WeST – Web Science & TechnologiesSemantic Web     Web Retrieval        Interactive Web      Multimedia Web       ...
Plan for this Talk                             1 Web                        2 Science               Steffen Staab         ...
Social Communities        …are everywhere                                          c           Steffen Staab          Web ...
Risks                                Opportunities            Bad content quality,                     Open innovation,   ...
Large-scale Testbeds2013                                                                20135M users                      ...
SAP Business Partner Use CaseSAP Developer Network                                  Size of user generated          Posts ...
ROBUST: IBM Employee Use CaseBusiness Data                    Created per day                           Number of users   ...
Risks in Online CommunitiesDefinition: Risk                                                      Likelihood       Probabi...
Communities: dynamics and confidentialityROBUST supports decision making for users, hosts and service providersManaging gr...
Agenda• Risks and Opportunities in Social Communities:  the ROBUST project• Many related Talks in this Summer SchoolRobust...
Plan for this Talk                             1 Web                        2 Science               Steffen Staab         ...
Bild eines schwarzen Lochs             Steffen Staab                    Web Science Doctoral             staab@uni-koblenz...
Agenda• Risks and Opportunities in Social Communities:  the ROBUST project• Web Science Methodology:  An explanation by an...
Better understanding of the tagging process     Cooperative classification of resources     Which factors influence the ...
Methodology         User interface          Something else?                                                              T...
Components of AnalysisProperties of Tag Streams                                                            Observations   ...
Stream Views of a FolksonomyFolksonomies:    Vertices: Users, tags, resources    Edges: Tag assignments    Postings:   ...
Co-occurrence StreamsCo-occurrence Streams:   All tags co-occurring with a given tag in a posting   Ordered by posting t...
Properties of Co-occurrence Streams – Tag Growth                                          linear                          ...
Properties of Co-occurrence Streams – Tag Frequencies                          power law             Steffen Staab        ...
Resource StreamsResource Streams:   All tags assigned to a resource   Ordered by posting timeResource stream for r2:   ...
Properties of Resource Streams – Tag Frequencies             Steffen Staab          Web Science Doctoral             staab...
Properties of Resource Streams – Tag Frequencies             Steffen Staab          Web Science Doctoral             staab...
Web Science & Technologies                    University of Koblenz ▪ Landau, GermanySimulating the Evolution of Tag Streams
Simulating tag streams        Which of my concepts                                     Inspiration for conceptualization f...
The Delicious User InterfaceImitating previous tag assignments:    Recommended tags: Intersection of tags of a user and t...
Simulating a Tag StreamStart with empty tag streamEach simulation step appends a new tag assignmentSimulation of a single ...
Modeling Background Knowledge   Text Corpora                              Del.icio.us                       Text CorporaPB...
Modeling Tag Imitation              PBK                   t      t-1   t-2   t-3   t-4   t-5    …   t-h   …               ...
Web Science & Technologies         University of Koblenz ▪ Landau, GermanySimulation Results
Overall Scheme          User interface              Something else?                                                       ...
Simulating Co-occurrence StreamsTag growth:   Influenced by PBK and p(w|t)Tag Frequencies:   Influenced by PBK, p(w|t), ...
Co-occ. Streams – Simulated Tag Growth             Steffen Staab          Web Science Doctoral             staab@uni-koble...
Co-occ. Stream – Simulated Tag Frequencies             Steffen Staab          Web Science Doctoral             staab@uni-k...
Simulating Resource StreamsPI and PBK: Values comparable to co-occurrence streamsp(w|r): Approximated by p(w|t)n: 7 tags a...
Res. Streams – Simulated Tag Frequencies             Steffen Staab          Web Science Doctoral             staab@uni-kob...
Lessons learned                                                   [Dellschaft+Staab,                                      ...
Solar System                                                                       Neptun                                 ...
Agenda• Risks and Opportunities in Social Communities:  the ROBUST project• Web Science Methodology:  An explanation by an...
Overall Scheme          User interface              Something else?                                                       ...
What is our Uranus?                   What is this?             Steffen Staab          Web Science Doctoral             st...
Uranus = Spam                                                  [Dellschaft+Staab,                                         ...
Why care? The Bibsonomy Example Complete snapshot of Bibsonomy system Manually labeled ground truth of spammers in the dat...
Why care? The Delicious ExampleCrawled during the TAGora Project       Users                 Tags         Resources       ...
Filtering Results (Users)                            Number of Spammers and Non-Spammers       16000       14000       120...
Filtering Results (Tag Assignments)                           Filtered and unfiltered number of TAS      450000      40000...
That’s whyEffect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream                 Steffen Staab        ...
How statistically significant is the epistemic model fornormal users?              Steffen Staab          Web Science Doct...
Lessons learnedUranus was discovered because it affected NeptunPluto was discovered because it affected Uranus!Spammers ca...
How do constellations in the sky evolve?     http://www.flickr.com/photos/furious-angel/2142647358/sizes/o/in/photostream/...
Agenda• Risks and Opportunities in Social Communities:  the ROBUST project• Web Science Methodology:  An explanation by an...
Example: Network            Person                 Friendship         Steffen Staab          Web Science Doctoral         ...
SUGGESTING WHOM TO LINKTO NEXT    Steffen Staab          Web Science Doctoral    staab@uni-koblenz.de    Summer School 56
Use Networks for Recommendation                                                            :-(    me   Goal: Predict who ...
Algebraic Graph Theory                             3           1          2            4      5            6Represent a ne...
Baseline: Friend of a Friend ModelCount the number of ways a person can be found asthe friend of a friend.Consider the mat...
Eigenvalue DecompositionWrite the matrix A as a product:                           A = UΛUTwhereU are the eigenvectors    ...
Computing A2Use the eigenvalue decomposition A = UΛUT              A2 = UΛUT UΛUT = UΛ2UTExploit U and Λ:    T U U = I   ...
Friend of a Friend of a Friend                             3        1            2               4               5        ...
Matrix Exponential                                  3      0.98                                                         0....
Why the Matrix ExponentialAn     = Number of paths of length naA2 + bA3 + cA4 + . . .  = Number of paths, weighted by path...
Computing Power SeriesLet p(A) be a power series:           p(A) = aA2 + bA3 + cA4 + . . .        = aUΛ2UT + bUΛ3UT + cUΛ4...
TRACKING THE EVOLUTIONOF THE NETWORK AS AWHOLE    Steffen Staab          Web Science Doctoral    staab@uni-koblenz.de    S...
Diversity• Many, equally-sized subcommunities• High entropy• ‘Flat’ structureRegularity• Few large subcommunities• Low ent...
⇒                             ⇒Network Evolution • How did a network look at time t? • Idea: Observe the change of diversi...
Outline 1.   Power-law exponent 2.   Weighted spectral distribution 3.   Network entropy 4.   Network rank                ...
1. Power-law Exponent Number of neighbors is unevenly distributed:                                      Epinions trust net...
1. Power-law Exponent over Time                 Epinions trust network (Massa et al. 2005)         γ shrinks ⇒ Network bec...
2. Weighted Spectral Distribution • Consider the n×n matrix N defined by Nij = 1 / sqrt(d(i)d(j))                when (i,j...
2. Weighted Spectral Distribution over Time               CiteULike user–tag network (Emamy et al. 2007) • The WSD shifts ...
3. Network Entropy                     G = G1 ∪ G2 ∪ . . . ∪ Gr • Write the graph G as a sum of subgraphs Gk Each Gk has w...
3. Network Entropy over Time                             Enron email network (Klimt et al. 2004)                          ...
4. Network Rank Decompose network into subcommunities:                      G = G1 ∪ G2 ∪ . . . ∪ Gr The rank r is a measu...
4. Network Rank over Time       Network rank (rank∗(G))                                    Enron email network (Klimt et a...
More Network Rank Plots                                                                  Epinions trust network    hep-th ...
Conclusion • Power-law exponent shrinks    – Connection diversity shrinking • Weighted spectral distribution shifts to zer...
Watch out!KONECT – Koblenz Network Collectionhttp://uni-koblenz.de/~kunegis/paper/kunegis-   konect.poster.pdfComing soon!...
Why has the sky the density it has?             Steffen Staab          Web Science Doctoral             staab@uni-koblenz....
Why do tagging systems have so little spam?                         Administrative                           ProcessConten...
Agenda• Risks and Opportunities in Social Communities:  the ROBUST project• Web Science Methodology:  An explanation by an...
Yahoo Answers • Ensure quality of user generated content • Use of administrators and community moderators          How? • ...
SURVEY OFGOVERNANCE MODELS    Steffen Staab          Web Science Doctoral    staab@uni-koblenz.de    Summer School 85
Communities need Governance Steering and coordinating actions of community members                                       ...
MotivationDifferent types of     Web communities     User-generated content (video, photos, comment, article,      quest...
Means of Governance1. Direct intervention of community owner   Affecting content or users based on apparent properties2. F...
MethodSelection of 250 most prominent web sites with community  functionality according to Alexa Page RankClustering web s...
Key Results(1) Abuse Reports are a successful means of governance.   • 16 occurrences   • Restricted to filter out unwante...
Key Results(3) Creation time is most implemented ranking criterion    • 18 occurrences    • Others: score: 8, ratings: 6  ...
GOVERNANCE MODEL:DEEP DIVE - SIMULATION     Steffen Staab          Web Science Doctoral     staab@uni-koblenz.de    Summer...
Methodology Principle  1. Define a Web Community model     (Lycos IQ, Yahoo Answers…)  2. Adapt this model to an existing ...
Dataset Lycos IQ    Time Period:                                909 days    Users:                                      34...
Observed parameters (input to simulation)                                                       100000                    ...
Example Behaviors and Example Policies Behaviors of Ordinary Users:                    Reading Policies for         • Crea...
How many administrators are needed?                                                                  1,05           0,95- ...
Fighting spam with administrators…              1          0,998          0,996          0,994                            ...
Fighting spam with user moderators…                                                     1                                 ...
Lessons Learned • Strategy of selecting questionable postings is crucial • Reporting by normal users is the most effective...
Agenda• Risks and Opportunities in Social Communities:  the ROBUST project• Web Science Methodology:  An explanation by an...
Are we satisfied here? No! Not by far!Understand how and why users tag or tweet?  -> What are people‘s limitations that af...
Web Science & Technologies     University of Koblenz ▪ Landau, GermanyThank You!
ReferencesThe Slashdot Zoo: Mining a social network with negative edgesJ. Kunegis, A. Lommatzsch and C. BauckhageIn Proc. ...
ReferencesB. Viswanath, A. Mislove, M. Cha, K. P. Gummadi, On theevolution of user interaction in Facebook. In Proc.Worksh...
ReferencesK. Dellschaft, S. Staab. An Epistemic Dynamic Model for   Tagging Systems. HYPERTEXT 2008, Proceedings of the   ...
Upcoming SlideShare
Loading in …5
×

Managing Social Communities

818 views

Published on

Published in: Technology, Education
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
818
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Managing Social Communities

  1. 1. Web Science & Technologies University of Koblenz ▪ Landau, Germany Managing Social Communities Steffen StaabAcknowledgements to ROBUST Project team & WEST Team, in particular K. Dellschaft, J. Kunegis, F. Schwagereit
  2. 2. Institut WeST – Web Science & TechnologiesSemantic Web Web Retrieval Interactive Web Multimedia Web Software Web eGovernment eMedia eScience eOrganizations ePerson Institute for Computer Institute for Leibniz Institute for Science Information Systems Social Sciences (GESIS) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 2
  3. 3. Plan for this Talk 1 Web 2 Science Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 3
  4. 4. Social Communities …are everywhere c Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 4
  5. 5. Risks Opportunities Bad content quality, Open innovation, social ill behavior,… improved user support,…  jeopardize business value  increase business value Data Storage Content, User & and Processing Networks AnalysisScalability, heterogeneity Understanding, response time Business ValueProduct support & innovation, CRM, Expertise management, Marketing, Advertising Online Communities Intranet, Extranet, Internet Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 5
  6. 6. Large-scale Testbeds2013 20135M users millions posts/day1200K accesses/day 1TB data/day SAP (B2B) Polecat (C2C) Community Network Online Marketing Business Partner Network CRM for IT 2009 2009 1.5M users … 150K access/day IBM (E2E) Developer Network 2009 2013 Corporate Knowledge 99K accounts 800K accounts Management Steffen Staab Web Science Doctoral 2 staab@uni-koblenz.de Summer School 6
  7. 7. SAP Business Partner Use CaseSAP Developer Network Size of user generated Posts per day Number of users content (posts) 2007 2009 2013 2007 2009 2013 2007 2009 2013 SAP 5000 6000 7000 1M 4M 10.0 1M 1.7M 4.8M M Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 7
  8. 8. ROBUST: IBM Employee Use CaseBusiness Data Created per day Number of users 2007 2009 2013 2007 2009 2013IBM Activities Entry 700 2750 5000 53200 143600 200000IBM Blogs Entries 120 30 60 34600 77750 100000IBM Communities 3 23 50 3000 181950 250000IBM Bookmarks 800 900 1000 8500 22400 50000IBM Wikis NA 40 100 NA 35450 100000IBM Files NA 290 1000 NA 45160 100000IBM Overall 1623 4033 7210 500000* 500000* 500000* Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 8
  9. 9. Risks in Online CommunitiesDefinition: Risk Likelihood  Probability of an event occurring  Impact of the event occurringRisk management Cost Benefit  Process for managing costs, benefits and likelyhoods  Detect high impact risks in time even if they generate expensive false alarms SAP: SCN Award Points Scamming  Ignore very low impact risks • Experts reputation decreases even if they can be reliably detected • Business users leave the forumTypes of risks  Non-compliance with the community policies/polity  Scamming or spamming behavior  Lower involvement and productivity  Decrease of user satisfaction  Loss of community dynamics Web: Public communities • Death of TechCrunch forum due to Loss of 1% experts  loss of high revenue spam and lack of management Loss of 10% lurkers  low impact Steffen Staab Web Science Doctoral 8 staab@uni-koblenz.de Summer School 9
  10. 10. Communities: dynamics and confidentialityROBUST supports decision making for users, hosts and service providersManaging growth & decline  Identify, encourage, safeguard core users  Social matching  Define/maintain etiquette and policies  Manage negative behavior and conflicts  Content matching  Recognize, categorize decline and growth  Redirect users to other communitiesMerging communities  Cross community topic detection to stimulate inter-community interactionsSplitting communities  Identification of clusters/compartments of members that can be separate Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 10
  11. 11. Agenda• Risks and Opportunities in Social Communities: the ROBUST project• Many related Talks in this Summer SchoolRobust partners Closely relatedAlani: Monitoring and analysis Greene: Network Analysisof social networks Bernstein: ScalableKarnstedt: User churn infrastructuresBut here comes the biased account from work in our institute Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 11
  12. 12. Plan for this Talk 1 Web 2 Science Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 12
  13. 13. Bild eines schwarzen Lochs Steffen Staab Web Science Doctoral staab@uni-koblenz.de Flickr cc, Jan 7 2009 by Summer School 13 thebadastronomer
  14. 14. Agenda• Risks and Opportunities in Social Communities: the ROBUST project• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 14
  15. 15. Better understanding of the tagging process  Cooperative classification of resources  Which factors influence the tagging process? • Background knowledge of the user? • Tag assignments of other users? Hypothesis: Tagging involves imitation of other users AND selection of tags from background knowledge of users. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 15
  16. 16. Methodology User interface Something else? Tagging Conceptualization Behavior Comparison of Statistics Own Shared Knowledge terminology Model of User Interface Influence Simulated Joint Stochastic Model Tagging Behavior Model of Own Model of Knowledge Sharing Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 16
  17. 17. Components of AnalysisProperties of Tag Streams Observations  Stream view of Folksonomies in  Co-occurrence streams the real world  Resource streamsDynamic model for Tagging Systems Stochastic  Simulating background knowledge models of  Simulating tag imitation influenceSimulation Results Which models  Co-occurrence streams best fit the  Resource streams reality? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 17
  18. 18. Stream Views of a FolksonomyFolksonomies:  Vertices: Users, tags, resources  Edges: Tag assignments  Postings: • Tag assignments of a user to a single resource • Can be ordered according to their time-stamp Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 18
  19. 19. Co-occurrence StreamsCo-occurrence Streams:  All tags co-occurring with a given tag in a posting  Ordered by posting timeCo-occurrence stream for apple:  {mackz, r1, {apple, tree}, 13:25} {klaasd, r2, {apple, mac, ibook}, 13:26} {mackz, r2, {apple, macintosh, stevejobs}, 13:27}  tree, mac, ibook, macintosh, stevejobs Tag |Y| |U| |T| |R| ajax 2.949.614 88.526 41.898 71.525 blog 6.098.471 158.578 186.043 557.017 xml 974.866 44.326 31.998 61.843 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 19
  20. 20. Properties of Co-occurrence Streams – Tag Growth linear growth Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 20
  21. 21. Properties of Co-occurrence Streams – Tag Frequencies power law Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 21
  22. 22. Resource StreamsResource Streams:  All tags assigned to a resource  Ordered by posting timeResource stream for r2:  {mackz, r1, {apple, tree}, 13:25} {klaasd, r2, {apple, mac, ibook}, 13:26} {mackz, r2, {apple, macintosh, stevejobs}, 13:27}  apple, mac, ibook, apple, macintosh, stevejobs Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 22
  23. 23. Properties of Resource Streams – Tag Frequencies Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 23
  24. 24. Properties of Resource Streams – Tag Frequencies Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 24
  25. 25. Web Science & Technologies University of Koblenz ▪ Landau, GermanySimulating the Evolution of Tag Streams
  26. 26. Simulating tag streams Which of my concepts Inspiration for conceptualization from: represent this web page? How do I tag 1. Most popular tags this web page? 2. Most recently used tags 3. Tags used for this resource 4. Tags co-occuring with similar text documents 5. Creating completely new tags 6. … Which combination of inspirations develop the same statistics as the one observed for delicious? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 26
  27. 27. The Delicious User InterfaceImitating previous tag assignments:  Recommended tags: Intersection of tags of a user and tags already assigned to the resource.  Your tags: Tags of the user.  Popular tags: 7 most popular tags assigned to the resource. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 27
  28. 28. Simulating a Tag StreamStart with empty tag streamEach simulation step appends a new tag assignmentSimulation of a single tag assignment: p(w|t): Probability of selecting word w for topic t. Modeled by word distributions in a topic centered text corpus. n: Number of visible previous tags. h: Maximal number of previous tag assignments used for determining ranking of the n distinct tags. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 28
  29. 29. Modeling Background Knowledge Text Corpora Del.icio.us Text CorporaPBK: Probability of selecting from background knowledge  p(w|t): Probability of selecting word w for topic t. Modeled by word distributions in a topic centered text corpus.  p(w|r): Probability of selecting word w for resource r. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 29
  30. 30. Modeling Tag Imitation PBK t t-1 t-2 t-3 t-4 t-5 … t-h … 1-PBK 1 2 3 … nPI = 1 – PBK: Probability of imitating a previous tag assignment  n: Number of visible top-ranked tags  h: Maximal number of previous tag assignments used for determining ranking of the n distinct tags Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 30
  31. 31. Web Science & Technologies University of Koblenz ▪ Landau, GermanySimulation Results
  32. 32. Overall Scheme User interface Something else? Tagging Conceptualization Behavior Comparison of Statistics Own Shared Knowledge terminology Model of User Interface Influence Simulated Joint Stochastic Model Tagging Behavior Model of Own Knowledge Model of Sharing Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 32
  33. 33. Simulating Co-occurrence StreamsTag growth:  Influenced by PBK and p(w|t)Tag Frequencies:  Influenced by PBK, p(w|t), n, h  n: Semantic breadth of a topic (blog: 100 tags, ajax: 50 tags, xml: 50 tags; Cattuto et al. 2007)  h: No hint for realistic values. Good guesses may be 500 and 1000. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 33
  34. 34. Co-occ. Streams – Simulated Tag Growth Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 34
  35. 35. Co-occ. Stream – Simulated Tag Frequencies Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 36
  36. 36. Simulating Resource StreamsPI and PBK: Values comparable to co-occurrence streamsp(w|r): Approximated by p(w|t)n: 7 tags are visible (cf. Delicious user interface)h: Smaller value than for co-occurrence streams Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 37
  37. 37. Res. Streams – Simulated Tag Frequencies Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 38
  38. 38. Lessons learned [Dellschaft+Staab, ACM Hypertext 2008]Black holes do not only eat mass they also dissolve by emitting radiationImitation AND background knowledge are needed for explaining properties of tag streamsProbability of imitating previous tag assignments: ~70-90% Frequency Rank Co-occur. Streams Resource Streams Tag Growth Polya Urn Model o o fixed size Simon Model o o linear YS Model w/ Memory + o linear Halpin et al. Model o o linear Our Model Epistemic Model + + power-law Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 40
  39. 39. Solar System Neptun Uranus Jupiter SaturnFlickr, cc Sep 1 2008 by Image Editor Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 41
  40. 40. Agenda• Risks and Opportunities in Social Communities: the ROBUST project• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 42
  41. 41. Overall Scheme User interface Something else? Tagging Conceptualization Behavior Comparison of Statistics Own Shared Knowledge terminology Model of User Interface Influence Simulated Joint Stochastic Model Tagging Behavior Model of Own Knowledge Model of Sharing Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 43
  42. 42. What is our Uranus? What is this? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 44
  43. 43. Uranus = Spam [Dellschaft+Staab, WebSci 2010]Effect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 45
  44. 44. Why care? The Bibsonomy Example Complete snapshot of Bibsonomy system Manually labeled ground truth of spammers in the data set Users Tags Resources TASSpammers 29,248 297,846 1,197,354 13,258,759Non-Spammers 2,467 61,154 234,143 816,196 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 46
  45. 45. Why care? The Delicious ExampleCrawled during the TAGora Project Users Tags Resources TAS 532,938 2,482,850 18,778,566 140,305,446Amount of spammers not known exactlyEstimation based on random sample of 500 users:  With 95% probability: Between 1.972 and 12.949 spammers  Delicious most likely already applies spam detection  Why care about ~ 1.5% spammers in Delicious? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 47
  46. 46. Filtering Results (Users) Number of Spammers and Non-Spammers 16000 14000 12000 10000 Spammer 8000 Non-Spammer 6000 4000 2000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 48
  47. 47. Filtering Results (Tag Assignments) Filtered and unfiltered number of TAS 450000 400000 350000 300000 250000 Spam Non-Spam 200000 150000 100000 50000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 49
  48. 48. That’s whyEffect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 50
  49. 49. How statistically significant is the epistemic model fornormal users? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 51
  50. 50. Lessons learnedUranus was discovered because it affected NeptunPluto was discovered because it affected Uranus!Spammers can be discovered by their behavior, even if you do not know what kind of spam they are producing! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 52
  51. 51. How do constellations in the sky evolve? http://www.flickr.com/photos/furious-angel/2142647358/sizes/o/in/photostream/ Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 53
  52. 52. Agenda• Risks and Opportunities in Social Communities: the ROBUST project• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 54
  53. 53. Example: Network Person Friendship Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 55
  54. 54. SUGGESTING WHOM TO LINKTO NEXT Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 56
  55. 55. Use Networks for Recommendation :-( me Goal: Predict who a person will add as friend Facebooks algorithm: find friends-of-friends → Problem: Rest of the network is ignored! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 57
  56. 56. Algebraic Graph Theory 3 1 2 4 5 6Represent a network 1 2 3 4 5 6 1 0 1 0 0 0 0by an adjacency matrix A: 2 1 0 1 1 0 0 3 0 1 0 1 0 0Aij = 1 when i and j are connected A= 4 0 1 1 0 1 0Aij = 0 when i and j are not connected 5 0 0 0 1 0 1 6 0 0 0 0 1 0A is square and symmetric. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 58
  57. 57. Baseline: Friend of a Friend ModelCount the number of ways a person can be found asthe friend of a friend.Consider the matrix product AA = A2 2 30 1 0 0 0 0 1 0 1 1 0 01 0 1 1 0 0 0 3 1 1 1 00 1 0 1 0 0 1 1 2 1 1 0 =0 1 1 0 1 0 1 1 1 3 0 10 0 0 1 0 1 0 1 1 0 2 0 1 2 40 0 0 0 1 0 0 0 0 1 0 1 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 59
  58. 58. Eigenvalue DecompositionWrite the matrix A as a product: A = UΛUTwhereU are the eigenvectors UTU = IΛ are the eigenvalues Λij = 0 when i ≠ j Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 60
  59. 59. Computing A2Use the eigenvalue decomposition A = UΛUT A2 = UΛUT UΛUT = UΛ2UTExploit U and Λ: T U U = I because U contains eigenvectors (Λ ) = Λ because Λ contains eigenvalues 2 2 ii iiResult: Just square all eigenvalues! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 61
  60. 60. Friend of a Friend of a Friend 3 1 2 4 5 6Compute the number of friends-of-friends-of-friends: 1 2 3 4 5 6 3 0 1 0 0 0 0 0 3 1 1 1 0 1 1 0 1 1 0 0 3 2 4 5 1 1 2 0 1 0 1 0 0 1 4 2 4 1 1 3 0 1 1 0 1 0 = 1 5 4 2 4 0 4 0 0 0 1 0 1 1 1 1 4 0 2 5 0 0 0 0 1 0 0 1 1 0 2 0 6A3 = UΛUT UΛUT UΛUT = UΛ3UT Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 62
  61. 61. Matrix Exponential 3 0.98 0.76 0.22 1 2 4 5 6 7The matrix exponential can be written as a powersum with decreasing coefficients: exp(A) = I + A + 1/2 A2 + 1/6 A3 + . . . 1 2 3 4 5 6 7 0 1 0 0 0 0 0 1.66 1.72 0.93 0.98 0.28 0.06 0.01 1 1 0 1 1 0 0 0 1.72 3.57 2.70 2.93 1.04 0.29 0.06 2 0 1 0 1 0 0 0 0.93 2.70 2.86 2.71 0.99 0.28 0.06 3 exp 0 1 1 0 1 0 0 = 0.98 2.93 2.71 3.63 1.95 0.76 0.22 4 0 0 0 1 0 1 0 0.28 1.04 0.99 1.95 2.35 1.59 0.64 5 0 0 0 0 1 0 1 0.06 0.29 0.28 0.76 1.59 2.23 1.38 6 0 0 0 0 0 1 0 0.01 0.06 0.06 0.22 0.64 1.38 1.59 7Recommendations for user ④: ①>⑥>⑦ Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 63
  62. 62. Why the Matrix ExponentialAn = Number of paths of length naA2 + bA3 + cA4 + . . . = Number of paths, weighted by path length→ New edges more likely to appear when there aremany paths already→ When a > b > c > . . . > 0, short paths areweighted more Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 64
  63. 63. Computing Power SeriesLet p(A) be a power series: p(A) = aA2 + bA3 + cA4 + . . . = aUΛ2UT + bUΛ3UT + cUΛ4UT + . . . = U(aΛ2 + bΛ3 + cΛ4 + . . .)UT = Up(Λ)UTTherefore: Power series change only the eigenvalues! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 65
  64. 64. TRACKING THE EVOLUTIONOF THE NETWORK AS AWHOLE Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 66
  65. 65. Diversity• Many, equally-sized subcommunities• High entropy• ‘Flat’ structureRegularity• Few large subcommunities• Low entropy• Many ‘hubs’ Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 67
  66. 66. ⇒ ⇒Network Evolution • How did a network look at time t? • Idea: Observe the change of diversity/regularity over time Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 68
  67. 67. Outline 1. Power-law exponent 2. Weighted spectral distribution 3. Network entropy 4. Network rank Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 69
  68. 68. 1. Power-law Exponent Number of neighbors is unevenly distributed: Epinions trust network (Massa et al. 2005) C(n) ∼ n−γ Results in a power-law (Newman 2006) Higher exponent γ denotes less regularity Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 70
  69. 69. 1. Power-law Exponent over Time Epinions trust network (Massa et al. 2005) γ shrinks ⇒ Network becomes more regular Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 71
  70. 70. 2. Weighted Spectral Distribution • Consider the n×n matrix N defined by Nij = 1 / sqrt(d(i)d(j)) when (i,j) is an edge Nij = 0 otherwise Then the distribution of the eigenvalues of N is called the weighted spectral distribution (WSD) (Fay et al. 2010) Eigenvalues nearer to ±1: diversity Eigenvalues nearer to 0: regularity Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 72
  71. 71. 2. Weighted Spectral Distribution over Time CiteULike user–tag network (Emamy et al. 2007) • The WSD shifts to zero ⇒ Regularization The WSD shifts towards zero ⇒ The network becomes regular Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 73
  72. 72. 3. Network Entropy G = G1 ∪ G2 ∪ . . . ∪ Gr • Write the graph G as a sum of subgraphs Gk Each Gk has weighted edges, with total weight λk • When picking an edge from G at random, the probability of it being in community Gk is λk / (λ1 + λ2 + . . . + λr) = λk / L • The entropy of this distribution is (Kunegis et al. 2011) H(G) = − Σk (λk / L) log (λk / L) • Entropy: Effective number of subcommunities Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 74
  73. 73. 3. Network Entropy over Time Enron email network (Klimt et al. 2004) absolut e Entropy (H(G)) zoo m Entropy is constant ⇒ Constant number of communities 0 Time (t) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 75
  74. 74. 4. Network Rank Decompose network into subcommunities: G = G1 ∪ G2 ∪ . . . ∪ Gr The rank r is a measure of diversity: rank(G) = r Weighted rank: rank∗(G) = Σk |Gk| / |G1| Robust measure of diversity (Kunegis et al. 2011) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 76
  75. 75. 4. Network Rank over Time Network rank (rank∗(G)) Enron email network (Klimt et al. 2004) Time (t) • Increasing network rank: increasing diversity • Shrinking network rank: shrinking diversity Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 77
  76. 76. More Network Rank Plots Epinions trust network hep-th citations Wikipedia elections frwikibooks edits MIT conference contacts YouTube social network (biased towards good examples of convex evolution) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 78
  77. 77. Conclusion • Power-law exponent shrinks – Connection diversity shrinking • Weighted spectral distribution shifts to zero – Emerging main components • Entropy is constant – Effective number of communities is constant • Network rank increases, then shrinks – Two-phase- model of expansion Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 79
  78. 78. Watch out!KONECT – Koblenz Network Collectionhttp://uni-koblenz.de/~kunegis/paper/kunegis- konect.poster.pdfComing soon!Follow #ictrobust or @kunegis or @ststaab Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 80
  79. 79. Why has the sky the density it has? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 81 14, Flickr, cc Oct 2007, Michael Donough
  80. 80. Why do tagging systems have so little spam? Administrative ProcessContent Community UserQuality Policy Roles Content Steffen Staab Process Web Science Doctoral staab@uni-koblenz.de Summer School 82
  81. 81. Agenda• Risks and Opportunities in Social Communities: the ROBUST project• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 83
  82. 82. Yahoo Answers • Ensure quality of user generated content • Use of administrators and community moderators How? • Policy influences community processes Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 84
  83. 83. SURVEY OFGOVERNANCE MODELS Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 85
  84. 84. Communities need Governance Steering and coordinating actions of community members [Benz2004]Goal: Successful and flourishing community  High quality user-generated content  Active community members [ http://www.flickr.com/photos/61433480@N02/5593890914/, http://www.flickr.com/photos/boojee/3733902852/ ] Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 86
  85. 85. MotivationDifferent types of  Web communities  User-generated content (video, photos, comment, article, questions, answers, posting, review text) What are the most successful means of governance for user-generated content? Analyze successful platforms and compare their means of governance! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 87
  86. 86. Means of Governance1. Direct intervention of community owner Affecting content or users based on apparent properties2. Functionality of the community platform Text Reviews Bookmarks Ratings Abuse Reports Assessment User-generated Content Modification Community Content Complex User Roles Member Selection & Ranking Ratings Score Time Views Replies Hide Low Quality Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 88
  87. 87. MethodSelection of 250 most prominent web sites with community functionality according to Alexa Page RankClustering web sites in four groups according to purpose Social Media Editorial News Social Networking Social ReviewingTop-5 web sites of each group analyzed (*) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 89
  88. 88. Key Results(1) Abuse Reports are a successful means of governance. • 16 occurrences • Restricted to filter out unwanted content • Staff needed – expensive but efficient [Schwagereit2010](2) Simple ratings are dominant – but battle between “Like” and “Like/Dislike” • “Like”: 9 occurrences • “Like/Dislike”: 7 occurrences • Tradeoff between simplicity and improved ranking ability Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 90
  89. 89. Key Results(3) Creation time is most implemented ranking criterion • 18 occurrences • Others: score: 8, ratings: 6 • Important content is renewed - unimportant content will be forgotten(4) Content modification and user roles are rarely implemented  2 occurrences  Requires complex role system and users who understand it Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 91
  90. 90. GOVERNANCE MODEL:DEEP DIVE - SIMULATION Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 92
  91. 91. Methodology Principle 1. Define a Web Community model (Lycos IQ, Yahoo Answers…) 2. Adapt this model to an existing community 3. Estimate parameters 4. Define quality measure 5. Simulate community behaviour 6. Compare simulation results with real data 7. Analyze quality measures wrt variations of CoSiMo parameters Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 93
  92. 92. Dataset Lycos IQ Time Period: 909 days Users: 34.327 Administrators: 36 Questions: 1.031.982 Answers: 2.996.446 Deleted non-compliant Answers: 21.139 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 94
  93. 93. Observed parameters (input to simulation) 100000 10000 1000 100 Number of Users 10 1 0-999 1000-1999 2000-2999 3000-3999 4000-4999 5000-5999 0.9-1.0 0.8-0.89 6000-6999 0.7-0.79 0.6-0.69 7000-7999 0.5-0.59 0.4-0.49 >7000 0.3-0.39 0.2-0.29 Answers 0.1-0.19 0.0-0.09 per year Rate of Compliant Answers Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 95
  94. 94. Example Behaviors and Example Policies Behaviors of Ordinary Users: Reading Policies for • Create new postings Administrators: • Read existing postings PA: random selection of • Report non-compliant postings postings PB: random selection of OR give bonus points to postings that no other poster administrator has examined so far Moderator Users: PC: selection of postings that • Create new postings were most often reported • Read existing postings by users for being non- • Delete non-compliant compliant posting OR give bonus points to Promotion Policy: poster PM-X : ordinary users become moderators (who can Administrators: delete postings) when •Read existing postings having at least X bonus •Delete non-compliant points postings Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 96
  95. 95. How many administrators are needed? 1,05 0,95- 1,05 0,95 0,85- Recent 0,95 0,85 Posting 0,75- Quality 0,85 0,75 0,65- 0,65 0,75 5 10 20 40 1152 80 288 72 160 Additional non-compliant 320 18 640 4 Postings (per day) 1280 Number of Administrators 2560 1 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 97
  96. 96. Fighting spam with administrators… 1 0,998 0,996 0,994 0,998-1Recent 0,992Posting 0,99 0,996-0,998 576Quality 0,994-0,996 72 9 0,992-0,994 1 0,99-0,992 Number of Administrators Applied Policies Variation of policies and number of administrators • Efficient policies result in high quality content • A minimum of 18 administrators are needed • Many moderators are needed to bring the quality to a high level Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 98
  97. 97. Fighting spam with user moderators… 1 0,95 0,9 0,85 0,8 0,95-1 0,75 0,7 Recent 5 0,65 100,6 Posting 20 40 0,9- 80 160 Quality 320 640 0,95 1280 PA+PB+PC+PM12 2560 0,85- PA+PB+PC+PM25 PA+PB+PC+PM1… PA+PB+PC+PM50 PA+PB+PC+PM3… PA+PB+PC+PM100 PA+PB+PC+PM200 PA+PB+PC+PM400 PA+PB+PC+PM800 PA+PB+PC PA+PB 0,9 PA Additional non- compliant Postings (per day) Applied Policies Variation of policies and posting quality • A limited number of administrators has a limited capacity of filtering a surge of non-compliant postings • Moderators are helping to increase quality Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 99
  98. 98. Lessons Learned • Strategy of selecting questionable postings is crucial • Reporting by normal users is the most effective strategy • Moderators are not so effective as expected, if they hunt only incidentally for non-compliant content • Sufficiently strong requirements regarding moderator profiles lead to high quality of moderators • Policies for promoting users need to be based on a criterion that is time dependent Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 100
  99. 99. Agenda• Risks and Opportunities in Social Communities: the ROBUST project• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 101
  100. 100. Are we satisfied here? No! Not by far!Understand how and why users tag or tweet? -> What are people‘s limitations that affect the system? -> Psychology and Sociology!What are their legal boundaries? -> How can you shape the systems? -> Law!What are organizations‘ incentives? -> Why and how do organizations participate? -> Nice example: open source -> Economy Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 102
  101. 101. Web Science & Technologies University of Koblenz ▪ Landau, GermanyThank You!
  102. 102. ReferencesThe Slashdot Zoo: Mining a social network with negative edgesJ. Kunegis, A. Lommatzsch and C. BauckhageIn Proc. World Wide Web Conf., pp. 741–750, 2009.Learning spectral graph transformations for link predictionJ. Kunegis and A. LommatzschIn Proc. Int. Conf. on Machine Learning, pp. 561–568, 2009.Spectral analysis of signed graphs for clustering, prediction andvisualizationJ. Kunegis, S. Schmidt, A. Lommatzsch and J. LernerIn Proc. SIAM Int. Conf. on Data Mining, pp. 559–570, 2010.Network growth and the spectral evolution modelJ. Kunegis, D. Fay and C. BauckhageIn Proc. Conf. on Information and Knowledge Management,pp. 739–748, 2010. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 104
  103. 103. ReferencesB. Viswanath, A. Mislove, M. Cha, K. P. Gummadi, On theevolution of user interaction in Facebook. In Proc.Workshop on Online Social Networks, pp. 37–42, 2009. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 105
  104. 104. ReferencesK. Dellschaft, S. Staab. An Epistemic Dynamic Model for Tagging Systems. HYPERTEXT 2008, Proceedings of the 19th ACM Conference on Hypertext and Hypermedia, June 19-21, 2008 - Pittsburgh, Pennsylvania, USA.K. Dellschaft, S. Staab. On Differences in the Tagging Behavior of Spammers and Regular Users. In: Proc. of WebSci-2010, Raleigh, April, 2010.F. Schwagereit, S. Sizov, S. Staab. Finding Optimal Policies for Online Communities with CoSiMo. In: Proc. of WebSci- 2010, Raleigh, US, April, 2010. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 106

×