Your SlideShare is downloading. ×
Managing Social Communities
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Managing Social Communities

280
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
280
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Web Science & Technologies University of Koblenz ▪ Landau, Germany Managing Social Communities Steffen StaabAcknowledgements to ROBUST Project team & WEST Team, in particular K. Dellschaft, J. Kunegis, F. Schwagereit
  • 2. Institut WeST – Web Science & TechnologiesSemantic Web Web Retrieval Interactive Web Multimedia Web Software Web eGovernment eMedia eScience eOrganizations ePerson Institute for Computer Institute for Leibniz Institute for Science Information Systems Social Sciences (GESIS) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 2
  • 3. Plan for this Talk 1 Web 2 Science Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 3
  • 4. Social Communities …are everywhere c Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 4
  • 5. Risks Opportunities Bad content quality, Open innovation, social ill behavior,… improved user support,…  jeopardize business value  increase business value Data Storage Content, User & and Processing Networks AnalysisScalability, heterogeneity Understanding, response time Business ValueProduct support & innovation, CRM, Expertise management, Marketing, Advertising Online Communities Intranet, Extranet, Internet Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 5
  • 6. Large-scale Testbeds2013 20135M users millions posts/day1200K accesses/day 1TB data/day SAP (B2B) Polecat (C2C) Community Network Online Marketing Business Partner Network CRM for IT 2009 2009 1.5M users … 150K access/day IBM (E2E) Developer Network 2009 2013 Corporate Knowledge 99K accounts 800K accounts Management Steffen Staab Web Science Doctoral 2 staab@uni-koblenz.de Summer School 6
  • 7. SAP Business Partner Use CaseSAP Developer Network Size of user generated Posts per day Number of users content (posts) 2007 2009 2013 2007 2009 2013 2007 2009 2013 SAP 5000 6000 7000 1M 4M 10.0 1M 1.7M 4.8M M Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 7
  • 8. ROBUST: IBM Employee Use CaseBusiness Data Created per day Number of users 2007 2009 2013 2007 2009 2013IBM Activities Entry 700 2750 5000 53200 143600 200000IBM Blogs Entries 120 30 60 34600 77750 100000IBM Communities 3 23 50 3000 181950 250000IBM Bookmarks 800 900 1000 8500 22400 50000IBM Wikis NA 40 100 NA 35450 100000IBM Files NA 290 1000 NA 45160 100000IBM Overall 1623 4033 7210 500000* 500000* 500000* Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 8
  • 9. Risks in Online CommunitiesDefinition: Risk Likelihood  Probability of an event occurring  Impact of the event occurringRisk management Cost Benefit  Process for managing costs, benefits and likelyhoods  Detect high impact risks in time even if they generate expensive false alarms SAP: SCN Award Points Scamming  Ignore very low impact risks • Experts reputation decreases even if they can be reliably detected • Business users leave the forumTypes of risks  Non-compliance with the community policies/polity  Scamming or spamming behavior  Lower involvement and productivity  Decrease of user satisfaction  Loss of community dynamics Web: Public communities • Death of TechCrunch forum due to Loss of 1% experts  loss of high revenue spam and lack of management Loss of 10% lurkers  low impact Steffen Staab Web Science Doctoral 8 staab@uni-koblenz.de Summer School 9
  • 10. Communities: dynamics and confidentialityROBUST supports decision making for users, hosts and service providersManaging growth & decline  Identify, encourage, safeguard core users  Social matching  Define/maintain etiquette and policies  Manage negative behavior and conflicts  Content matching  Recognize, categorize decline and growth  Redirect users to other communitiesMerging communities  Cross community topic detection to stimulate inter-community interactionsSplitting communities  Identification of clusters/compartments of members that can be separate Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 10
  • 11. Agenda• Risks and Opportunities in Social Communities: the ROBUST project• Many related Talks in this Summer SchoolRobust partners Closely relatedAlani: Monitoring and analysis Greene: Network Analysisof social networks Bernstein: ScalableKarnstedt: User churn infrastructuresBut here comes the biased account from work in our institute Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 11
  • 12. Plan for this Talk 1 Web 2 Science Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 12
  • 13. Bild eines schwarzen Lochs Steffen Staab Web Science Doctoral staab@uni-koblenz.de Flickr cc, Jan 7 2009 by Summer School 13 thebadastronomer
  • 14. Agenda• Risks and Opportunities in Social Communities: the ROBUST project• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 14
  • 15. Better understanding of the tagging process  Cooperative classification of resources  Which factors influence the tagging process? • Background knowledge of the user? • Tag assignments of other users? Hypothesis: Tagging involves imitation of other users AND selection of tags from background knowledge of users. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 15
  • 16. Methodology User interface Something else? Tagging Conceptualization Behavior Comparison of Statistics Own Shared Knowledge terminology Model of User Interface Influence Simulated Joint Stochastic Model Tagging Behavior Model of Own Model of Knowledge Sharing Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 16
  • 17. Components of AnalysisProperties of Tag Streams Observations  Stream view of Folksonomies in  Co-occurrence streams the real world  Resource streamsDynamic model for Tagging Systems Stochastic  Simulating background knowledge models of  Simulating tag imitation influenceSimulation Results Which models  Co-occurrence streams best fit the  Resource streams reality? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 17
  • 18. Stream Views of a FolksonomyFolksonomies:  Vertices: Users, tags, resources  Edges: Tag assignments  Postings: • Tag assignments of a user to a single resource • Can be ordered according to their time-stamp Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 18
  • 19. Co-occurrence StreamsCo-occurrence Streams:  All tags co-occurring with a given tag in a posting  Ordered by posting timeCo-occurrence stream for apple:  {mackz, r1, {apple, tree}, 13:25} {klaasd, r2, {apple, mac, ibook}, 13:26} {mackz, r2, {apple, macintosh, stevejobs}, 13:27}  tree, mac, ibook, macintosh, stevejobs Tag |Y| |U| |T| |R| ajax 2.949.614 88.526 41.898 71.525 blog 6.098.471 158.578 186.043 557.017 xml 974.866 44.326 31.998 61.843 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 19
  • 20. Properties of Co-occurrence Streams – Tag Growth linear growth Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 20
  • 21. Properties of Co-occurrence Streams – Tag Frequencies power law Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 21
  • 22. Resource StreamsResource Streams:  All tags assigned to a resource  Ordered by posting timeResource stream for r2:  {mackz, r1, {apple, tree}, 13:25} {klaasd, r2, {apple, mac, ibook}, 13:26} {mackz, r2, {apple, macintosh, stevejobs}, 13:27}  apple, mac, ibook, apple, macintosh, stevejobs Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 22
  • 23. Properties of Resource Streams – Tag Frequencies Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 23
  • 24. Properties of Resource Streams – Tag Frequencies Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 24
  • 25. Web Science & Technologies University of Koblenz ▪ Landau, GermanySimulating the Evolution of Tag Streams
  • 26. Simulating tag streams Which of my concepts Inspiration for conceptualization from: represent this web page? How do I tag 1. Most popular tags this web page? 2. Most recently used tags 3. Tags used for this resource 4. Tags co-occuring with similar text documents 5. Creating completely new tags 6. … Which combination of inspirations develop the same statistics as the one observed for delicious? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 26
  • 27. The Delicious User InterfaceImitating previous tag assignments:  Recommended tags: Intersection of tags of a user and tags already assigned to the resource.  Your tags: Tags of the user.  Popular tags: 7 most popular tags assigned to the resource. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 27
  • 28. Simulating a Tag StreamStart with empty tag streamEach simulation step appends a new tag assignmentSimulation of a single tag assignment: p(w|t): Probability of selecting word w for topic t. Modeled by word distributions in a topic centered text corpus. n: Number of visible previous tags. h: Maximal number of previous tag assignments used for determining ranking of the n distinct tags. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 28
  • 29. Modeling Background Knowledge Text Corpora Del.icio.us Text CorporaPBK: Probability of selecting from background knowledge  p(w|t): Probability of selecting word w for topic t. Modeled by word distributions in a topic centered text corpus.  p(w|r): Probability of selecting word w for resource r. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 29
  • 30. Modeling Tag Imitation PBK t t-1 t-2 t-3 t-4 t-5 … t-h … 1-PBK 1 2 3 … nPI = 1 – PBK: Probability of imitating a previous tag assignment  n: Number of visible top-ranked tags  h: Maximal number of previous tag assignments used for determining ranking of the n distinct tags Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 30
  • 31. Web Science & Technologies University of Koblenz ▪ Landau, GermanySimulation Results
  • 32. Overall Scheme User interface Something else? Tagging Conceptualization Behavior Comparison of Statistics Own Shared Knowledge terminology Model of User Interface Influence Simulated Joint Stochastic Model Tagging Behavior Model of Own Knowledge Model of Sharing Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 32
  • 33. Simulating Co-occurrence StreamsTag growth:  Influenced by PBK and p(w|t)Tag Frequencies:  Influenced by PBK, p(w|t), n, h  n: Semantic breadth of a topic (blog: 100 tags, ajax: 50 tags, xml: 50 tags; Cattuto et al. 2007)  h: No hint for realistic values. Good guesses may be 500 and 1000. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 33
  • 34. Co-occ. Streams – Simulated Tag Growth Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 34
  • 35. Co-occ. Stream – Simulated Tag Frequencies Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 36
  • 36. Simulating Resource StreamsPI and PBK: Values comparable to co-occurrence streamsp(w|r): Approximated by p(w|t)n: 7 tags are visible (cf. Delicious user interface)h: Smaller value than for co-occurrence streams Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 37
  • 37. Res. Streams – Simulated Tag Frequencies Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 38
  • 38. Lessons learned [Dellschaft+Staab, ACM Hypertext 2008]Black holes do not only eat mass they also dissolve by emitting radiationImitation AND background knowledge are needed for explaining properties of tag streamsProbability of imitating previous tag assignments: ~70-90% Frequency Rank Co-occur. Streams Resource Streams Tag Growth Polya Urn Model o o fixed size Simon Model o o linear YS Model w/ Memory + o linear Halpin et al. Model o o linear Our Model Epistemic Model + + power-law Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 40
  • 39. Solar System Neptun Uranus Jupiter SaturnFlickr, cc Sep 1 2008 by Image Editor Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 41
  • 40. Agenda• Risks and Opportunities in Social Communities: the ROBUST project• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 42
  • 41. Overall Scheme User interface Something else? Tagging Conceptualization Behavior Comparison of Statistics Own Shared Knowledge terminology Model of User Interface Influence Simulated Joint Stochastic Model Tagging Behavior Model of Own Knowledge Model of Sharing Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 43
  • 42. What is our Uranus? What is this? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 44
  • 43. Uranus = Spam [Dellschaft+Staab, WebSci 2010]Effect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 45
  • 44. Why care? The Bibsonomy Example Complete snapshot of Bibsonomy system Manually labeled ground truth of spammers in the data set Users Tags Resources TASSpammers 29,248 297,846 1,197,354 13,258,759Non-Spammers 2,467 61,154 234,143 816,196 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 46
  • 45. Why care? The Delicious ExampleCrawled during the TAGora Project Users Tags Resources TAS 532,938 2,482,850 18,778,566 140,305,446Amount of spammers not known exactlyEstimation based on random sample of 500 users:  With 95% probability: Between 1.972 and 12.949 spammers  Delicious most likely already applies spam detection  Why care about ~ 1.5% spammers in Delicious? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 47
  • 46. Filtering Results (Users) Number of Spammers and Non-Spammers 16000 14000 12000 10000 Spammer 8000 Non-Spammer 6000 4000 2000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 48
  • 47. Filtering Results (Tag Assignments) Filtered and unfiltered number of TAS 450000 400000 350000 300000 250000 Spam Non-Spam 200000 150000 100000 50000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 49
  • 48. That’s whyEffect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 50
  • 49. How statistically significant is the epistemic model fornormal users? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 51
  • 50. Lessons learnedUranus was discovered because it affected NeptunPluto was discovered because it affected Uranus!Spammers can be discovered by their behavior, even if you do not know what kind of spam they are producing! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 52
  • 51. How do constellations in the sky evolve? http://www.flickr.com/photos/furious-angel/2142647358/sizes/o/in/photostream/ Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 53
  • 52. Agenda• Risks and Opportunities in Social Communities: the ROBUST project• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 54
  • 53. Example: Network Person Friendship Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 55
  • 54. SUGGESTING WHOM TO LINKTO NEXT Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 56
  • 55. Use Networks for Recommendation :-( me Goal: Predict who a person will add as friend Facebooks algorithm: find friends-of-friends → Problem: Rest of the network is ignored! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 57
  • 56. Algebraic Graph Theory 3 1 2 4 5 6Represent a network 1 2 3 4 5 6 1 0 1 0 0 0 0by an adjacency matrix A: 2 1 0 1 1 0 0 3 0 1 0 1 0 0Aij = 1 when i and j are connected A= 4 0 1 1 0 1 0Aij = 0 when i and j are not connected 5 0 0 0 1 0 1 6 0 0 0 0 1 0A is square and symmetric. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 58
  • 57. Baseline: Friend of a Friend ModelCount the number of ways a person can be found asthe friend of a friend.Consider the matrix product AA = A2 2 30 1 0 0 0 0 1 0 1 1 0 01 0 1 1 0 0 0 3 1 1 1 00 1 0 1 0 0 1 1 2 1 1 0 =0 1 1 0 1 0 1 1 1 3 0 10 0 0 1 0 1 0 1 1 0 2 0 1 2 40 0 0 0 1 0 0 0 0 1 0 1 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 59
  • 58. Eigenvalue DecompositionWrite the matrix A as a product: A = UΛUTwhereU are the eigenvectors UTU = IΛ are the eigenvalues Λij = 0 when i ≠ j Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 60
  • 59. Computing A2Use the eigenvalue decomposition A = UΛUT A2 = UΛUT UΛUT = UΛ2UTExploit U and Λ: T U U = I because U contains eigenvectors (Λ ) = Λ because Λ contains eigenvalues 2 2 ii iiResult: Just square all eigenvalues! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 61
  • 60. Friend of a Friend of a Friend 3 1 2 4 5 6Compute the number of friends-of-friends-of-friends: 1 2 3 4 5 6 3 0 1 0 0 0 0 0 3 1 1 1 0 1 1 0 1 1 0 0 3 2 4 5 1 1 2 0 1 0 1 0 0 1 4 2 4 1 1 3 0 1 1 0 1 0 = 1 5 4 2 4 0 4 0 0 0 1 0 1 1 1 1 4 0 2 5 0 0 0 0 1 0 0 1 1 0 2 0 6A3 = UΛUT UΛUT UΛUT = UΛ3UT Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 62
  • 61. Matrix Exponential 3 0.98 0.76 0.22 1 2 4 5 6 7The matrix exponential can be written as a powersum with decreasing coefficients: exp(A) = I + A + 1/2 A2 + 1/6 A3 + . . . 1 2 3 4 5 6 7 0 1 0 0 0 0 0 1.66 1.72 0.93 0.98 0.28 0.06 0.01 1 1 0 1 1 0 0 0 1.72 3.57 2.70 2.93 1.04 0.29 0.06 2 0 1 0 1 0 0 0 0.93 2.70 2.86 2.71 0.99 0.28 0.06 3 exp 0 1 1 0 1 0 0 = 0.98 2.93 2.71 3.63 1.95 0.76 0.22 4 0 0 0 1 0 1 0 0.28 1.04 0.99 1.95 2.35 1.59 0.64 5 0 0 0 0 1 0 1 0.06 0.29 0.28 0.76 1.59 2.23 1.38 6 0 0 0 0 0 1 0 0.01 0.06 0.06 0.22 0.64 1.38 1.59 7Recommendations for user ④: ①>⑥>⑦ Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 63
  • 62. Why the Matrix ExponentialAn = Number of paths of length naA2 + bA3 + cA4 + . . . = Number of paths, weighted by path length→ New edges more likely to appear when there aremany paths already→ When a > b > c > . . . > 0, short paths areweighted more Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 64
  • 63. Computing Power SeriesLet p(A) be a power series: p(A) = aA2 + bA3 + cA4 + . . . = aUΛ2UT + bUΛ3UT + cUΛ4UT + . . . = U(aΛ2 + bΛ3 + cΛ4 + . . .)UT = Up(Λ)UTTherefore: Power series change only the eigenvalues! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 65
  • 64. TRACKING THE EVOLUTIONOF THE NETWORK AS AWHOLE Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 66
  • 65. Diversity• Many, equally-sized subcommunities• High entropy• ‘Flat’ structureRegularity• Few large subcommunities• Low entropy• Many ‘hubs’ Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 67
  • 66. ⇒ ⇒Network Evolution • How did a network look at time t? • Idea: Observe the change of diversity/regularity over time Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 68
  • 67. Outline 1. Power-law exponent 2. Weighted spectral distribution 3. Network entropy 4. Network rank Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 69
  • 68. 1. Power-law Exponent Number of neighbors is unevenly distributed: Epinions trust network (Massa et al. 2005) C(n) ∼ n−γ Results in a power-law (Newman 2006) Higher exponent γ denotes less regularity Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 70
  • 69. 1. Power-law Exponent over Time Epinions trust network (Massa et al. 2005) γ shrinks ⇒ Network becomes more regular Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 71
  • 70. 2. Weighted Spectral Distribution • Consider the n×n matrix N defined by Nij = 1 / sqrt(d(i)d(j)) when (i,j) is an edge Nij = 0 otherwise Then the distribution of the eigenvalues of N is called the weighted spectral distribution (WSD) (Fay et al. 2010) Eigenvalues nearer to ±1: diversity Eigenvalues nearer to 0: regularity Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 72
  • 71. 2. Weighted Spectral Distribution over Time CiteULike user–tag network (Emamy et al. 2007) • The WSD shifts to zero ⇒ Regularization The WSD shifts towards zero ⇒ The network becomes regular Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 73
  • 72. 3. Network Entropy G = G1 ∪ G2 ∪ . . . ∪ Gr • Write the graph G as a sum of subgraphs Gk Each Gk has weighted edges, with total weight λk • When picking an edge from G at random, the probability of it being in community Gk is λk / (λ1 + λ2 + . . . + λr) = λk / L • The entropy of this distribution is (Kunegis et al. 2011) H(G) = − Σk (λk / L) log (λk / L) • Entropy: Effective number of subcommunities Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 74
  • 73. 3. Network Entropy over Time Enron email network (Klimt et al. 2004) absolut e Entropy (H(G)) zoo m Entropy is constant ⇒ Constant number of communities 0 Time (t) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 75
  • 74. 4. Network Rank Decompose network into subcommunities: G = G1 ∪ G2 ∪ . . . ∪ Gr The rank r is a measure of diversity: rank(G) = r Weighted rank: rank∗(G) = Σk |Gk| / |G1| Robust measure of diversity (Kunegis et al. 2011) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 76
  • 75. 4. Network Rank over Time Network rank (rank∗(G)) Enron email network (Klimt et al. 2004) Time (t) • Increasing network rank: increasing diversity • Shrinking network rank: shrinking diversity Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 77
  • 76. More Network Rank Plots Epinions trust network hep-th citations Wikipedia elections frwikibooks edits MIT conference contacts YouTube social network (biased towards good examples of convex evolution) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 78
  • 77. Conclusion • Power-law exponent shrinks – Connection diversity shrinking • Weighted spectral distribution shifts to zero – Emerging main components • Entropy is constant – Effective number of communities is constant • Network rank increases, then shrinks – Two-phase- model of expansion Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 79
  • 78. Watch out!KONECT – Koblenz Network Collectionhttp://uni-koblenz.de/~kunegis/paper/kunegis- konect.poster.pdfComing soon!Follow #ictrobust or @kunegis or @ststaab Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 80
  • 79. Why has the sky the density it has? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 81 14, Flickr, cc Oct 2007, Michael Donough
  • 80. Why do tagging systems have so little spam? Administrative ProcessContent Community UserQuality Policy Roles Content Steffen Staab Process Web Science Doctoral staab@uni-koblenz.de Summer School 82
  • 81. Agenda• Risks and Opportunities in Social Communities: the ROBUST project• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 83
  • 82. Yahoo Answers • Ensure quality of user generated content • Use of administrators and community moderators How? • Policy influences community processes Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 84
  • 83. SURVEY OFGOVERNANCE MODELS Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 85
  • 84. Communities need Governance Steering and coordinating actions of community members [Benz2004]Goal: Successful and flourishing community  High quality user-generated content  Active community members [ http://www.flickr.com/photos/61433480@N02/5593890914/, http://www.flickr.com/photos/boojee/3733902852/ ] Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 86
  • 85. MotivationDifferent types of  Web communities  User-generated content (video, photos, comment, article, questions, answers, posting, review text) What are the most successful means of governance for user-generated content? Analyze successful platforms and compare their means of governance! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 87
  • 86. Means of Governance1. Direct intervention of community owner Affecting content or users based on apparent properties2. Functionality of the community platform Text Reviews Bookmarks Ratings Abuse Reports Assessment User-generated Content Modification Community Content Complex User Roles Member Selection & Ranking Ratings Score Time Views Replies Hide Low Quality Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 88
  • 87. MethodSelection of 250 most prominent web sites with community functionality according to Alexa Page RankClustering web sites in four groups according to purpose Social Media Editorial News Social Networking Social ReviewingTop-5 web sites of each group analyzed (*) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 89
  • 88. Key Results(1) Abuse Reports are a successful means of governance. • 16 occurrences • Restricted to filter out unwanted content • Staff needed – expensive but efficient [Schwagereit2010](2) Simple ratings are dominant – but battle between “Like” and “Like/Dislike” • “Like”: 9 occurrences • “Like/Dislike”: 7 occurrences • Tradeoff between simplicity and improved ranking ability Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 90
  • 89. Key Results(3) Creation time is most implemented ranking criterion • 18 occurrences • Others: score: 8, ratings: 6 • Important content is renewed - unimportant content will be forgotten(4) Content modification and user roles are rarely implemented  2 occurrences  Requires complex role system and users who understand it Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 91
  • 90. GOVERNANCE MODEL:DEEP DIVE - SIMULATION Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 92
  • 91. Methodology Principle 1. Define a Web Community model (Lycos IQ, Yahoo Answers…) 2. Adapt this model to an existing community 3. Estimate parameters 4. Define quality measure 5. Simulate community behaviour 6. Compare simulation results with real data 7. Analyze quality measures wrt variations of CoSiMo parameters Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 93
  • 92. Dataset Lycos IQ Time Period: 909 days Users: 34.327 Administrators: 36 Questions: 1.031.982 Answers: 2.996.446 Deleted non-compliant Answers: 21.139 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 94
  • 93. Observed parameters (input to simulation) 100000 10000 1000 100 Number of Users 10 1 0-999 1000-1999 2000-2999 3000-3999 4000-4999 5000-5999 0.9-1.0 0.8-0.89 6000-6999 0.7-0.79 0.6-0.69 7000-7999 0.5-0.59 0.4-0.49 >7000 0.3-0.39 0.2-0.29 Answers 0.1-0.19 0.0-0.09 per year Rate of Compliant Answers Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 95
  • 94. Example Behaviors and Example Policies Behaviors of Ordinary Users: Reading Policies for • Create new postings Administrators: • Read existing postings PA: random selection of • Report non-compliant postings postings PB: random selection of OR give bonus points to postings that no other poster administrator has examined so far Moderator Users: PC: selection of postings that • Create new postings were most often reported • Read existing postings by users for being non- • Delete non-compliant compliant posting OR give bonus points to Promotion Policy: poster PM-X : ordinary users become moderators (who can Administrators: delete postings) when •Read existing postings having at least X bonus •Delete non-compliant points postings Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 96
  • 95. How many administrators are needed? 1,05 0,95- 1,05 0,95 0,85- Recent 0,95 0,85 Posting 0,75- Quality 0,85 0,75 0,65- 0,65 0,75 5 10 20 40 1152 80 288 72 160 Additional non-compliant 320 18 640 4 Postings (per day) 1280 Number of Administrators 2560 1 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 97
  • 96. Fighting spam with administrators… 1 0,998 0,996 0,994 0,998-1Recent 0,992Posting 0,99 0,996-0,998 576Quality 0,994-0,996 72 9 0,992-0,994 1 0,99-0,992 Number of Administrators Applied Policies Variation of policies and number of administrators • Efficient policies result in high quality content • A minimum of 18 administrators are needed • Many moderators are needed to bring the quality to a high level Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 98
  • 97. Fighting spam with user moderators… 1 0,95 0,9 0,85 0,8 0,95-1 0,75 0,7 Recent 5 0,65 100,6 Posting 20 40 0,9- 80 160 Quality 320 640 0,95 1280 PA+PB+PC+PM12 2560 0,85- PA+PB+PC+PM25 PA+PB+PC+PM1… PA+PB+PC+PM50 PA+PB+PC+PM3… PA+PB+PC+PM100 PA+PB+PC+PM200 PA+PB+PC+PM400 PA+PB+PC+PM800 PA+PB+PC PA+PB 0,9 PA Additional non- compliant Postings (per day) Applied Policies Variation of policies and posting quality • A limited number of administrators has a limited capacity of filtering a surge of non-compliant postings • Moderators are helping to increase quality Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 99
  • 98. Lessons Learned • Strategy of selecting questionable postings is crucial • Reporting by normal users is the most effective strategy • Moderators are not so effective as expected, if they hunt only incidentally for non-compliant content • Sufficiently strong requirements regarding moderator profiles lead to high quality of moderators • Policies for promoting users need to be based on a criterion that is time dependent Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 100
  • 99. Agenda• Risks and Opportunities in Social Communities: the ROBUST project• Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 101
  • 100. Are we satisfied here? No! Not by far!Understand how and why users tag or tweet? -> What are people‘s limitations that affect the system? -> Psychology and Sociology!What are their legal boundaries? -> How can you shape the systems? -> Law!What are organizations‘ incentives? -> Why and how do organizations participate? -> Nice example: open source -> Economy Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 102
  • 101. Web Science & Technologies University of Koblenz ▪ Landau, GermanyThank You!
  • 102. ReferencesThe Slashdot Zoo: Mining a social network with negative edgesJ. Kunegis, A. Lommatzsch and C. BauckhageIn Proc. World Wide Web Conf., pp. 741–750, 2009.Learning spectral graph transformations for link predictionJ. Kunegis and A. LommatzschIn Proc. Int. Conf. on Machine Learning, pp. 561–568, 2009.Spectral analysis of signed graphs for clustering, prediction andvisualizationJ. Kunegis, S. Schmidt, A. Lommatzsch and J. LernerIn Proc. SIAM Int. Conf. on Data Mining, pp. 559–570, 2010.Network growth and the spectral evolution modelJ. Kunegis, D. Fay and C. BauckhageIn Proc. Conf. on Information and Knowledge Management,pp. 739–748, 2010. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 104
  • 103. ReferencesB. Viswanath, A. Mislove, M. Cha, K. P. Gummadi, On theevolution of user interaction in Facebook. In Proc.Workshop on Online Social Networks, pp. 37–42, 2009. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 105
  • 104. ReferencesK. Dellschaft, S. Staab. An Epistemic Dynamic Model for Tagging Systems. HYPERTEXT 2008, Proceedings of the 19th ACM Conference on Hypertext and Hypermedia, June 19-21, 2008 - Pittsburgh, Pennsylvania, USA.K. Dellschaft, S. Staab. On Differences in the Tagging Behavior of Spammers and Regular Users. In: Proc. of WebSci-2010, Raleigh, April, 2010.F. Schwagereit, S. Sizov, S. Staab. Finding Optimal Policies for Online Communities with CoSiMo. In: Proc. of WebSci- 2010, Raleigh, US, April, 2010. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 106

×