Zinayida Petrushyna, Ralf Klamma RWTH Aachen University Workshop “Digital social networks”, Munich September 12, 2008 The ...
Agenda <ul><li>Motivation & Problem definition </li></ul><ul><li>Data Management for Web Science  </li></ul><ul><ul><li>Cr...
Data Management issues in Web Science <ul><li>Interoperable formats </li></ul><ul><ul><li>XML based – Wikis , RSS Feeds, M...
Data Model for the Web 2.0 Latour: On Recalling ANT , 1999
Mediabase <ul><li>A Mediabase is a six-tuple graph  </li></ul>
Actors in the Mediabase
Crawling Technologies Mix of dumps (Wikis) and special purpose crawlers:
Trolls under the Bridge <ul><li>What is a disturbance, e.g. a troll? </li></ul><ul><ul><li>Sensing an incompatibility betw...
Complex Troll Pattern  in Basic Notation
Complex Troll Pattern  in Basic Notation 10 )
Pattern Language <ul><li>Variables  – simple variables ( troll, thread ), properties ( thread.author ) and set variables (...
Pattern Language for  PALADIN : Example Troll <ul><li>Troll Pattern : This pattern tries to discover the cases when a trol...
Pattern Discovery Process Digital Social Network 1.  Set pattern parameters 2.  Instantiate disturbances 3.  Evaluate dist...
Visualization
Conclusions and Outlook <ul><li>Homogeneous data management </li></ul><ul><li>Pattern language for disturbance analysis </...
Upcoming SlideShare
Loading in …5
×

The Troll under the Bridge: Data Management for Huge Web Science Mediabases

1,371 views
1,225 views

Published on

Zinayida Petrushyna, Ralf Klamma
RWTH Aachen University
Workshop “Digital Social Networks”, Munich
September 12, 2008

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,371
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The Troll under the Bridge: Data Management for Huge Web Science Mediabases

    1. 1. Zinayida Petrushyna, Ralf Klamma RWTH Aachen University Workshop “Digital social networks”, Munich September 12, 2008 The Troll under the Bridge: Data Management for Huge Web Science Mediabases
    2. 2. Agenda <ul><li>Motivation & Problem definition </li></ul><ul><li>Data Management for Web Science </li></ul><ul><ul><li>Crawling: Watchers </li></ul></ul><ul><ul><li>Analysis: Patterns </li></ul></ul><ul><ul><li>Visualization: Graphs </li></ul></ul><ul><li>Conclusion </li></ul><ul><li>Outlook </li></ul>
    3. 3. Data Management issues in Web Science <ul><li>Interoperable formats </li></ul><ul><ul><li>XML based – Wikis , RSS Feeds, Microformat </li></ul></ul><ul><ul><li>SQL based – Deep Web </li></ul></ul><ul><ul><li>Text based – Websites, Forums </li></ul></ul><ul><li>Non-continuous analysis </li></ul><ul><ul><li>Crawling vs. Dumps </li></ul></ul><ul><ul><li>Special purpose vs. General purpose </li></ul></ul><ul><li>Aggregation level is not possible to achieve </li></ul><ul><ul><li>Data warehouses </li></ul></ul><ul><ul><li>Theoretical considerations of agency – Actor network theory </li></ul></ul>
    4. 4. Data Model for the Web 2.0 Latour: On Recalling ANT , 1999
    5. 5. Mediabase <ul><li>A Mediabase is a six-tuple graph </li></ul>
    6. 6. Actors in the Mediabase
    7. 7. Crawling Technologies Mix of dumps (Wikis) and special purpose crawlers:
    8. 8. Trolls under the Bridge <ul><li>What is a disturbance, e.g. a troll? </li></ul><ul><ul><li>Sensing an incompatibility between theories exposed and theories-in-use </li></ul></ul><ul><li>Disturbances are starting points of learning processes </li></ul><ul><ul><li>Disturbances disturb, prevent … but they are creating reflection </li></ul></ul><ul><li>Disturbances are hard to detect or to forecast </li></ul>
    9. 9. Complex Troll Pattern in Basic Notation
    10. 10. Complex Troll Pattern in Basic Notation 10 )
    11. 11. Pattern Language <ul><li>Variables – simple variables ( troll, thread ), properties ( thread.author ) and set variables ( v 1 ,…,v n ). </li></ul><ul><li>Operations </li></ul><ul><ul><li>Arithmetic (+, -, *, / ) </li></ul></ul><ul><ul><li>Aggregate ( SUM , COUNT , AVERAGE ) </li></ul></ul><ul><ul><li>Logical (&, |, ~, FORALL and EXISTS ) </li></ul></ul><ul><ul><li>Comparison ( = , != , > , < ). </li></ul></ul><ul><li>Rules for variable binding </li></ul><ul><ul><li>Simple variables – pattern parameters, actors or set variables </li></ul></ul><ul><ul><li>Properties – actor properties or relations </li></ul></ul><ul><ul><li>Set variables – actors </li></ul></ul><ul><li>Interpreted by a finite state automaton </li></ul>
    12. 12. Pattern Language for PALADIN : Example Troll <ul><li>Troll Pattern : This pattern tries to discover the cases when a troll exists in a digital social network. A troll in the network is considered a disturbance. </li></ul><ul><li>Disturbance : </li></ul><ul><li>(EXISTS [medium | medium.affordance = threadArtefact]) & </li></ul><ul><li>(EXISTS [troll |(EXISTS [thread | (thread.author = troll) & </li></ul><ul><li> (COUNT [message | (message.author = troll) & </li></ul><ul><li> (message.posted = thread)]) > minPosts]) & </li></ul><ul><li> (~EXISTS[ thread 1 , message 1 | (thread 1 .author 1 != troll) & </li></ul><ul><li> (message 1 .author = troll & message 1 .posted = thread 1 ]))])]) </li></ul><ul><li>Forces : medium; troll; network; member; thread; message; url </li></ul><ul><li>Force Relations : neighbour(troll, member); own thread(troll, thread) </li></ul><ul><li>Solution : No attention must be paid to the discussions started by the troll . </li></ul><ul><li>Rationale : The troll needs attention to continue its activities. If no attention is paid, he/she will stop participating in the discussions. </li></ul><ul><li>Pattern Relations : Associates Spammer pattern. </li></ul>
    13. 13. Pattern Discovery Process Digital Social Network 1. Set pattern parameters 2. Instantiate disturbances 3. Evaluate disturbances 4a. Change Pattern Parameters 4b. Apply Pattern Solution Pattern Disturbance Variables Pattern Template Disturbance Variables Pattern Parameters Pattern Template Instance Pattern Instance Disturbance Variables Pattern Parameters Forces Force Relations Rationale Dependencies Description Solution Pattern Relations Disturbance Instances Variables Pattern Parameters
    14. 14. Visualization
    15. 15. Conclusions and Outlook <ul><li>Homogeneous data management </li></ul><ul><li>Pattern language for disturbance analysis </li></ul><ul><li>Graph-based visualization </li></ul><ul><li>Data uncertainty and inconsistent data </li></ul><ul><li>Goals and intentions of analysts </li></ul><ul><li>Dynamic Mediabase visualization </li></ul>

    ×