• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The MediaBase
 

The MediaBase

on

  • 4,528 views

The MediaBase

The MediaBase
A Webinar for the TELMAP project
December 16, 2010

Ralf Klamma
RWTH Aachen University
Information Systems & Database Technology

Statistics

Views

Total Views
4,528
Views on SlideShare
2,506
Embed Views
2,022

Actions

Likes
0
Downloads
20
Comments
0

15 Embeds 2,022

http://mashe.hawksey.info 1423
http://dbis.rwth-aachen.de 503
http://learningfrontiers.eu 30
http://beamtenherrschaft.blogspot.com 19
http://feeds.feedburner.com 15
http://www.twylah.com 11
http://www.ectel07.org 7
http://www.learningfrontiers.eu 4
http://translate.googleusercontent.com 3
https://translate.googleusercontent.com 2
http://www.beamtenherrschaft.blogspot.com 1
http://beamtenherrschaft.blogspot.in 1
http://complexnetworkst.blogspot.com 1
http://static.slidesharecdn.com 1
http://webcache.googleusercontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The MediaBase The MediaBase Presentation Transcript

    • Informatik 5 (DBIS), RWTH Aachen UniversityTeLLNet GALA The MediaBase Ralf Klamma Webinar December 16, 2010Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-1
    • TeLLNet GALA The Overall ApproachLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-2
    • What is unique about the MediaBase?  Interdisciplinary multidimensional model of digital networks – Social network analysis (SNA) is defining measures for socialTeLLNet Community relations GALA – Actor network theory (ANT) is connecting human and media agents – I* framework is defining strategic goals and dependencies – Theory of media transcriptions is studying cross-media knowledge social software Media Networks network of artifacts Wiki, Blog, Podcast, IM, Chat, Microcontent, Blog entry, Message, Burst, Thread, Email, Newsgroup, Chat … Comment, Conversation, Feedback (Rating) i*-Dependencies (Structural, Cross-media) network of membersLehrstuhl Informatik 5 Members (Social Network Analysis: Centrality,(Informationssysteme) Prof. Dr. M. Jarke Efficiency) Communities of practice I5-KL-111010-3
    • Modeling Dependencies Using the i* Framework Coordination Iterant Coordinator BrokerTeLLNet GALA isA isA isA Member Gatekeeper Artifact isA URL Hub Legend: Agent Goal Communication Network ResourceLehrstuhl Informatik 5 Task(Informationssysteme) Prof. Dr. M. Jarke Eric S. K. Yu, Towards Modeling and Reasoning Support for Early-Phase Requirements Engineering, RE 1997 I5-KL-111010-4
    • What can you do with the Mediabase  Community Interface for (Firefox Plugin) – Adding media for crawling, searching & viewingTeLLNet GALA – Observing social networks over time – Retrieving structural patterns of media – Applying Web 2.0 operations (tagging, etc.) on media  Writing your own crawlers  Applying all kind of social network measures – Centrality measures – Finding influential & powerful persons – Network statistics – Understand networks at large  Advanced queries in RDF Store on concepts and relations – Who is the owner of company x?Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke – Structured input for conceptual mapping tools I5-KL-111010-5
    • What is the MediaBase? Collection of Social Software artifacts:TeLLNet  Mailing lists (>200 k)  Wikipedias GALA  Blogs (>300 k)  RSS Feeds  Websites  Forums  Newsletters  … The MediaBase • IBM DB 2 data store • 24/7 Perl crawlers for media artifacts • Community oriented Commander Interface • Social network analysis & visualization toolsLehrstuhl Informatik 5 • PALADIN: A pattern language for automatic behavior detection Automatic extraction of concepts and relations in RDF(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-6 •
    • TeLLNet GALA The Data ModelLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-7
    • MediaBase ModelTeLLNet A Mediabase is a six-tuple graph GALA M = (A, R, µ , ν , η , L) R ⊆ A×A µ :A → L ν :R → L η : R → {0, 1}Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-8
    • Simplified Meta Model Attribute has ActorTeLLNet GALA isA Medium Artifact Process Agent Community isA stores creates is affected by belongs go represents consumes performs ranksLehrstuhl Informatik 5 Browse Address Transcribe … Localize(Informationssysteme) Prof. Dr. M. Jarke Latour: On Recalling ANT, 1999 I5-KL-111010-9
    • Actors in the MediabaseTeLLNet A ⊆ {Medium, Artefact, Process, Agent, Network} GALA Mailing lists, Newsletter, Newsgroup, Feed,    Medium ⊆ Web - site, Blog, Podcast, Chat room, Wiki, Forum,  Social bookmarking site, Folksonomy    Message, E - mail, Index, Comment, RSS Entry, Transaction,  Host, Feedback, Conversation, Burst, Blog entry, Thread,    Artifact ⊆    Executions, Tag, Trackback, Review, URL, Rating, Multimedia, Rankíng, Reference    Acquisition, Search, Monitoring,  Process ⊆    Retrieval, Transcription, Addressing Administrator, Member, Lurker, Reviewer, Dead, Answering person,Lehrstuhl Informatik 5 Agent ⊆  (Informationssysteme) Questioner, Troll, Spammer, Conversationalist, Expert  Prof. Dr. M. Jarke I5-KL-111010-10
    • Medium – Artifact Compatibility Mailing Transaction- Chat Email Blog Wiki URL ForumTeLLNet List based Website Room GALA Message + + - - - - - + Thread - + - - + - - + Burst + + + + + - - + Conversation - - - - - + - + Blog Entry - - + - - - - - Comment - - + + + - - + Web Page - - - - + - + - Transaction - - - + - - - - Feedback - - - + - - - +Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-11
    • TeLLNet GALA The CrawlersLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-12
    • Crawling Technologies Mix of dumps (Wikis) and special purpose crawlers:TeLLNet GALA W = Media ∪ Artifact I = Media ∪ Artifact ∪ Process ∪ Agent G = Media ∪ Artifact ∪ Process ∪ Agent ∪ Network MW = Mailing list ∪ Message ∪ Thread ∪ IndexLehrstuhl Informatik 5(Informationssysteme) BW = Blog ∪ Blogroll ∪ Blogentry ∪ Comment ∪ Index Prof. Dr. M. Jarke I5-KL-111010-13
    • Crawler OverviewTeLLNet GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-14
    • Website CrawlerTeLLNet GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-15
    • Feed CrawlerTeLLNet GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-16
    • Mailinglist CrawlerTeLLNet GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-17
    • News CrawlerTeLLNet GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-18
    • Podcast CrawlerTeLLNet GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-19
    • TeLLNet GALA The MediaBase CommanderLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-20
    • Media Base Web 2.0 Commander  Personalization (user annotates resources with tags and has his page)  Community-awareness (resources and annotation of others are open)TeLLNet  User-friendly interface (Firefox plug-in, easy insertion of resources, tags, tracking of GALA recent changes)Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-21
    • Application Programmer Interfaces  Under DevelopmentTeLLNet – GraphService – Visualization and PALADIN GALA – http://dbis.rwth- aachen.de/~atlas/module_build/JavaDoc//atlas_las_services_gr aph-service/HEAD/javadoc/index.html – TargETLy Service – RDF Data Generator – http://dbis.rwth- aachen.de/~atlas/module_build/JavaDoc/atlas_theses_da_kren ge_TargETLy2/HEAD/javadoc/index.htmlLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-22
    • GraphService  AbstractDigitalNetwork – Representation ofTeLLNet MetaModel GALA  Classes for Networks – Blogs, Mailinglists, etc.  Classes for Basic SNA  Classes for Pattern Analysis  Classes for GraphLayoutLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-23
    • TargETLy Service  Connection to RDF Store  OpenCalais Service – RDF GeneratorTeLLNet GALA  Pattern Analysis  IntentAnalysis  Collection of predefined RDF Queries – e.g. companyCompetitor, companyEmployeeNumber – e.g. patentFiling, patentIssuance – e.g. personEmailAddress, creditRatingLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-24
    • TeLLNet GALA PALADIN – Pattern AnalysisLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-25
    • PALADIN: Disturbances in Cross-media Social Networks  What is a disturbance?TeLLNet – Sensing an incompatibility GALA between theories exposed and theories-in-use  Disturbances are starting points of learning processes – Disturbances disturb, prevent … but they are creating reflection  Disturbances are hard to detect or to forecastLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-26
    • Pattern Language for PALADIN: Example Troll Troll Pattern: This pattern tries to discover the cases when a troll exists in a digital social network. A troll in the network is considered a disturbance.TeLLNet Disturbance: GALA (EXISTS [medium | medium.affordance = threadArtefact]) & (EXISTS [troll |(EXISTS [thread | (thread.author = troll) & (COUNT [message | (message.author = troll) & (message.posted = thread)]) > minPosts]) & (~EXISTS[ thread1, message1| (thread1.author1 != troll) & (message1.author = troll & message1.posted = thread1 ]))])]) Forces: medium; troll; network; member; thread; message; url Force Relations: neighbour(troll, member); own thread(troll, thread) Solution: No attention must be paid to the discussions started by the troll. Rationale: The troll needs attention to continue its activities. If no attention is paid, he/sheLehrstuhl Informatik 5 will stop participating in the discussions. Pattern Relations: Associates Spammer pattern.(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-27
    • Pattern Discovery Process Pattern 1. Set pattern Pattern Template parameters Disturbance Disturbance Variables PatternTeLLNet 4a. Variables Parameters Change GALA Pattern Instance Pattern Parameters Disturbance Digital Social Network 2. Instantiate disturbances 4b. Apply Variables Pattern Pattern Solution Parameters Pattern Template Instance Forces Force Relations Disturbance Instances Description Solution Variables Pattern Parameters Rationale Dependencies 3. Evaluate disturbancesLehrstuhl Informatik 5(Informationssysteme) Pattern Relations Prof. Dr. M. Jarke I5-KL-111010-28
    • PALADIN Case Study 10 patterns of disturbance over 119 social network instances,TeLLNet 17359 individuals, 215 345 mails GALA Pattern Occurrences Remarks Burst 22 The pattern finds out topics which were very important for certain period of time. Scalability is necessary. No Conversationalist 76 The existence implies little communication in the network. No Questioner 67 The existence implies that the network is not popular. No Answering Person 61 Occurs in small networks. The effects of the lack of an answering person must be further checked with content analysis. Troll 2 Troll occurs very rarely in cultural communities. True negatives exist. Spammer 86 Spammers can be found often in discussion groups. False positives exist. Leader 37 The pattern occurs in the network centered around a member. No Leader 40 Occurs in big networks where the members are distributed in different clusters. Structural Hole 67 Occurs for members having neighbors with only one contact.Lehrstuhl Informatik 5(Informationssysteme) Independent 13 Occurs in large networks where disconnected subnetworks exist. Prof. Dr. M. Jarke I5-KL-111010-29 Discussions Scalability is necessary.
    • TeLLNet GALA Visualization & AnalysisLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-30
    • Social Network Analysis of Open Source Communities  Eclipse components network based on analysis ofTeLLNet source code repository (Software Architecture) GALA  Eclipse components network based on analysis of mailing list communication (Social Structure)Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-31
    • Community Reflection about Development ProcessTeLLNet GALA  Social platform: Eclipse forum eclipsezone  Forum: Eclipse communication framework (ECF)  Measure: degree centralityLehrstuhl Informatik 5(Informationssysteme)  Statistics: 225 nodes, 283 edges Prof. Dr. M. Jarke I5-KL-111010-32
    • Conversationalist Pattern  Social platform: Eclipse mailing listTeLLNet  Forum: Device debugging developer discussion GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-33
    • Questioner Pattern  Social platform: Eclipse mailing listTeLLNet  Forum: Device debugging developer discussion GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-34
    • Identification of End-Users and Developers in OSS Communities CommunityTeLLNet Clustering GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-35
    • Textual Analysis of Postings from Community ExpertsTeLLNet GALA Postings from experts of one of the identified communitiesLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-36
    • Computer Science Knowledge Network: the VisualizationTeLLNet GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-37
    • Computer Science Knowledge Network: ClusteringTeLLNet GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-38
    • Interdisciplinary Venues: Top Betweenness CentralityTeLLNet GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-39
    • High Prestige Series: Top PageRankTeLLNet GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-40
    • Data Sets  DBLP (http://www.informatik.uni-trier.de/~ley/db/) - 788,259 author’s namesTeLLNet - 1,226,412 publications GALA - 3,490 venues (conferences, workshops, journals)  CiteSeerX (http://citeseerx.ist.psu.edu/) - 7,385,652 publications (including publications in reference lists) - 22,735,240 citations - Over 4 million author’s names  Combination - Canopy clustering [McCallum 2000] - Result: 864,097 matched pairs - On average: venues cite 2306 and are cited 2037 timesLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-41
    • WikiWatcher – System Design Stage 1: SAX-based Parser in PERL Wiki Network DataTeLLNet Authors Generating XML Parsing wiki data/ GALA dump/export files database transfer Article pages, Joe URLS, Revisions Liz article Tim [[Article]] RDB [[requested]] 123.45.67.89 Stage 2: Dynamic Analysis and Visualization article [http://…] [[Article2]] Generating Networks Measurement [[never exists]] MetadataLehrstuhl Informatik 5(Informationssysteme) Visualization Network Analysis Prof. Dr. M. Jarke I5-KL-111010-42
    • Network Heterogeneity  Author NetworksTeLLNet – Author nodes GALA (anonymous/registered users) – Edges represent collaboration between authors during a period t  Article Networks – Article nodes (incl. wiki namespaces) – Directed edges (links) between articlesLehrstuhl Informatik 5(Informationssysteme)  As expected both kind of Prof. Dr. M. Jarke I5-KL-111010-43 networks stay heterogenous
    • Importance of Network Actors  Articles: High betweennessTeLLNet centrality controls the flow of GALA information within a Wiki  Betweenness values grow up or stay nearly constant during the evolution process  Determines – Important actors – Important articles – VandalismLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-44
    • Evolution of Shortest Paths  Densification Power Law:TeLLNet Complex networks may GALA become denser during their growth  Generally this could not verified for wiki author networks!  The average distances stagnate at nearly 2 for all considered author networksLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-45
    • Evolution of Author Networks  Strongly connected components merged by collaboration of two wiki authorsTeLLNet GALALehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke Author Network of German Wikia in July 2007 Author Network of German Wikia in August 2007 I5-KL-111010-46
    • TeLLNet GALA Visualization & AnalysisLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-47
    • What you cannot do with the Mediabase (in the moment )  Creating a new Mediabase in a new environmentTeLLNet – Maintenance with databases, scripts and interfaces is tedious GALA – Interfaces integrated into Zope/Plone  Not all media are equally supported – Very good support for mailing lists, forums, web sites and blogs – Less support for wikis, podcasts, social bookmarks  Lacking support for – Conceptual navigation interface (Conzilla!) – Discourse management tools – Weak signal analysis tools – Topic & sentiment & opinion mining toolsLehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke – Automatic generation of recommendations I5-KL-111010-48
    • The Future of the Mediabase: CommunityBaseTeLLNet GALA Activity Theory [Enge87] Actor Network Self- Community Self- Theory [Lato05] modeling experience reflection repository Community of Practice [Weng98] + disturbance +/- - disturbance disturbance [PeKl08]Lehrstuhl Informatik 5(Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-49 Self-modeling phase contributes to self-reflection phase and vice versa