Team activity analysis / visualization

Team Activity Analysis/Visualisation Project Members : Judy Kay,Nicolas Maisonneuve, Peter Reimann, Kalina Yacef

Objectives Goal: Try to build « a system that will support groups in learning how to work more effectively as groups » ( Reimann & al. 2004) Visualisation of the collaboration (for Teacher and Student) With different levels of feedback: mirroring, advice For self-managed learning groups ( small teams of students ) Using logs generated by the collaboration tools.

Case studies Activity logs from real cases: IT project: semester long software development Forum/discussion from the public Health School Chat/Whiteboard from the Faculty of Education Event definition Event = { Author, Time, Type Event, Resource} IT Project: the Trac system. Ticket: Task Manager SVN: Source Repository Wiki: Web Page Editor

Related Works & Approach Related works in the Team Activity analysis: About 20 recent articles found in the following fields: Source repository Mining , Process Mining Visualisation for Team Software development. Very few are interesting for our concern: the Collaboration aspect. No satisfying result. Core of our research Visualisation part: Build some visualisations mirroring the group’s activity. Datamining part: Find relevant/Frequent sequences of events characterizing:

Visualisation Goal:Visualise the team’s activity to: Easily compare users / groups. Study some teamworking theories (Big-Five) 3 types of visualisation: Visualise the users’ participation within a group. Overview of the interaction between users. The timeline representing the users’ activity.

Activity Radar – Overview Properties: Participation for each data source. Participation relative to the group / class. Calculation The participation W for the source s : , set of all the events from s (wiki,svn,..) ,weight of participation for the event e. Wiki and SVN event, = added lines Ticket, =1 (not quantifiable) Todo: Quantify a ticket event by the importance of its action: w (Open) > w (Assign)> w (Comment) High participation Average Participation Low participation

Activity Radar – Comparison with a classic histogram Main difference: the reading process: Activity radar: Radial Scan Histogram: Horizontal Scan  Different purposes: Activity radar: fast global reading IF the shape is not too much deformed (DG1 vs DG2) (pb with a lot of users) Histogram: more precise graph: useful for a user self-evaluation. Todo: study more the validity of this graph in a HCI pov DG 1 H1 DG 2 H2

Interaction Network Goal Overview of interactions in a group. Unidirectional graph Some possible analysis: Group centric: level of interaction in the group.(many/few connections, global shape) User centric : some characteristic behaviors: Leader, lonely person. Analysis based on 2 parameters: number of connections: Influence of the user Connection’s width: Quantity of interaction between 2 users. Interaction for the wiki Interaction for the tickets

Calculation of the interaction I , the set of all the resources from s ,the weight of the interaction between 2 users for the resource r. Calculation for Global Interaction: (ex: source code) TODO: Feedback notion: I(a,b) ≠I(b,a) Direct feedback (ex: chat) Indirect feedback (ex: forum)… Interaction Network Example of calculation: Sequence of events (authors) for a resource: < a,b,d,b,a,b,a,e,d,b,d,b,d> global Interaction: I(a,d) = I(d,a) = min(3,4)=3 Direct feedback: I(a,d)=0, l(d,a)=0, l(b,d)=3, l(d,b)=2 Indirect feedback: Aa={{b,d,b,a},{b}}  I(a,d)=1 Ad={{a,b},{b,a,e},{b},{b}}  I(d,a)2

Wattle Tree Goal: Overview of the group’s activity during the time of the project. Depends on the domain Explanation of the graphic Each vertical tree = user’s activity Each tree component = an event Red Petal: add/modify a wikipage (WIKI) Blue Petal: add/modif source code (SVN) Branch: open/close ticket (TICKET)

Datamining Goal: Find frequent patterns (sequence of events) characterizing an aspect of the Teamwork. good/bad practices (Performance Indicator) the user’s role (e.g. the leader/ normal user) the group/user’s interaction level Event definition Event = {Author, Time, Type Event, Resource}

Frequent Sequence Pattern Customer Sequence : (ordered) LIST of items. Item: a element in the alphabet (item collection called C). subsequence : <e1,e2,…,en> is a subsequence of a sequence S if it matches the pattern <e1,*,e2,*,e3,…,*,en> in S. Ex: <a,b,c> < a ,e, b ,a, c > Support for a sequence S: the number of customer sequences of which S is a subsequence. Problem: Find all the frequent subsequences with a support > Smin. The alphabet C={a,b,c,d,e,f} Example: Find all the seq with a Minimum Support = 3: Sup(<a,b,c>)=2 , no good. Sup(<a,c>)=4  <a,c> ok Sup(<e,b,a>)=3  <e,b,a> ok <d,e,b,a,c> 4 <b,e,b,f,a,c> 3 <a,c,c,b,c> 2 <a,e,b,a,c> 1 Sequence CustID

2 main problems: Preprocessing Step: Transform our sequences of events into a list of subsequences. Mining Step: Made an algorithm to all the frequent subsequences in this list.

Generation of the list of sequences Cutting by activity: Ideal situation : each event should be linked to an activity. Not yet  Timeout cutting: Create a new activity if [ti, ti+1]> α with ti the time of the event i (i.e the events are considered as independant each other) Example: cutting seq at some points: [t2, t3]> α , [t6, t7]> α , [t8, t9]> α Results: 4 activities extracted Cutting by resource: A sequence for each resource <s,t> 4 <t,t> 3 <t,s,s,w> 2 <w,w> 1 Sequence ActivityID <t> Ticket2 <t,t> Ticket1 <w,w,w> wikiPage2 <w> wikiPage1 Sequence ResID

Generation of the list of sequences Transformation of Sequence of events in a sequence of Items. Step 1: Define an Alphabet (collection of item) Exemple: Item with 3 properties: Type of event number of occurrences, number of distinct authors, Step 2: transformation Seq of events  Seq of items … . 3 <(1t1),(2s1),(1t1)> 3 <(1w1),(1t1),(2w2)> 2 <(3w2),(1t1)> 1 Sequence of items ActivityID <w1,w2,t1,t2> 4 <t1,s1,s1,s3,t2> 3 <w1,t1,w2,w1> 2 <w1,w1,w2,t1> 1 Sequence of events ActivtyID

GSP, Generalized Sequence Pattern. Based on the Heuristic: « if a sequence is frequent, the subsequences are frequent too. » Algorithm: Step 1: find all frequent items with a minimum support Step k - Generation Phase: Generate some candidate k-sequence with the frequent k-1 sequence. Step k - Support test: test the support of every candidates. (delete the no frequent candidate) Go to step K+1 Ex: Minimum Support=3 Step 1: Sup(a)=3, Sup(b)=4, Sup(e)=1, Step k: Generation Phase (join phase) <a,b,c,d> 4 <a,b,f,c,e,d> 3 <a,b,f,c,d> 2 <a,b,c,f,d> 1 <a,b,f> <b,f, d> <a,b,f,d> <b,c,d> <a,b,c,d> <a,b,c> Candidate 4-sequence Frequent 3-sequence

Results - Resources. Possible Analysis on : the number of authors interacted in a resource The number of time than a resource is modified The life of the modification 1 0 0 0 0 (9w5,16) 0 0 1 0 0 (9t5,50) 208 755 654 160 99 (1s1,0) 1 5 10 54 11 (1w1,0) 13 95 22 49 0 (2s1,0) … . Ticket2 <(1,t,1,5) > Ticket1 <(2,w,2,12) > wikipage2 <(3,w,2,12) > wikipage1 Sequence of items REsID <t1> Ticket2 <t1,t2> Ticket1 <w1,w2> wikipage2 <w1,w1,w2> wikipage1 Sequence of events ResID

Future works Data Problem due to the data: incompleteness of the data: no linked among the events Bad use of the tool (svn) add meta data to link the events (e.g ticketID in the SVN commit log) Analysis with data from some professional projects. Research on an algorithm handling noisy in the data: Add a notion of similarity < a,b,c,d,e> similar sequence than <a,b,c,d,e,f> or <a,b,c,e,e> Find a heuristic to avoid a combinatory explosion. Classification of the user/group. Use the participation and interaction weights to clusterize the users/groups: e.g. User={Part(svn)=100, Part(wiki)=1, Part(ticket)=0} = developer role

Team activity analysis / visualization

More Related Content

What's hot

Viewers also liked

Similar to Team activity analysis / visualization

More from Nicolas Maisonneuve

Recently uploaded

Team activity analysis / visualization

Editor's Notes