- 1. Time series analysis of collaborative activitiesIrene-Angelica Chounta, Nikolaos Avouris HCI Group, University of Patras {houren, avouris}@upatras.gr
- 2. Outline• Objective• Time series and collaborative activities• Methodology of Analysis• Results• Conclusions and future work
- 3. Objective• Use of time series as a tool of analysis• Real time assessment of activity• Classification of collaborative sessions
- 4. Time series and collaborative activities • Time: important aspect of collaboration • Analysis regarding time can describe/reveal underlying group dynamics • Phenomena that may affect the quality of collaboration can be captured in this way (Vasileiadou, E., 2009)
- 5. Methodology of Analysis (1) Memory-based learning modelCollaborative session X tsA /CQA_A DistanceX-A IF (DistanceX-Y is minimum) tsB /CQA_B DistanceX-B then { CQA_X ≈ CQA_Y } … where CQA: Collaboration Quality Assessment tsN /CQA_n DistanceX-N
- 6. Methodology of Analysis (2)• a data pool of 212 collaborative sessions (collaboration quality assessed by rating scheme) (Kahrimanis, G., et al, 2009)• Groupware application: shared workspace + chat tool - Task: Dyads constructing flow charts – Duration: 1h30’• same conditions applied for all clients/collaborators
- 7. Methodology of Analysis (3)• time series (multivariate) of aggregated sequences of events of collaborative activities per time interval – Number of Chat Messages and Workspace actions, – Roles’ Alternations in Chat and Workspace activity – Their differences between consecutive time intervals• Various time intervals (1, 5, 8 and 10 minutes)• distance measure: Dynamic Time Warping (DTW) distance (Giorgino, T., 2009)• two dissimilarity functions (Euclidean and Manhattan)
- 8. Results (1)Model evaluation:• the correlation matrix of CQA(predicted vs. true value)• the root mean squared error (RMSE)• the mean absolute error (MAE)
- 9. Results (2)• The two variables (predicted vs. real CQA value) are significantly and positively correlated (p<0.05, Rho>0) for all time intervals Manhattan Euclidean Time interval (min) p value Spearman’s Rho p value Spearman’s Rho 1 0.000 0.296 0.029 0.150 5 0.002 0.202 0.021 0.154 8 0.000 0.235 0.005 0.187 10 0.011 0.168 0.010 0.170
- 10. Results (3)• MAE and RMSE For (CQA Є{-2, 2}) MAE RMSE Time interval (min) Manhattan Euclidean Manhattan Euclidean 1 0.89* 0.97 1.14 1.21 5 1.19 1.21 1.48 1.5 8 1.18 1.16 1.5 1.48 10 1.17 1.19 1.44 1.47
- 11. Results (4)For time interval=1 minute and Manhattandistance: |CQA_eval-CQA_pred| %cases <0.5 41 <1 68.4 <2 92 CQA Є{-2, 2}
- 12. Conclusions & Future Work• Significant positive correlations among the (CQA_evaluative, CQA_prediction)• Best results occur for 1 minute time interval and Manhattan distance (Rho:0.3,MAE: 0.89,RMSE: 1.1, CQA Є{-2, 2})• Advanced classification techniques (k-nearest neighbor) are expected to improve the results• Further explore real time assessment and the way feedback affects collaboration’s unfolding
- 13. Thank you …Questions are welcome!
- 14. Euclidean vs. Manhattan• Best distance highly dependable on data’s nature• Euclidean distance is not good with high dimensional data Euclidean: Manhattan:
- 15. Dynamic Time Warping• Popular technique for comparing time series• The series are "warped" non-linearly in the time dimension in order to find best match• Provides distance measure than can be further used for classification• Applies to both univariate and multivariate time series
- 16. Rating Scheme• provides quantitative judgments of the quality of collaboration• proposes the rating of seven collaborative dimensions on a 5 point scale• Collaboration Quality Average (CQA) is defined as the average value of six dimensions (Collaboration Flow, Sustaining Mutual Understanding, Knowledge Exchange, Argumentation, Structuring Problem Solving Process, Cooperative Orientation)
- 17. Time series• Time series: any sequence of observations recorded at successive time intervals (univariate, multivariate)• Examples of use: – Network traffic monitored by a web server per hour – Shares’ price in a stock market per week – Genes activity on biological processes
- 18. RMSE, MAE• MAE: all the individual differences are weighted equally in the average.• RMSE: the RMSE gives a relatively high weight to large errors.• The MAE and the RMSE can be used together to diagnose the variation in the errors in a set of forecasts.
- 19. Model evaluationBest MAE=0.89 where: – previous post assessment, machine learning techniques scored a MAE=0.74 – and MAE < 1 is acceptable for similar applications (Kahrimanis, 2010) – Simplicity of the model – Real time results
- 20. Differences????? Chat messages: a1 a2 a3 … aN-1 aNDifferences of Chat messages: a2-a1 a3-a2 … aN-aN-1

