to empower existing machine learning techniques and minimize the workload of human evaluators regarding time series characteristics vs. qualitative assessmentsfor providing further feedback to collaborative partners
Time is a fundamental aspect of collaboration and further analysis regarding time can reveal the underlying group dynamics
a data pool of 212 collaborative sessions associated with quantitative assessments of collaboration qualityTime series constructed by the aggregated events of Number of Chat Messages and Workspace actions, Roles Alternations in chat and workspace activitydistance measure : Dynamic Time Warping (DTW) distance (Giorgino, T., 2009)
for most of the time interval/dissimilarity method combinations Best results considering correlation coefficient, MAE/RMSE occur for 1 minute time interval and Manhattan distance (0.3, 0.89, 1.1 respectively, for a value range {-2, 2}).For the classification one optimal match was used for each query time series. Results could be improved if we used more advanced techniques (k-nearest neighbor)
measuring similarity between two sequences which may vary in time or speedDTW is a method that allows a computer to find an optimal match between two given sequences (e.g. time series) with certain restrictions. The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension
that stand for the five, fundamental aspects of collaboration: communication, joint information processing, coordination, interpersonal relationship and motivationCollaboration Quality Average (CQA) is defined as the average value of six out of seven, dimensions (leaving out the motivational/Individual task orientation aspect)
It measures accuracy for continuous variables. Expressed in words, the MAE is the average over the verification sample of the absolute values of the differences between forecast and the corresponding observation. The MAE is a linear score which means that all the individual differences are weighted equally in the average.Root mean squared error (RMSE)The RMSE is a quadratic scoring rule which measures the average magnitude of the error. The equation for the RMSE is given in both of the references. Expressing the formula in words, the difference between forecast and corresponding observed values are each squared and then averaged over the sample. Finally, the square root of the average is taken. Since the errors are squared before they are averaged, the RMSE gives a relatively high weight to large errors. This means the RMSE is most useful when large errors are particularly undesirable.The MAE and the RMSE can be used together to diagnose the variation in the errors in a set of forecasts. The RMSE will always be larger or equal to the MAE; the greater difference between them, the greater the variance in the individual errors in the sample. If the RMSE=MAE, then all the errors are of the same magnitude
Transcript of "Time series analysis of collaborative activities-CRIWG2012"
1.
Time series analysis of collaborative activitiesIrene-Angelica Chounta, Nikolaos Avouris HCI Group, University of Patras {houren, avouris}@upatras.gr
2.
Outline• Objective• Time series and collaborative activities• Methodology of Analysis• Results• Conclusions and future work
3.
Objective• Use of time series as a tool of analysis• Real time assessment of activity• Classification of collaborative sessions
4.
Time series and collaborative activities • Time: important aspect of collaboration • Analysis regarding time can describe/reveal underlying group dynamics • Phenomena that may affect the quality of collaboration can be captured in this way (Vasileiadou, E., 2009)
5.
Methodology of Analysis (1) Memory-based learning modelCollaborative session X tsA /CQA_A DistanceX-A IF (DistanceX-Y is minimum) tsB /CQA_B DistanceX-B then { CQA_X ≈ CQA_Y } … where CQA: Collaboration Quality Assessment tsN /CQA_n DistanceX-N
6.
Methodology of Analysis (2)• a data pool of 212 collaborative sessions (collaboration quality assessed by rating scheme) (Kahrimanis, G., et al, 2009)• Groupware application: shared workspace + chat tool - Task: Dyads constructing flow charts – Duration: 1h30’• same conditions applied for all clients/collaborators
7.
Methodology of Analysis (3)• time series (multivariate) of aggregated sequences of events of collaborative activities per time interval – Number of Chat Messages and Workspace actions, – Roles’ Alternations in Chat and Workspace activity – Their differences between consecutive time intervals• Various time intervals (1, 5, 8 and 10 minutes)• distance measure: Dynamic Time Warping (DTW) distance (Giorgino, T., 2009)• two dissimilarity functions (Euclidean and Manhattan)
8.
Results (1)Model evaluation:• the correlation matrix of CQA(predicted vs. true value)• the root mean squared error (RMSE)• the mean absolute error (MAE)
9.
Results (2)• The two variables (predicted vs. real CQA value) are significantly and positively correlated (p<0.05, Rho>0) for all time intervals Manhattan Euclidean Time interval (min) p value Spearman’s Rho p value Spearman’s Rho 1 0.000 0.296 0.029 0.150 5 0.002 0.202 0.021 0.154 8 0.000 0.235 0.005 0.187 10 0.011 0.168 0.010 0.170
10.
Results (3)• MAE and RMSE For (CQA Є{-2, 2}) MAE RMSE Time interval (min) Manhattan Euclidean Manhattan Euclidean 1 0.89* 0.97 1.14 1.21 5 1.19 1.21 1.48 1.5 8 1.18 1.16 1.5 1.48 10 1.17 1.19 1.44 1.47
11.
Results (4)For time interval=1 minute and Manhattandistance: |CQA_eval-CQA_pred| %cases <0.5 41 <1 68.4 <2 92 CQA Є{-2, 2}
12.
Conclusions & Future Work• Significant positive correlations among the (CQA_evaluative, CQA_prediction)• Best results occur for 1 minute time interval and Manhattan distance (Rho:0.3,MAE: 0.89,RMSE: 1.1, CQA Є{-2, 2})• Advanced classification techniques (k-nearest neighbor) are expected to improve the results• Further explore real time assessment and the way feedback affects collaboration’s unfolding
14.
Euclidean vs. Manhattan• Best distance highly dependable on data’s nature• Euclidean distance is not good with high dimensional data Euclidean: Manhattan:
15.
Dynamic Time Warping• Popular technique for comparing time series• The series are "warped" non-linearly in the time dimension in order to find best match• Provides distance measure than can be further used for classification• Applies to both univariate and multivariate time series
16.
Rating Scheme• provides quantitative judgments of the quality of collaboration• proposes the rating of seven collaborative dimensions on a 5 point scale• Collaboration Quality Average (CQA) is defined as the average value of six dimensions (Collaboration Flow, Sustaining Mutual Understanding, Knowledge Exchange, Argumentation, Structuring Problem Solving Process, Cooperative Orientation)
17.
Time series• Time series: any sequence of observations recorded at successive time intervals (univariate, multivariate)• Examples of use: – Network traffic monitored by a web server per hour – Shares’ price in a stock market per week – Genes activity on biological processes
18.
RMSE, MAE• MAE: all the individual differences are weighted equally in the average.• RMSE: the RMSE gives a relatively high weight to large errors.• The MAE and the RMSE can be used together to diagnose the variation in the errors in a set of forecasts.
19.
Model evaluationBest MAE=0.89 where: – previous post assessment, machine learning techniques scored a MAE=0.74 – and MAE < 1 is acceptable for similar applications (Kahrimanis, 2010) – Simplicity of the model – Real time results
Be the first to comment