Ruwaard, J. (2009). Computerized Text Analysis in Computer-mediated Therapy; The Interapy Post Traumatic Stress Corpus. Presented at the Fourth Meeting of the International Society for Research on Internet Interventions. Amsterdam, the Netherlands
10. Does LIWC predict outcome? Word Category Main effect Interaction Linear Quadratic Cubic I You Senses CogMech NegEmo PosEmo Present Past Future Word Count p < .01 p < .05
Good afternoon, The past two decades witnessed dramatic advances in Computerized Text analysis. Today, software for text analysis is readily available, making it much more easy to analyze text. As a result, computerized text analysis is applied more. 50% of the mail that you receive is spam, but you don’t notice it because the spam is filtered out by bayesian text classifiers. And the best example of applied computerized text analysis, of course, is google, which we use everyday to search the world wide web. Automated text analysis is likely to impact clinical psychology too, especially in e-mental health in which most of the contents of treatment are available in a digital format. To examine the application of computerized text analysis in a clinical context, we created the Interapy PTS corpus, a database of texts written by traumatized clients. Today, I will present this corpus, and some preliminary results that we obtained using one of the more well-known sofware programs for automated text analysis: Pennebaker’s Linguistic Inquiry and Word Count. In this explorative study, we focused on two questions: can computerized be used to measure treatment compliance, and, if so, do the resulting scores predict predict outcome?
The corpus that we have comprises over 4000 texts written by 478 traumatized clients who started online CBT. The text data complements an existing database containing demographic variables, descriptors of the trauma profile, as well as outcome and process data. It is a rich database, very useful to examine different approaches to computerized content analysis. to study the feasability of different approaches to computerized content analysis.
Clients followed a treatment consisting of structured writing assignments that are delivered over the Internet, without f2f contact. The treatment was developed almost ten years ago at the University of Amsterdam by a research team led by Alfred Lange en Paul Emmelkamp. It is a highly standardized treatment in which the clients produce 10 texts in 3 distinct sequential interventions. The key intervention in the first phase is self-confrontation through imaginary exposure: clients are instructed to describe the event, in the present tense and with as much detail as possible, and are told not to worry about proper chronology or spelling errors. In the second phase, clients are asked to change perspective, to write an encouraging letter to a hypothetical friend who experienced a similar event. The aim here is to implement cognitive restructuring by asking the client to develop alternative interpretations of the event not for themselves, but for someone else. In the third phase, clients are instructed to compile a well-written coherent account of the event. The aim to let the client produce a text which symbolically marks the transition from an orientation to the past to the present and the future. If you like to know more about this particular treatment, Alfred Lange will present a more in depth review of this treatment and its trials in a symposium dedicated to PTSD on Friday.
Overall, outcome was good. Most clients reported a substantial decrease of trauma-related symptoms. At posttest, Cohen’s d, the standardized mean difference, was 1.5 as measured by the Impact of Event Scale, which assesses core PTS symptoms like intrusion and avoidance. But not everyone improved. Client 408, for example, the upper right cell in the graphic at the right. This client did not change, the growth curve is a flat line. Most clients responded to treatment, but not all.
To explore the relationship of treatment compliance and outcome, we analyzed the texts using Pennebaker’s linguistic inquiry and word count (L I W C), which is one of the better known programs for automated content analysis. This program counts words. It calculates the degree to which people use different categories of words. It taps standard linguistic dimensions like pronouns and negations, but it also taps emotion, cognition, time, and many other dimensions. The gray box illustrates what happens when the program scores the first person singular- “I” category, containing words like me, mine, and, of course, I itself. The program counts the occurrence of these words, and expresses that number as a percentage of total word count. This text contains 17 I words, and has a total word count of 122 words, resulting in a score of .14. That’s all there is to it. It is a simple procedure. The value of LIWC lies in its dictionary: the words that are counted and the way in which te words are categorized.
The categories are well-defined. It is relatively easy to relate the categories to treatment compliance. In the exposure phase, for example, clients are instructed to write in the present tense and in the first person singular, which map exactly to the “Present” and “I” word categories of LIWC. In the next phase, clients are instructed to write to a hypothetical friend, and so we would expect to see more You words. Moreover, we would expect more words related to cognitive reflection, as tapped by the “CogMech” category.
This figure provides a good illustration of the data. It is a plot of the percentage of “I” words over time. There are 10 texts, 1 to 4 are written in the first phase, 5 to 8 in the second phase, and 9 and 10 are written in the third. Each line represents a client, and the darker line represents the average trend. And as you can see, the scores clearly mirror assignment instructions. Clients used more I words in phase 1, as instructed, and less in phase 2, in which they changed perspective in the letter to the hypothetical friend.
Other categories clearly mirror assignment instructions too, as can be seen in this plot in which we averaged scores within phases. Top left, you see the pattern in I words again, more clearly now, and next to it you see that the pattern in You scores is the almost perfect inverse of I. Words reflecting cognitive processing peak at phase 2, Sense related words, such as see, hear, feel, peak at phase 1, Negative emotions decrease, positive emotions increase, past tense words increase and peak at phase 3, present tense words are high in phase 1, and the future tense is used most in the last phase. On average, clients did what they were asked to do. LIWC tells us that treatment compliance was good.
Because of the strong relationship between phase and LIWC scores, phase can be predicted from a LIWC profile. We took a random subset of the data, about two-thirds of the total sample, and used it to train a statistical classifier to predict phase from the category scores. Next, we tested this classifier with the remaining cases. For phase 1 and 2 texts, this classifier correctly inferred phase from LIWC scores in about 80% of cases. Phase 3 texts, however, confused the classifier. This tells us that LIWC profiles of phase 1 and phase 2 text follow a distinctive pattern, but that the profile of phase 3 text is less clear. This appears to be directly related to the specifity of the instructions.
Now that we appear to have at least a partial measure of treatment compliance, we can ask whether differences in compliance relate to differences in outcome. And the short answer to that question is: “yes, but not very convincingly”. In most cases, we were unable to show statistical relationships between pre/post difference scores and LIWC scores. We found five weak relationships, and only two strong effects. Outcome appeared to better for clients who used more positive emotion words, and better for clients who varied the use of negative emotion words. Still, the effects explained only a very small part of the variance in outcome. Note that we used a liberal critical alpha and that we did not correct p-values for multiple testing. With Bonferroni corrections, we would have to conclude that there was no relationship between the LIWC scores and outcome.
Exploratively, we compared 25 clients with a very good outcome to 25 clients who did not change at all. The average difference score on the impact of event scale was 53 in the first group, and zero in the second. The advantage of this approach is that effects, if present, would show up more clearly. Another advantage is that it allows visual inspection of the differences. The difference were tiny, but if we zoom in, we see that the effects appear to confirm that better compliance is associated with better outcome. In the graphs, the black lines represent treatment responders, and the red lines the non-responders. Overall, however, the statistical analyses confirmed the results obtained in the full sample. Again, we found only a few small, marginally significant, relationships between LIWC scores and outcome.
To sum up: we found that LIWC mirrors instructions and the intended therapeutic effects. LIWC scores tell us that treatment compliance was good. Surprisingly, however, compliance as measured by LIWC was unrelated to treatment outcome. As a next step, we will use more probabilistic approaches to computerized text analysis. Recently, pennebaker used a technique called latent semantic analysis to predict outcome, and with this technique outcome was predicted much better than with LIWC. We will certainly try to replicate his findings in this sample, and will keep you posted on the results.