A presentation given at the BCcampus Symposium on Scholarly Inquiry into Teaching and Learning, Nov. 2014. I discuss a pilot research project on gauging the impact of peer feedback on writing over the course of multiple peer feedback sessions.
Christina HendricksProfessor of Teaching at University of British Columbia-Vancouver
1. TRACKING A DOSE-RESPONSE
CURVE IN PEER FEEDBACK ON
WRITING: A WORK IN PROGRESS
BC Campus Symposium, Nov. 14, 2014
Christina Hendricks
Sr. Instructor, Philosophy
University of British Columbia-Vancouver
Slides licensed CC-BY 4.0
2. My first SoTL project!
• Co-investigator: Dr. Jeremy Biesanz,
Psychology, UBC-Vancouver
• Funding: SoTL Seed Fund, Institute for the
Scholarship of Teaching and Learning, UBC
• This is a work in progress: currently
analyzing data from pilot study (2013-2014),
looking for feedback for larger one (2015-
2016)
3. SoTL literature on peer feedback
• Peer feedback improves writing (Paulus, 1999;
Cho & Schunn, 2007; Cho & MacArthur, 2010; Crossman
& Kite, 2012)
•Writing improves from giving peer feedback
(Cho & Cho, 2011; Li, Liu & Steckelberg, 2010)
• Gaps in the literature:
• More peer feedback sessions -> increased
implementation of feedback in writing?
• Do comments on one paper transfer to later papers
(rather than just revisions of same paper)?
4. http://artsone.arts.ubc.ca
Interdisciplinary, team-taught, full year course
for first-year students; 18 credits (6 each in first-year
English, History, Philosophy)
Writing intensive: Students write 10-12 essays
(approx 1500-2000 words)
Weekly structure:
• Lecture once per week (100 students)
• Seminars twice per week (20 students)
• Tutorials once per week (4 students plus instructor;
instructor does 5 of these per week)
5. Research questions
1. Do later essays improve on the dimensions in feedback
received as well as those in feedback given by students?
One more than the other?
2. Do later essays improve on the dimensions in peer
feedback given and/or received even when instructor
comments don’t agree with these?
3. Are students more likely to implement peer comments for
later essays after a few sessions or do they do so right
away?
4. Does the quality of peer comments improve over time (as
compared to instructor comments and/or raters’
evaluation of essays)?
6. Data
• 10 essays by each participant (13 in pilot
study 2013-2014)
• Comments by each student in a small group
(4 students) on peers’ essays (at least 2 per
essay)
• Comments by instructor on each essay
• All essays and comments coded according to
a common rubric
7. Rubric
4 categories:
• Strength of argument
• Organization
• Insight
• Style/mechanics
Subcategories in each, plus degree (1-3)
-- total of 11 subcategories
8. Complications, difficulties
• Gathering written comments by peers
• 2013-2014: wiki
• 2014-2015: piloting sidebar comments on
website
•Tutorial discussions each week
•Written comments sometimes given before,
sometimes after
• How to incorporate oral discussion of essays?
9. Where we are right now
• Research assistants (UBC undergrads):
• Jessica Wallace (Psychology, author/editor)
• Daniel Munro (Philosophy, former Arts One)
• Kosta Prodanovic (English, former Arts One)
• Refined coding rubric: added, subtracted,
condensed dimensions according to peer comments
• Split student comments into single meaning
units
• Achieved inter-coder reliability on student
comments
10. Inter-coder reliability on student
comments (approx 2000 total)
242
comments
Last 70
All 3 coders agree on degree (1-3),
regardless of category
90% 87%
3 agree on category & final decision
(after mtg) is same
56% 67%
2 or 3 agree on category & final
decision is same
82% 93%
2 agree on category & final is different 12% 7%
11. Inter-coder reliability on student
comments: Fleiss’ Kappa
• Average for 141 comments: 0.61 (moderate
agreement)
• For the most frequently used categories: 0.8
(substantial agreement)
• Agreement on degree (numbers 1-3) (Intra Class
Correlation): 0.96
Now: if two raters agree on category & number,
that’s the final decision; otherwise meet and
discuss
12. Coding yet to be done
• Instructor comments on all essays
• To isolate comments only given by peers
• To measure improvement in student comment
quality over time
• Coding essays on the categories and
degrees on the rubric used for comments
• To measure improvement in essays over time
13. Analyses to be done
Cross-lagged panel design with auto-regressive
structure
E1 quality E2 quality E3 quality
E1 comments E2 comments E3 comments
14. Timeline
• April 2015: Finish coding all essays and
comments
•May-June 2015: Do statistical analyses to
address research questions
• July-August 2015: Refine the design for a
larger study to start Sept. 2015, recruit
other Arts One instructors to join the study
15. References
• Cho, K., & MacArthur, C. (2010). Student revision with peer and
expert reviewing, Learning and Instruction. 20, 328-338.
• Cho, Y. H., & Cho, K. (2011). Peer reviewers learn from giving
comments. Instructional Science, 39, 629-643.
• Cho, K. & Schunn, C. D. (2007). Scaffolded writing and rewriting in
the discipline: A web-based reciprocal peer review system.
Computers & Education, 48, 409–426
• Crossman, J. M., & Kite, S. L. (2012). Facilitating improved writing
among students through directed peer review, Active Learning in
Higher Education, 13, 219-229.
• Li, L., Liu, X., & Steckelberg, A. L. (2010). Assessor or assessee: How
student learning improves by giving and receiving peer feedback.
British Journal of Educational Technology, 41(3), 525–536.
• Paulus, T. M. (1999). The effect of peer and teacher feedback on
student writing. Journal of Second Language Writing, 8, 265-289.
Too much to record and trascribe all tutorials
2013-2014: asked students to pick two things they got out of tutorials that they think are important
2014-2015: not doing this
Refined coding rubric
Added, subtracted, condensed dimensions acc. to student comments
Added examples of each sub-dimension
Single meaning units
Had several comments that could be given more than one code; needed to split them up so each comment had one code so as to better do analysis
Inter-coder reliability
Refined coding rubric
Added, subtracted, condensed dimensions acc. to student comments
Added examples of each sub-dimension
Out of 242 peer comments:
All 3 coders agree on value (1-3), regardless of dimension: 90%
2 or 3 coders agree on dimension and final decision (after meeting) is same as that: 82%
2 coders agree on dimension & final is different: 12%
3 coders agree on dimension & final is same: 56%
Just last set of 70 comments
Single meaning units
Had several comments that could be given more than one code; needed to split them up so each comment had one code so as to better do analysis
How much agreement do we observe relative to how much we would expect to see by chance?
-- takes into account the frequency of the type of code occurring in the data
-- some codes are more frequent, so you’d expect those to have more apparent agreement
-1 to +1
0 = amount of agreement we’d expect to see by chance
-1 is complete disagreement
0.6 is moderate agreement; 0.8 is substantial
-- Kappa includes just the category
Many of the mostly used categories have agreement in 0.8 range
Reliability on degree: intra class correlation (ICC) of 0.96
-- to what extent is the average across the three raters reliable: average of all the numbers each gave—how does this correlate with the average of everyone who could possibly do this—get no benefit for adding more people
-- average is 2.5
-- 1’s are pretty infrequent
-- people agree on whether a 2 or a 3 (40% are 2s, 60% are 3s)