Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Archeology - A theory- and context-informed approach to analyzing data traces

819 views

Published on

Theoretical overview and two examples of Data Archeology - a need to deeply understand context and engage in ground-truthing when analyzing large sets of digital data.

Published in: Education
  • Be the first to comment

Data Archeology - A theory- and context-informed approach to analyzing data traces

  1. 1. DATA ARCHEOLOGY Image Credit: Pedro Szekely via Flickr (CC BY 2.0), adapted A T H E O R Y - I N F O R M E D A P P R O A C H T O A N A LY Z I N G D ATA T R A C E S O F S O C I A L I N T E R A C T I O N I N L A R G E S C A L E L E A R N I N G E N V I R O N M E N T S ALYSSA WISE S I M O N F R A S E R U N I V E R S I T Y
  2. 2. OVERVIEW DATA ARCHEOLOGY – THE BIG IDEA APPLICATION THE ROLE OF “LISTENING” IN ONLINE DISCUSSIONS SOCIAL INTERACTION IN LARGE SCALE LEARNING ENVIRONMENTS CONCLUSION
  3. 3. DATA ARCHEOLOGY - THE BIG IDEA
  4. 4. DATA MINING Image Credit: Scott Clark via Flickr (CC BY 2.0), adapted
  5. 5. DATA GEOLOGY Image Credit: APS Museum via Flickr (CC BY 2.0), adapted ( S H A F F E R , 2 0 1 3 )
  6. 6. DATA ARCHEOLOGY Image Credit: U.S. Army Corps of Engineers Europe District via Flickr (CC BY 2.0), adapted ( W I S E , 2 0 1 3 , 2 0 1 4 )
  7. 7. DATA ARCHEOLOGY Image Credit: Pedro Szekely via Flickr (CC BY 2.0), adapted ( W I S E , 2 0 1 3 , 2 0 1 4 )
  8. 8. DATA ARCHEOLOGY THEORETICALLY-INFORMED EFFORTS TO MAKE SENSE OF THE DIGITAL ARTIFACTS LEFT BEHIND BY A PRIOR LEARNING “CIVILIZATION”
  9. 9. MOVING BEYOND “MORE IS BETTER” AS A LEARNING MODEL TO PROBE WHAT KINDS OF THINGS ARE BETTER FOR WHAT PURPOSES AND WHY THEORETICALLY-INFORMED
  10. 10. ATTENDING TO THE PEDAGOGICAL CONTEXT AS A CRITICAL FRAME FOR INTERPRETING THE PAST ACTIVITY THAT OCCURRED LEARNING “CIVILIZATION”
  11. 11. THE COMPLETE AGGREGATED DATA RECORD AVAILABLE AT THE END DOESN’T REFLECT THE DYNAMIC ENVIRONMENT IN WHICH THE ACTIVITY OCCURRED TEMPORALITY & TRAJECTORIES
  12. 12. WHAT TOOLS ARE THERE IN THE ONLINE ENVIRONMENT ? (RE)FRAMING QUESTIONS WHAT IS THE PURPOSE OF THE LEARNING ACTIVITIES CONDUCTED IN THE TOOLS ? HOW MUCH DO STUDENTS USE THEM? WHAT ARE THEORETICALLY DESIRABLE PAT TERNS OF PARTICIPATION ? HOW CAN THESE BEST BE PROXIED BY THE AVAILABLE DATA? FROM TO F O R M O R E O N C O N N E C T I N G L E A R N I N G A N A L Y T I C S + L E A R N I N G D E S I G N S E E L O C K Y E R , H E A T H C O T E & D A W S O N [ 2 0 1 3 ]
  13. 13. THE ROLE OF “LISTENING” IN ONLINE DISCUSSIONS
  14. 14. AN ONLINE DISCUSSION FORUM IS A TOOL IT’S EDUCATIONAL PURPOSE CAN CHANGE Q & A Peer Review Dialogue Reading Response Team Decision Making Argumen- tation
  15. 15. DIFFERENT PURPOSES FOR A DISCUSSION FORUM IMPLY DIFFERENT EXPECTATIONS FOR DESIRED PATTERNS OF USE
  16. 16. ONLINE DISCUSSION LEARNING PURPOSE Externalizing one’s ideas by contributing posts to an online discussion Taking in the externalizations of others by accessing existing posts • Social constructivist perspective - online discussions as a forum for learning through dialogue • Learning occurs as students articulate their ideas, are exposed to the ideas of others, and negotiate differences in perspective • Focus on how students contribute comments (“speak”), attend to other’s messages (“listen”), and the cxns bet them
  17. 17. UNDERLYING THEORY OF ONLINE “LISTENING” Listening not Lurking Lurker • Specific person who participates passively • Accesses existing comments but does not contribute • Negative connotation Listening • Active process conducted by anyone in online discussion • Activity interrelated with contributing. • Productive element of discussion participation Listening • Specific term (online discussions) • Dynamic text, distinct sub-units • Multi-authored • Generating a response often involved Reading • Generic term (all written text) • Static, cohesive text • Single author • Does not require response Listening not Reading
  18. 18. Speaking  Mechanism for sharing ideas  Value in speaking that is  Relevant to the topic at hand  Rationaled with evidence  Recurring and distributed  Moderately portioned  Responsive to the conversation Listening  Mechanism for becoming aware of ideas  Value in listening that is  Broad (to consider a diversity of ideas)  Deep (to consider ideas in earnest)  Recursive (to provide context for discussion flow)  Integrated (attending to connected rather than scattered comments) ONLINE DISCUSSION LEARNING MODEL
  19. 19. ONLINE DISCUSSION PEDAGOGICAL CONTEXT • Group and Timing – Small group discussions (~8-12 students) – Random assignment (would be better with differing perspectives) – Discussions run on a weekly schedule with course • Task – Contested real-world challenges (business, edu psychology) – Given two viable contrasting perspectives, come to consensus – Share decision with rationale with whole class • Expectations – Given criteria / guidelines for speaking and listening – Assessment varies (individual/group, student/instructor driven)
  20. 20. ONLINE DISCUSSION TECHNOLOGICAL CONTEXT
  21. 21. ONLINE DISCUSSION TECHNOLOGICAL CONTEXT
  22. 22. Listening  Mechanism for becoming aware of ideas  Value in listening that is  Broad (to consider a diversity of ideas)  Deep (to consider ideas in earnest)  Recursive (to provide context for discussion flow)  Integrated (attending to connected rather than scattered comments) ONLINE DISCUSSION LEARNING MODEL
  23. 23. Criteria Metric Definition Breadth % posts viewed Number of unique posts that a student viewed divided by the total number of posts in the discussion % posts read Number of unique posts that a student read divided by the total number of posts in the discussion Depth % (real) reads Number of times a student viewed other’s posts at < 6.5 wps, divided by the total number of views Av length of real reads (min) Total time spent reading posts, divided by the number of reads (after scans removed ) Recursiveness # of reviews of others’ posts Number of times a student revisited posts that they had viewed previously in the discussion Integration Posts read connected, not scattered Concentration of posts viewed by a student in the discussion space* [thread-density, network metrics..] ONLINE DISCUSSION LISTENING ANALYTICS
  24. 24. Speaking  Mechanism for sharing ideas  Value in speaking that is  Relevant to the topic at hand  Rationaled with evidence  Recurring and distributed  Moderately portioned  Responsive to the conversation ONLINE DISCUSSION LEARNING MODEL
  25. 25. Criteria Metric Definition Recurring Number of posts Total number of posts a student contributed to the discussion Percent of sessions with posts Number of sessions in which a student made a post, divided by their total of number sessions Moderately Portioned Average post length Total number of words posted by a student divided by the number of posts they made to the discussion Responsive Depth of response to existing conversation 0 None 1 Acknowledging 2 Responding to an idea 3 Responding to multiple ideas Rationaled Degree of argumentation 0 No argumentation 1 Unsupported argumentation (Position only) 2 Simple argumentation (Position + Reasoning/Evidence) 3 Complex argumentation (Position + Reasoning /Evidence+ Qualifier/Rebuttal) ONLINE DISCUSSION SPEAKING ANALYTICS
  26. 26. SOME RESULTS [ W I S E , H S I A O E T A L . 2 0 1 2 ] [ W I S E , P E R E R A E T A L . , 2 0 1 2 ] [ W I S E , S P E E R E T A L . 2 0 1 3 ] Depth Breadth (% of posts viewed) Low High Low Disregardful Coverage High Focused Thorough Un-engaged Engaged
  27. 27. SOME RESULTS
  28. 28. SOME RESULTS Depth (% of real reads) Breadth (% of posts viewed) Low High Low Disregardful Coverage High Focused Thorough Un- engaged Engaged
  29. 29. SOME MORE RESULTS [ W I S E , H AU S K N E C H T & Z H A O , 2 0 1 4 ] Greater Listening Depth (% of real reads) Listening Recursiveness (# reviews of others’ posts) Associated with More Rationaled Speaking More Responsive Speaking Listening Breadth not associated with any speaking qualities in the study. Less important for current pedagogical design?
  30. 30. FLESHING OUT TYPOLOGIES Pattern Characteristic Behaviors Disregardful Minimal attention to others’ posts (few posts viewed; short time viewing). Brief and relatively infrequent sessions of activity in discussions. Coverage Views a large proportion of others’ posts, but spends little time attending to them (often only scanning the contents). Short but frequent sessions of activity, focusing primarily on new posts. *May be socially-oriented or content-driven. Focused Views a limited number of others’ posts, but spends substantial time attending to them. Few extended sessions of activity in discussions. Thorough Views a large proportion of other’s posts; spends substantial time attending to many of them. Long overall time spent listening; considerable revisitiation of posts already read.
  31. 31. GROUND TRUTHING VIA TEMPORAL MICROANALYTIC CASE STUDIES Date Time Session Action Duration (min) Length (words) Message # 6/3/2011 23:46 1 Read 44.43 413 447 6/3/2011 23:52 1 Read 1.73 60 455 6/4/2011 00:08 1 Scan 0.23 117 459 6/4/2011 00:09 1 Read 12.51 413 460 6/4/2011 23:49 2 Post 3.18 120 477
  32. 32. TAKEAWAY ATTENDING TO THE PEDAGOGICAL CONTEXT OF DISCUSSION FORUM USE AND CRAFTING THEORETICALLY INFORMED METRICS LET US EXTRACT EXPLANATORY AND ACTIONABLE INFORMATION FROM THE CLICKSTREAM DATA
  33. 33. SOCIAL INTERACTION IN LARGE SCALE LEARNING ENVIRONMENTS WITH THANKS TO MY RESEARCH ASSISTANT YI CUI
  34. 34. CHALLENGES WE SET OURSELVES FOR LOOKING AT THE MOOC DATA LOOK AT SOCIAL INTERACTION ADDRESS ISSUES OF SCALE EMPLOY NATURAL LANGUAGE PROCESSING* ATTEND TO PEDAGOGICAL CONTEXT WORK IN A THEORY-INFORMED WAY
  35. 35. SOCIAL INTERACTION IN MOOC FORUMS ST RONG PRED IC TOR OF PERSIST ENCE BUT T HIS M AY B E BECAUS E IT IND EXES (RAT HER T HAN CAUS ES ) ENG AG EMENT – WHAT ABOUT L EARNING ? CL AIM ED TO PROVID E CRIT ICAL S OCIAL L EARNING S UPPORT BUT WIT HOUT T IES TO T HE ACAD EMIC CONT ENT, S OCIABIL IT Y MAY NOT IMPAC T L EARNING [ K U H , 2 0 0 2 ; W I S E , D E L V A L L E , C H A N G & D U F F Y, 2 0 0 4 ] ONLY A S MAL L % PART ICIPATE BUT T HIS IS NOT SURPRISING IF IT IS NOT D ES IG NED INTO A COURSE . HOW PEOPL E PART ICIPATE IS AS IMPORTANT AS IF T HEY D O S O.
  36. 36. WHY FOCUS ON LEARNING NOT ATTRITION? S T R O N G N E E D TO R E C O N C E P T U A L I S E P E R S I S T E N C E A N D AT T R I T I O N I N M O O C S G I V E N T H E N U M B E R O F P E O P L E W H O R E G I S T E R W / O “A N I N F O R M E D C O M M I T M E N T TO C O M P L E T E T H E C O U R S E ” [ D E B O E R , H O , S T U M P & B R E S L O W , 2 0 1 4 ] G R E AT VA R I E T Y I N I N T E N T I O N S , W O R K I N G PAT T E R S , R E S O U R C E S U S E D, S E Q U E N C E A N D F R E Q U E N C Y O F U S E [ D E B O E R E T A L . , 2 0 1 4 ; K I Z I L C E C , P I E C H & S C H N E I D E R , 2 0 1 3 ] J U S T I N D E X I N G L E V E L O F E N G A G E M E N T TO P R E D I C T W H O W I L L S TO P PA R T I C I PAT I N G D O E S N ’ T T E L L U S W H Y O R H O W TO I N T E R V E N E L E A R N I N G M AY B E O C C U R R I N G E V E N F O R T H O S E W H O D O N ’ T E V E N T U A L LY C O M P L E T E
  37. 37. WHY FOCUS ON LEARNING NOT ATTRITION? S T R O N G N E E D TO R E C O N C E P T U A L I S E P E R S I S T E N C E A N D AT T R I T I O N I N M O O C S G I V E N T H E N U M B E R O F P E O P L E W H O R E G I S T E R W / O “A N I N F O R M E D C O M M I T M E N T TO C O M P L E T E T H E C O U R S E ” [ D E B O E R , H O , S T U M P & B R E S L O W , 2 0 1 4 ] G R E AT VA R I E T Y I N I N T E N T I O N S , W O R K I N G PAT T E R S , R E S O U R C E S U S E D, S E Q U E N C E A N D F R E Q U E N C Y O F U S E [ D E B O E R E T A L . , 2 0 1 4 ; K I Z I L C E C , P I E C H & S C H N E I D E R , 2 0 1 3 ] J U S T I N D E X I N G L E V E L O F E N G A G E M E N T TO P R E D I C T W H O W I L L S TO P PA R T I C I PAT I N G D O E S N ’ T T E L L U S W H Y O R H O W TO I N T E R V E N E L E A R N I N G M AY B E O C C U R R I N G E V E N F O R T H O S E W H O D O N ’ T E V E N T U A L LY C O M P L E T E “I'm very happy to be in this course. I [couldn’t] finish it on time, but I think I have learnt a lot. Thank you Prof X, you are a great teacher, very [professional], excellent in many ways. I will miss you!”
  38. 38. SOCIAL INTERACTION IN MOOC FORUMS ST RONG PRED IC TOR OF PERSIST ENCE BUT T HIS M AY B E BECAUS E IT IND EXES (RAT HER T HAN CAUS ES ) ENG AG EMENT – WHAT ABOUT L EARNING ? CL AIM ED TO PROVID E CRIT ICAL S OCIAL L EARNING S UPPORT BUT WIT HOUT T IES TO T HE ACAD EMIC CONT ENT, S OCIABIL IT Y MAY NOT IMPAC T L EARNING [ K U H , 2 0 0 2 ; W I S E , D E L V A L L E , C H A N G & D U F F Y, 2 0 0 4 ] ONLY A S MAL L % PART ICIPATE BUT T HIS IS NOT SURPRISING IF IT IS NOT D ES IG NED INTO A COURSE . HOW PEOPL E PART ICIPATE IS AS IMPORTANT AS IF T HEY D O S O.
  39. 39. FRAMING QUESTIONS WHAT WAS THE PEDAGOGICAL PURPOSE / DESIGN OF THE DISCUSSION FORUMS IN THE PSYCH MOOC ? BASED ON THIS, WHAT WERE THEORETICALLY DESIRABLE PAT TERNS OF PARTICIPATION ? HOW CAN THESE BEST BE PROXIED BY THE AVAILABLE DATA? HOW COULD THE DESIRED PAT TERNS BE BET TER SUPPORTED ?
  40. 40. MOOC PEDAGOGICAL CONTEXT • Course Topic – Introductory Psychology • Level and Expected Background – Designed for college freshmen – Equivalent of high school education expected – No specific prior knowledge indicated • Course Design – Video lectures (8-15 min long) – Readings from OLI (Open Learning Initiative) online textbook – Weekly timed multiple choice quiz – Final exam at the end of the course
  41. 41. WHAT ABOUT THE DISCUSSION FORUMS? • Optional part of the course, main pedagogical design a Q&A forum to ask and answer questions about course material Communication “There will be a Q&A forum where you can post your questions about the course. Students will have the opportunity to "vote up" questions they want answered, and the questions with the most votes will be answered either in a forum post or a video.” …. Expectations “Participants are expected to seek help if needed from your fellow students by using the forums”
  42. 42. RECREATED STUDENT FORUM VIEW Forums Welcome to the course discussion forums. Sub-forum Activity General Discussion Discuss general aspects of the course. Q&A Ask and answer questions about course material. Assignments Discuss details of the course assignments. Technical Issues Post any issues with, or questions about, technical aspects of the course website (trouble with video playback, broken links, etc.). OLI Textbook Questions Post any issues with, or questions about, technical aspects of the OLI Textbook. Student Bios Introduce yourself and learn about other students.
  43. 43. RECREATED STUDENT FORUM VIEW Notes: [1] Counts only include non-deleted posts/threads [2] Counts taken prior to data cleaning, may include duplicate or nonsense posts Forums Welcome to the course discussion forums. Sub-forum Activity Threads (Posts + Comments) General Discussion Discuss general aspects of the course. 289 (1341+804) Q&A Ask and answer questions about course material. 158 (525+204) Assignments Discuss details of the course assignments. 147 (827+775) Technical Issues Post any issues with, or questions about, technical aspects of the course website (trouble with video playback, broken links, etc.). 108 (318+79) OLI Textbook Questions Post any issues with, or questions about, technical aspects of the OLI Textbook. 99 (347+106) Student Bios Introduce yourself and learn about other students. 662 (1614+354)
  44. 44. PROCESS CHECK DOES IT MAKE SENSE TO USE LISTENING AND SPEAKING THEORY IN THIS CONTEXT ? • PEDAGOGICAL CHALLENGE – M O S T O F T H E D I S C U S S I O N I S N ’ T C O N T E N T R E L AT E D, L I S T E N I N G I S N ’ T E X P E C T E D TO R E L AT E TO L E A R N I N G • TECHNICAL CHALLENGE – LO W G R A N U L A R I T Y D ATA ( “ V I E W. F O R U M ” & “ V I E W.T H R E A D ” V S . “ V I E W. P O S T ”, T H O U G H “ V OT E . U P ” N O W AVA I L A B L E ) • PRACTICAL CHALLENGE – M A N Y T H R E A D S I N Q & A F O R U M N OT A C T U A L LY C O N T E N T
  45. 45. CHANGING TRACKS A WHOLE BUNCH OF QUESTIONS WE THOUGHT WE WERE GOING TO ASK WENT OUT THE WINDOW… • NEW (VERY BASIC ) FOCUS ON IF THE PAT TERNS OF FORUM USE MATCHED THOSE DESIRED FOR THE INTENDED PURPOSE – D I D S T U D E N T S U S E T H E Q & A F O R U M TO A S K Q U E S T I O N S A B O U T T H E C O U R S E M AT E R I A L ? – D I D T H E I N S T R U C TO R S R E P LY TO ( T H E H I G H E S T V OT E D ) Q U E S T I O N S A B O U T T H E C O U R S E M AT E R I A L ?
  46. 46. DID STUDENTS USE THE Q&A FORUM TO ASK QUESTIONS ABOUT THE COURSE MATERIAL? • After preliminary inspection we decided to code both the Q&A and General Discussion (GD) forums b/c no clear fxnl difference was seen • Two raters coded the starting post in each thread as either – Content [C] (Asking questions about course material, expanding on course content; discussing a resource shared) – Non-Content [X] (Including logistics, social, study group formation and link sharing) • 439 of 447 total threads coded – 8 removed for foreign language or complete nonsense contents – 92% agreement (k=0.81), All difference reconciled, rule of leniency Image: So Many MOOCs by mksmith23, CC by 2.0 license
  47. 47. DID STUDENTS USE THE Q&A FORUM TO ASK QUESTIONS ABOUT THE COURSE MATERIAL? Content Threads Non-Content Threads General Discussion 55 226 Q&A 68 90 Total 123 (28%) 316 (72%) Image: So Many MOOCs by mksmith23, CC by 2.0 license
  48. 48. DID THE INSTRUCTORS REPLY TO (THE HIGHEST VOTED) QUESTIONS ABOUT THE COURSE MATERIAL? • First approach: “Instructor Replied” label [problematic] • 2 “official” Instructor IDs (threads automatically labelled) – Course Professor [2XXXXX4] – Course TA [5XXXX1] • 1 “unofficial” Instructor ID (threads not automatically labelled) – Course Professor [2XXXXX0] “[Yes,] I really am NAME2XXXXX0 (XXXX is my first name) and am the instructor for the course. I've been at UNIVERSITY for 43 years and love teaching. This course was a challenge because there was no feedback from students when the modules were being taped. The lack of student interaction is the real challenge of a MOOC. Just looking at a camera is a very different context than looking at a classroom of bright students. NAME2XXXXX0, Instructor” Image: So Many MOOCs by mksmith23, CC by 2.0 license
  49. 49. Forum Threads Instructor Replied (All 3 IDs) Content Threads Replied by instructor % of Instructor Replies Directed at Content General Discussion 289 31 (11%) 55 5 (9%) 16% Q&A 158 31 (20%) 68 17 (25%) 55% Total 447 62 (14%) 123 22 (18%) 35% DID THE INSTRUCTORS REPLY TO (THE HIGHEST VOTED) QUESTIONS ABOUT THE COURSE MATERIAL?Image: So Many MOOCs by mksmith23, CC by 2.0 license
  50. 50. Instructor Replied (62 threads) Non-Replied (377 threads) Average # (range) of votes 2.6 (0 to 30) 1.7 (-13 to 45) Av # (range) of posts+comments 8.6 (2-110) 6.2 (1-92) Av # (range) views 110 (25-1185) 75 (5-1143) Content Threads Non-Content Threads Av # votes 1.2 2.1 Av # posts+comments 4.1 7.5 Av # views 51 91 DID THE INSTRUCTORS REPLY TO (THE HIGHEST VOTED) QUESTIONS ABOUT THE COURSE MATERIAL?Image: So Many MOOCs by mksmith23, CC by 2.0 license
  51. 51. A CORE CHALLENGE FOR SOCIAL INTERACTION AT SCALE • Too much quantity, not enough quality • Students get lost / overwhelmed in the abundance of communication • Instructors too, challenging to find where their input is needed • A need to separate “the wheat from the chaff” Image: So Many MOOCs by mksmith23, CC by 2.0 license
  52. 52. CAN NATURAL LANGUAGE PROCESSING HELP? • Goal to support the instructor in finding content threads more efficiently in the forums to be able to respond and facilitate learning • A modest attempt to build a proof-of-concept model – Feature extraction performed with basic bag-of-words feature set (inc. bigrams, trigrams and parts-of-speech tagging), rare threshold of 5 – Unigrams and bigrams alone most useful to characterize and model posts – Total of 1573 features extracted Image: So Many MOOCs by mksmith23, CC by 2.0 license
  53. 53. CHARACTERISTIC FEATURES Feature Kappa but 0.25 more 0.24 by 0.24 in_the 0.24 why 0.24 as 0.23 what 0.22 is 0.22 that 0.21 or 0.20 then 0.20 in 0.19 when 0.18 of_the 0.17 of 0.17 Feature Kappa and_the 0.16 question 0.16 between 0.16 age 0.16 correct 0.16 than 0.16 were 0.15 by_the 0.15 an 0.15 answer 0.15 does 0.14 mental 0.14 research 0.14 to_the 0.14 their 0.14 Feature Kappa course 0.12 i 0.09 my 0.08 this_course 0.07 final 0.06 BOL_i 0.06 quiz 0.06 the_course 0.06 exam 0.05 thanks 0.05 i_am 0.05 videos 0.04 grade 0.04 certificate 0.04 BOL_hi 0.04 Feature Kappa BOL_hello 0.04 hello 0.04 everyone 0.04 will 0.04 final_exam 0.04 i_just 0.04 hi 0.04 i_can 0.04 find 0.04 coursera 0.03 courses 0.03 i_have 0.03 grades 0.03 material 0.03 quizzes 0.03 Content Threads Non-Content Threads Image: So Many MOOCs by mksmith23, CC by 2.0 license
  54. 54. PREDICTING CONTENT POSTS • Procedure – Algorithm: Support Vector Machines – Setting for Nominal Class Values: LibLINEAR – Cross-validation, 10 randomly generated folds • Results – Best Model Accuracy/Kappa = 0.86/0.64 – Recall = 0.71 (False Neg. rate = 0.29) – Precision = 0.76 Image: So Many MOOCs by mksmith23, CC by 2.0 license
  55. 55. STANDARD FORUM INDICATORS DON’T HELP IDENTIFY CONTENT Accuracy Kappa Recall Precision Base Model 0.86 0.64 0.71 0.76 Addition of Standard Forum Indicators # votes 0.84 0.60 0.68 0.74 # posts 0.85 0.62 0.69 0.75 # views 0.85 0.62 0.69 0.76 Image: So Many MOOCs by mksmith23, CC by 2.0 license
  56. 56. INSTRUCTOR PERSPECTIVE W/o Content Model (Default) With Content Model Total Number of Potential Content Threads to Read 439 (37/wk on av) 114 (10/wk on av) Percent of Threads Actually About Course Content 28% 76% Percent of Content Threads With Instructor Replies 18% >18%? Percent of Instructor Replies Addressing Content 35% >35%? Image: So Many MOOCs by mksmith23, CC by 2.0 license
  57. 57. STRENGTHS, LIMITATIONS & FUTURE OPPORTUNITIES • Design of forums can improve, but unexpected use will still happen. For instructors to facilitating learning, the first step is to locate where learning opportunities are happening – content modeling can help. – Aligns well with Coursera’s development of content / logistics TAs. • Model is simple but useful, more sophisticated modelling can improve these results. • Model built with only 439 starting posts, including all the posts could lead to both better prediction of if a post is content-related and more nuanced assessment of threads (e.g. “This thread is estimated to have 87% content-related posts) • Model seemed not to draw heavily on domain-specific vocabulary but may rely on domain-specific discourse types (extensibility to other social sciences but perhaps not humanities / hard sciences) Image: So Many MOOCs by mksmith23, CC by 2.0 license
  58. 58. TAKEAWAY ATTENDING TO THE PEDAGOGICAL CONTEXT OF DISCUSSION FORUM USE AND GETTING CLOSE TO THE DATA LET US DEVELOP A SIMPLE YET APPROPRIATE AND USEFUL MODEL - SUPPORTING CONTENT RELATED LEARNING DISCUSSION MAY BE PRE-REQUISITE TO STUDYING MORE COMPLEX FACETS OF INTERACTION LARGE SCALE LEARNING ENVIRONMENTS
  59. 59. A DATA ARCHEOLOGY APPROACH THAT PAYS ATTENTION TO THE LEARNING “CIVILIZATION” THAT CREATED THE DATA AND POSITS THEORY-INFORMED PATTERNS OF BEHAVIOR CAN HELP US BETTER UNDERSTAND AND SUPPORT SOCIAL INTERACTION IN LARGE SCALE LEARNING ENVIRONMENTS CONCLUSION
  60. 60. DATA ARCHEOLOGY Image Credit: Pedro Szekely via Flickr (CC BY 2.0), adapted A T H E O R Y - I N F O R M E D A P P R O A C H T O A N A LY Z I N G D ATA T R A C E S O F S O C I A L I N T E R A C T I O N I N L A R G E S C A L E L E A R N I N G E N V I R O N M E N T S ALYSSA WISE S I M O N F R A S E R U N I V E R S I T Y

×