A self-training framework forexploratory discourse detection                      Zhongyu Wei SoLAR symposiumOpen Universi...
Outline Exploratory dialogue analysis A self-training framework Datasets and experiments Applications
Online learning resources explosionLearning                       Online Forum                        Seminar             ...
the critical, knowledge-buildingdiscourse?...
How many points in the webinar triggered learning/knowledge-building?                        This person contributes a lot...
Exploratory dialogue analysis             Exploratory dialogue               ……represents a joint, coordinated from of c...
Low exploratory dialogue Time      Contribution3:12 PM   LOL3:12 PM   Its not looking good.3:13 PM   Sorry, had to do that...
Higher exploratory dialogue Time    Contribution2:42 PM I hate talking. :-P My question was whether "gadgets" were just   ...
Exploratory dialogue detection Problem Statement   Given an online chatting session S = {d0, d1 … …dn}, dk    stands for...
Exploratory dialogue classification                                                  Explorator                       Expl...
Exploratory dialogue classification       Instance-based supervised classifier training              Explor              ...
An example of feature list   Feature       Exploratory Non-                             Exploratory   what-is       0.9992...
A self-training frameworkAnnotat            Classifier    Classifie  ed             training        r data    Step 1: Tra...
A self-training framework                           Unlabele                              d                             da...
A self-training framework                           Unlabele                              d                             da...
A self-training framework                         Unlabele                            d                           data    ...
KNN based Instance Selection approach K nearest neighbors classification                         Blue stands for “explora...
KNN based Instance Selection approach    Pseudo annotated instances P = {p1,p2,…                         …pn }      pk = (...
Outline Exploratory dialogue analysis A self-training framework Datasets and experiments Applications
Data source: OU online       conference        4 sessions including 2634 posts.Data in this study taken from a 2 day OU c...
Annotation 2 Annotators with one morning training. Four categories are given. Kappa value (binary) is 0.5978 (moderate)...
Experiment Setup Baseline:   CP: Cue phrase based method   MEGE: Supervised Max Entropy GE (Generalized Expectation)   ...
Evaluation Criterion                       Exp     Exp                       Exp     Exp                               No...
Experiment ResultApproach     Accuracy     Precision      Recall          F1Cue-           0.5389       0.9523       0.424...
Experiment ResultSession     MaxEnt    MaxEnt-     MaxEntG   MaxEntG                      Selftrain      E         E-     ...
Outline Exploratory dialogue analysis A self-training framework Datasets and experiments Applications
Transcript level visualization
Time line Visualization      80      60      40      20       0               9:28               9:32             10:13   ...
Time line VisualizationTime       User Id    Content                     added to which 2M often drops to 10% of that in p...
User Visualization                                       Contribution Distribution of Users                              5...
User Visualization                                      Contribution Distribution of Users                               T...
i                                            iThis step appears to have very goodcontent that will provoke deeper learning...
Conclusion We have extended our previously proposed self-  training framework for exploratory discourse  detection in syn...
Future WorkText analytics: Integrate KNN instance selection method into the  self-training framework Explore other featu...
Future WorkVisual analytics: Investigate how these can be rendered most  usefully for educators and learners Investigate...
Acknowledgments Thanks for the guidance and consideration of Dr.  He Yulan, Dr. Simon and Dr. Rebecca. Thanks for the co...
Zhongyu WeiThe Chinese University of Hong Kong, Hong Kong          http://www.se.cuhk.edu.hk/~zywei/                   Yul...
A self training framework for exploratory discourse detection final
A self training framework for exploratory discourse detection final
Upcoming SlideShare
Loading in …5
×

A self training framework for exploratory discourse detection final

1,513
-1

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,513
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Here is the example in Elluminate, which is a web conference tool that supports chat along sides video, slides and presentations. Everyday, there are hundreds of materials are recoded and uploaded.
  • In the middle panel, there are chat texts for this record. And the left one shows us all the users in the chatting room. The material here can be hours. It is very time consuming for you to read all these content. Oh, god, would you please tell me which part is critical and worthy to read? Just like this! Isn’t it wonderful if someone help you figure out which part is most important? In addition, those users who are worthy to focus.OK, that is what we want show you.
  • A self training framework for exploratory discourse detection final

    1. 1. A self-training framework forexploratory discourse detection Zhongyu Wei SoLAR symposiumOpen University,UK, 26 June 2012 PhD student, SEEM, The Chinese University of Hong Kong, Hong Kong SocialLearn intern, Open University, UK zywei@se.cuhk.edu.hk
    2. 2. Outline Exploratory dialogue analysis A self-training framework Datasets and experiments Applications
    3. 3. Online learning resources explosionLearning Online Forum Seminar Online Distant Conferen Educatio ce n Platform
    4. 4. the critical, knowledge-buildingdiscourse?...
    5. 5. How many points in the webinar triggered learning/knowledge-building? This person contributes a lot during the chat. This part appears to have very good content that will provoke deeper learningData in this study taken from a 2 day OU conference in Elluminate & Cloudworks:
    6. 6. Exploratory dialogue analysis  Exploratory dialogue  ……represents a joint, coordinated from of co-reasoning in language, with speakers sharing knowledge, challenging ideas, evaluating evidence and considering Categor ... … options Description Example y Challen Identifies that something ge may be wrong and in need I disagree. Freemind is a superb of correction piece of software to use... Evaluati Has a descriptive quality Thats a really interesting on approach Extensio Builds on or provides Ive embedded helens slide n resources that support share over in cloudworks discussion http://link.com Reasoni The process of thinking an Why intranet only? WhatMercer, N. (2004). Sociocultural discourse analysis: analysing classroom talk as a social mode of thinking. Journal of Applied Linguistics, 1(2),137-168. ng idea through. meaning CLOSED in
    7. 7. Low exploratory dialogue Time Contribution3:12 PM LOL3:12 PM Its not looking good.3:13 PM Sorry, had to do that.3:13 PM jaaa3:13 PM Ouch!3:13 PM It was a vuvuzela.3:13 PM I though that was you @Alistair3:13 PM Ive taken away the vuvuzela from you now!3:13 PM LOL
    8. 8. Higher exploratory dialogue Time Contribution2:42 PM I hate talking. :-P My question was whether "gadgets" were just basically widgets and we could embed them in various web sites, like Netvibes, Google Desktop, etc.2:42 PM Thanks, thats great! I am sure I understood everything, but looks inspiring!2:43 PM Yes why OU tools not generic tools?2:43 PM Issues of interoperability2:43 PM The "new" SocialLearn site looks a lot like a corkboard where you can add various widgets, similar to those existing web start pages.2:43 PM What if we end up with as many apps/gadgets as we have social networks and then we need a recommender for the apps!2:43 PM My question was on the definition of the crowd in the wisdom of crowds we acsess in the service model?
    9. 9. Exploratory dialogue detection Problem Statement  Given an online chatting session S = {d0, d1 … …dn}, dk stands for the kth dialogue, classify dk as exploratory or non-exploratory. Solution from learning analytics  Sociocultural discourse analysis method  Manual  High precision and low recall Category Cue phrases Challenge But if, have to respond, my view Evaluation Good example, good point Extension More links, for example Reasoning That is why, next step
    10. 10. Exploratory dialogue classification Explorator Explorator y y Dialog Discourse ue Non- Classifier Explorator y Dialogue is represented by a feature vector. {I think she is right}{I, think, she, is, right, I- think, think-she, she-is, is-right, I-think-she, think- she-is, she-is-right}
    11. 11. Exploratory dialogue classification  Instance-based supervised classifier training Explor Explor Explorato atory atory Explorator ry Classifier y Training Discourse Non- Non- Non- Classifier Explorat Explorat Explorator ory ory y  Feature-based supervised classifier trainingExplor Explor Explorato Explorato atory atory Feature Classifie ry Featu ry Generati re r List Discourse Non- Non- on Training Non- ClassifierExplorat Explorat Explorator ory ory y
    12. 12. An example of feature list Feature Exploratory Non- Exploratory what-is 0.9992 0.0008 good-point 0.9995 0.0005 your-audio- 0.001 0.999 should thank-you 0.004 0.996 my-name 0.07 0.93
    13. 13. A self-training frameworkAnnotat Classifier Classifie ed training r data  Step 1: Training initial classifier on annotated data.  Annotated data is time consuming to obtain
    14. 14. A self-training framework Unlabele d dataAnnotat Classifier Classifie ed training r data Annotat Pseudo- Instance ed annotate Selection data d data  Step 2: Classify unlabeled data, select high confidence instances and combine them with annotated data  Step 3:Re-train classifier on the augmented training
    15. 15. A self-training framework Unlabele d data ExploratorAnnotat Classifier Classifie y Resul ed training r Discourse ts data Detection Annotat Pseudo- Instance Test ed annotate Selection data data d data  Step 4: Obtain final classifier: No improvement on validation dataset; After a certain iteration; No class label changes.  Step 5: Detect exploratory dialogues on the test data.
    16. 16. A self-training framework Unlabele d data ExploratorAnnotat Classifier Classifie y Resul ed training r Discourse ts data Detection Annotat Pseudo- Instance Test ed annotate Selection data data d data  Self-training will introduce noisy instance.
    17. 17. KNN based Instance Selection approach K nearest neighbors classification Blue stands for “exploratory” Gray stands for “non-exploratory” 1 nearest neighbor is “exploratory” 2 nearest neighbors is “exploratory” 5 nearest neighbors is “non- exploratory”
    18. 18. KNN based Instance Selection approach Pseudo annotated instances P = {p1,p2,… …pn } pk = (lk, ck) . Lk is pseudo label, ck is confidence value Form a candidate list Choose instances with ck > r For pk in the candidate list, identify the K nearest neighbors and update the pseudo label of pk by KNN Obtain new pseudo annotated instances P- updated
    19. 19. Outline Exploratory dialogue analysis A self-training framework Datasets and experiments Applications
    20. 20. Data source: OU online conference  4 sessions including 2634 posts.Data in this study taken from a 2 day OU conference in Elluminate & Cloudworks:
    21. 21. Annotation 2 Annotators with one morning training. Four categories are given. Kappa value (binary) is 0.5978 (moderate). Only posts with the consistent labels are collected. Total# Exploratory # Non-Exploratory Session # OU_22A 529 380 149 M OU_22P 661 508 153 M OU_23A 456 310 146 M
    22. 22. Experiment Setup Baseline:  CP: Cue phrase based method  MEGE: Supervised Max Entropy GE (Generalized Expectation) approach (feature based)  ME: Supervised Max Entropy approach (instance based)  SMEGE: Self-training Max Entropy GE approach (feature based)  SME: Self-training Max Entropy approach (instance based) Experiment Setup  Use one session as training part, one session as testing part, one session as validation  During the self-training process, examples include cue-phrase are added to training dataset at the first stage.  Pseudo samples are added with the same ratio of exploratory and non-exploratory as training dataset.  Confidence value 0.8  Feature threshold 0.65
    23. 23. Evaluation Criterion Exp Exp Exp Exp NonEx Exp p NonEx NonEx p p
    24. 24. Experiment ResultApproach Accuracy Precision Recall F1Cue- 0.5389 0.9523 0.4241 0.5865PhraseMaxEnt 0.8099 0.8526 0.8675 0.8499MaxEntGE 0.7932 0.8817 0.8078 0.8292Self- 0.8088 0.8331 0.9011 0.8574trainingMaxEntSelf- 0.8181 0.8818 0.8406 0.8554 Cue-phrase method give high precision, but low accuracy.training Feature-based self-training approach improve on all criteriaMaxEntGE (the last row). Instance-based self-training algorithm (4th row) perform even worse according to accuracy precision.
    25. 25. Experiment ResultSession MaxEnt MaxEnt- MaxEntG MaxEntG Selftrain E E- SelftrainOU_22AM 0.8190 0.8467 0.7887 0.8270OU_22PM 0.8034 0.8311 0.7738 0.8116OU_23AM 0.8268 0.8282 0.8114 0.8297OU_23PM Instance-based self-training algorithm (2nd 0.8042 0.7906 0.7294 0.7989 column) is sensitive to the initial classifier’s performance. Feature-based self-training approach gives more stable results (the last column).
    26. 26. Outline Exploratory dialogue analysis A self-training framework Datasets and experiments Applications
    27. 27. Transcript level visualization
    28. 28. Time line Visualization 80 60 40 20 0 9:28 9:32 10:13 11:48 12:00 12:04 12:05 9:36 9:40 9:41 9:46 9:50 9:53 9:56 10:00 10:05 10:07 10:07 10:09 10:17 10:23 10:27 10:31 10:35 10:40 10:45 10:52 10:55 11:04 11:08 11:11 11:17 11:20 11:24 11:26 11:28 11:31 11:32 11:35 11:36 11:38 11:39 11:41 11:44 11:46 11:52 11:54 12:03 -20 -40 1. anybody else with poor audio? 2. is anyone else Exploratorydifficulty hearing this? Average having … -60 3. background noise makes it difficult to hear1. Sheffield, UK not as sunny as yesterday - 1. See you!still warm 2. bye for now!2. Greetings from Hong Kong 3. bye, and thank3. Morning from Wiltshire, sunny here! you 4. Bye all for now
    29. 29. Time line VisualizationTime User Id Content added to which 2M often drops to 10% of that in peak11:46 AM User_2 80 times I really disagree - ECDL was the starting point for many11:47 AM User_3 60 many first time users11:47 AM User_1 40 online basics wont load in final third first11:47 AM User_1 20 mobile wont work round her11:47 AM User_1 0 and satlellite costs 40 a month for 1 gig data transfer 9:28 9:32 10:13 11:48 12:00 12:04 12:05 I think the issue about the skills needed to really embrace 9:36 9:40 9:41 9:46 9:50 9:53 9:56 10:00 10:05 10:07 10:07 10:09 10:17 10:23 10:27 10:31 10:35 10:40 10:45 10:52 10:55 11:04 11:08 11:11 11:17 11:20 11:24 11:26 11:28 11:31 11:32 11:35 11:36 11:38 11:39 11:41 11:44 11:46 11:52 11:54 12:03 -20 technologies is a huge one and with web 2.0 technologies -40 things are becoming more complicated, as I say often you dont just get this stuff by attending a workshop, you have Average Exploratory … -60 to participate and appropriate them to your interests and11:47 AM User_4 context and network of others. We use myguide on mobile broadband for outreach. Works OK, but not great and thats in city centre11:47 AM User_5 boardering 3G/GPS.
    30. 30. User Visualization Contribution Distribution of Users 50 Exploratory Message Count 45 40 35 Time User Id Content 30 because although some people can 25 11:42 get online the feed is so poor that 20 AM User_1 many pages wont load. eg myguide 15 how much time and money was spent 10 11:42 getting everyone to use a mobile 5 AM User_1 phone? 0 nothing. because it was perceived to 0 10 20 30 40 50 60 be useful, therefore there is no need Total Message Count time and money on to spend 11:43 digitalinclusion, until the access to the AM User_1 internet works in order to get a 2meg connection to 11:44 everyone we need fibre to the final AM User_1 third
    31. 31. User Visualization Contribution Distribution of Users Time 50 User Id Content Exploratory Message Count 9:51 45 Hello Im a tutor at Saudi arabia AM 40 User_6 branch 35 9:51 30 AM Moderator hello Saudi Arabia! 25 9:51 20 AM User_6 hi 15 9:52 Welcome Ashawa - did we meet in 10 AM Moderator Kuwait a couple of years ago? 5 9:52 0 AM 0 User_6 10 20 no actually 30 40 50 60 9:52 Total Message Count AM Moderator @ashawa - maybe next time 9:52 AM User_6 yes I wish
    32. 32. i iThis step appears to have very goodcontent that will provoke deeper learning i iThis step appears to have some contentthat will provoke deeper learning i iThis step appears to have little contentthat will provoke deeper learning
    33. 33. Conclusion We have extended our previously proposed self- training framework for exploratory discourse detection in synchronous textchat (Elluminate conference sessions). Propose a K Nearest Neighbors algorithm based instance selection method. Applied the proposed approach to SocialLearn platform.
    34. 34. Future WorkText analytics: Integrate KNN instance selection method into the self-training framework Explore other features for exploratory dialogue classification: inter-dialogue features, global features. Build a more reliable dataset for sub-category classification, challenge, evaluation, reasoning, e xtension.
    35. 35. Future WorkVisual analytics: Investigate how these can be rendered most usefully for educators and learners Investigate user feedback when deployed Different users will appreciate different levels of detail  Purdue Signals experience suggests that complex underlying analytics should be usefully distilled into very simple feedback  But as analytics literacy grows, will users value more powerful insights?
    36. 36. Acknowledgments Thanks for the guidance and consideration of Dr. He Yulan, Dr. Simon and Dr. Rebecca. Thanks for the consideration from all the other colleagues in Knowledge Media Institute.
    37. 37. Zhongyu WeiThe Chinese University of Hong Kong, Hong Kong http://www.se.cuhk.edu.hk/~zywei/ Yulan He The Open University, UK http://people.kmi.open.ac.uk/yulan/ Simon Buckingham Shum The Open University, UK http://oro.open.ac.uk/view/person/sjb72.html Rebecca Ferguson The Open University, UK http://oro.open.ac.uk/view/person/rf2656.html
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×