Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...Vitomir Kovanovic
LAK'16 Conference paper presentation:
abstract:
In this paper, we present the results of an exploratory study that examined the problem of automating content analysis of student online discussion transcripts. We looked at the problem of coding discussion transcripts for the levels of cognitive presence, one of the three main constructs in the Community of Inquiry (CoI) model of distance education. Using Coh-Metrix and LIWC features, together with a set of custom features developed to capture discussion context, we developed a random forest classification system that achieved 70.3% classification accuracy and 0.63 Cohen’s kappa, which is significantly higher than values reported in the previous studies. Besides improvement in classification accuracy, the developed system is also less sensitive to overfitting as it uses only 205 classification features, which is around 100 times less features
than in similar systems based on bag-of-words features. We also provide an overview of the classification features most indicative of the different phases of cognitive presence that gives an additional insights into the nature of cognitive presence learning cycle. Overall, our results show great potential of the proposed approach, with an added benefit of providing further characterization of the cognitive presence coding scheme.
What is the source of social capital? The association between social network ...Vitomir Kovanovic
Presentation at the Graph-based Educational Data Mining workshop (G-EDM) during the 2014 Educational Data Mining conference (EDM 2014) at Institute of Education, University of London, London, UK on July 4th, 2014.
Topic Modeling for Learning Analytics Researchers LAK15 TutorialVitomir Kovanovic
Slides from the introductory tutorial to topic modeling with R and LSA, pLSA and LDA algorithms organized at LAK15 conference in Poughkeepsie, NY March 17, 2015
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...Vitomir Kovanovic
LAK'16 Conference paper presentation:
abstract:
In this paper, we present the results of an exploratory study that examined the problem of automating content analysis of student online discussion transcripts. We looked at the problem of coding discussion transcripts for the levels of cognitive presence, one of the three main constructs in the Community of Inquiry (CoI) model of distance education. Using Coh-Metrix and LIWC features, together with a set of custom features developed to capture discussion context, we developed a random forest classification system that achieved 70.3% classification accuracy and 0.63 Cohen’s kappa, which is significantly higher than values reported in the previous studies. Besides improvement in classification accuracy, the developed system is also less sensitive to overfitting as it uses only 205 classification features, which is around 100 times less features
than in similar systems based on bag-of-words features. We also provide an overview of the classification features most indicative of the different phases of cognitive presence that gives an additional insights into the nature of cognitive presence learning cycle. Overall, our results show great potential of the proposed approach, with an added benefit of providing further characterization of the cognitive presence coding scheme.
What is the source of social capital? The association between social network ...Vitomir Kovanovic
Presentation at the Graph-based Educational Data Mining workshop (G-EDM) during the 2014 Educational Data Mining conference (EDM 2014) at Institute of Education, University of London, London, UK on July 4th, 2014.
Topic Modeling for Learning Analytics Researchers LAK15 TutorialVitomir Kovanovic
Slides from the introductory tutorial to topic modeling with R and LSA, pLSA and LDA algorithms organized at LAK15 conference in Poughkeepsie, NY March 17, 2015
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesDavid Inouye
Given a large collection of uncategorized text documents such as blogs, news articles, research papers or historical documents, how can we automatically discover major subject areas or topics in the collection? In addition, how should the abstract notion of "topic" be mathematically represented and presented to an end-user? For example, a document describing UTCS intuitively might be a combination the topic "computer science" and the topic "University of Texas". Most topic models--and in particular the most common model Latent Dirichlet Allocation (LDA)--attempt to answer these questions by proposing that each topic can be represented as a simple frequency distribution over possible words (i.e. a Multinomial distribution). With this representation of a topic, the ubiquitous presentation of the topic to an end-user is a list of top 10 or 20 words. While LDA has been useful in many applications, we suggest that a simple frequency distribution is an oversimplified notion of topic and hinders both interpretation and further analysis of these topics. Thus, we propose the new topic model Admixture of Poisson MRFs (APM). Unlike in previous models, the topic representations in APM allow dependencies between words. For example, if a computer science paper contains the word "programming", it is more likely to contain the word "languages" than a random computer science paper. This talk describes the APM model, the optimization algorithm for fitting APM, some preliminary results, and some future directions.
Introduction to Latent Dirichlet Allocation (LDA). We cover the basic ideas necessary to understand LDA then construct the model from its generative process. Intuitions are emphasized but little guidance is given for fitting the model which is not very insightful.
Semantic Web technologies have been increasingly used as a tool for generating, organizing and personalizing e-learning content. In this presentation we will discuss and demonstrate an innovative approach to automated generation of computer-assisted assessment (CAA) from Semantic Web–based domain ontologies. The primary application domain of this work is in the automated assessment, and in particular, the development of intelligent CAA systems and question banks, but the ideas can be further generalized in the context of ontology engineering and evaluation. Prototype is implemented and available online at http://www.opensemcq.org
Preservice Teachers' Dispositions Toward Technology IntegrationJoan E. Hughes, Ph.D.
Research study that examined:
What are teacher graduates' knowledge, attitudes, and behaviors related to technology integration as they set off to become novice teachers?
Presentation at American Educational Research Association Annual Meeting: April, 2010.
Keynote at Chilean Week of Computer Science. I present a brief overview of algorithms for Recommender and then I present my work Tag-based Recommendation, Implicit Feedback and Visual Interactive Interfaces.
Commonsense knowledge for Machine Intelligence - part 1Niket Tandon
These are the slides of the tutorial on commonsense knowledge for machine intelligence, presented by Dr. Niket Tandon, Dr. Aparna Varde, and Dr. Gerard de Melo at the CIKM conference 2017.
*Part 1/3: Acquiring commonsense knowledge*
Website: http://allenai.org/tutorials/csk/
E-Learning Adoption in a Higher Education Setting: An Empirical StudyDagmar Monett
Slides of the talk at the Multidisciplinary Academic Conference on Education, Teaching and Learning 2015, MAC-ETL 2015, Prague, Czech Republic, 4-6 December 2015.
A summary of my personal expertise and knowledge completed by a description of some of the most relevant research and developments engagements carried out so far via specific examples
Evolving Lesson Plans to Assist Educators: From Paper-Based to Adaptive Lesso...Dagmar Monett
Slides of the talk at the Multidisciplinary Academic Conference on Education, Teaching and Learning 2015, MAC-ETL 2015, Prague, Czech Republic, 4-6 December 2015.
There is no doubt that using computers in language testing as well as in language learning has some advantages and disadvantages. Despite the widespread use of computer-based testing, relatively few studies have been conducted on the equivalency of two test modes especially in academic contexts. However, some institutes and educational settings are going towards using computerized test due to its advantages without doing any comparability investigation beforehand. Perhaps because they mostly believe that if the items are identical, the testing mode is irrelevant. As the use of computerized test types is rapidly expanding, we need appropriate use of technology as a facet of language learning and testing. Regarding this accelerating development in computerized tests in language testing, further investigations are needed to ensure the validity and fairness of this administration mode in comparison with traditional one. This study provides a brief discussion on the importance of substituting CBT for PPT and necessity of doing comparability study before this transition. It presents the significance of the study followed by an illustration of the background of the comparability studies of CBT and PPT.
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesDavid Inouye
Given a large collection of uncategorized text documents such as blogs, news articles, research papers or historical documents, how can we automatically discover major subject areas or topics in the collection? In addition, how should the abstract notion of "topic" be mathematically represented and presented to an end-user? For example, a document describing UTCS intuitively might be a combination the topic "computer science" and the topic "University of Texas". Most topic models--and in particular the most common model Latent Dirichlet Allocation (LDA)--attempt to answer these questions by proposing that each topic can be represented as a simple frequency distribution over possible words (i.e. a Multinomial distribution). With this representation of a topic, the ubiquitous presentation of the topic to an end-user is a list of top 10 or 20 words. While LDA has been useful in many applications, we suggest that a simple frequency distribution is an oversimplified notion of topic and hinders both interpretation and further analysis of these topics. Thus, we propose the new topic model Admixture of Poisson MRFs (APM). Unlike in previous models, the topic representations in APM allow dependencies between words. For example, if a computer science paper contains the word "programming", it is more likely to contain the word "languages" than a random computer science paper. This talk describes the APM model, the optimization algorithm for fitting APM, some preliminary results, and some future directions.
Introduction to Latent Dirichlet Allocation (LDA). We cover the basic ideas necessary to understand LDA then construct the model from its generative process. Intuitions are emphasized but little guidance is given for fitting the model which is not very insightful.
Semantic Web technologies have been increasingly used as a tool for generating, organizing and personalizing e-learning content. In this presentation we will discuss and demonstrate an innovative approach to automated generation of computer-assisted assessment (CAA) from Semantic Web–based domain ontologies. The primary application domain of this work is in the automated assessment, and in particular, the development of intelligent CAA systems and question banks, but the ideas can be further generalized in the context of ontology engineering and evaluation. Prototype is implemented and available online at http://www.opensemcq.org
Preservice Teachers' Dispositions Toward Technology IntegrationJoan E. Hughes, Ph.D.
Research study that examined:
What are teacher graduates' knowledge, attitudes, and behaviors related to technology integration as they set off to become novice teachers?
Presentation at American Educational Research Association Annual Meeting: April, 2010.
Keynote at Chilean Week of Computer Science. I present a brief overview of algorithms for Recommender and then I present my work Tag-based Recommendation, Implicit Feedback and Visual Interactive Interfaces.
Commonsense knowledge for Machine Intelligence - part 1Niket Tandon
These are the slides of the tutorial on commonsense knowledge for machine intelligence, presented by Dr. Niket Tandon, Dr. Aparna Varde, and Dr. Gerard de Melo at the CIKM conference 2017.
*Part 1/3: Acquiring commonsense knowledge*
Website: http://allenai.org/tutorials/csk/
E-Learning Adoption in a Higher Education Setting: An Empirical StudyDagmar Monett
Slides of the talk at the Multidisciplinary Academic Conference on Education, Teaching and Learning 2015, MAC-ETL 2015, Prague, Czech Republic, 4-6 December 2015.
A summary of my personal expertise and knowledge completed by a description of some of the most relevant research and developments engagements carried out so far via specific examples
Evolving Lesson Plans to Assist Educators: From Paper-Based to Adaptive Lesso...Dagmar Monett
Slides of the talk at the Multidisciplinary Academic Conference on Education, Teaching and Learning 2015, MAC-ETL 2015, Prague, Czech Republic, 4-6 December 2015.
There is no doubt that using computers in language testing as well as in language learning has some advantages and disadvantages. Despite the widespread use of computer-based testing, relatively few studies have been conducted on the equivalency of two test modes especially in academic contexts. However, some institutes and educational settings are going towards using computerized test due to its advantages without doing any comparability investigation beforehand. Perhaps because they mostly believe that if the items are identical, the testing mode is irrelevant. As the use of computerized test types is rapidly expanding, we need appropriate use of technology as a facet of language learning and testing. Regarding this accelerating development in computerized tests in language testing, further investigations are needed to ensure the validity and fairness of this administration mode in comparison with traditional one. This study provides a brief discussion on the importance of substituting CBT for PPT and necessity of doing comparability study before this transition. It presents the significance of the study followed by an illustration of the background of the comparability studies of CBT and PPT.
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...Vitomir Kovanovic
LAK'17 Conference paper presentation:
Abstract:
In 2011, the phenomenon of MOOCs had swept the world of education and put online education in the focus of the public discourse around the world. Although researchers were excited with the vast amounts of MOOC data being collected, the benefits of this data did not stand to the expectations due to several challenges. The analyses of MOOC data are very time-consuming and labor-intensive, and require a highly advanced set of technical skills, often not available to the education researchers. Because of this MOOC data analyses are rarely done before the courses end, limiting the potential of data to impact the student learning outcomes and experience.
In this paper we introduce MOOCito (MOOC intervention tool), a user-friendly software platform for the analysis of MOOC data, that focuses on conducting data-informed instructional interventions and course experimentations. We cover important design principles behind MOOCito and provide an overview of the trends in MOOC research leading to its development. Although a work-in-progress, in this paper, we outline the prototype of MOOCito and the results of a user evaluation study that focused on system’s perceived usability and ease-of-use. The results of the study are discussed, as well as their practical implications.
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
Starting from 1950s, Machine Translation (MT) was challenged from different scientific solutions which included rule-based methods, example-based and statistical models (SMT), to hybrid models, and very recent years the neural models (NMT).
While NMT has achieved a huge quality improvement in comparison to conventional methodologies, by taking advantages of huge amount of parallel corpora available from internet and the recently developed super computational power support with an acceptable cost, it struggles to achieve real human parity in many domains and most language pairs, if not all of them.
Alongside the long road of MT research and development, quality evaluation metrics played very important roles in MT advancement and evolution.
In this tutorial, we overview the traditional human judgement criteria, automatic evaluation metrics, unsupervised quality estimation models, as well as the meta-evaluation of the evaluation methods. Among these, we will also cover the very recent work in the MT evaluation (MTE) fields taking advantages of large size of pre-trained language models for automatic metric customisation towards exactly deployed language pairs and domains. In addition, we also introduce the statistical confidence estimation regarding sample size needed for human evaluation in real practice simulation.
Critiquing CS Assessment from a CS for All lens: Dagstuhl Seminar PosterMark Guzdial
Poster presented at the Dagstuhl Seminar "Assessing Learning in Introductory Computer Science" (http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16072). I argue that we have to consider what the learner wants to do and wants to be (i.e., their desired Community of Practice) when assessing learning. Different CoP, different outcomes, different assessments.
Keynote presentation for the International Semantic Web Conference in Athens Greece, on November 9, 2023. The talk addresses the generative AI explosion and its potential impacts on the Semantic Web and Knowledge Graph communities and, in fact, may spark a research Renaissance.
Abstract:
We are living in an age of rapidly advancing technology. History may view this period as one in which generative artificial intelligence is seen as reshaping the landscape and narrative of many technology-based fields of research and application. Times of disruptions often present both opportunities and challenges. We will discuss some areas that may be ripe for consideration in the field of Semantic Web research and semantically-enabled applications. Semantic Web research has historically focused on representation and reasoning and enabling interoperability of data and vocabularies. At the core are ontologies along with ontology-enabled (or ontology-compatible) knowledge stores such as knowledge graphs. Ontologies are often manually constructed using a process that (1) identifies existing best practice ontologies (and vocabularies) and (2) generates a plan for how to leverage these ontologies by aligning and augmenting them as needed to address requirements. While semi-automated techniques may help, there is typically a significant portion of the work that is often best done by humans with domain and ontology expertise. This is an opportune time to rethink how the field generates, evolves, maintains, and evaluates ontologies. We consider how hybrid approaches, i.e., those that leverage generative AI components along with more traditional knowledge representation and reasoning approaches to create improved processes. The effort to build a robust ontology that meets a use case can be large. Ontologies are not static however and they need to evolve along with knowledge evolution and expanded usage. There is potential for hybrid approaches to help identify gaps in ontologies and/or refine content. Further, ontologies need to be documented with term definitions and their provenance. Opportunities exist to consider semi-automated techniques for some types of documentation, provenance, and decision rationale capture for annotating ontologies. The area of human-AI collaboration for population and verification presents a wide range of areas of research collaboration and impact. Ontologies need to be populated with class and relationship content. Knowledge graphs and other knowledge stores need to be populated with instance data in order to be used for question answering and reasoning. Population of large knowledge graphs can be time consuming. Generative AI holds the promise to create candidate knowledge graphs that are compatible with the ontology schema. The knowledge graph should contain provenance information identifying how the content was populated and its source and correctness and currency should be checked. A human-AI assistant approach is presented.
ICSME 2016 keynote: An ecosystemic and socio-technical view on software maint...Tom Mens
These are the slides of my ICSME 2016 keynote, presented on 5 October 2016 in Raleigh, North Carolina. I focus on the difficulties of maintaining and evolving software ecosystems, large collections of interacting software components that are maintained by a large and active community of contributors and that evolve together in the same environment. Software ecosystems are becoming ubiquitous due to the omnipresence of open source software. I present several problems that arise during maintenance and evolution of software ecosystems, and I argue how some of these challenges should be addressed by adopting a socio-technical view and by relying on a multidisciplinary and mixed methods research approach. I illustrate this with examples of social network analysis, complex systems research, ecological biodiversity, and survival analysis.
Rubric-based Assessment of Programming Thinking Skills and Comparative Evalua...Hironori Washizaki
Hironori Washizaki, "Rubric-based Assessment of Programming Thinking Skills and Comparative Evaluation of Introductory Programming Environments," 4th International Annual Meeting on STEM Education (IAMSTEM 2021), Keynote, August 12-14, 2021, Keelung, Taiwan and Online
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...Sunghwan Kim
Presented at the American Chemical Society (ACS) Spring 2021 National Meeting (Virtual, April 16, 2021).
==== Abstract ====
Computer and informatics skills to handle an ever-increasing amount of chemical information are considered important for students pursuing STEM careers in the age of big data. However, many schools do not offer a cheminformatics course or alternative training opportunities. The Cheminformatics Online Chemistry Course (OLCC) is a community effort to introduce cheminformatics content into the undergraduate chemistry curriculum. It is a highly collaborative teaching project involving instructors at multiple schools as well as external cheminformatics experts recruited across sectors, including academia, government, and industry. Three Cheminformatics OLCCs were offered in the Fall 2015, Spring 2017, and Fall 2019 semesters. In each OLCC, the instructors at participating schools would meet face-to-face with the students, while external cheminformatics experts engaged through online discussions across campuses with both the instructors and students. All the material created in the course has been made available at the open education repositories of LibreTexts and CCCE websites for other institutions to adapt to their future needs. This presentation describes the instructional approaches of the Cheminformatics OLCC project and the lessons learned from this community effort. We also discuss future directions for this project as well as cheminformatics education in general, including pedagogy, resources, and course content.
Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct. However, this is not always the case. There are two core challenges, which should be addressed: 1) ensuring that scientific publications are credible -- e.g. that claims are not made without supporting evidence, and that all relevant supporting evidence is provided; and 2) that scientific findings are not misrepresented, distorted or outright misreported when communicated by journalists or the general public. I will present some first steps towards addressing these problems and outline remaining challenges.
Big Qualitative Data, Big Team, Little Time - A Path to PublicationQSR International
This webinar will describe the project and research question, as well as the design, data management, and analysis process of using NVivo to analyze data with a large coding team with no NVivo experience.
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
Talk given at the 8th Forum for Information Retrieval Evaluation (FIRE, http://fire.irsi.res.in/fire/2016/), December 10, 2016, and at the Qatar Computing Research Institute (QCRI), December 15, 2016.
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...Antonio Toral
We reassess a recent study (Hassan et al.,
2018) that claimed that machine translation
(MT) has reached human parity for the transla-
tion of news from Chinese into English, using
pairwise ranking and considering three vari-
ables that were not taken into account in that
previous study: the language in which the
source side of the test set was originally writ-
ten, the translation proficiency of the evalua-
tors, and the provision of inter-sentential con-
text. If we consider only original source text
(i.e. not translated from another language, or
translationese), then we find evidence showing
that human parity has not been achieved. We
compare the judgments of professional trans-
lators against those of non-experts and dis-
cover that those of the experts result in higher
inter-annotator agreement and better discrim-
ination between human and machine transla-
tions. In addition, we analyse the human trans-
lations of the test set and identify important
translation issues. Finally, based on these find-
ings, we provide a set of recommendations for
future human evaluations of MT.
Similar to Automated Content Analysis of Discussion Transcripts (20)
Introduction to Learning Analytics for High School Teachers and ManagersVitomir Kovanovic
Presentation at the first Learning Analytics Learning Network (LALN) Event in Adelaide, Australia on Oct 22, 2019.
Abstract:
With the increased adoption of technology, institutions have unprecedented opportunities to continuously improve the quality of their services through data collection and analysis. Schools and universities now have data about learners and their contexts that can provide valuable insight into how they learn. Early attempts were directed towards mining educational data to identify students-at-risk and develop interventions. Recently, more sophisticated approaches are being deployed by researchers and practitioners. These include analysis of learner behaviour that leads to various learning outcomes, social networks and teams, employability, creativity, and critical thinking. Analysing digital traces generated through learning processes requires a broad suite of methods from data science, statistics, psychometrics, social and learning sciences.
This workshop aims to introduce teachers and educators to the fast growing and promising field of learning analytics. How digital data can be used for the analysis and improvement of student learning will be explored. First, we will provide an overview of learning analytics, its key methods and approaches, as well as problems for which it can be used. Secondly, attendees will engage in group learning activities to explore ways in which learning analytics could be used within their institutions. The focus will be on identifying learning-related challenges that are relevant to their particular context and exploring how learning analytics can be used to practically and effectively.
Extending video interactions to support self-regulated learning in an online ...Vitomir Kovanovic
Slides from our presentation at ASCILITE'18 conference in Geelong, Victoria. Full paper is available in ASCILITE conference proceedings at http://ascilite.org/wp-content/uploads/2018/12/ASCILITE-2018-Proceedings.pdf
Analysing social presence in online discussions through network and text anal...Vitomir Kovanovic
The slides from our presentation at IEEE ICALT'19 conference.
Abstract:
This paper presents an approach to studying relationships between students' social presence and course topics from transcripts of asynchronous discussions in online learning environments. Specifically, the paper uses topic modelling and epistemic network analysis to investigate how students' social presence is expressed across different course topics. Finally, we show how this method can be adopted to examine how students' social presence changed due to an instructional intervention. The results of this study and its implications are further discussed.
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...Vitomir Kovanovic
Slides from our EC-TEL'18 Paper presentation. Full paper is available at https://dx.doi.org/10.1007/978-3-319-98572-5_19
Abstract:
This paper presents a method for automated content analysis of students’ messages in asynchronous discussions written in Portuguese. In particular, the paper looks at the problem of coding discussion transcripts for the levels of cognitive presence, a key construct in a widely used Community of Inquiry model of online learning. Although there are techniques to coding for cognitive presence in the English language, the literature is still poor in methods for others languages, such as Portuguese. The proposed method uses a set of 87 different features to create a random forest classifier to automatically extract the cognitive phases. The model developed reached Cohen’s κ of .72, which represents a “substantial” agreement, and it is above the Cohen’s κ threshold of .70, commonly used in the literature for determining a reliable quantitative content analysis. This paper also provides some theoretical insights into the nature of cognitive presence by looking at the classification features that were most relevant for distinguishing between the different phases of cognitive presence.
Validating a theorized model of engagement in learning analyticsVitomir Kovanovic
Slides from our paper presentation at LAK'19 conference in Tempe, AZ. The full paper is available at https://dx.doi.org/10.1145/3303772.3303775
Abstract:
Student engagement is often considered an overarching construct in educational research and practice. Though frequently employed in the learning analytics literature, engagement has been subjected to a variety of interpretations and there is little consensus regarding the very definition of the construct. This raises grave concerns with regards to construct validity: namely, do these varied metrics measure the same thing? To address such concerns, this paper proposes, quantifies, and validates a model of engagement which is both grounded in the theoretical literature and described by common metrics drawn from the field of learning analytics. To identify a latent variable structure in our data we used exploratory factor analysis and validated the derived model on a separate sub-sample of our data using confirmatory factor analysis. To analyze the associations between our latent variables and student outcomes, a structural equation model was fitted, and the validity of this model across different course settings was assessed using MIMIC modeling. Across different domains, the broad consistency of our model with the theoretical literature suggest a mechanism that may be used to inform both interventions and course design.
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...Vitomir Kovanovic
Slides from our presentation at the Seventh National Conference
on Work-Integrated Learning (ACEN’18).
The full paper is available at https://www.researchgate.net/publication/328578409_Examining_the_value_of_learning_analytics_for_supporting_work-integrated_learning
Unsupervised Learning for Learning Analytics ResearchersVitomir Kovanovic
Slides from my 2.5-day workshop organised at 2018 Learning Analytics Summer Institute (LASI'18) organised at Teachers College, Columbia University on July 11, 2018.
Introduction to R for Learning Analytics ResearchersVitomir Kovanovic
The slides from my 2hr tutorial organised at 2018 Learning Analytics Summer Institute (LASI) at Teachers College, Columbia University on June 11, 2018.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Automated Content Analysis of Discussion Transcripts
1. Automated Content Analysis of Discussion Transcripts
Vitomir Kovanovi´c Dragan Gaˇsevi´c
v.kovanovic@ed.ac.uk dgasevic@acm.org
School of Informatics,
University of Edinburgh
Edinburgh, United Kingdom
v.kovanovic@ed.ac.uk
31 Aug 2015,
University of Edinburgh,
United Kingdom
2. Asynchronous online discussions -
“gold mine of information” (Henri, 1992)
• They are frequently used for all types of
education delivery,
• Their use produced large amount of data
about learning processes,
• Their use is well supported by the
social-constructivist pedagogies.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 1 / 18
3. Asynchronous online discussions - issues and challenges
• Produced data is used mainly for research after the courses are over,
• Content analysis techniques are complex and time consuming,
• Content analysis had almost no impact on educational practice (Donnelly and
Gardner, 2011),
• There is a need for more proactive use of the data through automation:
• Few attempts for automated content analysis,
• Focus mostly on surface level characteristics, and
• Not based on well established theories of education.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 2 / 18
4. Overall idea
Overall idea
To examine how we can use text mining for automation of content
analysis of discussion transcripts.
More specifically,
We looked at the automation of content analysis of cognitive
presence, one of the three main components of Community of
Inquiry framework.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 3 / 18
5. Community of Inquiry (CoI) model
Community of Inquiry model (Garrison, Anderson, and Archer, 1999)
Conceptual framework outlying important constructs that define worthwhile
educational experience in distance education setting.
Three presences:
• Social presence: relationships and social
climate in a community.
• Cognitive presence: phases of cognitive
engagement and knowledge construction.
• Teaching presence: instructional role
during social learning.
CoI model is:
• Extensively researched and validated.
• Adopts Content Analysis for assessment of
presences.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 4 / 18
6. Community of Inquiry (CoI) model
Community of Inquiry model (Garrison, Anderson, and Archer, 1999)
Conceptual framework outlying important constructs that define worthwhile
educational experience in distance education setting.
Three presences:
• Social presence: relationships and social
climate in a community.
• Cognitive presence: phases of cognitive
engagement and knowledge construction.
• Teaching presence: instructional role
during social learning.
CoI model is:
• Extensively researched and validated.
• Adopts Content Analysis for assessment of
presences.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 4 / 18
7. Cognitive presence
Cognitive Presence
“an extent to which the participants in any particular configuration of a
community of inquiry are able to construct meaning through sustained
communication.” (Garrison, Anderson, and Archer, 1999, p .89)
Four phases of cognitive presence:
1 Triggering event: Some issue, dilemma or problem is identified.
2 Exploration: Students move between private world of reflection and shared
world of social knowledge construction.
3 Integration: Students filter irrelevant information and synthesize new
knowledge.
4 Resolution: Students analyze practical applicability, test different
hypotheses, and start a new learning cycle.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 5 / 18
8. Cognitive presence coding scheme
• Use of whole message as unit of analysis,
• Look for particular indicators of different sociocognitive processes,
• Requires expertise with coding instrument and domain knowledge.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 6 / 18
9. Community of Inquiry (CoI) model
Issues and challenges:
• Very labor intensive,
• Crude coding scheme,
• Requires experienced coders,
• Can’t be used for real-time monitoring,
• Not explaining reasons behind observed levels of presences, and
• Not providing suggestions and guidelines for instructors to direct their
pedagogical decisions.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 7 / 18
10. Data set
• Six offerings of graduate level course in software engineering.
• Total of 1747 messages, 81 students,
• Manually coded by two coders (agreement = 98.1%, Cohen’s κ = 0.974),
ID Phase Messages (%)
0 Other 140 8.01%
1 Triggering Event 308 17.63%
2 Exploration 684 39.17%
3 Integration 508 29.08%
4 Resolution 107 6.12%
All phases 1747 100%
Number of Messages in Different Phases of Cognitive Presence
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 8 / 18
11. Feature extraction
• Unigrams, Bigrams and Trigrams,
• Part-of-Speech Bigrams and Trigrams,
• Backoff Bigrams and Trigrams:
Example: “John is working.”
Bigrams:
• john is,
• is working.
Backoff Bigrams:
• john verb ,
• noun is,
• is verb
• verb working.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 9 / 18
12. Feature extraction
• Dependency triplets: rel, head, modifier
Example: “Bills on ports and immigration were submitted by Senator
Brownback, Republican of Kansas.”
nsubjpass, submitted, Bills
auxpass, submitted, were
agent, submitted, Brownback
nn, Brownback, Senator
appos, Brownback, Republican
prep of, Republican, Kansas
prep on, Bills, ports
conj and, ports, immigration
prep on, Bills, immigration
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 10 / 18
13. Feature extraction
• Backoff dependency triplets:
Example: “Bills on ports and immigration were submitted by Senator
Brownback, Republican of Kansas.”
Dependency triplet:
• conj and, ports, immigration
Backoff dependency triplets:
• conj and, noun , immigration
• conj and, ports, noun
• conj and, noun , noun
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 11 / 18
14. Additional features
• Number of named entities in the message
Brainstorming should involve more concepts than posing a question,
• Is message first in the discussion?
Posing questions is more likely to be initiating discussions,
• Is message a reply to the first message in the discussion?
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 12 / 18
15. Classification
Classifier:
• SVM classifier with RBF kernel.
• Accuracy and kernel parameter tuning evaluated using nested 5-fold
cross-validation.
• Only features with support of 10 or more,
• Accuracy evaluated using 10 fold cross-validation,
• Comparison of models using McNemar’s test.
Implementation:
• Implemented in Java,
• Feature extraction using Stanford CoreNLP1
toolkit,
• Tokenization, Part-of-Speech, and Dependency parsing modules
• Classification using Weka (Witten, Frank, and Hall, 2011) and
LibSVM (Chang and Lin, 2011), and
• Statistical comparison using Java Statistical Classes (JSC)2
1http://nlp.stanford.edu/software/corenlp.shtml
2http://www.jsc.nildram.co.uk/index.htm
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 13 / 18
16. Results
• We achieved Cohen’s κ of 0.42 for our classification problem.
• Better then the existing Neural Network system (Cohen’s κ=0.31).
• Unigram baseline model achieved Cohen’s κ of 0.33.
Error analysis:
Predicted
Actual Other Trigg. Expl. Integ. Resol.
Other 17 04 05 02 00
Triggering 01 42 ⇒1
14 03 01
Exploration 02 09 98 24 04
Integration 01 03 38 ⇐1,2
56 04
Resolution 00 00 03 15 ⇐2
03
Confusion Matrix
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 14 / 18
17. Challenges
1 Effect of the large relative size of the
exploration class,
2 Effect of the code-up rule for coding,
3 No relative importance of features, and
4 Context is not taken into the account.
Code-up rule for coding
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 15 / 18
18. In progress: making use of tread context
• Discussions (and students’ learning) progresses from triggering to resolutions.
• Content of a message depends on the content of the previous messages.
• Content of a message depends on the learning progress of a given student.
Model for message classification
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 16 / 18
19. Approach: Hidden Markov models (HMMs) & Conditional
random fields (CRFs)
• Hidden Markov Models:
• HMMs used to models system states and their transitions in a variety of
contexts.
• Widely used, Bayesian Knowledge Tracing models based on HMMs.
• Challenges with HMM:
• Can this be modeled as HMM (2nd order HMMs?)
• Dependency only on a single previous state,
• One manifest variable for each state
• Conditional random fields:
• Used for structured predictions (e.g., speech recognition)
• For speech recognition, take into the account the classes of all letters in a word.
• Widely used in natural language processing,
• More flexible than HMMs,
• Challenges with CRF:
• Too many parameters to estimate with little data
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 17 / 18
20. Conclusions and future work
Summary:
• Promising path to explore,
• Use of backoff trigrams, plain and backoff dependency triplets, entity count
and first message indicator seems useful,
Future work:
• Additional types of features which look at the context of previous messages
(e.g., convergence vs. divergence),
• Moving away from SVM, explore other classification methods which are
better at explanation
• Give associated probabilities for each classification,
• Give relative importance of different features.
Challenges:
• Challenges with message unit of analysis and surface-level features,
• Low frequency of resolution messages.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 18 / 18
22. References I
Chang, Chih-Chung and Chih-Jen Lin (2011). “LIBSVM: A library for support vector machines”. In:
ACM Transactions on Intelligent Systems and Technology 2 (3), 27:1–27:27.
Donnelly, Roisin and John Gardner (2011). “Content analysis of computer conferencing transcripts”.
In: Interactive Learning Environments 19.4, pp. 303–315.
Garrison, D. Randy, Terry Anderson, and Walter Archer (1999). “Critical Inquiry in a Text-Based
Environment: Computer Conferencing in Higher Education”. In: The Internet and Higher Education
2.2–3, pp. 87–105.
Henri, France (1992). “Computer Conferencing and Content Analysis”. en. In: Collaborative Learning
Through Computer Conferencing, pp. 117–136.
Witten, Ian H., Eibe Frank, and Mark A. Hall (2011). Data Mining: Practical Machine Learning Tools
and Techniques, Third Edition. 3rd ed. Morgan Kaufmann.