Paper presentation for the final course Advanced Concept in Machine Learning.
The paper is @Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data"
http://jmlr.org/proceedings/papers/v32/chenf14.pdf
This is an introduction of Topic Modeling, including tf-idf, LSA, pLSA, LDA, EM, and some other related materials. I know there are definitely some mistakes, and you can correct them with your wisdom. Thank you~
Deep neural methods have recently demonstrated significant performance improvements in several IR tasks. In this lecture, we will present a brief overview of deep models for ranking and retrieval.
This is a follow-up lecture to "Neural Learning to Rank" (https://www.slideshare.net/BhaskarMitra3/neural-learning-to-rank-231759858)
This is an introduction of Topic Modeling, including tf-idf, LSA, pLSA, LDA, EM, and some other related materials. I know there are definitely some mistakes, and you can correct them with your wisdom. Thank you~
Deep neural methods have recently demonstrated significant performance improvements in several IR tasks. In this lecture, we will present a brief overview of deep models for ranking and retrieval.
This is a follow-up lecture to "Neural Learning to Rank" (https://www.slideshare.net/BhaskarMitra3/neural-learning-to-rank-231759858)
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. In this lecture, we will cover some of the fundamentals of neural representation learning for text retrieval. We will also discuss some of the recent advances in the applications of deep neural architectures to retrieval tasks.
(These slides were presented at a lecture as part of the Information Retrieval and Data Mining course taught at UCL.)
Neural Models for Information RetrievalBhaskar Mitra
In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing (NLP) tasks, such as language modelling and machine translation. This suggests that neural models will also yield significant performance improvements on information retrieval (IR) tasks, such as relevance ranking, addressing the query-document vocabulary mismatch problem by using semantic rather than lexical matching. IR tasks, however, are fundamentally different from NLP tasks leading to new challenges and opportunities for existing neural representation learning approaches for text.
We begin this talk with a discussion on text embedding spaces for modelling different types of relationships between items which makes them suitable for different IR tasks. Next, we present how topic-specific representations can be more effective than learning global embeddings. Finally, we conclude with an emphasis on dealing with rare terms and concepts for IR, and how embedding based approaches can be augmented with neural models for lexical matching for better retrieval performance. While our discussions are grounded in IR tasks, the findings and the insights covered during this talk should be generally applicable to other NLP and machine learning tasks.
Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on the other hand, terms have discrete or local representations, and the relevance of a document is determined by the exact matches of query terms in the body text. We hypothesize that matching with distributed representations complements matching with traditional local representations, and that a combination of the two is favourable. We propose a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that matches the query and the document using learned distributed representations. The two networks are jointly trained as part of a single neural network. We show that this combination or ‘duet’ performs significantly better than either neural network individually on a Web page ranking task, and significantly outperforms traditional baselines and other recently proposed models based on neural networks.
Neural Models for Information RetrievalBhaskar Mitra
In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing (NLP) tasks, such as language modelling and machine translation. This suggests that neural models may also yield significant performance improvements on information retrieval (IR) tasks, such as relevance ranking, addressing the query-document vocabulary mismatch problem by using semantic rather than lexical matching. IR tasks, however, are fundamentally different from NLP tasks leading to new challenges and opportunities for existing neural representation learning approaches for text.
In this talk, I will present my recent work on neural IR models. We begin with a discussion on learning good representations of text for retrieval. I will present visual intuitions about how different embeddings spaces capture different relationships between items, and their usefulness to different types of IR tasks. The second part of this talk is focused on the applications of deep neural architectures to the document ranking task.
This report discusses three submissions based on the Duet architecture to the Deep Learning track at TREC 2019. For the document retrieval task, we adapt the Duet model to ingest a "multiple field" view of documents—we refer to the new architecture as Duet with Multiple Fields (DuetMF). A second submission combines the DuetMF model with other neural and traditional relevance estimators in a learning-to-rank framework and achieves improved performance over the DuetMF baseline. For the passage retrieval task, we submit a single run based on an ensemble of eight Duet models.
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
Slides from my keynote talk at the Recherche d'Information SEmantique (RISE) workshop at CORIA-TALN 2018 conference in Rennes, France.
(Abstract)
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. Unlike classical IR models, these machine learning (ML) based approaches are data-hungry, requiring large scale training data before they can be deployed. Traditional learning to rank models employ supervised ML techniques—including neural networks—over hand-crafted IR features. By contrast, more recently proposed neural models learn representations of language from raw text that can bridge the gap between the query and the document vocabulary.
Neural IR is an emerging field and research publications in the area has been increasing in recent years. While the community explores new architectures and training regimes, a new set of challenges, opportunities, and design principles are emerging in the context of these new IR models. In this talk, I will share five lessons learned from my personal research in the area of neural IR. I will present a framework for discussing different unsupervised approaches to learning latent representations of text. I will cover several challenges to learning effective text representations for IR and discuss how latent space models should be combined with observed feature spaces for better retrieval performance. Finally, I will conclude with a few case studies that demonstrates the application of neural approaches to IR that go beyond text matching.
Final presentation of an agent for the 2014 Battlecode competition (www.battlecode.org), assignment of the master research project of the master in Artificial Intelligence at Maastricht University. Thanks to Gerrit, Justus and Jeroen
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. In this lecture, we will cover some of the fundamentals of neural representation learning for text retrieval. We will also discuss some of the recent advances in the applications of deep neural architectures to retrieval tasks.
(These slides were presented at a lecture as part of the Information Retrieval and Data Mining course taught at UCL.)
Neural Models for Information RetrievalBhaskar Mitra
In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing (NLP) tasks, such as language modelling and machine translation. This suggests that neural models will also yield significant performance improvements on information retrieval (IR) tasks, such as relevance ranking, addressing the query-document vocabulary mismatch problem by using semantic rather than lexical matching. IR tasks, however, are fundamentally different from NLP tasks leading to new challenges and opportunities for existing neural representation learning approaches for text.
We begin this talk with a discussion on text embedding spaces for modelling different types of relationships between items which makes them suitable for different IR tasks. Next, we present how topic-specific representations can be more effective than learning global embeddings. Finally, we conclude with an emphasis on dealing with rare terms and concepts for IR, and how embedding based approaches can be augmented with neural models for lexical matching for better retrieval performance. While our discussions are grounded in IR tasks, the findings and the insights covered during this talk should be generally applicable to other NLP and machine learning tasks.
Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on the other hand, terms have discrete or local representations, and the relevance of a document is determined by the exact matches of query terms in the body text. We hypothesize that matching with distributed representations complements matching with traditional local representations, and that a combination of the two is favourable. We propose a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that matches the query and the document using learned distributed representations. The two networks are jointly trained as part of a single neural network. We show that this combination or ‘duet’ performs significantly better than either neural network individually on a Web page ranking task, and significantly outperforms traditional baselines and other recently proposed models based on neural networks.
Neural Models for Information RetrievalBhaskar Mitra
In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing (NLP) tasks, such as language modelling and machine translation. This suggests that neural models may also yield significant performance improvements on information retrieval (IR) tasks, such as relevance ranking, addressing the query-document vocabulary mismatch problem by using semantic rather than lexical matching. IR tasks, however, are fundamentally different from NLP tasks leading to new challenges and opportunities for existing neural representation learning approaches for text.
In this talk, I will present my recent work on neural IR models. We begin with a discussion on learning good representations of text for retrieval. I will present visual intuitions about how different embeddings spaces capture different relationships between items, and their usefulness to different types of IR tasks. The second part of this talk is focused on the applications of deep neural architectures to the document ranking task.
This report discusses three submissions based on the Duet architecture to the Deep Learning track at TREC 2019. For the document retrieval task, we adapt the Duet model to ingest a "multiple field" view of documents—we refer to the new architecture as Duet with Multiple Fields (DuetMF). A second submission combines the DuetMF model with other neural and traditional relevance estimators in a learning-to-rank framework and achieves improved performance over the DuetMF baseline. For the passage retrieval task, we submit a single run based on an ensemble of eight Duet models.
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
Slides from my keynote talk at the Recherche d'Information SEmantique (RISE) workshop at CORIA-TALN 2018 conference in Rennes, France.
(Abstract)
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. Unlike classical IR models, these machine learning (ML) based approaches are data-hungry, requiring large scale training data before they can be deployed. Traditional learning to rank models employ supervised ML techniques—including neural networks—over hand-crafted IR features. By contrast, more recently proposed neural models learn representations of language from raw text that can bridge the gap between the query and the document vocabulary.
Neural IR is an emerging field and research publications in the area has been increasing in recent years. While the community explores new architectures and training regimes, a new set of challenges, opportunities, and design principles are emerging in the context of these new IR models. In this talk, I will share five lessons learned from my personal research in the area of neural IR. I will present a framework for discussing different unsupervised approaches to learning latent representations of text. I will cover several challenges to learning effective text representations for IR and discuss how latent space models should be combined with observed feature spaces for better retrieval performance. Finally, I will conclude with a few case studies that demonstrates the application of neural approaches to IR that go beyond text matching.
Final presentation of an agent for the 2014 Battlecode competition (www.battlecode.org), assignment of the master research project of the master in Artificial Intelligence at Maastricht University. Thanks to Gerrit, Justus and Jeroen
Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain
This is a tutorial on topic modelling techniques - that informs the reader about the basic ingredients of all topic models, and allows them to develop a new model in the end.
Research project MAI2 - Final Presentation Group 4Daniele Di Mitri
The slides here present the results of the second semster Research Project as part of the Master in Artificial Intelligence at the Department of Knowledge Engineering of Maastricht University. The project took place between February and June 2015 and consisted in the analysis of a big dataset consisting in 200K publications on Nanotechnology. The project team was composed by by S. Deckers - J. Hermans - A. Ludermann - D. Di Mitri - J. Rutten - D. Soemers.
Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...Alexis Perrier
Dans cette presentation je montre comment appliquer des techniques de topic modeling a un fil twitter en utilisant gensim, python et en comparant certains algorithmes: LSA, LSA ...
Topic Modelling to identify behavioral trends in online communities Conor Duke
We extracted a high volume of discussion from an online help forum and were able to significantly predict behavior based on the type of topics users were discussing, we are currently applying this to reduce churn and increase ARPU at major telco provider
This presentation includes a introduction to semantic analysis using Radim Řehůřek's Gensim and D3. We will be discussing the statistical principals of LDA, its application using Ipython notebook and interrogation of the results using D3 framework.
BIO: Conor Duke is the Insights manager at Fabrikatyr Analytics. He likes coffee, doing stuff social good and the outdoors. @conr / ie.linkedin.com/in/conorduke
Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016Jonathan Sedar
Slides from my presentation at ODSC and Python Quants meetup in London on 13 Apr 2016. Very lightly covering a demo project on topic modelling and network analysis of the Enron Email Corpus.
Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on...Vasily Leksin
This slides describes our solution for the RecSys Challenge 2016. In the challenge, several datasets were provided from a social network for business XING. The goal of the competition was to use these data to predict job postings that a user will interact positively with (click, bookmark or reply). Our solution to this problem includes three different types of models: Factorization Machine, item-based collaborative filtering, and content-based topic model on tags. Thus, we combined collaborative and content-based approaches in our solution.
Our best submission, which was a blend of ten models, achieved 7th place in the challenge's final leaderboard with a score of 1677898.52. The approaches presented in this paper are general and scalable. Therefore they can be applied to another problem of this type.
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGcscpconf
A Large number of digital text information is generated every day. Effectively searching,
managing and exploring the text data has become a main task. In this paper, we first represent
an introduction to text mining and a probabilistic topic model Latent Dirichlet allocation. Then
two experiments are proposed - Wikipedia articles and users’ tweets topic modelling. The
former one builds up a document topic model, aiming to a topic perspective solution on
searching, exploring and recommending articles. The latter one sets up a user topic model,
providing a full research and analysis over Twitter users’ interest. The experiment process
including data collecting, data pre-processing and model training is fully documented and
commented. Further more, the conclusion and application of this paper could be a useful
computation tool for social and business research.
A Text Mining Research Based on LDA Topic Modellingcsandit
A Large number of digital text information is gener
ated every day. Effectively searching,
managing and exploring the text data has become a m
ain task. In this paper, we first represent
an introduction to text mining and a probabilistic
topic model Latent Dirichlet allocation. Then
two experiments are proposed - Wikipedia articles a
nd users’ tweets topic modelling. The
former one builds up a document topic model, aiming
to a topic perspective solution on
searching, exploring and recommending articles. The
latter one sets up a user topic model,
providing a full research and analysis over Twitter
users’ interest. The experiment process
including data collecting, data pre-processing and
model training is fully documented and
commented. Further more, the conclusion and applica
tion of this paper could be a useful
computation tool for social and business research.
Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct. However, this is not always the case. There are two core challenges, which should be addressed: 1) ensuring that scientific publications are credible -- e.g. that claims are not made without supporting evidence, and that all relevant supporting evidence is provided; and 2) that scientific findings are not misrepresented, distorted or outright misreported when communicated by journalists or the general public. I will present some first steps towards addressing these problems and outline remaining challenges.
We present a feature vector formation technique for documents - Sparse Composite Document Vector (SCDV) - which overcomes several shortcomings of the current distributional paragraph vector representations that are widely used for text representation. In SCDV, word embeddings are clustered to capture multiple semantic contexts in which words occur. They are then chained together to form document topic-vectors that can express complex, multi-topic documents. Through extensive experiments on multi-class and multi-label classification tasks, we outperform the previous state-of-the-art method, NTSG. We also show that SCDV embeddings perform well on heterogeneous tasks like Topic Coherence, context-sensitive Learning and Information Retrieval. Moreover, we achieve a significant reduction in training and prediction times compared to other representation methods. SCDV achieves best of both worlds - better performance with lower time and space complexity. You can see my EMNLP presentation here: https://vimeo.com/238235553
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...IJECEIAES
Latent Dirichlet Allocation (LDA) is a probability model for grouping hidden topics in documents by the number of predefined topics. If conducted incorrectly, determining the amount of K topics will result in limited word correlation with topics. Too large or too small number of K topics causes inaccuracies in grouping topics in the formation of training models. This study aims to determine the optimal number of corpus topics in the LDA method using the maximum likelihood and Minimum Description Length (MDL) approach. The experimental process uses Indonesian news articles with the number of documents at 25, 50, 90, and 600; in each document, the numbers of words are 3898, 7760, 13005, and 4365. The results show that the maximum likelihood and MDL approach result in the same number of optimal topics. The optimal number of topics is influenced by alpha and beta parameters. In addition, the number of documents does not affect the computation times but the number of words does. Computational times for each of those datasets are 2.9721, 6.49637, 13.2967, and 3.7152 seconds. The optimisation model has resulted in many LDA topics as a classification model. This experiment shows that the highest average accuracy is 61% with alpha 0.1 and beta 0.001.
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the-art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed.
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
ABSTRACT
In this paper we propose a novel method to cluster categorical data while retaining their context. Typically, clustering is performed on numerical data. However it is often useful to cluster categorical data as well, especially when dealing with data in real-world contexts. Several methods exist which can cluster categorical data, but our approach is unique in that we use recent text-processing and machine learning advancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trained
word embeddings). We encode words or categorical data into numerical, context-aware, vectors that we use to cluster the data points using common clustering algorithms like K-means.
A Document Exploring System on LDA Topic Model for Wikipedia Articlesijma
A Large number of digital text information is generated every day. Effectively searching, managing and
exploring the text data has become a main task. In this paper, we first present an introduction to text
mining and LDA topic model. Then we deeply explained how to apply LDA topic model to text corpus by
doing experiments on Simple Wikipedia documents. The experiments include all necessary steps of data
retrieving, pre-processing, fitting the model and an application of document exploring system. The result of
the experiments shows LDA topic model working effectively on documents clustering and finding the
similar documents. Furthermore, the document exploring system could be a useful research tool for
students and researchers.
Similar to Lifelong Topic Modelling presentation (20)
Privacy-Preserving and Scalable Affect Detection in Online Synchronous Learning
written by Felix Böttger, Ufuk Cetinkaya, Daniele Di Mitri, Sebastian Gombert, Krist Shingjergji, Deniz Iren & Roland Klemke was accepted at the Seventeenth European Conference on Technology Enhanced Learning (EC-TEL 2022) Educating for a new future: Making sense of technology-enhanced learning adoption – Toulouse, France, 12-16 September 2022
The paper reports on a research prototype which stems from the cooperation between DIPF and the Open University of the Netherlands.
Abstract
The recent pandemic has forced most educational institutions to shift to distance learning. Teachers can perceive various non-verbal cues in face-to-face classrooms and thus notice when students are distracted, confused, or tired. However, the students’ non-verbal cues are not observable in online classrooms. The lack of these cues poses a challenge for the teachers and hinders them in giving adequate, timely feedback in online educational settings. This can lead to learners not receiving proper guidance and may cause them to be demotivated. This paper proposes a pragmatic approach to detecting student affect in online synchronized learning classrooms. Our approach consists of a method and a privacy-preserving prototype that only collects data that is absolutely necessary to compute action units and is highly scalable by design to run on multiple devices without specialized hardware. We evaluated our prototype using a benchmark for the system performance. Our results confirm the feasibility and the applicability of the proposed approach.
Guest Lecture: Restoring Context in Distance Learning with Artificial Intelli...Daniele Di Mitri
Presentation given on the February 1st, 2022 at the "Brown Bag" presentation series organised by the Faculty of NYU Educational Communication and Technology which is part of the Steinhardt School of Culture, Education, and Human Development.
https://www.ectstudent.info/news-events/brown-bag-dr-daniele-di-mitri
Presentation Abstract:
The COVID-19 pandemic forced more than 1.6 billion learners out of school, becoming the most challenging disruption ever endured by the global education systems. In many countries, education institutions decided to move their regular activities online, opting for remote teaching as an emergency solution to continue their education. Meanwhile, physical distancing and learning in isolation heavily challenge learners and hinder their study success. There is a compelling need to make education systems more resilient and less vulnerable to future disruptions in such a critical landscape. In particular, we have to reconsider how digital technologies can support online and hybrid teaching. If digital education technologies such as video conferencing tools and learning management systems have improved to make educational resources more available and education more flexible, the modes of interaction they implement remain essentially unnatural for the learner due to a substantial lack of context. Modern sensor-enabled computer systems allow extending the standard human-computer interfaces and facilitate richer multimodal interaction. Furthermore, advances in AI allow interpreting the data collected from multimodal and multi-sensor devices. These insights can be used to support online teaching and learning in isolation with personalised feedback and adaptation through Multimodal Learning Experiences (MLX). This guest lecture elaborates on existing approaches, architectures, and methodologies. I illustrate use cases that employ multimodal learning analytics applications that can shape the online teaching of the future.
I was invited to deliver a keynote at the SITE Interactive online conference 2021 Keynote on “Restoring Context in Online Teaching with Artificial Intelligence and Multimodal Learning Experiences”. The keynote paper to be published in the proceedings of the conference on the Learning & Technology Library
Abstract: The COVID-19 pandemic forced more than 1.6 billion learners out of school, becoming the most challenging disruption ever endured by the global education systems. In many countries, education institutions decided to move their regular activities online, opting for remote teaching as an emergency solution to continue their education. Meanwhile, physical distancing and learning in isolation are heavily challenging learners and hindering their study success. There is a compelling need to make education systems more resilient and less vulnerable to future disruptions in such a critical landscape. In particular, we have to reconsider how digital technologies can support online and hybrid teaching. In recent years, video conferencing tools and learning management systems have improved to make educational resources more available and education more flexible. However, the modes of interaction they implement remain essentially unnatural for the learner due to a substantial lack of context. Modern sensor-enabled computer systems allow extending the standard human-computer interface for multimodal communication. The advances in Artificial Intelligence allow interpreting the data collected from multimodal and multi-sensor devices. These insights can be used to support online teaching and learning in isolation with personalised feedback and adaptation through Multimodal Learning Experiences (MLX). In this keynote paper, I analyse the benefits and caveats of MLX and Multimodal Learning Analytics (MMLA) for online teaching. I describe three existing MLX systems to illustrate the possible ways of how these technologies can support online teaching.
MOBIUS: Smart Mobility Tracking with Smartphone SensorsDaniele Di Mitri
Presentation for the 11th EAI International Conference Sensor System and Software (S-Cube) taking place on the 3rd December 2020.
This paper received the *Best Paper Award* at the S-Cube conference https://www.dimstudio.org/best-paper-award-at-eai-s-cube-conference/
Abstract
In this paper, we introduce MOBIUS, a smartphone-based system for remote tracking of citizens' movements. By collecting smartphone's sensor data such as accelerometer and gyroscope, along with self-report data, the MOBIUS system allows to classify the users' mode of transportation. With the MOBIUS app the users can also activate GPS tracking to visualise their journeys and travelling speed on a map. The MOBIUS app is an example of a tracing app which can provide more insights into how people move around in an urban area. In this paper, we introduce the motivation, the architectural design and development of the MOBIUS app. To further test its validity, we run a user study collecting data from multiple users. The collected data are used to train a deep convolutional neural network architecture which classifies the transportation modes using with a mean accuracy of 89%.
Video of the presentation
https://youtu.be/tBXtxcHFyMs
To appear in the Springer proceedings Science and Technologies for Smart Cities 6th EAI International Summit, SmartCity360, online December 2-4 December 2020, Proceedings
This is the final presentation of my PhD defence which took place on the 4th September 2020 at the Open University of The Netherlands.
Abstract
This doctoral thesis describes the journey of ideation, prototyping and empirical testing of the Multimodal Tutor}, a system designed for providing digital feedback that supports psychomotor skills acquisition using learning and multimodal data capturing. The feedback is given in real-time with a machine-driven assessment of the learner's task execution. The predictions are tailored by supervised machine learning models trained with human-annotated samples. The main contributions of this thesis are a literature survey on multimodal data for learning, a conceptual model (the Multimodal Learning Analytics Model), a technological framework (the Multimodal Pipeline), a data annotation tool (the Visual Inspection Tool) and a case study in Cardiopulmonary Resuscitation training (CPR Tutor). The CPR Tutor generates real-time, adaptive feedback using kinematic and myographic data and neural networks.
Link to youtube presentation
https://youtu.be/b1kDSORpV8A
Link to the PhD Thesis manuscript
https://research.ou.nl/en/publications/the-multimodal-tutor-adaptive-feedback-from-multimodal-experience
Real-time Multimodal Feedback with the CPR TutorDaniele Di Mitri
My presentation at the International Conference in Artificial Intelligence in Education (AIED’2020)
8th July 2020
Di Mitri D., Schneider J., Trebing K., Sopka S., Specht M., Drachsler H. (2020) Real-Time Multimodal Feedback with the CPR Tutor. In: Bittencourt I., Cukurova M., Muldner K., Luckin R., Millán E. (eds) Artificial Intelligence in Education. AIED 2020. Lecture Notes in Computer Science, vol 12163. Springer, Cham
https://link.springer.com/chapter/10.1007/978-3-030-52237-7_12
Doctoral consortium presentation at the 17th Conference in Artificial Intelligence in Education in Poznań, Poland 2019
Abstract: In this doctoral consortium paper, we introduce the CPR Tutor, an intelligent tutoring system for cardiopulmonary resuscitation (CPR) training based on the analysis of multimodal data. Using a multi-sensor setup, the CPR Tutor tracks the CPR execution of the trainee and generates automatic adaptive feedback to improve the trainee's performance. This research work is part of a PhD project entitled ``Multimodal Tutor: adaptive feedback from multimodal experience capturing'', a project which investigates how to use multimodal and multi-sensor data to generate personalised feedback for training psycho-motor skills at the workplace or during medical simulations. In the CPR Tutor, we use Microsoft Kinect and Myo to track trainee's body position and the ResusciAnne QCPR manikin to get correct CPR performance metrics. We then use a validated approach, the Multimodal Pipeline, for the collection, storage, processing, annotation of multimodal data. This paper describes the preliminary results obtained in the first design of the CPR Tutor.
Presented the 25th May 2019 at the conference Artificial Intelligence and Adaptive Education (AIAED'19) Beijing, China.
Abstract: We introduce the Multimodal Learning Analytics Pipeline, a generic approach for collecting and exploiting multimodal data to support learning activities across physical and digital spaces. The MMLA Pipeline facilitates researchers in setting up their multimodal experiments, reducing setup and configuration time required for collecting meaningful datasets. Using the MMLA Pipeline, researchers can decide to use a set of custom sensors to track different modalities, including behavioural cues or affective states. Hence, researchers can quickly obtain multimodal sessions consisting of synchronised sensor data and video recordings. They can analyse and annotate the sessions recorded and train machine learning algorithms to classify or predict the patterns investigated.
Read Between The Lines: an Annotation Tool for Multimodal DataDaniele Di Mitri
This is the presentation of Read Between The Lines, the paper which we published at the Learning Analytics & Knowledge Conference 2019 in Tempe, Arizona (#LAK19).
Link to the paper available in Open Access ACM library https://dl.acm.org/citation.cfm?id=3303776
Abstract:
This paper introduces the Visual Inspection Tool (VIT) which supports researchers in the annotation of multimodal data as well as the processing and exploitation for learning purposes. While most of the existing Multimodal Learning Analytics (MMLA) solutions are tailor-made for specific learning tasks and sensors, the VIT addresses the data annotation for different types of learning tasks that can be captured with a customisable set of sensors in a flexible way. The VIT supports MMLA researchers in 1) triangulating multimodal data with video recordings; 2) segmenting the multimodal data into time-intervals and adding annotations to the time-intervals; 3) downloading the annotated dataset and using it for multimodal data analysis. The VIT is a crucial component that was so far missing in the available tools for MMLA research. By filling this gap we also identified an integrated workflow that characterises current MMLA research. We call this workflow the Multimodal Learning Analytics Pipeline, a toolkit for orchestration, the use and application of various MMLA tools.
The Multimodal Tutor - short pitch presentation at JTELSS 2018 in Durrës, Alb...Daniele Di Mitri
These two slides were prepared to present the Multimodal Tutor, my PhD research project which I presented at the JTEL Summer School 2018 http://ea-tel.eu/jtelss/jtelss2018/ The slides summarise concisely the topic of my research project, the main innovation point and the major obstacles that I am facing.
This presentation was awarded the EATEL doctoral academy award 2018 at the EC-TEL conference 2018 in Leeds.
Practical skills training, co-located group interactions, and tasks alternative to the classic desktop-based learning scenario represent still a big set of learning activities taking place in the classroom and at the workplace.
Physical interactions are most of the time offline moments, not captured by the data collection therefore not included in the datasets used for data analysis.
Bringing these moments into account requires extending the data collection from conventional data sources (LMS, MOOC data etc.) to data “beyond user-computer interaction” coming from multiple modalities.
ShareAlike 4.0 International (CC BY-SA 4.0)
Here the slides of the workshop "Multimodal Machines" presented at the JTEL Summer School 2017 in Aveiro, Portugal. For more info about it
http://www.prolearn-academy.org/Events/summer-school-2017/index_html
Multimodal Tutor - Adaptive feedback from multimodal experience capturingDaniele Di Mitri
This is my 5 minutes presentation at the Doctoral Consortium of the 18th Artificial Intelligence in Education conference held the 30th June 2017 in Wuhan, China
This is the presentation of the paper "Learning pulse: a machine learning approach for predicting performance in self-regulated learning using multimodal data" which I delivered at the Learning Analytics and Knowledge conference 2017 in Vancouver, Canada.
http://dl.acm.org/citation.cfm?id=3027447&CFID=912205331&CFTOKEN=43442860
Visual Learning Pulse - Final Thesis presentationDaniele Di Mitri
The final presentation of the master thesis project Visual Learning Pulse: Flow Prediction and Feedback in Self-Regulated Learning, a project collaboration between the Department of Data Science and Knowledge Engineering of the Univeristy of Maastricht and the Welten Institute of the Open University in the Netherlands.
TITLE:
Visual Learning Pulse: Flow Prediction and Feedback in Self-regulated Learning
ABSTRACT:
Visual Learning Pulse is a Master thesis research project developed in cooperation with the Welten Institute, the Research Centre for Learning, Teaching and Technology at the Open University of the Netherlands, and partially nanced by the European project Learning Analytics Community Exchange (LACE). Visual Learning Pulse explores whether physiological and physical data such as heart rate, step count and weather data if correlated with learning activity data can be used to predict learning success in self-regulated learning settings.
To verify this hypothesis an experiment was opportunely designed, consisting of three phases, lasting six weeks and involving nine participants, each of them wearing a Fitbit HR wrist band and having their application usage recorded during their learning and working activities throughout the day. An ad-hoc infrastructure for longitudinal and multi-modal data was designed and implemented. The data from dierent sources were stored using the Experience API (xAPI) data standard in a cloud distributed database called Learning Record Store.
The participants (doctoral students at the Open Universiteit) - were asked to rate their learning experience through an Activity Rating Tool indicating their perceived level of productivity, stress, challenge and abilities. These self-reported performance indicators were used as training labels for the two algorithms employed for prediction of time series data: the Vector Autoregression and Linear Mixed Eect Model.
A major task of the thesis consisted of developing the software application to pre-process, perform the analysis and generate the predictions on real time, in order to provide timely feedback to the users about their learning performances. Although not showing high overall accuracy, the prediction models were successfully learnt and used in production: in the third phase of the experiment, two visualisations mechanisms were used, the Learner Dashboard and the Feedback Cubes.
In addition, a conceptual paper of Visual Learning Pulse, illustrating setup and overall the rationale was presented at the Learning Analytics & Knowledge conference 2016 in Edinburgh, Scotland and was included in CEUR workshop proceedings.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
Lifelong Topic Modelling presentation
1. Lifelong Topic Modelling
Paper Review Presentation
Daniele Di Mitri
Department of Knowledge Engineering
University of Maastricht
22th May 2015
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 1 / 13
2. Chosen paper
Chen, Zhiyuan, and Bing Liu.
Topic Modeling using Topics from Many Domains, Lifelong Learning
and Big Data.
Proceedings of the 31st ICML conference, 2014
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 2 / 13
3. Outline
1 Topic modelling
LDA description
LDA limitations
2 Topic modelling using knowledge
Knowledge Based Topic modelling
3 Lifelong Topic modelling
Lifelong learning approach
The proposed algorithm
Incorporation of knowledge
4 Evaluation
5 Summary
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 3 / 13
4. Latent Dirichlet Allocation
some useful backgroundLatent Dirichlet allocation (LDA)
gene 0.04
dna 0.02
genetic 0.01
.,,
life 0.02
evolve 0.01
organism 0.01
.,,
brain 0.04
neuron 0.02
nerve 0.01
...
data 0.02
number 0.02
computer 0.01
.,,
Topics Documents
Topic proportions and
assignments
• Each topic is a distribution over words
• Each document is a mixture of corpus-wide topics
• Each word is drawn from one of those topics
Figure: David Blei, Probabilistic Topic Models, 2012
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 4 / 13
5. LDA limitations
Unsupervised model can produce incoherent topics
Example
LDA sample topics
D1 = {price, color, cost, life}
D2 = {cost, picture, price, expensive}
D3 = {price, money, customer, expensive}
These topics have incoherent words: color, life, picture, customer
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 5 / 13
6. Can we use Knowledge?
some related works
SUPERVISED
Topic model in supervised settings
E.g. Blei & McAuliffe (2007)
All prior knowledge is correct
Uses ”regions” and ”labels”
UNSUPERVISED
Knowledge Based Topic Modelling
E.g. GK-LDA (Chen et al. 2013) and DF-LDA (Andrezejewski et al.
2009)
Typically assume that given knowledge is correct
They don’t extract automatically and target prior knowledge
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 6 / 13
7. Can we do better?
A fully automatic system to mine prior knowledge and deal with inconsistencies
INTUITION
If we find a set or words common in two domains these can serve as
prior knowledge
Example
D1 ∩ D2 = {price, cost}
D2 ∩ D3 = {price, expensive}
These are prior knowledge sets (pk-sets)
Example (D1 improved)
D1 = {price, cost, expensive, color}
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 7 / 13
8. Lifelong Learning approach
In 4 ”simple” steps
1 Given a set of domains D = {D1, .., Dn} it runs simple LDA(Di ) to
generate prior topics p-topics, unionised in S
2 Given a test domain Dt, run LTM(Dt) to generate c-topics At
3 For each aj ∈ At find matching topics Mt
j ∈ S (high level knowledge
for aj )
4 Mine Mt
j to generate pk-sets of length 2
Why Lifelong Learning? Retaining the learnt knowledge with LTM and
adding (replacing) it to our initial prior topics S.
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 8 / 13
9. LTM algorithm
1 Runs GibbsSampling(Dt, ∅) (equivalent to LDA), for N iterations
2 Runs GibbsSampling(Dt, Kt) for N iterations adding Kt
3 Kt is updated at each iteration using minimum Symmetrised
KL-divergence sk ∈ S and aj ∈ At, and the Frequent Itemset Mining
to generate frequent itemsets of length 2 (pk-sets)
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 9 / 13
10. How does LTM incorporate knowledge?
NB: d is added not by 1, but to a certain proportion, which stored in a
matrix and is determined by using Pointwise Mutual Information.
PMI(w1, w2) = log(P(w1, w2)/P(w1)P(w2))
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 10 / 13
11. Evaluation
Test against 4 other baseline algorithms: LDA,DF-LDA, GK-LDA
and AKL
Average Topic Coherence as quality measure
Figure: Results of tests in settings 1 & 2
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 11 / 13
12. In summary
Lifelong Topic Modelling
Learn prior knowledge
Fault tolerance
First Lifelong Learning Topic model
Big Data ready
However...
some points for improvement
Text-corpora to be diversified (only Amazon review)
Focus on the flow of the algorithm
2nd test setting and test with Big Data not fully reported
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 12 / 13