keynote speech by Mark Billinghurst at the Workshop on Transitional Interfaces in Mixed and Cross-Reality, at the ACM ISS 2021 Conference. Given on November 14th 2021
A 1-hour introductory lecture on multimodal interaction that I gave to bachelor HCI students. Included a section on how to get started in this exciting line of research.
'SixthSense' is a wearable gestural interface that augments the physical world around us with digital information and lets us use natural hand gestures to interact with that information.
SixthSense is a gestural interface device comprising a neckworn pendant that contains both a data projector and camera. Headworn versions were also built at MIT Media Lab in 1997 that combined cameras and illumination systems for interactive photographic art, and also included gesture recognition (e.g. finger-tracking using colored tape on the fingers).
SixthSense is a name for extra information supplied by a wearable computer, such as the device called "WuW" (Wear yoUr World) by Pranav Mistry et al., building on the concept of the Telepointer, a neckworn projector and camera combination first proposed and reduced to practice by MIT Media Lab student Steve Mann.
keynote speech by Mark Billinghurst at the Workshop on Transitional Interfaces in Mixed and Cross-Reality, at the ACM ISS 2021 Conference. Given on November 14th 2021
A 1-hour introductory lecture on multimodal interaction that I gave to bachelor HCI students. Included a section on how to get started in this exciting line of research.
'SixthSense' is a wearable gestural interface that augments the physical world around us with digital information and lets us use natural hand gestures to interact with that information.
SixthSense is a gestural interface device comprising a neckworn pendant that contains both a data projector and camera. Headworn versions were also built at MIT Media Lab in 1997 that combined cameras and illumination systems for interactive photographic art, and also included gesture recognition (e.g. finger-tracking using colored tape on the fingers).
SixthSense is a name for extra information supplied by a wearable computer, such as the device called "WuW" (Wear yoUr World) by Pranav Mistry et al., building on the concept of the Telepointer, a neckworn projector and camera combination first proposed and reduced to practice by MIT Media Lab student Steve Mann.
Venture capital firm LETA Capital issues a report called “State of Phygital 2022”. It is our annual report highlighting the concept of “Phygital” and introduce a fresh look at the very fundamentals of “Phygital”, which is a set of technologies that combine the physical (offline) world with digital (online).
We share our vision and insights on the rising merge of a few technology directions, such as XR, IoT, Edge Computing etc., into a new paradigm called Phygital. In contrast to the digital-only Metaverse future, Phygital is trying to enhance objects from the real world with digital content and capabilities. It is a trend that has been gaining momentum in recent years and will continue to grow in the coming years.
We hope you will pay particular attention to page #11 of our report because we tried to prepare an ultimate market map of startups and vendors, involved in the creation of the Phygital world. Please, feel free to reach out if you know more solution providers, we will add them to the list next year.
In the previous report, we decided to focus on macroeconomic factors of the Phygital economy's emergence, challenges, and projections. Most of them are still relevant today. We believe that many upcoming events, such as Meta's Next VR Headset release this October (Meta Cambria, aka Quest Pro), will accelerate the penetration into everyday life.
Sergey Toporov, Partner at LETA Capital: "This year, in State of the Phygital 2022, we decided to elaborate more on existing use-cases of the Phygital approach. We want to help businesses and people from various industries, from retail to construction and logistics, see what has been done already and let them generate more new cases and brilliant ideas."
COMP 4010 Lecture 8 on an Introduction to Augmented Reality. This lecture provides a basic introduction to AR. Taught by Gun Lee on September 17th 2019 at the University of South Australia.
Blockchain Révolution ou illusion : présentation des enjeux par les cas d’usage
Sous le capot blockchain : décryptage du fonctionnement de la blockchain
Comment démarrer : étapes clés pour bâtir vos premiers cas d’usage
Overview Of Blockchain Technology And Architecture Powerpoint Presentation Sl...SlideTeam
Enhance your audiences knowledge with this well researched complete deck. Showcase all the important features of the deck with perfect visuals. This deck comprises of total of thirty two slides with each slide explained in detail. Each template comprises of professional diagrams and layouts. Our professional PowerPoint experts have also included icons, graphs and charts for your convenience. All you have to do is DOWNLOAD the deck. Make changes as per the requirement. Yes, these PPT slides are completely customizable. Edit the colour, text and font size. Add or delete the content from the slide. And leave your audience awestruck with the professionally designed Overview Of Blockchain Technology And Architecture Powerpoint Presentation Slides complete deck. https://bit.ly/3cJ7GmX
Lecture 3 in the COMP 4010 course on AR and VR. This lecture was taught by Professor Bruce Thomas on August 9th 2016. It focused on Human Perception and senses in relation to Virtual Reality.
Lecture 1 of the COMP 4010 course on AR and VR. This lecture provides an introduction to AR/VR/MR/XR. The lecture was taught at the University of South Australia by Mark Billinghurst on July 21st 2021.
Venture capital firm LETA Capital issues a report called “State of Phygital 2022”. It is our annual report highlighting the concept of “Phygital” and introduce a fresh look at the very fundamentals of “Phygital”, which is a set of technologies that combine the physical (offline) world with digital (online).
We share our vision and insights on the rising merge of a few technology directions, such as XR, IoT, Edge Computing etc., into a new paradigm called Phygital. In contrast to the digital-only Metaverse future, Phygital is trying to enhance objects from the real world with digital content and capabilities. It is a trend that has been gaining momentum in recent years and will continue to grow in the coming years.
We hope you will pay particular attention to page #11 of our report because we tried to prepare an ultimate market map of startups and vendors, involved in the creation of the Phygital world. Please, feel free to reach out if you know more solution providers, we will add them to the list next year.
In the previous report, we decided to focus on macroeconomic factors of the Phygital economy's emergence, challenges, and projections. Most of them are still relevant today. We believe that many upcoming events, such as Meta's Next VR Headset release this October (Meta Cambria, aka Quest Pro), will accelerate the penetration into everyday life.
Sergey Toporov, Partner at LETA Capital: "This year, in State of the Phygital 2022, we decided to elaborate more on existing use-cases of the Phygital approach. We want to help businesses and people from various industries, from retail to construction and logistics, see what has been done already and let them generate more new cases and brilliant ideas."
COMP 4010 Lecture 8 on an Introduction to Augmented Reality. This lecture provides a basic introduction to AR. Taught by Gun Lee on September 17th 2019 at the University of South Australia.
Blockchain Révolution ou illusion : présentation des enjeux par les cas d’usage
Sous le capot blockchain : décryptage du fonctionnement de la blockchain
Comment démarrer : étapes clés pour bâtir vos premiers cas d’usage
Overview Of Blockchain Technology And Architecture Powerpoint Presentation Sl...SlideTeam
Enhance your audiences knowledge with this well researched complete deck. Showcase all the important features of the deck with perfect visuals. This deck comprises of total of thirty two slides with each slide explained in detail. Each template comprises of professional diagrams and layouts. Our professional PowerPoint experts have also included icons, graphs and charts for your convenience. All you have to do is DOWNLOAD the deck. Make changes as per the requirement. Yes, these PPT slides are completely customizable. Edit the colour, text and font size. Add or delete the content from the slide. And leave your audience awestruck with the professionally designed Overview Of Blockchain Technology And Architecture Powerpoint Presentation Slides complete deck. https://bit.ly/3cJ7GmX
Lecture 3 in the COMP 4010 course on AR and VR. This lecture was taught by Professor Bruce Thomas on August 9th 2016. It focused on Human Perception and senses in relation to Virtual Reality.
Lecture 1 of the COMP 4010 course on AR and VR. This lecture provides an introduction to AR/VR/MR/XR. The lecture was taught at the University of South Australia by Mark Billinghurst on July 21st 2021.
It is a seminar presentation on a technology called Virtual reality. It key features are what is virtual reality, its history and evolution, its types, devices that are used for Virtual reality and where virtual reality is applicable.
The Top Skills That Can Get You Hired in 2017LinkedIn
We analyzed all the recruiting activity on LinkedIn this year and identified the Top Skills employers seek. Starting Oct 24, learn these skills and much more for free during the Week of Learning.
#AlwaysBeLearning https://learning.linkedin.com/week-of-learning
Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data.
Abstract
In this paper we propose a new product inwhich speech is used to interact with computers. Speech is a man’s most powerful form
of communication. The user will be able to give various voice commands to the system, which the system will recognize and
execute tasks based on the input command. This system will provide another form of input (apart from mouse and keyboard) for
daily users. It will also be of great assistance to physically challenged users. The user will be able to perform all the operations
using his voice as input which he is able to perform normally using mouse and keyboard. The system however requires a little bit
training from the user, so that the system will understand the user better. The need of training is due to the fact that every person
has different voice. Also the voice of women is totally distinct from men. More training will result in faster and accurate response.
Extensive experiments are conducted in order to check the accuracy and efficiency of the system, and the results show that our
system is reliable and achieves better accuracy and efficiency than previous systems.
KeyWords: Speech Technology, Voice Response System, Voice User Interface, Voice Recognition.
From PaperPoint to MindXpres - Towards Enhanced Presentation ToolsBeat Signer
Presentation given at Education Thursday, December, 2014
Related publications:
PaperPoint: https://www.academia.edu/175439/
MindXpres: https://www.academia.edu/7719770/
ICT research in the context of European Union
CASE SUMMER SCHOOL ON APPLIED SOFTWARE ENGINEERING
APPLIED SOFTWARE PROCESS MANAGEMENT AND TESTING
JULY 6-10, 2009, BOZEN/BOLZANO, ITALY
● Application of LSTM and CONV1D LSTM Network in Stock Forecasting Model
● Development of a Novel Media-independent Communication Theology for Accessing Local & Web-based Data: Case Study with Robotic Subsystems
● Defect Detection in CK45 Steel Structures through C-scan Images Using Deep Learning Method
● Ransomware Attack: Rescue-checklist Cyber Security Awareness Program
● Machine Learning Meets the Semantic Web
● Fuzzy Logic Based Perceptual Image Hashing Algorithm in Malaysian Banknotes Detection System for the Visually Impaired
User interfaces are omnipresent in daily consumer goods, whether it is in the usage of brown goods or the access to the menu of a DVD player, as well as in the Ambient Assistive Living services that are being established like for example in the OASIS project and which will be provided to older people. Considering the growing number of older people and the directly linked increasing number of disabilities that will appear accordingly, OASIS project has taken the step to consider accessible user interfaces (UI) from the very beginning using a variety of interaction technologies. In addition, it will also have its interfaces assessed by using the assessment tools provided by the ACCESSIBLE project that focuses on ensuring web, desktop and mobile interfaces are fully accessible from the very beginning, without any adjusting afterwards.
Demonstration of visual based and audio-based hci systemeSAT Journals
Abstract This paper is an attempt to provide a bird’s eye view to the concept of Human Compute Interaction (HCI). The intention is to focus on the uni-modal architecture of HCI; especially the HCI system based on visual-based and color-based communication channels viz-a-viz color recognition and speech recognition. We have developed a Graphical User Interface (GUI) for the same using MATLAB; one push button assigned for color input (through webcam) and the other push button assigned for speech input (through microphone). In color recognition, primary colors i.e. RGB are detected in frames captured in real time or images uploaded offline. Subsequently, desired operation is executed (we have set commands to open D drive). In speech recognition, audio input through microphone is compared with a pre-stored audio file and then an operation is performed automatically (here, we have set commands to open Google web browser). The respective algorithms of these two processes have been described with flow-charts and snapshots of MATLAB results have been displayed. Keywords: Human Computer Interaction, Uni-Modal Architecture, Color Recognition, Speech Recognition
The Recurated Museum: IV. Collections Management & SustainabilityChristopher Morse
Slides from the fourth session of the course "The Recurated Museum" by Sytze Van Herck & Christopher Morse at the University of Luxembourg (Summer Semester, 2020).
Course slides typically begin with a brief summary of the online discussions that occurred before the session.
Relinquishing Control: Creating Space for Open Innovationfrog
frog Creative Director Thomas Sutton spoke on the main stage at the Lift conference in Geneva, Switzerland on February 2. His presentation is about cultivating empty spaces for open innovation to understand what people need and want from their products.
HCI BASED APPLICATION FOR PLAYING COMPUTER GAMES | J4RV4I1014Journal For Research
This paper describes a command interface for games based on hand gestures and voice command defined by postures, movement and location. The system uses computer vision requiring no sensors or markers by the user. In voice command the speech recognizer, recognize the input from the user. It stores and passes command to the game, action takes place. We propose a simple architecture for performing real time colour detection and motion tracking using a webcam. The next step is to track the motion of the specified colours and the resulting actions are given as input commands to the system. We specify blue colour for motion tracking and green colour for mouse pointer. The speech recognition is the process of automatically recognizing a certain word spoken by a particular speaker based on individual information included in speech waves. This application will help in reduction in hardware requirement and can be implemented in other electronic devices also.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Similar to Multimodal Interaction - Lecture 05 - Next Generation User Interfaces (4018166FNR) (20)
Indoor Positioning Using the OpenHPS FrameworkBeat Signer
Research paper presentation given at IPIN 2021, Lloret de Mar, Spain.
Hybrid positioning frameworks use various sensors and algorithms to enhance positioning through different types of fusion. The optimisation of the fusion process requires the testing of different algorithm parameters and optimal lowas well as high-level sensor fusion techniques. The presented OpenHPS open source hybrid positioning system is a modular framework managing individual nodes in a process network, which can be configured to support concrete positioning use cases or to adapt to specific technologies. This modularity allows developers to rapidly develop and optimise their positioning system while still providing them the flexibility to add their own algorithms. In this paper we discuss how a process network developed with OpenHPS can be used to realise a customisable indoor positioning solution with an offline and online stage, and how it can be adapted for high accuracy or low latency. For the demonstration and validation of our indoor positioning solution, we further compiled a publicly available dataset containing data from WLAN access points, BLE beacons as well as several trajectories that include IMU data.
Research paper: https://beatsigner.com/publications/indoor-positioning-using-the-openhps-framework.pdf
Personalised Learning Environments Based on Knowledge Graphs and the Zone of ...Beat Signer
Presentation given at CSEDU 2022, Virtual Event.
The learning of new knowledge and skills often requires previous knowledge, which can lead to some frustration if a teacher does not know a learner's exact knowledge and skills and therefore confronts them with exercises that are too difficult to solve. We present a solution to address this issue when teaching techniques and skills in the domain of table tennis, based on the concrete needs of trainers that we have investigated in a survey. We present a conceptual model for the representation of knowledge graphs as well as the level at which individual players already master parts of this knowledge graph. Our fine-grained model enables the automatic suggestion of optimal exercises in a player's so-called zone of proximal development, and our domain-specific application allows table tennis trainers to schedule their training sessions and exercises based on this rich information. In an initial evaluation of the resulting solution for personalised learning environments, we received positive and promising feedback from trainers. We are currently investigating how our approach and conceptual model can be generalised to some more traditional educational settings and how the personalised learning environment might be further improved based on the expressive concepts of the presented model.
Research paper: https://beatsigner.com/publications/personalised-learning-environments-based-on-knowledge-graphs-and-the-zone-of-proximal-development.pdf
Cross-Media Technologies and Applications - Future Directions for Personal In...Beat Signer
Webinar given at icity Lab Talks - The Digital Value Chain
In this talk, I will first provide an overview of the lab’s research on a general data-driven approach for cross-media information system and architectures based on the resource-selector-link (RSL) hypermedia metamodel. We will then have a look at several cross-media applications for personal information management and next-generation presentation solutions (MindXpres). Finally, I will outline the lab’s most recent research on tangible interaction and dynamic data physicalisation.
Codeschool in a Box: A Low-Barrier Approach to Packaging Programming CurriculaBeat Signer
Presentation given at CSEDU 2023, Prague, Czech Republic.
The tech industry is a fast-growing field, with many companies facing issues in finding skilled workers to fill their open vacancies. At the same time, many people have limited access to the quality education necessary to enter this job market. To address this issue, various small and often volunteer-run non-profit organisations have emerged to up-skill capable learners. However, these organisations face tight constraints and many challenges while trying to design and deliver high-quality education to their learners. In this position paper, we discuss some of these challenges and present a preliminary version of a curriculum packager addressing some of these issues. Our proposed solution, inspired by first-hand experience in these organisations as well as computing education research (CER), is based on a combination of micromaterials, study lenses and a companion mobile application. While our solution is designed for the specific context of small organisations providing vocational ICT training, it can also be applied to the broader domain of learning environments facing similar constraints.
Research paper: https://beatsigner.com/publications/codeschool-in-a-box-a-low-barrier-approach-to-packaging-programming-curricula.pdf
Towards a Framework for Dynamic Data PhysicalisationBeat Signer
Presentation given at the International Workshop Toward a Design Language for Data Physicalization, Berlin, Germany, October 2018
ABSTRACT: Advanced data visualisation techniques enable the exploration and analysis of large datasets. Recently, there is the emerging field of data physicalisation, where data is represented in physical space (e.g. via physical models) and can no longer only be explored visually, but also by making use of other senses such as touch. Most existing data physicalisation solutions are static and cannot be dynamically updated based on a user's interaction. Our goal is to develop a framework for new forms of dynamic data physicalisation in order to support an interactive exploration and analysis of datasets. Based on a study of the design space for dynamic data physicalisation, we are therefore working on a grammar for representing the fundamental physical operations and interactions that can be applied to the underlying data. Our envisioned extensible data physicalisation framework will enable the rapid prototyping of dynamic data physicalisations and thereby support researchers who want to experiment with new combinations of physical variables or output devices for dynamic data physicalisation as well as designers and application developers who are interested in the development of innovative dynamic data physicalisation solutions.
Paper: https://www.academia.edu/37336859/Towards_a_Framework_for_Dynamic_Data_Physicalisation
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Multimodal Interaction - Lecture 05 - Next Generation User Interfaces (4018166FNR)
1. 2 December 2005
Next Generation User Interfaces
Multimodal Interaction
Prof. Beat Signer
Department of Computer Science
Vrije Universiteit Brussel
http://www.beatsigner.com
2. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2October 24, 2016
Human Senses and Modalities
Input as well as output modalities
Sensory Perception Sense Organ Modality
sight eyes visual
hearing ears auditory
smell nose olfactory
taste tongue gustatory
touch skin tactile
(balance) vestibular system vestibular
Modality “refers to the type of communication channel used to convey
or acquire information. It also covers the way an idea is expressed or
perceived, or the manner an action is performed.”
Nigay and Coutaz, 1993
3. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3October 24, 2016
Bolt's "Put-that-there" (1980)
Bolt, 1980
4. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4October 24, 2016
Video: “Put-that-there”
5. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5October 24, 2016
Bolt's "Put-that-there" (1980) ...
Combination of two input modalities
speech is used to issue the semantic part of the command
gestures are used to provide the locations on the screen
Complementary use of both modalities
use of only speech or gestures does not enable the same
commands
The "Put-that-there" interaction is still regarded as a rich
case of multimodal interaction
however, it is based on an old mouse-oriented metaphor for
selecting objects
6. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6October 24, 2016
Multimodal Interaction
Multimodal interaction is about human-machine
interaction involving multiple modalities
input modalities
- speech, gesture, gaze, emotions, …
output modalities
- voice synthesis, visual cues, …
Humans are inherently multimodal!
7. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7October 24, 2016
Example: SpeeG2
User
Speech recognition
(Microsoft SAPI 5.4)
Skeletal tracking
(Microsoft Kinect)
5
4
2
3
SpeeG2 GUI
6
1
Sven De Kock
8. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8October 24, 2016
Example: SpeeG2 …
While as user speaks a
sentence, the most
probable words are shown
on the screen
By using hand gestures
the user can correct words
(by selection) if necessary
Improved recognition rate
with up to 21 words per
minute (WPM)
SpeeG2, WISE Lab
9. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9October 24, 2016
Video: SpeeG2
10. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10October 24, 2016
GUIs vs. Multimodal Interfaces
Graphical User Interfaces Multimodal Interfaces
Single event input stream that controls
the event loop
Typically continuous and simultaneous input
from multiple input streams
Sequential processing Parallel processing
Centralised architecture Often distributed architectures (e.g. multi
agent) with high computational and memory
requirements
No temporal constraints Requires timestamping and temporal
constraints for multimodal fusion
11. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11October 24, 2016
Multimodal Human-Machine Interaction Loop
Dumas et al., 2009
12. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12October 24, 2016
Multimodal Fusion
Fusion in a multimodal system can happen at three
different levels
data level
feature level
decision level
Dumas et al., 2009
13. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13October 24, 2016
Multimodal Fusion …
Data-level fusion
focuses on the fusion of identical or tightly linked types of
multimodal data
- e.g. two video streams capturing the same scene from different angles
- low semantics and highly sensitive to noise (no preprocessing)
Feature-level fusion
fusion of features extracted from the data
- e.g. speech and lip movements
Decision-level fusion
focuses on interpretation based on semantic data
- e.g. Bolt’s "Put-that-there" speech and gesture
fusion of high-level information derived from data- and feature-
level fusion
- highly resistant to noise and failures
14. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14October 24, 2016
Comparison of Fusion Levels
Data-Level Fusion Feature-Level Fusion Decision-Level Fusion
Used for raw data of the
same modality
closely coupled
modalities
loosely coupled
modalities
Level of
detail (data)
highest level of
detail
moderate level of
detail
difficult to recover data
that has been fused on
the data and feature
level
Sensitivity
to noise and
failures
highly sensitive moderately sensitive not very sensitive
Usage not frequently used
for multimodal
interfaces
used to improve the
recognition rate
most widely used level
of fusion for multimodal
interfaces
15. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15October 24, 2016
Advantages of Multimodal Input
Support and accommodate a user's perceptual
and communicative capabilities
natural user interfaces with new ways of engaging interaction
Integration of computational skillls of computers in the
real world by offering more natural ways of human-
machine interaction
Enhanced robustness due to the combination of different
(partial) information sources
Flexible personalisation by using different modalities
based on user preferences and context
Helps visually or physically impaired users
16. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16October 24, 2016
Multimodal Fission
Forms a comprehensible answer for the user by making
use of multiple output modalities based on the device
and context
builds an abstract message through a combination of channels
Three main steps
selection and structuring of content
selection of modalities
output coordination
Coordination of the output on each channel is necessary
to form a coherent message
17. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17October 24, 2016
Ten Myths of Multimodal Interaction
1. If you build a multimodal system, users
will interact multimodally
users show a strong preference to interact
multimodally but this does not mean that they
always interact multimodally
unimodal and multimodal interactions are mixed
depending on the task to be carried out
2. Speech and pointing is the dominant
multimodal integration pattern
heritage from Bolt's "Put-that-there" where gesture and speech
are used to select an object (mouse-oriented metaphor)
modalities can be used for more than just object selection
- written input, facial expressions, ...
Sharon Oviatt
18. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18October 24, 2016
Ten Myths of Multimodal Interaction ...
3. Multimodal input involves simultaneous signals
signals are often not overlapping
users frequently introduce (consciously or not) a small delay
between two modal inputs
developers should not rely on an overlap for the fusion process
4. Speech is the primary input mode in any multimodal
system that includes it
in human-human communication speech is indeed the primary
input mode
in human-machine communication speech is not the exclusive
carrier of important information and does also not have temporal
precedence over other modalities
19. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19October 24, 2016
Ten Myths of Multimodal Interaction ...
5. Multimodal language does not differ linguistically from
unimodal language
different modalities complement each other
- e.g. avoid error-prone voice descriptions of spatial locations by pointing
in many respects multimodal language is substantially simplified
- might lead to more robust systems
6. Multimodal integration involves redundancy of content
between modes
early research in multimodal interaction assumed that different
redundant modalities could improve the recognition
complementary modalities are more important and designers
should not rely on duplicated content
20. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20October 24, 2016
Ten Myths of Multimodal Interaction ...
7. Individual error-prone recognition technologies combine
multimodally to produce even greater unreliability
assumes that using two error-prone input modes such as speech
and handwriting recognition results in greater unreliability
however, increased robustness due to mutual disambiguation
between modalities can be observed
8. All users’ multimodal commands are integrated in a
uniform way
different users use different integration patterns (e.g. parallel vs.
sequential)
system that can detect and adapt to a user’s dominant integration
pattern might lead to increased recognition rates
21. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21October 24, 2016
Ten Myths of Multimodal Interaction ...
9. Different input modes are capable of transmitting
comparable content
some modalities can be compared (e.g. speech and writing)
however, each modality has its own very distinctive gestures
- e.g. gaze vs. a pointing device such a the Wiimote
10.Enhanced efficiency is the main advantage of
multimodal systems
often there is only a minimal increase in efficiency
however, many other advantages
- less errors, enhanced usability, flexibility, mutual disambiguation, …
22. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22October 24, 2016
Formal Models for Combining Modalities
Formalisation of multimodal human-machine interaction
Conceptualise the different possible relationships
between input and output modalities
Two conceptual spaces
CASE: use and combination or modalities on the system side
(fusion engine)
CARE: possible combination of modalities at the user level
23. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23October 24, 2016
CASE Model
3 dimensions in the
CASE design space
levels of abstraction
use of modalities
fusion
Nigay and Coutaz, 1993
24. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24October 24, 2016
CASE Model: Levels of Abstraction
Data received from a
device can be processed
at multiple levels of
abstraction
Speech analysis example
signal (data) level
phonetic (feature) level
semantic level
A multimodal system is
always classified under
the meaning category
Nigay and Coutaz, 1993
25. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25October 24, 2016
CASE Model: Use of Modalities
The use of modalities
expresses the temporal
availability of multiple
modalities
parallel use allows the user
to employ multiple modalities
simultaneously
sequential use forces the
user to use the modalities
one after another
Nigay and Coutaz, 1993
26. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26October 24, 2016
CASE Model: Fusion
Possible combination of
different types of data
combined means that there
is fusion
independant means that
there is an absence of fusion
Nigay and Coutaz, 1993
27. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27October 24, 2016
CASE Model: Classification
Concurrent
two distinctive tasks
executed in parallel
e.g. draw a circle and draw
a square next to it
Alternate
task with temporal alternation
of modalities
e.g. "Draw a circle there"
followed by pointing to the
location
Nigay and Coutaz, 1993
28. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28October 24, 2016
CASE Model: Classification
Synergistic
task using several coreferent
modalities in parallel
e.g. "Draw a circle here" with
concurrent pointing to the
location
synergistic multimodal
systems are the ultimate goal
Exclusive
one task after the other with
one modality at a time (no
coreference)
Nigay and Coutaz, 1993
29. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29October 24, 2016
CARE Properties
Four properties to characterise and assess aspects of
multimodal interaction in terms of the combination of
modalities at the user level [Coutaz et al., 1995]
Complementarity
Assignment
Redundancy
Equivalence
30. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30October 24, 2016
Complementarity
Multiple modalities are to be used within a temporal
window in order reach a given state
No single modality on its own is sufficient to reach the
state
Integration (fusion) can happen sequentially or in parallel
Example
“Please show me this list”
and <pointing at ‘list of flights’ label>
31. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31October 24, 2016
Assignment
Only one modality can be used to reach a given state
Absence of choice
Example
Moving the mouse to change the position of a window
(considering that the mouse is the only modality available for that
operation)
32. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32October 24, 2016
Redundancy
If multiple modalities have the same expressive power
(equivalent) and if they are used within the same
temporal window
Repetitive behaviour without increasing the expressive
power
Integration (fusion) can happen sequentially or in parallel
Example
“Could you show me the list of flights?”
and <click on ‘list of flights’ button>
33. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33October 24, 2016
Equivalence
Necessary and sufficient to use any single of the
available modalities
Choice between multiple modalities
No temporal constraints
Example
“Could you show me the list of flights?”
or <click on ‘list of flights’ button>
34. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34October 24, 2016
Example for Equivalence: EdFest 2004
Edfest 2004 prototype con-
sisted of a digital pen and
paper interface in combination
with speech input and voice
output [Belotti et al., 2005]
User had a choice to execute
some commands either via
pen input or through speech
input
35. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35October 24, 2016
Multimodal Interaction Frameworks
Software frameworks for the creation of multimodal
interfaces
Squidy
OpenInterface
HephaisTK
Mudra
Three families of frameworks
stream based
- Squidy, OpenInterface
event based
- HephaisTK
hybrid
- Mudra
36. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36October 24, 2016
Squidy
Integrated tool with a GUI
Interfaces are define by
pipelining components
and filters
Online knowledge base of
device components and
filters
Semantic zooming
e.g. zooming on a filter
shows a graph of the data
passing through the filter
http://www.squidy-lib.de
37. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37October 24, 2016
OpenInterface Framework
European research project
(2006-2009)
Component-based
framework
Online component library
with components for
numerous modalities and
devices
Design an application by
pipelining components
http://www.openinterface.org
38. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38October 24, 2016
HephaisTK
Software agents-based
framework
Focus on events
Different fusion algorithms
Description of the
dialogue via the SMUIML
language
http://www.hephaistk.org
39. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 39October 24, 2016
UsiXML
User Interface eXtended
Markup Language
XML-based declarative
User Interface Description
Language (UIDL) to des-
cribe abstract interfaces
can later be mapped to
different physical character-
istics (including multimodal
user interfaces)
Stanciulescu et al., 2005
40. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 40October 24, 2016
Mudra
Fusion across different
levels of abstraction (data,
feature and decision)
via fact base
Interactions described via
a declarative rule-based
language
Rapid prototyping
simple integration of new
input devices
integration of external
gesture recognisers
Hoste et al., 2011
41. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 41October 24, 2016
References
R.A. Bolt, “Put-that-there”: Voice and Gesture at
the Graphics Interface, In Proceedings of Siggraph 1980,
7th Annual Conference on Computer Graphics and Inter-
active Techniques, Seattle, USA, July 1980
http://dx.doi.org/10.1145/965105.807503
L. Nigay and J. Coutaz, A Design Space for Multimodal
Systems: Concurrent Processing and Data Fusion, In
Proceedings of CHI 1993, 3rd International Conference
on Human Factors in Computing Systems, Amsterdam,
The Netherlands, April 1993
http://dx.doi.org/10.1145/169059.169143
42. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 42October 24, 2016
References …
J. Coutaz et al., Four Easy Pieces for
Assessing the Usability of Multimodal Interaction: The
CARE Properties, In Proceedings of Interact 1995, 5th
International Conference on Human-Computer
Interaction, Lillehammer, Norway, June 1995
http://dx.doi.org/10.1007/978-1-5041-2896-4_19
B. Dumas, D. Lalanne and S. Oviatt, Multi-
modal Interfaces: A Survey of Principles, Models and
Frameworks, Human Machine Interaction, LNCS 5440,
2009
http://dx.doi.org/10.1007/978-3-642-00437-7_1
43. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 43October 24, 2016
References …
S. Oviatt, Ten Myths of Multimodal Interaction,
Communications of the ACM 42(11), November 1999
http://dx.doi.org/10.1145/319382.319398
Put-that-there demo
https://www.youtube.com/watch?v=RyBEUyEtxQo
R. Belotti, C. Decurtins, M.C. Norrie, B. Signer and
L. Vukelja, Experimental Platform for Mobile Information
Systems, Proceedings of MobiCom 2005, 11th Annual
International Conference on Mobile Computing and Net-
working, Cologne, Germany, August 2005
http://dx.doi.org/10.1145/1080829.1080856
44. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 44October 24, 2016
References …
L. Hoste and B. Signer, SpeeG2: A Speech-
and Gesture-based Interface for Efficient Controller-free
Text Entry, Proceedings of ICMI 2013, 15th International
Conference on Multimodal Interaction, Sydney, Australia,
December 2013
http://beatsigner.com/publications/hoste_ICMI2013.pdf
L. Hoste, B. Dumas and B. Signer, Mudra: A Unified
Multimodal Interaction Framework, Proceedings of
ICMI 2011, 13th International Conference on Multimodal
Interaction, Alicante, Spain, November 2011
http://beatsigner.com/publications/hoste_ICMI2011.pdf
45. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 45October 24, 2016
References …
L. Hoste, A Declarative Approach for Engineer-
ing Multimodal Interaction, PhD thesis, Vrije Universiteit
Brussel (VUB), Brussels, Belgium, June 2015
http://beatsigner.com/theses/PhdThesisLodeHoste.pdf
A. Stanciulescu, Q. Limbourg, J. Vanderdonckt,
B. Michotte and F. Montero, A Transformational Ap-
proach for Multimodal Web User Interfaces Based on
UsiXML, Proceedings of ICMI 2005, 7th International
Conference on Multimodal Interfaces, Trento, Italy,
October 2005
http://dx.doi.org/10.1145/1088463.1088508
46. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 46October 24, 2016
References …
SpeeG2 demo
https://www.youtube.com/watch?v=ItKySNv8l90