NATURAL INTERACTION AT WORK
T. M. Alisi
Univ. of Florence
Univ. of Florence
A. Del Bimbo
Univ. of Florence
Univ. of Florence
Univ. of Florence
Univ. of Florence
This paper presents the media interaction systems implemented at the Mont'Alfonso Fortress,
close to Castelnuovo Garfagnana (Lucca). The stronghold was built at the end of the 16th
century and after being abandoned for decades it was recently submitted to a complete
restoration. A multimedia environment, made of large projections, was developed in one of the
buildings inside the fortress. Users can interact with natural body gestures: the multimedia
contents of two tables are driven by user's hands, while projections on walls and floor are
activated by motion detection. All the sensing is made using near-IR cameras.
The fortress of Mont'Alfonso in Castelnuovo Garfagnana is a cultural property of the
Province of Lucca, purchased in 1980, in conditions of semi-abandonment. It was built
between 1579 and 1586 on a project by Marc'Antonio Pasi from Carpi, as a last defensive
stronghold of the Duchy of Ferrara to guard the border with the territory near Lucca. In 2000
the preliminary studies for the restoration of the Mont'Alfonso complex begun: named
‘Masterplan’, the plan was approved in 2001 for the urban and cultural revival of the fortress
and the territory of Garfagnana and included the project of an interactive system that has to
inform visitors about all the contents related to the fortress history, its restoration and the
naturalistic environment surrounding the fortress itself.
INTERACTIVE SYSTEMS AND SENSING ARCHITECTURE
The natural interactive environment installed at the Mont’Alfonso Fortress is composed of
two systems: a system for interactive walls/floor and a system for interactive tables. Computer
vision modules drive both systems.
Users can interact with these systems through gestures and simple body motion, according to
the Natural Human-Computer Interaction principle.
Two large interactive walls are located in two opposite corners of the multimedia room. A
floor projection welcomes visitors entering the room. Three ceiling mounted video projectors
are used to display contents on the walls’ surface and floor. Ceiling mounted cameras provide
video input for each of the motion detection modules.
Computer vision modules trigger each video content in the video-streaming server; all the
video projectors are connected to this server. The computer vision module analyzes the
motion inside the room and understands which part of the beheld zone is triggered. The
system thus sends a play command to the video server, choosing the projection that has to be
The Media Integration and Communication Center unveiled his natural interaction
bookshop inside Palazzo Medici Riccardi in Florence during 2008. The Mont’Alfonso
interactive tables represent a technology and design enhancement. A more compact design has
been studied to fit the multimedia environment of Mont’Alfonso Fortress, designed to be easy
to use, especially for children.
The sensing architecture is made of a camera that captures screen surface from behind the
table: hand gestures are easily detected thanks to the infrared diffuse illumination.
This setup is also simple to implement because of the use of infrared illumination parallel to
the table’s surface instead of the FTIR setup [HAN2005].
The sensing module inside the table exploit simple computer vision techniques: for each
frame of video (captured at 30fps at a resolution of 320x240 pixels) just simple image
processing operations are performed. The sequence adopted is made of: background
subtraction, luminance threshold, noise removal (erode and dilate) then the processed image is
finally connected to a components analysis module. The processed information is translated in
a TCP packet describing blob positions and tracking, then sent to the interface so as to
perform interaction commands and feedback. All the communication between the sensing
module and the interface is made using Open Sound Control protocol.
NATURAL HUMAN-COMPUTER INTERACTION
As said, the computer vision modules send information to interfaces and triggers with the
goal of creating an interactive environment comfortable for users: for this reason, the aim of
Natural Human-Computer Interaction (NHCI) research is to create new interactive
frameworks that integrate human language and behavior into tech applications, focusing on
the way people live, work, play and interact with each other. Such frameworks have to be
easy to use, intuitive, entertaining and non-intrusive. The design of natural interaction
systems is focused on recognizing innate and instinctive human expressions in relation to
some object, and return the user a corresponding feedback that has the characteristics of being
both expected and inspiring. The techniques proposed to perform such recognition are often
referred as multi-modal interaction, focusing on how machines can understand commands
coming from different channels of human communication .
Natural Interaction system can be modelled as the sum of different modules: the sensing
subsystem, which gathers sensor data about user expressions and behaviour, and the
presentation module, which realizes the dialogue with the user, orchestrating the output of
different kind of actuators (graphics display, audio, haptics). All of the technology and the
intelligence are built inside the digital artefacts and the user is not asked to use external
devices, wear anything, or learn any commands or procedure. An interesting challenge for
NHCI is therefore to make systems self-explanatory by working on their ‘affordance’ and
introducing simple and intuitive interaction languages. The human expressions that can be
utilized are those considered innate, meaning that they don’t have to be learned. This includes
vocal expressions and all the gestures used by humans to explore the nearby space or the
immediate surroundings with their bodies, like: touching, pointing and stepping into zones.
These direct actions express a clear sign of interest and necessitate of a sudden reaction from
Museums and exhibitions are often just a collection of objects, standing deaf in front of
visitors. In many cases, objects are accompanied by textual descriptions, usually too short or
long to be useful for the visitor. In the last decade, progress in multimedia has allowed for
new, experimental forms of communication (using computer technologies) in public spaces
. With our system we try to make the fruition of information intuitive and attractive. We
exploit computer vision analysis in order to recognize and analyze user’s behaviors: bare hand
gestures on interactive tabletops and body motion close to interactive walls. Users can interact
with an interactive tabletop just by using their hands for touching, pointing and selecting
digital contents (videos and pictures). Similarly, when users approach to an Interactive wall,
the system makes it react through the activation of multimedia contents.
The interfaces developed aim to show the contents of the database filled with information
regarding the restoration of the fortress and its neighbourhood. The two systems are: an
interface for an interactive table and an application of interactive video projection, both
located in the multimedia room of the Villino Liberty within the fortress.
The interface for the interactive table was developed in Actionscript 3.0 using Adobe Flash
CS3. It looks like a carousel of multimedia cards that illustrate and summarize the contents of
the database available at the Mediateca of the fortress.
A simple and usable interface design was chosen, so that the user can navigate easily between
content, and can easily find the information he needs, without the necessity to ask too many
questions on the operations of the interface. The focus is on the user, with his needs and his
On the table there are thirteen cards of content and an active area for the language choice
(Italian and English). The user can scroll through the cards and rotate the carousel in either
direction, depending on the area of the screen where he puts his hand: each card will rotate
inside the carousel along with the others towards the centre of the table, taking the leading
position. The user can choose whether to select the card in the foreground, holding the hand
above for a short period of time, or choose a different card to make the rotation start again. If
the user chooses the centre card, it enlarges and approaches to be found. The cards are divided
into two main types: cards with audio and video contributions, and cards containing a picture
gallery. The tabs are closed automatically at the end of the consultation of their content (end
of the video or picture gallery), but they still respond to the will of the user, who can just
touch to rearrange the carousel and continue the interaction. The interface is not designed as a
multi-touch, but only reacts to a ‘stimulus’ (blob) at a time.
The interactive system of video projection allows projecting the videos on two vertical
displays at the corners of the multimedia room and on the floor. A video server handles
requests for playback and pause of the videos. Whenever a user is taken from the cameras, the
system detects his presence and starts one of the video projections.
The contents available in the interactive environment have been developed by a
communication and marketing agency for the recovery and revitalization of the fortress of
Mont'Alfonso, undertaken by the Province of Lucca. The materials, mainly video and images,
present to users, besides upgrading and restructuring of the architectural heritage of the
fortress and development, within it, of teaching activities and services, the analysis and
explanation of the aspects characterizing the territory of the Garfagnana, its history, and the
typical products and traditions. Particular attention is given to archaeological issues, the
mycology, energy resources and cartography.
The design of the systems installed in the fortress was clearly made with the intention of
maximizing the User Experience. User experience design, most often abbreviated UX, but
sometimes UE, is a term used to describe the overarching experience a person has as a result
of their interactions with a particular product or service, its delivery, and related artefacts,
according to their design. As with its related term, User Interface Design, prefixing "User"
associates it primarily (though not exclusively) with digital media, especially interactive
The system is currently up and running at the fortress, and the inauguration is forthcoming at
the time this paper is written. Only time and usage will confirm that all the premises of the
project were-well formed and the system is truly an interactive natural experience.
As a measurable result, the multimedia room at the fortress is actually an interactive
environment developed with the effort of different partners: MICC – University of Florence
for the interactive systems, Studi Uniti – Firenze for the communication project, Provincia di
Lucca for the funding and location and many others for all the content shown in the
multimedia installations. The installed systems clearly show the hard work of integration
between different skills and competences with the common goal of realizing a project that is
both a system thought for people interacting with multimedia systems and an effective tool for
the strengthening of the local governance. The system shows how public investments, through
the realization of a space thought for people and filled with state of the art technology and
contents, can reduce the gap between people and institutions.
 Marsic, I.; Medl, A.; Flanagan, J. “Natural communication with information systems”,
Rutgers Univ., Piscataway, NJ, USA , Aug. 2000.
 Thomas M. Alisi, Alberto Del Bimbo, and Alessandro Valli, “Natural Interfaces to
Enhance Visitors’ Experiences”, IEEE MultiMedia archive Volume 12, Issue 3 (July 2005).
 C. H. Bischof and G. M. Shroff, “On Updating Signal Subspaces “, IEEE Trans. Signal
Processing, vol. 40, no. 1, pp. 96-105, Jan. 1992.
 R. A. Lincoln and K. Yao, “Efficient Systolic Kalman Filtering Design by Dependence
Graph Mapping”, in VLSI Signal Processing, III, IEEE Press, R. W. Brodersen and H. S.
Moscovitz Eds., 1988, pp. 396-410.
[HAN2005] Han, J. Y. 2005. Low-Cost Multi-Touch Sensing through Frustrated Total
Internal Reflection. In Proceedings of the 18th Annual ACM Symposium on User Interface
Software and Technology