§§بسم ا الرحمن الرحيم
College of Science & Technology
Department of Computer
Mining Multi Media Data
Drear Numan Bader
2010 ,الثنين, 01 مايو
Mining Multi Media Data
What is mining multi media data ?
Is multimedia data mining just a new combination of buzz-words or is it a new interdisciplinary field which not only
incorporates methods and techniques from the relevant disciplines, but is also capable to produce new methodologies and
influence related interdisciplinary fields. Rather than making an overview of existing methods and techniques, or presenting a
particular technique, this paper aims to present some facets of multi media data mining in the context of its potential to
influence some relatively new interdisciplinary domains.
Data Mining, also known as Knowledge Discovery from Multimedia, is not Information Retrieval from multimedia, contrary to
popular belief, but the extraction of interesting patterns and new facts from multimedia objects. Knowledge discovery in
multimedia databases is focused on the synergy between two fields: Knowledge Discovery and Multimedia Databases .
Knowledge discovery and data mining, which consist in extracting valuable and relevant knowledge from large volumes of data,
have received much attention these last years. The approaches used for knowledge discovery are non-trivial and often domain
specific, depending on the canonical mining primitives. The data patterns discovered are typically used in decision-making
whether in business, in scientific research or other.
While significant research has been done on knowledge discovery from large corpora, most of the approaches are related to
numerical transactional data such as market-basket analysis, web activities, etc….. , thus very little has been achieved on mining
multimedia data probably due to the complexity of multimedia and multimedia repositories. The aim of this workshop is to
bring together experts in digital media content analysis, state- of-art data mining and knowledge discovery in multimedia
database systems, knowledge engineers and domain experts from diverse applied disciplines with potential in multimedia data
mining. There is a massive increase of information available on electronic networks. This profusion of resources on the World-
Wide Web gave rise to considerable interest in the research community. Traditional information retrieval techniques have been
applied to the document collection on the Internet, and a panoply of search engines and tools have been proposed and
implemented. However, the electiveness of these tools is not satisfactory. None of them is capable of discovering knowledge
from the Internet. The Web is still an alarming rate.
In a recent report on the future of database research known as the Sylmar Report, it has been predicted that in ten years from
now , the majority of human information will be available on the World-Wide Web, and it has been observed that the database
research community has contributed little to the Web thus far. In this work we propose a structure, called a Virtual Web View,
on top of the existing Web. Through this virtual view.
Data mining and information retrieval from the Web by drilling through the visualized results.
The preliminary experiments demonstrate that multi media data mining may lead to interesting and fruitful knowledge
discoveries in multimedia databases. Multimedia and digital media, and data mining are perhaps among the top ten most
overused terms in the last decade. The field of multimedia and digital media is at the intersection of several major fields,
including computing, telecommunications, desktop publishing, digital arts, the television/movie/game/broadcasting industry, audio-
video electronics. The advent of Internet and low-cost digital audio/video sensors accelerated the development of distributed
multimedia systems and on-line multimedia communication. The list of their application spans from distance-learning , digital
libraries, and home entertainment to fine arts, fundamental and applied science and research. As a result there is some multiplicity of
definitions and fluctuations in terminology. In this paper digital (multi)media denotes computer-mediated and controlled integration
of numeric, text, graphics and other geometry representations (CAD drawings, 3D models, virtual universes), images , animation,
sound , video and any other type of information medium which can be represented, stored, processed and transmitted over the
network in digital form.
This report presents a brief overview of multimedia data mining and the corresponding workshop series at ACM SIGKDD
conference series on data mining and knowledge discovery. It summarizes the presentations , conclusions and directions for
future work that were discussed during the 3 rd edition of the International Workshop on Multimedia Data Mining.
Multimedia data mining, knowledge discovery, multimedia databases, image content mining, digital media, sound analysis
,and the video analysis .
Digital multimedia differs from previous forms of combined media in that the bits that represent text, images, animations, and
audio, video and other signals can be treated as data by computer programs. One facet of this diverse data in terms of
underlying models and formats is that it is synchronized and integrated, hence, can be treated as integral data records. Such
.records can be found in a number of areas of human endeavor
,Modern medicine generates huge amounts of such digital data. For example, a medical data record may include SPECT images
DNA micro array data, ECG signals, clinical measurements, like blood pressure and cholesterol levels, and the description of the
diagnosis given by the physician interpretation of all these data. Another example is architecture design and related Architecture ,
Engineering and Construction (AEC) industry. An architectural design data record may include CAD files with floor plans and
data for generating 3D-models, links to building component data bases and the instances of this components, stored in
corresponding formats , images of the environment, text descriptions and drawings from the initial and evolved design
,requirements, financial and personnel data, and other data related to the other disciplines involved in the AEC industry
.e.g. civil and electrical engineering
Virtual communities (in the broad sense of this word, which includes any communities mediated by digital technologies ) are
another example where generated data constitutes an integral data record. Such data may include data about member profiles, the
content generated by the virtual community, and communication data in different formats, including email, chat records, SMS
.messages, video conferencing records
No t all multimedia data is so diverse. An example of less diverse, but larger in terms of the collected amount, is the data
generated by video surveillance systems, where each integral data record roughly consists of a set of time-stamped images – the
video frames. In any case, the collection of such integral data records constitutes a multimedia data set . The challenge of
extracting meaningful patterns from such data sets has lead to the research and development in the area of multimedia data
This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required, if not essential, in
many applications. Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for
managing and searching within such collections, but the need for tools to extract hidden useful knowledge embedded within
multimedia data is becoming critical for many decision-making applications. The tools needed today are tools for discovering
relationships between data items or segments within images, classifying images based on their content, extracting patterns from
sound, categorizing speech and music, recognizing and tracking objects in video streams. relations between different multi media
components , result of the rapid progress in these fields is the number of challenges for computer systems research and
:development, and cross including
;enlarged data sets with variety of formats and structures*
;variety of models for integration of media elements and components*
;demands on computational efficiency of media analysis and retrieval algorithms*
Multimedia (or digital media) representations comprise a collection of domain descriptions in "native" for the there variety in•
terms of the degree to which representation is structured and to what degree the specification of this representation complies with
some formal models . The term "hypermedia" is added to stress the presence of links in the digital media under consideration.
. Of knowledge representation and specification
MDM/KDD WORKSHOP SERIES : The workshop series was initiated in early 1998, however, the first successful event was
staged at KDD2000. Since the beginning of the century, the MDM/KDD workshops at the ACM SIGKDD forums
(MDM/KDD2000, MDM/KDD2001, MDM/KDD2002 in conjunction with KDD2000 (in Boston), KDD2001 (in San Francisco) and
KDD2003 (in Edmonton) , respectively) have been bringing together cross-disciplinary experts in analysis of digital multimedia
content, multimedia databases, spatial data analysis, analysis of data in collaborative virtual environments, and knowledge
engineers and domain experts from different applied disciplines related to multimedia data mining. MDD/KDD2000 revealed a
variety of topics that come under the umbrella of multimedia data mining. Accepted papers were presented in three sessions :
.[Mining spatial multimedia data ; Mining audio data and multimedia support ; and Mining image and video data [1
The presentations at the workshop showed that many researchers and developers in the areas of multimedia information systems
and digital media turn to data mining methods for techniques that can improve indexing and retrieval in digital media . The
workshop ended with a discussion on the scope of multimedia data mining and the need to establish an annual forum. When
text, maps, video, sound , and images typically fall into the realm of multimedia, research fields such as spatial data mining
and text mining were already known active disciplines. There was a consensus that multimedia data mining is emerging as its
own distinct area of research and development . The work in the area was expected to focus on algorithms and methods for
from images , sound and video streams. Workshop participants identified that there is a need for
A_ development and application of specific methods, techniques and tools for multimedia data mining;
B_ frameworks that provide consistent methodology for multimedia data analysis and integration of discovered knowledge(iii)
. back system
SIGKDD Explorations .Volume 4, Issue 2 – page 119can be utilized. These conclusions and to some extent decisions influenced the
selection of the papers for presentation at MDM/KDD2001, grouped in the following streams: Frameworks for multimedia mining (2
papers); Multimedia mining for information retrieval (4 papers); and Applications of multimedia mining (6 papers) . The workshop
discussion revised the scope of multimedia data mining outlined during the previous workshop, clearly identifying the need to approach
multimedia data as a “single unit” rather than ignoring some layers in favor of others. The participants acknowledged the high potential of
multimedia data mining methods in medical domains, design and creative industries. There was an agreement that the research and
development in multimedia mining should be extended in the area of collaborative virtual environments, 3D virtual reality systems,
THIS YEAR WORKSHOP :
The 3 rd International Workshop on Multimedia Data mining attempted to address the above-mentioned issues looking at
specific issues in pattern extraction from image data, sound, and video; suitable multimedia representations and formats that can
assist multimedia data mining ; and advanced architectures of multimedia data mining systems. Papers , selected for presentation
Domain and e-business technologies. provided an interesting coverage of these issues and some technical solutions. They were
grouped in the following three streams:
*Frameworks for Multimedia Data Mining.
* Multimedia Data Mining Methods and Algorithms ;
*Applications of Multimedia Data Mining including applications in medical image analysis and applications based on
This mini-symposium will try to address this and related issues concerning large multimedia databases. There will be four
focused presentation by experts in content based retrieval of images, video and speech, followed by a panel discussion on
the multimedia .The hypermedia representations comprise a collection of case descriptions represented as text in free or table
format and other multimedia data, such as CAD drawings, images, video, sound, etc. Another characteristic of hypermedia is the
use of links , where the links can connect information within a case , between different cases , or links to data that lies
outside the case library. Usually , the representation of the cases in the library is organized according to some conceptual model
of the domain .
General Area Of Research
In Multimedia Data Mining Framework for Raw Video Sequences by (Department of Computer Science and Engineering,
University of Texas at Arlington , USA ) presented a general framework for real time video data mining from “raw videos” (e.g.
traffic videos, surveillance videos). In proposed technique the first step for mining raw video data was grouping the input frames
into a set of basic units – segments , that became the organizational blocks of the video database for video data mining . Then ,
based on some features ( motion , object , colors , etc.) extracted from each segment , the segments could be clustered into similar
groups for detecting interesting patterns. The focus within the presented framework was on the motion as a feature, and how to
compute and represent it for further processing . The multi-level hierarchical segment clustering procedure used category and
motion . Preliminary experimental results were promising. Readers may see also the framework for multimedia data mining from
traffic video sequences presented by a joint research team from Florida International University at an earlier workshop  . In
An Innovative Concept for Image Information Mining , (Remote Sensing Technology Institute – IMF , German Aerospace Center –
DLR , Germany ) and Klaus ( Computer Vision Lab , ETH , Switzerland ) introduced the concept of ‘image information mining’
and discussed a system that implements this concept . The approach is based on modeling the causalities , which link the image
–signal contents to the objects and structures within the interest of the users . The basic idea is to split the information
: representation into four steps
;image feature extraction using a library of algorithms to obtain a quasi-complete signal description (1)
;unsupervised grouping in a large number of clusters to be suitable for a large set of tasks (2)
;data reduction by parametric modeling of the clusters , and (3)
supervised learning of user semantics , that is the level where , Instead of being programmed , the system is trained by a set (4)
Through these steps the links from image content to user semantics are created. Step 4 employs advanced visualization tools .
At the time of the workshop the system has been prototyped for inclusion in a new generation of intelligent satellite ground
. segment systems, value - adding tools in the area of geo-information, and several applications in medicine and biometrics
. Proceedings 2nd Int. Workshop on Multimedia Data Mining MDM/KDD2001
Multimedia Data Mining Methods and Algorithms The first presentation in the session , on Multimedia Data Mining Using
P-trees , focused on the specific research work of the Data SURG group at North Dakota State University , USA (co- authored by
A The group came to data mining from the context of evaluation of remotely sensed images for use in agricultural applications.
During that research a spatial data structure was developed that provided an efficient , loss-less , data mining ready
representation of the data. In time the similarity of a sequence of remotely sensed images and some categories of multimedia
data became apparent . The Count Tree (P-tree) technology provided an efficient way to store and mine sets of images and
related data . For most multimedia data mining applications, feature extraction converts the pertinent data to a structured
relational or tabular form , and then the tupelos or rows are analyzed . Proposed (P-tree) data structure design addressed such
data mining setting .
In scale space Exploration for Mining Image Information Content Mariana Ciucu, Patrick Heas, Mihai Datcu (Remote Sensing Technology
Institute – IMF, German Aerospace Center – DLR, Germany) and James C. Tilton (NASA's Goddard Space Flight Center, USA) described
an application of a scale space- clustering algorithm (melting) for exploration of image information content. Clustering by melting considers
the feature space as a thermos-dynamical ensemble and groups the data by minimizing the free energy, having the temperature as a scale
parameter. The implementation of the algorithm has been done through a multi- tree structure. With this structure, they were able to explore
the image content as an information mining function and to obtain a more compact data structure. They had maximum of information in
scale space because they memorized the bifurcation points and the trajectories of the centers points in the scale space. The information
encoded in the tree structure enabled the fast reconstruction and exploration of the data cluster structure and the investigation of
hierarchical sequences of image classifications.
Applications of Multimedia Data Mining
Applications in Medical Image Analysis :
(University of Alberta , Edmonton , Canada) presented in Mammography Classification by an Association Rule-based
Classifier the continuation of their work in classification of medical (mammography) images, discussed at MDM/KDD2001
. They proposed an association rule-based classifier , that was ested on a real dataset of medical images. The proposed approach
included a significant (and important) pre-processing phase, a phase for mining the resulted transactional database, and a final
phase to organize derived association rules in a classification model . Presented experimental results showed that the method
performed well reaching over 80% in accuracy . Authors emphasized the importance of the data-cleaning phase in building
an accurate data mining architecture for image classification . In An Application of Data Mining in Detection of Myocardial
Ischemia Utilizing Pre- and Post-Stress Echo images Pramod K. Singh, Simeon J. Sim off ( University of Technology Sydney,
Australia) and David Feng (University of Sydney , Australia) presented the initial work in the development of a data mining
Approach for computer-assisted detection of myocardial ischemia. Proposed approach is based on automatic identification of
endocrinal and epicedial boundaries of left ventricle (LV) from images generated from echocardiogram data . These images are of
poor quality and similar to the mammography images , discussed in the previous presentation , require significant preprocessing.
The overall algorithm includes LV-wall boundary identification , segmentation and further comparative analysis of wall segments
in pre- and post stress echocardiograms. Applications in Content-Based Multimedia Processing :
In From Data to Insight : The Community of Multimedia AgentsGang Wei, Valery A. Petrushin and Anatole V. Gershman
(Accentor Technology Labs , Chicago , USA) presented a work devoted to creating an open environment for developing , testing,
learning and prototyping multimedia content analysis and annotation methods. The work is part of the multimedia agents project
COMMA. The environment served as a medium for researchers to contribute and share their achievements while protecting their
proprietary techniques . Each method was represented as an agent that could communicate with the other agents registered in the
environment using templates that were based on the Descriptors and Description Schemes in the emerging MPEG-7 standard .
Such approach allowed agents developed by different organizations to operate and communicate with each other seamlessly
regardless of their programming languages and internal architecture . To facilitate the construction of media analysis methods , the
research team provided a development environment. This environment enabled researchers to compare the performance of different
agents and combine them in more powerful and robust system prototypes. The COMMA could also serve as a learning
environment for researchers and students to acquire and test cutting edge multimedia analysis algorithms , and sharing media
agents. Some background to this work can be found in Valerie's previous work on the PERSEUS .
In A Conte t Based Video Description Scheme and Video Database Navigator Sadiye Guler and Ian Pushee (Northrop Grumman
Information Technology , USA ) , introduced a unified framework for a comprehensive video description scheme and presented the
“database navigator” – a browsing and manipulation tool for video data mining. The proposed description scheme was based on the
structure and the semantics of the video, incorporating scene, camera, and object and behavior information pertaining to a large class
of video data . The Database Navigator was designed to exploit both the hierarchical structure of video data , the clips , shots and
objects , as well as the semantic structure , such as scene geometry the object be saviors . The navigator provided means for
visual data mining of multimedia data:
Intuitive presentation , interactive manipulation , ability to visualize the information and data from a number of perspectives , and
to annotate and correlate the data in the video database. The authors demonstrated a working prototype.
In Subjective Interpretation of Complex Data : Requirements for Supporting Kansei Mining Process Nadia Bianchi-Berthouze and
Tomofumi Hayashi (University of Aizu , Aizu Wakamatsu , Japan ) presented the continuation of Nadia’s work on modeling of
visual impression from point of view of multimedia data mining , presented at MDM/KDD2001 . The new work described a
data warehouse for the mining process of multimedia information, where a unique characteristic of the data warehouse was its
ability to store multiple hierarchical descriptions of the multimedia data. Such characteristic is necessary to allow mining
multimedia data , not only at different levels of abstraction , but also according to multiple interpretation of the content . Proposed
framework could be generalized to support the analysis of any type of complex data that relate to subjective cognitive processes,
Would be greatly variable . In User Concept Pattern Discovery Using Relevance Feedback and Multiple Instance Learning for
Content-Based Image Retrieval Xin Huang , Shu-Ching Chen (Florida International
University, USA) , Mei -Ling Shyu (University of Miami , USA) and Chengcui Zhang ( Florida International University , USA)
proposed a multimedia data mining framework that incorporated multiple instance learning into the user relevance feedback to
discover users concept patterns , especially to find where the users most interested region was allocated and how to map the local
feature vector of that region to the high - level concept pattern of users . This underlying mapping could be progressively
discovered through proposed feedback and learning procedure . The role of the user in the retrieval system was in guiding the
mining process according to user’s focus of attention.
Computer Vision : Multimedia Data Mining
In this project , we are interested in the question of finding pairs of near - duplicate images in a massive image database . We
have two primary concerns : robustly determining when two images are near - duplicates of each other ( under a suitable
definition of "near-duplicate" ) and determining very quickly which images in a database are near-duplicates of a query image .
We say that two images are near-duplicates if one is a transformed version of the other . Transformations can be geometric like
rotation, scaling, cropping, or translation or "image-processing" changes like compression and filtering. Note that this problem is
a restricted version of the general image retrieval problem .
The massive image datasets collected by web crawlers contain large number of duplicate and near-duplicate images. The same
image may have been stored at different resolutions or in different formats . It will be very useful to index these datasets and
store either only one copy of the duplicate images or add information about the existence of duplicates to the database . When
the crawler downloads a new image , It can use the index to decide whether to store the image or not . An image
comparison/retrieval scheme like the one we are developing will be very useful in such a situation .
Another scenario where our system can be applied is copyright protection . Watermarking is a commonly used tool to enforce
copyright protection . Watermarks typically add very slight changes to images . When a user wants to determine whether a
particular image is owned by him , he can use our system to quickly find the image in his catalogue ( which may contain a
very large number of images ) that are nearly similar to the suspect image and then apply a sophisticated ( and possibly time-
consuming ) watermark-detection algorithm.
All previous techniques for searching multimedia databases for similar images operate in two stages . They first formulate a
method for comparing two images for similarity . In the second stage , they use the comparison algorithm to detect which
images in a database are similar to the query image.
There are many previous techniques for measuring the similarity between two images . Most of these methods fall into two
classes . One class of techniques computes interesting and useful statistics ( such as the histogram or a filter response ) for the
entire image and then determines the distance between the images as a function of the computed statistics using well-known
metrics such as Euclidean and Manhattan distance. The other class of techniques breaks up the image into regions of various
sizes at different scales and computes statistics for each region. These methods then combine the different pieces of statistics and
compute the distance between the combinations. Alternatively , they treat the statistics as a set or bag and compare the images
based on the overlap between the two sets.
Most methods for detecting images in the database that are similar to a query image check every image in the database for
similarity , a process that is very slow for very large data sets . While techniques do exist that speed up the search by
indexing the database , they fail to restrict the search to a small portion of the database when applied to our problem . The
reason why the entire database or most of the database is searched for many queries is related to the manner in which an
image is represented in terms of its statistics . The standard form Is to combine the statistics into a vector in a high –
dimensional space (tens or hundreds of dimensions ) and to define the “distance” between the images to be an appropriate
measure of distance between the corresponding vectors. The typical measures used for measuring distance are the Euclidean
and Manhattan metrics. Two images are now said to be similar if the distance between the corresponding vectors is small.
Searching for points near a query point in these high-dimensional spaces is an extremely challenging problem. The
performance of all known and popular data structures for this problem degrades considerably for high dimensions . For instance
, it is well known that the K-d tree and its variants perform poorly in dimensions higher than twenty . In such high
dimensions , nearly the entire data structure is searched when it is queried for a near or nearest neighbor . This problem is
compounded when the distance measure used is not a metric (e.g., the K-L divergence) . Standard similarity metrics cannot
take full advantage of recent advances in multidimensional near-neighbor searching such as (LSH) because they do not satisfy
the properties required by LSH.
Our approach has two major strengths . First , we have developed a novel measure of image similarity that is robust to the
common transformations performed on images, e.g., rotation, translation, scaling, cropping, filtering, compression , etc . Second ,
our measure of similarity has the highly desirable property that it satisfies the conditions required by the LSH framework for
near-neighbor searching . The LSH algorithm has the crucial ( mathematically-provable ) property that it finds a near-neighbor by
examining only a very small fraction of the database . Thus , it performs extremely efficiently for databases with millions of
images and even when the entire database does not fit into the memory of the computer . In fact , when the database does not
fit into memory , the LSH algorithm finds near-similar images with a very small number of disk accesses.
Smart Kiosk Project
Human-Computer Interaction Systems Group
Kiosks are free - standing boxes with a display , an interactive user –Interface , and possibly a network connection. They are
designed to provide information, sell products , and entertain . Kiosks are typically located in high - traffic areas like museums
and stores . Existing kiosk applications include information centers at theme parks like Disney's Epcot center , music CD preview
stations , airplane /movie /concert ticket dispensers , and custom greeting card machines . Current kiosk are limited in their
interaction with people to responding to touch screen or keyboard input.
The Smart Kiosk project was initiated at HP to explore how advances in computer technology could be applied to improve
public kiosks . The goal of this project is to develop a "Smart Kiosk" that interacts with people in a natural, intuitive fashion.
The ideal Smart Kiosk will recognize people and track people in its vicinity, communicate with them visually and audibly , and
interact in a friendly , natural manner.
To test kiosk technologies and explore user requirements , we have been building a series of kiosk prototypes . Each prototype
addresses a real-world problem and provides an experimental test vehicle for exploring core kiosk technologies . The first
prototype placed outside of our laboratory was installed in the Cyber smith Café in Cambridge, Massachusetts in September
SPEECH RECOGNITION BASED MULTIMEDIA INDEXING
Is the first Internet search site for Indexing streaming spoken audio on the web . Unlike previous attempts to index spoken audio
on the Web which have relied on either adjacent text, metadata, or hand supplied transcripts and close captions , Speech uses
automatic speech recognition technology to transcribe and index documents that do not have transcripts or other content
information . The use of speech recognition permits the efficient and cost - effective indexing of thousands of hours of audio
content , which were previously inaccessible . Because of this indexing , Speech allows users to quickly search for relevant
content in long audio documents and yields a high precision on first page- retrieved items.
Speech indexes streaming media files based on their content, much as conventional search sites index ordinary Web pages by
their text content . Like conventional search sites , Speech does not store or serve the multimedia files themselves, but rather
provides users with links . The index is continually updated using SpeechBot’s highly scalable architecture.
The Speech site was designed as a demonstration vehicle for HP Corporate Research’s work on multimedia indexing for the
Internet . This technology includes workflow systems for crawling, downloading, Transco ding, and indexing multimedia files ,
several technologies ( including speech recognition ) for analyzing and annotating multimedia content , and search site technology
to handle user queries and serve appropriate Web pages .
Video and Image Processing Group
The Video and Imaging Processing Group at Cambridge Research Lab is developing core enabling technologies to allow motion video, still
image, and audio data to be accommodated in HP's computer systems and information appliances.
The central project is Media to, an exploratory information appliance where a goal is to tie together research efforts in a number of human-
computer interaction areas in the lab. In particular, efforts from the speech and vision Groups.
As humans we are wet-wired to respond to faces. Cells in the brain activate when presented with images of faces and respond directly to
facial orientation and pose. Understanding human response to faces has been the subject of much investigation by the scientific community
over many years. For example Charles Darwinian dealt precisely with these issues.
Tapping into human responses to faces promises to re-shape the human computer experience. The ability to present talking faces provides a
unique opportunity for Multimedia providers to create animated characters to deliver messages, guides to provide information about a site,
or bring Email and Chat avatars to life. These opportunities have resulted in a tremendous surge of interest and growth over the past five
A Grand Challenge for Facial Synthesis
As the fidelity of facial synthesis improves, it becomes increasingly hard to deceive the observer. This is because we are all experts at
interpreting and understanding subtle motions and expressions displayed on the face. Consequently, when we are presented with a synthetic
face of an individual, the illusion of reality is soon lost as we can quickly determine that it is a fake.
It is clear that at some point in the future we will be able to synthesize characters that can be masquerade as real people, but how do we
know how close we are to reality? Is it possible to take a single snapshot of an individual, record some voice samples and then re-animate
the individual in a new sequence to be indistinguishable from the real person? Essentially this is a "Turing Test for Facial Animation".
We are not there yet and it remains a grand challenge for computer synthesis.
The Turing Test for Facial Animation
Alan Turing first described in his 1950 article Computing machinery and intelligence (Mind, Vol. 59, No. 236, pp. 433-460) a test for
machine intelligence as follows:
"The interrogator is connected to one person and one machine via a terminal, therefore can't see her counterparts. Her task is to find out
which of the two candidates is the machine, and which is human only by asking them questions. If the interrogator cannot make a decision
within a certain time (Turing proposed five minutes, but the exact amount of time is generally considered irrelevant), the machine is
The visual equivalent to this test would be approximately the same test with a visual surrogate and could be done in two parts, the first a
passive observation with no interaction, and the second more challenging test with interrogator interaction.
At CRL have been working on real-time facial animation. This has resulted in a Multimedia tool face work capable of generating talking faces from a
.jpeg image and real speech.
Vision-based Interaction Group
Human-computer interaction is on the brink of radical change. The catalyst for this change will be the computer's ability to sense and
recognize its users, to see and reconstruct its environment, and to respond visually and audibly to these stimuli. The Vision and Graphics
Group at Cambridge Research Lab researches vision-based human-computer interaction. Our interest is in building real systems for real
users, for example, a smart kiosk that sees its users and interacts with them in a natural, intelligent way. Our expertise is in computer vision,
3D model building, computer graphics, facial animation, and behavior modeling.
We are developing a Smart Kiosk that is, a kiosk that sees its users and interacts with them in a natural way. The image below shows our
earliest kiosk prototype (circa 1995) which demonstrated the ability to sense a user with a camera and respond in real-time via a talking
synthetic face. More information is available about these efforts.
Discovering implicit knowledge in hypermedia case bases is substantially different to data mining in databases. The disorganization units
in database data mining are the data tables, in particular their columns or rows. Inside the hypermedia case base the organizational unit is
the case or sub case, which merely consists of one or more multimedia pages. The pages comprise variety of data formats. Knowledge
discovery then, in our use of the term, involves finding patterns in primarily unstructured data. More formally the knowledge discovery
process in multimedia case representations can be viewed as machine learning where a case library replaces the training set The approach
is illustrated As illustrated in Figure 8, the information model of the case library constitutes the initial basis for the multimedia data
mining. During the data segmentation multimedia data are divided into logical interconnected segments. For example, in hypermedia
case library each segment can include one or more pages. Within particular media type a segment may have different meaning. For
example, within a text a segment could be paragraph, a sentence. The actual mining and analysis procedures are expected to reveal some
relations in the segments and between the segments at different levels. For instance, the text analysis can identify relations between
concepts presented in the text and the CAD drawings presented on that page (or vice versa, find that the actual CAD drawings are not
connected with the content of the text description). The analysis within text segments can identify word concordances that denote complex
terms, not explicitly defined in the case base. Extracted patterns are incorporated and linked under the framework of the information model.
As a result there can be additional attributes, change in links, revision of identified attributes, changes in some attribute values. Some
paragraphs, images or other media segments could become insignificant. Consequently, the information model should be able to
accommodate changes in the structure and media content. The model is expanded in Figure 9. In case-based reasoning systems the specific
interest can be focused in finding patterns in the cases that can assist with indexing and adapting cases as a way of improving the retrieval of
related previous experience and indication when an adaptation lies outside some reasonable constraints, based on the experience in the case
base. Patterns in the form dynamic thematic paths can assist with the navigation in retrieved cases. The framework combines two
consecutive complementary strategies - data- and hypothesis-driven exploration, discussed in more details in .The above described
data mining schema can be integrated in the learning loop of the cased-based reasoning. The overall enhancedmodel of case-based
reasoning with knowledge discovery back-end, as shown in Figure 10, illustrates the dynamics of case manipulation, analysis, mining,
knowledge formulation and case update. The visual "symmetry" in Figure 10 reflects in some sense the mutual benefit from the
amalgamation of these computing approaches. On the one hand, multimedia data mining has the potential to improve the case-based
system. On the other hand, the case-based paradigm provides mechanism for incorporation and management of discovered knowledge. On
the indexing side potential advantages in the extended case-based reasoning model include:
*generation of term indexing schemes, based on the words used in the text representation ;
*generation of term indexing schemes, based on relating terms to regularities discovered in other media types;
*generation of alternative indexing schemes (for example, a graph structure indexing scheme), based on structural patterns discovered in
graphics, image, audio and video media in the case Thematic path is a set of multimedia pages, each of which is part. of the case library,
that are relevant to the explanation and illustration of particular concept and have to be visited in
*association of multiple indexing schemes to one case library;
*dynamic generation of the above listed indexing schemes.
Multimedia Miner is a prototype of a data mining system for mining high-level multimedia information and knowledge from large
Berthold, M. R., F. Sudweeks, S. Newton and R. Coyne
"Clustering on the Net: Applying an Auto associative Neural
Network to Computer-Mediated Discussions,"
Computer Mediated Communication
, 2 (4),
Fluckiger, F. Understanding Networked Multimedia:
Applications and Technology, Prentice Hall, London (1995).
Ip, H. H. S. and A. W. M. Smeulders, eds.,
Information Analysis and Retrieval
, Springer, Heidelberg,
Jones, S. ed.,
Doing Internet Research
, Sage Publications,
Thousand Oaks, CA, 29-55 (1999).
Knuth, D E.,
The art of computer programming
, Addison-Wesley, Reading, MA,
Maher, M. L., S. J. Simoff and A. Cicognani, "Potentials and
limitations of Virtual Design Studio,"
, January, a1 (1997).
Maher, M. L. and S. J. Simoff, "Knowledge discovery from
multimedia case libraries", in Smith, I., ed.,
Intelligence in Structural Engineering
, Springer, Berlin, 197-
Maher, M. L., S. J. Simoff and A. Cicognani,
Virtual Design Studios
, Springer, Heidelberg (2000).
Sudweeks, F., M. McLaughlin and S. Rafaeli, eds,
and Netplay: Virtual Groups on the Internet
Press, Menlo Park, CA, 191-220 (1998).
Simoff, S. J. and M. L. Maher, "Ontology-based multimedia
data mining for design information retrieval",
the ACSE Computing Congress
, Cambridge, MA, 310 - 320
Simoff, S. J. and M. L. Maher, "Deriving ontology from
International Journal of Design Computing
1, http://www.arch.usyd.edu.au/kcdc/journal/vol1 (1998).
Simoff, S. J. and M. L. Maher, "Knowledge Discovery in
Hypermedia Case Libraries - A Methodological Framework,"
Proceedings of the Knowledge Acquisition Workshop, 12th
Australian Joint Conference on Artificial Intelligence, AI'99
Simoff, S. J. and Maher, M. L. "Analysing Participation in
Collaborative Design Environments,"
Zaïane, O. R., J. Han, Z.-N. Li, S. H. Chee, S. H. and J. Y.
Chiang. "Multimedia Miner: A system Prototype for
Multimedia Data Mining,
Proceedings of ACM SIGMOD
International Conference on Management of Data
, 581 - 583
Zaïane, O. R., J. Han, Z.-N. Li, J. Hou, "Mining Multimedia
Proceedings CASCON'98: Meeting of Minds
Canada, 83-96 (1998).