Multimedia Mining

*Biniam Asnake
*Dawit Mulugeta

Presentation Outline:
• Introduction to MM
• Article Reviews:
1. Visual Mining of Multimedia Data for Social and
Behavioral Studies
2. Multimedia Data Mining for Traffic Video Sequences
3. Tune into the voice of your customer with voice
mining
• Conclusion
• Recommendations

Introduction
• Advances in multimedia acquisition and storage
technology have led to tremendous growth in
very large and detailed multimedia databases.
• A large amount of high-resolution high-quality
multimedia data has been collected in
research laboratories in various scientific
disciplines, especially in social, behavioral and
cognitive studies.
• If these multimedia files are analyzed, useful
information to users can be revealed.

… Introduction
• Multimedia mining deals with the
extraction of implicit knowledge,
multimedia data relationships, or
other patterns not explicitly
stored in multimedia files.
(S. Kotsiantis et. al, 2006)
• Multimedia mining is an interdisciplinary
endeavor that draws upon expertise in
computer vision, multimedia processing,
multimedia retrieval, data mining, machine
learning, database and artificial intelligence.

… Introduction
• How to automatically and effectively discover
new knowledge from rich multimedia data poses
a compelling challenge.
• Multimedia data mining consists of two stages.
1) Researchers extract some derived data
from raw multimedia data.
• This step can be implemented by human coding or by
using image/speech processing programs.
1) Researchers work on derived data with the
goal to find interesting patterns.

Visual Mining of Multimedia Data
for Social and Behavioral Studies
Chen Yu, Yiwen Zhong, Thomas
Smith, Ikhyun Park, Weixia Huang

Visualization approaches for multivariate data
• TimeSearcher
– is a time series exploratory and visualization tool that allows
users to query time series.
• ThemeRiver
– is used to visualize thematic changes in large document
collections.
• VizTree
– is designed to visually mine and monitor massive time series
data.
• Spiral
– is mainly used to compare and analyze periodic structures in
time series data,
• Van Wijk et al
– designed a cluster and calendar-based approach for the
visualization of calendar-based data.

Identified Problems
• Current methods of visualization deal with
linear time or highly periodic time;
– not designed to handle event-based data which is
typical in multimedia applications.
• Those methods focus on visualization,
navigation, or query only.
Objective
• This new approach provides an interactive
tool to integrate visualization with data
mining.

Multimedia Dataset Used
• Video:
– there were three video streams recorded simultaneously
with the frequency of 10 frames per second, and the
resolution of each frame is 320x240.
• Audio:
– The speech of the participants was recorded at a frequency of
44.1 kHz.
• Motion tracking:
– there were two sensors, one on each participant’s head. Each
sensor provided 6 dimensional (x, y, z, head, pitch, and roll)
data points at a frequency of 120Hz.
• In total, the dataset consists of about 90,000 image
frames, 864,000 position data points, and 50 minutes of
speech.

Visualization of Multimedia Data
There are two major display components in the application:
a multimedia playback window and a visualization window.
to visually
explore the
derived data
streams and
discover new
patterns and
findings

Data Representation and Visualization
• The time-based /temporal data can be
categorized into two kinds:
1. CONTINUOUS VARIABLES:
• related to time points (a series of
single measurement at particular
moments in time)
2. EVENT VARIABLES:
• related to time intervals
(e.g. the onset and offset of an event)

(1) Continuous Time Series Data
• 3 ways to visually explore continuous time
series data:
{1} as individual data streams
{2} as a set of multiple data streams
{3} as an arithmetic combination of
multiple data streams

1. Using curves to visualize
individual data streams
• A novel feature added -> HISTOGRAM DISPLAY.
• The purpose is to allow users to explore individual
data streams and examine both the overall
statistics of a data stream (Global Histogram) and
the statistics within a local window (Local
Histogram).

2. Using gray-level representation to
visualize a set of multiple data streams
• Purpose ->to visually display and explore two
kinds of information:
(1) possible correlation between multiple data
streams
(2) interesting joint patterns across multiple data
streams.

3. Using area graphs to visualize an arithmetic
combination of multiple data streams
• Users can combine multiple temporal variables
together (by + and -) in various ways and then
visually explore the combined distribution.

(2) Event Data
• Events are presented as bars of color, with
their size on screen corresponding to their
duration.
• Users can visually explore (1) freq. of event
(2) its duration and (3) its periodicity

To handle potential more complex patterns
involving more variables and logic operations,
users can define a new event variable.

(3) Concurrent visualization of
Continuous and Event variables
The display panel will highlight those
continuous values at the moments when the
selected events happen.

Event-based Interactive Visual Exploration
By visually exploring the data –
instance by instance,
users can directly compare those moments to detect the
similarities between these.
many
multimedia
data are
essentially
event-driven.

Event Grouping
• Users can visually examine each instance of an event,
and categorize the instances into groups. -> Saved
• The overall grouping results can then be visualized in one
single panel.

Flexible Interfaces between
Visualization and Data Processing
• The media playback panel allows users to play back video
and audio data at various speeds. On the top of this,
– The researchers designed and implemented one
critical component
to connect multimedia playback with
visual data mining
raw multimedia data <-> exploring derived data
• To increase the flexibility to be compatible
with data mining,
– this system allows users to use any programming
language (like: MatLab, R, C/C++) to obtain new
results.

The researchers' Future Work
• to conduct a systematical evaluation of
the prototype system
–using experimental paradigm
–to have a better idea of:
• what are advantages and limitations of the
current system and
• what will need to be improved.

Conclusion of the Article
• The visualization tool developed allows
users
–To easily examine and synthesize
information into new ideas and
hypotheses, but also
–quickly quantify and test the insights
gained from visualization.

Multimedia Data Mining for
Traffic Video Sequences
Shu-Ching Chen, Mei-Ling Shyu,
Chengcui Zhang, Jeff Strickrott

Introduction and Motivation
• Traffic video analysis can discover and provide
useful Information such as:
– queue detection, vehicle classification, traffic flow,
and incident detection at the Intersections.
• Some municipalities are installing video camera
systems to monitor and extract traffic control
information from their highways in real time.

Identified Problems
• The current transportation applications and research
work either:
– Do not connect to databases or
– have limited capabilities to index and store the
collected data
– cannot provide organized, unsupervised,
conveniently accessible and easy-to-use
multimedia information to traffic planners.
• In order to discover and provide some important but
previously unknown knowledge from the traffic video
sequences to the traffic planners, multimedia data
mining techniques need to be employed.

The Proposed Framework
• Includes:
–Background Subtraction
–Vehicle Object Identification and Tracking
–Multimedia Augmented Transition Network
(MATN) model and
–Multimedia Input Strings

Background Subtraction
• It is a technique to remove non-moving
components from a video sequence.
• This technique was used:
to enhance the basic SPCPE algorithm
(Simultaneous Partition and Class Parameter Estimation)
(unsupervised video segmentation method)
to get better segmentation results.

The main assumption is that the camera remains stationary

Object Tracking
• The 1st
step -> to extract the segments in each class.
• Then the minimal bounding box and the centroid
point for each segment are obtained.

Using MATNs & Multimedia Input Strings
to Model Video Key Frames
• A Multimedia Augmented Transition Network
(MATN) model
– can be represented diagrammatically by a labeled
directed graph, called a transition graph.
• A Multimedia Input String is
–accepted by the grammar if there is a path of
transitions which corresponds to the sequence of symbols in
the string and which leads from a specified initial
state to one of a set of specified final states.

… MATNs and Multimedia Input Strings
• Key frames play as the indices for a shot.
• In this paper, each frame is divided into nine sub-
regions with the corresponding subscript numbers.
• Each key frame is represented by:
– an input symbol in a multimedia input string
– “&” symbol between two vehicle objects
• is used to denote that the vehicle objects appear in the same
frame.
– subscripted numbers
• are used to distinguish the relative spatial positions of the
vehicle objects relative to the target object “ground”.

Multimedia Input String that represents two key frames
Example:
the nine sub-regions and
their corresponding subscript numbers
an example MATN model

Experiment Setup
• The traffic video sequence was:
– captured with a Sony Handycam CCD TR64 and
– digitized with an Brooktree Bt848 based capture card
on a Windows NT 2000 Celeron-based platform.
• The video sequence consists of about 16 minutes of
video with approximately constant lighting conditions.
• A small portion of the traffic video is used to
illustrate how the proposed framework can be
applied to traffic applications to answer spatio-
temporal queries like:
“Estimate the traffic flow of this road
intersection from 8:00 AM to 8:30 AM.”

Experiment Results
• Using the background subtraction technique,
– both the efficiency of the segmentation
process and the accuracy of the segmentation
results are improved achieving more accurate
video indexing and annotation.
Conclusion
• The proposed framework can model complex
situations such as traffic video for intersection
monitoring.

• Segmentation results as
well as the multimedia
input strings for frames
4, 9, 15, 16 and 35.
• The leftmost column
gives the original video
frames;
• the second column
shows difference images
obtained by subtracting
the background
reference frame from
the original frames;
• the third column shows
the vehicle segments
extracted from the
video frames, and
• the rightmost column
shows the bounding
boxes of the vehicle
objects

Tune into the voice of your
customer with voice mining
By Manya Mayes

Introduction
• Understanding customer comments coming in the forms text,
audio and video that are word for word records, e-mail, voice
mail, surveys and the Web, and most recently via social
networking sites (YouTube, Facebook, etc.) will determine the
business transaction of an organization.
• Especially the vice mining is getting growth and helps to
identify the reasons for call point, the effectiveness of
marketing campaigns, the competitors most mentioned by
your clients, why certain products sell more than others, and
predict the customer satisfaction level of every interaction.
• Combing voice capture with business intelligence, analytics
and text mining provides valuable customer intelligence for
marketing and competitive intelligence business functions.

Introduction(Cont.)
• In addition to the traditional keyboard-entered comments of customer
feedback, companies may also record the audio of these customer
interactions spoken by both the agent and the customer.
• The manual listening and interpreting customers’ feedback is often
inaccurate and inconsistent.
• As a result, automated methods are becoming more prevalent.
• An automated phonetic index search is the typical approach to
understand customer audio information using particular segments
voice-to-text transcription that is identified by domain expertise.
• Stored audio signals can be transcribed and analyzed to predict what is
most likely to happen next such as determining the likelihood that the
customer will close his or her account.
• Techniques such as segmentation are used to automatically group or
classify call transcriptions.

The process: analyzing audio data and
Phonetic index search
• Analyzing audio data can help you identify the call reasons,
the effectiveness of campaigns, the competitors
mentioned by clients, and can predict the customer
satisfaction level.
• The audio signal itself can be analyzed for a wide variety of
information with the metadata
– The Captured metadata fields include call length,
Emotion/stress detection, Silence, number of holds,
number of transfers and the like.

The process(Cont.)
• Phonemes are the basic units of sounds in a
language and a phonetic index is a partial
transcription of an audio signal.
• Metadata about calls can be used for reporting
purposes and incorporated into analytical models
for discovery purposes and identify a dissatisfied
customer.
• A phonetic index search automatically transforms
the captured audio signal into a sequence of
phonemes or sounds.
• Phonetics indexing allows fast searching of the
signal.

Categorizing calls
• Categorizing calls based on the phonetic index search
and full text transcription with the results of the search
indexes.
• Transcriptions are usually only performed on certain
calls
– e.g., calls where customers suggest they will close their
accounts, cancel their subscriptions or call with service
problems.
• By providing a full transcription of all customer calls
and combining the metadata about the call can:
– describe the issues that customers are calling and predict
which customers are most likely to close their accounts, etc
– allowing appropriate action to be taken before it is too
late.

Voice mining using SAS Text Miner and
its advantage
• SAS can read the audio outputs that are captured using Call
Miner, NICE Systems, other similar tools.
• The information provided by the voice capture includes:
– the categories created by the phonetic index search,
– the metadata about the call and the call transcriptions.
• SAS provides industry-leading data integration with the
ability to access a wide variety of data sources and formats,
enabling information to be delivered to users in a way that
they can use it.
– SAS Text Miner provides access to more than 200 document formats
and users are able to gather information from voice vendors of
choice

Voice mining(Cont.)
• The automatically clustering/segmenting documents
and profiling these segments using metadata about
the call will provide further information about the
segment.
– The method is used understand the types of issues
customers are calling about.
• Profiling these segments using metadata about the call
and related customer information provides further
information about the segments.
• The predictive modeling which is a data driven and
consistent method to understand what might happen
next and enables the center agent too take preventive
actions.
• The customer’s experience over the phone can help
predict loyalty, churn, satisfaction and more

Integrating structured data for segment
profiling
• To get an even clearer picture of the results of
text clustering, related structured data (metadata
about the call and related customer information)
was used to further describe the issues.
• The results show that call length and the call hold
indicator provide additional information in the
billing issues cluster.
• Terms that are highly associated with the
selected term are displayed in a hyperbolic tree
structure.

Predicting Cancellation of Subscription
• Once Instance
• In order to make a prediction on the likelihood of
cancellation of subscription, the churn prediction model
used which includes the call
– outcome(result of the call) showing whether or not the
customer cancelled his or her subscription
– the data describing the interaction with the customer such
as the transcriptions of the calls, the metadata about the
calls, demographics, purchasing behavior and
frequency/monetary information.
• The model to predict cancellation of subscription should
use historical data up to, but not including, the call
where the customer actually cancels his or her
subscription.

Predicting (Cont)
• The artificial value of 1 is given whenever the term
“cancel” or any of its variations (such as cancels,
cancelled, cancelling, cancellation, etc.) was found and
a value of 0 otherwise.
• The Text Miner node then takes the call transcriptions
and uses linguistic techniques to identify terms,
multiple-word terms, parts of speech, stems, etc., and
uses statistical techniques to give the customer
feedback text a numeric transformation.
• The data is then passed to the Regression, Neural
Network and Decision Tree nodes to build multiple
competing models using the churn outcome and the
text transformations..

Predicting(Cont.)
• The metadata about the call and related customer
information also may be used at this time to
improve model lift.
• The Model Comparison node then takes the results
of each of the preceding models and selects the
“best” model based on which model correctly
classifies the text as predicting churn or no churn.
• Once a best model has been selected, the
underlying code is then used to apply the model to
new data. This is known as model scoring or model
deployment.

Predicting (Cont.)
• The underlying SAS code behind the predictive model
described above was saved and registered as a SAS
Stored Process via the SAS Management Console.
• Several stored processes are created to highlight
various deployments of the MSNTV transcribed data.
• Since the current voice technology does not allow for
real-time transcription, voice captures cannot be
deployed in real time.
• The results are customized to show the original text
and the corresponding prediction of service
cancellation.

Predicting (Cont)
• The user can manipulate the resulting
spreadsheet to show a graphical
representation of the cancellations of
subscriptions. The SAS tasks available via the
SAS Add-In for Microsoft Office are displayed.
• SAS BI dashboards display additional
information about the MSNTV data. The
dashboard is configured to show several views
of the call center data.

Predicting (Cont)
•The propensity to
cancel indicator is about
38 percent chance of
cancelling their
subscriptions.
•The power can enable
companies to retain key
customers and avoid
the costs associated
with undue churn.

Conclusion
• Based on the Voice Mining tools and creating a
stored process can produce valuable information and
knowledge available to business analysts and
managers who might not have had access to this
information previously.
• Despite data quality issues, SAS Text Miner did a
remarkable job of finding consistent patterns in the
customer and agent comments
• By actually hearing and understanding what
customers are already telling you, numerous
indicators can be used to build loyalty, reduce churn
and make your products safer.

Recommendations
• As much as the importance of multimedia mining, there are
no local researches on multimedia mining and only few
researches multimedia retrieval (esp. image).
• Therefore, we recommend conducting research on
multimedia mining for audio, speech, video as well as
advanced image retrieval systems.
• Organizations like libraries, museums and other information
centers (like Television and Radio broadcasters) that have
digital repositories should use the advantages provided by
the application multimedia mining.
• Other organizations (such as Transportation and traffic office)
are also recommended to digitize the information which is
kept in non-computer readable formats and apply multimedia
mining on top of it.

Multimedia Mining

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Multimedia Mining

Similar to Multimedia Mining (20)

More from Biniam Asnake

More from Biniam Asnake (6)

Recently uploaded

Recently uploaded (20)

Multimedia Mining

Editor's Notes