*Biniam Asnake
*Dawit Mulugeta
Presentation Outline:
• Introduction to MM
• Article Reviews:
1. Visual Mining of Multimedia Data for Social and
Behavioral Studies
2. Multimedia Data Mining for Traffic Video Sequences
3. Tune into the voice of your customer with voice
mining
• Conclusion
• Recommendations
Introduction
• Advances in multimedia acquisition and storage
technology have led to tremendous growth in
very large and detailed multimedia databases.
• A large amount of high-resolution high-quality
multimedia data has been collected in
research laboratories in various scientific
disciplines, especially in social, behavioral and
cognitive studies.
• If these multimedia files are analyzed, useful
information to users can be revealed.
… Introduction
• Multimedia mining deals with the
extraction of implicit knowledge,
multimedia data relationships, or
other patterns not explicitly
stored in multimedia files.
(S. Kotsiantis et. al, 2006)
• Multimedia mining is an interdisciplinary
endeavor that draws upon expertise in
computer vision, multimedia processing,
multimedia retrieval, data mining, machine
learning, database and artificial intelligence.
… Introduction
• How to automatically and effectively discover
new knowledge from rich multimedia data poses
a compelling challenge.
• Multimedia data mining consists of two stages.
1) Researchers extract some derived data
from raw multimedia data.
• This step can be implemented by human coding or by
using image/speech processing programs.
1) Researchers work on derived data with the
goal to find interesting patterns.
Visual Mining of Multimedia Data
for Social and Behavioral Studies
Chen Yu, Yiwen Zhong, Thomas
Smith, Ikhyun Park, Weixia Huang
Visualization approaches for multivariate data
• TimeSearcher
– is a time series exploratory and visualization tool that allows
users to query time series.
• ThemeRiver
– is used to visualize thematic changes in large document
collections.
• VizTree
– is designed to visually mine and monitor massive time series
data.
• Spiral
– is mainly used to compare and analyze periodic structures in
time series data,
• Van Wijk et al
– designed a cluster and calendar-based approach for the
visualization of calendar-based data.
Identified Problems
• Current methods of visualization deal with
linear time or highly periodic time;
– not designed to handle event-based data which is
typical in multimedia applications.
• Those methods focus on visualization,
navigation, or query only.
Objective
• This new approach provides an interactive
tool to integrate visualization with data
mining.
Multimedia Dataset Used
• Video:
– there were three video streams recorded simultaneously
with the frequency of 10 frames per second, and the
resolution of each frame is 320x240.
• Audio:
– The speech of the participants was recorded at a frequency of
44.1 kHz.
• Motion tracking:
– there were two sensors, one on each participant’s head. Each
sensor provided 6 dimensional (x, y, z, head, pitch, and roll)
data points at a frequency of 120Hz.
• In total, the dataset consists of about 90,000 image
frames, 864,000 position data points, and 50 minutes of
speech.
Visualization of Multimedia Data
There are two major display components in the application:
a multimedia playback window and a visualization window.
to visually
explore the
derived data
streams and
discover new
patterns and
findings
Data Representation and Visualization
• The time-based /temporal data can be
categorized into two kinds:
1. CONTINUOUS VARIABLES:
• related to time points (a series of
single measurement at particular
moments in time)
2. EVENT VARIABLES:
• related to time intervals
(e.g. the onset and offset of an event)
(1) Continuous Time Series Data
• 3 ways to visually explore continuous time
series data:
{1} as individual data streams
{2} as a set of multiple data streams
{3} as an arithmetic combination of
multiple data streams
1. Using curves to visualize
individual data streams
• A novel feature added -> HISTOGRAM DISPLAY.
• The purpose is to allow users to explore individual
data streams and examine both the overall
statistics of a data stream (Global Histogram) and
the statistics within a local window (Local
Histogram).
2. Using gray-level representation to
visualize a set of multiple data streams
• Purpose ->to visually display and explore two
kinds of information:
(1) possible correlation between multiple data
streams
(2) interesting joint patterns across multiple data
streams.
3. Using area graphs to visualize an arithmetic
combination of multiple data streams
• Users can combine multiple temporal variables
together (by + and -) in various ways and then
visually explore the combined distribution.
(2) Event Data
• Events are presented as bars of color, with
their size on screen corresponding to their
duration.
• Users can visually explore (1) freq. of event
(2) its duration and (3) its periodicity
To handle potential more complex patterns
involving more variables and logic operations,
users can define a new event variable.
(3) Concurrent visualization of
Continuous and Event variables
The display panel will highlight those
continuous values at the moments when the
selected events happen.
Event-based Interactive Visual Exploration
By visually exploring the data –
instance by instance,
users can directly compare those moments to detect the
similarities between these.
many
multimedia
data are
essentially
event-driven.
Event Grouping
• Users can visually examine each instance of an event,
and categorize the instances into groups. -> Saved
• The overall grouping results can then be visualized in one
single panel.
Flexible Interfaces between
Visualization and Data Processing
• The media playback panel allows users to play back video
and audio data at various speeds. On the top of this,
– The researchers designed and implemented one
critical component
to connect multimedia playback with
visual data mining
raw multimedia data <-> exploring derived data
• To increase the flexibility to be compatible
with data mining,
– this system allows users to use any programming
language (like: MatLab, R, C/C++) to obtain new
results.
The researchers' Future Work
• to conduct a systematical evaluation of
the prototype system
–using experimental paradigm
–to have a better idea of:
• what are advantages and limitations of the
current system and
• what will need to be improved.
Conclusion of the Article
• The visualization tool developed allows
users
–To easily examine and synthesize
information into new ideas and
hypotheses, but also
–quickly quantify and test the insights
gained from visualization.
Multimedia Data Mining for
Traffic Video Sequences
Shu-Ching Chen, Mei-Ling Shyu,
Chengcui Zhang, Jeff Strickrott
Introduction and Motivation
• Traffic video analysis can discover and provide
useful Information such as:
– queue detection, vehicle classification, traffic flow,
and incident detection at the Intersections.
• Some municipalities are installing video camera
systems to monitor and extract traffic control
information from their highways in real time.
Identified Problems
• The current transportation applications and research
work either:
– Do not connect to databases or
– have limited capabilities to index and store the
collected data
– cannot provide organized, unsupervised,
conveniently accessible and easy-to-use
multimedia information to traffic planners.
• In order to discover and provide some important but
previously unknown knowledge from the traffic video
sequences to the traffic planners, multimedia data
mining techniques need to be employed.
The Proposed Framework
• Includes:
–Background Subtraction
–Vehicle Object Identification and Tracking
–Multimedia Augmented Transition Network
(MATN) model and
–Multimedia Input Strings
Background Subtraction
• It is a technique to remove non-moving
components from a video sequence.
• This technique was used:
to enhance the basic SPCPE algorithm
(Simultaneous Partition and Class Parameter Estimation)
(unsupervised video segmentation method)
to get better segmentation results.
The main assumption is that the camera remains stationary
Object Tracking
• The 1st
step -> to extract the segments in each class.
• Then the minimal bounding box and the centroid
point for each segment are obtained.
Using MATNs & Multimedia Input Strings
to Model Video Key Frames
• A Multimedia Augmented Transition Network
(MATN) model
– can be represented diagrammatically by a labeled
directed graph, called a transition graph.
• A Multimedia Input String is
–accepted by the grammar if there is a path of
transitions which corresponds to the sequence of symbols in
the string and which leads from a specified initial
state to one of a set of specified final states.
… MATNs and Multimedia Input Strings
• Key frames play as the indices for a shot.
• In this paper, each frame is divided into nine sub-
regions with the corresponding subscript numbers.
• Each key frame is represented by:
– an input symbol in a multimedia input string
– “&” symbol between two vehicle objects
• is used to denote that the vehicle objects appear in the same
frame.
– subscripted numbers
• are used to distinguish the relative spatial positions of the
vehicle objects relative to the target object “ground”.
Multimedia Input String that represents two key frames
Example:
the nine sub-regions and
their corresponding subscript numbers
an example MATN model
Experiment Setup
• The traffic video sequence was:
– captured with a Sony Handycam CCD TR64 and
– digitized with an Brooktree Bt848 based capture card
on a Windows NT 2000 Celeron-based platform.
• The video sequence consists of about 16 minutes of
video with approximately constant lighting conditions.
• A small portion of the traffic video is used to
illustrate how the proposed framework can be
applied to traffic applications to answer spatio-
temporal queries like:
“Estimate the traffic flow of this road
intersection from 8:00 AM to 8:30 AM.”
Experiment Results
• Using the background subtraction technique,
– both the efficiency of the segmentation
process and the accuracy of the segmentation
results are improved achieving more accurate
video indexing and annotation.
Conclusion
• The proposed framework can model complex
situations such as traffic video for intersection
monitoring.
• Segmentation results as
well as the multimedia
input strings for frames
4, 9, 15, 16 and 35.
• The leftmost column
gives the original video
frames;
• the second column
shows difference images
obtained by subtracting
the background
reference frame from
the original frames;
• the third column shows
the vehicle segments
extracted from the
video frames, and
• the rightmost column
shows the bounding
boxes of the vehicle
objects
Tune into the voice of your
customer with voice mining
By Manya Mayes
Introduction
• Understanding customer comments coming in the forms text,
audio and video that are word for word records, e-mail, voice
mail, surveys and the Web, and most recently via social
networking sites (YouTube, Facebook, etc.) will determine the
business transaction of an organization.
• Especially the vice mining is getting growth and helps to
identify the reasons for call point, the effectiveness of
marketing campaigns, the competitors most mentioned by
your clients, why certain products sell more than others, and
predict the customer satisfaction level of every interaction.
• Combing voice capture with business intelligence, analytics
and text mining provides valuable customer intelligence for
marketing and competitive intelligence business functions.
Introduction(Cont.)
• In addition to the traditional keyboard-entered comments of customer
feedback, companies may also record the audio of these customer
interactions spoken by both the agent and the customer.
• The manual listening and interpreting customers’ feedback is often
inaccurate and inconsistent.
• As a result, automated methods are becoming more prevalent.
• An automated phonetic index search is the typical approach to
understand customer audio information using particular segments
voice-to-text transcription that is identified by domain expertise.
• Stored audio signals can be transcribed and analyzed to predict what is
most likely to happen next such as determining the likelihood that the
customer will close his or her account.
• Techniques such as segmentation are used to automatically group or
classify call transcriptions.
The process: analyzing audio data and
Phonetic index search
• Analyzing audio data can help you identify the call reasons,
the effectiveness of campaigns, the competitors
mentioned by clients, and can predict the customer
satisfaction level.
• The audio signal itself can be analyzed for a wide variety of
information with the metadata
– The Captured metadata fields include call length,
Emotion/stress detection, Silence, number of holds,
number of transfers and the like.
The process(Cont.)
• Phonemes are the basic units of sounds in a
language and a phonetic index is a partial
transcription of an audio signal.
• Metadata about calls can be used for reporting
purposes and incorporated into analytical models
for discovery purposes and identify a dissatisfied
customer.
• A phonetic index search automatically transforms
the captured audio signal into a sequence of
phonemes or sounds.
• Phonetics indexing allows fast searching of the
signal.
Categorizing calls
• Categorizing calls based on the phonetic index search
and full text transcription with the results of the search
indexes.
• Transcriptions are usually only performed on certain
calls
– e.g., calls where customers suggest they will close their
accounts, cancel their subscriptions or call with service
problems.
• By providing a full transcription of all customer calls
and combining the metadata about the call can:
– describe the issues that customers are calling and predict
which customers are most likely to close their accounts, etc
– allowing appropriate action to be taken before it is too
late.
Voice mining using SAS Text Miner and
its advantage
• SAS can read the audio outputs that are captured using Call
Miner, NICE Systems, other similar tools.
• The information provided by the voice capture includes:
– the categories created by the phonetic index search,
– the metadata about the call and the call transcriptions.
• SAS provides industry-leading data integration with the
ability to access a wide variety of data sources and formats,
enabling information to be delivered to users in a way that
they can use it.
– SAS Text Miner provides access to more than 200 document formats
and users are able to gather information from voice vendors of
choice
Voice mining(Cont.)
• The automatically clustering/segmenting documents
and profiling these segments using metadata about
the call will provide further information about the
segment.
– The method is used understand the types of issues
customers are calling about.
• Profiling these segments using metadata about the call
and related customer information provides further
information about the segments.
• The predictive modeling which is a data driven and
consistent method to understand what might happen
next and enables the center agent too take preventive
actions.
• The customer’s experience over the phone can help
predict loyalty, churn, satisfaction and more
Integrating structured data for segment
profiling
• To get an even clearer picture of the results of
text clustering, related structured data (metadata
about the call and related customer information)
was used to further describe the issues.
• The results show that call length and the call hold
indicator provide additional information in the
billing issues cluster.
• Terms that are highly associated with the
selected term are displayed in a hyperbolic tree
structure.
Predicting Cancellation of Subscription
• Once Instance
• In order to make a prediction on the likelihood of
cancellation of subscription, the churn prediction model
used which includes the call
– outcome(result of the call) showing whether or not the
customer cancelled his or her subscription
– the data describing the interaction with the customer such
as the transcriptions of the calls, the metadata about the
calls, demographics, purchasing behavior and
frequency/monetary information.
• The model to predict cancellation of subscription should
use historical data up to, but not including, the call
where the customer actually cancels his or her
subscription.
Predicting(Cont.)
Predicting (Cont)
• The artificial value of 1 is given whenever the term
“cancel” or any of its variations (such as cancels,
cancelled, cancelling, cancellation, etc.) was found and
a value of 0 otherwise.
• The Text Miner node then takes the call transcriptions
and uses linguistic techniques to identify terms,
multiple-word terms, parts of speech, stems, etc., and
uses statistical techniques to give the customer
feedback text a numeric transformation.
• The data is then passed to the Regression, Neural
Network and Decision Tree nodes to build multiple
competing models using the churn outcome and the
text transformations..
Predicting(Cont.)
• The metadata about the call and related customer
information also may be used at this time to
improve model lift.
• The Model Comparison node then takes the results
of each of the preceding models and selects the
“best” model based on which model correctly
classifies the text as predicting churn or no churn.
• Once a best model has been selected, the
underlying code is then used to apply the model to
new data. This is known as model scoring or model
deployment.
Predicting (Cont.)
• The underlying SAS code behind the predictive model
described above was saved and registered as a SAS
Stored Process via the SAS Management Console.
• Several stored processes are created to highlight
various deployments of the MSNTV transcribed data.
• Since the current voice technology does not allow for
real-time transcription, voice captures cannot be
deployed in real time.
• The results are customized to show the original text
and the corresponding prediction of service
cancellation.
Predicting (Cont)
• The user can manipulate the resulting
spreadsheet to show a graphical
representation of the cancellations of
subscriptions. The SAS tasks available via the
SAS Add-In for Microsoft Office are displayed.
• SAS BI dashboards display additional
information about the MSNTV data. The
dashboard is configured to show several views
of the call center data.
Predicting (Cont)
•The propensity to
cancel indicator is about
38 percent chance of
cancelling their
subscriptions.
•The power can enable
companies to retain key
customers and avoid
the costs associated
with undue churn.
Conclusion
• Based on the Voice Mining tools and creating a
stored process can produce valuable information and
knowledge available to business analysts and
managers who might not have had access to this
information previously.
• Despite data quality issues, SAS Text Miner did a
remarkable job of finding consistent patterns in the
customer and agent comments
• By actually hearing and understanding what
customers are already telling you, numerous
indicators can be used to build loyalty, reduce churn
and make your products safer.
Recommendations
• As much as the importance of multimedia mining, there are
no local researches on multimedia mining and only few
researches multimedia retrieval (esp. image).
• Therefore, we recommend conducting research on
multimedia mining for audio, speech, video as well as
advanced image retrieval systems.
• Organizations like libraries, museums and other information
centers (like Television and Radio broadcasters) that have
digital repositories should use the advantages provided by
the application multimedia mining.
• Other organizations (such as Transportation and traffic office)
are also recommended to digitize the information which is
kept in non-computer readable formats and apply multimedia
mining on top of it.
Multimedia Mining

Multimedia Mining

  • 1.
  • 2.
    Presentation Outline: • Introductionto MM • Article Reviews: 1. Visual Mining of Multimedia Data for Social and Behavioral Studies 2. Multimedia Data Mining for Traffic Video Sequences 3. Tune into the voice of your customer with voice mining • Conclusion • Recommendations
  • 3.
    Introduction • Advances inmultimedia acquisition and storage technology have led to tremendous growth in very large and detailed multimedia databases. • A large amount of high-resolution high-quality multimedia data has been collected in research laboratories in various scientific disciplines, especially in social, behavioral and cognitive studies. • If these multimedia files are analyzed, useful information to users can be revealed.
  • 4.
    … Introduction • Multimediamining deals with the extraction of implicit knowledge, multimedia data relationships, or other patterns not explicitly stored in multimedia files. (S. Kotsiantis et. al, 2006) • Multimedia mining is an interdisciplinary endeavor that draws upon expertise in computer vision, multimedia processing, multimedia retrieval, data mining, machine learning, database and artificial intelligence.
  • 5.
    … Introduction • Howto automatically and effectively discover new knowledge from rich multimedia data poses a compelling challenge. • Multimedia data mining consists of two stages. 1) Researchers extract some derived data from raw multimedia data. • This step can be implemented by human coding or by using image/speech processing programs. 1) Researchers work on derived data with the goal to find interesting patterns.
  • 6.
    Visual Mining ofMultimedia Data for Social and Behavioral Studies Chen Yu, Yiwen Zhong, Thomas Smith, Ikhyun Park, Weixia Huang
  • 7.
    Visualization approaches formultivariate data • TimeSearcher – is a time series exploratory and visualization tool that allows users to query time series. • ThemeRiver – is used to visualize thematic changes in large document collections. • VizTree – is designed to visually mine and monitor massive time series data. • Spiral – is mainly used to compare and analyze periodic structures in time series data, • Van Wijk et al – designed a cluster and calendar-based approach for the visualization of calendar-based data.
  • 8.
    Identified Problems • Currentmethods of visualization deal with linear time or highly periodic time; – not designed to handle event-based data which is typical in multimedia applications. • Those methods focus on visualization, navigation, or query only. Objective • This new approach provides an interactive tool to integrate visualization with data mining.
  • 10.
    Multimedia Dataset Used •Video: – there were three video streams recorded simultaneously with the frequency of 10 frames per second, and the resolution of each frame is 320x240. • Audio: – The speech of the participants was recorded at a frequency of 44.1 kHz. • Motion tracking: – there were two sensors, one on each participant’s head. Each sensor provided 6 dimensional (x, y, z, head, pitch, and roll) data points at a frequency of 120Hz. • In total, the dataset consists of about 90,000 image frames, 864,000 position data points, and 50 minutes of speech.
  • 11.
    Visualization of MultimediaData There are two major display components in the application: a multimedia playback window and a visualization window. to visually explore the derived data streams and discover new patterns and findings
  • 12.
    Data Representation andVisualization • The time-based /temporal data can be categorized into two kinds: 1. CONTINUOUS VARIABLES: • related to time points (a series of single measurement at particular moments in time) 2. EVENT VARIABLES: • related to time intervals (e.g. the onset and offset of an event)
  • 13.
    (1) Continuous TimeSeries Data • 3 ways to visually explore continuous time series data: {1} as individual data streams {2} as a set of multiple data streams {3} as an arithmetic combination of multiple data streams
  • 14.
    1. Using curvesto visualize individual data streams • A novel feature added -> HISTOGRAM DISPLAY. • The purpose is to allow users to explore individual data streams and examine both the overall statistics of a data stream (Global Histogram) and the statistics within a local window (Local Histogram).
  • 15.
    2. Using gray-levelrepresentation to visualize a set of multiple data streams • Purpose ->to visually display and explore two kinds of information: (1) possible correlation between multiple data streams (2) interesting joint patterns across multiple data streams.
  • 16.
    3. Using areagraphs to visualize an arithmetic combination of multiple data streams • Users can combine multiple temporal variables together (by + and -) in various ways and then visually explore the combined distribution.
  • 17.
    (2) Event Data •Events are presented as bars of color, with their size on screen corresponding to their duration. • Users can visually explore (1) freq. of event (2) its duration and (3) its periodicity
  • 18.
    To handle potentialmore complex patterns involving more variables and logic operations, users can define a new event variable.
  • 19.
    (3) Concurrent visualizationof Continuous and Event variables The display panel will highlight those continuous values at the moments when the selected events happen.
  • 20.
    Event-based Interactive VisualExploration By visually exploring the data – instance by instance, users can directly compare those moments to detect the similarities between these. many multimedia data are essentially event-driven.
  • 21.
    Event Grouping • Userscan visually examine each instance of an event, and categorize the instances into groups. -> Saved • The overall grouping results can then be visualized in one single panel.
  • 22.
    Flexible Interfaces between Visualizationand Data Processing • The media playback panel allows users to play back video and audio data at various speeds. On the top of this, – The researchers designed and implemented one critical component to connect multimedia playback with visual data mining raw multimedia data <-> exploring derived data • To increase the flexibility to be compatible with data mining, – this system allows users to use any programming language (like: MatLab, R, C/C++) to obtain new results.
  • 23.
    The researchers' FutureWork • to conduct a systematical evaluation of the prototype system –using experimental paradigm –to have a better idea of: • what are advantages and limitations of the current system and • what will need to be improved.
  • 24.
    Conclusion of theArticle • The visualization tool developed allows users –To easily examine and synthesize information into new ideas and hypotheses, but also –quickly quantify and test the insights gained from visualization.
  • 25.
    Multimedia Data Miningfor Traffic Video Sequences Shu-Ching Chen, Mei-Ling Shyu, Chengcui Zhang, Jeff Strickrott
  • 26.
    Introduction and Motivation •Traffic video analysis can discover and provide useful Information such as: – queue detection, vehicle classification, traffic flow, and incident detection at the Intersections. • Some municipalities are installing video camera systems to monitor and extract traffic control information from their highways in real time.
  • 27.
    Identified Problems • Thecurrent transportation applications and research work either: – Do not connect to databases or – have limited capabilities to index and store the collected data – cannot provide organized, unsupervised, conveniently accessible and easy-to-use multimedia information to traffic planners. • In order to discover and provide some important but previously unknown knowledge from the traffic video sequences to the traffic planners, multimedia data mining techniques need to be employed.
  • 28.
    The Proposed Framework •Includes: –Background Subtraction –Vehicle Object Identification and Tracking –Multimedia Augmented Transition Network (MATN) model and –Multimedia Input Strings
  • 29.
    Background Subtraction • Itis a technique to remove non-moving components from a video sequence. • This technique was used: to enhance the basic SPCPE algorithm (Simultaneous Partition and Class Parameter Estimation) (unsupervised video segmentation method) to get better segmentation results.
  • 30.
    The main assumptionis that the camera remains stationary
  • 31.
    Object Tracking • The1st step -> to extract the segments in each class. • Then the minimal bounding box and the centroid point for each segment are obtained.
  • 32.
    Using MATNs &Multimedia Input Strings to Model Video Key Frames • A Multimedia Augmented Transition Network (MATN) model – can be represented diagrammatically by a labeled directed graph, called a transition graph. • A Multimedia Input String is –accepted by the grammar if there is a path of transitions which corresponds to the sequence of symbols in the string and which leads from a specified initial state to one of a set of specified final states.
  • 33.
    … MATNs andMultimedia Input Strings • Key frames play as the indices for a shot. • In this paper, each frame is divided into nine sub- regions with the corresponding subscript numbers. • Each key frame is represented by: – an input symbol in a multimedia input string – “&” symbol between two vehicle objects • is used to denote that the vehicle objects appear in the same frame. – subscripted numbers • are used to distinguish the relative spatial positions of the vehicle objects relative to the target object “ground”.
  • 34.
    Multimedia Input Stringthat represents two key frames Example: the nine sub-regions and their corresponding subscript numbers an example MATN model
  • 35.
    Experiment Setup • Thetraffic video sequence was: – captured with a Sony Handycam CCD TR64 and – digitized with an Brooktree Bt848 based capture card on a Windows NT 2000 Celeron-based platform. • The video sequence consists of about 16 minutes of video with approximately constant lighting conditions. • A small portion of the traffic video is used to illustrate how the proposed framework can be applied to traffic applications to answer spatio- temporal queries like: “Estimate the traffic flow of this road intersection from 8:00 AM to 8:30 AM.”
  • 36.
    Experiment Results • Usingthe background subtraction technique, – both the efficiency of the segmentation process and the accuracy of the segmentation results are improved achieving more accurate video indexing and annotation. Conclusion • The proposed framework can model complex situations such as traffic video for intersection monitoring.
  • 37.
    • Segmentation resultsas well as the multimedia input strings for frames 4, 9, 15, 16 and 35. • The leftmost column gives the original video frames; • the second column shows difference images obtained by subtracting the background reference frame from the original frames; • the third column shows the vehicle segments extracted from the video frames, and • the rightmost column shows the bounding boxes of the vehicle objects
  • 38.
    Tune into thevoice of your customer with voice mining By Manya Mayes
  • 39.
    Introduction • Understanding customercomments coming in the forms text, audio and video that are word for word records, e-mail, voice mail, surveys and the Web, and most recently via social networking sites (YouTube, Facebook, etc.) will determine the business transaction of an organization. • Especially the vice mining is getting growth and helps to identify the reasons for call point, the effectiveness of marketing campaigns, the competitors most mentioned by your clients, why certain products sell more than others, and predict the customer satisfaction level of every interaction. • Combing voice capture with business intelligence, analytics and text mining provides valuable customer intelligence for marketing and competitive intelligence business functions.
  • 40.
    Introduction(Cont.) • In additionto the traditional keyboard-entered comments of customer feedback, companies may also record the audio of these customer interactions spoken by both the agent and the customer. • The manual listening and interpreting customers’ feedback is often inaccurate and inconsistent. • As a result, automated methods are becoming more prevalent. • An automated phonetic index search is the typical approach to understand customer audio information using particular segments voice-to-text transcription that is identified by domain expertise. • Stored audio signals can be transcribed and analyzed to predict what is most likely to happen next such as determining the likelihood that the customer will close his or her account. • Techniques such as segmentation are used to automatically group or classify call transcriptions.
  • 41.
    The process: analyzingaudio data and Phonetic index search • Analyzing audio data can help you identify the call reasons, the effectiveness of campaigns, the competitors mentioned by clients, and can predict the customer satisfaction level. • The audio signal itself can be analyzed for a wide variety of information with the metadata – The Captured metadata fields include call length, Emotion/stress detection, Silence, number of holds, number of transfers and the like.
  • 42.
    The process(Cont.) • Phonemesare the basic units of sounds in a language and a phonetic index is a partial transcription of an audio signal. • Metadata about calls can be used for reporting purposes and incorporated into analytical models for discovery purposes and identify a dissatisfied customer. • A phonetic index search automatically transforms the captured audio signal into a sequence of phonemes or sounds. • Phonetics indexing allows fast searching of the signal.
  • 43.
    Categorizing calls • Categorizingcalls based on the phonetic index search and full text transcription with the results of the search indexes. • Transcriptions are usually only performed on certain calls – e.g., calls where customers suggest they will close their accounts, cancel their subscriptions or call with service problems. • By providing a full transcription of all customer calls and combining the metadata about the call can: – describe the issues that customers are calling and predict which customers are most likely to close their accounts, etc – allowing appropriate action to be taken before it is too late.
  • 44.
    Voice mining usingSAS Text Miner and its advantage • SAS can read the audio outputs that are captured using Call Miner, NICE Systems, other similar tools. • The information provided by the voice capture includes: – the categories created by the phonetic index search, – the metadata about the call and the call transcriptions. • SAS provides industry-leading data integration with the ability to access a wide variety of data sources and formats, enabling information to be delivered to users in a way that they can use it. – SAS Text Miner provides access to more than 200 document formats and users are able to gather information from voice vendors of choice
  • 45.
    Voice mining(Cont.) • Theautomatically clustering/segmenting documents and profiling these segments using metadata about the call will provide further information about the segment. – The method is used understand the types of issues customers are calling about. • Profiling these segments using metadata about the call and related customer information provides further information about the segments. • The predictive modeling which is a data driven and consistent method to understand what might happen next and enables the center agent too take preventive actions. • The customer’s experience over the phone can help predict loyalty, churn, satisfaction and more
  • 46.
    Integrating structured datafor segment profiling • To get an even clearer picture of the results of text clustering, related structured data (metadata about the call and related customer information) was used to further describe the issues. • The results show that call length and the call hold indicator provide additional information in the billing issues cluster. • Terms that are highly associated with the selected term are displayed in a hyperbolic tree structure.
  • 47.
    Predicting Cancellation ofSubscription • Once Instance • In order to make a prediction on the likelihood of cancellation of subscription, the churn prediction model used which includes the call – outcome(result of the call) showing whether or not the customer cancelled his or her subscription – the data describing the interaction with the customer such as the transcriptions of the calls, the metadata about the calls, demographics, purchasing behavior and frequency/monetary information. • The model to predict cancellation of subscription should use historical data up to, but not including, the call where the customer actually cancels his or her subscription.
  • 48.
  • 49.
    Predicting (Cont) • Theartificial value of 1 is given whenever the term “cancel” or any of its variations (such as cancels, cancelled, cancelling, cancellation, etc.) was found and a value of 0 otherwise. • The Text Miner node then takes the call transcriptions and uses linguistic techniques to identify terms, multiple-word terms, parts of speech, stems, etc., and uses statistical techniques to give the customer feedback text a numeric transformation. • The data is then passed to the Regression, Neural Network and Decision Tree nodes to build multiple competing models using the churn outcome and the text transformations..
  • 50.
    Predicting(Cont.) • The metadataabout the call and related customer information also may be used at this time to improve model lift. • The Model Comparison node then takes the results of each of the preceding models and selects the “best” model based on which model correctly classifies the text as predicting churn or no churn. • Once a best model has been selected, the underlying code is then used to apply the model to new data. This is known as model scoring or model deployment.
  • 51.
    Predicting (Cont.) • Theunderlying SAS code behind the predictive model described above was saved and registered as a SAS Stored Process via the SAS Management Console. • Several stored processes are created to highlight various deployments of the MSNTV transcribed data. • Since the current voice technology does not allow for real-time transcription, voice captures cannot be deployed in real time. • The results are customized to show the original text and the corresponding prediction of service cancellation.
  • 52.
    Predicting (Cont) • Theuser can manipulate the resulting spreadsheet to show a graphical representation of the cancellations of subscriptions. The SAS tasks available via the SAS Add-In for Microsoft Office are displayed. • SAS BI dashboards display additional information about the MSNTV data. The dashboard is configured to show several views of the call center data.
  • 53.
    Predicting (Cont) •The propensityto cancel indicator is about 38 percent chance of cancelling their subscriptions. •The power can enable companies to retain key customers and avoid the costs associated with undue churn.
  • 54.
    Conclusion • Based onthe Voice Mining tools and creating a stored process can produce valuable information and knowledge available to business analysts and managers who might not have had access to this information previously. • Despite data quality issues, SAS Text Miner did a remarkable job of finding consistent patterns in the customer and agent comments • By actually hearing and understanding what customers are already telling you, numerous indicators can be used to build loyalty, reduce churn and make your products safer.
  • 55.
    Recommendations • As muchas the importance of multimedia mining, there are no local researches on multimedia mining and only few researches multimedia retrieval (esp. image). • Therefore, we recommend conducting research on multimedia mining for audio, speech, video as well as advanced image retrieval systems. • Organizations like libraries, museums and other information centers (like Television and Radio broadcasters) that have digital repositories should use the advantages provided by the application multimedia mining. • Other organizations (such as Transportation and traffic office) are also recommended to digitize the information which is kept in non-computer readable formats and apply multimedia mining on top of it.

Editor's Notes

  • #12 The multimedia playback window is a digital media player that allows users to access video and audio data and play them back in various ways. The visualization window is the main tool that allows users to visually explore the derived data streams and discover new patterns and findings.
  • #15 The local histogram is updated as users move the zoom box while the global histogram is constant.
  • #16 The local histogram is updated as users move the zoom box while the global histogram is constant.
  • #18 Our visualization of multiple event variables allows users to see not only individual events but also joint events
  • #21 The researchers observed that many multimedia data are essentially event-driven.
  • #23 The tool provide flexible interfaces between visualization and data mining. It is important that users can refer to the raw multimedia data while exploring derived data. as far as users write the results into text files with pre-defined formats.