Visual and audio data mining


Published on

This presentation contains a literature review about visual and audio data mining.

Published in: Education, Technology
1 Comment
  • For Business Analytics tools Online Training register at
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Visual and audio data mining

  1. 1. Visual & Audio Data Mining By AS2010379 - H.N.Gunasinghe CSC 463 2.0 – Data Warehousing, Data Mining and Information Retrieval Department of Statistics and Computer Science
  2. 2. Overview  Data mining  Visual Data mining  Audio Data mining  Summary  References AS2010379 | H.N.Gunasinghe 2
  3. 3. Data mining AS2010379 | H.N.Gunasinghe 3
  4. 4. Data mining AS2010379 | H.N.Gunasinghe 4  Data mining (knowledge discovery from data) ◦ Automated extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data Data Cleaning Data Integration Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation Databases Knowledge Discovery (KDD) Process
  5. 5. Multimedia data Multimedia data Visual Still Motion Audio Speech Music AS2010379 | H.N.Gunasinghe 5
  6. 6. EXAMPLE (1) GIS uses image data to visualize patterns AS2010379 | H.N.Gunasinghe 6
  7. 7. EXAMPLE (2) A patient record may contain several types of data //to take a decision about a decease we need consider each type of data of huge amount of people AS2010379 | H.N.Gunasinghe 7
  8. 8. Other applications  Speaker emotion recognition in audio  Automatic summarization of TV programs  Traffic monitoring systems  The detection of faces in images and image sequences  Detection of generic sport video documents  The CONQUEST system combines satellite data with geophysical data to discover patterns in global climate change.  The SKICAT system integrates techniques for image processing and data classification in order to identify ’sky objects’ captured in a very large satellite picture set.  An example of video and audio data mining can be found in the Mining Cinematic Knowledge project which created a movie mining system by examining the suitability of existing concepts in data mining to multimedia.  Etc… AS2010379 | H.N.Gunasinghe 8
  9. 9. Multimedia database system  Generally, multimedia database systems store and manage a large collection of multimedia objects, such as image, video, audio and hypertext data.  Thus, in multimedia documents, knowledge discovery deals with non-structured information.  For this reason, we need tools for discovering relationships between objects or segments within multimedia document components, such as ◦ classifying images based on their content ◦ extracting patterns in sound ◦ categorizing speech and music ◦ recognizing and tracking objects in video streams AS2010379 | H.N.Gunasinghe 9
  10. 10. KDD process of multimedia data Multimedia files from a database must be first preprocessed to improve their quality multimedia files undergo various transformations and features extraction to generate the important features from the multimedia files With the generated features, mining can be carried out using data mining techniques to discover significant patterns These resulting patterns are then evaluated and interpreted in order to obtain the final application’s knowledge. AS2010379 | H.N.Gunasinghe 10
  11. 11. Mining process (2) AS2010379 | H.N.Gunasinghe 11
  12. 12. Models for multimedia mining model classification decisio n trees rule based method s Artificial Neural Networks Support Vector Machine s clustering partitioning methods model- based methods grid- based methods density- based methods hierarchical methods AS2010379 | H.N.Gunasinghe 12
  13. 13. Visual Data Mining AS2010379 | H.N.Gunasinghe 13
  14. 14. IMAGE MINING (1) Function-Driven Frameworks AS2010379 | H.N.Gunasinghe 14
  15. 15. IMAGE MINING (2) An information- driven image mining framework AS2010379 | H.N.Gunasinghe 15
  16. 16. Image Mining Techniques  Object Recognition ◦ can be referred to as a supervised labeling problem based on models of known objects. ◦ Specifically, given a target image containing one or more interesting objects and a set of labels corresponding to a set of models known to the system, what object recognition does is to assign correct labels to regions, or a set of regions, in the image.  Image Retrieval  Image Indexing  Image Classification and Image Clustering  Association Rule Mining AS2010379 | H.N.Gunasinghe 16
  17. 17. Image Retrieval Image mining requires that images be retrieved according to some requirement specifications. The requirement specifications can be classified into three levels of increasing complexity (a) Level 1  comprises image retrieval by primitive features such as color, texture, shape or the spatial location of image elements. Examples of such queries are “Retrieve the images with long thin red objects in the top right-hand corner” (b) Level 2  comprises image retrieval by derived or logical features like objects of a given type or individual objects or persons. Examples include “Retrieve images of round table” (c) Level 3  comprises image retrieval by abstract attributes, involving a significant amount of high-level reasoning about the meaning or purpose of the objects or scenes depicted. For example, we can have queries such as “Retrieve the images of football match” AS2010379 | H.N.Gunasinghe 17
  18. 18. Image Clustering  Image clustering is usually performed in the early stages of the mining process.  Feature attributes that have received most attention for clustering are color, texture and shape.  There is a wealth of clustering techniques available: ◦ hierarchical clustering algorithms ◦ partition-based algorithms ◦ nearest neighbor clustering ◦ fuzzy clustering ◦ evolutionary clustering approaches AS2010379 | H.N.Gunasinghe 18
  19. 19. Association rules  Association rule mining has been applied to large image databases.  There are two main approaches. ◦ mine from large collections of images alone ◦ mine from the combined collections of images and associated alphanumeric data  Example:- If the upper part of the picture is at least 50% blue, it is likely to represent sky.  The image can be modeled as a transaction, assigned with an ImageID, and the features of the images are the items contained in the transaction.  Therefore, mining the frequently occurring patterns among different images becomes mining the frequent patterns in a set of transactions. AS2010379 | H.N.Gunasinghe 19
  20. 20. Video mining AS2010379 | H.N.Gunasinghe 20 Video mining can be defined as the unsupervised discovery of patterns in an audio-visual content. The temporal (motion) and spatial (color, texture, shapes and text regions) features of the video can be used for the task mining.
  21. 21. GENERAL FRAMEWORK FOR VIDEO DATA MINING AS2010379 | H.N.Gunasinghe 21
  22. 22. Video data mining approaches  Video structure mining  Video clustering and classification  Video association mining  Video motion mining  Video pattern mining AS2010379 | H.N.Gunasinghe 22
  23. 23. VIDEO CLUSTERING AS2010379 | H.N.Gunasinghe 23
  24. 24. Classification  Video classification aims at grouping videos together with similar contents and to disjoin videos with non-similar contents and thus categorizing or assigning class labels to a pattern set under the supervision  video classification approaches ◦ rule-based approach ◦ statistical approach (machine learning) AS2010379 | H.N.Gunasinghe 24
  25. 25. Motion mining  Mining patterns from the movements of moving objects is called motion mining.  First, the features are extracted (physical, visual and aural, motion features) using objects detection and tracking algorithms  Then the significations of the features, trends of moving object activities and patterns of events are mined by computing association relations and spatial-temporal relations among the features. AS2010379 | H.N.Gunasinghe 25
  26. 26. Pattern mining  Video pattern mining detects the special patterns modeled in advance and usually characterized as video events such as dialogue, or presentation events in medical video.  This can be divided into two categories such as ◦ mining similar motion patterns ◦ mining similar objects AS2010379 | H.N.Gunasinghe 26
  27. 27. Video data mining applications  Produced video data mining  Raw video data mining  Medical video mining  Broadcast or prerecorded video mining AS2010379 | H.N.Gunasinghe 27
  28. 28. PRODUCED VIDEO MINING News videos, dramas, and movies…. Classify according to genre AS2010379 | H.N.Gunasinghe 28
  29. 29. Raw video mining  security video generally used for property or public areas  the monitoring video used to monitor the traffic flow AS2010379 | H.N.Gunasinghe 29
  30. 30. BROADCAST OR PRERECORDED VIDEO MINING HMM and SVM are mostly used in these classifications AS2010379 | H.N.Gunasinghe 30
  31. 31. Audio Data Mining AS2010379 | H.N.Gunasinghe 31
  32. 32. Introduction  The Web, databases, and other digitized information warehouses contain a growing volume of audio content.  For example newscasts, sporting events, telephone conversations, recordings of meetings, Webcasts, documentary archives, music, songs etc. AS2010379 | H.N.Gunasinghe 32
  33. 33. Inside audio mining  Audio mining, also called audio searching, takes a text-based query and locates the search term or phrase in an audio file.  Audio indexing uses speech recognition to analyze an entire file and produce a searchable index of content bearing words and their locations.  This is critical because audio content is in a binary format that is otherwise not readily searchable.  Indexing audio content thus enables searching. AS2010379 | H.N.Gunasinghe 33
  34. 34. Audio mining approaches  There are two main approaches to audio mining : 1.Text-based indexing :  It converts speech to text and then identifies words in a dictionary that can contain several hundred thousand entries. If a word or name is not in the dictionary, the system will choose the most similar word it can find.  The system uses language understanding to create a confidence level for its findings. For findings with less than a 100 percent confidence level, the system offers other possible word matches. AS2010379 | H.N.Gunasinghe 34
  35. 35. Audio mining approaches (2) AS2010379 | H.N.Gunasinghe 35 2. Phoneme-based indexing:  It doesn’t convert speech to text but instead works only with sounds.  The system first analyzes and identifies sounds in a piece of audio content to create a phonetic-based index. It then uses a dictionary of several dozen phonemes to convert a user’s search term to the correct phoneme string.  Phonemes are the smallest units of speech in a language, all words are set of phonemes.  Finally, the system looks for the search terms in the index.
  36. 36. Text based indexing vs. phoneme based indexing AS2010379 | H.N.Gunasinghe 36  A phonetic system requires a more efficient search tool because it must phoneticize the query term, then try to match it with the existing phonetic string output .This is considerably more complex than using one of the many existing text-based search tools.  Phoneme-based searches can result in more false matches than the text-based approach, particularly for short search terms, because many words sound alike or sound like parts of other words.  However, phonetic indexing can still be useful if the analyzed material contains important words that are likely to be missing from a text system’s dictionary, such as foreign terms and names of people and places
  37. 37. How the technology works AS2010379 | H.N.Gunasinghe 37  Text- and phoneme-based systems operate in much the same way, except that the former uses a text-based dictionary and the latter uses a phonetic dictionary.  A speech recognizer converts the observed acoustic signal into the corresponding written representation of the spoken words.  Speech recognition software contains acoustic models of the way in which all phonemes are represented.  Also, there is a statistical language model that indicates how likely words are to follow each other in a specific Language.  By using these capabilities, as well as complex probability analysis, the technology can take a speech signal of unknown content and convert it to a series of words .
  38. 38. AS2010379 | H.N.Gunasinghe 38 Figure: ScanSoft Audio Mining System
  39. 39. Audio classification example AS2010379 | H.N.Gunasinghe 39
  40. 40. Audio Classification AS2010379 | H.N.Gunasinghe 40  Since music is often described by genre, we would like to annotate our music data with genre.  Classification by genre is useful for music search and retrieval and also for playlist generation
  41. 41. Models for audio classification  Linear and non-linear neural networks  Gaussian Classification  Gaussian mixture models  Hidden Markov model AS2010379 | H.N.Gunasinghe 41
  42. 42. Classification  Can be either genre or artist based and must contain the correct class for a song so that the algorithm can be trained.  Different algorithms can be used based on the number of attributes they consider when classifying data.  Although more attributes is helpful for human’s when classifying a song, it can have the inverse effect for computer based classification because the similarity measure becomes more difficult AS2010379 | H.N.Gunasinghe 42
  43. 43. Hidden Markov model for phonemes AS2010379 | H.N.Gunasinghe 43  Each phoneme(unit of sound) could be represented by a state of different and varying duration.  Accordingly, the transition between different phonemes to form a word can be represented by A = {aij}.  The observations in this case are the sounds produced in each position and due to the variations in the evolution of each sound this can be also represented by a probabilistic function B = {bj(wk)}.
  44. 44. Clustering  The input vectors that represent a song can have similarity measures applied to them to produce clusters of songs that are contained in the same genre  For hierarchical clustering, single linkage is not good because the clusters produced are too narrow which is not as good when clustering by genre. Complete linkage is a better algorithm.  K-means can be used if the number of genres is known before hand AS2010379 | H.N.Gunasinghe 44
  45. 45. Applications AS2010379 | H.N.Gunasinghe 45  Companies could use audio mining to analyze customer-service and helpdesk conversations or even voice mail.  Law enforcement and intelligence organizations could use the technology to analyze intercepted phone conversations.  Broadcast companies like CNN and Radio Free Asia are already using audio mining to quickly retrieve important background information from previous broadcasts when new stories break.  A US prison is using ScanSoft’s audio mining product to analyze recordings of prisoners’ phone calls to identify illegal activity.
  46. 46. Musical audio mining AS2010379 | H.N.Gunasinghe 46  Musical audio mining relates to the identification of perceptually important characteristics of a piece of music such as melodic, harmonic or rhythmic structure.  Searches can then be carried out to find pieces of music that are similar in terms of their melodic, harmonic and/or rhythmic characteristics.  This type of analysis can also be used in music to determine characteristics like beats per minute (BPM), musical key, and musical structure, information that is employed to classify music.  Music downloading sites that categorize music by genre uses audio mining to organize the music.
  47. 47. Applications  Music recommendation services ◦ E.g. iTunes, Amazon  Music Information Retrieval Systems (both query by Audio and Symbolic representation) ◦ E.g. Shazam  Sound Effect Retrieval  Music streaming websites that contain automatic playlist generation ◦ E.g. Pandora, Spotify  Music copyright resolution  Chorus and Pattern Identification in Songs AS2010379 | H.N.Gunasinghe 47
  48. 48. Summary  Visual and audio Data mining are important subjects  They are new and developing areas  Visual data mining includes mining of still and motion images  Audio data mining is on speech and music data  Mining of visual data is highly subjective  So the choosing of correct techniques is a must  There are lots of techniques to improve accuracy and efficiency of mining algorithms AS2010379 | H.N.Gunasinghe 48
  49. 49. References  [1] V. Vijayakumar and R. Nedunchezhian, "A study on video data mining," in Int J Multimed Info Retr. London, England: Springer, 2012, vol. 1, pp. 153-172.  [2] Han , Jiawei ; Kamber, Micheline ;. [Online]. visual-data-mining?qid=ca7dc5ef-6448-4a22-a59d- 1a8b0ce8b710&v=default&b=&from_search=5#  [3] P. Prashant. (2013) [Online]. 15391429?qid=94a6ac4f-3661-46a7-86cd- da63cd8c8c4e&v=default&b=&from_search=7  [4] T. Kendrick. (2012) Music Data Mining. PDF.  [5] Ji. Zhang, Wynne Hsu, Mong Li Lee.“Image Mining: Issues, Frameworks and Techniques” AS2010379 | H.N.Gunasinghe 49
  50. 50. THANK YOU !!! AS2010379 | H.N.Gunasinghe 50