Audio mining
Upcoming SlideShare
Loading in...5
×
 

Audio mining

on

  • 1,709 views

 

Statistics

Views

Total Views
1,709
Views on SlideShare
1,709
Embed Views
0

Actions

Likes
1
Downloads
52
Comments
3

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

13 of 3 Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Audio mining Audio mining Presentation Transcript

    • Presented By:Patel Prashant(CE 137)
    • The Web, databases, and other digitized information storehouses contain agrowing volume of audio content. For example newscasts, sporting events, telephone conversations,recordings of meetings, Webcasts, documentary archives etc. Users want to make the most of this material by searching and indexingthe digitized audio content. In the past, companies had to create and manually analyze writtentranscripts of audio content because using computers to recognize, interpret,and analyze digitized speech was difficult. However, the development of faster microprocessors, larger storagecapacities, and better speech-recognition algorithms has made audio miningeasier.
    • Audio mining, also called audio searching, takes a text-based queryand locates the search term or phrase in an audio file. This helps users by, for example, letting them quickly get to specificplaces in a recorded conversation or determine when a company ismentioned in a newscast. Audio indexing uses speech recognition to analyze an entire file andproduce a searchable index of content bearing words and theirlocations.This is critical because audio content is in a binary format that isotherwise not readily searchable.Indexing audio content thus enables searching.
    •  There are two main approaches to audio mining :1.Text-based indexing :oIt converts speech to text and then identifies words in a dictionarythat can contain several hundred thousand entries. If a word or name isnot in the dictionary, the system will choose the most similar word itcan find.oThe system uses language understanding to create a confidence levelfor its findings. For findings with less than a 100 percent confidencelevel, the system offers other possible word matches.
    • 2. Phoneme-based indexing:o It doesn’t convert speech to text but instead works only with sounds.o The system first analyzes and identifies sounds in a piece of audiocontent to create a phonetic-based index. It then uses a dictionary ofseveral dozen phonemes to convert a user’s search term to the correctphoneme string.o Phonemes are the smallest units of speech in a language, all wordsare set of phonemes.oFinally, the system looks for the search terms in the index.
    •  A phonetic system requires a more efficient search tool because itmust phoneticize the query term, then try to match it with the existingphonetic string output .This is considerably more complex than usingone of the many existing text-based search tools.Phoneme-based searches can result in more false matches than thetext-based approach, particularly for short search terms, because manywords sound alike or sound like parts of other words.However, phonetic indexing can still be useful if the analyzedmaterial contains important words that are likely to be missing from atext system’s dictionary, such as foreign terms and names of people andplaces
    • Text- and phoneme-based systems operate in much the same way, exceptthat the former uses a text-based dictionary and the latter uses a phoneticdictionary.A speech recognizer converts the observed acoustic signal into thecorresponding written representation of the spoken words.Speech recognition software contains acoustic models of the way in whichall phonemes are represented.Also, there is a statistical language model that indicates how likely words areto follow each other in a specific Language.By using these capabilities, as well as complex probability analysis, thetechnology can take a speech signal of unknown content and convert it to aseries of words .
    • Figure: ScanSoft Audio Mining System
    •  Since music is oftendescribed bygenre, we wouldlike to annotateour music data with genre.Classification by genreis useful formusic search andretrieval and also forplaylist generation
    • Linear and non-linear neural networksGaussian ClassificationGaussian mixture modelsHidden Markov model
    • Let us consider a simple weather forecast problem and try toemulate a model that can predict tomorrows weather based ontoday’s condition. In this example we have three stationary all dayweather, which could be sunny (S), cloudy (C), or Rainy (R). Fromthehistory of the weather of the town under investigation we have thefollowing table
    • •We refer to the weather conditions by state q that aresampled at instant t and the problem is to find theprobability of weather condition of tomorrow given todaysconditionP(qt+1 /qt).•An acceptable approximation for n instants history is :•P(qt+1/qt , qt-1 , qt-2 , ….. , qt-n ) » P(qt+1 /qt).•This is the first order Markov chain as the history isconsidered to be one instant only.
    • Let us now ask this question: Given today as sunny (S) what isthe probability that the next following five days are S , C , C , Rand S, having the above model?The answer resides in the following formula using first orderMarkov chain:P(q1 = S, q2=S, q3=C, q4=C, q5=R, q6=S) =P(S).P(q2=S/q1=S). P(q3=C/q2=S). P(q4=C/q3=C).P(q5=R/q4=C). P(q6=S/q5=R)= 1 x 0.7 x 0.2 x 0.8 x 0.15 x 0.15= 0.00252The initial probability P(S) = 1, as it is assumed that today issunny.
    • The model is completely defined by these three sets of parameters a,b, and p and the model of N states and M observations can bereferred to by :λ = (A , B , p )where A = {aij}, B = {bj(wk)} 1 <=i , j <=N and 1< =k <= M.aij represents the probability of state transition (probability of beingin state Sj given state Si )aij = P(qt+1=Sj / qt=Si) .bj(wk) is the probability distribution in a state Sjw is the alphabet and k is the number of symbols in this alphabet.π = {1 0 0 } is the initial state probability distribution.
    • Each phoneme(unit of sound) could be represented by astate of different and varying duration.Accordingly, the transition between different phonemes toform a word can be represented by A = {aij}. The observations in this case are the sounds produced ineach position and due to the variations in the evolution ofeach soundthis can be also represented by a probabilistic functionB = {bj(wk)}.
    • A major challenge for speech recognition tools has been recognizing thespeech of different users in different environments.With this in mind, BBN, IBM, Fast-Talk, and ScanSoft have designed theiraudio mining technology to be largely speaker independent.For example, Fast-Talk’s acoustic models are trained to recognize numerousspeakers via exposure to audio data from speakers representing various ages,dialects, and speaking styles.Some audio mining technology uses acoustic models tuned to understandspeech from different environments— such as telephony, TV, or radio.
    • Designing filters to reduce the background noise that can interferewith accurate speech recognition.Creating efficient data structures for representing content.Developing algorithms that quickly work through the data structuresduring indexing and searching.
    •  Precision is improving but it is still a key issue impeding thetechnology’s widespread adoption, particularly in such accuracy-critical applications as court reporting and medical dictation.Processing conversational speech can be particularly difficultbecause of such factors as overlapping words and background noise.Breakthroughs in natural language understanding will eventuallylead to big improvements, but until then audio mining will getbetter only incrementally.It’s currently seen as a ‘nice to have,’ not quite yet a ‘need to have’technology.
    • Companies could use audio mining to analyze customer-service andhelpdesk conversations or even voice mail. Law enforcement and intelligence organizations could use thetechnology to analyze intercepted phone conversations.Broadcast companies like CNN and Radio Free Asia are alreadyusingaudio mining to quickly retrieve important background informationfrom previous broadcasts when new stories break.A US prison is using ScanSoft’s audio mining product to analyzerecordings of prisoners’ phone calls to identify illegal activity.
    •  Musical audio mining relates to the identification of perceptuallyimportant characteristics of a piece of music such as melodic,harmonic or rhythmic structure. Searches can then be carried out to find pieces of music that aresimilar in terms of their melodic, harmonic and/or rhythmiccharacteristics.This type of analysis can also be used in music to determinecharacteristics like beats per minute (BPM), musical key, and musicalstructure, information that is employed to classify music.Music downloading sites that categorize music by genre uses audiomining to organize the music.
    •  Research paper: “Lets Hear It For Audio Mining” by Neal Leavitt.Research paper: “Tendencies, Perspectives, and Opportunities ofMusical Audio-Mining” by Ghent University, Belgium.wikipedia.org/wiki/Audio_mining
    • Thank You