SlideShare a Scribd company logo
KBMS – Audio
Frank Nack
ILPS




  ILPS – HCS
  Frank Nack      nack@uva.nl Knowledge-based Media Systems   1
  – winter 2008
Outline



      Last   week
      Audio    – a sonic sign system

      Sound      and emotions




  ILPS    Frank Nack   nack@uva.nl   KBMS   2
Video Application – summary

  Investigated
    ForkBrowser
    AUTEUR

  Findings
      The content determines the application

      Content description

            Application dependent

            Complex

            Recourse demanding

            Time critical

            Incomplete

      Modular Schemata

      Description environment

            Production supportive


            Archive supportive




  ILPS   Frank Nack   nack@uva.nl   KBMS        3
Change




Vision is the ability of the brain and     Audition is the sense of sound perception in
eye to detect electromagnetic waves        response to changes in the pressure exerted
within the visible range (light) and to    by atmospheric particles within a range of 20
                                           to 20000 Hz.
interpret the image as "sight."




 ILPS    Frank Nack   nack@uva.nl   KBMS                                               4
Audio – a sonic sign system




  ILPS   Frank Nack   nack@uva.nl   KBMS   5
Audio – Listen




                                           Sit for a minute and listen with
                                           closed eyes to the sound of
                                           the environment.

                                           Observe how you perceive
                                           the sound(s).




  ILPS   Frank Nack   nack@uva.nl   KBMS                                      6
Audio – Listen




                                           Sound listening experiment results:

                                                Need to focus on focus on sound signal for
                                                 identification purposes

                                                Sound identification also determines location
                                                 (of sound and listener)‫‏‬

                                                Sound signals trigger thought associations




  ILPS   Frank Nack   nack@uva.nl   KBMS                                                         7
Audio – Produce




                                           Talk for a minute to the
                                           person the furthest away from
                                           you in the room, without
                                           looking at him or her.

                                           Observe how you generate
                                           sound and which cues you
                                           use to interpret the input of
                                           your conversation partner.




  ILPS   Frank Nack   nack@uva.nl   KBMS                                   8
Audio – Produce




                                           Sound producing experiment results:

                                                The higher the interference the louder the
                                                 voice.

                                                At a certain sound level one has to look at
                                                 the conversation partner to focus on sound
                                                 stream

                                                The noise level determines the use of
                                                 additional clues, e.g. eye contact, use of
                                                 hands to emphasise importance, etc.




  ILPS   Frank Nack   nack@uva.nl   KBMS                                                       9
Audio – Sound




    Generating                             Hearing   Listening




  ILPS   Frank Nack   nack@uva.nl   KBMS
                                                                 10
Audio – Sound
                                       The temporal lobe is involved in auditory perception and is
                                       home to the primary auditory cortex. It is also important for the
                                       processing of semantics in both speech and vision.

                                       This part of the cortex (primary auditory cortex) is involved in
                                       hearing. Adjacent areas in the superior, posterior and lateral
                                       parts of the temporal lobes are involved in high-level auditory
                                       processing. In humans this includes speech, for which the left
                                       temporal lobe in particular seems to be specialized. The
                                       functions of the left temporal lobe are not limited to low-level
                                       perception but extend to comprehension, naming, verbal
                                       memory and other language functions.

                                       The medial temporal lobes (near the Sagittal plane that divides
                                       left and right cerebral hemispheres) are thought to be involved
                                       in episodic/declarative memory. Deep inside the medial
                                       temporal lobes lie the hippocampi, which are essential for
                                       memory function - particularly the transference from short to
                                       long term memory and control of spatial memory and behavior.




  ILPS   Frank Nack   nack@uva.nl   KBMS
                                                                                                11
Audio – Sound listening


                                           Listening Factors
                                                Physical entities
                                                 Frequency, wavelength, amplitude, intensity,
                                                 speed, direction
                                                Context
                                                      danger, navigation, communication

                                                      speech, music, noise

                                                      cues (e.g. facial expressions, gestures)‫‏‬

                                                      personal experience and feelings

                                                Sound quality
                                                      pitch (melody and harmony),

                                                      rhythm (tempo, meter, and articulation),

                                                      dynamics,

                                                      structure,

                                                      sonic qualities (timbre and texture).




  ILPS   Frank Nack   nack@uva.nl   KBMS                                                           12
Audio – Sound listening




                                           Pitch is the perceived fundamental frequency of
                                           a sound.

                                           Melody is a series of linear events, not a
                                           simultaneity as in a chord.

                                           Harmony is the use of different pitches
                                           simultaneously.




  ILPS   Frank Nack   nack@uva.nl   KBMS                                                     13
Audio – Sound listening




                                           Rhythm is the variation of the length and
                                           accentuation of a series of sounds.

                                           Tempo is the speed or pace of a given piece.

                                           Articulation refers to the performance technique
                                           which affects the transition or continuity on a
                                           single note or between multiple notes or sounds.




  ILPS   Frank Nack   nack@uva.nl   KBMS                                                      14
Audio – Sound listening


                                           Dynamics normally refers to the softness or
                                           loudness of a sound (Amplitude).

                                           Timbre is the quality of a sound that
                                           distinguishes different types of sound production,
                                           such as voices or musical instruments.

                                           Texture is the overall quality of sound of a piece,
                                           most often indicated by the number of voices in
                                           the music and by the relationship between these
                                           voices, using terms such as "thick" and "light",
                                           "rough" or "smooth". The perceived texture of a
                                           piece can be affected by the timbre of the
                                           instruments or voices playing these parts and the
                                           harmony, tempo, and rhythms used.




  ILPS   Frank Nack   nack@uva.nl   KBMS                                                         15
Audio – Interpretation


                                           Prosody is the rhythm, stress, and intonation of
                                           speech or singing.

                                           Prosody may reflect the emotional state of a
                                           speaker

                                              whether an utterance is a statement, a question,
                                           or a command
                                             whether the speaker is being ironic or sarcastic;

                                           emphasis, contrast and focus
                                             in short everything in language that is not encoded

                                           by grammar.




  ILPS   Frank Nack   nack@uva.nl   KBMS                                                       16
Audio – Interpretation

                                           The amygdalae are almond-shaped groups of
                                           nuclei located deep within the medial temporal
                                           lobes of the brain in complex vertebrates,
                                           including humans.

                                           Shown in research to perform a primary role in
                                           the processing and memory of emotional
                                           reactions, i.e. physiological arousal,
                                           expressive behaviors, and conscious
                                           experience" the amygdalae are considered part
                                           of the limbic system.




  ILPS   Frank Nack   nack@uva.nl   KBMS                                                    17
Audio – Interpretation

                                           Sound in music or human speech is about : what
                                           and how.

                                           The how expresses the emotional value.

                                           Primary emotions (i.e., innate emotions, such as
                                           fear) depend on the limbic system circuitry, with
                                           the amygdala and anterior cingulate gyrus being
                                           "key players".

                                           Secondary emotions (i.e., feelings attached to
                                           objects [e.g., to dental drills], events, and
                                           situations through learning) require additional
                                           input, based largely on memory, from the
                                           prefrontal and somatosensory cortices. The
                                           stimulus may still be processed directly via the
                                           amygdala but is now also analyzed in the thought
                                           process.



  ILPS   Frank Nack   nack@uva.nl   KBMS                                                   18
Audio – a sonic sign system - summary



                                               Audition is the sense of sound perception in response
                                                to changes in the pressure exerted by atmospheric
                                                particles within a range of 20 to 20000 Hz.

                                               Audio interpretation of speech, music, noise is context
                                                dependent (space, semantic, emotion)‫‏‬

                                               Individual characteristics are important (thresholds of
                                                frequency spaces)‫‏‬

                                               Audio stimulates primary emotions




  ILPS   Frank Nack   nack@uva.nl   KBMS
                                                                                                          19
Audio – emotional representation




  ILPS   Frank Nack   nack@uva.nl   KBMS
                                           20
Audio – emotion model

                            Robert Plutchik (1980)‫‏‬

                            Anger an aroused state of antagonism toward someone or
                                  something perceived to be the source of an aversive event.
                            Fear / Anxiety a response to tangible and realistic dangers.
                            Sadness characterised by feelings of disadvantage and loss
                            Joy which one experiences feelings ranging from contentment
                                    and satisfaction to bliss.
                            Disgust is associated with things that are perceived as unclean,
                                           inedible, or infectious.
                            Curiosity causes natural inquisitive behaviour such as exploration,
                                            investigation, and learning
                            Surprise is the result of experiencing an unexpected event.
                            Acceptance refers to the experience of a situation without an
                                               intention to change that situation



  ILPS   Frank Nack   nack@uva.nl   KBMS                                                       21
Audio – emotion in human speech

                                              Arousal
                                              is the state of being awake.
                                              (leading to increased heart rate and
                                              blood pressure and a condition of
                                              sensory alertness, mobility and
                                              readiness to respond).




                                              Valence
                                              is the intrinsic attractiveness
                                              (positive valence) or aversiveness
                                              (negative valence) of an event,
Russel (1980)
                                              object, or situation.



   ILPS     Frank Nack   nack@uva.nl   KBMS                                   22
Audio – emotion in human speech II

  Variations of accoustic variables Murray & Arnott (1993)‫‏‬
   Emotion         Pitch                     Intensity   Speaking      Voice
                                                         Rate          Quality
   Anger           high mean,                increased   increased     breathy
                   wide range

   Joy             Increased                 increased   slow tempo;   sometimes
                   mean and                              increased     breathy;
                   range                                 rate          moderately
                                                                       blaring
                                                                       timbre
   Sadness         normal or lower decreased             slow          resonant
                   than normal                                         timbre
                   mean, narrow
                   range



  ILPS     Frank Nack   nack@uva.nl   KBMS                                          23
Audio – Measurement




  ILPS   Frank Nack   nack@uva.nl   KBMS
                                           24
Audio - Emotion Recognition System




                      Labeled
                      Training                                          Classifier
                        Data                                                             Classifier
                                                                         Training


Training phase                              Signal Pre      Feature
                                          Processing and   Extraction
Test phase                                 Segmentation


                                                                        Classification   Emotion
             Test
             Data




     ILPS           Frank Nack   nack@uva.nl   KBMS
                                                                                                   25
Audio – speech material problem




                   acted               read           elicited   real-life
                  emotions            emotions       emotions    emotions


         easy                                                        hard

                                            Access Difficulty




  ILPS    Frank Nack   nack@uva.nl   KBMS                                    26
Audio – Segmentation




    Goal: segment a speech signal into units
    that are representative for emotions.

    Target: words, utterances, words in context, fixed time intervals

    Requirement:
    •  long enough to reliably calculate features by means of statistical
          functions
    •  short enough to guarantee stable acoustic properties with respect to
         emotions within the segment




  ILPS   Frank Nack   nack@uva.nl   KBMS                                      27
Audio – Feature extraction

 Goal:           find those properties of the digitised and pre-processed acoustic
                 signal that are characteristic for emotions and to represent them in
                 a n-dimensional feature vector.

 Features:       Pitch and energy related features
                 Durational and pause related features
                 Types of voice quality features.

 Problem:        A high number of features is not beneficial because most
                 classifiers are negatively influenced by redundant, correlated
                 or irrelevant features.

 Solution:       Principal component analysis (PCA) to reduce multidimensional
                 data sets to lower dimensions for analysis.

                 Compute a high number of features and then apply a feature
                 selection algorithm that chooses the most significant features
                 of the training data for the given task.



  ILPS       Frank Nack   nack@uva.nl   KBMS                                            28
Audio – Feature extraction II

 The raw pitch, energy, etc. contours can be used as is, and are then called
 short-term features, or more often, the actual features are derived from these
 acoustic variables by applying (statistic) functions over the sequence of values
 within an emotion segment, thus called global statistics features.

 Example:

     mean pitch of a word or an utterance
     maximum, or minimum, etc. of the segment,
     Regression


 Metadata for enhancement of recognition accuracy.




      ILPS   Frank Nack   nack@uva.nl   KBMS                                        29
Audio – Classification

Each input unit is represented by a feature vector.

The problem of emotion recognition is now a general data mining problem.
Any statistical classifier that can deal with high-dimensional data can be used as
long as it achieves the following:

                               x                        p (x | apple)
                                 2
                        (color)        p( x | pear )




                                                       x (roundness )
                                                       1

           if p ( pear | x) > p (apple | x) x → pear
                                      else x → apple
  ILPS    Frank Nack   nack@uva.nl   KBMS                                            30
Audio – Classification II

        Choose decision function in feature space.
        Estimate the function from the training set.
        Classify a new pattern on the basis of this decision rule.




  ILPS     Frank Nack   nack@uva.nl   KBMS
                                                                      31
Audio – Classification III
             Linear                        Piecewise linear




             Polynomial                    Feed-forward neural network
  ILPS   Frank Nack   nack@uva.nl   KBMS
                                                                         32
Audio – Common statistical methods


Support vector machines are a set of related supervised learning methods that
analyze data and recognize patterns.




H3 (green) doesn't separate the 2 classes. H1
(blue) does, with a small margin and H2 (red)
with the maximum margin.




  ILPS    Frank Nack   nack@uva.nl   KBMS                                       33
Audio – Common statistical methods II

Neural networks are adaptive systems that change their structure based on
external or internal information that flows through the network during the learning
phase.




  ILPS    Frank Nack   nack@uva.nl   KBMS                                             34
Audio – Common statistical methods III

A decision treeIis a support tool that uses a tree-like graph or model of
decisions and their possible consequences, including chance event outcomes,
resource costs, and utility.




  ILPS   Frank Nack   nack@uva.nl   KBMS                                      35
Audio – Common statistical methods IV

HMM is a statistical Markov model in which the system being modeled is assumed
to be a Markov process with unobserved (hidden) states. An HMM can be
considered as the simplest dynamic Bayesian network.




                                             Probabilistic parameters of a hidden
                                             Markov model (example)
                                             x — states
                                             y — possible observations
                                             a — state transition probabilities
                                             b — output probabilities




  ILPS   Frank Nack   nack@uva.nl   KBMS                                            36
Audio – Measurement - summary

                                               No standard database for benchmarking
                                               It is rarely possible to compare features across
                                                published work, since conditions vary a lot and
                                                even slight changes in the general set-up can
                                                make results incomparable.
                                               Audio segmentation can be performed by voice
                                                activity detection which is a fast segmentation
                                                method not requiring high-level linguistic
                                                knowledge.
                                               Feature extraction may only rely on automatically
                                                computable properties of the acoustic signal, but
                                                this is not a major limitation.
                                               For classification, in principle any statistical
                                                classifier can be used with sophisticated
                                                classifiers being superior in accuracy, but simple
                                                and thus fast classifiers are often sufficient.
                                               The speech database used to train the classifier
                                                should be adjusted to the particular application as
                                                much as possible.


  ILPS   Frank Nack   nack@uva.nl   KBMS
                                                                                                      37
Audio – summary




                                              Audition is a sign system but it is more complicated
                                               to identify the smallest sign unit, as the temporal
                                               nature of sound is essential.

                                              The properties of the acoustic signal are enough to
                                               identify a sound BUT individual characteristics are
                                               important (thresholds of frequency spaces)‫‏‬

                                              The interpretation of sound relies on context (other
                                               clues, such as facial expression or gestures for
                                               spoken language, are of importance)‫‏‬

                                              Audio stimulates primary emotions, though the
                                               identification for each individual depends on its
                                               thresholds for arousal and valence.




 ILPS   Frank Nack   nack@uva.nl   KBMS
                                                                                                      38
Video – References




  ILPS   Frank Nack   nack@uva.nl   KBMS
                                           39
Audio – References

 Murray, I., Arnott, J.: Toward the simulation of emotion in synthetic speech: A review of the literature on
     human vocal emotion. Journal of the Acoustical Society of America 93(2) (1993,) 1097–1108
 Plutchik, R. (1980) A general psychoevolutionary theory of emotion. In R. Plutchik & H. Kellermann
      (eds.), Emotion: Theory, research, and experience, Volume I: Theories of emotions, pp. 3 – 31,
      New york: Accademic Press.
 Russell, J.A., 1980. A circumplex model of affect. Journal of Personality and Social Psychology 39(6),
     pp. 1161–1178, American Psychological Association.




  ILPS      Frank Nack   nack@uva.nl   KBMS
                                                                                                           40

More Related Content

Viewers also liked

Prob18
Prob18Prob18
Prob18okeee
 
Release your potential with Maconomy Essentials Cloud ERP
Release your potential with Maconomy Essentials Cloud ERPRelease your potential with Maconomy Essentials Cloud ERP
Release your potential with Maconomy Essentials Cloud ERPStefan Kim (Grahn)
 
Chapter6 huizing
Chapter6 huizingChapter6 huizing
Chapter6 huizingokeee
 
IDC on Social Collaboration
IDC on Social CollaborationIDC on Social Collaboration
IDC on Social Collaboration
Stefan Kim (Grahn)
 
Sw wordnet intro
Sw wordnet introSw wordnet intro
Sw wordnet introokeee
 
Release your potential with Deltek First Maconomy Essentials - February 2014
Release your potential with Deltek First Maconomy Essentials - February 2014Release your potential with Deltek First Maconomy Essentials - February 2014
Release your potential with Deltek First Maconomy Essentials - February 2014
Stefan Kim (Grahn)
 
Chapter4 brunt
Chapter4 bruntChapter4 brunt
Chapter4 bruntokeee
 

Viewers also liked (9)

Prob18
Prob18Prob18
Prob18
 
Release your potential with Maconomy Essentials Cloud ERP
Release your potential with Maconomy Essentials Cloud ERPRelease your potential with Maconomy Essentials Cloud ERP
Release your potential with Maconomy Essentials Cloud ERP
 
Chapter6 huizing
Chapter6 huizingChapter6 huizing
Chapter6 huizing
 
User Features
User FeaturesUser Features
User Features
 
IDC on Social Collaboration
IDC on Social CollaborationIDC on Social Collaboration
IDC on Social Collaboration
 
Sw wordnet intro
Sw wordnet introSw wordnet intro
Sw wordnet intro
 
Computer Aid - E-waste
Computer Aid - E-wasteComputer Aid - E-waste
Computer Aid - E-waste
 
Release your potential with Deltek First Maconomy Essentials - February 2014
Release your potential with Deltek First Maconomy Essentials - February 2014Release your potential with Deltek First Maconomy Essentials - February 2014
Release your potential with Deltek First Maconomy Essentials - February 2014
 
Chapter4 brunt
Chapter4 bruntChapter4 brunt
Chapter4 brunt
 

Similar to Kbms audio

Audio culture week 1 without sound
Audio culture week 1 without soundAudio culture week 1 without sound
Audio culture week 1 without soundPaul8989
 
SPEECH PERCEPTION MASLP
SPEECH PERCEPTION MASLPSPEECH PERCEPTION MASLP
SPEECH PERCEPTION MASLP
HimaniBansal15
 
Biological foundations of language
Biological foundations of languageBiological foundations of language
Biological foundations of language
Isabel Matos
 
Judith Ford Presentation - SRF Webinar Sep 13, 2012
Judith Ford Presentation - SRF Webinar Sep 13, 2012Judith Ford Presentation - SRF Webinar Sep 13, 2012
Judith Ford Presentation - SRF Webinar Sep 13, 2012wef
 
Urban Interlude ( Lund Sweden )
Urban Interlude ( Lund Sweden )Urban Interlude ( Lund Sweden )
Urban Interlude ( Lund Sweden )
Mauricio Navarro
 
Language and Human's Brain
Language and Human's  Brain Language and Human's  Brain
Language and Human's Brain Irma Fitriani
 
Evolution of human language final
Evolution of human language finalEvolution of human language final
Evolution of human language finalcarissaf
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
Lizy Abraham
 
Perception of sound
Perception of soundPerception of sound
Perception of sound
Aseel K. Mahmood
 
Perception of sounds
Perception of soundsPerception of sounds
Perception of sounds
Aseel K. Mahmood
 
Eng phon. 1st bim video lesson 1
Eng phon. 1st bim video  lesson 1Eng phon. 1st bim video  lesson 1
Eng phon. 1st bim video lesson 1UTPL UTPL
 
Language and sounds Ahmed Qadoury Abed
Language and sounds Ahmed Qadoury AbedLanguage and sounds Ahmed Qadoury Abed
Language and sounds Ahmed Qadoury Abed
Ahmed Qadoury Abed
 
Multimodal analysis1
Multimodal analysis1Multimodal analysis1
Multimodal analysis1
bobbadave
 
Outreach Sept 2018: Stuart Rosen
Outreach Sept 2018: Stuart RosenOutreach Sept 2018: Stuart Rosen
Outreach Sept 2018: Stuart Rosen
Lorna Halliday
 
Theories of speech perception.pptx
Theories of speech perception.pptxTheories of speech perception.pptx
Theories of speech perception.pptx
sherin444916
 

Similar to Kbms audio (20)

Opo t1
Opo t1Opo t1
Opo t1
 
Audio culture week 1 without sound
Audio culture week 1 without soundAudio culture week 1 without sound
Audio culture week 1 without sound
 
SPEECH PERCEPTION MASLP
SPEECH PERCEPTION MASLPSPEECH PERCEPTION MASLP
SPEECH PERCEPTION MASLP
 
Biological foundations of language
Biological foundations of languageBiological foundations of language
Biological foundations of language
 
Judith Ford Presentation - SRF Webinar Sep 13, 2012
Judith Ford Presentation - SRF Webinar Sep 13, 2012Judith Ford Presentation - SRF Webinar Sep 13, 2012
Judith Ford Presentation - SRF Webinar Sep 13, 2012
 
Urban Interlude ( Lund Sweden )
Urban Interlude ( Lund Sweden )Urban Interlude ( Lund Sweden )
Urban Interlude ( Lund Sweden )
 
Language and Human's Brain
Language and Human's  Brain Language and Human's  Brain
Language and Human's Brain
 
Evolution of human language final
Evolution of human language finalEvolution of human language final
Evolution of human language final
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
The Physics of Music
The Physics of MusicThe Physics of Music
The Physics of Music
 
Perception of sound
Perception of soundPerception of sound
Perception of sound
 
Audition Ppt
Audition PptAudition Ppt
Audition Ppt
 
Audition Ppt
Audition PptAudition Ppt
Audition Ppt
 
Perception of sounds
Perception of soundsPerception of sounds
Perception of sounds
 
Eng phon. 1st bim video lesson 1
Eng phon. 1st bim video  lesson 1Eng phon. 1st bim video  lesson 1
Eng phon. 1st bim video lesson 1
 
Language and sounds Ahmed Qadoury Abed
Language and sounds Ahmed Qadoury AbedLanguage and sounds Ahmed Qadoury Abed
Language and sounds Ahmed Qadoury Abed
 
Multimodal analysis1
Multimodal analysis1Multimodal analysis1
Multimodal analysis1
 
Su2012 ss lg week one full pp
Su2012 ss lg week one full ppSu2012 ss lg week one full pp
Su2012 ss lg week one full pp
 
Outreach Sept 2018: Stuart Rosen
Outreach Sept 2018: Stuart RosenOutreach Sept 2018: Stuart Rosen
Outreach Sept 2018: Stuart Rosen
 
Theories of speech perception.pptx
Theories of speech perception.pptxTheories of speech perception.pptx
Theories of speech perception.pptx
 

More from okeee

Week02 answer
Week02 answerWeek02 answer
Week02 answerokeee
 
Dm uitwerkingen wc4
Dm uitwerkingen wc4Dm uitwerkingen wc4
Dm uitwerkingen wc4okeee
 
Dm uitwerkingen wc2
Dm uitwerkingen wc2Dm uitwerkingen wc2
Dm uitwerkingen wc2okeee
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1okeee
 
Dm uitwerkingen wc3
Dm uitwerkingen wc3Dm uitwerkingen wc3
Dm uitwerkingen wc3okeee
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1okeee
 
Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handoutokeee
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homeworkokeee
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508okeee
 
Hcm p137 hilliges
Hcm p137 hilligesHcm p137 hilliges
Hcm p137 hilligesokeee
 
Overfit10
Overfit10Overfit10
Overfit10okeee
 
Decision tree.10.11
Decision tree.10.11Decision tree.10.11
Decision tree.10.11okeee
 
Dm week01 linreg.handout
Dm week01 linreg.handoutDm week01 linreg.handout
Dm week01 linreg.handoutokeee
 
Dm week02 decision-trees-handout
Dm week02 decision-trees-handoutDm week02 decision-trees-handout
Dm week02 decision-trees-handoutokeee
 
Dm week01 prob-refresher.handout
Dm week01 prob-refresher.handoutDm week01 prob-refresher.handout
Dm week01 prob-refresher.handoutokeee
 
Dm week01 intro.handout
Dm week01 intro.handoutDm week01 intro.handout
Dm week01 intro.handoutokeee
 
Dm week01 homework(1)
Dm week01 homework(1)Dm week01 homework(1)
Dm week01 homework(1)okeee
 
Chapter7 huizing
Chapter7 huizingChapter7 huizing
Chapter7 huizingokeee
 
Chapter8 choo
Chapter8 chooChapter8 choo
Chapter8 choookeee
 
Kbms text-image
Kbms text-imageKbms text-image
Kbms text-imageokeee
 

More from okeee (20)

Week02 answer
Week02 answerWeek02 answer
Week02 answer
 
Dm uitwerkingen wc4
Dm uitwerkingen wc4Dm uitwerkingen wc4
Dm uitwerkingen wc4
 
Dm uitwerkingen wc2
Dm uitwerkingen wc2Dm uitwerkingen wc2
Dm uitwerkingen wc2
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1
 
Dm uitwerkingen wc3
Dm uitwerkingen wc3Dm uitwerkingen wc3
Dm uitwerkingen wc3
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1
 
Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handout
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homework
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508
 
Hcm p137 hilliges
Hcm p137 hilligesHcm p137 hilliges
Hcm p137 hilliges
 
Overfit10
Overfit10Overfit10
Overfit10
 
Decision tree.10.11
Decision tree.10.11Decision tree.10.11
Decision tree.10.11
 
Dm week01 linreg.handout
Dm week01 linreg.handoutDm week01 linreg.handout
Dm week01 linreg.handout
 
Dm week02 decision-trees-handout
Dm week02 decision-trees-handoutDm week02 decision-trees-handout
Dm week02 decision-trees-handout
 
Dm week01 prob-refresher.handout
Dm week01 prob-refresher.handoutDm week01 prob-refresher.handout
Dm week01 prob-refresher.handout
 
Dm week01 intro.handout
Dm week01 intro.handoutDm week01 intro.handout
Dm week01 intro.handout
 
Dm week01 homework(1)
Dm week01 homework(1)Dm week01 homework(1)
Dm week01 homework(1)
 
Chapter7 huizing
Chapter7 huizingChapter7 huizing
Chapter7 huizing
 
Chapter8 choo
Chapter8 chooChapter8 choo
Chapter8 choo
 
Kbms text-image
Kbms text-imageKbms text-image
Kbms text-image
 

Recently uploaded

MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 

Recently uploaded (20)

MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 

Kbms audio

  • 1. KBMS – Audio Frank Nack ILPS ILPS – HCS Frank Nack nack@uva.nl Knowledge-based Media Systems 1 – winter 2008
  • 2. Outline   Last week   Audio – a sonic sign system   Sound and emotions ILPS Frank Nack nack@uva.nl KBMS 2
  • 3. Video Application – summary Investigated ForkBrowser AUTEUR Findings   The content determines the application   Content description   Application dependent   Complex   Recourse demanding   Time critical   Incomplete   Modular Schemata   Description environment   Production supportive   Archive supportive ILPS Frank Nack nack@uva.nl KBMS 3
  • 4. Change Vision is the ability of the brain and Audition is the sense of sound perception in eye to detect electromagnetic waves response to changes in the pressure exerted within the visible range (light) and to by atmospheric particles within a range of 20 to 20000 Hz. interpret the image as "sight." ILPS Frank Nack nack@uva.nl KBMS 4
  • 5. Audio – a sonic sign system ILPS Frank Nack nack@uva.nl KBMS 5
  • 6. Audio – Listen Sit for a minute and listen with closed eyes to the sound of the environment. Observe how you perceive the sound(s). ILPS Frank Nack nack@uva.nl KBMS 6
  • 7. Audio – Listen Sound listening experiment results:   Need to focus on focus on sound signal for identification purposes   Sound identification also determines location (of sound and listener)‫‏‬   Sound signals trigger thought associations ILPS Frank Nack nack@uva.nl KBMS 7
  • 8. Audio – Produce Talk for a minute to the person the furthest away from you in the room, without looking at him or her. Observe how you generate sound and which cues you use to interpret the input of your conversation partner. ILPS Frank Nack nack@uva.nl KBMS 8
  • 9. Audio – Produce Sound producing experiment results:   The higher the interference the louder the voice.   At a certain sound level one has to look at the conversation partner to focus on sound stream   The noise level determines the use of additional clues, e.g. eye contact, use of hands to emphasise importance, etc. ILPS Frank Nack nack@uva.nl KBMS 9
  • 10. Audio – Sound Generating Hearing Listening ILPS Frank Nack nack@uva.nl KBMS 10
  • 11. Audio – Sound The temporal lobe is involved in auditory perception and is home to the primary auditory cortex. It is also important for the processing of semantics in both speech and vision. This part of the cortex (primary auditory cortex) is involved in hearing. Adjacent areas in the superior, posterior and lateral parts of the temporal lobes are involved in high-level auditory processing. In humans this includes speech, for which the left temporal lobe in particular seems to be specialized. The functions of the left temporal lobe are not limited to low-level perception but extend to comprehension, naming, verbal memory and other language functions. The medial temporal lobes (near the Sagittal plane that divides left and right cerebral hemispheres) are thought to be involved in episodic/declarative memory. Deep inside the medial temporal lobes lie the hippocampi, which are essential for memory function - particularly the transference from short to long term memory and control of spatial memory and behavior. ILPS Frank Nack nack@uva.nl KBMS 11
  • 12. Audio – Sound listening Listening Factors   Physical entities Frequency, wavelength, amplitude, intensity, speed, direction   Context   danger, navigation, communication   speech, music, noise   cues (e.g. facial expressions, gestures)‫‏‬   personal experience and feelings   Sound quality   pitch (melody and harmony),   rhythm (tempo, meter, and articulation),   dynamics,   structure,   sonic qualities (timbre and texture). ILPS Frank Nack nack@uva.nl KBMS 12
  • 13. Audio – Sound listening Pitch is the perceived fundamental frequency of a sound. Melody is a series of linear events, not a simultaneity as in a chord. Harmony is the use of different pitches simultaneously. ILPS Frank Nack nack@uva.nl KBMS 13
  • 14. Audio – Sound listening Rhythm is the variation of the length and accentuation of a series of sounds. Tempo is the speed or pace of a given piece. Articulation refers to the performance technique which affects the transition or continuity on a single note or between multiple notes or sounds. ILPS Frank Nack nack@uva.nl KBMS 14
  • 15. Audio – Sound listening Dynamics normally refers to the softness or loudness of a sound (Amplitude). Timbre is the quality of a sound that distinguishes different types of sound production, such as voices or musical instruments. Texture is the overall quality of sound of a piece, most often indicated by the number of voices in the music and by the relationship between these voices, using terms such as "thick" and "light", "rough" or "smooth". The perceived texture of a piece can be affected by the timbre of the instruments or voices playing these parts and the harmony, tempo, and rhythms used. ILPS Frank Nack nack@uva.nl KBMS 15
  • 16. Audio – Interpretation Prosody is the rhythm, stress, and intonation of speech or singing. Prosody may reflect the emotional state of a speaker   whether an utterance is a statement, a question, or a command   whether the speaker is being ironic or sarcastic; emphasis, contrast and focus   in short everything in language that is not encoded by grammar. ILPS Frank Nack nack@uva.nl KBMS 16
  • 17. Audio – Interpretation The amygdalae are almond-shaped groups of nuclei located deep within the medial temporal lobes of the brain in complex vertebrates, including humans. Shown in research to perform a primary role in the processing and memory of emotional reactions, i.e. physiological arousal, expressive behaviors, and conscious experience" the amygdalae are considered part of the limbic system. ILPS Frank Nack nack@uva.nl KBMS 17
  • 18. Audio – Interpretation Sound in music or human speech is about : what and how. The how expresses the emotional value. Primary emotions (i.e., innate emotions, such as fear) depend on the limbic system circuitry, with the amygdala and anterior cingulate gyrus being "key players". Secondary emotions (i.e., feelings attached to objects [e.g., to dental drills], events, and situations through learning) require additional input, based largely on memory, from the prefrontal and somatosensory cortices. The stimulus may still be processed directly via the amygdala but is now also analyzed in the thought process. ILPS Frank Nack nack@uva.nl KBMS 18
  • 19. Audio – a sonic sign system - summary   Audition is the sense of sound perception in response to changes in the pressure exerted by atmospheric particles within a range of 20 to 20000 Hz.   Audio interpretation of speech, music, noise is context dependent (space, semantic, emotion)‫‏‬   Individual characteristics are important (thresholds of frequency spaces)‫‏‬   Audio stimulates primary emotions ILPS Frank Nack nack@uva.nl KBMS 19
  • 20. Audio – emotional representation ILPS Frank Nack nack@uva.nl KBMS 20
  • 21. Audio – emotion model Robert Plutchik (1980)‫‏‬ Anger an aroused state of antagonism toward someone or something perceived to be the source of an aversive event. Fear / Anxiety a response to tangible and realistic dangers. Sadness characterised by feelings of disadvantage and loss Joy which one experiences feelings ranging from contentment and satisfaction to bliss. Disgust is associated with things that are perceived as unclean, inedible, or infectious. Curiosity causes natural inquisitive behaviour such as exploration, investigation, and learning Surprise is the result of experiencing an unexpected event. Acceptance refers to the experience of a situation without an intention to change that situation ILPS Frank Nack nack@uva.nl KBMS 21
  • 22. Audio – emotion in human speech Arousal is the state of being awake. (leading to increased heart rate and blood pressure and a condition of sensory alertness, mobility and readiness to respond). Valence is the intrinsic attractiveness (positive valence) or aversiveness (negative valence) of an event, Russel (1980) object, or situation. ILPS Frank Nack nack@uva.nl KBMS 22
  • 23. Audio – emotion in human speech II Variations of accoustic variables Murray & Arnott (1993)‫‏‬ Emotion Pitch Intensity Speaking Voice Rate Quality Anger high mean, increased increased breathy wide range Joy Increased increased slow tempo; sometimes mean and increased breathy; range rate moderately blaring timbre Sadness normal or lower decreased slow resonant than normal timbre mean, narrow range ILPS Frank Nack nack@uva.nl KBMS 23
  • 24. Audio – Measurement ILPS Frank Nack nack@uva.nl KBMS 24
  • 25. Audio - Emotion Recognition System Labeled Training Classifier Data Classifier Training Training phase Signal Pre Feature Processing and Extraction Test phase Segmentation Classification Emotion Test Data ILPS Frank Nack nack@uva.nl KBMS 25
  • 26. Audio – speech material problem acted read elicited real-life emotions emotions emotions emotions easy hard Access Difficulty ILPS Frank Nack nack@uva.nl KBMS 26
  • 27. Audio – Segmentation Goal: segment a speech signal into units that are representative for emotions. Target: words, utterances, words in context, fixed time intervals Requirement: •  long enough to reliably calculate features by means of statistical functions •  short enough to guarantee stable acoustic properties with respect to emotions within the segment ILPS Frank Nack nack@uva.nl KBMS 27
  • 28. Audio – Feature extraction Goal: find those properties of the digitised and pre-processed acoustic signal that are characteristic for emotions and to represent them in a n-dimensional feature vector. Features: Pitch and energy related features Durational and pause related features Types of voice quality features. Problem: A high number of features is not beneficial because most classifiers are negatively influenced by redundant, correlated or irrelevant features. Solution: Principal component analysis (PCA) to reduce multidimensional data sets to lower dimensions for analysis. Compute a high number of features and then apply a feature selection algorithm that chooses the most significant features of the training data for the given task. ILPS Frank Nack nack@uva.nl KBMS 28
  • 29. Audio – Feature extraction II The raw pitch, energy, etc. contours can be used as is, and are then called short-term features, or more often, the actual features are derived from these acoustic variables by applying (statistic) functions over the sequence of values within an emotion segment, thus called global statistics features. Example:   mean pitch of a word or an utterance   maximum, or minimum, etc. of the segment,   Regression Metadata for enhancement of recognition accuracy. ILPS Frank Nack nack@uva.nl KBMS 29
  • 30. Audio – Classification Each input unit is represented by a feature vector. The problem of emotion recognition is now a general data mining problem. Any statistical classifier that can deal with high-dimensional data can be used as long as it achieves the following: x p (x | apple) 2 (color) p( x | pear ) x (roundness ) 1 if p ( pear | x) > p (apple | x) x → pear else x → apple ILPS Frank Nack nack@uva.nl KBMS 30
  • 31. Audio – Classification II   Choose decision function in feature space.   Estimate the function from the training set.   Classify a new pattern on the basis of this decision rule. ILPS Frank Nack nack@uva.nl KBMS 31
  • 32. Audio – Classification III Linear Piecewise linear Polynomial Feed-forward neural network ILPS Frank Nack nack@uva.nl KBMS 32
  • 33. Audio – Common statistical methods Support vector machines are a set of related supervised learning methods that analyze data and recognize patterns. H3 (green) doesn't separate the 2 classes. H1 (blue) does, with a small margin and H2 (red) with the maximum margin. ILPS Frank Nack nack@uva.nl KBMS 33
  • 34. Audio – Common statistical methods II Neural networks are adaptive systems that change their structure based on external or internal information that flows through the network during the learning phase. ILPS Frank Nack nack@uva.nl KBMS 34
  • 35. Audio – Common statistical methods III A decision treeIis a support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. ILPS Frank Nack nack@uva.nl KBMS 35
  • 36. Audio – Common statistical methods IV HMM is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be considered as the simplest dynamic Bayesian network. Probabilistic parameters of a hidden Markov model (example) x — states y — possible observations a — state transition probabilities b — output probabilities ILPS Frank Nack nack@uva.nl KBMS 36
  • 37. Audio – Measurement - summary   No standard database for benchmarking   It is rarely possible to compare features across published work, since conditions vary a lot and even slight changes in the general set-up can make results incomparable.   Audio segmentation can be performed by voice activity detection which is a fast segmentation method not requiring high-level linguistic knowledge.   Feature extraction may only rely on automatically computable properties of the acoustic signal, but this is not a major limitation.   For classification, in principle any statistical classifier can be used with sophisticated classifiers being superior in accuracy, but simple and thus fast classifiers are often sufficient.   The speech database used to train the classifier should be adjusted to the particular application as much as possible. ILPS Frank Nack nack@uva.nl KBMS 37
  • 38. Audio – summary   Audition is a sign system but it is more complicated to identify the smallest sign unit, as the temporal nature of sound is essential.   The properties of the acoustic signal are enough to identify a sound BUT individual characteristics are important (thresholds of frequency spaces)‫‏‬   The interpretation of sound relies on context (other clues, such as facial expression or gestures for spoken language, are of importance)‫‏‬   Audio stimulates primary emotions, though the identification for each individual depends on its thresholds for arousal and valence. ILPS Frank Nack nack@uva.nl KBMS 38
  • 39. Video – References ILPS Frank Nack nack@uva.nl KBMS 39
  • 40. Audio – References Murray, I., Arnott, J.: Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. Journal of the Acoustical Society of America 93(2) (1993,) 1097–1108 Plutchik, R. (1980) A general psychoevolutionary theory of emotion. In R. Plutchik & H. Kellermann (eds.), Emotion: Theory, research, and experience, Volume I: Theories of emotions, pp. 3 – 31, New york: Accademic Press. Russell, J.A., 1980. A circumplex model of affect. Journal of Personality and Social Psychology 39(6), pp. 1161–1178, American Psychological Association. ILPS Frank Nack nack@uva.nl KBMS 40