Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Speaker Independent Diarization for Child Language
Environment Analysis Using Deep Neural Networks
By Maryam Najafian
Supe...
 This study investigates language environments of
young children based on location tracking and
speech processing child-a...
Child-Adult speaker diarization
2/16
LIUM GMM-HMMs with
bottom-up clustering
MFCC extraction
Viterbi
Re-segmentation
BIC distance
GLR: Generalized Likelihood R...
Diarization with i-Vector SVM
i-Vector based child-adult turn-taking detection system
 TO-Combo-SAD: Threshold optimized ...
 I-Vector-SVM TO-Combo-SAD system,1.5s segments, on 4.5 hrs [1]
 27.3% Relative error reduction compared to LIUM on 4.5 ...
Synchronous
DNN-HMMs
46/16
System Comparison
 28.5% Relative error reduction compared to LIUM, on 7.2 hrs
7/16
7.2 hours:
Distribution of 4 acoustic...
Parallel
Asynchronous
DNN-HMMs
4
8/16
System Comparison
 37.11% Relative error reduction compared to LIUM, on 7.2 hrs
10/16
7.2 hours:
Distribution of 4 acoust...
System Comparison
11/16
 3 Classroom Time Points:
Compares level of interaction
between child & other children
and adults
Case study
12/16
Case studyCase study
 3 Classroom Time Points:
Compares % Time Spent in each
of 7 learning/activity areas:
(art, blocks, ...
Case study
 Case study aims to collect statistics that enable a wider perspective of child
communication between teachers...
Case study
 Case study aims to collect statistics that enable a wider perspective of child
communication between teachers...
Summary
 Explored LOCATION & LANGUAGE interactions via diarization
 Proposed DNN-HMM and diarization solutions to assess...
References
 [1] M. Najafian, D. Ir i , Y. Luo, B.“. Rous, a d J.H.L. Ha se , Auto ati
measurement and analysis of the chi...
Upcoming SlideShare
Loading in …5
×

1

Share

Download to read offline

presentation_Diarization_MIT

Download to read offline

Related Audiobooks

Free with a 30 day trial from Scribd

See all

presentation_Diarization_MIT

  1. 1. Speaker Independent Diarization for Child Language Environment Analysis Using Deep Neural Networks By Maryam Najafian Supervisor Prof. John Hansen University of Texas at Dallas, US 4th October 2016 Email: m.najafian@utdallas.edu
  2. 2.  This study investigates language environments of young children based on location tracking and speech processing child-adult speaker diarization  Audio recordings are gathered using LENA units  Location information are gathered using Ubisence units  labeled audio is gathered from 32 children wearing the LENA unit with age range from 2.5 to 5 years old over a typical day at three different time points Introduction UBISENSE LENA 1/16
  3. 3. Child-Adult speaker diarization 2/16
  4. 4. LIUM GMM-HMMs with bottom-up clustering MFCC extraction Viterbi Re-segmentation BIC distance GLR: Generalized Likelihood Ratio BIC: Bayesian Information Criterion Audio Segmentation GLR distance Agglomerative Clustering (BIC distance) UBM MAP adapted to each class MFCC extraction UBM: Universal Background Model MAP: Maximum A Posteriori 3/16 1-primary child 2-secondary child 3-Adult 4-music 5-crowd noise 6-silence
  5. 5. Diarization with i-Vector SVM i-Vector based child-adult turn-taking detection system  TO-Combo-SAD: Threshold optimized speech activity detection using Combo-SAD features [3,4]  Combo features: the mean and variance normalized Harmonicity, Clarity, Prediction, Priodicity, and Spectral Flux features are linearly mapped to 1-dimensional 'COMBO' feature  I-Vector [5] SVM based classifier 4/16
  6. 6.  I-Vector-SVM TO-Combo-SAD system,1.5s segments, on 4.5 hrs [1]  27.3% Relative error reduction compared to LIUM on 4.5 hrs [1] System Comparison 4.5 hours: Distribution of 4 acoustic classes in our database From manually labeled data Adult 22% Primary Child 10% Secondary Child 16% Non-speech 523% 5/16
  7. 7. Synchronous DNN-HMMs 46/16
  8. 8. System Comparison  28.5% Relative error reduction compared to LIUM, on 7.2 hrs 7/16 7.2 hours: Distribution of 4 acoustic classes in our database From manually labeled data on-Speech Adult 24% Primary Child 20% Secondary Child 23% Non-speech 33%
  9. 9. Parallel Asynchronous DNN-HMMs 4 8/16
  10. 10. System Comparison  37.11% Relative error reduction compared to LIUM, on 7.2 hrs 10/16 7.2 hours: Distribution of 4 acoustic classes in our database From manually labeled data on-Speech Adult 24% Primary Child 20% Secondary Child 23% Non-speech 33%
  11. 11. System Comparison 11/16
  12. 12.  3 Classroom Time Points: Compares level of interaction between child & other children and adults Case study 12/16
  13. 13. Case studyCase study  3 Classroom Time Points: Compares % Time Spent in each of 7 learning/activity areas: (art, blocks, books, dramatic play, cubbies, manipulation, science) 13/16
  14. 14. Case study  Case study aims to collect statistics that enable a wider perspective of child communication between teachers and peers in classrooms across different a ti it areas i.e., hi h areas are hot la guage spa es? Speech produced by adults, primary and secondary children across 7 activity areas in a 33 minutes green window 14/16
  15. 15. Case study  Case study aims to collect statistics that enable a wider perspective of child communication between teachers and peers in classrooms across different a ti it areas i.e., hi h areas are hot la guage spa es? Heat map adult word count vocalizations per minute 15/16
  16. 16. Summary  Explored LOCATION & LANGUAGE interactions via diarization  Proposed DNN-HMM and diarization solutions to assess child- adult interaction in naturalistic learning spaces  Using the fused DNN-HMM based system leads to considerable relative DER reduction on average compared to the LIUM’s GMM based system with bottom-up clustering.  Analysis plots derived from this work support our ability to:  Determine which children are less engaged in voice communication  Determine how much talk teachers direct at each child  Assess how much communication children have with other children in specific learning/activity areas  Determine which learning/activities stimulate greater voice communication between child-teacher and child-child  Determine which activity areas individual children or all children within a given classroom on average spend their time 16/16
  17. 17. References  [1] M. Najafian, D. Ir i , Y. Luo, B.“. Rous, a d J.H.L. Ha se , Auto ati measurement and analysis of the child verbal communication using classroom a ousti s ithi a hild are e ter, i WOCCI, 6.  [ ] M. Najafia , a d J.H.L. Ha se , “peaker i depe de t diarizatio for hild la guage e iro e t a al sis usi g Deep Neural Net orks, su itted to IEEE “LT- 2016.  [3] S. O. Sadjadi, J.H.L. Hansen, U super ised speech activity detection using voicing measures and perceptual spectral Flu , IEEE Signal Processing Letters, vol. 20, no. 3, pp. 197-200, March 2013  [4] A. Ziaei, L. Kaushik, A. Sangwan, J.H.L. Hansen, D. Oard, Speech activity detection for NASA Apollo space missions: challenges and solutions, ISCA INTERSPEECH-2014, Paper #994, Singapore, Sept. 14-18, 2014.  [5] N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verication, INTERSPEECH, 2011.
  • XiaotingFu

    Apr. 25, 2018

Views

Total views

564

On Slideshare

0

From embeds

0

Number of embeds

21

Actions

Downloads

7

Shares

0

Comments

0

Likes

1

×