Abu-El-Haija, Sami, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. "Youtube-8m: A large-scale video classification benchmark." arXiv preprint arXiv:1609.08675 (2016).
A flexible method to create wave file features IJECEIAES
Digital audio signal is one of the most important data type at present, it is used in various vital applications, such as human knowledge, security and banking applications, most applications require signal identification and recognition, and to increase the efficiency of these applications we must seek a method to represent the audio file by a small set of values called a features vector. In this paper research we will introduce an enhanced method of features extraction based on k-mean clustering. The method will be tested and implemented to show how the proposed method can reduce the efforts of voice identification, and can minimize the recognition time a set of voice extracted features must be used instead of using the voice wave file.
Abu-El-Haija, Sami, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. "Youtube-8m: A large-scale video classification benchmark." arXiv preprint arXiv:1609.08675 (2016).
A flexible method to create wave file features IJECEIAES
Digital audio signal is one of the most important data type at present, it is used in various vital applications, such as human knowledge, security and banking applications, most applications require signal identification and recognition, and to increase the efficiency of these applications we must seek a method to represent the audio file by a small set of values called a features vector. In this paper research we will introduce an enhanced method of features extraction based on k-mean clustering. The method will be tested and implemented to show how the proposed method can reduce the efforts of voice identification, and can minimize the recognition time a set of voice extracted features must be used instead of using the voice wave file.
Now You See it! Visualizing your PPC Competition - SMX West 2016Maddie Cary Deuel
It's not enough to be brilliant with your own paid search efforts. You also need to understand what your competitors are up to, including their keywords, landing page and quality score strategies, and campaign management efforts. In short, you need to know everything they’re doing to enhance their visibility and diminish yours.
This session reviews how eeasy it is to apply the time-tested techniques of competitive intelligence to give yourself a marketing edge.
Design and implementation of a Neural Network based image compression engine as part of Final Year Project by Jesu Joseph and Shibu Menon at Nanyang Technological University. The project won the best possible grade and excellent accolades from the research center.
Two approaches to QbE (Query-by-Example) retrieving system, proposed by the Technical University of Kosice (TUKE) for the query by example search on speech task (QUESST), are presented in this paper. Our main interest was focused on building such QbE system, which is able to retrieve all given queries with and without using any external speech resources. Therefore we developed posteriorgram-based keyword matching system, which utilizes a novel weighted fast sequential variant of DTW (WFS-DTW) algorithm in order to detect occurrences of each query within the particular utterance le, using two GMM-based acoustic units modeling approaches. The first one, referred as low-resource approach, employs language-dependent phonetic decoders to convert queries and utterances into posteriorgrams. The second one, defined as zero-resource approach, implements combination of unsupervised segmentation and clustering techniques by using only provided utterance files.
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_80.pdf
Now You See it! Visualizing your PPC Competition - SMX West 2016Maddie Cary Deuel
It's not enough to be brilliant with your own paid search efforts. You also need to understand what your competitors are up to, including their keywords, landing page and quality score strategies, and campaign management efforts. In short, you need to know everything they’re doing to enhance their visibility and diminish yours.
This session reviews how eeasy it is to apply the time-tested techniques of competitive intelligence to give yourself a marketing edge.
Design and implementation of a Neural Network based image compression engine as part of Final Year Project by Jesu Joseph and Shibu Menon at Nanyang Technological University. The project won the best possible grade and excellent accolades from the research center.
Two approaches to QbE (Query-by-Example) retrieving system, proposed by the Technical University of Kosice (TUKE) for the query by example search on speech task (QUESST), are presented in this paper. Our main interest was focused on building such QbE system, which is able to retrieve all given queries with and without using any external speech resources. Therefore we developed posteriorgram-based keyword matching system, which utilizes a novel weighted fast sequential variant of DTW (WFS-DTW) algorithm in order to detect occurrences of each query within the particular utterance le, using two GMM-based acoustic units modeling approaches. The first one, referred as low-resource approach, employs language-dependent phonetic decoders to convert queries and utterances into posteriorgrams. The second one, defined as zero-resource approach, implements combination of unsupervised segmentation and clustering techniques by using only provided utterance files.
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_80.pdf
Abstract: With the emergence of multiple modern video codecs, streaming service providers are forced to encode, store, and transmit bitrate ladders of multiple codecs separately, consequently suffering from additional energy costs for encoding, storage, and transmission. To tackle this issue, we introduce an online energy-efficient Multi-Codec Bitrate ladder Estimation scheme (MCBE) for adaptive video streaming applications. In MCBE, quality representations within the bitrate ladder of new-
generation codecs (e.g., High Efficiency Video Coding (HEVC),
Alliance for Open Media Video 1 (AV1)) that lie below the predicted rate-distortion curve of the Advanced Video Coding (AVC) codec are removed. Moreover, perceptual redundancy between representations of the bitrate ladders of the considered codecs is also minimized based on a Just Noticeable Difference (JND) threshold. Therefore, random forest-based models predict the VMAF score of bitrate ladder representations of each codec. In a live streaming session where all clients support the decoding of AVC, HEVC, and AV1, MCBE achieves impressive results, reducing cumulative encoding energy by 56.45%, storage energy usage by 94.99%, and transmission energy usage by 77.61% (considering a JND of six VMAF points). These energy reductions are in comparison to a baseline bitrate ladder encoding based on current industry practice.
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Alpen-Adria-Universität
With the emergence of multiple modern video codecs, streaming service providers are forced to encode, store, and transmit bitrate ladders of multiple codecs separately, consequently suffering from additional energy costs for encoding, storage, and transmission.
To tackle this issue, we introduce an online energy-efficient Multi-Codec Bitrate ladder Estimation scheme (MCBE) for adaptive video streaming applications. In MCBE, quality representations within the bitrate ladder of new-generation codecs (e.g., HEVC, AV1) that lie below the predicted rate-distortion curve of the AVC codec are removed. Moreover, perceptual redundancy between representations of the bitrate ladders of the considered codecs is also minimized based on a Just Noticeable Difference (JND) threshold. Therefore, random forest-based models predict the VMAF of bitrate ladder representations of each codec. In a live streaming session where all clients support the decoding of AVC, HEVC, and AV1, MCBE achieves impressive results, reducing cumulative encoding energy by 56.45%, storage energy usage by 94.99%, and transmission energy usage by 77.61% (considering a JND of six VMAF points). These energy reductions are in comparison to a baseline bitrate ladder encoding based on current industry practice.
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Beniamino Murgante
Network Based Kernel Density Estimation for Cycling Facilities Optimal Location Applied to Ljubljana
Nicolas Lachance-Bernard, Timothée Produit - Ecole polytechnique fédérale de Lausanne
Biba Tominc, Matej Niksic, Barbara Golicnik Marusic - Urban Planning Institute of the Republic of Slovenia
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...ijtsrd
Acoustic Scene Classification ASC is classified audio signals to imply about the context of the recorded environment. Audio scene includes a mixture of background sound and a variety of sound events. In this paper, we present the combination of maximal overlap wavelet packet transform MODWPT level 5 and six sets of time domain and frequency domain features are energy entropy, short time energy, spectral roll off, spectral centroid, spectral flux and zero crossing rate over statistic values average and standard deviation. We used DCASE Challenge 2016 dataset to show the properties of machine learning classifiers. There are several classifiers to address the ASC task. We compare the properties of different classifiers K nearest neighbors KNN , Support Vector Machine SVM , and Ensembles Bagged Trees by using combining wavelet and spectral features. The best of classification methodology and feature extraction are essential for ASC task. In this system, we extract at level 5, MODWPT energy 32, relative energy 32 and statistic values 6 from the audio signal and then extracted feature is applied in different classifiers. Mie Mie Oo | Lwin Lwin Oo "Acoustic Scene Classification by using Combination of MODWPT and Spectral Features" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27992.pdfPaper URL: https://www.ijtsrd.com/computer-science/multimedia/27992/acoustic-scene-classification-by-using-combination-of-modwpt-and-spectral-features/mie-mie-oo
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
KTTO_2015_Vavrek
1. The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Multi-level audio classification architecture
Jozef Vavrek, Jozef Juhár
Department of Electronics and Multimedia Communications
Faculty of Electrical Engineering and Informatics
Technical University of Košice
email: {Jozef.Vavrek; Jozef.Juhar}@tuke.sk
3. Content
1. Motivation and aim
2. Proposed classification system
3. Audio data
4. Segmentation, preprocessing, feature extraction, smoothing
4.1 Feature extraction techniques (cepstral)
4.2 Feature extraction techniques (spectral)
5. Basic principles of BN audio data classification via BDT
6. Basic principles of BN audio data classification via BDA
7. Binary discrimination architecture employing Support Vector
Machine classifier (BDASVM)
8. Experimental setup
9. Results
10. Additional experiments – One Against One (OAO) architecture
11. Additional results
12. Conclusions & future work
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
4. 1.Motivation and aim
We built the classification system with intention to use if for refinement
the acoustic models for each particular audio class and lower the word
error rate of the automatic speech recognition (ASR) system.
We proposed binary discrimination architecture utilizing support vector
machine (BDASVM) classifier in order to overcome classification
accuracy of binary decision trees with SVM (BDTSVM) and alleviate
miss-classification error that propagates from the top of the
architecture.
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
5. 2.Proposed classification system
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
6. 3.Audio data
Database: Slovak TV broadcast news BNKE1 - part of the COST-278
Audio: 16 kHz 16 bit mono PCM
Metadata: manually annotated using Transcriber
Duration: 65 hours (188 recordings)
Audio data used for training and testing:
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Audio event Training set (min) Testing set (min)
Pure Speech (PS) 10.19 9.16
Speech with env. sound (SES) 9.26 9.44
Speech with music (MS) 9.41 9.25
Music (M) 11.7 9.04
Env. Sound (Background B) 9.06 9.31
49 46.2
7. The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
4.Segmentation, preprocessing, feature extraction, smoothing
8. 4.1 Feature extraction techniques (cepstral)
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Mel-Frequency Cepstral Coefficients (MFCC)
Variance of Acceleration Mel-Frequency Cepstral Coefficients (VAMFCC)
Variance of Mel-Filter Bank Energy (VMFBE)
9. The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
4.2 Feature extraction techniques (spectral)
Spectral Centroid (SC)
Spectral Flux (SF)
Spectral Spread (SS)
10. The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
4.2 Feature extraction techniques (spectral)
Spectral ROLL-OFF (ROLLOFF)
Band Periodicity (BP)
11. 5.Basic principles of BN audio data classification via BDT
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
12. The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
6.Basic principles of BN audio data classification via BDA
13. The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
7.Binary Discrimination Architecture employing Support
Vector Machine classifier (BDASVM)
14. The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
8.Experimental setup
Segmentation: rectangular window with length 200ms and 100ms overlapping
Preprocessing: Hamming window with length 50ms and 25ms overlapping
Feature extraction: frame-based, segment-based, frame-based with smoothing, segment-
based with smoothing
Smoothing: floating window with length 1s
Classification: support vector machine classifier, RBF kernel function, 5-fold cross-
validation
•Evaluation parameters:
– for cross-validation: Area Under the Curve (AUC)=<TPR>, (0.5,1)
– for classification performance: Accuracy (Acc)=(TP+TN)/(TP+FP+TN+FN)
– Processing Time (PT)
Software: wavex (wav extractor), libsvm-3.17
Hardware: HPC TUKE 24 nodes, IBM Blade System x HS22 with two six-core processor
units Intel Xeon L5640 (2.27GHz) and 48 GB RAM
15. 9.Results
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Topology
Acc[%]
frame framefw
seg segfw
BDTSVM 74.82 88.05 78.13 86.81
BDASVM 75.10 89.12 80.43 90.52
+0.28 +1.07 +2.3 +3.71
Tab.: Classification performance of BDTSVM and BDASVM architectures for different parameterization levels
Acc represents average from S-NS, PS-NPS, MS-SES, M-B discriminators
Tab.: The overall classification performance of BDTSVM and BDASVM architectures
Acc represents average from each parameterization levels
Topology
Acc[%]
PS MS SES M B Avg PT[min]
BDTSVM 85.69 54.46 48.63 72.75 77.83 67.87 44.13
BDASVM 85.94 53.29 48.94 72.85 80.74 68.35 48.37
+0.48 -4.24
16. 10.Additional experiments – One Against One (OAO) architecture
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
17. 11.Additional results
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Topology
Acc[%]
frame framefw
seg segfw
OAOSVM 63.70 77.38 64.41 74.16
BDASVM 54.95 79.75 62.91 75.79
-8.75 +2.37 -1.5 +1.63
Tab.: Classification performance of OAOSVM and BDASVM architectures for different parameterization levels
Acc represents average for PS, MS, SES, M, B classes
Tab.: The overall classification performance of OAOSVM and BDASVM architectures
Acc represents average from each parameterization levels
Topology
Acc[%]
PS MS SES M B Avg PT[min]
OAOSVM 86.92 53.22 46.79 76.49 86.13 69.91 24.56
BDASVM 85.94 53.29 48.94 72.85 80.74 68.35 48.37
-1.56 -23.81
18. 12. Conclusions & future work
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Advantages of BDASVM
significant classification error reduction on each individual classification level
regardless the parameterization level (against BDTSVM)
higher the overall classification accuracy (against BDTSVM)
Possibility of using optimal parameterization and feature selection techniques
on each individual level of classification (against OAOSVM)
Disadvantages of BDASVM
higher number of classifiers => higher processing time (against BDTSVM and
OAOSVM)
a need to find an optimal feature selection algorithm for selecting optimal
training and testing set (against BDTSVM and OAOSVM)
In the near future, we will make comparison between BDASVM and One Against
All SVM (OAASVM) architecture and extraction of each audio class using
phoneme-based alignment.
Future work will be also directed towards an implementation of the BDASVM into
the BN transcription system.
19. Thank you for you attention
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217