This document summarizes the results of 5 different systems for detecting violent scenes in videos that were submitted by Technicolor, INRIA, and Imperial College London to the MediaEval 2012 Violent Scene Detection Task. System 1 used similarity measures between frames, System 2 used bag-of-audio-words modeling, System 3 used Bayesian network structure learning, System 4 used a naive Bayesian classifier, and System 5 fused the outputs of the first 3 systems. System 3 performed best with a MAP of 61.82% while the fusion system was the 4th best overall system. The document concludes with perspectives on improving the different approaches.
A robust audio watermarking in cepstrum domain composed of sample's relation ...ijma
Watermark bits embedded in audio signals considering the sample’s relative state in a frame may
strengthen the attack-invariant features of audio watermarking algorithm. In this work, we propose to
embed watermarks in an audio signal considering the relation between the mean values of consecutive
groups of samples which shows robustness by overcoming common watermarking challenges. Here, we
divide the host audio signal into equal-sized non-overlapping frames which in turn is split into four equalsized
non-overlapping sub-frames. After, transforming these sub-frames in cepstrum domain we finally use
the relation between the differences of first two sub-frames and last two sub-frames to embed watermarks.
Depending on the watermark bit (either 0 or 1) to be embed, our embedding technique either interchange
or update the differences between these groups of samples by distorting the sample values in sub-frames
selectively. Thus, watermarks are embedded by making a little or no distortion of the sub-frames which
helps our scheme to be imperceptible in nature. Moreover, use of such embedding technique lead our
watermarking scheme to a computationally less complex extraction method. Simulation results also justify
our claim of the proposed scheme to be both robust and imperceptible.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
AURALISATION OF DEEP CONVOLUTIONAL NEURAL NETWORKS: LISTENING TO LEARNED FEAT...NAVER LABS
ISMIR ( International Society for Music Information Retrieval Conference ) 2015 에서 발표된 CNN 딥러닝 방법을 이용하여 음악을 분석하는 내용의 논문입니다.
저자 : Queen Mary University of London 최근우, 네이버랩스 김정희, Queen Mary University of London George Fazekas, Queen Mary University of London Mark Sandler
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
Self Attested Images for Secured Transactions using Superior SOMIDES Editor
Separate digital signals are usually used as the
digital watermarks. But this paper proposes rebuffed
untrained minute values of vital image as a digital watermark,
since no host image is needed to hide the vital image for its
safety. The vital images can be transformed with the self
attestation. Superior Self Organized Maps is used to derive
self signature from the vital image. This analysis work
constructs framework with Superior Self Organizing Maps
(SSOM) against Counter Propagation Network for watermark
generation and detection. The required features like
robustness, imperceptibility and security was analyzed to prove
that which neural network is appropriate for mining watermark
from the host image. SSOM network is proved as an efficient
neural trainer for the proposed watermarking technique. The
paper presents one more contribution to the watermarking
area.
A robust audio watermarking in cepstrum domain composed of sample's relation ...ijma
Watermark bits embedded in audio signals considering the sample’s relative state in a frame may
strengthen the attack-invariant features of audio watermarking algorithm. In this work, we propose to
embed watermarks in an audio signal considering the relation between the mean values of consecutive
groups of samples which shows robustness by overcoming common watermarking challenges. Here, we
divide the host audio signal into equal-sized non-overlapping frames which in turn is split into four equalsized
non-overlapping sub-frames. After, transforming these sub-frames in cepstrum domain we finally use
the relation between the differences of first two sub-frames and last two sub-frames to embed watermarks.
Depending on the watermark bit (either 0 or 1) to be embed, our embedding technique either interchange
or update the differences between these groups of samples by distorting the sample values in sub-frames
selectively. Thus, watermarks are embedded by making a little or no distortion of the sub-frames which
helps our scheme to be imperceptible in nature. Moreover, use of such embedding technique lead our
watermarking scheme to a computationally less complex extraction method. Simulation results also justify
our claim of the proposed scheme to be both robust and imperceptible.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
AURALISATION OF DEEP CONVOLUTIONAL NEURAL NETWORKS: LISTENING TO LEARNED FEAT...NAVER LABS
ISMIR ( International Society for Music Information Retrieval Conference ) 2015 에서 발표된 CNN 딥러닝 방법을 이용하여 음악을 분석하는 내용의 논문입니다.
저자 : Queen Mary University of London 최근우, 네이버랩스 김정희, Queen Mary University of London George Fazekas, Queen Mary University of London Mark Sandler
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
Self Attested Images for Secured Transactions using Superior SOMIDES Editor
Separate digital signals are usually used as the
digital watermarks. But this paper proposes rebuffed
untrained minute values of vital image as a digital watermark,
since no host image is needed to hide the vital image for its
safety. The vital images can be transformed with the self
attestation. Superior Self Organized Maps is used to derive
self signature from the vital image. This analysis work
constructs framework with Superior Self Organizing Maps
(SSOM) against Counter Propagation Network for watermark
generation and detection. The required features like
robustness, imperceptibility and security was analyzed to prove
that which neural network is appropriate for mining watermark
from the host image. SSOM network is proved as an efficient
neural trainer for the proposed watermarking technique. The
paper presents one more contribution to the watermarking
area.
Presentation slides discussing the theory and empirical results of a text-independent speaker verification system I developed based upon classification of MFCCs. Both mininimum-distance classification and least-likelihood ratio classification using Gaussian Mixture Models were discussed.
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
Machine listening is a field that encompasses research on a wide range of tasks, including speech recognition, audio content recognition, audio-based search, and content-based music analysis. In this talk, I will start by introducing some of the ways in which machine learning enables computers to process and understand audio in a meaningful way. Then I will draw on some specific examples from my dissertation showing techniques for automated analysis of live drum performances. Specifically, I will focus on my work on drum detection, which uses gamma mixture models and a variant of non-negative matrix factorization, and drum pattern analysis, which uses deep neural networks to infer high-level rhythmic and stylistic information about a performance.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...gt_ebuddy
Joint Speech and Speaker Recognition using Hidden Markov Model/Vector Quantization for speaker independent Speech Recognition and Gaussian Mixture Model for speech independent speaker recognition- used MFCC (Mel-Frequency Cepstral Coefficient) for Feature Extraction (delta,delta delta and energy - 39 coefficients).
Developed in JAVA with client/server Architecture, web interface developed in Adobe Flex.
This project was done at TU, IOE - Pulchowk Campus, Nepal.
For more details visit http://ganeshtiwaridotcomdotnp.blogspot.com
ABSTRACT OF PROJECT>>>
Biometric is physical characteristic unique to each individual. It has a very useful application in authentication and access control.
The designed system is a text-prompted version of voice biometric which incorporates text-independent speaker verification and speaker-independent speech verification system implemented independently. The foundation for this joint system is that the speech signal conveys both the speech content and speaker identity. Such systems are more-secure from playback attack, since the word to speak during authentication is not previously set.
During the course of the project various digital signal processing and pattern classification algorithms were studied. Short time spectral analysis was performed to obtain MFCC, energy and their deltas as feature. Feature extraction module is same for both systems. Speaker modeling was done by GMM and Left to Right Discrete HMM with VQ was used for isolated word modeling. And results of both systems were combined to authenticate the user.
The speech model for each word was pre-trained by using utterance of 45 English words. The speaker model was trained by utterance of about 2 minutes each by 15 speakers. While uttering the individual words, the recognition rate of the speech recognition system is 92 % and speaker recognition system is 66%. For longer duration of utterance (>5sec) the recognition rate of speaker recognition system improves to 78%.
Text-Independent Speaker Verification ReportCody Ray
Provides an introduction to the task of speaker recognition, and describes a not-so-novel speaker recognition system based upon a minimum-distance classification scheme. We describe both the theory and practical details for a reference implementation. Furthermore, we discuss an advanced technique for classification based upon Gaussian Mixture Models (GMM). Finally, we discuss the results of a set of experiments performed using our reference implementation.
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
Scene Classification is used in Convolutional Neural Networks (CNNs). We seek to redefine computer vision as an AI problem, understand the importance of scene classification as well as challenges, and the difference between traditional machine learning and deep learning. Additionally, we discuss CNNs, using caffe for implementing CNNs and importact reosources to imorove.
CNNs
Jayani Withanawasam
Presentation slides discussing the theory and empirical results of a text-independent speaker verification system I developed based upon classification of MFCCs. Both mininimum-distance classification and least-likelihood ratio classification using Gaussian Mixture Models were discussed.
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
Machine listening is a field that encompasses research on a wide range of tasks, including speech recognition, audio content recognition, audio-based search, and content-based music analysis. In this talk, I will start by introducing some of the ways in which machine learning enables computers to process and understand audio in a meaningful way. Then I will draw on some specific examples from my dissertation showing techniques for automated analysis of live drum performances. Specifically, I will focus on my work on drum detection, which uses gamma mixture models and a variant of non-negative matrix factorization, and drum pattern analysis, which uses deep neural networks to infer high-level rhythmic and stylistic information about a performance.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...gt_ebuddy
Joint Speech and Speaker Recognition using Hidden Markov Model/Vector Quantization for speaker independent Speech Recognition and Gaussian Mixture Model for speech independent speaker recognition- used MFCC (Mel-Frequency Cepstral Coefficient) for Feature Extraction (delta,delta delta and energy - 39 coefficients).
Developed in JAVA with client/server Architecture, web interface developed in Adobe Flex.
This project was done at TU, IOE - Pulchowk Campus, Nepal.
For more details visit http://ganeshtiwaridotcomdotnp.blogspot.com
ABSTRACT OF PROJECT>>>
Biometric is physical characteristic unique to each individual. It has a very useful application in authentication and access control.
The designed system is a text-prompted version of voice biometric which incorporates text-independent speaker verification and speaker-independent speech verification system implemented independently. The foundation for this joint system is that the speech signal conveys both the speech content and speaker identity. Such systems are more-secure from playback attack, since the word to speak during authentication is not previously set.
During the course of the project various digital signal processing and pattern classification algorithms were studied. Short time spectral analysis was performed to obtain MFCC, energy and their deltas as feature. Feature extraction module is same for both systems. Speaker modeling was done by GMM and Left to Right Discrete HMM with VQ was used for isolated word modeling. And results of both systems were combined to authenticate the user.
The speech model for each word was pre-trained by using utterance of 45 English words. The speaker model was trained by utterance of about 2 minutes each by 15 speakers. While uttering the individual words, the recognition rate of the speech recognition system is 92 % and speaker recognition system is 66%. For longer duration of utterance (>5sec) the recognition rate of speaker recognition system improves to 78%.
Text-Independent Speaker Verification ReportCody Ray
Provides an introduction to the task of speaker recognition, and describes a not-so-novel speaker recognition system based upon a minimum-distance classification scheme. We describe both the theory and practical details for a reference implementation. Furthermore, we discuss an advanced technique for classification based upon Gaussian Mixture Models (GMM). Finally, we discuss the results of a set of experiments performed using our reference implementation.
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
Scene Classification is used in Convolutional Neural Networks (CNNs). We seek to redefine computer vision as an AI problem, understand the importance of scene classification as well as challenges, and the difference between traditional machine learning and deep learning. Additionally, we discuss CNNs, using caffe for implementing CNNs and importact reosources to imorove.
CNNs
Jayani Withanawasam
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
In today's era of digitization and fast internet, many video are uploaded on websites, a mechanism is required to access this video accurately and efficiently. Semantic concept detection achieve this task accurately and is used in many application like multimedia annotation, video summarization, annotation, indexing and retrieval. Video retrieval based on semantic concept is efficient and challenging research area. Semantic concept detection bridges the semantic gap between low level extraction of features from key-frame or shot of video and high level interpretation of the same as semantics. Semantic Concept detection automatically assigns labels to video from predefined vocabulary. This task is considered as supervised machine learning problem. Support vector machine (SVM) emerged as default classifier choice for this task. But recently Deep Convolutional Neural Network (CNN) has shown exceptional performance in this area. CNN requires large dataset for training. In this paper, we present framework for semantic concept detection using hybrid model of SVM and CNN. Global features like color moment, HSV histogram, wavelet transform, grey level co-occurrence matrix and edge orientation histogram are selected as low level features extracted from annotated groundtruth video dataset of TRECVID. In second pipeline, deep features are extracted using pretrained CNN. Dataset is partitioned in three segments to deal with data imbalance issue. Two classifiers are separately trained on all segments and fusion of scores is performed to detect the concepts in test dataset. The system performance is evaluated using Mean Average Precision for multi-label dataset. The performance of the proposed framework using hybrid model of SVM and CNN is comparable to existing approaches.
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...Academia Sinica
VoIP playout buffer dimensioning has long been a challeng- ing optimization problem, as the buffer size must maintain a balance between conversational interactivity and speech quality. The conversational quality may be affected by a number of factors, some of which may change over time. Although a great deal of research effort has been expended in trying to solve the problem, how the research results are applied in practice is unclear.
In this paper, we investigate the playout buffer dimension- ing algorithms applied in three popular VoIP applications, namely, Skype, Google Talk, and MSN Messenger. We conduct experiments to assess how the applications adjust their playout buffer sizes. Using an objective QoE (Quality of Experience) metric, we show that Google Talk and MSN Messenger do not adjust their respective buffer sizes appropriately, while Skype does not adjust its buffer at all. In other words, they could provide better QoE to users by improving their buffer dimensioning algorithms. Moreover, none of the applications adapts its buffer size to the network loss rate, which should also be considered to ensure optimal QoE provisioning.
Deep learning-based switchable network for in-loop filtering in high efficie...IJECEIAES
The video codecs are focusing on a smart transition in this era. A future area of research that has not yet been fully investigated is the effect of deep learning on video compression. The paper’s goal is to reduce the ringing and artifacts that loop filtering causes when high-efficiency video compression is used. Even though there is a lot of research being done to lessen this effect, there are still many improvements that can be made. In This paper we have focused on an intelligent solution for improvising in-loop filtering in high efficiency video coding (HEVC) using a deep convolutional neural network (CNN). The paper proposes the design and implementation of deep CNN-based loop filtering using a series of 15 CNN networks followed by a combine and squeeze network that improves feature extraction. The resultant output is free from double enhancement and the peak signal-to-noise ratio is improved by 0.5 dB compared to existing techniques. The experiments then demonstrate that improving the coding efficiency by pipelining this network to the current network and using it for higher quantization parameters (QP) is more effective than using it separately. Coding efficiency is improved by an average of 8.3% with the switching based deep CNN in-loop filtering.
CNNs and Fisher Vectors for No-Audio Multimodal Speech Detectionmultimediaeval
Paper: http://ceur-ws.org/Vol-2670/MediaEval_19_paper_23.pdf
Youtube: https://www.youtube.com/watch?v=nFl7q9rCj3g
Jose Vargas and Hayley Hung, CNNs and Fisher Vectors for No-Audio Multimodal Speech Detection. Proc. of MediaEval 2018, 27-29 October 2019, Sophia Antipolis, France.
Abstract:
This paper presents the algorithms that the organisers deployed for the automatic Behavior Analysis (HBA) task in MediaEval 2019, consisting on the detection of speech in social interaction from body-worn acceleration and video only. For acceleration-based prediction, a CNN with access to a window of 3s around and including the one-second prediction window is shown to perform remarkably. For video-based prediction, a Fisher vector pipeline with access only to the prediction window of 1s was found to perform significantly worse, while the late fusion of both approaches resulted in a small improvement.
Presented by Jose Vargas
Current per-title encoding schemes encode the same video content at various bitrates and spatial resolutions to find an optimal bitrate ladder for each video content in Video on Demand (VoD) applications. However, in live streaming applications, a fixed resolution-bitrate ladder is used to avoid the additional encoding time complexity to find optimum resolution-bitrate pairs for every video content. This paper introduces an online per-title encoding scheme (OPTE) for live video streaming applications. In this scheme, each target bitrate’s optimal resolution is predicted from any pre-defined set of resolutions using Discrete Cosine Transform(DCT)-energy-based low-complexity spatial and temporal features for each video segment. Experimental results show that, on average, OPTE yields bitrate savings of 20.45% and 28.45% to maintain the same PSNR and VMAF, respectively, compared to a fixed bitrate ladder scheme (as adopted in current live streaming deployments) without any noticeable additional latency in streaming.
OPTE: Online Per-title Encoding for Live Video Streaming.pdfVignesh V Menon
Abstract: Current per-title encoding schemes encode the same video content at various bitrates and spatial resolutions to find an optimized bitrate ladder for each video content in Video on
Demand (VoD) applications. However, in live streaming applications, a bitrate ladder with fixed bitrate-resolution pairs
is used to avoid the additional latency caused to find optimum
bitrate-resolution pairs for every video content. This paper introduces an online per-title encoding scheme (OPTE) for live
video streaming applications. In this scheme, each target bitrate’s optimal resolution is predicted from any pre-defined
set of resolutions using Discrete Cosine Transform (DCT)-
energy-based low-complexity spatial and temporal features
for each video segment. Experimental results show that, on average, OPTE yields bitrate savings of 20.45% and 28.45%
to maintain the same PSNR and VMAF, respectively, compared to a fixed bitrate ladder scheme (as adopted in current live streaming deployments) without any noticeable additional latency in streaming.
Abstract: With the emergence of multiple modern video codecs, streaming service providers are forced to encode, store, and transmit bitrate ladders of multiple codecs separately, consequently suffering from additional energy costs for encoding, storage, and transmission. To tackle this issue, we introduce an online energy-efficient Multi-Codec Bitrate ladder Estimation scheme (MCBE) for adaptive video streaming applications. In MCBE, quality representations within the bitrate ladder of new-
generation codecs (e.g., High Efficiency Video Coding (HEVC),
Alliance for Open Media Video 1 (AV1)) that lie below the predicted rate-distortion curve of the Advanced Video Coding (AVC) codec are removed. Moreover, perceptual redundancy between representations of the bitrate ladders of the considered codecs is also minimized based on a Just Noticeable Difference (JND) threshold. Therefore, random forest-based models predict the VMAF score of bitrate ladder representations of each codec. In a live streaming session where all clients support the decoding of AVC, HEVC, and AV1, MCBE achieves impressive results, reducing cumulative encoding energy by 56.45%, storage energy usage by 94.99%, and transmission energy usage by 77.61% (considering a JND of six VMAF points). These energy reductions are in comparison to a baseline bitrate ladder encoding based on current industry practice.
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Alpen-Adria-Universität
With the emergence of multiple modern video codecs, streaming service providers are forced to encode, store, and transmit bitrate ladders of multiple codecs separately, consequently suffering from additional energy costs for encoding, storage, and transmission.
To tackle this issue, we introduce an online energy-efficient Multi-Codec Bitrate ladder Estimation scheme (MCBE) for adaptive video streaming applications. In MCBE, quality representations within the bitrate ladder of new-generation codecs (e.g., HEVC, AV1) that lie below the predicted rate-distortion curve of the AVC codec are removed. Moreover, perceptual redundancy between representations of the bitrate ladders of the considered codecs is also minimized based on a Just Noticeable Difference (JND) threshold. Therefore, random forest-based models predict the VMAF of bitrate ladder representations of each codec. In a live streaming session where all clients support the decoding of AVC, HEVC, and AV1, MCBE achieves impressive results, reducing cumulative encoding energy by 56.45%, storage energy usage by 94.99%, and transmission energy usage by 77.61% (considering a JND of six VMAF points). These energy reductions are in comparison to a baseline bitrate ladder encoding based on current industry practice.
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...ijtsrd
Acoustic Scene Classification ASC is classified audio signals to imply about the context of the recorded environment. Audio scene includes a mixture of background sound and a variety of sound events. In this paper, we present the combination of maximal overlap wavelet packet transform MODWPT level 5 and six sets of time domain and frequency domain features are energy entropy, short time energy, spectral roll off, spectral centroid, spectral flux and zero crossing rate over statistic values average and standard deviation. We used DCASE Challenge 2016 dataset to show the properties of machine learning classifiers. There are several classifiers to address the ASC task. We compare the properties of different classifiers K nearest neighbors KNN , Support Vector Machine SVM , and Ensembles Bagged Trees by using combining wavelet and spectral features. The best of classification methodology and feature extraction are essential for ASC task. In this system, we extract at level 5, MODWPT energy 32, relative energy 32 and statistic values 6 from the audio signal and then extracted feature is applied in different classifiers. Mie Mie Oo | Lwin Lwin Oo "Acoustic Scene Classification by using Combination of MODWPT and Spectral Features" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27992.pdfPaper URL: https://www.ijtsrd.com/computer-science/multimedia/27992/acoustic-scene-classification-by-using-combination-of-modwpt-and-spectral-features/mie-mie-oo
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
Deep Convolutional Neural Networks (CNNs) have achieved impressive performance in
edge detection tasks, but their large number of parameters often leads to high memory and energy
costs for implementation on lightweight devices. In this paper, we propose a new architecture, called
Efficient Deep-learning Gradients Extraction Network (EDGE-Net), that integrates the advantages of Depthwise Separable Convolutions and deformable convolutional networks (DeformableConvNet) to address these inefficiencies. By carefully selecting proper components and utilizing
network pruning techniques, our proposed EDGE-Net achieves state-of-the-art accuracy in edge
detection while significantly reducing complexity. Experimental results on BSDS500 and NYUDv2
datasets demonstrate that EDGE-Net outperforms current lightweight edge detectors with only
500k parameters, without relying on pre-trained weights.
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...kevig
While speech recognition Word Error Rate (WER) has reached human parity for English, continuous
speech recognition scenarios such as voice typing and meeting transcriptions still suffer from segmentation
and punctuation problems, resulting from irregular pausing patterns or slow speakers. Transformer
sequence tagging models are effective at capturing long bi-directional context, which is crucial for
automatic punctuation. Automatic Speech Recognition (ASR) production systems, however, are constrained
by real-time requirements, making it hard to incorporate the right context when making punctuation
decisions. Context within the segments produced by ASR decoders can be helpful but limiting in overall
punctuation performance for a continuous speech session. In this paper, we propose a streaming approach
for punctuation or re-punctuation of ASR output using dynamic decoding windows and measure its impact
on punctuation and segmentation accuracy across scenarios. The new system tackles over-segmentation
issues, improving segmentation F0.5-score by 13.9%. Streaming punctuation achieves an average BLEUscore improvement of 0.66 for the downstream task of Machine Translation (MT).
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...kevig
While speech recognition Word Error Rate (WER) has reached human parity for English, continuous speech recognition scenarios such as voice typing and meeting transcriptions still suffer from segmentation and punctuation problems, resulting from irregular pausing patterns or slow speakers. Transformer sequence tagging models are effective at capturing long bi-directional context, which is crucial for automatic punctuation. Automatic Speech Recognition (ASR) production systems, however, are constrained by real-time requirements, making it hard to incorporate the right context when making punctuation decisions. Context within the segments produced by ASR decoders can be helpful but limiting in overall punctuation performance for a continuous speech session. In this paper, we propose a streaming approach for punctuation or re-punctuation of ASR output using dynamic decoding windows and measure its impact on punctuation and segmentation accuracy across scenarios. The new system tackles over-segmentation issues, improving segmentation F0.5-score by 13.9%. Streaming punctuation achieves an average BLEUscore improvement of 0.66 for the downstream task of Machine Translation (MT).
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene Detection Task
1. Technicolor / INRIA / Imperial College
London at the MediaEval 2012 Violent
Scene Detection Task
PENET Cédric – Technicolor, INRIA
DEMARTY Claire-Hélène – Technicolor
SOLEYMANI Mohammad – Imperial College London
GRAVIER Guillaume – CNRS, IRISA
GROS Patrick – INRIA
MediaEval 2012 Pisa Workshop
October, 4th 2012
2. Outline
Introduction
Systems description
Results and conclusion
2 10/7/2012
3. Outline
Introduction
Systems description
Results and conclusion
3 10/7/2012
4. Introduction
Joint effort between Technicolor / INRIA / Imperial College London
5 runs 5 different systems
Re-use of last year’s systems with few differences
Bayesian networks structure learning (Technicolor/INRIA)
Naive bayesian classifier (ICL)
Two new systems from Technicolor/INRIA
Exploiting similarity
Bag-of-Audio words
Fusion of three systems (Technicolor/INRIA – ICL)
4 10/7/2012
5. Outline
Introduction
Systems description
Results and conclusion
5 10/7/2012
6. Run 1: Exploiting Similarity
Idea: can we get the same results as last year using only similarity
measures?
Video features for each frame
Motion activity
Three color harmonisation features: harmonisation template, angle and
energy
Decision: KNN using only closest neighbour
10-movies used to populate KNN
Test frames labelled according to closest neighbour
1 frame of a shot labelled violent shot is labelled violent
6 10/7/2012
7. Run 2: Bag-of-Audio words
Audio features extraction
Extraction of MFCC audio features (with & ) - 20ms windows, 10 ms overlap
Extraction of silence segments with SPro
Extraction of coherent audio segments – Andre-Obrecht 1988
K-Means on non-silent audio segments for vocabulary (of size 128)
Each audio segment replaced by closest centroid
Construction of TF-IDF histograms
Each shot is a document
Classification using SVM
² and histogram intersection kernels
Applied weight on SVM parameter
7 10/7/2012
8. Run 3: Bayesian Networks structure learning
Re-use of Technicolor last year’s system with additionnal features
Audio features: energy, asymmetry, centroid, ZCR, flatness and roll-off at 90%
Video features: shot length, flashes, blood, activity, color coherence, average
luminance, fire and color harmonisation features
Features are averaged over a video shot
Graphical model for modeling conditional probability distributions along
with contextual features and temporal smoothing
Naive Bayesian network (NB)
Bayesian network example
Graph structure learning
Forest augmented naive Bayesian network (FAN)
K2
Late modalities fusion using simple rule
Source: https://controls.engin.umich.edu/wiki/index.php/Bayesian_network_theory
8 10/7/2012
9. Run 4: Naïve Bayesian classifier
Audio modality
Classical low level features extracted from non-silent segments
RMS Energy, pitch, MFCC, ZCR, spectrum flux, Spectral RollOff
Averaged over shots
Video modality
Shot duration, luminance, Average activity, motion component
Averaged over shots
Text features
Simple features such as number of spoken words and the average valence and arousal
per shot (from the dictionary of affect in language)
The results were bad and we decide not to include them in the final submission
A Naïve Bayesian classifier on each modality
Modality fusion using a weighted sum of posterior probabilities.
0.95* audio score +0.05 visual score
9 10/7/2012
10. Run 5: Systems fusion
Simple fusion of three systems
Run 2: Bag-of-Audio words
Run 3: Bayesian networks structure learning
Run 4: Naive bayesian classifier
Fusion by multiplication of probabilities
10 10/7/2012
11. Outline
Introduction
Systems description
Results and conclusions
11 10/7/2012
12. Results
Runs MAP@100 AP-1 AP-2 AP-3 STD
MediaEval Cost
N° Technique (%) (%) (%) (%) (%)
1 Similarity 13.89 0.00 12.91 28.77 14.41 2.29
2 BoAW 40.54 10.85 52.98 57.77 25.82 2.50
3 BN-SL 61.82 60.56 53.15 71.76 9.37 3.57
4 NBN 46.27 40.03 22.97 75.82 26.97 3.64
5 Fusion 57.47 64.52 37.21 70.69 17.82 4.60
Average Precision (AP) for Dead Poet Society (AP-1), Fight Club (AP-2) and Independence Day (AP-3)
STD: Standard deviation of the three test movies
High variation between movies
Best results on Independence day (similar to Armageddon)
Needs more movies to compute MAP
12 10/7/2012
13. Conclusion & perspectives
Similarity search
MAP is bad, but MediaEval Cost is one of the best (6th out of 35)
Adding features and merge decisions from different KNN might improve the
results
Fusion
4th best run overall (out of 35)
Results not as good as expected
Improves precision at the cost of recall (false alarms reduced by a factor of
two)
Test smarter fusion techniques
Bayesian Networks – Structure Learning
3rd best run overall (out of 35)
Very low standard deviation over three movies
Bayesian networks for intermediate concepts
13 10/7/2012
14. Conclusion & perspectives
Bag-of-Audio words
MAP is not bad (11th out of 35)
False alarms and missed detections are pretty low too
Simple tests proved efficient – more investigation needed
Naive bayesian classifier
Simple classifier with audio features can achieve moderatly good results
(10th out of 35)
Text features don’t work
Use a classifier that can learn temporal dynamics
14 10/7/2012