The document describes experiments using cGANs for speech style conversion from habitual to clear speech. It found that:
1) In a speaker-dependent one-to-one mapping experiment, cGANs improved speech intelligibility over DNN mapping for 2 out of 3 speakers based on keyword recall accuracy.
2) A speaker-independent many-to-one mapping experiment showed cGANs improved intelligibility for 1 out of 3 speakers.
3) A speaker-independent many-to-many mapping experiment showed cGANs improved keyword recall for 2 speakers but results were not significant. The modest results were likely due to a small dataset and not transforming additional acoustic features like duration.
This document describes experiments using a variational autoencoder (VAE) to perform spectral voice conversion and style conversion. The VAE was able to produce high quality speech in a vocoding task and achieved good speaker accuracy in a voice conversion experiment. A style conversion from habitual to clear speech using the VAE feature representation and a DNN with skip connections significantly improved speech intelligibility for one speaker from 24% to 46%.
This document presents a robust speech enhancement method using an adaptive Kalman filtering algorithm. It aims to overcome drawbacks of conventional Kalman filtering for speech enhancement. The proposed algorithm only constantly updates the first value of the state vector, eliminating matrix operations and reducing computational complexity. It also includes a forgetting factor to automatically adjust the estimation of environmental noise based on observation data, allowing the algorithm to better estimate real noise. Experimental results show the proposed robust algorithm is more effective for speech enhancement than the conventional Kalman filtering approach.
Summary from the paper:
Ipeirotis P. G. et al. Repeated labeling using multiple noisy labelers //Data Mining and Knowledge Discovery. – 2014. – Т. 28. – №. 2. – С. 402-441.
and short list of related papers for the topic
This document summarizes research on improving speech recognition of regional British English accents with limited accent-diverse training data. The research used acoustic model selection and data selection with Gaussian mixture models and deep neural networks. Key findings include: 1) Deep neural networks achieved a 46.85% relative gain over Gaussian mixture models for recognizing accents. 2) Supplementing training data with a small amount from the most difficult accent (Glaswegian) led to similar gains as using more data from various accents. 3) Analyzing accent properties and difficulty in the training data helped address challenges of multi-accent learning with limited resources.
Spelling correction systems for e-commerce platformsAnjan Goswami
This is a presentation on building a scalable machine learned spell correction system for an e-commerce site. However, most of the techniques are also generally applicable for any large consumer site.
The document discusses using polymers to functionalize surfaces for applications such as drug delivery. It describes modeling polymers using techniques like self-consistent field theory and numerical methods. The summary concludes that the modeling could help experimentalists design random copolymer brush systems to achieve perpendicular lamellae for high-value semiconductor devices.
The document discusses two experiments on classifying heart sounds using deep learning models. Experiment 1 compares the performance of various pre-trained deep learning models, finding that models pre-trained on audio data performed better than models pre-trained on image data. Experiment 2 evaluates model performance under domain shift conditions by training on one heart sound database and testing on others. It finds that data augmentation techniques like trimming and respiratory scaling can improve robustness, with all augmentation techniques together working best on 3 of 6 test databases.
Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
This document discusses neural semi-supervised learning under domain shift. It presents three research areas:
1) Learning across domains by selecting relevant source domain data for transfer learning using Bayesian optimization. Experimental results on sentiment analysis, POS tagging, and dependency parsing show this approach outperforms baselines.
2) Revisiting classic semi-supervised learning techniques like self-training, tri-training, and comparing them to recent advances. Experiments on sentiment analysis and POS tagging find tri-training works best.
3) The possibility of leveraging pre-trained language models for semi-supervised learning when the target task differs from the source task.
This document describes experiments using a variational autoencoder (VAE) to perform spectral voice conversion and style conversion. The VAE was able to produce high quality speech in a vocoding task and achieved good speaker accuracy in a voice conversion experiment. A style conversion from habitual to clear speech using the VAE feature representation and a DNN with skip connections significantly improved speech intelligibility for one speaker from 24% to 46%.
This document presents a robust speech enhancement method using an adaptive Kalman filtering algorithm. It aims to overcome drawbacks of conventional Kalman filtering for speech enhancement. The proposed algorithm only constantly updates the first value of the state vector, eliminating matrix operations and reducing computational complexity. It also includes a forgetting factor to automatically adjust the estimation of environmental noise based on observation data, allowing the algorithm to better estimate real noise. Experimental results show the proposed robust algorithm is more effective for speech enhancement than the conventional Kalman filtering approach.
Summary from the paper:
Ipeirotis P. G. et al. Repeated labeling using multiple noisy labelers //Data Mining and Knowledge Discovery. – 2014. – Т. 28. – №. 2. – С. 402-441.
and short list of related papers for the topic
This document summarizes research on improving speech recognition of regional British English accents with limited accent-diverse training data. The research used acoustic model selection and data selection with Gaussian mixture models and deep neural networks. Key findings include: 1) Deep neural networks achieved a 46.85% relative gain over Gaussian mixture models for recognizing accents. 2) Supplementing training data with a small amount from the most difficult accent (Glaswegian) led to similar gains as using more data from various accents. 3) Analyzing accent properties and difficulty in the training data helped address challenges of multi-accent learning with limited resources.
Spelling correction systems for e-commerce platformsAnjan Goswami
This is a presentation on building a scalable machine learned spell correction system for an e-commerce site. However, most of the techniques are also generally applicable for any large consumer site.
The document discusses using polymers to functionalize surfaces for applications such as drug delivery. It describes modeling polymers using techniques like self-consistent field theory and numerical methods. The summary concludes that the modeling could help experimentalists design random copolymer brush systems to achieve perpendicular lamellae for high-value semiconductor devices.
The document discusses two experiments on classifying heart sounds using deep learning models. Experiment 1 compares the performance of various pre-trained deep learning models, finding that models pre-trained on audio data performed better than models pre-trained on image data. Experiment 2 evaluates model performance under domain shift conditions by training on one heart sound database and testing on others. It finds that data augmentation techniques like trimming and respiratory scaling can improve robustness, with all augmentation techniques together working best on 3 of 6 test databases.
Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
This document discusses neural semi-supervised learning under domain shift. It presents three research areas:
1) Learning across domains by selecting relevant source domain data for transfer learning using Bayesian optimization. Experimental results on sentiment analysis, POS tagging, and dependency parsing show this approach outperforms baselines.
2) Revisiting classic semi-supervised learning techniques like self-training, tri-training, and comparing them to recent advances. Experiments on sentiment analysis and POS tagging find tri-training works best.
3) The possibility of leveraging pre-trained language models for semi-supervised learning when the target task differs from the source task.
The document discusses domain-specific languages (DSLs) and methods for evaluating the usability of DSLs. It notes that DSLs aim to raise the level of abstraction by focusing on domain concepts rather than computation concepts. This can provide benefits like productivity gains. However, the document states that software language engineers often do not evaluate how their languages impact the software development process.
It then presents Barišić's work on introducing DSL usability evaluation into the DSL development lifecycle. This includes designing effective experiments to provide qualitative and quantitative feedback to DSL developers. The goal is to produce user-centered DSL design and foresee quality while the language evolves. The document outlines one case study where two
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
The document summarizes a proposed method for reducing over-generation errors in automatic keyphrase extraction using integer linear programming. It formulates keyphrase extraction as a combinatorial optimization problem to find the optimal set of keyphrases. An ILP model is defined that represents the value of a keyphrase set as the sum of unique word weights, outperforming baselines that rank candidates independently. Experiments on scientific documents show the ILP approach substantially improves precision over commonly used methods.
Pascual, Santiago, Antonio Bonafonte, and Joan Serrà. "SEGAN: Speech Enhancement Generative Adversarial Network." INTERSPEECH 2017.
Current speech enhancement techniques operate on the spectral domain and/or exploit some higher-level feature. The majority of them tackle a limited number of noise conditions and rely on first-order statistics. To circumvent these issues, deep networks are being increasingly used, thanks to their ability to learn complex functions from large example sets. In this work, we propose the use of generative adversarial networks for speech enhancement. In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them. We evaluate the proposed model using an independent, unseen test set with two speakers and 20 alternative noise conditions. The enhanced samples confirm the viability of the proposed model, and both objective and subjective evaluations confirm the effectiveness of it. With that, we open the exploration of generative architectures for speech enhancement, which may progressively incorporate further speech-centric design choices to improve their performance.
The document describes two experiments that investigate how speakers align their referring expressions with dialogue partners. The first experiment found that speakers will align by using dispreferred properties and overspecifying attributes when primed to do so. The second experiment added a secondary memory task and found that speakers aligned less when cognitively taxed, suggesting alignment is an effortful process that primarily benefits listeners rather than being automatic for speakers. A dual-process model of alignment is proposed with preferences prioritized over alignment under pressure.
PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal AttentionTaesu Kim
Paper review: "M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention"
Presented at Tensorflow-KR paper review forum (#PR12) by Taesu Kim
Paper link: https://arxiv.org/abs/1907.04378
Video link: https://youtu.be/CpRGaFPIZnw (in Korean)
Iterative Hybridization of DE with LS (IHDELS) is a mehtod for large scale global optimization problems. It alternates between differential evolution (DE) and local search (LS) iterations. DE is used for exploration while LS is used for exploitation. The LS methods used are adapted over time based on their accumulated improvements. IHDELS was tested on 15 benchmark functions up to 1000 dimensions, and achieved good results finding optimal or near-optimal solutions on most test functions within 3 million evaluations.
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Jinho Choi
This document discusses using reinforcement learning for hospital readmission prediction from clinical notes. It presents an approach that uses a bag-of-words encoder with an RL agent to perform automatic noise pruning. The RL agent is able to identify and remove noisy tokens and text segments, improving performance over strong baselines. Experimental results show the RL method achieves better performance than deep learning approaches, while reducing the feature space to alleviate overfitting on this small clinical dataset.
Evolution of specialist vs. generalist strategies in a continuous environmentFlorence (Flo) Debarre
The document describes a model of the evolution of specialist versus generalist strategies in a spatially continuous landscape with two habitat types. The model examines the conditions under which a generalist strategy that uses both habitats equally is evolutionarily stable versus specialist strategies that focus on one habitat. It finds that a generalist strategy is always convergence stable, while it is evolutionarily stable under conditions of a concave trade-off between habitats and sufficient migration between habitats.
This document proposes a cluster-based Hausdorff distance (CHD) to measure the similarity between color palettes. It clusters the colors in each palette into groups and then calculates the distance between the clustered groups. Previous methods like Hausdorff distance do not account for the weights of each color. The document tests CHD on sample palettes and finds it performs better than other methods at capturing overall tone and partial matches. It concludes CHD reflects the characteristics of color palettes but notes more research is needed to determine the best way to measure palette similarity.
1. The document discusses various applications of deep learning algorithms for speaker identification and recognition, including convolutional deep belief networks (CDBN) and deep neural networks (DNN).
2. CDBN was shown to outperform traditional MFCC and raw features for audio classification tasks including speech and music recognition.
3. DNN approaches have demonstrated lower error rates than GMM-HMM models for speech recognition across multiple languages.
4. SIDEKIT is an open source Python toolkit that can implement state-of-the-art methods for speaker identification, including GMM-HMM, and has potential to incorporate DNN approaches.
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
From Natural Language to Structured Solr Queries using LLMsSease
This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index’s metadata.
This approach leverages the LLM’s ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.
The document discusses domain-specific languages (DSLs) and methods for evaluating the usability of DSLs. It notes that DSLs aim to raise the level of abstraction by focusing on domain concepts rather than computation concepts. This can provide benefits like productivity gains. However, the document states that software language engineers often do not evaluate how their languages impact the software development process.
It then presents Barišić's work on introducing DSL usability evaluation into the DSL development lifecycle. This includes designing effective experiments to provide qualitative and quantitative feedback to DSL developers. The goal is to produce user-centered DSL design and foresee quality while the language evolves. The document outlines one case study where two
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
The document summarizes a proposed method for reducing over-generation errors in automatic keyphrase extraction using integer linear programming. It formulates keyphrase extraction as a combinatorial optimization problem to find the optimal set of keyphrases. An ILP model is defined that represents the value of a keyphrase set as the sum of unique word weights, outperforming baselines that rank candidates independently. Experiments on scientific documents show the ILP approach substantially improves precision over commonly used methods.
Pascual, Santiago, Antonio Bonafonte, and Joan Serrà. "SEGAN: Speech Enhancement Generative Adversarial Network." INTERSPEECH 2017.
Current speech enhancement techniques operate on the spectral domain and/or exploit some higher-level feature. The majority of them tackle a limited number of noise conditions and rely on first-order statistics. To circumvent these issues, deep networks are being increasingly used, thanks to their ability to learn complex functions from large example sets. In this work, we propose the use of generative adversarial networks for speech enhancement. In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them. We evaluate the proposed model using an independent, unseen test set with two speakers and 20 alternative noise conditions. The enhanced samples confirm the viability of the proposed model, and both objective and subjective evaluations confirm the effectiveness of it. With that, we open the exploration of generative architectures for speech enhancement, which may progressively incorporate further speech-centric design choices to improve their performance.
The document describes two experiments that investigate how speakers align their referring expressions with dialogue partners. The first experiment found that speakers will align by using dispreferred properties and overspecifying attributes when primed to do so. The second experiment added a secondary memory task and found that speakers aligned less when cognitively taxed, suggesting alignment is an effortful process that primarily benefits listeners rather than being automatic for speakers. A dual-process model of alignment is proposed with preferences prioritized over alignment under pressure.
PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal AttentionTaesu Kim
Paper review: "M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention"
Presented at Tensorflow-KR paper review forum (#PR12) by Taesu Kim
Paper link: https://arxiv.org/abs/1907.04378
Video link: https://youtu.be/CpRGaFPIZnw (in Korean)
Iterative Hybridization of DE with LS (IHDELS) is a mehtod for large scale global optimization problems. It alternates between differential evolution (DE) and local search (LS) iterations. DE is used for exploration while LS is used for exploitation. The LS methods used are adapted over time based on their accumulated improvements. IHDELS was tested on 15 benchmark functions up to 1000 dimensions, and achieved good results finding optimal or near-optimal solutions on most test functions within 3 million evaluations.
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Jinho Choi
This document discusses using reinforcement learning for hospital readmission prediction from clinical notes. It presents an approach that uses a bag-of-words encoder with an RL agent to perform automatic noise pruning. The RL agent is able to identify and remove noisy tokens and text segments, improving performance over strong baselines. Experimental results show the RL method achieves better performance than deep learning approaches, while reducing the feature space to alleviate overfitting on this small clinical dataset.
Evolution of specialist vs. generalist strategies in a continuous environmentFlorence (Flo) Debarre
The document describes a model of the evolution of specialist versus generalist strategies in a spatially continuous landscape with two habitat types. The model examines the conditions under which a generalist strategy that uses both habitats equally is evolutionarily stable versus specialist strategies that focus on one habitat. It finds that a generalist strategy is always convergence stable, while it is evolutionarily stable under conditions of a concave trade-off between habitats and sufficient migration between habitats.
This document proposes a cluster-based Hausdorff distance (CHD) to measure the similarity between color palettes. It clusters the colors in each palette into groups and then calculates the distance between the clustered groups. Previous methods like Hausdorff distance do not account for the weights of each color. The document tests CHD on sample palettes and finds it performs better than other methods at capturing overall tone and partial matches. It concludes CHD reflects the characteristics of color palettes but notes more research is needed to determine the best way to measure palette similarity.
1. The document discusses various applications of deep learning algorithms for speaker identification and recognition, including convolutional deep belief networks (CDBN) and deep neural networks (DNN).
2. CDBN was shown to outperform traditional MFCC and raw features for audio classification tasks including speech and music recognition.
3. DNN approaches have demonstrated lower error rates than GMM-HMM models for speech recognition across multiple languages.
4. SIDEKIT is an open source Python toolkit that can implement state-of-the-art methods for speaker identification, including GMM-HMM, and has potential to incorporate DNN approaches.
Similar to Improving speech Intelligibility through Speaker Dependent and Independent Spectral Style Conversion (13)
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
From Natural Language to Structured Solr Queries using LLMsSease
This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index’s metadata.
This approach leverages the LLM’s ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
"NATO Hackathon Winner: AI-Powered Drug Search", Taras KlobaFwdays
This is a session that details how PostgreSQL's features and Azure AI Services can be effectively used to significantly enhance the search functionality in any application.
In this session, we'll share insights on how we used PostgreSQL to facilitate precise searches across multiple fields in our mobile application. The techniques include using LIKE and ILIKE operators and integrating a trigram-based search to handle potential misspellings, thereby increasing the search accuracy.
We'll also discuss how the azure_ai extension on PostgreSQL databases in Azure and Azure AI Services were utilized to create vectors from user input, a feature beneficial when users wish to find specific items based on text prompts. While our application's case study involves a drug search, the techniques and principles shared in this session can be adapted to improve search functionality in a wide range of applications. Join us to learn how PostgreSQL and Azure AI can be harnessed to enhance your application's search capability.
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Improving speech Intelligibility through Speaker Dependent and Independent Spectral Style Conversion
1. 1/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Improving Speech Intelligibility through Speaker
Dependent and Independent Spectral Style
Conversion
Tuan Dinh, Alexander Kain, Kris Tjaden
Oregon Health & Science University, University at Bualo
October 23, 2020
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
2. 2/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Background
Hybridization
Style Conversion
Background
Approximately 28 × 106 people in the United States have some
degree of hearing loss
Speakers naturally adopt a special clear speaking style when
talking to
listeners with hearing loss
normal-hearing listeners in adverse environments
Clear speech features
high degree of articulation
slower speaking rate
more frequent and longer pauses
exact strategy varies from speaker to speaker
Clear speech is more intelligible than habitual speech
1424% improvement in keyword recall in noise [Kain08]
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
4. 4/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Background
Hybridization
Style Conversion
Hybridization
Replacing certain acoustic features of habitual speech with
those from clear speech cause improved intelligibility
for typical speakers, incorporating [Kain08]
clear spectrum and duration yielded 24% improvement
for dysarthric speakers, incorporating [Tjaden14]
clear energy yielded 8.7% improvement
clear spectrum yielded 18% improvement
clear spectrum and duration yielded 13.4% improvement
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
5. 5/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Background
Hybridization
Style Conversion
Style Conversion
Style conversion converts speaking style
Previously
mapping habitual (HAB) to clear (CLR) VAE-12 resulted in
improvement of intelligibility for one speaker from 24% to 46%
[Dinh19]
Generated parameters from DNN-mapping can be
over-smoothing
Generative adversarial nets (GANs) can be a promising
approach to address over-smoothness
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
6. 6/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Background
Hybridization
Style Conversion
Style Conversion
Aim
To further increase intelligibility automatically by style conversion,
through the use of a conditional GANs (cGANs)
Experiments showing ecacy of cGANs in terms of speech
intelligibility when performing
1 speaker dependent one-to-one mapping
2 speaker independent many-to-one mapping
3 speaker independent many-to-many mapping
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
7. GANs
Traditional GAN has 2 components: a Generator (G) and a
Discriminator (D) that play a min-max game [Goodfellow14]
Figure: GANs
7/26
8. 8/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Background
Hybridization
Style Conversion
Proposed cGANs for style conversion
Left Context
HAB VAE
Right Context
G
D
Mapped VAE
HAB VAE
CLR VAE
D
Real Pairs?
Real Pairs?
Figure: cGAN framework for style conversion
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
9. 9/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Background
Hybridization
Style Conversion
Proposed Generator
Current
HAB
VAE
12
Left
Context
60
Right
Context
60
Concat
Dense
512
Dense
512
Concat
Dense
512
Dense
512
Linear
12
Add
Current
CLR
VAE
12
Figure: Generator architecture
No random noise z
The component G learns the dierences between HAB VAE-12
and CLR VAE-12
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
10. 10/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Background
Hybridization
Style Conversion
Proposed Discriminator
Discriminator has 2 hidden layers of 256 nodes, an output layer
of 1 nodes with sigmoid function
In addition to adversarial loss, we use mean-absolute dierence
loss between G(z) and aligned real data x
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
11. 11/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Background
Hybridization
Style Conversion
Tips and Tricks to Train cGANs
a leaky ReLU activation function with a negative slope of 0.2
for both G and D
a dropout layer following each hidden layer of D with a
dropout rate of 0.5,
use the Adam optimizer:
learning rate: 0.0001, momentum β1: 0.5 and learning rate
decay: 0.00001 for D
learning rate: 0.0002, momentum β1: 0.5 and learning rate
decay: 0.00001 for G
weights initialized from a zero-centered Normal distribution
with standard deviation 0.02
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
12. 12/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Method
Objective Evaluation
Subjective Evaluation
Experiment: One-to-one mapping
Train speaker-dependent HAB-to-CLR mapping:
Require parallel data of HAB and CLR speech
Database: Used a 78 speaker database:
Consisting of control speakers (CS, N = 32)
Speaker with multiple sclerosis (MS, N = 30)
Speakers with Parkinson's disease (PD, N = 16)
A speaker read 25 Harvard sentences in 2 speaking styles
(HAB, CLR)
Select three speakers: PDM6, CSM7, PDF7 that showed the
most benet from the CLR spectrum
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
13. 13/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Method
Objective Evaluation
Subjective Evaluation
Method
HAB
VAE-
12
style
mapping
CLR
VAE-
12
Figure: cGANs-based mapping
We aligned each HAB utterance to its parallel CLR utterance
of the same speaker using DTW on 32nd-order log lter-bank
features.
Then, we pre-trained the generator that maps HAB VAE-12 to
CLR VAE-12 to minimize mean-squared-error loss function
Then, we trained our proposed cGANs structure
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
14. 14/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Method
Objective Evaluation
Subjective Evaluation
Objective Evaluation: Log Spectral Distortion
mapping speakers PD_F7 PD_M6 C_M7
DNN 16.8 16.67 16.44
GAN 12.85 12.58 12.67
Table: Average LSD (in dB)
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
15. 15/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Method
Objective Evaluation
Subjective Evaluation
Objective Evaluation: LSD
0
10
20
LSD(dB)
Speaker PD_F7
DNN
GAN
0
10
LSD(dB)
Speaker PD_M6
DNN
GAN
0 5 10 15 20 25
Sentence ID
0
10
20
LSD(dB)
Speaker C_M7
DNN
GAN
Figure: LSD of 25 test sentences for 3 speakers; GAN vs DNN
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
16. 16/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Method
Objective Evaluation
Subjective Evaluation
Objective Evaluation: Variance ratio
0
1
22
CLR
2
MAP
Speaker PD_F7
DNN
GAN
0
1
2
2
CLR
2
MAP
Speaker PD_M6
DNN
GAN
2 4 6 8 10 12
VAE-12 component
0
1
2
2
CLR
2
MAP
Speaker C_M7
DNN
GAN
Figure: Variance ratio
σ2
CLR
σ2
MAP
between CLR VAE-12 (CLR) and mapped
VAE-12 (MAP); between GAN and DNN. Smaller is better.
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
17. 17/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Method
Objective Evaluation
Subjective Evaluation
Objective Evaluation: Example
Figure: Sentence: Four hours of steady work faced us.
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
18. 18/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Method
Objective Evaluation
Subjective Evaluation
Subjective Evaluation
Loudness dierence was minimized using RMSA measure
Stimuli was mixed with babble noise at 0 dB SNR
The test consists of 25 sentences × 3 speakers × 5 conditions
(2 purely vocoded, 1 hybrid, 2 mappings) = 375 unique trails
60 participants on AMT, each listened to 25 sentences then
typed down the sentences
We manually counted the accurate keywords of each sentence
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion
26. 26/26
Introduction
Experiment: One-to-one mapping
Experiment: Many-to-one mapping
Experiment: Many-to-many mapping
Conclusion
Conclusion
Apply cGANs in HAB-to-CLR style conversion
1 In speaker-dependent one-to-one mapping, cGANs outperform
DNN in term keyword recall accuracy. cGANs improved
intelligibility of two of three speakers
2 In speaker-independent many-to-one mapping, cGANs can
improve speech intelligibility of one of three speakers
3 In speaker-independent many-to-many mapping, cGANs can
improve keyword recall accuracy of two speakers but the
results are not signicant
The modest results of speaker-independent style conversion are
due to small dataset, and the fact that we did not attempt to
transform additional acoustic features, such as phoneme
durations
Tuan Dinh, Alexander Kain, Kris Tjaden cGANs for Voice and Style Conversion