This document discusses speech coding and the GSM codec. It begins by explaining speech creation in humans and key features like pitch, formants, and the vocal tract filter. It then outlines three main types of speech codecs: waveform codecs which encode the raw waveform; source codecs which model the voice source; and hybrid codecs which combine both approaches. The document focuses on the GSM codec as an example hybrid codec, describing its full-rate and half-rate variations. It aims to provide background on speech coding techniques used in mobile networks like GSM.
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
This document summarizes a study on the performance of turbo coded orthogonal frequency division multiplexing (OFDM) over fading channels. It describes how OFDM can mitigate inter-symbol interference caused by frequency selective fading channels by dividing the channel into parallel subchannels. It then provides details on turbo coding, including the encoder and iterative decoder design. The system model studied transmits a turbo coded OFDM signal over a frequency selective Rayleigh fading channel and evaluates the performance for rate 1/3 and 1/2 turbo codes. Simulation results are presented to analyze the bit error rate.
1) IP telephony uses RTP/UDP/IP to transport voice data over IP networks in order to allow for packet loss and variable delay while still delivering voice packets in order.
2) RTP provides mechanisms for delivering audio and video over IP networks by encapsulating media streams in UDP packets and providing sequence numbering and time-stamping to enable synchronization and loss detection.
3) Key challenges for delivering voice over IP include delay, jitter, packet loss, echo, and the need for voice activity detection and compression to efficiently use available bandwidth.
The document compares the performance of single stage and double stage interleavers in communication systems using turbo codes. A single stage interleaver uses one random interleaver between two convolutional encoders, while a double stage interleaver uses two interleavers in series. The document suggests that a double stage interleaver can improve the bit error rate (BER) of the system compared to a single stage interleaver by further scrambling the input bits. It also provides details on the components of a turbo code system such as convolutional encoders, interleavers, puncturing, and iterative decoding.
This document analyzes speech coding algorithms for Hindi and English languages. It discusses Linear Predictive Coding (LPC), an algorithm that accurately estimates speech parameters and represents speech signals at reduced bit rates while preserving quality. The paper proposes a voice-excited LPC algorithm and implements it on Hindi and English male and female voices. It analyzes tradeoffs between bit rates, delay, signal-to-noise ratio, and complexity. The results show low bit-rates and better signal-to-noise ratio with this algorithm.
This lecture discusses the Public Switched Telephone Network (PSTN) and its evolution since 1876. The PSTN uses both analog and digital signaling to transmit voice signals between users. Analog signaling is ideal for human interaction but less robust than digital. Digital signaling converts the analog voice signal into a digital bitstream using Pulse Code Modulation (PCM) sampling. The PSTN infrastructure includes local loops connecting users to central offices, trunks connecting central offices, and various signaling protocols for user-network and network-network communication like Dual Tone Multi-Frequency (DTMF) and Integrated Services Digital Network (ISDN) signaling. Numbering plans organize addressing for routing calls both nationally like the North American Number
The public switched telephone network (PSTN), also known as plain old telephone service (POTS), is the network of the world's public circuit-switched telephone networks, originally analog but now digital in its core, including both fixed and mobile phones. Technical standards created by the ITU-T allow different national networks to interconnect seamlessly through a single global numbering system based on E.163 and E.164, making it possible to connect any phone to any other phone worldwide.
This lecture discusses the Public Switched Telephone Network (PSTN) and its evolution since 1876. Key points covered include:
- The basic components of a PSTN including local loops, trunks, switches, and signaling.
- How analog voice signals are converted to digital signals through pulse code modulation sampling to transmit over the network.
- The main signaling methods used in PSTN including in-band dual tone multi-frequency (DTMF) signaling and out-of-band ISDN signaling.
- How the network hierarchy is organized using various transmission mediums like copper, fiber, and microwave links operating at different data rates.
- International and national numbering plans used to
In this paper we present the implementation of speaker identification system using artificial neural network
with digital signal processing. The system is designed to work with the text-dependent speaker
identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using
an audio wave recorder. The speech features are acquired by the digital signal processing technique. The
identification of speaker using frequency domain data is performed using backpropagation algorithm.
Hamming window and Blackman-Harris window are used to investigate better speaker identification
performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
This document summarizes a study on the performance of turbo coded orthogonal frequency division multiplexing (OFDM) over fading channels. It describes how OFDM can mitigate inter-symbol interference caused by frequency selective fading channels by dividing the channel into parallel subchannels. It then provides details on turbo coding, including the encoder and iterative decoder design. The system model studied transmits a turbo coded OFDM signal over a frequency selective Rayleigh fading channel and evaluates the performance for rate 1/3 and 1/2 turbo codes. Simulation results are presented to analyze the bit error rate.
1) IP telephony uses RTP/UDP/IP to transport voice data over IP networks in order to allow for packet loss and variable delay while still delivering voice packets in order.
2) RTP provides mechanisms for delivering audio and video over IP networks by encapsulating media streams in UDP packets and providing sequence numbering and time-stamping to enable synchronization and loss detection.
3) Key challenges for delivering voice over IP include delay, jitter, packet loss, echo, and the need for voice activity detection and compression to efficiently use available bandwidth.
The document compares the performance of single stage and double stage interleavers in communication systems using turbo codes. A single stage interleaver uses one random interleaver between two convolutional encoders, while a double stage interleaver uses two interleavers in series. The document suggests that a double stage interleaver can improve the bit error rate (BER) of the system compared to a single stage interleaver by further scrambling the input bits. It also provides details on the components of a turbo code system such as convolutional encoders, interleavers, puncturing, and iterative decoding.
This document analyzes speech coding algorithms for Hindi and English languages. It discusses Linear Predictive Coding (LPC), an algorithm that accurately estimates speech parameters and represents speech signals at reduced bit rates while preserving quality. The paper proposes a voice-excited LPC algorithm and implements it on Hindi and English male and female voices. It analyzes tradeoffs between bit rates, delay, signal-to-noise ratio, and complexity. The results show low bit-rates and better signal-to-noise ratio with this algorithm.
This lecture discusses the Public Switched Telephone Network (PSTN) and its evolution since 1876. The PSTN uses both analog and digital signaling to transmit voice signals between users. Analog signaling is ideal for human interaction but less robust than digital. Digital signaling converts the analog voice signal into a digital bitstream using Pulse Code Modulation (PCM) sampling. The PSTN infrastructure includes local loops connecting users to central offices, trunks connecting central offices, and various signaling protocols for user-network and network-network communication like Dual Tone Multi-Frequency (DTMF) and Integrated Services Digital Network (ISDN) signaling. Numbering plans organize addressing for routing calls both nationally like the North American Number
The public switched telephone network (PSTN), also known as plain old telephone service (POTS), is the network of the world's public circuit-switched telephone networks, originally analog but now digital in its core, including both fixed and mobile phones. Technical standards created by the ITU-T allow different national networks to interconnect seamlessly through a single global numbering system based on E.163 and E.164, making it possible to connect any phone to any other phone worldwide.
This lecture discusses the Public Switched Telephone Network (PSTN) and its evolution since 1876. Key points covered include:
- The basic components of a PSTN including local loops, trunks, switches, and signaling.
- How analog voice signals are converted to digital signals through pulse code modulation sampling to transmit over the network.
- The main signaling methods used in PSTN including in-band dual tone multi-frequency (DTMF) signaling and out-of-band ISDN signaling.
- How the network hierarchy is organized using various transmission mediums like copper, fiber, and microwave links operating at different data rates.
- International and national numbering plans used to
In this paper we present the implementation of speaker identification system using artificial neural network
with digital signal processing. The system is designed to work with the text-dependent speaker
identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using
an audio wave recorder. The speech features are acquired by the digital signal processing technique. The
identification of speaker using frequency domain data is performed using backpropagation algorithm.
Hamming window and Blackman-Harris window are used to investigate better speaker identification
performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
This document summarizes a research paper on speaker identification using artificial neural networks. The paper presents a speaker identification system that uses digital signal processing and ANN techniques. Speech features are extracted from utterances using FFT and windowing. These features are used to train a multi-layer perceptron network to classify speakers. The system was tested on Bangla speech and achieved accurate identification of speakers from their utterances.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...IRJET Journal
This document discusses a speaker identification system that uses Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and a Support Vector Machine (SVM) classifier. MFCC is used to extract statistical features from voice samples that capture unique characteristics. An SVM classifier then classifies voices as original or disguised based on the extracted features. The system was tested on 176 voice samples from various sources with 30 for training and 72 for testing. MFCC extracted mean and correlation coefficients as features. SVM classification accurately identified original vs disguised voices 90% of the time, demonstrating the effectiveness of the proposed system.
This document discusses a fast algorithm for noisy speaker recognition using artificial neural networks. It summarizes the following:
- The algorithm first records voice patterns from speakers via a noisy channel and applies noise removal techniques.
- It then extracts features using Mel Frequency Cepstral Coefficients (MFCC) and reduces features using Principal Component Analysis.
- The reduced feature vectors are classified using an artificial neural network classifier.
- Experimental results showed the proposed algorithm achieved about 99% accuracy on average, which is higher than other methods, and was also faster.
Speech coding is used to efficiently transmit speech through digital channels by retaining only the information useful to listeners. The LPC-10 standard uses linear predictive coding with 10 coefficients to analyze and synthesize speech. During analysis, it extracts parameters like voicing and pitch from the speech signal. The synthesis process uses these parameters to generate noise or periodic excitation, apply an LPC filter, and control gain. LPC-10 transmits speech at 2.4kbps by coding the 10 LPC coefficients, pitch, voicing, and energy into 54 bits per frame. It enables understandable but unnatural sounding speech and is used for secure voice transmissions.
This document describes a Simulink model for simulating Bluetooth voice transmission. The model includes blocks for speech coding, framing, error checking, modulation, frequency hopping, and a channel model with interference. Tests are performed by running simulations with varying parameters and analyzing output metrics like bit error rate and frame error rate. The model allows testing the effects of noise, interference and other factors on Bluetooth transmission performance.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
This document provides a final report on the implementation of a Dual-Tone Multiple Frequency (DTMF) detector algorithm on a TMS320C50 digital signal processor. The author models the DTMF detection algorithm in Ptolemy's C50 domain and generates code, but encounters problems with the large code size and incompatibility with the target hardware. To address these issues, the author develops a new SimC50 target that generates optimized code for the Texas Instruments TMS320C50 Evaluation Module and associated simulator instead of the Starter Kit. The new target is able to generate and compile code, although linking the code to run on the simulator was still in progress at the time of the report.
The document provides an overview of Bluetooth technology, including its history, working, specifications, advantages, disadvantages, and applications. It describes how Bluetooth was developed in the 1990s as a wireless alternative to cables that connects electronic devices like phones, laptops, and printers. The technology uses frequency-hopping spread spectrum in the 2.4GHz band to enable cable-free connectivity within short ranges. It has become a global standard and is widely used in mobile phones and wireless headphones.
The document discusses speech codecs and wireless local loop (WLL) systems. It explains that speech codecs aim to reduce bit rates while maintaining quality in order to accommodate more speech channels within the same transmission bandwidth. This is possible due to redundancies in human speech and the perceptual limits of the human ear. Lower bit rate codecs are more complex, introduce more delay but are also cheaper. Codec performance is evaluated based on metrics like bit rate, quality and complexity. WLL systems use cells and sectors to serve subscribers via base stations within a given bandwidth. Codecs play a key role in improving the capacity of these systems.
The document discusses a summer training program at PLC Institute of Electronics in Rohini, Delhi related to embedded systems and robotics. It includes an acknowledgement, preface, and sections on the technology used, electronics parts, mechanical parts, software parts, project snapshots, and conclusion. The training provided the student with an opportunity to learn about embedded systems and gain experience working in an industrial setting as part of a team.
This document discusses Voice over Internet Protocol (VoIP). It begins by introducing VoIP and how it allows phone calls and faxes to be sent over IP-based data networks. It then discusses how VoIP works by digitizing voice, compressing it into packets, transmitting the packets over the internet, and reconstructing the voice signal at the receiving end. The document also covers some key components of a VoIP system such as encoders, decoders, and quality of service mechanisms. Finally, it briefly mentions that most VoIP implementations follow the ITU H.323 standard.
This document summarizes how a fixed landline call is setup and routed through the telecommunications network. It first goes to the nearest switching center (PSTN). If the caller and receiver are in the same base switching center (BSC), the call is completed. Otherwise, it is transferred to the main switching center (MSC) and then to the receiver's prior BSC. For mobile calls, the call transfer is handled by the mobile telephone switching office (MTSO) and base transceiver stations (BTSs). The telephone exchange checks if numbers are valid and sends busy tones if lines are engaged. Calls are connected between telephones via remote exchanges.
The document provides an overview of techniques for secure speech communication, including speech coding, speaker identification, and encryption/decryption. It discusses various speech coding techniques like waveform coding, parametric coding, and hybrid coding that can compress speech signals while maintaining quality. It also describes speaker identification methods using hashing to authenticate users. For encryption, it outlines symmetric techniques like AES that use a shared key, and asymmetric techniques like RSA that use public/private key pairs. The goal is to integrate these methods to provide a high level of security for speech communication by removing redundancy, authenticating speakers, and strongly encrypting signals.
The document provides an overview of Signaling System 7 (SS7), which is an international standard for exchanging call setup, routing, and control information between telecommunications network elements in the public switched telephone network (PSTN). Key points discussed include:
- SS7 enables enhanced services like caller ID, call forwarding, toll-free numbers, and wireless roaming.
- SS7 uses a separate packet-switched network for signaling data rather than in-band signaling over voice channels.
- The SS7 network consists of switching points like service switching points (SSPs), signal transfer points (STPs), and service control points (SCPs).
- SS7 signaling links connect these
Interest towards speech coding & standardization:
– World wide growth in communication networks
– Emergence of new multimedia applications
– Advances in Very Large-Scale Integration (VLSI)
devices
• Standardization
– International Telecommunications Union (ITU)
– European Telecom. Standards Institute (ETSI)
– International Standards Organization (ISO)
– Telecommunication Industry Association (TIA), NA
– R&D Center for Radio systems (RCR), Japan
The document summarizes research on extracting speaker-specific information from the residual signal obtained from linear predictive coding (LPC) analysis of speech. It discusses how LPC analysis separates speech into a vocal tract system component and a source excitation component. The residual signal contains information about the excitation source, including prosodic and speaker-specific characteristics. The document proposes extracting mel-frequency cepstral coefficients (MFCC) from the residual signal to capture this speaker-specific information. It describes performing LPC analysis at different orders to vary the amount of vocal tract system information remaining in the residual. The researchers believe the residual signal contains robust speaker characteristics that can be used for automatic text-independent speaker tracking when classified with a feed-forward neural network model
The document discusses speech compression using GSM RPE-LTP encoding. It begins with an introduction to GSM standards and architecture. It then describes how speech is generated and modeled in the GSM 6.10 vocoder using the RPE-LTP algorithm. The algorithm compresses speech by analyzing the signal to determine if it is voiced or unvoiced, encoding the periodicity of voiced sounds, and transmitting the filter parameters. At the receiver, the decoder uses these parameters to reconstruct the speech signal through linear predictive coding, long term prediction synthesis filtering, and residual pulse decoding.
BTCL is Bangladesh's primary telecommunications provider. It operates landline telephone services across urban areas of Bangladesh and provides international calling. While BTCL previously had a monopoly on phone services in Dhaka, several other operators have entered the market since 2004.
BTCL's network uses digital and analog switching technologies. Exchanges like the ZTE and Shanghai Bell switches route calls between local loops, trunks, and switching offices. Services include landlines, wireless networks, leased lines, fax, and prepaid cards. Calls are established through switching cards like the ASLC cards which provide dial tones.
Maintenance of the network involves equipment like cable cabinets, distribution frames, and distribution points. Cables connect these
This document provides an overview of key concepts for drive testing, including physical verification, instruments used, antenna adjustments, tilt types, cell identifiers, frequency reuse, and Test Equipment Mobile System (TEMS). It discusses topics like antenna beamwidth, receiver level, logfile contents, neighboring cell information, and parameters to check during drive testing like radio quality, interference levels, and timing advance. The document is intended as an introductory guide for learning the basic concepts needed for effective drive testing.
The document summarizes research on multiple description coding for audio and speech to provide redundancy and improve quality over lossy networks. It describes a multiple description speech coder based on the AMR-WB standard that splits bits between two descriptions. It also discusses applying multiple description correlating transforms to perceptual audio coding to achieve better quality than a single description when packets are lost. The research was evaluated using simulations and subjective listening tests.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
This document summarizes a research paper on speaker identification using artificial neural networks. The paper presents a speaker identification system that uses digital signal processing and ANN techniques. Speech features are extracted from utterances using FFT and windowing. These features are used to train a multi-layer perceptron network to classify speakers. The system was tested on Bangla speech and achieved accurate identification of speakers from their utterances.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...IRJET Journal
This document discusses a speaker identification system that uses Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and a Support Vector Machine (SVM) classifier. MFCC is used to extract statistical features from voice samples that capture unique characteristics. An SVM classifier then classifies voices as original or disguised based on the extracted features. The system was tested on 176 voice samples from various sources with 30 for training and 72 for testing. MFCC extracted mean and correlation coefficients as features. SVM classification accurately identified original vs disguised voices 90% of the time, demonstrating the effectiveness of the proposed system.
This document discusses a fast algorithm for noisy speaker recognition using artificial neural networks. It summarizes the following:
- The algorithm first records voice patterns from speakers via a noisy channel and applies noise removal techniques.
- It then extracts features using Mel Frequency Cepstral Coefficients (MFCC) and reduces features using Principal Component Analysis.
- The reduced feature vectors are classified using an artificial neural network classifier.
- Experimental results showed the proposed algorithm achieved about 99% accuracy on average, which is higher than other methods, and was also faster.
Speech coding is used to efficiently transmit speech through digital channels by retaining only the information useful to listeners. The LPC-10 standard uses linear predictive coding with 10 coefficients to analyze and synthesize speech. During analysis, it extracts parameters like voicing and pitch from the speech signal. The synthesis process uses these parameters to generate noise or periodic excitation, apply an LPC filter, and control gain. LPC-10 transmits speech at 2.4kbps by coding the 10 LPC coefficients, pitch, voicing, and energy into 54 bits per frame. It enables understandable but unnatural sounding speech and is used for secure voice transmissions.
This document describes a Simulink model for simulating Bluetooth voice transmission. The model includes blocks for speech coding, framing, error checking, modulation, frequency hopping, and a channel model with interference. Tests are performed by running simulations with varying parameters and analyzing output metrics like bit error rate and frame error rate. The model allows testing the effects of noise, interference and other factors on Bluetooth transmission performance.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
This document provides a final report on the implementation of a Dual-Tone Multiple Frequency (DTMF) detector algorithm on a TMS320C50 digital signal processor. The author models the DTMF detection algorithm in Ptolemy's C50 domain and generates code, but encounters problems with the large code size and incompatibility with the target hardware. To address these issues, the author develops a new SimC50 target that generates optimized code for the Texas Instruments TMS320C50 Evaluation Module and associated simulator instead of the Starter Kit. The new target is able to generate and compile code, although linking the code to run on the simulator was still in progress at the time of the report.
The document provides an overview of Bluetooth technology, including its history, working, specifications, advantages, disadvantages, and applications. It describes how Bluetooth was developed in the 1990s as a wireless alternative to cables that connects electronic devices like phones, laptops, and printers. The technology uses frequency-hopping spread spectrum in the 2.4GHz band to enable cable-free connectivity within short ranges. It has become a global standard and is widely used in mobile phones and wireless headphones.
The document discusses speech codecs and wireless local loop (WLL) systems. It explains that speech codecs aim to reduce bit rates while maintaining quality in order to accommodate more speech channels within the same transmission bandwidth. This is possible due to redundancies in human speech and the perceptual limits of the human ear. Lower bit rate codecs are more complex, introduce more delay but are also cheaper. Codec performance is evaluated based on metrics like bit rate, quality and complexity. WLL systems use cells and sectors to serve subscribers via base stations within a given bandwidth. Codecs play a key role in improving the capacity of these systems.
The document discusses a summer training program at PLC Institute of Electronics in Rohini, Delhi related to embedded systems and robotics. It includes an acknowledgement, preface, and sections on the technology used, electronics parts, mechanical parts, software parts, project snapshots, and conclusion. The training provided the student with an opportunity to learn about embedded systems and gain experience working in an industrial setting as part of a team.
This document discusses Voice over Internet Protocol (VoIP). It begins by introducing VoIP and how it allows phone calls and faxes to be sent over IP-based data networks. It then discusses how VoIP works by digitizing voice, compressing it into packets, transmitting the packets over the internet, and reconstructing the voice signal at the receiving end. The document also covers some key components of a VoIP system such as encoders, decoders, and quality of service mechanisms. Finally, it briefly mentions that most VoIP implementations follow the ITU H.323 standard.
This document summarizes how a fixed landline call is setup and routed through the telecommunications network. It first goes to the nearest switching center (PSTN). If the caller and receiver are in the same base switching center (BSC), the call is completed. Otherwise, it is transferred to the main switching center (MSC) and then to the receiver's prior BSC. For mobile calls, the call transfer is handled by the mobile telephone switching office (MTSO) and base transceiver stations (BTSs). The telephone exchange checks if numbers are valid and sends busy tones if lines are engaged. Calls are connected between telephones via remote exchanges.
The document provides an overview of techniques for secure speech communication, including speech coding, speaker identification, and encryption/decryption. It discusses various speech coding techniques like waveform coding, parametric coding, and hybrid coding that can compress speech signals while maintaining quality. It also describes speaker identification methods using hashing to authenticate users. For encryption, it outlines symmetric techniques like AES that use a shared key, and asymmetric techniques like RSA that use public/private key pairs. The goal is to integrate these methods to provide a high level of security for speech communication by removing redundancy, authenticating speakers, and strongly encrypting signals.
The document provides an overview of Signaling System 7 (SS7), which is an international standard for exchanging call setup, routing, and control information between telecommunications network elements in the public switched telephone network (PSTN). Key points discussed include:
- SS7 enables enhanced services like caller ID, call forwarding, toll-free numbers, and wireless roaming.
- SS7 uses a separate packet-switched network for signaling data rather than in-band signaling over voice channels.
- The SS7 network consists of switching points like service switching points (SSPs), signal transfer points (STPs), and service control points (SCPs).
- SS7 signaling links connect these
Interest towards speech coding & standardization:
– World wide growth in communication networks
– Emergence of new multimedia applications
– Advances in Very Large-Scale Integration (VLSI)
devices
• Standardization
– International Telecommunications Union (ITU)
– European Telecom. Standards Institute (ETSI)
– International Standards Organization (ISO)
– Telecommunication Industry Association (TIA), NA
– R&D Center for Radio systems (RCR), Japan
The document summarizes research on extracting speaker-specific information from the residual signal obtained from linear predictive coding (LPC) analysis of speech. It discusses how LPC analysis separates speech into a vocal tract system component and a source excitation component. The residual signal contains information about the excitation source, including prosodic and speaker-specific characteristics. The document proposes extracting mel-frequency cepstral coefficients (MFCC) from the residual signal to capture this speaker-specific information. It describes performing LPC analysis at different orders to vary the amount of vocal tract system information remaining in the residual. The researchers believe the residual signal contains robust speaker characteristics that can be used for automatic text-independent speaker tracking when classified with a feed-forward neural network model
The document discusses speech compression using GSM RPE-LTP encoding. It begins with an introduction to GSM standards and architecture. It then describes how speech is generated and modeled in the GSM 6.10 vocoder using the RPE-LTP algorithm. The algorithm compresses speech by analyzing the signal to determine if it is voiced or unvoiced, encoding the periodicity of voiced sounds, and transmitting the filter parameters. At the receiver, the decoder uses these parameters to reconstruct the speech signal through linear predictive coding, long term prediction synthesis filtering, and residual pulse decoding.
BTCL is Bangladesh's primary telecommunications provider. It operates landline telephone services across urban areas of Bangladesh and provides international calling. While BTCL previously had a monopoly on phone services in Dhaka, several other operators have entered the market since 2004.
BTCL's network uses digital and analog switching technologies. Exchanges like the ZTE and Shanghai Bell switches route calls between local loops, trunks, and switching offices. Services include landlines, wireless networks, leased lines, fax, and prepaid cards. Calls are established through switching cards like the ASLC cards which provide dial tones.
Maintenance of the network involves equipment like cable cabinets, distribution frames, and distribution points. Cables connect these
This document provides an overview of key concepts for drive testing, including physical verification, instruments used, antenna adjustments, tilt types, cell identifiers, frequency reuse, and Test Equipment Mobile System (TEMS). It discusses topics like antenna beamwidth, receiver level, logfile contents, neighboring cell information, and parameters to check during drive testing like radio quality, interference levels, and timing advance. The document is intended as an introductory guide for learning the basic concepts needed for effective drive testing.
The document summarizes research on multiple description coding for audio and speech to provide redundancy and improve quality over lossy networks. It describes a multiple description speech coder based on the AMR-WB standard that splits bits between two descriptions. It also discusses applying multiple description correlating transforms to perceptual audio coding to achieve better quality than a single description when packets are lost. The research was evaluated using simulations and subjective listening tests.
The document discusses GSM authentication, localization, and handover.
For authentication, the document describes how the GSM network authenticates subscribers through a challenge-response mechanism using triplets generated by the AuC containing a random number (RAND), signed response (SRES), and ciphering key (Kc). The MS uses the RAND and its stored Ki to generate the SRES which is sent back to the network for verification.
For localization, the document explains that the GSM network always knows a user's location through the HLR and VLR updating location as the MS moves between MSCs. Localization can occur through network-based, handset-based, SIM-based, or hybrid methods
The document discusses authentication and ciphering in GSM networks. It describes the objectives of authentication to check authorization and provide parameters to calculate new ciphering keys. It outlines the authentication triplet of RAND, Kc, and SRES. It provides a detailed 7-step process for authentication between the MS, MSC, HLR, and AuC. It also discusses ciphering to prevent information interception and the use of the A5 algorithm with the Kc to encrypt data between the MS and BTS.
- Fixed Mobile Convergence (FMC) allows seamless switching of voice calls between fixed and mobile networks using a single device. When a WiFi signal is available, calls switch to the fixed network like broadband; otherwise the mobile network is used.
- FMC provides both fixed and mobile calling services with one phone that can switch between networks automatically. It requires a WiFi-enabled dual-mode handset supported by an FMC client and server.
- Examples include allowing calls on a Vodafone handset to use BT's broadband network via WiFi when at home, and generic access network (GAN) standards for roaming between wireless local and wide area networks.
Speech coding is the art of creating a minimally redundant representation of the speech signal that can be efficiently transmitted or stored in digital media, and decoding the signal with the best possible perceptual quality.
Learn the art of speech coding. Enroll in Image and Video Communication Certificate Course by IIT Kharagpur.
Topics covered in this presentation:
What is a Base Transceiver Station ?
Components of any BTS
BTS transceiver, BTS O&M module, clock module
BTS Transmitter and Receiver Characteristics
BTS configurations
BTS functions and Protocols on Um and Abis Interface
BTS security aspects
The document discusses the Global System for Mobile (GSM) communications, including an overview of GSM concepts, system architecture, identities and channels used, the radio link, mobility and call management, and radio resource management. It provides background on the development of GSM standards and specifications. The document also covers topics like GSM network structures, frequency bands, channel access techniques, and mobility functions like timing advance.
Introduction to GSM - an Overview of Global System for Mobile Communicationiptvmagazine
This slideshow explains the basic components, technologies used, and operation of Global System for Mobile Communication - GSM - systems. You will discover the evolution of GSM; 1st generation analog systems, 2nd generation GSM systems (digital voice), 3rd generation multimedia, and 4th generation wideband ultra broadband systems.
You will learn the key system components and basic services that GSM systems can provide. Discover the types of GSM devices which include mobile telephones, wireless PCMCIA cards, embedded radio modules, and external radio modems. The different types of services are described including voice services, data services, and messaging services.
Learn about the physical and logical radio channel structures of the GSM system along with the basic frame and slot structures. The operation of the GSM radio channels are explained including channel coding, modulation types, speech coding, RF power control, and mobile assisted handover. GSM radio channel have 8 time slots per frame and that some of these are used for signaling (control channels) and others are used for user traffic (voice and data).
Monitoring & Controlling of Devices using GSMpriyanka kini
This document describes a system to remotely monitor and control devices using GSM technology. The system uses a microcontroller to receive SMS commands from a mobile phone via a GSM modem and control devices connected to relays. It allows devices like fans and lamps to be turned on and off from a distance. The system provides advantages like remote control from anywhere using basic phone operation and potential applications in home, office, and industrial automation.
The document discusses GSM signaling and mobile signaling. GSM signaling defines communications between the mobile and network using different protocols across interfaces. Mobile signaling involves the mobile searching for frequencies, synchronizing, downloading information, selecting a network, and signaling to the network by sending a service request when a call is made.
The document provides an overview of the GSM network architecture, including its three main subsystems: the Mobile Station subsystem, the Base Station Subsystem, and the Network Switching Subsystem. It describes the key elements and interfaces within each subsystem, such as the Mobile Station, Base Transceiver Station, Base Station Controller, Mobile Switching Center, Home Location Register, and Visitor Location Register. The interfaces that connect these elements, such as the A, Abis, and Um interfaces, are also introduced.
In this paper, we discussed about LTE system throughput calculation for both TDD and FDD system.
3GPP LTE technology support both TDD and FDD multiplexing. The paper describes all the factors which affect the throughput like Bandwidth, Modulation, UE category and mulplexing. It also describes how we get throughput 300Mbps in DL and 75Mbps in UL and what are assumptions taken to calculate the same.
Paper describes the steps and formulae to calculate the throughput for FDD system for TDD Config 1 and Config 2.
The throughput calculations shown in this paper is theoretical and limited by the assumptions taken to calculate for calculations
Voice recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.
This document describes how to build a simple, yet complete and representative automatic speaker recognition system. Such a speaker recognition system has potential in many security applications. For example, users have to speak a PIN (Personal Identification Number) in order to gain access to the laboratory door, or users have to speak their credit card number over the telephone line to verify their identity. By checking the voice characteristics of the input utterance, using an automatic speaker recognition system similar to the one that we will describe, the system is able to add an extra level of security.
Speech compression analysis using matlabeSAT Journals
This document discusses speech compression analysis using MATLAB. It begins with an introduction to speech compression, noting its importance for efficient storage and transmission of audio data. It then discusses various speech compression techniques, including lossy and lossless compression as well as standards like MPEG. It focuses on using the discrete cosine transform and MATLAB commands to analyze speech signals, including reading wav files, applying windowing functions and the DCT, and playing/viewing the output. The document concludes by discussing current applications of speech compression technologies like MPEG.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
Speech Analysis and synthesis using VocoderIJTET Journal
Abstract— In this paper, I proposed a speech analysis and synthesis using a vocoder. Voice conversion systems do not create new speech signals, but just transform existing one. The proposed speech vocoding is different from speech coding. To analyze the speech signal and represent it with less number of bits, so that bandwidth efficiency can be increased. The Synthesis of speech signal from the received bits of information. In this paper three aspects of analysis have been discussed: pitch refinement, spectral envelope estimation and maximum voiced frequency estimation. A Quasi-harmonic analysis model can be used to implement a pitch refinement algorithm which improves the accuracy of the spectral estimation. Harmonic plus noise model to reconstruct the speech signal from parameter. Finally to achieve the highest possible resynthesis quality using the lowest possible number of bits to transmit the speech signal. Future work aims at incorporating the phase information into the analysis and modeling process and also synthesis these three aspects in different pitch period.
Speech Recognition Systems(SRS) have been implemented by various processors including the digital signal processors(DSPs) and field programmable gate arrays(FPGAs) and their performance has been reported in literature. The fundamental purpose of speech is communication, i.e., the transmission of messages.In the case of speech, the fundamental analog form of the message is an acoustic waveform, which we call the speech signal. Speech signals can be converted to an electrical waveform by a microphone, further manipulated by both analog and digital signal processing, and then converted back to acoustic form by a loudspeaker, a telephone handset or headphone, as desired.The recognition of speech requires feature extraction and classification. The systems that use speech as input require a microcontroller to carry out the desired actions. In this paper, Cypress Programmable System on Chip (PSoC) has been studied and used for implementation of SRS. From all the available PSoCs, PSoC5 containing ARM Cortex-M3 as its CPU is used. The noise signals are firstly nullified from the speech signals using LogMMSE filtering. These signals are then sent to the PSoC5 wherein the speech is recognized and desired actions are performed.
Speech Recognition Systems(SRS) have been implemented by various processors including the digital signal processors(DSPs) and field programmable gate arrays(FPGAs) and their performance has been reported in literature. The fundamental purpose of speech is communication, i.e., the transmission of messages.In the case of speech, the fundamental analog form of the message is an acoustic waveform, which we call the speech signal. Speech signals can be converted to an electrical waveform by a microphone, further manipulated by both analog and digital signal processing, and then converted back to acoustic form by a loudspeaker, a telephone handset or headphone, as desired.The recognition of speech requires feature extraction and classification. The systems that use speech as input require a microcontroller to carry out the desired actions. In this paper, Cypress Programmable System on Chip (PSoC) has been studied and used for implementation of SRS. From all the available PSoCs, PSoC5 containing ARM Cortex-M3 as its CPU is used. The noise signals are firstly nullified from the speech signals using LogMMSE filtering. These signals are then sent to the PSoC5 wherein the speech is recognized and desired actions are performed.
1) The document describes a Cordless Power Controller (CPC) that allows controlling lights, fans, and other appliances in the home using a cordless phone by connecting the CPC between the cordless phone receiver and telephone lines.
2) The CPC works in two modes - a power control mode where codes are dialed to control devices, and a telephone mode. A microcontroller generates firing pulses to control the TRIAC and thereby control appliance brightness and speed.
3) The document also covers telephone signaling systems including touch tone and DTMF dialing, as well as details on how a TRIAC works and can be used to control AC loads like lamps and fans.
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
This document describes a speech recognized automation system using speaker identification through wireless communication. The system uses a speech processor and MATLAB coding with MFCC algorithms to perform speech recognition and speaker identification. It then wirelessly controls electrical devices based on speech commands. Testing showed 80-85% accuracy for the actual speaker and lower (10-20%) accuracy for other speakers. Future work could involve improving speaker recognition accuracy as the number of speakers increases.
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
This paper discusses the methodology for a project named “Speech Recognized Automation System
using Speaker Identification through wireless communication”. This project gives the design of Automation
system using wireless communication and speaker recognition using Matlab code. Straightforward
programming interface of Matlab makes it an ideal tool for speech analysis in project. This automation system
is useful for home appliances as well as in industry. This paper discusses the overall design of a wireless
automation system which is built and implemented. The speech recognition centers on recognition of speech
commands stored in data base of Matlab and it is matched with incoming voice command of speaker. Mel
Frequency Cepstral Coefficient (MFCC) algorithm is used to recognize the speech of speaker and to extract
features of speech. It uses low-power RF ZigBee transceiver wireless communication modules which are
relatively cheap. This automation system is intended to control lights, fans and other electrical appliances in a
home or office using speech commands like Light, Fan etc. Further, if security is not big issue then Speech
processor is used to control the appliances without speaker identification
Financial Transactions in ATM Machines using Speech SignalsIJERA Editor
Speech is the natural and simplest way of communication and Speech Recognition is a fascinating application of Digital Signal Processing which has many real-world applications. In this paper, a speech recognition system is developed for Automated Teller Machines (ATMs) using Wavelet Packet Decomposition (WPD) and Artificial Neural Networks (ANN). Speech signals are one-dimensional and are random in nature. ATM machines communicate with the customers using the stored speech samples and the user communicates with the machine using spoken digits. Daubechies wavelets are employed here. A multilayer neural network trained with back propagation training algorithm is used for classification purpose. The proposed method is implemented for 500 speakers uttering 10 spoken digits in English. The experimental results show good recognition accuracy of 87.38% and the efficiency of combining these two techniques
Voice Recognition Based Automation System for Medical Applications and for Ph...IRJET Journal
This document describes a voice recognition-based automation system for medical applications and physically challenged patients. The system uses a voice recognition model, Arduino microcontroller, relays, LEDs, buzzers, and a motor to control an adjustable bed. Voice commands are recognized using techniques like MFCC and HMM and used to control devices via the Arduino. The system is intended to allow paralyzed patients to control devices like lights, alarms, and their bed using only voice commands for increased independence. Testing showed the system provided accurate voice recognition under various conditions.
Voice Recognition Based Automation System for Medical Applications and for Ph...IRJET Journal
This document describes a voice recognition-based automation system for medical applications and physically challenged patients. The system uses a voice recognition model, Arduino microcontroller, relays, LEDs, buzzers, and a motor to control an adjustable bed. Voice commands are recognized using techniques like MFCC and HMM and used to control devices via the Arduino. The system is intended to allow paralyzed patients to control devices like lights, alarms, and their bed using only voice commands for increased independence. Testing showed the system can accurately recognize commands and control devices with 99% accuracy under suitable conditions.
Audio/Speech Signal Analysis for Depressionijsrd.com
The word “depressed†is a common everyday word. People might say "I am depressed" when in fact they mean "I am fed up because I have had a row, or failed an exam, or lost my job", etc. These ups and downs of life are common and normal. Most people recover quite quickly. Depression is identified by different methods. Here we are identified depression by MFCC (Mel Frequency Ceptral Coefficient) method. There are different parameters used for the identification of depressed speech and normal speech, but MFCCs based parameter is the most applicable information then other parameter because depressive speech or audio signal can contain more information in the higher energy bands when compared with normal speech.
This document discusses feature extraction techniques for isolated word speech recognition. It begins with an introduction to digital speech processing and speech recognition models. The main part of the document compares two common feature extraction techniques: Mel Frequency Cepstral Coefficients (MFCC) and Relative Spectral (RASTA) filtering. MFCC allows signals to extract feature vectors and provides high performance but lacks robustness. RASTA filtering reduces the impact of noise in signals and provides high robustness by band-passing feature coefficients in both log spectral and spectral domains. The document provides details on the process of MFCC feature extraction, which involves steps like framing, windowing, fast Fourier transform, mel filtering, discrete cosine transform, and calculating
Audio compression has become one of the basic technologies of the multimedia age. The change in the telecommunication infrastructure, in recent years, from circuit switched to packet switched systems has also reflected on the way that speech and audio signals are carried in present systems. In many applications, such as the design of multimedia workstations and high quality audio transmission and storage, the goal is to achieve transparent coding of audio and speech signals at the lowest possible data rates. In other words, bandwidth cost money, therefore, the transmission and storage of information becomes costly. However, if we can use less data, both transmission and storage become cheaper. Further reduction in bit rate is an attractive proposition in applications like remote broadcast lines, studio links, satellite transmission of high quality audio and voice over internet.
A survey on Enhancements in Speech RecognitionIRJET Journal
This document discusses enhancements in speech recognition and provides an overview of the history and basic model of speech recognition. It summarizes key enhancements researchers have made to improve speech recognition, especially in noisy environments. The basic model of speech recognition involves speech input, preprocessing using techniques like MFCCs, classification models like RNNs and HMMs, and output of a transcript. Researchers are working to develop robust speech recognition that can understand speech in any environment.
Voice Controlled Intelligent Remote for Consumer ElectronicsNITIN DESAI
1) The document describes a voice-controlled intelligent remote system for consumer electronics that is designed to help physically challenged individuals.
2) The system uses a Raspberry Pi to convert voice commands to text and an 8051 microcontroller to generate corresponding infrared signals to control devices.
3) The system was implemented and tested successfully to control a DVD player by voice. It has the potential to be expanded to control multiple devices and provide universal remote functionality to benefit physically disabled users.
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
This document analyzes speech recognition performance based on neural network layers and transfer functions. It describes a system setup that uses MFCC features and ANNs for acoustic modeling. Experiments were conducted by varying the number of neural network layers (1, 2, 3 layers) and transfer functions (linear, sigmoid, tangent sigmoid). Results showed that networks with linear transfer functions in all layers achieved the performance goal, while other configurations reached minimum gradient but not the goal. The best architecture was a 3-layer network with linear transfer functions.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
2. T-61.246 Digital Signal Processing and Filtering 2(14)
Kristo Lehtonen 55788E
1.Table of contents
1. Table of contents......................................................................................................................2
2. Introduction..............................................................................................................................3
3. Speech creation........................................................................................................................4
4. Different codecs .......................................................................................................................7
4.1. Waveform codecs.................................................................................................................8
4.2. Source codecs.......................................................................................................................9
4.3. Hybrid codecs ....................................................................................................................10
5. GSM codec.............................................................................................................................12
5.1. Full-rate codec....................................................................................................................12
5.2. Half-rate codec...................................................................................................................13
6. References..............................................................................................................................14
3. T-61.246 Digital Signal Processing and Filtering 3(14)
Kristo Lehtonen 55788E
2.Introduction
The conversion of analogue speech waveform into digital form is usually called speech coding. A
major benefit of using speech coding is the possibility to compress the signal, that is, to reduce the
bit rate of the digital speech. In other words speech codec represents speech with as few bits as
possible, while maintaining a speech quality that is acceptable.
The efficient digital representation of the speech signal makes it possible to achieve bandwidth
efficiency both in the transmission of the signal, e.g. between two antennas in a GSM system, as
well as in storing the signal, e.g. on a magnetic media such as a GSM telephone memory. In GSM
systems this kind of functionality is of critical importance, because in mobile communication the
channel bandwidth is limited.
What comes to the quality criterion, in transferring speaking voice over a GSM media the quality
of the sound doesn’t have to be nearly as good as in the case of e.g. listening Mozart’s symphony
from a CD-player. When two people are speaking on the phone the quality of speech doesn’t need
to be perfect in order for mutual understanding to happen. In the near future, however, even
Mozart’s symphonies might be transferred over some mobile media. Accordingly, the importance
of speech coding will probably increase with the arrival of multimedia services.
In the subsequent chapters the speech creation process is first explained in order to gain
understanding of the basic principles in speech coding. After that different speech codec types will
be introduced and explained in a bit more detail. Last, codec used in GSM will be examined with a
bit more detail.
Issues concerning delay and complexity of different codecs are excluded from this paper. The
codec used in GSM systems is presented based on the original 13 kbits/s full-rate RPE codec.
Newest standardisation is also excluded from this paper.
4. T-61.246 Digital Signal Processing and Filtering 4(14)
Kristo Lehtonen 55788E
3.Speech creation
In order to understand speech codecs it is important to have some knowledge of the basic features
of human speech. Many of these features can be used to design effective speech codecs. The most
important of these features will now be explained.
When a person starts talking on his or her GSM mobile phone, what actually happens is depicted
in the figure 1. By the use of muscle force air is moved from the lungs through the vocal tract to
the microphone in the mobile device. The vocal tract extends from the glottis to the mouth,
including the three different cavities depicted in figure 1.
Figure 1 A block diagram of human speech production
Sound is produced when the glottis, which is an opening in the vocal cords, vibrates open and
close. This interrupts the flow of air and creates a sequence of impulses, which have some basic
frequency called the pitch. With males this frequency is typically 80-160 Hz and with females
180-320 Hz. From figure 2 we can see that a typical speech signal clearly has a periodic nature.
This is due to the pitch of the sound. The male voice sample in the figure has a period of roughly
10 ms giving a pitch of 100 Hz, which fits into the range above for male pitch. The periodic nature
of the speech is an important feature, which can be used when designing codecs.
Muscle force
Glottis
Lungs
Pharyngeal
cavity
Mouth
cavity
Nasal cavity
GSM
phone
5. T-61.246 Digital Signal Processing and Filtering 5(14)
Kristo Lehtonen 55788E
Figure 2 An example of a male speech in time-domain: vocal ‘aaa’
The spectrum of the sound is formed in the vocal tract. The strongest of the frequency components
in the speech are called formants. In the figure 3, which depicts the spectrum of the vocal sample
in figure 2, it is easy to distinguish the strongest formants, which are located roughly at every
frequency n*100 Hz, where n is a positive integer.
The frequencies in the spectrum of the sound are controlled by varying the shape of the tract, for
example by moving the tongue. An important part of many speech codecs is the modelling of the
vocal tract as a filter. The shape of vocal tract varies quite infrequently, which means that also the
transfer function of the modelling filter needs to be updated infrequently – usually after every 10-
40 ms or so. Due to the nature of the vocal tract the speech has also short-term correlations of the
order of 1 ms, which is another important feature of the human speech system.
There are also features in the human capability to receive sound, which can be exploited. Human
ear can only distinguish frequencies between 16 - 20 000 Hz. Even this bandwidth can be greatly
reduced without the understanding of the received speech suffering at all. In public telephone
networks only the frequencies between 300 – 3400 Hz are transferred.
Another basic property of the human speech system is that the human ear can’t differentiate
between changes in the speech signal of magnitude below certain level. Furthermore, the
resolution, that is the capability to differentiate small changes, is logarithmically divided, so that
with low frequencies the human ear is more sensitive than with higher frequencies. These
properties can be exploited in a clever quantisation schemes of the speech.
6. T-61.246 Digital Signal Processing and Filtering 6(14)
Kristo Lehtonen 55788E
Figure 3 Frequency presentation of the sample – amplitude response
7. T-61.246 Digital Signal Processing and Filtering 7(14)
Kristo Lehtonen 55788E
4.Different codecs
Speech coding can be defined as representing analogue waveforms with a sequence of binary
digits. The fundamental idea behind coding schemes is to take advantage of the special features of
human speech system, which were explained in the previous chapter: the statistical redundancy
and the shortcomings in human capability to receive sounds.
As was explained in earlier chapter, the speech signal varies quite infrequently, resulting in a high
degree of correlation between consecutive samples. This short-term correlation is due to the nature
of the vocal tract. There exists also long-term correlation due to the periodic nature of the speech.
This statistical redundancy can be exploited by introducing prediction schemes, which quantisise
the prediction error instead of the speech signal itself.
The shortcomings in human capability to receive sounds, on the other hand, lead to the fact that a
lot of information in the speech signal is perceptually irrelevant. The perceptual irrelevancy means
that the human ear can’t differentiate between changes of magnitude below a certain level and
can’t distinguish frequencies below 16 Hz or above 20 000 Hz. This can be exploited by designing
optimum quantisation schemes, where only a finite number of levels are necessary.
Figure 4 Illustration the fundamental idea behind speech coding
Figure 4 illustrates this fundamental idea of speech coding schemes: irrelevancy is minimised by
quantisation and redundancy removed by prediction. It is worthwhile notifying here that
quantisation introduces some loss of information, even if it is of irrelevant magnitude, whereas
prediction will in general preserve all information in the signal.
Another feature explained in the previous chapter, which can be used in designing effective codecs
was the modelling of the speech creation system as a filter.
There are three different speech coding methods, which use these different features in different
ways:
• Wavefrom coding
• Source coding
• Hybrid coding
Redundant
Non-Redundant
IrrelevantRelevant
Effect of predictive
coding and transform
coding Effect of amplitude
quantisation
Description of
channel signal after
efficient coding
8. T-61.246 Digital Signal Processing and Filtering 8(14)
Kristo Lehtonen 55788E
4.1. Waveform codecs
The basic difference between waveform and source codecs is depicted in their names. Source
codecs try to produce a digital signal by modelling the source of the codec, whereas waveform
codecs don’t use any knowledge of the source of the signal but instead try to produce a digital
signal whose waveform is as identical as possible to the original analog signal.
Pulse Code Modulation (PCM) is the most simple and purest waveform codec. PCM involves
merely the sampling and quantising of the input waveform. Speech in Public Switched Telephone
Networks, for example, is band-limited to about 4 kHz and in accordance with the Nyqvist rule
sampled at a sampling frequency of 8kHz. If linear quantisation were used, around twelve bits per
sample would be needed to give a good quality speech. This would mean a bit rate of 8*12 kbit/s =
96 kbit/s. This bit rate can be reduced by using non-unifrom quantisation. In Europe A-law is used
(look at picture 4). By using this non-linear A-law 8 bits per sample are sufficient to give a good
quality speech.
Figure 4 A rough sketch of the non-uniform quantisation according to the A-law. The axis values are not in scale.
Thus, by using the A-law we get a bit rate of 8*8 kbit/s = 64 kbit/s. In America u-law is the
standard, which differs somewhat from the European A-law.
A commonly used technique in speech coding is to attempt to predict the value of the next sample
from the previous samples. It is possible to do this because of the correlations present in speech
signals, as was mentioned earlier. The trick is to use error signal containing the difference between
the predicted signal and the actual signal instead of using the actual signal. This can be expressed
in the form:
)(')()( nsnsnr −= , where s(n) is the original signal, s’(n) the predicted signal and r(n) the
reconstruction error.
If these predictions are effective the error signal will have a lower variance than the original
speech samples. This is measured in terms of Signal-to-quantisation-Noise Ratio (SNR), which is
defined in dB form as:
2
2
log10
r
s
SNR
σ
σ
= , where 2
sσ is the signal variance and 2
rσ the reconstruction error variance.
Voltage amplitude1024 2048 3072 4096-1024-2048-3072-4096 -32
32
64
96
128
-64
-96
-128
Quantisising value
9. T-61.246 Digital Signal Processing and Filtering 9(14)
Kristo Lehtonen 55788E
Low SNR in turn means it is possible to quantisise this error signal with fewer bits than the
original speech signal. This is the basis of Differential Pulse Code Modulation (DPCM) schemes.
The predictive coding schemes can be made even more effective if the predictor and quantiser are
made adaptive so that they change as a function of time to match the characteristics of the speech
signal. This in turn leads to Adaptive Differential PCM (ADPCM).
All the three waveform codecs depicted above used time domain approach in coding the speech
signal. Frequency domain approach can also be used. One example of this is Sub-Band Coding
(SBC). In SBC the input speech is split into a number of frequency bands, or sub-bands, and each
is coded independently using, for example, an ADPCM coding scheme. At the receiver the sub-
band signals are decoded and recombined to give the reconstructed speech signal. The trick with
this coding scheme is that not all of the sub-bands are equally important in, for example,
transferring speech over mobile media. Therefore we can allocate more bits to perceptually more
important sub-bands so that the noise in these frequency regions is low, while in perceptually less
important sub-bands we may allow a higher coding noise because noise at these frequencies is
perceptually less important.
4.2. Source codecs
As was explained above, waveform codecs try to produce a digital presentation of the analog
speech signal whose waveform is as identical as possible to the original signal. Source codecs, on
the other hand, try to produce a digital signal by using a model of how the source was generated,
and attempt to extract, from the signal being coded, the parameters of the model. It is these model
parameters, which are transmitted to the decoder. Source codecs for speech are called vocodecs,
and work in a following way. The vocal tract is represented as a time-varying filter and is excited
with either a white noise source, for unvoiced speech segments, or a train of pulses separated by
the pitch period for voiced speech. The information, which must be sent to the decoder is the filter
specification, information if it’s voiced or unvoiced speech segment, the necessary variance of the
excitation signal, and the pitch period for voiced speech. This is updated every 10-20 ms in
accordance with the nature of normal speech. This procedure is depicted in a rough presentation in
figure 5.
Figrure 5 A presentation of the speech creation process as used in source coding
The model parameters can be determined by the encoder in a number of different ways, using
either time or frequency domain techniques. Also the information can be coded for transmission in
various different ways. The main use of vocoders has been in military applications where natural
Generation of
noise
Unvoiced
Switch
Vocal tract as
a filter
Filtered speech
Generation of
impulses
Voiced
Pitch
The parameters in
the vocal tract
10. T-61.246 Digital Signal Processing and Filtering 10(14)
Kristo Lehtonen 55788E
sounding speech is not as important as a very low bit rate to allow heavy protection and
encryption.
4.3. Hybrid codecs
Hybrid codecs attempt to fill the gap between waveform and source codecs. Waveform codecs are
capable of providing good quality speech at bit rates down to about 16 kbits/s, but are of limited
use at rates below this. Source codecs on the other hand can provide understandable speech at 2.4
kbits/s and below, but cannot provide natural sounding speech at any bit rate. The quality of
different codecs as a function of bit rate is depicted in figure 5.
Figure 6 Speech quality as a function of bit rate for three different speech coding methods
As can be seen from the figure 6, hybrid codecs combine techniques from both source and
waveform codecs and as a result give good quality with intermediate bit rates.
The most successful and commonly used hybrid codec type is Analysis-by-Synthesis (AbS)
codecs. Such coders use the same linear prediction filter model of the vocal tract as found in
source codecs. However instead of applying a simple two-state, voiced/unvoiced, model to find the
necessary input to this filter, the excitation signal is chosen by attempting to match the
reconstructed speech waveform as closely as possible to the original speech waveform. Thus AbS
codecs combine the techniques of waveform and source codecs.
AbS codecs work by splitting the input speech to be coded into frames, typically about 20 ms long.
For each frame parameters are determined for a filter called synthesis filter (look at figure 5). The
excitation to this synthesis filter is determined by finding the excitation signal, which minimises
the error between the input speech and the reconstructed speech. Thus the name Analysis-by-
Synthesis - the encoder analyses the input speech by synthesising many different approximations
to it. The basic idea is that each speech sample can be approximated by a linear combination or the
preceding samples. The synthesis filter is of the form
)(
1
)(
zA
zH = , where ∑
=
−
+=
p
k
k
k zazA
1
1)( . The ak in the formula is coefficient called the Linear
Predictive Coefficient (LPC). The coefficients are determined by minimising difference between
actual signal and predicted signal by the use of least square method. The variable p gives the order
of the filter. This filter is intended to model the short-term correlations introduced into the speech
by the action of the vocal tract. This kind of coding is also called Linear Predictive Coding (LPC).
Excellent
Good
Poor
Fair
Quality of service
Hybrid codecs
Waveform codecs
Source codecs
1 82 Bit Rate kbits/s164 32 64
11. T-61.246 Digital Signal Processing and Filtering 11(14)
Kristo Lehtonen 55788E
In order to make the codec even more effective instead of only using the short-term correlations
also the quasi-periodic nature of the human speech, that is long-term correlations, have to be used.
In the short-term linear prediction the correlations between samples less then 16 samples apart are
examined. In the long-term prediction (LTP) schemes the correlations between samples from
between 20-120 samples apart are examined. The transfer function can be presented in the form:
N
bzzP −
+=1)( , where N is the period of the basic frequency (the pitch) and b is the linear
prediction coefficient (LPC). N is chosen so that the correlation of signal x[n], which is the
sampled signal, with signal x[n+N] is maximised.
Multi-Pulse Excited codec is similar to AbS codec. The difference is that in a MPE codec the
excitation signal to the filter is given by a fixed number of non-zero pulses for every frame of
speech. Another close relative to AbS codecs are Regular Pulse Excited (RPE) codecs, which is
also the technique used in GSM codec. Like the MPE codec the RPE codec uses a number of non-
zero pulses to give the excitation signal. However in RPE codecs the pulses are regularly spaced at
some fixed interval. This means that the encoder needs only to determine the position of the first
pulse and the amplitude of all the pulses, whereas with MPE codecs the positions of all of these
non-zero pulses within the frame, and their amplitudes, must be determined by the encoder and
transmitted to the decoder. Therefore, with RPE codec less information needs to be transmitted
about pulse positions and this is critically important in mobile solutions like GSM where the
bandwidth is especially scarce.
Although MPE and RPE codecs can provide good quality speech at rates of around 10 kbits/s and
higher, they are not suitable for rates much below this. This is due to the large amount of
information that must be transmitted about the excitation pulses' positions and amplitudes.
Currently the most commonly used algorithm for producing good quality speech at rates below 10
kbits/s is Code Excited Linear Prediction (CELP). In CELP codecs the excitation is given by an
entry from a large vector quantiser codebook, and a gain term to control its power.
12. T-61.246 Digital Signal Processing and Filtering 12(14)
Kristo Lehtonen 55788E
5.GSM codec
5.1. Full-rate codec
The GSM system uses Linear Predictive Coding with Regular Pulse Excitation (LPC-RPE codec).
It is a full rate speech codec and operates at 13 kbits/s. As a comparison, the old public telephone
networks use speech coding with bit rate of 64 kbit/s. Despite this there is no significant difference
in the speech quality.
The encoder processes 20 ms blocks of speech. Each speech block contains 260 bits as depicted in
figure 7 (188+36+36 = 260). This is reasonable since 260 bits / 20 ms = 13 000 bits/s = 13kbits/s.
The more precise distribution of bits can be seen in table 1. The encoder has three major parts:
1) Linear prediction analysis (short-term prediction)
2) Long-term prediction and
3) Excitation analysis
Figure 7 A diagram presentation of the GSM Full-Rate LPC-RPE codec
Linear prediction uses transfer function of the order of 8.
∑
=
−
+=
8
1
1)(
k
k
k zazA
Altogether, the linear predictor part of the codec uses 36 bits.
The long-term predictor estimates pitch and gain four times at 5 ms intervals. Each estimate
provides a lag coefficient and gain coefficient of 7 bits and 2 bits, respectively. Together these four
estimates require 4*(7+2) bits = 36 bits. The gain factor in the predicted speech sample ensures
that the synthesised speech has the same energy level as the original speech signal.
Linear
prediction
Long-term
prediction
Excitation
analysis
20 ms speech
block
Synthesis
filter
36 bits
36 bits
188 bits
Error
+-
13. T-61.246 Digital Signal Processing and Filtering 13(14)
Kristo Lehtonen 55788E
The remaining 188 bits are derived from the regular pulse excitation analysis. After both short and
long term filtering the residual signal, that is the difference between the predicted signal and the
actual signal, is quantised for each 5 ms sub-frame.
Bits per 5 ms block Bits per 20 ms block
LPC filter 8 parameters 36
LTP filter Delay parameter 7 28
Gain parameter 2 8
Excitation signal Subsampling phase 2 8
Maximum amplitude 6 24
13 samples 39 156
Total 260 bits
Table 1 The distribution of bits used in a GSM full-rate codec.
5.2. Half-rate codec
There is also a half-rate version of the GSM codec. It is a Vector Self-Excited Linear Predictor
(VSELP) codec at bit rate of 5.6 kbit/s. VSELP codec is a close relative of the CELP codec family
explained in the previous chapter. A slight difference is that VSELP uses more than one separate
excitation codebook, which are separately scaled by their respective excitation gain factors.
14. T-61.246 Digital Signal Processing and Filtering 14(14)
Kristo Lehtonen 55788E
6.References
Garg, Vijay K. & Wilkes, Joseph E.: Principles & Applications of GSM
Insinöörijärjestön koulutuskeskus: Julkaisu 129-90, Koodaus ja salaus
Penttinen, Jyrki: GSM tekniikka
Perkis, Andrew: Speech codec systems
Deller, John R. & Hansen, John H. L. & Proakis, John G.: Discrete-Time Processing of Speech
signals
Voipio, Kirsi & Uusitupa, Seppo: Tietoliikenneaapinen
Halme, Seppo J.: Televiestintäjärjestelmät
Lathi, B.P.: Modern Digital and Analog Communication systems
Mansikkaviita, Jari & Talvo, Markku: Johdatus solukkopuhelintekniikkaan
Penttinen, Jyrki: Kännyköinnin lyhyt oppimäärä
Carlson, Bruce A. & Crilly, Paul B. & Rutledge, Janet C.: Communication systems
Stalling, William: Wireless Communications Systems
Woodard, Jason: Speech coding website: http://www-mobile.ecs.soton.ac.uk/speech_codecs/
European Telecommunication Standard 300 580-2, RE/SMG-110610PR1