The document provides an introduction to audio signal processing and related topics. It discusses analog and digital audio signals, the waveform audio file format (WAV) specification including its header structure, and tools for audio processing like FFmpeg and MATLAB. Example code is given to read header metadata and audio samples from a WAV file in C++. While useful for understanding audio formats and processing, the solution contains an error and FFmpeg is noted as a better library for audio tasks.
Salient Features:
The magnitude response is nearly constant(equal to 1) at lower frequencies
There are no ripples in passband and stop band
The maximum gain occurs at Ω=0 and it is H(Ω)=1
The magnitude response is monotonically decreasing
As the order of the filter ‘N’ increases, the response of the filter is more close to the ideal response
The presentation covers sampling theorem, ideal sampling, flat top sampling, natural sampling, reconstruction of signals from samples, aliasing effect, zero order hold, upsampling, downsampling, and discrete time processing of continuous time signals.
The Presentation includes Basics of Non - Uniform Quantization, Companding and different Pulse Code Modulation Techniques. Comparison of Various PCM techniques is done considering various Parameters in Communication Systems.
Salient Features:
The magnitude response is nearly constant(equal to 1) at lower frequencies
There are no ripples in passband and stop band
The maximum gain occurs at Ω=0 and it is H(Ω)=1
The magnitude response is monotonically decreasing
As the order of the filter ‘N’ increases, the response of the filter is more close to the ideal response
The presentation covers sampling theorem, ideal sampling, flat top sampling, natural sampling, reconstruction of signals from samples, aliasing effect, zero order hold, upsampling, downsampling, and discrete time processing of continuous time signals.
The Presentation includes Basics of Non - Uniform Quantization, Companding and different Pulse Code Modulation Techniques. Comparison of Various PCM techniques is done considering various Parameters in Communication Systems.
This presentation shares some basic concepts about audio processing and image processing using MATLAB and was used as teaching material for an introductory workshop.
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...gt_ebuddy
Joint Speech and Speaker Recognition using Hidden Markov Model/Vector Quantization for speaker independent Speech Recognition and Gaussian Mixture Model for speech independent speaker recognition- used MFCC (Mel-Frequency Cepstral Coefficient) for Feature Extraction (delta,delta delta and energy - 39 coefficients).
Developed in JAVA with client/server Architecture, web interface developed in Adobe Flex.
This project was done at TU, IOE - Pulchowk Campus, Nepal.
For more details visit http://ganeshtiwaridotcomdotnp.blogspot.com
ABSTRACT OF PROJECT>>>
Biometric is physical characteristic unique to each individual. It has a very useful application in authentication and access control.
The designed system is a text-prompted version of voice biometric which incorporates text-independent speaker verification and speaker-independent speech verification system implemented independently. The foundation for this joint system is that the speech signal conveys both the speech content and speaker identity. Such systems are more-secure from playback attack, since the word to speak during authentication is not previously set.
During the course of the project various digital signal processing and pattern classification algorithms were studied. Short time spectral analysis was performed to obtain MFCC, energy and their deltas as feature. Feature extraction module is same for both systems. Speaker modeling was done by GMM and Left to Right Discrete HMM with VQ was used for isolated word modeling. And results of both systems were combined to authenticate the user.
The speech model for each word was pre-trained by using utterance of 45 English words. The speaker model was trained by utterance of about 2 minutes each by 15 speakers. While uttering the individual words, the recognition rate of the speech recognition system is 92 % and speaker recognition system is 66%. For longer duration of utterance (>5sec) the recognition rate of speaker recognition system improves to 78%.
Isolated words recognition using mfcc, lpc and neural networkeSAT Journals
Abstract Automatic speech recognition is an important topic of speech processing. This paper presents the use of an Artificial Neural Network (ANN) for isolated word recognition. The Pre-processing is done and voiced speech is detected based on energy and zero crossing rates (ZCR). The proposed approach used in speech recognition is Mel Frequency Cepstral Coefficients (MFCC) and combine features of both MFCC and Linear Predictive Coding (LPC). The back-propagation is used as a classifier. The recognition accuracy is increased when combine features of both LPC and MFCC are used as compared to only MFCC approach using Neural Network as a classifier.. Keywords: Pre-processing, Mel frequency Cepstral Coefficient (MFCC), Linear Predictive Coding (LPC), Artificial Neural Network (ANN).
Simulation Study of FIR Filter based on MATLABijsrd.com
First, the rapid design of FIR digital filter was completed by using the Signal Processing Toolbox FDA Tool, the case filter design of a composite signal by filtering, to prove that the content filter designed for filtering. MATLAB and Simulink programs of the filter were used to verify the performance of the filter in MATLAB. Experimental results show that the low-pass filter filters the high frequency component of input signals mixed. Comparison of two types of simulation, the latter method was more convenient quickly, and reduces the workload.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
This file contains slides that explains the IIR filter design techniques. Especially the time invariance and bilinear transformations. The material found in this presentation was taken from Oppenheim second edition reference book, I hope that anyone who read this presentation to leave a feedback that mention its suitability
Design of Filter Circuits using MATLAB, Multisim, and ExcelDavid Sandy
The purpose of this project was to design crossover active filter circuits, in order to drive music through three different types of speakers. So, high frequencies would be sent through a Tweeter speaker, low frequencies would be sent through a Woofer speaker, and middle frequencies would be sent through a Midbass driver speaker. Three circuits were created to drive these speakers. Multisim, MATLAB, and Excel, were all used in the design process in order to create the filter circuits correctly.
A description about image Compression. What are types of redundancies, which are there in images. Two classes compression techniques. Four different lossless image compression techiques with proper diagrams(Huffman, Lempel Ziv, Run Length coding, Arithmetic coding).
AI邊緣運算實作: TensorFlow Lite for MCU
https://bit.ly/3j2fIIt
[1]python程式設計
https://bit.ly/359cz4m
[2]AI機器學習&深度學習
http://bit.ly/2KDZZz4
[3]TensorFlow Lite for MCU
https://bit.ly/3j2fIIt
CPU Organization & Functionality
CPU Organization
Fundamental and features
Data Representation
Basic Formats
Fixed and Floating Representation
Instruction Set
Instruction Format
Type and Programming Consideration
Addressing Mode
Fixed –Point Arithmetic Multiplication Algorithm
Hardware Algorithm
Booth Multiplication Algorithm
In the last decade, several Scientific Knowledge Graphs (SKG) were released, representing scientific knowledge in a structured, interlinked, and semantically rich manner. But, what kind of information they describe? How they have been built? What can we do with them? In this lecture, I will first provide an overview of well-known SKGs, like Microsoft Academic Graph, Dimensions, and others. Then, I will present the Academia/Industry DynAmics (AIDA) Knowledge Graph, which describes 21M publications and 8M patents according to i) the research topics drawn from the Computer Science Ontology, ii) the type of the author's affiliations (e.g, academia, industry), and iii) 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO). Finally, I will showcase a number of tools and approaches using such SKGs, supporting researchers, companies, and policymakers in making sense of research dynamics.
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
Slides of the Lecture at the 5th International School on Applied Probability Theory,Communications Technologies & Data Science (APTCT-2020)
12 Nov 2020
ResearchFlow: Understanding the Knowledge Flow between Academia and IndustryAngelo Salatino
Understanding, monitoring, and predicting the flow of knowledge between academia and industry is of critical importance for a variety of stakeholders, including governments, funding bodies, researchers, investors, and companies. To this purpose, we introduce ResearchFlow, an approach that integrates semantic technologies and machine learning to quantifying the diachronic behaviour of research topics across academia and industry. ResearchFlow exploits the novel Academia/Industry DynAmics (AIDA) Knowledge Graph in order to characterize each topic according to the frequency in time of the related i) publications from academia, ii) publications from industry, iii) patents from academia, and iv) patents from industry. This representation is then used to produce several analytics regarding the academia/industry knowledge flow and to forecast the impact of research topics on industry. We applied ResearchFlow to a dataset of 3.5M papers and 2M patents in Computer Science and highlighted several interesting patterns. We found that 89.8% of the topics first emerge in academic publications, which typically precede industrial publications by about 5.6 years and industrial patents by about 6.6 years. However this does not mean that academia always dictates the research agenda. In fact, our analysis also shows that industrial trends tend to influence academia more than academic trends affect industry. We evaluated ResearchFlow on the task of forecasting the impact of research topics on the industrial sector and found that its granular characterization of topics improves significantly the performance with respect to alternative solutions.
Early Detection of Research Trends [thesis defence]Angelo Salatino
Being able to rapidly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. The literature presents several approaches to identifying the emergence of new research topics, which rely on the assumption that the topic is already exhibiting a certain degree of popularity and consistently referred to by a community of researchers. However, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. In this dissertation, we begin to address this challenge by performing a study of the dynamics preceding the creation of new topics. This study indicates that the emergence of a new topic is anticipated by a significant increase in the pace of collaboration between relevant research areas, which can be seen as the ‘ancestors’ of the new topic. Based on this understanding, we developed Augur, a novel approach to effectively detecting the emergence of new research topics. Augur analyses the diachronic relationships between research areas and is able to detect clusters of topics that exhibit dynamics correlated with the emergence of new research topics. Here we also present the Advanced Clique Percolation Method (ACPM), a new community detection algorithm developed specifically for supporting this task. Augur was evaluated on a gold standard of 1,408 debutant topics in the 2000-2011 timeframe and outperformed four alternative approaches in terms of both precision and recall.
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 15K topics and 70K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO we have developed the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO at different levels. Users can use the portal to rate topics and relationships, suggest missing relationships, and visualise sections of the ontology. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various communities engaged with scholarly data.
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 15K topics and 70K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO we have developed the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO at different levels. Users can use the portal to rate topics and relationships, suggest missing relationships, and visualise sections of the ontology. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various communities engaged with scholarly data.
Invited Talk: Early Detection of Research Topics Angelo Salatino
Slides of my talk at Chan Zuckerberg Initiative (Meta)
Abstract:
The ability to promptly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. While the literature describes several approaches which aim to identify the emergence of new research topics early in their lifecycle, these rely on the assumption that the topic in question is already associated with a number of publications and consistently referred to by a community of researchers. Hence, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. In this paper, we begin to address this challenge by performing a study of the dynamics preceding the creation of new topics. This study indicates that the emergence of a new topic is anticipated by a significant increase in the pace of collaboration between relevant research areas, which can be seen as the ‘parents’ of the new topic. These initial findings (i) confirm our hypothesis that it is possible in principle to detect the emergence of a new topic at the embryonic stage, (ii) provide new empirical evidence supporting relevant theories in Philosophy of Science, and also (iii) suggest that new topics tend to emerge in an environment in which weakly interconnected research areas begin to cross-fertilise.
AUGUR: Forecasting the Emergence of New Research TopicsAngelo Salatino
Being able to rapidly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. The literature pre-sents several approaches to identifying the emergence of new re-search topics, which rely on the assumption that the topic is al-ready exhibiting a certain degree of popularity and consistently referred to by a community of researchers. However, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. We address this issue by introducing Augur, a novel approach to the early detection of research topics. Augur analyses the diachronic relationships between research areas and is able to detect clusters of topics that exhibit dynamics correlated with the emergence of new research topics. Here we also present the Advanced Clique Percolation Method (ACPM), a new communi-ty detection algorithm developed specifically for supporting this task. Augur was evaluated on a gold standard of 1,408 debutant topics in the 2000-2011 interval and outperformed four alternative approaches in terms of both precision and recall.
Detection of Embryonic Research Topics by Analysing Semantic Topic NetworksAngelo Salatino
Being aware of new research topics is an important asset for anybody involved in the research environment, including researchers, academic publishers and institutional funding bodies. In recent years, the amount of scholarly data available on the web has increased steadily, allowing the development of several approaches for detecting emerging research topics and assessing their trends. However, current methods focus on the detection of topics which are already associated with a label or a substantial number of documents. In this paper, we address instead the issue of detecting embryonic topics, which do not possess these characteristics yet. We suggest that it is possible to forecast the emergence of novel research topics even at such early stage and demonstrate that the emergence of a new topic can be anticipated by analysing the dynamics of pre-existing topics. We present an approach to evaluate such dynamics and an experiment on a sample of 3 million research papers, which confirms our hypothesis. In particular, we found that the pace of collaboration in sub-graphs of topics that will give rise to novel topics is significantly higher than the one in the control group.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Search and Society: Reimagining Information Access for Radical Futures
Introductory Lecture to Audio Signal Processing
1. Introduction to Audio Signal Processing
Human-Computer Interaction
Angelo Antonio Salatino
aasalatino@gmail.com
http://infernusweb.altervista.org
2. License
This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
3. Overview
•Audio Signal Processing;
•Waveform Audio File Format;
•FFmpeg;
•Audio Processing with Matlab;
•Doing phonetics with Praat;
•Last but not least: Homework.
4. Audio Signal Processing
•Audio signal processing is an engineering field that focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.
Audio
Signal
Processing
Input Signal
Output Signal
Data with meaning
5. Audio Processing in HCI
Some HCI applications involving audio signal processing are:
•Speech Emotion Recognition
•Speaker Recognition
▫Speaker Verification
▫Speaker Identification
•Voice Commands
•Speech to Text
•Etc.
6. Audio Signals
You can find audio signals represented in either digital or analog format.
•Digital – the pressure wave-form is a sequence of symbols, usually binary numbers.
•Analog – is a smooth wave of energy represented by a continuous stream of data.
7. Analog to Digital Converter (ADC)
•Don’t worry, it’s only a fast review!!!
Sample & Hold
Quantization
Encoding
Continuous in Time Continuous in Amplitude
Discrete in Time
Continuous in Amplitude
Discrete in Time Discrete in Amplitude
Discrete in Time
Discrete in Amplitude
Analog Signal
Digital Signal
•For each measurement a number is assigned according to its amplitude.
•Sampling frequency and the number of bits to represent a sample can be considered as main features for digital signals.
•How these digital signals are stored?
Sampling Frequency must be defined
# bits per sample must be defined
8. Waveform Audio File Format (WAV)
Endianess
Byte Offeset
Field Name
Field Size
Description
Big
0
ChunkID
4
RIFF Chunk Descriptor
Little
4
ChunkSize
4
Big
8
Format
4
Big
12
SubChunk1ID
4
Format SubChunk
Little
16
SubChunk1Size
4
Little
20
AudioFormat
2
Little
22
NumChannels
2
Little
24
SampleRate
4
Little
28
ByteRate
4
Little
32
BlockAlign
2
Little
34
BitsPerSample
2
Big
36
SubChunk2ID
4
Data SubChunk
Little
40
SubChunk2Size
4
Little
44
Data
SubChunk2Size
The Wav file is an instance of a Resource Interchange File Format (RIFF) defined by IBM and Microsoft. The RIFF is a generic file container format for storing data in tagged chunks (basic building blocks). It is a file structure that defines a class of more specific file formats, such as: wav, avi, rmi, etc.
9. Waveform Audio File Format (WAV)
ChunkID
Contains the letters «RIFF» in ASCII form
(0x52494646 big-endian form)
Endianess
Byte Offeset
Field Name
Field Size
Description
Big
0
ChunkID
4
RIFF Chunk Descriptor
Little
4
ChunkSize
4
Big
8
Format
4
Big
12
SubChunk1ID
4
Format SubChunk
Little
16
SubChunk1Size
4
Little
20
AudioFormat
2
Little
22
NumChannels
2
Little
24
SampleRate
4
Little
28
ByteRate
4
Little
32
BlockAlign
2
Little
34
BitsPerSample
2
Big
36
SubChunk2ID
4
Data SubChunk
Little
40
SubChunk2Size
4
Little
44
Data
SubChunk2Size
ChunkSize This is the size of the rest of the chunk following this number. The size of the entire file in bytes minus 8 for the two fields not included: ChunkID and ChunkSize.
Format
Contains the letters «WAVE» in ASCII form
(0x57415645 big-endian form)
10. Waveform Audio File Format (WAV)
SubChunk1ID
Contains the letters «fmt » in ASCII form
(0x666d7420 big-endian form)
Endianess
Byte Offeset
Field Name
Field Size
Description
Big
0
ChunkID
4
RIFF Chunk Descriptor
Little
4
ChunkSize
4
Big
8
Format
4
Big
12
SubChunk1ID
4
Format SubChunk
Little
16
SubChunk1Size
4
Little
20
AudioFormat
2
Little
22
NumChannels
2
Little
24
SampleRate
4
Little
28
ByteRate
4
Little
32
BlockAlign
2
Little
34
BitsPerSample
2
Big
36
SubChunk2ID
4
Data SubChunk
Little
40
SubChunk2Size
4
Little
44
Data
SubChunk2Size
SubChunk1Size
16 for PCM. This is the size of the SubChunk which follows this number.
11. Waveform Audio File Format (WAV)
AudioFormat Format Code or compression type: PCM = 0x0001 (Linear quantization, uncompressed) IEEE_FLOAT = 0x0003 Microsoft_ALAW=0x0006 Microsoft_MLAW=0x0007 IBM_ADPCM = 0x0103 …
Endianess
Byte Offeset
Field Name
Field Size
Description
Big
0
ChunkID
4
RIFF Chunk Descriptor
Little
4
ChunkSize
4
Big
8
Format
4
Big
12
SubChunk1ID
4
Format SubChunk
Little
16
SubChunk1Size
4
Little
20
AudioFormat
2
Little
22
NumChannels
2
Little
24
SampleRate
4
Little
28
ByteRate
4
Little
32
BlockAlign
2
Little
34
BitsPerSample
2
Big
36
SubChunk2ID
4
Data SubChunk
Little
40
SubChunk2Size
4
Little
44
Data
SubChunk2Size
NumChannels
Mono = 1, Stereo = 2, etc.
Note: Channels are interleaved
12. Waveform Audio File Format (WAV)
SampleRate Samplig frequency: 8000, 16000, 44100, etc.
Endianess
Byte Offeset
Field Name
Field Size
Description
Big
0
ChunkID
4
RIFF Chunk Descriptor
Little
4
ChunkSize
4
Big
8
Format
4
Big
12
SubChunk1ID
4
Format SubChunk
Little
16
SubChunk1Size
4
Little
20
AudioFormat
2
Little
22
NumChannels
2
Little
24
SampleRate
4
Little
28
ByteRate
4
Little
32
BlockAlign
2
Little
34
BitsPerSample
2
Big
36
SubChunk2ID
4
Data SubChunk
Little
40
SubChunk2Size
4
Little
44
Data
SubChunk2Size
ByteRate
Average bytes per second.
It is typically determined by the Equation 1.
1)ByteRate=SampleRate⋅NumChannels⋅ BitsPerSample8
2)BlockAlign=NumChannels⋅ BitsPerSample8
BlockAlign
The number of bytes for one sample including all channels.
It is determined by the Equation 2.
13. Waveform Audio File Format (WAV)
BitsPerSample 8 bits = 8, 16 bits = 16, etc.
Endianess
Byte Offeset
Field Name
Field Size
Description
Big
0
ChunkID
4
RIFF Chunk Descriptor
Little
4
ChunkSize
4
Big
8
Format
4
Big
12
SubChunk1ID
4
Format SubChunk
Little
16
SubChunk1Size
4
Little
20
AudioFormat
2
Little
22
NumChannels
2
Little
24
SampleRate
4
Little
28
ByteRate
4
Little
32
BlockAlign
2
Little
34
BitsPerSample
2
Big
36
SubChunk2ID
4
Data SubChunk
Little
40
SubChunk2Size
4
Little
44
Data
SubChunk2Size
SubChunk2ID
Contains the letters «data» in ASCII form (0x64617461 big-endian form)
SubChunk2Size This is the number of bytes in the Data field. If AudioFormat=PCM, then you can compute the number of samples (see Equation 3).
3)NumOfSamples= 8 ⋅ SubChunk2SizeNumChannels ⋅ BitsPerSample
14. Example of wave header
Chunk Descriptor
Fmt SubChunk
52
49
46
46
16
02
01
00
57
41
56
45
66
6d
74
20
10
00
00
00
01
00
01
00
R
I
F
F
W
A
V
E
f
m
t
Fmt SubChunk (cont…)
Data SubChunk
80
3e
00
00
00
7d
00
00
02
00
10
00
64
61
74
61
f2
01
01
00
…
.
.
.
d
a
t
a
SampleRate = 16000
ChunkSize = 66070
ByteRate = 32000
BloackAlign = 2
BitsPerSample = 16
NumChannels = 1
AudioFormat = 1 (PCM)
SubChunk1Size = 16
SubChunk2Size = 66034
Data
15. Exercise
For the next 15 min, write a C/C++ program that takes a wav file as input and prints the following values on standard output:
•Header size;
•Sample rate;
•Bits per sample;
•Number of channels;
•Number of samples.
Good work!
16. Solution
typedef struct header_file
{
char chunk_id[4];
int chunk_size;
char format[4];
char subchunk1_id[4];
int subchunk1_size;
short int audio_format;
short int num_channels;
int sample_rate;
int byte_rate;
short int block_align;
short int bits_per_sample;
char subchunk2_id[4];
int subchunk2_size;
} header;
/************** Inside Main() **************/
header* meta = new header;
ifstream infile;
infile.exceptions (ifstream::eofbit | ifstream::failbit | ifstream::badbit);
infile.open("foo.wav", ios::in|ios::binary);
infile.read ((char*)meta, sizeof(header));
cout << " Header size: "<<sizeof(*meta)<<" bytes" << endl;
cout << " Sample Rate "<< meta->sample_rate <<" Hz" << endl;
cout << " Bits per samples: " << meta->bits_per_sample << " bit" <<endl;
cout << " Number of channels: " << meta->num_channels << endl;
long numOfSample = (meta->subchunk2_size/meta->num_channels)/(meta->bits_per_sample/8);
cout << " Number of samples: " << numOfSample << endl;
However, this solution contains an error. Can you spot it?
17. What about reading samples?
short int* pU = NULL;
unsigned char* pC = NULL;
gWavDataIn = new double*[meta->num_channels]; //data structure storing the samples
for (int i = 0; i < meta->num_channels; i++) gWavDataIn[i] = new double[numOfSample];
wBuffer = new char[meta->subchunk2_size]; //data structure storing the bytes
/* data conversion: from byte to samples */
if(meta->bits_per_sample == 16)
{
pU = (short*) wBuffer;
for( int i = 0; i < numOfSample; i++)
for (int j = 0; j < meta->num_channels; j++)
gWavDataIn[j][i] = (double) (pU[i]);
}
else if(meta->bits_per_sample == 8)
{
pC = (unsigned char*) wBuffer;
for( int i = 0; i < numOfSample; i++)
for (int j = 0; j < meta->num_channels; j++)
gWavDataIn[j][i] = (double) (pC[i]);
}
else
{
printERR("Unhandled case");
}
This solution is available at: https://github.com/angelosalatino/AudioSignalProcessing
18. A better solution: FFmpeg
What FFmpeg says about itself:
•FFmpeg is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created. It supports the most obscure ancient formats up to the cutting edge. No matter if they were designed by some standards committee, the community or a corporation.
19. Why FFmpeg is better?
•Off-the-shelf;
•Open Source;
•We can read samples from different kind of formats: wav, mp3, aac, flac and so on;
•The code is always the same for all these audio formats;
•It can also decode video formats.
20. A little bit of code …
Step 1
•Create AVFormatContext
▫Format I/O context: nb_streams, filename, start_time, duration, bit_rate, audio_codec_id, video_codec_id and so on.
•Open file
AVFormatContext* formatContext = NULL;
av_open_input_file(&formatContext,"foo.wav",NULL,0,NULL)
21. A little bit of code …
Step 2
•Create AVStream
▫Stream structure; It contains: nb_frames, codec_context, duration and so on;
•Association between audio stream inside the context and the new one.
// Find the audio stream (some container files can have multiple streams in them) AVStream* audioStream = NULL; for (unsigned int i = 0; i < formatContext->nb_streams; ++i) if (formatContext->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO) { audioStream = formatContext->streams[i]; break; }
22. A little bit of code …
Step 3
•Create AVCodecContext
▫Main external API structure; It contains: codec_name, codec_id and so on.
•Create AVCodec
▫Codec Structure; It contains deep level information about codec.
•Find codec availability
•Open Codec
AVCodecContext* codecContext = audioStream->codec;
AvCodec codec = avcodec_find_decoder(codecContext->codec_id);
avcodec_open(codecContext,codec);
23. A little bit of code …
Step 4
•Create AVPacket
▫This structure stores compressed data.
•Create AVFrame
▫This structure describes decoded (raw) audio or video data.
AVPacket packet;
av_init_packet(&packet);
…
AVFrame* frame = avcodec_alloc_frame();
24. A little bit of code …
Step 5
•Read packets
▫Packets are read from AVContextFormat
•Decode packets
▫Frame are decodec with CodecContext
// Read the packets in a loop
while (av_read_frame(formatContext, &packet) == 0)
{
…
avcodec_decode_audio4(codecContext, frame, &frameFinished, &packet);
…
src_data = frame->data[0];
}
25. Problems with FFmpeg
•Update issues (with lib update, your previous code might not work)
▫Deprecated methods;
▫Function name or parameters could change.
•Poor documentation (until today)
Example of migration:
•avcodec_open (AVCodecContext *avctx, const AVCodec *codec)
•avcodec_open2 (AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options)
26. Audio Processing with Matlab
•Matlab contains a lot of built-in functions to read, listen, manipulate and save audio files.
•It also contains Signal Processing Toolbox and DSP System Toolbox
Advantages
Disadvantages
•Well documented;
•It works on different level of abstraction;
•Direct access to samples;
•Coding is simple.
•Only wave, flac, mp3, mpeg-4 and ogg formats are recognized in audioread (Is it really a disadvantage?);
•License is expensive.
27. Let’s code: Opening files
%% Reading file
% Section ID = 1
filename = './test.wav';
[data,fs] = wavread(filename); % reads only wav file
% data = sample collection, fs = sampling frequency
% or ---> [data,fs] = audioread(filename);
% write an audio file
audiowrite('./testCopy.wav',data,fs)
Recognized formats by audioread()
28. Information and play
%% Information & play
% Section ID = 2
numberOfSamples = length(data);
tempo = numberOfSamples / fs;
disp (sprintf('Length: %f seconds',tempo));
disp (sprintf('Number of Samples %d', numberOfSamples));
disp (sprintf('Sampling Frequency %d Hz',fs));
disp (sprintf('Number of Channels: %d', min(size(data))));
%play file
sound(data,fs);
% PLOT the signal
time = linspace(0,tempo,numberOfSamples);
plot(time,data);
29. Framing
%% Framing
% Section ID = 4
timeWindow = 0.04; % Frame length in term of seconds. Default: timeWindow = 40ms
timeStep = 0.01; % seconds between two frames. Default: timeStep = 10ms (in case of OVERLAPPING)
overlap = 1; % 1 in case of overlap, 0 no overlap
sampleForWindow = timeWindow * fs;
if overlap == 0;
Y = buffer(data,sampleForWindow);
else
sampleToJump = sampleForWindow - timeStep * fs;
Y = buffer(data,sampleForWindow,ceil(sampleToJump));
end
[m,n]=size(Y); % m corresponds to sampleForWindow
numFrames = n;
disp(sprintf('Number of Frames: %d',numFrames));
푠(푡)=푥(푡)⋅푟푒푐푡 푡−휏 #푠푎푚푝푙푒
30. Windowing
%% Windowing
% Section ID = 5
num_points = sampleForWindow;
% some windows USE help window
w_gauss = gausswin(num_points);
w_hamming = hamming(num_points);
w_hann = hann(num_points);
plot(1:num_points,[w_gauss,w_hamming, w_hann]); axis([1 num_points 0 2]);
legend('Gaussian','Hamming','Hann');
old_Y = Y;
for i=1:numFrames
Y(:,i)=Y(:,i).*w_hann;
end
%see the difference
index_to_plot = 88;
figure
plot (old_Y(:,index_to_plot))
hold on
plot (Y(:,index_to_plot), 'green')
hold off
clear num_points w_gauss w_hamming w_hann
푤퐺퐴푈푆푆(푛)=푒 − 12 푛−(푁−1)2 휎(푁−1)2 2,휎≤ 0.5
푤퐻퐴푀푀퐼푁퐺(푛)=0.54+0.46 cos2휋푛 푁−1
푤퐻퐴푁푁(푛)=0.5 1+cos2휋푛 푁−1
31. Energy
%% Energy
% Section ID = 6
% It requires that signal is already framed
% Run Section ID=4
for i=1:numFrames
energy(i)=sum(abs(old_Y(:,i)).^2);
end
figure, plot(energy)
퐸= |푥(푖)|2 푁 푖=1
32. Fast Fourier Transform (FFT)
%% Fast Fourier Transform (sull'intero segnale)
% Section ID = 7
NFFT = 2^nextpow2(numberOfSamples); % Next higher power of 2. (in order to optimize FFT computation)
freqSignal = fft(data,NFFT);
f = fs/2*linspace(0,1,NFFT/2+1);
% PLOT
plot(f,abs(freqSignal(1:NFFT/2+1)))
title('Single-Sided Amplitude Spectrum of y(t)')
xlabel('Frequency (Hz)')
ylabel('|Y(f)|')
clear NFFT freqSignal f
33. Short Term Fourier Transform (STFT)
%% Short Term Fourier Transform
% Section ID = 8
% It requires that signal is already framed. Run Section ID=4
NFFT = 2^nextpow2(sampleForWindow);
STFT = ones(NFFT,numFrames);
for i=1:numFrames
STFT(:,i)=fft(Y(:,i),NFFT);
end
indexToPlot = 80; %frame index to plot
if indexToPlot < numFrames
f = fs/2*linspace(0,1,NFFT/2+1);
plot(f,2*abs(STFT(1:NFFT/2+1,indexToPlot))) % PLOT
title(sprintf('FFT del frame %d', indexToPlot));
xlabel('Frequency (Hz)')
ylabel(sprintf('|STFT_{%d}(f)|',indexToPlot))
else
disp('Unable to create plot');
End
% *********************************************
specgram(data,sampleForWindow,fs) % SPECTROGRAM
title('Spectrogram [dB]')
34. Auto-correlation
%% Auto-Correlazione per frames
% Section ID = 9
% It requires that signal is already framed
% Run Section ID=4
for i=1:numFrames
autoCorr(:,i)=xcorr(Y(:,i));
end
indexToPlot = 80; %frame index to plot
if indexToPlot < numFrames
% PLOT
plot(autoCorr(sampleForWindow:end,i))
else
disp('Unable to create plot');
end
clear indexToPlot
Rx(n)= x(i)⋅x(i+n) 푁 푖=1
35. A system for doing phonetics: Praat
•PRAAT is a comprehensive speech analysis, synthesis, and manipulation package developed by Paul Boersma and David Weenink at the Institute of Phonetic Sciences of the University of Amsterdam, The Netherlands.
38. Other features with Praat
•Intensity
•Mel-Frequency Cepstrum Coefficients (MFCC);
•Linear Predictive Coefficients (LPC);
•Harmonic-to-Noise Ratio (HNR);
•and many others.
39. Scripting in Praat
•Praat can run scripts containing all the different commands available in its environment and perform the operations and functionalities that they represent.
fileName$ = "test.wav"
Read from file... 'fileName$'
name$ = fileName$ - ".wav"
select Sound 'name$'
To Pitch (ac)... 0.0 50.0 15 off 0.1 0.60 0.01 0.35 0.14 500.0
numFrame=Get number of frames
for i to numFrame
time=Get time from frame number... i
value=Get value in frame... i Hertz
if value = undefined
value=0
endif
path$=name$+"_pitch.txt"
fileappend 'path$' 'time' 'value' 'newline$'
endfor
select Pitch 'name$'
Remove
select Sound 'name$'
Remove
Here is an example to perform a pitch listing and save it in a text file.
40. Homework
•Exercise 1) Consider a speech signal containing silence, unvoiced and voiced regions, as showed here and write a Matlab function (or whatever language you prefer) capable to identify these sections.
•Exercise 2) Then, in voiced regions identify the fundamental frequency, the so called pitch.
Please, try this at home!!
Voiced
Unvoiced
Silence
41. •Signal Processing
▫http://deecom19.poliba.it/dsp/Teoria_dei_Segnali.pdf (Italian)
•WAV
▫https://ccrma.stanford.edu/courses/422/projects/WaveFormat/
▫http://www.onicos.com/staff/iz/formats/wav.html
•MATLAB
▫http://www.mathworks.com/products/signal/
▫http://www.mathworks.com/products/dsp-system/
▫http://homepages.udayton.edu/~hardierc/ece203/sound.htm
▫http://www.utdallas.edu/~assmann/hcs7367/classnotes.html
References and further reading
42. References and further reading
•FFmpeg
▫https://www.ffmpeg.org/
▫https://trac.ffmpeg.org/wiki/CompilationGuide/Ubuntu
•Praat
▫http://www.fon.hum.uva.nl/praat/
▫http://www.fon.hum.uva.nl/david/sspbook/sspbook. pdf
▫http://www.fon.hum.uva.nl/praat/manual/Scripting. html
•Source code
▫https://github.com/angelosalatino/AudioSignalProcessing