This document provides an overview of MPEG-1 audio compression. It describes the key components of the MPEG-1 audio encoder including the polyphase filter bank that transforms audio into frequency subbands, the psychoacoustic model that determines inaudible parts of the signal, and the coding and bit allocation process that assigns bits to subbands. The overview concludes by noting that MPEG-1 audio provides high compression while retaining quality and paved the way for future audio compression standards.
Audio Compression Techniques
a type of lossy or lossless compression in which the amount of data in a recorded waveform is reduced to differing extents for transmission respectively with or without some loss of quality, used in CD and MP3 encoding, Internet radio.
Dynamic range compression, also called audio level compression, in which the dynamic range, the difference between loud and quiet, of an audio waveform is reduced
What is Video Compression?, Introduction of Video Compression. Motivation, Working Methodology of Video Compression., Example, Applications, Needs of Video Compression, Advantages & Disadvantages
This presentation is about JPEG compression algorithm. It briefly describes all the underlying steps in JPEG compression like picture preparation, DCT, Quantization, Rendering and Encoding.
This presentation explains the Transform coding in easiest method possible. The graphics and diagrammatic representations are worth looking for. Simple language is another pro.
Video coding is an essential component of video streaming, digital TV, video chat and many other technologies. This presentation, an invited lecture to the US Patent and Trade Mark Office, describes some of the key developments in the history of video coding.
Many of the components of present-day video codecs were originally developed before 1990. From 1990 onwards, developments in video coding were closely associated with industry standards such as MPEG-2, H.264 and H.265/HEVC.
The presentation covers:
- Basic concepts of video coding
- Fundamental inventions prior to 1990
- Industry standards from 1990 to 2014
- Video coding patents and patent pools.
Presentation given in the Seminar of B.Tech 6th Semester during session 2009-10 By Paramjeet Singh Jamwal, Poonam Kanyal, Rittitka Mittal and Surabhi Tyagi.
In computer science and information theory, data compression, source coding,[1] or bit-rate reduction involves encoding information using fewer bits than the original representation.[2] Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression.
Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate. It provides extremely accurate estimates of speech parameters, and is relatively efficient for computation.
Audio Compression Techniques
a type of lossy or lossless compression in which the amount of data in a recorded waveform is reduced to differing extents for transmission respectively with or without some loss of quality, used in CD and MP3 encoding, Internet radio.
Dynamic range compression, also called audio level compression, in which the dynamic range, the difference between loud and quiet, of an audio waveform is reduced
What is Video Compression?, Introduction of Video Compression. Motivation, Working Methodology of Video Compression., Example, Applications, Needs of Video Compression, Advantages & Disadvantages
This presentation is about JPEG compression algorithm. It briefly describes all the underlying steps in JPEG compression like picture preparation, DCT, Quantization, Rendering and Encoding.
This presentation explains the Transform coding in easiest method possible. The graphics and diagrammatic representations are worth looking for. Simple language is another pro.
Video coding is an essential component of video streaming, digital TV, video chat and many other technologies. This presentation, an invited lecture to the US Patent and Trade Mark Office, describes some of the key developments in the history of video coding.
Many of the components of present-day video codecs were originally developed before 1990. From 1990 onwards, developments in video coding were closely associated with industry standards such as MPEG-2, H.264 and H.265/HEVC.
The presentation covers:
- Basic concepts of video coding
- Fundamental inventions prior to 1990
- Industry standards from 1990 to 2014
- Video coding patents and patent pools.
Presentation given in the Seminar of B.Tech 6th Semester during session 2009-10 By Paramjeet Singh Jamwal, Poonam Kanyal, Rittitka Mittal and Surabhi Tyagi.
In computer science and information theory, data compression, source coding,[1] or bit-rate reduction involves encoding information using fewer bits than the original representation.[2] Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression.
Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate. It provides extremely accurate estimates of speech parameters, and is relatively efficient for computation.
This presentation is meant to discuss the basics of video compression like DCT, Color space conversion, Motion Compensation etc. It also discusses the standards like H.264, MPEG2, MPEG4 etc.
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audioinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Psychoacoustic Approaches to Audio SteganographyCody Ray
Presentation slides corresponding to a paper that explores methods of audio steganography with emphasis on psychoacoustic approaches. Specifically, it describes a project that had the requirement of hiding a text-based message inside an audio signal with minimal or no distortion of the signal as perceived by the human ear. The theory and experimental results of each approach are discussed.
Explore the multifaceted world of Muntadher Saleh, an Iraqi polymath renowned for his expertise in visual art, writing, design, and pharmacy. This SlideShare delves into his innovative contributions across various disciplines, showcasing his unique ability to blend traditional themes with modern aesthetics. Learn about his impactful artworks, thought-provoking literary pieces, and his vision as a Neo-Pop artist dedicated to raising awareness about Iraq's cultural heritage. Discover why Muntadher Saleh is celebrated as "The Last Polymath" and how his multidisciplinary talents continue to inspire and influence.
2137ad Merindol Colony Interiors where refugee try to build a seemengly norm...luforfor
This are the interiors of the Merindol Colony in 2137ad after the Climate Change Collapse and the Apocalipse Wars. Merindol is a small Colony in the Italian Alps where there are around 4000 humans. The Colony values mainly around meritocracy and selection by effort.
Hadj Ounis's most notable work is his sculpture titled "Metamorphosis." This piece showcases Ounis's mastery of form and texture, as he seamlessly combines metal and wood to create a dynamic and visually striking composition. The juxtaposition of the two materials creates a sense of tension and harmony, inviting viewers to contemplate the relationship between nature and industry.
2137ad - Characters that live in Merindol and are at the center of main storiesluforfor
Kurgan is a russian expatriate that is secretly in love with Sonia Contado. Henry is a british soldier that took refuge in Merindol Colony in 2137ad. He is the lover of Sonia Contado.
2. Outline
Introduction
Technical Overview
Polyphase Filter Bank
Psychoacoustic Model
Coding and Bit Allocation
Conclusions and Future Work
3. Introduction
What does MPEG-1 Audio provide?
A transparently lossy audio compression system based on
the weaknesses of the human ear.
Can provide compression by a factor of 6 and
retain sound quality.
One part of a three part standard that includes
audio, video, and audio/video synchronization.
5. MPEG-I Audio Features
PCM sampling rate of 32, 44.1, or 48 kHz
Four channel modes:
Monophonic and Dual-monophonic
Stereo and Joint-stereo
Three modes (layers in MPEG-I speak):
Layer I: Computationally cheapest, bit rates > 128kbps
Layer II: Bit rate ~ 128 kbps, used in VCD
Layer III: Most complicated encoding/decoding, bit rates ~
64kbps, originally intended for streaming audio
6. Human Audio System (ear + brain)
Human sensitivity to sound is non-linear
across audible range (20Hz – 20kHz)
Audible range broken into regions where
humans cannot perceive a difference
called the critical bands
8. MPEG-I Encoder Architecture
Polyphase Filter Bank: Transforms PCM samples
to frequency domain signals in 32 subbands
Psychoacoustic Model: Calculates acoustically
irrelevant parts of signal
Bit Allocator: Allots bits to subbands according to
input from psychoacoustic calculation.
Frame Creation: Generates an MPEG-I compliant
bit stream.
10. Polyphase Filter Bank
Divides audio signal into 32 equal width
subband streams in the frequency domain.
Inverse filter at decoder cannot recover
signal without some, albeit inaudible, loss.
Based on work by Rothweiler[2].
Standard specifies 512 coefficient analysis
window, C[n]
11. Polyphase Filter Bank
Buffer of 512 PCM samples with 32 new
samples, X[n], shifted in every computation cycle
Calculate window samples for i=0…511:
Partial calculation for i=0…63:
Calculate 32 subsamples:
][][][ iXiCiZ ⋅=
∑=
+=
7
0
]64[][
j
jiZiY
∑=
⋅=
63
0
]][[][][
k
kiMiYiS
13. Polyphase Filter Bank
The net effect:
Analysis matrix:
Requires 512 + 32x64 = 2560 multiplies.
Each subband has bandwidth π/32T centered at
odd multiples of π/64T
]64[]64[]][[][
63
0
7
0
jiXjiCkiMiS
k j
++= ∑ ∑= =
−+
=
64
)16)(12(
cos]][[
πki
kiM
14. Polyphase Filter Bank
Shortcomings:
Equal width filters do not correspond with critical
band model of auditory system.
Filter bank and its inverse are NOT lossless.
Frequency overlap between subbands.
18. The Weakness of the Human Ear
Frequency dependent resolution:
We do not have the ability to discern minute
differences in frequency within the critical bands.
Auditory masking:
When two signals of very close frequency are
both present, the louder will mask the softer.
A masked signal must be louder than some
threshold for it to be heard gives us room to
introduce inaudible quantization noise.
19. MPEG-I Psychoacoustic Models
MPEG-I standard defines two models:
Psychoacoustic Model 1:
Less computationally expensive
Makes some serious compromises in what it
assumes a listener cannot hear
Psychoacoustic Model 2:
Provides more features suited for Layer III
coding, assuming of course, increased processor
bandwidth.
20. Psychoacoustic Model
Convert samples to frequency domain
Use a Hann weighting and then a DFT
Simply gives an edge artifact (from finite window
size) free frequency domain representation.
Model 1 uses 512 (Layer I) or 1024 (Layers II
and III) sample window.
Model 2 uses a 1024 sample window and two
calculations per frame.
21. Psychoacoustic Model
Need to separate sound into “tones” and “noise”
components
Model 1:
Local peaks are tones, lump remaining spectrum per
critical band into noise at a representative frequency.
Model 2:
Calculate “tonality” index to determine likelihood of each
spectral point being a tone
based on previous two analysis windows
22. Psychoacoustic Model
“Smear” each signal within its critical band
Use either a masking (Model 1) or a spreading
function (Model 2).
Adjust calculated threshold by incorporating
a “quiet” mask – masking threshold for
each frequency when no other frequencies
are present.
23. Psychoacoustic Model
Calculate a masking threshold for each subband in the
polyphase filter bank
Model 1:
Selects minima of masking threshold values in range of each
subband
Inaccurate at higher frequencies – recall how subbands are
linearly distributed, critical bands are NOT!
Model 2:
If subband wider than critical band:
Use minimal masking threshold in subband
If critical band wider than subband:
Use average masking threshold in subband
24. Psychoacoustic Model
The hard work is done – now, we just
calculate the signal-to-mask ratio (SMR)
per subband
SMR = signal energy / masking threshold
We pass our result on to the coding unit
which can now produce a compressed
bitstream
33. Layer I Coding
Group 12 samples from each subband and
encode them in each frame (=384 samples)
Each group encoded with 0-15 bits/sample
Each group has 6-bit scale factor
34. Layer II Coding
Similar to Layer I except:
Groups are now 3 of 12 samples per-subband =
1152 samples per frame
Can have up to 3 scale factors per subband to
avoid audible distortion in special cases
Called scale factor selection information (SCFSI)
35. Layer III Coding
Further subdivides subbands using Modified
Discrete Cosine Transform (MDCT) – a lossless
transform
Larger frequency resolution => smaller time
resolution
possibility of pre-echo
Layer III encoder can detect and reduce pre-echo
by “borrowing bits” from future encodings
36. Bit Allocation
Determine number of bits to allot for each
subband given SMR from psychoacoustic model.
Layers I and II:
Calculate mask-to-noise ratio:
MNR = SNR – SMR (in dB)
SNR given by MPEG-I standard (as function of quantization
levels)
Now iterate until no bits to allocate left:
Allocate bits to subband with lowest MNR.
Re-calculate MNR for subband allocated more bits.
37. Bit Allocation
Layer III:
Employs “noise allocation”
Quantizes each spectral value and employs
Huffman coding
If Huffman encoding results in noise in excess of
allowed distortion for a subband, encoder
increases resolution on that subband
Whole process repeats until one of three
specified stop conditions is met.
39. Conclusions
MPEG-I provides tremendous compression
for relatively cheap computation.
Not suitable for archival or audiophile grade
music as very seasoned listeners can
discern distortion.
Modifying or searching MPEG-I content
requires decompression and is not cheap!
40. Future Work
MPEG-1 audio lays the foundation for all modern
audio compression techniques
Lots of progress since then (1994!)
MPEG-2 (1996) extends MPEG audio
compression to support 5.1 channel audio
MPEG-4 (1998) attempts to code based on
perceived audio objects in the stream
Finally, MPEG-7 (2001) operates at an even
higher level of abstraction, focusing on meta-data
coding to make content searchable and
retrievable
41. References
[1] D. Pan, “A Tutorial on MPEG/Audio Compression”,
IEEE Multimedia Journal, 1995.
[2] J. H. Rothweiler, “Polyphase Quadrature Filters – a New
Subband Coding Technique”, Proc of the Int. Conf. IEEE
ASSP, 27.2, pp1280-1283, Boston 1983.