Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multimedia Services: Audio

989 views

Published on

Distinct concepts seen in Communication Engineering / Multimedia services classes

Published in: Technology
  • The #1 Woodworking Resource With Over 16,000 Plans, Download 50 FREE Plans... ■■■ http://tinyurl.com/y3hc8gpw
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Get access to 16,000 woodworking plans, Download 50 FREE Plans... ●●● http://tinyurl.com/y3hc8gpw
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Multimedia Services: Audio

  1. 1. Multimedia Services: Audio Sep-2015 Dani Gutiérrez Porset Associate Professor Communications Engineering Eman ta zabal zazu
  2. 2. 2 2Multimedia Services: Audio Thanks, Licences and Tools ● Thanks to people and organizations who took or take part in free software and free knowledge projects, specially Wikimedia Foundation and KDE ● This presentation is licensed as CC BY-SA 3.0 ES http://creativecommons.org/licenses/by-sa/3.0/es/ ● This presentation has been made with KDE, LibreOffice, Inkscape, Gimp, Chromium, Firefox
  3. 3. 3 3Multimedia Services: Audio Sources and References ● Images from Wikimedia Foundation, if not referenced other source. Logos and trademarks belong to respective organizations ● Texts: – Wikipedia pages and referenced articles and material – “Guide to Voice and Video over IP” - Sun, Mkwawa, Jammeh, Ifeachor – “Video over IP” - Wes Simpson – “Computer Networking, a top-down approach” - Kurose, Ross
  4. 4. 4 4Multimedia Services: Audio Index ● Introduction ● Codecs ● Speech ● Files, Containers and Formats ● Audio wires and connectors
  5. 5. 5 5Multimedia Services: Audio Human ear ● Time domain: – Not able to hear short time signals (< 1 msec) – Loud signals mask quieter signals near in time Introduction ● Frequency domain: – Range of audible frequencies – Loud signals at one pitch mask quieter signals at a near pitch
  6. 6. 6 6Multimedia Services: Audio Audio Applications ● Speech, VoIP ● CD, DVD sound ● Digital Audio/Video Broadcasting (DAB, DVB) ● Internet streaming ● Studio/transmitter link ● Theatrical movie presentation ● MIDI (similar to vectorial images): Technical standard for musical instruments that describes pitch, durations, velocity,... of notes. An output hardware device or a software synthesizes real audio Introduction
  7. 7. 7 7Multimedia Services: Audio Audio analog signals Waveform: for time-domain Spectrum: for freq-domain Audacity screenshots. Dani Gutiérrez Introduction
  8. 8. 8 8Multimedia Services: Audio Modulation families Analog baseband signal Digital signal Analog bandpass channel Analog modulation e.g. AM, FM Digital modulation e.g. PSK, FSK, ASK, QAM Analog baseband channel Pulse modulation, analog over analog e.g. PAM, PWM Digital baseband modulation e.g. Unipolar, NRZ, Manchester Digital channel Pulse modulation, analog over digital e.g. PCM Introduction
  9. 9. 9 9Multimedia Services: Audio Analog-over-digital modulations = Digitization ● Pulse-code modulation (PCM) – Differential PCM (DPCM) – Adaptive DPCM (ADPCM) ● Delta modulation (DM or Δ-modulation) ● Adaptive-delta modulation (ADM) or Continuously variable slope delta modulation (CVSDM) ● Delta-sigma modulation ( Δ)∑ ● Pulse-density modulation (PDM), e.g. used in Super Audio CD (“DSD” trademark from Sony and Philips) Introduction
  10. 10. 10 10Multimedia Services: Audio Pulse Code Modulation 1.Sampling (>= 2 x bandwidth of the analog signal) Errors depend on Frecuency cut and clock accuracy 2.Quantization: uniform (LPCM=Linear PCM) or non-uniform (PCMA=A- law, PCMU= -law)μ Errors: Granularity 3.Coding: number of bits per sample Mode Bandwidth (Hz) Sampling (kHz) Narrowband (NB) 300–3400 8 Wideband (WB) 50–7000 16 Super-wideband (SWB) 50–14000 32 Fullband (FB) 20–20000 48 Introduction
  11. 11. 11 11Multimedia Services: Audio Audio Codecs ● Aim of a Codec: to convert and to compress, for storage and transmission over distinct media, e.g. – AMR-NB: lossy, for speech – Dolby Digital: lossy, for cinema and HDTV broadcast – Dolby TrueHD: lossless, for home entertainment ● Conversion types: – Analog to Digital (+ Digital to Analog) – Digital to Digital ● Bitrate (kbits/s) at the codec output http://en.wikipedia.org/wiki/Analog-to-digital_converter http://en.wikipedia.org/wiki/Audio_coding_format Codecs
  12. 12. 12 12Multimedia Services: Audio Classifications of Audio Codecs Codecs ● Nature of source: speech, music, cellular (2G GSM, 3G ARM)... ● Source signal bandwidth (NB, WB, SWB, FB) and Sampling rate ● Resulting bitrate (Most in 4,8 to 16 kbps) ● One or more bitrates (adaptive) ● Lossless or lossy ● Latency or delay (inherent to each algorithm) ● Quality ● Creator (ITU-T, IETF, ETSI, Skype,...) ● Licenses ● Costs for encoder and player ● Compression techniques and algorithms (depend on nature and bandwidth of source signal): Frame based or sample based, Delay, CBR or VBR, No. of channels,... ● Complexity (computation time) Source Processing Result Legals & Costs http://en.wikipedia.org/wiki/Comparison_of_audio_coding_formats
  13. 13. 13 13Multimedia Services: Audio Audio compression ● Based on psychoacoustics: – Threshold of hearing (frequencies) – Simultaneous masking ● Lossy used algorithm families: – Time domain: Linear predictive coding (LPC), mainly for speech: CELP, ACELP, VSELP, LPC, RPE-LTP,... – Freq domain: ● Modified discrete cosine transform (MDCT), e.g. CELT ● Applied to full band or to sub-bands (SBC): break signal into freq bands, and encode each one independently, e.g. MP3 – Some codecs combine both, e.g. G.718 uses CELP and MDTC Codecs
  14. 14. 14 14Multimedia Services: Audio Compression ratio Codec Digital Input Stream Output Stream ● f=Sampling freq (kHz) ● bs=Bits/sample ● b=Bitrate (kbps) f x bs b Compression ratio = (related to input) 64 b Compression ratio = (related to 64 kbps) Codecs
  15. 15. 15 15Multimedia Services: Audio Framed based vs Sample based ● Sample-based: one sample each time e.g. PCM and ADPCM ● Frame-based: more than one sample is taken, to study correlation between near samples. Frame length can be fixed or variable e.g. G.723.1 and G.729 Codecs
  16. 16. 16 16Multimedia Services: Audio Audio Codecs and Delays ● Delays more or less appropriate for some types of transmission: – Low latency: less compression, higher bitrate e.g. for real time in VoIP or satellite communications – High latency: higher compression, lower bitrate e.g. for stored media, broadcasting or recording ● Origin of latencies: – Processing, depends on hardware – Inherent to each algorithm or codec (buffering is needed): ● Frame size (msec): related to number of samples inside the frame ● Look-ahead time: when needed to study correlation between actual and next frame ● Delay calculations: – In sender: Algorithm delay = Frame length + Look-ahead time – In both: Codec delay = 2 x Frame length + Look-ahead time Codecs
  17. 17. 17 17Multimedia Services: Audio Audio Codecs: CBR vs VBR ● CBR: Constant bitrate. Older ● VBR: Variable bitrate: – Frames of a file with distinct bitrates depending on variability of information, higher during more complex periods – Better quality vs size, but more complex to encode – Typical in lossless compression (e.g. FLAC, Apple Lossless) and in some lossy compressions (e.g. MP3, Opus, Vorbis, AAC) – Encoding in single-pass (“on the fly”) or multipass (not for real time or live streaming) – Input parameter: fixed quality, max/min bitrate, average bitrate, file size Codecs
  18. 18. 18 18Multimedia Services: Audio Audio Codecs and Channels ● Mix two (stereo) or more channels of similar information reducing size but at high quality, instead of store and send independent channels ● Techiques (used in e.g. MP3, AAC, Vorbis) that may be combined for a signal: – Simple Stereo (SS): independent channels. No compression – Mid-side Stereo (MS): ● Middle = (L+R)/2, Side = (L-R)/2. ● Can benefit if signal is more “mono-like”, compressing new “Middle channel” – Intensity Stereo: ● Based on phychoacoustics, replaces both channels with a single signal plus directional information ● Better at low bitrates, worse at high bitrates Codecs
  19. 19. 19 19Multimedia Services: Audio Audio Codecs comparisons Source: http://www.opus-codec.org/comparison/ Codecs
  20. 20. 20 20Multimedia Services: Audio Example of Audio Codec: MP3 ● Versions: MPEG-1, MPEG-2, MPEG-2.5 Audio Layer III ● Specification defines decoder better than encoder. Distinct implementations for encoder, e.g. LAME ● Distinct bitrates and sampling rates depending on version ● Channels: 2 in MPEG-1 mode and up to 5.1 in MPEG-2 ● Algorithms: MDCT Hybrid Subband ● Supports CBR and VBR ● Licensing and patent war Codecs
  21. 21. 21 21Multimedia Services: Audio Other examples of typical Audio Codecs ● AAC (Advanced Audio Coding), from ISO and IEC. Part of MPEG-2 and MPEG-4. Designed to replace MP3. Patent for coding, not for streaming or distributing contents ● Vorbis, from Xiph.Org foundation: typically inside Ogg or WebM containers. Based on MDCT. Open, Royalty- free ● Opus, from IETF. Suitable for interactive real-time. Based on CELT and SILK. Open, Royalty-free http://en.wikipedia.org/wiki/Category:Audio_codecs Codecs
  22. 22. 22 22Multimedia Services: Audio Speech case ● Distinct to music ● Interactive ● Voiced speech: harmonics (at freq depending if male/female) ● Unvoiced signal: like white noise Speech
  23. 23. 23 23Multimedia Services: Audio Speech Codecs ● Aim: intelligibility and speaker identification ● Specialized codecs, e.g.: – Better for music: Vorbis – Better for speech: GSM, Speex,... ● Distinct times: – Speech frame: time to encode a frame of speech – RTP Packet voice duration: time to packetize and send to the network e.g. for PCM: 20 msec ● Sometimes a VoIP tool provides several codecs to be selected manually or automatically, and can be changed during conversation Speech
  24. 24. 24 24Multimedia Services: Audio Speech Codecs: Techniques and Codec Comparison ● Compression: remove short-term correlation (~ 1 msec) and long-term correlation (~ 5 to 10 msec). ● Techniques: Waveform, Parametric (Vocoders for speech), Hybrid Source: http://www-mobile.ecs.soton.ac.uk/speech_codecs/common_classes.html Speech
  25. 25. 25 25Multimedia Services: Audio G.711 Codec ● Reference codec for comparison ● G.711 = “PCM of voice frequencies” 8k samples/sec x 8 bits/sample = 64 kbps ● Voice quantisation: non-uniform logarithmic quantisation because of its nature of voice: lower level speech signal has higher PDF (Probability Density Function) than higher speech ● Variations: – µ-law (North America, Japan): 14 bits to 8 bits – A-law (Europe): 13 bits to 8 bits Speech
  26. 26. 26 26Multimedia Services: Audio Speech compression: Waveform based technique ● Method: Remove rendundancy in waveform and reconstruct. ● Complexity: low ● Results: 16 kpbs to 64 kbps ● Examples of Codecs: – PCM – ADPCM (Adaptative Differential PCM), for NB and WB Speech
  27. 27. 27 27Multimedia Services: Audio Speech compression: Parametric based technique ● Method: – Take segments of short periods (~20 msec) and classifies them in voiced or unvoiced – The voice parameters of each segment are obtained via speech analysis, encoded and sent ● Complexity: high ● Results: better compression ratios, bad quality ● Examples of Codecs: – LPC (Linear Prediction Coding); 1,2 to 4,8 kbps. Used for secure wireless communications Speech
  28. 28. 28 28Multimedia Services: Audio Speech compression: Hybrid based technique ● “Analysis-by-Synthesis coding” ● Method: Combines waveform and parametric ● Examples of Codecs: – CELP (Codebook Excitation Linear Prediction): 4,8 to 16 kbps. Mobile/wireless/satellite communications achieving toll quality (MOS over 4.0) – Other modern codecs: G.729, G.723.1, AMR, iLBC, SILK Speech
  29. 29. 29 29Multimedia Services: Audio Speech codec examples Source: Cisco. Voice Over IP - Per Call Bandwidth Consumption Other important speech codecs: ● SILK, from Skype. Based on LPC. Not royalty-free ● iSAC (internet Speech Audio Codec): wideband and super wideband, open, royalty-free ● ILBC (internet Low Bitrate Codec): narrowband, open, royalty-free for WebRTC ● AMR (Adaptive Multi-Rate) or AMR-NB (Narrow Band) ● AMR-WB (Wideband) ● Speex http://en.wikipedia.org/wiki/Category:Speech_codecs Speech
  30. 30. 30 30Multimedia Services: Audio Audio files ● No. of Channels: one (“mono”), two (“stereo”) or Multichannel ● Compression and codecs: – No compression: raw PCM,... – Lossless: FLAC, Apple Lossless .m4a, WMA lossless,... – Lossy: MP3, Vorbis, AAC,… Files, Containers and Formats http://en.wikipedia.org/wiki/Audio_file_format
  31. 31. 31 31Multimedia Services: Audio Examples of Audio containers ● WAV: – Instance of [the more general] RIFF – Chunks: ● One or more chunks, e.g. 2 channels for stereo ● Can contain compressed audio data and non-audio data – Metadata for each chunk: ● Encoding (typically LPCM uncompressed), No. of channels, bits/channel, sample rate ● Labels: artist, comments,... ● From video: Ogg, MPEG-4 Part 14 or MP4 Files, Containers and Formats
  32. 32. 32 32Multimedia Services: Audio Audio physical formats ● CD: – Reference: “Red book” – Digital audio encoding: ● 2-channel ● Signed 16-bit ● Linear PCM ● 44,100 Hz – Similar but distinct to WAV: no headers, tracks that match the CD's sector sizes ● Other supports and associated modulations and lossless codecs: Super Audio CD (SACD) Pulse density modulation + Direct Stream Transfer DVD-Audio, Blu-ray, (HD DVD) Meridian Lossless Packing Files, Containers and Formats
  33. 33. 33 33Multimedia Services: Audio Audio hardware: wires and connectors ● Only for audio: – Analog: PC (colors scheme) – Digital: S/PDIF. Supports uncompressed PCM audio and compressed 5.1/7.1 surround sound ● Audio with video: – Analog: SCART – Digital: HDMI Wires and Connectors

×