The document provides an introduction to MPEG-7, which is a standard for describing multimedia content. It discusses the background and need for MPEG-7, as well as the main components of MPEG-7 including the Description Definition Language (DDL) for defining descriptions, Multimedia Description Schemes (MDS) for organizing descriptors, and various audio and video descriptors. Application areas of MPEG-7 involve searching, indexing, and retrieving multimedia content across different domains.
MPEG-7 is a standard for describing multimedia content to allow users to more efficiently search, browse and retrieve audiovisual material. It was developed by the Moving Picture Experts Group in 2001. MPEG-7 defines descriptors and description schemes for features of multimedia using XML schema. It also includes tools for generating descriptions, and is used in applications like digital libraries, multimedia directories, broadcast media selection and e-business product searching.
This presentation is meant to discuss the basics of video compression like DCT, Color space conversion, Motion Compensation etc. It also discusses the standards like H.264, MPEG2, MPEG4 etc.
MPEG is a video compression standard developed in the late 1980s to enable full-motion video over networks and storage mediums. It was created by the Motion Picture Experts Group to address the need for high compression ratios to transmit video given bandwidth limitations of the time. MPEG uses spatial and temporal redundancy reduction techniques like discrete cosine transformation, quantization, and entropy coding to compress video frames and take advantage of similarities between neighboring pixels and successive frames. It defines a group of pictures structure and different frame types like I, P, and B frames to enable features like random access while maintaining synchronization and error robustness. MPEG became widely adopted and evolved through standards like MPEG-1, MPEG-2, and MPEG
Audio compression can be either lossless, which reduces file size while retaining all audio information, or lossy, which greatly reduces file size but decreases sound quality by losing some audio information. Common lossless formats are AIFF, WAV, and FLAC, while common lossy formats are MP3, AAC, and Vorbis. The quality and size of compressed audio files depends on factors like sample rate, bit depth, bit rate, and number of channels. Higher values for these factors generally mean higher quality audio but larger file sizes.
Computer Science (A Level) discusses data compression techniques. Compression reduces the number of bits required to represent data to save disk space and increase transfer speeds. There are two main types of compression: lossy compression, which permanently removes non-essential data and can reduce quality, and lossless compression, which identifies patterns to compress data without any loss. Common lossy techniques are JPEG, MPEG, and MP3, while common lossless techniques are run length encoding and dictionary encoding.
This document discusses audio compression techniques. It begins by defining audio and compression. There are two main types of audio compression: lossy and lossless. Lossy compression reduces file sizes but results in some quality loss, while lossless compression decompresses the file back to its original quality. Common lossy audio compression methods are discussed, including those based on psychoacoustics involving how humans perceive sound. MPEG layers are then introduced as a standard for audio compression, with Layer I being highest quality but also highest bitrate, and Layer III providing greater compression but still high quality at lower bitrates like 64kbps. Effectiveness is shown to increase with each newer layer.
The document discusses several standard and proprietary streaming media protocols. It introduces Real-Time Transport Protocol (RTP) and Real-Time Control Protocol (RTCP) which transport streaming media and provide quality of service reports. It also describes Real Time Streaming Protocol (RTSP) which provides playback controls. Synchronized Multimedia Integration Language (SMIL) is mentioned as an XML language for multimedia content. Major companies like Real, Microsoft, and Apple are noted to use similar but proprietary protocols instead of the standards.
MPEG-7 is a standard for describing multimedia content to allow users to more efficiently search, browse and retrieve audiovisual material. It was developed by the Moving Picture Experts Group in 2001. MPEG-7 defines descriptors and description schemes for features of multimedia using XML schema. It also includes tools for generating descriptions, and is used in applications like digital libraries, multimedia directories, broadcast media selection and e-business product searching.
This presentation is meant to discuss the basics of video compression like DCT, Color space conversion, Motion Compensation etc. It also discusses the standards like H.264, MPEG2, MPEG4 etc.
MPEG is a video compression standard developed in the late 1980s to enable full-motion video over networks and storage mediums. It was created by the Motion Picture Experts Group to address the need for high compression ratios to transmit video given bandwidth limitations of the time. MPEG uses spatial and temporal redundancy reduction techniques like discrete cosine transformation, quantization, and entropy coding to compress video frames and take advantage of similarities between neighboring pixels and successive frames. It defines a group of pictures structure and different frame types like I, P, and B frames to enable features like random access while maintaining synchronization and error robustness. MPEG became widely adopted and evolved through standards like MPEG-1, MPEG-2, and MPEG
Audio compression can be either lossless, which reduces file size while retaining all audio information, or lossy, which greatly reduces file size but decreases sound quality by losing some audio information. Common lossless formats are AIFF, WAV, and FLAC, while common lossy formats are MP3, AAC, and Vorbis. The quality and size of compressed audio files depends on factors like sample rate, bit depth, bit rate, and number of channels. Higher values for these factors generally mean higher quality audio but larger file sizes.
Computer Science (A Level) discusses data compression techniques. Compression reduces the number of bits required to represent data to save disk space and increase transfer speeds. There are two main types of compression: lossy compression, which permanently removes non-essential data and can reduce quality, and lossless compression, which identifies patterns to compress data without any loss. Common lossy techniques are JPEG, MPEG, and MP3, while common lossless techniques are run length encoding and dictionary encoding.
This document discusses audio compression techniques. It begins by defining audio and compression. There are two main types of audio compression: lossy and lossless. Lossy compression reduces file sizes but results in some quality loss, while lossless compression decompresses the file back to its original quality. Common lossy audio compression methods are discussed, including those based on psychoacoustics involving how humans perceive sound. MPEG layers are then introduced as a standard for audio compression, with Layer I being highest quality but also highest bitrate, and Layer III providing greater compression but still high quality at lower bitrates like 64kbps. Effectiveness is shown to increase with each newer layer.
The document discusses several standard and proprietary streaming media protocols. It introduces Real-Time Transport Protocol (RTP) and Real-Time Control Protocol (RTCP) which transport streaming media and provide quality of service reports. It also describes Real Time Streaming Protocol (RTSP) which provides playback controls. Synchronized Multimedia Integration Language (SMIL) is mentioned as an XML language for multimedia content. Major companies like Real, Microsoft, and Apple are noted to use similar but proprietary protocols instead of the standards.
An audio file format is a file format for storing digital audio data on a computer system. There are three main types of audio file formats: uncompressed, lossless compression, and lossy compression (like MP3 and AAC). Examples of common file extensions include .wav, .mp3, .m4a, and .ra.
This document provides an overview of voice recognition biometrics. It discusses the history and development of voice recognition technology from early systems in the 1920s through current applications. The document explains how voice recognition works, capturing a voice sample, creating a voiceprint, and verifying a voice during the authentication process. It highlights benefits of voice recognition systems for security and cost savings but also challenges, such as variations in human voices and environmental noises. Current applications discussed include building access security, corrections monitoring, and telephone banking/ATM verification. The document concludes that voice recognition provides strong security when combined with other authentication methods and will likely continue growing as a biometric technology.
This white paper discusses various video compression techniques and standards. It explains that JPEG is used for still images while MPEG is used for video. The two main early standards were JPEG and MPEG-1. Later standards like MPEG-2, MPEG-4, and H.264 provided improved compression ratios and capabilities. Key techniques discussed include lossy compression, comparing adjacent frames to reduce redundant data, and balancing compression ratio with image quality and latency considerations for different applications like surveillance video.
This document provides an overview of optical storage media technologies including CD, CD-ROM, CD-R, DVD, and HD-DVD. It describes the key characteristics of each format such as storage capacity, data encoding, error correction techniques, and compatibility. The core technologies that enable higher storage densities are reduced pit/land sizes, increased track densities, more efficient coding and error correction, and additional data layers. HD-DVD builds on DVD to provide high definition video storage on discs with the same physical dimensions as DVD.
GREEND: An energy consumption dataset of households in Austria and ItalyAndrea Monacchi
Home energy management systems can be used to monitor and optimize consumption and local production from renewable energy. To assess solutions before their deployment, researchers and designers of those systems demand for energy consumption datasets. In this paper, we present the GREEND dataset, containing detailed power usage information obtained through a measurement campaign in households in Austria and Italy. We provide a description of consumption scenarios and discuss design choices for the sensing infrastructure. Finally, we benchmark the dataset with state-of-the-art techniques in load disaggregation, occupancy detection and appliance usage mining.
Steganography and Its Applications in SecurityIJMER
ABSTRACT: Steganography is the dark cousin of cryptography, the use of codes. While cryptography provides privacy,
steganography is intended to provide secrecy. Steganography is a method of covertly communicating. Steganography is a
process that involves hiding a message in an appropriate carrier for example an image or an audio file. The carrier can then
be sent to a receiver without anyone else knowing that it contains a hidden message. This is a process, which can be used for
example by civil rights organizations in repressive states to communicate their message to the outside world without their
own government being aware of it. In this article we have tried to elucidate the different approaches towards implementation
of Steganography using ‘multimedia’ file (text, static image, audio and video). Steganalysis is a newly emerging branch of
data processing that seeks the identification of steganographic covers, and if possible message extraction. It is similar to
cryptanalysis in cryptography. The technique is ancient emerging monster that have gained immutable notice as it have
newly penetrated the world of digital communication security. Objective is not only to prevent the message being read but
also to hide its existence.
Keywords: Carrier, Privacy, Secrecy, Steganalysis, Steganography
Comparison of various data compression techniques and it perfectly differentiates different techniques of data compression. Its likely to be precise and focused on techniques rather than the topic itself.
Digital images are represented by a matrix of numeric values where each value corresponds to the intensity of a pixel at a specific location. Images can be binary, representing black and white, or they can have multiple intensity levels represented by integers to capture shades of gray. Standard image file formats specify the spatial resolution in pixels and color encoding using a certain number of bits per pixel. When stored, an image is saved as a two-dimensional array of values, each representing intensity data for a pixel. Bitmap images use a one-dimensional matrix for monochrome and greater bit depth for more colors. Popular graphics software programs allow for image editing, painting and drawing.
This document discusses the JPEG image compression standard. It begins with an overview of what JPEG is, including that it is an international standard for compressing color and grayscale images up to 24 bits per pixel. The document then discusses the basic JPEG compression pipeline of encoding and decoding. It also outlines some of the major algorithms used in JPEG compression, including color space transformation, discrete cosine transform (DCT), quantization, zigzag scanning, and entropy coding. A key component discussed is the DCT, which converts image data into frequency domains and is useful for energy compaction in compression. The document concludes with noting implementations of JPEG and DCT in fields like image processing, scientific analysis, and audio processing.
This document provides an overview of digital audio compression techniques. It discusses how audio compression removes redundant or irrelevant information to reduce required storage space and transmission bandwidth. It describes how psychoacoustic modeling is used to eliminate inaudible components based on principles of masking. Spectral analysis is performed using transforms or filter banks to determine masking thresholds. Noise allocation quantizes frequency components to minimize noise while meeting thresholds. Additional techniques like predictive coding, coupling/delta encoding, and Huffman coding provide further compression. The encoding process involves analyzing, quantizing, and packing audio data into frames for storage or transmission.
The document discusses speech recognition and voice recognition. It covers what voice is, the components of sound, why voices are different, classification of speech sounds, the speech production process, what voice recognition is, automatic speech recognition (ASR), types of ASR systems including speaker-dependent and speaker-independent, approaches to speech recognition including template matching and statistical approaches, and the process of speech recognition.
introduction to audio formats - Multimedia StudentsSEO SKills
This document discusses audio file formats. It begins by explaining what sound is and how it is digitized through sampling and quantization. It then covers both uncompressed formats like PCM, WAV, and AIFF as well as compressed formats. Lossy formats discussed include MP3, AAC, OGG, and WMA, while lossless formats include FLAC, ALAC, and WMA Lossless. The document recommends using uncompressed formats for raw audio work, lossless compression like FLAC for high-quality music listening, and lossy compression if storage space needs to be conserved or quality is less important.
Multimedia is the presentation of information using a combination of text, audio, graphics, video and animation. There are two types of interactivity in multimedia - linear interactivity where the user passively receives content, and non-linear interactivity where the user can control the sequence. Non-linear multimedia allows two-way communication while linear does not. Hardware and software tools can be used to produce multimedia content. The production process involves analysis, design, implementation, testing and publishing phases.
MPEG-7 is a standard for describing multimedia content to enable search and retrieval of audiovisual information. It provides tools for describing multimedia content such as descriptors, description schemes, and a description definition language. The goal of MPEG-7 is to make multimedia content as searchable as text by providing metadata about features, structure, and semantics of audiovisual data.
The document provides an overview of the Internet by defining it as a network of networks that connects computers worldwide and allows for communication through services like email, file transfers, and the World Wide Web. It discusses the early history and development of the Internet from the 1960s onward. It also defines important Internet technologies like TCP/IP, IP addresses, domain names, browsers, search engines, and common online services available to Internet users.
Este documento explica conceptos básicos sobre imágenes digitales, incluyendo qué es un píxel, diferentes formatos de archivo como JPG, GIF y PNG, profundidad de color, resolución y factores a considerar para optimizar imágenes para pantalla e impresión. Resalta que para diseños en pantalla se deben usar formatos JPG, PNG o GIF a 72 dpi para lograr presentaciones atractivas de rápida visualización con el menor peso posible, mientras que para impresión se requieren formatos como TIFF, PSD o PDF a 150-300 dpi
This document discusses data compression techniques. It begins by defining data compression as encoding information in a file to take up less space. It then covers the need for compression to save storage and transmission time. The main types of compression discussed are lossless, which allows exact reconstruction of data, and lossy, which allows approximate reconstruction for better compression. Specific lossless techniques covered include Huffman coding, which assigns variable length codes based on frequency. Lossy techniques like JPEG are also discussed. The document concludes by listing applications of compression techniques in files, multimedia, and communication.
Data compression reduces the size of data files by removing redundant information while preserving the essential content. It aims to reduce storage space and transmission times. There are two main types of compression: lossless, which preserves all original data, and lossy, which sacrifices some quality for higher compression ratios. Common lossless methods are run-length encoding, Huffman coding, and Lempel-Ziv encoding, while lossy methods include JPEG, MPEG, and MP3.
MPEG-7 is an international standard for describing multimedia content to allow for fast and efficient searching. It was created by the Moving Picture Experts Group to address the need to efficiently manage and search the large amount of multimedia data available online. MPEG-7 uses description schemes and tools like color, texture, shape, and motion descriptors to provide standardized descriptions of audiovisual information and facilitate searching, indexing, filtering and accessing multimedia content. It has applications in education, journalism, tourism and other areas where multimedia data needs to be organized and retrieved.
Slides of a talk I gave in June 2018 at Google, giving an overview of various JPEG standardisation activities in compression and a short introductory with past projects.
This document provides an overview of MPEG-7, a standard for describing multimedia content. MPEG-7 allows for searching, indexing, and accessing multimedia in an interoperable way. It defines tools like descriptors, description schemes, and a description definition language to represent features of audiovisual content. MPEG-7 provides standardized ways of describing visual and audio features as well as generic multimedia descriptions. It aims to make multimedia as searchable as text.
MPEG-7 is an international standard for describing multimedia content to enable searching, filtering, and browsing of audiovisual data. It provides descriptors for features like color, texture, shape, and motion to support content-based retrieval of images and video. MPEG-7 also defines description schemes and a description definition language to create structured descriptions of multimedia content. The standard aims to facilitate searching, identifying, filtering and browsing of multimedia content through both text-based and content-based retrieval methods.
An audio file format is a file format for storing digital audio data on a computer system. There are three main types of audio file formats: uncompressed, lossless compression, and lossy compression (like MP3 and AAC). Examples of common file extensions include .wav, .mp3, .m4a, and .ra.
This document provides an overview of voice recognition biometrics. It discusses the history and development of voice recognition technology from early systems in the 1920s through current applications. The document explains how voice recognition works, capturing a voice sample, creating a voiceprint, and verifying a voice during the authentication process. It highlights benefits of voice recognition systems for security and cost savings but also challenges, such as variations in human voices and environmental noises. Current applications discussed include building access security, corrections monitoring, and telephone banking/ATM verification. The document concludes that voice recognition provides strong security when combined with other authentication methods and will likely continue growing as a biometric technology.
This white paper discusses various video compression techniques and standards. It explains that JPEG is used for still images while MPEG is used for video. The two main early standards were JPEG and MPEG-1. Later standards like MPEG-2, MPEG-4, and H.264 provided improved compression ratios and capabilities. Key techniques discussed include lossy compression, comparing adjacent frames to reduce redundant data, and balancing compression ratio with image quality and latency considerations for different applications like surveillance video.
This document provides an overview of optical storage media technologies including CD, CD-ROM, CD-R, DVD, and HD-DVD. It describes the key characteristics of each format such as storage capacity, data encoding, error correction techniques, and compatibility. The core technologies that enable higher storage densities are reduced pit/land sizes, increased track densities, more efficient coding and error correction, and additional data layers. HD-DVD builds on DVD to provide high definition video storage on discs with the same physical dimensions as DVD.
GREEND: An energy consumption dataset of households in Austria and ItalyAndrea Monacchi
Home energy management systems can be used to monitor and optimize consumption and local production from renewable energy. To assess solutions before their deployment, researchers and designers of those systems demand for energy consumption datasets. In this paper, we present the GREEND dataset, containing detailed power usage information obtained through a measurement campaign in households in Austria and Italy. We provide a description of consumption scenarios and discuss design choices for the sensing infrastructure. Finally, we benchmark the dataset with state-of-the-art techniques in load disaggregation, occupancy detection and appliance usage mining.
Steganography and Its Applications in SecurityIJMER
ABSTRACT: Steganography is the dark cousin of cryptography, the use of codes. While cryptography provides privacy,
steganography is intended to provide secrecy. Steganography is a method of covertly communicating. Steganography is a
process that involves hiding a message in an appropriate carrier for example an image or an audio file. The carrier can then
be sent to a receiver without anyone else knowing that it contains a hidden message. This is a process, which can be used for
example by civil rights organizations in repressive states to communicate their message to the outside world without their
own government being aware of it. In this article we have tried to elucidate the different approaches towards implementation
of Steganography using ‘multimedia’ file (text, static image, audio and video). Steganalysis is a newly emerging branch of
data processing that seeks the identification of steganographic covers, and if possible message extraction. It is similar to
cryptanalysis in cryptography. The technique is ancient emerging monster that have gained immutable notice as it have
newly penetrated the world of digital communication security. Objective is not only to prevent the message being read but
also to hide its existence.
Keywords: Carrier, Privacy, Secrecy, Steganalysis, Steganography
Comparison of various data compression techniques and it perfectly differentiates different techniques of data compression. Its likely to be precise and focused on techniques rather than the topic itself.
Digital images are represented by a matrix of numeric values where each value corresponds to the intensity of a pixel at a specific location. Images can be binary, representing black and white, or they can have multiple intensity levels represented by integers to capture shades of gray. Standard image file formats specify the spatial resolution in pixels and color encoding using a certain number of bits per pixel. When stored, an image is saved as a two-dimensional array of values, each representing intensity data for a pixel. Bitmap images use a one-dimensional matrix for monochrome and greater bit depth for more colors. Popular graphics software programs allow for image editing, painting and drawing.
This document discusses the JPEG image compression standard. It begins with an overview of what JPEG is, including that it is an international standard for compressing color and grayscale images up to 24 bits per pixel. The document then discusses the basic JPEG compression pipeline of encoding and decoding. It also outlines some of the major algorithms used in JPEG compression, including color space transformation, discrete cosine transform (DCT), quantization, zigzag scanning, and entropy coding. A key component discussed is the DCT, which converts image data into frequency domains and is useful for energy compaction in compression. The document concludes with noting implementations of JPEG and DCT in fields like image processing, scientific analysis, and audio processing.
This document provides an overview of digital audio compression techniques. It discusses how audio compression removes redundant or irrelevant information to reduce required storage space and transmission bandwidth. It describes how psychoacoustic modeling is used to eliminate inaudible components based on principles of masking. Spectral analysis is performed using transforms or filter banks to determine masking thresholds. Noise allocation quantizes frequency components to minimize noise while meeting thresholds. Additional techniques like predictive coding, coupling/delta encoding, and Huffman coding provide further compression. The encoding process involves analyzing, quantizing, and packing audio data into frames for storage or transmission.
The document discusses speech recognition and voice recognition. It covers what voice is, the components of sound, why voices are different, classification of speech sounds, the speech production process, what voice recognition is, automatic speech recognition (ASR), types of ASR systems including speaker-dependent and speaker-independent, approaches to speech recognition including template matching and statistical approaches, and the process of speech recognition.
introduction to audio formats - Multimedia StudentsSEO SKills
This document discusses audio file formats. It begins by explaining what sound is and how it is digitized through sampling and quantization. It then covers both uncompressed formats like PCM, WAV, and AIFF as well as compressed formats. Lossy formats discussed include MP3, AAC, OGG, and WMA, while lossless formats include FLAC, ALAC, and WMA Lossless. The document recommends using uncompressed formats for raw audio work, lossless compression like FLAC for high-quality music listening, and lossy compression if storage space needs to be conserved or quality is less important.
Multimedia is the presentation of information using a combination of text, audio, graphics, video and animation. There are two types of interactivity in multimedia - linear interactivity where the user passively receives content, and non-linear interactivity where the user can control the sequence. Non-linear multimedia allows two-way communication while linear does not. Hardware and software tools can be used to produce multimedia content. The production process involves analysis, design, implementation, testing and publishing phases.
MPEG-7 is a standard for describing multimedia content to enable search and retrieval of audiovisual information. It provides tools for describing multimedia content such as descriptors, description schemes, and a description definition language. The goal of MPEG-7 is to make multimedia content as searchable as text by providing metadata about features, structure, and semantics of audiovisual data.
The document provides an overview of the Internet by defining it as a network of networks that connects computers worldwide and allows for communication through services like email, file transfers, and the World Wide Web. It discusses the early history and development of the Internet from the 1960s onward. It also defines important Internet technologies like TCP/IP, IP addresses, domain names, browsers, search engines, and common online services available to Internet users.
Este documento explica conceptos básicos sobre imágenes digitales, incluyendo qué es un píxel, diferentes formatos de archivo como JPG, GIF y PNG, profundidad de color, resolución y factores a considerar para optimizar imágenes para pantalla e impresión. Resalta que para diseños en pantalla se deben usar formatos JPG, PNG o GIF a 72 dpi para lograr presentaciones atractivas de rápida visualización con el menor peso posible, mientras que para impresión se requieren formatos como TIFF, PSD o PDF a 150-300 dpi
This document discusses data compression techniques. It begins by defining data compression as encoding information in a file to take up less space. It then covers the need for compression to save storage and transmission time. The main types of compression discussed are lossless, which allows exact reconstruction of data, and lossy, which allows approximate reconstruction for better compression. Specific lossless techniques covered include Huffman coding, which assigns variable length codes based on frequency. Lossy techniques like JPEG are also discussed. The document concludes by listing applications of compression techniques in files, multimedia, and communication.
Data compression reduces the size of data files by removing redundant information while preserving the essential content. It aims to reduce storage space and transmission times. There are two main types of compression: lossless, which preserves all original data, and lossy, which sacrifices some quality for higher compression ratios. Common lossless methods are run-length encoding, Huffman coding, and Lempel-Ziv encoding, while lossy methods include JPEG, MPEG, and MP3.
MPEG-7 is an international standard for describing multimedia content to allow for fast and efficient searching. It was created by the Moving Picture Experts Group to address the need to efficiently manage and search the large amount of multimedia data available online. MPEG-7 uses description schemes and tools like color, texture, shape, and motion descriptors to provide standardized descriptions of audiovisual information and facilitate searching, indexing, filtering and accessing multimedia content. It has applications in education, journalism, tourism and other areas where multimedia data needs to be organized and retrieved.
Slides of a talk I gave in June 2018 at Google, giving an overview of various JPEG standardisation activities in compression and a short introductory with past projects.
This document provides an overview of MPEG-7, a standard for describing multimedia content. MPEG-7 allows for searching, indexing, and accessing multimedia in an interoperable way. It defines tools like descriptors, description schemes, and a description definition language to represent features of audiovisual content. MPEG-7 provides standardized ways of describing visual and audio features as well as generic multimedia descriptions. It aims to make multimedia as searchable as text.
MPEG-7 is an international standard for describing multimedia content to enable searching, filtering, and browsing of audiovisual data. It provides descriptors for features like color, texture, shape, and motion to support content-based retrieval of images and video. MPEG-7 also defines description schemes and a description definition language to create structured descriptions of multimedia content. The standard aims to facilitate searching, identifying, filtering and browsing of multimedia content through both text-based and content-based retrieval methods.
This document provides an introduction and overview of MPEG-7. MPEG-7 is a multimedia content description standard that provides metadata for multimedia content to enable content to be found, retrieved, accessed, filtered and managed. It was created due to the growing amount of digital multimedia and need for better identification and management of content. The standard includes description tools like descriptors, descriptions and applications for content production and consumption.
The document discusses MPEG-7, a standard for multimedia metadata. It provides an overview of MPEG-7 descriptors for describing visual and audio content, as well as multimedia communities that promote MPEG-7 adoption and interoperability. The document also outlines various applications of MPEG-7 for multimedia content management, description, navigation and retrieval.
A Personalized Audio Web Service using MPEG-7 and MPEG-21 standardsUniversity of Piraeus
This document summarizes a paper that presents a personalized audio web service using MPEG-7 and MPEG-21 standards for metadata description and querying. The proposed system delivers personalized audio content to users based on their preferences stored using MPEG standards. It uses a decentralized architecture where user preferences are stored locally on clients, while audio resources and adaptation metadata are stored on the web service. The service utilizes standards like MPEG-7, MPEG-21, OWL, SPARQL and MPQF to manage metadata, ontologies and queries for personalizing audio content delivery.
MPEG-7 is a multimedia content description standard that allows for fast and efficient searching of multimedia data. It provides standardized descriptions for various types of multimedia information like audio and video. MPEG-7 consists of parts for systems, description definition language, visual, audio, multimedia description schemes, reference software, conformance testing, and profiles and levels. Its goal is to make multimedia information as easy to find, retrieve, access, filter and manage as text-based information on the web.
C14 fiatifta dubai 2013, the mpeg-7 audiovisual description profile standar...FIAT/IFTA
This document discusses the MPEG-7 Audiovisual Description Profile (AVDP) standard for describing results of automatic annotation services. It was motivated by the need for a common format to represent and exchange metadata generated by various audiovisual analysis tools. The AVDP profile simplifies and constrains the MPEG-7 standard to define a schema for annotation data focused on content structure, features and semantic information. Examples of applications that use AVDP include automatic video quality analysis and validation of metadata descriptions.
Mpeg 7 video signature tools for content recognitionParag Tamhane
The document discusses the MPEG-7 Video Signature standard for content recognition. It aims to efficiently search large databases of videos by extracting compact and robust signatures that can detect duplicate, edited, or embedded video clips. The standard includes algorithms for signature extraction and matching, and enables interoperability across systems for content identification and management. It achieves high detection accuracy for common editing operations like embedding clips in longer videos.
The document discusses building an ontology for MPEG-7 multimedia content descriptions using the Resource Description Framework (RDF) Schema language. It describes the process of reverse engineering an RDF class hierarchy and property relationships from the MPEG-7 XML Schema definitions. Challenges included expressing multiple range constraints and dealing with the lack of an existing data model. The results include RDF Schema definitions for the top-level multimedia entities, segments, decomposition relationships, and basic non-multimedia entities defined in MPEG-7. Expressing the MPEG-7 semantics in an RDF Schema ontology enables greater interoperability and integration with descriptions from other domains on the Semantic Web.
The document discusses building an ontology for MPEG-7 multimedia content descriptions using the Resource Description Framework (RDF) Schema language. It describes the process of reverse engineering an RDF class hierarchy and property relationships from the MPEG-7 XML Schema definitions. Challenges included expressing multiple range constraints and defining unions of classes. The resulting RDF Schema ontology defines classes for different types of multimedia content, segments, and descriptors, along with their relationships. This will enable semantic interoperability between MPEG-7 and other domains by providing a common understanding of multimedia descriptions.
This document discusses validation of preservation methodologies using Representation Information (RepInfo). It presents RepInfo as a way to validate that future users can understand and reuse preserved data. It also describes creating formal, machine-readable RepInfo that defines data structures and semantics to enable validation and reuse of preserved data by new software over time. A variety of tools are presented for creating RepInfo, including formal description languages for defining data structures and semantics.
This document describes a project to create virtual vision glasses to help blind people. The glasses will use optical character recognition, computer vision techniques, text-to-speech, and translation to assist users with daily tasks like reading text, navigating surroundings, and understanding foreign languages. The proposed system will be built using a Raspberry Pi single board computer with a camera, and will include applications for text recognition, translation, and assistance from Google Assistant. It aims to make an affordable assistive device for the blind and help with issues like reading signs, books, and instructions in different languages.
The document provides information about MPEG compression standards. It discusses the history of MPEG and how it was established in 1988 as a joint effort between ISO and IEC to set standards for audio and video compression. It describes several MPEG standards including MPEG-1, MPEG-2, MPEG-4, MPEG-7, and MPEG-21. MPEG-4 is discussed in more detail, explaining that it offers greater efficiency than MPEG-2, allows encoding of mixed data types, and enables interaction of audio-visual scenes at the receiver end. The document contains diagrams and tables to illustrate key points about the different MPEG standards and compression techniques.
This paper presents an audio personalization framework for mobile devices. The multimedia
models MPEG-21 and MPEG-7 are used to describe metadata information. The metadata which support personalization are stored into each device. The Web Ontology Language (OWL) language is used to produce and manipulate the relative ontological descriptions. The process is distributed according to the MapReduce framework and implemented over the Android platform. It determines a hierarchical system structure consisted of Master and Worker devices. The Master retrieves a list of audio tracks matching specific criteria using SPARQL queries.
Technologies For Appraising and Managing Electronic Recordspbajcsy
This document summarizes technologies for appraising and managing electronic records, including discovering relationships among digital file collections and comparing document versions. It presents three technologies: file2learn to discover relationships between files based on metadata extraction and analysis; doc2learn for comprehensive document comparisons; and Polyglot for automated file format conversion and quality assessment.
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09Roku
This document describes a system for creating and publishing resumes using semantic web technologies. The system allows users to semantically tag skills and work experience, receives content-based and collaborative recommendations on additional tags, and publishes resumes with RDFa annotations so they can be crawled and understood by semantic agents. Future work includes linking to external data sources like DBpedia and improving methods for ranking nodes in the semantic graph.
Video indexing involves segmenting, analyzing, and abstracting video content into various levels including sequence, scene, shot, frame, and object. It can involve both low-level indexing based on visual features and high-level indexing focusing on semantic content. However, fully automated semantic indexing of large amounts of video data remains a challenge due to issues like the dynamic and interpretive nature of video versus text. Standards like MPEG-7 and Dublin Core along with metadata are used to aid in cataloging and retrieving video content for various applications and user needs.
Lecture given on January 28, 2019 to post-graduate students of the Computer Engineering and Media program, at the School of Journalism and Media, Aristotle University of Thessaloniki.
This document provides an introduction and overview of MPEG-21. MPEG-21 is an open framework for multimedia delivery and consumption that focuses on content creators and consumers. It aims to define the technology needed to support users in efficiently exchanging, accessing, consuming, trading, and manipulating digital items in an interoperable way. MPEG-21 is structured into multiple parts that cover areas like digital item declaration, identification, intellectual property management and protection, and rights expression.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
1. Introduction to MPEG-7
Guest lecture for ECE417 TSH
Charlie Dagli
[dagli@illinois.edu]
April 7, 2009
2. Contents
This lecture : A general idea of MPEG – 7
MPEG-7
–Background
–Introduction
–Components of MPEG-7
Description Definition Language (DDL)
Multimedia Description Scheme (MDS)
Video Descriptors
Audio Descriptors
–References
2
3. Background
Search and Retrieval of Multimedia data
– In recent years, there has been a huge increasing amount of audiovisual data
that is becoming available
– Applications
Large-scale multimedia search engines on the Web
Media asset management systems in corporations
AV broadcast servers
Personal media servers…
– Need: Retrieval, search, storage of the AV-data with higher level concept
– A solver:
Efficient processing tools to create description of AV material or to support the
identification or retrieval of AV documents.
– The research activity on processing tools, the need for interoperability
between devices has been recognized and standardization activities have
been launched.
MPEG-7, “MULTIMEDIA CONTENT DESCRIPTION INTERFACE”,
standardizes the description of multimedia content supporting wide range of
applications.
MPEG stands for Moving Picture Experts Group (1988)
3
4. Introduction : What is MPEG-7?
“Multimedia Content Description Interface”
–Intuition:
NOT focus so much on processing tools
Concentrate more on the selection of features that have to be described
Find a way to structure and instantiate the selected features with a
common language
–Efficient representation of audio-visual (AV) meta-data
–Goal: allow interoperable searching, indexing, filtering and
access of multimedia content by enabling interoperability
among devices that deal with multimedia content description.
4
5. MPEG-7 Main Elements
Descriptor (D) – standardized “audio
only” and “visual only” descriptors. <ex>
a time code for duration, color histograms
for color.
Multimedia Description Scheme (MDS)
– standardized description schemes for
audio and visual descriptors. <ex> video:
temporally structured scenes and shots,
including textual descriptors at the scene
level and color, motion, audio amplitude
descriptors at the shot level.
Description Definition Language
(DDL) – provides a standardized
language to express description schemes,
– based on XML (eXtensible Markup
Language) – a language that allows the
creation of new description schemes, and
possibly, descriptors. Also allows the
extension and modification of existing
description schemes.
5
6. What can MPEG-7 do?
Increasing availability of potentially interesting audiovisual
materials makes search more difficult.
The searching system that any type of AV material may be
retrieved by means of any type of query materials, such as video,
music, speech, etc.
– Some query examples
Music : Play a few notes on a keyboard and get in return a list of musical pieces
containing the required tune or images somehow matching the notes.
Image : Define objects, including color patches or textures and get in return
examples among which you select the interesting objects to compose your image
Voice : Using an excerpt of Pavarotti’s voice, and getting a list of Pavarotti’s
records, video clips where Pavarotti is singing or video clips where Pavarotti is
present.
Sports video analysis: can be solved by a much easier way with better results
6
7. Application Areas
Application domains listed in the MPEG-7 Applications document:
– Education
– Journalism (e.g. searching speeches of person using his name, his voice or
his face)
– Tourist information
– Cultural services (museum, art gallery, digital library)
– Entertainment (searching a game, karaoke)
– Investigation services (human characteristics recognition)
– Geographical information systems
– Remote sensing
– Surveillance (traffic control, surface transportation)
– Shopping
– Architecture, real estate, and interior design
– Social (Dating Service)
– Film, Video and Radio archives. ……..
– Audiovisual content production
7
8. MPEG-7 v.s. previous MPEG activities
MPEG-1,2, & 4 are designed to represent the information itself,
while MPEG-7 is meant to represent information about the
information.
MPEG-1,2, & 4 made content available, MPEG-7 allows you to
find the content you need.
Also, MPEG-7 can be used independently of the other MPEG
standards – the description might even be attached to an analog
movie.
8
9. MPEG-7 Parts
ISO/IEC TR 15938-1 (Systems)
– The binary format for encoding MPEG-7 descriptions and the terminal architecture.
ISO/IEC TR 15938-2 (Description Definition Language)
– The language for defining the syntax of the MPEG-7 Description Tools and for
defining new Description Schemes.
ISO/IEC TR 15938-3 (Visual)
– The Description Tools dealing with Visual descriptions.
ISO/IEC TR 15938-4 (Audio)
– The Description Tools dealing with Audio descriptions.
ISO/IEC TR 15938-5 (Multimedia Description Schemes)
– The Description Tools dealing with generic features and multimedia descriptions.
ISO/IEC TR 15938-6 (Reference Software)
– A Software implementation of relevant parts of the MPEG-7 Standard with
normative status.
ISO/IEC TR 15938-7 (Conformance Testing)
– Guidelines and procedures for testing conformance of MPEG-7 implementations
ISO/IEC TR 15938-8 (Extraction and use of descriptions)
– Informative material (in the form of a Technical Report) about the extraction and use
of some of the Description Tools.
9
11. Description Definition Language (DDL)
Foundations of MPEG-7 standard, provides the language for
defining the structure and content of multimedia information
A schema language to represent the results of modeling
audiovisual data, (i.e. descriptors, and description schemes) as
a set of syntactic, structural and value constraints to which
valid MPEG-7 descriptors, description schemes, and
descriptions must confirm.
Also provide the rules by which user can combine, extend, and
refine existing description schemes and descriptors.
XML. Example
<PersonName>
<Title> Prof. </Title>
<Firstname>Thomas </Firstname>
<Lastname>Huang</Lastname>
<Nickname>Tom</Nickname>
</PersonName>
11
13. Multimedia Description Schemes (MDS)
An overview of the organization of MPEG-7 MDS : Organized
in 6 Areas, Basic Elements, Content Descriptions, Content
Organization, Content management, Navigation and Access, and
User Interaction
13
14. MDS: Basic Elements
Basic Elements – fundamental constructs of the
definition of MPEG-7 description schemes
–Schema Tools :
facilitate the creation of valid MPEG-7 descriptions and packing..
–Basic Data types :
Integer & Real – represent constrained integer and real value
Vectors & Matrix – represent arbitrary sized vectors and matrices of
integer or real values
Probability Vectors & Matrices – represent probability distribution
described using vectors/matrices
String – represents codes identifying content type, countries, regions,
currencies, and character sets
–Linking, Identification and Localization Tools :
tools for referencing MPEG-7 descriptions, for linking descriptions to
multimedia content and for describing time in multimedia content
14
15. MDS: Basic Elements
–Example: Three kinds of media time representation:
t1 t2
Duration
TimeBase
RelTimePoint
A) Simple time: Specify a time point and a duration
B) Relative time: Specify a media time point relative to a time base, and a
duration
C) Incremental time: Specification of time using a predefined interval
called Time Unit and counting the number of intervals (efficient for
periodic signals)
15
16. MDS: Basic Elements
– Basic Description Tools : A library of description schemes and data types, which
are used as primitive components for building more complex and functionality-
specific description tools found in the rest of MPEG-7.
Graph and relation tools: weave together complex multimedia description
structures <Graph>
<Node id = “A”/> <Node id = “A”/> <Node id = “A”/> <Node id
– Ex. = “A”/> <Node id = “A”/>
<Relation type = “#r1” source “#A” target = “#B”/>
r3 r3
D <Relation type = “#r2” source “#A” target = “#C”/>
C
B r1 …………..
A
E
r4 r1 r2 </Graph>
Textual annotations: represent textual descriptions
– Free text annotation : Spain scores a goal against Sweden.
– Keyword annotation : score, Sweden, Spain
Classification schemes and terms: define and reference vocabularies for
multimedia content descriptors.
– Ex. Part of a ClassificationScheme for sports:
sports
soccer basketball baseball tennis
16
17. MDS: Basic Elements
People and locations: represent people and places related to
multimedia content
– Agent: persons, organizations, groups of persons,…
Ex. <PersonGroup>
<Name>Spanish National Soccer Team </Name>
<Kind><Name>Soccer Team </Name></Kind>
<Member>
<Name> Fernando </Name>
</Member>
<Member>
….
</PersonGroup>
– Places: existing, historical, and fictional places.
Affective description: describe emotional response to
multimedia content
– Ex. Recording an audience’s excitement while watching an action movie
Ordering tools:
– Provides a hint for ordering descriptions for presentation based on
information contained in those descriptions
– Ex. Order a set of video segments in a soccer game by the amount of
17
camera zoom within each segment.
19. MDS: Content Management
Content management : the description of the life cycle of the
content, from content to consumption
– Creation and Production Description,
Including title, textual annotation, creators, creation locations, dates, how the data
is classified, review and guidance information, and related multimedia material.
– Usage Description
Describes information related to the usage rights, usage record, and financial
information.
Rights information is not explicitly included in the description but links are
provided to the rights holders or right management.
Usage record description provides information related to the use of the content,
such as broadcasting, or demand delivery.
Financial information provides information related to the cost of production and
the income resulting from content use.
Usage description is dynamic and subject to change during the lifetime of the
multimedia content.
– Media Description
Describes the storage media in particular the compression, coding, and storage
format of multimedia content. It describes the master media that is the original
source from which different instances of the multimedia content are produced.
19
21. MDS: Structural Content Description
Content Description: structural and conceptual aspects
– Structure Description: describes the structure of multimedia built around the
notation of Segment Description Scheme that represents the spatial, temporal, or
spatiotemporal portion of the multimedia content
Segment DSs (the core element)
– Example: Mosaic DS – panoramic view of video segment constructed by
aligning together and warping the frames of a Video Segment upon each other
21
22. MDS: Structural Content Description
Specific features for structural data description
Feature Video Still Moving Audio
Segment Region Region Segment
Time X X X
Shape X X
Color X X X
Texture X
Motion X X
Camera X
motion
Audio X X X
features
22
24. MDS: Conceptual Content Description
Conceptual aspects: describes the multimedia content from
the viewpoint of real-world semantics and conceptual
notations.
– Involve entities such as objects, events, abstract concepts and relationships.
– Segment description schemes and semantic description schemes are related
by a set of links that allows the multimedia content to be described on the
basis of both content structure and semantics together.
24
25. MDS: Conceptual Content Description
Example of video segments and Regions Corresponding SegmentRelationship Graph
25
27. MDS: Navigation and Access
Facilitating browsing and retrieval by defining summaries,
views, and variations of the multimedia content.
Summaries: provide compact highlights of the multimedia
content to enable discovering, browsing, navigation, and
visualization of multimedia content.
– Hierarchical navigation mode
– Sequential navigation mode
27
28. MDS: Navigation and Access
View: based on partitions and decompositions, which
describes different decompositions of the multimedia signals
in space, time, and frequency. The partitions and
decompositions can be used as different views of the
multimedia content important for multi-resolution access
and progressive retrieval.
Variations: provides different variations of multimedia
programs, such as summaries and abstract, scaled,
compressed and low-resolution versions and versions with
different languages and modalities – audio, video, image, text,
and so forth allow the selection of the most suitable
variation of a multimedia program
28
30. MDS: Content Organization
Content Organization – tools describe collections and models
– Collection: unordered sets of multimedia content, segments, descriptor
instances, concepts or mixed sets of the above
(Example of collections of AV content including the relationships (i.e.
RAB,RBC,RAC) within and across Collection Clusters)
Collection structure
Content collection
Segment collection
Descriptor collection Collection (abstract)
Concept collection
Mixed collection
30
31. MDS: Content Organization
– Model tools: Parameterized representation of an instance or class
multimedia content, descriptors or collections, as follows:
Probability model : Associates statistics or probabilities with the attributes of
multimedia content, descriptors or collections
Analytic model: Associates labels or semantics with multimedia content or
collections
Cluster model: Associates labels or semantics and statistics or probabilities with
multimedia content collections
Classification model: Describes information about known collections of
multimedia content in terms of labels, semantics, and models that can be used to
classify unknown multimedia content
Model (abstract)
Classification Model
Probability Model Analytic Model Cluster Model
Cluster Model
Probability Model Collection Model ClusterClassification
Model
Discrete distribution Probability Model class
ProbabilityClassification
Continuous
Model
distribution
Finite State Model 31
32. MDS: Content Organization
– Clusters of positive
and negative
examples of images
are described using
Cluster Model tool.
– Soccer video sequence
modeled using State
Transition Model tool.
32
34. MDS: User Interaction
User interaction describes user preferences and usage history
Allow matching between user preferences and MPEG-7
content description facilitate personalization of multimedia
content access, presentation, and consumption.
34
35. Introduction to MPEG-7
Guest lecture for ECE417 TSH
Charlie Dagli
[dagli@illinois.edu]
April 7, 2009
36. Introduction : What is MPEG-7?
“Multimedia Content Description Interface”
–Intuition:
NOT focus so much on processing tools
Concentrate more on the selection of features that have to be described
Find a way to structure and instantiate the selected features with a
common language
–Provide a way to get information about the audiovisual (AV)
data without the need of performing the actual decoding of these
data.
–Goal: allow interoperable searching, indexing, filtering and
access of multimedia content by enabling interoperability
among devices that deal with multimedia content description.
36
37. MPEG-7 Main Elements
Descriptor (D) – provides standardized “audio only” and “visual only”
descriptors. <ex> a time code for duration, color histograms for color.
Multimedia Description Scheme (MDS) – provides standardized description
schemes involving both audio and visual descriptors. <ex> a movie,
temporally structured as scenes and shots, including textual descriptors at the
scene level and color, motion, audio amplitude descriptors at the shot level.
Description Definition Language (DDL) – provides a standardized language
to express description schemes,
– based on XML (eXtensible Markup Language) – a language that allows the creation
of new description schemes, and possibly, descriptors. Also allows the extension and
modification of existing description schemes.
Coding Schemes – compressing MPEG-7 textual XML descriptions into
Binary format (BiM) to satisfy application requirements for compression
efficiency, error resilience, ...
SYSTEM:
37
38. Visual Descriptors
Cover 6 basic visual features as
–Color
–Texture
–Shape
–Motion
–Localization
–Face Recognition
38
39. Color descriptors
Color Descriptors
– Color Space : defines the color components as continuous-value entities
R, G, B
Y, Cr, Cb
– Y = 0.299R + 0.587G + 0.114B
– Cb = – 0.169R – 0.331G + 0.500B Min (whiteness)
– Cr = 0.500R – 0.419G – 0.081B
H, S, V (Hue, Saturation, Value)
– A nonlinear transform of the RGB
– Quantized into 16,32,64,128,256 bins for
scalable color descriptor and frames
histogram descriptor
HMMD (Hue, Max, Min, Diff, Sum)
– Max = max (R, G, B)
– Min = min (R, G, B)
– Diff = Max – Min Max (blackness)
– Sum = (Max + Min ) / 2
Linear transformation matrix with reference to R, G, B
– Any 3 x 3 color transform matrix that specifies the linear
transformation between RGB and the respective color space.
Monochrome: Y component alone in YCrCb is used
39
40. Color Descriptors
–Color Quantization Descriptor : specifies the partitioning of the
given color space into discrete bins.
–Dominant Color Descriptor (DCD): allows specification of a small
number of dominant color values as well as their statistical properties, such as
distribution and variance provides an effective an compact representation
of colors present in a region or an image.
DCD is defined to be
F = {(ci, pi, vi), s}, (i = 1, 2, .. N), N is the number of dominant colors
ci dominant color value, a vector of corresponding color space component
values
pi the fraction of pixels in the image corresponding to ci
vi the variation of the color values of the pixels in a cluster around the
corresponding representative color
s the spatial coherency, represents the overall spatial homogeneity
(Examples of low and high spatial coherency of color)
40
41. Color Descriptors
–Scalable Color Descriptor : a Haar transform-based encoding
scheme applied across values of a color histogram in the HSV
color space
– Useful for image-to-image matching and retrieval based on color feature. Its
binary representation is scalable in terms of bin numbers and bit
representation accuracy over a broad range of data rate.
–Group-of-Frame or Group-of-Picture Descriptor :
For joint representation of color-based features for multiple images or multiple
frames in a video segment
Traditionally for a group of frames or pictures a key frame or image is
selected and the color-related features of the entire collection are represented by
the chosen sample unreliable
By GoF and GoP histogram based descriptors that reliably capture the color
content of multiple images or video frames.
41
42. Color Descriptors
– Color Layout Descriptor (CLD) : represents the spatial distribution of
representative colors on a grid superimposed on a region or image. Representation is
based on coefficients of Discrete Cosine Transform. This is a very compact
descriptor being highly efficient in fast browsing and search applications.
– Color Structure Descriptor (CSD): based on color histogram, but aims at
identifying localized color distributions using a small structuring window. To
guarantee, interoperability, the CSD is bound to the HMMD color space.
– CSD: the degree to which its pixels are clumped together relative to the scale of an
associated structuring element.
Examples of structured and unstructured color.
42
43. Texture Descriptors
Homogeneous Texture Descriptor (HTD):
– provides a quantitative representation using 62 numbers, consisting of the
mean energy and energy deviation from a set of frequency channel
– Useful for similarity retrieval
– Effective in characterizing homogeneous texture regions
Texture Browsing Descriptor (TBD):
– Defined for coarse level texture browsing
– Provides a perceptual characterization of texture, similar to human
characterization, in terms of regularity, coarseness and directionality of the
texture pattern.
Edge Histogram Descriptor (EHD):
– Capture spatial distribution of edges in an image
– Useful in matching regions with partially varying, non-uniform texture.
43
44. Homogeneous Texture Descriptor
• Texture Descriptor
– Homogeneous Texture Descriptor (HTD): characterize the region
texture using the mean energy and the energy deviation from a set of
frequency channel. The 2D frequency plane is partitioned into 30
channels as the following:
(Frequency layout for
feature extraction)
ω
The Syntax of the HTD is as follows:
HTD = [fDC, fSD, e1, e2, ..,e30, d1, d2, .. ,d30]
Where fDC and fSD are the mean and standard deviation of input images, and ei
and di are the nonlinearly scaled and quantized mean energy and energy
44
deviation of the i-th channel.
45. Texture Browsing Descriptor
– Texture Browsing : Perceptual characterization of a texture, similar to a human
characterization, in terms of regularity, coarseness and directionality
– TBD = [v1,v2,v3,v4,v5]
v1 ∈ {1, 2, 3, 4} or {00,01,10,11}: represents the regularity
v2,v3 ∈ {1, 2, 3, 4, 5, 6} : capture the directionality of the texture
v4, v5 ∈ {1, 2, 3, 4}: capture the coarseness of the texture
Regularity Semantics
00 irregular
01 slightly regular
10 regular
11 highly regular
Semantics of Regularity.
11 01 00
10
Regularity
Examples of Regularity
45
46. Edge Histogram Descriptor
– Edge Histogram: represents local edge distribution in the image
Five types of edges: 5 histogram bins per each sub-image
BinCounts[k] Semantics
BinCounts[0] Vertical edges in sub-image (0,0)
BinCounts[1] Horizontal edges in sub-image (0,0)
BinCounts[2] 45 degree edges in sub-image (0,0)
BinCounts[3] 135 degree edges in sub-image (0,0)
BinCounts[4] Non-directional edges in sub-image (0,0)
BinCounts[5] Vertical edges in sub-image (0,1)
BinCounts[74] Non-directional edges in sub-image (3,2)
BinCounts[75] Vertical edges in sub-image (3,3)
BinCounts[76] Horizontal edges in sub-image (3,3)
BinCounts[77] 45 degree edges in sub-image (3,3)
BinCounts[78] 135 degree edges in sub-image (3,3)
BinCounts[79] Non-directional edges in sub-image (3,3)
46
47. Shape Descriptors
Shape Descriptors
– Region-based Shape Descriptor
Expresses pixel distribution within a 2-D object or region.
Based on both boundary and internal pixels and can describe complex objects
consisting of multiple disconnected regions as well as simple objects with or
without holes.
– Contour-based Shape Descriptor
Based on CSS representation of the contour
– 3-D Spectrum Descriptor
Expresses characteristic features of objects represented as discrete polygonal 3-D
meshes.
Based on the histogram of local geometrical properties of the 3-Dsurfaces of the
object.
47
48. Shape Descriptors
– Region-based shape descriptor utilizes a set of ART(Angular Radial
Transform) coefficients. Twelve angular and three radial functions are used
(n < 3, m < 12).
Fnm is an ART coefficient of order n and m. V is ART basis function and f is an image function
V (ART basis function) is separable along the angular and radial directions
(Real part of the ART basis functions)
ART coefficients are divided by the magnitude of ART coefficient of order n= 0, m = 0, which is not used
as a descriptor element.
Quantization is applied to each coefficient using 4 bit per coefficient to minimize the size of the descriptor
48
49. Shape Descriptors
– Contour-based Shape Descriptor : describes a closed contour of a 2D object or
region in image or video sequence. Based on the Curvature Scale Space (CSS)
representation of the contour
(A 2D visual object (region) and its corresponding shape)
Field No. of bits Meaning
No. of peaks 6 No. of peaks in CSS image
Circularity and eccentricity
2×6
GlobalCurvature
of the contour
Circularity and eccentricity
2×6
PrototypeCurvature
of the smoothed contour
Absolute height of the highest
HighestPeakY 7
peak (quantized)
X-position on the contour of a
PeakX[] 6
peak (quantized)
Height of the peak
PeakY[] 3
(quantized)
(CSS Image Formation)
49
Smoothing evolution of zero-crossing
50. Shape Descriptors
Contour-based Shape Descriptor has the following properties
• It can distinguish between shapes that have similar region-shape properties but
different contour-shape properties.
– · It supports search for shapes that are semantically similar for humans
– · It is robust to significant non-rigid deformations
– · It is robust to distortions in the contour due to perspective transformations, which are
common in the images and video
– · It is robust to noise present on the contour.
– · It is very compact (14 Bytes per contour on average).
– · The descriptor is easy to implement and offers fast extraction and matching.
50
51. Shape Descriptors
(3-Dimensional Class)
– 3-D Shape spectrum descriptor : This descriptor specifies an intrinsic shape
description for 3D mesh models. It exploits some local attributes of the 3D surface.
The shape index, introduced by Koenderink, is defined as a function of the two principal
curvatures, and associated with point p on the 3D surface S.
with
By definition, the shape index value is in the interval [0,1]
The shape spectrum of the 3D mesh (3D-SSD) is the histogram of the shape indices (Ip‘s)
calculated over the entire mesh.
51
52. Motion Descriptors
Camera Motion Descriptor
Motion Trajectory Descriptor
Parametric Motion Descriptor
Motion Activity Descriptor
Moving region
Video segment
Camera motion Mosaic
Motion trajectory
Motion activity
Warping
Parametric motion
parameters
52
53. Motion Descriptors
Motion Descriptors
– Camera Motions: pan, track, tilt, boom, zoom, dolly, roll, absence
perspective projection and camera
motion parameters
53
54. Motion Descriptors
– Motion Trajectory : describes the displacements of objects in time. A high
level feature associated to a moving region, defined as the spatiotemporal
localization of one of its representative points (such as its center) as a list of key
points (x, y, z, t)
– Parametric Motion : describing the motion of objects in video sequences as a 2D
parametric model.
Affine Models (6): translations, rotations, scaling and combination of these.
Planar Perspective Models (8) : Global deformations with perspective projections
Quadratic Models (12) : describes more complex movements
– Motion Activity : Intuitive notion of ‘intensity of action’ or ‘pace of action’ in a
video segment.
Example of high “activity”: Goal scoring in a soccer match
Can be used in diverse applications such as content repurposing, video summarization,
surveillance, content-based querying, etc.
Four attributes:
– Intensity of activity: indicate high or low activity by a integer lying in [1—5]
– Direction of activity: expresses the dominant direction of the activity if any
– Spatial distribution of activity: the number and size of active regions in a frame
– Temporal distribution of activity: expresses the variation of activity over the duration
54
55. Localization Descriptors
Localization Descriptors
– Region Locator : Localization of regions within images or frames by specifying
them with a brief and scalable representation of a Box or a Polygon. Procedure
consists of the following 2 steps
Extraction of vertices of the region to be localized
Localization of the region within the image or frame
(localization using a polygonal and Box element of the RegionLocator)
– Spatio Temporal Locator: describes spatial-temporal regions in a video
sequence, such as moving object regions, and provides localization
functionality.
55
56. Face Recognition Descriptor
FaceRecognition Descriptor : Used to retrieve face images which match a query
face image.
–Face Recognition : The projection of a face vector onto a set of 48 basis
eigenvectors U (‘eigenfaces’) which span the space of possible face vectors.
–Feature Extraction : The FaceRecognition feature set is extracted from a
normalized face image. This normalized face image contains 56 lines with 46
intensity values in each line. The centre of the two eyes in each face image are
located on the 24th row and the 16th and 31st column for the right and left eye
respectively.
Features are given by the vector W
and is the mean face vector.
The features are normalized and clipped using Z=2048 as follows.
56
57. Face descriptor
– Automatic Face Image Localization
(Block Diagram of the Automatic face Image Localization algorithm)
Color Segmentation
(A color segmentation example: a) the skin color region in the Cb-Cr plane
b) original image c) results of the color segmentation algorithm)
57
59. Audio Descriptors
Basic Descriptors: temporally sampled scalar values for general use,
applicable to all kinds of signals
– AudioWaveform Descriptor : Audio waveform envelope (minimum and
maximum), typically for display purposes
– AudioPower Descriptor : the temporally smoothed instantaneous power,
which is useful as a quick summary of a signal, and in conjunction with the
power spectrum.
Basic Spectral Descriptors: all deriving from a single time-frequency
analysis of an audio signal
– AudioSpectrumEnvelope Descriptor : a logarithmic-frequency spectrum,
spaced by a power-of-two divider (multiple of an octave)
– AudioSpectrumCentroid Descriptor : the center of gravity of the log-
frequency power spectrum, which describes the shape of the power
spectrum
59
60. Audio Descriptors
– AudioSpectrumSpread Descriptor : complementary of the previous descriptor
by describing the second moment of log-frequency power spectrum. This may
help distinguish between pure-tone and noise-like sounds
– AudioSpectrumFlatness Descriptor : the flatness properties of the spectrum of
an audio signal for each of a number of frequency bands. When this indicates a high
deviation from a flat spectral shape for a given band, it may signal the presence of
tonal components
(Example of AudioSpectrumEnvelope description of a pop song)
Visualized using a spectrogram.
Required data storage is NM values
where N is the no. of spectrum bins
and M is the no. of time points
60
61. Audio Descriptors
Spectral Basis Descriptor: low-dimensional projections of a high-
dimensional spectral space to aid compactness and recognition, which are
used primarily with the Sound Classification and Indexing Description Tools
– AudioSpectrumBasis : a series of basis functions that are derived from the
singular value decomposition of a normalized power spectrum
– AudioSpectrumProjection : Used with above descriptor, and represents low-
dimensional features of a spectrum after projection upon a reduced rank basis.
(Example: A 10-basis component reconstruction showing most of the detail of the
original spectrogram including guitar, bass guitar, etc.)
The left vectors are an AudioSpectrumBasis
Descriptor and the top vectors are the
corresponding AudioSpectrumProjection
Descriptor. The required data storage is
10(M+N) values
61
62. Audio Descriptors
Signal Parameters : apply chiefly to periodic or quasi-periodic
signals
– AudioFundamentalFrequency Descriptor : fundamental frequency of an
audio signal, which represents for a confidence measure in recognition of
the fact that the various extraction methods, commonly called “pitch-
tracking”, are not perfectly accurate.
– AudioHarmonicity Descriptor : the harmonicity of a signal, allowing
distinction between sounds with a harmonic spectrum (e.g., musical tones
or voiced speech [vowels like ‘a’]), sounds with an inharmonic spectrum
(e.g., metallic or bell-like sounds) and sounds with a non-harmonic
spectrum (e.g., noise, unvoiced speech [fricatives like ‘f’], or dense
mixtures of instruments).
62
63. Audio Descriptors
Timbral Temporal Descriptor : temporal characteristics of segments
of sounds, useful for the description of musical timbre( characteristic tone
quality independent of pitch and loudness).
– LogAttackTime Descriptor : the ‘attack’ of a sound, the time it takes for the signal
to rise from silence to the maximum amplitude. It tells the difference between a
sudden and a smooth sound
– TemporalCentroid Descriptor : the signal envelope, representing where in time the
energy of a signal is focused. It is used for the distinction between a decaying piano
note and a sustained organ note, when the lengths and the attacks of the two notes
are identical.
Timbral Spectral Descriptor : spectral features in a linear-frequency
space especially applicable to the perception of musical timbre.
– SpectralCentroid Descriptor : the power-weighted average of the frequency of the
bins in the linear power spectrum. Very similar to the AudioSpectrumCentroid, but
specialized for use in distinguishing musical instrument timbres. It tells the
“sharpness” of a sound.
63
64. Audio Descriptors
– HarmonicSpectralCentroid Descriptor : the amplitude-weighted mean of the
harmonic peaks of the spectrum. It has a similar semantic to the other centroid
descriptors, but applies only to the harmonic parts of the musical tone.
– HarmonicSpectralDeviation Descriptor : the spectral deviation of log-amplitude
components from a global spectral envelope.
– HarmonicSpectralSpread Descriptor : the amplitude-weighted standard deviation
of the harmonic peaks of the spectrum, normalized by the instantaneous
HarmonicSpectralCentroid.
– HarmonicSpectralVariation Descriptor : the normalized correlation between the
amplitude of the harmonic peaks between two subsequent time-slices of the signal.
Silence Segment : attaches the simple semantic of “silence” (i.e. no
significant signal) to an Audio Segment. It may be used to aid further
segmentation of the audio stream, or as a hint not to process a segment.
64
65. Audio Descriptors
High-level Audio Description Tools (Ds and DSs)
– Audio Signature DS : A condensed representation of an audio signal designed to
provide a unique content for the purpose of robust automatic identification of audio
signals. Applications include audio fingerprinting, identification of audio based on a
database of known works
– Musical Instrument Timbre Description Tools
HarmonicInstrumentTimbre Descriptor : Four harmonic timbral spectral
Descriptors with the LogAttackTime Descriptor
PercussiveInstrumentTimbre Descriptor : The timbral temporal Descriptors
with a SpectralCentroid Descriptor
– Melody Description Tools
Include a rich representation for monophonic melodic information to
facilitate efficient, robust, and expressive melodic similarity matching.
MelodyContour DS: terse, efficient melody contour representation
MelodySequence DS: a more verbose, complete, expressive melody
representation
65
66. Audio Descriptors
– General Sound Recognition and Indexing Description Tools
A collection of tools for indexing and categorization of sound (effects) in
general
SoundModelStatePath Descriptor: states generated by a sound model
SoundModelStateHistogram Descriptor: normalized histogram of the state
sequence generated by a sound model
– Spoken Content Description Tools
Consists of combined word and phone lattices for each speaker in an audio
stream. Use phone lattices to alleviate out-of-vocabulary problem (OOV)
SpokenContentLattice Description Scheme : the actual decoding produced by
an ASR(Automatic Speech Recognition) engine.
SpokenContentHeader : information about the speakers being recognized and
the recognizer itself.
66
67. References
Book – Introduction to MPEG-7: Multimedia Content
Description Interface
B. S. Manjunath (Editor), Philippe Salembier (Editor), Thomas
Sikora (Editor)
ISBN: 0-471-48678-7
http://www.wiley.com/WileyCDA/WileyTitle/
productCd-0471486787.html
MPEG-7
http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm
MPEG-7 DDL Homepage
http://archive.dstc.edu.au/mpeg7-ddl/
67