containing different types of information could be       Content Information) metadata type specified in the
seamlessly mo...
FIGURE 1: Xiph.Org Ogg physical bit stream featuring multiplexed logical bit streams, with
   beginning and terminating pa...
FIGURE 2: MPEG-4 audio visual file with hint track ready for streaming [6].

The MPEG-4 part 12 ISO base file format is
However, in MPEG-4 the new format would have to           to MPEG-4 and no central point of contact [31]. All
have an obje...
[8] Vorbis               I             specification.       Symposium on Circuits and Systems; Vol. 2;
Upcoming SlideShare
Loading in …5



Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. BRINGING IT ALL TOGETHER: A COMPARISON OF TWO MODERN MULTIMEDIA CONTAINER FORMATS Michael Streatfield University of Southampton Motion Picture Experts Group’s (MPEG) MPEG-4 ABSTRACT format because the MPEG working group contains Container formats are used in multimedia many partners, which means that the standards applications to provide storage and synchronisation developed are likely to be more widely applicable. of different media types and there are many to Additionally, much more information is readily choose from. This paper provides a general available regarding MPEG-4 both in terms of specification and in academia. Another open overview of what a container format comprises standard container format is Matroska [22], however before comparing two widely used modern it is as yet immature and not as in widespread use container formats: Xiph.Org ’ s free and open Ogg as Ogg. format and the Motion Picture Expert Group ’s Section 2 of this paper provides an overview of what (MPEG) MPEG-4 part 12 container. It concludes container formats are, giving a brief history and that Ogg is a more lightweight format, suitable for motivating their existence in general before simple synchronisation and streaming of multimedia introducing MPEG-4 and Ogg. Section 3 describes whereas MPEG-4 is a more expansive and and motivates the criteria that will be used to extensible format in terms of features, especially compare the two formats and sections 4 through 7 those involving metadata related to audio-visual give the actual comparison. Section 8 draws the paper to a conclusion, summarising the conclusions items. for each criteria. The criteria themselves are the formats’ suitability for distribution and streaming Keywords content, their level of support for decoding multiple Container files, MPEG-4, Ogg, Metadata, formats of multimedia, their support for metadata Multimedia Streaming and Multimedia Storage and a brief consideration of the philosophy behind their licensing and usage. This paper’s focus is deliberately the technical aspects of these formats. 1.INTRODUCTION This paper aims to provide the reader with an idea of what multimedia container file formats are before 2.BACKGROUND providing a comparison of two widely used modern Multimedia information can be found in many multimedia container formats, Xiph.Org’s Ogg different formats and types and it is often useful to format [7] and the Motion Picture Expert Group’s be able to synchronise two different types of data, (MPEG) MPEG-4 format [2]. Multimedia container for example to play audio simultaneously with video formats are useful in multimedia systems for the and perhaps add subtitles to a video. This is where interchange of data during content creation and the the container format comes in; container formats synchronisation of different types of multimedia bring together different types of multimedia into a stream into single streams for network distribution single file and have the capacity to synchronise or local storage. playback of disparate types of media. They provide the information that multimedia storage and The container formats described in detail in this playback systems require in order to, for example, paper for comparison were chosen since they play back a presentation from a file in permanent represent an industry and academic standard storage locally or transmit it sequentially across a format (MPEG-4) and an established open and free- network. Container formats also generally provide for-use format (Ogg) although other options exist. metadata facilities that give additional information to Microsoft’s ASF (Advanced Systems Format) [21] users or help with common tasks such as seeking was considered to be narrower in scope than the within a file for particular points in time or specific events. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial 2.1History of Container Formats advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers The first widely used portable multimedia container or to redistribute to lists, requires prior specific permission. format was IFF (Interchange File Format), which 6th Annual Multimedia Systems, Electronics and Computer was a standard released by EA in 1985 [23]. This Science, University of Southampton introduced the basic idea of “chunks” within a file, © 2006 Electronics and Computer Science, University of each with its own type identifier in order to provide Southampton hints on how to decode it, such that complex files
  2. 2. containing different types of information could be Content Information) metadata type specified in the seamlessly moved between machines and original standard [5]. applications in a standard manner during development [24]. The IFF type solved the 2.4Xiph.Org Ogg problems that require a standardised container format so well that it has served as the basis for Xiph.Org is an organisation formed to create and most container formats since [24]. In particular, the maintain freely available open standards for internet standard gave as one of its main motivations the multimedia applications, including the encoding and ability to be able to add new types to the format distribution of video and audio data [7]. This without having to ask a central administration group. includes the Ogg container format, which this paper Extensibility and modularity of formats is something will focus on. The Ogg Vorbis (audio) and Theora which is important if a format wishes to gain (video) codec standards provide information on widespread acceptance. packaging of data into multiplexed Ogg files [8, 9], however they do not provide metadata for features Other formats based upon the original IFF such as chapter indexing or subtitles. This is specification include the RIFF (Resource instead provided by Tobias Waldvogel’s Ogg Media Interchange File Format) released by IBM and (OGM) extension, which is provided as part of the Microsoft in 1991 [25]. The only difference between Ogg software repository on the Xiph.Org website IFF and RIFF is that the bytes are x86 little-endian [7]. rather than the big-endian bytes used by the Motorola 68000 that the original IFF format was designed to accommodate [23]. The commonly 3.METHODOLOGY used AVI (Audio Video Interleave) audio-visual The container formats will be compared on both format and the WAV (short for wave) audio format technical merit and philosophical suitability for are both derived from RIFF [25]. purpose on several criteria. They will be compared on their theoretical ability to be used as both 2.2Multimedia Metadata streaming and distributable file formats, which will include a discussion on both their internal structure Given that the idea of chunks and chunk type and support for different audio and video codecs. identification is still a core idea in multimedia There will then be a discussion of the metadata container formats, what is it which makes a format each format supports natively. Philosophically, this modern? The differentiating factor is generally in paper will briefly discuss the licensing options for the amount and type of metadata each type can each format and the implications for their use. store. Metadata can be described as “data about data”, which may describe fundamental properties of that data such as screen resolution of a video 3.1Distribution and Streaming stream or the codec (coder/decoder) required to The container formats aid in streaming of audio and decode a particular data stream. This is in contrast visual data by providing a standard method of serial with “content essence”, which is the multimedia transportation that can be multiplexed by the material itself [26]. Important aspects of metadata receiver in order to play back many different implementations include the ability to add more streams simultaneously. For example, a stream metadata at a future date and, especially in a may consist of a video channel and several container format, to be able to decode a file even if selectable audio channels, each in a different the application does not understand particular parts language, as well as subtitles. A similar principle is of the associated metadata. used to store the data as a file on physical storage media and reconstruct the streams for local 2.3MPEG-4 Parts 12 and 14 playback, aiding distribution of multimedia. The comparison will be made on how the formats The Motion Picture Experts Group (MPEG) is an support this storage. This includes error correction ISO (International Organisation for Standardisation) and synchronisation support. working group put together to work on standards regarding video and audio [1]. They have previously overseen specification of the widely-used 3.2Support for Codecs MPEG-1 and MPEG-2 standards regarding video Different container files will have support for and audio compression and decompression. The different audio and video codecs depending on their most recent standard in this series is the MPEG-4 internal structure and available implementations. specification [2]. In particular, this paper is This will compare the formats’ ability to play interested in part 14 of the specification, which different codecs and thus how easily they may be defines a container format based upon Apple extended to take advantage of, for example, Computer, Inc.’s QuickTime container format. variable quality video and audio or future formats Unfortunately, part 14 of the MPEG-4 standard is which have not yet been created. If an application not freely available, so this paper will instead refer may easily use a relatively codec-agnostic container to part 12 [6], the ISO base media format which format then the transportation and distribution layer forms the basis for part 14. can be easily decoupled from the multiplexing and Also of interest is the MPEG-7 standard [3], which decoding layer. specifies a standard method for storing metadata for both generic and multimedia specific content. 3.3Metadata The MPEG-4 standard permits the inclusion of Direct metadata can be used in container files for MPEG-7 metadata [4] instead of the OCI (Object multimedia applications in order to provide the user
  3. 3. FIGURE 1: Xiph.Org Ogg physical bit stream featuring multiplexed logical bit streams, with beginning and terminating pages [13]. with information on details such as the track name order within the group itself. A group of logical bit and artist, one example of this being the ID3 tags streams must all begin simultaneously (send their which can be packaged with MPEG layer III (MP3) initial page before any data is sent) and a new audio files [11]. Metadata may also be used within group may not begin until all logical bit streams the file format itself in order to index the contents within the previous group have ended. and allow faster seeking of content or to provide The pages within the group contain the codec data information to applications so that they know which in packets, each of which is split into consecutive codecs to use to decode the data. Modern 255 byte chunks. This allows variable sized pages container formats allow advanced metadata that and packets with minimal processing to discover the quickly allows users to search for specific beginning and end of each, since the total packet multimedia, for example melody information that size may be deduced by the first non-255 sized can be used in a “query by humming” system [12]. chunk within the page (a size of 0 must be given The file formats’ methods of storing metadata will when the last segment in a packet is 255). be discussed as well as the implications for applications and users. The page header contains the sizes of each of the packet segments contained within. A maximum size of 255 segments is placed on the pages in 3.4Philosophical Considerations order to prevent runaway streams being given by The licensing options available to each format can corrupt data. Pages have a mechanism whereby a dictate how they fare in different areas of the flag is set to say when a packet has been split over market. This paper will briefly discuss the several pages. This means that error recovery in philosophical issues surrounding the licensing of the case of corrupt packets may be easily each of the formats. performed, as the codec can easily pick up the start of the next packet and ignore the partial data within 4.DISTRIBUTION AND STREAMING a page. The page header also contains CRC (Cyclic Redundancy Check) checksum data for individual pages in order to verify data integrity. 4.1Xiph.Org Ogg The technical evidence presented here comes from Ogg bit streams are thus built well for streaming the official Xiph.Org Ogg bit stream container format purposes; since the packet size for the codecs may specification, which is described in [13] and [14]. be variable and the page size may vary, this makes Ogg has been designed both for file storage on a it easy to store and send variable bit rate data as local system and for streaming over, for example, a the Ogg bit stream does not mandate a packet size TCP connection. One of the major design goals (though the codec may). Logical bit streams was to be able to construct a complete stream containing audio and video data may be sent without any seeking. This means that the files can simultaneously at differing bit rates, since there is be read or written in a single pass and makes it no obligation for the pages to appear at the same ideal for streaming applications such as internet frequency within a group in the physical bit stream. radio. An Ogg stream provides information about what Each instance of a decoding codec is responsible data it is sending when it provides the initial logical stream packets, meaning that an application can act for a single logical bit stream. The logical bit accordingly. For example, if a subtitle logical streams consist of consecutively numbered pages stream has begun the application might optionally (which contain the data within the stream). Each provide subtitles, however if none is sent initially page must be uniquely numbered within the context then it knows that no such support is required for of the physical bit stream. The physical bit stream this grouping of logical bit streams. is constructed from interleaved logical bit stream The overhead of page information is kept to a pages (Figure 1); it is this which forms the file or minimum, since the total size of a page may be streamed data. Logical bit streams may be deduced from its “lacing values” (the list of packet concatenated (also known as being chained) segment sizes). sequentially, so for example one audio stream may end and another begin immediately after. There are initial and terminating pages for each logical bit 4.2MPEG-4 Part 12 stream; a terminating page must be immediately All technical data in this subsection is from [6] followed by an initial page for the next stream if any unless otherwise specified. more data is to be sent on that logical stream. Logical bit streams may also be multiplexed in parallel (known as grouping). Pages within a group must follow one another sequentially within the logical bit stream but may be interleaved in any
  4. 4. FIGURE 2: MPEG-4 audio visual file with hint track ready for streaming [6]. The MPEG-4 part 12 ISO base file format is hierarchical in nature and can be much more 5.SUPPORT FOR CODECS complex than an Ogg bit stream. It consists of objects known as “boxes” (or “atoms”), whose 5.1Xiph.Org Ogg structure is inferred by their type (given by a four The Ogg native audio format is Vorbis. Vorbis itself character code, Figure 2). is specified as a container-agnostic format and may Boxes may contain other boxes, for example the be encoded at a variable bit rate. It is designed Movie Box (“moov”), which contains metadata such that the least important information is boxes for playback of the tracks (“trak” boxes). contained at the end of the packets, meaning that Whereas in an Ogg bit stream the logical stream they may be truncated on demand to make more types are defined by the Ogg derived format in use efficient use of bandwidth at the expense of audio and the initial pages given by the stream group, in quality [8]. Given the complexity of this native an MPEG-4 file the types are given by the “moov” format and the flexibility of the Ogg container box and its children (of which there can be only one format, it is reasonable to assume that the Ogg in a file). format is flexible enough to contain any audio or video codec which contains data in continuous Network streaming support is provided indirectly by temporal order. This is backed up by the Ogg another track format known as the hint track. This Theora codec, which is the native Ogg format for may be contained as a track with data within the video that suggests interleaving various kinds of MPEG-4 file itself or added using a “hinter” tool audio and video data as a part of the standard [9]. before transmission. MPEG-4 may be transmitted over RTP (the real-time transmission protocol) using a variety of different encoding techniques based on 5.2MPEG-4 Part 12 the flexibility of the box structure [15]. However, The MPEG-4 suite of standards contains the variety of types of media that may be present specifications for the H.264/AVC (Advanced Video within the MPEG-4 file mean that it is more complex Coding) video standard (part 10) [16] and natural to partition and order the data in order to stream audio coding which covers the range from “16 kbit/s across a network [15]. Ogg is specifically built such per channel up to bit-rates higher than 64 kbit/s per that the overhead provided by the headers scales channel” [17]. This provides the quality for with the size of the packets, whereas the size data encoding many different kinds of audio from speech provided for a raw MPEG-4 box is constant. quality to CD quality and variable bit rate video to HDTV quality, given sufficient processing power to In general the MPEG-4 Part 12 format is a lot more decode the large data rate in real-time [18]. Thus is flexible than the Ogg container. It specifies edit could be argued that the MPEG-4 standards for lists that allow the data within the file to be out of video and audio data rates are sufficient to account temporal order, whereas the Ogg standard specifies for the data rate required by new and interesting that the logical bit stream pages must be time- formats by themselves and there is not as much ordered [14]. This facilitates easier editing in place need for extensibility within MPEG-4 as there may of MPEG-4 file contents than Ogg provides and be in Ogg. Ogg’s Theora is based upon On2’s VP3 means it is potentially much more useful than Ogg format and is more suited to low bit rate streaming as an interchange format during development due video, whereas MPEG-4 native video is well suited to the larger variety of data it may hold as standard. to variable bit rate distribution [19]. This data may also be much richer, the implications MPEG-4 provides the MSDL [20] (MPEG-4 of which are described later on. In addition, Systems and Description Languages) in order to MPEG-4 may reference data outside of the file define new objects. The extensibility of the format itself, which Ogg has no native support for. This means that these could be easily included in the aids during content creation as this media data format itself. However, compared with Ogg this need not be embedded in the file until distribution. would be a relatively lengthy process given an existing container-independent implementation – the Ogg format wrapper would almost write itself, due to the flexibility of Ogg’s packet and page mechanism.
  5. 5. However, in MPEG-4 the new format would have to to MPEG-4 and no central point of contact [31]. All have an object specifically written for it in order to of this means that, although MPEG-4 will (and likely be usable within the MPEG-4 file itself. has) become a research and marketplace standard format for audio/visual data, open formats such as Ogg will always be around owing to pressure from 6.METADATA the community for open formats. 6.1Xiph.Org Ogg The generic Ogg container format does not itself 8.CONCLUSIONS declare any form of metadata beyond that provided This paper has described the technical merits of the by the initial stream tags in the Ogg stream (and open Xiph.Org Ogg container format standard and this is implied information based on file type rather the industry standard MPEG-4 standard (in than concrete metadata) [27]. However, the Ogg particular parts 12 and 14 which describe its audio Vorbis codec format specifies a header for the container format). It has shown how both are able Ogg stream which contains simple metadata such to store their data on disk and process it as artist name and track title. It also suggests that sequentially for network streaming. Ogg seems to arbitrary metadata associated with the streams have a slight edge on that front, being more within the Ogg file should be given its own logical bit lightweight and streamlined whereas MPEG-4 is stream based on XML or a similar information less flexible when it comes to storage and declaration technology [8]. streaming. Ogg uses a simple sequential stream whilst MPEG-4 utilises a hierarchy of objects. This makes MPEG-4 better for editing than Ogg, since 6.2MPEG-4 Part 12 the component objects may be edited in place The MPEG-4 part 12 specification gives two rather than having to rewrite the whole file. possible choices for attaching arbitrary metadata to a stream: the original Object Content Information Likewise, Ogg’s lightweight, flexible specification descriptors which may be attached to specific seems to lend itself well to extension to new media objects within the file or as a stream attached to an formats. However, MPEG-4 provides the tools for object, much like a track for information which formally specifying new objects in a standard changes over time (subtitles, for example) [5]. fashion which would have to be performed externally to Ogg to achieve the same effect. A much better method of attaching metadata to files MPEG-4 has much more built-in support for is to use the MPEG-7 framework, provision for metadata than Ogg, since it is able to easily take which has already been included in the MPEG-4 advantage of MPEG-7. standard. MPEG-7 can provide all of the functionality of OCI and much more [6]. MPEG-7 The biggest advantage to Ogg appears to be that it uses XML based data structures to store is a free and open standard as well as flexible and information about audio/visual data within scenes. lightweight, whereas MPEG-4 is designed by It is designed to be wide-ranging and extensible and industry experts in order to tackle many tasks in provides standards for, for example, visual objects’ addition to simple streaming and synchronisation. colour, shape and location within a scene or audio data’s melodic signature, instrumental timbre or 9.REFERENCES seeking data provided by generic indexing [3]. [1] Motion Picture Expert Group (MPEG) Applications of MPEG-7 data include search within Achievements. MPEG-4 data files [28], representation of metadata regarding internet streams [29] and storing the htm last accessed 6th November 2006. trajectory of objects within scenes such as sports videos [30]. This shows that the MPEG-4 format is [2] Koenen, R. (2002). Overview of the MPEG-4 ready to use MPEG-7 descriptor data in many Standard, N4668, ISO/IEC JTC1/SC29/WG11. different applications and research areas and while [3] Martínez, J. M. (2004) MPEG-7 Overview, there is technically nothing preventing someone N6828, ISO/IEC JTC1/SC29/WG11. from using MPEG-7 data or XML streams within an Ogg file, there is little reason to if access to [4] MPEG-4 File Format, Version 2. MPEG-4 technology is available. The availability of papers on MPEG-4 and MPEG-7 research topics dd000155.shtml last accessed 6th November also shows that they are widespread within the 2006. academic community as a standard for research [5] Motion Pictures Expert Group (2006) purposes. Introduction to MPEG-4 Object Content Information, N8148, ISO/IEC 7.PHILOSOPHICAL CONSIDERATIONS JTC1/SC29/WG11. The Ogg container format and other Xiph.Org [6] ISO/IEC 14496-12:2005(E); Information software specifications and products are completely technology - Coding of audio-visual objects Part free for use and free from patents [7]. This provides 12: ISO base media file format. a major advantage over MPEG-4 in terms of cost, since using the MPEG-4 suite of tools requires ndards/c041828_ISO_IEC_14496-12_2005(E).z permission from the patent holders and possibly ip last accessed 7th November 2006. payment of royalties. This is a tricky area, since [7] About Xiph. last there are many patent holders who have contributed accessed 17th November 2006.
  6. 6. [8] Vorbis I specification. Symposium on Circuits and Systems; Vol. 2; pp. 1480-1483. ml last accessed 17th November 2006. [21] Microsoft’s Advanced Systems Format (ASF) [9] Theora I Specification. Specification. df last accessed 17th November 2006. dia/forpros/format/asfspec.aspx last accessed [10] Supported codecs and format of their on November 17th 2006. CodecPrivate blocks. [22] Matroska File Format. last technical/specs/matroska.pdf last accessed on accessed 17th November 2006. November 17th 2006. [11] Diepold, K., Pereira, F., Chang W. (2005) [23] Morrison, J. (1985), Standard for Interchange MPEG-A: multimedia application formats. Format Files. Electronic Arts. Multimedia, IEEE; Vol. 12, no. 4; pp. 34- 41. last accessed 17th November 2006. [12] Quackenbush, S., Lindsay, A. (2001) Overview of MPEG-7 audio. IEEE Transactions on [24] Seebach, P. (2006) Standards and specs: The Circuits and Systems for Video Technology; Interchange File Format (IFF). Vol. 11, issue 6; pp. 725-729. library/pa-spec16/?ca=dgr-lnxw07IFF last [13] Ogg logical and physical bit stream overview. accessed 17th November 2006. last accessed on 17th November 2006. [25] Resource Interchange File Format. last accessed [14] Ogg logical bit stream framing. 17th November 2006. last accessed on 17th November 2006. [26] Wilkinson, J. H. Morgan, O. F. (1997) International Broadcasting Convention; Vol. 1, [15] Basso, A., Varakliotis, S. (2000) Transport of Issue 447; pp. 374-379. MPEG-4 over IP/RTP. IEEE International [27] RFC 3533: The Ogg Encapsulation Format Conference on Multimedia and Expo, 2000; Vol. Version 0. 2; pp. 1067-1070.. notes/rfc3533.txt last viewed on 17th November [16] Wiegand, T., Sullivan, G.J., Bjntegaard, G., 2006. Luthra, A. (2003) IEEE Transactions on Circuits [28] Ki, M., Kim, K. (2006) MPEG-7 over MPEG-4 and Systems for Video Technology; Vol. 13, Systems Decoder for Using Metadata. issue 7. pp. 560- 576. International Conference on Consumer [17] Brandenburg, K., Kunz, O., Sugiyama, A. Electronics, 2006. 2006 Digest of Technical (2000) MPEG-4 natural audio coding. Signal Papers; pp. 245- 246. Processing: Image Communication; Vol. 15, [29] Rehm, E. (2000) Representing internet Issues 4-5, pp. 423-444. streaming media metadata using MPEG-7 [18] Moseler, K., Fang, J. (2000) Real-time multimedia description schemes. Proceedings Performance Analysis of MPEG-4 Systems. of the 2000 ACM workshops on Multimedia; pp. Proceedings of the 43rd IEEE Midwest 93-98. Symposium on Circuits and Systems; Vol. 3; [30] Haoran, Y., Rajan, D., Liang-Tien, C. (2003) pp. 1274-1277. Automatic Generation of MPEG-7 Compliant [19] Radha, H., Chen, Y., Parthasarathy, K., Cohen, XML Document for Motion Trajectory Descriptor R. (1999) Scalable Internet Video Using in Sports Video. Proceedings of the 1st ACM MPEG-4. Signal Processing: Image international workshop on Multimedia Communication; Vol. 15, pp. 95-126. databases; pp. 10 – 17. [20] Eleftheriadis, A. (1997) The MPEG-4 system [31] MPEG Licensing Information (2005). and description languages: from practice to last theory. Proceedings of 1997 IEEE International accessed 17th November 2006.