containing different types of information could be Content Information) metadata type specified in the
seamlessly moved between machines and original standard .
applications in a standard manner during
development . The IFF type solved the 2.4Xiph.Org Ogg
problems that require a standardised container
format so well that it has served as the basis for Xiph.Org is an organisation formed to create and
most container formats since . In particular, the maintain freely available open standards for internet
standard gave as one of its main motivations the multimedia applications, including the encoding and
ability to be able to add new types to the format distribution of video and audio data . This
without having to ask a central administration group. includes the Ogg container format, which this paper
Extensibility and modularity of formats is something will focus on. The Ogg Vorbis (audio) and Theora
which is important if a format wishes to gain (video) codec standards provide information on
widespread acceptance. packaging of data into multiplexed Ogg files [8, 9],
however they do not provide metadata for features
Other formats based upon the original IFF such as chapter indexing or subtitles. This is
specification include the RIFF (Resource instead provided by Tobias Waldvogel’s Ogg Media
Interchange File Format) released by IBM and (OGM) extension, which is provided as part of the
Microsoft in 1991 . The only difference between Ogg software repository on the Xiph.Org website
IFF and RIFF is that the bytes are x86 little-endian .
rather than the big-endian bytes used by the
Motorola 68000 that the original IFF format was
designed to accommodate . The commonly 3.METHODOLOGY
used AVI (Audio Video Interleave) audio-visual The container formats will be compared on both
format and the WAV (short for wave) audio format technical merit and philosophical suitability for
are both derived from RIFF . purpose on several criteria. They will be compared
on their theoretical ability to be used as both
2.2Multimedia Metadata streaming and distributable file formats, which will
include a discussion on both their internal structure
Given that the idea of chunks and chunk type and support for different audio and video codecs.
identification is still a core idea in multimedia There will then be a discussion of the metadata
container formats, what is it which makes a format each format supports natively. Philosophically, this
modern? The differentiating factor is generally in paper will briefly discuss the licensing options for
the amount and type of metadata each type can each format and the implications for their use.
store. Metadata can be described as “data about
data”, which may describe fundamental properties
of that data such as screen resolution of a video 3.1Distribution and Streaming
stream or the codec (coder/decoder) required to The container formats aid in streaming of audio and
decode a particular data stream. This is in contrast visual data by providing a standard method of serial
with “content essence”, which is the multimedia transportation that can be multiplexed by the
material itself . Important aspects of metadata receiver in order to play back many different
implementations include the ability to add more streams simultaneously. For example, a stream
metadata at a future date and, especially in a may consist of a video channel and several
container format, to be able to decode a file even if selectable audio channels, each in a different
the application does not understand particular parts language, as well as subtitles. A similar principle is
of the associated metadata. used to store the data as a file on physical storage
media and reconstruct the streams for local
2.3MPEG-4 Parts 12 and 14 playback, aiding distribution of multimedia. The
comparison will be made on how the formats
The Motion Picture Experts Group (MPEG) is an support this storage. This includes error correction
ISO (International Organisation for Standardisation) and synchronisation support.
working group put together to work on standards
regarding video and audio . They have
previously overseen specification of the widely-used 3.2Support for Codecs
MPEG-1 and MPEG-2 standards regarding video Different container files will have support for
and audio compression and decompression. The different audio and video codecs depending on their
most recent standard in this series is the MPEG-4 internal structure and available implementations.
specification . In particular, this paper is This will compare the formats’ ability to play
interested in part 14 of the specification, which different codecs and thus how easily they may be
defines a container format based upon Apple extended to take advantage of, for example,
Computer, Inc.’s QuickTime container format. variable quality video and audio or future formats
Unfortunately, part 14 of the MPEG-4 standard is which have not yet been created. If an application
not freely available, so this paper will instead refer may easily use a relatively codec-agnostic container
to part 12 , the ISO base media format which format then the transportation and distribution layer
forms the basis for part 14. can be easily decoupled from the multiplexing and
Also of interest is the MPEG-7 standard , which decoding layer.
specifies a standard method for storing metadata
for both generic and multimedia specific content. 3.3Metadata
The MPEG-4 standard permits the inclusion of Direct metadata can be used in container files for
MPEG-7 metadata  instead of the OCI (Object multimedia applications in order to provide the user
FIGURE 1: Xiph.Org Ogg physical bit stream featuring multiplexed logical bit streams, with
beginning and terminating pages .
with information on details such as the track name order within the group itself. A group of logical bit
and artist, one example of this being the ID3 tags streams must all begin simultaneously (send their
which can be packaged with MPEG layer III (MP3) initial page before any data is sent) and a new
audio files . Metadata may also be used within group may not begin until all logical bit streams
the file format itself in order to index the contents within the previous group have ended.
and allow faster seeking of content or to provide The pages within the group contain the codec data
information to applications so that they know which in packets, each of which is split into consecutive
codecs to use to decode the data. Modern 255 byte chunks. This allows variable sized pages
container formats allow advanced metadata that and packets with minimal processing to discover the
quickly allows users to search for specific beginning and end of each, since the total packet
multimedia, for example melody information that size may be deduced by the first non-255 sized
can be used in a “query by humming” system . chunk within the page (a size of 0 must be given
The file formats’ methods of storing metadata will when the last segment in a packet is 255).
be discussed as well as the implications for
applications and users. The page header contains the sizes of each of the
packet segments contained within. A maximum
size of 255 segments is placed on the pages in
3.4Philosophical Considerations order to prevent runaway streams being given by
The licensing options available to each format can corrupt data. Pages have a mechanism whereby a
dictate how they fare in different areas of the flag is set to say when a packet has been split over
market. This paper will briefly discuss the several pages. This means that error recovery in
philosophical issues surrounding the licensing of the case of corrupt packets may be easily
each of the formats. performed, as the codec can easily pick up the start
of the next packet and ignore the partial data within
4.DISTRIBUTION AND STREAMING a page. The page header also contains CRC
(Cyclic Redundancy Check) checksum data for
individual pages in order to verify data integrity.
The technical evidence presented here comes from Ogg bit streams are thus built well for streaming
the official Xiph.Org Ogg bit stream container format purposes; since the packet size for the codecs may
specification, which is described in  and . be variable and the page size may vary, this makes
Ogg has been designed both for file storage on a it easy to store and send variable bit rate data as
local system and for streaming over, for example, a the Ogg bit stream does not mandate a packet size
TCP connection. One of the major design goals (though the codec may). Logical bit streams
was to be able to construct a complete stream containing audio and video data may be sent
without any seeking. This means that the files can simultaneously at differing bit rates, since there is
be read or written in a single pass and makes it no obligation for the pages to appear at the same
ideal for streaming applications such as internet frequency within a group in the physical bit stream.
radio. An Ogg stream provides information about what
Each instance of a decoding codec is responsible data it is sending when it provides the initial logical
stream packets, meaning that an application can act
for a single logical bit stream. The logical bit accordingly. For example, if a subtitle logical
streams consist of consecutively numbered pages stream has begun the application might optionally
(which contain the data within the stream). Each provide subtitles, however if none is sent initially
page must be uniquely numbered within the context then it knows that no such support is required for
of the physical bit stream. The physical bit stream this grouping of logical bit streams.
is constructed from interleaved logical bit stream The overhead of page information is kept to a
pages (Figure 1); it is this which forms the file or minimum, since the total size of a page may be
streamed data. Logical bit streams may be deduced from its “lacing values” (the list of packet
concatenated (also known as being chained) segment sizes).
sequentially, so for example one audio stream may
end and another begin immediately after. There are
initial and terminating pages for each logical bit 4.2MPEG-4 Part 12
stream; a terminating page must be immediately All technical data in this subsection is from 
followed by an initial page for the next stream if any unless otherwise specified.
more data is to be sent on that logical stream.
Logical bit streams may also be multiplexed in
parallel (known as grouping). Pages within a group
must follow one another sequentially within the
logical bit stream but may be interleaved in any
FIGURE 2: MPEG-4 audio visual file with hint track ready for streaming .
The MPEG-4 part 12 ISO base file format is
hierarchical in nature and can be much more 5.SUPPORT FOR CODECS
complex than an Ogg bit stream. It consists of
objects known as “boxes” (or “atoms”), whose 5.1Xiph.Org Ogg
structure is inferred by their type (given by a four The Ogg native audio format is Vorbis. Vorbis itself
character code, Figure 2). is specified as a container-agnostic format and may
Boxes may contain other boxes, for example the be encoded at a variable bit rate. It is designed
Movie Box (“moov”), which contains metadata such that the least important information is
boxes for playback of the tracks (“trak” boxes). contained at the end of the packets, meaning that
Whereas in an Ogg bit stream the logical stream they may be truncated on demand to make more
types are defined by the Ogg derived format in use efficient use of bandwidth at the expense of audio
and the initial pages given by the stream group, in quality . Given the complexity of this native
an MPEG-4 file the types are given by the “moov” format and the flexibility of the Ogg container
box and its children (of which there can be only one format, it is reasonable to assume that the Ogg
in a file). format is flexible enough to contain any audio or
video codec which contains data in continuous
Network streaming support is provided indirectly by temporal order. This is backed up by the Ogg
another track format known as the hint track. This Theora codec, which is the native Ogg format for
may be contained as a track with data within the video that suggests interleaving various kinds of
MPEG-4 file itself or added using a “hinter” tool audio and video data as a part of the standard .
before transmission. MPEG-4 may be transmitted
over RTP (the real-time transmission protocol) using
a variety of different encoding techniques based on 5.2MPEG-4 Part 12
the flexibility of the box structure . However, The MPEG-4 suite of standards contains
the variety of types of media that may be present specifications for the H.264/AVC (Advanced Video
within the MPEG-4 file mean that it is more complex Coding) video standard (part 10)  and natural
to partition and order the data in order to stream audio coding which covers the range from “16 kbit/s
across a network . Ogg is specifically built such per channel up to bit-rates higher than 64 kbit/s per
that the overhead provided by the headers scales channel” . This provides the quality for
with the size of the packets, whereas the size data encoding many different kinds of audio from speech
provided for a raw MPEG-4 box is constant. quality to CD quality and variable bit rate video to
HDTV quality, given sufficient processing power to
In general the MPEG-4 Part 12 format is a lot more decode the large data rate in real-time . Thus is
flexible than the Ogg container. It specifies edit could be argued that the MPEG-4 standards for
lists that allow the data within the file to be out of video and audio data rates are sufficient to account
temporal order, whereas the Ogg standard specifies for the data rate required by new and interesting
that the logical bit stream pages must be time- formats by themselves and there is not as much
ordered . This facilitates easier editing in place need for extensibility within MPEG-4 as there may
of MPEG-4 file contents than Ogg provides and be in Ogg. Ogg’s Theora is based upon On2’s VP3
means it is potentially much more useful than Ogg format and is more suited to low bit rate streaming
as an interchange format during development due video, whereas MPEG-4 native video is well suited
to the larger variety of data it may hold as standard. to variable bit rate distribution .
This data may also be much richer, the implications MPEG-4 provides the MSDL  (MPEG-4
of which are described later on. In addition, Systems and Description Languages) in order to
MPEG-4 may reference data outside of the file define new objects. The extensibility of the format
itself, which Ogg has no native support for. This means that these could be easily included in the
aids during content creation as this media data format itself. However, compared with Ogg this
need not be embedded in the file until distribution. would be a relatively lengthy process given an
existing container-independent implementation – the
Ogg format wrapper would almost write itself, due to
the flexibility of Ogg’s packet and page mechanism.
However, in MPEG-4 the new format would have to to MPEG-4 and no central point of contact . All
have an object specifically written for it in order to of this means that, although MPEG-4 will (and likely
be usable within the MPEG-4 file itself. has) become a research and marketplace standard
format for audio/visual data, open formats such as
Ogg will always be around owing to pressure from
6.METADATA the community for open formats.
The generic Ogg container format does not itself 8.CONCLUSIONS
declare any form of metadata beyond that provided This paper has described the technical merits of the
by the initial stream tags in the Ogg stream (and open Xiph.Org Ogg container format standard and
this is implied information based on file type rather the industry standard MPEG-4 standard (in
than concrete metadata) . However, the Ogg particular parts 12 and 14 which describe its
audio Vorbis codec format specifies a header for the container format). It has shown how both are able
Ogg stream which contains simple metadata such to store their data on disk and process it
as artist name and track title. It also suggests that sequentially for network streaming. Ogg seems to
arbitrary metadata associated with the streams have a slight edge on that front, being more
within the Ogg file should be given its own logical bit lightweight and streamlined whereas MPEG-4 is
stream based on XML or a similar information less flexible when it comes to storage and
declaration technology . streaming. Ogg uses a simple sequential stream
whilst MPEG-4 utilises a hierarchy of objects. This
makes MPEG-4 better for editing than Ogg, since
6.2MPEG-4 Part 12 the component objects may be edited in place
The MPEG-4 part 12 specification gives two rather than having to rewrite the whole file.
possible choices for attaching arbitrary metadata to
a stream: the original Object Content Information Likewise, Ogg’s lightweight, flexible specification
descriptors which may be attached to specific seems to lend itself well to extension to new media
objects within the file or as a stream attached to an formats. However, MPEG-4 provides the tools for
object, much like a track for information which formally specifying new objects in a standard
changes over time (subtitles, for example) . fashion which would have to be performed
externally to Ogg to achieve the same effect.
A much better method of attaching metadata to files MPEG-4 has much more built-in support for
is to use the MPEG-7 framework, provision for metadata than Ogg, since it is able to easily take
which has already been included in the MPEG-4 advantage of MPEG-7.
standard. MPEG-7 can provide all of the
functionality of OCI and much more . MPEG-7 The biggest advantage to Ogg appears to be that it
uses XML based data structures to store is a free and open standard as well as flexible and
information about audio/visual data within scenes. lightweight, whereas MPEG-4 is designed by
It is designed to be wide-ranging and extensible and industry experts in order to tackle many tasks in
provides standards for, for example, visual objects’ addition to simple streaming and synchronisation.
colour, shape and location within a scene or audio
data’s melodic signature, instrumental timbre or 9.REFERENCES
seeking data provided by generic indexing .
 Motion Picture Expert Group (MPEG)
Applications of MPEG-7 data include search within Achievements.
MPEG-4 data files , representation of metadata http://www.chiariglione.org/mpeg/achievements.
regarding internet streams  and storing the htm last accessed 6th November 2006.
trajectory of objects within scenes such as sports
videos . This shows that the MPEG-4 format is  Koenen, R. (2002). Overview of the MPEG-4
ready to use MPEG-7 descriptor data in many Standard, N4668, ISO/IEC JTC1/SC29/WG11.
different applications and research areas and while  Martínez, J. M. (2004) MPEG-7 Overview,
there is technically nothing preventing someone N6828, ISO/IEC JTC1/SC29/WG11.
from using MPEG-7 data or XML streams within an
Ogg file, there is little reason to if access to  MPEG-4 File Format, Version 2.
MPEG-4 technology is available. The availability of http://www.digitalpreservation.gov/formats/fdd/f
papers on MPEG-4 and MPEG-7 research topics dd000155.shtml last accessed 6th November
also shows that they are widespread within the 2006.
academic community as a standard for research  Motion Pictures Expert Group (2006)
purposes. Introduction to MPEG-4 Object Content
Information, N8148, ISO/IEC
7.PHILOSOPHICAL CONSIDERATIONS JTC1/SC29/WG11.
The Ogg container format and other Xiph.Org  ISO/IEC 14496-12:2005(E); Information
software specifications and products are completely technology - Coding of audio-visual objects Part
free for use and free from patents . This provides 12: ISO base media file format.
a major advantage over MPEG-4 in terms of cost, http://standards.iso.org/ittf/PubliclyAvailableSta
since using the MPEG-4 suite of tools requires ndards/c041828_ISO_IEC_14496-12_2005(E).z
permission from the patent holders and possibly ip last accessed 7th November 2006.
payment of royalties. This is a tricky area, since  About Xiph. http://www.xiph.org/about/ last
there are many patent holders who have contributed accessed 17th November 2006.
 Vorbis I specification. Symposium on Circuits and Systems; Vol. 2;
http://www.xiph.org/vorbis/doc/Vorbis_I_spec.ht pp. 1480-1483.
ml last accessed 17th November 2006.
 Microsoft’s Advanced Systems Format (ASF)
 Theora I Specification. Specification.
df last accessed 17th November 2006. dia/forpros/format/asfspec.aspx last accessed
 Supported codecs and format of their on November 17th 2006.
CodecPrivate blocks.  Matroska File Format. http://www.matroska.org/
http://haali.cs.msu.ru/mkv/codecs.pdf last technical/specs/matroska.pdf last accessed on
accessed 17th November 2006. November 17th 2006.
 Diepold, K., Pereira, F., Chang W. (2005)  Morrison, J. (1985), Standard for Interchange
MPEG-A: multimedia application formats. Format Files. Electronic Arts.
Multimedia, IEEE; Vol. 12, no. 4; pp. 34- 41. http://www.szonye.com/bradd/iff.html last
accessed 17th November 2006.
 Quackenbush, S., Lindsay, A. (2001) Overview
of MPEG-7 audio. IEEE Transactions on  Seebach, P. (2006) Standards and specs: The
Circuits and Systems for Video Technology; Interchange File Format (IFF).
Vol. 11, issue 6; pp. 725-729. http://www-128.ibm.com/developerworks/power/
 Ogg logical and physical bit stream overview. accessed 17th November 2006.
last accessed on 17th November 2006.  Resource Interchange File Format.
http://en.wikipedia.org/wiki/RIFF last accessed
 Ogg logical bit stream framing. 17th November 2006.
accessed on 17th November 2006.  Wilkinson, J. H. Morgan, O. F. (1997)
International Broadcasting Convention; Vol. 1,
 Basso, A., Varakliotis, S. (2000) Transport of Issue 447; pp. 374-379.
MPEG-4 over IP/RTP. IEEE International
 RFC 3533: The Ogg Encapsulation Format
Conference on Multimedia and Expo, 2000; Vol. Version 0. ftp://ftp.rfc-editor.org/in-
2; pp. 1067-1070.. notes/rfc3533.txt last viewed on 17th November
 Wiegand, T., Sullivan, G.J., Bjntegaard, G., 2006.
Luthra, A. (2003) IEEE Transactions on Circuits  Ki, M., Kim, K. (2006) MPEG-7 over MPEG-4
and Systems for Video Technology; Vol. 13, Systems Decoder for Using Metadata.
issue 7. pp. 560- 576. International Conference on Consumer
 Brandenburg, K., Kunz, O., Sugiyama, A. Electronics, 2006. 2006 Digest of Technical
(2000) MPEG-4 natural audio coding. Signal Papers; pp. 245- 246.
Processing: Image Communication; Vol. 15,  Rehm, E. (2000) Representing internet
Issues 4-5, pp. 423-444. streaming media metadata using MPEG-7
 Moseler, K., Fang, J. (2000) Real-time multimedia description schemes. Proceedings
Performance Analysis of MPEG-4 Systems. of the 2000 ACM workshops on Multimedia; pp.
Proceedings of the 43rd IEEE Midwest 93-98.
Symposium on Circuits and Systems; Vol. 3;  Haoran, Y., Rajan, D., Liang-Tien, C. (2003)
pp. 1274-1277. Automatic Generation of MPEG-7 Compliant
 Radha, H., Chen, Y., Parthasarathy, K., Cohen, XML Document for Motion Trajectory Descriptor
R. (1999) Scalable Internet Video Using in Sports Video. Proceedings of the 1st ACM
MPEG-4. Signal Processing: Image international workshop on Multimedia
Communication; Vol. 15, pp. 95-126. databases; pp. 10 – 17.
 Eleftheriadis, A. (1997) The MPEG-4 system  MPEG Licensing Information (2005).
and description languages: from practice to http://www.mpegif.org/patents/index.php last
theory. Proceedings of 1997 IEEE International accessed 17th November 2006.