CMS White Paper

Orange Labs
research & developement white paper
France Telecom Group confidential
CMS Ecosystem & Evolutions
____________________________________________________________________
version 14/10/2010
Research Object "Entertainment and Content Distribution”
authors
Antonio González (Orange iLabs Spain) antonio.gonzalezm@orange-ftgroup.com
Pablo Paredes (Orange iLabs Spain) pablo.paredes@orange-ftgroup.com
contributors
Yves Scotto (Orange RD SIRP) yves.scotto@orange-ftgroup.com
Thomas Labbe (Orange RD MAPS) thomas.labbe@orange-ftgroup.com
Vincent Mahe (Orange RD TECH) v.mahe@orange-ftgroup.com

white paper Page 2/52
France Telecom Group
TABLE OF CONTENTS
1 EXECUTIVE SUMMARY ....................................................................................................................3
2 AIM OF THIS DOCUMENT.................................................................................................................7
3 PRESENTATION OF CMS .................................................................................................................8
3.1 DEFINITIONS..........................................................................................................................................................8
3.2 THE CONTENT LIFE CYCLE..................................................................................................................................... 10
3.3 THE PROCESS AND ROLES .................................................................................................................................... 19
3.4 OPERATIONAL ISSUES .......................................................................................................................................... 20
3.5 CMS WEB VS CMS VIDEO ................................................................................................................................... 22
3.6 WEB 2.0.............................................................................................................................................................. 25
3.7 PRINCIPAL FEATURES OF A CMS........................................................................................................................... 29
4 CMS PROVIDERS ............................................................................................................................33
4.1 EZ PUBLISH......................................................................................................................................................... 34
4.2 DRUPAL .............................................................................................................................................................. 35
4.3 JOOMLA .............................................................................................................................................................. 36
4.4 TYPO3 ................................................................................................................................................................ 36
4.5 BESTV................................................................................................................................................................ 37
4.6 COMPARISON ...................................................................................................................................................... 38
5 EVOLUTION & TRENDS ..................................................................................................................43
5.1 TOOL IMPROVEMENTS AND PRODUCTIVITY ............................................................................................................. 43
5.2 WEB 3.0.............................................................................................................................................................. 44
5.3 MULTICHANNEL SERVICES .................................................................................................................................... 49
6 APPENDIX 1: LIST OF ACRONYMS...............................................................................................52

1 EXECUTIVE SUMMARY
It is difficult to provide a straight, global definition of a CMS, because CMS can serve different purposes, and
the expectations of the reader may be influenced by a familiar example or a known implementation. CMS can
be used as simple content repositories, or as enabling tools in order to produce portals, or as service
platforms for high-scale deployment of multimedia content. In order to be generalistic, we suggest the support
of the content life cycle as the main mission of a CMS, and, using this storyline, we explain which are the
capabilities required and expected depending on different scenarios and usages.
In most cases, such as simple portal enablers, the complete functionalities of CMS can be found in
standalone packages. This is the case of Joomla and eZ Publish, as examples of Open Source products, or
SED and Magic, as examples of internal FT Group solutions. But it is particularly challenging to identify and
isolate the perimeter of CMS and their native functions in wide, complex, multidisciplinar arrangements, as it is
the case of IPTV service platforms, for instance. If we look at them from 10.000m, we can account for the
whole catalog of CMS functions that support the content life cycle: content ingestion, content catalog, content
transformation and transcoding, content enrichment and metadata, portal generation and publication,
animation, content delivery, etc. But looking closely, we see that some critical functions are distributed to
specialized components, which are not really CMS by themselves, but all of which contribute to a global CMS
entity. So, a DAM (Digital Asset Manager) such as Netia is used as content vault and content reference, while
Orca/RighTV is providing the middleware/mediation for managing business functions, plus publication and
delivery, and SED is providing animation.
Technically considered, CMS are not located at the top-notch of complexity of computer systems. There are
however specific considerations that do make complex the decision for selection, replacement and/or
deployment of a CMS within an organization:
→ The frontiers of the CMS expected functionality are wide, sometimes fuzzy, from a simple database
aimed to just catalog content up to a complete user interface for a TV interactive service. In the
requirements process, this may lead to an overestimation of features, in some cases, or to try to push
the CMS beyond their natural limits, in other.
→ More often than not, some critical functionality needed in a service based on a CMS does not come
out-of-the-shelf, but it is implemented as ad-hoc complement on top of, or aside, the CMS. As
examples: a semantic search engine based on a content repository, or an IPTV interactive service.
However, we tend to include these extra functionalities in the feature reviews and benchmarking
exercises, as differential elements to decision.
→ CMS are usually embedded in the workflow of the organizations that use them. With time, both CMS
and organizations have adapted reciprocally. Therefore, there is always a reactance to change,
because the staff would think that the new CMS will change their right way of doing things.
→ In this adaptation, CMS have surely suffered over time the consequences of tight budgets and
business models, which are especially aggressive in the Portal business. The organizations involved
can hardly afford the costs of a renewal, which by the way should include some hidden cost elements
such as training, migration, etc.
→ CMS are not closed systems, but they are complemented with extra software, either in the pages or
templates themselves, or as side developments. Furthermore, data models are pushed to their limits
over time. These actions are not typically well documented. Therefore, when dealing with a
replacement and migration of a CMS, most of the times we would face a rebuild of the service from
scratch.
We consider that there are three main success factors that need to be considered when selecting a CMS for
a practical implementation or deployment:
→ First, the suitability for the task. Suitability does not mean an aseptic, one by one comparison of
needed against provided features, but it means understanding the business processes of the
organization and the local technical ecosystem, and figuring out how to implement them in the new
system, as well as the optimization possibilities of both.
→ Second, the costs, both for deployment and operation. As mentioned, the business models of portal
services are weak, so the cost of the CMS cannot be a negative obstacle in order to reach P&Ls
goals. Two main considerations:

o “Keep it simple”: In paragraph 3.4 we have presented some architecture recommendations,
where we can see how sensitive the right CMS architecture is to scalability, and therefore to
cost.
o “One size does not fit all”: Features that are considered as critical for one scenario might be
seen as irrelevant for another. Small organizations cannot be recharged for features not used.
Therefore, a global CMS should be modular (and simple) enough to adapt to all needs, or on
the contrary more than one alternative must be considered.
→ And third, the continuous support. Support must be understood along two axis:
o Technical support and supervision: Technical teams cannot be left alone with a CMS once
deployed, but an open support relationhip must be established to recommend, supervise and
optimize the use of the CMS in the ways it was planned for.
o Flexibility: Typical time of portal evolutions are in the range of weeks or few months, while the
typical time of CMS evolutions is 1 year. The launch of new products, new sites or specific
campaigns cannot wait for new releases of the CMS, therefore the technical team in charge of
the tool needs to have enough training and enough close support to work-around problems
with the tools available today. Otherwise, webmasters would look for immediate alternatives
in the Open source world, and so separating from corporate directions.
Innovative evolutions: CMS vendors and development communities continuosly claim for providing
innovative solutions in their new releases. But the existence of too many vendors (Wikipedia lists 100+) is a
symptom of poor innovation, which manifests almost exclusively around technical improvements: better form
management, better administrative tool usability, more flexible data models, improved scripting, etc. As a
more deep innovation, we see that the foreseen activities lay in the following fields:
→ Of course, internal improvements: adoption of new RIA technologies in order to make the internal
tools more user-friendly and productive. But also the integration of IDEs for template programming
such as GWT, and WYSIWYG edition (not to be confused with WYSIWYG design)
→ We have seen how multimedia has crossed the line of content management. In the past, multimedia
was just a link to a binary file associated to textual content; now it is metatagged. In the future, it will
be considered more naturally as content: metainformation will extend to the timeline and to scene-
level; it will be possible to navigate through hyperlinks embedded in scene-leve objects; scenes and
objects will be extracted from videos as semantic story units; contextual extra info and ads will be
shown along with the video; clips and story units will participate in SEO strategies… CMS must be
ready to automatically enrich the video assets with the required metainformation.
→ Globalization, or the ability to produce a portal in different languages, in order to allow the presence in
different markets, will extend the internal CMS workflows to translating partners, not yet machine-
based but humans.
→ Software-as-a-service (SaaS) management and usage models for CMS will expand, especially for
organizations looking for basic CMS support, since the model allows them to better predict and control
costs.
→ Multichannel: The “edit once, publish many” strategy becomes more critical, as the chanels for
Smartphones, tablets and TVs expand and mature.
→ User interaction: CMS will distribute their frontiers towards the clients, as Smartphones, Tablets and
TVs will host functionalities traditionally reserved for server-side, and the final users will be put in the
loop. In the form of RIA and downloadable apps, users and devices will take active part in the content
life cycle. In this scenario, CMS and front-ends will deliver simple, metatagged content through web
services, and the apps will allow the user to aggregate or filter the content, and to choose the final
rendering out of flexible templates. So, we can consider that part of the CMS will reside in the clients.
The web is always evolving as new emerging technologies arise and are adopted. From the Web 1.0
conception as a Read-or-Write web, we have witnessed a change of paradigm in Web 2.0 as a Read/Write
web, which is again evolving in the so called Web 3.0, or Semantic web, towards a Read/Write/Request web,
which offers better semantic approaches to content, richer client-side applications which includes new
opportunities in the management of multimedia, the exploitation of data residing in the cloud, and an
exponential participation of the communities and social networks. There are exciting technologies to keep an
eye on, and CMS will play a bigger or lesser role in their benefit:

→ The introduction of 3D effects will have a major impact on CMS, considered not only as rendering
engines, where they must make use of the appropriate graphic libraries such as WebGL, but also in
the conception of the design and animation effects, where traditional 2D canvas based on headers,
footers, menues and frames need to be substituted by 3D scenarios such as perspective walls, stacks
or cylinder-rotating coverflows, to name a few. Additionally, in some cases the CMS must take into
account the z-dimension of content objects; for instance, if we want to superimpose a protruding
menu box on top of a 3D scene of a car, we need to know which object is closer to the user’s eye, the
box or the car.
→ On the contrary, CMS will not be significatively affected by the adoption of HTML5 standards. Of
course, services already implemented in CMS could gradually incorporate the recommendations of
HTML5, but this can be done with existing facilities, basically by adapting their content models in
order to incorporate the extended metainformation possibilities, and by re-engineering the templates
in order to make that metainformation visible to the rendered output, and to incorporate the use of the
new APIs. This will not generally require new versions of CMS, but a new way of using them.
→ Multimedia consumption will keep increasing, with enriched user experiences favored by several
enablers: the push of 3D and their new immersive possibilities, the facilities of video delivery and
interaction provided by HTML5, the enrichment of video metainformation given by timeline plus scene
level tagging, and the interesting possibilities of non-linear video browsing. CMS need to provide room
for managing this extra metainformation, as well as the use of automatic techniques to extract
semantic stories, such as image/audio processing, object recognition and scene lookup, closed
caption text indexing, etc.
→ Alternatives to actual Man-Machine interfaces, especially in the case of the IR remote for TV and
STB, are desperately needed. However, CMS are not expected to impact, or been impacted by, any
new solution.
→ New devices are expected to challenge CMS. Especially smartphones, tablets and connected TVs will
modify the way we consume content, since their processing power and user experience goodies may
transfer the aggregation, composition and rendering of content from the servers to these client
devices as supercharged browsers. Besides, they may unbalance the flow of content generation by
the community, because they are optimized for consumption better than for generation of content. We
may also expect other devices to challenge content delivery and consumption, such as media centers
as universal aggregators of home entertainment, and those oriented to connected cars, or carputers.
→ As a consequence of new devices, the multipublication of the same content through different
platforms and devices will gain importance. However, CMS are already prepared for multiplatform
publication, the challenge being more structural and cultural than technical. We would realize that
automatic content adaptation may not the best approach in order to fully exploit the native capabilities
of the devices, which, by the way, in some cases are the main reason for buying.
→ CMS must play an important role in customer knowledge on behavioural and social network analysis.
We must understood, since the stages of analysis previous to deployment, that the CMS must act as
a collector of all users’ activity, and therefore is one major input to CRM and data mining systems. At
the end of the chain, it must be able to segmentate the content based on personalization or
recommendation engines.
→ Privacy and parental controls are needing of definitive solutions, and CMS should be a part of them.
New technologies and uses applied to CMS can help the development of new products and services:
→ Cloud content management: From a CMS point of view, storing the content in the cloud is just a
change of platform, but not a change in solution. The added value comes when it’s complemented by
a CMS SaaS (Software as a Service) service, providing remotely the functionalities described in
previous chapters, and leveraging cloud-computing capacity with templated applications.
→ SaaS: The industry is making moves to the cloud. Open-source CMS like Alfresco and Nuxeo are
making their software available as a package that is ready to deploy to the Amazon, RightScale or
JumpBox clouds. As another exampe, WordPress is powered by Amazon CloudFront, and it uses
Amazon's S3 for storage.
→ eCommerce: eCommerce will keep its increase of activity in the following years. When catalogs are
big and heterogeneus, CMS offer their functionality as a mutualized repository of assets.

→ Digital asset management: Similarly to eCommerce platforms, CMS can act as repository referentials
when the number of assets is big, when flexibility in the assignment and use of properties is needed,
and when several channels are used to access and exploit the asset repository. This is especially
clear in video services.
→ Search: Search services will keep extending through new areas: general, vertical, suggested, based
on proximity, temporal, semantic… Three new applications will gain momentum: instant search, with
implications in SEO and advertising; real-time search, where on-line community tweets are used as
sources of temporal immediacy; and subjective search, where a user’s history is profiled to filter out
the results.
→ Social TV: New ways to explore the socialization of TV watching will be explored. However, CMS are
not expected to act as differential in these services.
→ The geo-spatial web is based on geo-spatial metainformation associated to content elements, and
taking the actual location of the user as reference. Both are possibly with the technologies existing
today. But there are also services based on geo-spatial browsers, where the browser simulates the
movement of the user over a spatial world.

2 AIM OF THIS DOCUMENT
The purpose of this document is to provide a common vision of what Orange consider as a Content
Management System (CMS), defining which are the generic functions for the different platforms and contents,
analyzing the main usages, describing what are the main existing solutions, including the major CMS
providers and the list of internal CMS, and exposing which are the future evolutions and trends.
This documment is not a market benchmark and pretends to be not too technical but functional with some
architecture elements, in order to be useful not only for technical areas, but for marketing teams or anyone
else within the Group interested in dig deeper into the exciting world of Content Management Systems.
This paper tries to place CMS systems in the context of the business needs and functional scenarios that they
should support. It is noted that the concept and functional perimeter for the perfect CMS is very wide, and
therefore it is a crucial key for success the adequate selection of features depending on the functional needs.
In order to backup the understanding of CMS, a detailed presentation of their atomic capabilities is presented
in Chapter 2, following the content life cycle as rationale.
Chapter 3 provides a brief description of a variety of available CMS packages, taken from the Open Source
community.
In Chapter 4, a snapshot of CMS and organizations using them in the different FT Group subsidiaries is
presented.
Chapter 5 moves into the anticipation arena, suggesting some technical areas of exploration according with
the expected evolution of content-related product and services. This analysis is kept inbounds the CMS
perimeter, the focus not being how would the content business look like in 5 years from now, but which might
be the key technologies to invest on.

3 PRESENTATION OF CMS
From a general perspective, a Content Management System (CMS) allows website administrators to manage
and maintain a website in a simple manner by providing the internal users with easy to use interfaces. A
content management system can be very helpful and save time especially when managing a website with
many web pages and content that needs to be updated constantly.
A content management system supports the creation, transformation, management, distribution, publishing
and discovery of information. It covers the complete lifecycle of the pages on a site, from providing simple
tools to creating the content, through to publishing, and finally to archiving and searching. It also provides the
ability to manage the structure of the site, the appearance of the published pages, and the navigation offered
to the users.
There is a wide range of business benefits to using a content management system, including:
→ Streamlined authoring process: A CMS allows to add and modify content quickly, while having little
to no technical experience. A CMS, though, usually needs to be installed by experienced
administrators or website development organizations who know HTML, CSS and PHP code, but once
set up, it can be maintained by non-technical personnel. This means a person editing the site does
not have to depend on a webmaster or an IT group to perform updates, providing faster turnaround
time for new pages and changes.
→ Separation of design and content: The design templates and content are separated from one
another, which means that the consistency of the design can not be changed when editing content.
The design will stay the same on all pages and will not be altered when making changes or updating
content. This capability ensures the coherence of the graphic style guidelines. At the same time, it
allows the reuse of content in different formats and platforms.
→ Decentralized and multiple authoring: CMS offer typically web-based administration and edition
interfaces, allowing to edit content anytime and from any location. Also, multiple authors can be set-
up to access and edit content, and even to specify different sections that a user can access and be
allowed to change. This prevents a person from editing a page he is not allowed to change.
→ Increased site flexibility: Websites must quickly adapt to match new products, services, or corporate
strategies. The CMS supports easy and streamined re-structures and interface re-designs, such as for
instance updating all pages to reflect a new corporate brand or image. Without the use of a CMS, ad-
hoc publishing processes prevent effective management and accountability.
→ Improved information accuracy: Duplication of information across business units and platforms
increases maintenance costs and error rates. Wherever possible, information should be stored once,
and reused multiple times. Without a CMS, a manual process for updating website information is slow
and inefficient.
→ Greater capacity for growth: The use of a CMS implements a methodology for managing a site,
both from the publication and system administration point of view, which prevents the chaotic
explosion of ad-hoc processes. This way, the site becomes manageable, with greater consistency,
and reduced duplication of information, which can keep growth up while maintaining the costs.
→ Search engine friendly: Content management systems enforce the optimization of the published
pages from search engine perspective, providing customizable title tags, meta tags and URL. This
way, the pages are keyword rich and more easily found by search engines.
3.1 Definitions
Before entering into details, it is important to provide a definition of a CMS itself, and to clarify the two most
important concepts in CMS: content and metadata.

3.1.1 Definition of content
We often look at content as any information appearing on a web page. Sometimes we refer to it as media.
Content is in essence, any type or "unit" of digital information. It can be text, images, graphics, video, sound,
documents, records, etc, or, in other words, any piece of information that is likely to be managed in an
electronic format.
What differentiates content from just information or raw data is its structure and metadata. The structure of the
content is a set of tags or attributes where the content is placed, and which allow to categorize and index the
content. For instance, an encyclopedia written in a single, raw, continuous text file is completely useless and
valueless from the content point of view, because we need to browse through the whole file in order to find the
information we are looking for. However, we may model this encyclopedia out of content templates: one for
articles, another for images, etc. We may define tags within the article template such as Title, Heading,
Body…, and we may force the editors to write the articles following this template. This way, we have just
created a structure for the content.
We will then use the term content as any kind of structured information.
3.1.2 Definition of metadata
Metadata, often called data about data or information about information, is structured information that
describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.
It is difficult to set the frontier between data and metadata. When referring to textual data, the difference
becomes clearer: data is information which is intrinsical to the content, while metadata is extra information
aimed to catalog the content. Data is typically invariant, and metadata can change depending on the use
scenario. For example, in our encyclopedic article the attributes Title, Heading, Highlight, Summary, Body,
etc, are data, while attributes such as Subject, Keywords, etc, are metadata. However, other attributes not
directly visible in the output, but intimately related with the content, may also be considered as data, such as
ID, Source, Author, Date, Number of words, etc.
When talking about multimedia, the difference might not be so clear. In most cases, the added information is
stored outside the content file, but not all of it should be considered as metadata. For instance, the codec,
codec parameters, resolution, duration of a movie, etc, are intrinsic to the file, and should be looked at as
data. Other attributes such as director, genre, actors, rating, production date, language, etc, are metadata. But
what happens with attributes such as price or broadcast date? They are not really data nor metadata, but
associated parameters. But apart from this [academic] discussion, all this kind of information is often
considered as metadata.
Metadata schemas are sets of metadata elements designed for a specific purpose, such as describing a
particular type of information resource. The definition or meaning of the elements themselves is known as the
semantics of the schema. The values given to metadata elements are the content. Metadata schemas
generally specify names of elements and their semantics. Optionally, they may specify content rules for how
content must be formulated.
3.1.3 Definition of content management
Content Management is the management of content.
OK, this definition is a tautology. But it’s intended to mean the application of management principles to
content, as the goal in implementing content technologies and related processes, which span from document
inventory to semantic search.
3.1.4 Definition of CMS
There are many uses of the term CMS as Content Management System. We will focus on three.
→ The first definition is: "A CMS is a system capable of managing content". Although it seems
obvious (and therefore not very useful), this is really the concept managed by people working on pure

content management, such as the domains of content providers, or those of semantic web. This
generalistic approach becomes more clear when we translate the term "to manage" as the ability to
model, capture, structure, store, modify, cross-link and transcode content, as well as its associated
metadata.
→ The first definition is excluding an important feature pursued in CMS, which is the content
representation as consumable output. On the other extreme to the first definition, and putting the
emphasis on this functionality, we may define CMS as "a system that allows non-technical users
to manage portals". It is more precise than the first one, but this is quite a restrictive definition,
because it assumes a particular business scenario and a particular CMS usage.
→ We prefer this other definition: "A CMS is a system that provides support to the content life
cycle". It seems that, this way, we are postponing the definition, because we are linking one concept
(CMS) to another one (content life cycle), but doing so it will help us in the understanding of important
concepts about content and CMS.
It is necessary therefore to dig on the concepts behind the content life cycle.
3.2 The content life cycle
We will differentiate 5 main stages in the content life cycle:
→ Content sourcing
→ Content authoring
→ Content management
→ Content publication
→ Content delivery
We will go through these stages, and at the same time we will be providing the rationale for the functionalities
requested to a CMS. For the sake of illustration, we will start from the basic and central one: content
management.
3.2.1 Content management
This step basically fits the first definition of CMS we presented in 3.1.4. This is where both content and
metadata are modeled, created (if our CMS creates content) or captured, stored, associated, modified, cross-
linked and transcoded.
Sourcing / Authoring Publication Delivery
Management
Templates
Content
adaptation
Animation
Precharged
Dinamimic /
static
Online
Contents origin
Manual ingestion
Agregation:
adquisition +
transcoding
Syndication
Storing
Data modeller
Representation +
metainformation
Relation of Contens
& Metadata
Animation

Content usually needs to be stored in a database. Database models for content differ from those found in
typical information systems. The stored objects are basically:
→ The pure content objects, with its internal structure and metadata
→ Links to binary files, in the case of multimedia. Due to their size, it is not a good idea to store them in a
database table, which besides does not add value.
If we consider news content, for instance, it is usually represented in a database as a serialized object in the
form attribute=value:
Object {
ID = 123456;
Source = "Sky News";
Date = 19 April 2010;
Title = "UK Flights Grounded Till At Least 7pm";
Heading = "Flight crisis continues";
Short = "Volcanic ash cloud UK flights ban enters fifth day";
Body = "The volcanic ash cloud crisis entered its fifth day today with the UK remaining a no-fly
zone until at least 7pm. Traffic control company NATS issued its latest statement at just
after 3am this morning. It read: 'Based on the latest information from the Met Office,
NATS advises that the current restrictions across UK controlled airspace due to the
volcanic ash cloud will remain in place until at least 1900 on Monday 19 April. Anyone
hoping to travel should contact their airline before travelling to the airport.'
Conditions around the movement of the layers of the volcanic ash cloud over the UK remain
dynamic.";
Highlight = "What we are looking at doing is flying people from further afield into Spain and
using alternative means to transport them from Spain.";
Photo = "http://www.orange.co.uk/images/editorial/volcano330x143pa.jpg";
[...]
}
Around the bare content object, we can start building the metadata, by defining and populating the desired
metadata attributes:
object {
ID = 123456;
[...]
metadata {
Type = "News article";
Subtype = "travel, natural disaster, politics, economy";
Lang = "English";
Keywords = "flight, crisis, volcano, ashes, UK, airport, Iceland, ban, airport, chaos";
[...]
}
}
When dealing with textual information, metadata accompanies data in the database representation of the
object. It is modeled and stored inside the content object itself, and not in related tables, in a schema that
separates from a relational model. However, it is impossible to embed metadata in some types of objects (for
example, digital videos), or doing so (for example, EXIF) we will not facilitate search and retrieval. In these
cases, metadata is commonly stored in the database system and linked to the objects described. Therefore,
when dealing with multimedia, it is preferable to create specific multimedia classes, and reference their
objects by ID. In our example, we can build the Photo class:
object {
ID = Photo_987654;
Title = "Eyjafjallajoekull volcano";
original {
Format = "JPEG";
Dimension = 330x143;
Size = 13330;
Path = "http://www.orange.co.uk/images/editorial/volcano330x143pa.jpg";
}

wap {
Format = "GIF";
Dimension = 220x82;
Size = 4571;
Path = "http://www.orange.co.uk/images/editorial/volcano220x82pa.gif";
}
Keywords = "volcano, ashes, Iceland, cloud";
[...]
}
Therefore, in our article representation we may refer to the photo by its ID:
object {
ID = 123456;
[...]
Photo = Photo.ID(Photo_987654);
}
So, let's review the functionalities we should expect from a CMS regarding its primary function of content
management:
→ Metamodel: The CMS should provide powerfull tools to create ad-hoc, complex metamodels. The
system must allow the definition of basic data types (string, number, date...) as well as complex data
types. A complex data type may be, for instance, a Person, which is composed of Name (string), Last
name (String), Date of birth (date), Phone number (Number), Photo (binary). Complex data types
should be reusable within more complex data types, in the same way a Person data type is made part
of an Organization data type. Furthermore, the system should allow the dynamic extension of a given
model with more attributes, without needing to rebuild the whole model.
→ Content repository and database schemas: In CMS language, databases are referred to as content
repositories. They typically have a hyerarchical, tree-like, attribute=value object-oriented schema, with
a native-XML internal representation. For example, the weather for a particular place may be
represented in a tree such as Continent -> Area -> Country -> Region -> Province -> Village. Although
relational models are not often found as content repositories, it is important that the CMS provide
connectors to these kind of databases. For example, theater programming or flight schedules typically
obey to relational schemas.
→ Raw representation: Content should be stored in raw mode, typically XML, completely independent
from its representation and late usage. If we like to highlight part of the text, it is not a good idea to
include mark-up within the content object itself, because its representation may invalid the
multipublication. If we insert mark-up tags such as <b>-</b>, first we are assuming a HTML
representation, and second we are inserting formatting characters that may not be recognized for
other uses. Particular attention should be paid to Wysiwyg capabilities: if these capabilities introduce
garbage into the internal content representation, Wysiwyg (What you see is what you get) may turn
into "but it`s the only thing you get".
→ APIs and Massive loading: Content repositories should provide efficient APIs to deal with content, in
order to isolate its internal location and representation from GUIs and management tools. APIs should
typically implement content access functions such as: "Give me a list of max 10 content objects from
the Article class, whose Keyword attribute contains 'volcano', sorted by insertion date", or "Update the
attribute Editable with value True to the Article object with ID=123456". The repository should also
provide means to do a massive update, like for instance "Add the new attribute Lang to the Article
class, and update all its objects with value English". Additionally, means to perform a complete rebuild
of the repository is also necessary: for instance, we may need to replicate a production repository into
a development one, while keeping the internal links and relationships within data types, classes and
objects.
→ Multi-language: Multilanguage creates a real challenge for CMS. And not only because we need to
provide 'clones' of articles, in the different languages, but also because metadata, which should be
shared between copies of the objects, do also have a language, and because there typically exists
separate editorial organizations that maintain each language version, which makes difficult the task of

keeping the consistency of the repository along time. There are cases where the same content object
must be perfectly replicated in two or more languages (for instance in corporate CMS, or in Web CMS
for countries like Netherlands), but in other cases the preferences of the audience differ with culture.
3.2.2 Content authoring
Content authoring is the process of manually interacting with the content, in order to create, edit, search, filter
and catalog it.
In our prototypical CMS we already have a model and a database. Now, we need at the front some basic tools
to interact with it, in an easy-to-use authoring environment. The very first need is a set of forms that would
allow us to search, populate and edit our content objects, that provides a non-technical way of creating new
pages or updating content, without having to know any HTML. The forms should be as dynamic as the
metamodel itself, so, if we add an attribute to our classes, we would like to see our forms to automatically
include the new attribute. Auto-form generation is therefore a basic requirement.
Editing tools should provide the capability of:
→ Text insertion
→ Linking
→ Image, video and audio uploading
→ Managing multimedia libraries
→ Browser–based image editing
→ Searches through content repository
The CMS also allows to manage the structure of the site, that is, where the pages go, and how they are linked
together. It is this authoring tool that is the key to the success of the CMS. By providing a simple mechanism
for maintaining the site, authoring can be returned back into the business itself, so marketing managers and
product managers can modify and update themselves the content of the site.
Around these two basic functionalities, content edition and sitemap management, let's review the basic
requirements for content authoring:
→ Auto-form generation: In the most basic usage, content is interacted with by means of forms.
Content is defined by its metamodel, which is flexible and ever changing, and this flexibility must be
extended to the form application. Therefore, the CMS must provide the capability of auto-generating
the forms needed by content editors. Form fields do not have to match only datamodel fields, they
may specify content rules for how content must be formulated (for example, how to identify the main
title), representation rules for content (for example, capitalization rules), and allowable content values
(for example, terms must be used from a specified controlled vocabulary). Specific requirements for
edition forms are: search by any content field, multi-selection, multi-edition.
→ Search: As noted above, Search is a mandatory function in editing forms. Search would enable to
find content on the content repository by specifying one or more search terms. Notice that in this
context we look for pure content matching, sintactic search, as opposite to semantic: we want to find
and edit exactly what is in the content database.
→ APIs: If APIs for accessing content are not provided, content insertion would be resticted to manual
forms.
→ UGC: User generated content may be incorporated to the CMS, typically by use of the above
commented APIs, and the most common examples are blogs or videos. External users do not only
create content, but also metadata. For instance, assigning comments or ratings to existing content.
Moderation could be necessary, which would increase the need of human intervention.
→ Multiple selection and edition: Sometimes, an amount of similar content pieces need to be modified
simultaneously. For instance, when we add a new metadata tag and we want that tag to be populated
(i.e. Lang="English"). Our content edition tools and APIs should provide for the capability of managing
multiple selections and adding or modifying several attributes in one action.
→ User rights: Sometimes, content articles or certain attributes of these articles need to be protected
from edition by particular user profiles. However, complex profile arrangements may unnecesarily
complicate the internal CMS data model, as well as its administration.

→ Multimedia edition and transcoding: Content edition forms should not only be resticted to textual
data, but also applicable to binary data such as multimedia. It is generally very useful and handy to
have simple tools to quickly transcode a video to smaller formats, or quickly resize or crop images.
These functionalities might be also accessible through APIs, by defining transcoding rules, which is
very convenient if they are also attached to automatic feeds.
→ Versioning: The need for content versioning has to be thoroughly thought of. If content is not edited,
there is no clear need for versioning. If, on the contrary, content is created or often edited, and each
edition needs to keep a copy of the changes, a database size study must be done in order to assess
the growth and possible performance issues.
3.2.3 Content sourcing
Content sourcing is the process of filling our system with externally generated content. Content may obtained
by several forms:
→ Creation/Edition: The CMS provides tools to editors in order to create, or author, content from scratch.
→ UGC: Another way of generating content is to make the final users of consumers insert their own
content into the system, or UGC User Generated Content, as it happens with blogs.
→ Feeds/Syndication: Content is acquired from external content providers, an transferred in electronic
format through the network. It can be continuous, like in stock exchange.
→ Aggregation: Content is divided into parts, and each part is taken from a different source.
These are the main functionalities for content sourcing:
→ Feed integration: The most typical way of inserting content into our CMS is by connecting it to an
external feed, especially for highly automated CMS in charge of actuality portals. Content may come
from external content providers in different forms. In our feed integration analysis we need to
consider:
o Feed transport: data may be accessed by network connections, typically by network data
transfer. The transport mostly depends on what the providers do offer, and which are their
publication facilities. It is therefore important that our feed system could be configured with
transports such as HTTP, FTP, even eMail. In certain situations, such as continuous stock
exchange feeds, we may need to integrate a provider-delivered client. Other cases may
require a direct connection to the provider's database, better through a layer of Web services
APIs.
o Feed exchange protocol: Defining the transport (HTTP, FTP, ...) does not completely solve
the data transfer problem. Along with the transport there is always a protocol. So, an FTP
transfer could be originated by the source (PUT) or by the destination (GET). Our feed system
should provide configuration capabilities to manage the different protocols. For instance:
We should be able to schedule an FTP GET, every day at 06:00, to get weather data.
If data is not available, we shoud be able to define several additional tries. After
succesful retrieval, we should be able to trigger a publication, or an alert in case we
couldn't find any data.
Sometimes, the provider deliver us data, in an asynchronous basis, in an input tray.
Our feed system needs to be configured to poll this tray for the existence of new data,
and to trigger subsequent actions (one of them to move already processed data to
another directory).
o Data format: Data may come packaged and formatted in different ways.
Packages: Sometimes we receive a zip-packaged file, which includes textual data,
metadata and multimedia binaries. Data is typically XML, and they reference
internally the associated binaries. Our feed system should be able to recognize these
packages, to extract them to temporary folders, and to execute a consistency check:
data formats are acknowledged as previously agreed, there are no references to
missing files, and all files are referenced.

Formats: Most textual data come in XML format. If it does not match the internal data
and metadata models, we need our feed system to be configured with the required
tag matching processes.
→ Scheduling: Except for editorial creation (i.e. an editor creates manually an article), some event-
driven conditions (i.e. breaking news) or some continuous feeds (i.e. stock exchange), the ingestion of
content is not typically manual. In most situations, content is fed from external providers, and this
happens at specific moments. For instance, weather feeds are taken at 6AM and 6PM, or news feeds
are updated every 30 minutes. Therefore, it is necessary to have the capability of programming
actions at particular moments in time. We will see later that this capability is also required at
publication time.
→ Filtering (or moderation): Special care should be taken with content sources that provide a
continuous flow of feeds, such as some news agencies. We should consider to store in our database
only that content that is going to be published and that requires persistence. If the flow generates a
high number of updates per day, say 1.000, it may saturate our database in a few months. There are
two approaches to this scenario, either to publish the content directly, without inserting it into the
database, or to filter it prior to updating the database.
→ Multimedia transcoding: Content providers usually provide neutral formats for multimedia, that are
not suitable for the publication channel. For instance, video clips in DVD quality, or big photos, that
cannot be shown in WAP. Therefore, when we link a source (i.e. BBC videos) to a destination (i.e.
WAP), and along with the feed ingestion, we need automatic rules for transcoding to the different
formats, codecs and quality.
→ End-to-end control: An important functionality is to have and end-to-end view of publication
processes originated on external providers, from feed to publication, and driven by a schedule. Let's
suppose the weather feed has not been updated. We need to diagnose and fix the problem: maybe
the provider had a problem, or the data transfer failed, or the data was corrupted, or an unexpected
change in the format, or the filesystem is full, or the database had a problem, or the publication
failed...
3.2.4 Content adaptation and publication
Once the final content is in the repository, it can then be published out to the website. Content management
systems boast powerful publishing engines which allow the appearance and page layout of the site to be
applied automatically during publishing. It may also allow the same content to be published to multiple sites.
Of course, every site looks different, so the CMS lets the graphic designers and web developers specify the
appearance that is applied by the system. These publishing capabilities ensure that the pages are consistent
across the entire site, and enable a very high standard of appearance. This also allows the authors to
concentrate on writing the content, by leaving the look of the site entirely to the CMS.
Publication is the act of putting together the content and the content representation (design) into a rendered
output (web page).
For publication, CMS use templates. Templates are instructions to generate the rendered output, which in
most of the cases are web pages. We can see templates as empty skeletons of the final page design and
representation, and having references to content queries and to content data attributes. For example,
consider a template for football news pages. First, the template refers to a content query: show only content
articles with the tag “Football news”. Second, the template defines how to represent the different content
parts: summary in bold, header in a boxed rectangle, text in Arial 12, photo in a 300x200 side box, etc.
Web pages are not only content. They also include, inside a HTML container modified with CSS, several static
and common components (header, footer, menus,…), code (JS, Ajax, Java), applets (i.e. a player),
advertising, images and multimedia. Some of these components are incorporated as server-side includes
(which by the way might have been rendered separately by other templates of our CMS).
Adaptation is the action of configuring the content and/or the content format according with the platform or
device.
At publication time, content may be adapted:

→ depending on the publication platform or browser capabilities, identified by User Agent: in this case
we may have different templates, one per publication platform (Web, WAP, TV...), and create different
pages accordingly
→ depending on the content itself: we may select different content categories according with the user
personalization
Adaptation can be done at publication time, at delivery time, or both:
→ At publication time, having different templates (i.e. Web, WAP...) or pre-publishing all available
content (i.e. horoscope signs)
→ At delivery time, inserting dynamic content or generating the output in a neutral format and passing it
through an adaptation engine, as it is the case of OML and CADAP
Personalization is the Web page accommodation based on the characteristics (interests, social category,
context ...) of individual users. Personalization implies that the changes are based on implicit data:
→ Based on something that the user knows, such as gender or a password.
→ Based on something that the network knows, such as the User agent or the radio network throughtput
→ Based on something that a server knows, such as billing credit, items purchased or pages viewed.
The term customization is used instead when it is the user himself who configures the appearance of the
page.
Publication can be made statically or dynamically:
→ Static or pre-published: renderization is done in batch mode, generating all outputs and all content
combinations. At delivery, the right format and the right content is chosen from all possibilities. For
instance, browser versions can be chosen from the User agent; a particular horoscope sign is chosen
from user data present in a cookie.
→ Dynamic: Pre-published pages can be seen as "baked pages": they are made available before the
user access them. On the opposite, dynamic pages can be seen as "fried pages", because they are
created on demand. Although it may seem that all personalization requires dynamic publication, some
remarks can be made:
o Pre-publication needs far less resources than dynamic publication.
o For capacity planning purposes, it must be analyzed what kind of HW and SW infrastructure
is needed if every user access to a page requires a transaction in our CMS database.
o Consider the response time, the possibility of a failure.
o Pre-publication only needs a web server and a hard disk, which is a very simple, cheap and
fast arrangement.
o Not all personalization is necessarily based on dynamic generation. Any piece of content
viewed by more than one user is suitable of being cached. A clear example is the horoscope.
Let's now review the desired functionalities for content adaptation and publication:
→ Templates: All CMS use templates. They may be based on a mark-up language (HTML, OML, ...) or
based on scripting (PHP). The first provide a more interactive control on the final output, while the
second provides more versatility.
→ Design: HTML, CSS, WYSIWYG: Templates may be defined in the same mark-up language as the
final output, for instance HTML+CSS. In this case, we need to provide for special tags in order to
reference the content attributes (at publication time, the publication engine will substitute these
references by the actual content). Some CMS include HTML editors for this purpose, and they are
called WYSIWYG editors. But no CMS editors would beat the edition capabilities of specific tools such
as DreamWeaver. Powerful reasons should backup the decision of using in-CMS edition against
specialized market tools (i.e. simplicity, tool integration, inmediacy, avoid dispersion of code...).
→ Site definition: A collection of web pages do not define a portal. We need at minimum a sitemap,
which defines the navigation tree where all pages fit, and the cross links between pages. It is a must

to have a tool to define and manage the sitemap, independently from the pages. It will allow us, for
instance, to move a branch with all its pages from one place to another in an easy way.
→ Editorial rules: All portal sites make use of editorial rules, at least implicitly. This way, the editor
knows that if he adds a new article in the sports section, he may need to add it to the index, and
possibly to the home page. CMS could make these rules explicit, and so help the automation, if it
allows us to program the rules: show the 10 most recent news in the index, and show the most rated
one in a small box at the home page.
→ Cascade publication: There are situations where we want to apply a style change to a section of the
portal, to modify an article that appears in several positions. The cascade publication facility allow us
to trigger all the updates of all the related entries, forced by the sitemap.
→ Scheduling: A portal is always changing, and CMS use schedulers to control the changes. We may
define clock events, associated to editorial rules, in order to produce the morning, evening, night and
weekend versions of our portal, in an unattended way. We can also attach schedule events to feed
ingestions, and update the weather datat several times a day.
→ Publication APIs: Publication triggers should be wrapped by publication APIs. This way, we can
manage the publication not only from interactive tools, but also by schedulers and other external
events. APIs should accept parameters, used by the templates in order to categorize the query to
content. For instance, if I'm looking for the weather in Madrid, I will write Madrid in a text box, and it
will be passed to the template as a parameter in order to retreive the desired information.
→ Preview: When a site is updated manually, it is very convenient to have a preview on how the
changes look like before final publication. That will allow us, for instance, to insert a line break at a
news heading, for better fitting.
→ Workflow: If a portal is very sensitive on its content, in the sense of the message that it transmit, as a
branch image or an editorial line, the CMS may need to provide a publication workflow. This way, the
articles may be reviewed and approved by adequate responsibles before publication. If, on the
contrary, the portal is more generalistic, the workflow may impact on the dynamicity, since
responsibles may not be available when needed.
→ Un-publication: We use to concentrate on publication: the content fills the database more and more,
and pages are constantly created. We may lose track of expired pages, that are no more published,
or worse, of outdated content wasting database space. Sometimes we need to republish or rebrand a
complete portal, and we don't know which data and pages we need to modify. Therefore it is a good
feature to have a back link from pages to its originating data, and some attributes in the data that tell
us if and where the content is actually accessible. That will allow us to un-publish, or to make a
massive deletion.
→ SEO: A critical feature of CMS is to provide for Search Engine Optimizations. CMS will help us to
improve the algorithmic page rank of search engines if it support powerful SEO rules, such as: include
keywords in the URL name, in the title tag, in the Description metatag, keyword location and order,
etc.
→ Advertising: Content Web sites make their money primarily from advertising. Advertising comes in
the form of banners inserted into the web pages. There are 2 basic functionalities about advertising
we need to ask our CMS:
o Ad insertion: banners are inserted into web pages by placing a small Javascript code which
will request the banner from banner ad networks (i.e. Doubleclick)
o Clickthrough measurement: When clicking on a banner, or when induced traffic needs to be
measured, the HREF links are wrapped around a small JS code which creates a log entry.
o In both cases, the template engine of our CMS has to allow the insertion of this Javascript
→ RSS and Sydication: RSS, or Really Simple Syndication, is a XML format to share content. Our CMS
may publish RSS, and users get updates by subscribing to the source, using a SW designed to poll
and read these content. Content providers use to publish their information in RSS which is fed into our
CMS, or our CMS can publish RSS both final users or for partners, what is called web redifussion or
web syndication. RSS publishing needs a specific set of XML templates, following a protocol called
ATOM.

→ Portlets: Portlets are modular components of a web GUI, which implement specific functionality very
useful in our web sites to generate dynamism and user interaction, such as photo galleries, micro
polls, user comments, etc. Portlets are usually available in Javascript, included in the web pages via
the templates, and are usually managed in CMS as a library of reusable code for template
programmers. Raw javascript components adapt to the site look&feel using CSS.
3.2.5 Content delivery
Content delivery is the transport of content, renderized in its final output (HTML) from the web server to the
final customer SW, usually a web browser.
→ Caching: Caching is critical for server performance. For operational purposes, the ideal situation is a
transparent cache, but sometimes the CMS must have the capability of purging the caches, for
instance when updating static content. Web site design must take into consideration that there may
be a number of non-controlled transparent caches or proxy-caches.
→ Content distribution: The CMS uses to end at a web server, but this may not the server accessed by
the final users. Users may be geographically distributed, therefore is more efficient to distribute the
content through a number of mirror servers, closer to the user from a network or DNS point of view.
The net of replicated servers managing the same content is a Content Distribution Network. Although
CDN provide optimum network usage, load balancing and fault tolerance, CMS are requested to
manage the CDN by pushing the content updates through the CDN nodes. Notice that this schema
works well for static content, where the content shown to users are not retrieved on-line from a
database, but from a network distributed disk.
→ Pull vs. push: Most usually, content is pulled from the web server by the browser, following user
actions. This is how the basic HTTP protocols works, following a request/response pattern. But
sometimes the server needs to push content: for instance, in B2C SMS and MMS services, or when
updating a CDN.
→ Geotargetting: To geotarget means use the location of a user in order to deliver different content
based on that location. In web sites, geolocation is often obtained from the IP address, and it is often
used for advertising and even for restricting content to those geolocated in specific countries.
→ Tracking and statistics, audience measurement: A critical requirement for any web site is to
produce page and visitor tracking. If the front-end web servers are under control, and their number is
small, this can be achieved by generating access logs, which are periodically collected for later
process. But if the front-ends are distributed, or their number is high, it is highly unlikely that all web
servers are available for over-the-night collection and process. A common solution for scalability is to
include the call to a 1x1 pixel hidden image in all web pages, which are served from a reduced set of
dedicated servers. This call can be enriched with data contextual to the page, such as domain, site,
section, page, etc, which is used later for reports and business intelligence.

3.3 The process and roles
Design (templates) + Content + Animation = Portals
In the process of any CMS we can distinguish two different main blocks:
→ Desing, which includes the translation of business requirement into the final specifications needed to
build the sitemap and templates of the Portal
→ Opertations, which includes the ingestion, management and publising of content in the different
platforms.
The distinction between structuring and displaying content is one of the key features of any CMS, avoiding the
need for content writers to be concerned with technical details.
If we talk about roles, we can describe the following players:
→ Designers who design a static representation of the site, extract the final data model from this
representation, and convert the static representation to the required template.
→ Programmers who add dynamic behaviour to the portal and integrate feeds.
→ Editors who create or modify contents and publish the content to end users manually or automatically
through the scheduler.
The separation of content and design provides to editors and designers a easy way of working separately
without conflicts, allowing also that content can be published easily in multiple formats.

3.4 Operational issues
A CMS is a computer system. As any computer system, it must be properly planned, dimensioned according
to usage, operated and maintained. There are however some special characteristics that need to be
considered in its operational lifetime.
3.4.1 Service levels
Service levels of a SW system are usually measured in terms of availability and response time, which tend to
be closely related to operating costs. For a web site, availability and response time are key for success. But
web sites do not usually have strong business models supporting them, therefore it is critical to keep the
operational costs as low as possible.
A CMS needs to be both performant and cheap. The solution for this paradox is to 'keep it simple': a web
server delivering cached, static HTML pages is about the simplest SW system we can find. Of course the
CMS will be pushed to support dynamic, personalized, interactive portals, but the ideas of simplicity need to
be always in mind when designing or parametrizing them.

3.4.2 Staging
When managing a CMS we need not to think only about the productive environment. A portal is a dynamic
entity, and there are continuous changes: new feeds to integrate, new data models to accomodate, new
sections in the portal, new designs in the pages... At any single moment in time, there will be editors
managing the live portal, but there will also be programmers in the background making changes to templates
and data models. These changes have to be made in a separate environment, and once finished and tested,
updated to the productive environment.
The CMS therefore needs to provide the capability of managing several environments as an intrinsic
functionality, because the staging is not a simple issue of mirroring databases and SW repositories
(templates, scripts, etc), as we provide some illustrations:
→ When working in test environments, we need to make a copy of the content database, or a portion of
it, from the production environment. The copy has to be coherent: it must include selection of content,
but also the internal references of classes and content objects (photos, videos, etc). Semantic,
attribute-value database schemas are not so easy to replicate as traditional relational schemas.
→ Sometimes, the mirrored system needs to be fed with the productive content feeds. This means the
feeds need to be separated in two streams, one that goes to the testing platform and the other to the
productive platform. And that includes text, metadata and multimedia.
→ Testing environments are not dimensioned at the same scale than productive ones. If we plan a
simple mirror from production, the environment could not be big and powerful enough.
→ A web page is not an auto-contained object, but it includes or references other entities, such as
headers/footers, images, banners, links, Javascript/Ajax libraries, #includes, etc, and it is served
through a specific network that includes DNS, caches, etc. When testing a modified template of a
page, we need to make sure that all these external entities do work as well.
3.4.3 Tracking and statistics
The capability to provide tracking and statistics is mandatorily requested to CMS. But to provide data for
statistics is a completely different matter than to produce statistics. Producing reports and statistics is indeed a
convenient feature for small size portals, where a complete, closed, integrated solution is valued. But for big
portals it may not be so convenient, because the reporting, data mining and business intelligence
requirements will grow accordingly, and they may architecturarly shift the primary goal of a CMS from
producing portals to producing reports.
The CMS must of course provide means to allow the production of several statistics, such as: categorization
of pages, click-troughs, etc, but it is recommended that the collection, aggregation, reporting and analysis of
these data are made outside the CMS. There are several reasons to argument this:
→ A CMS should not be seen as a Data Warehouse. For producing daily audience measurements, log
data may be generated by millions of hits per day. These data have to be inserted into a database for
aggregation, reporting and analysis. The CMS database, its data model and its database base SW,
which is specialized for storing content, it is surely not the most efficient system for this kind of huge
data management.
→ Typically, audience reports are managed by corporate data warehouses. DWH have their own system
management and user administration, providing support to business intelligence analysts. Apart from
sinergies, the service levels of such systems would be far better than the ones that a CMS would
provide.
3.4.4 Migrations
One portal can be migrated from one CMS to another. But when planning such an action, several issues must
be taken into consideration:
→ The internal data model of content and metadata is tailored to the design of the site and to the
processes that are integrated into it, such as feeds, edition, publication, syndication, etc. Migration is

often related with redesign, rebranding, renegotiation with providers and/or transfer between
organizations, so there are high chances that the data model needs to be revisited.
→ The site pages are generated by templates, which in essence are programs. It is very unlikely that
templates could be ported and reused from one CMS to another, even if sharing the same language
(PHP for instance). And we should not generally expect that a clear documentation exists around
these templates.
→ The portal usually stores a historic of content, that needs to be migrated, along with its metadata, from
one repository to another. The repository includes databases (with classes, objects and internal
references), files in filesystems, symbolic links, CGIs, Apache redirections, and all sort of
configurations.
→ One part of the repository is the set of historic, published pages, which may be a huge number of web
content (HTML, images, videos, icons, frames, etc) and their corresponding URLs. These pages need
to be accessible from the new portal, as search results for instance, but they will look with the same
design and styles as when they were created. Sometimes this is solved by a massive republication
with the new templates; in other cases these files are batch-processed to make the modifications.
One way or another, it is not a trivial task.
→ We may find that the content in the old repository is not as clean as it needs to be. After several years
of content entered into the site by people who usually have a high rotation rate, it means that a lot of
pages will have old font tags, some won’t be structured correctly, and every page will be a different
adventure. One of the great things about content management systems is that anyone can edit the
content, but one of the worst things about content management systems is … that anyone can edit
the content.
All in all, the experience says that when a portal is migrated, it needs to be completely rebuilt from scratch.
Furthermore, just because the new system has been picked to fill all business needs, it will not fit all the
needs. CMS are picked for all kinds of reasons other than the right reasons. Sometimes it’s budget,
sometimes it’s for personal preference, sometimes it’s a political issue. When presenting the new CMS to the
existing organization, most of the things they do in the beginning will take a lot longer to do than the previous
system because the previous system was custom to their needs. No out-of-the-box CMS will completely fit the
gap, so we need to figure out how to alter the workflow and business processes to fit the new system.
3.5 CMS Web vs CMS Video
This paragraph will try to set the differences between a CMS oriented to manage web portals and one
specialized in video. First, let's try to categorize a video portal. What is a video portal, or a portal specialized in
video?
From a CMS point of view, a web CMS deals with text content as principal subject of management. Content is
ingested without major transformations; content databases and classes model and store the data plus the
metadata; articles (content objects) are linked based on the metadata; articles are indexed based on the
textual data for search engines; and templates provide the graphic representation or renderization. From the
output perspective, users find articles basically using search engines, the heading of the article being the
decision point to click and dig into the details, articles are read first diagonally to see if they are interesting,
and then read in a “lean forward” position. Eventually, web articles can include multimedia content, either
photo or video, but only as a complement to the textual part. Text articles are cross referenced based on their
metadata, but accompanying multimedia is not.
On the other hand, a video CMS must deal with multimedia (audio, photo, video, streaming) as the basic
subject of management. Usually multimedia needs to be transcoded when ingested. Files are not usually
stored inside the database tables because of its size; instead, the file is stored in a file system, and the
database has a pointer to the file. Since there is no information inside the binary asset that may help its
categorization, metadata becomes more crucial. Since a video clip cannot be read diagonally, metadata is the
main decision point for a user to decide if the video is interesting or not. Therefore metadata must not only
refer to the description and categorization of the content, but also to some attributes that present the video
attractive, and these attributes are usually incorporated by other users, in terms of ranks, comments, number
of downloads, etc. Therefore, in video CMS metadata is live and dynamic, and comes from a feedback loop

from the very same consumers. In video CMS, multimedia content objects are cross-related based on the
metadata, although they can include textual descriptions as complements.
One final consideration about video portals published in web is whether the CMS should include the player or
not. The player provides control on the service and on the business model. The player guarantees the user
experience is good and coherent, it allows to insert advertising, it allows to show related content and to
retrieve community feedback. But on the other side it limits the experience to the web; if the service and
business model has to be opened and extended to other scenarios, such as connected TV, multimedia
players, media centers, etc, the lack of player must be replaced by APIs in the CMS side, providing similar
functionality.
We may find some degrees on video portals, or on video CMS, from Youtube and 24/24 Actu to Orange TV.
Let's try to analyze which are the main features requested to the supporting CMS in these examples.
3.5.1 Ergonomics of PC vs. TV
Web users take an active role in their content consumption experience, while TV viewers are passive. There
are some reasons for this difference in behavior:
→ The ergonomics of the environment for web users is different than for TV viewers. Web users sit
leaning forward, close from the monitor, with one hand managing a mouse and the other on a
keyboard. These input devices are designed to allow users to control their interaction, and therefore
their experience. Their 10 fingers are seized for control.
→ Leaning forward and being close to the monitor also facilitates the population of the web page with
numerous content elements, which are not only reachable with the mouse, but just readable!
→ Leaning forward also places the input device, such as the keyboard, close to our view, so we don’t
need to make eye focus efforts to change our vision from the keyboard to the monitor, even very
frequently. The same activity in front of a TV can make us sick.
→ Web users are offered numerous chances inside the browser to click away from any given web page,
such as a back button, a search box, links inside the page, banners, etc. This makes web users to
spend a few seconds per visited page.
→ Web users have distractions to events and applications outside their web browser, which frequently
interrupt their consumption of web content.
→ Even the web navigation is layered, because the user can control several instances or tabs of his
browser, so multiplying the chances of non-linear navigation.
→ PC is basically for individual use, while TVs are shared between the elements of the family. That gives
more freedom to a web user to navigate erratically in a non-linear manner.
→ A PC browser is easily upgradeable to include support for new libraries and SW architectures. This is
obviously not the case when dealing with TVs.
There are a lot of more differences and human factors between the PC platform and the TV platform that need
to be considered before asking a multiplatform CMS to solve all of them. It’s not just about adapting the
content, but about adapting the service.
3.5.2 Video-only CMS
For video-only CMS, including VoD and Catch-up TV, CMS are usually referred to as DAM (Digital Asset
Managers).
In VoD, the digital library is the core of the business. DAM refers to the management, organization, and
distribution of digital assets from a central repository, which involves controlling the entry of new digital assets,
assigning and editing metadata associated with the asset, and providing means for indexing and search. A
digital asset is any form of media that has been turned into a binary source. MAM, for Multimedia Asset
Management, refers to assets before being digitalized.
DAMs offer the following functionality:

→ Video and audio ingestion (provisioning), which does not come in a continuous feed, but in batch
packages, except for Catch-up TV.
→ Encoding and upload: Powerful tools for transcoding and checking the quality of one or more outputs,
and this includes managing several languages and subtitles, and most important, adding encryption
for DRM.
→ Indexing: Video producers usually provide basic metadata describing the asset content (what is in the
package?), which needs to be enriched with more specific metadata (title, director, cast, CSA rating,
etc), the encoding caracteristics (MPEG 2, MPEG-4, framerate, bitrate, …), ownership, rights of
access, as well as many others.
→ Storing: because of its size, digital assets are not usually stored in databases, but in filesystems,
having the database a pointer to the file location.
→ Animation: In the animation process, metadata dealing with the product and the presentation is
added, such as genre or offer group, price, highlights, effectivity dates, etc.
→ Distribution and delivery: Once ready for broadcast, the assets are distributed through CDNs, pushing
them closer to the final consumers, and the video portal is updated. The video portal includes all the
sections navigable by the customers, with its embedded logic (authentication, payment, parental
control, etc), and links to the media streamers.
3.5.3 Video aggregator CMS
If we consider a video aggregator portal such as 24/24 Actu we may find several differencial features we need
to be present in the supporting CMS.
→ The video information has an intrinsic actuality component, and the number of files increase by the
hundreds everyday, therefore the process from ingestion to publication has to be more direct and
therefore automatic than for example the VoD scenario.
→ The transcoding can be requested to the external providers, so they can provide the videos in an
specified format and codec. Furthermore, they can provide several instances in different formats if the
portal is multiplatform.
→ There are some features that lose wheight, such as DRM, content delivery through CDN and QoS,
because in this case users may admit small pixelations and some streaming interruptions.
→ On the other hand, some features gain importance, such as metadata generation and cross-
referencing. Depending on the product objectives, the analysis of data an metadata, indexing and
cross-referencing algorithms, frame or scene analysis, closed-caption analysis, etc, might be
preferrable to be requested to ad-hoc external modules.
3.5.4 Video sharing CMS
In a Youtube or DailyMotion-like portal, the differential features are:
→ Uploading and transcoding: For UGC, users need to be provided video uploading capabilities, and
uploaded videos have to be transcoded to common formats, even in real time for immediate
publication.
→ Storage: Videos may be uploaded by the thousands everyday, and may last forever. Storage capacity
and massive indexation must be carefully considered.
→ Moderation: Content is produced and published by the users, but moderation is necessary in order to
avoid unproper, abusive or copyrighted content. Moderation organizations need to be carefully
planned and accounted for in the business cases.
→ Tagging: Video sharing portals are fundamentally auto-configured. We don’t need editors tagging
videos and adding metadata, but this is done by the own community. The CMS metamodel needs to
admit “User Generated Metadata”, in the form of rating, comments, related videos, similar navigation,

etc, as well as audience metrics for promoting “the most viewed”, “the most popular”, “the most
commented”, etc.
3.6 Web 2.0
3.6.1 Introduction
The Web 2.0 is the evolution from a content-centric architecture to a user-centric framework. Since the hype
of Web 2.0 explosion, materialized in sites based in blogs, wikis, podcasts, RSS syndication, etc, there is a
permanent question mark made to all CMS: is it Web 2.0 compliant? The same question is often formulated
this way: does it support Ajax?
But, what is Web 2.0?. What is Ajax?. If a CMS does support Ajax, is it therefore Web 2.0 compatible?
The term Web 2.0 was coined in 2004 by Dale Dougherty, a vice-president of O’Reilly Media Inc, in an
internal brainstorming, to pinpoint the transformation of the web from read only to a read/write tool, and based
on collaboration, contribution and community, which facilitates a more socially connected Web.
We may say that Web 2.0 is about enabling and encouraging participation through open applications and
services. By this definition, ‘participation’ means the ability to contribute, share, mix and publish both data
(technology) and ideas (content) through web applications and services. Practical examples include the ability
to post a photo from a smartphone in response to a news article on a web site. Or taking the Google Maps
API and mixing this up with some of user’s own code to produce a site that allows us to find where a
restaurant is.
In this sense, Britannica Online, page views, or publishing are to Web 1.0 as Wikipedia, cost per click and
participation are to Web 2.0, respectively. And following these comparisons, CMS are to Web 1.0 as wikis and
blogs are to Web 2.0.
3.6.2 The Web 2.0 key ideas
Tim O'Reilly describes Web 2.0 around six key ideas:
→ Individual production and User Generated Content: In the past, content production and
publication, in the form of text or media, had a one-way flow, where content editors would produce the
content, and the users would silently consume it. Web 2.0 changes dramatically this situation, where
end-users take an active role in the content chain, and become both content producers and
consumers. Besides sharing the content, end users are also looking for new ways to participate in
networked or social relationships. Web applications are not then static frameworks, but they
dynamically evolve. In other words, the end-users are becoming applications developers themselves
together with other end-users.

→ Harnessing the power of the croud, through collaborative production or crowdsourcing. We can
translate crowdsourcing as “asking our users to work for us” (and probably without them knowing). For
instance, Google produces its pagerank based on how the users hit and link the pages, and so, the
users of Google are not really their customers, but their producers. As another example, if an airplane
crashes somewhere in the world, chances are that somebody around takes a picture and uploads it to
Twitter, building a new concept in the content chain: the real-time web. Finally, we can take a look at
Foursquare, where the whole business model has been planned and built from the principles of user
production and collaboration.
Figure 1: Information creation and circulation before and after Twitter
→ Data on an epic scale: In Web 2.0 applications, the value of the software is proportional to the scale
and dynamism of the data it manages. The winners in the playground are companies that have
developed the ability to collect and manage this data on an epic scale. Much of the data is collected
indirectly from users and aggregated. This data can be recombined in multiple ways, and also made

available, via APIs, to developers, who can create mash-ups. Mash-ups can be again collected,
aggregated and the new elaborated data can be made available through new open APIs…
→ Architecture of Participation: The architecture of participation occurs when, through normal use of
an application or service, the service itself gets better. To the user, this appears to be a side effect of
using the service, but in fact, the system has been designed to take the user interactions and utilize
them to improve itself. An example is Google Search. Another example is Bittorrent: the service gets
better the more people use it.
→ Network Effects: There are two key concepts around the size of internet as a network: the Network
effect and the Long-tail. The Network effect has to do with the economic and social implications of
adding new users to a service based on the Internet, and not from its physical point of view, but from
the increase in value to the existing users of a service in which there is some form of interaction with
others, as more and more people start to use it. The Long-tail reveals niches of content that can be
exploited due to the fact that the use of internet does not impose physical barriers. For instance, the
frequency of hits to hypertext links follow a power rule or asymptotic law: there are few links accessed
very frequently, and a long tail of links whose number of hits tend to zero. But since there are not
barriers, such as self space in a book store, those links are left available for ever. Music stores that
have an online service observe this phenomenon: while most of the on-store sales (60-70%) account
for new albums, that percentage is reversed (30-40%) on behalf of the online sales. Amazon is a
notorious master in taking benefit of the long-tail effect.
→ Openness: The development of the Web has seen a wide range of legal, regulatory, political and
cultural developments surrounding the control, access and rights of digital content. However, the Web
has also always had a strong tradition of working in an open fashion and this is also a powerful force
in Web 2.0: working with open standards, using open source software, making use of free data, re-
using data and working in a spirit of open innovation.
3.6.3 Technology and standards
One of the key drivers of the development of Web 2.0 is the emergence of a new generation of Web-related
technologies and standards. This has been underpinned by the powerful idea of the Web as platform.
3.6.3.1 Ajax
On traditional HTML-based websites, when the user chooses an option or clicks on a hypertext link, he has to
wait for pages to reload and refresh. Several attempts have been made over the years to improve the
dynamism of web pages through individual. However, but it is really only with the introduction of Ajax
(Asynchronous Javascript + XML) that this has come together successfully. Using Ajax, only small amounts of
information pass to and from the server once the page has first been loaded. This allows a portion of a
webpage to be dynamically reloaded in real-time and creates the impression of richer, more 'natural'
applications.
Although Ajax is a group of technologies, the core is the Ajax engine, which acts as an intermediary, sitting
within the client’s browser and facilitating asynchronous communication with the server of smaller items of
information. So, if a webpage contains a lot of text, plus, as a side-bar, and a graph of the current stock
prices, this graph can be asynchronously updated in real-time without the whole page being reloaded every
few seconds. The Ajax engine processes every action that would normally result in a trip back to the server for
a page reload, before making any really necessary referrals back to the server. The traffic between the client
and the server is reduced, because no update to the mark-up is necessary (HTML), only pure data.
3.6.3.2 SOAP and REST
In order to communicate over networks we need standardized data formats and protocols. How will we
standardize the protocols used to transport the XML documents? We need the use of simplified programming
models, which facilitate the creation of loosely coupled systems, or WebServices. A WebService is a set of
standards and protocols to interchange data between applications.

However, there is a source of debate about which is the best programming practice, whether to use
'heavyweight' and rather formal techniques such as WebServices, or, on the ‘lightwheight’ way of using
scripting languages such as Perl, Python, PHP and Ruby, along with technologies such as RSS, Atom and
JSON.
The discussions about style within the Web development community are materializing around two main
approaches: REST and SOAP. Both REST and SOAP are often termed "Web services," and one is often
used in place of the other, but they are totally different approaches: REST is an architectural style for building
client-server applications, whether SOAP is a protocol specification for exchanging data between two
endpoints.
• REST stands for Representational State Transfer, an architectural idea and set of architectural
principles. It is not a standard, but describes an approach for a client/server, stateless architecture
which provides a simple communications interface using XML and HTTP. REST is mainly pushed by
Yahoo.
• SOAP and WebServices, on the other hand, are more formal and use messaging, complex protocols
and Web Services Description Language (WSDL). Google is the pusher for SOAP.
3.6.4 CMS readiness to Web 2.0
In order to assess the readyness to Web 2.0 for a CMS, we need to consider its beneficiaries:
• The organization of internal content producers and managers
• The human audience: those people who use, consume and potentially enrich the content
• The machine or software audience: those devices or applications that will consume machine readable
views of the content
3.6.4.1 Internal staff
CMS offer web-based tools to editorial staff and content managers, so the system needs to address the new
demands of this audience. Here are a few things to pay attention to:
• Toolkits: The CMS tools that manage Web 2.0 sites should be as direct and productive as the Web
2.0 sites themselves, and so, posting a document in the intranet or creating a new article in our news
portal should be as easy as creating a wiki or a blog entry. Otherwise, staff can avoid using the CMS.
• Rich Experience: CMS administrative interfaces based on a web browser need to be modernized,
with Rich Internet Applications (RIA) technicques, in order to provide complex operations without the
wait for page refreshes between operations. It’s a matter of applying consumer-oriented design values
to enterprise-oriented software.
• Immediacy: Users like to be able to perform an action and see a result, and any delay discourages
participation. The CMS must provide flexibility in the workflows and move toward lighter processes
and quick publishing operations. Complex security policies and workflow models that put several
approval layers between a contributor and publishing do not fit well with user expectation of
immediacy.
3.6.4.2 External human audience
As we have seen, consumers have become participants. A modern Web CMS must acknowledge this and
demonstrate the ability to both integrate the public as participants and deliver the tools visitors expect.
• Flexible data models: We don’t want our website to be a static brochure, nor we want our
information architecture to be impenetrable to the people we want to welcome. CMS with weak data
taxonomy support are limiting. On the contrary, we will be pushing for multiple paths to the same

content and provide a richer Web experience by creating dynamic views of content based on what the
user is looking for.
• Social Bookmarking: Page structure and metadata should allow out site to be considered by social
bookmarking or tagging sites such as Del.icio.us. These services allow users to identify useful assets
and categorize them in a way that the external world may have a better chance of understanding.
• Community Generated Content Support: There are two forms of UGC: primary content and content
metadata. Depending on our goals, the CMS should support one or both of them.
o If we plan to allow our audience to publish primary content assets, the CMS needs to have a
flexible content data model and the ability to support public participation in the publishing
workflow.
o The CMS must also support the many forms of metadata, including voting, rating, comments
and tagging, which help users promote and associate content that they have found to be
valuable.
• Integration of Community Generated Content: CMS must address the issue of storing community-
generated content. Many CMS are designed around a multi-tier architecture with a strongly secured
delivery tier that pushes content to the read-only presentation tier for display. Some systems have the
management tier pre-render (bake) content into formatted HTML, while others have a presentation tier
that dynamically renders (fries) pages with each page request. Content presented to the customer is
essentially read-only. However, when we want our external visitors to contribute to the website, the
issue of where to store this content raises.
o For baking style presentation systems, the strategy is to manage community-generated
content separately from the editorial content, that can be done on a separate, dynamic
section of the site as in user forum area.
o Frying style presentation systems have the option of dynamically sending visitor-generated
content to the presentation tier. Of course, read-write presentation tier will require clustering
configurations to be more complex.
• Multi-device Support: Publishing to different formats, such as wireless devices, requires better
separation of content and layout and more manageable presentation systems, so CMS that tightly
bind content structure and layout will be in a poor positioning.
3.6.4.3 External consumer applications
In Web 2.0, our content should live beyond the boundaries of our website, and therefore focus must be put in
supporting machine readable content formats such as RSS, which allows the community to easily access our
services, and potentially use it to create new value.
• Syndication with Web Feeds: Syndicating the content in standard format feeds such as RSS or
Atom is both strategic and well mannered in the Web 2.0 world, and it allows our content to be
aggregated into high traffic resources.
• Public APIs: Open API's create the potential of spontaneous partnerships through “mashups.”
• Use of Microdata: Using Microdata in the rendered Web pages allows machines to scan a page for
information, and extract structured content out of it (see paragraph 5.2.1).
3.7 Principal features of a CMS
Although in previous paragraphs we have presented a thorough list of features desired in a generic CMS,
some of them are more important than others. In this chapter we will filter this global list, providing
recommendations depending on the scenario.

CMS White Paper

Recommended

Recommended

More Related Content

Similar to CMS White Paper

Similar to CMS White Paper (20)

Recently uploaded

Recently uploaded (20)

CMS White Paper