Seo2india the social semantic web

John G. Breslin · Alexandre Passant · Stefan Decker

The Social
Semantic Web

123

John G. Breslin Alexandre Passant
Electrical and Electronic Engineering Digital Enterprise Research
School of Engineering and Informatics Institute (DERI)
National University of Ireland, Galway National University of Ireland, Galway
Nuns Island IDA Business Park
Galway Lower Dangan
Ireland Galway
john.breslin@nuigalway.ie Ireland
alexandre.passant@deri.org

Stefan Decker
Digital Enterprise Research
Institute (DERI)
National University of Ireland, Galway
IDA Business Park
Lower Dangan
Galway
Ireland
stefan.decker@deri.org

ISBN 978-3-642-01171-9 e-ISBN 978-3-642-01172-6
DOI 10.1007/978-3-642-01172-6
Springer Heidelberg Dordrecht London New York

Library of Congress Control Number: 2009936149

ACM Computing Classification (1998): H.3.5, H.4.3, I.2, K.4

c Springer-Verlag Berlin Heidelberg 2009
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from Springer. Violations
are liable to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.

Cover design: KuenkelLopka GmbH

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Contents

1 Introduction to the book ....................................................................................1
1.1 Overview......................................................................................................1
1.2 Aims of the book, and who will benefit from it? .........................................3
1.3 Structure of the book....................................................................................4
1.3.1 Motivation for applying Semantic Web technologies to the Social
Web ...............................................................................................................5
1.3.2 Introduction to the Social Web (Web 2.0, social media, social
software)........................................................................................................5
1.3.3 Adding semantics to the Web...............................................................6
1.3.4 Discussions...........................................................................................6
1.3.5 Knowledge and information sharing ....................................................6
1.3.6 Multimedia sharing...............................................................................7
1.3.7 Social tagging.......................................................................................7
1.3.8 Social sharing of software ....................................................................7
1.3.9 Social networks ....................................................................................8
1.3.10 Interlinking online communities.........................................................8
1.3.11 Social Web applications in enterprise ................................................8
1.3.12 Towards the Social Semantic Web.....................................................9

2 Motivation for applying Semantic Web technologies to the Social Web .....11
2.1 Web 2.0 and the Social Web......................................................................11
2.2 Addressing limitations in the Social Web with semantics .........................13
2.3 The Social Semantic Web: more than the sum of its parts.........................15
2.4 A food chain of applications for the Social Semantic Web .......................17
2.5 A practical Social Semantic Web ..............................................................19

3 Introduction to the Social Web (Web 2.0, social media, social software) ....21
3.1 From the Web to a Social Web ..................................................................21
3.2 Common technologies and trends ..............................................................25
3.2.1 RSS.....................................................................................................25
3.2.2 AJAX..................................................................................................27
3.2.3 Mashups .............................................................................................28
3.2.4 Advertising .........................................................................................30
3.2.5 The Web on any device ......................................................................32
3.2.6 Content delivery .................................................................................34
3.2.7 Cloud computing ................................................................................35
3.2.8 Folksonomies .....................................................................................38
3.3 Object-centred sociality .............................................................................39

vi The Social Semantic Web

3.4 Licensing content....................................................................................... 42
3.5 Be careful before you post ......................................................................... 42
3.6 Disconnects in the Social Web .................................................................. 44

4 Adding semantics to the Web .......................................................................... 45
4.1 A brief history............................................................................................ 45
4.2 The need for semantics .............................................................................. 47
4.3 Metadata .................................................................................................... 51
4.3.1 Resource Description Framework (RDF)........................................... 52
4.3.2 The RDF syntax ................................................................................. 54
4.4 Ontologies.................................................................................................. 56
4.4.1 RDF Schema ...................................................................................... 59
4.4.2 Web Ontology Language (OWL)....................................................... 61
4.5 SPARQL.................................................................................................... 62
4.6 The ‘lowercase’ semantic web, including microformats ........................... 64
4.7 Semantic search ......................................................................................... 66
4.8 Linking Open Data .................................................................................... 67
4.9 Semantic mashups ..................................................................................... 69
4.10 Addressing the Semantic Web ‘chicken-and-egg’ problem..................... 71

5 Discussions ........................................................................................................ 75
5.1 The world of boards, blogs and now microblogs....................................... 75
5.2 Blogging .................................................................................................... 76
5.2.1 The growth of blogs ........................................................................... 77
5.2.2 Structured blogging ............................................................................ 79
5.2.3 Semantic blogging.............................................................................. 81
5.3 Microblogging ........................................................................................... 85
5.3.1 The Twitter phenomenon ................................................................... 88
5.3.2 Semantic microblogging .................................................................... 89
5.4 Message boards.......................................................................................... 91
5.4.1 Categories and tags on message boards.............................................. 92
5.4.2 Characteristics of forums ................................................................... 94
5.4.3 Social networks on message boards ................................................... 97
5.5 Mailing lists and IRC............................................................................... 100

6 Knowledge and information sharing ............................................................ 103
6.1 Wikis........................................................................................................ 103
6.1.1 The Wikipedia.................................................................................. 105
6.1.2 Semantic wikis ................................................................................. 105
6.1.3 DBpedia ........................................................................................... 110
6.1.4 Semantics-based reputation in the Wikipedia .................................. 111

Contents vii

6.2 Other knowledge services leveraging semantics......................................112
6.2.1 Twine................................................................................................112
6.2.2 The Internet Archive ........................................................................115
6.2.3 Powerset ...........................................................................................117
6.2.4 OpenLink Data Spaces .....................................................................119
6.2.5 Freebase............................................................................................119

7 Multimedia sharing ........................................................................................121
7.1 Multimedia management .........................................................................121
7.2 Photo-sharing services .............................................................................122
7.2.1 Modelling RDF data from Flickr......................................................123
7.2.3 Annotating images using Semantic Web technologies.....................125
7.3 Podcasts ...................................................................................................126
7.3.1 Audio podcasts .................................................................................127
7.3.2 Video podcasts .................................................................................129
7.3.3 Adding semantics to podcasts ..........................................................131
7.4 Music-related content ..............................................................................133
7.4.1 DBTune and the Music Ontology.....................................................133
7.4.2 Combining social music and the Semantic Web ..............................134

8 Social tagging ..................................................................................................137
8.1 Tags, tagging and folksonomies ..............................................................137
8.1.1 Overview of tagging.........................................................................137
8.1.2 Issues with free-form tagging systems .............................................140
8.2 Tags and the Semantic Web.....................................................................142
8.2.1 Mining taxonomies and ontologies from folksonomies ...................143
8.2.2 Modelling folksonomies using Semantic Web technologies............144
8.3 Tagging applications using Semantic Web technologies.........................148
8.3.1 Annotea ............................................................................................148
8.3.2 Revyu.com........................................................................................149
8.3.3 SweetWiki ........................................................................................151
8.3.4 int.ere.st ............................................................................................151
8.3.5 LODr ................................................................................................152
8.3.6 Atom Interface..................................................................................153
8.3.7 Faviki................................................................................................154
8.4 Advanced querying capabilities thanks to semantic tagging ...................155
8.4.1 Show items with the tag ‘semanticweb’ on any platform.................155
8.4.2 List the ten latest items tagged by Alexandre on SlideShare............155
8.4.3 List the tags used by Alex on SlideShare and by John on Flickr......157
8.4.4 Retrieve any content tagged with something relevant to the
Semantic Web field ...................................................................................158

viii The Social Semantic Web

9 Social sharing of software.............................................................................. 159
9.1. Software widgets, applications and projects ........................................... 159
9.2 Description of a Project (DOAP)............................................................. 160
9.2.1 Examples of DOAP use.................................................................... 161
9.3 Crawling and browsing software descriptions ......................................... 164
9.4 Querying project descriptions and related data........................................ 166
9.4.1 Locating software projects from people you trust ............................ 166
9.4.2 Locating a software project related to a particular topic .................. 167

10 Social networks............................................................................................. 169
10.1 Overview of social networks ................................................................. 169
10.2 Online social networking services ......................................................... 173
10.3 Some psychology behind SNS usage..................................................... 175
10.4 Niche social networks............................................................................ 177
10.5 Addressing some limitations of social networks.................................... 179
10.6 Friend-of-a-Friend (FOAF).................................................................... 181
10.6.1 Consolidation of people objects ..................................................... 184
10.6.2 Aggregating a person’s web contributions ..................................... 186
10.6.3 Inferring relationships from aggregated data.................................. 187
10.7 hCard and XFN...................................................................................... 189
10.8 The Social Graph API and OpenSocial ................................................. 190
10.8.1 The Social Graph API .................................................................... 190
10.8.2 OpenSocial ..................................................................................... 192
10.9 The Facebook Platform.......................................................................... 193
10.10 Some social networking initiatives from the W3C .............................. 194
10.11 A social networking stack.................................................................... 194

11 Interlinking online communities ................................................................. 197
11.1 The need for semantics in online communities...................................... 197
11.2 Semantically-Interlinked Online Communities (SIOC)......................... 198
11.2.1 The SIOC ontology ........................................................................ 201
11.2.2 SIOC metadata format.................................................................... 203
11.2.3 SIOC modules ................................................................................ 205
11.3 Expert finding in online communities.................................................... 206
11.3.1 FOAF for expert finding ................................................................ 208
11.3.2 SIOC for expert finding.................................................................. 209
11.4 Connections between community description formats .......................... 211
11.5 Distributed conversations and channels................................................. 212
11.6 SIOC applications.................................................................................. 215
11.7 A food chain for SIOC data ................................................................... 216
11.7.1 SIOC producers .............................................................................. 218
11.7.2 SIOC collectors .............................................................................. 223
11.7.3 SIOC consumers............................................................................. 224
11.8 RDFa for interlinking online communities ............................................ 231

Contents ix

11.9 Argumentative discussions in online communities................................234
11.10 Object-centred sociality in online communities...................................236
11.11 Data portability in online communities................................................238
11.11.1 The DataPortability working group..............................................238
11.11.2 Data portability with FOAF and SIOC.........................................240
11.11.3 Connections between portability efforts.......................................241
11.12 Online communities for health care and life sciences..........................242
11.12.1 Semantic Web Applications in Neuromedicine............................243
11.12.2 Science Collaboration Framework ...............................................244
11.12.3 bio-zen and the art of scientific community maintenance ............246
11.13 Online presence....................................................................................246
11.14 Online attention....................................................................................247
11.15 The SIOC data competition .................................................................247

12 Social Web applications in enterprise.........................................................251
12.1 Overview of Enterprise 2.0 ....................................................................251
12.2 Issues with Enterprise 2.0 ......................................................................255
12.2.1 Social and philosophical issues with Enterprise 2.0 .......................255
12.2.2 Technical issues with Enterprise 2.0 ..............................................258
12.3 Improving Enterprise 2.0 ecosystems with semantic technologies........262
12.3.1 Introducing SemSLATES...............................................................262
12.3.2 Implementing semantics in Enterprise 2.0 ecosystems ..................263
12.3.3 SIOC for collaborative work environments....................................266

13 Towards the Social Semantic Web..............................................................269
13.1 Possibilities for the Social Semantic Web .............................................269
13.2 A community-guided Social Semantic Web ..........................................271
13.2.1 Wisdom of the crowds and the Semantic Web...............................272
13.2.2 A grassroots approach ....................................................................273
13.2.3 The vocabulary onion.....................................................................275
13.3 Integrating with the Social Semantic Desktop .......................................278
13.4 Privacy and identity on the Social Semantic Web .................................279
13.4.1 Keeping privacy in mind ................................................................279
13.4.2 Identity fragmentation ....................................................................280
13.5 The vision of a Social Semantic Web ....................................................281

Acknowledgments..............................................................................................285

Dedication from John........................................................................................287

Biographies ........................................................................................................289

References ..........................................................................................................291

1 Introduction to the book

1.1 Overview

The Social Web - encompassing social networking services such as MySpace,
Facebook and orkut, as well as content-sharing sites (that also offer social net-
working functionality) like Flickr, Last.fm and del.icio.us - has captured the atten-
tion of millions of users as well as billions of dollars in investment and acquisi-
tion. As more social websites form around the connections between people and
their objects of interest (to avoid these sites becoming boring), and as these ‘ob-
ject-centred networks’ (where people connect via these objects of interest) grow
bigger and more diverse, more intuitive methods are needed for representing and
navigating the content items in these sites: both within and across social websites.
Also, to better enable user access to multiple sites and ultimately to content-
creation facilities on the Web, interoperability among social websites is required
in terms of both the content objects and the person-to-person networks expressed
on each site. This requires representation mechanisms to interconnect people and
objects on the Web in an interoperable and extensible way (Breslin and Decker
2007).
Semantic Web representation mechanisms are ideally suited to describing peo-
ple and the objects that link them together in such object-centred networks, by re-
cording and representing the heterogeneous ties that bind each to the other. By us-
ing agreed-upon Semantic Web formats to describe people, content objects, and
the connections that bind them together, social networks can also interoperate by
appealing to common semantics. Developers are already using Semantic Web
technologies to augment the ways in which they create, reuse, and link content on
social networking and social websites. These efforts include the Friend-of-a-
Friend (FOAF) project1 for describing people and relationships, the Nepomuk so-
cial semantic desktop2 which is a framework for extending the desktop to a col-
laborative environment for information management and sharing, and the Seman-
tically-Interlinked Online Communities (SIOC) initiative3 for representing online
discussions (Breslin et al. 2005). Some social networking services (SNSs), such as
FriendFeed, are also starting to provide query interfaces to their data, which others
can reuse and link to via the Semantic Web.
The Semantic Web is a useful platform for linking and for performing opera-
tions on diverse person- and object-related data (as shown in Figure 1.1) gathered
from heterogeneous social websites (in what is termed ‘Web 2.0’).

1 http://www.foaf-project.org/ (URL last accessed 2009-06-09)
2 http://nepomuk.semanticdesktop.org/ (URL last accessed 2009-06-09)
3 http://sioc-project.org/ (URL last accessed 2009-06-09)

J.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_1,
© Springer-Verlag Berlin Heidelberg 2009

2 The Social Semantic Web

Metadata and Ontology
Unified vocabularies languages
representation
Unified queries

Web 2.0

Fig. 1.1. Interconnecting and reusing distributed Web 2.0 data with semantic technologies

In the other direction, object-centred networks and user-centric services for
generating collaborative content can serve as rich data sources for Semantic Web
applications (Figure 1.2).

Collaboration
Architecture
of participation Browsing
interfaces

Authoring Mash-ups

Semantic Web

Fig. 1.2. Powering semantic applications with rich community-created content and Web 2.0
paradigms

This linked data can provide an enhanced view of individual or community ac-
tivity in localised or distributed object-centred social networks. In fact, since all
this data can be semantically interlinked using well-given semantics (e.g. using the

1 Introduction to the book 3

FOAF and SIOC ontologies), in theory it makes no difference whether the content
is distributed or localised. All of this data can be considered as a unique inter-
linked machine-understandable graph layer (with nodes as users or related data
and arcs as relationships) over the existing Web of documents and hyperlinks, i.e.
a Giant Global Graph as Tim Berners-Lee recently coined4. Moreover, such inter-
linked data allows advanced querying capabilities, for example, ‘show me all the
content that Alice has acted on in the past three months in any SNS’.
In this book, we will begin with our motivations followed by overviews of both
the Social Web and the Semantic Web. Then we will describe some popular social
media and social networking applications, list some of their strengths and limita-
tions, and describe some applications of Semantic Web technologies to address
current issues with social websites by enhancing them with semantics.
Across these heterogeneous social websites, we will demonstrate a twofold ap-
proach towards integrating the Social Web and the Semantic Web: in particular,
(1) by demonstrating how the Semantic Web can serve as a useful platform for
linking and for performing operations on diverse person- and object-related data
gathered from these websites, and (2) by showing that in the other direction, social
websites can themselves serve as rich data sources for Semantic Web applications.
We shall conclude with some observations on how the application of Semantic
Web technologies to the Social Web is leading towards the ‘Social Semantic
Web’, forming a network of interlinked and semantically-rich content and knowl-
edge.

1.2 Aims of the book, and who will benefit from it?

Initially, we aim to educate readers on evolving areas from the world of collabora-
tion and communication systems, social software and the Social Web. We shall
also show connections with parallel developments in the Semantic Web effort.
Then, we will illustrate how social software applications can be enhanced and in-
terconnected with semantic technologies, including semantic and structured blog-
ging, interconnecting community sites, semantic wikis, and distributed social net-
works. The goal of this book is that readers will be able to apply Semantic Web
technologies to these and to other application areas in what is termed the Social
Semantic Web.
This book is intended for computer science professionals, researchers, academ-
ics and graduate students interested in understanding the technologies and research
issues involved in applying Semantic Web technologies to social software. Appli-
cations such as blogs, social networks and wikis require more automated ways for
information distribution. Practitioners and developers interested in such applica-

4 http://dig.csail.mit.edu/breadcrumbs/node/215 (URL last accessed 2009-06-09)


tion areas will also learn about methods for increasing the levels of automation in
these forms of web communication.
For those who have background knowledge in the area of the Semantic Web,
we envisage that this book will help you to develop application knowledge in rela-
tion to social software and other widely-used related Social Web technologies. For
those who already have application knowledge in web engineering or in the devel-
opment of systems such as wikis, social networks and blogs, we hope this book
will inspire you to develop and create ideas on how to increase the usability of so-
cial software and other web systems using Semantic Web technologies.

1.3 Structure of the book

We shall now give an introduction to the chapters in this book and explain the
logical chapter layout and flow (Figure 1.3).
Following an overview of the motivation for combining the Social Web and the
Semantic Web, we will proceed with an introduction to various technologies and
trends in both the Social Web and the Semantic Web domains.

Fig. 1.3. Chapter flow for the book


This will be followed by a series of chapters whereby various Social Web ap-
plication areas will be introduced, and semantic enhancements to these areas will
be described. The areas we focus on are: online discussion systems such as fo-
rums, blogs and mailing lists; knowledge sharing services such as wikis and other
sites for (mainly textual) information storage and recovery; multimedia services
for sharing images, audio and video files; bookmarking sites and similar services
organised around tagging functionality; sites for publishing and sharing commu-
nity software projects; online social networking services; interlinked online com-
munities; and enterprise applications. These chapters will have varying ratios of
semantic implementations to non-semantic ones where state-of-the-art semantic
techniques may have achieved more traction in some application areas.
Finally, in the last chapter we will describe approaches to integrate these social
semantic applications in what we have termed ‘Social Semantic Information
Spaces’.

1.3.1 Motivation for applying Semantic Web technologies to the
Social Web

This part will focus on the motivation for applying Semantic Web technologies to
the Social Web, as summarised in the introductory description just given.

1.3.2 Introduction to the Social Web (Web 2.0, social media, social
software)

We shall begin with an overview of social websites, looking at common Social
Web technologies and methods for collaboration, content sharing, data exchange
and representation (enhancing interaction and exchange with AJAX and mashups,
how content is being categorised via tagging and folksonomies, etc.). We shall
also discuss existing structured content that is available from social websites,
mainly via content syndication whereby people can keep up to date with published
material using RSS, Atom and other subscription methods. Then we will introduce
the notion of object-centred sociality (referencing the observations of Jyri
Engeström and Karen Knorr-Cetina), where social websites are organised around
the objects of interest that connect people together.


1.3.3 Adding semantics to the Web

In this chapter, we will examine state of the art in the Semantic Web such as
metadata and ontology standards and mashups, as well as some efforts aimed at
providing semantic search and leveraging linked data. We shall talk about why ob-
ject-centred sociality provides a meaning for representing Social Web content us-
ing semantics. The chapter will focus not only on the ‘uppercase’ Semantic Web
(where formal specifications such as OWL and RDF are used to represent ontolo-
gies and associated metadata), but will also look at the ‘lowercase’ semantic web
(where developer-led efforts in the microformats community are creating simple
semantic structures for use by ‘people first, machines second’).

1.3.4 Discussions

We shall describe the area of blogging, one of the most popular Social Web activi-
ties. Blogs are online journals or sets of chronological news entries that are main-
tained by individuals, communities or commercial entities, and can be used to
publish personal opinions, diary-like articles or news stories relating to a particular
interest or product. We shall begin by describing current approaches to blogging,
and detail how semantic technologies improve both the processes of creating and
editing blog posts, and of browsing and querying the data created by blogs (via
structured blogging and semantic blogging). We shall also discuss forums, mailing
lists, and other web-based discussion systems such as microblogging, a recent
trend regarding lightweight and agile communication on the Web.

1.3.5 Knowledge and information sharing

Wikis are collaboratively-edited websites that can be updated or added to by any-
one with an interest in the topic covered by the wiki site, and have been used to
create online encyclopaedias, photo galleries and literature collections. We shall
describe the Social Web application area of wikis, and describe how adding se-
mantics to wikis can offer distinct benefits: augmenting the language text in wiki
articles with structured data and typed links enables advanced querying and
browsing. We shall examine popular semantic wikis in usage today (e.g. Semantic
MediaWiki), and we will look at semantic services that leverage structured infor-
mation from wikis (such as the DBpedia). We shall briefly detail how a reputation
system with embedded semantics could be deployed in a large-scale community
site like the Wikipedia. We shall also look at the latest wave of knowledge net-


working and information sharing services (including Twine, Freebase, and
OpenLink Data Spaces).

1.3.6 Multimedia sharing

We shall begin by looking at Social Web applications for storing and sharing pho-
tographs and other images (Flickr, Zooomr, etc.), and describe an application
called FlickRDF that exports semantic data from the Flickr service. We shall then
describe both audio and video podcasting, and give some ideas for the application
of semantics to this area (e.g. through metadata descriptions and applications like
ZemPod). We shall finish the chapter with a description of how semantic tech-
nologies can be applied to social music services and websites like Last.fm,
through projects such as DBTune and the Music Ontology.

1.3.7 Social tagging

This chapter will discuss social tagging and bookmarking services on the Web.
We shall look at tagging and how semantics can assist the tagging process as well
as enhancing related aspects such as tag clouds. We shall look at annotated social
bookmarks, where sites like del.icio.us are allowing people to publicly publish
textual descriptions of their favourite links along with associated annotations of
use to others, and we will describe different issues related to tagging behaviours.
We shall describe how semantics can be added to tagging systems, both by defin-
ing models to represent tagging activities or particular behaviours and by extract-
ing a hierarchy of concepts or vocabularies from tags. Semantic social bookmark-
ing and tagging applications (e.g. int.ere.st, Revyu, LODr) will also be described
to emphasise how different aspects of tagging applications can be augmented
thanks to Semantic Web technologies.

1.3.8 Social sharing of software

The Social Web allows us to not only share data or multimedia content, but also
applications, especially free-software applications or lightweight add-ons to web
pages such as widgets. We shall look at how interoperability among social web-
sites is possible not just in terms of the expressed content but also in terms of the
social applications in use (e.g. widgets) on each site. We shall give an overview of
existing ways to share software on the Web, focussing on how a social aspect can
be added to data such as software projects or widget descriptions. We shall follow


this with a description of methods for describing software projects using seman-
tics, and we will see how applications can be identified and discovered on the
Web thanks to these semantics. We shall also discuss how trust mechanisms for
consuming applications can be leveraged via the distributed social graph so that
users can decide who to accept any new data or applications from.

1.3.9 Social networks

We shall begin with an overview of social networks, and look at current develop-
ments regarding the ‘social graph’. We further describe the idea of object-centred
sociality as introduced in Chapter 3. We shall then discuss initiatives from major
Web companies to provide interoperability between social networking applica-
tions such as Facebook Connect and Google’s OpenSocial and Social Graph APIs.
We shall finish the chapter with a description of how open and distributed seman-
tic social networks can be created through definitions such as Friend-of-a-Friend
(FOAF) or XHTML Friends Network (XFN), enabling interoperability between
different SNSs.

1.3.10 Interlinking online communities

We shall describe the usage of Semantic Web technologies for enhancing commu-
nity portals and for connecting heterogeneous social websites - SIOC is currently
being used for information structuring as well as for export and information dis-
semination. We shall describe current standardisation activities as well as research
prototype applications and commercial implementations. We shall also show how
SIOC can be combined with other ontologies (including FOAF, SKOS, and Dub-
lin Core) in architectures for community site interoperability. We will look at cur-
rent projects that enable one to query for topics or to browse distributed discussion
content across various types of social websites (e.g. the SIOC Explorer, Sindice
SIOC Widget).

1.3.11 Social Web applications in enterprise

We shall begin with an overview of Enterprise 2.0, looking at how Social Web
applications are being used internally and externally by companies. We shall then
examine the application of Semantic Web technologies to Enterprise 2.0 ecosys-
tems. In particular, we will look at the usage of semantics in integrated enterprise
social software suites as well as how the Semantic Web can help us to integrate


the various components that are being used in Enterprise 2.0 ecosystems. For ex-
ample, we will show how collaborative work environments can be enhanced
through the application of semantics (e.g. SIOC4CWE).

1.3.12 Towards the Social Semantic Web

Finally, we will discuss and present current approaches to realize the ideas of
Vannevar Bush (Bush 1945) and Doug Engelbart (Engelbart 1962) on distributed
collaboration infrastructures, towards both the Social Semantic Web and the So-
cial Semantic Desktop (together, we term these as Social Semantic Information
Spaces). We can combine the semantically-enhanced social software applications
described in previous chapters into a Social Semantic Information Space. In the
spirit of seminal visions such as Bush’s Memex and Engelbart’s open hyperdocu-
ment system (OHS), this chapter will detail how previous perspectives on group
forming, network modelling and algorithms, and innovative IT-based interaction
with feedback are driving new initiatives for creating semantic connections within
and between people’s information spaces.

2 Motivation for applying Semantic Web
technologies to the Social Web

Many will have become familiar with popular Social Web applications such
as blogging, social networks and wikis, and will be aware that we are heading
towards an interconnected information space (through the blogosphere, in-
ter-wiki links, mashups, etc.). At the same time, these applications are experi-
encing boundaries in terms of information integration, dissemination, reuse,
portability, searchability, automation and more demanding tasks like query-
ing. The Semantic Web is increasingly aiming at these applications areas -
quite a number of Semantic Web approaches have appeared in recent years
to overcome the boundaries in these application areas, e.g. semantic wikis
(Semantic MediaWiki), knowledge networking (Twine), embedded microcon-
tent detection and reuse (Operator, Headup, Semantic Radar), social graph
and data portability APIs (from Google and Facebook), etc. In an effort to
consolidate and combine knowledge about existing efforts, we aim to educate
readers about Social Web application areas and new avenues open to com-
mercial exploitation in the Semantic Web. We shall give an overview of how
the Social Web and Semantic Web can be meshed together.

2.1 Web 2.0 and the Social Web

One of the most visible trends on the Web is the emergence of the Web 2.0 tech-
nology platform. The term Web 2.0 refers to a perceived second-generation of
Web-based communities and hosted services. Although the term suggests a new
version of the Web, it does not refer to an update of the World Wide Web techni-
cal specifications, but rather to new structures and abstractions that have emerged
on top of the ordinary Web. While it is difficult to define the exact boundaries of
what structures or abstractions belong to Web 2.0, there seems to be an agreement
that services and technologies like blogs, wikis, folksonomies, podcasts, RSS
feeds (and other forms of many-to-many publishing), social software and social
networking sites, web APIs, web standards1 and online web services are part of
Web 2.0. Web 2.0 has not only been a technological but also a business trend: ac-
cording to Tim O’Reilly2: ‘Web 2.0 is the business revolution in the computer in-

1 http://www.webstandards.org/ (URL last accessed 2009-06-09)
2 http://radar.oreilly.com/archives/2006/12/web-20-compact.html (accessed 2009-06-09)



dustry caused by the move to the Internet as platform, and an attempt to under-
stand the rules for success on that new platform’.
Social networking sites such as Facebook (one of the world’s most popular
SNSs), Friendster (an early SNS previously popular in the US, now widely used in
Asia), orkut (Google’s SNS), LinkedIn (an SNS for professional relationships) and
MySpace (a music and youth-oriented service) - where explicitly-stated networks
of friendship form a core part of the website - have become part of the daily lives
of millions of users, and have generated huge amounts of investment since they
began to appear around 2002. Since then, the popularity of these sites has grown
hugely and continues to do so. (Boyd and Ellison 2007) have described the history
of social networking sites, and suggested that in the early days of SNSs, when
only the SixDegrees service existed, there simply were not enough users: ‘While
people were already flocking to the Internet, most did not have extended networks
of friends who were online’. A graph from Internet World Stats3 shows the growth
in the number of Internet users over time. Between 2000 (when SixDegrees shut
down) and 2003 (when Friendster became the first successful SNS), the number of
Internet users had doubled.
Web 2.0 content-sharing sites with social networking functionality such as
YouTube (a video-sharing site), Flickr (for sharing images) and Last.fm (a music
community site) have enjoyed similar popularity. The basic features of a social
networking site are profiles, friend listings and commenting, often along with
other features such as private messaging, discussion forums, blogging, and media
uploading and sharing. In addition to SNSs, other forms of social websites include
wikis, forums and blogs. Some of these publish content in structured formats ena-
bling them to be aggregated together.
A common property of Web 2.0 technologies is that they facilitate collabora-
tion and sharing between users with low technical barriers – although usually on
single sites (e.g. Technorati) or with a limited range of information (e.g. RSS,
which we will describe later). In this book we will refer to this collaborative and
sharing aspect as the ‘Social Web’, a term that can be used to describe a subset of
Web interactions that are highly social, conversational and participatory. The So-
cial Web may also be used instead of Web 2.0 as it is clearer what feature of the
Web is being referred to4.
The Social Web has applications on intranets as well as on the Internet. On the
Internet, the Social Web enables participation through the simplification of user
contributions via blogs and tagging, and has unleashed the power of community-
based knowledge acquisition with efforts like Wikipedia demonstrating the collec-
tive ‘wisdom of the crowds’ in creating the largest encyclopaedia. One outcome of
such websites, especially wikis, is that they can produce more valuable knowledge
collectively rather than that created by separated individuals. In this sense, the So-
cial Web can be seen as a way to create collective intelligence at a Web-scale

3 http://www.internetworldstats.com/emarketing.htm (URL last accessed 2009-06-09)
4 http://en.wikipedia.org/wiki/Social_web (URL last accessed 2009-06-09)

2 Motivation for applying Semantic Web technologies to the Social Web 13

level, following the ‘we are smarter than me’ principles5 (Libert and Spector
2008).
Similar technologies are also being used in company intranets as effective
knowledge management, collaboration and communication tools between employ-
ees. Companies are also aiming to make social website users part of their IT
‘team’, e.g. by allowing users to have access to some of their data and by bringing
the results into their business processes (Tapscott and Williams 2007).

2.2 Addressing limitations in the Social Web with semantics

A limitation of current social websites is that they are isolated from one another
like islands in the sea (Figure 2.1). For example, different online discussions may
contain complementary knowledge and topics, segmented parts of an answer that a
person may be looking for, but people participating in one discussion do not have
ready access to information about related discussions elsewhere. As more and
more social websites, communities and services come online, the lack of interop-
erability between them becomes obvious. The Social Web creates a set of single
data silos or ‘stovepipes’, i.e. there are many sites, communities and services that
cannot interoperate with each other, where synergies are expensive to exploit, and
where reuse and interlinking of data is difficult and cumbersome.
The main reason for this lack of interoperation is that for most Social Web ap-
plications, communities, and domains, there are still no common standards for
knowledge and information exchange or interoperation available. RSS (Really
Simple Syndication), a format for publishing recently-updated Web content such
as blog entries, was the first step towards interoperability among social websites,
but it has various limitations that make it difficult to be used efficiently in such an
interoperability context, as we will see later.
Another extension of the Web aims to provide the tools that are necessary to
define extensible and flexible standards for information exchange and interopera-
bility. The Scientific American article (Berners-Lee et al. 2001) from Berners-Lee,
Hendler and Lassila defined the Semantic Web as ‘an extension of the current
Web in which information is given well-defined meaning, better enabling com-
puters and people to work in cooperation’. The last couple of years have seen
large efforts going into the definition of the foundational standards supporting data
interchange and interoperation, and currently a quite well-defined Semantic Web
technology stack exists, enabling the creation of defining metadata and associated
vocabularies.

5 http://www.wearesmarter.org/ (URL last accessed 2009-06-09)


i ii

iii iv

Fig. 2.1. Creating bridges between isolated communities of users and their data6

A number of Semantic Web vocabularies have achieved wide deployment –
successful examples include RSS 1.0 for the syndication of information, FOAF,
for expressing personal profile and social networking information, and SIOC, for
interlinking communities and distributed conversations. These vocabularies share
a joint property: they are small, but at the same time vertical – i.e. they are a part
of many different domains. Each horizontal domain (e.g. e-health) would typically
reuse a number of these vertical vocabularies, and when deployed the vocabularies
would be able to interact with each other.
The Semantic Web effort is in an ideal position to make social websites inter-
operable by providing standards to support data interchange and interoperation be-
tween applications, enabling individuals and communities to participate in the
creation of distributed interoperable information. The application of the Semantic
Web to the Social Web is leading to the ‘Social Semantic Web’ (Figure 2.2), cre-
ating a network of interlinked and semantically-rich knowledge. This vision of the
Web will consist of interlinked documents, data, and even applications created by
the end users themselves as the result of various social interactions, and it is mod-
elled using machine-readable formats so that it can be used for purposes that the

6 Images courtesy of Pidgin Technologies at http://www.pidgintech.com/


current state of the Social Web cannot achieve without difficulty. As Tim Berners-
Lee said in a 2005 podcast7, Semantic Web technologies can support online com-
munities even as ‘online communities [...] support Semantic Web data by being
the sources of people voluntarily connecting things together’. For example, social
website users are already creating extensive vocabularies and semantically-rich
annotations through folksonomies (Mika 2005a).

Fig. 2.2. The Social Semantic Web

Because a consensus of community users is defining the meaning, these terms
are serving as the objects around which those users form more tightly-connected
social networks. This goes hand-in-hand with solving the chicken-and-egg prob-
lem of the Semantic Web (i.e. you cannot create useful Semantic Web applications
without the data to power them, and you cannot produce semantically-rich data
without the interesting applications themselves): since the Social Web contains
such semantically-rich content, interesting applications powered by Semantic Web
technologies can be created immediately.

2.3 The Social Semantic Web: more than the sum of its parts

The combination of the Social Web and Semantic Web can lead to something
greater than the sum of its parts: a Social Semantic Web (Auer et al. 2007, Blu-

7 http://esw.w3.org/topic/IswcPodcast (URL last accessed 2009-06-09)


mauer and Pellegrini 2008) where the islands of the Social Web can be intercon-
nected with semantic technologies, and Semantic Web applications are enhanced
with the wealth of knowledge inherent in user-generated content.
In this book, we will describe various solutions that aim to make social web-
sites interoperable, and which will take them beyond their current limitations to
enable what we have termed Social Semantic Information Spaces8. Social Seman-
tic Information Spaces are a platform for both personal and professional collabora-
tive exchange with reusable community contributions. Through the use of Seman-
tic Web data, searchable and interpretable content is added to existing Web-based
collaborative infrastructures and social spaces, and intelligent use of this content
can be made within these spaces - bringing the vision of semantics on the Web to
its most usable and exploitable level.
Some typical application areas for social spaces are wikis, blogs and social
networks, but they can include any spaces where content is being created, anno-
tated and shared amongst a community of users. Each of these can be enhanced
with machine-readable data to not only provide more functionality internally, but
also to create an overall interconnected set of Social Semantic Information Spaces.
These spaces offer a number of possibilities in terms of increased automation and
information dissemination that are not easily realisable with current social soft-
ware applications:
 By providing better interconnection of data, relevant information can be ob-
tained from related social spaces (e.g. through social connections, inferred
links, and other references).
 Social Semantic Information Spaces allow you to gather all your contributions
and profiles across various sites (‘subscribe to my brain’), or to gather content
from your friend / colleague connections.
 These spaces allow the use of the Web as a clipboard to allow exchange be-
tween various collaborative applications (for example, by allowing readers to
drag structured information from wiki pages into other applications, geographic
data about locations on a wiki page could be used to annotate information on an
event or a travel review in a blog post one is writing).
 Such spaces can help users to avoid having to repeatedly express several times
over the same information if they belong to different social spaces.
 Due to the high semantic information available about users, their interests and
relationships to other entities, personalisation of content and interface input
mechanisms can be performed, and innovative ways for presenting related in-
formation can be created.
 These semantic spaces will also allow the creation of social semantic mashups,
combining information from distributed data sources together that can also be
enhanced with semantic information, for example, to provide the geolocations
of friends in your social network who share similar interests with you.

8 http://www2006.org/tutorials/#T13 (URL last accessed 2009-06-09)


 Fine-grained questions can be answered through such semantic social spaces,
such as ‘show me all content by people both geographically and socially near to
me on the topic of movies’.
 Social Semantic Information Spaces can make use of emergent semantics to ex-
tract more information from both the content and any other embedded meta-
data.
There have been initial approaches in collaborative application areas to incor-
porate semantics in these applications with the aim of adding more functionality
and enhancing data exchange - semantic wikis, semantic blogs and semantic social
networks. These approaches require closer linkages and cross-application demon-
strators to create further semantic integration both between and across application
areas (e.g. not just blog-to-blog connections, but also blog-to-wiki exchanges). A
combination of such semantic functionality with existing grassroots efforts such as
OpenID9 (a single sign-on mechanism) or OAuth10 (an authentication scheme) can
bring the Social Web to another level. Not only will this lead to an increased num-
ber of enhanced applications, but an overall interconnected set of Social Semantic
Information Spaces can be created.

2.4 A food chain of applications for the Social Semantic Web

A semantic data ‘food chain’, as shown in Figure 2.3, consists of various produc-
ers, collectors and consumers of semantic data from social networks and social
websites. Applying semantic technologies to social websites can greatly enhance
the value and functionality of these sites.
The information within these sites is forming vast and diverse networks which
can benefit from Semantic Web technologies for representation and navigation.
Additionally, in order to easily enable navigation and data exchange across sites,
mechanisms are required to represent the data in an interoperable and extensible
way. These are termed semantic data producers.
An intermediary step which may or may not be required is for the collection of
semantic data. In very large sites, this may not be an issue as the information in
the site may be sufficiently linked internally to warrant direct consumption after
production, but in general, many users make small contributions across a range of
services which can benefit from an aggregate view through some collection ser-
vice. Collection services can include aggregation and consolidation systems, se-
mantic search engines or data lookup indexes.

9 http://openid.net/ (URL last accessed 2009-06-09)
10 http://oauth.net/ (URL last accessed 2009-06-09)


Fig. 2.3. A food chain for semantic data on the Social Web

The final step involves consumers of semantic data. Social networking tech-
nologies enable people to articulate their social network via friend connections. A
social network can be viewed as a graph where the nodes represent individuals
and the edges represent relations. Methods from graph theory can be use to study
these networks, and we refer to initial work by (Ereteo et al. 2008) on how social
network analysis can consume semantic data from the food chain.
Also, representing social data in RDF (Resource Description Framework), a
language for describing web resources in a structured way, enables us to perform
queries on a network to locate information relating to people and to the content
that they create. RDF can be used to structure and expose information from the
Social Web allowing the simple generation of semantic mashups for both proprie-
tary and public information. HTML content can also be made compatible with
RDF through RDFa (RDF annotations embedded in XHTML attributes), thereby
enabling effective semantic search without requiring one to crawl a new set of
pages (e.g. the Common Tag11 effort allows metadata and URIs for tags to be ex-
posed using RDFa and shared with other applications). Interlinking social data
from multiple sources may give an enhanced view of information in distributed
communities, and we will describe applications to consume and exchange this
interlinked data in future chapters.

11 http://commontag.org/ (URL last accessed 2009-07-07)


2.5 A practical Social Semantic Web

Applying Semantic Web technologies to social websites allows us to express dif-
ferent types of relationships between people, objects and concepts. By using
common, machine-readable ways for expressing data about individuals, profiles,
social connections and content, these technologies provide a way to interconnect
people and objects on a Social Semantic Web in an interoperable, extensible way.
On the conventional Web, navigation of data across social websites can be a
major challenge. Communities are often dispersed across numerous different sites
and platforms. For example, a group of people interested in a particular topic may
share photos on Flickr, bookmarks on del.icio.us and hold conversations on a dis-
cussion forum. Additionally, a single person may hold several separate online ac-
counts, and have a different network of friends on each. The information existing
on each of these websites is generally disconnected, lacking in semantics, and is
centrally controlled by a single organisation. Individuals generally lack control or
ownership of their own data.
Social websites are becoming more prevalent and content is more distributed.
This presents new challenges for navigating such data. Machine-readable descrip-
tions of people and objects, and the use of common identifiers, can allow for link-
ing diverse information from heterogeneous social networking sites. This creates a
starting point for easy navigation across the information in these networks.
The use of common formats allows interoperability across sites, enabling users
to reuse and link to content across different platforms. This also provides a basis
for data portability, where users can have ownership and control over their own
data and can move profile and content information between services as they wish.
Recently there has been a push within the web community to make data portability
(i.e. the ability for users to port their own data wherever they wish) a reality12.
Additionally, the Social Web and social networking sites can contribute to the
Semantic Web effort. Users of these sites often provide metadata in the form of
annotations and tags on photos, ratings, blogroll links, etc. In this way, social net-
works and semantics can complement each other. Already within online commu-
nities, common vocabularies or folksonomies for tagging are emerging through a
consensus of community members.
In this book we will describe a variety of practical Social Semantic Web appli-
cations that have been enhanced with extra features due to the rich content being
created in social software tools by users, including the following:
 The Twine application from Radar Networks is an example of a system that
leverages both the explicit (tags and metadata) and implicit semantics (auto-
matic tagging of text) associated with content items. The underlying semantic
data can also be exposed as RDF by appending ‘?rdf’ to any Twine URL.

12 http://www.dataportability.org/ (URL last accessed 2009-07-21)


 The SIOC vocabulary is powering an ecosystem of Social Semantic Web appli-
cations producing and consuming community data, ranging from individual
blog exporters to interoperability mechanisms for collaborative work environ-
ments.
 The DBpedia represents structured content from the collaboratively-edited
Wikipedia in semantic form, leveraging the semantics from many social media
contributions by multiple users. DBpedia allows you to perform semantic que-
ries on this data, and enables the linking of this socially-created data to other
data sets on the Web by exposing it via RDF.
 Revyu.com combines Web 2.0-type interfaces and principles such as tagging
with Semantic Web modelling principles to provide a reviews website that fol-
lows the principles of the Linking Open Data initiative (a set of best practice
guidelines for publishing and interlinking pieces of data on the Semantic Web).
Anyone can review objects defined on other services (such as a movie from
DBpedia), and the whole content of the website is available in RDF, therefore it
is available for reuse by other Social Semantic Web applications.
As Metcalfe’s law defines, the value of a network is proportional to the number
of nodes in the network. Metcalfe’s law is strongly related to the network effect of
the Web itself: by providing various links between people, social websites can
benefit from that network effect, while at the same time the Semantic Web also
provides links between various objects on the Web thereby obeying this law
(Hendler and Golbeck 2008).
Therefore, by combining Web 2.0 and Semantic Web technologies, we can en-
visage better interaction between people and communities, as the global number of
users will grow, and hence the value of the network. This will be achieved by (1)
taking into account social interactions in the production of Semantic Web data,
and (2) using Semantic Web technologies to interlink people and communities.

3 Introduction to the Social Web (Web 2.0,
social media, social software)

Web 2.0 is a widely-used and wide-ranging term (in terms of interpretations),
made popular by Tim O’Reilly who wrote an article on the seven features or
principles of Web 2.0. To many people, Web 2.0 can mean many different
things. Most agree that it can be thought of as the second phase of architec-
ture and application development for the Web, and that the related term ‘So-
cial Web’ describes a Web where users can meet, collaborate, and share con-
tent on social spaces via tagged items, activity streams, social networking
functionality, etc. There are many popular examples that work along this col-
laboration and sharing meme: MySpace, del.icio.us, Digg, Flickr, Upcom-
ing.org, Technorati, orkut, 43 Things, and the Wikipedia.

3.1 From the Web to a Social Web

Since it was founded, the Internet has been used to facilitate communication not
only between computers but also between people. Usenet mailing lists and bulletin
boards allowed people to connect with each other and enabled communities to
form, often around topics of interest. The social networks formed via these tech-
nologies were not explicitly stated, but were implicitly defined by the interactions
of the people involved. Later, technologies such as IRC (Internet Relay Chat), web
forums, instant messaging, blogging, social networking services, and even
MMOGs or MMORPGs (massively multiplayer online [role playing] games) have
continued the trend of using the Internet (and the Web) to build communities.
The structural and syntactic web put in place in the early 90s is still much the
same as what we use today: resources (web pages, files, etc.) connected by un-
typed hyperlinks. By untyped, we mean that there is no easy way for a computer
to figure out what a link between two pages means. Beyond links, the nature of the
objects described in those pages (e.g. people, places, etc.) cannot be understood by
software agents. In fact, the Web was envisaged to be much more (Figure 3.1). In
Tim Berners-Lee’s original outline for the Web in 1989, entitled ‘Information
Management: A Proposal’1, resources are connected by links describing the type
of relationships between them, e.g. ‘wrote’, ‘describes’, ‘refers to’, etc. This is a
precursor to the Semantic Web which we will come back to in the next chapter.

1 http://www.w3.org/History/1989/proposal.html (URL last accessed 2009-06-09)



Fig. 3.1. Adapted from ‘Information Management: A Proposal’ by Tim Berners-Lee

Over the last decade and a half, there has been a shift from just ‘existing’ or
publishing on the Web to participating in a ‘read-write’ Web. There has been a
change in the role of a web user from just a consumer of content to an active par-
ticipant in the creation of content. For example, Wikipedia articles are written and
edited by volunteers, Amazon.com uses information about what users view and
purchase to recommend products to other users, and Slashdot moderation is per-
formed by the readers.
Web 2.02 is a widely-used and wide-ranging term (certainly in terms of inter-
pretations) made popular by Tim O’Reilly. O’Reilly defined Web 2.0 as ‘a set of
principles and practices that ties together a veritable solar system of sites that
demonstrate some or all of those principles, at a varying distance from that core’.
While this definition is quite vague, he defined seven features or principles of
Web 2.0, to which some have added an eighth: the long tail phenomenon (i.e.
many small contributors and sites outweighing the main players). Among these
features, two points seems particularly important: ‘the Web as a platform’ and ‘an
architecture of participation’. Actually, in spite of the 2.0 numbering, this vision is
close to the original idea of Berners-Lee for the Web, i.e. that it should be a par-

2 http://tinyurl.com/7tcjz (URL last accessed 2009-06-09)

3 Introduction to the Social Web 23

ticipative medium. For example, the first Web browser called WorldWideWeb3
was already a read-write browser, while current ones are generally read-only.
The first idea from O’Reilly of ‘the Web as a platform’ considers the Web and
its principles as a way to provide services and value-added applications in addition
to generally static contents. In some cases, the Web can even be seen as a transit
layer for information to the desktop or mobile devices, for example, using RSS.
We can also consider that ‘the Web as a platform’ refers to the migration of tradi-
tional desktop services such as e-mail and word processing to web-based applica-
tions, for example, as provided by Google with Gmail and Google Docs. In that
context, the vision of ‘an architecture of participation’ emphasises how applica-
tions can help to produce value-added content and synergies by simply using
them, thanks to the way they were designed. As people begin to use Web 2.0 ap-
plications for their own needs (uploading pictures, writing blog posts, tagging con-
tent), they enhance the global activity of the system and this can be a benefit for
everyone. O’Reilly hence makes a comparison with open-source development
principles and peer-to-peer architectures in relation to how they are providing the
same kind of architecture of participation.
The evolution of the Web is - in our opinion - mostly a sociological and eco-
nomic one, as referred to in the book ‘Wikinomics’ (Tapscott and Williams 2007).
However, thanks to the strong interactions between services and users, it has led to
interesting practices in terms of software development. O’Reilly in particular in-
cites application developers to go further than they would in traditional develop-
ment processes and to constantly deliver new features, leading to ‘the perpetual
beta’, considering that ‘users must be treated as co-developers’. Agile develop-
ments methods are therefore becoming popular on the Web, as well as languages
that adhere to such software development principles (e.g. Ruby on Rails).

Fig. 3.2. The Social Web in simple terms: users, content, tags and comments

3 http://www.w3.org/People/Berners-Lee/WorldWideWeb.html (URL last accessed 2009-06-09)


While some describe Web 2.0 simply as a second phase of architecture and ap-
plication development for the World Wide Web, others mainly think of it as a
place where ‘ordinary’ users can meet, collaborate, and share content using social
software applications on the Web - via tagged items, social bookmarking, AJAX
functionality, etc. - hence the term ‘the Social Web’. The Social Web is a platform
for social and collaborative exchange with reusable community contributions,
where anyone can mass publish using web-based social software, and others can
subscribe to desired information, news, data flows, or other services via syndica-
tion formats such as RSS.
There are many popular examples that work along this collaboration and shar-
ing meme: Twitter, del.icio.us, Digg, Flickr, Technorati, orkut, 43 Things, Wikipe-
dia, etc. It is ‘social software’ that is being used for this communication and col-
laboration, software that ‘lets people rendezvous, connect or collaborate by use of
a computer network. It results in the creation of shared, interactive spaces.’4 With
the Social Web, all of us have become participants, often without realising the part
we play on the Web - clicking on a search result, uploading a video or social net-
work page - all of this contributes to and changes this Social Web infrastructure.
There may be different motivations for leveraging social websites, from personal
expression to political campaigning (e.g. Barack Obama’s presidential campaign
raised 87% of his funds through social websites5).
Social websites provide access to community-contributed content that is posted
by some user and may be tagged and can be commented upon by others (Figure
3.2). That content (termed ‘social media’) can be virtually anything: blog entries,
message board posts, videos, audio, images, wiki pages, user profiles, bookmarks,
events, etc. Users post and share content items with others; they can annotate con-
tent with tags; can browse related content via tags; they may often discuss content
via comments; and they may connect to each other directly or via posted content.
Social websites that are sharing content are covered in some part by what is
called the Digital Millennium Copyright Act (DMCA)6. It provides a safe harbour
if a service cannot reasonably prevent against anything and everything being up-
loaded (and is unaware of it). The user agreements of most social websites usually
request that users do not add other people’s copyright material. Fair use is usually
permitted, such that if one shares something copyrighted they should use an ex-
tract with a link to the main content.
There are a variety of figures available for the ratio of social media contributors
versus casual browsers or lurkers7 on social websites. CNET’s News.com8 site
says: ‘A recent Hitwise study indicates that as few as 4 percent of Internet users
actually contribute to sites like YouTube and Flickr, and more than 55 percent are

4 http://en.wikipedia.org/w/index.php?title=Social_software&oldid=26231487 (accessed 2009-06-09)
5 http://tinyurl.com/4t2r6h (URL last accessed 2009-06-09)
6 http://www.copyright.gov/legislation/dmca.pdf (URL last accessed 2009-06-09)

7 http://www.tiara.org/blog/?p=272 (URL last accessed 2009-06-09)

8 http://tinyurl.com/lrw3l9 (URL last accessed 2009-06-09)


men. [...] To be the mainstream trend (that it [Web 2.0] deserves to be), it must
evolve from the currently small group of people who are creating and filtering our
content to a position where the ‘everyman’ is embraced.’
The UK technology site vnunet.com9 mentions: ‘Bill Tancer, general manager
of Hitwise, said that the company’s data showed that only a tiny fraction of users
contributed content to community media sites. Just 0.16% of YouTube users up-
load videos, and only 0.2% of Flickr users upload photos. Wikipedia returned a
more reasonable percentage, with 4.6% of visitors actually editing and adding in-
formation.’ We can deduce that the percentage of contributors may include those
who upload content items (videos, images, etc.) as well those who comment on
that content.

3.2 Common technologies and trends

We shall now describe some of the common technologies used and other trends in
social websites, including RSS, AJAX, mashups, content delivery, and advertising
models. Future chapters will describe typical usages of these features, including
blogging, wiki-based collaborations, and social networking.

3.2.1 RSS

As we will see in this book, Social Web principles allow people to publish infor-
mation more often and more easily. Consequently, there is a need for readers to
know where to get new and pertinent information and how to consume it. Content
syndication aims to solve that issue, by providing a website with the means to
automatically deliver the latest content from blogs, wikis, forums or news services
in computer-readable feeds that can be reused and subscribed to by other people
and systems. For example, news content from newspapers is often syndicated so
that headlines can be read by people in their own feed reader programs or inte-
grated into their own websites. Rather than mass spamming via e-mail, interested
parties can subscribe to feeds to be notified about changes or updates to informa-
tion (self service). A common syndication format can have many uses, including
connecting services together, ‘mashing’ together of data, etc.
Previous to syndication, semi-regular visits to bookmarked sites resulted in a
lack of accuracy in monitoring information. Now, feed aggregators or readers al-
low you to check multiple blog or news feeds on a regular basis, and you can
choose to view only new or updated posts since your last access. You can pull in-
formation from sites and put it directly into your desktop (Thunderbird) or

9 http://tinyurl.com/ksdwkw (URL last accessed 2009-06-09)


browser application (Google Reader, Bloglines), allowing you to quickly scan a
human-readable view of multiple feeds for relevant content. Intelligent pushing of
feeds (e.g. with ‘pingback’) can also be facilitated to update content immediately
on aggregator sites (e.g. PlanetPlanet) or other search and navigation applications.
Content syndication is thus a first step towards the Semantic Web as it provides
interoperability between applications. We shall see later on that it is somewhat
limited in terms of achieving the complete goal.
In order to define a standard for modelling such information feeds, various for-
mats such as NewsML10 were proposed in the late 90s. The most commonly
adopted syndication format is ‘RSS’, which has various meanings (Really Simple
Syndication, Rich Site Summary and RDF Site Summary) and comes in different
flavours (currently there are eight variations). Some of the variations are from pri-
vate organisations (0.9 by Netscape), some of them are closed (2.0), and some of
them are from open consortiums (1.0). However, they all share the same basic
principles: the latest articles, with hyperlinks, titles and summaries, are syndicated
using a computer-readable format (XML or RDF). In general, one does not have
to worry about which feed format a blog or website provides, because practically
any aggregator or news reader will be able to read it anyway. From the Semantic
Web perspective, the RSS 1.0 variant (in RDF) allows us to combine syndicated
articles with metadata from other vocabularies such as FOAF or Dublin Core.

Fig. 3.3. Content on a blog being published as RSS

The RSS feed structure (as shown in Figure 3.3) is as follows:
 Class ‘channel’:
– Properties ‘title’, ‘link’, ‘description’
– Contains ‘items’

 Class ‘item’:
– Properties ‘title’, ‘link’, ‘description’, ‘date’, ‘creator’, etc.

10 http://newsml.org/ (URL last accessed 2009-06-09)


The strength of RSS is in its generality, but therein lies its weakness: when one is
subscribed to multiple channels or items, there is no way to easily group by differ-
ent types of content based on the available metadata. RSS is used for more than
just blog headlines and news syndication, having applications in libraries (e.g. to
announce new book acquisitions), shared calendars (RSSCalendar.com), recipe
clubs, etc. Executives in many corporations are also starting to mandate what RSS
feeds they wants their companies to provide.
Similar to RSS is the Atom Syndication Format11, an XML format and recent
IETF standard that is also commonly used for syndicating web feeds (e.g. from
Blogger.com). The lack of unification between RSS formats is one of the reasons
that led to Atom being created. The Atom Publishing Protocol12 (APP or AtomPub
for short) is related to this, being a simple HTTP-based protocol for creating and
modifying web resources, and the specification was edited by Joe Gregorio and
Bill de hÓra.
One important thing regarding content syndication is the way in which it can
not only enable a user to control the consumption of information, but through Web
2.0-type services, the user can also control its production (i.e. they can control
when and where it must be delivered, contrary to traditional mailing list subscrip-
tions, for example).

3.2.2 AJAX

AJAX, standing for Asynchronous JavaScript and XML, is a method for creating
interactive web applications whereby data is retrieved from a web server asyn-
chronously without interrupting the display of a currently-viewed web page.
AJAX has won over much of the Web due to the seamless interaction it provides,
and website developers have voted with their feet by deploying it on their sites. As
an example of AJAX in use, Google Maps retrieves surrounding map image tiles
for a map being displayed on screen, so that when one moves in any direction, the
new map can be displayed without reloading the browser window.
One of the challenges with AJAX is that the source code is often available to
anyone with a web browser. It is therefore crucial to protect against having some-
one ever executing any JavaScript code that is external to an AJAX application.
Since browsers were not initially designed for AJAX-type methods, it has also
taken some years for browsers to become solid productive AJAX containers. With
the emergence of JIT (just-in-time) compilation technology, browsers running
AJAX code will soon be able to operate at least two to three times faster.

11 http://tools.ietf.org/rfc/rfc4287.txt (URL last accessed 2009-06-09)
12 http://tinyurl.com/yscdv9 (URL last accessed 2009-06-09)


There are currently 50 or 60 AJAX development toolkits, but many believe that
the Web industry should rally around a smaller number, especially open-source
technologies which offer long-term portability across all the leading platforms.
According to Scott Dietzen, president and chief technical officer of Zimbra,
their web-based e-mail application is one of the largest AJAX-based web applica-
tions (with thousands of lines of JavaScript code)13, and there are more than
11,000 participants in the Zimbra open-source community.
There are some common techniques for speeding up AJAX applications.
Firstly, code should be combined wherever possible. Then, pages are compressed
to shrink the required bandwidth for smaller pipes. The next method is caching,
which avoids browsers having to re-get the JavaScript and re-interpret it (e.g. by
including dates for when the JavaScript files were last updated). The last and most
useful technique is ‘lazy loading’. For cases where a very large JavaScript applica-
tion is on a single page, it can be broken up into several modules that can be
loaded on demand, reducing the time from when one can first see an application to
when they can start using it.
However, while AJAX generally aims to provide user-friendly and intuitive in-
terfaces, it can also lead to some usability issues. For example, pages rendered via
AJAX cannot generally be bookmarked as they will not have a proper URL but of-
ten use, for example, the homepage URL.

3.2.3 Mashups

‘Mashups’ (services that combine content from more than one source into an inte-
grated experience, often with new browsing and visualisation capabilities such as
geolocation) are also becoming more common in social websites, and the recent
Pipes service14 from Yahoo! illustrates just some of the possibilities offered by
combining RSS feeds with data and functionality from other sources.
A mashup is a web application that combines data from multiple sources into a
single integrated tool. The term mashup can apply to composite applications,
gadgets, management dashboards, ad-hoc reporting mechanisms, spreadsheets,
data migration services, social software applications and content aggregation sys-
tems. In the mashup space, companies are either operating as mashup builders or
mashup infrastructure players. According to ProgrammableWeb15, there are now
around 400 to 500 mashup APIs available, but there are 140 million websites ac-
cording to NetCraft, so there is a mismatch in terms of the number of services
available to sites.

13 http://tinyurl.com/scottdietzen (URL last accessed 2009-06-09)
14 http://pipes.yahoo.com/ (URL last accessed 2009-06-09)
15 http://www.programmableweb.com/ (URL last accessed 2009-06-09)


The main value of mashups is in combining data. For example, HousingMaps,
a mashup of Google Maps and data from craigslist (Figure 3.4), was one of the
first really useful mashups. One of the challenges with mashups is that they are
normally applied to all items in a data set, but if you are looking for a house, you
may want a mashup that allows you to filter by things like school district ratings,
fault lines, places of worship, or even by proximity to members of your Facebook
or MySpace social network.

Fig. 3.4. The HousingMaps website integrates online accommodation data with a geographical
mapping service

Mashups are also being used in business automation to automate internal proc-
esses, e.g. to counteract the time wasted by ‘swivel-chair integration’ where
someone is moving from one browser on one computer to another window and
back again to do something manually. Content migration via mashups has been
found to be more useful than static migration scripts since they can be customised
and controlled through a web interface.
Rod Smith says16 that mashups allow content to be generated from a combina-
tion of rich interactive applications, do-it-yourself applications plus the current
‘scripting renaissance’ (e.g. as described in the previous section on AJAX). Ac-
cording to Joe Keller, marketing officer with Kapow17, the three components of a
mashup are the presentation layer, logic layer, and the data layer - i.e. access to
fundamental or value-added data.
Fundamental data includes structured data, standard feeds and other data that
can be subscribed to, basically, data that is open to everyone. The value-added

16 http://2006.blogtalk.net/Main/RodSmith (URL last accessed 2009-06-09)
17 http://www.slideshare.net/schee/s18 (URL last accessed 2009-06-09)


data is more niche: unstructured data, individualised data, vertical data, etc. The
appetite for data collection is growing, especially around the area of automation to
help organisations with this task. The amount of user-generated content available
on the Social Web is a goldmine of potential mashup data, enabling one to create
more meaningful time series that can be mashed up quickly into applications.
However, Keller claims that the primary obstacle to the benefit of value-added
data is the lack of standard feeds or APIs for this data. We shall discuss in Chapter
12 how the Semantic Web can help with this problem and can help to enhance
mashup development.

3.2.4 Advertising

With the advent of Web 2.0, web-based advertising is often classified into three
categories: banners and rich media, list-type advertisements, and mobile advertis-
ing (i.e. a combination of banner and ad lists grouped together on a mobile plat-
form), according to Rie Yamanaka, director with Yahoo! Japan’s commercial
search subsidiary Overture KK18. Ad lists are usually quite accurate in terms of
targeting since they are shown and ranked based on a degree of relevance to what
a user is looking at or for. The focus has also shifted from TV and radio advertis-
ing towards Internet-based advertising and it is growing exponentially, primarily
driven by ad lists and mobile ads.
In terms of metrics, traditionally Internet-based ads have been classified in
terms of what one wants to achieve. For banner ads (which many think of as being
very ‘Web 1.0’-like), the number of impressions is key (e.g. if one is advertising a
film, the volume of graphic ads displayed is most important) and charges are
based on what is termed the CPM (cost per mille or thousand). However, ad lists
(as shown in search results where the aim is to get a full web page on the screen)
are focussed more on rankings and the CPC (cost per click), and are often associ-
ated with Web 2.0 where the fields of SEO (search engine optimisation) and SEM
(search engine marketing) come into play. Another term that is now becoming
more important is the CPA (cost per acquisition), i.e. how much it costs to acquire
a customer.
Four trends (with associated challenges) are quite important in the field of web-
based advertising: the first is increased traceability (i.e. how one can track and
keep a log of who did what); the next is behavioural or attribute-based targeting
(over one-third of websites are now capable of behavioural targeting according to
Advertising.com19); the third is APIs for advertising (interfacing with traditional
business workflows); and finally is the integration between offline and online me-
dia (where the move to search for information online is becoming prevalent).

18 http://tinyurl.com/rieyamanaka (URL last accessed 2009-06-09)
19 http://www.docstoc.com/docs/1748110/publisher-survey-07 (last accessed 2009-06-09)


1. With traceability, one can get a list of important keywords in searches that re-
sult in subsequent clicks, with the ultimate aim of increasing revenues. Search
engine marketing (SEM) can also be used to help eliminate the loss of opportu-
nities that may occur through missed clicks. The greatest challenge in the world
of advertising is figuring out how much in total or how much extra a company
makes as a result of advertising (based on what form of campaign is used). If
one can figure out a way to link sales to ads, e.g. through internet conversion
where one can trace when a person moves onwards from an ad and makes a
purchase, then one can get a measure of the CPA. On the Web, one can get a
traceable link from an ad impression to an eventual deal or transaction (through
clicking on something, browsing, getting a lead, and finding a prospect). One
can also compare targeted results and what a customer did depending on
whether they came from an offline reference (e.g. through custom URLs for
offline ads) or directly online. For companies who are not doing business on
the Web, its harder to link a sale to an ad (e.g. if someone wants to buy a
Lexus, and reads reference material on the Web, they may then go off and buy
a BMW without any traceable link).
2. Behavioural targeted advertising, based on a user’s search history, can give
advertisers a lot of useful information. One can use, for example, information
on gender (i.e. static details) or location (i.e. dynamic details, perhaps from an
IP address) for attribute-based targeting. This can also be used to provide per-
sonalised communication methods with users, so that very flexible products can
be deployed as a result. Spend on behavioural targeted advertising is continuing
to grow at a significant rate due to a combination of greater advertiser accep-
tance and greater publisher support. By 2011, ‘very large publishers will be
selling 30% to 50% of their ad inventory using this [behavioural targeting]
technique’, according to Bill Gossman, CEO of Revenue Science20.
3. APIs for advertising can be combined with core business flows, especially
when a company provides many products, e.g. Amazon.com or travel services.
For a large online retailer, there can be logic that will match a keyword with the
current inventory, and the system will hide certain keywords if associated items
are not in stock. This is also important in the hospitality sector, where for ex-
ample there should be a change in the price of a product when it goes past a
best-before time or date (e.g. hotel rooms normally drop in price after 9 PM).
With an API, one can provide highly optimised ads that cannot be created on-
the-fly by people. Advertisers can therefore take a scientific approach to dy-
namically improving their offerings in terms of cost and sales.
4. Matching online information to offline ads, while not directly related to Web
2.0, is important too. Web 2.0 is about personalisation, and targeting internet-
based ads towards segmented usergroups is of interest, and so there is a need to
find the best format and media to achieve this. If one looks at TV campaigns,
one can analyse information about how advertising the URL for a particular

20 http://www.emarketer.com/Article.aspx?R=1004989 (URL last accessed 2009-06-09)


brand can lead to people visiting the associated website. Some people may only
visit a site after seeing an offline advertisement, so there can be a distinct mes-
sage sent to these types of users. If a TV ad shows a web address, it can result
in nearly 2.5 times more accesses than could be directly obtained via the Inter-
net (depending on the type of products being advertised), so one can attract a
lot more people to a website in this way. There is a lot of research being carried
out into how to effectively guide people from offline ads to the Web, e.g. by
combining campaigns in magazines with TV slots. It depends on what service
or product a customer should get from a company, as this will determine the
type of information to be sent over the Web and whether giving a good user
experience is important (since you many not want to betray the expectations of
users and what they are looking for). Those in charge of brands for websites
need to understand how people are getting to a particular web page when there
are many different entry points to a site. It is also important to understand why
customers who watch TV are being invited onto the Web: if it is for govern-
ment information, selling products, etc. The purpose of a 30-second advert may
actually be to guide someone to a website where they will read material online
for more than five minutes. In the reverse direction (i.e. using online informa-
tion to guide offline choices), there are some interesting statistics. According to
comScore21, pre-shoppers on the Web will spend 41% more in a real store if
they have seen internet-based ads for a product (and for every dollar that pre-
shoppers spend online, they may spend an incremental $6 in-store22).
Since much of the information in social websites contains inherent semantic
structures and links, advertising campaigns can be created that will focus on cer-
tain topics or profile information. The semantic graph categorises people, places,
organisations, products, companies, events, places, and other objects, and defines
the relationships among them. Users can define new profile categories and add
metadata to these categories that can help improve the relevance of advertising
engines. That metadata can then be used to personalise advertising content and
provide targeted solutions to advertisers.

3.2.5 The Web on any device

There has been a gradual move from the Web running as an application on various
operating systems and hardware to the Web itself acting as a kind of operating
system (e.g. Google Chrome OS23), where a variety of applications can now run
within web browsers across a range of hardware platforms. Due to the range of

21 http://tinyurl.com/n2wbvp (URL last accessed 2009-06-09)
22 http://us.i1.yimg.com/us.yimg.com/i/adv/research/robo5_final.pdf (accessed 2009-06-09)
23 http://tinyurl.com/mkt6lv (accessed 2009-07-21)

Seo2india the social semantic web

Seo2india the social semantic web

Recommended

Recommended

More Related Content

Similar to Seo2india the social semantic web

Similar to Seo2india the social semantic web (20)

More from SEO2India - Devang Barot - SEO2India

More from SEO2India - Devang Barot - SEO2India (19)

Recently uploaded

Recently uploaded (20)

Seo2india the social semantic web