Seo2india the social semantic web

Uploaded on

SEO2India - An Ahmedabad Based SEO for more info visit

SEO2India - An Ahmedabad Based SEO for more info visit

More in: Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. The Social Semantic Web
  • 2. John G. Breslin · Alexandre Passant · Stefan DeckerThe SocialSemantic Web123
  • 3. John G. Breslin Alexandre PassantElectrical and Electronic Engineering Digital Enterprise ResearchSchool of Engineering and Informatics Institute (DERI)National University of Ireland, Galway National University of Ireland, GalwayNuns Island IDA Business ParkGalway Lower DanganIreland Ireland alexandre.passant@deri.orgStefan DeckerDigital Enterprise Research Institute (DERI)National University of Ireland, GalwayIDA Business ParkLower DanganGalwayIrelandstefan.decker@deri.orgISBN 978-3-642-01171-9 e-ISBN 978-3-642-01172-6DOI 10.1007/978-3-642-01172-6Springer Heidelberg Dordrecht London New YorkLibrary of Congress Control Number: 2009936149ACM Computing Classification (1998): H.3.5, H.4.3, I.2, K.4 c Springer-Verlag Berlin Heidelberg 2009This work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9,1965, in its current version, and permission for use must always be obtained from Springer. Violationsare liable to prosecution under the German Copyright Law.The use of general descriptive names, registered names, trademarks, etc. in this publication does notimply, even in the absence of a specific statement, that such names are exempt from the relevant protectivelaws and regulations and therefore free for general use.Cover design: KuenkelLopka GmbHPrinted on acid-free paperSpringer is part of Springer Science+Business Media (
  • 4. Contents1 Introduction to the book ....................................................................................1 1.1 Overview......................................................................................................1 1.2 Aims of the book, and who will benefit from it? .........................................3 1.3 Structure of the book....................................................................................4 1.3.1 Motivation for applying Semantic Web technologies to the Social Web ...............................................................................................................5 1.3.2 Introduction to the Social Web (Web 2.0, social media, social software)........................................................................................................5 1.3.3 Adding semantics to the Web...............................................................6 1.3.4 Discussions...........................................................................................6 1.3.5 Knowledge and information sharing ....................................................6 1.3.6 Multimedia sharing...............................................................................7 1.3.7 Social tagging.......................................................................................7 1.3.8 Social sharing of software ....................................................................7 1.3.9 Social networks ....................................................................................8 1.3.10 Interlinking online communities.........................................................8 1.3.11 Social Web applications in enterprise ................................................8 1.3.12 Towards the Social Semantic Web.....................................................92 Motivation for applying Semantic Web technologies to the Social Web .....11 2.1 Web 2.0 and the Social Web......................................................................11 2.2 Addressing limitations in the Social Web with semantics .........................13 2.3 The Social Semantic Web: more than the sum of its parts.........................15 2.4 A food chain of applications for the Social Semantic Web .......................17 2.5 A practical Social Semantic Web ..............................................................193 Introduction to the Social Web (Web 2.0, social media, social software) ....21 3.1 From the Web to a Social Web ..................................................................21 3.2 Common technologies and trends ..............................................................25 3.2.1 RSS.....................................................................................................25 3.2.2 AJAX..................................................................................................27 3.2.3 Mashups .............................................................................................28 3.2.4 Advertising .........................................................................................30 3.2.5 The Web on any device ......................................................................32 3.2.6 Content delivery .................................................................................34 3.2.7 Cloud computing ................................................................................35 3.2.8 Folksonomies .....................................................................................38 3.3 Object-centred sociality .............................................................................39
  • 5. vi The Social Semantic Web 3.4 Licensing content....................................................................................... 42 3.5 Be careful before you post ......................................................................... 42 3.6 Disconnects in the Social Web .................................................................. 444 Adding semantics to the Web .......................................................................... 45 4.1 A brief history............................................................................................ 45 4.2 The need for semantics .............................................................................. 47 4.3 Metadata .................................................................................................... 51 4.3.1 Resource Description Framework (RDF)........................................... 52 4.3.2 The RDF syntax ................................................................................. 54 4.4 Ontologies.................................................................................................. 56 4.4.1 RDF Schema ...................................................................................... 59 4.4.2 Web Ontology Language (OWL)....................................................... 61 4.5 SPARQL.................................................................................................... 62 4.6 The ‘lowercase’ semantic web, including microformats ........................... 64 4.7 Semantic search ......................................................................................... 66 4.8 Linking Open Data .................................................................................... 67 4.9 Semantic mashups ..................................................................................... 69 4.10 Addressing the Semantic Web ‘chicken-and-egg’ problem..................... 715 Discussions ........................................................................................................ 75 5.1 The world of boards, blogs and now microblogs....................................... 75 5.2 Blogging .................................................................................................... 76 5.2.1 The growth of blogs ........................................................................... 77 5.2.2 Structured blogging ............................................................................ 79 5.2.3 Semantic blogging.............................................................................. 81 5.3 Microblogging ........................................................................................... 85 5.3.1 The Twitter phenomenon ................................................................... 88 5.3.2 Semantic microblogging .................................................................... 89 5.4 Message boards.......................................................................................... 91 5.4.1 Categories and tags on message boards.............................................. 92 5.4.2 Characteristics of forums ................................................................... 94 5.4.3 Social networks on message boards ................................................... 97 5.5 Mailing lists and IRC............................................................................... 1006 Knowledge and information sharing ............................................................ 103 6.1 Wikis........................................................................................................ 103 6.1.1 The Wikipedia.................................................................................. 105 6.1.2 Semantic wikis ................................................................................. 105 6.1.3 DBpedia ........................................................................................... 110 6.1.4 Semantics-based reputation in the Wikipedia .................................. 111
  • 6. Contents vii 6.2 Other knowledge services leveraging semantics......................................112 6.2.1 Twine................................................................................................112 6.2.2 The Internet Archive ........................................................................115 6.2.3 Powerset ...........................................................................................117 6.2.4 OpenLink Data Spaces .....................................................................119 6.2.5 Freebase............................................................................................1197 Multimedia sharing ........................................................................................121 7.1 Multimedia management .........................................................................121 7.2 Photo-sharing services .............................................................................122 7.2.1 Modelling RDF data from Flickr......................................................123 7.2.3 Annotating images using Semantic Web technologies.....................125 7.3 Podcasts ...................................................................................................126 7.3.1 Audio podcasts .................................................................................127 7.3.2 Video podcasts .................................................................................129 7.3.3 Adding semantics to podcasts ..........................................................131 7.4 Music-related content ..............................................................................133 7.4.1 DBTune and the Music Ontology.....................................................133 7.4.2 Combining social music and the Semantic Web ..............................1348 Social tagging ..................................................................................................137 8.1 Tags, tagging and folksonomies ..............................................................137 8.1.1 Overview of tagging.........................................................................137 8.1.2 Issues with free-form tagging systems .............................................140 8.2 Tags and the Semantic Web.....................................................................142 8.2.1 Mining taxonomies and ontologies from folksonomies ...................143 8.2.2 Modelling folksonomies using Semantic Web technologies............144 8.3 Tagging applications using Semantic Web technologies.........................148 8.3.1 Annotea ............................................................................................148 8.3.2 8.3.3 SweetWiki ........................................................................................151 8.3.4 ............................................................................................151 8.3.5 LODr ................................................................................................152 8.3.6 Atom Interface..................................................................................153 8.3.7 Faviki................................................................................................154 8.4 Advanced querying capabilities thanks to semantic tagging ...................155 8.4.1 Show items with the tag ‘semanticweb’ on any platform.................155 8.4.2 List the ten latest items tagged by Alexandre on SlideShare............155 8.4.3 List the tags used by Alex on SlideShare and by John on Flickr......157 8.4.4 Retrieve any content tagged with something relevant to the Semantic Web field ...................................................................................158
  • 7. viii The Social Semantic Web9 Social sharing of software.............................................................................. 159 9.1. Software widgets, applications and projects ........................................... 159 9.2 Description of a Project (DOAP)............................................................. 160 9.2.1 Examples of DOAP use.................................................................... 161 9.3 Crawling and browsing software descriptions ......................................... 164 9.4 Querying project descriptions and related data........................................ 166 9.4.1 Locating software projects from people you trust ............................ 166 9.4.2 Locating a software project related to a particular topic .................. 16710 Social networks............................................................................................. 169 10.1 Overview of social networks ................................................................. 169 10.2 Online social networking services ......................................................... 173 10.3 Some psychology behind SNS usage..................................................... 175 10.4 Niche social networks............................................................................ 177 10.5 Addressing some limitations of social networks.................................... 179 10.6 Friend-of-a-Friend (FOAF).................................................................... 181 10.6.1 Consolidation of people objects ..................................................... 184 10.6.2 Aggregating a person’s web contributions ..................................... 186 10.6.3 Inferring relationships from aggregated data.................................. 187 10.7 hCard and XFN...................................................................................... 189 10.8 The Social Graph API and OpenSocial ................................................. 190 10.8.1 The Social Graph API .................................................................... 190 10.8.2 OpenSocial ..................................................................................... 192 10.9 The Facebook Platform.......................................................................... 193 10.10 Some social networking initiatives from the W3C .............................. 194 10.11 A social networking stack.................................................................... 19411 Interlinking online communities ................................................................. 197 11.1 The need for semantics in online communities...................................... 197 11.2 Semantically-Interlinked Online Communities (SIOC)......................... 198 11.2.1 The SIOC ontology ........................................................................ 201 11.2.2 SIOC metadata format.................................................................... 203 11.2.3 SIOC modules ................................................................................ 205 11.3 Expert finding in online communities.................................................... 206 11.3.1 FOAF for expert finding ................................................................ 208 11.3.2 SIOC for expert finding.................................................................. 209 11.4 Connections between community description formats .......................... 211 11.5 Distributed conversations and channels................................................. 212 11.6 SIOC applications.................................................................................. 215 11.7 A food chain for SIOC data ................................................................... 216 11.7.1 SIOC producers .............................................................................. 218 11.7.2 SIOC collectors .............................................................................. 223 11.7.3 SIOC consumers............................................................................. 224 11.8 RDFa for interlinking online communities ............................................ 231
  • 8. Contents ix 11.9 Argumentative discussions in online communities................................234 11.10 Object-centred sociality in online communities...................................236 11.11 Data portability in online communities................................................238 11.11.1 The DataPortability working group..............................................238 11.11.2 Data portability with FOAF and SIOC.........................................240 11.11.3 Connections between portability efforts.......................................241 11.12 Online communities for health care and life sciences..........................242 11.12.1 Semantic Web Applications in Neuromedicine............................243 11.12.2 Science Collaboration Framework ...............................................244 11.12.3 bio-zen and the art of scientific community maintenance ............246 11.13 Online presence....................................................................................246 11.14 Online attention....................................................................................247 11.15 The SIOC data competition .................................................................24712 Social Web applications in enterprise.........................................................251 12.1 Overview of Enterprise 2.0 ....................................................................251 12.2 Issues with Enterprise 2.0 ......................................................................255 12.2.1 Social and philosophical issues with Enterprise 2.0 .......................255 12.2.2 Technical issues with Enterprise 2.0 ..............................................258 12.3 Improving Enterprise 2.0 ecosystems with semantic technologies........262 12.3.1 Introducing SemSLATES...............................................................262 12.3.2 Implementing semantics in Enterprise 2.0 ecosystems ..................263 12.3.3 SIOC for collaborative work environments....................................26613 Towards the Social Semantic Web..............................................................269 13.1 Possibilities for the Social Semantic Web .............................................269 13.2 A community-guided Social Semantic Web ..........................................271 13.2.1 Wisdom of the crowds and the Semantic Web...............................272 13.2.2 A grassroots approach ....................................................................273 13.2.3 The vocabulary onion.....................................................................275 13.3 Integrating with the Social Semantic Desktop .......................................278 13.4 Privacy and identity on the Social Semantic Web .................................279 13.4.1 Keeping privacy in mind ................................................................279 13.4.2 Identity fragmentation ....................................................................280 13.5 The vision of a Social Semantic Web ....................................................281Acknowledgments..............................................................................................285Dedication from John........................................................................................287Biographies ........................................................................................................289References ..........................................................................................................291
  • 9. 1 Introduction to the book1.1 OverviewThe Social Web - encompassing social networking services such as MySpace,Facebook and orkut, as well as content-sharing sites (that also offer social net-working functionality) like Flickr, and - has captured the atten-tion of millions of users as well as billions of dollars in investment and acquisi-tion. As more social websites form around the connections between people andtheir objects of interest (to avoid these sites becoming boring), and as these ‘ob-ject-centred networks’ (where people connect via these objects of interest) growbigger and more diverse, more intuitive methods are needed for representing andnavigating the content items in these sites: both within and across social websites.Also, to better enable user access to multiple sites and ultimately to content-creation facilities on the Web, interoperability among social websites is requiredin terms of both the content objects and the person-to-person networks expressedon each site. This requires representation mechanisms to interconnect people andobjects on the Web in an interoperable and extensible way (Breslin and Decker2007). Semantic Web representation mechanisms are ideally suited to describing peo-ple and the objects that link them together in such object-centred networks, by re-cording and representing the heterogeneous ties that bind each to the other. By us-ing agreed-upon Semantic Web formats to describe people, content objects, andthe connections that bind them together, social networks can also interoperate byappealing to common semantics. Developers are already using Semantic Webtechnologies to augment the ways in which they create, reuse, and link content onsocial networking and social websites. These efforts include the Friend-of-a-Friend (FOAF) project1 for describing people and relationships, the Nepomuk so-cial semantic desktop2 which is a framework for extending the desktop to a col-laborative environment for information management and sharing, and the Seman-tically-Interlinked Online Communities (SIOC) initiative3 for representing onlinediscussions (Breslin et al. 2005). Some social networking services (SNSs), such asFriendFeed, are also starting to provide query interfaces to their data, which otherscan reuse and link to via the Semantic Web. The Semantic Web is a useful platform for linking and for performing opera-tions on diverse person- and object-related data (as shown in Figure 1.1) gatheredfrom heterogeneous social websites (in what is termed ‘Web 2.0’).1 (URL last accessed 2009-06-09)2 (URL last accessed 2009-06-09)3 (URL last accessed 2009-06-09)J.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_1,© Springer-Verlag Berlin Heidelberg 2009
  • 10. 2 The Social Semantic Web Metadata and Ontology Unified vocabularies languages representation Unified queries Web 2.0Fig. 1.1. Interconnecting and reusing distributed Web 2.0 data with semantic technologies In the other direction, object-centred networks and user-centric services forgenerating collaborative content can serve as rich data sources for Semantic Webapplications (Figure 1.2). Collaboration Architecture of participation Browsing interfaces Authoring Mash-ups Semantic WebFig. 1.2. Powering semantic applications with rich community-created content and Web 2.0paradigms This linked data can provide an enhanced view of individual or community ac-tivity in localised or distributed object-centred social networks. In fact, since allthis data can be semantically interlinked using well-given semantics (e.g. using the
  • 11. 1 Introduction to the book 3FOAF and SIOC ontologies), in theory it makes no difference whether the contentis distributed or localised. All of this data can be considered as a unique inter-linked machine-understandable graph layer (with nodes as users or related dataand arcs as relationships) over the existing Web of documents and hyperlinks, i.e.a Giant Global Graph as Tim Berners-Lee recently coined4. Moreover, such inter-linked data allows advanced querying capabilities, for example, ‘show me all thecontent that Alice has acted on in the past three months in any SNS’. In this book, we will begin with our motivations followed by overviews of boththe Social Web and the Semantic Web. Then we will describe some popular socialmedia and social networking applications, list some of their strengths and limita-tions, and describe some applications of Semantic Web technologies to addresscurrent issues with social websites by enhancing them with semantics. Across these heterogeneous social websites, we will demonstrate a twofold ap-proach towards integrating the Social Web and the Semantic Web: in particular,(1) by demonstrating how the Semantic Web can serve as a useful platform forlinking and for performing operations on diverse person- and object-related datagathered from these websites, and (2) by showing that in the other direction, socialwebsites can themselves serve as rich data sources for Semantic Web applications. We shall conclude with some observations on how the application of SemanticWeb technologies to the Social Web is leading towards the ‘Social SemanticWeb’, forming a network of interlinked and semantically-rich content and knowl-edge.1.2 Aims of the book, and who will benefit from it?Initially, we aim to educate readers on evolving areas from the world of collabora-tion and communication systems, social software and the Social Web. We shallalso show connections with parallel developments in the Semantic Web effort.Then, we will illustrate how social software applications can be enhanced and in-terconnected with semantic technologies, including semantic and structured blog-ging, interconnecting community sites, semantic wikis, and distributed social net-works. The goal of this book is that readers will be able to apply Semantic Webtechnologies to these and to other application areas in what is termed the SocialSemantic Web. This book is intended for computer science professionals, researchers, academ-ics and graduate students interested in understanding the technologies and researchissues involved in applying Semantic Web technologies to social software. Appli-cations such as blogs, social networks and wikis require more automated ways forinformation distribution. Practitioners and developers interested in such applica-4 (URL last accessed 2009-06-09)
  • 12. 4 The Social Semantic Webtion areas will also learn about methods for increasing the levels of automation inthese forms of web communication. For those who have background knowledge in the area of the Semantic Web,we envisage that this book will help you to develop application knowledge in rela-tion to social software and other widely-used related Social Web technologies. Forthose who already have application knowledge in web engineering or in the devel-opment of systems such as wikis, social networks and blogs, we hope this bookwill inspire you to develop and create ideas on how to increase the usability of so-cial software and other web systems using Semantic Web technologies.1.3 Structure of the bookWe shall now give an introduction to the chapters in this book and explain thelogical chapter layout and flow (Figure 1.3). Following an overview of the motivation for combining the Social Web and theSemantic Web, we will proceed with an introduction to various technologies andtrends in both the Social Web and the Semantic Web domains.Fig. 1.3. Chapter flow for the book
  • 13. 1 Introduction to the book 5 This will be followed by a series of chapters whereby various Social Web ap-plication areas will be introduced, and semantic enhancements to these areas willbe described. The areas we focus on are: online discussion systems such as fo-rums, blogs and mailing lists; knowledge sharing services such as wikis and othersites for (mainly textual) information storage and recovery; multimedia servicesfor sharing images, audio and video files; bookmarking sites and similar servicesorganised around tagging functionality; sites for publishing and sharing commu-nity software projects; online social networking services; interlinked online com-munities; and enterprise applications. These chapters will have varying ratios ofsemantic implementations to non-semantic ones where state-of-the-art semantictechniques may have achieved more traction in some application areas. Finally, in the last chapter we will describe approaches to integrate these socialsemantic applications in what we have termed ‘Social Semantic InformationSpaces’.1.3.1 Motivation for applying Semantic Web technologies to theSocial WebThis part will focus on the motivation for applying Semantic Web technologies tothe Social Web, as summarised in the introductory description just given.1.3.2 Introduction to the Social Web (Web 2.0, social media, socialsoftware)We shall begin with an overview of social websites, looking at common SocialWeb technologies and methods for collaboration, content sharing, data exchangeand representation (enhancing interaction and exchange with AJAX and mashups,how content is being categorised via tagging and folksonomies, etc.). We shallalso discuss existing structured content that is available from social websites,mainly via content syndication whereby people can keep up to date with publishedmaterial using RSS, Atom and other subscription methods. Then we will introducethe notion of object-centred sociality (referencing the observations of JyriEngeström and Karen Knorr-Cetina), where social websites are organised aroundthe objects of interest that connect people together.
  • 14. 6 The Social Semantic Web1.3.3 Adding semantics to the WebIn this chapter, we will examine state of the art in the Semantic Web such asmetadata and ontology standards and mashups, as well as some efforts aimed atproviding semantic search and leveraging linked data. We shall talk about why ob-ject-centred sociality provides a meaning for representing Social Web content us-ing semantics. The chapter will focus not only on the ‘uppercase’ Semantic Web(where formal specifications such as OWL and RDF are used to represent ontolo-gies and associated metadata), but will also look at the ‘lowercase’ semantic web(where developer-led efforts in the microformats community are creating simplesemantic structures for use by ‘people first, machines second’).1.3.4 DiscussionsWe shall describe the area of blogging, one of the most popular Social Web activi-ties. Blogs are online journals or sets of chronological news entries that are main-tained by individuals, communities or commercial entities, and can be used topublish personal opinions, diary-like articles or news stories relating to a particularinterest or product. We shall begin by describing current approaches to blogging,and detail how semantic technologies improve both the processes of creating andediting blog posts, and of browsing and querying the data created by blogs (viastructured blogging and semantic blogging). We shall also discuss forums, mailinglists, and other web-based discussion systems such as microblogging, a recenttrend regarding lightweight and agile communication on the Web.1.3.5 Knowledge and information sharingWikis are collaboratively-edited websites that can be updated or added to by any-one with an interest in the topic covered by the wiki site, and have been used tocreate online encyclopaedias, photo galleries and literature collections. We shalldescribe the Social Web application area of wikis, and describe how adding se-mantics to wikis can offer distinct benefits: augmenting the language text in wikiarticles with structured data and typed links enables advanced querying andbrowsing. We shall examine popular semantic wikis in usage today (e.g. SemanticMediaWiki), and we will look at semantic services that leverage structured infor-mation from wikis (such as the DBpedia). We shall briefly detail how a reputationsystem with embedded semantics could be deployed in a large-scale communitysite like the Wikipedia. We shall also look at the latest wave of knowledge net-
  • 15. 1 Introduction to the book 7working and information sharing services (including Twine, Freebase, andOpenLink Data Spaces).1.3.6 Multimedia sharingWe shall begin by looking at Social Web applications for storing and sharing pho-tographs and other images (Flickr, Zooomr, etc.), and describe an applicationcalled FlickRDF that exports semantic data from the Flickr service. We shall thendescribe both audio and video podcasting, and give some ideas for the applicationof semantics to this area (e.g. through metadata descriptions and applications likeZemPod). We shall finish the chapter with a description of how semantic tech-nologies can be applied to social music services and websites like,through projects such as DBTune and the Music Ontology.1.3.7 Social taggingThis chapter will discuss social tagging and bookmarking services on the Web.We shall look at tagging and how semantics can assist the tagging process as wellas enhancing related aspects such as tag clouds. We shall look at annotated socialbookmarks, where sites like are allowing people to publicly publishtextual descriptions of their favourite links along with associated annotations ofuse to others, and we will describe different issues related to tagging behaviours.We shall describe how semantics can be added to tagging systems, both by defin-ing models to represent tagging activities or particular behaviours and by extract-ing a hierarchy of concepts or vocabularies from tags. Semantic social bookmark-ing and tagging applications (e.g., Revyu, LODr) will also be describedto emphasise how different aspects of tagging applications can be augmentedthanks to Semantic Web technologies.1.3.8 Social sharing of softwareThe Social Web allows us to not only share data or multimedia content, but alsoapplications, especially free-software applications or lightweight add-ons to webpages such as widgets. We shall look at how interoperability among social web-sites is possible not just in terms of the expressed content but also in terms of thesocial applications in use (e.g. widgets) on each site. We shall give an overview ofexisting ways to share software on the Web, focussing on how a social aspect canbe added to data such as software projects or widget descriptions. We shall follow
  • 16. 8 The Social Semantic Webthis with a description of methods for describing software projects using seman-tics, and we will see how applications can be identified and discovered on theWeb thanks to these semantics. We shall also discuss how trust mechanisms forconsuming applications can be leveraged via the distributed social graph so thatusers can decide who to accept any new data or applications from.1.3.9 Social networksWe shall begin with an overview of social networks, and look at current develop-ments regarding the ‘social graph’. We further describe the idea of object-centredsociality as introduced in Chapter 3. We shall then discuss initiatives from majorWeb companies to provide interoperability between social networking applica-tions such as Facebook Connect and Google’s OpenSocial and Social Graph APIs.We shall finish the chapter with a description of how open and distributed seman-tic social networks can be created through definitions such as Friend-of-a-Friend(FOAF) or XHTML Friends Network (XFN), enabling interoperability betweendifferent SNSs.1.3.10 Interlinking online communitiesWe shall describe the usage of Semantic Web technologies for enhancing commu-nity portals and for connecting heterogeneous social websites - SIOC is currentlybeing used for information structuring as well as for export and information dis-semination. We shall describe current standardisation activities as well as researchprototype applications and commercial implementations. We shall also show howSIOC can be combined with other ontologies (including FOAF, SKOS, and Dub-lin Core) in architectures for community site interoperability. We will look at cur-rent projects that enable one to query for topics or to browse distributed discussioncontent across various types of social websites (e.g. the SIOC Explorer, SindiceSIOC Widget).1.3.11 Social Web applications in enterpriseWe shall begin with an overview of Enterprise 2.0, looking at how Social Webapplications are being used internally and externally by companies. We shall thenexamine the application of Semantic Web technologies to Enterprise 2.0 ecosys-tems. In particular, we will look at the usage of semantics in integrated enterprisesocial software suites as well as how the Semantic Web can help us to integrate
  • 17. 1 Introduction to the book 9the various components that are being used in Enterprise 2.0 ecosystems. For ex-ample, we will show how collaborative work environments can be enhancedthrough the application of semantics (e.g. SIOC4CWE).1.3.12 Towards the Social Semantic WebFinally, we will discuss and present current approaches to realize the ideas ofVannevar Bush (Bush 1945) and Doug Engelbart (Engelbart 1962) on distributedcollaboration infrastructures, towards both the Social Semantic Web and the So-cial Semantic Desktop (together, we term these as Social Semantic InformationSpaces). We can combine the semantically-enhanced social software applicationsdescribed in previous chapters into a Social Semantic Information Space. In thespirit of seminal visions such as Bush’s Memex and Engelbart’s open hyperdocu-ment system (OHS), this chapter will detail how previous perspectives on groupforming, network modelling and algorithms, and innovative IT-based interactionwith feedback are driving new initiatives for creating semantic connections withinand between people’s information spaces.
  • 18. 2 Motivation for applying Semantic Webtechnologies to the Social WebMany will have become familiar with popular Social Web applications suchas blogging, social networks and wikis, and will be aware that we are headingtowards an interconnected information space (through the blogosphere, in-ter-wiki links, mashups, etc.). At the same time, these applications are experi-encing boundaries in terms of information integration, dissemination, reuse,portability, searchability, automation and more demanding tasks like query-ing. The Semantic Web is increasingly aiming at these applications areas -quite a number of Semantic Web approaches have appeared in recent yearsto overcome the boundaries in these application areas, e.g. semantic wikis(Semantic MediaWiki), knowledge networking (Twine), embedded microcon-tent detection and reuse (Operator, Headup, Semantic Radar), social graphand data portability APIs (from Google and Facebook), etc. In an effort toconsolidate and combine knowledge about existing efforts, we aim to educatereaders about Social Web application areas and new avenues open to com-mercial exploitation in the Semantic Web. We shall give an overview of howthe Social Web and Semantic Web can be meshed together.2.1 Web 2.0 and the Social WebOne of the most visible trends on the Web is the emergence of the Web 2.0 tech-nology platform. The term Web 2.0 refers to a perceived second-generation ofWeb-based communities and hosted services. Although the term suggests a newversion of the Web, it does not refer to an update of the World Wide Web techni-cal specifications, but rather to new structures and abstractions that have emergedon top of the ordinary Web. While it is difficult to define the exact boundaries ofwhat structures or abstractions belong to Web 2.0, there seems to be an agreementthat services and technologies like blogs, wikis, folksonomies, podcasts, RSSfeeds (and other forms of many-to-many publishing), social software and socialnetworking sites, web APIs, web standards1 and online web services are part ofWeb 2.0. Web 2.0 has not only been a technological but also a business trend: ac-cording to Tim O’Reilly2: ‘Web 2.0 is the business revolution in the computer in-1 (URL last accessed 2009-06-09)2 (accessed 2009-06-09)J.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_2,© Springer-Verlag Berlin Heidelberg 2009
  • 19. 12 The Social Semantic Webdustry caused by the move to the Internet as platform, and an attempt to under-stand the rules for success on that new platform’. Social networking sites such as Facebook (one of the world’s most popularSNSs), Friendster (an early SNS previously popular in the US, now widely used inAsia), orkut (Google’s SNS), LinkedIn (an SNS for professional relationships) andMySpace (a music and youth-oriented service) - where explicitly-stated networksof friendship form a core part of the website - have become part of the daily livesof millions of users, and have generated huge amounts of investment since theybegan to appear around 2002. Since then, the popularity of these sites has grownhugely and continues to do so. (Boyd and Ellison 2007) have described the historyof social networking sites, and suggested that in the early days of SNSs, whenonly the SixDegrees service existed, there simply were not enough users: ‘Whilepeople were already flocking to the Internet, most did not have extended networksof friends who were online’. A graph from Internet World Stats3 shows the growthin the number of Internet users over time. Between 2000 (when SixDegrees shutdown) and 2003 (when Friendster became the first successful SNS), the number ofInternet users had doubled. Web 2.0 content-sharing sites with social networking functionality such asYouTube (a video-sharing site), Flickr (for sharing images) and (a musiccommunity site) have enjoyed similar popularity. The basic features of a socialnetworking site are profiles, friend listings and commenting, often along withother features such as private messaging, discussion forums, blogging, and mediauploading and sharing. In addition to SNSs, other forms of social websites includewikis, forums and blogs. Some of these publish content in structured formats ena-bling them to be aggregated together. A common property of Web 2.0 technologies is that they facilitate collabora-tion and sharing between users with low technical barriers – although usually onsingle sites (e.g. Technorati) or with a limited range of information (e.g. RSS,which we will describe later). In this book we will refer to this collaborative andsharing aspect as the ‘Social Web’, a term that can be used to describe a subset ofWeb interactions that are highly social, conversational and participatory. The So-cial Web may also be used instead of Web 2.0 as it is clearer what feature of theWeb is being referred to4. The Social Web has applications on intranets as well as on the Internet. On theInternet, the Social Web enables participation through the simplification of usercontributions via blogs and tagging, and has unleashed the power of community-based knowledge acquisition with efforts like Wikipedia demonstrating the collec-tive ‘wisdom of the crowds’ in creating the largest encyclopaedia. One outcome ofsuch websites, especially wikis, is that they can produce more valuable knowledgecollectively rather than that created by separated individuals. In this sense, the So-cial Web can be seen as a way to create collective intelligence at a Web-scale3 (URL last accessed 2009-06-09)4 (URL last accessed 2009-06-09)
  • 20. 2 Motivation for applying Semantic Web technologies to the Social Web 13level, following the ‘we are smarter than me’ principles5 (Libert and Spector2008). Similar technologies are also being used in company intranets as effectiveknowledge management, collaboration and communication tools between employ-ees. Companies are also aiming to make social website users part of their IT‘team’, e.g. by allowing users to have access to some of their data and by bringingthe results into their business processes (Tapscott and Williams 2007).2.2 Addressing limitations in the Social Web with semanticsA limitation of current social websites is that they are isolated from one anotherlike islands in the sea (Figure 2.1). For example, different online discussions maycontain complementary knowledge and topics, segmented parts of an answer that aperson may be looking for, but people participating in one discussion do not haveready access to information about related discussions elsewhere. As more andmore social websites, communities and services come online, the lack of interop-erability between them becomes obvious. The Social Web creates a set of singledata silos or ‘stovepipes’, i.e. there are many sites, communities and services thatcannot interoperate with each other, where synergies are expensive to exploit, andwhere reuse and interlinking of data is difficult and cumbersome. The main reason for this lack of interoperation is that for most Social Web ap-plications, communities, and domains, there are still no common standards forknowledge and information exchange or interoperation available. RSS (ReallySimple Syndication), a format for publishing recently-updated Web content suchas blog entries, was the first step towards interoperability among social websites,but it has various limitations that make it difficult to be used efficiently in such aninteroperability context, as we will see later. Another extension of the Web aims to provide the tools that are necessary todefine extensible and flexible standards for information exchange and interopera-bility. The Scientific American article (Berners-Lee et al. 2001) from Berners-Lee,Hendler and Lassila defined the Semantic Web as ‘an extension of the currentWeb in which information is given well-defined meaning, better enabling com-puters and people to work in cooperation’. The last couple of years have seenlarge efforts going into the definition of the foundational standards supporting datainterchange and interoperation, and currently a quite well-defined Semantic Webtechnology stack exists, enabling the creation of defining metadata and associatedvocabularies.5 (URL last accessed 2009-06-09)
  • 21. 14 The Social Semantic Web i ii iii ivFig. 2.1. Creating bridges between isolated communities of users and their data6 A number of Semantic Web vocabularies have achieved wide deployment –successful examples include RSS 1.0 for the syndication of information, FOAF,for expressing personal profile and social networking information, and SIOC, forinterlinking communities and distributed conversations. These vocabularies sharea joint property: they are small, but at the same time vertical – i.e. they are a partof many different domains. Each horizontal domain (e.g. e-health) would typicallyreuse a number of these vertical vocabularies, and when deployed the vocabularieswould be able to interact with each other. The Semantic Web effort is in an ideal position to make social websites inter-operable by providing standards to support data interchange and interoperation be-tween applications, enabling individuals and communities to participate in thecreation of distributed interoperable information. The application of the SemanticWeb to the Social Web is leading to the ‘Social Semantic Web’ (Figure 2.2), cre-ating a network of interlinked and semantically-rich knowledge. This vision of theWeb will consist of interlinked documents, data, and even applications created bythe end users themselves as the result of various social interactions, and it is mod-elled using machine-readable formats so that it can be used for purposes that the6 Images courtesy of Pidgin Technologies at
  • 22. 2 Motivation for applying Semantic Web technologies to the Social Web 15current state of the Social Web cannot achieve without difficulty. As Tim Berners-Lee said in a 2005 podcast7, Semantic Web technologies can support online com-munities even as ‘online communities [...] support Semantic Web data by beingthe sources of people voluntarily connecting things together’. For example, socialwebsite users are already creating extensive vocabularies and semantically-richannotations through folksonomies (Mika 2005a).Fig. 2.2. The Social Semantic Web Because a consensus of community users is defining the meaning, these termsare serving as the objects around which those users form more tightly-connectedsocial networks. This goes hand-in-hand with solving the chicken-and-egg prob-lem of the Semantic Web (i.e. you cannot create useful Semantic Web applicationswithout the data to power them, and you cannot produce semantically-rich datawithout the interesting applications themselves): since the Social Web containssuch semantically-rich content, interesting applications powered by Semantic Webtechnologies can be created immediately.2.3 The Social Semantic Web: more than the sum of its partsThe combination of the Social Web and Semantic Web can lead to somethinggreater than the sum of its parts: a Social Semantic Web (Auer et al. 2007, Blu-7 (URL last accessed 2009-06-09)
  • 23. 16 The Social Semantic Webmauer and Pellegrini 2008) where the islands of the Social Web can be intercon-nected with semantic technologies, and Semantic Web applications are enhancedwith the wealth of knowledge inherent in user-generated content. In this book, we will describe various solutions that aim to make social web-sites interoperable, and which will take them beyond their current limitations toenable what we have termed Social Semantic Information Spaces8. Social Seman-tic Information Spaces are a platform for both personal and professional collabora-tive exchange with reusable community contributions. Through the use of Seman-tic Web data, searchable and interpretable content is added to existing Web-basedcollaborative infrastructures and social spaces, and intelligent use of this contentcan be made within these spaces - bringing the vision of semantics on the Web toits most usable and exploitable level. Some typical application areas for social spaces are wikis, blogs and socialnetworks, but they can include any spaces where content is being created, anno-tated and shared amongst a community of users. Each of these can be enhancedwith machine-readable data to not only provide more functionality internally, butalso to create an overall interconnected set of Social Semantic Information Spaces.These spaces offer a number of possibilities in terms of increased automation andinformation dissemination that are not easily realisable with current social soft-ware applications: By providing better interconnection of data, relevant information can be ob- tained from related social spaces (e.g. through social connections, inferred links, and other references). Social Semantic Information Spaces allow you to gather all your contributions and profiles across various sites (‘subscribe to my brain’), or to gather content from your friend / colleague connections. These spaces allow the use of the Web as a clipboard to allow exchange be- tween various collaborative applications (for example, by allowing readers to drag structured information from wiki pages into other applications, geographic data about locations on a wiki page could be used to annotate information on an event or a travel review in a blog post one is writing). Such spaces can help users to avoid having to repeatedly express several times over the same information if they belong to different social spaces. Due to the high semantic information available about users, their interests and relationships to other entities, personalisation of content and interface input mechanisms can be performed, and innovative ways for presenting related in- formation can be created. These semantic spaces will also allow the creation of social semantic mashups, combining information from distributed data sources together that can also be enhanced with semantic information, for example, to provide the geolocations of friends in your social network who share similar interests with you.8 (URL last accessed 2009-06-09)
  • 24. 2 Motivation for applying Semantic Web technologies to the Social Web 17 Fine-grained questions can be answered through such semantic social spaces, such as ‘show me all content by people both geographically and socially near to me on the topic of movies’. Social Semantic Information Spaces can make use of emergent semantics to ex- tract more information from both the content and any other embedded meta- data. There have been initial approaches in collaborative application areas to incor-porate semantics in these applications with the aim of adding more functionalityand enhancing data exchange - semantic wikis, semantic blogs and semantic socialnetworks. These approaches require closer linkages and cross-application demon-strators to create further semantic integration both between and across applicationareas (e.g. not just blog-to-blog connections, but also blog-to-wiki exchanges). Acombination of such semantic functionality with existing grassroots efforts such asOpenID9 (a single sign-on mechanism) or OAuth10 (an authentication scheme) canbring the Social Web to another level. Not only will this lead to an increased num-ber of enhanced applications, but an overall interconnected set of Social SemanticInformation Spaces can be created.2.4 A food chain of applications for the Social Semantic WebA semantic data ‘food chain’, as shown in Figure 2.3, consists of various produc-ers, collectors and consumers of semantic data from social networks and socialwebsites. Applying semantic technologies to social websites can greatly enhancethe value and functionality of these sites. The information within these sites is forming vast and diverse networks whichcan benefit from Semantic Web technologies for representation and navigation.Additionally, in order to easily enable navigation and data exchange across sites,mechanisms are required to represent the data in an interoperable and extensibleway. These are termed semantic data producers. An intermediary step which may or may not be required is for the collection ofsemantic data. In very large sites, this may not be an issue as the information inthe site may be sufficiently linked internally to warrant direct consumption afterproduction, but in general, many users make small contributions across a range ofservices which can benefit from an aggregate view through some collection ser-vice. Collection services can include aggregation and consolidation systems, se-mantic search engines or data lookup indexes.9 (URL last accessed 2009-06-09)10 (URL last accessed 2009-06-09)
  • 25. 18 The Social Semantic WebFig. 2.3. A food chain for semantic data on the Social Web The final step involves consumers of semantic data. Social networking tech-nologies enable people to articulate their social network via friend connections. Asocial network can be viewed as a graph where the nodes represent individualsand the edges represent relations. Methods from graph theory can be use to studythese networks, and we refer to initial work by (Ereteo et al. 2008) on how socialnetwork analysis can consume semantic data from the food chain. Also, representing social data in RDF (Resource Description Framework), alanguage for describing web resources in a structured way, enables us to performqueries on a network to locate information relating to people and to the contentthat they create. RDF can be used to structure and expose information from theSocial Web allowing the simple generation of semantic mashups for both proprie-tary and public information. HTML content can also be made compatible withRDF through RDFa (RDF annotations embedded in XHTML attributes), therebyenabling effective semantic search without requiring one to crawl a new set ofpages (e.g. the Common Tag11 effort allows metadata and URIs for tags to be ex-posed using RDFa and shared with other applications). Interlinking social datafrom multiple sources may give an enhanced view of information in distributedcommunities, and we will describe applications to consume and exchange thisinterlinked data in future chapters.11 (URL last accessed 2009-07-07)
  • 26. 2 Motivation for applying Semantic Web technologies to the Social Web 192.5 A practical Social Semantic WebApplying Semantic Web technologies to social websites allows us to express dif-ferent types of relationships between people, objects and concepts. By usingcommon, machine-readable ways for expressing data about individuals, profiles,social connections and content, these technologies provide a way to interconnectpeople and objects on a Social Semantic Web in an interoperable, extensible way. On the conventional Web, navigation of data across social websites can be amajor challenge. Communities are often dispersed across numerous different sitesand platforms. For example, a group of people interested in a particular topic mayshare photos on Flickr, bookmarks on and hold conversations on a dis-cussion forum. Additionally, a single person may hold several separate online ac-counts, and have a different network of friends on each. The information existingon each of these websites is generally disconnected, lacking in semantics, and iscentrally controlled by a single organisation. Individuals generally lack control orownership of their own data. Social websites are becoming more prevalent and content is more distributed.This presents new challenges for navigating such data. Machine-readable descrip-tions of people and objects, and the use of common identifiers, can allow for link-ing diverse information from heterogeneous social networking sites. This creates astarting point for easy navigation across the information in these networks. The use of common formats allows interoperability across sites, enabling usersto reuse and link to content across different platforms. This also provides a basisfor data portability, where users can have ownership and control over their owndata and can move profile and content information between services as they wish.Recently there has been a push within the web community to make data portability(i.e. the ability for users to port their own data wherever they wish) a reality12. Additionally, the Social Web and social networking sites can contribute to theSemantic Web effort. Users of these sites often provide metadata in the form ofannotations and tags on photos, ratings, blogroll links, etc. In this way, social net-works and semantics can complement each other. Already within online commu-nities, common vocabularies or folksonomies for tagging are emerging through aconsensus of community members. In this book we will describe a variety of practical Social Semantic Web appli-cations that have been enhanced with extra features due to the rich content beingcreated in social software tools by users, including the following: The Twine application from Radar Networks is an example of a system that leverages both the explicit (tags and metadata) and implicit semantics (auto- matic tagging of text) associated with content items. The underlying semantic data can also be exposed as RDF by appending ‘?rdf’ to any Twine URL.12 (URL last accessed 2009-07-21)
  • 27. 20 The Social Semantic Web The SIOC vocabulary is powering an ecosystem of Social Semantic Web appli- cations producing and consuming community data, ranging from individual blog exporters to interoperability mechanisms for collaborative work environ- ments. The DBpedia represents structured content from the collaboratively-edited Wikipedia in semantic form, leveraging the semantics from many social media contributions by multiple users. DBpedia allows you to perform semantic que- ries on this data, and enables the linking of this socially-created data to other data sets on the Web by exposing it via RDF. combines Web 2.0-type interfaces and principles such as tagging with Semantic Web modelling principles to provide a reviews website that fol- lows the principles of the Linking Open Data initiative (a set of best practice guidelines for publishing and interlinking pieces of data on the Semantic Web). Anyone can review objects defined on other services (such as a movie from DBpedia), and the whole content of the website is available in RDF, therefore it is available for reuse by other Social Semantic Web applications. As Metcalfe’s law defines, the value of a network is proportional to the numberof nodes in the network. Metcalfe’s law is strongly related to the network effect ofthe Web itself: by providing various links between people, social websites canbenefit from that network effect, while at the same time the Semantic Web alsoprovides links between various objects on the Web thereby obeying this law(Hendler and Golbeck 2008). Therefore, by combining Web 2.0 and Semantic Web technologies, we can en-visage better interaction between people and communities, as the global number ofusers will grow, and hence the value of the network. This will be achieved by (1)taking into account social interactions in the production of Semantic Web data,and (2) using Semantic Web technologies to interlink people and communities.
  • 28. 3 Introduction to the Social Web (Web 2.0,social media, social software)Web 2.0 is a widely-used and wide-ranging term (in terms of interpretations),made popular by Tim O’Reilly who wrote an article on the seven features orprinciples of Web 2.0. To many people, Web 2.0 can mean many differentthings. Most agree that it can be thought of as the second phase of architec-ture and application development for the Web, and that the related term ‘So-cial Web’ describes a Web where users can meet, collaborate, and share con-tent on social spaces via tagged items, activity streams, social networkingfunctionality, etc. There are many popular examples that work along this col-laboration and sharing meme: MySpace,, Digg, Flickr,, Technorati, orkut, 43 Things, and the Wikipedia.3.1 From the Web to a Social WebSince it was founded, the Internet has been used to facilitate communication notonly between computers but also between people. Usenet mailing lists and bulletinboards allowed people to connect with each other and enabled communities toform, often around topics of interest. The social networks formed via these tech-nologies were not explicitly stated, but were implicitly defined by the interactionsof the people involved. Later, technologies such as IRC (Internet Relay Chat), webforums, instant messaging, blogging, social networking services, and evenMMOGs or MMORPGs (massively multiplayer online [role playing] games) havecontinued the trend of using the Internet (and the Web) to build communities. The structural and syntactic web put in place in the early 90s is still much thesame as what we use today: resources (web pages, files, etc.) connected by un-typed hyperlinks. By untyped, we mean that there is no easy way for a computerto figure out what a link between two pages means. Beyond links, the nature of theobjects described in those pages (e.g. people, places, etc.) cannot be understood bysoftware agents. In fact, the Web was envisaged to be much more (Figure 3.1). InTim Berners-Lee’s original outline for the Web in 1989, entitled ‘InformationManagement: A Proposal’1, resources are connected by links describing the typeof relationships between them, e.g. ‘wrote’, ‘describes’, ‘refers to’, etc. This is aprecursor to the Semantic Web which we will come back to in the next chapter.1 (URL last accessed 2009-06-09)J.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_3,© Springer-Verlag Berlin Heidelberg 2009
  • 29. 22 The Social Semantic WebFig. 3.1. Adapted from ‘Information Management: A Proposal’ by Tim Berners-Lee Over the last decade and a half, there has been a shift from just ‘existing’ orpublishing on the Web to participating in a ‘read-write’ Web. There has been achange in the role of a web user from just a consumer of content to an active par-ticipant in the creation of content. For example, Wikipedia articles are written andedited by volunteers, uses information about what users view andpurchase to recommend products to other users, and Slashdot moderation is per-formed by the readers. Web 2.02 is a widely-used and wide-ranging term (certainly in terms of inter-pretations) made popular by Tim O’Reilly. O’Reilly defined Web 2.0 as ‘a set ofprinciples and practices that ties together a veritable solar system of sites thatdemonstrate some or all of those principles, at a varying distance from that core’.While this definition is quite vague, he defined seven features or principles ofWeb 2.0, to which some have added an eighth: the long tail phenomenon (i.e.many small contributors and sites outweighing the main players). Among thesefeatures, two points seems particularly important: ‘the Web as a platform’ and ‘anarchitecture of participation’. Actually, in spite of the 2.0 numbering, this vision isclose to the original idea of Berners-Lee for the Web, i.e. that it should be a par-2 (URL last accessed 2009-06-09)
  • 30. 3 Introduction to the Social Web 23ticipative medium. For example, the first Web browser called WorldWideWeb3was already a read-write browser, while current ones are generally read-only. The first idea from O’Reilly of ‘the Web as a platform’ considers the Web andits principles as a way to provide services and value-added applications in additionto generally static contents. In some cases, the Web can even be seen as a transitlayer for information to the desktop or mobile devices, for example, using RSS.We can also consider that ‘the Web as a platform’ refers to the migration of tradi-tional desktop services such as e-mail and word processing to web-based applica-tions, for example, as provided by Google with Gmail and Google Docs. In thatcontext, the vision of ‘an architecture of participation’ emphasises how applica-tions can help to produce value-added content and synergies by simply usingthem, thanks to the way they were designed. As people begin to use Web 2.0 ap-plications for their own needs (uploading pictures, writing blog posts, tagging con-tent), they enhance the global activity of the system and this can be a benefit foreveryone. O’Reilly hence makes a comparison with open-source developmentprinciples and peer-to-peer architectures in relation to how they are providing thesame kind of architecture of participation. The evolution of the Web is - in our opinion - mostly a sociological and eco-nomic one, as referred to in the book ‘Wikinomics’ (Tapscott and Williams 2007).However, thanks to the strong interactions between services and users, it has led tointeresting practices in terms of software development. O’Reilly in particular in-cites application developers to go further than they would in traditional develop-ment processes and to constantly deliver new features, leading to ‘the perpetualbeta’, considering that ‘users must be treated as co-developers’. Agile develop-ments methods are therefore becoming popular on the Web, as well as languagesthat adhere to such software development principles (e.g. Ruby on Rails).Fig. 3.2. The Social Web in simple terms: users, content, tags and comments3 (URL last accessed 2009-06-09)
  • 31. 24 The Social Semantic Web While some describe Web 2.0 simply as a second phase of architecture and ap-plication development for the World Wide Web, others mainly think of it as aplace where ‘ordinary’ users can meet, collaborate, and share content using socialsoftware applications on the Web - via tagged items, social bookmarking, AJAXfunctionality, etc. - hence the term ‘the Social Web’. The Social Web is a platformfor social and collaborative exchange with reusable community contributions,where anyone can mass publish using web-based social software, and others cansubscribe to desired information, news, data flows, or other services via syndica-tion formats such as RSS. There are many popular examples that work along this collaboration and shar-ing meme: Twitter,, Digg, Flickr, Technorati, orkut, 43 Things, Wikipe-dia, etc. It is ‘social software’ that is being used for this communication and col-laboration, software that ‘lets people rendezvous, connect or collaborate by use ofa computer network. It results in the creation of shared, interactive spaces.’4 Withthe Social Web, all of us have become participants, often without realising the partwe play on the Web - clicking on a search result, uploading a video or social net-work page - all of this contributes to and changes this Social Web infrastructure.There may be different motivations for leveraging social websites, from personalexpression to political campaigning (e.g. Barack Obama’s presidential campaignraised 87% of his funds through social websites5). Social websites provide access to community-contributed content that is postedby some user and may be tagged and can be commented upon by others (Figure3.2). That content (termed ‘social media’) can be virtually anything: blog entries,message board posts, videos, audio, images, wiki pages, user profiles, bookmarks,events, etc. Users post and share content items with others; they can annotate con-tent with tags; can browse related content via tags; they may often discuss contentvia comments; and they may connect to each other directly or via posted content. Social websites that are sharing content are covered in some part by what iscalled the Digital Millennium Copyright Act (DMCA)6. It provides a safe harbourif a service cannot reasonably prevent against anything and everything being up-loaded (and is unaware of it). The user agreements of most social websites usuallyrequest that users do not add other people’s copyright material. Fair use is usuallypermitted, such that if one shares something copyrighted they should use an ex-tract with a link to the main content. There are a variety of figures available for the ratio of social media contributorsversus casual browsers or lurkers7 on social websites. CNET’s News.com8 sitesays: ‘A recent Hitwise study indicates that as few as 4 percent of Internet usersactually contribute to sites like YouTube and Flickr, and more than 55 percent are4 (accessed 2009-06-09)5 (URL last accessed 2009-06-09)6 (URL last accessed 2009-06-09)7 (URL last accessed 2009-06-09)8 (URL last accessed 2009-06-09)
  • 32. 3 Introduction to the Social Web 25men. [...] To be the mainstream trend (that it [Web 2.0] deserves to be), it mustevolve from the currently small group of people who are creating and filtering ourcontent to a position where the ‘everyman’ is embraced.’ The UK technology site vnunet.com9 mentions: ‘Bill Tancer, general managerof Hitwise, said that the company’s data showed that only a tiny fraction of userscontributed content to community media sites. Just 0.16% of YouTube users up-load videos, and only 0.2% of Flickr users upload photos. Wikipedia returned amore reasonable percentage, with 4.6% of visitors actually editing and adding in-formation.’ We can deduce that the percentage of contributors may include thosewho upload content items (videos, images, etc.) as well those who comment onthat content.3.2 Common technologies and trendsWe shall now describe some of the common technologies used and other trends insocial websites, including RSS, AJAX, mashups, content delivery, and advertisingmodels. Future chapters will describe typical usages of these features, includingblogging, wiki-based collaborations, and social networking.3.2.1 RSSAs we will see in this book, Social Web principles allow people to publish infor-mation more often and more easily. Consequently, there is a need for readers toknow where to get new and pertinent information and how to consume it. Contentsyndication aims to solve that issue, by providing a website with the means toautomatically deliver the latest content from blogs, wikis, forums or news servicesin computer-readable feeds that can be reused and subscribed to by other peopleand systems. For example, news content from newspapers is often syndicated sothat headlines can be read by people in their own feed reader programs or inte-grated into their own websites. Rather than mass spamming via e-mail, interestedparties can subscribe to feeds to be notified about changes or updates to informa-tion (self service). A common syndication format can have many uses, includingconnecting services together, ‘mashing’ together of data, etc. Previous to syndication, semi-regular visits to bookmarked sites resulted in alack of accuracy in monitoring information. Now, feed aggregators or readers al-low you to check multiple blog or news feeds on a regular basis, and you canchoose to view only new or updated posts since your last access. You can pull in-formation from sites and put it directly into your desktop (Thunderbird) or9 (URL last accessed 2009-06-09)
  • 33. 26 The Social Semantic Webbrowser application (Google Reader, Bloglines), allowing you to quickly scan ahuman-readable view of multiple feeds for relevant content. Intelligent pushing offeeds (e.g. with ‘pingback’) can also be facilitated to update content immediatelyon aggregator sites (e.g. PlanetPlanet) or other search and navigation applications.Content syndication is thus a first step towards the Semantic Web as it providesinteroperability between applications. We shall see later on that it is somewhatlimited in terms of achieving the complete goal. In order to define a standard for modelling such information feeds, various for-mats such as NewsML10 were proposed in the late 90s. The most commonlyadopted syndication format is ‘RSS’, which has various meanings (Really SimpleSyndication, Rich Site Summary and RDF Site Summary) and comes in differentflavours (currently there are eight variations). Some of the variations are from pri-vate organisations (0.9 by Netscape), some of them are closed (2.0), and some ofthem are from open consortiums (1.0). However, they all share the same basicprinciples: the latest articles, with hyperlinks, titles and summaries, are syndicatedusing a computer-readable format (XML or RDF). In general, one does not haveto worry about which feed format a blog or website provides, because practicallyany aggregator or news reader will be able to read it anyway. From the SemanticWeb perspective, the RSS 1.0 variant (in RDF) allows us to combine syndicatedarticles with metadata from other vocabularies such as FOAF or Dublin Core.Fig. 3.3. Content on a blog being published as RSSThe RSS feed structure (as shown in Figure 3.3) is as follows: Class ‘channel’: – Properties ‘title’, ‘link’, ‘description’ – Contains ‘items’ Class ‘item’: – Properties ‘title’, ‘link’, ‘description’, ‘date’, ‘creator’, etc.10 (URL last accessed 2009-06-09)
  • 34. 3 Introduction to the Social Web 27The strength of RSS is in its generality, but therein lies its weakness: when one issubscribed to multiple channels or items, there is no way to easily group by differ-ent types of content based on the available metadata. RSS is used for more thanjust blog headlines and news syndication, having applications in libraries (e.g. toannounce new book acquisitions), shared calendars (, recipeclubs, etc. Executives in many corporations are also starting to mandate what RSSfeeds they wants their companies to provide. Similar to RSS is the Atom Syndication Format11, an XML format and recentIETF standard that is also commonly used for syndicating web feeds (e.g. The lack of unification between RSS formats is one of the reasonsthat led to Atom being created. The Atom Publishing Protocol12 (APP or AtomPubfor short) is related to this, being a simple HTTP-based protocol for creating andmodifying web resources, and the specification was edited by Joe Gregorio andBill de hÓra. One important thing regarding content syndication is the way in which it cannot only enable a user to control the consumption of information, but through Web2.0-type services, the user can also control its production (i.e. they can controlwhen and where it must be delivered, contrary to traditional mailing list subscrip-tions, for example).3.2.2 AJAXAJAX, standing for Asynchronous JavaScript and XML, is a method for creatinginteractive web applications whereby data is retrieved from a web server asyn-chronously without interrupting the display of a currently-viewed web page.AJAX has won over much of the Web due to the seamless interaction it provides,and website developers have voted with their feet by deploying it on their sites. Asan example of AJAX in use, Google Maps retrieves surrounding map image tilesfor a map being displayed on screen, so that when one moves in any direction, thenew map can be displayed without reloading the browser window. One of the challenges with AJAX is that the source code is often available toanyone with a web browser. It is therefore crucial to protect against having some-one ever executing any JavaScript code that is external to an AJAX application.Since browsers were not initially designed for AJAX-type methods, it has alsotaken some years for browsers to become solid productive AJAX containers. Withthe emergence of JIT (just-in-time) compilation technology, browsers runningAJAX code will soon be able to operate at least two to three times faster.11 (URL last accessed 2009-06-09)12 (URL last accessed 2009-06-09)
  • 35. 28 The Social Semantic Web There are currently 50 or 60 AJAX development toolkits, but many believe thatthe Web industry should rally around a smaller number, especially open-sourcetechnologies which offer long-term portability across all the leading platforms. According to Scott Dietzen, president and chief technical officer of Zimbra,their web-based e-mail application is one of the largest AJAX-based web applica-tions (with thousands of lines of JavaScript code)13, and there are more than11,000 participants in the Zimbra open-source community. There are some common techniques for speeding up AJAX applications.Firstly, code should be combined wherever possible. Then, pages are compressedto shrink the required bandwidth for smaller pipes. The next method is caching,which avoids browsers having to re-get the JavaScript and re-interpret it (e.g. byincluding dates for when the JavaScript files were last updated). The last and mostuseful technique is ‘lazy loading’. For cases where a very large JavaScript applica-tion is on a single page, it can be broken up into several modules that can beloaded on demand, reducing the time from when one can first see an application towhen they can start using it. However, while AJAX generally aims to provide user-friendly and intuitive in-terfaces, it can also lead to some usability issues. For example, pages rendered viaAJAX cannot generally be bookmarked as they will not have a proper URL but of-ten use, for example, the homepage URL.3.2.3 Mashups‘Mashups’ (services that combine content from more than one source into an inte-grated experience, often with new browsing and visualisation capabilities such asgeolocation) are also becoming more common in social websites, and the recentPipes service14 from Yahoo! illustrates just some of the possibilities offered bycombining RSS feeds with data and functionality from other sources. A mashup is a web application that combines data from multiple sources into asingle integrated tool. The term mashup can apply to composite applications,gadgets, management dashboards, ad-hoc reporting mechanisms, spreadsheets,data migration services, social software applications and content aggregation sys-tems. In the mashup space, companies are either operating as mashup builders ormashup infrastructure players. According to ProgrammableWeb15, there are nowaround 400 to 500 mashup APIs available, but there are 140 million websites ac-cording to NetCraft, so there is a mismatch in terms of the number of servicesavailable to sites.13 (URL last accessed 2009-06-09)14 (URL last accessed 2009-06-09)15 (URL last accessed 2009-06-09)
  • 36. 3 Introduction to the Social Web 29 The main value of mashups is in combining data. For example, HousingMaps,a mashup of Google Maps and data from craigslist (Figure 3.4), was one of thefirst really useful mashups. One of the challenges with mashups is that they arenormally applied to all items in a data set, but if you are looking for a house, youmay want a mashup that allows you to filter by things like school district ratings,fault lines, places of worship, or even by proximity to members of your Facebookor MySpace social network.Fig. 3.4. The HousingMaps website integrates online accommodation data with a geographicalmapping service Mashups are also being used in business automation to automate internal proc-esses, e.g. to counteract the time wasted by ‘swivel-chair integration’ wheresomeone is moving from one browser on one computer to another window andback again to do something manually. Content migration via mashups has beenfound to be more useful than static migration scripts since they can be customisedand controlled through a web interface. Rod Smith says16 that mashups allow content to be generated from a combina-tion of rich interactive applications, do-it-yourself applications plus the current‘scripting renaissance’ (e.g. as described in the previous section on AJAX). Ac-cording to Joe Keller, marketing officer with Kapow17, the three components of amashup are the presentation layer, logic layer, and the data layer - i.e. access tofundamental or value-added data. Fundamental data includes structured data, standard feeds and other data thatcan be subscribed to, basically, data that is open to everyone. The value-added16 (URL last accessed 2009-06-09)17 (URL last accessed 2009-06-09)
  • 37. 30 The Social Semantic Webdata is more niche: unstructured data, individualised data, vertical data, etc. Theappetite for data collection is growing, especially around the area of automation tohelp organisations with this task. The amount of user-generated content availableon the Social Web is a goldmine of potential mashup data, enabling one to createmore meaningful time series that can be mashed up quickly into applications.However, Keller claims that the primary obstacle to the benefit of value-addeddata is the lack of standard feeds or APIs for this data. We shall discuss in Chapter12 how the Semantic Web can help with this problem and can help to enhancemashup development.3.2.4 AdvertisingWith the advent of Web 2.0, web-based advertising is often classified into threecategories: banners and rich media, list-type advertisements, and mobile advertis-ing (i.e. a combination of banner and ad lists grouped together on a mobile plat-form), according to Rie Yamanaka, director with Yahoo! Japan’s commercialsearch subsidiary Overture KK18. Ad lists are usually quite accurate in terms oftargeting since they are shown and ranked based on a degree of relevance to whata user is looking at or for. The focus has also shifted from TV and radio advertis-ing towards Internet-based advertising and it is growing exponentially, primarilydriven by ad lists and mobile ads. In terms of metrics, traditionally Internet-based ads have been classified interms of what one wants to achieve. For banner ads (which many think of as beingvery ‘Web 1.0’-like), the number of impressions is key (e.g. if one is advertising afilm, the volume of graphic ads displayed is most important) and charges arebased on what is termed the CPM (cost per mille or thousand). However, ad lists(as shown in search results where the aim is to get a full web page on the screen)are focussed more on rankings and the CPC (cost per click), and are often associ-ated with Web 2.0 where the fields of SEO (search engine optimisation) and SEM(search engine marketing) come into play. Another term that is now becomingmore important is the CPA (cost per acquisition), i.e. how much it costs to acquirea customer. Four trends (with associated challenges) are quite important in the field of web-based advertising: the first is increased traceability (i.e. how one can track andkeep a log of who did what); the next is behavioural or attribute-based targeting(over one-third of websites are now capable of behavioural targeting according toAdvertising.com19); the third is APIs for advertising (interfacing with traditionalbusiness workflows); and finally is the integration between offline and online me-dia (where the move to search for information online is becoming prevalent).18 (URL last accessed 2009-06-09)19 (last accessed 2009-06-09)
  • 38. 3 Introduction to the Social Web 311. With traceability, one can get a list of important keywords in searches that re- sult in subsequent clicks, with the ultimate aim of increasing revenues. Search engine marketing (SEM) can also be used to help eliminate the loss of opportu- nities that may occur through missed clicks. The greatest challenge in the world of advertising is figuring out how much in total or how much extra a company makes as a result of advertising (based on what form of campaign is used). If one can figure out a way to link sales to ads, e.g. through internet conversion where one can trace when a person moves onwards from an ad and makes a purchase, then one can get a measure of the CPA. On the Web, one can get a traceable link from an ad impression to an eventual deal or transaction (through clicking on something, browsing, getting a lead, and finding a prospect). One can also compare targeted results and what a customer did depending on whether they came from an offline reference (e.g. through custom URLs for offline ads) or directly online. For companies who are not doing business on the Web, its harder to link a sale to an ad (e.g. if someone wants to buy a Lexus, and reads reference material on the Web, they may then go off and buy a BMW without any traceable link).2. Behavioural targeted advertising, based on a user’s search history, can give advertisers a lot of useful information. One can use, for example, information on gender (i.e. static details) or location (i.e. dynamic details, perhaps from an IP address) for attribute-based targeting. This can also be used to provide per- sonalised communication methods with users, so that very flexible products can be deployed as a result. Spend on behavioural targeted advertising is continuing to grow at a significant rate due to a combination of greater advertiser accep- tance and greater publisher support. By 2011, ‘very large publishers will be selling 30% to 50% of their ad inventory using this [behavioural targeting] technique’, according to Bill Gossman, CEO of Revenue Science20.3. APIs for advertising can be combined with core business flows, especially when a company provides many products, e.g. or travel services. For a large online retailer, there can be logic that will match a keyword with the current inventory, and the system will hide certain keywords if associated items are not in stock. This is also important in the hospitality sector, where for ex- ample there should be a change in the price of a product when it goes past a best-before time or date (e.g. hotel rooms normally drop in price after 9 PM). With an API, one can provide highly optimised ads that cannot be created on- the-fly by people. Advertisers can therefore take a scientific approach to dy- namically improving their offerings in terms of cost and sales.4. Matching online information to offline ads, while not directly related to Web 2.0, is important too. Web 2.0 is about personalisation, and targeting internet- based ads towards segmented usergroups is of interest, and so there is a need to find the best format and media to achieve this. If one looks at TV campaigns, one can analyse information about how advertising the URL for a particular20 (URL last accessed 2009-06-09)
  • 39. 32 The Social Semantic Web brand can lead to people visiting the associated website. Some people may only visit a site after seeing an offline advertisement, so there can be a distinct mes- sage sent to these types of users. If a TV ad shows a web address, it can result in nearly 2.5 times more accesses than could be directly obtained via the Inter- net (depending on the type of products being advertised), so one can attract a lot more people to a website in this way. There is a lot of research being carried out into how to effectively guide people from offline ads to the Web, e.g. by combining campaigns in magazines with TV slots. It depends on what service or product a customer should get from a company, as this will determine the type of information to be sent over the Web and whether giving a good user experience is important (since you many not want to betray the expectations of users and what they are looking for). Those in charge of brands for websites need to understand how people are getting to a particular web page when there are many different entry points to a site. It is also important to understand why customers who watch TV are being invited onto the Web: if it is for govern- ment information, selling products, etc. The purpose of a 30-second advert may actually be to guide someone to a website where they will read material online for more than five minutes. In the reverse direction (i.e. using online informa- tion to guide offline choices), there are some interesting statistics. According to comScore21, pre-shoppers on the Web will spend 41% more in a real store if they have seen internet-based ads for a product (and for every dollar that pre- shoppers spend online, they may spend an incremental $6 in-store22). Since much of the information in social websites contains inherent semanticstructures and links, advertising campaigns can be created that will focus on cer-tain topics or profile information. The semantic graph categorises people, places,organisations, products, companies, events, places, and other objects, and definesthe relationships among them. Users can define new profile categories and addmetadata to these categories that can help improve the relevance of advertisingengines. That metadata can then be used to personalise advertising content andprovide targeted solutions to advertisers.3.2.5 The Web on any deviceThere has been a gradual move from the Web running as an application on variousoperating systems and hardware to the Web itself acting as a kind of operatingsystem (e.g. Google Chrome OS23), where a variety of applications can now runwithin web browsers across a range of hardware platforms. Due to the range of21 (URL last accessed 2009-06-09)22 (accessed 2009-06-09)23 (accessed 2009-07-21)
  • 40. 3 Introduction to the Social Web 33web-based applications available, browser software is essential for a range ofcomputing systems including the desktop, mobile phones and for other devices(e.g. the Nintendo Wii, the iPhone and the One Laptop Per Child24 $100 laptop).For example, the Opera Mini browser for mobiles is a small (100 kB) Java-basedbrowser, where processing of pages takes place via proxy on a fixed network ma-chine, and then a compressed page is sent to the browser. As the Web has moved from text pages towards multimedia files that can beaccessed via portable devices, there is a corresponding need to be able to handlethese new media types on the Web. According to Håkon Wium Lie, one of theprime architects of CSS (Cascading Style Sheets) and the chief technical officerwith Opera Software25, video needs to be made into a ‘first-class citizen’ of theWeb. At the moment, it takes a lot of ‘black magic’ and third-party plugins andobject tags before you can get video to work in a browser for most users. There are two problems that need to be solved. The first is how videos are rep-resented in markup. Some have proposed that a <video> element be added to theforthcoming HTML5 specification. The second problem is in relation to a com-mon video format. Håkon Wium Lie says that the Web needs a baseline formatbased on an open standard (e.g. Ogg Theora, which is free of licensing fees), andin HTML5 there could be a soft requirement or recommendation to use this for-mat. SVG effects (overlays, reflections, filters, etc. created in the Scalable VectorGraphics format) can also be combined with these video elements, and somebrowsers can access the 3-D engines of graphics card hardware to render bitmapsonto vector surfaces for rotations or other graphical animations. HTML5 will in-clude new parsing rules, new media elements, some semantic elements (section,article, nav, aside), and also some presentational elements will be removed (cen-ter, font). CSS can also be used to control how the Web is displayed on different devices,where screen real estate may be more limited. The CSS Zen Garden26 allows peo-ple to take a boring document and to test how their stylesheets will alter its look.CSS has a number of properties for handling fonts and text on the Web, and dif-ferent styles may be more appropriate for different devices. Browsers have aroundten fonts that can be viewed on most platforms (usually Microsoft’s core freefonts), but there are many more fonts out there (e.g. there are 2500 font familiesavailable on Font Freak). In CSS2, you can import a library of fonts, so that fontsresiding on the Web can be used more across a range of devices in the future. There is also the Acid2 test which can be used to test how well web pages dis-play on a variety of browsers and devices. Acid2 consists of a single web pagewhere every element is positioned using some CSS or HTML code with somePNGs, and if a browser renders it correctly, it should show a smiley face (butrarely does).24 (URL last accessed 2009-07-16)25 (URL last accessed 2009-06-09)26 http:// (URL last accessed 2009-07-16)
  • 41. 34 The Social Semantic Web A variety of convergences are taking place between other software applicationsand hardware systems. E-mail has made great progress in becoming part of theweb experience (Hotmail, Gmail, Yahoo! Mail, etc.). The same thing is now hap-pening to IM (instant messaging), to VoIP (voice over IP), to calendars, etc. Forexample, a presence indicator next to an e-mail inbox shows if each user is avail-able for an IM or a phone call. In the reverse direction, if someone tries to call orIM you, you can push back and say that you just want them to e-mail you becauseyou are not available right now. Being able to prioritise communications based onwho your boss is, who your friends are, etc., is a crucial aspect of harnessing thepower of ubiquitous computing. On voice, you often want to be able to see yourcall logs, using these to click on a person and call them again, but you may alsowant to forward segments from that voice call over e-mail or IM. Internet-enabled mobile phones and mobile applications of Web services, espe-cially microblogging services such as Twitter – that we will describe later on inthis book – also augment this process of ubiquitous computing, or more specifi-cally ubiquitous social networking. More and more people tend to share their loca-tions or their activities as a live stream of their life (‘lifestream’), and all in realtime.3.2.6 Content deliveryThe Web has moved beyond a means for distributing just textual data and still im-ages. As video becomes that aforementioned first-class citizen on the Web, newparadigms for sharing and accessing videos are required due to the expectations ofusers in terms of social website usage conventions and also due to the sheer vol-ume of data involved. The market for IP video via the Web and the Internet ishuge, and a recent Cisco report called the ‘Exabyte Era’ shows that P2P (peer topeer), which currently accounts for 1014 PB of traffic each month, will continueto rise with a 35% year-over-year growth rate. User-contributed computing hastaken off, and is delivering over half of all Internet traffic today. P2P video is nowbeing accessed via the Web, with companies like Joost (a multichannel online TVservice) moving from standalone applications to browser-embedded solutions. According to Eric Klinker, chief technical officer for content-delivery networkBitTorrent Inc.27, a new order of magnitude has arrived, the exabyte (EB). One ex-abyte is 2^60 bytes, which is 1 billion gigabytes. If you wanted to build a websitethat would deliver 1 EB per month, you would need to be able to transfer at a rateof 3.5 TB/s (assuming 100% network utilisation). 1 EB corresponds to 292,000years of online TV (stream encoded at 1 MB/s), 5,412 years of Blu-Ray DVDvideo (maximum standard 54 MB/s), 29 years of all online radio traffic, 1.7 yearsof YouTube traffic, or just one month of P2P traffic. If you had a central service27 (URL last accessed 2009-06-09)
  • 42. 3 Introduction to the Social Web 35and wanted to deliver 1 EB of data, you would need about 6.5 MB/s peak band-width, and 70,000 servers requiring about 60-70 megawatts in total. At a price of$20 per MB/s, it would cost about $130 million to run per month! The ‘Web 2.0’ way is to use peers to deliver that exabyte of data. However, notevery business is ready to be governed entirely by their userbase and sometimes ahybrid-model approach is required (e.g. 55 major studios have made 10,000 titlesavailable via By leveraging the Web 2.0 nature of distributedcomputing, we can enable many things that would not or could not be achievedotherwise. For example, Electric Sheep is a distributed computing application thatrenders a single frame on your machine for a 30-second long screensaver, whichyou can then use. Social networks also require many machines, but the best example of distrib-uted computing is web search. Google has an estimated 500,000 to 1 million serv-ers, corresponding to $4.5B in cumulative capex (capital expenditure) or 21% oftheir Q2 net earnings (according to Morgan Stanley). And yet, search is still not agreat experience today, since you often have a hard time finding what you want.Search engines are not contextual, they do not see the whole Internet (e.g. the‘dark web’ consists of content that is not indexed by search engines), they are notparticularly well personalised or localised, and they sometimes are not dynamicenough (i.e. they cannot keep up with the content from social websites). The best Web 2.0 applications have involved user participation, with users con-tributing to all aspects of the application (including infrastructure). ‘Harness thepower of participation, and multiply your ability to deliver a rich and powerful ap-plication’, says Eric Klinker. Developers need to consider how users can do this(not only through contributed content or code, but also through computing power).3.2.7 Cloud computingAccording to Kai-Fu Lee, Vice President of Engineering at Google and Presidentof Google Greater China28, cloud computing can provide many of the features thatusers of the Web now expect: accessibility, shareability, freedom (i.e. their datawherever they are), simplicity, and security. Data is stored in the ‘Cloud’, on some server somewhere that is not necessarilyknown by the user, but it is ‘just there’ and accessible. As an analogy, banks toohave become ‘Clouds’, allowing people to go to any ATM and remove moneyfrom their bank wherever they are. Electricity can be thought of in a similar fash-ion, as it can come from various places, and you do not have to know where itcomes from: it just works. Software and services are also moving to the Cloud,usually accessible via a fully-featured web browser on a client device. Finally, theCloud should be accessible from any device, especially from phones. When the28 (URL last accessed 2009-06-09)
  • 43. 36 The Social Semantic WebApple iPhone was released, Google found that web usage from that device was 50times greater than that from other web-capable phones. Where the PC era was hardware centric and the client-server era was moresoftware centric (making it suitable for enterprise computing), cloud computing ismore service centric: abstracting the server, making it very scalable, hiding com-plexities, and allowing the server to be anywhere. Three key requirements formaking cloud-based computing a reality are now in place: the falling cost of stor-age, ubiquitous broadband, and the democratisation of the tools of production.These forces are also allowing cloud-based computing to become more like a util-ity, and much of this is due to IBM and DEC’s pioneering work in the 1990s onmaking computing itself a utility. Kai-Fu Lee says that there are six important properties of cloud computing, be-ing: user centric; task centric; powerful; accessible; intelligent; and programmable.1. User centric means that both the data and the application moves with you. Peo- ple do not want to reinstall their address books or their applications on different machines as it is painful to do it. If someone drops or breaks their laptop, they will be anxious about potentially losing data, and we also know how difficult it is to do something as simple as switching mobile phones. It is hard because synchronising data is usually complicated. For example, the infrared (IR) func- tionality on a mobile phone is not easy to use or user centric: how often do people use IR to backup data to their laptops? If the data is stored in the Cloud instead - images, messages, whatever - once you are connected to the Cloud, any new PC or mobile device that can access your data or that allows you to create data becomes yours, even if the device itself cannot store all of your data. Not only is the data yours, but you can share it with others: you do not have to worry about where the data is. PCs are normally our window to the world, but mobile devices can do more. There are around three billion mobile phone users worldwide, dwarfing the number of PCs that are Internet- accessible. Since mobile services know who you are and often where you are, they can give you more targeted content. Intelligent mobile search is extremely useful, giving local listings and results relevant to context. The most powerful and popular application is the aforementioned mapping with mashups - espe- cially when people get lost or if they spontaneously want to go somewhere - with mapping applications allowing users to search for relevant attractions nearby or even see real-time traffic flows, etc. As there is a move from e-mail usage towards photo sharing or mapping applications, these are moving into the Cloud as well.2. There is a move towards task-centric computing where the applications of the past - spreadsheets, e-mail, calendars - are becoming modules, and they can be composed and laid out in a task-specific manner. For example, the task may be multiple teachers collaborating on the creation of a departmental curriculum, where one can see a list of the users who are currently viewing the curriculum spreadsheet, and those teachers can have real-time debates via chat in parallel
  • 44. 3 Introduction to the Social Web 37 with the curriculum development. Spreadsheet editing allows collaboration and publishing between a selected group of people, with associated version control mechanisms.3. Having many computers in the Cloud means that powerful computing tasks can be carried out that a single personal computer cannot perform. For example, search engines work faster than searching in desktop applications such as Thunderbird or Word. Of course, web search has to be much faster even though there are many more documents: for the storage, if there are 100 billion pages at 10 kB per page, this corresponds to about 1000 TB of disk space. Cloud computing should therefore have an infinite amount of disks and computation at its disposal. When a query is issued to a web search engine like Google, it queries at least 1000 machines (potentially accessing thousands of terabytes of data).4. Accessibility to diverse data types in different clouds can be achieved through universal search. For example, if you want to do a specific type of search, for restaurants, images, etc. in a particular location, PageRank-type web search may not necessarily be the best option. It is difficult for most people to get to the right vertical search page in the first place since they usually cannot re- member where to go. Universal search is basically a single search system that will access all of these vertical searches. This search requires simultaneously querying and searching over all of the specific databases: news, images, videos, tens of such sources today, with potentially hundreds or thousands of them in the future. These multiple simultaneous searches then get ranked, so it will be even more computing intensive than current web search methods.5. Intelligent data mining and massive data analysis are required to generate some intelligence for the masses of data available in cloud computing applications. But this needs to be combined with people - via their collaboration and contri- butions - to change a mass (or mess!) of photos or facts or whatever into a very powerful combination. People and computing tools working together can create intelligent knowledge. Applications like Google Earth are much more useful when people can contribute to them, e.g. as National Geographic showed when they added many high-resolution photos to it. Reviews, 3-D buildings, etc. can turn a tool from a collection of pictures into something special. Creativity adds connections to data-centric applications, enabling intelligent combinations of content. But With all of this data comes the issue of server costs. In a choice between high-end servers or cheap PC-class servers, better cost efficiency and improved performance can be achieved by going for very many PC-class servers (as long as appropriate reliability mechanisms are put in place to deal with the higher failure rates).6. With the many servers used for storing and accessing masses of data in cloud computing setups, there are associated new programming solutions required for fault tolerance, distributed shared memory, and other programming paradigms. For fault tolerance, Google uses the ‘Google File System’ which is a distributed disk storage architecture. Every piece of data is replicated three times. If one
  • 45. 38 The Social Semantic Web machine dies, a master redistributes the data to a new server. There are around 200 clusters (some with over 5 PB of disk space on 500 machines). Google uses what they call the ‘Big Table’ for distributed memory. For example, if one is storing every web page from, no single machine can store that so multiples are required. The largest cells in the Big Table are 700 TB, spread over 2000 machines. MapReduce is an example of a new programming para- digm for cloud computing, where a trillion records are cut into a thousand parts on a thousand machines. Each machine will then load a billion records and will run the same program over these records, and then the results are recombined. While in 2005, there were some 72,000 jobs being run on Google’s MapRe- duce setup, in 2007, there were two million jobs (the usage seems to be increas- ing exponentially). One criticism of cloud computing is that it makes many of the same claims asnetworked computing and networked desktops did in the 1990s (i.e. the theory thatusers would move to using computers with no hard drives, where all data wouldbe stored elsewhere and would be available from any networked desktop that auser would choose to use). However, users still wanted control over their owndesktops (in particular, having offline access to the data contained therein), andhence local storage is still a primary consideration when purchasing a computer.3.2.8 FolksonomiesAs mentioned, a key feature of social websites is that contributed content may betagged with a keyword by the content creator or sometimes by community mem-bers. Tagging is common to many social websites - a tag is a keyword annotationthat acts like a subject or category for the associated content. Folksonomies are constructed from these tags: they are collaboratively gener-ated, open-ended labelling systems that enable users of social websites to catego-rise their content using the tags system, and to thereby visualise popular tag usagesvia ‘tag clouds’ (visual depictions of the tags used on a particular website, like aweighted list in visual design) as shown in Figure 3.5. Examples of systems thatuse tags are blogs, social bookmarking sites, photo or video sharing services andwikis. Folksonomies are a step in the same direction as the Semantic Web (Specia andMotta 2007). The Semantic Web often uses top-down controlled vocabularies todescribe various domains, but it can also utilise folksonomies and therefore de-velop more quickly since folksonomies are a great big distributed classificationsystem with low entry costs. We shall discuss social tagging and the connectionsbetween folksonomies, tagging and the Semantic Web in more detail in Chapter 8.
  • 46. 3 Introduction to the Social Web 39Fig. 3.5. A typical tag cloud3.3 Object-centred socialitySocial networks exist all around us - at workplaces as well as within families andsocial groups. They are designed to help us work together over common activitiesor interests, but anecdotal evidence suggests that many online social networkingservices (SNSs) lack such common objectives (Irvine 2006). Instead, users oftenconnect to others for no other reason than to boost the number of friends they havein their profiles29. Many more browse other users’ profiles simply for curiosity’ssake. These explicitly-established connections become increasingly meaninglessbecause they are not backed up by common objects or activities. The act of connecting sometimes becomes a site’s primary (only) activity. Infact, some sites act simply as enhanced address books: although potentially usefulfor locating or contacting someone, they provide little attraction for repeat visits.This is a flaw with the current theory. As Jyri Engeström, co-founder of microblogging site (microblogging is a lightweight form of bloggingthat consists of short message updates), puts it30, ‘social network theory is good atrepresenting links between people, but it doesn’t explain what connects those par-ticular people and not others.’ Indeed, many are finding that social networkingsites are becoming increasingly boring and meaningless. Another problem is that the various SNSs do not usually work together. Youthus have to re-enter your profile and redefine your connections from scratchwhen you register for each new site. Some of the most popular SNSs probablywould not exist without this sort of ‘walled garden’ approach31, but some flexibil-29 (URL last accessed 2009-06-09)30 (accessed 2009-06-09)31 (URL last accessed 2009-06-09)
  • 47. 40 The Social Semantic Webity would be useful. Users often have many identities on different social networks.Reusable or distributed social networking profiles would let them import existingidentities and connections (from their own homepage or another site they are reg-istered on), thereby forming a single global identity with different views. Engeström has theorised32 that the longevity of social websites is proportionalto the ‘object-centred sociality’ (Bouman et al. 2008) occurring in these networks,i.e. the degree to which people are connecting via items of interest related to theirjobs, workplaces, favourite hobbies, geographic locations, etc. Similarly, Ken Jor-dan and colleagues advocate augmented social networks, in which citizens formrelationships and self-organise into communities around shared interests (Jordan2003).Fig. 3.6. Users form object-centred social networks (using their possibly multiple online ac-counts) around the content items they act on via social websites32 (accessed 2009-06-09)
  • 48. 3 Introduction to the Social Web 41 On the Web, social connections are formed through the actions of people, viathe content they create together, comment on, link to, or for which they use similarannotations. Adding annotations to items in social networks (using topic tags,geographical pinpointing, etc.) is particularly useful for browsing and locating in-teresting items and people with similar interests. Content items such as blog en-tries, videos, and bookmarks serve as the lodestones for social networks, drawingpeople back to check for new items and for updates from others in their network.On Flickr, people can look for photos categorised using an interesting ‘tag’, orconnect to photographers in a specific community of interest. On Upcoming,events are also tagged by interest, and people can connect to friends or like-minded others who are attending social or professional events in their own local-ity. Figure 3.6 is illustrative of an object-centred social network for three people,showing their various user accounts on different websites and the things that theycreate and do using these accounts. Rather than being connected simply throughonline social network relationships (i.e. by explicitly-defined friends-type con-tacts), these people are bound together (via their user profiles) through ‘social ob-jects’ of common interest: the content they create together, co-annotate, or forwhich they use similar annotations. For example, Bob and Carol are connectedthrough bookmarked websites that they both have annotated on musical keyboardsand also through music-related events that they are both attending. Similarly, Al-ice and Bob are using matching tags on media items about pets and are subscribedto the same blog on birds. For many social websites, success has come from enabling communitiesformed around common interests, where the users are active participants who aswell as consuming information also provide content and metadata. In this way, itis probable that people’s SNS methods will continue to move closer towards simu-lating their real-life social interaction, so that people will meet others via some-thing they have in common, not by randomly approaching each other - eventuallyleading towards more realistic interaction methods with friends as online connec-tions become intertwined with their real-world interests. Multiplayer online gaming has had groups (‘clans’) of people working towardscommon purposes for more than a decade, even though they may never have meteach other in real life. We may start to see gaming social websites where real-timemultiplayer online games will appear in browser-embedded windows just as You-Tube does for videos, with user-to-user conversations and running commentariesbeing carried out in parallel to these games. Web interactions have not reached thelevel of real-world interaction just yet, but real-time microblogging and featureslike being able to respond to people’s videos with video comments of your own(e.g. as on YouTube) are going in this direction. These online activities can oftenlead to real-world group meetups (e.g. localised Tweetup events for microbloggersarising from updates on Twitter), with online activities reinforcing offline onesand vice versa.
  • 49. 42 The Social Semantic Web Virtual worlds such as Second Life have already begun to provide a user ex-perience which is more faithful to reality and where networks of friends are inter-acting in much more realistic ways. Users interact via avatars in a three-dimensional environment where they can move between different areas and social-ise with other residents. An important aspect of Second Life is that the world islargely user-created. Residents can buy land, construct houses and create objects.It is also possible to trade with other users, as well as buy or sell using the world’sinternal currency, the Linden Dollar. Second Life’s world encourages residents tomeet and stay in touch with other users with similar interests via themed areas andevents – a prime example of object-centred sociality.3.4 Licensing contentAs we previously mentioned, an important feature of social websites is the waythey let users add content to the Web, no matter what format it is in. When pub-lishing content on the Web, it is important to know how people will be able to ac-cess and reuse it. Should they be allowed to copy and paste someone else’s blogpost on a wiki? Can they use a picture someone else has taken for their next bookcover? Or can they reuse some songs written by someone else that are only meantfor non-commercial use? Contrary to common perception, if something is freelyaccessible on the Web, it does not mean that it is free to reuse. The Creative Commons33 (CC) project provides a legal framework that definesrights regarding how one can reuse content that users publish. It provides six dif-ferent contracts for defining if and how people can reuse content that has beenpublished, if they can modify it and if it may be used for commercial purposes. Inthis way, content providers can decide exactly how people can use their content.For example, on Flickr, different licences are offered when adding a picture, andvarious bands publish their content using CC licences on Jamendo34. ccMixter35 isa Creative Commons-sponsored service that provides ways to create and publishmashups of CC-based content.3.5 Be careful before you postWhen filling out profiles on social websites or posting content, users are oftenoblivious to the fact that the content they post is not just available to their friendson the site, but by default, content is usually visible to everyone. Some are quite33 (URL last accessed 2009-07-16)34 (URL last accessed 2009-06-09)35 (URL last accessed 2009-06-09)
  • 50. 3 Introduction to the Social Web 43happy to post personal content online if their online presence is a significant com-ponent of their real life (e.g. for social media experts, online or offline celebrities,etc.) Others may be more private and therefore should be aware of the public na-ture of content they post online. There are some basic guidelines that should befollowed when adding information to either personal or public areas on socialwebsites: Common sense should prevail, and you should not post anything that you would not give to a stranger in the street. That includes your phone number, your address, your birth date, etc. Try not to use your real name or your e-mail address in your online nickname or posting account. Keep your work e-mail details separate from accounts used for message boards or blogs where you post informally: get a Hotmail or Gmail account for such activities. Also, do not give any account password to your friends unless there is a very good reason to do so. Be careful about posting potentially damaging information about your relation- ships with professional colleagues and friends or family, or personal specifics about yourself (because even though you may be posting anonymously, it can be very easy for someone to put one and one together and figure out who you are). If you post inflammatory statements about something or somebody, be aware that doing so under your own name may lead to a campaign of retaliation against you. If you post defamatory statements, be prepared for legal action. There is effectively a permanent record of what you contribute to the (Social) Web (e.g. if you let slip something you should not about your workplace or family, sometimes even if the original site disappears). It may be on the origi- nal site you posted on, in Google’s cache, in the Wayback Machine (a periodic caching of website content by the Internet Archive that stretches back to the beginnings of the Web), or someone may just save it to their own site or com- puter. Remember that when you post something sensitive: it could well be there forever, for your parents, your kids, your boss, your future employer to see (even after you have logged off from this mortal coil). Blogging is a powerful medium due to its open nature and public contributions, but it is this openness that means that whatever you say can be read by all and people can build up a picture of who you are and what you are doing (even if you do not realise that they are reading or actively following your blog). As with social networking sites, some people mistakenly think that their blog is only being read by a closed circle of friends, but of course if it is publicly ac- cessible, anyone or any search engine can get access to it and forward content to others. Finally, you should not arrange to meet anyone you have only talked to online alone in the real world.
  • 51. 44 The Social Semantic WebThe above guidelines are not an attempt to make Social Web users paranoid, but itis prudent to be careful about what you contribute. There is already a huge amountof publicly-available information about individuals ranging from phone book en-tries to local government planning applications and objections, and it is becomingeasier to link this to less formal information such as blog posts or photos taken (ofyou, by others) at parties or other events.3.6 Disconnects in the Social WebThe Social Web is allowing people to connect and communicate via the Internet,resulting in the creation of shared, interactive spaces for communities and collabo-ration. There is currently a large disconnect in the online social space. Blogs, fo-rums, wikis and social networking sites all can contain vibrant active communi-ties, but it is difficult to reuse and to identify common data across these sites. Forexample, Wikipedia contains a huge body of publicly-accessible knowledge, butreuse of this knowledge outside of Wikipedia and incorporating it into other appli-cations poses a significant challenge. As another example, a user may create content on several blogs, wikis and fo-rums, but one cannot identify this user’s contributions across all the different typesof social software sites. We shall address this in future chapters by describingmethods for connecting these sites. We shall identify core vocabularies for de-scribing interlinked social spaces, and provide guidelines and tools for describingcontent in social software. Our use cases will provide examples of how addingsemantic information to social websites will enable richer applications to be built.
  • 52. 4 Adding semantics to the WebThe ‘Semantic Web’ can be thought of as the next generation of the Webwhere computers can aid humans with their daily web-related tasks as moremeaningful structured information is added to the Web (manually and auto-matically). For example, using a combination of facts like ‘John works_atNUI Galway’, ‘Mary knows John’, ‘John is a Person’, ‘Mary is a Person’,‘NUI Galway is an Organisation’, ‘A Person works_at an Organisation’, and‘A Person knows a Person’, you can allow computers to answer relativelystraightforward questions like ‘Find me all the people who know others whowork at NUI Galway’ which at the moment is quite difficult for us to dowithout some manual processing of information returned from search results.4.1 A brief historyDuring the evolution of human civilisation, new technologies enabled the creationof more and more recordings of knowledge in various media forms. The inventionof the Web and the evolution towards the Semantic Web can be explained by theneed to cope with the ever-increasing amount of information and knowledge. Indeed, the creation and recording of knowledge began with cave drawings asearly as 32000 BC. One of the earliest written expressions was the Sumerian cu-neiform scripts written on clay tables about 3000 BC. Rapid progress in the crea-tion and distribution of text and pictures was made by the invention of the me-chanical printing press by Johannes Gutenberg in 1440. The invention ofphotography 1839 by Louis Daguerre added another principal form of media, fol-lowed by the invention of the phonograph for sound recordings by Thomas AlvaEdison in 1877 and the capability to effectively create movies developed by theLumière brothers around 1885. The proliferation of media created the need to collect and organise these mediaitems. As early as 700 BC, media items were organised in libraries, and the An-cient Library of Alexandria, assumed to have up to 1 million scrolls, is one of themost well-known examples of an early large collection of media. However, thoselibraries were (and still are) centralised collections of media with strict organisa-tion and indexing principles. In 1945, Vannevar Bush, a science administrator for the US government, wasone of the first people to realise that the proliferation of knowledge in various me-dia forms had opened up new challenges that central repositories and the indexingmechanisms of conventional centralised libraries could not meet. That led Bush toJ.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_4,© Springer-Verlag Berlin Heidelberg 2009
  • 53. 46 The Social Semantic Webpostulate the ‘Memex’ (Bush 1945), a device ‘in which an individual stores all hisbooks, records, and communications, and which is mechanised so that it may beconsulted with exceeding speed and flexibility. It is an enlarged intimate supple-ment to his memory.’ The idea of the Memex was picked by Douglas Engelbart, a computer engineerat Stanford Research Institute, who was aiming towards augmenting human intel-lect (Engelbart 1962) to increase the capability of a person ‘to approach a complexproblem situation, to gain comprehension to suit his particular needs, and to derivesolutions to problems.’ In the course of working on augmenting human intellect,Engelbart invented many of the core technologies we are still using today, such ashypertext, display editing, and the mouse. However, the core idea of augmentinghuman intellect on a societal scale as formulated in (Engelbart 1962) remainslargely unrealised. Ted Nelson, another information technology pioneer, foundedthe Xanadu project in 1960 with the goal of creating a computer network with asimple user interface. Tim Berners-Lee, a programmer at the European Organisation for Nuclear Re-search (CERN) who was inspired by the visions of Engelbart and Nelson, was ableto realise part of the idea: the hypertext system we know today as the World WideWeb, or the Web for short, which made global information access feasible. Com-puters on the Web use the HTTP (Hypertext Transfer Protocol) communicationsstandard to transfer web pages containing display instructions in HTML (Hyper-text Markup Language). However (Berners-Lee 1999) conceded that the originalgoal and motivation remained largely unfulfilled: There was a second part of the dream. [...] We could then use computers to help us analyse it, make sense of what were doing, where we individually fit in, and how we can better work together. [...] The second part has yet to happen. Realising the original idea of the Memex, the ‘augmentation of human intel-lect’, and the ‘second part of the dream’ has turned out to be a difficult problem.The Web has created a global common information space, but it has amplified theproblem of information and knowledge overload by causing the creation of moreand more hyperlinked web documents. The central problem is that computers areperfectly capable of rendering web documents. However, computers provide littlesupport in helping people to understand, organise and manage the knowledge con-tained in these documents. To be able to augment the human intellect, somethingmore than the HTML Web we use today is required: a ‘Semantic’ Web that ma-chines can process for us, where computers help us to make sense of the informa-tion enabling us to work better together.
  • 54. 4 Adding semantics to the Web 474.2 The need for semanticsSearching for information today is based on finding words within web pages andmatching them. For example, if a person was searching for information on theformer English rugby captain Martin Johnson, they would visit a site such asGoogle and type ‘Martin Johnson’ into the search box. The search engine will notonly return web pages for the rugby player, but primarily those relating to hismore famous artist namesake Martin Johnson Heade (and many other MartinJohnsons besides). One way to improve the situation would be for a web page au-thor to add some extra meaning to the information, for example, by telling thecomputer that Martin Johnson is indeed a rugby player and that every rugby playeris a person. This is a simple example of annotation, where semantic meaning canbe added to the Web. Now a computer can determine that this Martin Johnson is arugby player, and that he may be the one that you are looking for. However, theHTML Web can not express these annotations. The principles of the HTML Web since its invention in the 1990s have essen-tially remained the same: resources (web pages, files, etc.) are connected by se-mantically-untyped hyperlinks. By untyped, we mean that there is no easy way fora computer to figure out what a link between two pages means, i.e. what are thesemantics of the relationship between the pages. For example, on the UEFA foot-ball website, there are hundreds of links to the various organisations that are regis-tered members of the association, but there is nothing explicitly saying that thelink is to an organisation that is a ‘member of’ UEFA or what type of organisationis represented by the link. On a professor’s work page, she may link to many pa-pers that she has authored, but the page may not say that she is the author of thosepapers or that she wrote such-and-such when she was visiting at a particular uni-versity. Moreover, while anyone can guess that she is a professor, which is a per-son, a computer cannot extract any information about the nature of the objects (orin fact, any objects at all) described in the pages as the computer is only aware ofweb pages and hyperlinks. While a reader interprets these pages and hyperlinks asrepresenting real-world concepts and properties, a computer cannot. There ishence a knowledge gap between what is on the Web and the interpretation we canmake when compared to what a computer can deduce. To close the knowledgegap, knowledge representation mechanisms on the Web beyond HTML areneeded. Since the Web has as its basis a number of unique properties (such as dis-tribution, diversity and heterogeneity), these properties serve as requirements forknowledge representation mechanisms. These requirements are: Entity identity. Entities on the Web have to be uniquely identifiable: not just documents, but all possible entities (e.g. persons). Object identity is a necessary prerequisite to enable computers to process and understand information. Relationships. Representing entities alone is not sufficient. Relationships be- tween entities capture relevant knowledge and need to be expressed.
  • 55. 48 The Social Semantic Web Extensibility. Given the wide variety of communities and topics on the Web and the need for evolution and adaptability, a fixed schema would not be ade- quate for the rapid evolution of topics. Vocabularies (ontologies). In order to exchange and represent an agreement about how to transfer data on a specific topic, an agreement on which vocabu- lary to use is necessary. The Semantic Web, put forward by the inventor of the current Web, Sir TimBerners-Lee, addresses these issues and requirements by allowing one to providemetadata that is associated with web resources, and behind this metadata there areassociated vocabularies or ‘ontologies’ (we will not make any distinction betweenthese terms in the book, hence using both) that describe what this metadata is andhow it is all related to each other. The Semantic Web, or as some have termed it‘Web 3.0’1, is ‘an extension of the current Web in which information is givenwell-defined meaning, better enabling computers and people to work in coopera-tion’ (Berners-Lee et al. 2001). The word ‘semantic’ stands for ‘the meaning of’,and therefore the Semantic Web is one that is able to describe things in a way thatcomputers can better understand: basically, adding more meaning to the Web.Computers can only do so much with the ‘natural language’ information that is onthe Web at the moment: they are not evolved enough to understand what pages oftext are about. The idea of a Semantic Web involves a move from unstructuredpages of text to structured information that can not only be understood by peoplebut can be interpreted by computers to present the information to people in newways. MIT’s (Massachusetts Institute of Technology2) Stefan Marti summarised ‘theSemantic Web for dummies’ as3: XML customised tags, like: <dog>Nena</dog> + RDF relations, in triples, like: (Nena) (is_dog_of) (Kimiko/Stefan) + Ontologies / hierarchies of concepts, like: mammal -> canine -> Cotton de Tulear -> Nena + Inference rules, like: If (person) (owns) (dog), then (person) (cares_for) (dog) = Semantic Web! The next step is the development of various ontologies. Ontologies, providing avocabulary of terms in a certain area (for example, there would be separate on-tologies for sports or soaps or science) are used to specify the meanings of the an-notations added to web pages. For the rugby example, there may be a definition inan ontology that a rugby player is a member of a team, or that each team has 151 (URL last accessed 2009-06-09)2 (URL last accessed 2009-07-16)3 (accessed 2009-06-09)
  • 56. 4 Adding semantics to the Web 49players. These ontologies are designed to be understandable by computers as partof the Semantic Web (using the Resource Description Framework, or RDF). Some of the more popular Semantic Web vocabularies include FOAF (Friend-of-a-Friend, for social networks), Dublin Core (for resources online or in librar-ies), SIOC (for online communities and content), the W3C Basic Geo Vocabulary4(for the coordinates of geographic locations), and the Gene Ontology (for genes inorganisms). (Bizer et al. 2007) also provides a list of popular and core vocabular-ies that people should use when publishing data on the Semantic Web5. People canalso create custom vocabularies for their own information representation require-ments (Figure 4.1).Fig. 4.1. We can now describe lots of things semantically!6 Figure 4.2 shows the node types which exist in a typical Semantic Web datamodel, i.e. a vocabulary (similar to FOAF), and the relationship types which con-nect them together. The relation or predicate ‘maker’ is considered to be the in-verse of the relation ‘made’, in other words, they represent the same relationship,but in opposite directions. While it is known that adding metadata to websites can often improve the per-centage of relevant document hits in search engine results, it is difficult to per-suade Web authors to add metadata to their pages in a consistent, reliable manner(either due to perceived high entry costs or because it is too time consuming). Forexample, few web authors make use of the simple Dublin Core metadata system,e.g. by indicating the creator or creation date of their pages, even though DCmetadata tags can increase a page’s prominence in search results.4 (URL last accessed 2009-07-16)5 (URL last accessed 2009-06-09)6 (URL last accessed 2009-06-09)
  • 57. 50 The Social Semantic WebFig. 4.2. An example Semantic Web data model The main power of the Semantic Web lies in interoperability, and combinationsof vocabulary terms: interoperability and increased connectivity is possiblethrough a commonality of expression; vocabularies can be combined and used to-gether: e.g. a description of a book using Dublin Core metadata can be augmentedwith specifics about the book author using the FOAF vocabulary. Vocabulariescan also be easily extended (using modules) in a distributed manner. A person canadapt an ontology published on the Web to their own needs and then republish thechanges so that anyone can benefit from it. Of course, to be successful, the Se-mantic Web should rely on a set of core ontologies that are agreed upon and usedby most people for adding semantics to their content. Through this, true intelligentsearch with more granularity and relevance is possible: e.g. a search can be per-sonalised to an individual by making use of their identity profile and relationshipinformation. In later sections we will see that the Semantic Web also provides usefulmechanisms for describing and leveraging the social objects that bind us togetherin social websites. Since more interesting social networks are being formed aroundthe connections between people and their objects of interest, and as these object-centred social networks grow bigger and more diverse, more intuitive methods ofnavigating the information contained in these networks have become necessary –both within and across social networking sites. We have mentioned how individu-als are connecting through these shared objects, but this can apply to whole com-munities as well (e.g. a community of interest for mountaineering may consist ofboth people and content distributed across photo-, bookmark- and event-centredsocial networks, Figure 4.3). Person- and object-related data can also be gatheredfrom various social networks and linked together using a common representationformat. This linked data can provide an enhanced view of individual or commu-nity activity in a localised or distributed object-centred social network(s) (‘showme all the content that Alice has acted on in the past three months’).
  • 58. 4 Adding semantics to the Web 51Fig. 4.3. Community groups are also connected through objects of interest In the following sections we describe the necessary building blocks of the Se-mantic Web in more detail.4.3 MetadataMetadata has been with us since the first librarian made a list of the items on ashelf of handwritten scrolls. The term ‘meta’ comes from a Greek word that de-notes ‘alongside, with, after, next’. Metadata can be thought of as ‘data aboutdata’, and it commonly refers to descriptive structured data about web resourcesthat can be used to help support a wide range of operations. Metadata can be used for many purposes: to provide a structured description ofcharacteristics such as the meaning (semantics), content, structure and purpose ofa web resource; to facilitate information sharing; to enable more sophisticatedsearch engines on the Web; to support intelligent agents and the pushing of data(e.g. from blog feeds); to minimise data loss or repetition; and to help with thediscovery of resources by enabling field-based searches. We can consider a library analogy to a Web without metadata, where everyword in every page in every book must be indexed. Because such indexing willlag the growth and change in the Web, it often yields poor search results. Witheven some basic metadata, using the library analogy we have books with catego-ries, titles, descriptions, ratings, yielding better retrieval. However, this also resultsin some extra work classifying things and assigning properties. Many kinds of resources, objects, or things on the Web can be annotated withmetadata: HTML documents, digital images, databases, books, museum objects,archival records, collections, services, physical places, people (e.g. using FOAF),abstract ‘works’, concepts, events and even metadata records themselves. Thismetadata can be used by people, for example, a collection owner managing orcontrolling access to resources, or a researcher seeking or interpreting resources,or by computerised services or agents, for example, aggregators (e.g. blog collec-
  • 59. 52 The Social Semantic Webtions), web portals presenting a ‘landscape’ of data to users, or brokers performingquery tasks on behalf of users. Metadata can be created by software tools (e.g. by indexing robots or webcrawlers accessing resource content), and it can be created by people through de-scriptions added by a resource owner or by third parties (e.g. specialist cataloguersor resource users). However, creating (and maintaining) high-quality metadata isnot always cheap, and there may be rights or copyright issues for metadata as wellas for the underlying resources. Depending on which approach offers the most flexibility, metadata can be em-bedded within a resource and extracted from the resource itself (depending on itsformat), or it may simply be linked to a resource via an external file or a databaseof resource descriptions. One may also need to present different subsets of meta-data in different contexts. To exchange metadata, metadata standards are required. These are agreed-oncriteria for describing metadata for purposes of interoperability. As a simple ex-ample, a date (e.g. attached as metadata to a file) could be expressed as January31, 2009, 31 janvier 2009, 2009-01-31, 01-31-2009, or 31012009 (amongst othervariations), and it is obvious that we need some consistent forms for exchangingsuch metadata. There are already many metadata standards for different domains7,and sometimes mappings are required between these standards. For the SemanticWeb, the common standard to describe metadata is RDF, and we will now de-scribe in more detail what RDF is and how it can be used8.4.3.1 Resource Description Framework (RDF)The Resource Description Framework (RDF) is used to represent entities, referredto by their unique identifiers or URIs (Uniform Resource Identifiers), and binaryrelationships between those entities. RDF consists of two parts: the RDF datamodel specification and a serialisation syntax (RDF/XML is often used, but it cantake other forms including Notation 3 and Turtle). The data model definition is thecore of the specification, and the syntax is necessary to transport RDF data in anetwork. In RDF, two entities and a binary relationship between these entities is called astatement, or a triple. Represented graphically, the source of the relationship iscalled the subject of the statement, the labelled arc itself is the predicate (alsocalled the property) of that statement, and the destination of the relationship iscalled the object of that statement. The data model of RDF distinguishes betweenentities (also called resources), which have a URI identifier, and literals, which arejust strings. The subject and the predicate of a statement are always resources,7 (URL last accessed 2009-07-17)8 Some of the aforementioned formats, such as Dublin Core, can also be expressed using RDF
  • 60. 4 Adding semantics to the Web 53while the object can be a resource or a literal. In RDF diagrams, resources aretypically drawn as ovals, and literals are drawn as boxes. An example of a state-ment is given in Figure 4.4: the resource is a sub-ject and has a property and thevalue of the property (the object) is the resource.Fig. 4.4. The simplest possible RDF graph: two nodes and an arc The statement can be read as: the resource hasa homepage, which is the resource At first glance itmight look strange that predicates are also resources and thus have a URI as a la-bel. However, to avoid confusion it is necessary to give the predicate a uniqueidentifier. Simply ‘hasHomepage’ would not be sufficient, because different vo-cabulary providers might define different versions of the predicate hasHomepagewith possibly different meanings. A set of statements forms a graph. Figure 4.5 shows an extension of Figure 4.4:the property with value John Breslin (a literal) hasbeen added to the graph.Fig. 4.5. An extension of the previous example This is the ‘core’ of RDF. To allow for a more convenient data representation,additional vocabularies and conventions need to be introduced. For example,predicate URIs are often abbreviated by using the XML-namespace syntax. In-stead of writing the full URI form of the predicate, the namespace formsw:hasHomepage is used with the assumption that the substitution of the name-space prefix ‘sw’ with ‘’ is defined. The name-space prefix ‘rdf’ is commonly used to refer to the specification explaining howmetadata should be produced according to the RDF model and syntax (Lassila and
  • 61. 54 The Social Semantic WebSwick 1999). In this case, the ‘rdf’ prefix would be expanded to the URL of theRDF-specific vocabulary Blank nodesSometimes it is not convenient to provide an explicit URI for a resource in RDF, ifthe URI is not really visible to the outside world. In these cases ‘blank nodes’(also called anonymous resources or bnodes) are used. In RDF, a blank node is aresource, or a node in an RDF graph, which is not identified by a URI. A blanknode can be used as the subject or the object in an RDF triple. Figure 4.6 shows anexample of a blank node.Fig. 4.6. A metadata instance often may not have a full URI but rather a blank node identifier4.3.2 The RDF syntaxIn order to facilitate interchange of the data that is represented in RDF, a concreteserialisation syntax is needed. RDF/XML (Beckett 2004) is an obvious choice, butit is worth noting that the RDF data model is not tied to any particular syntax andcan be expressed in any syntactic representation (or vice versa, extracted fromother forms of data, e.g. Topic Maps (ISO/IEC 13250)). Furthermore, because ofthe XML serialisation, the RDF syntax definition is rather complicated. RDF APIsare used to shield developers from the details of any particular serialisation syntax,and can handle RDF data as graphs.
  • 62. 4 Adding semantics to the Web 55 The RDF specification suggests two standard ways to serialise RDF data inXML: an abbreviated syntax and a standard syntax. Both serialisation possibilitiesuse the XML namespace mechanisms (Bray et al. 1999) to abbreviate URIs as al-ready described. An example is given below for the abbreviated syntax. The ab-breviated syntax is very close to how one would intuitively model data in XML. The following XML is the serialisation of the RDF graph given in Figure 4.5.<rdf:RDF xmlns=“” xmlns:rdf=“” xmlns:rdfs=“” xmlns:dcterms=“”><Project rdf:about=“”> <hasHomepage> <rdfs:Resource rdf:about=“”> <dcterms:creator>John Breslin</dcterms:creator> </rdfs:Resource> </hasHomepage></Project></rdf:RDF> XML documents can carry RDF code, which is mapped into an RDF data-model instance. The start and end of the RDF code is indicated by the tags<rdf:RDF> and </rdf:RDF>. Some other common serialisation syntaxes for RDF are N3 (Notation 3) (Bern-ers-Lee 1998) and Turtle (Beckett and Berners-Lee 2008), the second one being asubset of the first one. The same RDF graph from Figure 4.5 is given below inTurtle9:@prefix : <> .@prefix rdf: <> .@prefix rdfs: <> .@prefix dcterms: <> .<> a :Project ; :hasHomepage <> .<> a rdfs:Resource ; dcterms:creator “John Breslin” .9For future examples in this book, we will not list the namespaces and prefixes in our RDF codesnippets. In this example, we also point out the shortcut “a”, which is equivalent to rdf:type
  • 63. 56 The Social Semantic Web The semicolon at the end of a line means that subsequent statements (until a pe-riod is encountered at the end of a line) will use the same subject as in the previ-ous line. Therefore, in the above example, the semicolon indicates that the predi-cate ‘dcterms:creator’ and associated literal object ‘John Breslin’ applies to thesubject ‘’. Another approach for serialising RDF is through the annotation of XHTMLdocuments with RDFa (Resource Description Framework in Attributes) (Adida etal. 2008)10, which makes it possible to embed semantics in XHTML attributes insuch a way that the data can be mapped to RDF and objects can be identified byURIs. This approach bridges the gap between the Semantic Web for humans andfor machines since a single document with RDFa can contain information forboth. This also prevents the repetition of information between an HTML docu-ment and an RDF/XML one. The same RDF graph from Figure 4.5 is given belowin RDFa:<div xmlns:sw=“” xmlns:rdf=“” xmlns:rdfs=“” xmlns:dcterms=“”> <p about=“” typeof=“sw:Project”> The SIOC Project is hosted at <a rel=“sw:hasHomepage” href=“”/>http://</a>. </p> <p about=“” typeof=“rdfs:Resource”> This page has been created by <span property=“dcterms:creator”>John Breslin</span> </p></div>4.4 OntologiesMetadata elements are used to provide structure to the description of a resource,e.g. for a distance-learning course this could be title, description, keywords, au-thor, educational level, version, location, language, date created, etc., and the basicdata model for RDF has been introduced which allows us to express metadataabout a particular resource. For practical purposes it is necessary to define schemainformation for this metadata, since a common vocabulary (also called an ontol-10 (URL last accessed 2009-06-09)
  • 64. 4 Adding semantics to the Web 57ogy) needs to be agreed on in order to facilitate information and knowledge ex-change. Figure 4.7 shows a representation of how metadata about people, their so-cial connections and their interests is produced according to some ontology speci-fications.Fig. 4.7. Metadata and ontologies As another example, if there is metadata about a soccer team, an underlying on-tology will say that a soccer team always has a goalkeeper and always has a man-ager, so each metadata entry for a soccer team should have that information. The term ‘ontology’ (from the Greek words on = being and logos = to reason)was originally coined in philosophy to denote the theory or study of being as such.The use of the term ontology in computer science has a more practical meaningthan its use in philosophy. The study of metaphysics is not in the foreground incomputer science, but rather what properties a machine must have to enable it toprocess data that is being questioned within a certain domain of discourse. Hereontology is used as the term for a certain artefact. Tom Gruber’s widely-cited answer to the question ‘what is an ontology?’ is:‘An ontology is a specification of a conceptualization.’ In (Gruber 1993), thisstatement is elaborated on: A body of formally represented knowledge is based on a conceptualization: the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them (Genesereth and Nilsson 1987). A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualization, explicitly or implicitly.Specification in this context means an explicit representation by some syntacticmeans. Most approaches to ontology modelling agree on the following primitivesfor representation purposes:
  • 65. 58 The Social Semantic Web Firstly, there must be a distinction between classes and instances, where classes are interpreted as a set of instances. Classes may be partially ordered using the binary relationship ‘subClassOf’, which can be interpreted as a subset relation- ship between two classes. The fact that an object is an element of a certain class is usually denoted with a binary relationship such as ‘type’. Secondly, a set of properties (also called attributes or slots) is required. Slots are binary relationships defined by classes, which usually have a certain do- main and a range. Slots might be used to check if a certain set of instances with slots is valid with respect to a certain ontology. These are the modelling primitives of RDF Schema (RDFS), which we will in-troduce shortly. For example, classes can be thought of as general things or con-cepts in a domain and may be things like ‘Person’, ‘Document’, ‘Book’, or ‘Web-Page’. There may also be relationships among these classes such as ‘Book’ and‘WebPage’ have a ‘subClassOf’ relationship to ‘Document’, or a ‘Page’ is ‘con-tainedIn’ a ‘Book’. Typical properties or attributes would be that a ‘Person’ has an‘age’, or that a ‘WebPage’ has a ‘creationDate’.Fig. 4.8. From taxonomy to ontology to knowledge base While the words vocabulary and ontology are often used interchangeably, amore strict definition is that a vocabulary is a collection of terms being used in aparticular domain, that can be structured (e.g. hierarchically) as a taxonomy andcombined with some relationships, constraints and rules to form an ontology.Typical constraints would be ‘the cardinality must be at least 1’ or ‘the maximumvalue is 300’ and some sample rules or axioms would be that ‘cows are larger thandogs’ or ‘cats cannot eat only vegetables’. A combination of an ontology togetherwith a set of instances of classes constitutes a knowledge base (Figure 4.8). Implementing or creating ontologies consists of defining all the ontology com-ponents through an ontology-definition language. It is generally carried out in two
  • 66. 4 Adding semantics to the Web 59stages: an informal stage, where the ontology is sketched out using either naturallanguage descriptions or some diagram technique, and a formal stage, where theontology is encoded in a formal knowledge-representation language that is ma-chine computable (e.g. RDF Schema or OWL, the Web Ontology Language). Dif-ferent tools (e.g. Protégé11) may also be used in the implementation of an ontol-ogy. However, even more important than implementing the ontology isdocumenting it. There is a need to produce clear informal and formal documenta-tion so that the ontology can be understandable by others. An ontology that cannotbe understood will not be reused.4.4.1 RDF SchemaThe purpose of the RDF Schema (RDFS) specification (Brickley and Guha 2004)is to define the primitives required to describe classes, instances and relationships.RDF Schema is an RDF application, in that it is defined in RDF itself. The definedvocabulary is very similar to the usual modelling primitives available in frame-based languages12 (where entities in a domain are modelled as frames that have aset of associated slots or properties). In this section, the vocabulary used in theaforementioned examples is defined using RDF Schema. The namespace-prefix‘rdfs’ is used as an abbreviation for, theRDFS namespace identifier.Fig. 4.9. An ontology represented in RDF Schema11 (URL last accessed 2009-06-09)12 (URL last accessed 2009-07-14)
  • 67. 60 The Social Semantic Web Figure 4.9 depicts an RDF Schema-based ontology, defining the classsw:Project and two properties sw:hasHomepage and sw:hasMember. The classnode is defined by typing the node with the resource rdfs:Class, which represents ameta-class in RDFS. sw:Project is also defined as a subclass of rdfs:Resource,which is the most general class in the class hierarchy defined by RDF Schema.The rdfs:subClassOf property is defined as transitive. Properties (predicates) are defined by typing them with the resourcerdfs:Property, which is the class of all properties. Furthermore, the domain andrange of a property can be restricted by using the properties rdfs:range andrdfs:domain to define value restrictions on properties. For example, the propertysw:hasHomepage has the domain sw:Project and a range rdfs:Resource (which iscompliant with the use of sw:hasHomepage in Figure 4.4). Using these definitions,RDF data can be tested with compliance regarding a particular RDF Schemaspecification. The RDF Schema defines more modelling primitives: The property rdfs:label allows one to define a human-readable form of a name. The property rdfs:comment enables comments. The property rdfs:subPropertyOf indicates that a property is subsumed by an- other property. For example the fatherOf property is subsumed by the parentOf property, since every father is also a parent. The properties rdfs:seeAlso and rdfs:isDefinedBy are used to indicate related resources. As a convention, an application can expect to find an RDF Schema documentdeclaring the vocabulary associated with a particular namespace at the namespaceURI. As another example, the following ontology snippet defines the fact that aresearch institute is a specific kind of organisation:<> rdf:type rdfs:Class ; rdfs:subClassOf <> . Therefore, thanks to inference principles related to this property, one can iden-tify all Organisation(s) even if they are defined as instances of ResearchInstitute.Moreover, subclasses and subproperties can be defined for classes and propertiesappearing in existing external ontologies, so that one can extend any ontologyavailable on the Semantic Web for his or her own needs in a distributed way. Thisalso requires some agreement between people, as we will discuss in Section13.2.2. As mentioned, RDF Schema allows us to define the domain and range for eachproperty. The following code identifies that the property hasMember links an Or-ganisation to a Person:<> a rdfs:Property ; rdfs:domain <> ; rdfs:range <> .
  • 68. 4 Adding semantics to the Web 61 An interesting aspect of domains and ranges is that each instance linked to us-ing a hasMember property does not have to be explicitly defined as an instance ofa Person, but it rather becomes such an instance by inference as soon as this prop-erty exists. Hence, one statement might be enough to represent several facts abouta single resource.4.4.2 Web Ontology Language (OWL)The expressiveness of RDF Schema is somewhat limited. For example, it cannotbe used to define that a property is symmetric (e.g. isNeighbourOf) or transitive(e.g. locatedIn). In order to model such advanced axioms within ontologies, theW3C started a working group on OWL (Web Ontology Language) in 2001, basedon the work performed within the DAML+OIL project, itself based on OIL fromEurope and DAML-ONT from the USA. OWL became a W3C Recommendationin 2004 for defining ontologies13, and goes beyond RDF Schema in terms of ex-pressivity as its semantics are based on Description Logics. Since an introductionto Description Logics is beyond the scope of the book, we mostly rely on RDFSchema with some OWL extensions, most notably owl:sameAs. The built-in OWL property owl:sameAs indicates that two URI references ac-tually refer to the same thing: the individuals have the same ‘identity’. This is use-ful to indicate when two entities are actually identical (the same) even when theyhave different identifiers14. OWL extends the notion of classes and properties defined in RDF Schema, andit provides new axioms to define advanced characteristics and constraints regard-ing classes and properties. OWL actually provides three sublanguages with differ-ent degrees of expressivity: OWL-Lite extends RDFS and provides new axioms such as symmetry and car- dinality constraints (however, cardinality can be only 0 or 1 in OWL-Lite). OWL-DL (DL being inherited from Description Logics) adds new axioms (and provides these axioms in OWL) including union, intersection and disjunction between classes, as well as extended OWL-Lite cardinality constraints. OWL-Full does not add new axioms but interprets them differently and thus becomes more powerful (for example, a URI can represent at the same time a class and an instance).13Here, we only refer to OWL 1 since OWL 2 is currently undergoing a standardisation process14Identity on the Semantic Web is a complex issue, and this topic has been discussed during the1st International Workshop on Identity and Reference on the Semantic Web (IRSW 2008), withmore information available at
  • 69. 62 The Social Semantic Web An important thing to keep in mind regarding these languages and the SemanticWeb in general is that they refer to what is termed an ‘open-world assumption’.Therefore, if a fact is not defined, nothing can be assumed about it. For example,if no triples mention that ‘:John :worksWith :Alex’ and if someone asks ‘is Johnworking with Alex’, the answer will not be ‘no’ but rather ‘there is no answer’ asthere are not enough facts to answer that query.4.5 SPARQLAs we have seen, RDF(S) and OWL are useful languages for representing ontolo-gies and metadata on the Semantic Web. However, once this metadata has beenpublished, query languages are required to make full use of it. SPARQL(SPARQL Protocol And RDF Query Language) aims to satisfy this goal and pro-vides, as the name says, both a query language and a protocol for RDF data on theSemantic Web. SPARQL can be thought of as the SQL of the Semantic Web, andoffers a powerful means to query RDF triples and graphs. As Tim Berners-Leesaid15: Trying to use the Semantic Web without SPARQL is like trying to use a relational database without SQL. SPARQL makes it possible to query information from databases and other diverse sources in the wild, across the Web. As RDF data is represented as a graph, SPARQL is therefore a graph-queryinglanguage, which means that the approach is different than SQL where people dealwith tables and rows. Moreover, it provides extensibility within the query patterns(based on the RDF graph model itself) and therefore advanced querying capabili-ties based on this graph representation, such as ‘find every person who knowssomeone interested in Semantic Web technologies’. SPARQL can be used to query independent RDF files as well as sets of RDFfiles, either loaded in memory by the SPARQL query engine or through the use ofa SPARQL-compliant triple store (a triple store is a storage system for RDF data).Therefore, there is currently a need to know which files must be queried beforerunning a query, which can be an issue in some cases and can be considered as ahurdle to overcome. However, approaches such as voiD, the Vocabulary of Interlinked Datasets16(Alexander et al. 2009), can be used in addition to distributed SPARQL query en-gines in order to dynamically identify which RDF sources should be consideredwhen querying information. SPARQL offers four query forms that can be used to run different types of que-ries:15 (last accessed on 2009-06-09)16 (URL last accessed 2009-07-16)
  • 70. 4 Adding semantics to the Web 63 SELECT, used to retrieve information based on a particular pattern, CONSTRUCT, used to create an RDF graph based on RDF input and that can be used as a translation service for RDF data (between different ontologies), ASK, used to identify if a particular query pattern can be matched on the que- ried RDF graph, and DESCRIBE, used to identify all triples related to the particular object that must be described. For example, the following SPARQL query represents the question we posedearlier (identifying people that know someone interested in the Semantic Web),using the foaf:knows relationship to identify a relationship between two people,the foaf:name property to link a person to his or her name and thefoaf:topic_interest property to model an interest of some person. In this query, weidentify the notion of ‘Semantic Web technologies’ using the DBpedia URI for theSemantic Web, i.e. As we mentionedearlier, to get results for the following query17, one must apply it to a set of RDFfiles or a triple store that contains relevant information.SELECT DISTINCT ?who ?name { ?who foaf:name ?name ; foaf:knows [ foaf:topic_interest <> ]} Moreover, SPARQL provides different modifiers (such as ORDER or LIMIT)to organise the various results. The following query therefore extends the previousone by ordering the people by name, and limiting the output to only two results.SELECT DISTINCT ?who ?name { ?who foaf:name ?name ; foaf:knows [ foaf:topic_interest <> ]} ORDER BY asc(?name) LIMIT 2 While SPARQL is obviously a key component of the Semantic Web, it hassome limits. At the time of writing, SPARQL does not provide any aggregatefunction, hence implying a need to use external languages (such as a Python orPHP script) to run aggregations, which can make the adoption of RDF technolo-gies complicated in some cases. Various SPARQL engines have implemented this17 We have omitted the vocabulary prefix definitions in these examples
  • 71. 64 The Social Semantic Webfunctionality however, for example, OpenLink Virtuoso18 (a hybrid triple store andRDBMS-based middleware platform) and ARC219 (an RDF solution for PHP andMySQL-based applications). Other relevant but specific extensions have beenprovided by other engines such as path-based or imprecise queries. Furthermore,SPARQL is a read-only language, in that it does not allow one to add or modifyRDF statements. The SPARQL Update W3C Member Submission (Seaborne et al.2008) provides some ways to update, add and delete RDF triples. To overcomethese and other issues, the W3C SPARQL Working Group is currently working onan update of SPARQL, taking into account a variety of desired features includingupdates, aggregates and negation20. Also, as one may notice, there are some relationships between SPARQL andXQuery, and between the RDF and XML worlds in general. XSPARQL21 (Akhtaret al. 2008) aims to provide a way to bridge the gap between those two worlds byextending SPARQL and XQuery, thereby offering a way to query both XML andRDF data using the same query language. Finally, as we mentioned, SPARQL is both a query language and a protocol.By providing HTTP bindings for it, as well as normalised serialisation of the re-sults (in XML or JSON), it can be efficiently used to provide open access to RDFdatabases. In this book, we will therefore present various applications that offer aSPARQL endpoint to their users, i.e. a way to run HTTP-based queries on RDFstores publicly available on the Web, thereby delivering open and structured datato customers. An example of such a SPARQL endpoint is the one provided byDBpedia22.4.6 The ‘lowercase’ semantic web, including microformatsMicroformats allow specific pieces of structured information to be embeddedwithin the HTML markup code that makes up web pages. This information canthen be discovered and reused by various applications. Microformats have beensuccessful in bringing semantic metadata to the current Web through a vibrant de-veloper community centred around a wiki-based website23 and a set of mailinglists. Through this community, several microformats have been created and arecurrently in widespread use by large companies such as Yahoo! and Automattic,in particular on social websites.18 (URL last accessed 2009-07-07)19 (URL last accessed 2009-07-07)20 (URL last accessed 2009-07-15)21 (URL last accessed 2009-07-16)22 (URL last accessed 2009-07-16)23 (URL last accessed 2009-06-09)
  • 72. 4 Adding semantics to the Web 65Fig. 4.10. The microformats logo The range of available microformats includes hCard, XFN, hCalendar, hRe-view, rel-tag, etc. The hCard microformat can be used to describe informationabout a person such as their name and contact details (e.g. on social networkingsites). The hReview microformat is used for describing information about reviews,and the hAtom microformat allows one to express information about content itemsavailable for syndication, such as blog posts and comments (derived from theAtom syndication format). Microformats have adopted an approach to ‘solve problems’ for particular sce-narios, rather than providing arbitrary Semantic Web data structures that can beused for any purpose. However, despite various arguments24 there is no reason thatboth the Semantic Web and microformats communities cannot work together.Both communities are trying to add semantics in the Web, and using mechanismslike GRDDL (Gleaning Resource Descriptions from Dialects of Languages)25 andposhRDF26, the existing work on both sides can be combined and reused. As with the Semantic Web, microformats have many applications beyond so-cial websites. In enterprise, typical usage scenarios relate to saving companiestime in keeping third-parties (such as customers or price comparison sites) up-dated27. An example is the use of microformats to power systems which show acustomer’s loan with the interest calculated daily on the outstanding amount basedon an interest rate (taken from a microformat-enabled site) and the fixed amount.There are also discussions on how microformats can be used to represent financialdata in documents ranging from online statements to e-commerce receipts, e.g.debit or credit figures28, a total of any kind, or an interest figure; on how curren-cies should be represented29; and on the use of hCalendar for investor relationsevent entries30. Finally, microformats can be added to Excel spreadsheets31 as ameans to embed some ‘reusable, stable semantics’.24 (last accessed 2009-06-09)25 (URL last accessed 2009-06-09)26 (URL last accessed 2009-06-09)27 (URL last accessed 2009-06-09)28 (URL last accessed 2009-06-09)29 (URL last accessed 2009-06-09)30 (URL last accessed 2009-06-09)31 (URL last accessed 2009-06-09)
  • 73. 66 The Social Semantic Web There are some limitations with microformats in terms of representing the rela-tionships between individual fragments of data, which limits the ability to properlydescribe the linked, Web nature of data (e.g. hAtom is sometimes used to repre-sent blog comments, but it does not have a property to indicate what blog post thecomment is in a reply to). Parsing of microformats can sometimes be difficult as asignificant number of exceptions and special cases have to be taken into account.References to objects (such as people, content items, etc.) can also be ambiguous.In addition, microformats cannot be extended as easily as RDF vocabularies,which are more flexible in terms of reusability and integration for different needs. A generic approach for storing the information contained within microformatsis needed if we are to store and query information about all different kinds of So-cial Web objects in a uniform way. One option is to store microformats in their na-tive HTML format, but these would be difficult to process and query. Alterna-tively, domain-specific data stores and applications could be used for eachparticular kind of microformat object, but they may lack flexibility and limit theability to perform universal search queries over links between different objecttypes. The third option is to use RDF, which has advantages over the first two op-tions as it is more generic and allows one to store and process information aboutvarious types of resources and the relations between them. GRDDL can serve as ameans of moving from microformats to RDF, bridging the gap between the se-mantic web and the Semantic Web.4.7 Semantic searchMachine-readable Semantic Web data is now being crawled by Semantic Websearch engines like Sindice32, SWSE33 or Swoogle34. These search engines canusually match keywords in any data that has been crawled or integrated into a se-mantic store. It could be from structured information about people, places, dates,library documents, blog items or topics, whatever. In fact, there is no limit to thetypes of things that can be indexed and searched - since RDF (an open data modelthat can be adapted to describe pretty much anything) is used as the data format.Anyone can reuse existing RDF vocabularies like SIOC to publish data; they canpublish data using their own custom vocabularies (e.g. to describe stamp collect-ing or Bollywood movie genres or whatever); or they can combine public and cus-tom vocabularies (e.g. take FOAF and one’s own vocabulary about soccer to de-scribe players and managers on a soccer team). Sindice (Tummarello et al. 2007) can be thought of as a big semantic index ofthe Web. It allows you to find pointers to relevant pages or URIs where particular32 (URL last accessed 2009-06-09)33 (URL last accessed 2009-06-09)34 (URL last accessed 2009-06-05)
  • 74. 4 Adding semantics to the Web 67keywords are mentioned, where certain property values are used (e.g. pages wherea person says their e-mail address is, or where certain factsor semantic triples appear. Sindice gives you pointers to where stuff is, whereasmany other engines give you the stuff as well (without you having to go to thesource page). Sindice also has an API that can provide results in a reusable (se-mantic) format that can be leveraged by other applications. Alternately, SWSE (Semantic Web Search Engine) shows you semantic infor-mation about the object of interest (e.g. a person’s phone number, their friends,etc.) which may be derived from multiple sources (i.e. the information on an ob-ject comes from tens of sources consolidated together via unique identifiers forthat object or through what is called ‘object consolidation’). Both SWSE andSwoogle allow query capabilities over the collections of all Semantic Web state-ments, so if you search for Galway, it can show you the relevant statements aswell as pointing you to the pages they were obtained from. Geotemporal information is particularly useful for searching across a range ofdomains, and provides nice semantic linkages between things. For example, hav-ing geographic information and time information is useful for describing wherepeople have been and when, for detailing historical events or TV shows, for time-tabling and scheduling of events, etc., and for connecting all of these things to-gether (‘I’m travelling to Edinburgh next week: show me all the TV shows ofrelevance and any upcoming events I should be aware of according to my inter-ests…’). A social search engine that makes use of semantic information is Tusavvy35, al-lowing users to search ‘community knowledge without navigating the entire web’.Tusavvy reveals ‘not easily linked-to pages’ that are often buried in conventionalsearch results. It was built by aligning human factors with search: using socially-annotated web data, leveraging a lexicon built via semantically-related tags, andutilising rankings selected through a user’s accumulated interests.4.8 Linking Open DataIn spite of various standardisation efforts during the last few years regarding lan-guages to model and query data on the Semantic Web, a critical mass of RDF datadid not exist on the Web until recently. While native exports of FOAF data fromsome social websites (e.g. LiveJournal) gave a glimpse of mainstream adoption ofthe Semantic Web, the provision of RDF data was a still a domain restricted to afew early adopters. In parallel, a large amount of rich (semi-structured) data be-came publicly available on the (Social) Web, for example, using Creative Com-mons licences or GNU FDL (the GNU Free Documentation Licence) as in the35 (URL last accessed 2009-06-09)
  • 75. 68 The Social Semantic WebWikipedia. Based on these observations, the Linking Open Data (LOD)36 commu-nity effort started in mid-2007, supported by the W3C Semantic Web Educationand Outreach group. The aim of this initiative is to expose the data already pub-licly available on the Web (in non-RDF forms) using RDF and to interlink it so asto emphasise the value of the Giant Global Graph. In order to achieve this pragmatic vision of the Semantic Web (pragmatic inthat it is more focused on exposing large data sets in RDF rather than performingadvanced reasoning), the project is based on the four tenets of Linked Data, as de-fined by (Berners-Lee 2006):1. Use URIs as the names for things.2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful information.4. Include links to other URIs so that they can discover more things.Fig. 4.11. The Linking Open Data dataset cloud from March 200937 Thanks to this effort, lots of RDF data is now available on the Web and can beused in various applications, from advanced data visualisations and querying sys-tems to complex mashups. More importantly, this data is all linked together,36 (URL last accessed 2009-06-09)37 (URL last accessed 2009-06-09)
  • 76. 4 Adding semantics to the Web 69which means that one can easily navigate from one information source to anotherthanks to Semantic Web browsers such as the Tabulator38 (thereby breakingthrough the barriers that exist between traditional websites). Various strategies canbe used to provide those links from manual interlinking (Hausenblas et al. 2008)to advanced heuristics for raising ambiguity and heterogeneity problems (Rai-mond et al. 2008). As Figure 4.11 shows, the nature of currently-available datasets is quite varied. For example, DBpedia provides a complete Wikipedia exportin RDF and hence acts as the nucleus for this ‘Web of Data’ (a re-branding of theSemantic Web), while GeoNames provides RDF information about millions ofgeographic entities. Social data can also benefit from this initiative. For instance,the Flickr profile exporter interlinks user information with GeoNames entities (wewill describe this later on). Another outcome from the Linking Open Data project is that thanks to the bil-lion of triples and links available on the Web, making this data available in RDFhas bootstrapped the Semantic Web in conjunction with projects like SIOC (morelater). Companies are now gaining interest in the Semantic Web, some of themeven becoming part of this Web of Data effort. Zemanta39 recently released an APIthat provides named-entity extraction for textual data based on entities from theLOD cloud, and Freebase’s RDF data exports are now linked to DBpedia URIs.Moreover, we believe that this huge amount of data can raise interesting researchchallenges, such as distributed querying and reasoning on large-scale distributeddata, as well as the trustworthiness of information sources on the Web.4.9 Semantic mashupsAlthough it is distributed and sometimes completely disconnected, data from theSemantic Web is represented using a common language, i.e. RDF. Therefore, itenables the production of semantic mashups in an easy way by allowing the com-bination of RDF data from different data sources. For example, geolocation datafrom GeoNames and information about personalities from DBpedia could be usedfor celebrity geolocation mashups. Social data can also be taken into account. Forexample, the FOAFMap application (Passant 2006) provides one of the first ge-olocation mashups for social semantic data, displaying a complete social networkon a Google Map from a single entry point, i.e. the starter person’s FOAF file,with further people queried on-the-fly. It is not just on the Social Web that thesemashups are useful. In an organisational context, semantic mashups can make iteasier for employees of an organisation to get at relevant data, to integrate it and toshare it.38 (URL last accessed 2009-07-16)39 (URL last accessed 2009-07-07)
  • 77. 70 The Social Semantic Web With the Semantic Web, it is possible to reduce the costs for people who are in-terested in mixing together or mashing up data from many different sources whilehiding much of the complexity that makes it happen. Most of the componentpieces for these mashups exist: the parts just have to be combined together. Ac-cording to Eric Miller from Zepheira40, the main actions required by such mashupcomponents are: create, publish, and analyse. Open-source tools that can be usedinclude Remix (create), Exhibit (publish) and Studio (analyse) from MIT’sSIMILE project41 (Semantic Interoperability of Metadata and Information inunLike Environments), and DERI Pipes42 from NUI Galway. Exhibit is a software service for rendering data. Data is fed into the system anda facetted-navigation interface is returned, without the need for a database or abusiness logic tier. Exhibit can style the data in different ways, and the data canthen be viewed through different ‘lenses’. Remix is a tool that can provide seman-tically-mashed data for Exhibit. It combines visual interfaces, data transformationinterfaces and data storage components. Remix also leverages persistent identifiers(for people, places, concepts, network objects, etc.) using For example, Remix can be used to ‘stitch’ together two related spreadsheetsfrom different sources (organisations, groups, people, etc.). Fields can be mappedfrom one spreadsheet to the other and then you can see if it makes sense from adata perspective. Remix has tools for ‘simultaneous editing’ which allows editingover patterns of data, so by editing one entry you can edit all of them. This actslike a script which can change ‘last name, first name’ to ‘first name last name’without any complicated programming. During each step, every piece of data hasan identifier and therefore becomes a web resource in a framework that enablespeople to mash data together as part of a resource-oriented architecture. You can connect any fields together, but this may not necessarily make sense,so there is a need for interfaces to show users whether it does or not. In Exhibit,you can then take the stitched-together data and create an interface to it by cus-tomising facets and views, applying different themes, etc. This combination oftools enables a non-technical expert to not just produce a user interface to interactwith semantic mashups of data, but to publish the information on the Web so thatother people can benefit from it. As a final component in a semantic mashup, Studio can be used to analyse data,e.g. as reports with pattern analyses, which is particularly useful for organisationaldata. Because it is based on RDF and SPARQL, queries can be created that arerelevant to a particular organisation, e.g. ‘show me the most popular or least popu-lar reports’, or ‘show me any reports that used some of my data’. This can bringorganisations into a ‘Linked Enterprise Data’ (LED) framework, a parallel idea tothe Linking Open Data initiative described earlier. Miller says that LED is all40 (URL last accessed 2009-06-09)41 (URL last accessed 2009-07-20)42 (URL last accessed 2009-07-07)
  • 78. 4 Adding semantics to the Web 71about exposing and linking enterprise data, while showing that there are benefitsin terms of solutions that can be made available immediately. From NUI Galway, the DERI Pipes application allows a variety of input datatypes (RDF, XML, JSON, microformats, etc.) to be ‘mashed up’ through a graphi-cal interface (inspired by Yahoo! Pipes) or by using a command line tool. Pipesare basically simple commands (taking an input and transforming it) that can becombined together to create a certain desired output. Since it is also available asopen source, DERI Pipes can be easily extended or customised and applied in usecases where a local deployment is required. The DERI Pipes GUI allows pipes tobe graphically edited, debugged and invoked. The execution engine is also avail-able as a standalone JAR file which is ideal for embedded use.4.10 Addressing the Semantic Web ‘chicken-and-egg’ problemThe challenge for the Semantic Web is related to the chicken-and-egg problem: itis difficult to produce data without interesting applications, and vice versa. TheSemantic Web cannot work all by itself, because if it did it would be called the‘Magic Web’. For example, it is not very likely that you will be able to sell yourcar just by putting your Semantic Web file on the Web. Society-scale applicationsare required, i.e. consumers and processors of Semantic Web data, Semantic Webagents or services, and more advanced collaborative applications that make realuse of shared data and annotations.The Web The Social Web The Social Semantic WebPersonal Websites Blogs Semantic Blogs: semiBlog, Hay- stack, Structured Blogging, ZemantaContent Management Wikis, Wikipedia Semantic Wikis: Semantic Me-Systems, Britannica diaWiki, SemperWiki, Platypus,Online DBpedia, RhizomeAltaVista, Google Google Personalised, Searchles Semantic Search: SWSE, Swoogle, Intellidimension, Powerset, HakiaCiteSeer, Project Google Scholar, Book Search Semantic Digital Libraries:Gutenberg JeromeDL, BRICKS, LongwellMessage Boards Community Portals Semantic Forums and Community Portals: SIOC, OpenLink Data Spaces, Talis EngageBuddy Lists, Address Online Social Networks Semantic Social Networks: FOAF,Books PeopleAggregator, Social Graph APITable 4.1. From Web ‘1.0’ to the Social Semantic Web The Semantic Web effort is mainly towards producing standards and recom-mendations that will interlink applications, and the primary Social Web meme as
  • 79. 72 The Social Semantic Webalready discussed is about providing user applications. These are not mutually ex-clusive43: with a little effort, many Social Web applications can and do use Seman-tic Web technologies to great benefit. Table 4.1 shows some evolving areas wherethese two streams have and will come together: semantic blogging, semanticwikis, semantic social networks (Mika 2007) and in parallel with these the Seman-tic Desktop. These all fall in the realm of what Nova Spivack (CEO of SemanticWeb company Radar Networks) has termed the ‘Metaweb’44, or Social SemanticInformation Spaces. Semantic MediaWiki45, for example, has already been com-mercially adopted46 by Centiare (now MyWikiBiz). There are also great opportunities for mashing together of both Social Web dataor applications and Semantic Web technologies, which just require the use ofsome imagination. Dermod Moore wrote47 of one such Social Web applicationmashup for a hobby project: a Scuttle48 + Gregarius49 + Feedburner50 + Grazr51 hy-brid (these are, respectively, a web-based social bookmarking application, a web-based feed aggregator system, an RSS feed management service, and a provider ofweb-based aggregation widgets). This mashup allows one to aggregate one’s fa-vourite blogs or other content on a particular topic and then to annotate bookmarksto the most interesting content found. Bringing this a step further, we could have a‘semantic social collaborative resource aggregator’. In this hypothetical system: Social network members specify their favourite content sources, You and your friends specify any topics of interest, You specify friends whose topic lists you value, Metadata aggregator collects content from sites you and friends like (which may be human tagged, or could be automatically tagged), Highlights content that may be of interest to you or your friends, If nothing of interest is currently available, content sources may have semanti- cally-related sources in other communities for secondary content acquisition and highlighting, You bookmark and tag the interesting content, and share! In fact, the recent Twine application from Radar Networks (which will be dis-cussed later) offers much of this functionality.43 (URL last accessed 2009-06-09)44 (URL last accessed 2009-06-09)45 (URL last accessed 2009-06-09)46 (URL last accessed 2009-06-09)47 (last accessed 2009-06-09)48 (URL last accessed 2009-07-07)49 (URL last accessed 2009-07-07)50 (URL last accessed 2009-07-07)51 (URL last accessed 2009-07-07)
  • 80. 4 Adding semantics to the Web 73 There have been many announcements relating to commercial Semantic Webapplications recently, with much attention being given to the startup companies inthis space: Powerset (acquired by Microsoft), Metaweb (creators of Freebase) andRadar Networks (Twine), and also to the big companies who have announcedwhat they are doing with semantic data: Reuters (Calais API), Yahoo! (semanti-cally-enhanced search) and Google (Social Graph API and Rich Snippets52). According to Marta Strickland on the Three Minds blog53, the Bintro socialnetworking service (short for ‘business introduction’) uses semantic technologiesto match profiles together for business opportunities. Unlike LinkedIn, it is lessbased on who you know and more based on what you know. Another related ser-vice is from BanyanLink54 that allows college students to collaborate with careercentres, and uses semantic-matching technologies to connect students to internshipopportunities. In the next few chapters, we will discuss some of the most popular Social Webapplication areas, and describe how each of these can be enhanced with semanticsto not only provide more functionality but also to create an overall interconnectedset of social information spaces.52 (URL last accessed 2009-07-07)53 (URL last accessed 2009-06-09)54 (URL last accessed 2009-07-07)
  • 81. 5 DiscussionsDiscussions on different topics can take a variety of forms online, from bulle-tin boards to blogs to mailing lists. With the move from text content to mul-timedia content in the Social Web, discussions are now being attached to con-tent items as well (e.g. lists of comments appear for many videos in YouTubeor photos in Flickr). More recently, the phenomenon of microblogging hastaken root, where people create short text entries about what they are doingthat are normally limited to 140 characters, and people can reply within theirown activity streams to messages posted by others using a simple reply syn-tax. However, all of these discussion methods share a common format: some-one begins a conversation (either with a text post or a multimedia item), andothers weigh in with their views on the topic or item under discussion.5.1 The world of boards, blogs and now microblogsAlthough it is difficult to calculate the size of discussion spaces like the ‘blo-gosphere’ (the world of blogs) or the ‘boardscape’ (the world of message boards),we can make some estimates based on the State of the Blogosphere (now calledthe State of the Live Web) from Technorati’s Dave Sifry and also based on somestatistics from BoardTracker (Breslin et al. 2007a). According to Dave Sifry1,there are some 230 million posts that use tags. This accounts for 35% of all posts,which leads one to a figure of 657 million posts on the blogosphere (or at leastthose that Technorati tracks). However, this does not include comments. BoardTracker, the largest message board search engine, estimate that there areover 6 billion discussions and about 100 billion posts on message boards. A dis-cussion ‘thread’ has on average 16 replies. Accounting for a similar comment ratioon blogs, this could bring the number of blog discussion entries (starter posts andcomments) to about 10 billion posts and comments. Based on this, the boardscapeis roughly 10 times bigger than the blogosphere, but of course message boardshave been around for longer than blogs. Where blogging allowed people to send their thoughts online to an open audi-ence, audioblogging or podcasting allowed people to record them, and videobloggingor ‘vlogging’ allowed them to deliver their messages via video. Now, microbloggingenables anyone to exchange short text messages within their community or simplyto write in brief to the general public. Twitter, the world’s largest microblogging1 (URL last accessed 2009-06-09)J.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_5,© Springer-Verlag Berlin Heidelberg 2009
  • 82. 76 The Social Semantic Website, celebrated its one billionth ‘tweet’ (microblog post) in November 20082, andis nearing three billion posts as of July 2009. The total number of microblog postsin the ‘tweetosphere’ could be double that when the contributions of other microb-logging sites like Jaiku, and Pownce (now closed) are taken into account. We will now give an overview of some popular discussion methods and detailhow semantic technologies have been brought to bear on applications in these ar-eas. While discussion systems also encompass Q&A sites, instant messaging andother services, we will not cover them here. However, we note that the methodswe now describe can easily be deployed on other types of discussion systems, andwe refer to the SIOC Types module3 (to be detailed later) that defines the requiredterms for such systems.5.2 BloggingA blog, or weblog, is a user-created website consisting of journal-style entries dis-played in reverse-chronological order. Entries may contain text, links to otherwebsites, and images or other media. Often there is a facility for readers to leavecomments on individual entries, which make blogs an interactive medium. Thanksto the use of trackbacks4, bloggers can also state that a blog post is a reply to an-other one, introducing a distributed system for conversations in the blogosphere.Bloggers often link to their favourite blogs or friends through ‘blogroll’ links onthe sidebar (forming social networks of bloggers, as shown in Figure 5.1). Also,the latest headlines, with hyperlinks and summaries, are syndicated using RSS orAtom formats (e.g. for reading one’s favourite blogs with a feed reader) as de-scribed earlier. Blogs may be written by individuals or by groups of contributors (ranging fromthe arms of political campaigns to corporations, e.g. the Google Blog). A blogmay function as a personal journal, or it may provide news or opinions on a par-ticular subject. They are also starting to cross the generation gap - teenagers mighthave a blog via a social networking service, their parents may blog themselves andeven their grandparents can be posting, reading or commenting on blogs! Com-pared to the web trend of the 90s where people set up homepages on hosting ser-vices like Geocities, blogs are now one of the most popular methods by whichpeople can acquire and maintain an online presence. However, because of theblogging trend towards spontaneous and more regular updates, as well as the re-verse chronological ordering of posts on blogs, these online presences have acompletely different dynamic.2 (last accessed 2009-06-09)3 (URL last accessed 2009-07-07)4 (URL last accessed 2009-06-09)
  • 83. 5 Discussions 77Fig. 5.1. A graph of a community of bloggers connected to each other via blogroll links (the mid-grey names are those linked to a single highlighted blog shown in dark-grey, ‘Holy Shmoly’)5.2.1 The growth of blogsThe growth and take-up of blogs over the past four years has been dramatic, with adoubling in the size of the blogosphere every six or so months (according to statis-tics from Technorati5). Over 120,000 blogs are created every day, working out atabout one a second. Nearly a million blog posts are being made each day, withover half of bloggers contributing to their sites three months after the blog’s crea-tion. Technorati counted 70 million blogs at the beginning 2007 and now esti-mates that there are 133 million of them available6. The nature of blogs is quite varied, from teenager’s weblogs to technology ex-perts or political opinions. Many political or opinion-type blogs are considered to5 (URL last accessed 2009-06-09)6 (URL last accessed 2009-06-09)
  • 84. 78 The Social Semantic Webbe a form of ‘grassroots journalism’ (Gilmor 2004), and they often have larger au-diences than or at least ones comparable to the websites of mainstream media asobserved in some studies by Technorati7. Many bloggers also use Google Adsenseadvertising on their blogs to get extra revenue (therefore search engine optimisa-tion for blogs becomes important for getting visitors). It also interesting to studythe temporality of information flow between the blogosphere and traditional mediaservices (Cointet et al. 2007). As we will see when we introduce microblogging,bloggers are often at the forefront of information, where traditional media cannotact as fast as the online ‘wisdom of crowds’. Similar to accidentally wandering onto message boards and web-enabled mail-ing lists, when searching for something on the Web, one often happens across arelevant entry on someone’s blog. RSS feeds are also a useful way of accessing in-formation from your favourite blogs, but they are usually limited to the last 15 or20 entries, and do not provide much information on exactly who wrote or com-mented on a particular post, or what the post is talking about. Some approacheslike SIOC (Semantically-Interlinked Online Communities, more later) aim to en-hance the semantic metadata provided about blogs, comments and posts, but thereis also a need for more information about what exactly a person is writing about.Blog entries often refer to resources on the Web and these resources will usuallyhave a context in which they are being used, and in terms of which they could bedescribed. For example, a post which critiques a particular resource could incorpo-rate a rating, or a post announcing an event could include start and end times. When searching for particular information in or across blogs, it is often not thateasy to get it because of ‘splogs’ (spam blogs) and also because of the fact that thevirtue of blogs so far has been their simplicity - apart from the subject field, every-thing and anything is stored in one big text field for content. Keyword searchesmay give some relevant results, but useful questions such as ‘find me all the Chi-nese restaurants that bloggers reviewed in Dublin with a rating of at least 5 out of10’ cannot be posed, and you cannot easily drag-and-drop events or people or any-thing (apart from Uniform Resource Locators - URLs) mentioned in blog postsinto your own applications. Blog posts are sometimes categorised (e.g. ‘Scotland’, ‘Movies’) by the postcreator using pre-defined categories or tags, such that those on similar topics canbe grouped together using free-form tags / keywords or hierarchical tree catego-ries. Posts can also be tagged by others using social bookmarking services or personal aggregators like Gregarius. Other services like Technoratican then use these tags or keywords as category names for linking together blogposts, photos, links, etc. in order to build what they call a ‘tagged web’. UtilisingSemantic Web technology, both tags and hierarchical categorisations of blog posts7 (URL last accessed 2009-06-09)
  • 85. 5 Discussions 79can be further enriched and exposed in RDF8 via the SKOS (Simple KnowledgeOrganisation Systems) framework9. There have been some approaches to tackle the issue of adding more informa-tion to blog posts, so that advanced queries can be made regarding the posts’ con-tent, and the things that people talk about can be reused in other posts or applica-tions (because not everyone is being served well by the lowest commondenominator that we currently have in blogs). One approach is called ‘structuredblogging’10 (mainly using microformats to annotate blog content), and the other is‘semantic blogging’ (using RDF to represent both blog structures and blog con-tent): both approaches can also be combined together.5.2.2 Structured bloggingStructured blogging is an open-source community effort that has created tools toprovide microcontent from popular blogging platforms such as WordPress andMovable Type. The term microcontent indicates a unit of data and associatedmetadata communicating one main idea. Sources of microcontent include micro-formats11, which enable semantic markup to be embedded directly withinXHTML. Microformats therefore provide a simple method of expressing contentin a machine-readable way, facilitating re-use and aggregation. An example of amicroformat is hReview, which allows for the structured description of reviewswithin web pages. Although the original effort has tapered off, structured blogging is continuingthrough services like LouderVoice12, a review site which integrates reviews writ-ten on blogs and other websites. In structured blogging, packages of structureddata are becoming post components (Figure 5.2). Sometimes (not all of the time) aperson will have a need for more structure in their posts - if they know a subjectdeeply, or if their observations or analyses recur in a similar manner throughouttheir blog - then they may best be served by filling in a form (which has its ownmetadata and model) during the post creation process. For example, someone maybe writing a review of a film they went to see, or reporting on a sports game theyattended, or creating a guide to tourist attractions they saw on their travels. Notonly do people get to express themselves more clearly, but blogs can start to inter-operate with enterprise applications through the microcontent that is being createdin the background.8 (URL last accessed 2009-06-09)9 (URL last accessed 2009-06-09)10 (URL last accessed 2009-06-09)11 (URL last accessed 2009-06-09)12 (URL last accessed 2009-06-09)
  • 86. 80 The Social Semantic WebFig. 5.2. A typical structured blogging entry form, in this case for a restaurant review
  • 87. 5 Discussions 81 Take the scenario where someone (or a group of people) is reviewing somesoccer games that they watched. Their after-game soccer reports will typically in-clude information on which teams played, where the game was held and when,who were the officials, what were the significant game events (who scored, whenand how, or who received penalties and why, etc.) - it would be easier for theseblog posters if they could use a tool that would understand this structure, present-ing an editing form with the relevant fields, and automatically create both HTMLand RSS with this structure embedded in it. Then, others reading these posts couldchoose to reuse this structure in their own posts, and their blog reading / writingapplication could make this structure available when the blogger is ready to write.As well as this, reader applications could begin to answer questions based on theform fields available – ‘show me all the matches from South Africa with morethan two goals scored’, etc. At the moment, structured blogging tools (such as those from LouderVoice)provide a fixed set of forms that bloggers can fill in for things like reviews, events,audio, video and people - but there is no reason that people could not create cus-tom structures, and news aggregators or readers could auto-discover an unknownstructure, notify a user that a new structure is available, and learn the structure forreuse in the user’s future posts. Semantic Web technologies could also be used toontologise any available post structures for more linkage and reuse. Some other past attempts at structured blogging include Qlogger13, the Lafay-ette Project14, and JemBlog15. To date, structured blogging tools have been pro-vided for single-user blogging platforms16, but would be more suited for deploy-ment in multi-user blogging communities powered by WordPress Multi-User orDrupal where they could better achieve critical mass. It would also be interestingif structured blogging tools could be integrated with Social Web reading lists ormedia consumption sites, e.g. All Consuming or Most of these produceRSS, which could be used as the basis for getting potential review items in a drop-down list.5.2.3 Semantic bloggingBlog posts are usually only tagged using free-form keywords by the blog owner(i.e. the blog post creator). However, there is often much more to say about a blogpost than simply what category it belongs in or what topics it relates to. SemanticWeb technologies can also be used to enhance any available post structures in a13 (URL last accessed 2009-06-09)14 (URL last accessed 2009-06-09)15 (URL last accessed 2009-06-09)16 (URL last accessed 2009-06-09)
  • 88. 82 The Social Semantic Webmachine-readable way for more linkage and reuse. This is where semantic blog-ging comes in. (Cayzer 2004) envisioned an initial idea for semantic blogging with two mainaspects that could improve blogging platforms: a richer structure both for blogpost metadata and their topics - using shared ontologies - and richer queries interms of subscription, discovery and navigation. He later defined a Snippet Man-ager service implementing some of these features. (Karger and Quan 2004) gavesome other ideas about ‘what would it mean to blog on the Semantic Web’. Theyargued that such tools should be able to produce structured and machine-understandable content in an autonomous way, without any additional input fromthe users. They also provided a first prototype based on the Haystack platform(Quan et al. 2003) that showed new ways to navigate between content thanks tothese techniques. Traditional blogging is aimed at what can be called the ‘eyeball Web’ - i.e. text,images or video content that is targeted mainly at people (Möller et al. 2006). Se-mantic blogging aims to enrich traditional blogging with metadata about the struc-ture (what relates to what and how) and the content (what is this post about - aperson, event, book, etc.). Already RSS and Atom (a format for syndicating webcontent) are used to describe blog entries in a machine-readable way and enablethem to be aggregated together. However by augmenting this data with additionalstructural and content-related metadata, new ways of querying and navigating blogdata become possible. In structured blogging, microcontent such as microformats or RDFa is posi-tioned inline in the (X)HTML (and subsequent syndication feeds) and can be ren-dered via CSS. Structured blogging and semantic blogging do not compete, butrather offer metadata in slightly different ways (using microcontent and RDF re-spectively). There are already mechanisms such as GRDDL which can be used tomove from one to the other and that allows one to provide RDF data from embed-ded RDFa or microformats. Extracted RDF data can then be reused as one wouldany native RDF data, and as such it may be processed using common SemanticWeb tools and services. The question remains as to why one would choose to enhance their blogs andposts with semantics. Current blogging offers poor query possibilities (except forsearching by keyword or seeing all posts labelled with a particular tag). There islittle or no reuse of data offered (apart from copying URLs or text from posts).Some linking of posts is possible via direct HTML links or trackbacks, but again,nothing can be said about the nature of those links (are you agreeing with some-one, linking to an interesting post, or are you quoting someone whose blog post isdirectly in contradiction with your own opinions?). Semantic blogging aims totackle some of these issues, by facilitating better (i.e. more precise) querying whencompared with keyword matching, by providing more reuse possibilities, and bycreating ‘richer’ links between blog posts.
  • 89. 5 Discussions 83Fig. 5.3. Annotating a blog entry with an address book entry (Möller et al. 2006)Fig. 5.4. Integrating a semantic blogging application with desktop data It is not simply a matter of adding semantics for the sake of creating extrametadata, but rather a case of being able to reuse what data a person already has intheir desktop or web space and making the resulting metadata available to others.People are already (sometimes unknowingly) collecting and creating largeamounts of structured data on their computers, but this data is often tied into spe-cific applications and locked within a user’s desktop (e.g. contacts in a person’saddress book as in Figure 5.3, events in a calendaring application, author and titleinformation in documents, audio metadata in MP3 files). Semantic blogging can
  • 90. 84 The Social Semantic Webbe used to ‘lift’ or release this data onto the Web, as in the semiBlog17 application(now called Shift) which allows users to reuse metadata from Apple Mac desktopsin blog posts (see Figure 5.4). For example, looking at the picture in Figure 5.5 (Möller and Decker 2005), Inawrites a blog post which she annotates using content from her desktop calendaringand address book applications. She publishes this post onto the Web, and John,reading this post, can reuse the embedded metadata in his own desktop applica-tions. In this picture, the semantic blog post is being created by annotating a partof the post text about a person with an address book entry that has extra metadatadescribing that person. Once a blog has semantic metadata, it can be used to per-form queries such as ‘which blog posts talk about papers by Stefan Decker?’ It canalso be used for browsing not only across blogs but also other kinds of discussionmethods; or it can be used by blog readers for importing metadata into desktopapplications (or using the Web as a clipboard).Fig. 5.5. Lifting semantic data from the desktop to the Web and back again Conversations can also span multiple blog sites in blog posts and their com-ments, and bloggers often respond to the entries of other users in their own blogs.The use of semantic technologies can enable the tracking of these distributed con-versations. Links between units of conversation could even be enhanced to includesentiment information, e.g. who agrees or disagrees with the initial opinion. SparqlPress18 is another prototype that leverages Semantic Web technologies inblogs. It is not a separate blogging system but rather an open-source plugin for thepopular WordPress platform, and it aims to produce, integrate and reuse RDF datafor an enhanced user experience. SparqlPress mainly relies on the FOAF, SIOCand SKOS Semantic Web vocabularies.17 (URL last accessed 2009-06-09)18 (URL last accessed 2009-06-08)
  • 91. 5 Discussions 85 One interesting feature that SparqlPress provides is the way that it combinesFOAF and OpenID. OpenID is a decentralised login system that allows people toregister on different websites using the same login ID and password, with thelogin being a URL. From that URL, a person can link to their FOAF profile, RDF representation of their persona, by adding a link in the web page header orby simply using RDFa. Via this link, SparqlPress can retrieve the FOAF file of auser when they are logging in, and it is then able to display extra informationabout the user on the blog. This could include their homepage and other accountsor blogs he or she may have on the Web, if this information is provided in theFOAF file. Connecting OpenID to one’s FOAF social networking profile19 canalso be useful for the blocking of blog comment spam. Zemanta provides client-side and server-side tools that enrich the content beingcreated by bloggers or publishers, allowing them to automatically add hyperlinks,choose appropriate tags, and insert images based on an analysis of the content be-ing posted. Zemanta also automatically suggests Common Tags20 to publishers(more in Chapter 8), and allows them to embed these tags in their content. As well as the aforementioned semantic blogging systems, others have beendeveloped by HP21 (Cayzer 2004), the National Institute of Informatics, Japan22(Ohmukai and Takeda 2004), and MIT (Karger and Quan 2004).5.3 MicrobloggingMicroblogging is a recent social phenomenon on the Social Web, with similar us-age motivations (i.e. personal expression and social connection) to other applica-tions like blogging. It can be seen as a hybrid of blogging, instant messaging andstatus notifications, allowing people to publish short messages (usually fewer than140 characters) on the Web about what they are currently doing. These short mes-sages, or microblog posts, are often called ‘tweets’ and have a focus on real-timeinformation. As a simple and agile form of communication in a fluid network ofsubscriptions, it offers new possibilities regarding lightweight information updatesand exchange. Twitter is now one of the largest microblogging services, and thevalue of microblogging is demonstrated by its popularity and by Google’s acquisi-tion of Jaiku, another leading microblogging service. Individuals can publish their brief text updates using various communicationschannels such as text messages from mobile phones, instant messaging, e-mail andthe Web. The simplicity of publishing such short updates in various situations orlocations and the creation of a more flexible social network based on subscriptions19 (URL last accessed 2009-06-09)20 (URL last accessed 2009-07-13)21 (URL last accessed 2009-06-09)22 (URL last accessed 2009-06-09)
  • 92. 86 The Social Semantic Weband response posts makes microblogging an interesting communications methodthat has been studied from a social point of view (Java et al. 2007). Moreover, thismean of publishing can be extended with multimedia in the form of short videorecordings, e.g. as in Seesmic which is considered to be the first video microblog-ging service. Those who are interested in what someone is doing can also receive updatesthrough various means (Web, e-mail, IM, SMS). Some people call microblogging‘lifestreaming’, while others think it is just a lot of mundane, trivial stuff (e.g.‘having toast for breakfast’). Microblogging is addictive not because the content isinteresting but rather because you may want to find out what someone is doingright now. Through microblogging services such as Twitter, you can know veryminute things about someone’s life: what they are thinking, that they are tired, etc.Historically, we have only known that kind of information for a very few peoplethat we are close to (or celebrities). Microblogging is quite useful for getting a snapshot of what is going on in andfor interacting with your community or communities of interest. Similar to using ablog aggregator and scanning the titles and summaries of many blogs at once,thereby getting a feel for what is going on at a particular point in time, microblog-ging allows one to view status updates from many people in a compact (screen)space. Some microblogging services also have SMS integration, allowing one tosend updates and receive microblog posts from friends via a mobile phone (al-though Twitter dropped outbound SMS notifications for non-US residents during2008). One of the advantages of microblogs is that people can talk about a greaterrange of subjects due to the ease of writing a short post, since they are more likelyto talk about a variety of diverse topics in multiple microblog posts that are limitedto 140 characters as opposed to a writing a longer, single blog post. In fact, thisconstraint also makes microblogging somewhat more interactive due to the back-and-forth conversations that result when someone looks for clarification of what isbeing said in those shorter status updates. Microblogging is also more conversa-tional because everyone is using the same service and there are less delays loggingon or filling in profile fields than one has to do to when posting a comment onsomeone else’s blog. A disadvantage is that the momentum of microblogging sites like Twitter isnow such that you have to keep checking back much more regularly to be kept up-to-date with everything that is going on or to find those hidden gems of informa-tion or knowledge. If you are subscribed to a few hundred people it makes it diffi-cult (impossible even) to see all that is relevant since even the most interesting mi-crobloggers will not be talking about stuff that is interesting to you all the time.However, Twitter clients like TweetDeck23 do allow various searches to be set upin separate columns, such that updates relevant to a certain keyword or combina-23 (URL last accessed 2009-07-21)
  • 93. 5 Discussions 87tion of keywords (e.g. ‘galway OR ireland’, ‘semantic web’) can be monitoredquite easily, irrespective of whom one is following. This communication method is also promising for corporate environments infacilitating informal communication, learning and knowledge exchange (e.g.Yammer24 is an enterprise microblogging platform). Its so far untapped potentialcan be compared to that of company-internal wikis some years ago. Microblog-ging can be characterised by rapid (almost real-time) knowledge exchange andfast propagation of new information. For a company, this can mean real-timeQ&A and improved informal learning and communication, as well as status notifi-cations, e.g. about upcoming meetings and deliveries. However, the potential formicroblogging in corporate environments still has to be demonstrated with realuse cases (e.g. IBM has recently deployed an internal beta microblogging servicecalled Blue Twit25). We expect that a trend of corporate microblogging willemerge in the next few years similar to what happened with blogging, wikis andother Enterprise 2.0 services as we will describe in Chapter 12. Traditionally, microblogging has been mainly used by technically-minded Webusers and bloggers, but this is begging to change, with newspapers and celebritiesgetting in on the phenomenon. Microblog-type publishing can also be set up onpersonal services, for example, the WordPress platform offers a dedicated tem-plate interface (Prologue) that lets people publish short and real-time updates, andSixApart have recently developed their own installable microblogging platformcalled Motion following their acquisition of Pownce. However, there is no aggre-gation for personal microblogs that would take into account the special character-istics of it as a new medium. In the section on blogging, we discussed how it has led to ‘grassroots journal-ism’. To that extent, microblogging is an interesting phenomenon, especiallyTwitter, as updates can be posted in many ways and from different devices (e.g.via text message from mobile phones). Hence, it was one of the first media to re-port the major earthquake in China and the terrorist attacks in Mumbai26. Of course, one must be careful about choosing who to trust or not, as in any so-cial media service. By sharing their personal lifestreams, people sometimes exposethemselves to privacy issues, voluntary or not. Some services allow users to blockpublic access to their tweets, but most publish them for all to see. For example, itappears that more than 10 posts every two minutes on Twitter are about meetings,many of them being professional or corporate ones. These may contain some factsthat competitors can take into account.24 (URL last accessed 2009-07-21)25 (URL last accessed 2009-06-09)26 (URL last accessed 2009-06-09)
  • 94. 88 The Social Semantic Web5.3.1 The Twitter phenomenonTwitter was established by Evan Williams, formerly of Pyra Labs and Blogger.While working with Jack Dorsey and Biz Stone at Obvious Corp. on the podcast-ing directory service Odeo, they began developing the Twitter microblogging ser-vice in Ruby on Rails, and it took two weeks to build the first functional prototypeof Twitter27. It has become hugely popular - now ranked at website number 15 inthe world - with millions of users worldwide (20% of Twitter users are in Japan).As a result, there are many challenges to scaling since Twitter is one of the largestRails applications, and there are various scaling problems that have yet to besolved according to Williams. Like many social websites, Twitter has evolved as users of the application de-manded it. For example, Twitter initially did not have a system for allowing peo-ple to comment on each others’ tweets, so the users invented a convention by us-ing the ‘at’ sign and a username (e.g. @johnbreslin) to comment on other people’stweets. Twitter also has an API that has enabled third-party services like Twittervision(a map of the world showing various random tweets taking place). Twitter’s APIhas been quite successful, with dozens of desktop applications, others that extractdata and present it in different ways, various bots that post information to Twitter(URLs, news, weather, etc.), and more recently a timer application that will send amessage at a certain time period in the future for reminders (e.g. via the SMS gate-way). The API allows a simple service like Twitter to become more powerful andreusable in other applications. Although there is no official business model for Twitter beyond ads on theirJapanese service, an example of third-party commercial usage of Twitter is Woot,a single special offer item per day site that has a lot of followers on Twitter. Twitter has evolved beyond being a haven for social media gurus like RobertScoble or technology experts like Ward Cunningham (the creator of the wiki), as itnow has its fair share of celebrity ‘twitterers’ or ‘tweeple’ with many followers (infact, the first result from Google when you search for twitterer is actor and writerWil Wheaton, or ‘@wilw’ on Twitter). Some celebrities are twittering by proxy orvia their ‘social media directors’, but many celebrities take the time out to engagewith the public and with their fans by posting tweets themselves. Politicians havebeen using Twitter as part of their election campaigning, e.g. Barack Obama (withover 1.5 million followers in July 2009), John McCain and John Edwards. Actorshave also embraced Twitter: from the popular science fiction drama Heroes, GregGrunberg, Brea Grant and David H. Lawrence XVII are regular Twitter users.David Hewlett (Stargate Atlantis), movie star Luke Wilson, director Kevin Smith,comedian John Cleese and Stephen Fry also frequent Twitter, as do sportspeopleShaquille O’Neill, Lance Armstrong and Andy Murray. From the music world,27 (URL last accessed 2009-06-09)
  • 95. 5 Discussions 89there is Britney Spears (or at least her social media staff), MC Hammer and DaveMatthews. Other famous twitterers include Virgin founder Richard Branson, illu-sionist Penn Jillette, and former US vice-president Al Gore.5.3.2 Semantic microbloggingMichael Arrington wrote a post28 on the technology blog TechCrunch about theneed for a ‘decentralised Twitter’ and for open alternative microblogging plat-forms, which was picked up by technologists Dave Winer, Marc Canter and ChrisSaad amongst others. The SMOB29 or semantic microblogging prototype devel-oped in DERI (a Semantic Web research institute in NUI Galway), and availableas an open-source framework, is an example of how Semantic Web technologiescan provide an open platform for decentralised / distributed publishing of microb-logging content (see Figure 5.6), mainly using the FOAF and SIOC vocabularies.Fig. 5.6. Global architecture of distributed semantic microbloggging An aim of SMOB was also to demonstrate how such technologies can provideusers with a way to control, share and remix their own data as they want, notsolely dependant on the facilities provided by a third-party service. In this way,SMOB-published data belongs to the user who created it. As soon as someonewrites some microblog content using a SMOB client, the content is spread through28 (URL last accessed 2009-06-09)29 (URL last accessed 2009-07-20)
  • 96. 90 The Social Semantic Webvarious microblogging servers, or aggregators, but remains available locally to theuser who created it as depicted in Figure 5.6. Hence, if one aggregator closes forsome reasons, the user can still use their data as it really belongs to him or her andnot to any third-party aggregation service. This goal is more globally shared by theDataPortability project that will be described later.Fig. 5.7. Latest SMOB updates rendered in Exhibit In order to represent microblogging data, SMOB uses FOAF and SIOC tomodel microbloggers, their properties, account and service information, and themicroblog updates that users create. A multitude of publishing services can pingone or a set of aggregating servers as selected by each user, and it is important tonote that users retain control of their own data through self hosting as we detailedpreviously. The aggregate view of microblogs uses ARC230 for storage and query-ing, and MIT’s Exhibit31 faceted browser for the user interface as shown in Figure5.7. It therefore offers a user-friendly interface to display complex RDF data, ag-gregated from distributed sources. Moreover, in order to further benefit from Se-mantic Web technologies, microblog posts can also embed semantic tags, e.g.geographical tags which can leverage the GeoNames database to power new visu-alisations such as the map view in Figure 5.8.30 (URL last accessed 2009-06-09)31 (URL last accessed 2009-06-09)
  • 97. 5 Discussions 91 At the moment, the complete data set of updates is publicly available and canbe browsed with any RDF browser such as Tabulator, but in the future privacyconcerns can be addressed by requiring OpenID authentication. The SMOB clienthas also been adapted to allow cross posting to StatusNet (another distributed andopen microblogging platform) and to Twitter (via CURL and HTTP authentica-tion).Fig. 5.8. Map view of latest microblog updates with Exhibit5.4 Message boardsThe message board has been a popular feature of internet-based communicationsince the early days of mailing lists and Usenet newsgroups. Web-based messageboards or ‘forums’, which originated around 1995, are online discussion areas thatoperate in a similar manner to the dial-up bulletin board services and Usenetnewsgroups from the 1980s and 1990s. Since the late 1990s, message board sys-tems have become more sophisticated, allowing multiple message boards, catego-rised by common parent headings, to be hosted on a single site. One of the easiest ways of creating an online community is via the creation of amessage board. Message boards allow discussions to be held by many Internet us-ers in a community on a variety of subjects. Most forums on community sites em-
  • 98. 92 The Social Semantic Webploy some threaded display methods, where users post threads on a particulartopic, and other users can then reply by posting on these threads. Those who sharea common interest will discuss related topics of interest on the message board,forming a virtual sub-community. A message board normally contains a set of fo-rums classified into categories, and may also integrate event meeting calendars. Message boards have evolved beyond the traditional admin-maintained struc-ture into one where forums and categories can be created once a critical mass ofuser support has been received. The moderator of a forum has the responsibilityfor pruning undesirable threads and for banning unwanted users from the forum. Afeedback forum can be used to raise useful suggestions or bug reports that can in-crease the usability of the underlying software. These message boards are a thriving part of the current HTML Web. Posts on amessage board can be referenced via a URI, which is required for Semantic Webapplications. Some popular message board systems include vBulletin32, phpBB33,Invision Board, and the ezboard forum hosting service34. Websites of open-sourceprojects such as those hosted on include forum functionality toenable discussions between project members and software users. However, single message boards and multi-forum sites primarily exist as is-lands that are not connected together. Apart from some multi-function contentmanagement systems such as Drupal that offer a unified login or services usingOpenID, there have been few efforts towards connecting various message boardstogether (Zoints35, Klostu36) despite the potential benefits that this may offer (e.g.linking complementary topics across forum sites, enabling distributed conversa-tions, linking a user’s distributed posts). By interconnecting these message boardstogether and viewing them as part of an overall ‘boardscape’37, we can enablethese potential uses. The concept of a boardscape can be thought of as the world ofmessage boards: millions of users creating billions of posts across thousands ofmulti-forum sites on the Internet. It is the collection of all message boards and thepotential aggregated power of all board communities and their member collec-tives.5.4.1 Categories and tags on message boardsForum sites tend to be relatively narrow in scope, being dedicated to communitiesin niche areas (e.g. anime, sports, TV shows, etc.), and therefore are normally32 (URL last accessed 2009-07-16)33 (URL last accessed 2009-07-16)34 (URL last accessed 2009-07-16)35 (URL last accessed 2009-06-09)36 (URL last accessed 2009-06-09)37 (URL last accessed 2009-07-21)
  • 99. 5 Discussions 93categorised according to specific aspects of that niche interest. Other more gen-eral-purpose message boards may have more wide-ranging categories, for exam-ple, the Irish community website uses categorisations similar to thosefound in the Open Directory Project or the Yahoo! Directory, with top-level cate-gories ranging from Arts and Business to Sports and Technology. Many social networking sites have also incorporated community discussionfeatures such as message boards. Unfortunately, social networks also suffer frompoor categorisation of community areas. For example on the orkut site, hundredsof thousands of community message boards are classified in just 28 top-level cate-gories. This makes it difficult for users themselves to find communities matchingtheir interests, or for machines to match users to communities. As regards tagging content on message boards, a number of modules have re-cently been developed for board administrators that allow them to provide taggingfunctionality for users of their sites. There are also some newer message boardsystems (e.g. bbPress, vBulletin 3.7) that offer integrated tagging features. Thesetagging solutions usually provide a tag cloud of the most popular tags being usedby message board posters on a particular site, and via the boardscape they can leadto an overall tag cloud view of tags used across all message board sites that par-ticipate in the tagging process. BoardTracker38, a message board search engine, provides an aggregate searchacross 40,000 message boards from around the world. This search can also be in-tegrated by board administrators into their own discussion sites, providing resultsfrom the local site or from many other searchable sites on the boardscape. Threadsretrieved through a BoardTracker search may already have been tagged by theoriginal content creator, but they can also be tagged by the third party who per-formed the search (thereby creating new connections between content in theboardscape). Some sites and services use a hierarchical structured categorisation for classify-ing message boards, so that a particular message board may be linked to a sub-category in a taxonomy (e.g. Sports & Recreation > Football > American). Such astructured categorisation can be used to match user-defined interests to other pos-sible message boards of interest, either existing message boards on registration (innetworks of forums such as CrowdGather39) or newly-created boards through a pe-riodic matching system. New message boards to discuss a common interest couldalso be proposed or automatically created based on the level of demand for thetopic of interest as expressed in user profiles. Using the tagging system already mentioned, and by analysing the tags used inmessage boards that have already been categorised, a simple way of matching us-ers (via their most commonly used tags) and message boards (either with the high-est occurrences of that tag or in a category where that tag is prevalent) become ap-parent.38 (URL last accessed 2009-06-09)39 (URL last accessed 2009-07-18)
  • 100. 94 The Social Semantic Web5.4.2 Characteristics of forumsMessage boards have long been recognised as a place where the majority of dis-cussions between people on the Web take place. However, it has not been detailedhow much discussion actually takes place in the boardscape and what are its char-acteristics. Today’s boards are descendants of the dial-up bulletin board systems(BBS) that developed in parallel to newsgroups during the 1980s. After makingtheir appearance on the Web, boards diverged, evolved and were enhanced withsome new concepts and technologies. However, boards retained their basic con-ceptual design as a communal discussion platform. Also, much of the psychologi-cal and motivational aspects of boards remained intact. Much like their earlier ver-sions, where different dial-up BBS systems were physically disconnected fromone another and therefore had very little or no relationships to each other at all, to-day’s message boards are mostly isolated and tend to have no connections withother message boards on the Web, despite the fact that technical barriers weredropped. Message board owners tend to retain a prominent level of moderation and au-thentication for members and content. Almost all boards require that you registeras a user before you can take part in discussions. A great degree of isolation andcontrol over the flow of outbound traffic from the board is commonplace. Linkingand quoting discussions (or parts of them) from other boards is uncommon. Com-munication with members of other boards is impossible. This aspect of the board-scape is what Ron Kass of BoardTracker refers to as the ‘island mentality’. A term which is often used in the boardscape is ‘lurkers’, which refers to indi-viduals who are following (as viewers) the message board and discussions in it,but do not post or get involved directly in the discussion or community. It is esti-mated that the number of lurkers in a specific community is much greater than theactual number of registered and active members. While message boards vary greatly in most of their characteristics - be it the to-tal number of members on the board site, their age, gender, language, the theme ormain topics of the board, the site design, or even the board software - there are in-teresting aspects of the discussions taking place on these boards that arise from anaggregated view of all of the discussions generated on them. As evident from classifications performed by BoardTracker on messageboards40, there are boards on virtually every subject in every niche. While somebias exists towards technology and towards recreation, there are board communi-ties on even the most esoteric subjects. Another characteristic of message boardsis their true global nature, either in international coverage, multi-lingual, age andother demographic characteristics. Coverage of boards in different countries is ofcourse influenced by the degree of penetration of the Internet in those countries. Insome cases however, cultural characteristics influence the popularity of using40 (URL last accessed 2009-06-09)
  • 101. 5 Discussions 95message boards in some countries. As an example, the usage of message boards inthe Chinese market is relatively higher than the average European counterpart. Estimates of the total number of message board members show the true multi-national dimension of message boards. Altogether, it is estimated that over 450million people are registered members to message boards in the boardscape. Fig-ure 5.9 shows a summary of some geographical statistics gathered by Board-Tracker. The total number of discussions accumulated in the boardscape is esti-mated to be over 6 billion discussions and over 100 billion posts. As of January2009, BoardTracker had indexed over 60 million threads (approximately a billionposts) across about 40,000 forums and sub-forums worldwide. In a single day in2009, the number of posts indexed by BoardTracker was 4 million, correspondingto a yearly total of nearly 1.5 billion posts (note that the current coverage ofBoardTracker is estimated at about 4% of the total boardscape).Fig. 5.9. Board members distribution worldwide Big-Boards.com41, a statistics site for large message boards that covers about1,800 sites, reports over 158 million registered accounts with 6.7 billion posts onthose board sites. It is again important to note that Big Boards only lists forumswith over 500,000 posts, and as well as many more medium to small-sized forumsin the ‘long tail’, there are also other large message board sites not listed in thisservice including Yahoo! forums, MySpace forums, the ezboard forum networkand others. One key factor for a discussion is its length, i.e. how many exchanges are madein a thread. A single person starts a thread in a forum, and others in the forum willreply to that thread (in most cases). Some threads get no replies: for example, be-cause there is no call for a reply; there is no need for one (on board announce-ments for example); some threads are looking for information that others cannotprovide; a topic is locked; or for other reasons. Some threads, however, generatehundreds or thousands of replies. There are even extreme cases where the numberof replies made by members of the forum exceeds 100,000.41 (URL last accessed 2009-06-09)
  • 102. 96 The Social Semantic Web Figure 5.10 shows a histogram of the number of threads with specific numbersof replies. As can be seen, the biggest group in this histogram is the set of threadswithout replies. However, it still only accounts for 13% of the total number ofthreads. This means that 87% of threads on message boards get at least one reply.It is obvious then that message boards are not a lonely place like many other inter-net mediums. To emphasise this, we observe that a thread in the boardscape re-ceives about 16 replies on average. About 2.5% of the threads get between 51 and100 replies, less than 1% receive between 101 and 200 replies, and about 0.5% re-ceive more than 200 replies. While not a very high number, it does mean that in anaverage message board, one in two hundred threads receives about 200 replies ormore.Fig. 5.10. Percentage of threads ordered by number of repliesFig. 5.11. Search results from Google showing number of posts and authors from boards
  • 103. 5 Discussions 97 Google have implemented some message board parsing algorithms to deter-mine how many posts are on a thread, how many users posted on that thread andwhen the last post was made. This can be seen in the search result for ‘irish’ shown in Figure 5.11. It is not complete, and probably relies on identi-fying certain HTML structures for non-Google discussion sites, e.g. there is a blogdiscussion and a forum thread in the middle of the results that do not display thetotal posts or commenters. However, this is moving towards the Semantic Web vi-sion of providing more metadata about discussions on the Web to help you in find-ing more relevant information. Google’s ‘Rich Snippets’ effort (allowing webmas-ters to mark up their content with microformats and RDFa) is another move in thisdirection.5.4.3 Social networks on message boardsSocial networks have grown in popularity recently (more later), networking bothsocial acquaintances and professional associates. Many social networking siteshave incorporated community discussion features such as message boards. Ratherthan add a message board to a social network, we can also take advantage of thelarge number of message boards available to create parallel social networks. Some social websites are using FOAF to export social networking data fromuser profiles. However, for existing message board communities, incentives forusers to provide this semantically-rich content are necessary to aid in the integra-tion of these communities on the Semantic Web. One approach that can be used incombination with FOAF and other RDF export functionality is the development ofa social networking component that provides a graphical view of the bidirectionallinks between user profiles (and parallel FOAF representations) to create a ‘friend’connection. In a message board-based community, profile information on each user ismainly gathered at registration through the use of required fields that must becompleted by a user before an account can be fully registered. This can include in-terests, work details, and so on. Such information can form the basis of a FOAFprofile for a particular user, assuming that they have made the information pub-licly available (an option can be added that would allow the automatic creation ofa FOAF file from their profile if enabled by a user). A useful feature of some community-based message board systems is thefriends list or ‘buddy list’. This allows users to see when their friends are online,or to send all of their friends a private message at once. Each friends list is nor-mally private to a particular user, but by allowing users to make a portion or theentire list publicly accessible, the public friends can be used as part of a FOAF ex-port system. A FOAF exporter for the vBulletin message board system called
  • 104. 98 The Social Semantic WebvBFOAF42 has been developed, and a similar module has also been developed forthe phpBB system43. The user ID or hashed e-mail address can be used to create aunique URI for a FOAF profile, and the interest keywords can be mapped to URIsfor corresponding categories in the Open Directory Project44 (ODP) taxonomy orappropriate resources in DBpedia. To encourage the installation of the FOAF exporter, corresponding social net-work visualisation modules were created for both vBulletin45 and phpBB46. Thesocial networking module for vBulletin (called vBFriends) is shown in Figure5.12. The friends display mode for a particular user ID shows who is linked to andfrom that user by analysing the user IDs harvested from all friends lists (similar tohow the now-defunct Plink service dealt with foaf:knows entries from FOAFfiles). In 2007, Jelsoft Inc. provided integrated social networking functionality intheir flagship message board product vBulletin.Fig. 5.12. Original social networking module for vBulletin The implicit social networking information that can be derived from messageboard interactions and explicitly-defined buddy lists can be used to create a socialnetwork in reverse: a boardscape with semantic elements that spans across allmessage boards, linking users, forums and posts. Having such a structure enables42 (URL last accessed 2009-06-09)43 (URL last accessed 2009-06-09)44 (URL last accessed 2009-06-09)45 (URL last accessed 2009-06-09)46 (accessed 2009-06-09)
  • 105. 5 Discussions 99us to provide solutions to the limitations mentioned earlier with message board is-lands. A unified login, or links between various user accounts and the person whoowns them, can be provided via the boardscape - this has the associated advantageof being able to connect all of a person’s post content together (and that of theirfriends). Related topics can be also linked by topic tag, hierarchical category or di-rectly by distributed reply-type hyperlinks (similar to trackbacks). By leveragingall of this content, as well as social networks with FOAF, we can envision a firstdefinition of distributed social networking and online communities that we willlater detail in Chapter 11.Fig. 5.13. Sample of the boardscape and the interconnected nature of its users and posts Figure 5.13 shows a partial view of the boardscape, and some of the connec-tions that are contained therein. A person holds many user accounts on differentsites, and content is created about similar topics across various message boards.Users will know other users in their social network, either on the same site oracross a number of sites. A conversation may begin on one message board site, buteventually lead to and end up on a different message board elsewhere. By creatingthese connections between the users and posts on boards, we enable many interest-ing possibilities. We shall later discuss how SIOC can be used to represent thecontent on message boards and other social websites, and how this can be com-bined with social networking and personal profile information expressed in FOAF.
  • 106. 100 The Social Semantic Web Pidgin Technologies are the developers of a service called Klostu47, a site thatconnects message boards from around the world through a central access portaland unified login system. Klostu allows one to find thousands of message boardsin one place: people can make friends via their social networking functionality orbrowse and search through millions of forum topics. Klostu are also releasingmodules for various message board systems, allowing board owners to integratethe Klostu single sign-on system into their own sites.5.5 Mailing lists and IRCA large number of systems preceding the current Web are still deployed andwidely used on the Internet. E-mail is used for exchanging messages and files inan asynchronous way, Usenet is still used to exchange messages, and IRC is usedfor synchronous chat. E-mail is still the most prevalent asynchronous one-to-manycommunication medium on the Internet. Mailing lists provide a quick method toset up communications features for an online community. They were also one ofthe first methods used to set up and support a closed-group online community. Un-fortunately, e-mail and mailing lists can still be subject to abuse (e.g. through mailbombs, spam, or other unsolicited mail). Mailing lists still occupy a huge segmentof online discussions, and along with the growth of the Web, mailing lists havemoved towards web-based mechanisms and online archives, making them acces-sible to a wider audience. Although e-mail’s main transport protocols are SMTP, POP3, and IMAP4, andthe format is text-based (RFC 82248), the contents of mailing lists are also beingmade available on the Web in HTML format. For example, Yahoo! Groups (for-merly eGroups) allows the creation of private or public community mailing lists,with messages either browsable via the Web or sent via individual or digest-typee-mails. Archives of mailing lists hosted on individual servers are often madeavailable online in HTML, using tools such as GNU Mailman or MHonArc. Somemailing lists, such as DBWorld, already have message headers defined to includeannotations in semi-structured format, e.g. metadata descriptions about calls forpapers. To capture this large amount of legacy data exchanged in online communitiesin a semantic form, these systems and protocols need to be considered for transla-tion to the Semantic Web. In contrast to web-based systems, where we just need totranslate the data, we may need to employ protocol wrappers to move from legacyprotocols to the Semantic Web. For example, for e-mail, we may need to translatethe data representation format from RFC 822 to RDF.47 (URL last accessed 2009-06-09)48 (URL last accessed 2009-06-09)
  • 107. 5 Discussions 101 The SWAML (Semantic Web Archive of Mailing Lists) project from (Fernan-dez et al. 2007) is an exporter for mailing list content in Semantic Web format.SWAML reads a collection of e-mail messages stored in a Unix-type mailbox(from a mailing list compatible with RFC 4155) and generates an RDF descriptionof it. It is written in Python, using SIOC as the main ontology to represent a mail-ing list in RDF, and is also available as a Debian package. SWAML fulfils a much-needed requirement for the Semantic Web: to be ableto refer to semantic versions of e-mail messages and their properties using re-source URIs. By reusing the SIOC vocabulary for describing online discussions,SWAML allows users of SIOC to refer to e-mail messages from other discussionstaking place on SIOC-enabled forums, blogs, etc., such that distributed conversa-tions can eventually occur across these discussion media. Also, by providing e-mail messages in SIOC format, SWAML provides a rich source of data from mail-ing lists for use in SIOC applications.Fig. 5.14. The Buxon browser for viewing semantic mailing list data The SWAML creators have also developed their own applications that workwith SIOC mailing list data. The Buxon browser (see Figure 5.14), developed inPyGTK, for browsing SIOC forums (in this case mainly mailing lists) is an inter-esting example of a program using SIOC message data that can come from one ormany sources (e.g. from a ‘virtual forum’ or container of posts from multiple sitesand systems). For example, the ‘’ example script packaged with python-
  • 108. 102 The Social Semantic Weblibgmail can be used to download and convert an inbox from a Google Gmail ac-count to Unix mailbox format, and then ‘’ can be run on that mailbox toconvert it to SIOC RDF. The resulting RDF can then be browsed with the Buxonapplication. Another interesting SIOC-enabled application is the Mailing List Explorer49.MLE allows the exploration of mailing lists via query, timeline view, etc. It pro-vides RDF representations (including SIOC metadata) for any valid W3C publicmailing list archive. A Java-based application for generating SIOC data from mail-ing list archives has also been developed50, leveraging RSS and Atom feeds fromweb-based message archives. The application uses the RDFReactor51 library forcreating RDF APIs. Finally, IRC (Internet Relay Chat) can also benefit from Semantic Web tech-nologies. The sioclog52 project aims to record IRC conversations in a machine-readable way using RDF (in particular, by using the SIOC vocabulary). Hence,IRC conversations can be browsed using the same tools as for mailing lists, forexample, the Buxon browser described before. Through the MicroTurtle IRCbot53, users can also define links to their FOAF profiles from IRC, thereby identi-fying IRC content as theirs and lifting this content into the world of Linked Data.49 (URL last accessed 2009-06-09)50 (URL last accessed 2009-06-09)51 (URL last accessed 2009-06-09)52 (URL last accessed 2009-07-07)53 (URL last accessed 2009-07-07)
  • 109. 6 Knowledge and information sharing‘Universal access to all knowledge can be one of our greatest achievements’,according to Brewster Kahle. There have been various efforts to categoriseworld knowledge and to leverage this using semantic technologies, e.g.through Wikipedia and the DBpedia, and with Cycorp (developers of the Cycknowledge base of common-sense knowledge) joining the Linking Open Datainitiative via their OpenCyc project. There have also been some successes invarious question and answering systems with pieces of knowledge that can bemined and found. These may be two extreme cases, but the popularity of so-cial websites for organising knowledge shows that the answer lies somewherein the middle at a sweet spot: some organisation and leveraging this via se-mantics, but not too much. Wikipedia and the DBpedia is a positive step inthis direction, and the question and answering approach can still be broughtcloser to the Wikipedia community-created knowledge approach.6.1 WikisMany people are familiar with the Wikipedia1, but less know exactly what a wikiis. A wiki is a website which allows users to edit content through the same inter-face they use to browse it, usually a web browser, while some desktop-based wikisalso exist. This facilitates collaborative authoring in a community, especially sinceediting a wiki does not require advanced technical skills. A wiki consists of a setof web pages which can be connected together by links. Users can create newpages (e.g. if one for a certain topic does not exist), and they can also change (orsometimes delete) existing ones, even those created by other members. TheWikiWikiWeb was the first wiki, established by Ward Cunningham in March1995, and the name is based on the Hawaiian term wiki, meaning ‘quick’, ‘fast’,or ‘to hasten’. Wiki often act as informational resources, like a reference manual, encyclopae-dia, or handbook. They amass to a group of web pages where users can add con-tent and others can edit the content, relying on cooperation, checks and balancesof its members, and a belief in the sharing of ideas. This creates a community ef-fort in resource and information management, disseminating the ‘voice’ amongstmany instead of concentrating it upon few people. Therefore, contrary to howblogs reflect the opinions of a pre-defined set of writers (or a single author), wikis1 (URL last accessed 2009-06-09)J.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_6,© Springer-Verlag Berlin Heidelberg 2009
  • 110. 104 The Social Semantic Webuse an open approach whereby anyone can contribute to the value of the commu-nity. Changing a wiki page is quite straightforward. For example, on the MediaWikisystem employed by the Wikipedia, you simply click on the ‘Edit’ tab in an arti-cle, type in the new text, and then click on ‘Save Page’. Wikis employ a simplemarkup system for linking to other articles or for formatting text, e.g. [[Ireland]]will provide a link to the article on Ireland, and putting three single quotes arounda word (i.e. ‘‘‘important’’’) will format the word in bold type. One of the most well-known and highly-used wikis is the Wikipedia free-accessonline encyclopaedia. Wikis are also being used for free dictionaries, book reposi-tories, event organisation, writing research papers, project proposals and evensoftware development or documentation. In this way, the openness of wiki-basedwriting can be seen as a natural follow-up to the openness of source-code modifi-cation. Wikis have become increasingly used in enterprise environments for col-laborative purposes: research projects, papers and proposals, coordinating meet-ings, etc. SocialText2 produced the first commercial open-source wiki solution,and many companies now use wikis as one of their main intranet collaborationtools. However, wikis may break some existing hierarchical barriers in organisa-tions (due to a lack of workflow mechanisms, open editing by anyone with access,etc.) which means that new approaches towards information sharing must be takeninto account when implementing wiki solutions. We shall discuss this more in thechapter dedicated to social software in enterprise environments. There are hun-dreds of wiki software systems now available, ranging from MediaWiki3, the sof-tware used on the Wikimedia family of sites, and Eugene Eric Kim’s PurpleWiki4,where fine-grained elements on a wiki page are referenced by purple numbers (aconcept of Doug Engelbart), to Alex Schröder’s OddMuse5, a single Perl scriptwiki install, and WikidPad6, a single-user desktop-based wiki for notes. Many areopen source, free, and will often run on multiple operating systems. The differ-ences between wikis are usually quite small but can include the development lan-guage used (Java, PHP, Python, Perl, Ruby, etc.), the database required (MySQL,flat files, etc.), whether attachment file uploading is allowed or not, spam preven-tion mechanisms, page access controls, RSS feeds, etc. Personal wikis are often used on desktop systems for personal informationmanagement. Such wikis need to be very simple, very fast, and very usable (for‘note-taking on steroids’). As well as WikidPad, popular personal wikis includeTomboy and VoodooPad. Typical uses are for organising notes, links, categorisingto do lists, and appointments.2 (URL last accessed 2009-06-09)3 (URL last accessed 2009-06-09)4 (URL last accessed 2009-06-09)5 (URL last accessed 2009-06-09)6 (URL last accessed 2009-06-09)
  • 111. 6 Knowledge and information sharing 1056.1.1 The WikipediaThe Wikipedia project consists of over 250 different wikis, corresponding to a va-riety of languages. The English-language one is currently the biggest, with nearlythree million pages, but there are wikis in languages ranging from Gaelic to Chi-nese. A typical wiki page will have two buttons of interest: ‘Edit’ and ‘History’.Normally, anyone can edit an existing wiki article, and if the article does not existon a particular topic, anyone can create it. If someone messes up an article (eitherdeliberately or erroneously), there is a revision history - as in most wiki engines -so that the contents can be reverted or fixed by the community. Thus, while thereis no pre-defined hierarchy in most wikis, content is auto-regulated thanks to anemergent consensus within the community, ideally in a democratic way (for ex-ample, most wikis include discussions pages where people can discuss sensibletopics). There is a certain amount of ego-related motivation in contributing to a wikilike the Wikipedia. People like to show that they know things, to fix mistakes andfill in gaps in underdeveloped articles (stubs), and to have a permanent record ofwhat they have contributed via their registered account. By providing a templatestructure to input facts about certain things (towns, people, etc.), wikis also facili-tate this user drive to populate wikis with information. As well as the Wikipedia, the Wikimedia Foundation has a family of sites in-cluding the Wiktionary and Wikibooks. Wikibooks also features some annotatedtexts; indeed, there is much public domain material available from free book sitessuch as Project Gutenberg7 that is ripe for annotation through such efforts.6.1.2 Semantic wikisWe discussed semantic blogging in the previous chapter, but it is not just blogposts that are being enhanced by structured metadata and semantics - this is hap-pening in many other Social Web application areas. Wikis such as the Wikipediahave contained structured metadata in the form of templates for some time now,and at least twenty ‘semantic wikis’8 have also appeared to address a growingneed for more structure in wikis. There has been a move from wikis as editors forweb pages to semantic wikis that can act as sophisticated annotation systems (Fig-ure 6.1). The sweet spot lies somewhere in the middle: some structures and anno-tations, but not too much that would discourage users from providing these seman-tics. In his presentation on ‘The Relationship Between Web 2.0 and the SemanticWeb’9, Mark Greaves (from Vulcan Inc. and formerly with DARPA) said that se-7 (URL last accessed 2009-06-09)8 (accessed 2009-06-09)9 (URL last accessed 2009-06-09)
  • 112. 106 The Social Semantic Webmantic wikis are a promising answer to various issues associated with semanticauthoring, by reducing the investment of time required for training on an annota-tion tool and by providing incentives required for users to contribute semanticmarkup (attribution, visibility and reuse by others).Fig. 6.1. The move from wikis to semantic wikis Typical wikis usually enable the description of resources in natural language.By additionally allowing the expression of knowledge in a structured way, wikiscan provide advantages in querying, managing and reusing information. Wikissuch as the Wikipedia have contained structured metadata in the form of templatesfor some time now (to provide a consistent look to the content placed within arti-cle texts), but there is still a growing need for more structure in wikis (e.g. theWikipedia page about Ross Mayfield links to about 25 pages, but it is not possibleto answer a simple question such as ‘find me all the organisations that Ross hasworked with or for’). Templates can also be used to provide a structure for enter-ing data, so that it is easy to extract metadata about the topic of an article (e.g.from a template field called ‘population’ in an article about London). Semantic wikis bring this to the next level by allowing users to create semanticannotations anywhere within a wiki article’s text for the purposes of structured ac-cess and finer-grained searches, inline querying, and external information reuse.Generally, those annotations are designed to create instances of domain ontologiesand their related properties (either explicit ontologies or ontologies that willemerge from the usage of the wiki itself), whereas other wikis use semantic anno-tations to provide advanced metadata regarding wiki pages. Obviously, both layersof annotation can be combined to provide advanced representation capabilities, asshown in Figure 6.2. For example, the Semantic MediaWiki system allows peopleto add structured data into pages, such as typed links and attributes (relationshipsand number / text properties respectively). By allowing people to add such extrametadata, the system can then show related pages (either through common rela-tionships or properties, or by embedding search queries in pages). These en-hancements are powered by the metadata that the people enter (aided by semanticwiki engines). A semantic wiki should have an underlying model of the knowledge describedin its pages, allowing one to capture or identify further information about thepages (metadata) and their relations. The knowledge model should be available ina formal language as RDFS or OWL, so that machines can (at least partially)
  • 113. 6 Knowledge and information sharing 107process and reason on it. For example, a semantic wiki would be able to capturethat an ‘apple’ article is a ‘fruit’ (through an inheritance relationship) and presentyou with further fruits when you look at the apple article. Articles will have acombination of semantic data about the page itself (the structure) and the object itis talking about (the content), as shown in Figure 6.2.Fig. 6.2. The connection between structural (page) metadata and content (relating to the de-scribed concept) metadata Some semantic wikis also provide what is called inline querying. For example,in SemperWiki10, questions such as ‘?page dc:creator EyalOren’ (find me all pageswhere the creator is Eyal Oren) or ‘?s dc:subject ‘todo’’ (show all me all my to doitems, as shown in Figure 6.3) are processed as a query when the page is viewedand the results are shown in the wiki page itself (Oren et al. 2006). Also, when de-fining some relationships and attributes for a particular article (e.g. ‘foaf:gendermale’), other articles with matching properties can be displayed along with the ar-ticle. Moreover, some wikis such as IkeWiki (Schaffert 2006) feature reasoningcapabilities, for example, retrieving all instances of foaf:Person when querying fora list of all foaf:Agent(s) since the first class subsumes the second one in theFOAF ontology.10 (URL last accessed 2009-06-09)
  • 114. 108 The Social Semantic WebFig. 6.3. Performing inline queries for to do items using the SemperWiki semantic wiki system Finally, just as in the semantic blogging scenario, wikis can enable the Web tobe used as a clipboard, by allowing readers to drag structured information fromwiki pages into other applications (for example, geographic data about locationson a wiki page could be used to annotate information on an event or a person inone’s calendar application or address book software respectively). Semantic MediaWikiOne of the most popular semantic wikis is Semantic MediaWiki (Krötzsch et al.2006), an extension to the popular MediaWiki system. Semantic MediaWikiallows for the expression of semantic data describing the connection from onepage to another, and attributes or data relating to a particular page. Let us take an example of providing structured access to information via a se-mantic wiki. There is a Wikipedia page about JK Rowling that has a link to ‘HarryPotter and the Deathly Hallows’ (and to other books that she has written), to Edin-burgh because she lives there, and to Scholastic Press, her publisher. In a tradi-tional wiki, you cannot perform fine-grained searches on the Wikipedia data setsuch as ‘show me all the books written by JK Rowling’, or ‘show me all authorsthat live in the UK’, or ‘what authors are signed to Scholastic’, because the type oflinks (i.e. the relationship type) between wiki pages are not defined. In SemanticMediaWiki, you can do this by linking with [[author of::Harry Potter and theDeathly Hallows]] rather than just the name of the novel. There may also be some
  • 115. 6 Knowledge and information sharing 109attribute such as [[birthdate:=1965-07-31]] which is defined in the JK Rowling ar-ticle. Such attributes can be used for answering questions like ‘show me authors overthe age of 40’ or for sorting articles, since this wiki syntax is translated into RDFannotations when saving the wiki page. Moreover, page categories are used tomodel the related class for the created instance. Indeed, in this tool, as in most se-mantic wikis that aim to model ontology instances, not only do the annotationsmake the link types between pages explicit, but they also make explicit the rela-tionships between the concepts referred to in these wiki pages, thus bridging thegap from documents plus hyperlinks to concepts plus relationships. For instance,in the previous example, the annotation will not model that ‘the page about JKRowling is the author of the page about Harry Potter and the Deathly Hallows’ butrather that ‘the person JK Rowling is the author of the novel Harry Potter and theDeathly Hallows’. Since Semantic MediaWiki is completely open in terms of the terms used forannotating content, the underlying data model, i.e. the different ontologies used tomodel the instances, evolve according to the user behaviour. For example, each‘Category’ page leads to a new class in the ontology. However, extracted data maybe subject to heterogeneity problems. For instance, some users will use [[authorof:somebookname]] while others will prefer [[has written:somebookname]], lead-ing to problems when querying data since the semantics of the two relationshipsare different despite the common ideas underlying them. Other wikis such as On-toWiki (Auer et al. 2006), IkeWiki or UfoWiki (Passant and Laublet 2008a) canassist the user when modelling semantic annotations, in order to avoid those het-erogeneity issues and to provide data that is based on pre-defined ontologies sothat it can be more easily and efficiently re-used for querying and navigating thewiki. OntoWikiOntoWiki11 is a semantic wiki developed by the AKSW research group at theUniversity of Leipzig that also acts as an agile ontology editor and distributedknowledge engineering application. Unlike other semantic wikis, OntoWiki reliesmore on form-based mechanisms for the input of structured data rather than usingsyntax-based or markup-based inputs. One of the advantages of such an approachis that complicated syntaxes for representing structured knowledge can be hiddenfrom wiki users and therefore syntax errors can be avoided. OntoWiki visually presents a knowledge base as an information map, with dif-ferent views on available instance data. It aims to enable intuitive authoring ofsemantic content, and also features an inline editing mode for editing RDF con-tent, similar to WYSIWIG for text documents. As with most wikis, it fosters social11 (URL last accessed 2009-07-16)
  • 116. 110 The Social Semantic Webcollaboration aspects by keeping track of changes and allowing users to discussany part of the knowledge base, but OntoWiki also enables users to rate andmeasure the popularity of content, thereby honouring the activities of users. OntoWiki enhances the browsing and retrieval experience by offering semanti-cally-enhanced search mechanisms. Such techniques can decrease the entrancebarrier for domain experts and project members to collaborate using semantictechnologies. OntoWiki is open source and is based on PHP and MySQL.6.1.3 DBpediaWhile it is not a wiki per-se, it is worth mentioning DBpedia in relation to wikisand semantic wikis. DBpedia12 provides an RDF export of the Wikipedia and canbe seen as one of the core components of the Linking Open Data project. It hasbeen created by exporting the ‘infoboxes’ (i.e. metadata entered on various articlesfor pre-defined template structures) from various language versions of Wikipediaand linking them together. By weaving Wikipedia articles and related objects intothe Semantic Web, DBpedia defines URIs for many concepts so that people canuse them in their semantic annotations. For example, one can define that they areinterested in the Semantic Web and located in France by writing these two triples::me foaf:topic_interest <>:me geonames:locatedIn <> The DBpedia data set is freely available for download and it also provides apublic SPARQL endpoint so that anyone can interact with it for advanced query-ing capabilities. For example, to identify all actors born in New York that starredin a movie directed by Quentin Tarantino, one would usually have to browse tensof web pages. With the DBpedia, a single SPARQL query can be used to find theanswer to that question. The following query can be posed at the DBpediaSPARQL endpoint13 to identify the relevant movies and actors, leading to the an-swers shown in Figure 6.4.SELECT DISTINCT ?movie ?person WHERE { ?movie dbpedia-owl:director <> . ?movie dbpedia-owl:starring ?person . ?person dbpedia-owl:birthplace <> .}12 (URL last accessed 2009-06-09)13 (URL last accessed 2009-07-16)
  • 117. 6 Knowledge and information sharing 111Fig. 6.4. Results of a SPARQL query for movie information from DBpedia An interesting application related to DBpedia is DBpedia Mobile14, a ‘location-centric DBpedia client application for mobile devices’ that consists of a map inter-face, the Marbles Linked Data Browser15 and a GPS-enabled launcher application.The application displays nearby DBpedia resources (from a set of 300,000) basedon a users’ geolocation as recorded through his or her mobile device. Efforts arealso ongoing towards allowing DBpedia to feed new content back into the Wikipe-dia16 (e.g. by suggesting new values for infoboxes, or by contributing back newmaps created via DBpedia Mobile). Other applications can make use of DBpediato provide third-party services, for example, Faviki17 allows users to bookmarkweb pages using common terms from the Wikipedia (as extracted by DBpedia).6.1.4 Semantics-based reputation in the WikipediaAs a global, independent and neutral framework to which we can all contributecontent, Wikipedia could serve as the basis for a de-facto global and open reputa-tion system. At the moment, Wikipedia does not provide much information onpeople’s reputations, i.e. those who make changes to articles are not very visibleon Wikipedia and are not treated as experts as such. On the Wikipedia website, itis often the case that the contributor who may know the most about an article isnot clearly identified in the Wikipedia article as being the foremost expert. There have been various attempts to establish reputation sites on the Web, e.g.Naymz18, which may help a person to improve their visibility in search engines.However, there is a problem with these sites in that a person’s reputation can onlybe truly reflected online if they regularly contribute to the site and maintain an up-to-date version of their profile with all of their achievements. Another issue is thatpeople who already have a good reputation will most probably not join these sites,14 (URL last accessed 2009-06-09)15 (URL last accessed 2009-07-16)16 (URL last accessed 2009-07-07)17 (URL last accessed 2009-07-14)18 (URL last accessed 2009-06-09)
  • 118. 112 The Social Semantic Webperhaps due to time constraints, or if reputation is related to the number of connec-tions or endorsements one has (which may be by invitation). Wikipedia can be improved by the addition of a global reputation system withembedded semantics. This could be achieved by placing larger emphasis on thediscussion pages in the Wikipedia, and by introducing threaded structures in thesepages from which expertise would emerge. For example, experts could emergefrom their actions in discussion pages when their suggested changes have been ac-cepted, highlighting those who made the best changes on the article page itself. If we include microcontent such as microformats or RDFa in these pages, wesolve two problems at one stroke: (1) Wikipedia benefits from a richer reputationframework where people can be motivated to add contextual semantic informationto make their content more searchable (directly benefiting their own reputations),and (2) this can also move forward the Semantic Web, by solving the issue of whowill be motivated to add the semantics to the Semantic Web and why. This infor-mation can also be used to power services like Garlik’s QDOS19 that aim to meas-ure people’s ‘digital status’ or estimated online rating. Related work on Wikipediaand trust or reputation measures has been described by (McGuinness et al. 2006)and (Adler and de Alfaro 2007).6.2 Other knowledge services leveraging semanticsWe shall now discuss some other knowledge services that are benefiting fromtheir usage of semantic technologies, including the Twine service from RadarNetworks, the Internet Archive, Freebase, and OpenLink Data Spaces.6.2.1 TwineRadar Networks is one of a number of startup companies that is practically apply-ing Semantic Web technologies to social software applications. Radar’s flagshipproduct is called Twine, and the company is led by CEO Nova Spivack20. In 2003,Radar developed a desktop-based semantic tool called ‘Personal Radar’, a per-sonal assistant for knowledge sharing. It was effectively a Java-based P2P versionof ‘Twine’ powered by RDF and with some appealing visualisations. At the time,most venture capitalists were not interested, but Radar received angel fundingfrom Vulcan Capital (the founder Paul Allen is said to believe that adding struc-ture to the Web is inevitable).19 (URL last accessed 2009-06-09)20 (URL last accessed 2009-06-09)
  • 119. 6 Knowledge and information sharing 113 The Twine service allows people to share what they know and can be thoughtof as a knowledge networking application that allows users to share, organise, andfind information with people they trust. People create and join ‘twines’ (commu-nity containers) around certain topics of interest, and items (documents, book-marks, media files, etc., that can be commented on) are posted to these twinesthrough a variety of methods. Twine has a number of novel and useful functionsthat elevate it beyond the social bookmarking sites to which it has been compared,including an extensive choice of twineable item types, twined item customisation(‘add detail’ allows user-chosen metadata fields to be attached to an item) and the‘e-mail to a twine’ feature (enabling twines to be populated through messages sentto a custom e-mail address). The focus of Twine is these interests. Where Facebook is often used for manag-ing one’s social relationships and LinkedIn is used for connections that are relatedto one’s career, Twine can be used for organising one’s interests. Spivack alsocalls this ‘interest networking’ as opposed to social networking. With Twine, one can share knowledge, track interests with feeds, carry out in-formation management in groups or communities, build or participate in commu-nities around one’s interests, and collaborate with others. The key activities are‘organise, share and discover’. Twine allows people to find things that might be ofinterest to them based on what they are doing. The key ‘secret sauce’ according toSpivack is that everything in Twine is generated from an ontology. Even the siteitself - user interface elements, sidebars, navigation bar, buttons, etc. - come froman application-definition ontology. Similarly, the Twine data is modelled on a custom ontology. However, Twineis not just limited to these internal ontologies, and Radar is beginning the processof bringing in other external ontologies and using them within Twine. At a laterstage, they hope to allow people to make their own ontologies (e.g. to express do-main-specific content) resulting in the Twine community having a more extensibleinfrastructure. Twine performs natural language processing on text, mainly providing auto-matic tagging with semantic capabilities. It has an underlying ontology with a mil-lion instances of thousands of concepts to generate these tags (at present, Twine isexposing just some of these). Radar are also working on statistical-analysis andmachine-learning approaches for clustering of related content to show people,items and interests that are related to each other (for example, to give informationto users such as ‘here are a selection of things that are all about movies you like’). Twine search also has semantic capabilities. For example, bookmarks can befiltered by the companies they are related to, or people can be filtered by theplaces they are from. Underneath Twine, a lot of research work on scaling hasbeen carried out, but it is not trying to index the entire Web. However, Twine doespull in related objects (e.g. from links in an e-mail), thereby capturing informationaround the information that you bring in and that you think is important. Twine wants to bring semantics to the masses, and therefore it is not just aim-ing at Semantic Web enthusiasts but rather at mainstream users. The interface has
  • 120. 114 The Social Semantic Webto be simple so that someone who knows nothing about structured data or auto-matic tagging should be able to figure out in a few minutes or even seconds howto use it. Individuals are Twine’s first target market, allowing them to author anddevelop rich semantic content. For example, this could be a professional who hasa need for a particular interest in some technical subject that is outside the scopeof what they are doing at the moment. However, such a service becomes morevaluable when users are connected to other people, if they join groups, therebygiving a richer network effect. The main value proposition for these users is thatthey can keep track of things they like, people they know, and capturing knowl-edge that they think is important. When groups start using Twine, collective intelligence begins to take place (byleveraging other people who are researching material, finding items, testing,commenting, etc.). It is a type of communal knowledge base similar to other ser-vices like Wikia or Freebase. However, unlike many public communal sites, inTwine more than half of the data and activities are private (60%). Therefore pri-vacy and permission control is very important, and it is deeply integrated into theTwine data structures. Since Twine left beta, public twines have become visible tosearch engines and SEO has been applied to increase the visibility of this content. Twine is powered by Java, PostgreSQL and WebDAV. Since relational data-bases are not optimised for the ‘shape’ of semantic data that is being stored inTwine, the data store had to be tweaked. Twine uses an eight-element tuple store(subject-predicate-object, provenance, time stamp, confidence value, and otherstatistics about the triple or item itself). Predicate inferencing can be performedacross statements for access control, etc. Some of the feature requests for Twine include import capabilities, interopera-bility with other applications, and the aforementioned ability to use other ontolo-gies. At the moment, Twine works with e-mail (sending notifications out and al-lowing the twining of e-mails sent in), RSS (pushing feeds out), and browsers (e.g.for bookmarking). There have been various requests for interoperability with mindmaps, various databases, and enterprise applications. Twine currently has a RESTAPI which allows people to make their own add-ons. In terms of data interoperability, semantic data can be obtained from Twine inRDF for reuse elsewhere (by appending ‘?rdf’ to the end of any Twine URL).Having already hardcoded some interoperability with services like Amazon.comand provided import functionality from, Radar are also looking at po-tential adaptors to other services including Digg, desktop bookmark files, Outlookcontacts, Lotus Notes, Exchange and Freebase. With such a service, there is a requirement for duplication detection. Most peo-ple submit similar bookmarks and it is reasonably straightforward to identifythese, e.g. when the same item is arrived at through different paths on a websiteand has different URLs. However some advanced techniques are required whenthe content is similar but comes from different locations on the Web. Referring to our earlier discussion on object-centred sociality in Chapter 3,there is great potential in the community aspects of twines. These twines can act
  • 121. 6 Knowledge and information sharing 115as ‘social objects’ that will draw people back to the service in a much strongermanner than other social bookmarking sites currently do (in part, this is due tothere being a more identifiable home for these objects and also due to the im-proved commenting facilities that Twine provides). Radar is focussing on advertising as the first revenue stream for Twine. SinceTwine has semantic profiles for both users and groups, it can understand and lev-erage their interests quite effectively. Radar will be pilot testing sponsored contentor advertisements in Twine based on these interests. According to Spivack, ifsomething is extremely relevant to your interests, then it is almost as valuable ascontent (even if it is sponsored). When Radar began work on the Twine applica-tion, they also started working on a commercial version of the underlying plat-form. One of their aims is to allow non-Semantic Web savvy people to build ap-plications that use the Semantic Web without having to do any programming.6.2.2 The Internet ArchiveThe Internet Archive21 is home to various types of media archived on the Web,from books and web pages to audio and video content. They also host legally-downloadable software titles (e.g. old software that can be reused or replayed viavirtual machines or emulators). Media can be reviewed, rated and bookmarked forsharing with other users. Due to the huge amount of both textual and multimediacontent on the site, there are many advantages to leveraging semantics and inter-linking the media in such a huge data source. According to Brewster Kahle, co-founder of the Internet Archive, one book isapproximately 1 MB in size, so all of the books in the US Library of Congress (26million books of them) would correspond to about 26 TB (with images, that figurewould be somewhat larger)22. At present, it costs about $30 to scan a book in theUS. For about 10 cents a page, books or microfilm can now be scanned at variouscentres around the United States and put online. 250,000 books have been scannedin so far and are held in eight online collections. Such books can also be madeavailable to recipients of laptops through the OLPC project23. However, most peo-ple like having printed books, so book mobiles for print-on-demand books are be-ginning to appear. Such a book mobile charges just $1 to print and bind a shortbook. There are a number of issues related to putting audio or recorded sound worksonline. At best, there are two to three million discs that have been commerciallydistributed, but the issue with putting these online is in relation to rights. TheInternet Archive has 100,000 items in 100 collections. Audio costs about $10 per21 (URL last accessed 2009-06-09)22 (URL last accessed 2009-06-09)23 (URL last accessed 2009-06-09)
  • 122. 116 The Social Semantic Webdisk (roughly one hour) to digitise, so about a third of the price of a book. Rock‘n’ roll concerts are the most popular category of the Internet Archive audio files(with 40,000 concerts so far); for ‘unlimited storage, unlimited bandwidth, for-ever, for free’, the Internet Archive offers bands their hosting service if they waiveany issues with rights. There are various cultural materials that do not work wellin terms of record sales, but there are many people who are very interested in hav-ing these published online via services such as the Internet Archive. Video makes up another large portion of the Internet Archive with 55,000 vid-eos in 100 collections. Most people think of Hollywood films in relation to video,but at most there are 150,000 to 200,000 video items that are designed for movietheatres, and almost half of these are from India. Many films are locked up incopyright, and are therefore problematic. The Internet Archive has about 1,000 ofthese films (out of copyright or otherwise permitted). However, there are manyother types of video materials that people want to see: thousands of archival films,advertisements, training films and government films that have been downloadedfrom the website millions of times. Academics can also put copies of their videolectures online at the Internet Archive. Video costs about $15 per hour of content for digitisation services. There are anestimated 400 channels of ‘original’ television content (ignoring duplicate re-broadcasts), but if you were to record a television channel for one year, it wouldrequire about 10 TB of data with a cost of $20,000 for that year. The TelevisionArchive24 team from the Internet Archive have been recording 20 channels fromaround the world since 2000 (it is currently about a petabyte in size). This corre-sponds to about 1.5 million hours of TV, but little has been made publicly avail-able due to copyright reasons (apart from video recorded during the week of the9/11 attacks). The Internet Archive is probably best known for archiving web pages. Their‘Wayback Machine’ archive25 started in 1996, by taking a snapshot of every ac-cessible page on a website. It is now about 2 PB in size, with over 100 billionpages. Most people use this service to find their old materials again, since mostpeople ‘don’t keep their own materials very well’ according to Kahle (e.g. Yahoo!came to the Internet Archive to get a 10-year-old version of their own homepage). Preservation or how to keep all of these materials available to the public is animportant task for the Internet Archive. The Internet Archive in San Francisco hasfour employees and 1 PB of storage: including the power bill, bandwidth and peo-ple costs, their total costs are about $3 million per year; 6 GB bandwidth is usedper second; and their storage hardware costs $700,000 for 1 PB. They have abackup of their book and web materials in Alexandria (somewhat unfortunatelyknown for its ancient destroyed library), and also store audio material at the Euro-pean Archive in Amsterdam. Also, their Open Content Alliance initiative allows24 (URL last accessed 2009-06-09)25 (URL last accessed 2009-06-09)
  • 123. 6 Knowledge and information sharing 117various people and organisations to come together to create joint collections for allto use. Search is now beginning to make in-roads in terms of time-based search, and this is particularly relevant to archives of content with strong temporal aspects like the Internet Archive. For example, one can examine how words and their usage change over time (e.g. ‘marine life’). Semantic Web applications for accessing and searching information in the Internet Archive can help people to deal with the huge onslaught of information on the site. There is a need to take large related subsets of the Internet Archive collections and to help them make sense for peo- ple. Much work has been carried out on both wikis and search (and even on com- bining these services, e.g. Google SearchWiki26), but according to Brewster Kahle there is a need to ‘add something more to the mix’ to bring more structure to the Internet Archive project. This may involve combining the ease of access and au- thoring from the wiki world with computer-aided ways to incorporate the struc- tures that we all know are in there. Such methods should be flexible enough so that people can add structure one item at a time or so that computers can be em- ployed to help with this task. For example, in the recent joint initiative27, the idea is to build one web page for every book ever published (not just ones still for sale) to include content, metadata, reviews, etc. The relevant concepts in this project include: cre- ating Semantic Web concepts for authors, works and entities; having wiki-editable data and templates; using a tuple-based database with history; and making it all available in open source (both the data and the Python code). has over 10 million book records, with 250,000 of them containing the full texts.6.2.3 PowersetNatural language search company Powerset is another in a generation of startupsthat are employing semantic technologies to augment the way that people can ac-cess knowledge and information. The company’s first product was a semanticsearch and discovery tool for the Wikipedia social website, and Powerset was ac-quired by Microsoft in 2008. Barney Pell, chief technical officer of Powerset, believes that natural languagecan help with the realisation of the Semantic Web28, especially both sides of thechicken-and-egg problem (the chicken and the egg). On one side, annotations canbe created from unstructured text, and ontologies can be generated, mapped andlinked. On the other side, natural language search can consume Semantic Web in-26 (URL last accessed 2009-06-09)27 (URL last accessed 2009-06-09)28 (URL last accessed 2009-06-09)
  • 124. 118 The Social Semantic Webformation, and can expose Semantic Web services in response to natural languagequeries. The self-stated goal of Powerset is to enable people to interact with informationand services as naturally and effectively as possible, by combining natural lan-guage and scalable search technology. Natural language search interprets the Web,indexes it, interprets queries, searches and matches. Historically, search hasmatched query intents with document intents, and a change in the document modelhas driven the latest innovations. The first is proximity: there has been a shift fromdocuments being a ‘bag of keywords’ to becoming a ‘vector of keywords’. Thesecond is in relation to anchor text: adding off-page text to search is next. Documents are loaded with linguistic structure that is mostly discarded and ig-nored (due to cost and complexity), but it has immense value. A document’s intentis actually encoded in this linguistic structure, from which Powerset’s semanticindexer extracts meaning. Converging trends that are enabling this natural lan-guage search are emerging language technologies themselves, lexical and onto-logical knowledge resources, Moore’s law, open-source software, and commodityor cloud computing. Powerset integrates not just text from websites but diversetypes of resources, e.g. newsfeeds, blogs, archives, metadata, video, and podcasts.It can also do real-time queries on databases, where a natural language query isconverted into a database query to give results that can drive further engagement. As an example of how Powerset works, when the query ‘Sir Edward Heathdied from what’ is entered, the system parses each sentence; extracts entities andsemantic relationships, identifies and expands these to similar entities, relation-ships and abstractions; and then indexes multiple facts for each sentence. The firstfact returned from the Wikipedia says ‘Heath died from pneumonia’. Multiplequeries on the same topic to Powerset will retrieve the same ‘facts’ (e.g. the query‘what killed Edward Heath’ returns the same fact). The information on the variousentities or relationships can also come from multiple sources, e.g. information onEdward Heath or Deng Xiaoping may be from Freebase and details on pneumoniacan come from WordNet. Powerset can also handle more abstract queries thatwould be difficult to express or perform in conventional keyword search, such as‘who said something about WMDs?’ Powerset have stated that they will provide various APIs to the developercommunity and will give access to their technologies to build mashups and otherapplications. Powerset’s other community contributions will be in the form of datasets, annotations, and open-source software. Powerset’s language technologies are the result of commercialising the XLEwork from PARC, leveraging their ‘multidimensional, multilingual architectureproduced from long-term research’. Some of their main challenges for Powersethave been in the areas of scalability, systems integration, incorporating variousdata and knowledge resources, and enriching the user experience. According tochief operating officer Steve Newcomb29, it takes more computing power to parse29 (URL last accessed 2009-06-09)
  • 125. 6 Knowledge and information sharing 119semantics than to simply index, and nearly 20 percent of Powerset’s ongoingbudget is spent on computing resources. Pre-acquisition, Powerset’s commercialmodel was based on advertising (like most search engines) and on licensing theirtechnologies to other companies or search engines.6.2.4 OpenLink Data SpacesOpenLink Data Spaces (ODS)30 is a commercial semantically-powered collabora-tion platform that leverages popular Semantic Web vocabularies including FOAF,SIOC, SKOS and MOAT. ODS SPARQL endpoints provide access to semanticinstance data from a range of ODS application instances, including blogs, wikis,aggregated feeds (RSS 1.0, 2.0 and Atom), shared bookmarks, discussions (i.e.comment threads), photo galleries, briefcases (e.g. WebDAV file servers), etc. Theassociated MyOpenLink.net31 service is an example of an ODS-based service thatcan expose semantic instance data to SPARQL query service clients. ODS exposes all its data in the form of real or virtual RDF graphs via its Virtu-oso32-based quad store. There are a number of modules for the OpenLink DataSpaces (ODS) platform that each export semantic metadata (e.g. using the SIOCvocabulary), including ODS-Blog, ODS-Wiki, ODS-Bookmarks, ODS-AddressBook, ODS-Calendar, ODS-Polls, ODS-Gallery (for photos), ODS-Feeds(for feed aggregation and exposure via SIOC), and ODS-Discussion (for com-ments across blogs, wikis or any other data space that supports some form ofcommenting). OpenLink have also released an EC2 / S3 Amazon Image-versionof their Virtuoso product, which includes semantic data support: ‘your blogs,wikis, bookmarks, etc. are based on the SIOC ontology (think open socialgraph++)’. We will introduce SIOC in more detail later on.6.2.5 FreebaseThe open collaborative knowledge database Freebase was launched by San Fran-cisco-based Metaweb Technologies in 2007. Founded by Danny Hillis, co-founderof Thinking Machines and Applied Minds, and Robert Cook, a former video gamedeveloper, Metaweb has received nearly $60 million in funding. Freebase has been described by Metaweb as a ‘massive collaboratively-editeddatabase of cross-linked data’, and aims to become ‘the world’s database, with allof the world’s information.’ At present, Freebase mainly incorporates community-30 (accessed 2009-06-09)31 (URL last accessed 2009-06-09)32 (URL last accessed 2009-06-09)
  • 126. 120 The Social Semantic Webcreated data combined with data imported from open access repositories includingthe Wikipedia and MusicBrainz (an open-content music database and associatedset of tools for analysing patterns in music). However, the company have also saidthat Freebase could be used for proprietary or commercial data, thereby potentiallyproviding an additional revenue stream from such a service (and mirroring similarintentions from Radar Networks whose public Twine service may later be repack-aged for organisational use). Freebase organises its data and categories of data in ontology-like structurescalled ‘Freebase Types’, based on a graph model. Any user can create and modifytheir own types and associated properties, and these can be promoted for adoptionby administrators of the relevant domains that the type belongs to. Freebase data is licensed under the Creative Commons Attribution license.Data can be accessed via a JSON-based API such that third parties can developremote applications to leverage Freebase data. Data can be queried using the Met-aweb Query Language (MQL). Recently, Freebase announced the availability of all of its data in RDF33,thereby joining efforts in the Semantic Web community on the Linking Open Dataproject. Various projects, such as DBpedia, now provide links to Freebase con-cepts, since each Freebase concept has its own URI that can be referenced by ex-ternal applications.33 (URL last accessed 2009-07-20)
  • 127. 7 Multimedia sharingAs we have seen so far in this book, a key feature of the Social Web is thechange in the role of a user from simply being a consumer of content. Fur-thermore, it is not just textual content that can be shared, annotated or dis-cussed, but also any multimedia content such as pictures, videos, or evenpresentation slides. Moreover, this content can also benefit from SemanticWeb technologies. In this chapter, we will describe various trends regardingmultimedia sharing on the Social Web and we will focus on how SemanticWeb technologies can help to provide better interlinking between multimediacontent from different services.7.1 Multimedia managementThere is an ever-increasing amount of multimedia of various formats becomingavailable on the Social Web. Current techniques to retrieve, integrate and presentthese media items to users are deficient and would benefit from improvement.Semantic technologies make it possible to give rich descriptions to media, facili-tating the process of locating and combining diverse media from various sources.Making use of online communities can give additional benefits. Two main areas inwhich social networks and semantic technologies can assist in multimedia man-agement are annotation and recommendation. Some efforts such as DBTune1 al-ready provide musical content exported to the Semantic Web for music-based rec-ommendations. We shall describe these efforts in more detail later on in thischapter. Social tagging systems such as allow users to assign shared free-formtags to resources, thus generating annotations for objects with a minimum amountof effort. The informal nature of tagging means that semantic information cannotbe directly inferred from an annotation, as any user can tag any resource withwhatever strings they wish. However, studying the collective tagging behaviour ofa large number of users allows emergent semantics to be derived (Wu et al. 2006).Through a combination of such mass collaborative ‘structural’ semantics (via tags,geo-temporal information, ratings, etc.) and extracted multimedia ‘content’ se-mantics (which can be used for clustering purposes, e.g. image similarities or mu-sical patterns), relevant annotations can be suggested to users when they contrib-1 (URL last accessed 2009-06-09)J.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_7,© Springer-Verlag Berlin Heidelberg 2009
  • 128. 122 The Social Semantic Webute multimedia content to a community site by comparing new items with relatedsemantic items in one’s implicit and explicit networks. Another way in which the wisdom of crowds can be harnessed in semanticmultimedia management is in providing personalised social network-based re-commender systems. (Liu et al. 2006) presents an approach for semantic mining ofpersonal tastes and a model for taste-based recommendation. (Ghita et al. 2005)explores how a group of people with similar interests can share documents andmetadata, and can provide each other with semantically-rich recommendations.The same principles can be applied to multimedia recommendation, and these rec-ommendations can be augmented with the semantics derived from the multimediacontent itself (e.g. the information on those people depicted or carrying out actionsin multimedia objects2).7.2 Photo-sharing servicesAs soon as people began to use digital cameras for taking pictures, they tended topublish them on the Web. However, installing dedicated applications such as Gal-lery3 or having one’s own storage space on the Web requires some technical ex-pertise, thereby limiting the picture-sharing experience to only a few users. Simi-lar to blogging platforms that provide simple mechanisms for people who want topublish their thoughts online without technical requirements, Social Web applica-tions that let people easily publish, tag and share pictures began to appear, withFlickr being one of the most popular. Flickr, now owned by Yahoo!, allows you toupload pictures by selecting some images from your hard drive, to add text de-scriptions and tags, and to mark regions of interest on a photo by annotating them(‘add note’). As well as offering tagging and commenting mechanisms, Flickr allows usersto organise their pictures into browsable sets. Pictures can be searched by date ( upload date or by the real ‘taken on’ date using EXIF metadata), tag, descrip-tion, etc. Flickr offers control mechanisms for deciding who can access photos,and one can define each picture’s visibility (private, public, only friends, onlyfamily). As well as the web interface, pictures can be uploaded to Flickr by e-mailor using desktop utilities, and users can display thumbnails of pictures on theirblog or website using ‘badges’. Millions of pictures are now available on Flickr, and upload statistics on theFlickr homepage show thousands of pictures being uploaded each minute. Thanksto camera phones and custom uploading applications, many of these incorporateautomatic geolocation metadata, such that people can publish pictures as soon as2 (URL last accessed 2009-06-09)3 (URL last accessed 2009-06-09)
  • 129. 7 Multimedia sharing 123they take them on the street, underground or anywhere and these are then auto-matically linked to a particular place on a map. Apart from the uploading and storage facilities that Flickr offers, an importantfeature of the service is its social aspect. Flickr offers social networking function-ality in the form of adding friends and exchanging messages with them. Picturescan not only be seen by anyone but they can also be subject to conversation.Groups can even be created, to foster a community around a particular topic, fol-lowing the idea of object-centred communities that we mentioned earlier on in thisbook. For example, the ‘Squared Circle’ group, dedicated to pictures of circularthings, has nearly 6,500 members and 83,000 pictures, with related discussionthreads4. There are some limitations with Flickr. You cannot export your data easily, andyou cannot modify or edit your pictures (apart from rotation). You have to pay ifyou want: to allow higher resolution viewing of your images; to create more thanthree photo sets; to be able to post to more than 10 groups; or to upload many(large) pictures, since the free version is limited to 100 MB of data transfer permonth. As a result, some other feature-rich services have become quite popular in-cluding Zooomr5.7.2.1 Modelling RDF data from FlickrWhile Flickr does not natively expose any data in RDF, various exporters havebeen written to provide semantically-enhanced data from this popular photo shar-ing service. As with many other social websites, Flickr provides an API for devel-opers, and RDFizers (tools for converting from various data formats to RDF) canbe written based on this API. For example, the FlickRDF exporter6 (Passant 2008a) provides a representationof Flickr social networks and related user-generated content in RDF, mainly usingthe FOAF and SIOC ontologies. Therefore, it allows one to export their Flickrconnections in FOAF so that they can be related to connections from their per-sonal FOAF profile or from other social websites providing data in RDF (such asTwitter and its related FOAF exporter), enabling the construction of a distributedsocial graph as detailed in Chapter 10. The exporter relies on FOAF and SIOC asfollows: It uses FOAF to model people as instances of foaf:Person, as well as the vari- ous relationships between people using the foaf:knows relationship. Depending4 (URL last accessed 2009-06-09)5 (URL last accessed 2009-06-09)6 (URL last accessed 2009-07-09)
  • 130. 124 The Social Semantic Web on how much information is publicly available, it can provide more informa- tion, such as the person’s name using the foaf:name property. It uses SIOC to model the related user account (sioc:User) as well as the vari- ous user galleries that belong to it, using sioc:owner_of and sioc_t:ImageGallery. SIOC is also used to model the various groups a user be- longs to, using the sioc:member_of property and the sioc:Usergroup class. Some sample metadata from the FlickRDF exporter is given below:<> afoaf:Person ; foaf:name “Alexandre Passant” ; foaf:mbox_sha1sum “80248cbb1109104d97aae884138a6afcda688bd2” ; foaf:holdsAccount <> ; foaf:knows <> ; foaf:knows <> ; foaf:knows <> ; foaf:knows <> ; foaf:knows <> ; foaf:knows <> ; sioc:member_of <> ; sioc:member_of <> ; sioc:member_of <> . Moreover, in order to provide global interlinking with other RDF data, the ex-porter also relies on other ontologies and data sources such as GeoNames7 tomodel the geolocation of a user (based on the Flickr information from the userprofile). By providing such a complete export, one can for example identify allFlickr galleries owned by a friend-of-a-friend who lives in France. Other ways to provide semantically-enhanced data from Flickr have been pro-vided. The Flickr Wrappr8 provides information regarding pictures related to any7 (URL last accessed 2009-07-07)8 (URL last accessed 2009-06-09)
  • 131. 7 Multimedia sharing 125DBpedia URI. In this way, one can identify all pictures related to a particularmonument, city, person, etc. This exporter combines the multilingual capacities ofDBpedia and its geolocation features with the Flickr API so that it can identify allpictures related to a particular concept. The export is available both in HTML andRDF (thanks to content negotiation), so that human readers as well as softwareagents can benefit from it. Another service called Flickr2RDF also provides a method for extracting RDFinformation from any Flickr picture9. This API mainly uses FOAF and DublinCore to represent such information in RDF, and also provides a way to exportFlickr notes so that notes applied to particular image regions can also be repre-sented in RDF (using the Image Region vocabulary10). We shall describe moreways to annotate image regions in the next section. (Maala et al. 2007) have presented a conversion process with linguistic rulesfor producing RDF descriptions of Flickr tags that helps users to understand pic-ture tags and to find various relationships between them. Finally, as we will men-tion in Chapter 8, machine tags from Flickr can be translated into RDF using theFlickcurl API. This API also allows other information about Flickr pictures to betranslated into RDF using Dublin Core (and the WGS84 Geo vocabulary11 if thepicture has been geotagged).7.2.3 Annotating images using Semantic Web technologiesWhile annotating Flickr pictures and extracting some RDF information from themrequires the use of a service like Flickr2RDF, there are generic ways to add se-mantic information to images that can be applied to any picture. The ‘Image An-notation on the Semantic Web’ document12 from the W3C Multimedia SemanticsIncubator Group references various vocabularies, applications and use cases thatcan be used for such tasks. A simple way to do this is to represent metadata relatedto a particular picture (such as the title, author, image data, etc.) using commonSemantic Web vocabularies such as FOAF or Dublin Core (as performed byFlickr2RDF). This then provides a means to query for metadata about pictures in aunified way. Going further, the MPEG-7 (Moving Picture Experts Group) standard and itsassociated RDF(S)/OWL mappings can also be used to represent image regionsand add particular annotations about them13. These annotations can be combinedwith other metadata, for example, modelling that a region depicts a person (identi-9 (URL last accessed 2009-06-09)10 (URL last accessed 2009-06-09)11 (URL last accessed 2009-06-09)12 (URL last accessed 2009-06-09)13 (URL last accessed 2009-07-16)
  • 132. 126 The Social Semantic Webfied using the FOAF vocabulary), a place (referring to DBpedia or GeoNames in-formation), etc. Vocabularies such as Digital Media14 or the Image Region vo-cabulary can be used for a similar task. Applications such as M-OntoMat-Annotizer15 or PhotoStuff16 can be used to provide such annotations and to createthe corresponding RDF files that can then be exchanged or shared on the Web. While many of these techniques usually require a separate RDF file for storingmetadata information, annotations can sometimes be directly embedded into theimage itself, for example, in SVG (Scalable Vector Graphics) images as describedby the SVG Tiny specification17. At the moment, there is little agreement on what media vocabularies should beused across the board. One useful task would be to define a set of mappings be-tween these various models, allowing us to efficiently combine the best parts ofdifferent ontologies for annotating multimedia content. This is one of the currenttasks of the W3C Media Annotation Working Group18. As defined by the charterof the group, its goal ‘is to provide an ontology designed to facilitate cross-community data integration of information related to media objects in the Web,such as video, audio and images’. A first draft of this ontology was published inJune 200919.7.3 PodcastsPodcasts are to radio what blogs are to newspapers or magazines - people can cre-ate and distribute audio content using podcasts for public consumption and play-back on personal / portable media players, computers or other MP3-enabled de-vices. Video podcasts, also known as ‘vlogs’ from video blogs or ‘vodcasts’ fromvideo podcasts, are a variation on audio podcasts where people can produce andpublish video content on the Web for consumption on media playing-devices, andthis content can range from individuals publishing home movies or their own news‘interviews’, to studios releasing TV episodes or movies for a fee. We shall nowdescribe these two areas in more detail, along with some ideas on how semanticmetadata can be leveraged for this application area.14 (URL last accessed 2009-06-09)15 (URL last accessed 2009-06-09)16 (URL last accessed 2009-06-05)17 (URL last accessed 2009-06-05)18 (URL last accessed 2009-06-05)19 (URL last accessed 2009-07-07)
  • 133. 7 Multimedia sharing 1277.3.1 Audio podcastsAudio podcasting has become quite popular in the past few years, with podcast re-cordings ranging from interviews and music shows to comedies and radio broad-casts. One of most popular podcasts is by comedian Ricky Gervais for the Guard-ian Unlimited website. Although the concept of podcasting was suggested in 2000, the technical rootsstarted to evolve in 2001, with the influence of blogs being a key aspect. The word‘podcast’ itself is a portmanteau of ‘pod’ from iPod and ‘broadcast’, and the termcame into popular use around 2004 with one of the first-known podcasts beingproduced by Adam Curry. Several technologies had to be in place for podcastingto take off: high-speed access to the Internet, MP3 technology, RSS, podcatchingsoftware, and digital media players. In 2005, the word ‘podcast’ already yieldedover 100 million Google hits, and in 2006, the number of podcasts surpassed thenumber of radio stations worldwide. From simple origins, podcasting has become a major force for multimedia syn-dication and distribution. Much of the strength of podcasting lies in its relativesimplicity, whereby casual users can create and publish what is effectively anonline radio show and can distribute these shows to a wide audience via the Web.All a user needs to create a podcast is some recording equipment (e.g. a PC andmicrophone), an understanding of subscription mechanisms like RSS, and somehosting space. It is also easy for a consumer to listen to podcasts, either by using traditionalfeed-catching methods to subscribe to a podcast feed and thereby receive auto-matic intermittent updates, or by subscribing to a podcast discovered through thecategorised podcast directories of Odeo or the iTunes20 music store on a desktopcomputer, iPod Touch or iPhone. However, it is not only individuals who are publishing podcasts, since largerorganisations have leveraged the positive aspects of such technologies. Manycompanies now have regular podcasts, ranging from Oracle and NASA to GeneralMotors and Disney. Also, many radio stations have begun making podcasts oftheir programmes available online (e.g. NPR’s Science Friday), although theseusually are devoid of music or other copyright content. Many sites have offered downloads of audio files or streaming audio content(in MP3 or other format) for some time. Podcasts differ in that they can bedownloaded automatically via ‘push’ technologies using syndication processessuch as RSS described earlier. When a new audio file is added to a podcast chan-nel, the associated syndication feed (usually RSS or Atom) is updated. The con-sumer’s podcasting application (e.g. iTunes) will periodically check for new audiofiles in the channels that a consumer is subscribed to, and will automatically20 (URL last accessed 2009-06-05)
  • 134. 128 The Social Semantic Webdownload them. Podcasts can also be accompanied by show notes, usually in PDFformat. After recording a podcast using a computer with a line-in or USB microphone,editing can be performed using open-source utilities like Audacity21. The podcastcan then be self-hosted using services like LoudBlog22 or WordPress.org23 with thePodPress24 extension, or hosted on other third-party services such as Word-Press.com25, Blast26 or Blogger (e.g. by uploading a file to the Internet Archiveand linking to a post on Blogger using their ‘Show Link Field’ option). As well asthe iTunes application from Apple, a popular open-source tool for downloadingpodcasts is Juice27. There is also a legal aspect to podcasting. Copyright, the branch of law thatprotects creative expression, covers texts displayed or read aloud, music playedduring podcasts (even show intros or outros), audio content performed or dis-played (e.g. in video podcasts, more later), and even the interviews of others maybe protected under copyright. The solution is to try and use what is termed ‘pod-safe’ content, i.e. Creative Commons-licensed works28, works in the public do-main (e.g. from the Internet Archive29), or at the very least, material that adheresto fair use principles30. Universities are also publishing lectures or other educational content throughpodcasts31, allowing students to listen to or view their lectures on demand. Teach-ers can publish podcasts of their lectures and assignments for an entire class or forthe public, e.g. to supplement physical lectures or to fully serve the needs of dis-tance-learning students. Conversely, students can create and publish content anddeliver it to their teachers or other students. Some popular educational podcastsare provided by Stanford32 and MIT33. Some more podcasting technologies and derivatives include: ‘autocasting’, theautomatic generation of podcasts from text-only sources (e.g. from free books atProject Gutenberg); multimedia messaging service-based podcasts and ‘mobile-casting’, i.e. mobile podcasting and listening or viewing through mobile phones;21 (URL last accessed 2009-06-05)22 (URL last accessed 2009-06-05)23 (URL last accessed 2009-06-05)24 (URL last accessed 2009-06-05)25 (URL last accessed 2009-06-05)26 (URL last accessed 2009-06-05)27 (URL last accessed 2009-06-05)28 (URL last accessed 2009-06-05)29 (URL last accessed 2009-06-05)30 (URL last accessed 2009-06-05)31 (URL last accessed 2009-06-05)32 (URL last accessed 2009-06-05)33 (URL last accessed 2009-06-08)
  • 135. 7 Multimedia sharing 129‘voicecasting’, or podcast delivery through a telephone call; and ‘Skypecasting34’or phonecasting where podcasts are created by recording a Skype conference callor regular phone call. At the SDForum / SoftTECH Event on Architecting Community Solutions in2005, Zack Rosen of CivicSpace Labs posed the idea for an evolutionary step inweb-based discussions, whereby phone conversations could be recorded (via As-terisk, an open source Linux-based PBX application) and then streamed ordownloaded as audio discussions that would augment the traditional text discus-sions on message board sites. We may also see mailing lists being linked to PBXphone numbers that you could ring up to leave audio comments for members ofthe list. Podcasting is moving in this direction: you can not only have text com-ments as replies to podcast postings but you can also add audio comments (this isa feature of the LoudBlog podcasting platform).7.3.2 Video podcastsVideo podcasts (Felix and Stolarz 2006) are similar to audio podcasts, and can bedownloaded to PCs or personal media players using many of the same tools andmechanisms. Known by a variety of terms (video blogging, vidblogging, vlogging,vodcasting from ‘video on demand’, video casting or vidcasting), video podcast-ing ranges from interviews and news to tutorials and behind-the-scenes documen-taries. Some television stations are also making episodes of their series download-able for free (e.g. via Channel 4’s 4OD35 player in the UK) or for a fee. With videopodcasts, anyone can have their ‘own’ internet TV station: all they need is a cam-era and some effort. Some of the most popular video podcasts (from the Podcast Alley36 directory)include one offering woodworking advice, a gadget news show, digital videocamera tutorials, discussions on real-life issues, and a Big Brother-type series.Video podcasters can make money from their podcasts through various means: byusing Google Adsense for display ads or by having a PayPal ‘tip jar’ at the pod-cast download site, by manually inserting video advertisements or by usingRevver37 ‘RevTags’ (a clickable advert at the end of each video). According to a story from the Guardian38, despite the relatively modest numberof users who are watching online video, research indicates that video downloadsare responsible for more than 50% of all internet traffic, and this may in the futurecause gridlock on the Internet. Premium Internet video services will reach $2.634 (accessed 2009-06-05)35 (URL last accessed 2009-07-07)36 (URL last accessed 2009-07-07)37 (URL last accessed 2009-06-05)38 (accessed 2008-05-01)
  • 136. 130 The Social Semantic Webbillion in 200939, and according to Forrester Research, with more than half ofadults (53% of consumers 18 and older) stating that they view online video40,mainstream adoption of Internet video has arrived. Similar to the differences between audio downloads and podcasting, there aresome distinctions that can be made between video downloads and podcasts. Bothinvolve a content creation process, use codecs (coder-decoders) for media com-pression, may be transferred via multiple file formats, and can possibly leveragesome streaming services. Like audio podcasting, video podcasting differs in that itincludes some method for automated download of video files, e.g. using an RSSsubscription mechanism or possibly some blogging or CMS (content managementsystem) software. DRM (digital rights management) or restrictive transfer proto-cols are not usually a feature of video podcasts, otherwise nobody would botherdownloading them. Video podcasts are normally created through a digital camera or camcorder,webcam, mobile phone, etc. Video files are then transferred from the recordingdevice, or may be captured live via USB, TV card, etc. After conversion, editingand compression using processing tools like VirtualDub or Adobe Premiere, thevideos are uploaded to the Web, including popular video sharing services likeYouTube,, etc. Video podcasts need to be fairly short: less than 5 minutesis good, 15 minutes is okay, but 30 minutes is too long. Since a lot of video pod-casts are similar to ‘talk radio’, there can be a bit of a learning curve. As with au-dio podcasting, you should use ‘podsafe’ audio41 from sources like GarageBand orMagnatune in your videos. Seesmic42 is a ‘microvlogging’ application in the style of services like Twitter(such that it is being referred to as ‘the video Twitter’). However, if a picture isworth a thousand words (and a video contains many thousands of pictures), thenSeesmic is quite different to Twitter in terms of expressivity and what can be con-veyed through even a short video message (when compared to 140 characters).Seesmic has a simple but intuitive interface for creating content and viewing vid-eos (from the public or from friends). The emphasis in Seesmic is mainly towardsusing one’s webcam for creating microvlogs, but it also encourages the uploadingof short video files (e.g. in Flash video format). Another recent trend is that of ‘lifecasting’ or live video streaming, as exempli-fied by services such as Ustream43 (allowing video to be broadcast live from com-puters and mobiles) and Qik44 (for sharing live video from mobiles only).39 (URL last accessed 2009-06-05)40 (URL last accessed 2009-06-09)41 (URL last accessed 2009-06-05)42 (URL last accessed 2009-07-16)43 (URL last accessed 2009-07-07)44 (URL last accessed 2009-07-07)
  • 137. 7 Multimedia sharing 1317.3.3 Adding semantics to podcastsSemantic metadata can be associated with both the overall structure and audiocontent of podcasts. Such metadata for podcasts can be attached to the channel anditem descriptions in RSS 1.0 format, and may simply involve a reorganisation ofpre-existing structured data (see Figure 7.1). For example, Apple has written a specification document45 describing theiriTunes namespace46 (an extension for RSS 2.0) that details podcast metadata foruse in iTunes listings and iPod displays. Yahoo! has also created a namespace forsyndicating media items47, intended as a replacement for the RSS enclosure ele-ment. In fact, it may also be possible to explicitly define metadata in an RSS 1.0 ex-tension for multimedia data where such metadata does not already exist (Hogan etal. 2005). Podcast content can also be annotated, more so through automaticspeech recognition, but people could also add annotations (e.g. URL references) ortags to parts of a recording as they listen to it. This could also be combined withthe Music Ontology48 (more later).Fig. 7.1. Some sources of metadata for a semantic representation of a podcast file One possibility would be to extract and convert the metadata that is often em-bedded in multimedia files, and this could be extracted when songs are played dur-ing the recording of a podcast. An example of such embedded metadata would bethe ID3 / ID4 / APE tags often found in MP3 files and annotated via tools like the45 (URL last accessed 2009-06-05)46 (URL last accessed 2009-06-05)47 (URL last accessed 2009-06-08)48 (URL last accessed 2009-06-05)
  • 138. 132 The Social Semantic WebID3 Tag Editor49. Such tags provide information relating to the file name, song orpiece name, creator or artist, album, genre and year. Other multimedia metadatastandards include the MPEG series of standards (e.g. MPEG-7, a means of ex-pressing audio-visual metadata in XML). Upon parsing of such information, a pre-templated RSS 1.0 file can be filled in with the available supplemental informa-tion for further interpretation by podcasting tools. This metadata can then be usedby tools such as the Podcast Pinpointer described by (Hogan et al. 2005), a proto-type application for the intelligent location and retrieval of podcasts. Many sites have begun using word recognition technologies in the indexing ofmultimedia files, with one such popular site being the video site blinkx. Word rec-ognition software has seen many advances in recent years, and is becoming moreand more accurate. Services can use these technologies to create a transcript ofspoken words contained in the audio of podcast files. This would be quite usefulin keyword searches. Others are employing human transcription services to convert the content ofaudio podcasts to text files, especially since ‘content is king’ on the Web and pod-casts can be a valuable source of new text content that may not be available else-where. As well as these transcripts, HLT (Human Language Technology) could beimplemented to derive a structure from the prose. These structures could also beattached to RSS 1.0 documents thereby complementing existing metadata. An example of a semantically-enhanced podcast service is the ZemPod applica-tion described by (Celma and Raimond 2008). It uses both speech and music re-cognition algorithms in order to automatically split a podcast into different partsand then adds RDF metadata to each part of it in order to ease the way in whichpodcast files can be consumed and browsed. Metadata can be related to extractedkeywords as well as to the recognised songs. Regarding the latter, additional in-formation can be retrieved or interlinked from existing sources for a better userexperience. For example, one could identify all podcasts containing a song thatlasts less than two minutes and was written by an American band that played atleast twice in the CBGB music club. We shall now describe in detail other initiatives related to adding semantics tomusic-related content on the Web, many of which can be used to semantically de-scribe the content in both audio and video podcasts.49 (URL last accessed 2009-06-05)
  • 139. 7 Multimedia sharing 1337.4 Music-related content7.4.1 DBTune and the Music OntologyA wide range of music-related data sources have been interlinked within the Link-ing Open Data initiative (Raimond et al. 2008). Some efforts such as DBTune50 al-ready provide musical content exported to the Semantic Web, and recent work hasbeen performed in order to reuse that interlinked musical content for music-basedrecommendations (Passant and Raimond 2008).Fig. 7.2. Sources of music-related data interlinked with the Linked Open Data cloud For example, the DBTune project exports the data sets depicted in Figure 7.2 inRDF, interlinked with other data. These data sets encompass detailed editorial in-formation, geolocations of artists, social networking information amongst artistsand listeners, listening habits, Creative Commons content, public broadcasting in-formation, and content-based data (e.g. features extracted from the audio signalcharacterising structure, harmony, melody, rhythm or timbre, and content-basedsimilarity measures derived from these). These data sets are linked to other ones.For example, Jamendo (a music platform and community for free downloadablemusic) is linked to GeoNames, therefore providing an easy-to-build geolocation-50 (URL last accessed 2009-06-05)
  • 140. 134 The Social Semantic Webbased mashup for music data. Artists within MusicBrainz are linked to DBpediaartists, MySpace artists, and artists within the BBC’s playcount data. In order to represent assorted types of information from these music data sets,such as differentiating between bands or solo artists, as well as various kinds ofartists, the Music Ontology (MO) provides a complete vocabulary for music-related information modelling which ties in with well-known vocabularies such asFOAF. For example, the ‘Artist’ class in MO is a subclass of the ‘Agent’ classfrom FOAF.7.4.2 Combining social music and the Semantic WebInformation from DBpedia, music-related services and data sets described in theprevious section can be efficiently combined with social information such as so-cial networks, tagged blog posts, etc. to provide advanced services for end users tobrowse and find music-related information. Hence, (Passant and Raimond 2008) have detailed various ways for using Se-mantic Web technologies to enable the navigation of music-related data. For ex-ample, by modelling social network information from various platforms (,MySpace, etc.) using FOAF (as we will describe later), information can be sug-gested to a user not just from his or her friends on a particular network but fromfriends-of-friends on any network. This is shown in Figure 7.3 and goes furtherthan some generic collaborative filtering algorithms provided in most social musicapplications. A related project is ‘Foafing the Music’ (Celma et al. 2005) whichuses FOAF-based distributed social networks as well as content-based data avail-able in RDF to suggest related information in recommender systems. :alex foaf:knows :yves foaf:knows :tom foaf:topic_interest foaf:topic_interest dbpedia:Ramones dbpedia:RancidFig. 7.3. Combining social networks and musical interests across social websites
  • 141. 7 Multimedia sharing 135 Another way to benefit from user-generated tagged audio content is to leverageadvanced semantic tagging capabilities such as MOAT (described in Chapter 8).For example, pictures of Joe Strummer or other former band members from theClash could be displayed when browsing blog posts about the band, as depicted inFigure 7.4 (e.g. by leveraging relationships existing between both in the DBpedia).Fig. 7.4. Interlinking related music information from a content management system and photo-sharing serviceFig. 7.5. Browsing similar artists using information from the DBpedia
  • 142. 136 The Social Semantic Web As we explained earlier, an aspect of Semantic Web data modelling is the pres-ence of typed links between concepts rather than simple hypertext links betweendocuments. These links can then be used when browsing content, so that one candecide to visit an artist page from another one because they are in the same musi-cal genre or are signed to the same label. A first experiment based on artist infor-mation available in DBpedia is depicted in Figure 7.5.
  • 143. 8 Social taggingTagging has rapidly become a common and popular practice on social web-sites. It allows people to easily annotate the content they publish or share withfree-form keywords in order to make the content more easily browsable anddiscoverable by others, leading to a social component of tagging. While tag-ging is a lightweight, agile and evolving way to annotate content, we believe itcan be efficiently combined with formal modelling schemes such as ontologiesto make it more powerful and to be part of the Semantic Web as a whole. Inthis chapter, we hope to give a comprehensive overview of the benefits of Se-mantic Web technologies for tagging activities both from a theoretical andpractical point of view, as we describe both models and applications that canbridge the gap between social tagging and the Semantic Web.8.1 Tags, tagging and folksonomies8.1.1 Overview of taggingApart from providing a means to create discussions and to define or manage socialnetworks, one of the most important features of social websites is the ability to up-load and share content with one’s peers. That particular feature also reinforces theobject-centred sociality aspect that was described earlier in this book: peopleshare, interact and meet thanks to common interests related to particular objects.For example, a community may form around a particular movie, a technology or aplace. On many social websites, this data can be shared either with whoever issubscribed to (or just browsing) the website or else within a restricted community.Furthermore, not only textual content can be shared, but also various media typessuch as videos and audio, as we saw in Chapter 7. In order to make this content more easily discoverable, users can add free-formkeywords, or tags, that act like subjects or categories for anything that they uploador wish to share. For example, this book could be tagged with the keywords ‘se-manticweb’ and ‘socialweb’ on a scientific bibliography management system suchas Bibsonomy1. This is depicted in Figure 8.1 showing how a related journal paperhas been tagged. A tag is normally a single-word descriptor so punctuation marks1 (URL last accessed 2009-06-05)J.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_8,© Springer-Verlag Berlin Heidelberg 2009
  • 144. 138 The Social Semantic Webare usually avoided, but some systems support phrases in quotation marks like‘global warming’ and others use camelCase to distinguish between words. One of the most popular tagging systems is the social bookmarking, which allows one to store their favourite bookmarks on the Web viaquick buttons in a browser (instead of locking them into a single desktop browserinstallation). Bookmarks saved in become accessible from anywhereand are normally public. After bookmarking your favourite URL, e.g.‘’, you can then add tags, e.g. ‘university cool nuigalwaycourses students’. Users can subscribe to other user’s bookmarks, and bookmarkscan be forwarded to other registered users using the custom ‘for:username’ tagsyntax in 8.1. A journal article tagged with ‘semanticweb’ and ‘socialweb’ in Bibsonomy On the microblogging service Twitter, people have been using what are called‘hashtags’2 (i.e. tag keywords prefixed with the ‘#’ or hash symbol) to annotatetheir microblog posts. While the use of hashtags began in late 2007, Twitter onlyadded hyperlink support for these tags in July 2009, such that clicking on a hash-tag brought one to a search service where related microblog posts using the sametag were shown. While tags can be generally considered as a type of metadata, it is important tokeep in mind that they are user-driven metadata. Indeed, while a blog engine mayautomatically assign a creation date to any blog post, or a photo sharing servicesuch as Flickr will use embedded EXIF information to display the aperture of thecamera with which a photo was taken, tags are added voluntarily by users them-selves and a tag reflects the needs and the will of the user who assigns it. In thisway, tags focus on what a user considers as important regarding the way he or shewants to share information. The main advantage of tagging for end users is thatone does not have to learn a pre-defined vocabulary scheme (such as a hierarchyor taxonomy) and one can use the keywords that fit exactly with his or her needs2 (URL last accessed 2009-07-07)
  • 145. 8 Social tagging 139or ‘desire lines’3. Moreover, tags can be used for various purposes, and (Golderand Huberman 2006) have identified seven different functions that tags can playfor end users, from topic definition to opinion forming and even self-reference.(Marlow et al. 2006) also identified that in some cases, tags can be social elementsthat a user wants to emphasise, e.g. ‘seen_in_concert’. As tags are useful only when used in combination with the resource they are re-lated to, they are generally associated to tagging actions. A tagging action thenrepresents the fact of assigning one or more keywords to online resources. Obvi-ously, many tags can be assigned to the same resource, and on some services, dif-ferent users can assign (the same or different) tags to the same resource, leading toa social feature in those tagging systems. For example, in, a bookmarkcan be saved by several users, each of them being able to assign his or her owntags to the item. In order to simplify the tagging process, websites generally pro-vide auto-completion features or automatically suggest tags, typically by analysingtags already assigned by other users to the same resource. From a theoretical point of view, a tagging action is often represented as a tri-partite model between a User, a Resource and a Tag as proposed amongst othersby (Mika 2005a). Figure 8.2 represents three different tagging actions (T1, T2, T3)made by two different users (U1, U2) on a particular picture.Fig. 8.2. Representing different tagging actions related to the same content Emerging from the use of tagging on a given platform, these actions lead towhat is generally called a folksonomy, a term coined by (Vander Wal 2007) as aportmanteau of the works ‘folks’ and ‘taxonomy’. A folksonomy is hence a social,3 (last accessed 2009-06-05)
  • 146. 140 The Social Semantic Webcollaboratively-generated, open-ended, evolving and user-driven categorisationscheme. Contrary to pre-defined classification schemes, users can use their ownterms, which makes the folksonomy evolve quickly, based on the user’s needs andbenefiting from the ‘architecture of participation’ effect. Websites that supporttagging therefore benefit from the ‘wisdom of the crowds’ effect. Information retrieval from tags and folksonomies is simply carried out usingtag-based search engines, which leads to some issues that we will describe in thefollowing sections. Folksonomies also provide a way to fluently navigate betweenvarious related tags and content, leading to serendipitous discovery of items. Forexample, users can generally navigate from one tagged item to the list of all itemstagged with a similar tag, and so on. A popular visualisation scheme for these tag-ging ecosystems is the use of tag clouds, where the highly-used tags are bigger (orbolder) than the other ones (similar to a weighted list in visual design). These tagclouds also give an overview of the main categories or topics discussed in the re-lated community website as seen in Figure 8.3.Fig. 8.3. Tag cloud of popular tags from del.icio.us8.1.2 Issues with free-form tagging systemsIn spite of its advantages when annotating content items, tagging leads to variousissues regarding information retrieval. It makes the task of retrieving tagged con-tent sometimes quite costly, especially when looking for information tagged byother people. (Mathes 2004) says that the ‘folksonomy represents simultaneouslysome of the best and worst in the organisation of information’.
  • 147. 8 Social tagging 141 As we mentioned previously, tag-dedicated search engines are simply based onplain-text strings, i.e. a user types a tag and gets all the content that has beentagged with that particular keyword. It therefore leads to various issues that wewill now describe. (Interestingly, the issues below have parallels in the world oflibraries, and are one reason why librarians now use classification schemes likethesauri or taxonomies to classify items, such as Dewey Decimal Classification orthe ACM Taxonomy.) Tag ambiguitySince tags are text-strings only, without any semantics or obvious interpretation(rather than a set of characters) for a software program that reads them, ambiguityis an important issue. While a person knows that the tag ‘apple’ means somethingdifferent when it is used in relation to content about a laptop or on a picture of abag of fruit, a search engine will not be able to distinguish between them. It willretrieve both items for a search on ‘apple’ even if the user had the computer brandin mind. Consequently, the user will have to sort out what is relevant and what is not re-garding his or her expectations, which can be a costly step depending on the num-ber of retrieved items. For example, Figure 8.4 shows the result of a search regard-ing most relevant items tagged ‘apple’ on Flickr, which mixes pictures of fruit andApple devices.Fig. 8.4. Tag ambiguity in a Flickr search for pictures tagged ‘apple’
  • 148. 142 The Social Semantic Web8.1.2.2 Tag heterogeneityTag ambiguity refers to the same tag being used to refer to different things, but aparallel issue is that different tags can also be used to refer to the same thing. Inthis case, a user must run various queries to get the content related to a particularconcept or object. Such heterogeneity is mainly caused by the multilingual natureof tags (e.g. ‘semanticweb’ and ‘websemantique’) but also due to the fact thatpeople will use acronyms or shortened versions (‘sw’ and ‘semweb’), as well aslinguistic and morpho-syntactic variations (synonyms, plurals, case variations,etc.). For example, we observed that at least ten variations are used on del.icio.usto identify the term ‘semantic web’, not taking into account narrower tags, as wewill now explain. Lack of organisation between tagsSince a folksonomy is essentially a flat bundle of tags, the lack of relationships be-tween them makes it difficult to find information if one is not directly looking atthe right tag. This is clearly a problem in the practice of tagging, especially if, asnoted by (Golder and Huberman 2006), users use different tags depending on theirlevel on expertise, or if they search for broader or narrower ones. For example,while we mentioned the tags ‘semanticweb’ or ‘socialweb’ regarding this book, anexpert may not use those terms which would be too broad but instead would preferto use terms such as ‘sioc’, ‘rdfa’, ‘sparql’ which will help him or her to betterclassify the data. Then, if someone is simply looking at items tagged ‘semantic-web’, they will not be able to retrieve the book even though there is a clear rela-tionship between both in terms of the technological domain. To overcome this issue, clustering algorithms can be used to identify relatedtags as introduced by (Begelman et al. 2006). However, their success depends onthe tagging distribution, i.e. if there is a strong co-occurrence between tags or not,which may not be the case in some folksonomies, even for tags that identify re-lated concepts. In some cases, these algorithms can also be combined with otherapproaches to identify related tags and communities around particular topics(Hayes and Avesani 2007).8.2 Tags and the Semantic WebIn the past, folksonomies and ontologies have been regularly cited as opposite andexclusive means for managing and organising information. A frequent point ofview was to consider folksonomies as a bottom-up classification, while ontologieswere seen as a centralised top-down approach. This way of thinking was also partof a larger set of opposing views between Web 2.0 and the Semantic Web. How-
  • 149. 8 Social tagging 143ever, as we have described in this book, we believe that this opposition is unjusti-fied and should not exist since these two fields are in fact complementary (andsynergistic) paths towards enhancing the Web. This opposition has often beenstated in posts on the blogosphere, and one of the main reasons may be due to amisunderstanding of the original Semantic Web article by (Berners-Lee et al.2001). The common interpretation is that this initial vision will lead to a uniqueuniversal ontology for the Semantic Web, which was not the case as SemanticWeb pioneer Jim Hendler states on his blog4 when answering various comments. However, numerous works related to the links between tags, related objects(tagging actions, folksonomies, tag clouds, etc.) and the Semantic Web have beenpublished during the last couple of years. We can divide these works into twomain areas: (1) the ones aiming to define, mine or automatically link to taxono-mies or ontologies from existing folksonomies, and (2) works based on definingSemantic Web models for tags and related objects. Again, the border between bothis sometimes fuzzy since both approaches can be combined together.8.2.1 Mining taxonomies and ontologies from folksonomiesThis first set of approaches is mainly based on the idea that emergent semanticsnaturally appear through the use of tags. As (Golder and Huberman 2006) report,there is generally a stable set of tags used for a given resource (or in a given tagspace) after a certain amount of time. For example, on, the tags for anitem stabilise after it has been tagged about a hundred times. Therefore, emergentsemantics can be used to mine taxonomies or ontologies from folksonomies andthis research area has led to several works during the past few years.Fig. 8.5. Mining hierarchical relationships from co-occurrence of tags, adapted from (Halpin etal. 2006)4 (accessed 2009-06-02)
  • 150. 144 The Social Semantic Web Among others, (Halpin et al. 2006) used an approach based on related co-occurrences of tags to extract hierarchical relationships between concepts, as de-picted in Figure 8.5. Based on the reflexive co-occurrence of tags, they extractbroader and narrower relationships between concepts that they model as an RDFSvocabulary using the rdfs:subClassOf property. (Mika 2005a) defined a socially-aware approach for automatically building ontologies by combining social net-work analysis and clustering algorithms based on folksonomies. One outcome ofhis work is that sub-communities of users can also be mapped to a hierarchy oftags: communities of experts use narrower tags than the broader communities theyare included in. More recently, the FoLksonomy Ontology enRichment (FLOR) 5technique provides a completely automated approach to semantically enrich tagspaces by mapping tags to Semantic Web entities (Angeletou 2008). By enrichingtag spaces with semantic information about the meaning of each tag, some issueswith tagging regarding information retrieval (such as tag ambiguity as mentionedearlier) can be solved.8.2.2 Modelling folksonomies using Semantic Web technologiesWhile the previous section described work on extracting and linking structuredmodels based on tags and tagging activities, another approach to bridge folksono-mies and the Semantic Web is to use RDF(S)/OWL modelling principles to repre-sent tags, tagging actions and other related objects as tag clouds. While tag-basedsearch is the only way to retrieve tagged content at the moment (leading to theaforementioned problems), these new models allow advanced querying capabili-ties such as ‘which items are tagged ‘semanticweb’ on any platform’, ‘what arethe latest ten tags used by Stefan on’, ‘list all the tags commonly usedby Alex on SlideShare and by John on Flickr’ or ‘retrieve any content tagged withsomething relevant to the Semantic Web field’. Having tags and tagged contentpublished in RDF also allows one to easily link this to or from other SemanticWeb data, and to reuse it across applications in order to achieve the goal of aglobal graph of knowledge. While it has not been implemented, (Gruber 2007) defined one of the first ap-proaches to model folksonomies and tagging actions using a dedicated ontology.This work considers the tripartite model of tagging and extends it with (1) a spaceattribute, aimed at modelling the website in which the tagging action occurred, and(2) a polarity value in order to deal with spam issues. His proposal provides acomplete model to represent tagging actions, but also considers the idea of a tagidentity, such that various tags can refer to the same concept while being writtendifferently, introducing the need to identify some common semantics in the tagsthemselves.5 (URL last accessed 2009-06-05)
  • 151. 8 Social tagging 145 The Tag Ontology6 was the first RDF-based model for representing tags andtagging actions, based on the initial ideas of Gruber and on the common theoreti-cal model of tagging that we mentioned earlier. This ontology defines the ‘Tag’and ‘Tagging’ classes with related properties to create the tripartite relationship oftagging. In order to represent the user involved in a tagging action, this ontologyrelies on the FOAF vocabulary that we will describe in more detail later. An important feature of this model is that it defines a Tag class, hence implyingthat each tag will have a proper URI so that tags can be used both as the subjectand object of RDF triples. Moreover, this class is defined as a subclass ofskos:Concept and the ontology introduces a ‘relatedTag’ property. The SimpleKnowledge Organisation System (SKOS)7 is a lightweight RDFS vocabulary al-lowing people to define controlled vocabularies such as taxonomies and thesauri.In this way, tags can be linked together, for example, to model that the ‘rdfa’ tag ismore specific than ‘semanticweb’. However, the proposed property does not dif-ferentiate between two tags that are related because they represent the same con-cept but are spelled differently (‘websemantique’ and ‘semanticweb’) or if one tagidentifies a concept which is broader than the other (‘rdfa’ and ‘semanticweb’).Finally, while it does not specifically consider the tagging space, it introduces away to temporally define the tagging action thanks to a taggedOn property. The Social Semantic Cloud of Tags (SCOT) ontology (Kim et al. 2007) is fo-cused on representing tag clouds and defines ways to describe the use and co-occurrence of tags on a given social platform, allowing one to move his or her tagsfrom one service to another and to share tag clouds with others. While we will in-troduce the ideas of data portability later on in this book, it is important to mentionthat SCOT envisions this portability not for the content itself but for the taggingactions and the tags of a particular user. SCOT reuses the Tag Ontology as well asSIOC and models tags, tagging actions and tag clouds. An important aspect of theSCOT model is that it considers the space where the tagging action happened (i.e.the social platform, e.g. Flickr or, as suggested by Gruber’s initial pro-posal. SCOT also provides various properties to define spelling variants betweentags, using a main spellingVariant property and various subproperties such as ac-ronym, plural, etc. Another ontology related to tagging is Meaning Of A Tag (MOAT)9, whichaims to represent the meaning of tags using URIs of existing domain ontology in-stances or resources from existing public knowledge bases (Passant and Laublet2008b), such as those from the Linking Open Data project introduced in Chapter4. The goal of MOAT is thus to create a bridge between folksonomies and existingontologies or knowledge bases so that the issues of free-form tagging regarding in-formation retrieval can be solved. For example, it allows us to model facts such as6 (URL last accessed 2009-06-05)7 (URL last accessed 2009-06-05)8 (URL last accessed 2009-06-05)9 (URL last accessed 2009-06-05)
  • 152. 146 The Social Semantic Web‘In this blog post, I use the tag “apple” and by that I mean the computer brandidentified by dbpedia:Apple_Inc., while the “apple” tag on that other picturemeans the fruit identified by dbpedia:Apple’, as depicted in Figure 8.6. To achieve this goal, it provides a lightweight OWL-DL ontology that reusesand extends the Tag Ontology. MOAT also relies on SIOC and FOAF to modelthe tagged resource and the user that assigned the tag to it respectively. MOAT ismore than a single model, as it also provides a framework10 based on the ontology,the goal of which is to let people easily bridge the gap between simple free-formtagging and semantic indexing. The latter is more powerful than the former interms of information retrieval but is certainly more complex in terms of annotatingcontent. The proposed framework aims to reduce this gap by helping users to an-notate their content with URIs of Semantic Web resources from the tags that theyhave already used for annotated content. Furthermore, while it mainly consists of a model and a framework for aug-mented tagging, the MOAT approach can be automated as applied by (Abel 2008)in the GroupMe!11 system. It therefore provides a nice bridge between approachesthat extract tag meanings from folksonomies, as we described in the previous sec-tion, and those that aim to model tags with Semantic Web technologies.Fig. 8.6. Modelling the meaning of the ‘apple’ tag in a tagging action using MOAT More recently, the Common Tag initiative12 (involving AdaptiveBlue, Faviki,Freebase, Yahoo!, Zemanta, Zigtag and DERI, NUI Galway) developed a light-weight vocabulary13 with a similar goal of linking tags to well-defined concepts(represented with their URIs) in order to make tagging more efficient and inter-connected. In particular, it focuses on a simple approach allowing site owners topublish RDFa tag annotations, as well as providing a complete ecosystem of pro-10 (URL last accessed 2009-06-05)11 (URL last accessed 2009-06-08)12 (URL last accessed 2009-07-16)13 (URL last accessed 2009-07-16)
  • 153. 8 Social tagging 147ducers and consumers of Common Tag data that can help end users to deploy ap-plications based on this format, as depicted in Figure 8.7. In addition, other models that can be used to represent tags include the Ne-pomuk Annotation Ontology (NAO)14, SIOC, and the Annotea annotation15 andbookmark16 schemas. Both NAO and SIOC define a new ‘Tag’ class, withsioc:Tag defined as a subclass of skos:Concept. SIOC also defines a topic propertyto link a resource to some of its topics. While not explicitly using the ‘tag’ word inits definition, the Annotea bookmark model provides a ‘Topic’ class and a ‘has-Topic’ property to link an item to some related keywords. This model also definesa ‘subTopicOf’ property in order to model hierarchies of topics. However, in con-trast to the main ontologies defined previously, these three vocabularies do notprovide any way to model the tagging action itself (i.e. the tripartite relation be-tween a resource, a tag and a user). Hence, they cannot capture the complete rep-resentation of folksonomies but simply focus on the relationship between a taggedresource and its related tags.Fig. 8.7. The initial Common Tag ecosystem1714 (URL last accessed 2009-06-05)15 (URL last accessed 2009-06-05)16 (accessed 2009-06-05)17 From
  • 154. 148 The Social Semantic Web Finally, machine tags18 from Flickr can be used as a way to provide augmentedtagging to end users. By defining tags in the form of ‘prefix:property=value’ (suchas ‘geo:lat=43.22’ or ‘lastfm:event=3544’), they allow users to add machine-readable metadata to annotated pictures. For example, the tag ‘geo:lat=43.22’ canbe used to define the location where a picture was taken, especially as it can beautomatically generated from camera phones and dedicated upload applications,while the ‘lastfm:event=3544’ tag can be used to automatically aggregate someFlickr pictures related to a particular event on While they are not directlyrepresented using Semantic Web technologies, machines tags can be mapped toRDF using the Flickcurl API19.8.3 Tagging applications using Semantic Web technologiesVarious tools already provide advanced tagging features using Semantic Webtechnologies or RDF exports of tagged data based on the techniques introduced inthe previous sections. We shall now describe some of them.8.3.1 AnnoteaAs mentioned before, while not strictly defined as a ‘tagging’ system, Annotea20was almost certainly the first web-based social application that used SemanticWeb technologies. Begun in 2001, it allowed people to simply add notes andcomments to web pages that they browsed, and to bookmark and then share themthrough a community of users that were subscribed to a dedicated Annotea server.An important feature of Annotea is its openness and compliance to W3C stan-dards. In particular, any data is available in RDF using the dedicated Annotea an-notation and bookmark vocabularies mentioned before. Furthermore, Annotea re-lies on a simple user interface so that the use of these technologies was completelytransparent for the end user. As it was launched some years before the Web 2.0meme and many of the works described in this book, Annotea can be seen as aprecursor of many Social Semantic Web applications. Moreover, various clientscan be used to interact with an Annotea server, from the W3C browser Amaya21(as shown in Figure 8.8) to Firefox plugins.18 (URL last accessed 2009-06-05)19 (URL last accessed 2009-06-05)20 (URL last accessed 2009-06-05)21 (URL last accessed 2009-06-05)
  • 155. 8 Social tagging 149Fig. 8.8. Adding an annotation on Annotea using the Amaya browser 228.3.2 Revyu.comRevyu.com23 is an online service dedicated to creating reviews for all sorts ofthings: from conference papers to pubs or restaurants. It reuses some well-knownprinciples and features of Social Web applications such as tags, tag clouds andstar ratings, and it provides a JavaScript bookmarklet to ease the publication ofnew reviews for end users when browsing the Web. Most importantly,, winner of the Semantic Web challenge in 2006, is completely RDF-based. Each review is modelled using the RDF Review vocabulary24 (compatible withthe hReview microformat) and tags as well as tagging actions are represented us-ing the Tag Ontology (Figure 8.9). As also provides a SPARQL end-point to allow one to query its data, it efficiently allows one to reuse tagged data22 From (URL last accessed 2009-06-05)24 (URL last accessed 2009-06-09)
  • 156. 150 The Social Semantic Webfrom the website in any other application, as well as enabling mashups with exist-ing content. Two important features of regarding the use of SemanticWeb technologies are: Integration and interlinking with other data sets. Thanks to different heuris- tics, integrates identity links (using owl:sameAs properties) to re- sources already defined on the Semantic Web, especially resources being de- scribed in data sets from the Linking Open Data cloud that we described earlier. For example, most reviews regarding research papers are linked to the paper definition from the Semantic Web Dogfood project25, while reviews about movies can be automatically linked to their DBpedia URI. Thus, it pro- vides global interlinking of Semantic Web resources rather than defining new URIs for existing concepts. The ability to consume FOAF-based user profiles. While many Social Web applications require the user to fill in their personal details when subscribing, with those details having already been filled in on other platforms, allows one to simply give his or her FOAF URI so that the information con- tained therein is automatically reused. As we will describe in more detail later on this book, consuming FOAF profiles in web-based applications provides a first step towards solving data portability issues between applications on the Social Web.Fig. 8.9. Tagged data and related RDF annotations from Revyu.com25 (URL last accessed 2009-06-05)
  • 157. 8 Social tagging 1518.3.3 SweetWikiSweetWiki (Semantic WEb Enabled Technology Wiki)26 from (Buffa et al. 2007)is a semantic wiki prototype featuring augmented-tagging features for end users.In contrast to the other wikis we mentioned in Chapter 6 of this book, it is not de-signed for creating and maintaining ontology instances, but rather uses SemanticWeb technologies to augment the user experience and navigation between pages.One relevant feature of SweetWiki regarding work described in this chapter is theability to organise tags as a hierarchy of concepts. This hierarchy is then modelledin RDFS so that it can be reused in other applications, while the wiki model itselfis defined using a particular OWL ontology. Most importantly, this hierarchy oftags is not a personal one but is built and shared amongst all the users of the wiki.In this way, SweetWiki provides a social and collaborative approach to maintain-ing hierarchies of concepts that can be seen as lightweight ontologies. Moreover,users can define two tags as synonyms in order to solve heterogeneity issues.From a tagging point of view, tags can be not only assigned to web pages but alsoto pictures and embedded videos, and these are then used to retrieve or browsecontent, while similar and related tags are used to augment the navigation processby suggesting related pages. Finally, SweetWiki models all of its data using RDFa. Hence, an applicationthat wants to reuse it is only required to extract and parse an XHTML page, sinceall the required RDF annotations are embedded in it and can be extracted usingGRDDL.8.3.4 int.ere.stBased on the SCOT ontology, is a web application dedicated to tag port-ability between applications. The main objective of (Kim et al. 2008) isto demonstrate how Semantic Web and Social Web technologies can be combinedto support better tag sharing and creation across various online communities. Us-ing, people can save, tag and bookmark in their own as well as otherpeople’s tag clouds, as represented using the SCOT ontology. The tag meta-searchalso allows one to look for similar patterns of tagging from other people based ontheir interests (as expressed using tags). Tag clouds can be imported into int.ere.stfrom various services, and a related SCOT exporter is available for the popularWordPress blogging platform. Some of the major functionalities provided by the application include: variousoptions for tag searching, such as and (&), or (space), co-occurrence (+), broader(>), and narrower (<); user searching; resource searching; integrating tag data26 (URL last accessed 2009-06-05)
  • 158. 152 The Social Semantic Webacross communities; meta tagging; ontology bookmarking; and sharing metadataproduced in using the FOAF, SIOC, and SCOT ontologies.8.3.5 LODrLODr27 is a personal application providing semantic-enrichment features for exist-ing tagged content from various popular social websites, such as Flickr, del.icio.usand Twitter. By allowing people to re-tag their content with URIs, rather thansimple keywords, i.e. to give meaning to their tags using the MOAT principles de-fined earlier, users’ social data can be weaved into the Semantic Web. Its main ob-jective is to provide a simple way to create RDF and interlinked content from ex-isting social websites, so that queries like ‘please list all SlideShare items taggedwith a topic related to the Semantic Web’ can be answered. One other importantmotivation is that LODr is not another tagging service, but is rather a system thatgives users a way to semantically enrich existing tag data that has been created intheir favourite tools, since it is important to avoid social network fatigue and to letusers keep their existing tagging habits. Since LODr is based on MOAT, people manually link their tags to URIs to de-fine their meanings. However, some suggestions are made based on the relation-ships that are already defined within the community or by querying some publicSPARQL endpoints. Once these links have been provided, users can benefit fromadvanced navigation capabilities. The screenshot in Figure 8.10 emphasises howtagged content can be browsed. When a concept is selected, as well as a list ofitems, the system also displays: (1) a description of the concept, using itsrdfs:comment property (or subproperties); (2) a list of related concepts by co-occurrence; (3) a list of related concepts that share a direct relationship with thecurrent one; and (4) a list of related concepts that share a common property. Whileall of these steps involve SPARQL queries and RDF graph browsing, the user isnot faced with the complexity of the model when retrieving information. For ex-ample, the figure shows the XSL Transformation concept being suggested whenbrowsing items tagged with SPARQL since both are related as they share the samevalue for their skos:subject property in DBpedia28. Finally, it is worth notifying that while LODr is an independent application, alldata provided by LODr users is also stored publicly on the LODr server29, and thisprovides a method for finding tagged data from various people. In the next sectionof this chapter, we will describe various SPARQL queries that can be performedon top of this data via the LODr SPARQL endpoint.27 (URL last accessed 2009-06-05)28 (URL last accessed 2009-06-05)29 (URL last accessed 2009-07-16)
  • 159. 8 Social tagging 153Fig. 8.10. Browsing semantically-enhanced data using LODr8.3.6 Atom InterfaceThe Atom Interface30 is an interesting approach for visualising and navigatingthrough tree structures and graphs. The name for the Atom Interface was chosenas it is based on the metaphor of electrons, atoms and molecules. It uses a ‘com-pact radial layout’ to organise items around their parents in a circular fashionwithin a single atom. Although radial layouts31 and menus are not a new idea andhave been around for some years (e.g. Maya’s marking menus, Mozilla’s Radial-Context pie menus, Neverwinter Nights’ radial menu), the Atom Interface30 isnovel in that: ‘(i) it is focused more towards exploring and browsing small andlarge trees by collapsing/uncollapsing paradigm (as opposite to visualising theoverview of the entire tree or graph), (ii) is more compact in order to emphasiserelationships and ease learning and understanding the structure, (iii) preserves fullcontext’. As mentioned earlier, the methods described by (Halpin et al. 2006) can be lev-eraged to create hierarchies of broader and narrower concepts in tag sets. TheAtom Interface can therefore be quite a useful means for visualising one’s book-marks and tag hierarchies, and for navigating the connections to other users’ con-tent (via their atoms) as shown in Figure 8.11.30 (URL last accessed 2009-06-05)31 (URL last accessed 2009-06-05)
  • 160. 154 The Social Semantic WebFig. 8.11. Atom Interface for browsing hierarchies of bookmarks8.3.7 FavikiFaviki is a social bookmarking service that uses a controlled vocabulary for itstags, namely the resources defined in Wikipedia. Hence, it provide features suchas multilingual tagging (with various tags being automatically linked to the sameconcept), a related tags suggestion service (based on the relationships betweenthese concepts), and it can also display tag descriptions, as shown in Figure 8.12.Faviki relies on DBpedia, Zemanta and Google Language APIs to provide its ser-vice. In addition, Faviki exposes its data in RDFa using the Common Tag formatintroduced previously, with some architectural details described in (Miličić 2008).Fig. 8.12. Browsing tagged content in Faviki
  • 161. 8 Social tagging 1558.4 Advanced querying capabilities thanks to semantic taggingFor the remainder of this chapter, we would like to demonstrate some advancedquerying capabilities that can be performed on semantically-enhanced tag data. Inparticular, we will detail the SPARQL queries corresponding to the examples thatwe described in the introduction to the chapter. As we mentioned earlier, thesequeries can be tested on the SPARQL endpoint associated with the LODr frame-work which contains data from various LODr instances32 (unless they are identi-fied as example queries).8.4.1 Show items with the tag ‘semanticweb’ on any platformThis simple query shows the benefits of having a general semantic model for tagsand tagging actions as it can be used to retrieve items tagged with ‘semanticweb’wherever the tagging action has been carried out. The main requirements are (1) tohave the data modelled using common semantics and (2) to index available tag-ging platforms using a broad-coverage Semantic Web search engine. We note thatthe Common Tag33 effort is moving in this direction. In the case of this example,the Tag Ontology is a sufficient model as we only need to identify a particular tag.To match the tripartite tagging model that the Tag Ontology provides, the querycan be thought of as ‘list items involved in any tagging action using the “seman-ticweb” tag’ and it is translated into the SPARQL query language as follows:PREFIX tags: <>SELECT ?itemWHERE { [] a tags:RestrictedTagging ; tags:associatedTag ?tag ; tags:taggedResource ?item . ?tag tags:name “semanticweb” .}8.4.2 List the ten latest items tagged by Alexandre on SlideShareIn this query, we introduce the concept of a tag space, i.e. the service in which thetagging action occurred. As we saw previously, the Tag Ontology does not pro-32 (URL last accessed 2009-07-16)33 (URL last accessed 2009-07-07)
  • 162. 156 The Social Semantic Webvide a way to model this space but the ‘has_space’ property from SIOC can beused in this context to identify the space associated with a tagged item. The fol-lowing query retrieves the latest items tagged by one user (identified with hisFOAF URI, on SlideShare as well as the related tag la-bels, with the items being ordered by date. The result of such a query is displayedin Figure 8.13.PREFIX tags: <>PREFIX dct: <>PREFIX sioc: <>SELECT DISTINCT ?item ?tagWHERE { [] a tags:RestrictedTagging ; tags:associatedTag [ tags:name ?tag ] ; tags:taggedResource ?item ; tags:taggedBy <> . ?item sioc:has_space <> ; dct:created ?date .} ORDER BY ASC(?date)Fig. 8.13. Identifying content tagging in a particular platform Another option is to identify the tagging space not via the ‘has_space’ propertybut to rely on the user account (rather than on the physical person). Normally, auser account is related to a particular service which acts as a tag space. This can beachieved thanks to SCOT’s ‘taggingAccount’ property which is used to identify aparticular instance of any ‘User’ (from SIOC) involved in a tagging action, lead-
  • 163. 8 Social tagging 157ing to the following example SPARQL query. (We shall describe this distinctionbetween the physical person and associated user accounts in more detail whentalking about the SIOC model in Chapter 11.)PREFIX tags: <>PREFIX scot: <>SELECT ?tagWHERE { [] a tags:RestrictedTagging ; tags:associatedTag ?tag ; scot:taggingAccount <> .}8.4.3 List the tags used by Alex on SlideShare and by John onFlickrThis query is somewhat similar to the previous one as it requires either the use ofSCOT or SIOC to model the particular websites used for the tagging actions.However, while the previous example focused on a single user within a single tagspace, this one can be used to identify similar tagging behaviours between differ-ent users and different applications. It thus emphasises one of the benefits of a common semantic model for repre-senting tags and related objects between Social Web applications (in this case,running a sort of cross-folksonomy analysis). The following example SPARQLquery retrieves similarly-used tags from two different users in two differentspaces, identified by their user accounts modelled in SIOC.PREFIX tags: <>PREFIX scot: <>SELECT ?tagWHERE { [] a tags:RestrictedTagging ; tags:associatedTag ?tag ; scot:taggingAccount <> . [] a tags:Tagging ; tags:associatedTag ?tag ; scot:taggingAccount <> .}
  • 164. 158 The Social Semantic Web8.4.4 Retrieve any content tagged with something relevant to theSemantic Web fieldFinally, we will describe a query that can be used to identify content tagged withconcepts related to the Semantic Web. While the query in Section 8.4.1 doessomething similar, it will not retrieve posts written in French, for example, usingthe tag ‘websemantique’ instead of ‘semanticweb’, as this query is based on a text-string matching approach. However, if people were encouraged to use a preciseURI as well as a simple tag to define the meaning of a tag (modelling this relation-ship using the MOAT or Common Tag vocabularies), e.g. using the DBpedia cate-gory, we would then be ableto retrieve all related posts independent of the language used. Moreover, thanks to these URIs, one can run even more advanced queries. Asin the example of retrieving all posts related to the Semantic Web, we could alsoshow those for which the topic is directly related to this URI (e.g. RDFa, SKOS,etc.) as the following query does, emphasising the benefits of combining data fromvarious data sets, interlinked together in the entire Semantic Web graph. Thisquery can be achieved thanks to different stacks of Semantic Web models for tag-ging, especially the Tag Ontology and MOAT. We actually check for all tagging actions in which the meaning is directly re-lated to the URI of the Semantic Web category as defined in DBpedia, i.e. ifsomething is related to the Semantic Web. As we can see in Figure 8.14, the queryretrieves information tagged with ‘folksonomies’, ‘grddl’ and ‘linkeddata’ sincethe related URIs are linked to the chosen identifier for the Semantic Web in thisquery ( tags: <>PREFIX moat: <>SELECT ?item ?tagWHERE { [] a tags:RestrictedTagging ; tags:associatedTag [ tags:name ?tag ] ; tags:taggedResource ?item ; moat:tagMeaning ?meaning . ?meaning ?p <> .}Fig. 8.14. Identifying content tagged with a tag related to a particular concept with MOAT
  • 165. 9 Social sharing of softwareAs well as content, many third parties are producing application widgets thatcan be added by users to their social website profiles, but mechanisms fortrusting the source of these widgets can be improved or augmented with in-formation derived from social network connections. We shall give an over-view of how lightweight semantics can be added to software descriptions andhow these semantics can be linked to the other efforts that we describe in thisbook. We shall hence describe how social networking properties can be takeninto account when retrieving project descriptions or when trying to trust ornot if some widget or particular project should be used on the Web.9.1. Software widgets, applications and projectsSoftware descriptions are required for the embeddable applications or ‘widgets’that are now proliferating many of the big social networking websites. Third-partydevelopers are now creating their own applications that can be added by users totheir own social networking profiles. For example, a user may choose to add awidget to their profile showing a map of places they have visited in the world, orenabling some other functionality which may not be natively offered by the socialwebsite. Soon after Facebook added a developer’s interface to their site, 4,000third-party applications had been made available and 70,000 developers hadsigned up to the developer community. Facebook’s active user count also jumped70% in the four months after this contributable application layer was added. Inparallel, Google has initiated the OpenSocial project1, which allows developers tocreate application widgets that can be deployed across a range of OpenSocial-enabled social networking sites. The Universal Widget API (UWA) from Netvibesalso allows one to write widgets that can be used on Netvibes, Google, the MacOS X Dashboard and other applications with the same source code. It aims to easethe spread of such lightweight applications on the Web but also on desktops.However, there is an important question in relation to these widgets: how does oneknow if she should trust the source of an application? For example, does a userhave to browse the complete source code (as a developer would) to avoid mal-ware, or can they, at a first glance, just rely on some social networking aspect, i.e.trusting applications from people they know?1 (URL last accessed 2009-06-05)J.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_9,© Springer-Verlag Berlin Heidelberg 2009
  • 166. 160 The Social Semantic Web Before widgets, many applications were already produced and shared on theWeb, mainly from open-source developer communities. In these communities, thesocial aspects of software project hosting and directory services are present butmay not be immediately obvious. Websites like SourceForge.net2, Savannah3 orBerliOS Developer4 offer tools for developers to manage their projects (source-code repositories, versioning, FTP space, etc.); Freshmeat5 or Ohloh6 allow themto reference and give visibility to their projects; Slashdot7 provides the latest ‘hot’news from the developer community and information on some projects. Yet, aswith many social websites, one problem is that developers must subscribe to eachhosting website independently, filling in their personal details on each one, andentering their project description again and again on each directory-like website. Beyond project hosting, these websites generally offer various social interac-tion tools for project tracking (such as blogs, wikis and mailing lists) which canprovide a social aspect to a software project. Thus, while the software develop-ment itself does not necessarily involve a social aspect (for example, source codewrite access might be delegated to only a restricted set of users), users can be partof the process, by reporting bugs and participating on the mailing list, answeringblog posts, or editing a project wiki page to suggest new functionalities. Softwaredevelopment can thus benefit from the participation of online communities in thedevelopment process, even if users are not directly ‘in touch’ with the source codeitself. Moreover, if those tools are not provided by the project hosting service it-self, developers can easily set them up using freely-available tools on the Web.9.2 Description of a Project (DOAP)As for blogs and wikis, a project description that describes a software applicationusually depends on the website it has been created on. There is thus a need for acommon metadata modelling scheme for describing applications, in order to pro-vide a unified way to represent it wherever it comes from. DOAP8 (Description of a Project) is an RDF vocabulary that aims to achievethis goal. It defines a ‘Project’ class with various properties, such as its maintain-ers, its licence, subversion access, etc. Moreover, since it is RDF-based, DOAPcan be reused with existing vocabularies. In particular, from a social networkingpoint of view, DOAP can be linked to FOAF to specify the developers of a project2 (URL last accessed 2009-06-05)3 (URL last accessed 2009-06-05)4 (URL last accessed 2009-06-05)5 (URL last accessed 2009-06-05)6 (URL last accessed 2009-06-05)7 (URL last accessed 2009-06-05)8 (URL last accessed 2009-06-05)
  • 167. 9 Social sharing of software 161(with their associated identifying URIs) rather than just having a plain-text name,which can often raise ambiguity or heterogeneity problems. If a user decides to install a widget or application on their social networkingservice, they usually have to trust some third-party service that may provide themwith a certificate which they can decide whether to trust or not. An alternate ap-proach is to leverage the social graphs of publishers and consumers of applicationwidgets. Let us suppose someone writes a Facebook or OpenSocial widget andthey want to distribute it, using this new approach. A user may choose to trust ap-plications written by people connected to them in their (distributed) social graphby no more than two degrees of separation, hence providing a simple way to de-cide how to trust or not trust an application. It is possible to use semantics to represent the various parts required in thisscenario: FOAF can be used to describe people and their (distributed) socialgraph; while DOAP can be used to describe software projects, with the widget orapplication as a component of these software projects. It is then possible to connectthe application project and the person together using FOAF-DOAP relationships. By using such representations, the social graph (that is used here to determinewhether to install a widget or not) does not have to be locked into one site, butrather can be distributed across any site that can be part of the larger intercon-nected social graph. As long as a publisher is part of the FOAF network, they donot even have to be on the particular social networking service where you installthe application. This means that one can trust an OpenSocial widget on one socialnetworking site if its author is someone he or she knows on another social website,where both sites have representations on the Semantic Web.9.2.1 Examples of DOAP useAs we have introduced, DOAP provides an RDFS vocabulary for defining meta-data related to software projects. As with FOAF and SIOC, it is a lightweight vo-cabulary, and this makes it easy for software developers who want to provide openand common descriptions of their projects using Semantic Web technologies. Forexample, the next snippet of code identifies metadata about the SIOC PHP API,defined as an instance of a ‘doap:Project’, and assigned a specific URI. As we cansee in this example, and as explained previously, maintainers of projects are repre-sented by their own URIs rather than by simple text strings.<> a doap:Project ; doap:name “SIOC PHP Export API” ; doap:shortname “sioc-export-api” ; doap:shortdesc “PHP API to create SIOC exporters” ; doap:description “SIOC PHP Export API provides an easy to write
  • 168. 162 The Social Semantic Web SIOC exporters for any PHP application“ ; doap:homepage <> ; doap:download-page <> ; doap:programming-language “PHP“ ; doap:licence <> ; doap:maintainer <> ; doap:maintainer < foaf-captsolo.rdf#Uldis_Bojārs> ; doap:developer <> ; doap:developer < foaf-captsolo.rdf#Uldis_Bojārs> ; doap:repository [ a doap:SVNRepository ; doap:location <> ] . While DOAP descriptions can be created by hand, various DOAP exporters formajor free software development websites have been written by developers (seefor example the RDF exporter for Ohloh9). Some websites natively expose DOAPfiles for the projects they host, such as the Python Package Index10. These export-ers allow software metadata to be available on the Web, described in a uniformway using the DOAP vocabulary (rather than just being embedded in web pageswhich makes it difficult for automatic reuse by software agents). As one can see in the above example, there are various ties between FOAF andDOAP. Since any project can have various developers or maintainers, DOAP of-fers the ability to use not only a name to define an author, but their URI, i.e. his orher identifier on the Semantic Web, generally associated with a FOAF profile.Thanks to URI identification, and in spite of the fact that these profiles are distrib-uted on the network, the software graph (DOAP), the identity graph (FOAF) andeven the content graph (SIOC) can be connected together, providing a completeoverview of the online activity and identity of people working on a given project. For example, Figure 9.1 shows how different graphs, related mainly to FOAF,SIOC and DOAP can interact together to provide a complete Semantic Web de-scription of a network, a widget description and a related blog post by variouspeople, in a distributed but interlinked way that can then be used for project identi-fication purposes. Moreover, projects can be related to various topics (e.g. social networking, se-curity, PHP programming, etc.). Here, once again, instead of relying on textstrings, people can use URIs to define project topics in a machine-understandableway.9 (URL last accessed 2009-06-05)10 (URL last accessed 2009-07-17)
  • 169. 9 Social sharing of software 163Fig. 9.1. Linking people, content and social widgets across websites A good practice would be to use URIs of topics as defined in DBpedia, or otherdata sets from the Linking Open Data movement. The link between the project anda topic URI can be defined directly by the project’s author, may be extracted fromthe project’s textual description using an NLP (natural language processing) algo-rithm, or can by added by the author via free-form keyword tagging using MOATas explained earlier in Chapter 8. For example, since our example project is re-lated to Semantic Web technologies, and particularly to the SIOC vocabulary, thefollowing code mentions the links between the project and those topics, uniquelyidentified with their DBpedia URIs.<> dc:subject <> ; dc:subject <> . Once again, and with reference to the earlier chapter on social tagging, express-ing these URIs offers new capabilities regarding information exchange and model-ling (we will also demonstrate this later).
  • 170. 164 The Social Semantic Web9.3 Crawling and browsing software descriptionsAs with FOAF profiles or any RDF data, DOAP files may be distributed over thenetwork, which can make it difficult for end users or developers to discover them.In order to solve this problem, an architecture was proposed by (Bojārs et al.2007a) involving various components acting together: (1) Semantic Radar11, aFirefox plugin whose goal is to discover RDF documents from HTML pages (ei-ther using auto-discovery links or thanks to embedded RDFa); (2) PingTheSeman-ticWeb12 (PTSW), a ping service for Semantic Web documents (from ZitgistLLC13) which stores a fresh listing of RDF files it has received pings about; and(3) doap:store14, a collaborative and open directory of DOAP projects.Fig. 9.2. A food chain to discover and reuse RDF data distributed over the Web11 (URL last accessed 2009-06-05)12 (URL last accessed 2009-06-05)13 (URL last accessed 2009-07-04)14 (URL last accessed 2009-06-05)
  • 171. 9 Social sharing of software 165 In fact, while all of these components were developed separately, they all actwith each other to provide a complete Semantic Web food chain depicted by theprevious picture that can be used to help discover any kind of RDF document (notonly DOAP ones) and can be used in other third-party applications (such as theSWSE search engine). When people browse the Web using Semantic Radar, the plugin sends a ping toPTSW each time an RDF file is found. PTSW then stores a link to this RDF file inits database, and provides a list of pinged documents to developers (which maythen be organised by type). In this system, discovering documents and storingpings is not only dedicated to DOAP, but can be useful for people who are lookingfor FOAF or SIOC files.Fig. 9.3. Screenshot of doap:store in action Finally, in this architecture, the doap:store service fetches the list of newDOAP files on a regular basis to provide a directory of DOAP projects that canthen be queried and browsed. doap:store (Figure 9.3) was one of the first tools touse this architecture, but anyone can benefit from it, by focusing on creating theapplication rather than finding and crawling the data. doap:store currently hostsmore than 9700 projects15 and new ones are constantly being added to the direc-tory thanks to the architecture described previously. An interesting point in thisworkflow is the social process it involves. Since anyone can contribute just bybrowsing the Web, this means that any user can be a part of the Semantic Web15 (URL last accessed 2009-07-17)
  • 172. 166 The Social Semantic Webdocument discovery process, weaving the ‘architecture of participation’ principlefrom Web 2.0 into the Semantic Web. The Semantic Radar plugin has also provedto be popular. In the time period from November 2006 to September 2008, thistool was downloaded 8068 times. On one particular day (27th August 2008), it had1767 active daily users.9.4 Querying project descriptions and related dataAs in the previous chapter on tagging, we will now detail some SPARQL queryexamples that show the benefits of Semantic Web technologies for modelling suchsoftware project-related data. The following queries can be tried out via the LODSPARQL endpoint16 that hosts a replica of the Linking Open Data cloud for que-rying, and it also contains both FOAF and DOAP data suitable for such queries.9.4.1 Locating software projects from people you trustIf we consider that a user will only trust software applications written by peoplethat they have added as personal connections (represented on the Semantic Webusing FOAF), the following query will retrieve projects in which one of the main-tainers of a project is in their network, where the original user is identified with<$uri>:PREFIX foaf: <>PREFIX doap: <>SELECT DISTINCT ?project ?friendWHERE { <$uri> foaf:knows ?friend . ?project a doap:Project ; doap:maintainer ?friend .} For example, the following results are retrieved when the above query is per-formed using the <> URI.Fig. 9.4. Identifying projects via social networking relationships16 (URL last accessed 2009-07-17)
  • 173. 9 Social sharing of software 167 Moreover, as we will explain in Chapter 10 when describing FOAF, instead ofgiving a URI, one can use an IFP to identify themselves, such as an e-mail addressor an OpenID URL. A similar query can be used if one decides to trust not onlytheir direct friends, but also their friends-of-friends as shown below, retrieving theproject, its maintainer, and the person that acted as an intermediary connection:SELECT DISTINCT ?project ?friend ?friendofafriendWHERE { <$uri> foaf:knows ?friend . ?friend foaf:knows ?friendofafriend . ?project a doap:Project ; doap:maintainer ?friendofafriend .} Also, the query could be extended to express various degrees of connectivity.The current SPARQL specification only allows node-arc-node queries, whichmeans that for each desired path length, the query must be adapted. However, aSPARQL ‘path’ extension like SPARQLer (Kochut and Kanik 2007) can be usedwith appropriate SPARQL engines, allowing us to write queries like ‘find all pro-jects from people I’m connected to via a path of between one and three (inclusive)foaf:knows relationships’.9.4.2 Locating a software project related to a particular topicSimilar to the earlier example of blog posts and associated topics, where projectsare related to topics using URIs rather than keywords, projects around a particulartopic can easily be found. Once again, we show how various data sets interlinkedwith URIs in this ‘Giant Global Graph’ enable us to perform advanced queries. Moreover, this can be combined with a social networking aspect. The followingexample query could be constructed to retrieve all projects with a topic related tothe Semantic Web created by people known to a user with the identifier <$uri>,hence using the social networking aspect to make the query even more relevant:SELECT DISTINCT ?project ?friendWHERE { ?project rdf:type doap:Project ; doap:maintainer ?friend ; dc:subject ?topic . ?topic ?rel <> . <$uri> foaf:knows ?friend .}
  • 174. 10 Social networksSocial networking services (SNS) allow a user to create and maintain anonline network of close friends or business associates for social and profes-sional reasons. There has been an explosion in the number of online socialnetworking services in the past five years, so much so that the terms YASNand YASNS (Yet Another Social Network[ing Service]) have become com-monplace. However, these sites do not usually work together and thereforerequire you to re-enter your profile and redefine your connections when youregister for each new site. There are also deficiencies in the traditional idea ofa social network as a set of individuals and relationships, in that it does notconsider the shared objects which bring people together, and the semanticsthat can be used to represent these interlinked people and objects.10.1 Overview of social networksThe ‘friend-of-a-friend effect’ often occurs when someone tells someone some-thing and they then tell you1 - linked to the theory that anybody is connected toeverybody else (on average) by no more than six degrees of separation. This num-ber of six is often attributed to sociologist Stanley Milgram who conducted an ex-periment in the late 1960s (Travers and Milgram 1969). Random people from Ne-braska and Kansas were told to send a letter (via intermediaries) to a stock brokerin Boston. However, they could only give the letter to someone that they knew ona first-name basis. Amongst the letters that found their target (around 20%), theaverage number of links was around 5.5 (rounded up to 6). While this experimentdid not sufficiently verify the six degrees number, it does demonstrate that mostindividuals are separated by just a few hops. According to a report in NatureNews2, users of the Microsoft Messenger instant messaging service are 6.6 de-grees away from each other. The six degrees idea is nicely summed up by thisquote from a play called ‘Six Degrees of Separation’ written by John Guare: I read somewhere that everybody on this planet is separated by only six other people. Six degrees of separation between us and everyone else on this planet. The President of the United States, a gondolier in Venice, just fill in the names. [...] It’s not just big names – it’s anyone. A native in a rain forest, a Tierra del Fuegan, an Eskimo. I am bound – you are bound – to everyone on this planet by a trail of six people.1 As found in the old Irish expression “Dúirt bean liom go ndúirt bean léi”, translated as “Awoman told me that a woman told her”2 (accessed 2009-06-08)J.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_10,© Springer-Verlag Berlin Heidelberg 2009
  • 175. 170 The Social Semantic Web Some other related ideas include the Erdős number: the number of links re-quired to connect scholars to mathematician Paul Erdős (1913-1996) via co-authorship of papers. Erdős was a prolific writer who co-authored over 1500 pa-pers with more than 500 authors. Jerry Grossman’s site3 allows mathematicians tocompute their Erdős numbers: the average connecting path length (among mathe-maticians only) is 4.65 while the maximum is 13. There is also the Kevin Bacon game, where the goal is to connect any actor toKevin Bacon, by linking actors who have acted in the same movie. It was inventedby three Albright College students (Craig Fass, Brian Turtle, and Mike Ginelly) in1994, and the ‘Oracle of Bacon’ website4 progressed from this by leveraging theInternet Movie Database to find the shortest link between any two actors. Accord-ing to this service, most actors are within three links of each other, and the mostcentral actor is Rod Steiger (Kevin Bacon is ranked 1049th in terms of closeness tothe centre). The theory is that this is due to the huge range of movie genres thatSteiger acted in. The game was also parodied by the Dilbert comic strip: Ratbert: ‘Name me a famous celebrity.’ Dilbert: ‘Sandra Bullock.’ Ratbert: ‘…Sandra Bullock was in a movie with Kevin Spacey, and Kevin Spacey eats bacon. See? Everyone on Earth is only one degree from a person named Kevin who eats bacon!’ Dilbert: ‘That is so close to being interesting…’Fig. 10.1. Two people interconnected through the small-world network theory It is often found that even though one route is followed to get in contact with aparticular person, after talking to them there is another obvious connection thatwas not previously known about5. An example of this is shown in Figure 10.1where the original link between John B. and Marc (via Uldis and Valdis) was rein-forced by further links after John B. and Marc first met and had a chat. This is part3 (URL last accessed 2009-06-08)4 (URL last accessed 2009-06-09)5 (URL last accessed 2009-06-09)
  • 176. 10 Social networks 171of the small-world network theory (Watts and Strogatz 1998), which says thatmost nodes in a network exhibiting small-world characteristics (such as a socialnetwork) can be reached from every other node by a small number of hops orsteps. As this graph illustrates, different paths can be selected to go from one per-son to another, for example, a path related to close friends, another one related tofamily members and another through business networks.Fig. 10.2. Radial view of those connected to a person by two degrees of separation Even in a small-sized SNS, there can be a lot of links available for analysis, andthis data is usually meaningless when viewed as a whole, so one usually needs toapply some social network analysis (SNA) techniques6. Apart from comprehen-sive textbooks in this area (Wasserman and Faust 1994), there are many academictools for examining social networks and performing common SNA routines, e.g.UCINET (Borgatti et al. 2002) and Pajek7 (Batagelj and Mrvar 1998) can be usedto drill down into various social networks. A common method is to reduce theamount of relevant social network data by clustering. One can choose to clusterpeople by common friends, by shared interests, by geographic location, by tags,etc. The JUNG framework (O’Madadhain et al. 2003) for network and graph visu-alisation can also be used to develop custom analytics and visual tools for socialnetworks.6 (accessed 2009-06-09)7 (URL last accessed 2009-06-09)
  • 177. 172 The Social Semantic Web In social network analysis, people are modelled as nodes or ‘actors’. Relation-ships (such as acquaintanceship, co-authorship, friendship, etc.) between actorsare represented by lines or edges. This model allows analysis using existing toolsfrom mathematical graph theory and mapping, with target domains such as movieactors, scientists and mathematicians (as already mentioned), sexual interaction,phone call patterns or terrorist activity. There are some useful tools for visualising these models, such as Vizster8 by(Heer and Boyd 2005), based on the Prefuse9 open-source information visualisa-tion toolkit. Prefuse (Heer et al. 2005) allows one to visualise networks in a vari-ety of forms, including a radial layout (showing layers of friends and friends-of-friends and friends-of-friends-of-friends as in Figure 10.2) or an animated force-directed view (the same data is displayed in Figure 10.3, but in this case a physicssimulation of interacting forces is used to draw the graph where the nodes repeleach other and the edges act as springs).Fig. 10.3. Force-directed view of those connected to a person by two degrees of separation8 (URL last accessed 2009-06-09)9 (URL last accessed 2009-06-09)
  • 178. 10 Social networks 173 Others have combined SNA with Semantic Web technologies10 to determinesocial behaviour patterns, and MIT Media Lab are conducting mobile SNA re-search via their ‘reality mining’ project11. On the security front, the NSA is usingsocial network analysis technologies for homeland security, and there have beenreports from the New Scientist of ‘automated intelligence profiling’12 from socialnetworking services like MySpace.10.2 Online social networking servicesThere has been a proliferation of social networking sites or SNSs which (Boydand Ellison 2007) define as a category of websites consisting of user profiles,which other users can comment on, and a traversable social network originatingfrom publicly-articulated lists of friends. The idea behind such services is to makepeople’s real-world relationships explicitly defined online - whether they be closefriends, business colleagues or just people with common interests. Most SNSs al-low one to surf from a list of friends to find friends-of-friends, or friends-of-friends-of-friends for various purposes. While the majority of these sites are forpurely social reasons, others have additional purposes such as LinkedIn which istargeted towards professionals. As already mentioned in relation to object-centredsociality, there has been some debate on what people consider as being useful asopposed to ‘useless’ social networking services. Before 2002, most people networked using online services such as OneList (amailing list service), ICQ (‘I seek you’, an instant messaging program) or eVite (asite for sending invitations). The first big SNS in 2002 was Friendster; in 2003,LinkedIn and MySpace appeared; then in 2004, orkut and Facebook (by a collegestudent for college students) were founded; these were followed by Bebo (target-ing both high school and college students) in 2005. Social networking servicesusually offer the same basic functionalities: network of friends listings (showing aperson’s ‘inner circle’), person surfing, private messaging, discussion forums orcommunities, events management, blogging, commenting (sometimes as en-dorsements on people’s profiles), and media uploading. In general, these sites donot usually work together and therefore require you to re-enter your profile andredefine your connections when you register for each new site. Some motivations for SNS usage include building friendships and relation-ships, arranging offline meetings, curiosity about others, arranging business op-portunities, or job hunting. People may want to meet with local professionals, cre-ate a network for parents, network for social (dating) purposes, get in touch with aventure capitalist, or find out if they can link to any famous people via their10 (URL last accessed 2009-06-09)11 (URL last accessed 2009-06-09)12 (URL last accessed 2009-06-09)
  • 179. 174 The Social Semantic Webfriends. SNSs can enable communities of like-minded people to group togetherand talk about items and interests that they would never have been able to do aseffectively or as regularly via meetups, phone conversations or e-mail. It is a loteasier to get involved in a community of interest through SNSs, and having thesesocial networks centred around objects of common interest means that you can getthe news and answers about things you are actually interested in very quickly. SNSs are also useful for networks of geographically-disconnected friends, whomay have been in school or college but are now spread all over the world. Theyare very valuable for getting in touch with someone whose advice or skills youneed, through a friend-of-a-friend or a friend-of-a-friend-of-a-friend. The negativeaspects of SNSs include the inadvertent uses made of them that they were neverintended for: bullying, stalking and other types of abuse13. Many of these social networking services have the attractor of ‘how manyfriends do I have?’ which becomes almost like a viral phenomenon. When visitinga profile on one of these sites, a small number is often displayed beside someone’spicture that says they have 20 friends. Of course, the natural instinct is to becomeas popular as that person, and the visitor will try and gather at least 20 friends aswell. In addition to relationship management, social networks are sometimes used forviral marketing (Leskovec et al. 2006), although there are differing opinions as tohow effective this is. For example, Karin Knorr-Cetina (Knorr-Cetina 1997) re-ports that ‘the additional purchases that resulted from recommendations are just adrop in the bucket of sales’ and that ‘marketers should take heed that even if viralmarketing works initially, providing excessive incentives for customers to recom-mend products could backfire by weakening the credibility of the very same linksthey are trying to take advantage of’. In contrast, Nielsen reported in July 200914that personal recommendations and consumer opinions posted online are the mosttrusted forms of advertising globally. A key feature of social networking sites is community-contributed content thatcan be commented upon by others. That content can be virtually anything: blogentries, board posts, videos, audio, images, wiki pages, user profiles, bookmarks,events, etc. Tagging is also common to many social networking websites. Many SNS are also opening up to web crawlers, making users’ profiles (or atleast part of them) and content searchable and accessible without being loggedonto the site. Eric Schmidt, speaking at Google’s Zeitgeist conference in 200715said: ‘People don’t appreciate how many page views on the Internet are in socialnetworks. It is very real. It’s a very real phenomenon.’ While it is difficult to calculate the exact proportion of all page views that dogo to SNSs, we can get some indications of their popularity through rankings from13 (URL last accessed 2009-06-09)14 (URL last accessed 2009-07-22)15 (URL last accessed 2009-06-09)
  • 180. 10 Social networks 175Alexa16 and other sources. Facebook is at #5 in Alexa’s top 10, with MySpaceranked #7. In November 2006, Compete said that the 10 most popular domainsaccounted for about 40% of all page views on the Web, and nearly half of thoseviews were from the social networking services MySpace and Facebook. That is20% of all page views coming from social networking services (just in those top10 domains). Other popular SNSs according to Alexa include Hi5 at #17, V Kon-takte at #32, Orkut Brazil at #38, Skyrock at #42 and Friendster at #43. Interestingly, according to comparison figures17 from August 2007 by Niel-sen//NetRatings, US teenagers aged 12 to 17 who visit both MySpace and Face-book spend more time at each site than those who visit just one site or the other.During that month, teens who visited both sites spent on average 20% more timeon MySpace (than visitors to MySpace alone), and on average 26% longer atFacebook (than exclusive Facebook visitors). 80% of Facebook’s visitors in Au-gust 2007 also visited MySpace. Between 2006 and 2008, Friendster received four patents that cover social net-working services. One of Friendster’s 2006 patents applied to networks that limitrelationships to a certain number of degrees of separation (i.e. where you cannotconnect to someone who does not know someone who knows someone youknow). The patent seems to be general enough to cover the activities of manyother sites, especially those like LinkedIn that only allow people to connect withina certain number of degrees of separation. Reid Hoffman, founder, said18: ‘Some of our folks have reviewed the claims, and think thatit’s fairly obvious that none of them apply to us... So, in short, not worried.’ How-ever, the patent might also have broader application, to networks without suchlimits, since the patent says that you can define that maximum number of degreesas ‘any number’. To set up your own social network, there are a variety of options includingelgg19, the AROUNDMe social networking software system by Barnraiser20, theNing21 hosted social networking service, or the Drupal content management sys-tem installed with associated social networking modules.10.3 Some psychology behind SNS usageSocial networking services are serving as places where people can define their off-line relationships to others – family, friends, schoolmates, etc. – online, and as16 (URL last accessed 2009-06-09)17 (URL last accessed 2009-06-09)18 (URL last accessed 2009-06-06)19 (URL last accessed 2009-06-09)20 (URL last accessed 2009-06-09)21 (URL last accessed 2009-06-09)
  • 181. 176 The Social Semantic Webwell as defining existing real-world relationships, they can begin to form new onesor strengthen others with people they may have had less contact with previously.However, few users realise the very public nature of content on these SNSs, espe-cially through new social network search engines like Spock22, Bigulo23, Wink24and even traditional search engines like Yahoo! or Google. It is quite easy to deliberately or accidentally happen across someone’s profile,with all sorts of information there that they would not expect the public to be ableto view. Few also grasp the persistent nature of information on the Web. Even af-ter one’s SNS page has been edited to remove any unwanted information or com-promising photographs, services like the Internet Archive or Google may wellhave an archived or cached copy for future generations or potential employers toexamine. Some SNSs give users very primitive control over what other people can see:just allowing users to state if their profiles should be public or private. Other SNSsoffer more sophisticated access control, based on the number of degrees away thata contact is. In terms of how a person is perceived on a SNS, there is also a drivetowards making profiles more interesting, and sometimes people go to extremes todo so by posting pictures on their site that they certainly would not hand out to astranger on the street. This is fine as long as users remember that possibly anyonecan see their profile and if they are assured that the benefits of being in a particularcommunity outweigh the potential side effects. Often due to the social norms in SNS-based communities, i.e. the perceivedclosed nature of the community, individuals may appear to be acting in a moreego-centric manner than they would in other situations (because ‘everyone is do-ing it’). This is dependent on the personality of the individual, as not everyone willbe comfortable uploading photographs taken while out drinking with friends thenight before (whereas some will). People will often say things online that theywould not say to another person face-to-face, but the reverse is true and others willnot get the same enjoyment from interacting through a web browser as they wouldin real life. Virtual reputation is also becoming an obsession with SNS users, e.g. throughhit counters on profile pages or facilities like ‘Share the Luv’ on Bebo. Fortu-nately, most SNSs only allow positive increases in reputation, avoiding the prob-lems encountered in eBay or with bulletin board karma systems where negativepoints can be awarded. As already mentioned, one of the most common egoboosters is the number of friends one has along with the number of comments or‘scraps’ that people are leaving on one’s SNS profile. To increase one’s totalnumber of friends, one common approach is to examine the friends lists of directcontacts to look for one’s friends that may have not been connected to yet.22 (URL last accessed 2009-07-07)23 (URL last accessed 2009-07-07)24 (URL last accessed 2009-07-07)
  • 182. 10 Social networks 177 Others are bypassing the establishment of a trust model on social networkingservices in favour of self promotion in order to attract new people to their profiles(e.g. through attractive photographs), thereby increasing the chances of someoneserendipitously happening across their profile via a friend or search. However, onecould probably attract better matches to one’s profile by expanding on interests,the events one liked and films one went to, or even just by blogging rather thanfollowing this image-centred approach. There is also a ritualistic element to SNSs with people entering their dailythoughts on a blog linked to their profile and checking in for comments on a regu-lar basis, thereby creating a strong tie between someone and their online presence.For those who are endeavouring to make themselves more popular, it is notenough to just contribute to one’s own page, but users must comment on other’spages to create more backlinks and drive people back to the user’s own profile.Fortunately, this means interacting with other people online (and offline), balanc-ing the ritualistic maintenance of one’s own profile somewhat with feedback fromothers. For some, SNS usage can border on addiction, where people have to regu-larly login to check what has changed on their own pages and on those of theirfriends. Facebook’s news feed allows users to see at a glance who (from theirfriends list) has updated their profiles. Many social networking services also allow people to express their individual-ity through widgets, skins and other ‘pieces of flair’. From the individual’s per-spective, being able to customise their page like this means that they can attractlike-minded or random curious people to visit and keep on visiting their pages,making themselves more popular in the process. From the service provider’s per-spective, it is driving up the revenues from the display of advertising on the site. SNS are causing more young people to watch less television25, and this alsomeans that they are physically disconnected or away from their families whilethey are logged on. In the opposite direction, more conversations are being formedoffline between teenagers discussing what such and such put on their profile pagelast night, but it is unlikely that the extra time spent online is less than that beingspent offline as a result.10.4 Niche social networksAlthough some have argued against the need for niche social networking ser-vices26 due to the widespread usage of large sites like Facebook and MySpace,niche SNSs can provide a breath of fresh air when one wants to escape from thebigger overcrowded SNS cities. As long as a niche SNS or community site pro-25 (URL last accessed 2009-06-09)26 (URL last accessed 2009-06-09)
  • 183. 178 The Social Semantic Webvides regularly-updated content to a steady or growing set of users, there is no rea-son that such sites should not persist or flourish on the Web. As pointed out by Paul Gibler in his article ‘The Expanding World of SocialNetworking’27, it is the fine-grained and targeted communities such as CafeMom,BOOMj and PEERtrainer that are experiencing recent growth. Niche SNSs havebeen set up to cater for age demographics (e.g. Multiply for seniors), home coun-tries (SiliconIndia), charities (Ammado28,, gender (MothersClickfor moms), occupations (The Financial Executives Networking Group) and inter-ests (StreetCred for hip-hop). Gibler says: ‘If you have a clearly defined and un-served target market, there could be a place for a niche SNS to connect and serveyour audience.’ And even if your target users are already using a large SNS, then aniche group can be housed therein. Many niche SNSs already exist, and if the niche site is more relevant and keptupdated for its users, it will survive. This also ties into the idea of object-centredsociality introduced earlier. Blogger Mark O’Neill sums it up nicely29: ‘[...] by or-ganising networks centrifugally around objects, social networking sites havemeaning, even when they do not have 200 million users and even when they arecentred around minority interests (like Thomas Kinkade paintings!). The point isthat they are centred on objects which are in common.’ As an interesting example of a niche social network, Sun launched a social (andorganisational) networking service in 2007 called OpenEco.org30, which aims tohelp companies and organisations to improve their environmental footprintthrough best practice sharing. Normally, organisations must calculate their green-house gas (GHG) emissions with custom or proprietary tools, requiring significantinternal resources or external consultancy. Now, GHG data can be created andshared using’s ‘free, open and secure online tools for visualising andmanaging carbon footprints’, enabling organisations to benchmark against eachother, to set footprint reduction goals, and to share best practices to reach thesegoals. Similarly, Yokohama Tyre Corporation recently implemented a white-labelSNS solution for their ‘online presence and green marketing initiative’ called EcoTreadsetters. The aim of the site is to enable Yokohama consumers across the USto communicate with each other, to create and navigate custom profiles, to formsub-communities, and to submit their environmental projects via the Eco Treadset-ters site. If the rise in universal SNSs such as Facebook and MySpace is mirrored by anexplosion in the growth of niche SNSs31, this will accelerate the demand for se-mantic-type applications that will allow people to travel seamlessly through vari-ous social networking services finding objects related to their interests along the27 (URL last accessed 2009-06-09)28 (URL last accessed 2009-07-20)29 (URL last accessed 2009-06-09)30 (URL last accessed 2009-06-09)31 (URL last accessed 2009-06-09)
  • 184. 10 Social networks 179way. This is where projects like OpenID, FOAF and SIOC can help (more onthese later). For example, you may have a single login (via an OpenID-based ser-vice) that is tied to your interests (as defined in your FOAF profile) which canthen be semantically matched to content items created across many social net-working communities (represented in SIOC).10.5 Addressing some limitations of social networksA problem exists with most social networking services (SNSs) in that they usuallydo not work together and therefore you are required to re-enter your profile and re-define your social connections when you register for each new site. There havebeen a lot of complaints in recent years about the walled gardens that are socialnetwork sites. Users may have many identities on different social networks, whereeach identity was created from scratch. People are tired of repeating the same in-formation in multiple places, and through standard sign-on systems like OpenIDand profile representation mechanisms like FOAF, you can allow someone to de-fine their identity and to reuse it wherever they choose to use it. A reusable profile would allow a user to import their existing identity and con-nections (from their own homepage or from another site they are registered on),thereby forming a single global identity with different views. If data about a per-son is aggregated from various social networks and linked together using a com-mon representation format, a single global identity can be formed with differentviews and associated reputations / histories. Semantic Web vocabularies such as FOAF and microformats like hCard andXFN (XHTML Friends Network)32 can serve as useful platforms for linking or re-using the diverse information about a person from heterogeneous social network-ing sites and for performing operations on such reusable and linked data. This canthen be used to provide an enhanced view of an individual’s activity in a distrib-uted social network (e.g. ‘show me all the content that Alice has acted on in thepast three months’). ‘Social network portability’ is a related term that has been used to describe theability to reuse one’s own profile across various social networking sites and appli-cations. The founder of the LiveJournal blogging community, Brad Fitzpatrick,wrote an article33 in August 2007 from a developer’s point of view about forminga ‘decentralised social graph’, which discusses some ideas for social network port-ability and aggregating one’s friends across sites. Dan Brickley, the co-creator ofthe FOAF vocabulary, wrote a related article entitled ‘The World is Now Closed’which talked about how SNSs should not define one’s relationships in absolute32 (URL last accessed 2009-06-09)33 (URL last accessed 2009-06-09)
  • 185. 180 The Social Semantic Webterms and that even an aggregate social graph cannot be so clearly defined34. Inparallel with this, a social network portability mailing list was established discuss-ing many interesting topics including centralisation versus decentralisation,FOAF, XFN, hCard, OpenID, Bloom filters, ownership of your published content,categorising friends and personas, the OpenFriendFormat, SNAP (Social NetworkApplication Platform), aggregation and privacy, and XMPP (Extensible Messag-ing and Presence Protocol). We are also beginning to see the emergence of distributed social networkingplatforms, where social connections can be formed across sites. The AppleseedProject35 is an effort to create open-source social networking software that is basedon a distributed model. For example, a profile on one Appleseed website could‘friend’ a profile on another Appleseed website, and the two profiles could interactwith each other. The open microblogging platform StatusNet36 also allows users tocreate friend connections across installations. Efforts are also underway to allowpeople to collect and manage their identities across various SNSs (PeopleAggrega-tor, 30boxes, etc.), and solutions like OpenID37 are enabling people to have a sin-gle sign-on to any of the SNSs that they are a member of. DiSo38 (Distributed So-cial Networking applications) is a project which aims to implement open-sourcedistributed social networks. NoseRub39 is a PHP and MySQL-based prototypeframework for a distributed social network that allows profiles to be synchronisedbetween various services running NoseRub, and it reuses open standards likeOpenID, RSS and FOAF. The evolving need for reusable profiles has been highlighted by several recentnotable efforts. DataPortability40 is a group whose aim is to advance standardsenabling data-sharing between services. The Open Data Definition41 fromCurverider (developers of the elgg SNS42) aims to provide data portability of us-ers, sites and objects, and the relationships between them. Google’s Social GraphAPI indexes publically-articulated social connections and allows users to viewtheir social network across multiple services. There is also an IETF working groupaiming to standardise personal address books using the vCard format43. These ini-tiatives also make use of existing and open standards like FOAF, microformatsand OpenID.34 (URL last accessed 2009-06-08)35 (URL last accessed 2009-06-09)36 (URL last accessed 2009-09-09)37 (URL last accessed 2009-07-07)38 (URL last accessed 2009-06-09)39 (URL last accessed 2009-07-13)40 (URL last accessed 2009-06-09)41 (URL last accessed 2009-06-09)42 (URL last accessed 2009-07-17)43 (URL last accessed 2009-06-09)
  • 186. 10 Social networks 181 Some sites like Dopplr have begun to offer features whereby you can bringyour friends with you from another service by specifying something like yourGmail account details (matching e-mail addresses you use) or your Twitter ac-count details (retrieving a list of those whose microblogs you follow). It would beeven more useful to have a smaller set of standard, reusable contact formats thatcould make such services more widespread (thereby extending the number of ser-vices that you could import from). The Google Social Graph API is a nice exam-ple of something that can enable this, as it allows applications to reuse socialgraph information extracted from sources all over the Web and represented usingthe open formats XFN and FOAF. ‘The Bill of Rights for Users of the Social Web’ was authored in September2007 for social websites who wish to guarantee ownership and control over one’sown personal information44. As well as being able to port your personal profileand friends, it would be useful to be able to port your content items as well. Thereare some issues related to both transporting your friends (you may need their per-mission to do so) and also comments attached to your content (as you may needthe permission of those commenters too), but you should at the very least be ableto bring what belongs to you, i.e. your own SNS profile and the content that youyourself created (aligning with the aims of this ‘bill’).10.6 Friend-of-a-Friend (FOAF)Semantic Web technologies allow for a more expressive description of a socialnetwork, enabling the use of heterogeneous nodes and links denoting differenttypes of objects and different types of relationships respectively. This enables usto express a model for an object-centred network where content and other items ofinterest can be described along with people in a decentralised manner.Fig. 10.4. The FOAF logo44 (URL last accessed 2009-06-09)
  • 187. 182 The Social Semantic Web The Friend-of-a-Friend (FOAF)45 project was started by Dan Brickley andLibby Miller in 2000 and defines a widely-used vocabulary for describing peopleand the relationships between them, as well as the things that they create and do. Itenables people to create machine-readable web pages for people, groups, organisa-tions and other related concepts. The main classes and properties in the FOAF vo-cabulary are shown in Figure 10.5. These include the commonly-used classesfoaf:Person (for describing people), foaf:OnlineAccount (for detailing the onlineuser accounts that they hold), and foaf:Document (for the documents that peoplecreate). Some of the most important properties are foaf:knows (used to create afriend link), foaf:mbox_sha1sum (often used as an identifier for a person) andfoaf:topic_interest (used to point to resources representing an interest that a personmay have). foaf:knows is one of the most used FOAF properties: it acts as a simple way tocreate social networks through the addition of knows relationships for each indi-vidual that a person knows. For example, Bob may specify knows relationships forAlice and Caroline, and Damien may specify a knows relationship for Carolineand Eric; therefore Damien and Bob are connected indirectly via Caroline.Fig. 10.5. Friend-of-a-Friend terms as illustrated by Dan Brickley4645 (URL last accessed 2009-06-09)46 (URL last accessed 2009-06-09)
  • 188. 10 Social networks 183 Anyone can create their own FOAF file describing themselves and their socialnetwork, using tools such as FOAF-a-matic47 or FOAF Builder48 from QDOS. Inaddition, the information from multiple FOAF files can easily be combined to ob-tain a higher-level view of the network across various sources, as shown in Figure10.6. This means that a group of people can articulate their social network withoutthe need for a single centralised database, following the distributed principles usedin the architecture of the Web. FOAF can be integrated with any other Semantic Web vocabularies, such asSIOC, SKOS, etc. Some prominent social networking services that expose data us-ing FOAF include Hi5 (a social networking site), LiveJournal (a social networkingand blogging community site), Vox (a social networking and blogging service), (a microblogging site) and MyBlogLog (an application which addscommunity features to blogs). People can also create their own FOAF documentand link to it from their homepage. Aggregations of FOAF data from many indi-vidual homepages are creating distributed social networks; this can in turn be con-nected to FOAF data from larger online social networking sites.Fig. 10.6. Integrating social networks by using FOAF as a common representation format andhaving unique URIs for people (Kinsella et al. 2007) Third-party exporters are available for major social websites including Flickr,Twitter (a microblogging service) and Facebook. For example, there is a MySpaceRDF translation service that provides FOAF documents for MySpace users andMusic Ontology metadata about the user’s audio tracks if they are also a musi-cian49, while the Flickr exporter (Passant 2008a) reuses the GeoNames vocabulary(and knowledge base) for geolocation information and the SIOC vocabulary for47 (URL last accessed 2009-07-17)48 (URL last accessed 2009-07-17)49 (URL last accessed 2009-06-09)
  • 189. 184 The Social Semantic Webrepresenting user galleries. FOAF data can be easily produced from social soft-ware systems that feature some user profiles and friends lists, e.g. the vBFOAFexporter50 for the popular message board system vBulletin. Stephen Flinter’s se-mantictweet service51 allows one to retrieve one’s Twitter friends and followers asa FOAF document. The structure of the social network formed by relations expressed in FOAFdocuments on the Web has been studied in (Ding et al. 2005), particularly thesmall-world characteristics of the graph. FOAF documents usually contain per-sonal information, links to friends, and other related resources. The knowledgerepresentation of a person and their friends would be achieved through a FOAFfragment similar to that below.<> a foaf:Person ; foaf:name “John Breslin“ ; foaf:mbox <> ; foaf:homepage <> ; foaf:nick “Cloud“ ; foaf:depiction <> ; foaf:topic_interest <> ; foaf:knows [ a foaf:Person ; foaf:name “Sheila Kinsella” ; foaf:mbox <> ] ; foaf:knows [ a foaf:Person ; foaf:name “Hak-Lae Kim” ; foaf:mbox <> ] . We shall now give some examples of tasks that can be performed using socialnetwork data expressed in FOAF that would be difficult to achieve otherwise.10.6.1 Consolidation of people objectsAn important task in extracting social data from the Web is merging identifiers ofequivalent instances occurring across different sources. This involves identifyinginstances representing the same object, and unifying them into one entity. Objectconsolidation (or ‘smushing’) can be performed for instances which share the50 (URL last accessed 2009-06-09)51 (URL last accessed 2009-07-10)
  • 190. 10 Social networks 185same value for inverse functional properties or IFPs (Mika 2005b), for exampleusing foaf:mbox52. Another option is to provide explicit identification using in-stances of the OWL (Web Ontology Language) sameAs property between variousresources that identify the same person or data, despite different URIs being used.Fig. 10.7. Identity consolidation and social network browsing using data exported from varioussocial websites This best practice allows one to unify all of their identities from various export-ers (e.g. Flickr, Twitter, Facebook, etc.) and to then query their complete socialnetwork via a single entry point. Figure 10.7 shows a prototype application calledFOAFGear53 that queries the main FOAF URI for a person and then retrievesother URIs for that person (Flickr, Twitter, etc.) and related RDF files. SPARQLqueries are performed on each of them to retrieve relationships and an XML file ispassed to Graph Gear54 for rendering. Since it relies on a single representationformat, i.e. FOAF, and not on various distinct APIs, the complete application isonly about 100 lines of code. Finally, it can also be determined by consideringvarious alternative criteria and if a certain threshold is reached in similarity be-tween two instances that they can be considered equal (Aleman-Meza et al. 2006).However, while one can define such rules within his or her own restricted socialgraph, it may lead to unexpected results on the complete Web (for example, wheredifferent people sometimes have the same name), and identity management on theSemantic Web is a vast research topic (Choi et al. 2006).52 Defining a property as inverse functional (owl:InverseFunctionalProperty) implies that if tworesources share the same value for that property, they are the same even if they have differentURIs. FOAF defines various IFPs (foaf:mbox, foaf:openid, etc.)53 (URL last accessed 2009-06-01)54 (URL last accessed 2009-06-09)
  • 191. 186 The Social Semantic Web10.6.2 Aggregating a person’s web contributionsWe may wish to retrieve content that a person has contributed to various sourceson the Web, for example, all documents, images, chat events, etc. This is a diffi-cult problem to perform with a normal search engine as people may share theirname with other people, or people may use different account names on differentsites. A sample query over some FOAF data is shown below to get all the docu-ments created by a particular person55:PREFIX foaf: <>SELECT DISTINCT ?docWHERE { ?doc a foaf:Document . ?doc foaf:maker <http://apassant.nex/alex> .} However, since this query is based on a precise URI, it will not retrieve docu-ments created by the same user while using another URI (e.g. One option to retrieve this content is to defineowl:sameAs statements between this URI and others for the same person, such as:<> owl:sameAs <> . Then, by adding these statements into the triple store that holds the data, andassuming that it supports reasoning based on owl:sameAs, the query will also re-trieve documents that have as a foaf:maker. Another wayto retrieve a person’s contributions is to run the query not based on the URI, butrather based on an IFP (inverse functional property) such as the person’sfoaf:mbox or foaf:openid. Since OpenID aims to become a standard for authenti-cation on the Web, this can be a useful way to retrieve all contributions by a givenperson no matter which social website they use - providing the person signs in us-ing the same OpenID URL. This method is shown in the following query:PREFIX foaf: <>SELECT DISTINCT ?docWHERE { ?doc a foaf:Document ; foaf:maker ?person . ?person foaf:openid <> .}55 All the queries described in this section can be experimented with via the aforementionedLOD SPARQL endpoint at which contains FOAF data fromvarious sources
  • 192. 10 Social networks 18710.6.3 Inferring relationships from aggregated dataThe simplest way of extracting a social network from the Web is to look at explic-itly stated connections. Social networking sites and other types of social softwareallow users to express lists of friends. Blogging platforms may allow users to adda blogroll which is a list of favourite blogs. Depending on the platform, these con-nections may indicate a directed or undirected link between users. For example,blogroll links are frequently unreciprocated, and are therefore directed, but manysocial networking sites require both users to consent to the link, creating undi-rected ties. A sample query for extracting the social network formed by explicitfoaf:knows relationships follows using the SPARQL query language.PREFIX foaf: <>SELECT DISTINCT ?s ?oWHERE { ?s a foaf:Person . ?o a foaf:Person . ?s foaf:knows ?o .} In addition to explicitly stated person-to-person links, there are many implicitsocial connections present on the Web. For example, we may decide to suggest toAlice and Bob that they connect to each other directly if they both share an inter-est in surfing as shown in Figure 10.8. Links between people may be inferred dueto links through some common objects, for example, people appearing in the samepictures, tagging the same documents, or replying to each other’s blog posts.Fig. 10.8. Two people who get connected after a common foaf:interest is discovered These connections indicate relationships of varying strengths - for example, e-mail communication may be interpreted as stronger evidence of a real tie than thecase of one person replying to another’s blog post. Co-occurrence of names indocuments would be an even weaker sign of a relation. A sample query for ex-tracting the implicit social network formed by replies to posts follows (using thehas_reply property from SIOC which we will describe later):
  • 193. 188 The Social Semantic WebPREFIX foaf: <>PREFIX sioc: <>SELECT ?author1 ?author2WHERE { ?post1 a sioc:Post ; foaf:maker ?author1 ; sioc:has_reply ?post2 . ?post2 a sioc:Post ; foaf:maker ?author2 .} Instead of running queries to retrieve those implicit relationships, we can definerules to make them explicit and to state the acquaintance of users on a weblog. Forexample, we can consider that there is a formal agreement relationship betweentwo users (modelled with an arg:agreedWith relationship) as soon as one replies toa post from another one using ‘I agree’ in his or her answer56. To model this rule,we rely here on the SPARQL CONSTRUCT pattern which can be used to producenew statements from existing ones. Thus, we can send the following query to ourtriple store, and then insert the resulting RDF graph into the triple store so that therelationship will become explicit. The produced statements may then be used toextract a more precise social network within a blogging community when query-ing data.PREFIX foaf: <>PREFIX sioc: <>PREFIX arg: <htp://>CONSTRUCT { ?author2 arg:agreedWith ?author1 .} WHERE { ?post1 a sioc:Post ; foaf:maker ?author1 ; sioc:has_reply ?post2 . ?post2 rdf:type sioc:Post ; foaf:maker ?author2 ; sioc:content ?content . FILTER REGEX(?content, “I agree”, “i”) .} The above examples can be applied to people and content both in and acrosssites. Traditional, non-semantic queries performed in SQL would be limited to onesite and would require some kind of join on a user or content table. However, as-56Ideally, more advanced pattern matching and NLP methods should be used to define agree-ment between two users on a weblog, or perhaps a ‘thumbs up’ icon URI could be referenced
  • 194. 10 Social networks 189suming that the required data has been crawled and is available in a store with aSPARQL endpoint, the use of shared semantically-rich vocabularies makes it pos-sible to perform operations like these on data originating from many differentsources. While the above examples result in simple networks of people and un-typed ties, more complex social networks consisting of multiple node and linktypes can also be studied.10.7 hCard and XFNhCard57 is a microformat used to describe people, organisations, and contact de-tails for both. It was devised by Tantek Çelik and Brian Suda based on the vCardIETF format58 for describing electronic business cards. Like FOAF, hCard can beused to define various properties relating to people, including ‘bday’ (a person’sbirth date), ‘email’, ‘nickname’, and ‘photo’, where these properties are embeddedwithin XHTML attributes. The specification for hCard also incorporates the Geomicroformat which is used to identify the coordinates for a location or ‘adr’ (ad-dress) described within an hCard. For example, the hCard for John Breslin is:<div class=“vcard”> <div class=“fn”>John Breslin</div> <div class=“nickname”>Cloud</div> <div class=“org”>National University of Ireland, Galway</div> <div class=“tel”>+35391492622</div> <a class=“url” href=“”></a></div> XFN (XHTML Friends Network)59 is another social network-oriented micro-format, developed by Tantek Çelik, Eric Meyer and Matthew Mullenweg in 2003just before the creation of the microformats community. XFN allows one to definerelationships and relationship types between people, for example, ‘friend’,‘neighbor’, ‘parent’, ‘met’, etc. XFN is also supported through the WordPressblogging platform: when adding a new blogroll link, one can use a form withcheckboxes to specify additional metadata regarding the relationship between theblog owner and the person who is being linked to (which is then exposed as meta-data embedded in the blog’s resulting XHTML). For example, an XFN ‘col-league’-type link to Uldis Bojārs would be written as:57 (URL last accessed 2009-07-07)58 (URL last accessed 2009-07-07)59 (URL last accessed 2009-07-07)
  • 195. 190 The Social Semantic Web<a href=“” rel=“colleague”>Uldis Bojārs</a> When combined with XFN, hCard provides similar functionality to FOAF interms of describing people and their social networks. The different types of per-son-to-person relationships available in XFN allow richer descriptions of socialnetworks to be created as FOAF only has ‘knows’ relationships. However, FOAFcan also be extended with richer relationship types via the XFN in RDF vocabu-lary60 (developed in 2008 by Richard Cyganiak) or the Relationship vocabulary61(which includes a variety of terms including ‘siblingOf’, ‘wouldLikeToKnow’ and‘employerOf’).10.8 The Social Graph API and OpenSocial10.8.1 The Social Graph APIThe idea of the social graph and ‘social network portability’ is mainly about beingable to bring your social network connections from one site to another. If imple-mented, and if you were on Facebook, you could then move to LinkedIn and bringyour profile and connections with you. The global social graph is composed of allthe social network connections that are distributed across a multitude of sites. Google’s Social Graph API62 is a step in this direction, and it ‘returns web ad-dresses of public pages and publicly-declared connections between them’. Thisdata is obtained from FOAF and XFN information embedded in other crawledpages, combined with specialist knowledge about the structure of certain large so-cial websites. The API provides one with an easy method to find their social graph(both ‘me’ and ‘knows’ connections), and from this graph one can browse andquery associated social objects through other services63. By indexing semanticdata from many social networking sites like Hi5, MySpace, LiveJournal, Twitter,etc., users can bring their social graph with them from those services when theysign up for a new site that supports the API. An example of a tool that makes use of the Social Graph API is Dan Brickley’sFOAF extension for Mozilla Ubiquity64. This tool provides more context about thecurrently-viewed page, by consulting the Social Graph API for a list of other ac-counts that are claimed by the current page.60 (URL last accessed 2009-07-07)61 (URL last accessed 2009-07-07)62 (URL last accessed 2009-06-09)63 (URL last accessed 2009-06-09)64 (URL last accessed 2009-06-08)
  • 196. 10 Social networks 191 Those unhappy with the Social Graph API65 have raised objections to the APIbeing operated by a for-profit as opposed to a non-profit organisation, and there issome opposition to the idea of a single point of control rather than having a set ofdistributed indexes. Again, an option to disable crawling of one’s social graph(similar to ‘nofollow’ links for the Web) could be useful for public social graphs.On the other hand, such an API probably needs the backing and resources of acompany like Google so that people will actually use it. Previous FOAF aggrega-tor efforts like Plink and FoaFSpace could have provided such a service, but theydid not achieve critical mass. As an example of usage for the Social Graph API,the following lookup66 query will retrieve both inbound and outbound social graphlinks (the query parameters ‘edi=1’ and ‘edo=1’ will yield ‘nodes_referenced_by’and ‘nodes_referenced’ in the output respectively) for the URI(‘q=’), and this query will return the output in pretty-print JSON format (‘pretty=1’) with no JavaScript callback function required(‘callback=’). yields:{ “canonical_mapping”: { “”: “” }, “nodes”: { “”: { “attributes”: { }, “nodes_referenced”: { “”: { “types”: [ “me” ] } }, “nodes_referenced_by”: { “”: { “types”: [ “contact” ]65 (URL last accessed 2009-06-09)66 (last accessed 2009-07-07)
  • 197. 192 The Social Semantic Web }, “ ontoeval/oe4/john-foaf.rdf”: { “types”: [ “me” ] }, “”: { “types”: [ “me” ] }, . . .10.8.2 OpenSocial The OpenSocial API67 is an initiative from Google that enables ‘gadget’ port-ability, where social applications can be deployed across a variety of social net-working sites. Google, Yahoo! and MySpace also formed the OpenSocial Founda-tion68 as a non-profit entity to support such social application portability69. Assummarised by Julian Bond70, OpenSocial consists of a gadget API (for gadgetprogrammers) and a standard for site owners to implement these gadgets on theirown sites. Some data portability is possible through an OpenSocial REST API. In terms of being able to transfer your social network profile and contactsacross networks, the OpenSocial documentation for hosting applications states71:‘Usually your SPI [Service Provider Interface] will connect to your own socialnetwork, so that an OpenSocial app added to your website automatically uses yoursite’s data. However, it is possible to use data from another social network as well,should you prefer.’ Of course, this will require social networks to enable suchfunctionality, but if so, it could also be a step in the direction of cross-networkportability.67 (URL last accessed 2009-06-09)68 (URL last accessed 2009-06-09)69 (URL last accessed 2009-06-09)70 (URL last accessed 2009-06-09)71 (URL last accessed 2009-06-09)
  • 198. 10 Social networks 19310.9 The Facebook PlatformAlthough it started off as a college-oriented SNS, Facebook has turned the cornerwith over 50% of Facebook users being non-students, and people over 24 are itsfastest-growing demographic. Facebook’s success has also been demonstrated bythe amount of acquisition interest from various parties including Yahoo! andGoogle. Microsoft bought a 1.5% stake in Facebook during 200772. However, Facebook’s rise in popularity has most certainly been due in somepart to their Facebook Platform. The framework allows developers to create appli-cations that can interact with Facebook’s core features and these applications cangive custom functionality to users of Facebook (virtual gifts, classified ads, etc.).In mid-2008, 33,000 applications had been created using the developer interface,with 400,000 developers signed up to the Facebook developer community. Some of the Facebook Platform technologies available for developers to workwith include FBML (Facebook Markup Language), FQL (Facebook Query Lan-guage), a Facebook API and FBJS (Facebook JavaScript). All of these have beenlicensed in more generic forms as SNML (Social Network Markup Language),SNQL (Social Network Query Language), SNAPI (Social Network API), andSNJS (Social Network JavaScript) to the Bebo SNS. Another service called Facebook Connect allows users to connect their Face-book identity, friends and privacy to any site, enabling third parties to implementand offer features from the Facebook Platform on non-Facebook sites. This ser-vice, combined with the aforementioned offerings to SNSs like Bebo, can be seenas directly competing with OpenSocial73. According to the Washington Post74, oneof the aims of the OWF75 (Open Web Foundation, established in 2008) is to act asa venue where Google and Facebook can resolve differences between their Open-Social and Facebook platforms, as well as work on a standard way to have theirusers interact with each other. FBML is an adapted form of HTML which can be used by developers to cus-tomise the look-and-feel of their applications. FQL allows developers to easilyquery Facebook data from their applications in an SQL-like syntax. The FacebookAPI allows Facebook social data to be integrated into applications, e.g. data onfriends, groups or events. FBJS allows developers to ‘sandbox’ JavaScript in theirFBML code such that the JavaScript of one application does not have access toanything outside of its own scope. The APIs for accessing Facebook data have specific terms of use which restrictthe caching of their data. You have to access the data ‘on the fly’ within clearlydefined constraints: you can interact with data on the social network via the Face-72 (URL last accessed 2009-06-09)73 (URL last accessed 2009-06-09)74 (URL last accessed 2009-07-07)75 (URL last accessed 2009-06-09)
  • 199. 194 The Social Semantic Webbook APIs, but the data cannot be exported as a dump. However, OpenLink havedeveloped an RDFizer middleware layer that talks to FQL and converts the re-quested Facebook profiles or associated data (e.g. photo albums) returned in XMLform into semantic linked data on the fly and without caching76. (Rowe andCiravegna 2008) have also described their ‘Facebook FOAF Generator77 ‘whichreturns FOAF data for a user who has authenticated with the service.10.10 Some social networking initiatives from the W3CIn January 2009, the W3C co-sponsored a very successful workshop78 on ‘The Fu-ture of Social Networking’ for which 72 position papers were submitted. 57 or-ganisations were represented at the event, which aimed to bring together industrypartners in order to discuss future challenges and opportunities for the social net-working industry. The topics under discussion included decentralised and distrib-uted social networks; the use and abuse of contextual information; policies, pri-vacy and trust in social networks; accessibility; and networking via mobiledevices. One outcome of this workshop was the creation of a W3C Incubator Group onthe topic of the ‘Social Web’ to progress the aforementioned workshop topics. Themission of this group79 is ‘to understand the systems and technologies that permitthe description and identification of people, groups, organisations, and user-generated content in extensible and privacy-respecting ways’. The group interactswith various other initiatives in the areas of policy, privacy, data portability, iden-tity, and microblogging, usually through invited experts who participate in thegroup’s weekly teleconferences.10.11 A social networking stackSo far, SNSs use explicit representations of social networks primarily for visuali-sation and browsing purposes. Yet, some research prototypes show that socialnetworks are actually useful for more than just ego surfing to discover unexpectedlinks in networks of friends. For example, some efforts are under way to examinee-mail filtering and ranking based on social networks (Golbeck and Hendler 2004,Fisher et al. 2006). Richard Soderberg’s Filster80 filter for procmail checks incom-76 (URL last accessed 2009-06-09)77 (URL last accessed 2009-06-09)78 (URL last accessed 2009-07-17)79 (URL last accessed 2009-07-17)80 (URL last accessed 2009-06-06)
  • 200. 10 Social networks 195ing mail against valid users from various networks such as orkut, CPAN (Com-prehensive Perl Archive Network), and distributed FOAF profiles. Explicitly-represented social networking information can also provide a means for assessinga piece of information’s importance and relevance for many other kinds of infor-mation filtering - for example, in semantic attention management (Petrie 2006) -and routing, in general. Another system that leverages social networking (in the instant messaging ap-plication space) is the xOperator81 tool from the University of Leipzig (Dietzold etal. 2008). xOperator is a ‘semantic agent’ for XMPP (formerly Jabber)-based net-works which finds and shares content about resources (using RDF and SPARQL)amongst a person and his or her friends, by creating an overlay of collaborative in-formation agents on top of existing IM networks that are based on XMPP. Rather than building a separate social networking layer into tools (with all theassociated maintenance problems), we propose that information space and applica-tion architects need to fold it into various technology stacks (see Figure 10.9). TheNepomuk project (that we shall cover in Chapter 13) does this for the desktop, butgiven the evolution toward ubiquitous computing and the so-called ‘Internet ofthings,’ which will deliver much more information, the Internet infrastructure it-self might need to be augmented to include a social networking infrastructure tokeep users from drowning in an ocean of unconnected and meaningless informa-tion. Just as the social semantic desktop Nepomuk provides an operating systemlayer for representing and exchanging information on the desktop, we propose thatinformation creation on the Web and the Internet should take existing connectionsbetween content objects and people into account to provide meaning for this in-formation. For example, SNSs might include mechanisms to automate the creationof connections among information items or to route information based on existingrelationships between people and content items.Fig. 10.9. Proposal for making social networking a shared component across various desktop andWeb applications A social networking stack needs to take into account a person’s relevant objectsof interest and provide some limited data portability (at the very least, for theirmost highly-used or rated items). Through this, the actions and interactions of aperson with other users and objects (exhibiting relevant properties) in existing sys-81 (URL last accessed 2009-06-09)
  • 201. 196 The Social Semantic Webtems can be used to create new user or group connections when a person registersfor a new social networking site or application. Also, instead of having a fragmented view of one’s network in each applica-tion, the social networking stack would let a user use all of their person-to-personconnections in any application. To enable the sharing of existing contacts and toaid with the creation of new ones, the cross-application social networking stackwill require a number of layers:1. Personal authentication and authorisation layer. This layer would use a sin- gle sign-on mechanism (e.g. OpenID, Sxip82) to authenticate that an individual is who they claim to be, and would in turn ensure that they are authorised to make use of their social network connections (layer 2) and/or leverage previ- ously created content items (layer 3).2. Social network access layer. This layer would utilise the social networking contacts created by an individual across various platforms, for example, by col- lecting FOAF ‘knows’ relationships from multiple sites. However, access con- trol is required as social connections may not always be bi-directional: i.e. there has to be some consent from both sides for certain transactions. For example, Alice may create a connection to Bob in order to view Bob’s public content, but Bob may have to approve the connection in the reverse direction if Alice ever wants to send him a direct message. This layer would not only ensure that the required directional links exist for various interactions, but would also ver- ify that the source of this social network information is valid.3. Content object access layer. This layer would collect a person’s relevant con- tent objects, and verify that they are allowed to reuse data and metadata from these objects in the current application. This could be achieved using SIOC (more later) as a representation format, aggregating a person’s created items (through their user accounts) from various site containers. For reputation pur- poses, this layer would also verify that these items were in fact created by the authenticated individual on whatever sites they reference. This may require provenance of information as well as signing of RDF graphs (Carroll et al. 2005) and possibly advanced policies for dealing with identity theft. For the implementation of a social networking stack, various architectural alter-natives exist: the existing Domain Name System (DNS) system is an example of apossible architecture, but creates a central point of control. A peer-to-peer ap-proach is another possibility which would be worthwhile to explore, especiallysince it preserves the distributed aspect. The availability of a social networking stack would also have an effect on exist-ing networking layers: social routing algorithms are able to deliver information di-rectly to people for whom the information is relevant, with e-mail filtering androuting via social networks being just a simple example.82 (URL last accessed 2009-07-16)
  • 202. 11 Interlinking online communitiesAs relevant information from blogs, forums and other discussion primitivesincreasingly turns up in searches, solutions are required to access, leverageand interlink this information across community platforms in a standardway. The SIOC initiative (Semantically-Interlinked Online Communities)aims to enable the integration of online community information. SIOC pro-vides a Semantic Web ontology for representing rich data from social web-sites in RDF. It has recently achieved significant adoption through its usagein a variety of commercial and open-source software applications, and iscommonly used in conjunction with the FOAF vocabulary described earlier.By becoming a standard way for expressing user-generated content fromsuch sites, SIOC enables new kinds of usage scenarios for online communitysite data, and allows innovative semantic applications to be built on top of theexisting Social Web. We shall describe some of these applications, rangingfrom healthcare to business use.11.1 The need for semantics in online communitiesWith the proliferation of social websites, there is growing need for providingmethods to link these sites together in a meaningful way. When you use a searchengine to find information on the Web to answer a specific question, you oftenfind that the parts of your answer come from different community sites (forums,blogs, etc.). You also have to trawl across many of these sites before you can get acomplete answer. It would be useful to have methods for expressing the informa-tion from these communities in a standard form that could then be linked togetherallowing people to say, for example, that something was written by the same per-son who wrote something else, or that it is related to something else on the sametopic. Due to this lack of interconnection, online communities suffer from a num-ber of limitations as detailed below: A major limitation with online communities at present is the repeated require- ment to register an account on each new site that you wish to post content on. This involves filling out various text boxes, choosing a password, performing an e-mail verification step, etc. for each site. Another issue is in relation to the content that a user creates across a variety of sites. At the moment, without some links between the various user accounts that a person owns, it is almost impossible to obtain a complete set of content items that a person has created on all the sites that they are registered on. It isJ.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_11,© Springer-Verlag Berlin Heidelberg 2009
  • 203. 198 The Social Semantic Web also not feasible to find all the content created by a group of like-minded users in a distributed set of sites. As mentioned, there are many sites discussing complementary topics, where the same information is being repeated on different sites, and/or relevant parts of information are distributed across a number of sites. Therefore, when search- ing for information on a particular topic using a traditional search engine, one will often have to traverse quite a few sites to find a complete solution to a cer- tain query. Finally, there is at present no way to create and view ‘distributed conversa- tions’ across a number of community sites. Blogs use the trackback system to create links between blog posts, but there is no standard method of doing the same on message boards or other services. To overcome these limitations, we investigated how an ontology could be cre-ated to describe the domain of online communities and what they consist of (e.g.users and posts and other simple terms that occur in these communities). Also ofimportance is how we could encourage adoption of such an ontology throughcompelling use cases. There are a lot of structures and inherent connections pre-sent in online communities, in that people tag content, make replies or createtrackbacks between posts. The structure that is created in online communities isoften hidden in some database behind the scenes, and semantics can be used toexpose that structure. The structures of most online discussions are also quite similar, whether they beon blogs, Usenet newsgroups or forums - they consist of discussion starters andreplies or comments to the initial post. With a common ontology in place, we canuse this to represent and connect the data from various online communities (e.g. ifwe wanted to connect discussions on the web archive of the Irish Linux Users’Group mailing list to those ongoing in’s Unix forum). After creating thisontology, the next step towards connecting all of these discussion primitives(through links such as similar topics of interest, social networks, related forums,etc.) is encouraging people to install semantic data exporters developed for a vari-ety of open-source and commercial discussion systems. We will now describethese efforts.11.2 Semantically-Interlinked Online Communities (SIOC)SIOC (the Irish word for frost, pronounced ‘shock’) stands for ‘Semantically-Interlinked Online Communities’. The SIOC initiative is aimed at interlinking re-lated online community content from platforms such as blogs, message boards,and other social websites, by providing a lightweight ontology to describe thestructure of and activities in online communities, as well as providing a completefood chain for such data. In combination with the FOAF vocabulary for describing
  • 204. 11 Interlinking online communities 199people and their friends, and the SKOS model for organising knowledge, SIOClets developers link discussion posts and content items to other related discussionsand items, people (via their associated user accounts), and topics (using specific‘tags’, hierarchical categories, or concepts represented with URIs as described inChapter 8).Fig. 11.1. The SIOC logo As discussions begin to move beyond simple text-based conversations to in-clude audio and video content, SIOC is evolving to describe not only conventionaldiscussion platforms but also new Web-based communication and content-sharingmechanisms. At the moment, a lot of the content being created on social websites(events, bookmarks, videos, etc.) is being commented on and annotated by others.If you consider such content items to be the starting point for a discussion aboutthe content (similar to a text-based thread starter in a forum or blog), and if thecontent item being created is done so in a container linked to a user or topic, thenSIOC is quite suitable for describing metadata about these content items as well. Since disconnected social websites require ontologies for interoperation, anddue to the fact that there is a lot of social data with inherent semantics contained inthese sites, there is potential for high impact through the successful deployment ofa SIOC ontology. The development of SIOC was also an interesting process to ex-plore how an ontology can be developed for and bootstrapped on the SemanticWeb. Feedback from the research and development community to the ontologydevelopment process was increased through the development of a W3C MemberSubmission for SIOC. Partners in this space were gathered - a combination of aca-demic and industry partners - and after a year or so of putting this submission inplace, the vocabulary evolved and was revised by community consensus involvingdiscussions on the open-access SIOC Developers mailing list, a SIOC IRC chan-nel, and a project wiki, resulting in the full submission being published at the endof July 20071. The submission included the ontology definition and related docu-ments, including one on applications using SIOC and another describing the ties1 (URL last accessed 2009-06-12)
  • 205. 200 The Social Semantic Webbetween SIOC and other related lightweight vocabularies, such as FOAF, DublinCore and SKOS. Many online communities still use mailing lists and message boards as theirmain communication mechanisms, and the SIOC initiative has also created anumber of data producers for such systems in order to lift these communities tothe Semantic Web. So far, SIOC has been adopted in a framework of about 60 ap-plications or modules2 ranging from exporters for major Social Web platforms toapplications in neuromedicine research, and has been deployed on hundreds ofsites. Since the W3C Member Submission, SIOC has gained even more successand attention from interested parties. For example, SIOC was recently chosen byYahoo! SearchMonkey as a suitable reference vocabulary to describe the activitiesof online communities, along with FOAF to describe the social networking stack3. An interesting aspect of SIOC is that it goes beyond pure Social Web systemsand can be used in other use cases involving the need to model social interactionwithin communities, either in corporate environments (where there is parallel lackof integration between social software and other systems in enterprise intranets)4,or for argumentative discussions and scientific discourse representation (as illus-trated by recent efforts5 to align SIOC and Semantic Web Applications in Neu-romedicine6). Some are even considering how SIOC and FOAF can be applied tothe application domain of real estate7, 8. According to Eric Miller from Zepheira9,FOAF and SIOC are useful not only for getting at historical content, but for pro-viding fine-tuned content-delivery systems connecting people interested in variousaspects of a particular domain. SIOC aims to represent the information in online community sites and humancommunication. However, in contrast to other representation approaches, e.g.OWL Time (Hobbs and Pan 2004), SIOC is not aiming at an axiomatisation of adomain. The SIOC ontology was deliberately designed to capture existing infor-mation that is mostly present in current Social Web systems and to provide a lowentry barrier for users and developers. The design of SIOC allows one to exportinformation from many sources by relatively simply means, and thus to generate acritical mass of information, leading quickly to the Web community adoptingSIOC, providing additional exporters and writing applications. SIOC is aimed at aminimal consensus of a given domain, rather than a complete specification.2 (URL last accessed 2009-06-10)3 (accessed 2009-07-17)4 (URL last accessed 2009-06-09)5 (URL last accessed 2009-06-09)6 (URL last accessed 2009-06-09)7 (URL last accessed 2009-06-09)8 (URL last accessed 2009-06-09)9 (URL last accessed 2009-06-09)
  • 206. 11 Interlinking online communities 20111.2.1 The SIOC ontologyThe ontology consists of the SIOC Core ontology10, an RDF-based schema con-sisting of 11 classes and 53 properties, and four ontology modules: SIOC Access,SIOC Argumentation, SIOC Services and SIOC Types. The SIOC Core ontologydefines the main concepts and properties required to describe information fromonline communities on the Semantic Web. The main terms in the SIOC Core on-tology are shown in Figure 11.2. The basic concepts in SIOC have been chosen tobe as generic as possible, thereby allowing us to describe many different kinds ofuser-generated content.Fig. 11.2. The main classes and properties in SIOC The SIOC ontology definitions are written using the RDF/OWL language thatmakes it easy for software to process some basic facts about the terms in the on-tology, and consequently about the things described in SIOC documents. A SIOCdocument, unlike a traditional Web page, can be combined with other SemanticWeb documents to create a unified database of information. The SIOC Core ontology was originally created with the terms used to describeweb-based discussion areas such as blogs and message boards: namely Site, Fo-rum and Post. sioc:Users create Posts organised in Forums which are hosted onSites. sioc:Posts have reply Posts, and Forums can be parents of other Forums.10 (URL last accessed 2009-06-09)
  • 207. 202 The Social Semantic Web In parallel with the evolution of new types of social websites, these conceptsbecame subclasses of a higher level concepts - data spaces (sioc:Space, a placewhere data resides), containers (sioc:Container, used for grouping items together)and content items (sioc:Item) - which were added to SIOC as it evolved. Theseclasses allow us to structure the information in online community sites and distin-guish between different kinds of objects. Properties defined in SIOC allow us todescribe relations between objects and attributes of these objects. For example: The sioc:has_reply property links reply posts to content that they are replying to; sioc:has_creator and foaf:maker link user-generated content to additional in- formation about its authors; and The sioc:topic property points to a resource describing the topic of content items, e.g. their categories and tags. The high-level concepts sioc:Space, sioc:Container and sioc:Item are at the topof the SIOC class hierarchy, and most of the other SIOC classes are subclasses ofthese. A data space (sioc:Space) is a place where data resides, such as a website,personal desktop, shared file space, etc. It can be the location for a set of Con-tainer(s) of content Item(s). Subclasses of Container can be used to further specifytyped groupings of Item(s) in online communities. The class sioc:Item is a high-level concept for content items and is used for describing user-created content.Fig. 11.3. Integrating content from heterogeneous social websites via semantic representations Usually these high-level concepts are used as abstract classes which otherSIOC classes can be derived from. They are needed to ensure that SIOC canevolve and be applied to specific domain areas where definitions of the originalSIOC classes such as sioc:Post or sioc:Forum can be too narrow. For example, anaddress book, which describes a collection of social and professional contacts, is a
  • 208. 11 Interlinking online communities 203type of sioc:Container but it is not the same as a discussion forum. We shallshortly describe some subclasses of these high-level SIOC concepts from the‘SIOC Types’ ontology module. Referring back to our discussions on the ‘boardscape’ earlier, Figure 11.3shows a scenario of how semantic content from a set of message boards could becombined with data from other social websites, all described using a combinationof SIOC and FOAF. Using metadata about users’ social networks on messageboard sites and the posts created by members of those social networks, this can becombined with other content from those users’ blogs, image galleries and ex-tended social networks outside the boardscape.11.2.2 SIOC metadata formatSIOC metadata is normally produced in the RDF format, but it can be expressed inany of the available RDF syntaxes including RDF/XML, Turtle or RDFa. A sam-ple fragment of SIOC RDF/XML metadata is shown below, representing a blogpost, its metadata and associated follow-up comments. Using properties such asthe sioc:topic or the sioc:creator_of an item / post, content in a common form canbe collected across many heterogeneous discussion platforms based on what is be-ing talked about and who is saying it.<sioc:Post rdf:about=“”> <dc:title>Creating connections between discussion clouds with SIOC</dc:title> <dcterms:created>2006-09-07T09:33:30Z</dcterms:created> <sioc:has_container rdf:resource=“ index.php?sioc_type=site#weblog” /> <sioc:has_creator> <sioc:User rdf:about=“ cloud/” rdfs:label=“Cloud”> <rdfs:seeAlso rdf:resource=“ index.php?sioc_type=user&sioc_id=1” /> </sioc:User> </sioc:has_creator> <sioc:content>SIOC provides a unified vocabulary for content and interaction description: a semantic layer that can co-exist with existing discussion platforms.</sioc:content> <sioc:topic rdfs:label=“Semantic Web” rdf:resource=“http://” /> <sioc:topic rdfs:label=“Blogs” rdf:resource=“” />
  • 209. 204 The Social Semantic Web <sioc:has_reply> <sioc:Post rdf:about=“ creating-connections-between-discussion-clouds-with-sioc/ #comment-123928”> <rdfs:seeAlso rdf:resource=“ index.php?sioc_type=comment&sioc_id=123928” /> </sioc:Post> </sioc:has_reply></sioc:Post>Another sample instance of SIOC metadata from a forum (message board) in Dru-pal is shown below in Turtle. This forum has a title, a taxonomy topic in Drupal, adescription, and is the container for one or more posts. More information on theposts can be obtained from the referenced URI (e.g. if it has replies, related posts,who wrote it, etc.).<> a sioc:Forum ; dc:title “Developers Forum” ; dc:description “Developers Forum at”; sioc:link <> ; sioc:topic <> ; sioc:container_of <> .<> a sioc:Post ; rdfs:label “Microformats and SIOC” ; rdfs:seeAlso <> .Fig. 11.4. Some ways of describing threaded discussions in SIOC Threaded discussions can be described by two different methods in SIOC. Inthe first and most-used method, a Forum is the container_of a Post, which then hasa number of reply posts as shown by the has_reply links in Figure 11.4. Alternately,one may wish to use an intermediary Thread container to contain various Posts as
  • 210. 11 Interlinking online communities 205shown by the container_of links, and either deduce the threaded structure from thedates of the Posts or also include information about the reply structure. In the sec-ond case, these Threads would be contained within parent Forums.11.2.3 SIOC modulesA separate SIOC Types module defines more specific subclasses of the SIOCCore concepts which can be used to describe the structure and various types ofcontent of social websites. This module defines subtypes of SIOC objects neededfor more precise representation of the various elements in online community sites(e.g. sioc_t:MessageBoard is a subclass of sioc:Forum), and introduces new sub-classes for describing different kinds of Social Web objects in SIOC. The modulealso points to existing ontologies suitable for describing full details of these ob-jects (e.g. a sioc_t:ReviewArea may contain Review(s), described in detail usingthe Review Vocabulary). Examples of SIOC Core ontology classes and the corre-sponding SIOC Types module subclasses include: sioc:Container (AddressBook,AnnotationSet, AudioChannel, BookmarkFolder, Briefcase, EventCalendar, etc.);sioc:Forum (ChatChannel, MailingList, MessageBoard, Weblog); and sioc:Post(BlogPost, BoardPost, Comment, InstantMessage, MailMessage, WikiArticle).Some additional terms (Answer, BestAnswer, Question) were also added forQ&A-type sites like Yahoo! Answers, whereby content from such sites can alsobe lifted onto the Semantic Web. Community sites typically publish web service interfaces for programmaticsearch and content management services (typically SOAP and/or REST). Theseservices may be generic in nature (with standardised signatures covering input andoutput message formats) or service specific (where service signatures are uniqueto specific functions performed, as can be seen in current Web 2.0 API usage pat-terns). The SIOC Services ontology module allows one to indicate that a web ser-vice is associated with (located on) a sioc:Site or a part of it. This module providesa simple way to tell others about a web service, and should not be confused withweb service definitions that define the details of a web service. Asioc_s:service_definition property is used to relate a sioc_s:Service to its full webservice definition (e.g. in WSDL, the Web Services Description Language). Finally, a SIOC Access module was created to define basic information aboutusers’ permissions, access rights and the status of content items in online commu-nities. User access rights are modelled using Roles assigned to a user and Permis-sions on content items associated with these Roles. This module includes termslike sioc_a:Status that can be assigned to content items to indicate their publica-tion status (e.g. public, draft, etc.), and sioc_a:Permission which describes a typeof action that can be performed on an object (e.g. a sioc:Forum, sioc:Site) that iswithin the scope of a sioc:Role.
  • 211. 206 The Social Semantic Web11.3 Expert finding in online communitiesEven with the advent of the Social Web, a common problem still occurs wheretwo or more people are struggling to solve a particular problem in a specialisedarea when their combined skills could easily solve their mutual interests. Theprocess of finding the right expert for a given problem (e.g. matching people’sskills, interests, tasks or responsibilities) has long been of interest to computer sci-entists, and with renewed interest it is now being tackled in domains ranging fromlanguage modelling (Balog 2006) to the Semantic Web (through the ExpertFinderinitiative11). By describing exactly who says what in a semantic form, commonproblems with finding relevant expertise in organisations can be addressed as ref-erenced in this quote from (Berners-Lee 1999), where he describes what was actu-ally one of the main motivations for developing the Web: A larger company fails to be intuitive when the person with the answer isn’t talking with the person who has the question.Fig. 11.5. Typical expert finding scenario For example, in the scenario in Figure 11.5, technician Elaine has been as-signed as an advisor contact for new employee Mike, who will be installingbroadband using Linux-based routers. Elaine herself would like Mike to have amore suitable contact for two reasons: (1) she is already heavily loaded, and (2)11 (URL last accessed 2009-06-09)
  • 212. 11 Interlinking online communities 207she thinks there may be others with more relevant expertise in the work areas thatMike has been assigned. Through a semantic search, she sees that her contactMary has written comprehensive answers in many discussions about Linux net-working, and she realises that Mary’s expertise would be a good match for Mike’sskills requirements. She can then communicate with Mary and arrange a reas-signment of Mike to Mary as advisor. Using metadata described using the FOAF and SIOC vocabularies, the knowl-edge that is created in online communities can be made explicit and accessible toall members, and this can be used to find particular expertise through one’s socialconnections and memberships of community sites. By using the profiles of usersfrom FOAF-enabled social networks and SIOC-exporting online communities,various connections become possible, including connecting people with people bytailoring available information and linking users based on their profiles and socialnetworks (and in turn enhancing the connections between and within communitiesof interest). The properties of FOAF make it particularly suited for matching peo-ple and their skills in social networks, and SIOC helps with identifying relevanttopics and the individuals who have discussed them. So far, a lot of the work in this area has focussed on producing the appropriatedata for expert finding, but the augmentation of this data with rules to aid withreasoning is the next step. For example, this is discussed in (Aleman-Meza 2007)by members of the ExpertFinder project that aims to improve the publication ofmetadata on web pages in order to help with the automated identification of ex-perts on particular topics. Through combining information from one’s explicitlydefined social network and from implicit connections that may be derived throughcommon activities (e.g. commenting on each other’s content, participating in thesame community areas), the suggestion of experts can be enhanced. When finding experts, it is also important to calculate a trust level between youand that person by analysing the explicit connections available through socialnetworks expressed in FOAF and the implicit connections described throughcommon activities in SIOC. As Tim Berners-Lee said in an interview in 200812: Friend-of-a-Friend (FOAF) represents relationships between people, as well as basic contact details. SIOC does this for groups: it extends the FOAF idea to being able to talk about whole groups of people. Groups are a very important part of the Web because online communities are where we form our trust. It is very useful to build tools that apply to all the online communities so they can all generate this social trust information. I am excited about SIOC because you can use that information to determine trust, to let people in. If someone is in a group that a friend of mine is part of, I can create another relationship based on that.12 (URL last accessed 2009-06-09)
  • 213. 208 The Social Semantic Web11.3.1 FOAF for expert findingIn terms of definitions of expertise by an individual, the FOAF ontology has anumber of properties of note. Firstly, the foaf:topic_interest property defines top-ics of interest to a person, and can be used directly to find those with an interest ina particular domain when it is represented by its URI (e.g. from DBpedia), whilethe foaf:interest property links to a foaf:Document representing this interest. Sec-ondly, people can create foaf:publications or other foaf:Document(s) (viafoaf:made / maker) which may have an associated foaf:topic or foaf:primaryTopicthat can again be used to determine a person’s domains of interest. Thirdly,foaf:currentProject / pastProject gives some information on some ‘some collabora-tive or individual undertaking’ that a person may be involved in.Fig. 11.6. Mock-up of a semantic expert finding tool for skills matching in a social network There have been a number of extensions or modules for the FOAF ontologythat are of interest to the expert finding scenarios previously mentioned. There isthe ‘Resume-RDF’ schema for extending FOAF profiles with curriculum vitae-type information (Bojārs and Breslin 2007). This schema includes terms for workand academic experience, skills, courses and certifications, publications, refer-
  • 214. 11 Interlinking online communities 209ences, etc. Another extension to FOAF, DOAC13 (Description of a Career) canalso be used to describe the professional capabilities of a worker. These defini-tions could easily be implemented in a social network to display available matchesfor a desired skill set (Figure 11.6). (Kruk and Decker 2005)’s FOAFRealm is a user profile management systembased on FOAF, that provides authentication, access control and social networkingfeatures such as ‘semantic social collaborative filtering’. The system allows usersto share and annotate their personal taxonomies across a social network usingWordNet, Dewey Decimal Classification, and the Open Directory Project as baseclassifications. When implemented in document exchange systems such asJeromeDL, a semantic digital library, users can classify their documents or book-marks and allow others to access these resources using FOAFRealm’s ACL-based(Access Control List) social networking functionality (Breslin et al. 2007b). Eachuser’s collection is assigned an expertise value that reflects the quality of the in-formation that they provide; this value is calculated based on a PageRank calcula-tion of their social network. Users are then also aware of the expertise level ofothers on given topics.11.3.2 SIOC for expert findingWith respect to finding experts in an online community, the main SIOC propertiesof interest are sioc:topic and dc:subject. sioc:topic defines a category resource thata particular discussion post is related to; by aggregating all the sioc:topic(s) thatare associated with a particular user’s posts across a number of sites, a pictureemerges as to where their topics of interest and related expertises lie.sioc:Forum(s) or sioc:Site(s) may also have associated sioc:topic(s), and again auser with an interest in a particular topic may be a sioc:subscriber_of a certain dis-cussion channel. SIOC concepts are loosely aligned with FOAF and SKOS so as to avoid anyunnecessary duplication or term conflicts. The concept of sioc:User has been de-fined to be a sub-type of foaf:OnlineAccount, so that existing properties fromFOAF can be reused and so that new properties for users can be defined in SIOCwithout directly impacting on the FOAF ontology. Using SKOS to define topicsunder discussion and of interest leads to many possibilities when the relationshipsbetween the various taxonomy terms are formalised using the SKOS vocabulary. By combining the properties of the FOAF, SIOC and SKOS ontologies, we canconstruct some interesting scenarios that can be used when metadata is providedfrom semantically-enabled social networks and community sites. We shall use aparticular hypothetical scenario illustrated in Figure 11.7, where Alice (top rightcorner) is looking for expertise in the area of rain clouds. Direct ‘knows’ links be-13 (URL last accessed 2009-06-09)
  • 215. 210 The Social Semantic Webtween people represent bidirectional social relationships (e.g. Bob knows Carolineand vice versa). Using the proposed framework, Alice has expressed an interest (using thefoaf:topic_interest property) in rain clouds, and wishes to find others with exper-tise in this domain. At the moment, she is only connected to Bob, who she knowsis also interested in clouds (through his image gallery and comments that he hasattached to his photos), but she does not know anyone directly who is specificallyinterested in rain clouds. She performs a search through her extended social net-work to find out if anyone else has defined a topic of interest in rain clouds, butdoes not receive any matches.Fig. 11.7. Expert finding using the concepts in FOAF, SIOC and SKOS She then looks to see if anyone in her extended social network has created con-tent relating to rain clouds (thereby indirectly expressing an interest in this topic),and via SIOC instance data, Alice finds a match in an individual named Carolinewho has created a message board post that is on the topic of rain clouds and blogsabout clouds from time to time. She is also shown a breadcrumb trail or shortestpath between herself and Caroline, which happens to be through their commonacquaintance Bob, and Alice then asks Bob if he can introduce her to Caroline. While she is waiting for this introduction, Alice decides to further explore themessage board where rain clouds were being discussed, and finds another individ-ual called Eric who has made some posts about this topic. By browsing other postsby Eric, she finds a Usenet newsgroup that Eric frequents where this topic is beingdiscussed in some more detail, and subscribes herself to it. As shown in Figure 11.7, content that a person creates on a particular discus-sion site (e.g. a weblog, mailing list, message board, etc.) can be linked using
  • 216. 11 Interlinking online communities 211sioc:topic to a skos:Concept (e.g. in Figure 11.7, one post is talking about cloudsand another post is referring to a narrower concept, that of rain clouds).11.4 Connections between community description formatsRSS and Atom are commonly-used generic syndication formats whose power liesin their simplicity: any website can be described as a channel, and the content ofpages can be described as items. At present, the syndicated content of a socialwebsite can be retrieved using RSS syndication modules, but these formats can besomewhat limited in that they typically only display the most recent 10 or 15 postscreated on the entire website or on a particular channel (e.g. gallery, user, topic),with no information on reply structures and therefore much semantic informationis lost. Also, the full nature of a social website’s structure and content cannot beadequately expressed using RSS, and this is one of the motivations for the use ofSIOC: it can be used to describe all of the past content contained within a socialwebsite and its relationships to other sites. SIOC is ideally suited for such representation as it is more semantically con-strained (being designed specifically to describe Social Web content as opposed toany content) and also all historical content can now be made available. One of thekey ideas in the Semantic Web is that data from different sources and vocabulariescan be combined together in ways that were previously not possible. A SIOCdocument can also be used in conjunction with terms from semantically-enabledsyndication formats such as RSS 1.0 and AtomOWL14, as well as other SemanticWeb vocabularies such as DOAP, FOAF, Dublin Core, etc. One of the best practices of the Semantic Web is the reuse of existing ontolo-gies and vocabularies, leading to improved data interoperability. The SIOC ontol-ogy follows this practice by reusing the FOAF vocabulary to describe person-centric data and the Dublin Core vocabulary to describe properties of SIOC con-tent items. Figure 11.8 gives an example of some of the relations between SIOCand other vocabularies. A person (described by foaf:Person) will usually have a number of online ac-counts (sioc:User) on different online community sites. They use these accounts tocreate content (sioc:Item or sioc:Post). The class sioc:User is a subclass offoaf:OnlineAccount and the foaf:holdsAccount property links a person to his orher online accounts. SIOC content items (shown as Post objects on Figure 11.8)are described using properties from SIOC, FOAF and Dublin Core. By usingFOAF to point to multiple social website accounts registered to a user and usingSIOC to express user-generated content on these sites, we can aggregate the con-tent created by a person all across the Social Web.14 (URL last accessed 2009-06-09)
  • 217. 212 The Social Semantic WebFig. 11.8. Relations between the SIOC, FOAF and SKOS ontologies Topics are usually present on the Social Web as categories and tags assigned tocontent items, and are represented in SIOC using a sioc:topic property. SIOC doesnot enforce the values of sioc:topic to be in a particular ontology (apart from thevalue being a URI) and leaves it up to information system architects to choose themost appropriate ontology to represent topics in each case. One of the approaches,illustrated in Figure 11.8, is to use the SKOS schema to represent topic hierarchiesand the relationships between them.11.5 Distributed conversations and channelsSIOC provides a unified vocabulary for content and interaction descriptions: asemantic layer that can co-exist with existing discussion platforms. Using SIOC,various linkages are created between the aforementioned concepts, which allownew methods of accessing this linked data (Figure 11.9), including: Virtual Forums. These may be a gathering of posts or threads which are dis- tributed across discussion platforms, for example, where a user has found posts from a number of blogs that can be associated with a particular category of in- terest, or an agent identifies relevant posts across a certain timeframe. If a per-
  • 218. 11 Interlinking online communities 213 son wanted to create a virtual forum for all of their own posts, instead of going around to each of their favourite sites and linking to their posts from a page, another option would be to do it in reverse - when their post was created, they would link back to their own post aggregation resource. From the user side, this could be achieved by creating an RDFa link for sioc:has_container (or by using the equivalent rel-directory microformat) to allow linking to such virtual fo- rums from the post creator side, i.e. from the post content. For example, this could be done through the use of a BBCode15 (a lightweight markup language used to format posts in many message boards) that would insert such a typed link. Also, virtual forums could be created that list all the posts by people in a restricted social network, or constructed for posts and threads that refer to a certain resource or topic.Fig. 11.9. SIOC-enabled semantic connections between existing discussion systems Distributed Conversations. Trackbacks are commonly used to link blog posts to previous posts on a related topic. By creating links in both directions, not only across blogs but across all types of internet discussions, conversations can be followed regardless of what point or URI fragment a browser enters at. Since the data to create these connections may be absent in current discussion15 (URL last accessed 2009-07-20)
  • 219. 214 The Social Semantic Web platforms, these may also be created through the use of supplemental RDFa tags (sioc:has_reply, reply_of) or microformats (rel-reply, rev-reply) in the HTML of post content. Unified Communities. Apart from creating a web page with a number of rele- vant links to the blogs or forums or people involved in a particular community, there is no standard way to define what makes up an online community, unless one groups the people who are members of that community using FOAF or OPML16 (Outline Processor Markup Language). We allow one to simply define what objects are constituent parts of a community, or to say to what community an object belongs (using Dublin Core’s hasPart and isPartOf relationships): us- ers, groups, forums, blogs, etc.Fig. 11.10. SIOC acting as a middle layer between existing discussion methods and new possi-bilities Shared Topics. Technorati (a search engine for blogs) and BoardTracker (for bulletin boards) have been leveraging the free-form tags that people associate with their posts for some time now. We allow the definition of such tags (using the dc:subject property), but also enable hierarchical or non-hierarchical topic definition of posts using sioc:topic when a topic is ambiguous or more informa- tion on a topic is required. Combining with other Semantic Web vocabularies,16 (URL last accessed 2009-06-09)
  • 220. 11 Interlinking online communities 215 tags and topics can be further described using the SKOS organisation system. Whether people are defining their interests explicitly using something like a foaf:interest / foaf:topic_interest field or implicitly by replying to thread topics or clicking on tags of interest, relevant content can be easily returned from het- erogeneous social websites with appropriate dc:subject or sioc:topic metadata.• One Person, Many User Accounts. SIOC also aims to help with the issue of multiple identities by allowing users to define that they hold other accounts or that their accounts belong to a particular personal identity (via foaf:holdsAccount or sioc:account_of). Therefore, all the posts or comments made by a particular person using their associated user accounts across various platforms could be identified. These are some ideas for what can be achieved by having some common repre-sentation formats for online community content. SIOC can be thought of as asemantic layer that sits on top of existing discussion systems (Figure 11.10), andone that enables some of the scenarios and possibilities described here, but ofcourse there are other novel ideas that can be enabled by such a semantic layer.11.6 SIOC applicationsSIOC is currently being used by companies in a variety of application domains(Figure 11.11). For example, OpenLink’s Data Spaces17 and Virtuoso AMI prod-ucts provide access to SIOC instance data from a range of applications includingblogs, wikis, aggregated feeds, shared bookmarks, discussions, photo galleries,briefcases (e.g. WebDAV file servers), etc. Engage is a community informationapplication from Talis that combines SIOC in its schematics with SKOS forknowledge organisation and FOAF for person description. Talk Digger18, a webservice from Zitgist LLC that helps people to find, follow and enter conversationson the Web (in order to see who is linking to a specific web page) exports all oftheir data using SIOC. ImageMatters LLC have recently announced a new open-source social bookmarking and mash-up application called gnizr, that exportssaved bookmarks using SIOC in combination with a tag ontology. Finally, Yahoo!SearchMonkey has published a list of recommended vocabularies for developersof SearchMonkey applications, with SIOC being recommended for ‘blogs, discus-sion forums, Q&A sites’19.17 (URL last accessed 2009-06-09)18 (URL last accessed 2009-06-09)19 (URL accessed 2009-06-01)
  • 221. 216 The Social Semantic WebFig. 11.11. Adoption of SIOC in open-source and commercial applications There are also many other open-source applications of SIOC coming from theweb developer community. OpenQabal, an open-source social networking andcollaboration platform is supporting SIOC, allowing Roller, JavaBB and othercomponent applications to become part of the SIOC-o-sphere (the world of SIOCdata). SIOC descriptions of forums are also being used for teaching and learning,for example, in the Fishtank project for Faculty Academy which leverages thestructure and searching power of RDF. IkeWiki, a wiki for knowledge engineer-ing, allows discussions represented using the SIOC ontology (following a forumstyle with threaded views) to be attached to wiki pages, thereby allowing one touse semantic queries to investigate the structure of any discussion. SWAML, theSemantic Web Archive of Mailing Lists, uses SIOC as its base ontology (theSWAML domain hosts the third largest collection of SIOC data files according tostatistics from PTSW). The project also includes Buxon, a sioc:Forum browserwritten in PyGTK. Finally, a third-party RDF exporter for the Twitter microblogservice20 has been created that uses SIOC for representing all microblog entriesand FOAF for describing the people (this is the second largest collection of SIOCdata according to PTSW).11.7 A food chain for SIOC dataSIOC gives different online community sites a common format for expressingtheir data in a rich, interlinked form. This interconnection of Social Web content20 (URL last accessed 2009-07-10)
  • 222. 11 Interlinking online communities 217using Semantic Web technologies can lead to many interesting possibilities on theindividual and community level. With the data represented in SIOC, variousbrowsers and applications taking advantage of this information can be built on topof SIOC. We shall now describe a food chain of such applications, starting withtools used to generate SIOC data from online community sites and concludingwith applications for browsing and reusing this information. Refining the picturein Figure 2.3, Figure 11.12 shows some application types in the SIOC food chain,illustrating where SIOC data is being produced, collected and consumed. This isnot exhaustive but it does cover the majority of SIOC application types availableat the moment. Rather than adding references for each of the applications listed in this section,we refer readers to the document ‘SIOC Ontology: Applications and Implementa-tion Status’21, especially as more applications are being added every few months.Fig. 11.12. The food chain of applications producing, collecting and consuming SIOC Producers of SIOC data include: various add-ons and modules for applicationsthat reuse and build on existing internal functions to export SIOC data in RDFformat (e.g. the SIOC exporters for WordPress or Drupal); sources of semi-structured data or queryable APIs that output data in formats that can be convertedto SIOC and augmented with other RDF data (e.g. SWAML for mailing list mail-boxes, the FlickRDF tool, the SIOC MediaWiki exporter22 and Sioku which lever-ages the Jaiku microblogging API); applications that natively store semantic data21 (URL last accessed 2009-07-09)22 (URL last accessed 2009-07-17)
  • 223. 218 The Social Semantic Weband use SIOC as one of their representation formats (e.g. Talis Engage); andschema mapping tools that directly access MySQL or other relational databasestores - bypassing default application access methods - to generate SIOC metadata(e.g. OpenLink Data Spaces). To aid with the production of SIOC data from newapplications, reusable data export APIs have also been created for PHP, Ruby onRails and Java. The SIOC data produced by these applications may either be discovered, col-lected, stored and/or indexed in an intermediary step, or may be consumed directlyby end-user applications (Bojārs et al. 2007a). Collectors can include: SemanticWeb spiders (‘scutters’) or crawlers specifically tailored towards gathering SIOCdata, which may store SIOC instances from multiple sources in a single data store(e.g. SWSE, Swoogle, Zitgist, and the SIOC Crawler); and indexers which willstore lists of where SIOC data instances are available from (e.g. Sindice and Consumers of SIOC data include: generic browsers for RDF data (such asDisco, the OpenLink RDF Browser, Tabulator, Timeline or Zitgist) or those thatare customised towards the display of SIOC data; extensions for web browsersthat detect the presence of SIOC and other RDF data, for pinging data indexingservices or for reusing / clipping of community contributions elsewhere (e.g. theSemantic Radar); applications that can import SIOC data, for example, for port-ability of data contributions between user accounts on different systems or for mi-grating a community from one site to another (e.g. the SIOC importer for Word-Press); and applications with visualisations oriented towards SIOC linked datagraphs (e.g. the ‘SIOCal network’ visualisation23 that displays SIOC commentstructures in terms of a network of their creators). We will now describe some applications falling under these three headings.11.7.1 SIOC producersSIOC export tools are a class of application which produce SIOC RDF data fromonline community sites (blogs, wikis, forums, etc.), often implemented as pluginsto the specific content management system used on a site. By enriching socialcommunity sites with SIOC RDF exporters, high-quality data can be automaticallycreated without the need for screen scraping and having to reconstruct the rela-tions from the information visible on web pages. An important property of these SIOC exporters is that information about everycontent item on a site is represented in RDF, making all the main information con-tained within a site available in RDF and ready for reuse. Most SIOC export tools also use RDF auto-discovery links (i.e. a link to anRDF document which is inserted inside the HTML HEAD element of web pages23 (accessed 2009-06-09)
  • 224. 11 Interlinking online communities 219on the site) that enables automatic discovery of this content by tools such as theSemantic Radar plugin for Firefox24. There are many potential uses for SIOC exporters. Firstly, SIOC data can beused in publish and subscribe mechanisms, without loss of important associatedmetadata about authors, replies and related links. Secondly, SIOC can provide atransport mechanism between various discussion methods such as blogs, mailinglists and boards (since these share many of the same concepts, but differ slightly intheir sense of ownership). Thirdly, SIOC metadata can be detected on sites usingauto-discovery mechanisms, either by metadata crawlers or using browser plugins(e.g. Semantic Radar). Fourthly, SIOC can be used to get the most active user orother current states of activity for a set of discussion sites. SIOC APIsSIOC APIs were developed as a part of the SIOC initative in order to lower thebarrier for entry and to help people who are not familiar with the Semantic Web towrite SIOC exporters for their community sites. Since many SIOC applications are available as open source, it is usually possi-ble for a developer to look at another SIOC data producer in order to find out howthe API should be used.• SIOC Export API for PHP. The SIOC API for PHP25 provides an easy way to manipulate SIOC data through PHP objects and methods, and renders the data into RDF/XML. The API creates and exports SIOC concepts about communi- ties of authors (sioc:User and foaf:Person), the thread starters and replies they create (sioc:Post and sioc_t:Comment), and the structure of the website (sioc:Site and sioc:Forum). SIOC API for Java. A SIOC API for Java26 has been created, based on sem- web4j. For each object in the SIOC ontology, this API generates classes with links between the objects realised as Java properties. SIOC API for Perl. Version 1.0 of a SIOC API for Perl27 has been released on CPAN, thanks to Jochen Lillich and Thomas Burg. RDFa on Rails. RDFa on Rails28 is a library of helper methods to help Ruby on Rails developers with producing RDFa data. SIOC terms are used to de- scribe blog posts in this library.24 (URL last accessed 2009-06-09)25 (URL last accessed 2009-07-17)26 (accessed 2009-06-09)27 (URL last accessed 2009-06-09)28 (URL last accessed 2009-06-09)
  • 225. 220 The Social Semantic Web11.7.1.2 Weblog, forum and CMS exportersSIOC export plugins are available for common online community site engines, in-cluding Drupal (message boards and blogs), WordPress (blogs), MediaWiki(wikis), and the PHP-based bulletin board systems vBulletin and phpBB.Fig. 11.12. The main functionality of the WordPress SIOC Exporter WordPress SIOC Exporter. Interlinking blogs is currently limited to static links between posts and users, lacking the semantics needed for interpretation by computers. The SIOC ontology has particular applicability to blog sites, by enabling the semantic linking of blogs and posts through terms such as topic, creator, has_owner, subscriber_of, links_to and related_to. WordPress is a popular blogging platform based on PHP and MySQL. The WordPress SIOC
  • 226. 11 Interlinking online communities 221 Exporter29 allows the production of SIOC metadata from WordPress-based blogs, by simply installing two plugin files in the plugins folder and enabling the SIOC plugin from the WordPress control panel. This plugin is the most widely-used SIOC exporter, and has acted as a reference example for many other SIOC data export applications. It uses a combination of native WordPress function calls and the SIOC PHP API to generate RDF data representing the complete blog, and the main components are shown in Figure 11.12. An ex- tended version30 of this plugin was created that also exports any semantic metadata embedded within the content of a blog post. Dotclear and b2evolution SIOC Exporter. Dotclear is a widely-used French blogging platform. The Dotclear SIOC Exporter31 produces SIOC metadata us- ing the SIOC export API for PHP, and exports information about the blog it- self, the blog users, posts and comments. b2evolution is a multi-blog platform that evolved from the same roots as WordPress (from b2/cafelog). An early version of a b2evolution SIOC Exporter32 has been built that also uses the SIOC export API for PHP. Drupal SIOC Exporter. There is also a Drupal SIOC Exporter33, which can be used to export SIOC data from Drupal sites, including blogs and forums. As Drupal can be used as a multi-user blogging platform, the plugin will export all blogs and all user accounts, so that each post can be clearly identified by its us- ers. For example, the SIOC module for Drupal exports site metadata (as a sioc:Site), blogs and forums (as sioc:Forums), nodes and comments from blogs and forums (as sioc:Posts), users (as sioc:Users), and also roles (as sioc:Roles). The Views Datasource module34 provides related functionality, rendering node content in a number of shareable, reusable formats including XML, JSON, XHTML and RDF (FOAF, SIOC, and DOAP). Drupal RDFa. Drupal creator Dries Buytaert wrote a very interesting post35 in October 2008 entitled ‘Drupal, the semantic web and search’ in which he said: On a social networking site built with Drupal, [semantic technology] opens up the possibility to do all sorts of deep social searches - searching by types and levels of relationships while simultaneously filtering by other criteria. I was talking with David Peterson the other day about this, and if Drupal core supported FOAF and SIOC out of the box, you could search within your network of friends or colleagues. This would be a fundamentally new way to take advantage of your network or significantly increase the relevance of certain searches.29 (URL last accessed 2009-06-09)30 (last accessed 2009-07-20)31 (URL last accessed 2009-06-09)32 (URL last accessed 2009-06-09)33 (URL last accessed 2009-04-02)34 (URL last accessed 2009-07-20)35 (URL last accessed 2009-06-09)
  • 227. 222 The Social Semantic Web The structured data that is available in many Drupal deployments (but is diffi- cult to leverage due to HTML representations) can be exposed and leveraged via RDFa using modules created for Drupal 636 and through ongoing work to add RDFa to the core of Drupal 7 (based on a roadmap by Stéphane Cor- losquet37). A video was created38 to demonstrate some deep searches of Drupal RDFa data using Yahoo! SearchMonkey, and it also showed some visual navi- gations of this linked data. The possibilities are exciting, as Dries says39: Google and Yahoo! are getting increasingly hungry for structured data. It is no surprise, because if they could [build] a global, vertical search engine that, say, searches all products online, or one that searches all job applications online, they could disintermediate many existing companies. [...] Hundreds of thousands of Drupal sites contain vast amounts of structured data, covering an enormous range of topics [and these structures] can be associated with rich, semantic meta-data that Drupal could output in its XHTML as RDFa. For example, say we have an HTML textfield that captures a number, and that we assign it an RDF property of ‘price’. Semantic search engines then recognize it as a ‘price’ field. Add fields for ’shipping cost’, ‘weight’, ‘color’ (and/or any number of others) and the possibilities become very exciting.• phpBB and vBulletin SIOC Exporters. phpBB is one of the most-used open- source message board platforms, and vBulletin is a commercial message board solution that powers some of the world’s largest discussion sites. vBulletin40 and phpBB SIOC Exporters41 have been written that produce SIOC metadata about forums, posts and the users that created them. Wikis. A SIOC data exporter for MediaWiki-based wikis42 was recently cre- ated that takes a MediaWiki page URL as input and provides RDF as the out- put. Also, the ‘DokuSIOC’ plugin43 for DokuWiki creates wiki page metadata with auto-discovery links and content negotiation, pings PingTheSemantic- on updates, and provides linked data. Other producers BlogEngine.NET and SemanticEngine.NET. A data portability pack has been announced for blogging platform BlogEngine.NET44 that produces SIOC,36 (URL last accessed 2009-07-09)37 (URL last accessed 2009-07-09)38 (URL last accessed 2009-07-09)39 (URL last accessed 2009-07-09)40 (URL last accessed 2009-06-09)41 (URL last accessed 2009-06-09)42 (URL last accessed 2009-07-20)43 (URL last accessed 2009-07-20)44 (URL last accessed 2009-06-09)
  • 228. 11 Interlinking online communities 223 APML (Attention Profiling Markup Language)45 and FOAF. From the same team, the SemanticEngine.NET class library46 also supports various different formats ready to use in .NET including APML, FOAF, XFN, SIOC and micro- formats. Microblogging. An RDF exporter for Twitter47 microblogs is available, and a ‘Sioku’ Jaiku to RDF service was created for the Jaiku microblogging platform. SIOC and FOAF are used as the main vocabularies for representing streams of microblog entries and for describing people and their contacts respectively. StatusNet, the open-source microblogging platform that powers, also publishes both FOAF (describing people48) and SIOC data (as SIOC- augmented RSS feeds for users49 and groups50). BAETLE. BAETLE51 (Bug And Enhancement Tracking LanguagE) aims to create a software bug ontology that can be used by various repositories to en- able people to query for bugs across these repositories. SIOC is being used to define some of the required terms. TagCrumbs. TagCrumbs52, a social placemarking service, have announced53 that they are now making placemark data available in RDF, including SIOC.11.7.2 SIOC collectorsWith the current proliferation of social websites that we are now all members of,people are beginning to understand the motivations for being able to retrieve user-generated content items collected from all or from specific types of social websites(blogs, forums, mailing lists, photo albums) using mechanisms like SIOC andFOAF. After SIOC data has been exported from various social software systems,we can collect and perform queries on the aggregate data set from a collection ofcommunities (Figure 11.13). This could be a set of public discussion services, or agroup of internal collaboration systems from an organisational intranet.45 (URL last accessed 2009-06-09)46 (accessed 2009-06-09)47 (accessed 2009-06-09)48 (URL last accessed 2009-07-20)49 (URL last accessed 2009-07-20)50 (URL last accessed 2009-07-20)51 (URL last accessed 2009-06-09)52 (URL last accessed 2009-07-20)53 (URL last accessed 2009-07-20)
  • 229. 224 The Social Semantic WebFig. 11.13. Collecting SIOC data from multiple sources for aggregated search and retrieval11.7.2.1 The SIOC CrawlerSIOC data can be collected by a crawler that traverses the Web and retrieves anySIOC data it finds. The crawler starts with a list of ‘seed’ SIOC URLs and followsrdfs:seeAlso links used to point to more SIOC and RDF data. This is a genericprinciple for crawling RDF documents, so a generic RDF crawler could be used.The SIOC Crawler54, however, has additional knowledge about the structure ofSIOC data which allowed the enhancement of this crawler with advanced func-tionality, e.g. incremental retrieval of new SIOC data in threads.11.7.3 SIOC consumersAs the Social Web begins to generate more SIOC data, this information can be re-used by consumers of SIOC data, e.g. to provide better tools for finding related in-formation across community sites or to transfer rich information about contentitems between online community sites. One example of a SIOC consumer isBuxon55, a sioc:Forum browser originally released as a part of the SWAML pack-age and now available as an independent package. Written in PyGTK, it readssioc:Forum information from RDF files and shows it as a tree of message threads.54 (URL last accessed 2009-06-09)55 (URL last accessed 2009-06-09)
  • 230. 11 Interlinking online communities 225Another is the SIOC Reader Universal Widget56 for iGoogle or Netvibesdashboards which accesses SIOC data from any SPARQL endpoint that provides aJSON output for SELECT queries. We shall now describe some more consumersof SIOC data. The SIOC ExplorerOne of the collection tools for SIOC is called the SIOC Explorer57, a web proto-type which can aggregate posts from community web sites publishing SIOC dataand allows users to browse and explore all of this disparate information in an inte-grated manner (Heitmann and Oren 2007). The SIOC Explorer allows you to viewand navigate based on all exported RDF data, not just SIOC, by utilising a do-main-independent faceted-browsing approach. The SIOC Explorer acts in a simi-lar fashion to a feed reader: it allows people to subscribe to SIOC feeds and toread their content. Because SIOC data is semantically rich, people can filter thecontent based on e.g. authors, topics, creation date, or any other properties. The application aggregates SIOC content into a local RDF store and providesvarious ways to view the content and associated data. The start screen allows oneto choose from a list of available sioc:Forum(s), with data coming from SIOC-enabled systems such as online community forums, blogs, mailing lists, etc. By crawling SIOC comment posts, and getting data from forums, blogs, mail-ing lists, etc., tools like the SIOC Explorer can provide a nice overall view of aperson’s contributions to various ongoing or past discussions, or can provide veryspecific views of what a group of people have been saying during a certain time-frame (something that is not easily done with current feed readers or aggregators). When viewing posts from an individual forum or a group of forums, the user ispresented with the list of posts in a reverse chronological order. Each post issummarised and can be expanded in order to read the full content. Clicking on thecreator of a post shows all posts (including comments and replies) written by thisperson, across all forums; clicking on a topic shows all posts tagged with thistopic, again across all forums. In contrast to ordinary feed readers, such lateralbrowsing works across all different types of community forums that can be de-scribed in SIOC: clicking on the user ‘Elias Torres’ will not only show his blogposts, but also his e-mails and contributions to IRC. Finally, a generic faceted navigation interface is offered on the left-hand side,displaying relevant facets that are not already shown as a part of the defaultbrowsing interface. Facets are built dynamically at view time and will show theproperties and values derived from the actual data, also displaying propertieswhich may not be known at the system design time. Some facets (like a tag or theyear) contain only ‘simple’ values while complex facets, such as maker or topic,56 (accessed 2009-06-09)57 (URL last accessed 2009-06-09)
  • 231. 226 The Social Semantic Webcan be further expanded to see subsequent sub-facets (as shown formaker.homepage on the bottom left of Figure 11.14). This indirect faceted browsing is quite powerful, since one can filter by proper-ties of the things that are related to the posts through intermediary objects. We canthus create views like ‘show me all the posts by people who live in London andwho have an interest in the Semantic Web’ by leveraging properties such asfoaf:based_near or foaf:interest. Application developers can customise the facetednavigation to their needs and may choose to exclude or include certain facets.Fig. 11.14. Faceted exploration of SIOC RDF data in the SIOC Explorer The SIOC Explorer is built on the Ruby on Rails framework for web applica-tion development and uses several components for consuming and processing Se-mantic Web data: ActiveRDF58 for mapping RDF data onto programmatic objectsand the associated Semantic Web on Rails Development (SWORD) plugin;BrowseRDF59, a faceted browsing engine that enables navigation of large Seman-tic Web data sets without domain-specific navigation knowledge and provides ageneric view of all RDF data associated with SIOC concepts (e.g. the foaf:makerrelation between authors and posts); and the SIOC RDF crawler that crawls, ex-tracts, normalises, and integrates SIOC data from various community sites.58 (URL last accessed 2009-06-09)59 (URL last accessed 2009-06-09)
  • 232. 11 Interlinking online communities 227Fig. 11.15. Aggregate view of information about a person in the Social SIOC Explorer The Social SIOC Explorer (Bojārs et al. 2007b) is an extension of the SIOCExplorer which allows us to see and explore social relations on the Social Webmanifested via user-generated content. The rich data structure, including links be-tween an original post and its replies and links between a post and additional in-formation about its author, is collected from producers of SIOC data. The SocialSIOC Explorer can use this information to mine relations between people on theWeb, for example, to find a set of people who have participated in a discussionwith or who have commented upon the content created by certain user (see Figure11.15). In addition, this application contains an additional component that performs so-cial network analysis using information about the content created by users, ex-tracting social context information from the SIOC data and allowing visualisationof this information in the user interface. The SIOC BrowserThe SIOC Browser is simply a way to view RDF information in a more human-friendly form, and it allows people to browse and receive additional informationfrom SIOC data sources or data stores. One of the motivations for creating thiswas to enable people to view semantic information easily because it may have dif-ferent aspects that can be of interest - it may be the same information you see on anormal web page, but it may also contain extra information that is not normallydisplayed on a web page but is rather hidden or locked into a database. That in-formation may prove useful for some third-party applications (e.g. a modification
  • 233. 228 The Social Semantic Webdate, incoming links), or perhaps some extra information can be calculated or in-ferred for a semantic page (related content on the same topic, tag usage frequen-cies, etc.). SIOC Browsers can work in two modes - on-the-fly mode and crawler mode -or can use a combination of both (Bojārs et al., 2006). The on-the-fly or livebrowser60 is a simple and effective way to explore community information avail-able in SIOC. It gives a user-friendly look at the internal structure of the datawithout requiring the viewers to dive into a more complex RDF/XML syntax. Atriple-store interface - that can be plugged onto any triple store that offers aSPARQL endpoint - has also been written61 for browsing crawled SIOC data, pro-viding methods to visualise this data in both textual and graphical ways. WordPress SIOC ImporterThe pilot implementation of a SIOC import plugin for WordPress62 is an exampleof an application for reusing SIOC data between social media sites. This plugincan be added to the WordPress blog engine and demonstrates how SIOC data maybe reused by regular blog users (e.g. via an admin user interface). A user begins bysupplying the URL of some SIOC data (Figure 11.16). The plugin then retrievesSIOC data from this URL, extracts all the content items and posts them to theblog. While the pilot implementation only processes one SIOC data file at a time,it can be easily extended to mirror all content from a blog or forum site.Fig. 11.16. Importing SIOC data into the WordPress blogging platform60 (URL last accessed 2009-06-09)61 (URL last accessed 2009-06-09)62 (URL last accessed 2009-06-09)
  • 234. 11 Interlinking online communities 229 Tools like this SIOC import plugin enable us not only to transfer content itemsbetween the same type of sites (e.g. blogs), but also between different types of so-cial media systems (e.g. between a mailing list, expressed in SIOC usingSWAML, and a blog). By combining data export and data import tools for SIOC,we enable various scenarios for data portability of social media contributions (de-scribed later), allowing people to save their contributions and move them betweendifferent online community sites. Sindice SIOC WidgetThe Sindice search engine can be used to find pointers to where related content is,and via their API these results can be used by third-party applications. For exam-ple, the Sindice SIOC Widget63 for WordPress is powered through a combinationof distributed SIOC documents and the Sindice index.Fig. 11.17. Surfing to posts by the same user in remote heterogeneous discussion systems When you are browsing a blog that has this widget installed, you will seespeech bubble icons appearing beside authors’ and commenters’ names. Clickingon any of these icons will show a pop-up with a list of content (posts, comments,topics) that that commenter has created not just on the blog site you are viewingbut across a range of SIOC-enabled websites (blogs, forums, mailing lists, what-ever) as indexed in Sindice, as shown in Figure 11.17.63 (URL last accessed 2009-06-09)
  • 235. 230 The Social Semantic Web Therefore, you can see a person’s activity and navigate to the content that theperson has created across a range of sites from just one place that they post to (i.e.all posts, comments and topics created by that user across the SIOC-o-sphere).You can also click on any arrow icon beside a link in a blog post to see where elseit has been referenced (Figure 11.18).Fig. 11.18. Finding references to a link on other blog systems There is also a Sindice SIOC API64 available which serves as a gateway toSIOC data via the Sindice discovery and search services, enabling the verificationof the presence of a user or a link on the SIOC-o-sphere as indexed within Sindice. Other consumers ZYB. As part of their data portability drive, the mobile phonebook service ZYB (recently acquired by Vodafone) announced on their blog65 that their June 2008 release of ZYB introduced FOAF and SIOC exports from user’s profile pages (with the full export only available to the users themselves at and respectively). SIOC is being used to describe actions performed by ZYB users such as shouting, writ- ing comments or updating status messages on Facebook. IKHarvester. IKHarvester66, a component for the Didaskon67 curriculum as- sembly framework, collects data from semantic social spaces (wikis, blogs,64 (URL last accessed 2009-06-09)65 (URL last accessed 2009-06-09)66 (URL last accessed 2009-06-09)67 (URL last accessed 2009-06-09)
  • 236. 11 Interlinking online communities 231 etc.) and provides it to Didaskon as informal learning objects (LOs). SIOC data exported from blogs and wikis is gathered and mapped to learning object meta- data (LOM) with the IKHarvester prototype. and JeromeDL. notitio.us68, a social bookmarking and knowledge harvesting prototype, provides SIOC metadata support through SSCF (social semantic collaborative filtering). The SSCF functionality can also display the associated SIOC data from bookmarked sites, forums and posts. This function- ality is also implemented in the JeromeDL69 semantic digital library system.11.8 RDFa for interlinking online communitiesSIOC expressed in RDFa can also be used to help create closer interlinks betweenthe objects that make up online communities, i.e. posts, forums or blogs, commu-nities and user profiles. In an article at, Mark Wahl described how toembed SIOC RDFa in XHTML70. Digital Bazaar CEO Manu Sporny has alsowritten a three-part guide71, 72, 73 about how more semantic information could beincorporated into the Digg community pages using SIOC RDFa. We have alsomentioned efforts to include RDFa (including SIOC) in the core of Drupal 774.Referencing the earlier section on distributed conversations and channels, we willnow describe how RDFa can also be used for ‘connecting discussion clouds’. Weassume that namespace prefix definitions likexmlns:sioc=“” are contained in the <head> element of thecorresponding pages. One Person, Many User Accounts. As mentioned previously, it is quite diffi- cult to identify all the posts made by a particular person through their various user personas or profiles on different sites, e.g. ‘find all the blog posts and comments and forum threads I’ve created in the past year’. In the SIOC and FOAF vocabularies, there are two properties linking people to user profiles: a Person is linked to a User using a ‘foaf:holdsAccount’ link, and a User is linked to a Person using an ‘sioc:account_of’ relationship. Using the RDFa for sioc:account_of / foaf:holdsAccount can help with the ‘One Person, Many User Accounts’ issue, e.g. these links could be created from a user signature or other public profile field. For example, let us say that you have an account on68 (URL last accessed 2009-05-06)69 (URL last accessed 2009-06-09)70 (accessed 2009-06-09)71 (accessed 2009-06-09)72 (accessed 2009-06-09)73 (accessed 2009-06-09)74 (URL last accessed 2009-07-13)
  • 237. 232 The Social Semantic Web, and you want to say that you are the same person described by a FOAF profile on your own site, or that you hold another account on Links can also be required in both directions to confirm that you are really who you claim you are, but this can be done at least (even if it is tedious). BBCodes (a markup syntax for message boards) for forums could also be used that trans- late to the appropriate HTML. In the following example, we define that the account (sioc:User) refers to the persona (foaf:Person), which is itself linked to the account (sioc:User). In this way, we can interlink different online identities through a unique FOAF representation, solving the issue presented by having various ac- counts on different services. <div about=“” typeof=“sioc:User”>I am <a rel=“sioc:account_of” href=“”>John Bres- lin</a>,</div> and <div about=“”>I also have another account, <a rel=“foaf:holdsAccount” href=“”>Cloud on microfor-</a>.</div> Virtual Forums. As already mentioned, it would be useful if you could have a virtual forum of all your posts on blogs, forums, mailing lists, etc. Apart from having to go around to all of your favourite sites and linking to your posts from a page, another option would be to do it in reverse: when you create your post, link back to your own post aggregation resource. From the user side, this could be achieved by using the RDFa for sioc:has_container that allows linking to virtual forums from the post creator side, e.g. from the post content. Also, vir- tual forums could be created that list all the posts by people in a restricted so- cial network, or constructed for posts or threads that refer to a certain resource or topic. It is also possible that a discussion post (or threaded set of posts) will need to be in more than one forum. <div about=“#post” typeof=“sioc:Post”>Another post that is related to my love of <a rel=“sioc:has_container” href=“”>electronic mu- sic</a>, Tangerine Dream are releasing five previously unreleased albums next week...</div>
  • 238. 11 Interlinking online communities 233 Unified Communities. Similar to Eran Globen’s idea of ‘distributed social anything’75, in SIOC we reuse dcterms:hasPart and dcterms:isPartOf from Dub- lin Core to link any object to a particular community. The concept of Commu- nity in SIOC is quite generic, but the idea is that you should have a structured way to link different things (forums, people, etc.) to a community object. If there is a community talking about the TV series ‘CSI’, and they have a blog, a mailing list, an aggregation of CSI-related blogs, or just a news website, these could be linked to an identified central community resource through hasPart or isPartOf relations. Similarly, you could identify yourself or your user profile on a particular site as being part of that CSI community. From forums, an RDFa snippet for hasPart or isPartOf would allow one to link a forum or blog or any discussion channel to a larger community, e.g. this could be done from a forum description. Also, this could help to link a user to a community, e.g. by creating a typed link from a user signature. <div about=“” typeof=“sioc:Forum”>Welcome to the SIOC forums, where we talk about the <a rel=“dcterms:partOf”><div typeof=“sioc:Community” href=“”>Semantic Web</div></a> and internet discussions.</div> Distributed Conversations. The SIOC idea for distributed threads is closely related to the cite-rel draft76 by Ryan King and Eran Globen from the micro- formats community. Their rel-reply roughly corresponds to sioc:has_reply, and rev-reply corresponds to sioc:reply_of; rev-update and rel-update may corre- spond to sioc:previous_version and sioc:next_version respectively; and via may be compared with sioc:related_to. Using the rev or reply links would allow dis- tributed threads to form, e.g. if used from the post content. cite-rel brings the reply idea a step further by introducing rel-forward and rev-forward (basically, sioc:has_reply and sioc:reply_of with quoted content). Ultimately, we may eventually need ways to say that a post is in agreement or disagreement with a previous post or even with specific parts of a previous post (see Figure 11.20) by creating other reply types. <div about=“#post” typeof=“sioc:Post” xmlns:sioc=“”>As a follow up to my previ- ous post about <a rel=“sioc:reply_of” href=“ (URL last accessed 2009-06-09)76 (URL last accessed 2009-06-09)
  • 239. 234 The Social Semantic Web 09/07/connections-between-discussion-clouds-with-sioc/”>connecting discussion clouds</a>, I have realised that I need to say more about people and topics, especially as discussed <a rel=“sioc:related_to” href=“ iswc-2005-over-won-an-ipod-aligning-sioc-with-foaf-and-skos/”>on this page</a>. Edit: I’ve since posted about how <a rel=“sioc:has_reply” href=“ sioc-foaf-skos/”>FOAF and SKOS can be used to describe people and topics</a>.</div>11.9 Argumentative discussions in online communitiesDespite the longevity of online communities, it is not possible to view or leverageany benefits from the argumentative structures implicit in the conversations thatare taking place in the many millions of discussions contained in various socialwebsites. While some forum sites allow the use of icons to identify the type of re-plies that occur in a threaded discussion (see Figure 11.19), very few make use ofthese to help users when they are searching for a particular type of response.Fig. 11.19. An argumentative discussion on a message board
  • 240. 11 Interlinking online communities 235Fig. 11.20. Agreement and disagreements in discussion threads Even in just a single thread there can be many challenges in terms of identify-ing agreement and disagreement. Figure 11.20 illustrates some of the complexityrelating to implementing such reply types, especially when user revisions of con-tent are brought into play. There are two types of agreement or disagreementshown in this example: the vertical lines show those between responses to the dis-cussion topic, and the horizontal lines show those between revised versions of acontent item. (Berners-Lee 1999) talked about his frustration with the loss of ar-gumentative context in online discussions: People are already experimenting with new social machines for online peer review, while other tools such as chat rooms developed quite independently and before the Web. [...] By experimenting with these structures, we may find a way to organise new social models that not only scale well, but can be combined to form larger structures. [...] I’d always been frustrated that the essential role of a message in an argument was often lost information. [...] We created a sub-directory called Discussion [... that] allowed people to post questions on a given subject, read and respond. A person couldn’t just ‘reply’. He had to say whether he was agreeing, disagreeing or asking for clarification of a point. The idea was that the state of the discussion would be visible to everyone involved.He further described how such argumentative context could be expressed in anopen form to support ongoing discussions: Imagine having servers for comments in different forums, perhaps family, school and company. Each point and rebuttal is linked, so everyone can see at a glance the direct agreements and contradictions and the supporting evidence for each view, such that anything could be contested by the people involved. If there was some sort of judicial, democratic process for resolving issues, the discussion could be done in a very clear and open fashion, with a computer keeping track of the arguments.
  • 241. 236 The Social Semantic Web One example of a tool that can aid with such discussions is the argumentationvisualisation site Debategraph77. The goal of Debategraph is ‘to make the best ar-guments on all sides of any debate freely available to all and continuously open tochallenge and improvement by all’. It is an impressive tool that has evolved fromthe work of just two people over the past few years, and was tested78 by DowningStreet on their website following a speech by Tony Blair in 2008. The debatemaps themselves can also be embedded into blog posts. Related work has beencarried out by Douglas Walton79 on argumentative schemes and also by Stanford’sRobert Horn80. An argumentation module extension to SIOC81 has been provided to allow oneto formulate agreement and disagreement between SIOC content items (Lange etal. 2008). The properties and classes defined in this module are related to other ar-gumentation models (Groza et al. 2008) such as SALT82 and IBIS83. Some relatedwork has been performed by aligning SIOC with the SWAN ontology (morelater). It may also be necessary to extend these terms for use cases where more de-tailed discourse representation is required. One may want to define exactly what itis that parts of a discussion will be in agreement or disagreement with: for exam-ple, a statement (an opinion or a well-known fact), a question, a topic, etc. Simi-larly, there may be a need for more fine-grained argumentation: rather than agree-ing or disagreeing with an entire post, someone may refer to a knowledge chunk.11.10 Object-centred sociality in online communitiesWe earlier introduced the term ‘object-centred sociality’, referring to the hypothe-sis that people on the Social Web are connected through the user-generated con-tent that they create, annotate and collaborate on. Figure 11.21 shows a conceptualillustration of this idea using a model of content ‘circles’ created by a person viamultiple online accounts. Connections are formed between these circles by peoplecreating similar content or using similar annotations. The semantic representation of people’s decentralised content can be related tothe networks formed between people by social objects (i.e. common content itemsor annotations) (Kinsella et al. 2007). SIOC and FOAF can be used together84 to77 (URL last accessed 2009-01-30)78 (URL last accessed 2009-01-30)79 (URL last accessed 2009-06-09)80 (URL last accessed 2009-06-09)81 (URL last accessed 2009-07-01)82 (URL last accessed 2009-06-09)83 (URL last accessed 2009-06-09)84 (URL last accessed 2009-06-09)
  • 242. 11 Interlinking online communities 237describe the objects in this social network of users. SIOC can be used to provide arepresentation of all content items created by a person (via their user accounts) onvarious social media sites, and this can be nicely combined with the FOAF profileof that person who holds the associated user accounts. All this information, inte-grated across the Social Web, allows us to build up a picture of all the objects thata user has created, interacted with and commented upon across different socialmedia sites, from which the links between the users themselves emerge. Theremay be different requirements for doing this, including: You may want to centralise your content on your own service85. You may want to see your content on a third-party service providing an aggre- gate view, similar to the FriendFeed86 service. You may want to move all of your content and profiles from multiple services to one third-party service, e.g. as provided by the chi.mp87 website. You may just want to move the content you have on one service to another (e.g. move all your blog posts, comments, friends, etc. from to Acme Blog Service).Fig. 11.21. Subscribe to my brain and the things that I have made on various social websites Figure 11.21 shows an example of content that one person has created onFlickr, YouTube, etc. through their various user identities on those sites (thesecorrespond to content containers and items listed in the SIOC Types88 module).We can also say that each content item is a user-contributed post, with some at-85 (accessed 2009-06-09)86 (URL last accessed 2009-06-09)87 (URL last accessed 2009-06-09)88 (URL last accessed 2009-06-09)
  • 243. 238 The Social Semantic Webtached or embedded content (e.g. a file or some other metadata). The inner layer isa person (semantically described in FOAF), the next layer is their user accounts(described in FOAF, SIOC) and the outer layer is the posted content (i.e. text,files, associated metadata, etc.) on community sites (again described using SIOC).11.11 Data portability in online communitiesData portability, referring to methods whereby people can port the data they ownfrom one place to another, is important to people for a number of reasons. Firstly,people are tired of constantly having to repeat their personal profile definitionsacross a range of social websites. Secondly, having to search for your contacts(colleagues or friends) is becoming increasingly tiring as new sites appear.Thirdly, if you decide you want to change services from one platform to another,there are very few easy-to-use mechanisms for bringing your content items withyou (photos, blog posts, whatever). But most importantly, users like to think thatthey have full control over their own data. That means having the freedom to bringtheir data with them if they choose to use it elsewhere. There is a growing feeling that companies need to support the wishes of theirusers in this direction, as expressed in ‘The Bill of Rights for Users of the SocialWeb’89 written by Joseph Smarr, Marc Canter, Robert Scoble and Michael Arring-ton. However, companies should also realise that providing mechanisms for dataportability does not necessarily mean that users will leave their sites en masse. Byproviding open methods to access data on sites, via APIs or query mechanisms orembedded markup (e.g. Facebook’s FQL, the Twitter API and the Flickr API), thebig players are allowing others to build new and interesting applications on top oftheir sites which encourages users to stay on board. Also, the users then feel happyin knowing that they have access to their data if they need it, building loyalty asopposed to anger against restrictive user data agreements. Lastly, these companiescan open up avenues for an influx of new users who can easily bring their dataover via data portability mechanisms.11.11.1 The DataPortability working groupDataPortability is an independent advocacy group promoting interoperabilitythrough open standards and user control in an attempt at standardising various as-pects of social networking, and has a common agenda with other groups referredto earlier including DiSo and the OWF. (‘DataPortability’ is used here to refer tothe working group and ‘data portability’ refers to the ability to port data.) The89 (URL last accessed 2009-06-09)
  • 244. 11 Interlinking online communities 239DataPortability working group was established to look at ways in which data (per-sonal profiles, content, etc.) could be ported from one service (e.g. a social web-site) to another. DataPortability aims to document the best practices for integratingexisting open standards and protocols to enable end-to-end data portability be-tween online tools, vendors, and services. A goal of the working group is to enableusers to move, share, and control their identity, photos, videos and all other formsof personal data. Some potential technologies that fulfil data portability require-ments are listed by developer John Dyer90 and include OpenID, FOAF,SIOC and APML. DataPortability is acting as a rallying point for solutions to some common setof scenarios that are currently being encountered on the Social Web: migrating so-cial network profiles, locating friends, transporting content items. It is building onthe vast amount of work that has been done in related efforts such as the SemanticWeb (FOAF, SIOC, etc.), the microformats community, OpenID, RSS, AtomPub,OPML, OpenID, etc. Rather than inventing new standards, DataPortability lever-ages existing published formats, ‘de-facto’ standards and related APIs that haveemerged (‘standing on the shoulders of giants’). Their focus is more on showinghow certain typical scenarios or use cases can be solved using a combination ofpieces that exist already. For users to have true data portability, there still needs to be some consensus onboth the APIs and the formats needed to transfer and represent this portable data.A challenge for DataPortability is therefore the mapping of different data stan-dards. As this initiative may use related but different data representation methodsfor various scenarios (e.g. XFN and hCard in one case and FOAF in another), theissue of mapping between these formats needs to be addressed. Existing workfrom the Semantic Web domain (ontology mapping, GRDDL) could provide asignificant contribution to addressing this question. A large number of companies have already signed up to the DataPortability ini-tiative, including Microsoft, Google, Yahoo! and Facebook. This may be partiallydue to the fact that with all such initiatives, it is important to be aware of all im-plementation strategies, technical blueprints and policy guidelines when they arebeing drafted, so that opinions can be expressed at a formative stage (rather thanfinding something untenable if they joined later on). There is also a publicity op-portunity for the big players in this area, since those who do not support such aninitiative may be seen as opposing the views of many users who want to havetransparency when joining new services or sites.90 (URL last accessed 2009-06-09)
  • 245. 240 The Social Semantic Web11.11.2 Data portability with FOAF and SIOCOne of the problems with combining social media data is in knowing exactly whataccounts the user holds on different social websites so that one can access infor-mation about the content created by the user on each of these sites. Josh Pattersonhas described how a combination of YADIS91, XRDS (eXtensible Resource De-scriptor Sequence)92, SIOC and FOAF can be used to discover and collect one’sdata from a variety of WRFS-enabled (Web Relational File System) sources93.YADIS, XRDS and WRFS are used for discovery purposes, and the FOAF andSIOC vocabularies are used to describe a person’s social network profile, theiruser accounts and the content items created using those accounts in various con-tainers on different social websites. Firstly, the YADIS communications protocol is used to discover an identity fora particular person, which then returns a YADIS/XRDS document indicatingwhich identities that person prefers to use (e.g. referencing a OpenID account andassociated FOAF profile), and what services those identities are held on. Then, theWRFS abstraction model can be used to find out what containers the returnedidentities hold on those services. SIOC is an ideal representation method for de-scribing the content of those containers and the items, and the structure and con-nections therein. In Figure 11.22, Bob holds user accounts on various social websites (twoshown for clarity, but there could be many more), and via those accounts he cre-ates content items on those sites (usually within containers of some sort, e.g. in abookmark folder, personal blog, message board or image gallery). He should beable to port not only his social graph (in this case, his connections to Alice andCarol), but also his personal containers and sets of content items, and perhaps evenassociated comment replies. The associated vocabulary terms are shown in darkgrey: foaf:knows, sioc:User, etc. SIOC was initially created to provide a way to describe the content from onlinecommunities (mailing lists, message boards, etc), was soon used to describe blogs(since the post plus reply structure is very similar to community discussions: thefirst poster is usually just one person in blogs), and more recently it has been usedfor other types of Social Web content. However, SIOC is more than just a way to represent personal containers ofdata. Methods are required for porting not just small user-centric sets of data butwhole sets of community data – especially for niche groups who may want tomove from one service to another. It includes the main concepts needed to describe the structure and contents of acommunity site as a whole. If someone runs a community site, and they decide91 (URL last accessed 2009-07-07)92 (URL last accessed 2009-07-07)93 (URL last accessed 2009-06-09)
  • 246. 11 Interlinking online communities 241that they want to port their group from one place to another, SIOC can be used todescribe the structure and content of the existing community site in order to recre-ate it on a different information system.Fig. 11.22. Representing the data created by users on various social websites in a portable form11.11.3 Connections between portability effortsDataPortability is mainly about users being able to have portable data (profiles,identities, content like photos, videos, discussion posts) that they can move be-tween the services and sites that they trust and choose to use. Microsoft has alsobegun an initiative in the data portability space with their Contacts APIs. They an-nounced94 in 2008 that they were ‘working with Facebook, Bebo, Hi5, Tagged andLinkedIn to exchange functionally-similar Contacts APIs, allowing [them] to cre-ate a safe, secure two-way street for users to move their relationships between[their] respective services’. Uno de Waal has since described how the related Mi-crosoft Invite2Messenger service can be used to gain access to your Facebook94 (accessed 2009-06-09)
  • 247. 242 The Social Semantic Webfriends’ e-mail addresses in plain text95. OpenSocial on the other hand is moreabout ‘gadget’ portability, where social applications can be deployed across a va-riety of social networking sites (although some data portability tasks may be pos-sible). Figure 11.23 shows some of the main companies who have signed up to theDataPortability, Contacts APIs and OpenSocial initiatives.Fig. 11.23. The overlap between companies involved in related portability efforts Marc Canter96 and others have pointed out97 that although the Contact APIsfrom Microsoft are not open in themselves, at least the APIs seem to export asmuch data as they can import. He also says that Microsoft (and other big compa-nies) may not be explicitly following the actions (e.g. the technical recommenda-tions) of the DataPortability initiative, but rather claims that it would hurt them ifthey did not open up and go along with some portable data efforts given the cur-rent climate and the tide of users in favour of this.11.12 Online communities for health care and life sciencesWe will now describe some efforts towards implementing Semantic Web tech-nologies in online communities that focus on health care and life sciences. We in-95 (URL last accessed 2009-06-09)96 (URL last accessed 2009-06-09)97 (URL last accessed 2009-06-09)
  • 248. 11 Interlinking online communities 243clude efforts such as SWAN, SCF and bio-zen, but we note that this is an emerg-ing area being considered from a variety of perspectives (e.g. some are discussinghow health care management can be advanced with the [Social] Semantic Web98).11.12.1 Semantic Web Applications in NeuromedicineAnother interesting use case for SIOC is its use in life science applications. Recentefforts in this direction include the SWANSIOC99 initiative to provide models andtools for scientific discussions on neuromedicine in both formal publications andthrough social software. SWAN (Semantic Web Applications in Neuromedicine)100 is a framework usedfor representing discourse on neurodegenerative disorders. SWAN consists of: aformal ontology to record and present scientific discourse; a knowledge base ofhypotheses, claims, evidence, concepts, entities, citations genes and proteins inAlzheimer’s disease research; a community process built upon Alzforum101 (theAlzheimer Research Forum); a discovery tool for conflicts, gaps, and missing evi-dence; and an information bridge to promote collaboration. One of the main ele-ments in the SWAN ontology is the research statement, which may be a claim,hypothesis, comment or research question. Instances of these research statementsmay be extracted from a publication or can stand on their own (Ciccarese et al.2008). SWAN has an active beta participation of 144 Alzheimer researchers, and ispart of an ongoing integration into SCF (Science Collaboration Framework), a col-laboration toolkit based on Drupal (more in the next section). More generally, thiswork is leading towards a Semantic Web platform for scientific discourse in bio-medicine that is linked to the key concepts, entities and knowledge specified byappropriate ontologies and that can be integrated with existing software tools thatare useful to Web communities of working scientists. The SIOC and SWAN teams are aligning their ontologies to provide a commonframework for modelling scientific discourse and argumentative discussions basedon the experiences of both teams. New classes are being added to SIOC as well asaligning SIOC properties and classes with SWAN ones102 (see Figure 11.24).98 (URL last accessed 2009-07-20)99 (URL last accessed 2009-06-09)100 (URL last accessed 2009-06-09)101 (URL last accessed 2009-07-10)102 (URL last accessed 2009-06-09)
  • 249. 244 The Social Semantic WebFig. 11.24. Alignments between the SIOC, FOAF, SWAN Citations and SWAN Scientific Dis-course ontologies11.12.2 Science Collaboration FrameworkThe Science Collaboration Framework (SCF) (Das et al. 2008b) is a reusable,open-source platform for structured online collaboration in biomedical researchthat leverages existing biomedical ontologies and RDF resources on the SemanticWeb. SCF supports structured Social Web-type community discourse amongstbiomedical research scientists that is centred on a variety of interlinked heteroge-neous data resources available to them (both formal and informal content, includ-ing scientific articles, news items, interviews, and various other perspectives). The first instance of the SCF framework is being used to create an open-accessonline community for stem cell research called StemBook103. StemBook was de-veloped based on requirements from the Harvard Stem Cell Institute. A secondcommunity is being planned for PD Online, a web community for Parkinson’sdisease researchers sponsored by the Michael J. Fox Foundation (MJFF). The de-velopers of SCF have cited significant overlaps between PD Online requirementsand existing features built for StemBook, suggesting that the framework willachieve feature convergence through successive community implementations. Al-103 (URL last accessed 2009-06-09)
  • 250. 11 Interlinking online communities 245zforum is currently redesigning their website to incorporate the SCF platform, andSCF is also being evaluated by several other scientific communities. While each of these communities has a different focus, there is an obvious ad-vantage to basing them on a common infrastructure with shared ontologies andsharable modules. These implementations primarily act as forums for discoursewithin a particular community of members, but information from these communi-ties can be of significant interest and importance across related specialities andshould be shared. Discourse from each of these communities could be fed into oneor several SCF-powered knowledge bases, allowing for a community of interoper-able scientific communities. The SCF GPL software104 consists of the Drupal core content management sys-tem and customised modules. A community administrator has the option of install-ing selections of these modules if there is an existing Drupal site or of installingthe entire framework, with configuration being performed through the normalDrupal administrative control panel. SCF provides the ability to publish articles,interviews and news; to annotate these with biological resources such as genes,animal models and antibodies; and to create informal discourse around these con-tent items as well as current scientific articles. The information about a contenttype (e.g. gene, article, etc.) is contained within a Drupal node (content item); asite can create instances of gene nodes, article nodes, etc. using these modules.Bioinformatics data tends to be heterogeneous, and therefore there was a need forSCF to be able to instantiate nodes from heterogeneous data sources such as XMLand RDF. The knowledge representation of biological resources is based on an ex-tension of the SWAN ontology described in the previous section. SCF incorporates a ‘node proxy’ architecture for populating gene nodes or pub-lication article nodes via resources available from the Semantic Web or other webservices. Gene information is imported from the Entrez Gene RDF repository(Ruttenberg et al. 2007, Sahoo 2006) via a SPARQL interface105. Articles format-ted using National Library of Medicine Document Type Definition in XML canalso be uploaded into SCF. Articles can be annotated with biological processes,molecular functions or cellular components from the Gene Ontology (this data canbe loaded into Drupal taxonomies). The architecture (Das et al. 2008a) makes it possible to define common sche-mas in OWL for a set of web communities and to enable interoperability acrossbiological resources, SWAN research statements or other objects of interest de-fined in the shared schemas. It is planned to make these graphs available via RDFaembedded within the HTML, and this work will be carried out in parallel with ef-forts to integrate RDFa into Drupal core106 that we discussed previously.104 (URL last accessed 2009-06-09)105 (URL last accessed 2009-06-09)106 (URL last accessed 2009-06-09)
  • 251. 246 The Social Semantic Web11.12.3 bio-zen and the art of scientific community maintenanceThe bio-zen initiative107 is an attempt to represent data, information and knowl-edge from research in all facets of life sciences on the Semantic Web. The goal ofthis project is the unification of information that is now scattered through a multi-tude of different data structures, exchange formats and databases. As part of this,the Semantically-Interlinked Scientific Communities108 (SISC) effort aims to im-prove how scientific data and knowledge is currently being represented and com-municated. bio-zen and SISC uses SIOC, FOAF, DC, Creative and Science Commons,OBO and HCLS ontologies and technologies as its foundation. The bio-zen ontol-ogy framework also has a proposed bio-zen DOLCE use case for the W3C Seman-tic Web Health Care and Life Sciences Interest Group (HCLSIG). SIOC has beenadopted by authors of this initiative for the representation of basic scientific dis-course in scientific publications or on the Web. According to Matthias Samwald,creator of these initiatives, SIOC was chosen one of the base ontologies for this ef-fort since it provides ‘an excellent tool to describe scientific discourse in a practi-cal, web-centric manner’109. Two interesting properties use in bio-zen (that are along the lines of argumenta-tive discussion and IBIS) are ‘supported-by’ and ‘in-conflict-with’, allowing bio-zen ‘to represent the basics of scientific discourse (e.g. one can make the state-ment that a certain posting / document / data set is supported or in conflict withsome other posting / document / data set)’109. Related to this is work by the myExperiment team to reuse terms from SIOC inthe publishing and sharing of experimental data by scientists. In David Newman’smyExperiment ontology110, concepts from Dublin Core, FOAF and SIOC are re-used since they are closely related to the two main functions of myExperiment: asocial networking framework for researchers, and a metadata registry for experi-ments.11.13 Online presenceIn 2008, a project for modelling online presence was initiated by Milan Stankoviccalled the Online Presence Ontology (OPO)111. Whereas FOAF is mainly focusedon static user profiles and SIOC has been somewhat oriented towards threaded107 (URL last accessed 2009-06-09)108 (URL last accessed 2009-06-09)109 (URL last accessed 2009-06-09)110 (URL last accessed 2009-07-20)111 (URL last accessed 2009-06-09)
  • 252. 11 Interlinking online communities 247discussions, OPO can be used to model dynamic aspects of a user’s presence inthe online world (e.g. custom messages, IM statuses etc.). By expressing data using OPO, online presence data can be exchanged betweenservices (chat platforms, social networks, and microblogging services). The ontol-ogy can also be used for exchanging IM statuses between IM platforms that usedifferent status scales, since it enables very precise descriptions of IM statuses. The maintainers of OPO and SIOC are also working together to define align-ments such that semantic descriptions of online presence and community-createdcontent can be effectively leveraged on the Social Semantic Web.11.14 Online attentionAPML, or Attention Profiling Markup Language, is an XML-based format that al-lows people to share their own personal ‘attention profile’, similar to how OPML(Outline Processor Markup Language) allows the exchange of reading lists be-tween sites and news readers. APML compresses all forms of attention-relateddata into a portable file format to provide a complete description of a person’srated interests (and dislikes). Efforts are also underway to link APML into the Semantic Web by creating anAPML-RDF schema112, 113. An example Social Semantic Web application that usesAPML is ‘WebThere: Semantic APML Profiles’114 by Brian MacKay, a servicefor creating and maintaining profiles of user interests and attention preferences insocial websites (powered by SIOC and FOAF).11.15 The SIOC data competitionThe Digital Enterprise Research Institute (DERI) at the National University Gal-way ran a competition during late 2008 in conjunction with, an Irishmessage board site. The competition was an open contest in which entrants wereasked to submit an interesting creation based on a SIOC data set of discussionposts created on between 1998 and 2008 (approximately 9 milliondocuments including SIOC content and FOAF profiles). This data reflects 10years of online life from one of Ireland’s busiest websites, with over 1.5 millionunique visitors a month. Up until now, this data was publicly viewable but it was difficult to leverage itwithout any added semantics due to the fact that it was embedded in heavily-112 (URL last accessed 2009-06-09)113 (URL last accessed 2009-06-09)114 (URL last accessed 2009-06-09)
  • 253. 248 The Social Semantic Webstyled HTML pages. The aim was to encourage entrants to generate interestingapplications or creations that make use of community data represented in theSIOC semantic format. The competition had about sixty registrants with eight fi-nal submissions.Fig. 11.25. SIOC.ME, winner of the SIOC data competition The top winning submission was entitled ‘SIOC.ME: A Real-Time InteractiveVisualisation of Semantic Data within a 3-D Space’115. The entry illus-trates how 3-D visualisations may be harnessed to not only provide an interactivemeans of presenting or browsing data but also to create useful data analysis tools,especially for manipulating the ‘semantic’ (meaningful) data from online commu-nities and social networking sites. The entry (shown in Figure 11.25) was submit-ted by Darren Geraghty, a user interface and interaction designer.115 (URL last accessed 2009-06-09)
  • 254. 11 Interlinking online communities 249Fig. 11.26. boardsview, the runner-up in the SIOC data competition In second place was a visualisation application called ‘boardsview’ by StephenDolan of Trinity College Dublin (Figure 11.26). This is an interactive, real-timeanimation where one can watch the historical content from many discussion fo-rums changing in real or compressed time. In this application, you can zoom into aparticular forum to see individual users posting messages or to see threads beingcreated and destroyed. Third prize was awarded to the ‘Forum Activity Graph’116 by Drew Perttulafrom California (Figure 11.27). This entry was a visualisation showing the popu-larity of forums on as represented by coloured rivers of information,which were drawn as SVG graphics and then rendered and displayed using GoogleMaps.116 (URL last accessed 2009-06-09)
  • 255. 250 The Social Semantic WebFig. 11.27. The ‘Forum Activity Graph’ visualisation of thread growths per forum (topic area) Other final submissions included: ‘Forum Map Demonstration’117 by TristanWebb and Ian Dickinson of HP Labs Bristol, a demonstration of self-organisingmaps applied to an information navigation problem in a big community site; ‘FindSomething Interesting’ by ITT Dublin’s Alexandra Roshchina and AlekseyKharkov, an application to provide recommendations of the most interesting postsand threads based on interest-matching and graph-mining techniques; ‘Chart-Boards’118 by Martin Harrigan of TCD, a tool for examining community trends viaterm frequencies; and ‘Visualising the Community Culture withCharts’119 by Eoin McLoughlin of TCD, where various graph types were used tosimplify the huge amount of available community data to something that could al-low someone to easily grasp its size and depth.117 (URL last accessed 2009-06-09)118 (URL last accessed 2009-05-20)119 (URL last accessed 2009-06-09)
  • 256. 12 Social Web applications in enterpriseWhile we have mainly described applications for social media on the Web inprevious chapters of this book, it is important to also consider the aspects ofSocial Web applications in enterprise environments. As on the Web, corpo-rate ecosystems can benefit from Semantic Web technologies to provide ad-vanced services to knowledge workers and to solve some of the issues withtraditional collaborative information systems. This chapter will therefore de-scribe the ideas and limits of Enterprise 2.0 ecosystems and will give an over-view of how Semantic Web technologies can be used to solve these issues andmake these systems more powerful.12.1 Overview of Enterprise 2.0So far, we have mainly described the usages, applications and impacts of semantictechnologies on social media at web scale. However, while people generally usesocial media for their own personal reasons, one important trend in the last fewyears is the use of social software in corporate contexts. This recent trend, oftentermed ‘Enterprise 2.0’ (McAfee 2006), describes the next generation of informa-tion management and collaboration tools being used in organisations, similar tohow Web 2.0 is being used to describe a second-generation of web-based commu-nities and hosted media services. Whereas Web 2.0 is the use of applications suchas blogs, wikis, RSS feeds and social networking on the Web, Enterprise 2.0 is thepackaging of those technologies in both corporate IT and workplace environ-ments. ‘Enterprise 2.0 is the use of a freeform social software platform inside an or-ganisation that allows them to do things that are important’, says McAfee. ‘Thereare direct enterprise equivalents [to Facebook]. You can ask people the status oftheir projects, what they’re working on, are they travelling, things they’ve learned.All of these things would be very valuable inside an enterprise.’ An article entitled ‘Social Media Will Change Your Business’ in BusinessWeek1 by Stephen Baker and Heather Green described the latest movements in thesocial media space, the business applications of social media-based technologiesand how things have changed since their 2005 article2 entitled ‘Blogs Will ChangeYour Business’. (In fact, the authors revised, annotated and re-published their ear-1 (URL last accessed 2009-06-09)2 (URL last accessed 2009-06-09)J.G. Breslin et al., The Social Semantic Web, DOI 10.1007/978-3-642-01172-6_12,© Springer-Verlag Berlin Heidelberg 2009
  • 257. 252 The Social Semantic Weblier article to account for recent advances in social media3.) Some sites and ser-vices covered included TechCrunch, Engadget, the Huffington Post, YouTube,Twitter, Wikipedia, Facebook and MySpace. Research for the article was also car-ried out via their Twitter accounts stevebaker and heatherlgreen.Fig. 12.1. A wiki used for project management (the JIRA issue tracking system integrated withConfluence wiki) Enterprise 2.0 is also considered as ‘the use of emergent social software plat-forms within companies, or between companies and their partners or customers’4.It places emphasis on how tools generally meant for personal or collaborative useon the Web, such as blogs or wikis, can be part of corporate information systems.When defining Enterprise 2.0, McAfee introduced the SLATES acronym, identi-fying six main features that such information systems should cover: Search, Links,Authoring, Tags, Extension and Signals. For example, blogs and wikis can easethe Authoring process and the definition of Links without the need for technicalrequirements thanks to intuitive interfaces. Folksonomies can be used to add Tagsto any published content and consequently leveraged for navigation Extensions.Moreover, tools such as microblogging and RSS feeds can be used to provide Sig-nals features, especially as the former can be an interesting medium to get real-time answers to any question asked in a company, providing an improved dynamicwhen compared to instant messaging since anyone can provide an answer. In addi-3 (URL last accessed 2009-07-13)4 (URL last accessed 2009-06-09)
  • 258. 12 Social Web applications in enterprise 253tion, most of these tools provide Search functionalities, via plain-text search en-gines or using tag-based information retrieval capabilities. More generally, an important feature of these services is the collaborative as-pect, since most of the tools provide live information spaces and documents thatlead to writable, user-driven and evolving information systems, as opposed to tra-ditional information management architectures with complex workflows and pub-lishing procedures. Enterprise 2.0 is more than just a publishing mechanism, as itstools are also beneficial for fostering collaboration between employees and be-tween various departments. For example, blogs can be used by various clusters tolet all employees know about their activities, while transverse projects involvingseveral departments can use a wiki to monitor their activities, to create meetingminutes and to write deliverables. This virtual collaboration and editing processcan then allow people in the company to know each other better and ideally to fos-ter new collaborations in real life. However, it is not only within companies that Enterprise 2.0 becomes useful,but also regarding the way companies can benefit from public social software sys-tems in order to (1) gather knowledge internally from the Web, and (2) be morevisible and communicate in a more dynamic way with one’s customers. Regardingthe first point, the use of social media and RSS feeds is particularly relevant. Fig-ures estimate that 75-80% of learning is done informally5, and with 40-50% ofemployees accessing information and knowledge from social websites6, the SocialWeb is potentially responsible for a large proportion of this informal learning (upto 30-40%). RSS feeds can be used to gather this information internally and mostimportantly, for free. Companies can benefit from the knowledge of domain expertsby simply subscribing to their blogs, similar to how they can benefit from ad-vanced software by using open-source solutions. In some cases, knowledge acquisition from the Web can be carried out in aproactive manner, as in the online ideas marketplace ‘Ideagoras’ described inWikinomics (Tapscott and Williams 2006). Another outcome from the use of So-cial Web technologies in enterprise is the choice that must be made regarding theuse of public services or internal ones, especially for social networking aspects. Asobserved by J. Nicholas Hoover in InformationWeek7: ‘companies must decidewhether to take the build-it-yourself approach or simply hitch on to social net-works like Facebook and LinkedIn. Private networks offer greater control and pro-tection, while the Web approach makes it possible to reach more people.’ A simi-lar decision is required in terms of SaaS (Software as a Service), since manycompanies often have to decide whether or not to outsource some of their services(such as e-mail or data storage) to web-based solutions, e.g. Google Enterprise8.5 (URL last accessed 2009-06-09)6 (accessed 2009-06-09)7 (accessed 2009-06-09)8 (URL last accessed 2009-06-09)
  • 259. 254 The Social Semantic WebFig. 12.2. JetBlue interacting with its customers on Twitter In terms of uptake of these technologies and business impact, a survey by So-cialText showed that about 13% of the Fortune 500 companies have a publicblog9. Many companies are also considering using other social media platforms tocommunicate with their customers such as Second Life or Twitter. The former caneven be used for job interviews10 while the latter can be an efficient way to pub-lish short updates regarding the overall company or about some services and prod-uct releases11. It can even be used to interact with customers in an agile way, asJetBlue are doing with nearly a million followers in Figure 12.2. Meanwhile,Gartner has identified that many more ‘social computing platforms’ will be con-sidered for adoption by companies during the next 10 years12, while a report byForrester Research13 predicts that the Enterprise 2.0 solutions market will be $4.6billion by 2013. The three biggest areas identified by Forrester are social network-ing, RSS and mashups. This is consistent with the drive towards social objects:combining and pushing the right information of interest to you from a variety ofsources via your social contacts.9 (URL last accessed 2009-06-09)10 (URL last accessed 2009-06-09)11 (URL last accessed 2009-06-09)12 (URL last accessed 2009-06-09)13 (URL last accessed 2009-06-09)
  • 260. 12 Social Web applications in enterprise 25512.2 Issues with Enterprise Social and philosophical issues with Enterprise 2.0Some argue that Enterprise 2.0 (McAfee 2006), i.e. the use of social softwarewithin companies and in their communications with customers, raises more phi-losophical issues than technical ones. We shall now describe some of these inmore detail, while making a distinction between internal and external usage of so-cial software. Advantages and disadvantages of using external social websitesWith the increased attention being directed towards social media in the public re-lations sector, companies are wondering if they should be represented on socialwebsites (Facebook, Twitter, etc.). Social websites can help organisations to in-vestigate and find out more about their target audiences, and these sites allowthem to participate in or contribute to conversations about specific products orservices (e.g. the commercial interaction discussion forums on Theycan receive and respond to feedback about their offerings from customers (eitherdirectly or indirectly), but also from influential hubs and connectors in social web-sites. As well as being a useful means for self-promotion and for marketing prod-ucts online (through targeted advertising and viral marketing), social websites canbe used for discovering new recruits and for networking with peers. If a company’s customers are already using a particular site, then they can lev-erage it as opposed to building their own. It is quite difficult to build a communityfrom scratch (i.e. to gain critical mass), unless there is a very strong attractor to thesite. Building a sub-community on an existing site being used by a company’s cus-tomers (whether it belongs to the company or is external to the company) wouldbe a good idea if there is high usage of that site, allowing companies to potentiallyreach many users and get more valuable feedback than they could otherwise do. There are natural worries for companies if their employees are using externalsocial websites, both in terms of sapping employee productivity and unwittinglydisclosing private information. Many chief information officers from large com-panies (e.g. financial institutions) block employee access to public social websites,not just for time-wasting or security reasons, but because of a fear of losing con-trol of information in response to the ‘open’ ethos of the Internet. Accountingfirms often need to ensure that their employees do not provide tax or financial ad-vice online to comply with regulatory guidelines and disclosure legislation. Assuch, companies often require safeguards in terms of tracking documents and dis-
  • 261. 256 The Social Semantic Webcussions, in an effort to comply with company, legal or state regulations. For ex-ample, Awareness Inc.14 provides a solution that tracks posts and sends potentiallyinflammatory posts into moderation boxes for manager review. To avoid potentialviolations or breaches of company protocol, 50% of companies (including Citi-group, Goldman Sachs, JPMorgan, UBS, and Lehman Brothers) block access tothe Facebook service15. If worries about time wasting can be overcome, companies can benefit enor-mously from employees using social networks and social websites externally, es-pecially in terms of informal learning from peers as mentioned earlier. One of thebiggest impediments to companies encouraging usage of some of these sites istrust, but in comparison to e-mail, SNSs have many mechanisms for verifying thatthe user you are about to connect to is trustworthy (friends lists, recommendationsand endorsements, content creation histories). In a blog article by Iona co-founder Chris Horn entitled ‘The Long TailWags’16, he describes how commercial organisations can benefit from and addvalue to the global Web 2.0-using community. There are also some interesting ob-servations on risks and opportunities associated with the large quantities of user-generated data (their activities, profiles, click histories, etc.) that are in the posses-sion of various Internet-based and telecommunications-focused companies. External content having a negative impact on a company’s imageWhen a company’s image is reflected on the Web, another issue is related to em-ployees with behaviours that are not supported by the various policies of a com-pany. Forrester Research recently found that 14% of companies have disciplinedemployees and 5% fired them for offences related to social networking. A poll bySophos found that 66% of workers think their colleagues share too much informa-tion on Facebook17. In a case in Japan, six civil servants in the agriculture ministry of the Japanesegovernment were reprimanded for their combined 408 edits to the Wikipedia. Themost prolific of the six made 260 changes to pages related to ‘Mobile Suit Gun-dam’, a Japanese animation show (or ‘anime’) about giant robots. A representativefor the ministry, Tsutomu Shimomura, said: ‘The agriculture ministry is not incharge of Gundam.’ He also confirmed that a ministry-wide order and site ban wasin place to prohibit access to the Wikipedia from their offices. Jessica Zenner, an employee of Parker Services (a technical recruitment con-tractor for Nintendo) was fired in August 2007 by her employer following con-cerns expressed by Nintendo about posts on her blog ‘Inexcusable Behavior’. On14 (URL last accessed 2009-06-09)15 (URL last accessed 2009-06-09)16 (URL last accessed 2009-06-09)17 (accessed 2009-06-09)
  • 262. 12 Social Web applications in enterprise 257the blog, where she talks about herself, her friends and co-workers (none of whomare mentioned by name), Zenner used the identity ‘Jessica Carr’ but did not ex-plicitly refer to her employers. Another blog-related dismissal was that of Paris-based Catherine Sanderson, who blogged under the moniker ‘La Petite Anglaise’,who was fired from accountancy firm Dixon Wilson for ‘gross misconduct’.Bookseller Waterstone’s also sacked an employee for writing a blog. Joe Gordonfrom Edinburgh, who worked for the company for 11 years, said he was dismissedbecause he ‘brought the company into disrepute [in 2005]’. Reasons for using social software internally in companiesFrom an enterprise usage point of view (networking with colleagues, knowledgesharing, etc.), social networking has been recognised as a tool that can drive cor-porate innovation, For a major enterprise, there are great opportunities for privatecorporate social networks as an internal critical mass can be gained more easily.For example, Starcom MediaVest Group has a 33% usage of their internal SNS. Internal social networks can be extremely useful for sharing information andexpertise. You can share information within a business’ own walls, and use theseservices to mine for in-house expertise (‘expert finding’). Also, companies can re-duce the time spent mailing documents and e-mailing comments by publishing toa common area. More importantly, usage of internal social networks can encour-age employees, alumni, interns, new hires, retired staff, and other stakeholders tointeract with each other both online and offline. According to Rachel Dappe, research manager with IDC, corporate adoption ofsocial networking tools has been considerable due to their effectiveness in cuttingacross barriers in large corporations. Social networking providers and platformsinclude Awareness, Contact Networks, IBM Lotus Connections, introNetworks,Jive Software, Leverage Software, Mentor Scout, Microsoft SharePoint, Select-Minds, Tacit Illumio, Visible Path and Web Crossing. For example, Visible Pathpowers ‘Hoover’s Connect’ for business research company Hoover’s, which letsusers know how they are connected to companies and people in the Hoover’s da-tabase. Adoption and policies for social software internallyOne of the most popular systems that enables companies to harness collectivewisdom and thereby helps individuals to make decisions in an organisation is thecorporate wiki (e.g. SocialText, Confluence, MoinMoin). For example, Social-Text, which combines wikis and blogs, allows the creation of wiki pages or teamblogs that can be commented upon and this can aid in the documentation of deci-sions required to progress, for example, designs, projects, proposals, etc.
  • 263. 258 The Social Semantic Web However, an issue to consider with Enterprise 2.0 is the adoption of such ser-vices. Indeed, in most organisations, knowledge equals power. Consequently usersare not always ready to share information with their peers, even in the same com-pany. For example, when using a wiki, personal knowledge is distributed amongstthat created by the masses and personal improvements are not as visible as in, e.g.a personal note that goes to the head of department. Therefore, a combination of top-down and bottom-up strategies can be the so-lution. Social media expert Suw Charman says18 that early adopters can becomeevangelists of social software solutions while management can encourage em-ployees to use these technologies such that the global benefits of using these toolscan be seen and understood by users. A company’s culture can also have an effecton the acceptance of such open knowledge-sharing architectures. For example, astudy by AIIM19 showed that 41% of companies do not have a clear understandingof what Enterprise 2.0 is, while this percentage goes down to 15% in KM (knowl-edge management)-oriented companies. The same survey also showed that corpo-rate culture is the biggest impediment to implementing Enterprise 2.0 (in 49% ofnon-KM companies and 33% of KM-inclined ones). Many companies also worry about whether social websites will open the doorto problems with misinformation when user-generated content is imported fromthe Web into an Enterprise 2.0 ecosystem. However, one has to determine whetherto trust the source of the information or not, and to balance that with some com-mon sense as to how frequently updated or how many contributions have beenmade to that source (if it is on a wiki, for example). You would expect that aWikipedia article about Barack Obama would be fairly reliable (being under con-stant scrutiny) whereas an article about Obama Station in Japan may be slightly lessso. As online identities become more closely integrated, it is becoming easier to seeif a source is trustworthy based on their contacts or previously-created content.12.2.2 Technical issues with Enterprise 2.0While there is no doubt that the use of social software within organisations easesthe process of publishing and sharing knowledge internally, we will describe inthis section some of the technical issues with Enterprise 2.0 information systems. Information fragmentation between applicationsAs on the Web, information sharing and social networking in organisations is gen-erally object-centric, the main difference being that in this case, the objects are18 (URL last accessed 2009-06-09)19 (URL last accessed 2009-06-09)