Lecture 3: Social Web Data Formats (2012)

Social Web
Lecture III
What DATA looks like on the Social Web?
Lora Aroyo
The Network Institute
VU University Amsterdam

Monday, February 27, 12

What do people
contribute on
the SW?


History & Nature
of Blogs
• Blog = weB LOG = we blog
• evolved from online diary (in the 1980’s)
• the term blog coined in late 1990’s
• one of the ﬁrst ways people could contribute
content on the Web themselves
• Nature: political, technical, art, journalistic,
cultural, personal
• Software: WordPress, Blogger, LifeJournal


Types of Blogs
• Single- or Multi-authored
• Photo-blog,Video-blog, Audio-blog
• Life (b)log, now - microlifeblog (twitter)
• lifecasting: in 2007 by Justin Kan: webcam on a cap
• Gordon Bell MyLifeBits: Microsoft SenseCam

http://www.justin.tv/
http://research.microsoft.com/en-us/projects/mylifebits/

Question?
Why has microblogging (eg Twitter) taken over the
popularity from more traditional blogs?


Wikis

• Wiki in Hawaiian meaning fast/quick
• "the simplest online database that could
possibly work" (Ward Cunningham), 1995
• ﬁrst wiki software: WikiWikiWeb (the
QuickWeb)

http://en.wikipedia.org/wiki/Ward_Cunningham
http://en.wikipedia.org/wiki/WikiWikiWeb

Wiki Features
• a website powered by wiki software
• created and maintained collaboratively by multiple users
= an ongoing process that constantly changes the site
• not a carefully crafted site for casual visitors
• users can add, modify or delete content
• to obtain meaningful topic associations between
different pages, page link creation is easy
• Examples: community websites, corporate intranets,
knowledge management systems, and note taking


Wiki Implementation
• as an application server that runs on one or more web servers
• content is stored in a ﬁle system, and changes to the content
are stored in a relational database management system
• commonly implemented software package is MediaWiki
(known from Wikipedia)
• pages structure & formatting: simpliﬁed markup language
(wikitext)
• style & syntax of wikitexts vary among wiki implementations
(some also allow HTMLtags or use WYSIWYG editing)
• Issues: control of editing & changes, trust & security


http://www.wikimedia.org/
http://en.wikipedia.org/wiki/List_of_wikis

Question?
Blogging and wikis are examples of '(lay) users
publishing content'.

What are requirements to make this publishing effective?


User-generated
data


Exploiting the crowd

• in the wiki applications crowd
contributes with collective
intelligence (textual)
• later other media & recourses
emerged, e.g. photo, video, music
• crowdsourcing

Why crowdsourcing?
• many tedious and time-consuming tasks
• professional results not always complete
• professionals (experts) are few & expensive
• professionals do not always know the needs, the
language and the perspectives of the users
• people have wide range of hobbies and detailed
knowledge
• people have time


Example
• in 1760 Wolfgang von Kempelen designed The Turk
• in 2005 Amazon introduced the Amazon Mechanical Turk
• marketplace for work; people perform tasks computers are
lousy at, e.g. identifying items in a photo/video, writing
product descriptions, transcribing podcasts
• organized work
• HITs = human intelligence tasks
• require very little time & offer very little compensation
• workers & requesters


5 Rules of the New
Labor Pool
• The crowd is dispersed and can perform a range
of tasks – from the most rote to the highly specialized

• The crowd has a short attention span, so jobs
need to be broken into “micro-chunks”

• The crowd is full of specialists

• The crowd produces mostly crap - no increase in
the amount of talent – the challenge is to find and
leverage that talent

• The crowd finds the best stuff - finds the best
material and corrects errors

By Jeff Howe

Question?
Was the $1 million Netﬂix prize a victory for crowdsourcing?


Question?
Crowdsourcing is about exploiting collective effort or
collective intelligence.

What are aspects that make it now much more applicable
than before?


Folksonomies


Structure on the Web
• In the evolution of the Web, Semantic Web
refers to an approach to add ‘semantics’ to
the web, by naming terms in a domain
• A speciﬁcation of such terms is called an
‘ontology’
• For software: ontologies help to effectively
use content on the Web (like DB schemas)


Folksonomy
• On the social web the user-generated content is
organized in light-weight ontologies, i.e. folksonomies
• Community-based semantics = a relationship between
Users,Tags & Resources
• user-created, bottom-up classiﬁcation/categorization
of (domain) terms / user-labels, e.g. tags
• tagging = the social process where lay users attach
labels to resources (as opposed to annotation by
professional experts)


• cleaning messy data
• transforming data from one format to another
• fetching missing data


Question?
Folksonomies typically show the relationships between users,
tags and resources.

Can you think of ways to aggregate user-tag-resource combinations
to get more concise and therefore more meaningful folksonomies?


What DATA
formats do we have?


Vocabularies on the
(Social) Web
• to create interfaces or exchange data
between applications the software needs to
know the terms in the data
• vocabularies deﬁne set of terms in a certain
domain, e.g. describing people, relationships,
content of different type


FOAF
• FOAF = Friend of a Friend
• a machine-readable ontology describing persons, their
activities & their relations to other people and objects
• an open, decentralized technology for connecting social Web
sites, & the people they describe
• http://www.foaf-project.org/

• Create your own FOAF ﬁle:
http://www.ldodds.com/foaf/foaf-a-matic


FOAF Vocabulary
• Gradual evolution since mid-2000
• Stable core of classes and properties that will
not be changed
• New terms may be added at any time
• FOAF RDF namespace URI is ﬁxed
• http://xmlns.com/foaf/spec/

FOAF Files
• Text documents, that adopt the conventions of RDF and
may be written in XML, RDFa or N3
• Contain FOAF vocabulary and other RDF vocabularies
• FOAF defines classes, e.g. foaf:Person,
foaf:Document, foaf:Image
• FOAF defines properties of those things, e.g.
foaf:name, foaf:mbox (i.e. an internet mailbox),
foaf:homepage
• FOAF defines relationship that hold between
members of these categories, e.g. foaf:depiction relates
something (e.g. a foaf:Person) to a foaf:Image


Linked Data & FOAF
• model for publishing simple factual data
via a networked of linked RDF
documents
• FOAF is an attempt to use the Web to:
• integrate factual information with
information in human-oriented
documents (e.g. videos, books,
spreadsheets, 3d models)
• and info that is still in people's
heads
• linking networks of information with
networks of people


FOAF Example

• there is a foaf:Person
• with a foaf:name property of 'Dan Brickley'
• in foaf:homepage and foaf:openid relationships to a thing called http://danbri.org/
• in foaf:img relationship to a thing referenced by a relative URI of /images/me.jpg


FOAF Auto-Discovery

• If you publish a FOAF self-description (e.g. using
foaf-a-matic) you can make it easier for tools to
ﬁnd your FOAF by putting markup in the head of
your HTML homepage
• Common ﬁlename foaf.rdf is a common choice

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:admin="http://webns.net/mvcb/">
<foaf:PersonalProfileDocument rdf:about="">
<foaf:maker rdf:resource="#me"/>
<foaf:primaryTopic rdf:resource="#me"/>
<admin:generatorAgent rdf:resource="http://www.ldodds.com/foaf/foaf-a-matic"/>
<admin:errorReportsTo rdf:resource="mailto:leigh@ldodds.com"/>
</foaf:PersonalProfileDocument>

<foaf:Person rdf:ID="me">
<foaf:name>Lora Aroyo</foaf:name>
<foaf:title>Ms</foaf:title>
<foaf:givenname>Lora</foaf:givenname>
<foaf:family_name>Aroyo</foaf:family_name>
<foaf:nick>laroyo</foaf:nick>
<foaf:mbox_sha1sum>d21e8b414a0533e5b4b23411fd76aabbf63ad232</foaf:mbox_sha1sum>
<foaf:homepage rdf:resource="http://lora-aroyo.org"/>
<foaf:depiction rdf:resource="lora.jpg"/>
<foaf:phone rdf:resource="tel:123456789"/>
<foaf:workplaceHomepage rdf:resource="http://www.cs.vu.nl/~laroyo"/>

<foaf:knows>
<foaf:Person>
<foaf:name>Marieke van Erp</foaf:name>
<foaf:mbox_sha1sum>f4e16d18528b83fd8b91b603583cbfd8d15f30f2</foaf:mbox_sha1sum></foaf:Person></
foaf:knows>

<foaf:knows>
<foaf:Person>
<foaf:name>Dan Brickley</foaf:name>
<foaf:mbox_sha1sum>748934f32135cfcf6f8c06e253c53442721e15e7</foaf:mbox_sha1sum>
<rdfs:seeAlso rdf:resource="http://danbri.org/foaf.rdf"/></foaf:Person></foaf:knows></foaf:Person>
</rdf:RDF>

foaf:depiction


SIOC
• Semantically-Interlinked Online Communities
• a standard way for expressing user-generated content, i.e.
enable the integration of online community information
• methods for interconnecting discussions, e.g. blogs, forums &
mailing lists
• Semantic Web ontology for representing rich data from the
Social Web in RDF
• commonly used in conjunction with the FOAF vocabulary
for expressing personal proﬁle and social networking
information
• http://sioc-project.org/


<sioc:Post rdf:about="http://jbreslin.com/blog/2006/09/07/creating-connections"> 1
<dc:title>Creating connections between discussion clouds with SIOC</dc:title>
2 <dcterms:created>2006-09-07T09:33:30Z</dcterms:created>
<sioc:has_container rdf:resource="http://jbreslin.com/blog/index.php?sioc_type=site#weblog"/>
<sioc:has_creator>
<sioc:UserAccount rdf:about="http://jbreslin.com/blog/author/cloud/" rdfs:label="Cloud"> 3
6 <rdfs:seeAlso rdf:resource="http://jbreslin.com/blog/index.php?sioc_type=user&sioc_id=1"/>
</sioc:UserAccount>
</sioc:has_creator>
<foaf:maker rdf:resource="http://jbreslin.com/blog/author/cloud/#foaf"/>
<sioc:content>SIOC provides a unified vocabulary for content and interaction description: a semantic la
that can co-exist with existing discussion platforms. 5
</sioc:content>
4 <sioc:topic rdfs:label="Semantic Web" rdf:resource="http://jbreslin.com/blog/category/semantic-web/"/>
<sioc:topic rdfs:label="Blogs" rdf:resource="http://jbreslin.com/blog/category/blogs/"/>
7 <sioc:has_reply>
<sioc:Post rdf:about="http://jbreslin.com/blog/2006/09/07/creating-connections/#comment-123928">
<rdfs:seeAlso rdf:resource="http://johnbreslin.com/blog/index.php?
sioc_type=comment&sioc_id=123928"/> 8
</sioc:Post>
</sioc:has_reply>
</sioc:Post>

• A post (1) titled "Creating connections between discussion clouds with SIOC" (2)
created at 09:33:30 on 2006-09-07 (3) written by user "Cloud" (4) on topics
"Blogs" and "Semantic Web" (5) with contents described in sioc:content.
• (6) More information about its author at http://johnbreslin.com/blog/
index.php?sioc_type=user&sioc_id=1
• The post has a (7) reply and (8) detailed SIOC information about this reply can be
found at http://johnbreslin.com/blog/index.php?
sioc_type=comment&sioc_id=123928

SIOC

• http://rdfs.org/sioc/ns# - SIOC Core Ontology Namespace
• http://rdfs.org/sioc/access# - SIOC Access Ontology Module Namespace
• http://rdfs.org/sioc/types# - SIOC Types Ontology Module Namespace
• http://rdfs.org/sioc/services# - SIOC Services Ontology Module Namespace


Activity Streams
• A list of recent activities performed by someone on a
website
• Example: Facebook News Feed
• Activity Streams project aims is to develop an activity
stream protocol to syndicate activities across social Web
applications
• Major websites with activity stream implementations have
already opened up their activity streams to developers to use,
e.g. Facebook and MySpace
• http://activitystrea.ms/


Activity Streams
Speciﬁcation
• an actor, a verb, an object and a target
• person performing an action on/with an object
• Geraldine posted a photo to her album
• John shared a video
• activity metadata to present to a user in a rich human-friendly
format, e.g. constructing readable sentences about the activity
that occurred, visual representations of the activity, or
combining similar activities for display
• Activities are serialized using the JSON format
• There is also an ATOM-oriented speciﬁcation


Activity Streams
Example

http://activitystrea.ms/specs/json/1.0/

Verbs, Objects, Mapping
Verbs Objects

http://wiki.activitystrea.ms/w/page/1359319/Verb%20Mapping

XFN
• Xhtml Friends Network
• relationships between individuals: by deﬁning a small set of values that
describe personal relationships
• In HTML and XHTML documents, these are given as values for the
rel attribute on a hyperlink. XFN allows authors to indicate which of
the weblogs they read belong to friends, whom they've physically
met, and other personal relationships. Using XFN values, which can
be listed in any order, people can humanize their blogrolls and links
pages, both of which have become a common feature of weblogs.
• using XFN can easily style all links of a particular type; thus, friends
could be boldfaced, co-workers italicized, etc.
• http://gmpg.org/xfn/


XFN Example

• Joe has a set of ﬁve links in his blogroll: his girlfriend
Jane; his friends Dave and Darryl; industry expert James,
who Joe brieﬂy met once at a conference; and
MetaFilter.
• MetaFilter gets no value since it is not an actual person
http://gmpg.org/xfn/intro

5 people who’ve met
friends vs. acquaintances

colleagues vs. co-workers love vs. family

http://gmpg.org/xfn/intro

Open Graph
• protocol originally developed in Facebook
• enables web pages to become a rich object in a social graph, i.e. any
web page to have the same functionality as any other object on
Facebook
• Basic Metadata: to turn your web pages into graph objects
• og:title = title of your object e.g., "The Rock"
• og:type = type of your object e.g.,
"video.movie"
• og:image = image URL to represent your object
within the graph
• og:url = canonical URL of your object that will
be used as its permanent ID in the graph, e.g.,
"http://www.imdb.com/title/tt0117500/"


OGP: Explained
• “Like” button on each of your posts
• Open Graph Protocol to mark up content OGP:

• preﬁx="og: http://ogp.me/ns#" speciﬁes the OGP
vocabulary

OGP Explained

1. import the Dublin Core & Open Graph
Protocol vocabularies using the
prefix attribute
2. associate a preﬁx, dc and og with the
URL for each vocabulary
3. use dc:creator and og:title,
which are short-hand for the full
vocabulary term URLs http://
purl.org/dc/creator/creator
and http://ogp.me/ns#title,
respectively


RDFa
• another syntax for RDF
• embedded in HTML, e.g. specify that a text is the name of a
product = “adding semantic markup”.
• initially speciﬁed only for XHTML
• RDFa 1.1 = speciﬁed for XHTML and HTML5 (for any XML-
based language, e.g. SVG)
• RDFa Lite = “a small subset of RDFa consisting of a few attributes
that may be applied to most simple to moderate structured data
markup tasks.”
• Publish your data as Linked Data through RDFa --> link to other
URIs (others can link to your HTML+RDFa)


Why RDFa?
• data can be easily shared & reused (no need of maintaining the raw
structured data in a separate ﬁle in a separate format)
• RDFa processors can easily extract all the structured data from a
webpage
• search engines
• Yahoo was a pioneer in this area, starting with Search Monkey
• Google started with Rich Snippets
• Recently, Google,Yahoo, Bing --> Schema.org
• recommendation for publishers on how to semantically
markup their webpages
• Google Recipe = what can be done with structured data on the web


Microformats
• a set of simple, open data formats built upon
existing and widely adopted standards
• Designed for humans ﬁrst and machines second
• Design principles for formats
• Highly correlated with semantic XHTML (aka
the real world semantics, lowercase semantic web,
lossless XHTML)
• “An evolutionary revolution”


Microformats


Your ﬁrst microformat
• You can put a microformat on your website in less than 5 mins
• Example: putting an hCard (online business card) on your site
1. Find your name somewhere on your website
2. Wrap your name in an fn (formatted name)

Jamie Jones

3. Wrap it all in a vcard (declares that everything inside is the hCard microformat):

Jamie Jones
<address class="vcard">Jamie Jones</address>

The address element indicates that the person in the hCard is the contact for the page

My name is Jamie Jones I dig microformats!
http://microformats.org/get-started

Further microformats

• Add more information to your hCard
• Link to your friends and contacts with XFN
• Add events to your site with hCalendar
• Review movies, books, and more with hReview

http://microformats.org/get-started

HTML Microdata
• HTML Microdata allows machine-readable
data to be embedded in HTML documents in an
easy-to-write manner, with an unambiguous
parsing model
• It is compatible with numerous other data
formats including RDF and JSON
• Microdata DOM API
• http://www.w3.org/TR/microdata/

Microdata Syntax

• Microdata consists of a group of name-value pairs.
The groups are called items, and each name-value
pair is a property

• itemscope is used to create an item
• itemprop is used to add a property to an item


Microdata Example
3 properties

URL

Time

top-level

Question?
We have seen many approaches to 'organizing' embedded
semantics, e.g. RDFa, Microformats, schema.org.

All these are driven by different parties and motives. How do
you think this is best organized?


Question?
For which things on the social web would more vocabularies
for embedded semantics be needed (besides what we have
already seen)?


Hands-on Teaser
• mining data in various social web
formats
• see the differences in what each of the
formats can contain & what purpose
they serve
• start: simple search where we pull in
some XFN data and visualise a graph of
people that we ﬁnd on a website
• check: software you will be working
with on the website
image source: http://www.ﬂickr.com/photos/bionicteaching/1375254387/


Lecture 3: Social Web Data Formats (2012)

More Related Content

What's hot

Viewers also liked

Similar to Lecture 3: Social Web Data Formats (2012)

More from Lora Aroyo

Recently uploaded

Lecture 3: Social Web Data Formats (2012)