How the Web can change social science research (including yours)

How the Web
can change
social science research
(including yours)
Frank van Harmelen
Computer Science Department
VU University Amsterdam
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial

Using the web (of data)
for e-science
in Social Sciences
Frank van Harmelen
Computer Science Department
VU University Amsterdam
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial
Health Warning:
Computer
Scientist!

This talk is about
using the web
as an observational instrument
using the web of data
as an even better observational instrument
using the web of data
as a data-sharing platform

This talk is not about
it's NOT social science about e-science
(e.g Oxford research center)
it's NOT about high-performance computing
(that's just boring infrastructure,
let the computer scientists will deal with that)
I don’t discuss online social experiments
(crowd sourcing, social games, mech. turk, etc)

Who are you?
 who is using large computerised data-sets ?
 who is using data extracted from the web ?
 who is using semantic web data ?

This talk is about
using the web & the web of data
as an observational instrument &
as a sharing platform
Through:
A whole bunch of realistic examples
A sketch of the technology
Message = yes, you can do this too!

Philosophical confession
I take a strongly positivistic stance

Effects of
observation instruments

Question: Is the content of party-political
programmes and election speeches predictive
of government coalition attempts?
Data
• All party manifesto’s,
• half a year of all Dutch newspapers

Example:
Communication science

Question: Can we predict the social network
at Tn from the content at Tn-1?
Data
• Discussions from online forum nl.politiek
• 21.000 participants talking about 19 Dutch
political parties during 259 weeks

Question: Is thematic co-occurence at Yn
predictive of co-authoring at Yn+1?
Data:
5 year conference series,
1000 papers/year, 3000 authors/year

This works…. sort of….
Methods:
web scraping
nat. lang. analysis
(parsing, stemming, synonyms, homonyms)
identity resolution
Required
Physical Interoperability
Syntactic Interoperability
Semantic Interoperability

General idea of Web of Data
(a.k.a. “Semantic Web”)
1. Make data available on the Web
in machine-understandable form
(formalised)
2. Structure the data
and meta-data
in ontologies

Warning:
technical content
coming up

Bluffer’s Guide to RDF
• Express relations between things:
• Results in labelled network (“graph”)
• All labels are actually web-addresses (URIs)
• You can “ping” any label and find out more
• Bits of the graph can live at physically different
locations & have different owners
Frank y
x
AuthorOf
MIT
publishedBy
Subject Object
Predicate

Bluffer’s Guide to RDF Schema
• types for subjects & objects & predicates
• Types organised in a hierarchy
• Inheritance of properties
Frank y
x
AuthorOf
MIT
publishedBy
author book publisher
person artifact
man

Ontologies (= hierarchical
conceptual vocabularies)
Identify the key concepts in a domain
Identify a vocabulary for these concepts
Identify relations between these concepts
Make these precise enough
so that they can be shared between
• humans and humans
• humans and machines
• machines and machines

Biomedical ontologies (a few..)
 Mesh
• Medical Subject Headings, National Library of Medicine
• 22.000 descriptions
 EMTREE
• Commercial Elsevier, Drugs and diseases
• 45.000 terms, 190.000 synonyms
 UMLS
• Integrates 100 different vocabularies
 SNOMED
• 200.000 concepts, College of American Pathologists
 Gene Ontology
• 15.000 terms in molecular biology
 NCBI Cancer Ontology:
• 17,000 classes (about 1M definitions),

On the Web of Data, anyone
can link anything to anything
x T
[<x> IsOfType <T>]
different
owners & locations
<institute>

SPARQL: Bluffer’s Guide
SELECT ?country_name ?population
WHERE {
?country a type:LandlockedCountries ;
?country rdfs:label ?country_name ;
?country prop:populationEstimate ?population .
FILTER (?population > 15000000) .
SELECT ?name ?img ?hp ?loc
WHERE {
?a a mo:MusicArtist ;
?a foaf:name ?name .
OPTIONAL { ?a foaf:homepage ?hp } .
}

Faculteit der Exacte Wetenschappen
MEET JULIE
PhD Student
“institutional influences on
collaboration patterns in
interdisciplinary research”

Julie needs data
33

34

DBLP: RDF & RDF Schema

36
SELECT ?author ?affiliation ?uriAffiliation WHERE
{
GRAPH <$graph> {
{<$article> swrc:author ?author.
OPTIONAL{?author swrc:affiliation ?uriAffiliation.}
OPTIONAL{?author swc:affiliation ?affiliation.} }
}
}
DBLP Query: 2 weeks  15 mins.
UNION {
<$article> foaf:maker ?author.
OPTIONAL{?author swc:affiliation ?affiliation.}
}
UNION {
<$article> dc:creator ?author.
OPTIONAL{?author swc:affiliation ?affiliation.}
}

Example:
Dutch census data
(1795 – 1971)

 40.745.554.078 triples
 Semantically rich

The World Bank is also doing it!
http://data.worldbank.org/
7,000 indicators from World Bank data sets.

The US gov is also doing it!
http://data.gov/ : 390.000 data sets
Compare foreign aid budgets
Does tax influence smokers?
Compare campaign money

already many billions of facts & rules
Everybody’s doing it!
May ‘09 estimate > 4.2 billion triples +
140 million interlinks
It gets bigger every month

And many more
• Reuters
• New York Times
• EU (EUROSTAT, others)
• BBC
• Facebook
• ….

So how good is this
observational instrument ?
Studies on validity (e.g. in science dynamics)
methods for provenance & trust
methods for attribution & citation

For real ?
“ use the power of information to
explore social and economic life on
Earth ”
1bn€ over 10 years

Take home message
use the web & the web-of-data
to obtain your data
use the web-of-data to share your data
yes, you can do this too!
Collaborate with computer scientists
reflect on deeper consquences
for the social sciences
(methodological, theoretical, etc)

Acknowledgements
I’ve freely used material from the work of
Shenghui Wang
Paul Groth
Julie Birkholz
Wouter van Atteveldt
Laurens van Rietveld
Rinke Hoekstra
and many in the Semantic Web community

How the Web can change social science research (including yours)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to How the Web can change social science research (including yours)

Similar to How the Web can change social science research (including yours) (20)

More from Frank van Harmelen

More from Frank van Harmelen (20)

Recently uploaded

Recently uploaded (20)

How the Web can change social science research (including yours)

Editor's Notes