How the Web can change social science research (including yours)


Published on

A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Add pictures
  • Add pictures
  • Add pictures
  • Talk about citation data, difficult to get2 weeks to gather a couple of hundred citation scores
  • Open data to the rescue…. (
  • FasterEasier to experimentAccess to more data
  • How the Web can change social science research (including yours)

    1. 1. How the Web can change social science research (including yours) Frank van Harmelen Computer Science Department VU University Amsterdam Creative Commons License: allowed to share & remix, but must attribute & non-commercial
    2. 2. Using the web (of data) for e-science in Social Sciences Frank van Harmelen Computer Science Department VU University Amsterdam Creative Commons License: allowed to share & remix, but must attribute & non-commercial Health Warning: Computer Scientist!
    3. 3. This talk is about using the web as an observational instrument using the web of data as an even better observational instrument using the web of data as a data-sharing platform
    4. 4. This talk is not about it's NOT social science about e-science (e.g Oxford research center) it's NOT about high-performance computing (that's just boring infrastructure, let the computer scientists will deal with that) I don’t discuss online social experiments (crowd sourcing, social games, mech. turk, etc)
    5. 5. Who are you?  who is using large computerised data-sets ?  who is using data extracted from the web ?  who is using semantic web data ?
    6. 6. This talk is about using the web & the web of data as an observational instrument & as a sharing platform Through: A whole bunch of realistic examples A sketch of the technology Message = yes, you can do this too!
    7. 7. Philosophical confession I take a strongly positivistic stance
    8. 8. Revolution ahead?
    9. 9. Effects of observation instruments
    10. 10. Effects of observation instruments
    11. 11. Effects of observation instruments
    12. 12. Effects of observation instruments
    13. 13. Effects of observation instruments
    14. 14. Example: Political science
    15. 15. Question: Is the content of party-political programmes and election speeches predictive of government coalition attempts? Data • All party manifesto’s, • half a year of all Dutch newspapers
    16. 16. Example: Communication science
    17. 17. Question: Can we predict the social network at Tn from the content at Tn-1? Data • Discussions from online forum nl.politiek • 21.000 participants talking about 19 Dutch political parties during 259 weeks
    18. 18. Example: Science dynamics
    19. 19. Question: Is thematic co-occurence at Yn predictive of co-authoring at Yn+1? Data: 5 year conference series, 1000 papers/year, 3000 authors/year
    20. 20. AmCAT3: Keyword search
    21. 21. This works…. sort of…. Methods: web scraping nat. lang. analysis (parsing, stemming, synonyms, homonyms) identity resolution Required Physical Interoperability Syntactic Interoperability Semantic Interoperability
    22. 22. Web of Data to the rescue
    23. 23. General idea of Web of Data (a.k.a. “Semantic Web”) 1. Make data available on the Web in machine-understandable form (formalised) 2. Structure the data and meta-data in ontologies
    24. 24. Warning: technical content coming up
    25. 25. Bluffer’s Guide to RDF • Express relations between things: • Results in labelled network (“graph”) • All labels are actually web-addresses (URIs) • You can “ping” any label and find out more • Bits of the graph can live at physically different locations & have different owners Frank y x AuthorOf MIT publishedBy Subject Object Predicate
    26. 26. Bluffer’s Guide to RDF Schema • types for subjects & objects & predicates • Types organised in a hierarchy • Inheritance of properties Frank y x AuthorOf MIT publishedBy author book publisher person artifact man
    27. 27. Ontologies (= hierarchical conceptual vocabularies) Identify the key concepts in a domain Identify a vocabulary for these concepts Identify relations between these concepts Make these precise enough so that they can be shared between • humans and humans • humans and machines • machines and machines
    28. 28. Biomedical ontologies (a few..)  Mesh • Medical Subject Headings, National Library of Medicine • 22.000 descriptions  EMTREE • Commercial Elsevier, Drugs and diseases • 45.000 terms, 190.000 synonyms  UMLS • Integrates 100 different vocabularies  SNOMED • 200.000 concepts, College of American Pathologists  Gene Ontology • 15.000 terms in molecular biology  NCBI Cancer Ontology: • 17,000 classes (about 1M definitions),
    29. 29. On the Web of Data, anyone can link anything to anything x T [<x> IsOfType <T>] different owners & locations <institute>
    30. 30. SPARQL: Bluffer’s Guide SELECT ?country_name ?population WHERE { ?country a type:LandlockedCountries ; ?country rdfs:label ?country_name ; ?country prop:populationEstimate ?population . FILTER (?population > 15000000) . SELECT ?name ?img ?hp ?loc WHERE { ?a a mo:MusicArtist ; ?a foaf:name ?name . OPTIONAL { ?a foaf:homepage ?hp } . }
    31. 31. Example: science dynamics
    32. 32. Faculteit der Exacte Wetenschappen MEET JULIE PhD Student “institutional influences on collaboration patterns in interdisciplinary research”
    33. 33. Faculteit der Exacte Wetenschappen Julie needs data 33
    34. 34. Faculteit der Exacte Wetenschappen34
    35. 35. Faculteit der Exacte Wetenschappen DBLP: RDF & RDF Schema
    36. 36. Faculteit der Exacte Wetenschappen36 SELECT ?author ?affiliation ?uriAffiliation WHERE { GRAPH <$graph> { {<$article> swrc:author ?author. OPTIONAL{?author swrc:affiliation ?uriAffiliation.} OPTIONAL{?author swc:affiliation ?affiliation.} } } } DBLP Query: 2 weeks  15 mins. UNION { <$article> foaf:maker ?author. OPTIONAL{?author swrc:affiliation ?uriAffiliation.} OPTIONAL{?author swc:affiliation ?affiliation.} } UNION { <$article> dc:creator ?author. OPTIONAL{?author swrc:affiliation ?uriAffiliation.} OPTIONAL{?author swc:affiliation ?affiliation.} }
    37. 37. Example: Dutch census data (1795 – 1971)
    38. 38.  40.745.554.078 triples  Semantically rich
    39. 39. Who’s doing it?
    40. 40. The World Bank is also doing it!  7,000 indicators from World Bank data sets.
    41. 41. The US gov is also doing it!  : 390.000 data sets Compare foreign aid budgets Does tax influence smokers? Compare campaign money
    42. 42. already many billions of facts & rules Everybody’s doing it! May ‘09 estimate > 4.2 billion triples + 140 million interlinks It gets bigger every month
    43. 43. It gets bigger every month
    44. 44. And many more • Reuters • New York Times • EU (EUROSTAT, others) • BBC • Facebook • ….
    45. 45. So how good is this observational instrument ? Studies on validity (e.g. in science dynamics) methods for provenance & trust methods for attribution & citation
    46. 46. For real ? “ use the power of information to explore social and economic life on Earth ” 1bn€ over 10 years
    47. 47. Pfew….
    48. 48. Take home message use the web & the web-of-data to obtain your data use the web-of-data to share your data yes, you can do this too! Collaborate with computer scientists reflect on deeper consquences for the social sciences (methodological, theoretical, etc)
    49. 49. Acknowledgements I’ve freely used material from the work of Shenghui Wang Paul Groth Julie Birkholz Wouter van Atteveldt Laurens van Rietveld Rinke Hoekstra and many in the Semantic Web community