Your SlideShare is downloading. ×

How the Web can change social science research (including yours)

1,285
views

Published on

A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their …

A presentation for a group of PhD students from the Leibniz Institutes (section B, social sciences) to discuss how they could use the Web, and even better the Web of Data, as an instrument in their research.

Published in: Technology, Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,285
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Add pictures
  • Add pictures
  • Add pictures
  • Talk about citation data, difficult to get2 weeks to gather a couple of hundred citation scores
  • Open data to the rescue…. (
  • FasterEasier to experimentAccess to more data
  • Transcript

    • 1. How the Web can change social science research (including yours) Frank van Harmelen Computer Science Department VU University Amsterdam Creative Commons License: allowed to share & remix, but must attribute & non-commercial
    • 2. Using the web (of data) for e-science in Social Sciences Frank van Harmelen Computer Science Department VU University Amsterdam Creative Commons License: allowed to share & remix, but must attribute & non-commercial Health Warning: Computer Scientist!
    • 3. This talk is about using the web as an observational instrument using the web of data as an even better observational instrument using the web of data as a data-sharing platform
    • 4. This talk is not about it's NOT social science about e-science (e.g Oxford research center) it's NOT about high-performance computing (that's just boring infrastructure, let the computer scientists will deal with that) I don’t discuss online social experiments (crowd sourcing, social games, mech. turk, etc)
    • 5. Who are you?  who is using large computerised data-sets ?  who is using data extracted from the web ?  who is using semantic web data ?
    • 6. This talk is about using the web & the web of data as an observational instrument & as a sharing platform Through: A whole bunch of realistic examples A sketch of the technology Message = yes, you can do this too!
    • 7. Philosophical confession I take a strongly positivistic stance
    • 8. Revolution ahead?
    • 9. Effects of observation instruments
    • 10. Effects of observation instruments
    • 11. Effects of observation instruments
    • 12. Effects of observation instruments
    • 13. Effects of observation instruments
    • 14. Example: Political science
    • 15. Question: Is the content of party-political programmes and election speeches predictive of government coalition attempts? Data • All party manifesto’s, • half a year of all Dutch newspapers
    • 16. Example: Communication science
    • 17. Question: Can we predict the social network at Tn from the content at Tn-1? Data • Discussions from online forum nl.politiek • 21.000 participants talking about 19 Dutch political parties during 259 weeks
    • 18. Example: Science dynamics
    • 19. Question: Is thematic co-occurence at Yn predictive of co-authoring at Yn+1? Data: 5 year conference series, 1000 papers/year, 3000 authors/year
    • 20. AmCAT3: Keyword search
    • 21. This works…. sort of…. Methods: web scraping nat. lang. analysis (parsing, stemming, synonyms, homonyms) identity resolution Required Physical Interoperability Syntactic Interoperability Semantic Interoperability
    • 22. Web of Data to the rescue
    • 23. General idea of Web of Data (a.k.a. “Semantic Web”) 1. Make data available on the Web in machine-understandable form (formalised) 2. Structure the data and meta-data in ontologies
    • 24. Warning: technical content coming up
    • 25. Bluffer’s Guide to RDF • Express relations between things: • Results in labelled network (“graph”) • All labels are actually web-addresses (URIs) • You can “ping” any label and find out more • Bits of the graph can live at physically different locations & have different owners Frank y x AuthorOf MIT publishedBy Subject Object Predicate
    • 26. Bluffer’s Guide to RDF Schema • types for subjects & objects & predicates • Types organised in a hierarchy • Inheritance of properties Frank y x AuthorOf MIT publishedBy author book publisher person artifact man
    • 27. Ontologies (= hierarchical conceptual vocabularies) Identify the key concepts in a domain Identify a vocabulary for these concepts Identify relations between these concepts Make these precise enough so that they can be shared between • humans and humans • humans and machines • machines and machines
    • 28. Biomedical ontologies (a few..)  Mesh • Medical Subject Headings, National Library of Medicine • 22.000 descriptions  EMTREE • Commercial Elsevier, Drugs and diseases • 45.000 terms, 190.000 synonyms  UMLS • Integrates 100 different vocabularies  SNOMED • 200.000 concepts, College of American Pathologists  Gene Ontology • 15.000 terms in molecular biology  NCBI Cancer Ontology: • 17,000 classes (about 1M definitions),
    • 29. On the Web of Data, anyone can link anything to anything x T [<x> IsOfType <T>] different owners & locations <institute>
    • 30. SPARQL: Bluffer’s Guide SELECT ?country_name ?population WHERE { ?country a type:LandlockedCountries ; ?country rdfs:label ?country_name ; ?country prop:populationEstimate ?population . FILTER (?population > 15000000) . SELECT ?name ?img ?hp ?loc WHERE { ?a a mo:MusicArtist ; ?a foaf:name ?name . OPTIONAL { ?a foaf:homepage ?hp } . }
    • 31. Example: science dynamics
    • 32. Faculteit der Exacte Wetenschappen MEET JULIE PhD Student “institutional influences on collaboration patterns in interdisciplinary research”
    • 33. Faculteit der Exacte Wetenschappen Julie needs data 33
    • 34. Faculteit der Exacte Wetenschappen34
    • 35. Faculteit der Exacte Wetenschappen DBLP: RDF & RDF Schema
    • 36. Faculteit der Exacte Wetenschappen36 SELECT ?author ?affiliation ?uriAffiliation WHERE { GRAPH <$graph> { {<$article> swrc:author ?author. OPTIONAL{?author swrc:affiliation ?uriAffiliation.} OPTIONAL{?author swc:affiliation ?affiliation.} } } } DBLP Query: 2 weeks  15 mins. UNION { <$article> foaf:maker ?author. OPTIONAL{?author swrc:affiliation ?uriAffiliation.} OPTIONAL{?author swc:affiliation ?affiliation.} } UNION { <$article> dc:creator ?author. OPTIONAL{?author swrc:affiliation ?uriAffiliation.} OPTIONAL{?author swc:affiliation ?affiliation.} }
    • 37. Example: Dutch census data (1795 – 1971)
    • 38.  40.745.554.078 triples  Semantically rich
    • 39. Who’s doing it?
    • 40. The World Bank is also doing it! http://data.worldbank.org/ 7,000 indicators from World Bank data sets.
    • 41. The US gov is also doing it! http://data.gov/ : 390.000 data sets Compare foreign aid budgets Does tax influence smokers? Compare campaign money
    • 42. already many billions of facts & rules Everybody’s doing it! May ‘09 estimate > 4.2 billion triples + 140 million interlinks It gets bigger every month
    • 43. It gets bigger every month
    • 44. And many more • Reuters • New York Times • EU (EUROSTAT, others) • BBC • Facebook • ….
    • 45. So how good is this observational instrument ? Studies on validity (e.g. in science dynamics) methods for provenance & trust methods for attribution & citation
    • 46. For real ? “ use the power of information to explore social and economic life on Earth ” 1bn€ over 10 years
    • 47. Pfew….
    • 48. Take home message use the web & the web-of-data to obtain your data use the web-of-data to share your data yes, you can do this too! Collaborate with computer scientists reflect on deeper consquences for the social sciences (methodological, theoretical, etc)
    • 49. Acknowledgements I’ve freely used material from the work of Shenghui Wang Paul Groth Julie Birkholz Wouter van Atteveldt Laurens van Rietveld Rinke Hoekstra and many in the Semantic Web community