Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
In search of lost knowledge: joining
the dots with Linked Data
Jon Blower

j.d.blower@reading.ac.uk
Department of Meteorol...
instrument
algorithm

platform

dataset
scientists

publication
How do you fill in the blanks?
• Hope the documentation contains pointers
• … to the right versions of things
• … and usin...
Problems
1. Crossing communities is very hard
www.fooddeserts.org
Problems
1. Crossing communities is very hard
2. Lots of useful stuff isn’t formally documented
Lots of useful negative results, which aren’t published
Nearly 1/3 of positive results are false

1000 hypotheses

100 are...
Problems
1. Crossing communities is very hard
2. Lots of useful stuff isn’t formally documented
3. Unknown unknowns…
“There is a spurious decline in low-cloud fraction in
the ISCCP cloud database due to the viewing angle.
It’s in the liter...
Consequences
• We use the data that are most easily
available, not necessarily the best
• Constant re-discovery of the sam...
The “Web of Data”
instrument
algorithm

platform

dataset
scientists

publication
The Web is the
“Web of Documents”
What is being cited?
Why?
Fingers crossed!
• We hope that everything has a document
written about it, or we have nothing to cite
• … and we hope som...
“If … the Web made all the
online documents look like
one huge book, [the

Sir Tim Berners-Lee

Semantic Web] will
make al...
Why is the Semantic Web different?
• Focuses on things, not documents-aboutthings
• Links things together in a meaningful ...
1. Give things unique identifiers
Must be globally unique and persistent
http://www.reading.ac.uk/people/jon.blower
might ...
2. Express relationships between
things
The key concept is the triple
subject predicate object
E.g. “MODIS is carried by t...
Examples of realistic triples
(identifiers are illustrative, not necessarily correct!)

http://www.reading.ac.uk/people/jo...
Aside: how to use the Semantic Web
to ruin jokes
“A hundred kilopascals go into a bar”
“100000”^^unit:Pascal
owl:sameAs
“1...
3. Publish your triples
• Can be as simple as putting a document in the
right format on your website
– (triples can be exp...
4. Profit!
• Everyone can publish their own “view of the world”
– Don’t need a single data structure “to rule them all”

•...
Example: SemantEco

Wildlife observations
Ecosystem impacts
Administrative areas
Pollution data and policies

“Find me all...
So what’s “Linked Data” then?
• The Web is the “web of documents”
• The Semantic Web is the “web of data”
• “Linked Data” ...
carries

produced

authored

produced

describes /
uses /
queries …

The “Web of Data”
Linking Open Data cloud diagram, by Richard
Cyganiak and Anja Jentzsch. http://lod-cloud.net/
CHARMe:

Sharing climate knowledge through
Linked Data
(Jan 2013 – Dec 2014)
From observations to decisions
Observations

Satellites

(obs. sometimes
used for model
initialization)

Climate
Record
Cr...
Where can users go for help?
• Scientific literature

– Huge, verbose and inaccessible to some communities
– Not well link...
How can climate data users decide whether a
dataset is fit for their purpose?
(N.B. We consider that “data quality” and “f...
“Commentary metadata”

37
Examples of commentary metadata
• Post-fact annotations, e.g. citations, ad-hoc comments and notes;
• Results of assessmen...
How will this be done?
• CHARMe will create
connected repositories of
commentary information

Data provider
website

rd
33...
Open Annotation
•

We are using Open Annotation for
recording commentary

– World Wide Web Consortium standard

•

Associa...
Where does CHARMe fit in?
Observations
Satellites

(obs. sometimes
used for model
initialization)

Climate
Record
Creation...
www.fooddeserts.org
Challenges
• Ensuring community adoption:

– We are “injecting” CHARMe capabilities into existing
websites used in the com...
What CHARMe will enable
(some examples)

Users:
- “Find me all the documents that have been written about this dataset”
- ...
What CHARMe will not enable
• “Give me the best dataset on sea surface temperature”
– The “best” dataset depends on the ap...
Beware of 5-star
ratings…

(words of wisdom from xkcd.com)

TERRIBLE
How will you use CHARMe?

• Search for datasets on existing data provider website
• Use the “CHARMe button” to view or rec...
“Significant events” viewer

•
•
•
•

Google Finance (above) matches stock prices with news events
CHARMe tool will match ...
Fine-grained commentary

• Will allow creation and discovery about specific parts of
datasets
• E.g. variables, geographic...
Jon Blower
Debbie Clifford
Tristan Quaife
Sam Williams

•
•

Using Linked Data to combine EO and socioeconomic data in 8 a...
Summary
•

Linked Data can join up information
from all over the Web

•

Makes disparate data sources
discoverable and pro...
Thank you!
j.d.blower@reading.ac.uk
@Jon_Blower
http://www.charme.org.uk
@MelodiesProject
In search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked Data
Upcoming SlideShare
Loading in …5
×

In search of lost knowledge: joining the dots with Linked Data

889 views

Published on

These slides are from my seminar to the University of Reading Department of Meteorology, November 2013. They contain a (hopefully not very technical) introduction to the concepts of Linked Data and how we are applying them in the CHARMe project (http://www.charme.org.uk). In CHARMe we are using Open Annotation to connect users of climate data with community-generated "commentary information" that helps them to understand a dataset's strengths and weaknesses.

The slide notes contain some helpful context, so you might like to download the PPT file!

The slides are licensed as "Creative Commons Attribution 3.0", meaning that you can do what you like with these slides provided that you credit the University of Reading for their creation. See http://creativecommons.org/licenses/by/3.0/.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

In search of lost knowledge: joining the dots with Linked Data

  1. 1. In search of lost knowledge: joining the dots with Linked Data Jon Blower j.d.blower@reading.ac.uk Department of Meteorology Reading e-Science Centre National Centre for Earth Observation University of Reading With thanks to all CHARMe project partners!
  2. 2. instrument algorithm platform dataset scientists publication
  3. 3. How do you fill in the blanks? • Hope the documentation contains pointers • … to the right versions of things • … and using consistent terminology • In restricted communities, we can gather all information into a central location and harmonize it • OR we hope that a Google search throws up the right results
  4. 4. Problems 1. Crossing communities is very hard
  5. 5. www.fooddeserts.org
  6. 6. Problems 1. Crossing communities is very hard 2. Lots of useful stuff isn’t formally documented
  7. 7. Lots of useful negative results, which aren’t published Nearly 1/3 of positive results are false 1000 hypotheses 100 are true 900 are false Test accuracy 95% 95 true positives 5 false negatives 45 false positives 855 true negatives
  8. 8. Problems 1. Crossing communities is very hard 2. Lots of useful stuff isn’t formally documented 3. Unknown unknowns…
  9. 9. “There is a spurious decline in low-cloud fraction in the ISCCP cloud database due to the viewing angle. It’s in the literature, but you might not find it if you’re not specifically looking for it. For example if you’re using the data for model validation you may not spot the problem.” (Claire Barber, paraphrased)
  10. 10. Consequences • We use the data that are most easily available, not necessarily the best • Constant re-discovery of the same issues • Very hard to share information outside communities
  11. 11. The “Web of Data”
  12. 12. instrument algorithm platform dataset scientists publication
  13. 13. The Web is the “Web of Documents”
  14. 14. What is being cited? Why?
  15. 15. Fingers crossed! • We hope that everything has a document written about it, or we have nothing to cite • … and we hope someone writes a new document when “it” changes • We hope that we are citing the right document (i.e. the most authoritative etc)
  16. 16. “If … the Web made all the online documents look like one huge book, [the Sir Tim Berners-Lee Semantic Web] will make all the data in the world look like one huge database.” Photo by Susan Lesch, from Wikipedia
  17. 17. Why is the Semantic Web different? • Focuses on things, not documents-aboutthings • Links things together in a meaningful way • Information is readable by computers • Here’s how it works…
  18. 18. 1. Give things unique identifiers Must be globally unique and persistent http://www.reading.ac.uk/people/jon.blower might represent me http://nasa.gov/instruments/MODIS might represent the MODIS instrument –(these are illustrative, not accurate!)
  19. 19. 2. Express relationships between things The key concept is the triple subject predicate object E.g. “MODIS is carried by the Terra satellite” all three things need to be unique identifiers (we don’t use plain English!)
  20. 20. Examples of realistic triples (identifiers are illustrative, not necessarily correct!) http://www.reading.ac.uk/people/jon.blower http://xmlns.com/foaf/0.1/knows http://www.durham.ac.uk/staff/ed.llewellin http://dx.doi.org/12345.678910 http://purl.org/spar/cito/citesAsDataSource http://www.badc.ac.uk/datasets/sst
  21. 21. Aside: how to use the Semantic Web to ruin jokes “A hundred kilopascals go into a bar” “100000”^^unit:Pascal owl:sameAs “1”^^unit:Bar (Using OWL and QUDT ontologies)
  22. 22. 3. Publish your triples • Can be as simple as putting a document in the right format on your website – (triples can be expressed in many formats) • … or as complex as publishing a multi-billiontriple database – E.g. Ordnance Survey • Either way, your data become part of the Web, not just on the web
  23. 23. 4. Profit! • Everyone can publish their own “view of the world” – Don’t need a single data structure “to rule them all” • Use common identifiers and common vocabularies to link between communities • Different vocabularies can be mapped to each other – E.g. map CF standard names to GRIB codes • Gives a means to record provenance of information – “Direct line of sight from decisions to data” - NASA • Reasoning engines can traverse the graph of links and discover new information
  24. 24. Example: SemantEco Wildlife observations Ecosystem impacts Administrative areas Pollution data and policies “Find me all the sites where chloride pollution exceeds the local policy limits” “Show me the species over time in this region”
  25. 25. So what’s “Linked Data” then? • The Web is the “web of documents” • The Semantic Web is the “web of data” • “Linked Data” is really a set of principles for using the Semantic Web – http://www.w3.org/DesignIssues/LinkedData.html • (Many people use “Linked Data” and “Semantic Web” somewhat interchangeably)
  26. 26. carries produced authored produced describes / uses / queries … The “Web of Data”
  27. 27. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  28. 28. CHARMe: Sharing climate knowledge through Linked Data (Jan 2013 – Dec 2014)
  29. 29. From observations to decisions Observations Satellites (obs. sometimes used for model initialization) Climate Record Creation and Curation Climate Data Records Reports Analysis and applications Decision making Decisions, Policies Model results Climate Modelling (Adapted from Dowell et al., 2013), “Strategy Towards an Architecture for Climate Monitoring from Space”)
  30. 30. Where can users go for help? • Scientific literature – Huge, verbose and inaccessible to some communities – Not well linked to source data • Technical reports and conference proceedings – Hard to find, scattered or inaccessible • Data centres – increasingly strong at providing some important metadata, but don’t usually include community feedback – Not all countries and communities have data centres! • Websites and blogs – From CEOS Handbook to a scientist’s blog – Increasingly useful, but scattered
  31. 31. How can climate data users decide whether a dataset is fit for their purpose? (N.B. We consider that “data quality” and “fitness for purpose” are the same thing) Not specific to climate data!
  32. 32. “Commentary metadata” 37
  33. 33. Examples of commentary metadata • Post-fact annotations, e.g. citations, ad-hoc comments and notes; • Results of assessments, e.g. validation campaigns, intercomparisons with models or other observations, reanalysis; • Provenance, e.g. dependencies on other datasets, processing algorithms and chain, data source; • Properties of data distribution, e.g. data policy and licensing, timeliness (is the data delivered in real time?), reliability; • External events that may affect the data, e.g. volcanic eruptions, ElNino index, satellite or instrument failure, operational changes to the orbit calculations. General rule: information originates from users or external entities, not original data providers – However, sometimes information is not available from the data provider!
  34. 34. How will this be done? • CHARMe will create connected repositories of commentary information Data provider website rd 33rdparty party system system – Stored as triples in “CHARMe nodes”, • Information can be read and entered through websites or Web Services • Using principles of Open Linked Data and the Semantic Web CHARMe node CHARMe node CHARMe node
  35. 35. Open Annotation • We are using Open Annotation for recording commentary – World Wide Web Consortium standard • Associates a body with a target • E.g. a publication could be the body, an EO dataset could be the target • Can record the motivation behind an annotation – Bookmarking, classifying, commenting, describing, editing, highlighting, questioning, replying… (lots more) – Covers a lot of CHARMe use cases! • An annotation can have multiple targets – A key requirement from users
  36. 36. Where does CHARMe fit in? Observations Satellites (obs. sometimes used for model initialization) Climate Record Creation and Curation Climate Modelling (e.g. CMIP5) Climate Data Records Reports Analysis Analysis and and applications applications Decision making Decisions, Policies Model results Supports analysts and scientists in production of information for decision-makers
  37. 37. www.fooddeserts.org
  38. 38. Challenges • Ensuring community adoption: – We are “injecting” CHARMe capabilities into existing websites used in the community – We will “seed” the CHARMe system with information to attract users (e.g. links between publications and datasets) • Ensuring quality of commentary metadata – Moderation is a strong user requirement – We will provide guides to creators of commentary – what makes a comment helpful?
  39. 39. What CHARMe will enable (some examples) Users: - “Find me all the documents that have been written about this dataset” - “… in both peer-reviewed journals and the grey literature” - “… and specifically about precipitation in Africa” - “… in both NEODC’s and Astrium’s archives” - “What factors might affect the quality of this dataset?” e.g. upstream datasets, external events - “What have other users already discovered about this dataset?” - “I want to find information related to the dataset I’m looking at” Data providers: - “Who is using my dataset and what are they saying about it?” - “Let me subscribe to new user comments and reply to them”
  40. 40. What CHARMe will not enable • “Give me the best dataset on sea surface temperature” – The “best” dataset depends on the application • CHARMe will not provide a new “quality stamp” for datasets – But will be able to link to such things if other people publish them, e.g. Bates maturity index, QA4EO certification • CHARMe will not provide access to actual data – (but will enable discovery of data) • Not planning to create (another) “one-stop shop” for information – We want the information to appear where users are already looking
  41. 41. Beware of 5-star ratings… (words of wisdom from xkcd.com) TERRIBLE
  42. 42. How will you use CHARMe? • Search for datasets on existing data provider website • Use the “CHARMe button” to view or record commentary on a particular dataset
  43. 43. “Significant events” viewer • • • • Google Finance (above) matches stock prices with news events CHARMe tool will match climate reanalysis data with “significant events” – Algorithm changes, instrument failures, new data sources Will allow user annotations on the data and events Will be available on the Web
  44. 44. Fine-grained commentary • Will allow creation and discovery about specific parts of datasets • E.g. variables, geographic locations, time ranges • cf http://maphub.github.io
  45. 45. Jon Blower Debbie Clifford Tristan Quaife Sam Williams • • Using Linked Data to combine EO and socioeconomic data in 8 application areas 16-partner EU FP7 project, just started! – Coordinated from Reading
  46. 46. Summary • Linked Data can join up information from all over the Web • Makes disparate data sources discoverable and processable • CHARMe is using Linked Data techniques to help users of climate data to connect with all the experience in the community – Techniques are not specific to climate data • MELODIES project will combine environmental data and socioeconomic data to develop real-world services using Linked Data
  47. 47. Thank you! j.d.blower@reading.ac.uk @Jon_Blower http://www.charme.org.uk @MelodiesProject

×