Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dealing with Data Diversity in a Smart City Data Hub

Keynote speech at the Semantics for Smarter Cities workshop SSC 2014 at ISWC 2014

Dealing with Data Diversity in a Smart City Data Hub

  1. 1. Dealing with Data Diversity in a Smart City Data Hub Mathieu d'Aquin - @mdaquin slideshare.net/mdaquin Knowledge Media Institute, The Open University
  2. 2. Diversity where a penguin is a dataset
  3. 3. Why should we care about diversity? Because diversity is good, and what makes data diverse is not the same as what makes it more or less relevant
  4. 4. Why should we care about diversity? Because it is hard to manage How many species of species of penguins/animals/things? How many biologist to classify them? and that's purely static... unlike species, new data appear all the time...
  5. 5. Why should we care about diversity? The Eskimo language has 255 different words for "visiting linguist" Because we might have a lot of it, or what we need to manage is very granular
  6. 6. Data diversity in a Smart City Example of the MK:Smart project in Milton Keynes, UK (mksmart.org)
  7. 7. Data diversity in a Smart City Partners in the MK:Smart project
  8. 8. Data diversity in a Smart City Areas of the MK:Smart project
  9. 9. Data diversity in a Smart City MK Data Hub - Where diversity is handled
  10. 10. A concrete example Wifi-based presence sensors
  11. 11. A concrete example Wifi-based presence sensors 10-12 can covers an reasonably large enclosed area (here, the refectory of the Open University);
  12. 12. A concrete example Wifi-based presence sensors Use trianglation to find the location of wifi-enabled devices.
  13. 13. A concrete example Wifi-based presence sensors Basic statistical analysis to extract patterns of usage of the facility
  14. 14. A concrete example Wifi-based presence sensors Basic statistical analysis to extract patterns of usage of the facility
  15. 15. A concrete example: Diversity
  16. 16. A concrete example: Diversity
  17. 17. A concrete example: Diversity
  18. 18. A concrete example: Diversity
  19. 19. A concrete example: Diversity
  20. 20. A concrete example: Diversity
  21. 21. How do we usually deal with this data heterogenity for we use alignments, mappings, links, etc. Example: The LinkedUp Catalogue of datasets for education includes mappings between the vocanulaties of different datasets data.linkededucation.org/linkedup/catalogue/
  22. 22. What about diversity at the policy level?
  23. 23. What about diversity at the policy level?
  24. 24. What about diversity at the policy level?
  25. 25. What about diversity at the policy level?
  26. 26. More structured representation VoID and DC to represent datasets, PROV-O for basic provenance.
  27. 27. More structured representation ODRL for the structured representation of policies and rights
  28. 28. More structured representation With the tools to deal with it
  29. 29. More structured representation And the processes
  30. 30. Reasoning on the way policy-information propagates Requires an appropriate representation of dataflows
  31. 31. DataNode http://purl.org/datanode/ns/ An ontology of relationships between data artifacts (DataNodes).
  32. 32. DataNode Captures the essence of dataflows rather than the process, as a basis for meta-information propagation.
  33. 33. Propagating meta information accross dataflows Examples of rules: Duties such as attributions propagate over relations of derivation, but not necessraly others Permissions such as the right to redistribute however do not propagate over relations of derivation, except of specific cases (e.g. copies) Prohibitions such as preventing commercial exploitation propage over derivations
  34. 34. Discussion/future A lot of the semantics for Smart Cities work focus on data heterogeneity. There is a need to look at data diversity at the meta-information level (here we focus on policy related information). How to manage, catalogue, keep track of and manipulate a large number of datasets with diverse rights, access, validity, scope. How do we help users/developers in exploring and exploiting this diversity...
  35. 35. Discussion/future Master of Datasets
  36. 36. Discussion/future Need for a clear, semantic (i.e. ontological) foundation for describing and defining data artefacts. DataNode is a step towards defining their relationships. Vocabularies such as ODRL and VOID focus on specific aspects. More is needed to formally represent the foundamental descriptors of data (scope, validity, policy, ...)
  37. 37. Thanks! Mathieu d'Aquin Alessandro Adamou Enrico Daga Shuangyan Liu Keerthi Thomas Enrico Motta

×