SlideShare a Scribd company logo
1 of 29
BETTER WITH DATA:
A CASE STUDY IN SOURCING LINKED DATA
INTO A BUSINESS INTELLIGENCE ANALYSIS
Amin Chowdhury
Charles Boisvert
Matthew Love
Ian Ibbotson
TLAD 2015
13th International Workshop on Teaching, Learning and
Assessment of Databases (TLAD) Conference,
Birmingham City University
Sourcing Linked Data
into a Business Intelligence analysis
Can students apply more than one
technology at a time?
• Early barriers prevents access to later work
• Limited time
• Need to measure performance
• Cocktail effect
We need carefully worked case studies
We use Open data to look into the relationship
between weather conditions and levels of air
pollution.
This is a case using a range of practices:
• Finding and accessing Open Data
• Exploring Linked Data
• Sections of the Extract-Transform-Load
processes of data warehousing
• Building an analytic cube
• Application of data mining tools
Links provided for the data sources and tools.
Our case study: Air pollution kills
Estimated 29,000 early deaths each year in the UK (PHE).
Government targets for reducing the quantities and/or
frequencies of the main pollutants (some figures given
below).
Local Authorities monitor and publish pollution levels in
their areas.
Sheffield City Council monitoring devices:
• Diffusion tubes
• Fully automated processing units.
Measuring pollution
Nitrogen Dioxide diffusion tube
Around 160 diffusion tube devices
Diffusion tubes:
• are spread throughout the city area.
• Have to be sent in for analysis
• Data every six to eight weeks per tube.
• Published aggregated annual level
Measuring pollution
6 automated stations
• A.k.a. Groundhogs
• Fixed spots (sort of)
• Measure a variety of pollutants
• Plus temperature and air pressure (from ‘groundhog 1’)
• Frequent readings (several per hour) when it works
• Log is publicly available
• 15-year archive, with gaps
• Some post-editing: deletions, correction of outliers.
Data is available
Sheffield City Council web sites:
• Air Quality:
https://www.sheffield.gov.uk/environment/air-quality/monitoring.html
• Air Pollution Monitoring:
http://sheffieldairquality.gen2training.co.uk/sheffield/index.html
Good things:
• Automated station results
• We can selected a range,
choose a format (PostScript,
CSV, Excel), download.
• Data is human-readable (ish)
Is it open?
Like so much data sourced from the Internet…
• Textual descriptions
• No obvious way of automatically deriving further information.
Open data: the idea that certain data should be freely available to everyone to
use and republish as they wish wikipedia.org/Open_data
• e.g. Groundhog1 is at
“Orphanage Road, Firhill” –
where is that? What is it
like?
Is it open?
• Navigation not designed for automation.
• URL does not reflect the name of the Groundhog
• On Sir Tim’s 5-star scale, this is 3 / 5.
• We want automated discovery by data harvesting tools.
• Plus: how flexibly can users contribute to the data?
• How is the meta-data (licencing, quality…)?
Available
Downloadable
Open format
No API
No automatic
discovery
Image: 5stardata.info
Wanted: automated discovery and consumption.
C Boisvert
office
9327
tel
1234
position
Senior
Lecturer
• Store everything as triples
• Rather than primary keys:
Use URIs
PKs are unique in one table of
one system.
URIs are unique World-Wide.
Linked Data
• Form ‘chains’ from point to point
through the graph database.
Air Quality+:
Linked Data for Sheffield Pollution
https://github.com/BetterWithDataSociety
• A database of Sheffield pollution
measurements as linked data.
• Groundhogs have their URI
• Diverse measures, e.g. NO2, SO2, micro-particles (e.g. diesel
fumes), air pressure , air temperature.
• Measurements are archived in the database as triples.
• The ontology allows all but literal values to be further
investigated, for instance to find out more about the NO2
compound.
• Allows machine discovery to add context to data, e.g. the
type of neighbourhood of each of the Groundhog sites.
AQ+ linked data
SPARQL
To query the Subject / Predicate / Value triples in the database, we
use the SPARQL query language.
Specify a partial triple to return all records that fit that context.
Filter – e.g. return values within a selected date range.
Discover programmatically
• What Groundhogs there are
• What pollutants each monitors,
• The readings of those pollutants.
The AQ+ endpoint offers multiple result formats, e.g. CVS, JSON,
XML.
SPARQL Editor
boisvert.me.uk/opendata/sparql_aq+.html
Hourly readings from all available
Groundhogs between selected dates
• Editing
• SPARQL syntax highlighted
• interpreted on AQ+ endpoint
Further data sources
A lucky strike:
• Local enthusiast
• Weather station readings at five-
minute intervals
• In PDF format - 200 pages per month!
• Bytescout PDF -> CSV
Giving added context to facts, through Dimension descriptors added
from other sources.
• From Groundhog1 – temperature and air pressure
• But no data on other factors - wind strength & direction, humidity
Surely these influence pollution formation and/or dispersal? We need
detailed historic weather data; not cheap.
Licencing rights to this data have not been decided in general. Ask permission to use the
data for study purposes (any commercial use of the data could cause the site to be closed).
INTEGRATION OF FURTHER DATA SOURCES
• Microsoft SQL Server Data Warehouse
• ETL processes
• Data Cube from Data Star
• Business Intelligence with MS Analysis
Service
• Data Mining
Data Warehouse
Creation of Data Cube from Data Star
Analysis and PowerPivot Exporting
Self Service Data Exploration
Self Service Data Exploration
Data Mining
Cluster Data Mining Toolc
Ranked by probability
Comparison of properties of cluster 9
Decision Trees
http://aces.shu.ac.uk/teaching.cmsrml/AirQuality
Teaching resources
Questions?

More Related Content

What's hot

RVISL_poster_final
RVISL_poster_finalRVISL_poster_final
RVISL_poster_finalEvan Ezell
 
Application of numerical method
Application of numerical methodApplication of numerical method
Application of numerical methodNayeem Rahman
 
Application of Numerical method in Real Life
Application of Numerical method in Real LifeApplication of Numerical method in Real Life
Application of Numerical method in Real LifeTaqwa It Center
 
Fastnet Aq Conference
Fastnet Aq ConferenceFastnet Aq Conference
Fastnet Aq ConferenceRudolf Husar
 
Smr Fastnet Presentation Take2 Pubs
Smr Fastnet Presentation Take2 PubsSmr Fastnet Presentation Take2 Pubs
Smr Fastnet Presentation Take2 PubsRudolf Husar
 
The Climateprediction.net programme, big data climate modelling
The Climateprediction.net programme, big data climate modellingThe Climateprediction.net programme, big data climate modelling
The Climateprediction.net programme, big data climate modellingDavid Wallom
 
Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesUCD Library
 
Applications of Numerical Method
Applications of Numerical Method Applications of Numerical Method
Applications of Numerical Method MdOsmanAzizMinaj
 
Egu2015 cornell don pico
Egu2015 cornell don picoEgu2015 cornell don pico
Egu2015 cornell don picoSarah Cornell
 
Who are cams users today by Popp
Who are cams users today by PoppWho are cams users today by Popp
Who are cams users today by PoppCopernicus ECMWF
 
2007-10-16 HTAP Juelich
2007-10-16 HTAP Juelich2007-10-16 HTAP Juelich
2007-10-16 HTAP JuelichRudolf Husar
 
Possibilities of Open Source Code
Possibilities of Open Source CodePossibilities of Open Source Code
Possibilities of Open Source CodeRoope Tervo
 

What's hot (20)

Bradley Opal 2011
Bradley Opal 2011Bradley Opal 2011
Bradley Opal 2011
 
RVISL_poster_final
RVISL_poster_finalRVISL_poster_final
RVISL_poster_final
 
Application of numerical method
Application of numerical methodApplication of numerical method
Application of numerical method
 
MiniSymp2011 Bradley
MiniSymp2011 BradleyMiniSymp2011 Bradley
MiniSymp2011 Bradley
 
Ae4102224236
Ae4102224236Ae4102224236
Ae4102224236
 
Application of Numerical method in Real Life
Application of Numerical method in Real LifeApplication of Numerical method in Real Life
Application of Numerical method in Real Life
 
Fastnet Aq Conference
Fastnet Aq ConferenceFastnet Aq Conference
Fastnet Aq Conference
 
Smr Fastnet Presentation Take2 Pubs
Smr Fastnet Presentation Take2 PubsSmr Fastnet Presentation Take2 Pubs
Smr Fastnet Presentation Take2 Pubs
 
The Climateprediction.net programme, big data climate modelling
The Climateprediction.net programme, big data climate modellingThe Climateprediction.net programme, big data climate modelling
The Climateprediction.net programme, big data climate modelling
 
Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access Resources
 
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
 
Applications of Numerical Method
Applications of Numerical Method Applications of Numerical Method
Applications of Numerical Method
 
Peter Schaubs - GeoLena November 11, 2015
Peter Schaubs - GeoLena November 11, 2015Peter Schaubs - GeoLena November 11, 2015
Peter Schaubs - GeoLena November 11, 2015
 
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
 
Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...
 
Egu2015 cornell don pico
Egu2015 cornell don picoEgu2015 cornell don pico
Egu2015 cornell don pico
 
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
 
Who are cams users today by Popp
Who are cams users today by PoppWho are cams users today by Popp
Who are cams users today by Popp
 
2007-10-16 HTAP Juelich
2007-10-16 HTAP Juelich2007-10-16 HTAP Juelich
2007-10-16 HTAP Juelich
 
Possibilities of Open Source Code
Possibilities of Open Source CodePossibilities of Open Source Code
Possibilities of Open Source Code
 

Similar to Sourcing Linked Data into a Business Intelligence Analysis

Tlad better with data - matthew love + charles (2)
Tlad   better with data - matthew love + charles (2)Tlad   better with data - matthew love + charles (2)
Tlad better with data - matthew love + charles (2)Amin Chowdhury
 
SemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challengesSemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challengesAndrew Woolf
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
 
US EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open DataUS EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open Data3 Round Stones
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
 
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...israel edem
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunk
 
RDMRose 2.5 Metadata and data citation
RDMRose 2.5 Metadata and data citationRDMRose 2.5 Metadata and data citation
RDMRose 2.5 Metadata and data citationRDMRose
 
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Laurent Lefort
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web APISammy Fung
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
SplunkLive! Customer Presentation – Harris
SplunkLive! Customer Presentation – HarrisSplunkLive! Customer Presentation – Harris
SplunkLive! Customer Presentation – HarrisSplunk
 
Improving access to geospatial Big Data in the hydrology domain
Improving access to geospatial Big Data in the hydrology domainImproving access to geospatial Big Data in the hydrology domain
Improving access to geospatial Big Data in the hydrology domainClaudia Vitolo
 
ATS-16: Making Data Count, Krista Nordback
ATS-16: Making Data Count, Krista NordbackATS-16: Making Data Count, Krista Nordback
ATS-16: Making Data Count, Krista NordbackBTAOregon
 
Adam Rusbridge (EDINA) - Clarifying e-journal subscription history
Adam Rusbridge (EDINA) - Clarifying e-journal subscription historyAdam Rusbridge (EDINA) - Clarifying e-journal subscription history
Adam Rusbridge (EDINA) - Clarifying e-journal subscription historysherif user group
 

Similar to Sourcing Linked Data into a Business Intelligence Analysis (20)

Tlad better with data - matthew love + charles (2)
Tlad   better with data - matthew love + charles (2)Tlad   better with data - matthew love + charles (2)
Tlad better with data - matthew love + charles (2)
 
SemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challengesSemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challenges
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
US EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open DataUS EPA Resource Conservation and Recovery Act published as Linked Open Data
US EPA Resource Conservation and Recovery Act published as Linked Open Data
 
OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015
 
Getting more from Data with Standards | Paul Davidson | March 2015
Getting more from Data with Standards | Paul Davidson | March 2015Getting more from Data with Standards | Paul Davidson | March 2015
Getting more from Data with Standards | Paul Davidson | March 2015
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
RDMRose 2.5 Metadata and data citation
RDMRose 2.5 Metadata and data citationRDMRose 2.5 Metadata and data citation
RDMRose 2.5 Metadata and data citation
 
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web API
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
 
SplunkLive! Customer Presentation – Harris
SplunkLive! Customer Presentation – HarrisSplunkLive! Customer Presentation – Harris
SplunkLive! Customer Presentation – Harris
 
Improving access to geospatial Big Data in the hydrology domain
Improving access to geospatial Big Data in the hydrology domainImproving access to geospatial Big Data in the hydrology domain
Improving access to geospatial Big Data in the hydrology domain
 
ATS-16: Making Data Count, Krista Nordback
ATS-16: Making Data Count, Krista NordbackATS-16: Making Data Count, Krista Nordback
ATS-16: Making Data Count, Krista Nordback
 
Adam Rusbridge (EDINA) - Clarifying e-journal subscription history
Adam Rusbridge (EDINA) - Clarifying e-journal subscription historyAdam Rusbridge (EDINA) - Clarifying e-journal subscription history
Adam Rusbridge (EDINA) - Clarifying e-journal subscription history
 

More from Amin Chowdhury

OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLS
OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLSOPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLS
OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLSAmin Chowdhury
 
Database Project management
Database Project managementDatabase Project management
Database Project managementAmin Chowdhury
 
Database Industry perspective
Database Industry perspectiveDatabase Industry perspective
Database Industry perspectiveAmin Chowdhury
 
090321 - EEHCO Project Plan PSTC- Dhaka
090321 - EEHCO Project Plan PSTC- Dhaka090321 - EEHCO Project Plan PSTC- Dhaka
090321 - EEHCO Project Plan PSTC- DhakaAmin Chowdhury
 
E-commerce Project Development
E-commerce Project DevelopmentE-commerce Project Development
E-commerce Project DevelopmentAmin Chowdhury
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernAmin Chowdhury
 

More from Amin Chowdhury (7)

OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLS
OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLSOPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLS
OPPORTUNITIES FOR THE USE OF DIGITAL TECHNOLOGY TOOLS
 
Database Project management
Database Project managementDatabase Project management
Database Project management
 
Database Industry perspective
Database Industry perspectiveDatabase Industry perspective
Database Industry perspective
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
 
090321 - EEHCO Project Plan PSTC- Dhaka
090321 - EEHCO Project Plan PSTC- Dhaka090321 - EEHCO Project Plan PSTC- Dhaka
090321 - EEHCO Project Plan PSTC- Dhaka
 
E-commerce Project Development
E-commerce Project DevelopmentE-commerce Project Development
E-commerce Project Development
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing Concern
 

Sourcing Linked Data into a Business Intelligence Analysis

  • 1. BETTER WITH DATA: A CASE STUDY IN SOURCING LINKED DATA INTO A BUSINESS INTELLIGENCE ANALYSIS Amin Chowdhury Charles Boisvert Matthew Love Ian Ibbotson TLAD 2015 13th International Workshop on Teaching, Learning and Assessment of Databases (TLAD) Conference, Birmingham City University
  • 2. Sourcing Linked Data into a Business Intelligence analysis Can students apply more than one technology at a time? • Early barriers prevents access to later work • Limited time • Need to measure performance • Cocktail effect
  • 3. We need carefully worked case studies We use Open data to look into the relationship between weather conditions and levels of air pollution. This is a case using a range of practices: • Finding and accessing Open Data • Exploring Linked Data • Sections of the Extract-Transform-Load processes of data warehousing • Building an analytic cube • Application of data mining tools Links provided for the data sources and tools.
  • 4. Our case study: Air pollution kills Estimated 29,000 early deaths each year in the UK (PHE). Government targets for reducing the quantities and/or frequencies of the main pollutants (some figures given below). Local Authorities monitor and publish pollution levels in their areas. Sheffield City Council monitoring devices: • Diffusion tubes • Fully automated processing units.
  • 5. Measuring pollution Nitrogen Dioxide diffusion tube Around 160 diffusion tube devices Diffusion tubes: • are spread throughout the city area. • Have to be sent in for analysis • Data every six to eight weeks per tube. • Published aggregated annual level
  • 6. Measuring pollution 6 automated stations • A.k.a. Groundhogs • Fixed spots (sort of) • Measure a variety of pollutants • Plus temperature and air pressure (from ‘groundhog 1’) • Frequent readings (several per hour) when it works • Log is publicly available • 15-year archive, with gaps • Some post-editing: deletions, correction of outliers.
  • 7. Data is available Sheffield City Council web sites: • Air Quality: https://www.sheffield.gov.uk/environment/air-quality/monitoring.html • Air Pollution Monitoring: http://sheffieldairquality.gen2training.co.uk/sheffield/index.html Good things: • Automated station results • We can selected a range, choose a format (PostScript, CSV, Excel), download. • Data is human-readable (ish)
  • 8. Is it open? Like so much data sourced from the Internet… • Textual descriptions • No obvious way of automatically deriving further information. Open data: the idea that certain data should be freely available to everyone to use and republish as they wish wikipedia.org/Open_data • e.g. Groundhog1 is at “Orphanage Road, Firhill” – where is that? What is it like?
  • 9. Is it open? • Navigation not designed for automation. • URL does not reflect the name of the Groundhog • On Sir Tim’s 5-star scale, this is 3 / 5. • We want automated discovery by data harvesting tools. • Plus: how flexibly can users contribute to the data? • How is the meta-data (licencing, quality…)? Available Downloadable Open format No API No automatic discovery Image: 5stardata.info
  • 10. Wanted: automated discovery and consumption. C Boisvert office 9327 tel 1234 position Senior Lecturer • Store everything as triples • Rather than primary keys: Use URIs PKs are unique in one table of one system. URIs are unique World-Wide. Linked Data • Form ‘chains’ from point to point through the graph database.
  • 11. Air Quality+: Linked Data for Sheffield Pollution https://github.com/BetterWithDataSociety • A database of Sheffield pollution measurements as linked data. • Groundhogs have their URI • Diverse measures, e.g. NO2, SO2, micro-particles (e.g. diesel fumes), air pressure , air temperature. • Measurements are archived in the database as triples. • The ontology allows all but literal values to be further investigated, for instance to find out more about the NO2 compound. • Allows machine discovery to add context to data, e.g. the type of neighbourhood of each of the Groundhog sites.
  • 13. SPARQL To query the Subject / Predicate / Value triples in the database, we use the SPARQL query language. Specify a partial triple to return all records that fit that context. Filter – e.g. return values within a selected date range. Discover programmatically • What Groundhogs there are • What pollutants each monitors, • The readings of those pollutants. The AQ+ endpoint offers multiple result formats, e.g. CVS, JSON, XML.
  • 14. SPARQL Editor boisvert.me.uk/opendata/sparql_aq+.html Hourly readings from all available Groundhogs between selected dates • Editing • SPARQL syntax highlighted • interpreted on AQ+ endpoint
  • 15. Further data sources A lucky strike: • Local enthusiast • Weather station readings at five- minute intervals • In PDF format - 200 pages per month! • Bytescout PDF -> CSV Giving added context to facts, through Dimension descriptors added from other sources. • From Groundhog1 – temperature and air pressure • But no data on other factors - wind strength & direction, humidity Surely these influence pollution formation and/or dispersal? We need detailed historic weather data; not cheap. Licencing rights to this data have not been decided in general. Ask permission to use the data for study purposes (any commercial use of the data could cause the site to be closed).
  • 16. INTEGRATION OF FURTHER DATA SOURCES • Microsoft SQL Server Data Warehouse • ETL processes • Data Cube from Data Star • Business Intelligence with MS Analysis Service • Data Mining
  • 18. Creation of Data Cube from Data Star
  • 20. Self Service Data Exploration
  • 21. Self Service Data Exploration
  • 23.
  • 26. Comparison of properties of cluster 9