Beyond the Query: Transforming Air
Quality Data Discovery with AI
David Topping
david.topping@manchester.ac.uk
•Build a set of user
stories describing the needs of
users throughout their journey of
using environmental data.
•Build a set of user archetypes.
These describe what different
users are setting out to achieve,
what steps they go through to
achieve these, and important
challenges faced by these groups.
User needs mapping approach
Air quality within wider environmental science
https://www.digital-solutions.uk/wp-content/uploads/2023/09/NERC-DSH-Report-ODM-FINAL.pdf
Stakeholders - Who are we talking to?
Several common challenges emerged, including:
• a lack of suitable data to satisfy the task in hand, with
many datasets being inaccessible due to lack of
discoverability, paywalls
• the proliferation and confusing nature of different
platforms that required bespoke access approaches,
inchoate data formats and methods of retrieval and
unclear provenance.
• Having to work with locked down systems due to security or
data protection requirements.
• a lack of a coherent and centralised data management
infrastructure, prevalence of legacy datasets and difficulty
in getting people to openly share data.
Unique Property Reference Number (UPRN)
for every addressable location across the
UK. May be any kind of building, or it may
be an object that might not have a 'normal'
address – such as a bus shelter. UPRN
tagging for geospatial data
https://satre-specification.readthedocs.io/en/stable/
Data data everywhere - Technologies and standards to enable health and air-quality data integration
Air quality within wider environmental science
Multiple, uncoordinated sources — AURN, LA monitoring, research networks, private
sensors.
• No single point of discovery, no shared metadata standards.
• Responsibility for coordination is unclear — DEFRA, devolved administrations,
UKRI, LAs, research centres all play roles.
• Some datasets now Geotagged in e.g. BioBank and used 'as standard'
The Current AQ data Landscape — “Rich Data, Poor Access”?
We’re between stages 2 and 3 — rich data, some shared systems, but not yet discoverable or interoperable.
• Filing Cabinet Era 2. Shared Drive Era 3. Cloud Era 4. Federated Era
“Information is often held in people’s heads, not systems.” – NERC DSH user research
The biggest friction
isn’t just technical
— it’s cultural and
organisational.
Key
Challenges
Theme Issue Possible Solutions
Data data everywhere - Technologies and standards to enable health and air-quality data integration
AI placing renewed emphasis on data access and potential solutions
Taxonomy of AI regulatory approaches
flexibility
stricter
controls
• DSIT's “pro-innovation to AI regulation”
•
• EU’s AI Act
• Principles based
• Standards based
• Agile and experimentalist
• Facilitating and enabling
• Adapting existing laws
• Access to information and
transparency mandates
• Risk based
• Rights based
• Liability
"....due to digital divides within countries,
the development and use of specific AI
systems may likely produce enormous
returns for a few powerful people and
simultaneously generate significant
adverse effects for the general population
and marginalized
populations....legislators should discuss
and explore regulatory
instruments..including:
....
i. Access to data.
....
"
Data data everywhere - focus is on ensuring access to data
AI placing renewed emphasis on data access and potential solutions
Data data everywhere - Leveraging AI to improve search and discovery
AI placing renewed emphasis on data access and potential solutions
metadata
catalogue
metadata
catalogue
Opportunities — Towards a Federated, Discoverable System
Opportunities — Towards a Federated, Discoverable System
We don’t need a single owner — we need a consolidated way to find and use
air quality data
Summary
Summary - lots of positive work but do we need to do better in a number of areas?
Continuing demonstration of data science technologies in enabling data discovery.
Likely to continue and move to automated workflows.
There are existing barriers on data access - fundamental problem
• Cultural change needed
• AI will help with search and discovery - does not negate need for data provenance
wrt regulations and standards.
• Are roles and responsibilities clear? No
No longer an isolated academic area of work. Service provision evolving
• Partnerships with technology providers will be essential

11:20 Beyond the Query: Transforming Air Quality Data Discovery with AI (D Topping)

  • 1.
    Beyond the Query:Transforming Air Quality Data Discovery with AI David Topping david.topping@manchester.ac.uk
  • 2.
    •Build a setof user stories describing the needs of users throughout their journey of using environmental data. •Build a set of user archetypes. These describe what different users are setting out to achieve, what steps they go through to achieve these, and important challenges faced by these groups. User needs mapping approach Air quality within wider environmental science https://www.digital-solutions.uk/wp-content/uploads/2023/09/NERC-DSH-Report-ODM-FINAL.pdf Stakeholders - Who are we talking to?
  • 3.
    Several common challengesemerged, including: • a lack of suitable data to satisfy the task in hand, with many datasets being inaccessible due to lack of discoverability, paywalls • the proliferation and confusing nature of different platforms that required bespoke access approaches, inchoate data formats and methods of retrieval and unclear provenance. • Having to work with locked down systems due to security or data protection requirements. • a lack of a coherent and centralised data management infrastructure, prevalence of legacy datasets and difficulty in getting people to openly share data. Unique Property Reference Number (UPRN) for every addressable location across the UK. May be any kind of building, or it may be an object that might not have a 'normal' address – such as a bus shelter. UPRN tagging for geospatial data https://satre-specification.readthedocs.io/en/stable/ Data data everywhere - Technologies and standards to enable health and air-quality data integration Air quality within wider environmental science
  • 4.
    Multiple, uncoordinated sources— AURN, LA monitoring, research networks, private sensors. • No single point of discovery, no shared metadata standards. • Responsibility for coordination is unclear — DEFRA, devolved administrations, UKRI, LAs, research centres all play roles. • Some datasets now Geotagged in e.g. BioBank and used 'as standard' The Current AQ data Landscape — “Rich Data, Poor Access”? We’re between stages 2 and 3 — rich data, some shared systems, but not yet discoverable or interoperable. • Filing Cabinet Era 2. Shared Drive Era 3. Cloud Era 4. Federated Era “Information is often held in people’s heads, not systems.” – NERC DSH user research
  • 5.
    The biggest friction isn’tjust technical — it’s cultural and organisational. Key Challenges Theme Issue Possible Solutions
  • 6.
    Data data everywhere- Technologies and standards to enable health and air-quality data integration AI placing renewed emphasis on data access and potential solutions
  • 7.
    Taxonomy of AIregulatory approaches flexibility stricter controls • DSIT's “pro-innovation to AI regulation” • • EU’s AI Act • Principles based • Standards based • Agile and experimentalist • Facilitating and enabling • Adapting existing laws • Access to information and transparency mandates • Risk based • Rights based • Liability "....due to digital divides within countries, the development and use of specific AI systems may likely produce enormous returns for a few powerful people and simultaneously generate significant adverse effects for the general population and marginalized populations....legislators should discuss and explore regulatory instruments..including: .... i. Access to data. .... " Data data everywhere - focus is on ensuring access to data AI placing renewed emphasis on data access and potential solutions
  • 8.
    Data data everywhere- Leveraging AI to improve search and discovery AI placing renewed emphasis on data access and potential solutions metadata catalogue
  • 9.
    metadata catalogue Opportunities — Towardsa Federated, Discoverable System
  • 10.
    Opportunities — Towardsa Federated, Discoverable System We don’t need a single owner — we need a consolidated way to find and use air quality data
  • 11.
    Summary Summary - lotsof positive work but do we need to do better in a number of areas? Continuing demonstration of data science technologies in enabling data discovery. Likely to continue and move to automated workflows. There are existing barriers on data access - fundamental problem • Cultural change needed • AI will help with search and discovery - does not negate need for data provenance wrt regulations and standards. • Are roles and responsibilities clear? No No longer an isolated academic area of work. Service provision evolving • Partnerships with technology providers will be essential