Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Semantic Integration at large scale

2,596 views

Published on

Semantic Integration at large scale: benefits and challenges
Maria-Cristina Marinescu, Barcelona

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Semantic Integration at large scale

  1. 1. Semantic integration at large scale: benefits and challenges Maria-Cristina Marinescu Barcelona Supercomputing Center
  2. 2. What is a Smart City?
  3. 3. Smart Cities SUSTAINABLE GREEN SECURE LIVEABLE SELF-SUFFICIENT RESILIENT EQUITABLE
  4. 4. Smart Cities SUSTAINABLE GREEN SECURE LIVEABLE SELF-SUFFICIENT RESILIENT EQUITABLE Right?
  5. 5. Smart Cities? BETTER, FASTER MAIN AVENUE … may split neighbourhoods NEW HOUSING IN PREVIOUSLY RAN DOWN NEIGHBOURHOOD … housing may become unaffordable for previous neighbours … neighbourhoods loose identity, friends/family move apart … existing local issues spread globally (crime, STDs – Baltimore housing projects, ...) CITY FOCUSES ON LOCAL PRODUCE / RURAL REGION SPECIALIZES IN FEW HIGHLY DEMANDED PRODUCTS … draught or plague may hit a crop → there`s no backup plan for rural region, and delayed response in city TRAM TO AVOID TRAFFIC JAMS AND RUN DIRECTLY … may collapse car /public transport. traffic that runs in different direction WALKABLE CITIES … commercial traffic worsens, no place to stop … less parking space for neighbours … possibly in places where people don`t usually walk! (e.g. steep hills) CHANGING / DAMMING WATER COURSE FOR CITIES … ecosystem changes … displaced people
  6. 6. Smart(er) Cities BETTER, FASTER MAIN AVENUE … may split neighbourhoods NEW HOUSING IN PREVIOUSLY RAN DOWN NEIGHBOURHOOD … housing may become unaffordable for previous neighbours … neighbourhoods loose identity, friends/family move apart … existing local issues spread globally (crime, STDs, ...) CITY FOCUSES ON LOCAL PRODUCE / RURAL REGION SPECIALIZES IN FEW HIGHLY DEMANDED PRODUCTS … draught or plague may hit a crop → there`s no backup plan for rural region, and delayed response in city TRAM TO AVOID TRAFFIC JAMS AND RUN DIRECTLY … may collapse car /public transport. traffic that runs in different direction WALKABLE CITIES … commercial traffic worsens, no place to stop … less parking space for neighbours … possibly in places where people don`t usually walk! (e.g. steep hills) CHANGING / DAMMING WATER COURSE FOR CITIES … ecosystem changes … displaced people Define goals Define model that allows - integration of data - measuring the current status Capture interconnections between domains (and goals)
  7. 7. Massive city data integration  Not only in volume, but more importantly, in number of heterogeneous data sources!  Structured, semi-structured, raw data… in different formats  Traditional approach: 1. Manually identify relevant data and map it to a schema  Problem: schema may not support actual data, needs redesign! 2. Consult, analyze, display data  Problem: if schema changes, apps don´t work any longer! Green&Sport Area Number of Trees Coordinates (format1) Status Area Id Adjacent to Coordinates (format2) Surface Green Area Id Number of Trees Sport Area Id Sport Material
  8. 8. Alternatives?  ETL (Extract Transform Load) has some problems:  Don´t scale as well as technologies that access data without moving it around  Too rigid for ad-hoc data exploration, sparse data, implicit information  Semantic technologies
  9. 9.  Explicit relationships  Richer, more general model (domain rather than application specific) Association Is a Semantic integration Area Coordinates Surface Green AreaSport Area Is a Is a Has Coordinates Has Surface Adjacent to Numeric Has Sport Material Numeric Has Trees
  10. 10. Semantic integration Examples:  ‘Adjacent to’ not only a field name in table ‘Area’, but a concept in itself, with labels, relationships, synonyms, superconcepts, subconcepts, etc.  ‘Adjacent to’ transitive  Newly inferred relationships: Sport Area has surface Reflexive Transitive Road Adjacent to Traffic volume Contamination Has Measured by Area Coordinates Surface Green AreaSport Area Is a Is a Has Coordinates Has Surface Adjacent to Numeric Has Sport Material Numeric Has Trees Association Is a  Explicit relationships  Richer, more general model (domain rather than application specific)  Easier to understand, browse, and search  Easier to integrate cross-domain information
  11. 11. Semantic Approach. Part III Area Coordinates Surface Green Area Semantic integration Green & Sport Area inherits from both sport area and green area:  No data duplication  If no data for some attribute (e.g. sport material), no need to leave empty slot (i.e. no assumptions are done about non-existing data)  Allows different formats at the same time (e.g. coordinates) Area Coordinates Surface Green AreaSport Area Numeric Numeric Green & Sport Area Is aIs a  Semi-automatic mapping is easier because at least one of the models is richer; can be done (collaboratively) via Web  Extend rather than change the model  Supports Open World – incomplete and evolving data
  12. 12. Semantic Approach. Part V  Consult, analyze, display  Reduce app re-engineering (e.g. Green & Sport Areas are implicity included as types of Areas)  Semantic access without modifying the apps – mapping to different data via the same model Display areas on map, show attributes and related concepts, query data and model Semantic integration Barcelona schema Quito schema MAP MAP Is aIs a Area Green AreaSport Area Is a (inference) Software Schema Data Is a Is a Green & Sport Area
  13. 13. Semantic Approach. Part VI  Data accessible unambiguously (each concept = URI) and directly via Web  Open Linked Data: publish, share, reuse, import (same format!)  Link with concepts from other semantic models! Area Semantic integration Geo- spatial Measures Indicators Time Area Coordinates Surface
  14. 14. Semantic Models Issues & Solutions  Bad design of the semantic model could affect its extensibility  A good ontology is domain specific, but depends on HOW it will be used (application goals), based on standards when possible  Complex models to browse and search  The more domain specific, the easier to use (but less reusable)  Integration of open data, evolving ontologies, population at large scale  Entities and relationships may evolve over time or between cities  Data of new types – help from mapping tool and exploration / navigation tools  Data not precise, inconsistent, or uncertain  Data curation and consistency  Probabilistic data Challenges
  15. 15. Semantic Models Issues & Solutions  Lack of support for geo-spatial information  Needs huge storage capacity  Integration with maps  Querying easier than in general purpose GIS or general purpose semantic platforms  Specific platforms/algorithms to speed up the queries  Data not available at requested spatial granularity – domain experts apply different approximation methods depending on data type  Query scalability  Trade-off between expressiveness and computational costs  Reasons: reasoning at runtime, no generally applicable efficient indexing schemas, graph-based querying, etc Challenges (cont´d)
  16. 16. Questions?
  17. 17.  Analyzing integrated data may uncover hidden correlations  Usually requires extensive monitoring Discover, predict, plan, react  Neighbourhoods with insufficient green areas, etc...  Conflictive neighbourhoods  Areas where assaults mostly happen Discover
  18. 18. Discover, predict, plan, react  Neighbourhoods with insufficient green areas, etc...  Conflictive neighbourhoods  Areas where assaults mostly happen Planning green areas  Analyzing integrated data may uncover hidden correlations  Usually requires extensive monitoring Discover
  19. 19. ... less racially mixed … have low density of population in the streets … at night when (1) single person waiting (2) bus stop hidden from camera view (3) in tourist neighborhood ...etc … different behavior may be suspicious Discover, predict, plan, react  Analyzing integrated data may uncover hidden correlations  Usually requires extensive monitoring  Neighbourhoods with insufficient green areas, etc...  Conflictive neighbourhoods  Areas where assaults mostly happen Planning green areas Predict Discover
  20. 20. ... less racially mixed … have low density of population in the streets Predict … at night when (1) single person waiting (2) bus stop hidden from camera view (3) in tourist neighborhood ...etc … different behavior may be suspicious Long-term demographic mix, preventive security Discover, predict, plan, react  Neighbourhoods with insufficient green areas, etc...  Conflictive neighbourhoods  Areas where assaults mostly happen Planning green areas  Analyzing integrated data may uncover hidden correlations  Usually requires extensive monitoring Discover
  21. 21. ... less racially mixed … have low density of population in the streets Predict … at night when (1) single person waiting (2) bus stop hidden from camera view (3) in tourist neighborhood ...etc … different behavior may be suspicious Long-term demographic mix, preventive security Discover, predict, plan, react  Neighbourhoods with insufficient green areas, etc...  Conflictive neighbourhoods  Areas where assaults mostly happen Ensure safety, more business, better social gather places Planning green areas  Analyzing integrated data may uncover hidden correlations  Usually requires extensive monitoring Discover
  22. 22. ... less racially mixed … have low density of population in the streets Predict … at night when (1) single person waiting (2) bus stop hidden from camera view (3) in tourist neighborhood ...etc … different behavior may be suspicious Potential safety alarm Long-term demographic mix, preventive security Discover, predict, plan, react  Analyzing integrated data may uncover hidden correlations  Usually requires extensive monitoring  Neighbourhoods with insufficient green areas, etc...  Conflictive neighbourhoods  Areas where assaults mostly happen Planning green areas Ensure safety, more business, better social gather places Discover
  23. 23. ... less racially mixed … have low density of population in the streets Predict … at night when (1) single person waiting (2) bus stop hidden from camera view (3) in tourist neighborhood ...etc … different behavior may be suspicious Potential safety alarm Long-term demographic mix, preventive security Potential safety alarm Discover, predict, plan, react  Analyzing integrated data may uncover hidden correlations  Usually requires extensive monitoring  Neighbourhoods with insufficient green areas, etc...  Conflictive neighbourhoods  Areas where assaults mostly happen Ensure safety, more business, better social gather places Planning green areas Discover
  24. 24.  How should smart infrastructure be integrated to make a Smart City truly smart?  What does this mean for the organization of our city governments?  How do we find guidelines to understand the architecture of integrated infrastructures?

×