Alessandro Cimbelli - Assessment of the refugee camps
extension from optical satellite images
Workshop 16 giugno 2016
Auditorium Antonianum – Sala San Francesco
Viale Manzoni, 1 Roma
Marina Arcasenza - A web storymap of inbound Italian
migration flows
Workshop 16 June 2016
Auditorium Antonianum – Sala San Francesco
Viale Manzoni, 1 Roma
One of the most promising areas in the world of NoSQL - graphs based storage and processing system which based on the theory of graphs. Neo4J - is, perhaps, the most popular Graph database at the moment. It provides high performance data storage and working with graphs, using various Java APIs and declarative query language Cypher.
Adobe, Cisco, classmates.com, Deutsche telecom and many others are using Neo4J.
Alessandro Cimbelli - Assessment of the refugee camps
extension from optical satellite images
Workshop 16 giugno 2016
Auditorium Antonianum – Sala San Francesco
Viale Manzoni, 1 Roma
Marina Arcasenza - A web storymap of inbound Italian
migration flows
Workshop 16 June 2016
Auditorium Antonianum – Sala San Francesco
Viale Manzoni, 1 Roma
One of the most promising areas in the world of NoSQL - graphs based storage and processing system which based on the theory of graphs. Neo4J - is, perhaps, the most popular Graph database at the moment. It provides high performance data storage and working with graphs, using various Java APIs and declarative query language Cypher.
Adobe, Cisco, classmates.com, Deutsche telecom and many others are using Neo4J.
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial DataGloria Re Calegari
We present the challenges faced by a Data Scientist in exploring and analyzing heterogeneous Open Geospatial Data. This work is aimed at explaining the initial steps of a data exploration process, specifically aimed at discovering similarities and differences conveyed by diverse sources and resulting from their correlation analysis; we also explore the influence of spatial resolution on the dependence strength between heterogeneous urban sources, to pave the way to a meaningful information fusion.
Using R to Visualize Spatial Data: R as GIS - Guy LansleyGuy Lansley
This talk demonstrates some of the benefits of using R to visualize spatial data efficiently and clearly.
It was originally presented by Guy Lansley (UCL and the Consumer Data Research Centre) to the GIS for Social Data and Crisis Mapping Workshop at the University of Kent.
Shows the use of Excel and Esri ArcGIS Desktop 10.1 to make statistics reports from Innovative Interfaces circulation data. Originally presented at IUG 2014 as part of panel: Slinging statistics and dicing data in the public library.
City Data Dating: emerging affinities between diverse urban datasetsGloria Re Calegari
Cities are complex environments in which digital technologies are more and more pervasive; this digitization of the urban space has led to a rich ecosystem of data producers and data consumers. Moreover, heterogeneous sources differ in terms of data complexity, spatio-temporal resolution and curation/maintenance costs. Do those diverse urban sources reflect the same picture of the city? Do distinct perspectives share some commonalities?
We present our data analytics empirical experiments on a set of urban sources related to the city of Milano; our investigation is aimed at discovering “affinities” between datasets by means of different quantitative and qualitative correlation analyses. We also explore the influence of spatial resolution and data complexity on the dependence strength between heterogeneous urban sources, to pave the way to a meaningful information fusion.
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...GIS in the Rockies
ISO 19157 Geographic information - Data quality provides a structure for organizing comprehensive data quality assessment measures. What it doesn't provide is a priority of data quality elements for a specific dataset and jurisdiction. Over the past year, the Colorado Address Data Quality subgroup has developed a prioritized list of data quality measures for addressed locations, in an effort to establish common criteria and a scorecard. These will provide a means to describe the data compiled from multiple jurisdictions with varying origins in an objective manner so users of the data can determine their fitness for use. It also provides feedback for local jurisdictions to increase their level of quality according to their need and discretion.
In addition, the State of Colorado in coordination with the US Postal Service, the US Census Bureau, and state and local agencies will begin to provide feedback to local jurisdictions on possible discrepancies in comparison to Master Street Address Guides (MSAGs), the Coding Accuracy Support System (CASS), Statewide Colorado Voter Registration and Election System (SCORE), the Colorado Motorist Insurance Identification Database MIDB, and other datasets that contain addresses. These comparisons are particularly helpful in identifying possible omissions but also in confirming and completing georeferenced address data content. This presentation will describe the value of these comparisons and progress in developing and measuring data quality using common criteria and objective measures.
Lemmens kessler-agile-linked data v3-slideshareRob Lemmens
Geo-Information Visualizations of Linked Data. Linked Data provides an ever-growing source of geographically referenced data for application development. In this paper, we analyse the workflow behind the development of such an application. Using two examples based on worldwide development aid and refugee data, we discuss the steps from locating data for use and data integration, up to the actual visualization in a web-based application. At each step, we discuss the skill set required for completion and point to potential challenges. This includes RDF, SPARQL, HTTP requests, HTML, and JavaScript. We conclude the paper by putting our case study in the context of GIScience curriculum development.
Webinar - INSPIRE 2020 Virtual Conference
----
How Spatial Data Infrastructures (SDIs) can evolve into Data Ecosystems?
This is the main question that the ongoing study addressing “Data ecosystems for geospatial data - Evolution of Spatial Data Infrastructures” (JRC/IPR/2019/MVP/2781) is addressing. It is performed by the Luxembourg Institute of Science and Technology (LIST) in close collaboration with the Joint Research Centre of the European Commission.
The purpose of this study is to identify and analyse a set of successful data ecosystems and to address recommendations in support of the implementation of data-driven innovation in line with the recently published European Strategy for Data. It investigates factors such as relevant actors, their responsibilities and data value chains, emerging data sources (e.g. the Internet of Things) and technical/architectural approaches (e.g. digital platforms, mobile-by-default, Application Programming Interfaces). It also addresses the interoperability between data ecosystems in different sectors and/or different countries and crosscutting requirements for geospatial data.
This session is intended to share with the audience the study approach, methodological approach and first identified Data Ecosystems, and to learn from their experiences with Data Ecosystems: emergence, barriers, opportunities, sustainability, interoperability between ecosystems, etc.
AGENDA
14.00 - Welcome, Introduction to the context of the study (JRC)
14.10 - Study approach and methodological framework (LIST)
14.20 - Identified data ecosystems and selection criteria (LIST)
14.25 - Illustration of data ecosystem analysis (LIST)
Ghislain Delabie, Simon Saint-Georges, Urban Rennes Data Interface
Sean Wiid, UP42
Charles Moszkowicz, ENEO • Interactive session (All, 20)
Next activities, Goodbye.
Approaches to representing and delivering geospatial data in the semantic Web...Paul Box
To address many real world challenges, multi-disciplinary data is required. In the earth sciences, EO together with feature based geospatial representations of spatial objects, such as administrative reporting geographies, topographic features and in-situ sensor assets, typically need to be accessed and integrated to support analysis. Currently, coordinates are the primary linga franca for operating across both types of data. However, human users refer to spatial objects using spatial identifiers such as place names or well know codes such as the postcodes, Local Government Area (LGA) codes or hydrological catchment codes. These identifiers provide a critical data integration mechanism as they are used to reference and access observations and measurement data about spatial objects, such as time series water levels for a catchment or demographic data for a Local Government Area (LGA) held in numerous systems.
With the emergence of the semantic Web, current geometry and coordinate oriented approaches to geospatial data need to change. More non-GIS users need to access reliable identifiers for places (spatial identifiers), rather than their geometries. Consequently, there is a need to develop approaches that enable us to manage and share the identity of spatial object e.g. Lake Arthur and link spatial identities to available geometric representations for use when required.
Emerging approaches to representing and using geospatial data in the semantic Web have some implications for earth observation data. These include the potential use of spatial identifier sets as vocabularies for the spatial dimension of EO data cubes, enabling users to reliably query EO data using these identifiers. If identifiers sets are applied across data cubes, consistent identifier based queries are possible across distributed information sources.
This presentation will highlight some emerging approaches to delivering geospatial data into the semantic web, together with implications for improved integration of EO and feature based representations of spatial objects.
An evaluation of SimRank and Personalized PageRank to build a recommender sys...Paolo Tomeo
The Web of Data is the natural evolution of the World Wide Web from a set of interlinked documents to a set of interlinked entities. It is a graph of information resources interconnected by semantic relations, thereby yielding the name Linked Data. The proliferation of Linked Data is for sure an opportunity to create a new family of data-intensive applications such as recommender systems. In particular, since content-based recommender systems base on the notion of similarity between items, the selection of the right graph-based similarity metric is of paramount importance to build an effective recommendation engine. In this paper, we review two existing metrics, SimRank and PageRank, and investigate their suitability and performance for computing similarity between resources in RDF graphs and investigate their usage to feed a content-based recommender system. Finally, we conduct experimental evaluations on a dataset for musical artists and bands recommendations thus comparing our results with two other content-based baselines
measuring their performance with precision and recall, catalog coverage, items distribution and novelty metrics.
Towards emergency vehicle routing using Geolinked Open Data: the case study o...Sergio Consoli
Linked Open Data (LOD) has gained significant momentum over the past years as a best practice of promoting the sharing and publi- cation of structured data on the semantic Web. Currently LOD is reach- ing significant adoption also in Public Administrations (PAs), where it is often required to be connected to existing platforms, such as GIS-based data management systems. Bearing on previous experience with the pi- oneering data.cnr.it, through Semantic Scout, as well as the Agency for Digital Italy recommendations for LOD in Italian PA, we are working on the extraction, publication, and exploitation of data from the Geographic Information System of the Municipality of Catania, referred to as SIT (“Sistema Informativo Territoriale”). The goal is to boost the metropolis towards the route of a modern Smart City by providing prototype inte- grated solutions supporting transport, public health, urban decor, and social services, to improve urban life. In particular a mobile application focused on real-time road traffic and public transport management is currently under development to support sustainable mobility and, espe- cially, to aid the response to urban emergencies, from small accidents to more serious disasters. This paper describes the results and lessons learnt from the first work campaign, aiming at analyzing, reengineering, linking, and formalizing the Shape-based geo-data from the SIT.
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...Data Con LA
The impact of data science on business is undeniable, and the value it provides is growing without signs of slowing. To keep up with this rapidly evolving technology landscape, data scientists must adapt and specialize through continuous learning. This talk focuses on how they can do that in a way that maximizes the positive impact data science will have on their organization.
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial DataGloria Re Calegari
We present the challenges faced by a Data Scientist in exploring and analyzing heterogeneous Open Geospatial Data. This work is aimed at explaining the initial steps of a data exploration process, specifically aimed at discovering similarities and differences conveyed by diverse sources and resulting from their correlation analysis; we also explore the influence of spatial resolution on the dependence strength between heterogeneous urban sources, to pave the way to a meaningful information fusion.
Using R to Visualize Spatial Data: R as GIS - Guy LansleyGuy Lansley
This talk demonstrates some of the benefits of using R to visualize spatial data efficiently and clearly.
It was originally presented by Guy Lansley (UCL and the Consumer Data Research Centre) to the GIS for Social Data and Crisis Mapping Workshop at the University of Kent.
Shows the use of Excel and Esri ArcGIS Desktop 10.1 to make statistics reports from Innovative Interfaces circulation data. Originally presented at IUG 2014 as part of panel: Slinging statistics and dicing data in the public library.
City Data Dating: emerging affinities between diverse urban datasetsGloria Re Calegari
Cities are complex environments in which digital technologies are more and more pervasive; this digitization of the urban space has led to a rich ecosystem of data producers and data consumers. Moreover, heterogeneous sources differ in terms of data complexity, spatio-temporal resolution and curation/maintenance costs. Do those diverse urban sources reflect the same picture of the city? Do distinct perspectives share some commonalities?
We present our data analytics empirical experiments on a set of urban sources related to the city of Milano; our investigation is aimed at discovering “affinities” between datasets by means of different quantitative and qualitative correlation analyses. We also explore the influence of spatial resolution and data complexity on the dependence strength between heterogeneous urban sources, to pave the way to a meaningful information fusion.
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...GIS in the Rockies
ISO 19157 Geographic information - Data quality provides a structure for organizing comprehensive data quality assessment measures. What it doesn't provide is a priority of data quality elements for a specific dataset and jurisdiction. Over the past year, the Colorado Address Data Quality subgroup has developed a prioritized list of data quality measures for addressed locations, in an effort to establish common criteria and a scorecard. These will provide a means to describe the data compiled from multiple jurisdictions with varying origins in an objective manner so users of the data can determine their fitness for use. It also provides feedback for local jurisdictions to increase their level of quality according to their need and discretion.
In addition, the State of Colorado in coordination with the US Postal Service, the US Census Bureau, and state and local agencies will begin to provide feedback to local jurisdictions on possible discrepancies in comparison to Master Street Address Guides (MSAGs), the Coding Accuracy Support System (CASS), Statewide Colorado Voter Registration and Election System (SCORE), the Colorado Motorist Insurance Identification Database MIDB, and other datasets that contain addresses. These comparisons are particularly helpful in identifying possible omissions but also in confirming and completing georeferenced address data content. This presentation will describe the value of these comparisons and progress in developing and measuring data quality using common criteria and objective measures.
Lemmens kessler-agile-linked data v3-slideshareRob Lemmens
Geo-Information Visualizations of Linked Data. Linked Data provides an ever-growing source of geographically referenced data for application development. In this paper, we analyse the workflow behind the development of such an application. Using two examples based on worldwide development aid and refugee data, we discuss the steps from locating data for use and data integration, up to the actual visualization in a web-based application. At each step, we discuss the skill set required for completion and point to potential challenges. This includes RDF, SPARQL, HTTP requests, HTML, and JavaScript. We conclude the paper by putting our case study in the context of GIScience curriculum development.
Webinar - INSPIRE 2020 Virtual Conference
----
How Spatial Data Infrastructures (SDIs) can evolve into Data Ecosystems?
This is the main question that the ongoing study addressing “Data ecosystems for geospatial data - Evolution of Spatial Data Infrastructures” (JRC/IPR/2019/MVP/2781) is addressing. It is performed by the Luxembourg Institute of Science and Technology (LIST) in close collaboration with the Joint Research Centre of the European Commission.
The purpose of this study is to identify and analyse a set of successful data ecosystems and to address recommendations in support of the implementation of data-driven innovation in line with the recently published European Strategy for Data. It investigates factors such as relevant actors, their responsibilities and data value chains, emerging data sources (e.g. the Internet of Things) and technical/architectural approaches (e.g. digital platforms, mobile-by-default, Application Programming Interfaces). It also addresses the interoperability between data ecosystems in different sectors and/or different countries and crosscutting requirements for geospatial data.
This session is intended to share with the audience the study approach, methodological approach and first identified Data Ecosystems, and to learn from their experiences with Data Ecosystems: emergence, barriers, opportunities, sustainability, interoperability between ecosystems, etc.
AGENDA
14.00 - Welcome, Introduction to the context of the study (JRC)
14.10 - Study approach and methodological framework (LIST)
14.20 - Identified data ecosystems and selection criteria (LIST)
14.25 - Illustration of data ecosystem analysis (LIST)
Ghislain Delabie, Simon Saint-Georges, Urban Rennes Data Interface
Sean Wiid, UP42
Charles Moszkowicz, ENEO • Interactive session (All, 20)
Next activities, Goodbye.
Approaches to representing and delivering geospatial data in the semantic Web...Paul Box
To address many real world challenges, multi-disciplinary data is required. In the earth sciences, EO together with feature based geospatial representations of spatial objects, such as administrative reporting geographies, topographic features and in-situ sensor assets, typically need to be accessed and integrated to support analysis. Currently, coordinates are the primary linga franca for operating across both types of data. However, human users refer to spatial objects using spatial identifiers such as place names or well know codes such as the postcodes, Local Government Area (LGA) codes or hydrological catchment codes. These identifiers provide a critical data integration mechanism as they are used to reference and access observations and measurement data about spatial objects, such as time series water levels for a catchment or demographic data for a Local Government Area (LGA) held in numerous systems.
With the emergence of the semantic Web, current geometry and coordinate oriented approaches to geospatial data need to change. More non-GIS users need to access reliable identifiers for places (spatial identifiers), rather than their geometries. Consequently, there is a need to develop approaches that enable us to manage and share the identity of spatial object e.g. Lake Arthur and link spatial identities to available geometric representations for use when required.
Emerging approaches to representing and using geospatial data in the semantic Web have some implications for earth observation data. These include the potential use of spatial identifier sets as vocabularies for the spatial dimension of EO data cubes, enabling users to reliably query EO data using these identifiers. If identifiers sets are applied across data cubes, consistent identifier based queries are possible across distributed information sources.
This presentation will highlight some emerging approaches to delivering geospatial data into the semantic web, together with implications for improved integration of EO and feature based representations of spatial objects.
An evaluation of SimRank and Personalized PageRank to build a recommender sys...Paolo Tomeo
The Web of Data is the natural evolution of the World Wide Web from a set of interlinked documents to a set of interlinked entities. It is a graph of information resources interconnected by semantic relations, thereby yielding the name Linked Data. The proliferation of Linked Data is for sure an opportunity to create a new family of data-intensive applications such as recommender systems. In particular, since content-based recommender systems base on the notion of similarity between items, the selection of the right graph-based similarity metric is of paramount importance to build an effective recommendation engine. In this paper, we review two existing metrics, SimRank and PageRank, and investigate their suitability and performance for computing similarity between resources in RDF graphs and investigate their usage to feed a content-based recommender system. Finally, we conduct experimental evaluations on a dataset for musical artists and bands recommendations thus comparing our results with two other content-based baselines
measuring their performance with precision and recall, catalog coverage, items distribution and novelty metrics.
Towards emergency vehicle routing using Geolinked Open Data: the case study o...Sergio Consoli
Linked Open Data (LOD) has gained significant momentum over the past years as a best practice of promoting the sharing and publi- cation of structured data on the semantic Web. Currently LOD is reach- ing significant adoption also in Public Administrations (PAs), where it is often required to be connected to existing platforms, such as GIS-based data management systems. Bearing on previous experience with the pi- oneering data.cnr.it, through Semantic Scout, as well as the Agency for Digital Italy recommendations for LOD in Italian PA, we are working on the extraction, publication, and exploitation of data from the Geographic Information System of the Municipality of Catania, referred to as SIT (“Sistema Informativo Territoriale”). The goal is to boost the metropolis towards the route of a modern Smart City by providing prototype inte- grated solutions supporting transport, public health, urban decor, and social services, to improve urban life. In particular a mobile application focused on real-time road traffic and public transport management is currently under development to support sustainable mobility and, espe- cially, to aid the response to urban emergencies, from small accidents to more serious disasters. This paper describes the results and lessons learnt from the first work campaign, aiming at analyzing, reengineering, linking, and formalizing the Shape-based geo-data from the SIT.
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...Data Con LA
The impact of data science on business is undeniable, and the value it provides is growing without signs of slowing. To keep up with this rapidly evolving technology landscape, data scientists must adapt and specialize through continuous learning. This talk focuses on how they can do that in a way that maximizes the positive impact data science will have on their organization.
Individual movements and geographical data mining. Clustering algorithms for ...Beniamino Murgante
Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.
Giuseppe Borruso, Gabriella Schoier - University of Trieste
Online Index Extraction from Linked Open Data SourcesFabio Benedetti
This presentation has been held by me at the Workshop titled Linked Data for Information Extraction 2014 (LD4IE) held at the International Semantic Web Conference 2014. The related paper is titled "Online Index Extraction from Linked Open Data Sources" and here is the link: http://ceur-ws.org/Vol-1267/LD4IE2014_Benedetti.pdf
Location Intelligence for Italian and UK Justice and Public Safety - BIWASumm...Iconsulting
An Italian Government Institution and a UK County Police have developed crime analysis and visibility reporting systems to fight petty crime and improve officers’ deployment.These major organizations, belonging to the Justice and Public Safety sector, implemented Location Intelligence solutions based on a Geo-Data Warehouse. Here they integrate their analytical data with location-based datasets like crimes coordinates, officers’ positions and open data layers about demographics.
Oracle technology stack (Oracle Spatial, Oracle MapViewer, OBIEE), extended with the Iconsulting Location Intelligence Library, are the foundations of these advanced analytics solutions which support both data exploration needs and decision-making processes.
On one hand, the synergy between Spatial capabilities and OBIEE has helped the Italian Government Institution not only to produce fast and reliable statistics and analyses, but also to improve the investigation process, by discovering recurrent crime patterns, considering both analytical and spatial data.
On the other hand, the UK County Police exploits data collected from officers’ handheld radio with GPS (they collect the position of each officer every few seconds). The solution provides the Chief Constable and the senior command team an overview of officers’ availability and visibility. It allows benchmarking these figures, understanding where officers should be and improving their deployment.
Location Intelligence solutions allow performance analysts to spend more time analyzing their performance without worrying about data manipulation or presentation issues.
The Geo-Data Warehouse unlocks the power of Spatial data, enabling advanced features like Geo What-if simulations, and lays the foundations for effective predictive crime analysis algorithms.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
tapal brand analysis PPT slide for comptetive data
Geo-statistical Exploration of Milano Datasets
1. Geo-statistical Exploration of
Milano Datasets
Irene Celino and Gloria Re Calegari
CEFRIEL
1
The 13th International Semantic Web Conference 2014
Riva del Garda, Italy
19 – 23 October 2014
Semantic Statistics workshop
19th October 2014
2. Research goal
Long term goal
• Compare data related to the same city but obtained from heterogeneous
sources. Do they provide the same ‘picture’ of the city?
• Can we update a dataset (expensive and time consuming, updated once every
5/10 years) with another dataset (cheaper and always up to date)?
Presentation target
• Comparison of two heterogeneous datasets referring to the same city (Milan) to
discover if they have any intrinsic correlation
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 2
3. Datasets available
• Telecom (phone activity data) provided for their “Big Data Challenge”
• Milan + surroundings
• Nov-dic 2013
• Grid of 10.000 cells
• Activity recorded every ten minutes
• A footprint for each cell
• ISTAT - Italian National Statistical Institute
• demographic data of 2011 and 2001
• divided by sex, age and nationality
Analysis performed using data in their original format (GeoJSON, csv) and serialization in RDF format
at the end of the analysis.
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 3
4. Datasets available – different spatial granularity
ISTAT – 50 surrounding towns + 9
internal Milano districts
Telecom – grid of 10.000 cells (250 x
250 m each)
Pre-processing of
data required.
Mapping of
Telecom data into
municipality
granularity.
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 4
5. Methodology of analysis
Goal of analysis: do the two datasets have the same intrinsic meaning?
Unsupervised clustering to group data in each dataset:
• k-means (euclidean distance)
• Adaptive k-means (cosine distance)
Comparison of the two clustering results to find correlation between the two datasets.
Validation of the comparison:
• Rand Index
• Kappa Index
(the closer to 1 the index is, the stronger the correlation between the data clusterings)
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 5
6. Experiments- Telecom vs ISTAT (range on total poulation)
Telecom - Kmeans 6 classes
Rand
Index
Kappa
Index
0,23 0,23
ISTAT 2011 - range 6 classes
Only partial
correlation
Try to add
information to total
population (feature
vector with
distribution of
populatin divided by
sex, age and
nationality)
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 6
7. Experiments - Telecom vs ISTAT (2011 fine-grained population)
Telecom - Kmeans 4 classes
Rand
Index
Kappa
Index
0,77 0,81
ISTAT 2011 – Kmeans 4 classes
Good correlation
Try to add historical
information (2001
datasets).
Does adding
temporal dynamic
improve corrrelation?
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 7
8. Telecom - Kmeans 4 classes
Rand
Index
Kappa
Index
0,77 0,81
ISTAT 2011 + 2001 – Kmeans 4 classes
Good correlation.
The same as the
previous test.
Experiments- Telecom vs ISTAT (2011+2001 fine-grained population)
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 8
9. Clustering results serialization
Results of
clustering
analysis
Geographic format
RDF format
Geo + RDF format
GeoJSON file
RDF Data Cube vocabulary (adding in the
Observation definition concepts of clustering
algo and cluster number)
Enrich GeoJSON with power of
JSON-LD -> ‘’GeoJSON-LD’’
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 9
10. ‘’GeoJSON-LD’’
‘’GeoJSON-LD’’ is obtained by adding the ‘@context’ prefix to the GeoJSON file.
The prefix specify how to interpret GeoJSON tags as RDF resources
@context prefix GeoJSON file
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 10
11. ‘’GeoJSON-LD’’ pros and cons
Correct interpretation of the geographic information
in the GeoJSON-LD file
Problems in the interpretation of the RDF information.
‘’GeoJSON-LD’’ does not interpret correctly a lists of lists
(polygon coordinates representation)
GeoJSON-LD is correctly interpreted as GeoJSON by
GIS
GeoJSON list of lists
RDF serialization
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 11
12. Conclusion and future works
• Identification of a correlation between phone activity data and
demographic information
• Further tests using different datasets (land use, Point of interests)
• Find efficient method for handling the different granularity levels of the
datasets
• Method for handling temporal aspect of phone activity data (we lost temporal
information during clustering process)
• Find a method to overcome ‘’GeoJSON-LD’’ serialization problem
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 12
13. Thank you! Any question?
Further details at: http://swa.cefriel.it/geo/
Irene Celino and Gloria Re Calegari
CEFRIEL
Semantic Statistics - ISWC 2014 - Riva del Garda, Italy 13