Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Better Session, Semantic Analytics Exercise @ EO Joint Big Data Hackathon

22 views

Published on

The 'Semantic Analytics' exercise, introducing the SANSA Correlation Library, during the BETTER Session at the EO Joint Big data Hackathon in Frascati, Italy, 7-8 November 2019. Read more here: https://www.ec-better.eu/pages/h2020-eo-big-data-hackathon

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Better Session, Semantic Analytics Exercise @ EO Joint Big Data Hackathon

  1. 1. Semantic Analytics In BETTER Dr. Simon Scerri (Fraunhofer IAIS) Mohamed Nadjib Mami (Fraunhofer IAIS) BETTER Hackathon 08th November 2019 Frascati, Rome http://ec-better.eu This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement no 776280 Download these slides here https://www.slideshare.net/PRBETTER/better- exercise3semantic-analytics Download material here https://github.com/ec-better/eohackathon- better-semantics
  2. 2. Data Heterogeneity: Limiting the Potential! Closed Data Silos Project Copernicus Data Hub Public Gov. Data Linked Data … 2 Project Weather Data Shared Data Spaces Data Providers ?? ?
  3. 3. What is needed to Enable Linking and Maximise Re-use? ● Highly-structured Data Format ● Using Common Domain Models ● Porting existing data onto highly-structure format ● Using Universal Identifiers for Things 3 Closed Data Silos Project Copernicus Data Hub Public Gov. Data Linked Data … Project Weather Data Shared Data Spaces Data Providers Data Standard Data Models Domain Experts Schema
  4. 4. subject Encoding Knowledge with RDF 4 An instance is an entity that is a member of a class Bulawayo Province monthlyRainfall 86 A concept / entity that represents things in the real and/or information world. type Relationship between two concepts / entity. predicate object Target of a predicate: concept (Province), other entity (Zimbabwe) or literal (86) A class represents a group of instances that have one or more properties in common Literals are encoded in forms of data types, e.g.: ● String: “Value” ● Number: 180, 5.85 ● Date: 2018-02-21 A property refers to attributes of the instance or relations to other entities Zimbabwe country
  5. 5. Ontologies: Enabling a common ‘Language’ ● Raw data / ground truth ■ People, Places, Organisations, Sensor data, Production data, etc. ● Metadata ■ License information, Provenance, Versioning, Documentation, etc. ● Vocabularies ■ Domain Models: Definitions of Class and Property(-hierarchies) ■ Define Metadata that describes Raw Data (entities) using ■ T-box enabling Knowledge Representation Meta- data Description of the Data Vocabularies Structure of the Data Data Ground Truth 5
  6. 6. Transformation tools: Un/Semi/Structured data to RDF Relational data models Graph based data model Un/Structured Text, Media, etc. Complexity ? R2RML SILK Framework ….
  7. 7. SANSA: Innovative KG-based Inference & ML ● Machine Learning Distributed algorithms ○ KG embeddings for KB completion, link prediction ○ Graph Clustering ○ Association Rule Mining ○ Semantic Decision Trees Inference ● Inference ○ In-memory via rule-based forward chaining ○ Dynamically build Rule dependency graphs ○ Based on RDF/OWL fragments
  8. 8. Today’s Exercise! Rainfall to Market Price Correlation Detection CHIRPS Rainfall aggregates ? Crop Market Prices
  9. 9. CHIRPS Knowledge Graph Linking & Pattern Discovery ● Ingestion of CHIRPS-derived Knowledge Graphs/RDF ○ Ontology Mapping for semi-automatic RDF transformation ● Interlinking with relevant open data ○ Crop Market Prices from WFP ● Implicit Discovery of Patterns ○ Identify correlations between Rainfall and Crop (Maize) Market prices
  10. 10. Required Libraries • Python 3+ • Particular packages requirement: GDAL, Shapely • Java 7+ • Maven • Apache Spark 2+ • Linux OS
  11. 11. github.com/ ec-better/eohackathon-better-semantics Download Material Here
  12. 12. Task 1: From CHIRPS TIF to RDF Rainfall values extraction Rainfall values numerical array 01/2015 Geographical intersection Country Polygon Extract the geographical coordinate Rainfall values geo-coordinates Rainfall values of Zimbabwe Transformatio n to RDFRainfall values in RDF format TIF CHIRPS Southern Africa Monthly Zimbabwe
  13. 13. Task 1: From CHIRPS TIF to RDF • Required Libraries: Python 3 + GDAL and Shapely packages • Material Location: App > 1_Rainfall_TIF_to_RDF • Application Execution: run the command python3 Main.py ../../Data/CHIRPSv2_SouthernAfrica_N30_daystotal_2015-01- 01_2015-01-31.tif tif-rainfall-output.ttl
  14. 14. Task 1: From CHIRPS TIF to RDF • Output: RDF multidimensional data (aka Data Cube Vocabulary) @prefix qb: <http://purl.org/linked-data/cube#> . @prefix eg: <http://example.org/ns#> . @prefix dbpedia: <http://dbpedia.org/ontology/> . @prefix cf-feature: <http://purl.oclc.org/NET/ssnx/cf/cf-feature#> . eg:obs1 a qb:Observation ; qb:dataSet eg:dataset-prices ; dbpedia:month "1" ; dbpedia:year "2015" ; cf-feature:rainfall_amount "789587" ; . Namespaces (domains) An Observation A DataSet Dimensions The Measure
  15. 15. Task 2: From CHIRPS CSV to RDF Filtering CSV CHIRPS Zimbabwe Monthly Rainfall Rainfall Monthly Transformation to RDF Rainfall in RDF format
  16. 16. Task 2: From CHIRPS CSV to RDF • Required libraries: Python 3 • Material location: App > 2_Rainfall_CSV_to_RDF • Application Execution: run the command python3 Main.py ../../Data/Zimbabwe_Rainfall.csv rainfall-output.ttl
  17. 17. Task 2: From CHIRPS CSV to RDF • Output: RDF multidimensional data (aka Data Cube Vocabulary) @prefix qb: <http://purl.org/linked-data/cube#> . @prefix eg: <http://example.org/ns#> . @prefix dbpedia: <http://dbpedia.org/ontology/> . @prefix cf-feature: <http://purl.oclc.org/NET/ssnx/cf/cf-feature#> . eg:obs1 a qb:Observation ; qb:dataSet eg:dataset-prices ; dbpedia:month "1" ; dbpedia:year "2015" ; cf-feature:rainfall_amount "789587" ; . Namespaces (domains) An Observation A DataSet Dimensions The Measure
  18. 18. Task 3: From WFP Maize prices to RDF Filtering Maize Prices CSV Maize Prices Monthly year 2015 Transformation to RDF Maize Prices in RDF format
  19. 19. Task 3: From WFP Maize prices to RDF • Required Libraries: Python 3 • Material Location: App > 3_Prices_CSV_to_RDF • Application Execution: run the command python3 Main.py ../../Data/Zimbabwe_Maize_Prices-2015.csv prices- output.ttl
  20. 20. Task 3: From WFP Maize prices to RDF • Output: RDF multidimensional data (aka Data Cube Vocabulary) @prefix qb: <http://purl.org/linked-data/cube#> . @prefix eg: <http://example.org/ns#> . @prefix dbpedia: <http://dbpedia.org/ontology/> . @prefix cbo: <http://comicmeta.org/cbo/> . eg:obs1 a qb:Observation ; qb:dataSet eg:dataset-prices ; dbpedia:month "1" ; dbpedia:year "2015" ; cbo:price "789587" ; . Namespaces (domains) An Observation A DataSet Dimensions The Measure
  21. 21. Task 4: Correlation Detection Extract Values Rainfall values in RDF format Crop Prices in RDF format Rainfall Vector Prices Vector Correlation Detection Correlation Factor Correlation Specification
  22. 22. Task 4: Correlation Detection { "properties": [ { "source": "/home/user/.../Apps/2_Rainfall_CSV_to_RDF/rainfall-output.ttl", "target": "<http://purl.oclc.org/NET/ssnx/cf/cf-feature#rainfall_amount>", "filters": { "<http://dbpedia.org/ontology/year>": "2015", "<http://dbpedia.org/ontology/month>": "1,2,3,4" } }, { "source": "/home/user/.../Apps/3_Prices_CSV_to_RDF/prices-output.ttl", "target": "<http://comicmeta.org/cbo/price>", "filters": { "<http://dbpedia.org/ontology/month>": "{02,03,04}{03,04,05}{04,05,06}{05,06,07};sum", "<http://dbpedia.org/ontology/year>": "2015" } } ] } The two properties to correlated Filter based on other properties
  23. 23. • Required Libraries: Python 3, Maven, Apache Spark • Material Location: App > 4_SANSA_Correlation_Detection • Application Execution: • Run the command to package the App • Run the command (if SPARK_HOME not set, navigate to Spark folder) mvn package spark-submit --class org.ml.test.App --master local[*] --executor-memory 5G target/correlation-1.0-SNAPSHOT.jar ../../Data/correlation.conf local[*] Task 4: Correlation Detection
  24. 24. Task 4: Correlation Detection • Output: Correlation Value using Pearson method • Value range between -1 and 1 • The closer to 1 = positive correlation • When rainfalls change, prices change in the same direction (up/down) • The closer to -1 = negative correlation • When rainfalls change, prices change in the opposite direction (up/down)

×