Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Semantic Geo-clustering
using SANSA
Dr. Simon Scerri (Fraunhofer IAIS)
Afshin Sadeghi (Fraunhofer IAIS)
Exercise 4, Final ...
Final BETTER Hackathon
Results of the Better Project
BETTER | Delivering EO Based Data Pipelines to Address Key Societal Challenges
• 3 year H2020 project selected in the EO2-...
BETTER | General Objective
Implementing a EO Big Data intermediate service layer devoted to harnessing the
potential of th...
BETTER | Extending those communities through challenges
• Data challenges set by user communities to drive
the development...
BETTER | Extending those communities through challenges
• Addresses the needs of diverse set of user
communities to promot...
• More about the Project
• Website: https://www.ec-better.eu/
• Twitter: https://twitter.com/BETTER_H2020
• LinkedIn: http...
Final BETTER Hackathon
Value of Semantic Interoperability
Data Heterogeneity: Limiting the Potential!
Closed
Data Silos
Project Copernicus
Data Hub
Public Gov.
Data
Linked Data
…
P...
What is needed to Enable Linking and Maximise Re-use?
● Highly-structured Data
Format
● Using Common Domain
Models
● Porti...
subject
Encoding Knowledge with RDF
An instance is an entity that is
a member of a class
Bulawayo
Province
monthlyRainfall...
Ontologies: Enabling a common ‘Language’
● Raw data / ground truth
■ People, Places, Organisations, Sensor
data, Productio...
Transformation tools: Un/Semi/Structured data to RDF
Relational
data models
Graph based
data model
Un/Structured
Text, Med...
SANSA: Innovative KG-based Inference &
ML
● Machine Learning Distributed algorithms
○ KG embeddings for KB completion, lin...
Final BETTER Hackathon
Today’s Exercise!
Exercise Format & Support
• Step-by-step exercise in this main channel
• Participants can Raise Hands and Ask Questions th...
Timeplan
13:30 Background (BETTER Project)
13:45 Today’s exercise (Introduction)
14:00 - 15:00 Task 1, 2 preparations & RD...
Semantic Geo-Clustering of Tweets with SANSA
Images in Twitter posts
? Twitter posts geo locations
on the globe
The Pipeline
● Data transformation and RDFization of the Twitter dataset.
● Step by Step clustering based on geo-tags and ...
Twitter Dataset (MK-Lab)
Kindly provided by MKLab ITI-CERTH, https://mklab.iti.gr/),
consortium member of the EOPEN projec...
Clustering Geo Data extracted from Twitter
● Twitter data.
Requirements
• Docker Tutorial: https://docs.docker.com/engine/install/
• Docker Compose Tutorial: https://docs.docker.com...
https://github.com/ec-better/hackathon-
2020-semanticgeoclustering
Clone Project from Here
Stuck?
Background Support Chann...
Task 1: Loading data to HDFS for distributed
processing
• If you have not done it yet, load the data :
You should be able ...
Clustering Geo Data extracted from Twitter
Today we perform:
● Transform and RDFization Twitter-tagged JSON → RDF
● Step b...
Task 2: Transform & Twitter-tagged JSON → RDF
Extract Image
Tags,
locations
Image Tags,
locations,
tweet Ids
01/2015
Inter...
Task 2: From JSON to RDF
We have prepared a Python script that prepared these tasks. Zeppelin can run Python!
Open localho...
Task 2: From JSON to RDF
Let’s check inside the rdf file:
Open a new paragraph in Zeppelin, hover the mouse down the first...
Task 2: Content of Triple dataset
• Output: N-triple data
< ...> <../22-rdf-syntax-ns#type> <http://slipo.eu/def#POI> .
<....
Task 3: Upload triple data to SANSA Hadoop
node
Let’s upload the new file to Hadoop node so that SANSA can process it.
Che...
Task 4: Extract POIs and their Geo data
Extract
Values
values in RDF
format
POI TAGs
POI Geo Clustering Clustered
POI
Simi...
Task 4: Use SANSA to read the N-triple file
Let’s Use SANA to read the N-triple data on the Hadoop node.
Check in workbook...
Task 4: Extract POIs and their Geo data
Extract
Values
values in RDF
format
POI TAGs
POI Geo
SANSA
Query
- Use SANSA Query...
Task 4: Extract POIs and their Geo data
Extract
Values
values in RDF
format
POI TAGs
POI Geo
SANSA
Query
Associate the Tag...
Task 4: Clustering POIs and their Based on
Image Tags
-Clustering based on the Geo data and
Tags
POI TAGs
POI Geo
Clusteri...
Task 4: Clustering POIs
Array[ClusterOSM] = Array(ClusterOSM(0,[LPoiOSM;@22a56393), ClusterOSM(1,[LPoiOSM;@1d8c8c79),
Clus...
Task 5: Further clustering and Visualisation
of Image tags using Javascript
-Zeppelin supports JavaScript and AngularJS
- ...
Task 5: K-Means clustering
Array[List[com.vividsolutions.jts.geom.Point]] = Array(List(POINT (-74.203214 4.562966)
0.13057...
Results: Visualisation using Javascript
- Run and Zoom out the Map
- Each POI image tag is shown with a color
workbook/zep...
Final BETTER Hackathon
Concluding remarks
Discussion
● Share your results & experience
● Open Questions?
CALLISTO Project
● Continuing effort showcased in today’s Hackathon exercise
● Starting 2021
● Consortium includes Fraunho...
Thank You
We truly hope you have found value in today’s exercise:
Semantic Geo-clustering using SANSA
Please help us by fi...
Upcoming SlideShare
Loading in …5
×

of

Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 1 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 2 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 3 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 4 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 5 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 6 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 7 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 8 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 9 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 10 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 11 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 12 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 13 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 14 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 15 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 16 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 17 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 18 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 19 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 20 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 21 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 22 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 23 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 24 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 25 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 26 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 27 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 28 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 29 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 30 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 31 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 32 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 33 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 34 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 35 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 36 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 37 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 38 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 39 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 40 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 41 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 42 Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA Slide 43
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA

Download to read offline

As part of the final BETTER Hackathon, project partners prepared 4 hackathon exercises. Fraunhofer IAIS organised this exercise in conjunction with external partner ​MKLab ITI-CERTH (EOPEN project). This step-by-step exercise featured the setup of local Docker images on Linux OS featuring Dcoker Compose and (pre-installed) Python, SANSA, Hadoop, Apache Spark and Apache Zeppelin. It featured semantic transformation and and the use of SANSA (Scalable Semantic Analytics Stack - http://sansa-stack.net/) libraries on a sample of tweets ahead of geo-clustering.

Project website (Hackathon information): https://www.ec-better.eu/pages/2nd-hackathon
Github repository: https://github.com/ec-better/hackathon-2020-semanticgeoclustering

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA

  1. 1. Semantic Geo-clustering using SANSA Dr. Simon Scerri (Fraunhofer IAIS) Afshin Sadeghi (Fraunhofer IAIS) Exercise 4, Final BETTER Hackathon 29th October 2020 Virtual Hackathon https://www.ec-better.eu/pages/2nd-hackathon http://ec-better.eu This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement no 776280 VIEW these slides (and others) here https://www.slideshare.net/PRBETTER/ GitHub Repository https://github.com/ec-better/hackathon-2020-semanticgeoclustering
  2. 2. Final BETTER Hackathon Results of the Better Project
  3. 3. BETTER | Delivering EO Based Data Pipelines to Address Key Societal Challenges • 3 year H2020 project selected in the EO2-2017 call “Big Data shift” (2017 - 2020) • 3 million euros funding • 6 partners: • Technical Development • Deimos group (Deimos Engenharia, Deimos Space, Deimos UK), Terradue and Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), • Challenge promoters • World Food Programme (WFP), European Union Satellite Centre (SatCen), Swiss Federal Institute of Technology in Zurich (ETH Zurich)
  4. 4. BETTER | General Objective Implementing a EO Big Data intermediate service layer devoted to harnessing the potential of the Copernicus and Sentinel European EO data directly from the needs of the users, gathered through Data Challenges, and delivering to them customized solutions denominated Data Pipelines. http://ec-better.eu
  5. 5. BETTER | Extending those communities through challenges • Data challenges set by user communities to drive the development of the BETTER service layer • 3 challenge yearly cycles • challenge promoters working in key societal areas • consists of a well defined problem that requires systematic processing of EO and/or non-EO data • should involve high data volume and/or diversity • centered in the use of Copernicus and Sentinel data
  6. 6. BETTER | Extending those communities through challenges • Addresses the needs of diverse set of user communities to promote a wider range of solutions Increases the scope, robustness & resilience of developed system • Data challenges set by user communities to drive the development of the BETTER service layer • 3 challenge yearly cycles • challenge promoters working in key societal areas • consists of a well defined problem that requires systematic processing of EO and/or non-EO data • should involve high data volume and/or diversity • centered in the use of Copernicus and Sentinel data
  7. 7. • More about the Project • Website: https://www.ec-better.eu/ • Twitter: https://twitter.com/BETTER_H2020 • LinkedIn: https://www.linkedin.com/company/better_h2020/ • Facebook: https://www.facebook.com/BETTER.H2020 • GitHub: https://github.com/ec-better • SlideShare: https://www.slideshare.net/PRBETTER/ • Youtube: https://www.youtube.com/channel/UCc6NJ1v3sGoKoJuHQlACi8Q BETTER | Delivering EO Based Data Pipelines to Address Key Societal Challenges
  8. 8. Final BETTER Hackathon Value of Semantic Interoperability
  9. 9. Data Heterogeneity: Limiting the Potential! Closed Data Silos Project Copernicus Data Hub Public Gov. Data Linked Data … Project Weather Data Shared Data Spaces Data Providers ?? ?
  10. 10. What is needed to Enable Linking and Maximise Re-use? ● Highly-structured Data Format ● Using Common Domain Models ● Porting existing data onto highly-structure format ● Using Universal Identifiers for Things Closed Data Silos Project Copernicus Data Hub Public Gov. Data LinkedData … Project Weather Data Shared Data Spaces Data Providers Data Standard Data Models Domain Experts Schema
  11. 11. subject Encoding Knowledge with RDF An instance is an entity that is a member of a class Bulawayo Province monthlyRainfall 86 A concept / entity that represents things in the real and/or information world. type Relationship between two concepts / entity. predicate object Target of a predicate: concept (Province), other entity (Zimbabwe) or literal (86) A class represents a group of instances that have one or more properties in common Literals are encoded in forms of data types, e.g.: ● String: “Value” ● Number: 180, 5.85 ● Date: 2018-02-21 A property refers to attributes of the instance or relations to other entities Zimbabwe country
  12. 12. Ontologies: Enabling a common ‘Language’ ● Raw data / ground truth ■ People, Places, Organisations, Sensor data, Production data, etc. ● Metadata ■ License information, Provenance, Versioning, Documentation, etc. ● Vocabularies ■ Domain Models: Definitions of Class and Property(-hierarchies) ■ Define Metadata that describes Raw Data (entities) using ■ T-box enabling Knowledge Representation Meta- data Description of the Data Vocabularies Structure of the Data Data Ground Truth
  13. 13. Transformation tools: Un/Semi/Structured data to RDF Relational data models Graph based data model Un/Structured Text, Media, etc. Complexity ? R2RML SILK Framework ….
  14. 14. SANSA: Innovative KG-based Inference & ML ● Machine Learning Distributed algorithms ○ KG embeddings for KB completion, link prediction ○ Graph Clustering ○ Association Rule Mining ○ Semantic Decision Trees Inference ● Inference ○ In-memory via rule-based forward chaining ○ Dynamically build Rule dependency graphs ○ Based on RDF/OWL fragments
  15. 15. Final BETTER Hackathon Today’s Exercise!
  16. 16. Exercise Format & Support • Step-by-step exercise in this main channel • Participants can Raise Hands and Ask Questions through the GoToWebinar tool options. • We will switch to ‘Meeting mode’ from ‘Webinar mode’ meaning all participants can also unmute and speak. • We will conduct checkpoints to ensure progress • Issues that need more 1-1 attention can be resolved in our background support channel: https://meet.google.com/hsn-hojt-zhi • Please don’t be shy :) Ideally, this Hackathon was meant to be a face to face event, so do speak up, share opinions and ask question - we are here to help make this event successful and of value for you!
  17. 17. Timeplan 13:30 Background (BETTER Project) 13:45 Today’s exercise (Introduction) 14:00 - 15:00 Task 1, 2 preparations & RDFization, Task 3 SANSA node upload Coffee Break 20 mins 15:20 - 16:10 Task 4 RDFization, Clustering Coffee Break 20 mins 16:30-17:15 Task 4, 5, 6 Clustering, Visualisation 17:15 Wrap-up, Result Reports & Conclusion Stuck? Background Support Channel: https://meet.google.com/hsn-hojt-zhi
  18. 18. Semantic Geo-Clustering of Tweets with SANSA Images in Twitter posts ? Twitter posts geo locations on the globe
  19. 19. The Pipeline ● Data transformation and RDFization of the Twitter dataset. ● Step by Step clustering based on geo-tags and other associated tags. ○ Distributed clustering using SANSA stack
  20. 20. Twitter Dataset (MK-Lab) Kindly provided by MKLab ITI-CERTH, https://mklab.iti.gr/), consortium member of the EOPEN project. ● Dataset used in exercise consists of social media metadata derived from collected and analysed Twitter posts. ● Tweet Collection: using Twitter Streaming API, posted publicly 2017-20 ● Tweets include flood-related keywords (English) and an attached image ● Tweet Analysis: Comprises i) automatic geotagging based on location mentions and ii) extraction of visual concepts from the images
  21. 21. Clustering Geo Data extracted from Twitter ● Twitter data.
  22. 22. Requirements • Docker Tutorial: https://docs.docker.com/engine/install/ • Docker Compose Tutorial: https://docs.docker.com/compose/install/ • Linux OS
  23. 23. https://github.com/ec-better/hackathon- 2020-semanticgeoclustering Clone Project from Here Stuck? Background Support Channel: https://meet.google.com/hsn-hojt-zhi
  24. 24. Task 1: Loading data to HDFS for distributed processing • If you have not done it yet, load the data : You should be able to see the Twitter JSON file in here: BETTER-hacktron/examples/data make load-data
  25. 25. Clustering Geo Data extracted from Twitter Today we perform: ● Transform and RDFization Twitter-tagged JSON → RDF ● Step by Step clustering based on geo-tags and other associated tags.
  26. 26. Task 2: Transform & Twitter-tagged JSON → RDF Extract Image Tags, locations Image Tags, locations, tweet Ids 01/2015 Intersect tags and tweets Create concepts and instances for each of these Concept list, concept ids, Tags concept, tag instances, etc Intersected values Transformation to RDF values in RDF format Twitter data
  27. 27. Task 2: From JSON to RDF We have prepared a Python script that prepared these tasks. Zeppelin can run Python! Open localhost Create a new notebook. Check the code in workbook : workbook/zeppelin_step2_1.txt Copy and past it to the Zeppelin Click run in the right corner. workbook/zeppelin_step2_1.txt
  28. 28. Task 2: From JSON to RDF Let’s check inside the rdf file: Open a new paragraph in Zeppelin, hover the mouse down the first paragraph screen: Add the command: workbook/zeppelin_step2_2.txt %sh head ../sansa-examples/data/EOPEN_POIs_100.nt -n 100
  29. 29. Task 2: Content of Triple dataset • Output: N-triple data < ...> <../22-rdf-syntax-ns#type> <http://slipo.eu/def#POI> . <...> <...l#hasGeometry> <.../geometry> . <..> <http://slipo.eu/def#termValue> "Professional_Video" . <..><..>POINT(71.748324 33.973913)"^^<http://www.opengis.net/ont/geosparql#wktLiteral <http://slipo.eu/id/poi/1306902128451923968> <http://slipo.eu/def#termValue> "Text" . Defining Concepts (classes) An image tag A location Point Tweet ID instance
  30. 30. Task 3: Upload triple data to SANSA Hadoop node Let’s upload the new file to Hadoop node so that SANSA can process it. Check in workbook : workbook/zeppelin_step3_1.txt To see the content of Hadoop add a new block in zeppelin and write this two lines: %sh hdfs dfs -ls /data Check in workbook : workbook/zeppelin_step3_2.txt You should see the newly generated file in list of HDFS files. Stuck? Background Support Channel: https://meet.google.com/hsn-hojt-zhi
  31. 31. Task 4: Extract POIs and their Geo data Extract Values values in RDF format POI TAGs POI Geo Clustering Clustered POI Similarity definition SANSA Query
  32. 32. Task 4: Use SANSA to read the N-triple file Let’s Use SANA to read the N-triple data on the Hadoop node. Check in workbook: workbook/zeppelin_step4_1.txt You will see an output similar to the opening of the file in Task 2. What was the difference? - It was distributed reading using SCALA- SPARK. The task is distributed between workers. - Slower on the very small files - Faster on Large files. Stuck? Background Support Channel: https://meet.google.com/hsn-hojt-zhi
  33. 33. Task 4: Extract POIs and their Geo data Extract Values values in RDF format POI TAGs POI Geo SANSA Query - Use SANSA Query to - extract entities that have TAGS - extract entities that have Geo-data (1306905122056867840,spatial(71.748324,33.973913)) id Longitude, latitude workbook/zeppelin_step4_2.txt
  34. 34. Task 4: Extract POIs and their Geo data Extract Values values in RDF format POI TAGs POI Geo SANSA Query Associate the Tag category information with the Geo data for each entity PoiOSM(1305770203016556544,spatial(7.417893,48.0423 07),"Graphic") Tag category workbook/zeppelin_step4_3.txt
  35. 35. Task 4: Clustering POIs and their Based on Image Tags -Clustering based on the Geo data and Tags POI TAGs POI Geo Clustering based on Geo and Tags Clustered POI Similarity definition workbook/zeppelin_step4_4.txt
  36. 36. Task 4: Clustering POIs Array[ClusterOSM] = Array(ClusterOSM(0,[LPoiOSM;@22a56393), ClusterOSM(1,[LPoiOSM;@1d8c8c79), ClusterOSM(2,[LPoiOSM;@44a1cfc9), ClusterOSM(3,[LPoiOSM;@5a1b8824), ClusterOSM(4,[LPoiOSM;@53abd10f), ClusterOSM(5,[LPoiOSM;@2a3d6047), ClusterOSM(6,[LPoiOSM;@60a82d4e), ClusterOSM(7,[LPoiOSM;@66f4c3d7), ClusterOSM(8,[LPoiOSM;@6b5a55af), ClusterOSM(9,[LPoiOSM;@2a243fdb), ClusterOSM(10,[LPoiOSM;@b47a58c), ClusterOSM(11,[LPoiOSM;@4bed2548), ClusterOSM(12,[LPoiOSM;@72a3e6b7), ClusterOSM(13,[LPoiOSM;@7972d5df), ClusterOSM(14,[LPoiOSM;@4faf74b8), ClusterOSM(15,[LPoiOSM;@6dcacf60), ClusterOSM(16,[LPoiOSM;@310a30), ClusterOSM(17,[LPoiOSM;@6f02fbc4), ClusterOSM(18,[LPoiOSM;@5bbc5828), ClusterOSM(19,[LPoiOSM;@60362df0), ClusterOSM(20,[LPoiOSM;@68decb4c), ClusterOSM(21,[LPoiOSM;@10fad148) finaloutputforsaving: ClsutersOSM = ClsutersOSM(135,[I@3f96ea9d,[LClusterOSM;@4ca3750f) ("Graphic",(spatial(7.417893,48.042307),1305770203016556544)) ("Maps",(spatial(7.417893,48.042307),1305770203016556544)) ("Synthetic_Images",(spatial(7.417893,48.042307),1305770203016556544)) ("Animation_Cartoon",(spatial(7.417893,48.042307),1305770203016556544)) ("Background_Static",(spatial(7.417893,48.042307),1305770203016556544)) Tags and location of POIs in the first cluster Array of Clusters Cluster id Number of items in the first cluster
  37. 37. Task 5: Further clustering and Visualisation of Image tags using Javascript -Zeppelin supports JavaScript and AngularJS - We process the data to format that is presentable in map. - Each image tag will be shown with a color workbook/zeppelin_step5_1.txt and 5_2
  38. 38. Task 5: K-Means clustering Array[List[com.vividsolutions.jts.geom.Point]] = Array(List(POINT (-74.203214 4.562966) 0.1305715467571679233, POINT (-9.807217 6.849202) 0.1306860053920505857, POINT (-9.807217 6.849202) 0.1306860116235284480), List(POINT (-81.463983 27.756767) 1.1305714614278270976, POINT (139.504805 35.723257) 1.1306891291540807682), List(POINT (-88.043054 30.694357) 2.1305788814829461504),.... Array[(String, Array[Int])] = Array((66p0,Array(5)), (112p0,Array(5)), (67p0,Array(5)), (60p0,Array(5)), (68p0,Array(5)), (23p0,Array(4)), (107p0,Array(5)), (162p0,Array(4)), (25p0,Array(4)), (110p0,Array(5)), (118p0,Array(5)), (70p0c71p0c72p0c73p0c74p0c75p0c76p0c77p0c78p0c79p0c80p0c81p0c82p0c84p0,Array(3)), (122p0,Array(4)), (125p0,Array(4)), (40p0,Array(5)),..... Cluster2 newdata: Array[Int] = Array(5, 5, 5, 5, 5, 4, 5, 4, 4, 5, 5, 3, 4, 4, 5, 4, 4, 5, 5, 5, 4, 3, 5, 4, 5, 4, 5, 4, 5, 4, 5, 4, 5, 5, 4, 4, 4, 5, 4, 5, 5, 5, 4, 5, 5, 4, 5, 4, 4, 4, 5, 4, 5, 5, 5, 5, 4, 5, 4, 3, 4, 4, 5, 4, 5, 3, 3, 5, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 4, 4, 3, 4, 4, 4, 5, 5, 4, 4, 5, 3, 4) workbook/zeppelin_step5_3.txt Combination of cluster and Id List of clusters - Step 5-3 : k-means clustering based on the distance between POIs Number of POIs in each of clusters.
  39. 39. Results: Visualisation using Javascript - Run and Zoom out the Map - Each POI image tag is shown with a color workbook/zeppelin_step5_4.txt to 5_6
  40. 40. Final BETTER Hackathon Concluding remarks
  41. 41. Discussion ● Share your results & experience ● Open Questions?
  42. 42. CALLISTO Project ● Continuing effort showcased in today’s Hackathon exercise ● Starting 2021 ● Consortium includes Fraunhofer IAIS & MKLab ITI-CERTH ● Features the use of SANSA ● Keep an eye out for the project (no Website yet)!
  43. 43. Thank You We truly hope you have found value in today’s exercise: Semantic Geo-clustering using SANSA Please help us by filling out a 2-minute questionnaire: https://forms.gle/MdZQ1n3ZqJoNUfzF9

As part of the final BETTER Hackathon, project partners prepared 4 hackathon exercises. Fraunhofer IAIS organised this exercise in conjunction with external partner ​MKLab ITI-CERTH (EOPEN project). This step-by-step exercise featured the setup of local Docker images on Linux OS featuring Dcoker Compose and (pre-installed) Python, SANSA, Hadoop, Apache Spark and Apache Zeppelin. It featured semantic transformation and and the use of SANSA (Scalable Semantic Analytics Stack - http://sansa-stack.net/) libraries on a sample of tweets ahead of geo-clustering. Project website (Hackathon information): https://www.ec-better.eu/pages/2nd-hackathon Github repository: https://github.com/ec-better/hackathon-2020-semanticgeoclustering

Views

Total views

102

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

2

Shares

0

Comments

0

Likes

0

×