SlideShare a Scribd company logo
1 of 19
Download to read offline
LODStats
Introduction
Description and System Architecture
Dataset Model
Use Cases
Agenda
Data Web Statistics (Summary)
Conclusions
How to comprehend this
data?
3
● Data portals
● Big nucleus datasets
● SPARQL endpoints
Introduction
9960+
RDF Datasets on the Data Portals
4
Calculate statistical metrics User interface
5
Aggregates datasets from the largest data portals
LODStats: Web Application
SPARQL interface
“LODStats – An Extensible Framework for High-performance Dataset Analytics” (EKAW’2012) [1]
6
CKAN Aggregator
LODStats: System Architecture
Scan largest CKAN repos Filter out RDF datasets
“Linked Open Data Statistics: Collection and Exploitation” (KESW’2013) [2]
7
LODStats core application
LODStats: System Architecture (cont.)
Queue RDF datasets Calculate statistics
LODStats:
Provisioning
Docker image per component
docker-compose.yml for the whole project
Sustainable and platform independent deployment
8
LODStats:
Provisioning (cont.)
9
web:
restart: always
build: ./web
links:
- db
- rabbitmq
environment:
- LODSTATS_DB=db
- RABBITMQ=rabbitmq
rabbitmq:
restart: always
image: rabbitmq:3.6.1
db:
restart: always
build: ./db
virtuoso:
restart: always
build: ./virtuoso
environment:
- DBA_PASSWORD=dba
- SPARQL_UPDATE=false
- DEFAULT_GRAPH=http://lodstats.aksw.org/
nginx:
build: ./nginx
restart: always
links:
- web
- virtuoso
environment:
- VIRTUAL_HOST=lodstats.aksw.org,stats.lod2.eu
LODStats:
Provisioning (cont.)
10
$ git pull https://github.com/AKSW/lodstats.docker
$ docker-compose build
$ docker-compose up -d
11
Data Model
12
Data Web Statistics Summary
More statistics are available from SPARQL endpoint
2011 2016
Datasets 422 9,644
Links 3% 40%
Data Portals datahub.io publicdata.eu,
data.gov, datahub.io
Privacy Analysis
Does dataset
contain sensitive
information?
Coverage Analysis
Does dataset
contain necessary
information?
Quality Analysis
Define quality
metrics using
statistical data.
Vocabulary Reuse
Find a suitable
vocabulary for
your dataset.
13
How can you use LODStats data?
Use Cases
Link Target Identification
Which datasets are good
candidates for
interlinking?
“Detecting Similar Linked Datasets Using Topic Modelling” (ESWC’2016) [3]
14
Sparql Endpoint
http://lodstats.aksw.org/sparql
Availability
● Application
○ Online at: http://lodstats.aksw.org
○ LODStats processing module: https://github.com/aksw/lodstats
○ LODStats frontend including SPARQLify mappings:
https://github.com/aksw/lodstats_www
○ Deployment setup (docker): https://github.com/AKSW/lodstats.docker
● Dataset
○ Online at: http://lodstats.aksw.org/sparql
○ Datahub.io: https://datahub.io/dataset/lodstats
○ Can be deployed in Virtuoso using docker-compose from deployment repo
Processing of very large datasets (Spark/Hadoop)
Improving usability of the frontend
Extending data collection to crawling
Conclusions & Future Work
LODStats is easily replicable using Docker technology
Augustusplatz 10,
Room P905,
04109 Leipzig, Germany
Address
+49-341-97-32260
Phone
iermilov@informatik.uni-leipzig.de
Email
twitter.com/akswgroup
http://aksw.com/IvanErmilov
17
Contact Information
Thank You
Ivan Ermilov <iermilov@informatik.uni-leipzig.de>
Linked Open Data Statistics: Collection and Exploitation by Ivan Ermilov, Michael Martin,
Jens Lehmann, and Sören Auer in Proceedings of the 4th Conference on Knowledge Engineering
and Semantic Web
LODStats---An Extensible Framework for High-performance Dataset Analytics by Jan
Demter, Sören Auer, Michael Martin, and Jens Lehmann in Proceedings of the EKAW 2012
References
1
2
Detecting Similar Linked Datasets Using Topic Modelling by Michael Röder, Axel-Cyrille
Ngonga Ngomo, Ivan Ermilov, and Andreas Both in The Semantic Web. Latest Advances and
New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 --
June 2, 2016, Proceedings
3

More Related Content

What's hot

ICIC 2016: New Product Introductions FIZ Karlsruhe / STN
ICIC 2016: New Product Introductions FIZ Karlsruhe / STNICIC 2016: New Product Introductions FIZ Karlsruhe / STN
ICIC 2016: New Product Introductions FIZ Karlsruhe / STN
Dr. Haxel Consult
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12c
Martin Toshev
 

What's hot (20)

Societal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonSocietal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending Comparison
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
BDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical OverviewBDE-BDVA Webinar: BDE Technical Overview
BDE-BDVA Webinar: BDE Technical Overview
 
SC1 Workshop 2 Technical overview
SC1 Workshop 2 Technical overviewSC1 Workshop 2 Technical overview
SC1 Workshop 2 Technical overview
 
Updates from Hungary (Jozsef Kovacs)
Updates from Hungary (Jozsef Kovacs)Updates from Hungary (Jozsef Kovacs)
Updates from Hungary (Jozsef Kovacs)
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
 
Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.Alexander Pavlenko, Java Software Engineer, DataArt.
Alexander Pavlenko, Java Software Engineer, DataArt.
 
10 basic terms so you can talk to data engineer
10 basic terms so you can  talk to data engineer10 basic terms so you can  talk to data engineer
10 basic terms so you can talk to data engineer
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
 
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
 
shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014
shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014
shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014
 
ICIC 2016: New Product Introductions FIZ Karlsruhe / STN
ICIC 2016: New Product Introductions FIZ Karlsruhe / STNICIC 2016: New Product Introductions FIZ Karlsruhe / STN
ICIC 2016: New Product Introductions FIZ Karlsruhe / STN
 
Data access and data extraction services within the Land Imagery Portal
Data access and data extraction services within the Land Imagery PortalData access and data extraction services within the Land Imagery Portal
Data access and data extraction services within the Land Imagery Portal
 
Hadoop at LinkedIn
Hadoop at LinkedInHadoop at LinkedIn
Hadoop at LinkedIn
 
Spark - The beginnings
Spark -  The beginningsSpark -  The beginnings
Spark - The beginnings
 
New web service oriented ARC
New web service oriented ARCNew web service oriented ARC
New web service oriented ARC
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12c
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
Mastro
MastroMastro
Mastro
 

Viewers also liked

First Day
First Day First Day
First Day
jmori1
 
Java swing tips
Java swing tipsJava swing tips
Java swing tips
Tuan Ngo
 
Assignment 4 - Certification in Dispute Management
Assignment 4 - Certification in Dispute ManagementAssignment 4 - Certification in Dispute Management
Assignment 4 - Certification in Dispute Management
Jyotpreet Kaur
 
Traditional may day celebrations
Traditional may day celebrationsTraditional may day celebrations
Traditional may day celebrations
balada65
 
Track2 -刘继伟--openstack in gamewave
Track2 -刘继伟--openstack in gamewaveTrack2 -刘继伟--openstack in gamewave
Track2 -刘继伟--openstack in gamewave
OpenCity Community
 
Burgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hBurgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 h
Mieke Sanden, van der
 
เครื่องใช้ไฟฟ้า
เครื่องใช้ไฟฟ้าเครื่องใช้ไฟฟ้า
เครื่องใช้ไฟฟ้า
thananat
 
Backtoschoolnight
BacktoschoolnightBacktoschoolnight
Backtoschoolnight
hdaleo
 
Toward a malaria-free world - Tools' information
Toward a malaria-free world - Tools' informationToward a malaria-free world - Tools' information
Toward a malaria-free world - Tools' information
Xplore Health
 
Burgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hBurgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 h
Mieke Sanden, van der
 

Viewers also liked (20)

Infographic - MSP AWS Migration
Infographic - MSP AWS MigrationInfographic - MSP AWS Migration
Infographic - MSP AWS Migration
 
First Day
First Day First Day
First Day
 
Compare mysql5.1.50 mysql5.5.8
Compare mysql5.1.50 mysql5.5.8Compare mysql5.1.50 mysql5.5.8
Compare mysql5.1.50 mysql5.5.8
 
Java swing tips
Java swing tipsJava swing tips
Java swing tips
 
Elements of Art - nf
Elements of Art - nfElements of Art - nf
Elements of Art - nf
 
1interview1 golda
1interview1 golda1interview1 golda
1interview1 golda
 
Assignment 4 - Certification in Dispute Management
Assignment 4 - Certification in Dispute ManagementAssignment 4 - Certification in Dispute Management
Assignment 4 - Certification in Dispute Management
 
Traditional may day celebrations
Traditional may day celebrationsTraditional may day celebrations
Traditional may day celebrations
 
upcoming commercial projects in dwarka expressway 7428424386
upcoming commercial projects in dwarka expressway 7428424386upcoming commercial projects in dwarka expressway 7428424386
upcoming commercial projects in dwarka expressway 7428424386
 
MyEpcTeam v1.1
MyEpcTeam v1.1MyEpcTeam v1.1
MyEpcTeam v1.1
 
Infographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance MonitoringInfographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance Monitoring
 
Kmeans vs kmeanspp_20151124
Kmeans vs kmeanspp_20151124Kmeans vs kmeanspp_20151124
Kmeans vs kmeanspp_20151124
 
Track2 -刘继伟--openstack in gamewave
Track2 -刘继伟--openstack in gamewaveTrack2 -刘继伟--openstack in gamewave
Track2 -刘继伟--openstack in gamewave
 
Burgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hBurgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 h
 
เครื่องใช้ไฟฟ้า
เครื่องใช้ไฟฟ้าเครื่องใช้ไฟฟ้า
เครื่องใช้ไฟฟ้า
 
Backtoschoolnight
BacktoschoolnightBacktoschoolnight
Backtoschoolnight
 
Toward a malaria-free world - Tools' information
Toward a malaria-free world - Tools' informationToward a malaria-free world - Tools' information
Toward a malaria-free world - Tools' information
 
How are drugs developed? - Tools' information
How are drugs developed? - Tools' informationHow are drugs developed? - Tools' information
How are drugs developed? - Tools' information
 
Lightning Talk
Lightning TalkLightning Talk
Lightning Talk
 
Burgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hBurgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 h
 

Similar to Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wi
OllieShoresna
 

Similar to Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016 (20)

BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
 
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
 
Introduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStackIntroduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStack
 
BigDataEurope @BDVA Summit2016 1: The BDE Platform
BigDataEurope @BDVA Summit2016 1: The BDE PlatformBigDataEurope @BDVA Summit2016 1: The BDE Platform
BigDataEurope @BDVA Summit2016 1: The BDE Platform
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query Engines
 
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal Pilots
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
 
LarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - Introduction
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wi
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
OCCIware@OW2con 2016
OCCIware@OW2con 2016OCCIware@OW2con 2016
OCCIware@OW2con 2016
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...
 
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Stargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data APIStargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data API
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

  • 2. Introduction Description and System Architecture Dataset Model Use Cases Agenda Data Web Statistics (Summary) Conclusions
  • 3. How to comprehend this data? 3 ● Data portals ● Big nucleus datasets ● SPARQL endpoints Introduction
  • 4. 9960+ RDF Datasets on the Data Portals 4
  • 5. Calculate statistical metrics User interface 5 Aggregates datasets from the largest data portals LODStats: Web Application SPARQL interface “LODStats – An Extensible Framework for High-performance Dataset Analytics” (EKAW’2012) [1]
  • 6. 6 CKAN Aggregator LODStats: System Architecture Scan largest CKAN repos Filter out RDF datasets “Linked Open Data Statistics: Collection and Exploitation” (KESW’2013) [2]
  • 7. 7 LODStats core application LODStats: System Architecture (cont.) Queue RDF datasets Calculate statistics
  • 8. LODStats: Provisioning Docker image per component docker-compose.yml for the whole project Sustainable and platform independent deployment 8
  • 9. LODStats: Provisioning (cont.) 9 web: restart: always build: ./web links: - db - rabbitmq environment: - LODSTATS_DB=db - RABBITMQ=rabbitmq rabbitmq: restart: always image: rabbitmq:3.6.1 db: restart: always build: ./db virtuoso: restart: always build: ./virtuoso environment: - DBA_PASSWORD=dba - SPARQL_UPDATE=false - DEFAULT_GRAPH=http://lodstats.aksw.org/ nginx: build: ./nginx restart: always links: - web - virtuoso environment: - VIRTUAL_HOST=lodstats.aksw.org,stats.lod2.eu
  • 10. LODStats: Provisioning (cont.) 10 $ git pull https://github.com/AKSW/lodstats.docker $ docker-compose build $ docker-compose up -d
  • 12. 12 Data Web Statistics Summary More statistics are available from SPARQL endpoint 2011 2016 Datasets 422 9,644 Links 3% 40% Data Portals datahub.io publicdata.eu, data.gov, datahub.io
  • 13. Privacy Analysis Does dataset contain sensitive information? Coverage Analysis Does dataset contain necessary information? Quality Analysis Define quality metrics using statistical data. Vocabulary Reuse Find a suitable vocabulary for your dataset. 13 How can you use LODStats data? Use Cases Link Target Identification Which datasets are good candidates for interlinking? “Detecting Similar Linked Datasets Using Topic Modelling” (ESWC’2016) [3]
  • 15. Availability ● Application ○ Online at: http://lodstats.aksw.org ○ LODStats processing module: https://github.com/aksw/lodstats ○ LODStats frontend including SPARQLify mappings: https://github.com/aksw/lodstats_www ○ Deployment setup (docker): https://github.com/AKSW/lodstats.docker ● Dataset ○ Online at: http://lodstats.aksw.org/sparql ○ Datahub.io: https://datahub.io/dataset/lodstats ○ Can be deployed in Virtuoso using docker-compose from deployment repo
  • 16. Processing of very large datasets (Spark/Hadoop) Improving usability of the frontend Extending data collection to crawling Conclusions & Future Work LODStats is easily replicable using Docker technology
  • 17. Augustusplatz 10, Room P905, 04109 Leipzig, Germany Address +49-341-97-32260 Phone iermilov@informatik.uni-leipzig.de Email twitter.com/akswgroup http://aksw.com/IvanErmilov 17 Contact Information
  • 18. Thank You Ivan Ermilov <iermilov@informatik.uni-leipzig.de>
  • 19. Linked Open Data Statistics: Collection and Exploitation by Ivan Ermilov, Michael Martin, Jens Lehmann, and Sören Auer in Proceedings of the 4th Conference on Knowledge Engineering and Semantic Web LODStats---An Extensible Framework for High-performance Dataset Analytics by Jan Demter, Sören Auer, Michael Martin, and Jens Lehmann in Proceedings of the EKAW 2012 References 1 2 Detecting Similar Linked Datasets Using Topic Modelling by Michael Röder, Axel-Cyrille Ngonga Ngomo, Ivan Ermilov, and Andreas Both in The Semantic Web. Latest Advances and New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 -- June 2, 2016, Proceedings 3