STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
Data Licensing on the Cloud - Empirical Insights and Implications for Linked Data
1. Data Licensing
on the Cloud
Empirical Insights and
Implications for Linked Data
Ivan Ermilov
Tassilo Pellegrini
2. Outline
● Open Source Licensing and Data Licensing
● Licensing for the Web of Data
○ Requirements
○ Possible Solutions
● Licensing Empirical Study
○ Methodology
○ Results
● License Composition
○ Problems
○ License Compositor
● Limitations and Future Work
3. Open Source Licensing
Open Source Package
License
Open Source Package
Package License
Dependency 1
License 1
Dependency 1
License 1
Dependency 1
License 1
Dependency N
License N
Idealistic View Realistic View
5. Open Data Licensing (Use Case)
Dataset A
License A
Dataset N
License N
...
● User selects N datasets for her particular usage scenario
● Can user commercialize an application she’s building?
● Which actions are legally acceptable for this combination of datasets?
6. Open Source & Data Licensing: Summary
● Multilayer licenses
● Licenses are created for humans by humans
● Metadata is provided
○ For open source software license is described in pom.xml files (Java projects)
○ Open data maintained in data catalogs has machine-readable metadata
7. Licensing for the Web of Data: Requirements
● Machine Readable
○ License should be easily accessible by machine agents
○ Machines should be able to reason about licenses
● Human Readable
○ License should be easy to read and understand by a human
● Applicable in real lawsuits
○ But sophisticated enough to be used in real world
8. Licensing for the Web of Data: Solution
● Creative Commons
○ Open source licenses
○ Machine-readable (RDF)
○ Provides a simple vocabulary to describe data
● Open Data Commons
○ Open data/database licenses
○ Only plain text, PDF, ODT, RTF
10. Licensing Empirical Study: Methodology
● We choose the most popular data catalogs
○ Europe: Publicdata.eu
○ USA: Data.gov
○ Canada: Open Data Canada
○ Open Knowledge Foundation: Datahub.io
● Aggregated 441 315 individual datasets
○ CKAN aggregator: https://github.com/AKSW/ckan-aggregator-py
● Analyzed the licensing information
11. Results
Data Licenses on the Cloud
Datagov Open Canada Public Data Datahub
Datasets 132 206 244 257 55 481 9 371
License Types 10 3 50 33
Not Specified 99.6% 0.0% 24.3% 59.1%
CC 0.4% 0.0% 35.3% 17.1%
ODC 0.0% 0.0% 0.5% 4.8%
Other 0.0% 100.0% 39.9% 19.0%
Deref. Link 0.4% 100.0% 43.2% 23.1%
Mach. Read. 0.0% 0.0% 2.6% 2.2%
12. Problems
● Most of the
licenses are not
machine-readable
○ TLDR Legal: https:
//tldrlegal.com/
● How to compose
those licenses
together?
● Can combined
license be matched
to legal text?
13. License Composition
● Given human-readable deeds, machine-readable metadata, and lawyer-
readable licenses
● Combine machine-readable metadata from a set of licenses
● Provide recommendation on the data usage to the user
○ Which license can be used for the derivative dataset?
○ Can it be used commercially?
15. Limitations and Future Work
● Application is limited to a simple prototype
○ Necessity for a complex application with license extractors and analyzers
● Further analysis of existing licenses are required
● There is no ontology for reasoning over the license data
● Evaluation is required for the license compositor
○ Is it valid to make such a combination of licenses?
● Creating custom licenses
○ Automatization for right clearance processes
16. Thank you for your attention!
iermilov@informatik.uni-leipzig.de Ivan Ermilov
tassilo.pellegrini@fhstp.ac.at Tassilo Pellegrini
This work has been carried out within the project NoLDE (Network of Linked
Data Excellence) funded by the Austrian Promotion Research Agency under
grant number 3592880. We acknowledge support from GeoKnow project, GA
number no. 318159, as well as BMBF project SAKE.