2. Introduction
We will try in the next slides to show you what is
the level of expectation from metadata
handling from a 2nd generation open data
system
3. Imagine you are in front of the ENGAGE system,
and you have your URI from a dataset,
somewhere in the cloud,
(copied as string in the clipboard)
And begin …
5. (then for 30 seconds you see this
screen, changing)
Progress of ENGAGE Resource Prescreening:
( 45% ) of jobs completed
Managed to :
Identify xls file
Autofill, provisionally: Title
Autofill, provisionally: Creator
Create unique ENGAGE URI
Calculate keywords
Autofill, provisionally: keywords
…
…
6. (When finishing import, the report)
Report
ENGAGE managed to automatically, provisionally fill in ( 21 ) of 43 metadata
attributes for your dataset.
Your current validity is at ( 45% )
For your dataset to be inserted in the database, you need to continue filling
in ( 5 ) mandatory attributes.
Your dataset will then be inserted with validity ( 55% )
If all ( 17 ) non-mandatory attributes are filled in, validity will be maximum, at
70% / limit of the insertion phase.
Please select next action: Continue Park Cancel
7. After import …
… and then, we enter the metadata insertion
page with pre-filled data, etc.
When we finish, we get a similar final report.
AND NOW THE ENGAGE METADATA set, that
makes all that a possibility:
8. But,before, some semantics:
Attribute characteristics – notation:
(M) : attribute is Mandatory (cannot be empty)
(*) : attribute takes values from a controlled list of terms (codelist), or tree (dag of terms), or table
(+) : takes values from an extendible list or tree. User may extend the list during insertion
(a) : an auto-filling list (as suggestion) or otherwise automatically calculated attribute
(m) : attribute accepts multiple values
(v) : attribute entry can be verified through a type-checking algorithm
(( x )) : x is possible, but as an option
no tag : attribute is a simple string entry
---------- for the future -------------
(c0), (c1), (c2), (c3) : the importance of attribute in completeness calculation (c3 is higher – mostly important)
(q0), (q1), (q2), (q3) : the importance of attribute in data quality calculation (q3 is higher – mostly important)
9. A. The core attributes
Size of
Existing
Metadata Attribute Type of Attribute Type of codelist codelist
(nodes) codelists
Title
(M) ((a))
Automatic: extracted from the dataset headline - - -
String
of the URI/dataset provided
Publisher (M)(*)(+) 100 X Greece
Tree of Strings
PUB admin tree (100 per country, extendible) Pointer to Tree country (ENG)
Creator
PUB admin tree (100 per country, extendible) (M)(*)(+) 100 X Greece
Tree of PS entities
Prompt: same as the publisher Pointer to Tree country (ENG)
Code
Automatic: ENGAGE automatic classification
(M)(*)(a)
system (date,country,PSector,type,etc) or - - -
String
ENGAGE URI
User -
(*)(a)
The user who uploads that. Automatic filling Table of Users -
Pointer to Table
from table of users / login
10. B. The outer core attributes
Size of
Existing
Metadata Attribute Type of Attribute Type of codelist codelist
codelists
(nodes)
Subject
(M)(*)(+) All resource
Text describing the resource in one sentence List of strings NO
Pointer to List subjects
It can be stored in a list and reused
Type
List of types: dataset, linkable dataset, (M)(*)(m)
List of strings 10 ENG
visualization, textual information, executable Pointer to list
binary, unknown
Format
(M)(*)(+) List of strings 50 ENG
xls xml odata … jpd pdf … (appr. 50 format types)
Pointer to list
Language
ISO simplified (5 < 20 (EU) < ISO (3000). (M)(*) ((a)) (m)
List of strings 200 ISO List
Automatic: extract from language settings (when Pointer to List
(ENG)
XLS / ISO)
Country
(M)(*)(m) ISO List
5 ENGAGE countries < rest of 27 EU < other List of strings 200
Pointer to List (ENG)
countries ISO country list
11. C. The Public Sector Context
Size of Existing
Metadata Attribute Type of Attribute Type of codelist codelist codelists
(nodes)
Public Sector Domain
Tree of sectors (20: finance, health, social
(*)(m)(+)
security, etc) Tree of strings 20 ENG, GR
Pointer to Tree
Automatic : can be calculated from Creator, if all
public sector entities have a domain
Relative Public Service
List of public services (i2010 20 basic services,
(*)(m)(+)
plus “other-reward service”, “othr permission List of strings 24 ENG, GR
Pointer to List
service”, “Other registry entry service”, “Other
personal documents service”)
Relative Information System
(*)(m)(+)
List of EU and national main information systems List of strings 200 GR
Pointer to List
(50+50*country)
Legal Framework
Main EU directives on open data (10), main
Table of Legal
national laws and decrees on open data (10 X (*)(m)(+) 100 GR
Elements
country)
12. D. The Scientific Context
Size of
Existing
Metadata Attribute Type of Attribute Type of codelist codelist
codelists
(nodes)
Scientific Sector (*)(m)
Tree of strings 100 Science
ENGAGE Tree of Scientific Domains Pointer to Tree
Scientific Usage of Resource
ENGAGE tree of scientific types/usages: events (*)(m)(+)
Tree of strings 20 Science
data (nature or man-made), financial data, health Pointer to Tree
data, etc (20)
Intended Audience
List of possible audiences: citizens, enterprises,
(*)(m)(+)
researchers, public sector managers, public Tree of strings 20 ENGAGE
Pointer to List
sector officers, policy makers, members of
National Parliament, MEP’s, NGO’s etc
Keywords
Initial list made / proposed by ENGAGE System
(*)(m)(+)(a)
with countries, Psector Domain, Science Domain, List of strings 200 -
Pointer to List
Usage. Also get from linked areas / domains /
types etc
13. E. URL’s – URI’s - Links
Size of
Existing
Metadata Attribute Type of Attribute Type of codelist codelist
(nodes) codelists
Type of Source Link
(*)(+)
URL / URI / DOI / WS / RSS/ ENGAGE / other List of Strings 10 ENG
Pointer to List
Source Link (URL) Codelist is
String or ENGAGE URL (*)(a). Automatic: put the (*) (+) ((a)) the full list
List of Strings Yes
URL of ENGAGE site Pointer to List of URI’s in
ENGAGE
Type of Resource link
(*)(+)
URL / URI / DOI / WS / RSS/ ENGAGE other List of Strings 10 ENG
Pointer to List
Resource Link Codelist is
String or ENGAGE (a). Automatic lists the link it (*) (+) ((a)) the full list
List of Strings Yes
already has. Pointer to List of URI’s in
ENGAGE
Relevant Resources Codelist is
List of existing URI’s in the system . Automatic: the full list
(*)(m)(+)(a) List of Strings Yes
calculates from matching domain+type+ of URI’s in
ENGAGE
14. F. Linked Data
Size of
Existing
Metadata Attribute Type of Attribute Type of codelist codelist
(nodes) codelists
Linking status
Linkable, linked, non-linked, non-linkable, (*)
List of Strings 5 YES
unknown Pointer to List
Linked Data Set
(*)(m)(+)(a)(d)
URI of a linked dataset. List of URI’s No limit -
Pointer to List
Details of link:
Linking Type (PK match) Pointer to List List of Strings 1 -
Matching column of this resource String - - -
Matching column of linked resource String - - -
Columns of this resource, to be included (m) String - - -
Columns of linked resource, to be included (m) String - - -
Visualisations (*)(m)(+)(a)(d)
List of URI’s No limit -
Links to visualisations of current resource Pointer to List
15. G. Dates and Status
Size of
Existing
Metadata Attribute Type of Attribute Type of codelist codelist
(nodes) codelists
(v)
Consideration Started on - - -
DATE
(v)
Initial Approval / Planning Started on - - -
DATE
(v)
Planned to be valid on - - -
DATE
(v)
Validity Started on - - -
DATE
(v)
Validity to finish on - - -
DATE
(v)
Rejected on - - -
DATE
(v)
Substituted on - - -
DATE
Status
Considered, planned, valid, valid and linked,
(*) (a)
rejected, outdated, substituted. List of Strings 8 ENG
Pointer to List
Automatic: calculation through DATES
16. H. Rating
Size of
Existing
Metadata Attribute Type of Attribute Type of codelist codelist
(nodes) codelists
Metadata Completeness
Automatic: calculated by filled / empty non Number (1-100) - - -
mandatory items
Metadata Quality
Automatic: calculated by specific filled / empty
Number (1-100) - - -
non mandatory items
Citizen Rating
Number (1-100) - - -
As reported / calculated by relative users
Researcher Rating
Number (1-100) - - -
As reported / calculated by relative users
Business Rating Number (1-100)
As reported / calculated by relative users
Number of Downloads
Number - - -
As reported by the ENGAGE System
Density of Downloads
Number % - - -
As number per total period of validity to date
17. Not to forget: Metadata codelists
where there, since the Hearing … !
An Infrastructure for Open, Linked
Governmental Data Provision towards
Research Communities and Citizens
Proposal Evaluation Hearing
Brussels 23/2/2011
18. Q6: Which types of metadata will you select?
• Exploit work already done by the consortium (DELFT, NTUA, AEGEAN, STFC) in public
sector metadata schemas
• Multi-facet design: take under consideration the fact that the data may be used in
different contexts, such as research, policy making or by citizens
• Take under consideration the fact that data sources may provide wildly differing
metadata – go towards metadata standardisation for Open Data / a major
contribution of ENGAGE
• Two-phase metadata design within ENGAGE workplan (Task C1.2: Data and knowledge
representation annotation and linking methods). Initial proposal based on Dublin Core,
UK eGovernment Metadata Schema and eGMS+, is as following:
Metadata ENGAGE Set
Identifier Title Creator
Publisher Country Source
Type (*) Format (*) Language (*)
Sector (*) Subject (*) Keywords (*)
Relative Public Service (*) Relative Information System URL / URI / DOI
Validity Date (from – to) Audience (*) Legal Framework
Status (*) Relevant Resources Linkded Data Sets (*)
(*) Indicates Controlled Lists / Taxonomies