Your SlideShare is downloading. ×
On metadata for Open Data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

On metadata for Open Data

316
views

Published on

On an enlarged metadata set for open data classification, allowing for automated processing and linking

On an enlarged metadata set for open data classification, allowing for automated processing and linking

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
316
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. On Metadata for Open Data Yannis Charalabidis 25.04.2012
  • 2. IntroductionWe will try in the next slides to show you what is the level of expectation from metadata handling from a 2nd generation open data system
  • 3. Imagine you are in front of the ENGAGE system, and you have your URI from a dataset, somewhere in the cloud, (copied as string in the clipboard) And begin …
  • 4. Prescreening: User only gives URI of the dataset Enter (paste) the URI of your dataset _
  • 5. (then for 30 seconds you see this screen, changing) Progress of ENGAGE Resource Prescreening: ( 45% ) of jobs completed Managed to : Identify xls file Autofill, provisionally: Title Autofill, provisionally: Creator Create unique ENGAGE URI Calculate keywords Autofill, provisionally: keywords … …
  • 6. (When finishing import, the report) Report ENGAGE managed to automatically, provisionally fill in ( 21 ) of 43 metadata attributes for your dataset. Your current validity is at ( 45% )For your dataset to be inserted in the database, you need to continue filling in ( 5 ) mandatory attributes. Your dataset will then be inserted with validity ( 55% )If all ( 17 ) non-mandatory attributes are filled in, validity will be maximum, at 70% / limit of the insertion phase. Please select next action: Continue Park Cancel
  • 7. After import …… and then, we enter the metadata insertion page with pre-filled data, etc.When we finish, we get a similar final report.AND NOW THE ENGAGE METADATA set, that makes all that a possibility:
  • 8. But,before, some semantics:Attribute characteristics – notation:(M) : attribute is Mandatory (cannot be empty)(*) : attribute takes values from a controlled list of terms (codelist), or tree (dag of terms), or table(+) : takes values from an extendible list or tree. User may extend the list during insertion(a) : an auto-filling list (as suggestion) or otherwise automatically calculated attribute(m) : attribute accepts multiple values(v) : attribute entry can be verified through a type-checking algorithm(( x )) : x is possible, but as an optionno tag : attribute is a simple string entry---------- for the future -------------(c0), (c1), (c2), (c3) : the importance of attribute in completeness calculation (c3 is higher – mostly important)(q0), (q1), (q2), (q3) : the importance of attribute in data quality calculation (q3 is higher – mostly important)
  • 9. A. The core attributes Size of ExistingMetadata Attribute Type of Attribute Type of codelist codelist (nodes) codelistsTitle (M) ((a))Automatic: extracted from the dataset headline - - - Stringof the URI/dataset providedPublisher (M)(*)(+) 100 X Greece Tree of StringsPUB admin tree (100 per country, extendible) Pointer to Tree country (ENG)CreatorPUB admin tree (100 per country, extendible) (M)(*)(+) 100 X Greece Tree of PS entitiesPrompt: same as the publisher Pointer to Tree country (ENG)CodeAutomatic: ENGAGE automatic classification (M)(*)(a)system (date,country,PSector,type,etc) or - - - StringENGAGE URIUser - (*)(a)The user who uploads that. Automatic filling Table of Users - Pointer to Tablefrom table of users / login
  • 10. B. The outer core attributes Size of ExistingMetadata Attribute Type of Attribute Type of codelist codelist codelists (nodes)Subject (M)(*)(+) All resourceText describing the resource in one sentence List of strings NO Pointer to List subjectsIt can be stored in a list and reusedTypeList of types: dataset, linkable dataset, (M)(*)(m) List of strings 10 ENGvisualization, textual information, executable Pointer to listbinary, unknownFormat (M)(*)(+) List of strings 50 ENGxls xml odata … jpd pdf … (appr. 50 format types) Pointer to listLanguageISO simplified (5 < 20 (EU) < ISO (3000). (M)(*) ((a)) (m) List of strings 200 ISO ListAutomatic: extract from language settings (when Pointer to List (ENG)XLS / ISO)Country (M)(*)(m) ISO List5 ENGAGE countries < rest of 27 EU < other List of strings 200 Pointer to List (ENG)countries ISO country list
  • 11. C. The Public Sector Context Size of ExistingMetadata Attribute Type of Attribute Type of codelist codelist codelists (nodes)Public Sector DomainTree of sectors (20: finance, health, social (*)(m)(+)security, etc) Tree of strings 20 ENG, GR Pointer to TreeAutomatic : can be calculated from Creator, if allpublic sector entities have a domainRelative Public ServiceList of public services (i2010 20 basic services, (*)(m)(+)plus “other-reward service”, “othr permission List of strings 24 ENG, GR Pointer to Listservice”, “Other registry entry service”, “Otherpersonal documents service”)Relative Information System (*)(m)(+)List of EU and national main information systems List of strings 200 GR Pointer to List(50+50*country)Legal FrameworkMain EU directives on open data (10), main Table of Legalnational laws and decrees on open data (10 X (*)(m)(+) 100 GR Elementscountry)
  • 12. D. The Scientific Context Size of ExistingMetadata Attribute Type of Attribute Type of codelist codelist codelists (nodes)Scientific Sector (*)(m) Tree of strings 100 ScienceENGAGE Tree of Scientific Domains Pointer to TreeScientific Usage of ResourceENGAGE tree of scientific types/usages: events (*)(m)(+) Tree of strings 20 Sciencedata (nature or man-made), financial data, health Pointer to Treedata, etc (20)Intended AudienceList of possible audiences: citizens, enterprises, (*)(m)(+)researchers, public sector managers, public Tree of strings 20 ENGAGE Pointer to Listsector officers, policy makers, members ofNational Parliament, MEP’s, NGO’s etcKeywordsInitial list made / proposed by ENGAGE System (*)(m)(+)(a)with countries, Psector Domain, Science Domain, List of strings 200 - Pointer to ListUsage. Also get from linked areas / domains /types etc
  • 13. E. URL’s – URI’s - Links Size of ExistingMetadata Attribute Type of Attribute Type of codelist codelist (nodes) codelistsType of Source Link (*)(+)URL / URI / DOI / WS / RSS/ ENGAGE / other List of Strings 10 ENG Pointer to ListSource Link (URL) Codelist isString or ENGAGE URL (*)(a). Automatic: put the (*) (+) ((a)) the full list List of Strings YesURL of ENGAGE site Pointer to List of URI’s in ENGAGEType of Resource link (*)(+)URL / URI / DOI / WS / RSS/ ENGAGE other List of Strings 10 ENG Pointer to ListResource Link Codelist isString or ENGAGE (a). Automatic lists the link it (*) (+) ((a)) the full list List of Strings Yesalready has. Pointer to List of URI’s in ENGAGERelevant Resources Codelist isList of existing URI’s in the system . Automatic: the full list (*)(m)(+)(a) List of Strings Yescalculates from matching domain+type+ of URI’s in ENGAGE
  • 14. F. Linked Data Size of ExistingMetadata Attribute Type of Attribute Type of codelist codelist (nodes) codelistsLinking statusLinkable, linked, non-linked, non-linkable, (*) List of Strings 5 YESunknown Pointer to ListLinked Data Set (*)(m)(+)(a)(d)URI of a linked dataset. List of URI’s No limit - Pointer to ListDetails of link: Linking Type (PK match) Pointer to List List of Strings 1 - Matching column of this resource String - - - Matching column of linked resource String - - - Columns of this resource, to be included (m) String - - - Columns of linked resource, to be included (m) String - - -Visualisations (*)(m)(+)(a)(d) List of URI’s No limit -Links to visualisations of current resource Pointer to List
  • 15. G. Dates and Status Size of ExistingMetadata Attribute Type of Attribute Type of codelist codelist (nodes) codelists (v)Consideration Started on - - - DATE (v)Initial Approval / Planning Started on - - - DATE (v)Planned to be valid on - - - DATE (v)Validity Started on - - - DATE (v)Validity to finish on - - - DATE (v)Rejected on - - - DATE (v)Substituted on - - - DATEStatusConsidered, planned, valid, valid and linked, (*) (a)rejected, outdated, substituted. List of Strings 8 ENG Pointer to ListAutomatic: calculation through DATES
  • 16. H. Rating Size of ExistingMetadata Attribute Type of Attribute Type of codelist codelist (nodes) codelistsMetadata CompletenessAutomatic: calculated by filled / empty non Number (1-100) - - -mandatory itemsMetadata QualityAutomatic: calculated by specific filled / empty Number (1-100) - - -non mandatory itemsCitizen Rating Number (1-100) - - -As reported / calculated by relative usersResearcher Rating Number (1-100) - - -As reported / calculated by relative usersBusiness Rating Number (1-100)As reported / calculated by relative usersNumber of Downloads Number - - -As reported by the ENGAGE SystemDensity of Downloads Number % - - -As number per total period of validity to date
  • 17. Not to forget: Metadata codelistswhere there, since the Hearing … ! An Infrastructure for Open, Linked Governmental Data Provision towards Research Communities and Citizens Proposal Evaluation Hearing Brussels 23/2/2011
  • 18. Q6: Which types of metadata will you select?• Exploit work already done by the consortium (DELFT, NTUA, AEGEAN, STFC) in public sector metadata schemas• Multi-facet design: take under consideration the fact that the data may be used in different contexts, such as research, policy making or by citizens• Take under consideration the fact that data sources may provide wildly differing metadata – go towards metadata standardisation for Open Data / a major contribution of ENGAGE• Two-phase metadata design within ENGAGE workplan (Task C1.2: Data and knowledge representation annotation and linking methods). Initial proposal based on Dublin Core, UK eGovernment Metadata Schema and eGMS+, is as following: Metadata ENGAGE Set Identifier Title Creator Publisher Country Source Type (*) Format (*) Language (*) Sector (*) Subject (*) Keywords (*) Relative Public Service (*) Relative Information System URL / URI / DOI Validity Date (from – to) Audience (*) Legal Framework Status (*) Relevant Resources Linkded Data Sets (*) (*) Indicates Controlled Lists / Taxonomies