Hello, my name is Hans Overbeek. I am a civil servant working for the Dutch government. My main responsibility is to develop and manage a metadata standard for Dutch government information on the Internet. The standard we develop is called OWMS. OWMS is based on Dublin Core. We try to stay as close as possible to the specifications of the DCMI.
Let's take a look at the business case to see what we are talking about. The Netherlands is a small but densely populated country in western-Europe. We estimate the number of organizations within the dutch government to 1200. There are about 1600 governmental websites that contain public information for a population of about 16 million people and companies.
We see today that in the vast growth of supply of information many people lose their way.
In an attempt to reach different target audiences, new websites are built with new editorial boards, but with reused information. In order to make this re-use easier, we use metadata to give structure to information that is not structured in itself. We make difference between supply of information by governmental organizations – here on your right side - and demand for information by citizens and companies – here on your left.
At the demand-side we need a structure to create different views on the information and for instance to make mash-ups of information from different sources. This structure is what we call structure of demand . You can think of postal code-area to specify location, standard navigation structures like themes or life events, or pre-cooked queries on mash-up pages.
At the supply-side we need a stable standard for metadata . Some information – if not all information - can be posted for a long time. And you don't want to change the metadata if you have different needs in presentation.
n order to bridge the gap between the need for stability in the so called back-end and the need for flexibility in the front-end we need to build a model of our world in Linked Data. We call this layer the knowledge models . You can think of : Semantic networks, Geographic models, Decision trees, Inference rules, Thesauri, Synonyms, Translations etc. This is the 'Bigger Picture'. Let's have a look at the 'Smallest Parts'.
First the metadata that makes the stable structure of the information supply. OWMS, the metadata standard for Dutch government, is a Dublin Core Application Profile. In order to to be attractive enough for implementors of government information systems - such as Content Management, Record Management and Document Information Systems - we decided to choose a minimal set of properties as being mandatory to provide. So we came to these eight core-elements, the OWMS-core. Identifier, title and type - to identify and recognize the described information object. Language - to filter out objects in a certain language and because it is a property that is relatively easy to provide. Creator - to know “who-says-so”. It gives the described resource authenticity. Modified - also contains date created so you can tell how old the described information object is. Temporal - contains the period that the information object is about. Spatial - is the area or point that the information object is about.
What does OWMS metadata look like? Our aim is to make it look as simple as possible. And that's not easy! ;-) In it's simplest form it is just plain text, for example in case of dcterms:title.
For other properties we provide controlled lists of value or Vocabulary Encoding Schemes in Dublin Core words. In this example we use the value “Amsterdam” from the Vocabulary Encoding Scheme called “overheid:Gemeente”. (“Overheid” means “Government” , “Gemeente” means “Municipality”.)
At this very moment we are developing a framework to define URI's or Pointers for generally used concepts, also known as non-literal values. They can be used stand-alone like this pointer to the municipality of Amsterdam, ...
...or optionally in combination with a scheme and a label. These concepts will be defined as Linked Data and have relationships with other concepts.
An overview of the properties and ranges that we use in our standard can be found by the tinyurl I provided. The site is written in our beautiful language: Dutch, but this translation table should be enough to get an idea.
So our Dublin Core Application Profile follows an abstract model which is a little bit simpler than the Abstract Model of Dublin Core.
From the top of the model going down, we see that a described resource is described by a description. A Description consists of statements. A statement is a property-value pair A value can be - a URI, which is a pointer to a concept. - a plain value string - a Vocabulary encoding scheme and a string. As said before, this model is a simplification of the full Dublin Core Abstract Model. It is a bit more pragmatic and easier to understand, though it is not academically correct and not as flexible as the DCAM. There are two reasons for that: 1. is that we want to keep the metadata itself as easy to understand as possible. 2. We want to validate the metadata provided by information suppliers, which means that we want organizations to use known encoding schemes and for instance not to use xsi:type constructions or their own datatypes.
In case you are interested in the XML-schema definition that we use to validate content metadata against the standard, you will find it at this tinyurl. I can not go through the scheme within the time frame of this presentation, but if you are interested you are welcome to investigate it. If you have any questions, please don't hesitate to contact me.
So if information suppliers in their content metadata provide URI's to concepts, they use URI's to Linked Data. Unfortunately we are not ready yet to publish our concepts, so these URI's are not dereferenceable yet. We define all knowledge objects in the same namespace. We define a number of reference classes. Every Knowledge Object belongs to at least one of these classes. Some Knowledge objects are skos:Concepts and belong to a skos:ConceptScheme. Our concepts will be published following the very useful tutorial about Linked Data by the Free University of Berlin. I provided a tinyurl again. So we will publish a human-readable HTML-page and a machine-readable RDF-snippet for each concept. These concepts will have relations, represented in what we have called knowledge models.
We distinguish roughly five different knowledge domains, each with its own characteristics.
The concepts within the Geographical domain are the most tangible ones. Geographical entities can be projected on a map and therefor are quite easy to define and to recognize. They are very powerful when used for filtering or navigating to information. There are many initiatives around the world to standardize on geo-concepts.
These knowledge models can be purchased on the market.
Organizations are also quite formally defined in many occasions, though not always, unfortunately.
Therefor we categorize organizations and geo-concepts, using the owl vocabulary. Many organizations correspond to a geo-concept, being the mandate-area of that organization.
Then we distinguish a domain of government information types , products and services like laws, permits, announcements and so on.
They relate to the organizations which produce these kind of information.
The most interesting and the most challenging domain is that of subjects . That is because there seems to be infinite ways to name and classify subjects. Mostly the lemma's in these classifications are only defined by a label and lack proper definition. Subjects relate to information types. The knowledge models for subjects are probably the holy grail of the semantic web. And the fact that we have only just begun to discover how we should handle this, made us decide to put dcterms:subject not into our 'Dutch Core' so-far.
And finally there is the Language domain with knowledge about synonyms and homonyms and stemming-rules like plurals and conjugations.
These knowledge models are available as thesauri, often for free.
So we aim to model Geo-objects and organizations in owl and information types and subjects in skos. We realize that we are just starting this exercise and that a lot of the theory behind it is still developing. So we will not be able to make definite decisions. We have been thinking about the choice between RDF and Topic Maps. And we decided to start developing our knowledge models in RDF, since that seems to be the more popular framework today. But we try to keep the models so simple that we can convert or integrate with other frameworks if that is feasible.
There are still a lot of issues to be resolved. Our main concern is to develop a framework which is not only correct, but also simple enough to understand and to use. Therefor we need to present the classes and concept schemes in a way that is immediately clear to the user. We have to bring business rules in place in order to enforce uniform application of the standard, but on the other hand, the standard should be flexible enough to serve a broad scope of applications and domains. There are some practical issues, like how to cope with changes? Jenni Tennison gave a very useful definition of the problem. And finally we need organizations to understand the need for metadata, to understand the standard and to start using it!
Thank you for your attention and thank you, Liddy, for giving me the opportunity to present our case to you all. If there are any questions, please feel free to send me an e-mail. I wish you all a good conference in Seoul and I hope I will be able to join you next year. Thank you.
Dutch Government Business Case
Dutch Government Metadata O verheid.nl W eb M etadata S tandaard: OWMS Hans Overbeek [email_address] DC2009
Interoperability of government information <ul><li>1200+ organisations </li></ul><ul><li>1600+ websites
Information Architecture Findability Any question Information (Content)
Information Architecture Findability Any question Information (Content) amsterdam.nl government.nl government.nl xyz.nl Any website
Information Architecture Findability Any question Information (Content) Location (e.g. map) Life event Theme Search term Structure of demand (Standard question) amsterdam.nl government.nl government.nl xyz.nl Any website
Information Architecture Findability Any question Information (Content) Location (e.g. map) Life event Theme Search term Structure of demand (Standard question) identifier title type language creator modified spatial temporal Metadata (“Dutch Core”) amsterdam.nl government.nl government.nl xyz.nl Any website
Information Architecture Findability Any question Information (Content) Location (e.g. map) Life event Theme Search term Structure of demand (Standard question) identifier title type language creator modified spatial temporal Metadata (“Dutch Core”) Knowledge model (Linked Data) Geo Subject Language amsterdam.nl government.nl government.nl xyz.nl Any website
Metadata: Dutch Government Core <ul><li>dcterms:identifier </li></ul><ul><li>dcterms:title
Aim: Simple content metadata Text Value from a VES URI URI with optional VES and Text <dcterms:title> Amsterdam.nl Homepage </dcterms:title>
Aim: Simple content metadata Text Value from a VES URI URI with optional VES and Text <dcterms:title> Amsterdam.nl Homepage </dcterms:title> <dcterms:spatial scheme=”overheid:Gemeente”> Amsterdam </dcterms:spatial>
Aim: Simple content metadata Text Value from a VES URI URI with optional VES and Text <dcterms:title> Amsterdam.nl Homepage </dcterms:title> <dcterms:spatial scheme=”overheid:Gemeente”> Amsterdam </dcterms:spatial> <dcterms:spatial resourceIdentifier=”http://standaarden.overheid.nl/owms/terms/Amsterdam” > </dcterms:spatial>
Aim: Simple content metadata Text Value from a VES URI URI with optional VES and Text <dcterms:title> Amsterdam.nl Homepage </dcterms:title> <dcterms:spatial scheme=”overheid:Gemeente”> Amsterdam </dcterms:spatial> <dcterms:spatial resourceIdentifier=”http://standaarden.overheid.nl/owms/terms/Amsterdam” > </dcterms:spatial> <dcterms:spatial scheme=”overheid:Gemeente” resourceIdentifier=”http://standaarden.overheid.nl/owms/terms/Amsterdam” > Amsterdam </dcterms:spatial>
Documentation of the standard Human readable documentation of the standard: http://tinyurl.com/nlgov-doc <ul><li>Eigenschappen = Properties
Ideas, rules of thumb <ul><li>Use OWL:Class for more tangible entities (e.g. organisations)
Use SKOS:ConceptScheme for more abstract entities (e.g. information type, subject)
Start modeling from RDF/XML and try keeping models so simple that connection and conversion to alternate models (e.g. Topic Maps) remains possible. </li></ul>
Work in progress <ul><li>Technical R&D issues: </li><ul><li>How to determine classes and conceptschemes?
How liberal or concise should our business rules be in order to achieve quality without discouraging contributors?
How to cope with change? e.g. Date in URI? See: http://www.jenitennison.com/blog/node/108 and http://www.jenitennison.com/blog/node/112 </li></ul><li>Management issues: </li><ul><li>many organisations and collections