Successfully reported this slideshow.
Your SlideShare is downloading.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
  • Be the first to comment

The MetaLex Document Server - Legal Documents as Versioned Linked Data

  1. 1. The MetaLexDocument Server Rinke Hoekstra Universiteit van Amsterdam
  2. 2. The Problem• Knowledge• Provenance Regulation A Art 12 Art 14, lid 3, 2e volzin Art 14, lid 3, 2e volzin (01-01-2011) (04-02-2011) (11-06-2008) (01-07-2011)• Open Data: public service falls short• Large scale validation of CEN MetaLex• “Linked Open Government Data”
  3. 3. Current SituationPublic content services hosted at
  4. 4. XML Service• Only available format is BWB XML• Only current version• Content at document level• Identification at document level• Identifiers are not dereferencable• Hardly any metadata (e.g. version date)• Only available context is position in text
  5. 5. BWBId Web Service The problem with the XML processing instruction was reported and fixed, but returned sometime last week
  6. 6. Identifiers & Juriconnect 1.0:c:BWBR0005416&artikel=6 vs vs geldigheidsdatum_14-01-2005 • Juriconnect? • URN-based... but no naming server • (cf. Document Object Identifiers) • Named elements do not carry identifier • No explicit version information, only contextual
  7. 7. Sources used...• List of all regulations in “XML”• XML Service• Metadata in HTML table on (the “info page”)• ... so let’s get started already
  8. 8. Step 1Requirements
  9. 9. Our Goals• “Deserialize” regulation content (e.g. topic-based browsing)• Extract and reconstruct implicit information (identifiers, metadata)• Annotate regulations (reconstructed metadata, third-party metadata)• Annotate using regulations (knowledge based systems, services, business processes ...)• Accessible and reusable for any other party (shared vocabularies, standard access)
  10. 10. Requirements• Unique, persistent identification• Generic XML structure of documents• Extensible metadata framework• Flexible web services
  11. 11. Technology Choices• URL-like URIs• CEN MetaLex XML documents• Linked Data / RDF metadata (extensibility to OWL, RIF)• Transparent REST-services
  12. 12. Step 2 Come up with persistent identifiers atelement level and a solid versioning scheme
  13. 13. Identification• Web-enabled “URL-like” URIs • e.g.• “Cool” URIs ( • “Accept”-header based dereferencing • Different types of content at same URI
  14. 14. Levels of Identification Bibliographic Work Entity realizes• IFLA FRBR levels Expression embodies • Work Manifestation exemplifies • Expression Item • Manifestation XML version of regulation on XML version of Version of Regulation regulation regulation my harddisk
  15. 15. Transparent Identifiers• Hierarchical information (work)• Version and language (expression)• Format information (manifestation)
  16. 16. Problem • URIs don’t carry semantics... • Detect changes: • which element versions are the same • ... and which versions are different? Art. 44, lid 4 (2011-03-26)Art. 44, lid 4(2011-04-05) from: Besluit prudentiële regels Wft, BWBR0020420
  17. 17. Opaque Identifiers s1 s2 frbr:realizes frbr:realizes s1t1 s1t2 s1t3 s2t1 s2t2 s2t3 ... owl:sameAs owl:sameAs owl:sameAs owl:sameAs AE6 B9C 3F5 • Content information • Unique SHA1 Hash of text
  18. 18. Step 3Generic conversion of BWB XML to a generic XML format (CEN MetaLex) and appropriate metadata
  19. 19. Procedure For each BWB XML file listed,if update has occurred since latest run, download latest version, scrape metadata, and produce: Persistent URIs CEN MetaLex + Citations Inline RDFa (optional) or RDF graph (optional), Pajek “.net” files (optional)
  20. 20. CEN MetaLex• Straightforward 1:1 mapping • ... some minor fixes• Mint URI’s on the fly• Convert citations on the fly• Generate metadata on the fly • “inline” inside mcontainer elements
  21. 21. Results14 Table 1. Conversion performance for 300 randomly selected regulations. Number % Number % 42 Substitutions Corrections container 22312 29 % artikel 2525 72 % hcontainer 3730 5% divisie 519 15 % htitle 3730 5% colspec 289 8% block 34325 44 % illustratie 54 2% inline 13527 17 % others 99 3% Total 77624 Total 3486 Total no. of regulations 300 Revoked regulations 109 30 % Correction % 4%Lastly, the MDS offers a simple search interface for finding regulations based onthe title and version date.6 Conclusion(full description in draft ISWC 2011 paper) and ResultsWe ran the MetaLex conversion script on all regulations available through portal, resulting in a total of 27.687 versions of regulations being con- 40
  22. 22. Citations• Juriconnect citations: 1.0:v:BWBR0020486&artikel=6 1.0:c:BWBR0020486&artikel=6• MetaLex identifiers:
  23. 23. Metadata Vocabularies• “RDFized” BWB elements• MetaLex ontology • FRBR type, modification events, structure• Dublin Core • title, alternativeTitle, version• FOAF • page, homepage• Simple Event Model (SEM)• Open Provenance Model vocabulary (OPMV)• W3C Time Ontology
  24. 24. Events & Provenance The date at which the expression was created"2009-10-23"^^xsd:date time:Instant ml:Date sem:Time rdf:value sem:hasTimeStamp rdf:type rdf:type sem:timeType time:inXSDDateTime rdf:type opmv:Process sem:Event ml:LegislativeModification sem:hasTime rdf:type rdf:type time:hasEnd rdf:type ml:date sem:eventType The creation event of the regulation opmv:Artifact opmv:wasGeneratedAtThe process that generated the expression ml:resultOf rdf:type ml:BibliographicExpression opmv:wasGeneratedBy rdf:type The expression (version) URI of a regulation
  25. 25. Step 4Publish: The MetaLex Document Server (MDS)
  26. 26. Document Serving• RESTful API • Implement Cool URIs (Dereference to XML, RDF, .net) • Shorthands (‘/latest’) • SPARQL endpoint • Citation graphs• Rudimentary (and unpredictable) search• CSS Stylesheet for CEN MetaLex XML
  27. 27. Dereferencing (RDF)File containing Turtle serialisation of SCBD Accept: application/x-turtle 1 Client requests URI MDS returns Turtle 5 2 Server redirects to manifestation URI (HTTP 303) JSON serialisation SPARQLTriplestore returns SCBD 4 of SCBD Query 3 Server queries triplestore for Symmetric Concise Bounded Description (SCBD)
  28. 28. Dereferencing (XML) Location of Manifestation Accept: text/xml 1 Client requests URIMDS redirects to Manifestation URI (HTTP 302) 6 2 Server redirects to manifestation URI (HTTP 303) Triplestore returns URI of Manifestation 5 Manifestation Glob 3 Server queries file store for XML manifestation 4 If no manifestation exist, extract from parent (extract) (Clients may render XML using CSS stylesheet)
  29. 29. Dereferencing (...)• Other RDF syntaxes application/rdf+xml, text/rdf+n3• HTML clients application/xml, application/xhtml+xml, text/html • Redirect (303) to Marbles browser• Pajek clients text/plain • Download .net file • View using Gephi Toolkit
  30. 30. Technical Details• Current situation • +/- 27 thousand regulations • 87.9 million triples ( 1.9 billion) • Updated daily• Technical details • Dell PowerEdge II T110, 32GB RAM • Garlik 4Store triplestore ( • Python Django web applications • Tomcat servlet + Gephi Toolkit API• See
  31. 31. Step 5Use: social network analysis and concept extraction (ongoing work)
  32. 32. Network Analysis• Impact of regulation on other regulations (combine with work on court rulings)• Connectedness• “Importance” of articles• Analysis tools • Pajek, Gephi

    Be the first to comment

    Login to see the comments

  • ciciliati

    Jul. 14, 2011
  • Ghalem

    Sep. 20, 2011
  • cgueret

    Oct. 5, 2011
  • hochstenbach

    Oct. 5, 2011


Total views


On Slideshare


From embeds


Number of embeds