Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linked Data for Czech Legislation

4,833 views

Published on

The slides show what is linked data and how we experiment with linked data in the area of legislative documents (in Czech Republic).

Download the slides for detailed embedded comments.

Published in: Technology, Travel
  • Be the first to comment

  • Be the first to like this

Linked Data for Czech Legislation

  1. 1. Linked Data for Czech LegislationMartin Nečaský, Ph.D.necasky@xrg.czMatematicko-fyzikální fakulta Univerzity Karlovyhttp://www.xrg.czhttp://www.opendata.cz
  2. 2. Our projects in Nutshell The goal of our effort is to enable intelligent browsing and queryinga set of semi-structured documents from some domain. legislative documents project documentation medical documentation basic prerequisite – documents have some common characteristics The project consists of following steps extract useful structured data from semi-structured documents withNLP techniques transform extracted data to Linked Data so that the data can be easily(= quickly and cheaply) interconnected with other related data andwith the original documents provide tools for browsing and querying the created data + documentsspace
  3. 3. Rolesufal.mff.cuni.cz ksi.mff.cuni.cz
  4. 4. Outline What is Linked Data? current Web publishing data on Web Linked Data principles Linked Data for legislative documents basic ideas what we have done and what we want to do sample data and queries
  5. 5. What is Linked Data?
  6. 6. Web of DocumentsCurrent Web (of Documents) provides lot ofdata about Prague. Problems• Data about Prague is encoded indocuments distributed across the Web• Documents are intended for humans notfor computers• Documents about Prague or related thingsare not linked Computers are not able to process dataabout Prague published on the Web http://monitor.statnipokladna.czPrague budgethttp://registry.czso.czBasic info about Praguehttp://www.praha.euPrague public contractshttp://www.czso.czDemography of Praguehttp://www.risy.czEU funded projects in Prague
  7. 7. Web of DocumentsTry to search for this information on thecurrent Web• Top 100 suppliers of Prague withheadquarters outside of Prague region.• Money spent in Prague for new publicplaygrounds in the last 5 years per onechild.• Public playgrounds in Prague funded by EU.http://monitor.statnipokladna.czPrague budgethttp://registry.czso.czBasic info about Praguehttp://www.praha.euPrague public contractshttp://www.czso.czDemography of Praguehttp://www.risy.czEU funded projects in Prague
  8. 8. Architecture of Web of DocumentsUnified global space of documentsBuilt on top of several simple principles:1. HTML as a format for publishingdocuments2. URLs as unique global identifiers ofdocuments3. HTTP for localization and accessingdocuments by their URLs4. hyperlinks between documentsThere are two kinds of applicationsworking in this space of documents:• web browsers (localizing andbrowsing documents throughhyperlinks)• search engines (indexing and full textsearching of documents)DatabaseAHTMLDatabaseBHTMLDatabaseDHTMLDatabaseCHTMLWeb browserSearch engineHTTPHTTP
  9. 9. What about publishing data? The next step should be publishing data instead ofdocuments Raw (open) data about things published on the Web whichcan be processed by machines (applications, domain-specific search engines) See public administration efforts in the area of publishingopen data:• http://data.gov.uk• http://1.usa.gov/193lKN6 We can publish data on the current Web! basic way: data files with their own URLs in differentformats (CSV, XLS, DBF, XML, etc.) advanced way: Application Programming Interfaces (APIs)
  10. 10. Web can publish data! APIsDifferent APIs provide machinereadable data for further processingin so called mash-up applications.Also built on several simple principles:• XML/JSON as formats for publishingdata• HTTP URIs as global uniqueidentifiers of APIs and theiroperations• HTTP protocol for transferring databetween APIs and applicationsDatabaseADatabaseBDatabaseDDatabaseCMash-up AppMash-up AppHTTPProprietaryData API AHTTPHTTP HTTPProprietaryData API CProprietaryData API DProprietaryData API B
  11. 11.  Current principles and technologies do not lead toWeb of Data! publishing data about things not based on the principleswhich have already been invented for documentsProblems with data on current WebWeb of Documents Current Web IS NOT Web of Data!HTML as a format for publishing documents many formats for publishing data (XML,JSON, CSV, XLS, ...)URLs as unique global identifiers ofdocumentsno unique global identifiers of thingsHTTP for localization and accessingdocuments by their URLsHTTP for localization of APIs and accessingthem (REST) [but not for localization ofthings and accessing their data]hyperlinks between documents none of current formats enables to linkrelated things
  12. 12. Linked Data data published on the Web according to 4simple principles (introduced by sir T. B. Lee)1. Use URIs as names for things2. Use HTTP URIs so that people can look up thosenames.3. When someone looks up a URI, provide usefulinformation, using the standards (RDF, SPARQL)4. Include links to other URIs so that they candiscover more things.
  13. 13. Linked Data vs. DocumentsWeb of Documents Linked Data = Web of Data!HTML as a format for publishing documents RDF as a format for publishing data aboutthingsURLs as unique global identifiers ofdocumentsHTTP URIs (URLs) as unique globalidentifiers of thingsHTTP for localization and accessingdocuments by their URLsHTTP for localization and accessing thingsby their HTTP URIshyperlinks between documents links between related entities
  14. 14. Things as first-class citizensPublic contractOSM/MZ/044/09City of PraguePrague council Prague budgetPrague demographyEU funded projectCZ.2.16/2.1.00/22189Public contractMAN/23/07/007316/2010Public contractDIL/23/07/007302/2010
  15. 15. HTTP URIs for Thingsczso.cz (Czech Statistical Office)http://registry.cszo.cz/praguehttp://www.czso.cz/praguehttp://www.czso.cz/prague/stats/demogmfcr.cz (Ministry of Finance of CZ)http://www.mfcr.cz/praguehttp://www.mfcr.cz/prague/budgetpraha.eu (Prague)http://www.praha.eu/contract/007316http://www.praha.eu/cityhttp://www.praha.eu/councilhttp://www.praha.eu/contract/006870http://www.praha.eu/contract/007302risy.cz (Regional Information Service in CZ)http://www.risy.cz/location/praguehttp://www.risy.cz/project/412457http://www.risy.cz/contract/007302
  16. 16. Data about Things in RDFClientHTTP REQUESTPlaygroundRevitalizationAuthority: PragueDelivery date: 31.8.2011Price: 28 444 000 CZK...http://www.praha.eu/contract/007302http://www.praha.eu/contract/007302PlaygroundRevitalizationhttp://www.praha.eu/contract/007302/price28444000 CZKdcterms:titlepc:contractingAuthoritypc:agreedPricegr:hasCurrencygr:hasCurrencyValue31.8.2011pc:estimatedEndDatehttp://www.praha.eu/council
  17. 17. Data about Things in RDFClientHTTP REQUESTPlaygroundRevitalizationSupplier: PKS INPOSDelivery date: 31.8.2011Price: 28 444 000 CZK...http://www.praha.eu/contract/007302<http://www.praha.eu/contract/007302>rdf:type pc:Contract ;dcterms:title "Playground Revitalization" ;pc:estimatedEndDate "31.8.2011" ;pc:agreedPrice <http://www.praha.eu/contract/007302/price> ;pc:contractingAuthority <http://www.praha.eu/council> .<http://www.praha.eu/contract/007302/price>rdf:type gr:PriceSpecification ;gr:hasCurrency "CZK" ;gr:hasCurrencyValue "28444000" .
  18. 18. Vocabularies published RDF data would be hardly interpretablewhen each publisher would use proprietary types types of properties (= predicates) and types of things(= classes) therefore, standardized (or at least widely used)predicates should have priority beforeproprietary ones e.g. Dublin Core, Good Relations, FOAF, schema.org, ... predicates are defined in so called vocabularies(or ontologies) note: ontology is a special case of vocabulary, itcontains more detailed reasoning rules which is out ofscope of this lecture
  19. 19. Vocabularies classes and predicates semantic relationships between classes and predicates in onevocabulary or more different vocabularies subtyping (sub-class of, sub-property of) semantic equivalence (equivalent class, equivalent property) – whentwo different vocabularies define classes/properties with the samesemantics vocabularies expressed in RDF using RDF Schema, OWL vocabularies each class and predicate has own HTTP URI mechanism of XML namespaces and prefixes is usually used class URI is used to denote the type of a thing:<http://www.praha.eu/contract/007302> rdf:type pc:Contract . predicate URI is used to denote the predicate in a triple:<http://www.praha.eu/contract/007302> dcterms:title "..." .
  20. 20. Linking URIs of Related Thingsrisy.cz (Regional Information Service in CZ)http://www.risy.cz/project/412457czso.cz (Czech Statistical Office)http://www.czso.cz/prague/stats/demogmfcr.cz (Ministry of Finance of CZ)http://www.mfcr.cz/prague/budgetpraha.eu (Prague)http://www.praha.eu/contract/007316http://www.praha.eu/cityhttp://www.praha.eu/councilhttp://www.praha.eu/contract/006870http://www.praha.eu/contract/007302n1:budgetn2:demographyn3:beneficiaryn3:realizedByhttp://registry.cszo.cz/praguehttp://www.czso.cz/praguehttp://www.mfcr.cz/praguehttp://www.risy.cz/location/praguehttp://www.risy.cz/contract/007302
  21. 21. Linking URIs of Same Thingsczso.cz (Czech Statistical Office)http://registry.cszo.cz/praguehttp://www.czso.cz/praguemfcr.cz (Ministry of Finance of CZ)http://www.mfcr.cz/praguepraha.eu (Prague)http://www.praha.eu/cityhttp://www.praha.eu/councilhttp://www.praha.eu/contract/007302risy.cz (Regional Information Service in CZ)http://www.risy.cz/contract/007302owl:sameAsowl:sameAshttp://www.risy.cz/project/412457http://www.czso.cz/prague/stats/demoghttp://www.mfcr.cz/prague/budgethttp://www.risy.cz/location/prague
  22. 22. Related vs. Same Things Situation: Publisher A publishes some data about athing T under URI U you want to publish somethingnew about T  create yourown URI V for T, publish newdata under V and link V to Uwith owl:sameAs you want to say that yourthings are related to T but youdo not publish anything newfor T  do not create ownHTTP URI for T and do not copydata about T from A, only linkyour things to UYou AV... ......U............You A...U............
  23. 23. Primary Data vs. Secondary Dataczso.cz (Czech Statistical Office)http://registry.cszo.cz/praguehttp://www.czso.cz/praguehttp://www.czso.cz/prague/stats/demogmfcr.cz (Ministry of Finance of CZ)http://www.mfcr.cz/praguehttp://www.mfcr.cz/prague/budgetpraha.eu (Prague)http://www.praha.eu/contract/007316http://www.praha.eu/cityhttp://www.praha.eu/councilhttp://www.praha.eu/contract/006870http://www.praha.eu/contract/007302risy.cz (Regional Information Service in CZ)http://www.risy.cz/location/praguehttp://www.risy.cz/project/412457http://www.risy.cz/contract/007302
  24. 24. Linked Data for (Czech) Legislation
  25. 25. Linked Data in Czech LegislationActs andRegulationsCourtDecisionsPublicauthoritiesAgendas ofPublicAuthoritiesRights andobligationsLifesituationsdefinedetermineregulateexecuteActs andRegulationsProposalsresults from
  26. 26. Structural Layer of Legislative Documents structural parts of acts and regulations references between court decisions and parts of acts and regulations court decisions (legal case retrospection) amendments what we have done vocabulary of legislative documents metadata and structure of acts, regulations and decreesrepresented as Linked Data• metadata about each version of each act, regulation and decreesince 1945• structured content of versions of all acts, regulations and decreesvalid in 2011, 2012 extraction of references and retrospection  NLP
  27. 27. Structural Layer of Legislative DocumentsPublicContracts ActPublic ContractsAct Version07/2006Public ContractsAct Version07/2012Public ContractsAct Version01/2015Public ContractsAct Version06/2007Similarly, we representparagraphs, sections, etc. of eachversion of each law. However, wehave a problem to get consolidateddocuments.DECISIONXYZDECISIONABCrefers
  28. 28. Structural Layer of Legislative DocumentsCASE6 C 135/2007DECISION6 C 135/2007-44CASE21 Co 472/2008DECISION21 Co 472/2008-62made forDECISION6 C 135/2007-141CASE21 Co 458/2011DECISION21 Co 458/2011-173CASE26 Cdo 2523/2012based onextraordinaryappeal againstDECISION26 Cdo 2523/2012ActsOtherDecisionsMetropolitanCourt in PragueDistrict CourtPrague 9Supreme Court
  29. 29. Structural Layer of Legislative Documents browsing data http://linked.opendata.cz/resource/legislation/cz/act/2006/137-2006• instance of lex:Act representing Public Procurement Act
  30. 30. Structural Layer of Legislative Documents
  31. 31. Structural Layer of Legislative Documents querying data (SPARQL) http://linked.opendata.cz/sparql
  32. 32. Structural Layer of Legislative DocumentsWhich acts amended the Act about political parties of Czech Republic?PREFIX lex: <http://purl.org/lex#>PREFIX frbr: <http://purl.org/vocab/frbr/core#>PREFIX dcterms: <http://purl.org/dc/terms/>SELECT ?amendmentTitle ?amendmentValidityWHERE {?version frbr:realizationOf<http://linked.opendata.cz/resource/legislation/cz/act/1991/424-1991> .?change lex:changedOriginal ?version .?amendment lex:definesChange ?change ;dcterms:title ?amendmentTitle ;dcterms:valid ?amendmentValidity .}
  33. 33. Structural Layer of Legislative DocumentsOne of well-known hiddenamendments. It increasedthe payments of state topolitical parties from 500k to900k for one parliamentmember.
  34. 34. Structural Layer of Legislative DocumentsWhich another acts were amended together with Act about political parties of CzechRepublic?PREFIX lex: <http://purl.org/lex#>PREFIX frbr: <http://purl.org/vocab/frbr/core#>PREFIX dcterms: <http://purl.org/dc/terms/>SELECT ?anotherActTitle ?anotherVersionValidityWHERE {?version frbr:realizationOf<http://linked.opendata.cz/resource/legislation/cz/act/1991/424-1991> .?change lex:changedOriginal ?version .?amendment lex:definesChange ?change ;lex:definesChange ?anotherChange .FILTER (?change != ?anotherChange)?anotherChange lex:changeResult ?anotherVersion .?anotherVersion frbr:realizationOf ?anotherAct ;dcterms:valid ?anotherVersionValidity .?anotherAct dcterms:title ?anotherActTitle .}
  35. 35. Structural Layer of Legislative Documents
  36. 36. Structural Layer of Legislative DocumentsHow many changes have been done in Czech legislation per year?PREFIX lex: <http://purl.org/lex#>PREFIX frbr: <http://purl.org/vocab/frbr/core#>PREFIX dcterms: <http://purl.org/dc/terms/>SELECT (COUNT(?amendment) as ?changeCnt) (year(?validity) AS ?year)WHERE {?amendment lex:definesChange ?change ;dcterms:valid ?validity .}GROUP BY year(?validity)ORDER BY DESC(year(?validity))
  37. 37. Structural Layer of Legislative Documents
  38. 38. Semantic Layer of Legislative Documents rights, obligations and subjects defined bylegislation their occurrences in court decisions currently we start experiments with extractingthese concepts and relationships betweenthem from documents with acts  NLP based on syntactic parsing we do not have RDF representation yet
  39. 39. Semantic Layer of Legislative Documents
  40. 40. Semantic Layer of Legislative Documents

×