Linked Open Data for Public ContractsMartin NečaskýFaculty of Mathematics and Physics, Charles University in PragueFaculty of Informatics and Statistics, University of Economics in Prague13.6.2013 – Publications Office of the European Union, Luxembourg
Outline Introduction to Linked Data What benefits Linked Data bring for TED andPublic Procurement in EU? What does it mean for TED and others topublish its data as Linked Data? What we have already done in LOD2 project?
Web Applications Eco-system Linked Data helps to create an eco-system of webapplications which publish, enrich and consumedata about things in one shared global data spaceShared Global Data Space on the Web(Web of Data)App 1App 2App 3App 4App 5App 4
Architecture of Web of DocumentsShared global space of documentsBuilt on top of several simple principles:1. HTML as a format for publishingdocuments2. URLs as unique global identifiers ofdocuments3. HTTP for localization and accessingdocuments by their URLs4. hyperlinks between documentsThere are two kinds of applicationsworking in this space of documents:• web browsers (localizing andbrowsing documents throughhyperlinks)• search engines (indexing and full textsearching of documents)HTMLHTMLHTMLHTMLWeb browserSearch engineHTTPHTTP
Web of DocumentsCurrent Web (of Documents) provides lot ofdata about Prague. Problems• Data about Prague encoded in documentsdistributed across the Web• Documents intended for humans notcomputers• Documents about Prague or related thingsnot linked• Therefore, computers not able to processdata about Prague published on the Web http://monitor.statnipokladna.czPrague budgethttp://registry.czso.czBasic info about Praguehttp://www.praha.euPrague public contractshttp://www.czso.czDemography of Praguehttp://www.risy.czEU funded projects in Prague
Web of DocumentsTry to search for this information on thecurrent Web• Top 100 suppliers of Prague withheadquarters outside of Prague region.• Money spent in Prague for new childrenplaygrounds in the last 5 years per onechild.• Organizations in Prague funded by EUstructural funds and their top 100suppliers. http://monitor.statnipokladna.czPrague budgethttp://registry.czso.czBasic info about Praguehttp://www.praha.euPrague public contractshttp://www.czso.czDemography of Praguehttp://www.risy.czEU funded projects in Prague
Linked Data data published on the Web according to foursimple principles (introduced by sir T. B. Lee)1. Use URIs as names for things2. Use HTTP URIs so that people can look up thosenames.3. When someone looks up a URI, provide usefulinformation, using the standards (RDF, SPARQL)4. Include links to other URIs so that they candiscover more things.
Things as first-class citizensProjectCZ.2.16/2.1.00/22189Prague CityPragueCouncilPragueDemographyPragueBudgetContractDIL/23/07/007302/2010
HTTP URIs for ThingsProjectCZ.2.16/2.1.00/22189praha.eu (Prague)http://praha.eu/contract/7302http://praha.eu/councilhttp://praha.eu/citymfcr.cz (Ministry of Finance)http://mfcr.cz/prague/budgethttp://mfcr.cz/praguerisy.cz (Regional Information Service)http://risy.cz/location/praguehttp://risy.cz/contract/22189-01http://risy.cz/project/22189czso.cz (Czech Statistical Office)http://registry.czso.cz/praguehttp://czso.cz/praguehttp://czso.cz/prague/demogstat
Data about Things in RDFClientPlaygroundRevitalizationAuthority: PragueDelivery date: 31.8.2011Price: 28 444 000 CZK...PlaygroundRevitalization28444000 CZKdcterms:titlepc:contractingAuthoritypc:agreedPricegr:hasCurrencygr:hasCurrencyValue31.8.2011pc:estimatedEndDatehttp://praha.eu/contract/7302http://praha.eu/contract/7302http://praha.eu/contract/7302/pricehttp://praha.eu/council
<http://www.praha.eu/contract/7302>dcterms:title "Playground Revitalization" ;pc:estimatedEndDate "31.8.2011" ;pc:agreedPrice <http://www.praha.eu/contract/7302/price> ;pc:contractingAuthority <http://www.praha.eu/council> .<http://www.praha.eu/contract/7302/price>gr:hasCurrency "CZK" ;gr:hasCurrencyValue "28444000" .Data about Things in RDFClientPlaygroundRevitalizationAuthority: PragueDelivery date: 31.8.2011Price: 28 444 000 CZK...http://praha.eu/contract/7302
Vocabularies published RDF data would be hardly interpretablewhen each publisher would use proprietary predicates therefore, standardized (or at least widely used)predicates should have priority before proprietary ones e.g. Dublin Core, Good Relations, FOAF, schema.org, ... or more specific ones for public procurement• e.g., Public Contracts Ontology(http://purl.org/procurement/public-contracts ) predicates are defined in so called vocabularies (orontologies) note: ontology is a special case of vocabulary, it contains more detailedreasoning rules which is out of scope of this lecture note: not only predicates but also classes (= types of things) are defined invocabularies/ontologies
Linking URIs of Related Thingspraha.eu (Prague)http://praha.eu/contract/7302http://praha.eu/citymfcr.cz (Ministry of Finance)http://mfcr.cz/prague/budgethttp://mfcr.cz/praguerisy.cz (Regional Information Service)http://risy.cz/location/praguehttp://risy.cz/contract/22189-01http://risy.cz/project/22189czso.cz (Czech Statistical Office)http://registry.czso.cz/praguehttp://czso.cz/praguehttp://czso.cz/prague/demogstatc: hasBeneficiarya:fundedByb:hasBudgethttp://praha.eu/councild:hasDemography
d:hasDemographyLinking URIs of Related Thingspraha.eu (Prague)http://praha.eu/contract/7302mfcr.cz (Ministry of Finance)http://mfcr.cz/prague/budgethttp://mfcr.cz/praguerisy.cz (Regional Information Service)http://risy.cz/contract/22189-01http://risy.cz/project/22189czso.cz (Czech Statistical Office)http://czso.cz/prague/demogstatc:hasBeneficiarya:fundedByhttp://praha.eu/cityhttp://risy.cz/location/praguehttp://registry.czso.cz/praguehttp://czso.cz/praguehttp://praha.eu/councilowl:sameAsowl:sameAsb:hasBudget
Benefits of Publishing TED as LD Problem: It is hard to get a unified view of a chosenthing (i.e. contracting authority, supplier, contract,contract notice, tender, ...) from TED. The data about the thing is distributed across severalcontract notices. LD solution: Each thing has a unique TED HTTP URIwhich can be used by third-party applications to get allTED data for this thing. Data is represented as RDF graph respecting openlydefined vocabularies shared across developers andcommunities. Data include links to URIs of other things on TED. TED can flexibly and continuously extend the dataprovided for the thing.
Benefits of Publishing TED as LDUserWebapplication?detail=http://ted.eu/contract/CZ/54782145TED LDServicehttp://ted.eu/contract/CZ/54782145http://praha.eu/contract/7302http://praha.eu/contract/7302/pricehttp://praha.eu/councilTED easily assembles datarelated to the requestedcontract and returns it as aninterconnected graph to therequesting web application.
Benefits of Publishing TED as LDUserWebapplicationTED LDServicehttp://ted.eu/org/CZ/00064581http://praha.eu/contract/7302http://praha.eu/contract/7302/pricehttp://praha.eu/councilTED easily assembles datarelated to the requestedauthority and returns it as aninterconnected graph to therequesting web application.click?detail=http://ted.eu/org/CZ/00064581
Problems with HTTP URIs Today, public procurement data are collected fromcontracting authorities in a form of contract notices (callsfor tender, contract award notices, etc.) Notices usually do not contain explicit identifiers ofcontracting authorities and suppliers. These organizations are usually identified in the notices only bynames and addresses which are often misspelled and incorrect. Therefore, if we create an HTTP URI for an organization fromone notice, it is often very hard to recognize whether anorganization from another notice is the same one or not. Therefore, a serious questions arise – how the HTTP URI ofan organization (contracting authority/supplier) should looklike? How an organization should be identified in a noticeso that we are able to unambiguously recognize it?
Problems with HTTP URIs There are two possible solutions to this question,both are very simple from the technical point ofview but very complex from the political point ofview (enforcement in all EU countries) 1st solution: Some countries define unique mandatory identifiersfor organizations (for both, private companies as wellas public institutions). These identifiers should be present in the notices toidentify contracting authorities and suppliers. We can then use them to recognize organizations andassociate them with corresponding HTTP URIs.
Problems with HTTP URIs 2nd solution: Each organization involved in public procurement should have ownpublic profile on the Web with own HTTP URI. The public profile can be a simple HTML web page which also containsfew data encoded in RDF (technically, it is very simple) The public profile can be a part of the official web site of theorganization, e.g. http://praha.eu/public-profile Or, the organization can use services which can manage public webprofiles of organizations. There already exist such services, e.g.http://opencorporates.org• This service already contains profiles of many organizations, it associates themwith HTTP URIs and provides basic RDF data about them (title, address, etc.) The HTTP URI of the profile should become a part of the notice. This solution also saves some time and money because details aboutthe organization do not have to be repeated in each notice – eachnotice is linked to the HTTP URI where the information is present.• Yes, if you think about the problem that there is only actual information on the profile which can bedifferent than the information which was valid before for some earlier notices, then you are right. Butthis can be technically solved (e.g. TED and other authorities responsible for collecting publicprocurement data can back-up those information, etc.).
Problems with HTTP URIs 2nd solution:praha.eu (Prague)http://praha.eu/public-profilecompany-a.cz (Company A)http://company-a.cz/public-profileopencorporates.orghttp://opencorporates.org/company-b/public-profilehttp://opencorporates.org/company-c/public-profile...http://ted.europa.eu/notice/574832http://ted.europa.eu/notice/575833pc:contractingAuthority
Benefits of Publishing TED as LD Problem: It is hard to find information related to publiccontracts, contracting authorities and suppliers whichis published outside of TED somewhere else on theWeb, e.g., data from the post-award phase public contracts not published on TED profiles of contracting authorities and suppliers LD solution: TED publishes the basic data infrastructureof HTTP URIs of public contracts, contractingauthorities, suppliers, etc. Others can enrich this basic infrastructure with their owndata. The enriched TED datasets can be consumed by third-partyapplications and even by TED itself.
Benefits of Publishing TED as LDShared Global Data Space (Web)TED Linked DataBasic InfrastructurePublisher ofprofiles of CZsuppliersPublisher ofpost-awarddata of GEcontracts
Suitable suppliers for a contract?Benefits of Publishing TED as LDPublic spending perinhabitant in 2010Contracts similar to a contractPC Filing ApplicationPublic spending inCzech Republic"HeatMap" Application
Benefits of Publishing TED as LD Problem: Other authorities must copy TED datato their databases if they want to use TED data(which includes also republishing TED data). Repeated work for building such databases and theirmaintenance is paid from public budgets (!) LD solution: Other public authorities link theirprimary data (represented as Linked Data, notnecessarily published) to TED without the need tocopy, integrate and maintain this data in theirdatabase. Anyone who works with the data of such other publicauthority can get the data directly from TED ifnecessary.
Benefits of Publishing TED as LD Our planned experiment in Czech Republic in cooperation withCzech Ministry of Finance (MoF) and data about public contractsCZ Public Budgets(MoF)NUTS&LAU CZregionsCZ Public ContractsDemography (CzechStat. Office)Public contracts in Prague withPrague budget and demographystatistics? To show that institutions canshare data by linking the datainstead of copying them
Benefits for StakeholdersContracting Authorities and Suppliers Unified global data space covering various aspects of public procurementacross all EU countries. contracting authorities They can find similar contracts to their contracts. They can group their calls with other authorities to achieve better offers fromsuppliers. They can verify their requirements against requirements of other buyers toincrease quality and completeness of their requirements and ask for betterprices. They can search for suitable suppliers who realized similar contractssuccessfully in the past. suppliers They can get necessary information about opened calls for tenders. They can better inform potential customers about their offers. They can analyze previous contracts in their market to better target theirtenders and improve the quality of the services they offer. They can group with other suppliers with complementary offers for jointtendering.
Benefits for StakeholdersEU and Citizens EU saves money Only basic infrastructure is build and primary data is published• Related data is published and linked by third-parties There is no need to build and pay for complex applications andservices• These will be built by third-parties not only for citizens but also for contractingauthorities and suppliers solely on the base of their demand. There is no need to duplicate data in different public administrationservices and applications• Data is linked instead of copied EU supports building a common market and interoperability (ISA) EU supports transparency Citizens can more easily monitor what public administrations buy intheir city/country, from who and for how much They can also more easily compare the purchases of their city/countrywith other cities/countries.
Linked Data for TED – What needs tobe done to adopt LD principles?
Public Procurement and LOD2 Project vocabulary for publishing Public Contracts as LinkedData combination of existing broadly adopted vocabularies andtheir extension for public procurement (GoodRelations,Payments Ontology, schema.org, Dublin Core, SKOS) Public Contracts filing application web application for contracting authorities and suppliers It enables to publish data about public contracts as LinkedData. Contracting authorities can search for similar contracts andsuitable suppliers. Experimental Linked Data from Czech Republic, GreatBritain and TED
Experimental Linked Data from Czech Republic, GreatBritain and TED created as part of LOD2 projectCZPublicContractsCommonProcurementVocabularyCZ BusinessEntitiesCZDemographyStatsCZPublicBudgetsDBPediaTED PublicContracts andOrganizationsSDMXCZLAU RegionsNUTSRegions(RAMON)GB PublicContracts andOrganizationsProductsOntology