Where does EU money go? Availability and quality of Open Data on the recipients of EU Structural Funds
1 2nd Interna*onal EIBURS-‐TAIPS conference on: “Innova&on in the public sector and the development of e-‐services” Where does EU money go? Availability and quality of Open Data on the recipients of EU Structural Funds Marco Biage<, Luigi Reggi EIBURS-‐TAIPS team and Italian Ministry of Economic Development * email@example.com University of Urbino April 18th, 2013 * The views expressed here are those of the authors and, in parEcular, do not necessarily reﬂect those of the Ministry of Economic Development
2 Outline • Open Government Data and the development of public eServices• Open Data on EU Regional Policy• Relevant literature and research objectives• Methodology and results• Data collection• Nonlinear PCA & cluster analysis: identifying Open Data strategies• mlogit and logit models: the determinants of strategic choices• Conclusions
3 Open Govn’t Data and public eServices provision Increased openness of government datasets is emerging as a desirable feature acrossEurope (Davies, 2010). Open data is seen as having significant economic potential,generating user-driven innovation (Von Hippel, 2005) based on the availability ofpreviously restricted information and the creation of new firms. This can lead to thecreation of new public eServices that are both effective (user-centred) and efficient(harnessing capacity and knowledge outside government).In particular, Open Government Data (OGD):(a) fosters transparency and accountability of policy choices;(b) enables the creation of new public eServices by government, civil society andindividual citizens(c) increases the collaboration across government bodies and with citizens andenterprises(d) enables substantial improvements in the quality of policy making, in terms, e.g., ofquality of the spending and public value delivered;(e) may contribute to creation of social capital through the enhancement of informationflows to and from the citizen (e.g. participation to public debates, crowdsourcing ofrelevant information).
4 Open Government Data Deﬁni&on: The 8 Principles 1. Data Must Be CompleteAll public data are made available. Data are electronically stored information orrecordings, including but not limited to documents, databases, transcripts, and audio/visual recordings. Public data are data that are not subject to valid privacy, security orprivilege limitations, as governed by other statutes2. Data Must Be PrimaryData are published as collected at the source, with the finest possible level ofgranularity, not in aggregate or modified forms3. Data Must Be TimelyData are made available as quickly as necessary to preserve the value of the data.4. Data Must Be AccessibleData are available to the widest range of users for the widest range of purposes.5. Data Must Be Machine processableData are reasonably structured to allow automated processing of it.6. Access Must Be Non-DiscriminatoryData are available to anyone, with no requirement of registration.7. Data Formats Must Be Non-ProprietaryData are available in a format over which no entity has exclusive control.8. Data Must Be License-freeData are not subject to any copyright, patent, trademark or trade secret regulation.Reasonable privacy, security and privilege restrictions may be allowed as governed byother statutes.
5 EU Open Data policy E-government action plan 2011-2015• Improvement of Transparency• Access to information on government laws andregulations, policies and finance• Re-use of Public Sector InformationThe Digital Agenda for Europe“Turning government data intogold”Re-use of Public Sector Information Directive (2003)A common legislative framework regulating how public sector bodies should make theirinformation available for re-use in order to remove barriers such as discriminatory practices,monopoly markets and a lack of transparency.In December 2011, the Commission presented an Open Data Package:1. A Communication on Open Data2. A proposal for a revision of the Directive, which aims at opening up the market forservices based on public-sector information, by• including new bodies in the scope of application of the Directive such as libraries(including university libraries), museums and archives;• limiting the fees that can be charged by the public authorities at the marginal costsas a rule;• introducing independent oversight over re-use rules in the Member States;• making machine-readable formats for information held by public authorities thenorm.3. New Commission rules on re-use of the documents it holds
6 Relevant literature on open data policy Open data and the “invisible hand” Public Value & Data divide Current emerging pracEce focuses on the publica*on of open government data in machine-‐readable format, possibly through open standards, so that the data can be easily re-‐used by ciEzens, enterprises and civil society. How to measure this eﬀort? Governmentshould onlypublish data inopen, machine-readable formatsOther scholarsthink thatgovernment shouldconsider differentusers needs(public value) andprovide also easy-to-access data inprocessed form(data divide)Brito, 2007Robinson et al., 2009Dawes and Helbig, 2010Gurstein, 2011Harrison et al, 2011There’s a first stream ofliterature focusing on the“invisible hand” of privatesector or civil societyorganizations which is ableto reuse PSI and to mashup this information withother sources to createnew innovative services
7 Relevant literature on open data policy Theore*cal framework Source: Dawes (2010)Stewardship1. Metadata provision2. Data management3. Data standards and formats4. Information quality and classificationUsefulness1. Easy-to-use basic features2. Searching and display3. Use social media to enhancedescription and useEXAMPLES OF STEW & USEF VARIABLES:Most voted proposals from “Evolving Data.gov with You” online dialogue(as of April 21, 2010)Twocomplementaryprinciples thatneed to bebalanced
8 Research objec&ves • To explore the information-based strategies that European public agenciesare pursuing when publishing their data on the web• To analyze the evolution of such strategies from 2010 to 2012
9 Open Government Data and EU Regional Policy EU Cohesion Policy represents an ideal opportunity for measuring the levels of transparency,trustworthiness and interactivity of available open government data• Beneficiaries of public funding are widely recognized as the open data #1 priority (Osimo,2008)• Cohesion Policy is the second item of EU budget: 347 billion Euros for 2007-13 period. Thepurpose of cohesion policy is to reduce disparities between the levels of development ofthe EUs various regions.• Transparency of EU Structural Funds has been questioned• On the one hand, all Member States and EU regions are involved and share common rulesand regulations, which makes data perfectly comparable.• On the other hand, the regulations focus only on a minimum set of requirements forpublishing data on the web, which leaves room for an improvement in terms of detail,quality, access and visualization.“the managing authority shall be responsible for organising thepublication, electronically or otherwise, of 1. the names of thebeneficiaries, 2. the names of the operations and 3. the amountof public funding allocated to the operations”Structural Funds Regulation 2007-13Art. 7 Reg. 1828 8 dic 2006
10 Open Government Data and EU Regional Policy The new regulations for the 2014-2020 programming period – currently under negotiation– are stressing the need for more transparency and openness.Art. 105 General Regulation (EC proposal)Machine-readable format: CSV, XMLsingle national website or portalNow mandatory data fields include• Beneficiary name (only legal entities; no natural persons shall be named);• Operation name; Operation summary;• Operation start date & Operation end date (expected date for physical completion or fullimplementation of the operation);• Total eligible expenditure allocated to the operation;• EU co-financing rate (as per priority axis);• Operation postcode;• Name of category of intervention for the operation;• Date of last update of the list of operations.• The headings of the data fields and the names of the operations shall be also providedin at least one other official language of the European Union.
11 Empirical method Web-based analysis of the lists of beneficiaries of 434 EU27 Operational Programmesco-funded by Structural FundsEmpirical analysis:1. Aggregating the 33 initial variables2. Nonlinear Principal Component Analysis: reducing 33 variables to 2 maindimensions3. Identifying and analysing the evolution of open data strategies from 2010 to 20124. Exploring the determinants of the different strategies
12 Data collec&on An ad-hoc web-based survey has been carried out into the universe of all EU OPs co-funded by the European Regional Development Fund (ERDF) and the European SocialFund (ESF), aiming to ascertain the presence or absence of 33 specific quality features• All EU Countries and Regions included• 434 Operational Programmes reviewed[European Commission - DG Regional Policy database]• Starting point: EC DG Regional Policy and DG Employment dedicated portals• Three waves: Oct 2010, Oct 2011, Oct 2012The methodology stems from the following studies and guidelines: • Technopolis Group: Study on the quality of websites containing lists of beneﬁciaries of EU Structural Funds (2010) • UK Central Oﬃce of InformaIon: Underlying data publicaIon: guidance for public sector communicators, website managers and policy teams (2010) • Open Government Working Group: 8 Principles of Open Government Data (2007) • Open Knowledge FoundaIon, The Open Data Manual hSp://opendatamanual.org • W3C: Improving Access to Government through BeSer Use of the Web (2009) • Preliminary survey on prevailing characterisIcs (August-‐Sept 2010)
13 From 33 basic dichotomous variables to 8 indices For each of the categories composing Stewardship and Usefulness interms of access and dissemination of data on Structural Funds’beneficiaries, as follows the itemisation of the results attained by EUOperational Programmes through a simple index (expressed inpercentage) resulting from the sum of the characteristics already activeversus theoretically overall “activable” characteristics
14 From 33 basic dichotomous variables to 8 indices Aggregated variables Underlying variables Content CONT Final Beneﬁciary Project Axis Speciﬁc/Operat. ObjecEves IntervenEon Line Project descripEon Award and payment dates Project start/end dates Status (acEve/completed) Financial Data FIN Financial value allocated to the project Payments EU co-‐ﬁnancing NaEonal co-‐ﬁnancing (or other) Format = PDF Format = HTML Format = XLS or CSV PDF HTML XLSCSV PDF HTML XLS or CSV Informa*on Quality QUAL Last update date Update frequency Data descripEon Fields descripEon in another language Number of clicks from home page < 3 robots.txt does not prevent search engine search STEWARDSHIPVARIABLES
15 From 33 basic dichotomous variables to 8 indices Aggregated variables Underlying variables DB consulta*on through masks RIC Search by Fund type Search by Project Search by OP Search by Axis/Object./AcEon Search by Beneﬁciary Search by Resources Search by Territory/Area Search by Project status Advanced Func*ons GEO Georeferencing through maps VisualisaEon through graphs and other elaboraEons Data with sub-‐regional detail USEFULNESSVARIABLES
16 Descrip&ve stats All variables have increased during the short period of *me considered except (of course) pdf
17 Dimension reduc&on: Nonlinear PCA The eight constructed variables are categorical and metric but in no waycontinuous.We are willing to reduce the number of dimensions through “summarizing artificialones” and still preserve the basic (bi)linearity of a traditional multivariate techniquesuch as the Principal Component Analysis.Bilinearity means that data matrix are approximated by inner products of scores and loadings.WE ALSO WANT TO ALLOW FOR POSSIBLE NON LINEAR TRANSFORMATIONSOF THE VARIABLES => We use NON LINEAR PCA (NLPCA)Indeed, NLPCA should be used whenever there are rank orders made up by numericalvalues but the possibility of non linear transformations that better fit the bilinearmodel cannot be discarded. In other cases NLPCA can be performed together withMultiple Correspondence Analysis (De Leeuw, 2005).
18 Dimension reduc&on: Nonlinear PCA In other words, we do not only want to merely minimize the loss over scores andloadings to assess the fit of, say, p dimensions like it is done in the PCA but alsoover the admissible transformations of the columns of X (our data matrix).Least squares loss function of PCA to beminimized where a = component scores, b =loading scoresLeast squares loss function of NLPCA to beminimized where a, b are the same as aboveAdmissible transformations of variable j. NLPCA of this kind hasbeen proposed for monotone transformations by Lingoes &Guttman (1968), Kruskal & Shepard (1974). Young et al. (1978)and Gifi (1990) extended NLPCA to wider classes of admissibletransformations beyond monotone
19 Iden&fying EU regional open data strategies The following figures help us analyze graphically the first two underlyingdimensions of the 8 indices (variables) considered altogether.We plot the coordinates of the variables’ loadings (black arrows), which are veryimportant to analyze the relations between each variable, and the coordinates ofeach observation (blue little circles), that is each Operational Programme (OP)considered.The points represented are less than 434 because the OPs that share a common portal have the samecoordinates.We are looking for meaningful clusters of variables (loadings) that are consistentwith current literature on open data strategies
20 Iden&fying EU regional open data strategies 2010 2011 2012[35%][23%][38%][21%][47%][13%]
21 Iden&fying EU regional open data strategies 2010 & 2011The first dimension (accounted var = 35 to 38%) helps differentiate a “regulation-centred”approach from a proactive strategyThe second dimension (accounted var = 23 to 21%) is useful to distinguish between thestewardship and the usefulness approach3 different strategies1. where DIM1 > 0 & DIM2 > 0STEWARDSHIP STRATEGY (STEW): it implies the release of high-quality data in machine-readable format2. where DIM1 > 0 & DIM2 < 0USEFULNESS STRATEGY (USEF): focused on data visualization and interactive search inorder to include non-technically oriented citizens in open data re-use and understanding3. where DIM1 < 0REGULATION-CENTRED STRATEGY (PDF): this strategy is about NOT being open. Littledetail, little quality, PDF format pevailing
22 Iden&fying EU regional open data strategies 2012The first dimension (accounted var increases to 47%) helps differentiate a “regulation-centred” approach from a proactive strategyThe second dimension accounts for much less % of total variance (13%, while the thirdand fourth dimensions account for 12 and 11% respectively) and is hardly interpretable.Some variables previously belonging to alternative proactive strategies now are highlycorrelated.For example, in 2010 a machine-readable format was associated with highly detailedfinancial data on project implementation or with proper metadata and projects’ description,while the presence of a map or of advanced search capabilities was likely where data werepresented directly in a HTML page. Now the two formats are highly correlated.So we take into account only the first dimension to interpret the results.We can identify only two alternative strategies, based on the 1st DIM:1. where DIM1 > 0MIXED PROACTIVE STRATEGY2. where DIM1 < 0REGULATION-CENTRED STRATEGY (PDF)
23 Strategies iden&ﬁed: descrip&ve tabs by year 2010 2011 2012 n % n % n % Regulation-centred [PDF] 255 59 Regulation-centred [PDF] 235 54 Regulation-centred [PDF] 233 54 Usefulness 106 24 Usefulness 120 28 Mixedproactive 201 46 Stewardship 73 17 Stewardship 79 18 Total 434 100 Total 434 100 Total 434 100 No. of OPs by strategy adopted
24 How do they evolve over &me? Transi&on matrices The majority of PDF-centered OPs are confirming their strategy. PDFs and “closed data”are die-hard features of EU OPs!However, from 2010 to 2012, OPs adopting the “regulation-centered” strategy (PDF) areslightly decreasing over time. From 2010 to 2011, most of these OPs switched to theUsefulness strategy (17.5% of OPs adopting the Usefulness strategy in 2011 have chosenthe PDF strategy back in 2010).
28 Explaining strategies: the independent variables What are the determinants of the strategic choices made by EU publicauthorities?We employ the following variables as regressors1) centralization = presence of a centralized national website or portal, i.e. one site forall OPs active in the Country (it changes through the 3 years: no=0 from 234  to 225, oppositely from 225 to 234 yes=1)2) fund = EU Regional Development Fund (ERDF) or EU Social Fund (ESF) (317 EDRFand 117 ESF)3) financial endowment = total financial resources allocated to the OP (the onlycontinuous independent variable)4) objective = 1 for Convergence objective, 2 for Competitiveness and Employmentobjective, 3 for Cooperation objective, U for OPs that belongs to both Convergence andCompetitiveness objectives (161 OPs for 1, 173 for 2, 71 for 3, 29 for U)5) naz_reg = territorial scope of the OP (71 cb= Cross border, 12 m=multiregional, 92n=national, 258 r=regional6) new_entries = YES if new Member States, NO if EU15 (71 missing = crossborder –no nationality of OPs, 268 of old member states, 95 of new member states)
29 Explaining strategies: the technique Clusterization showed that for 2010 and 2011 3 strategies are present. In2012 the story is quite different. There are only 2 strategies.Furthermore, variables used hardly change through the years. That iswhy the use of non linear panel data techniques is not very informative inour case.WE PREFER TO USE MULTINOMIAL LOGIT (ML) FOR THE FIRSTTWO YEARS AND LOGIT (L) FOR THE LAST TO CHECK HOWINDEPENDENT VARIABLES MOLD THE PROBABILITY OFCHOOSING A STRATEGY.ML => 3 STRATEGIES L => 2 STRATEGIESTwo specifications proposed: Model A with all of the OPs; Model Bwith Convergence and Competitiveness OPs but without Cross-border OPs. Model B allows us to add the variable “new entry”which cannot be attributed to Cross-border OPs.
31 Explaining strategies: basic results -‐ 2010 Centralization affects positively both proactive strategies in bothspecifications. So does the fact of being a new member in model BESF does bad in model A for proactive strategiesFinancial endowments are good for proactive strategies exclusive ofstewardship in model B. So do objective 2 programs except for stewardshipin model A.Multiregional programs are ok for proactive strategies only in model BRegional programs affect negatively the shift from pdf to uselfuness inmodel A and positively that from PDF to stewardship in model B (so donational for what concerns model A)
32 Explaining strategies: results (from pdf to other) 2010 These categoriesare important as LRtest showsconfirming thePseudo R2 whenthe variable newentry has beentaken outThis means that model B isbetter specified even thoughwe lose CB OPs there
33 Explaining strategies: some predicted probs 2010 In model B, if an OP were centralized there would be a 42% prob that a pdfstrategy were adopted, a 44% prob of adopting a usefulness strategy and a14% prob for the stewardship strategy. But if it were adopted by a newmember state the pdf strategy would decrease to 5%, the usefulness wouldgo down to 15% and stewardship would increase to 80%!!In model A If an OP were centralized there would be a 32% prob ofadopting a pdf strategy, a 41% prob that a usefulness strat were adoptedand a 27% prob for stewardship.
34 Explaining strategies: results (from pdf to others) 2011 Base category = PDFUsefulnessStewardshipBase categories: Centralization==0, fund=ERDF, objective=1, naz_reg(model A)=cb | naz_reg (model B)=n, new_entries (model B)=0
35 Explaining strategies: basic results -‐ 2011 The specification of the model loses momentum in 2011 (Pseudo R2 decreases for both specifications).Even centralization – though strongly and positively correlated tothe probability of adopting proactive strategies – is a bit less so forwhat concerns the shift from PDF to stewardship in model B. Newmembership keeps on counting a lot.National, regional or multiregional programs keep on being notvery informative in model A in the shift to stewardship, while nationaland regional ones affect negatively the path from PDF tousefulness.Oppositely, in model B multiregional OPs are positively correlatedto the shifts towards proactive strategies. Again model B should bepreferred even though an analysis of CB OPs cannot be performed(CB are by definition lacking of the variable membership).
36 Explaining strategies: results (from pdf to other) 2011 It does not changemuch in 2011exclusive of adecrease in thestrong significanceof the objective 2 ,multiregional andregional programsAgain model B with lessobservation but showingbetter specificationperformance
37 Explaining strategies: some predicted probs 2011 In model B If an OP were centralized the probabilities would not changemuch wrt 2010 but. If centralization were carried out by new member statesthe prob of adopting a passive strategy would be 6%, that of usefulnesswould be 19%, that of stewardship 75%In model A If an OP were centralized there would be a 31% prob ofadopting a pdf strategy, a 43% prob that a usefulness strat were adoptedand a 26% prob for stewardship (they hardly change).
38 Explaining strategies: results (from pdf to proac&ve) 2012 Base category = PDF (remind it is a binary logit)ProactivestrategyBase categories: Centralization==0, fund=ERDF, objective=1, naz_reg(model A)=cb | naz_reg (model B)=n, new_entries (model B)=0
39 Explaining strategies: basic results -‐ 2012 Centralization and new membership are confirmed to be the mostimportant determinants also on the mixed strategy.ESF affects negatively the proactive strategy more in model A than inmodel B while financial endowment affects it positively more in the formerthan in the latter.Objective 2 programs are better in the better specified model B, whileobjective U are negative for proactive strategies in model AMultinational programs are good in model B, while regional are bad forproactive strategies in model A.
40 Explaining strategies: some predicted probs 2012 In model A, were a OP centralized it would have 69% of odds of adopting aproactive mixed strategy. In model B this prob would be 62% but it wouldincrease to 92%(!!!) if it were adopted by a new member state!TO SUM UP: SENIORITY IN MEMBERSHIP AND CENTRALIZATION AREFOUND TO BE THE MOST IMPORTANT DETERMINANTS FOR THEADOPTION OF PROACTIVE STRATEGIES
41 Conclusions 1. There is still a long way to go to ensure that data on EU Regional Policy are trulytransparent and re-usable for the creation of new public eServices.A nonlinear multivariate analysis of 8 indices on the openness and transparency of 434Operational Programmes in Europe shows that a strategy that we called “Regulation-centered” (PDF) is prevailing (54% of total OPs adopted it in October 2012). Thisstrategy implies little information detail, difficult accessibility, non-machine readableformats. Available information is limited to basic information on projects, funding andbeneficiaries2. In 2010 and 2011 we can also identify 2 different proactive strategies:a. a first strategy focuses on the characteristics of data quality and reusability(content, financial data, downloadable XLS format, ease of search, update anddescription), which then appear strongly inter-connected. This strategy istherefore consistent with the Stewardship principle developed in the literature byDawes (2010).b. a second strategy focuses on the characteristics that enable users to moreeffectively access data published in administrations’ websites. The variablescharacterising this cluster are: presence of a search mask, data geo-referencing,and use of "pop-up" or other HTML views to display data detail on projects andbeneficiaries. This strategy is consistent with the Usefulness principle
42 Conclusions 3. From October 2010 to October 2012 the strategies have evolved, leaving room formore speculation about what kind of supply of policy data we can expect for the future.More precisely, data suggests that the two proactive strategies have become one.In fact, it is impossible to clearly distinguish a strategy based on re-usable formats anddetailed information from a strategy focused on letting users browse through data anddiagrams.For example, some national or regional portals now let the users both downloadthe data in bulk and surf through the data right on the website. Obviously, this isgood news for researchers, data journalists and ordinary citizens. Data providers seemto be more aware that the usefulness and stewardship principles are complementary.4. The characteristic of the OPs that influences the most the choice of a pro-activestrategy is the presence of a centralized, national portal containing all data from theOPs managed within the Country. This is consistent with the provisions of theproposed new 2014-2020 General Regulation of Structural Funds.New EU Member States tend to be more open and transparent in managing EUfunds. This choice could be explained by the greater influence that the EU Commissioncan exert on local Managing Authorities.
UsefulnessStewardshipClosed dataData quality approach FOCUSED ONraw data,advanced user,mash-up appsDatavisualizationapproachFOCUSED ONprocessed data, nontechnically-orientedcitizensOpen, hi-‐quality, useful and accessible data Re-‐user centered User centered RegulaEon centered Conclusions: the path to a balanced approach