Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
An introduction to open data: context and practice
This	presenta,on	brought	to	you	by:	Open	Knowledge
Open	Knowledge	is	an	interna,onal	advocacy	network	promo,ng	open	data	and	
open	knowledge.
Open	Knowledge	develop	the	CKAN	open	data	portal	that	is	used	by	na,onal,	
regional	and	local	governments	worldwide	for	pu...
Tony Hirst is a part-time trainer for Open Knowledge and a Senior Lecturer at the
Open University. He promotes the use of ...
Open data may be contextualised by the wider notion of an open system.
Open systems accept inputs from, and produce output...
The FCO Open Data Strategy 2012 provides an overview of the role of information
within the department. The base principles...
The FCO operates in local, national and international environmental contexts, as well
as topical data/information contexts...
The FCO’s open data and transparency related activities are identified in the FCO’s
original Open Data Strategy document o...
Notes from the FCO Open Data Strategy and Open Data Strategy Refresh. Everyone
knows these documents backwards, I expect(?...
More notes… This belief is one that has been “pushed” in open government circles at
a global level. It’s taken ads axiomat...
Open data for transparency is pushed at a global level and may help drive inter- as
well as intra-governmental operations,...
As an example of how open data can be used to support real-time actions from a
standing start, within days, if not hours, ...
Government policies around open data and transparency reflect public expectations
about open access to data and informatio...
One indicator of public interest in gaining access to otherwise unpublished public/
government information is the steady r...
Some public bodies maintain an “FOI disclosure log” that summarises FOI requests
(and responses) on a periodic basis; whil...
As well as being a producer of requested and open data, the FCO may also be a
consumer of such data, or have a desire to w...
Releasing information for transparency purposes means opening yourself up to
account on that basis at least.
So how might ...
Services like “Where Does My Money Go?” help interpret budget allocations in terms
of hypothetical “hypothecated” spends. ...
The Great Rip off map was a Global Witness report that used a database containing
reports of fraudulent corporate activity...
An important consideration when releasing data is that it often leads to requests for
further data or information – analys...
Just making data available doesn’t always support transparency. On the one hand,
people don’t necessarily know how a set o...
So what do we really mean by open?
When open data isn’t, earlly - physical, limited access (every Tuesday), costs money,
voluminous and inconsistent, difficu...
According	to	a	summary	of	the	Open	Defini,on	(opendefini,on.org),	‘A	piece	of	data	
or	content	is	open	if	anyone	is	free	to	...
According	to	a	summary	of	the	Open	Defini,on	(opendefini,on.org),	‘A	piece	of	data	
or	content	is	open	if	anyone	is	free	to	...
1.	Completeness	
	
Datasets	released	by	the	government	should	be	as	complete	as	possible,	reflec,ng	
the	en,rety	of	what	is...
Data.gov.uk was established as a one stop directory for looking up open data
published by UK public bodies. The idea was t...
There are also FCO “spin-offs” separately listed…
It should be noted that there are many workflow related “issues” associa...
The www.gov.uk website is the primary home for departmental online publications. As
such, data.gov.uk records for departme...
Sometimes, it can just be easier to do a web search…
Sometimes, it can be easier to refine your search even more…
Sometimes, it can be easier to refine your search even more…
Although not an obvious source of national statistics (and national statistics offices
are a great place around the world ...
Quickly skimming FCO open data publications, one segment or cluster of publications
I spotted related to “cash” transparen...
When spend publication is siloed, eg by department, or locale, it’s often hard to see
the wider picture. This diagram – kn...
Another cluster of FCO pages I found relate to positions of “influence” or activities that
might be associated with influe...
Simple relationships may be used to identify large scale patterns, structures or
distributions across a wider set of relat...
Another grouping in the FCO transparency releases seem to relate to operational
matters…
Some of the data released by the FCO – or other government departments – may
end being aggregated on a topical basis by ot...
Civil Society organisations also collect and publish open data of their own collection,
data that is sometimes used in tur...
Government departments may also link out to open datasets published by other
public or arms length bodies.
Initiatives such as IETI – the Extractive Industries Transparency Initiative – open up
information about extractive indust...
OpenOil is eve more focused, publishing contract and corporate structure information
relating specifically to the oil indu...
So what does it mean to start engaging with open data practice in the FCO?
Here’s an example of some information that could be released as data. Unfortunately,
the information is locked up in a PDF...
Fortunately, tools such as Tabula allow us to extract data tables as tables from a PDF
document.
One of the features of open data is that we want it to be machine readable so we can
process it with machines…
Here’s	what	a	PDF	looks	like	to	us…
Here’s	what	it	looks	like	to	a	machine.	
	
There	are	ways	to	get	structured	data	out	of	a	PDF	but	they	can	be	difficult	and	...
Tabular	spreadsheet	data	
Limita,on	is	that	it	is	proprietary
CSV	=	Comma	Separated	Value	
Can	also	have	different	separators,	commonly	|	and	/t	(tab)	
Great	for	simple	tabular	data
Even if data is published as data, it still may not be easy to use or reconcile.
ACTIVITY: look up data about Ministers Gi...
Here’s some more information I found – the structure of the record suggests this may
have been pulled from a database, but...
JSON	=	JavaScript	Object	Nota,on	
Common	format	for	web	developers	
For	more	informa,on	on	XML	and	JSON	hLp://schoolofdata...
Somewhat paradoxically, legal notions of openness are based on (closed) models of
ownership.
A discussion of licensing starts from copyright: if there were no copyright, there would
be no need for (or possibility of...
Permissions in data and content include things like reading it (probably this is usually
already implicit), copying it, di...
Permissions	can	mixed	when	a	work	is	licensed…
The	Crea,ve	Commons	licensing	scheme	promotes	a	range	of	license	types.
Databases	are	covered	by	a	database	right	and	open	licenses	have	been	developed	
specifically	in	that	context.
In	the	UK,	the	Open	Government	Licence,	requiring	only	aLribu,on,	can	be	used	for	
releasing	data	by	any	local	or	na,onal	...
A brief look at the international context…
OGP was launched in 2011 to provide an international platform for domestic
reformers committed to making their governments...
Members:	Canada,	France,	Germany	(2015	Chair),	Italy,	Japan,	Russia,	United	
Kingdom,	United	States	
Principle	1:	Open	Dat...
Globally accepted principles
EXAMPLES:
Company Business register
Crime Statistics
Meteorological data, land use data, agriculture/fishing/forestry
Scho...
Is open data publication part of the workflow?
How are data formats selected?
How is redaction implemented?
How is the dat...
Fco open data in half day  th-v2
Fco open data in half day  th-v2
Fco open data in half day  th-v2
Fco open data in half day  th-v2
Fco open data in half day  th-v2
Fco open data in half day  th-v2
Fco open data in half day  th-v2
Fco open data in half day  th-v2
Fco open data in half day  th-v2
Fco open data in half day  th-v2
Fco open data in half day  th-v2
Fco open data in half day  th-v2
Fco open data in half day  th-v2
Upcoming SlideShare
Loading in …5
×

Fco open data in half day th-v2

889 views

Published on

Annotated slides on Open Data for Transparency

Published in: Education
  • Be the first to comment

Fco open data in half day th-v2

  1. 1. An introduction to open data: context and practice
  2. 2. This presenta,on brought to you by: Open Knowledge
  3. 3. Open Knowledge is an interna,onal advocacy network promo,ng open data and open knowledge.
  4. 4. Open Knowledge develop the CKAN open data portal that is used by na,onal, regional and local governments worldwide for publishing open data. CKAN is used to deliver the UK’s data.gov.uk website.
  5. 5. Tony Hirst is a part-time trainer for Open Knowledge and a Senior Lecturer at the Open University. He promotes the use of open data in education, such as the OU’s “Learn to Code for Data Analysis” free online FutureLearn course, which uses open data from the World Bank and UN Comtrade (import/export data) as part of the practical exercises used in the course.
  6. 6. Open data may be contextualised by the wider notion of an open system. Open systems accept inputs from, and produce outputs into, an external environment, and may also observe and learn from processes happening in that external environment. Their internal processes may also be observable from (transparent to observers in) the external environment.
  7. 7. The FCO Open Data Strategy 2012 provides an overview of the role of information within the department. The base principles (info a valued and managed asset) are realised by information that is fit for purpose, structured and cross-referenceable, which supports reuse and openness.
  8. 8. The FCO operates in local, national and international environmental contexts, as well as topical data/information contexts (for example, cross-governmental spending). ACTIVITY: how does openness/transparency feature in YOUR current work?
  9. 9. The FCO’s open data and transparency related activities are identified in the FCO’s original Open Data Strategy document of 2012 and the later refresh of 2013.
  10. 10. Notes from the FCO Open Data Strategy and Open Data Strategy Refresh. Everyone knows these documents backwards, I expect(?!;-) Laudable as they are, how practical and “actionable” are these? We’ll explore that later…
  11. 11. More notes… This belief is one that has been “pushed” in open government circles at a global level. It’s taken ads axiomatic, though perhaps need more critiquing as evidence collects about open data publication and use.
  12. 12. Open data for transparency is pushed at a global level and may help drive inter- as well as intra-governmental operations, and act as a common resource that can be used to build working relationships with and between NGOs and civil society, as well as supporting journalistic oversight.
  13. 13. As an example of how open data can be used to support real-time actions from a standing start, within days, if not hours, of an earthquake hitting a province in Taiwan, open data to support relief co-ordination efforts was being published by local government using a local civic open data portal.
  14. 14. Government policies around open data and transparency reflect public expectations about open access to data and information. (It is interesting to note that historically in the UK, one of the drivers for open data was the Guardian newspaper’s “Free Our Data” campaign that in-part called on “we’ve paid for it (through taxes) so we want access to it” arguments for public access to things like Ordnance Survey data.
  15. 15. One indicator of public interest in gaining access to otherwise unpublished public/ government information is the steady rate at which FOI requests have been made to public bodies since the advent of FOI legislation. One argument for the pre-emptive release of public data as open data is that it can reduce the burden on servicing FOI requests. If frequently requested information is made available openly as part of standard working practice, it can be directly referred to by the public rather than them having to request it. http://www.instituteforgovernment.org.uk/blog/12942/how-whitehall-responds-to- freedom-of-information-requests/
  16. 16. Some public bodies maintain an “FOI disclosure log” that summarises FOI requests (and responses) on a periodic basis; whilst the ICO might recommend such a publication, it is not mandatory. Services such as WhatDoTheyKnow, operated by MySociety, provides a public gateway to making FOI requests to bodies covered by UK FOI legislation. ACTIVITY: •  what sort of requests are made through WhatDoTheyKnow to the FCO? •  How do these compare to typical requests received directly? •  Do any of the requests relate to information that could (or is already) published routinely as open data by the FCO?
  17. 17. As well as being a producer of requested and open data, the FCO may also be a consumer of such data, or have a desire to work with other nations to publish data of a common form in a common format. Surveying FOI style sites similar to WhatDoTheyKnow in other jurisdictions provides an informal way of monitoring/tracking such activity elsewhere. (In the same way, reviewing open data portals from FCO equivalents in other countries may help normalise data releases through the informal adoption of a convention relating to data publication.)
  18. 18. Releasing information for transparency purposes means opening yourself up to account on that basis at least. So how might open data be used?
  19. 19. Services like “Where Does My Money Go?” help interpret budget allocations in terms of hypothetical “hypothecated” spends. Services like “The Daily Bread” further humanise the data, showing how the money is spent in terms of tax collected from a salary level similar to your own. Comparing such allocations of money to expenses gathered from other public services, such as local government spending data (£x on a care placement, for example) or health spending (cost of a particular prescription or treatment, for example), helps provide further context for the benefits of government. At a more detailed, transactional level, mismatches in budget and spend, or hard to explain allocations, may provide a forum for debate and oversight.
  20. 20. The Great Rip off map was a Global Witness report that used a database containing reports of fraudulent corporate activity to associate companies registered in secret jurisdictions with the countries in which their fraudulent activity took place. Information about company directorships from services like OpenCorporates support this sort of investigation using open data published by – or scraped from – company registration documents.
  21. 21. An important consideration when releasing data is that it often leads to requests for further data or information – analysis tends to be an iterative process. Analysis may also produce additional (derived, or summary) datasets, as well as richer linked datasets that allow alternative forms of segmentation.
  22. 22. Just making data available doesn’t always support transparency. On the one hand, people don’t necessarily know how a set of data was used to inform or help make a particular decision or frame a particular policy. On the other, when a data set is released that supports a report containing charts and tables derived from the data, for example, or the outcomes of an analysis applied to the data, it may be difficult, if not impossible, for a third party to try to replicate that charts, tables or analyses from the data. In some parts of the world of Open Science, there is a move to using open computational notebooks that blend text, computational analyses and the outputs of those analyses applied to linked to data sets to provide reproducible analyses that can be rerun, and checked over, by a third party. This screenshot shows an analysis published by Buzzfeed that shows how they analysed betting data to reveal match fixing tends in professional tennis (the story was broken by Buzzfeed working the the BBC). This openness of process and analysis is far more helpful than just releasing the betting data they found.
  23. 23. So what do we really mean by open?
  24. 24. When open data isn’t, earlly - physical, limited access (every Tuesday), costs money, voluminous and inconsistent, difficult to read, all calculations using data must be performed by hand. Still the case for some datasets like Electoral Register. ACTIVITY: what characteristics or features do you think define “open”?
  25. 25. According to a summary of the Open Defini,on (opendefini,on.org), ‘A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to aLribute and/or share-alike.’ While it is commendable that the Pagwell ledger can be consulted, it fails to meet the Open Defini,on in a number of ways: it is not open to anyone; it is not free to re-use either financially (there is a charge), legally (a disclaimer) or technically (the handwriLen format makes re-use almost impossible). The Open Defini,on, published and maintained by the Open Knowledge Founda,on, is widely accepted as defining what cons,tutes ‘open’ data.
  26. 26. According to a summary of the Open Defini,on (opendefini,on.org), ‘A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to aLribute and/or share-alike.’ While it is commendable that the Pagwell ledger can be consulted, it fails to meet the Open Defini,on in a number of ways: it is not open to anyone; it is not free to re-use either financially (there is a charge), legally (a disclaimer) or technically (the handwriLen format makes re-use almost impossible). The Open Defini,on, published and maintained by the Open Knowledge Founda,on, is widely accepted as defining what cons,tutes ‘open’ data. Again - ACTIVITY: what characteristics or features do you think define “open”?
  27. 27. 1. Completeness Datasets released by the government should be as complete as possible, reflec,ng the en,rety of what is recorded about a par,cular subject. All raw informa,on from a dataset should be released to the public, except to the extent necessary to comply with federal law regarding the release of personally iden,fiable informa,on. Metadata that defines and explains the raw data should be included as well, along with formulas and explana,ons for how derived data was calculated. Doing so will permit users to understand the scope of informa,on available and examine each data item at the greatest possible level of detail. 2. Primacy Datasets released by the government should be primary source data. This includes the original informa,on collected by the government, details on how the data was collected and the original source documents recording the collec,on of the data. Public dissemina,on will allow users to verify that informa,on was collected properly and recorded accurately. 3. Timeliness
  28. 28. Data.gov.uk was established as a one stop directory for looking up open data published by UK public bodies. The idea was that as and when data is published WHEREVER, a catologue record would also be added on data.gov.uk So how does data released as open data by the FCO fare when ranked against the Sunlight Principles? Check out some of the open data published by the FCO on data.ac.uk…
  29. 29. There are also FCO “spin-offs” separately listed… It should be noted that there are many workflow related “issues” associated with publishing records on data,gov.uk – it’s hard enough publishing the data in the first place without forgetting to add – or update – a corresponding record on data.gov.uk
  30. 30. The www.gov.uk website is the primary home for departmental online publications. As such, data.gov.uk records for departments like FCO might naturally point to the possibly more comprehensive (if dispersed) data related publication on the departmental site.
  31. 31. Sometimes, it can just be easier to do a web search…
  32. 32. Sometimes, it can be easier to refine your search even more…
  33. 33. Sometimes, it can be easier to refine your search even more…
  34. 34. Although not an obvious source of national statistics (and national statistics offices are a great place around the world to find open data), the FCO does publish some stats… QUESTION: on the data.gov.uk and www.gov.uk sites, what sort of data does the FCO tend to publish?
  35. 35. Quickly skimming FCO open data publications, one segment or cluster of publications I spotted related to “cash” transparency - how money flowed. One thing that’s often interesting to note around spending data is that data releases tend to be one sided. Spend data is published, but receipts aren’t..
  36. 36. When spend publication is siloed, eg by department, or locale, it’s often hard to see the wider picture. This diagram – known as a Sankey diagram, which is a useful chart type for visualising the forward flow of a conserved quantity.
  37. 37. Another cluster of FCO pages I found relate to positions of “influence” or activities that might be associated with influence,
  38. 38. Simple relationships may be used to identify large scale patterns, structures or distributions across a wider set of relationships. This map was constructed several years ago from the starting point of a single company name. By finding the directors of the company (using OpenCorporates), looking up other companies they were associated, finding the directors of those companies, and so on, then drawing lines between companies connected by two or more of the same directors, it is possible to identify a large number of companies within a corporate group, and perhaps even reveal something about the overall corporate structure. The lesson is that when things are connected in networks – which are can be simply represented by a row in a two column dataset – THIS CONNECTS-TO THAT – lots of things can fall out when you look at the network as a whole…
  39. 39. Another grouping in the FCO transparency releases seem to relate to operational matters…
  40. 40. Some of the data released by the FCO – or other government departments – may end being aggregated on a topical basis by other parties or agencies. IATI is a voluntary, multi-stakeholder initiative of governments and civil society that attemps to make information on aid flows more accessible. It enables recipient countries to plan and allocate more effectively and enables citizens of recipient countries to hold their governments to account for how they spend those resources.
  41. 41. Civil Society organisations also collect and publish open data of their own collection, data that is sometimes used in turn by government departments (I think BIS refer to Transparency International’s Corruption Perceptions Index, for example).
  42. 42. Government departments may also link out to open datasets published by other public or arms length bodies.
  43. 43. Initiatives such as IETI – the Extractive Industries Transparency Initiative – open up information about extractive industry contracts to support transparency and fight corruption.
  44. 44. OpenOil is eve more focused, publishing contract and corporate structure information relating specifically to the oil industry. Note that when I tried to gran the screenshot, the site appeared to be down. So instead I went to the Internet Archive’s Wayback machine to see an archived version of the site. ACTIVITY: visit the wayback machine to find old government pages. In the UK, the National Archive also maintain a web archive of notable UK sites.
  45. 45. So what does it mean to start engaging with open data practice in the FCO?
  46. 46. Here’s an example of some information that could be released as data. Unfortunately, the information is locked up in a PDF document.
  47. 47. Fortunately, tools such as Tabula allow us to extract data tables as tables from a PDF document.
  48. 48. One of the features of open data is that we want it to be machine readable so we can process it with machines…
  49. 49. Here’s what a PDF looks like to us…
  50. 50. Here’s what it looks like to a machine. There are ways to get structured data out of a PDF but they can be difficult and ,me consuming and it may result in errors in the extracted data or informa,on. For those interested in doing this, try Tabula or Scraper Wiki’s PDF extract tool
  51. 51. Tabular spreadsheet data Limita,on is that it is proprietary
  52. 52. CSV = Comma Separated Value Can also have different separators, commonly | and /t (tab) Great for simple tabular data
  53. 53. Even if data is published as data, it still may not be easy to use or reconcile. ACTIVITY: look up data about Ministers Gifts and Hospitality. Are the columns all the same…?
  54. 54. Here’s some more information I found – the structure of the record suggests this may have been pulled from a database, but it’s locked up in a PDF and the semantics - this line is the name of the establishment, these are the address, these are the services offered, are hard for a machine to extract reliably (the cues are visual and rely on common sense understanding).
  55. 55. JSON = JavaScript Object Nota,on Common format for web developers For more informa,on on XML and JSON hLp://schoolofdata.org/2013/11/21/xml- and-json/
  56. 56. Somewhat paradoxically, legal notions of openness are based on (closed) models of ownership.
  57. 57. A discussion of licensing starts from copyright: if there were no copyright, there would be no need for (or possibility of) licences. Copyright in a work rests with the creator of the work and effectively constitutes ownership over the work. Other people cannot use the work without the permission of the copyright holder. (In the case of data there is also, within the EU, a ‘database right’ in the ‘arrangement’ of data, separate from copyright but similar to it.) When publishing data it is essential to publish a statement making clear how people can use the data; such a statement is called a licence. Without a clear licence, people will be wary of re-using the data in case they are breaching copyright. In general a licence might grant certain permissions, subject to specified conditions.
  58. 58. Permissions in data and content include things like reading it (probably this is usually already implicit), copying it, distributing the data either as it is or in modified form, perhaps mixed in with other data, etc. Copyright is inherent in the data; it is not a contract. If someone republishes your data (with or without your permission), you still have copyright in it even if they have mixed it or modified it. Your licence potentially therefore binds all downstream users. (However, you can contract with someone to give them more permissions than the general licence you have applied.) [Symbols from the Noun Project. Modify symbol: Piotrek Chuchla; others public domain]
  59. 59. Permissions can mixed when a work is licensed…
  60. 60. The Crea,ve Commons licensing scheme promotes a range of license types.
  61. 61. Databases are covered by a database right and open licenses have been developed specifically in that context.
  62. 62. In the UK, the Open Government Licence, requiring only aLribu,on, can be used for releasing data by any local or na,onal government body. (Also Crown Copyright.) Based on Crea,ve Common license. Wikipedia: hLp://en.wikipedia.org/wiki/ Open_Government_Licence
  63. 63. A brief look at the international context…
  64. 64. OGP was launched in 2011 to provide an international platform for domestic reformers committed to making their governments more open, accountable, and responsive to citizens. Since then, OGP has grown from 8 countries to the 65 participating countries indicated on the map below. In all of these countries, government and civil society are working together to develop and implement ambitious open government reforms.
  65. 65. Members: Canada, France, Germany (2015 Chair), Italy, Japan, Russia, United Kingdom, United States Principle 1: Open Data by Default Principle 2: Quality and Quan,ty Principle 3: Usable by All Principle 4: Releasing Data for Improved Governance Principle 5: Releasing Data for Innova,on Signed in June 2013
  66. 66. Globally accepted principles
  67. 67. EXAMPLES: Company Business register Crime Statistics Meteorological data, land use data, agriculture/fishing/forestry Schools performance Pollution monitoring Spending data, budgets, tenders Maps Aid, food security, extractives Electoral Results Prescription data, disease prevalence, mortality rates Research data
  68. 68. Is open data publication part of the workflow? How are data formats selected? How is redaction implemented? How is the data actually published…? And is it cross-referenced from data.gov.uk?

×