5. Tony Hirst is a part-time trainer for Open Knowledge and a Senior Lecturer at the
Open University. He promotes the use of open data in education, such as the OU’s
“Learn to Code for Data Analysis” free online FutureLearn course, which uses open
data from the World Bank and UN Comtrade (import/export data) as part of the
practical exercises used in the course.
6. Open data may be contextualised by the wider notion of an open system.
Open systems accept inputs from, and produce outputs into, an external environment,
and may also observe and learn from processes happening in that external
environment. Their internal processes may also be observable from (transparent to
observers in) the external environment.
7. The FCO Open Data Strategy 2012 provides an overview of the role of information
within the department. The base principles (info a valued and managed asset) are
realised by information that is fit for purpose, structured and cross-referenceable,
which supports reuse and openness.
8. The FCO operates in local, national and international environmental contexts, as well
as topical data/information contexts (for example, cross-governmental spending).
ACTIVITY: how does openness/transparency feature in YOUR current work?
9. The FCO’s open data and transparency related activities are identified in the FCO’s
original Open Data Strategy document of 2012 and the later refresh of 2013.
10. Notes from the FCO Open Data Strategy and Open Data Strategy Refresh. Everyone
knows these documents backwards, I expect(?!;-)
Laudable as they are, how practical and “actionable” are these? We’ll explore that
later…
11. More notes… This belief is one that has been “pushed” in open government circles at
a global level. It’s taken ads axiomatic, though perhaps need more critiquing as
evidence collects about open data publication and use.
12.
13. Open data for transparency is pushed at a global level and may help drive inter- as
well as intra-governmental operations, and act as a common resource that can be
used to build working relationships with and between NGOs and civil society, as well
as supporting journalistic oversight.
14. As an example of how open data can be used to support real-time actions from a
standing start, within days, if not hours, of an earthquake hitting a province in Taiwan,
open data to support relief co-ordination efforts was being published by local
government using a local civic open data portal.
15. Government policies around open data and transparency reflect public expectations
about open access to data and information. (It is interesting to note that historically in
the UK, one of the drivers for open data was the Guardian newspaper’s “Free Our
Data” campaign that in-part called on “we’ve paid for it (through taxes) so we want
access to it” arguments for public access to things like Ordnance Survey data.
16. One indicator of public interest in gaining access to otherwise unpublished public/
government information is the steady rate at which FOI requests have been made to
public bodies since the advent of FOI legislation.
One argument for the pre-emptive release of public data as open data is that it can
reduce the burden on servicing FOI requests. If frequently requested information is
made available openly as part of standard working practice, it can be directly referred
to by the public rather than them having to request it.
http://www.instituteforgovernment.org.uk/blog/12942/how-whitehall-responds-to-
freedom-of-information-requests/
17. Some public bodies maintain an “FOI disclosure log” that summarises FOI requests
(and responses) on a periodic basis; whilst the ICO might recommend such a
publication, it is not mandatory.
Services such as WhatDoTheyKnow, operated by MySociety, provides a public
gateway to making FOI requests to bodies covered by UK FOI legislation.
ACTIVITY:
• what sort of requests are made through WhatDoTheyKnow to the FCO?
• How do these compare to typical requests received directly?
• Do any of the requests relate to information that could (or is already) published
routinely as open data by the FCO?
18. As well as being a producer of requested and open data, the FCO may also be a
consumer of such data, or have a desire to work with other nations to publish data of
a common form in a common format.
Surveying FOI style sites similar to WhatDoTheyKnow in other jurisdictions provides
an informal way of monitoring/tracking such activity elsewhere.
(In the same way, reviewing open data portals from FCO equivalents in other
countries may help normalise data releases through the informal adoption of a
convention relating to data publication.)
19. Releasing information for transparency purposes means opening yourself up to
account on that basis at least.
So how might open data be used?
20. Services like “Where Does My Money Go?” help interpret budget allocations in terms
of hypothetical “hypothecated” spends. Services like “The Daily Bread” further
humanise the data, showing how the money is spent in terms of tax collected from a
salary level similar to your own.
Comparing such allocations of money to expenses gathered from other public
services, such as local government spending data (£x on a care placement, for
example) or health spending (cost of a particular prescription or treatment, for
example), helps provide further context for the benefits of government.
At a more detailed, transactional level, mismatches in budget and spend, or hard to
explain allocations, may provide a forum for debate and oversight.
21. The Great Rip off map was a Global Witness report that used a database containing
reports of fraudulent corporate activity to associate companies registered in secret
jurisdictions with the countries in which their fraudulent activity took place.
Information about company directorships from services like OpenCorporates support
this sort of investigation using open data published by – or scraped from – company
registration documents.
22. An important consideration when releasing data is that it often leads to requests for
further data or information – analysis tends to be an iterative process. Analysis may
also produce additional (derived, or summary) datasets, as well as richer linked
datasets that allow alternative forms of segmentation.
23. Just making data available doesn’t always support transparency. On the one hand,
people don’t necessarily know how a set of data was used to inform or help make a
particular decision or frame a particular policy. On the other, when a data set is
released that supports a report containing charts and tables derived from the data, for
example, or the outcomes of an analysis applied to the data, it may be difficult, if not
impossible, for a third party to try to replicate that charts, tables or analyses from the
data.
In some parts of the world of Open Science, there is a move to using open
computational notebooks that blend text, computational analyses and the outputs of
those analyses applied to linked to data sets to provide reproducible analyses that
can be rerun, and checked over, by a third party.
This screenshot shows an analysis published by Buzzfeed that shows how they
analysed betting data to reveal match fixing tends in professional tennis (the story
was broken by Buzzfeed working the the BBC). This openness of process and
analysis is far more helpful than just releasing the betting data they found.
25. When open data isn’t, earlly - physical, limited access (every Tuesday), costs money,
voluminous and inconsistent, difficult to read, all calculations using data must be
performed by hand. Still the case for some datasets like Electoral Register.
ACTIVITY: what characteristics or features do you think define “open”?
29. Data.gov.uk was established as a one stop directory for looking up open data
published by UK public bodies. The idea was that as and when data is published
WHEREVER, a catologue record would also be added on data.gov.uk
So how does data released as open data by the FCO fare when ranked against the
Sunlight Principles?
Check out some of the open data published by the FCO on data.ac.uk…
30. There are also FCO “spin-offs” separately listed…
It should be noted that there are many workflow related “issues” associated with
publishing records on data,gov.uk – it’s hard enough publishing the data in the first
place without forgetting to add – or update – a corresponding record on data.gov.uk
31. The www.gov.uk website is the primary home for departmental online publications. As
such, data.gov.uk records for departments like FCO might naturally point to the
possibly more comprehensive (if dispersed) data related publication on the
departmental site.
35. Although not an obvious source of national statistics (and national statistics offices
are a great place around the world to find open data), the FCO does publish some
stats…
QUESTION: on the data.gov.uk and www.gov.uk sites, what sort of data does the
FCO tend to publish?
36. Quickly skimming FCO open data publications, one segment or cluster of publications
I spotted related to “cash” transparency - how money flowed.
One thing that’s often interesting to note around spending data is that data releases
tend to be one sided. Spend data is published, but receipts aren’t..
37. When spend publication is siloed, eg by department, or locale, it’s often hard to see
the wider picture. This diagram – known as a Sankey diagram, which is a useful chart
type for visualising the forward flow of a conserved quantity.
38. Another cluster of FCO pages I found relate to positions of “influence” or activities that
might be associated with influence,
39. Simple relationships may be used to identify large scale patterns, structures or
distributions across a wider set of relationships.
This map was constructed several years ago from the starting point of a single
company name. By finding the directors of the company (using OpenCorporates),
looking up other companies they were associated, finding the directors of those
companies, and so on, then drawing lines between companies connected by two or
more of the same directors, it is possible to identify a large number of companies
within a corporate group, and perhaps even reveal something about the overall
corporate structure.
The lesson is that when things are connected in networks – which are can be simply
represented by a row in a two column dataset – THIS CONNECTS-TO THAT – lots of
things can fall out when you look at the network as a whole…
40. Another grouping in the FCO transparency releases seem to relate to operational
matters…
41. Some of the data released by the FCO – or other government departments – may
end being aggregated on a topical basis by other parties or agencies.
IATI is a voluntary, multi-stakeholder initiative of governments and civil society that
attemps to make information on aid flows more accessible. It enables recipient
countries to plan and allocate more effectively and enables citizens of recipient
countries to hold their governments to account for how they spend those resources.
42. Civil Society organisations also collect and publish open data of their own collection,
data that is sometimes used in turn by government departments (I think BIS refer to
Transparency International’s Corruption Perceptions Index, for example).
43. Government departments may also link out to open datasets published by other
public or arms length bodies.
44. Initiatives such as IETI – the Extractive Industries Transparency Initiative – open up
information about extractive industry contracts to support transparency and fight
corruption.
45. OpenOil is eve more focused, publishing contract and corporate structure information
relating specifically to the oil industry.
Note that when I tried to gran the screenshot, the site appeared to be down. So
instead I went to the Internet Archive’s Wayback machine to see an archived version
of the site.
ACTIVITY: visit the wayback machine to find old government pages.
In the UK, the National Archive also maintain a web archive of notable UK sites.
46. So what does it mean to start engaging with open data practice in the FCO?
47. Here’s an example of some information that could be released as data. Unfortunately,
the information is locked up in a PDF document.
48. Fortunately, tools such as Tabula allow us to extract data tables as tables from a PDF
document.
49. One of the features of open data is that we want it to be machine readable so we can
process it with machines…
54. Even if data is published as data, it still may not be easy to use or reconcile.
ACTIVITY: look up data about Ministers Gifts and Hospitality. Are the columns all the
same…?
55. Here’s some more information I found – the structure of the record suggests this may
have been pulled from a database, but it’s locked up in a PDF and the semantics -
this line is the name of the establishment, these are the address, these are the
services offered, are hard for a machine to extract reliably (the cues are visual and
rely on common sense understanding).
68. A discussion of licensing starts from copyright: if there were no copyright, there would
be no need for (or possibility of) licences. Copyright in a work rests with the creator of
the work and effectively constitutes ownership over the work. Other people cannot
use the work without the permission of the copyright holder. (In the case of data there
is also, within the EU, a ‘database right’ in the ‘arrangement’ of data, separate from
copyright but similar to it.)
When publishing data it is essential to publish a statement making clear how people
can use the data; such a statement is called a licence. Without a clear licence, people
will be wary of re-using the data in case they are breaching copyright. In general a
licence might grant certain permissions, subject to specified conditions.
69. Permissions in data and content include things like reading it (probably this is usually
already implicit), copying it, distributing the data either as it is or in modified form,
perhaps mixed in with other data, etc.
Copyright is inherent in the data; it is not a contract. If someone republishes your data
(with or without your permission), you still have copyright in it even if they have mixed
it or modified it. Your licence potentially therefore binds all downstream users.
(However, you can contract with someone to give them more permissions than the
general licence you have applied.)
[Symbols from the Noun Project. Modify symbol: Piotrek Chuchla; others public
domain]
77. OGP was launched in 2011 to provide an international platform for domestic
reformers committed to making their governments more open, accountable, and
responsive to citizens. Since then, OGP has grown from 8 countries to the 65
participating countries indicated on the map below. In all of these countries,
government and civil society are working together to develop and implement
ambitious open government reforms.
80. EXAMPLES:
Company Business register
Crime Statistics
Meteorological data, land use data, agriculture/fishing/forestry
Schools performance
Pollution monitoring
Spending data, budgets, tenders
Maps
Aid, food security, extractives
Electoral Results
Prescription data, disease prevalence, mortality rates
Research data
81. Is open data publication part of the workflow?
How are data formats selected?
How is redaction implemented?
How is the data actually published…? And is it cross-referenced from data.gov.uk?