4. “Data is the Next Intel Inside®.”
Photography: Jason Madara, WIRED UK 02:13, 2013.
Tim O’Reilly
Founder and CEO, O’Reilly Media
”Every significant Internet application to date has been backed by a specialized database.
[…] Much as the rise of proprietary software led to the Free Software movement, we
expect the rise of proprietary databases to result in a Free Data movement within the
next decade.” — “What is Web 2.0,” Sep. 2005.
5. Tim O’Reilly
Founder and CEO, O’Reilly Media
John Battelle
CEO, Co-founder, and Chairman,
NewCo
Photography: James Duncan Davidson, Web 2.0 Summit, 2010.
6. Data is the“Intel Inside®”of the Next Generation of Applications
Tim O’Reilly
Founder and CEO, O’Reilly Media
John Battelle
CEO, Co-founder, and Chairman,
NewCo
Photography: James Duncan Davidson, Web 2.0 Summit, 2010.
”Collective intelligence applications depend on managing, understanding, and
responding to massive amounts of user-generated data in real-time. The“subsystems”of
the emerging Internet operating system are increasingly data subsystems: location,
identity (of people, products, and places), and the skeins of meaning that tie them
together. This leads to new levers of competitive advantage: Data is the “Intel Inside®” of
the next generation of computer applications.” — “Web Squared: Web 2.0 Five Years On,” Oct. 2010.
9. Open Government
Barack Obama
44th President of the United States
Photography: Kevin S. O’Brien, U.S. Navy, 2009.
”My administration is committed to creating an unprecedented level of openness in
Government. We will work together to ensure the public trust and establish a system of
transparency, public participation, and collaboration. Openness will strengthen our
democracy and promote efficiency and effectiveness in Government.”— Memorandum for
the Heads of Executive Departments and Agencies, “Transparency and Open Government,” Jan. 2009.
11. “Government as a Platform”
Tim O’Reilly
Founder and CEO, O’Reilly Media
Photography: Eric Laycock, Esri, 2011.
”This is the right way to frame the question of“Government 2.0.”How does government
itself become an open platform that allows people inside and outside government to
innovate? How do you design a system in which all of the outcomes aren’t specified
beforehand, but instead evolve through interactions between the technology provider
and its user community? […] That’s Government 2.0: technology helping build the kind of
government the nation’s founders intended: of, for and by the people.”
— “Gov 2.0: The Promise of Innovation,” Forbes, Aug. 2009.
12. Todd Park
2nd United States ChiefTechnology Officer
Photography: U.S. Department of Labor, 2012.
13. Open Data Policy
Todd Park
2nd United States ChiefTechnology Officer
Photography: U.S. Department of Labor, 2012.
”Making information resources accessible, discoverable, and usable by the public can
help fuel entrepreneurship, innovation, and scientific discovery — all of which improve
Americans’lives and contribute significantly to job creation.”— Sylvia M. Burwell, Steven
VanRoekel, Todd Park, and Dominic J. Mancini, Memorandum for the Heads of Executive Departments and
Agencies, “Open Data Policy — Managing Information as an Asset,” May. 2013.
15. Open Data Movement
Joel Gurin
President and Founder,
Center for Open Data Enterprise
Photography: Techonomy, 2014.
”The Open Data movement began with democratic goals, fuelled by the idea
that governments should make the data they collect available to the taxpayers
who’ve paid to collect it. But in addition to its social benefits, Open Data has
created tremendous new business opportunities.”
— “Open Data Now,” McGraw-Hill Education, 2014.
16. Why Open Data? — Open Data and Social Impact
Sketchnote: Open Government Partnership, 2013.
17. Why Open Data? — Driving Growth, Ingenuity, and Innovation
Sketchnote: Open Government Partnership, 2013.
”Data is the new capital of the global economy, and as organisations seek renewed
growth, stronger performance and more meaningful customer engagement, the pressure
to exploit data is immense. […] As a result, we foresee that open data, and not simply big
data, will be a vital driver for growth, ingenuity and innovation in the UK economy. There
are four key aspects to our vision:
1. Every business wil have a strategy to exploit the rapidly growing estate of open data.
2. Businesses will increasingly open up their data to revolutionise the way they compete.
3. Businesses will use open data to inspire customer engagement.
4. Businesses will work with the Government to establish a new paradigm in data
responsibility and privacy.”
— “Open data: Driving growth, ingenuity and innovation,” Deloitte, 2012.
18. Why Open Data? — Large Amount of Economic Value
”Making data more“liquid”(open, widely available, and in shareable formats)”
has the potential to unlock large amount of economic value (approx. $3 trillion
annually), by improving the efficiency and effectiveness of existing processes;
making possible new products, services, and markets; and creating value for
individual consumers and citizens.”— “Open data: Unlocking innovation and
performance with liquid information,” McKinsey Global Institute, Oct. 2013.
McKinsey & Company
McKinsey Global Institute
More open data
for more users ...
40+Number of countries with
government open data platforms*
90,000+Data sets on data.gov
(US site)*
1.4 millionPage views for the UK open data site
in the summer of 2013
102Cities that participated in 2013
International Open Data Hackathon Day
1 million+Data sets made open by
governments worldwide
* As of 2013
19. Why Open Data? — Large Amount of Economic Value
”While sources differ in their precise estimates of the economic potential of Open Data,
all are agreed that it is potentially very large. In countries which were early movers in
Open Data, there is already evidence of significant businesses having developed to
exploit that potential. Leading governments have recognised that their role is not simply
to publish data — they are supporting the whole value chain of the use of data […].”—
“Open Data for Economic Growth,” The World Bank, Jun. 2014.
The World Bank
IBRD· IDA
23. Joel Gurin
President and Founder,
Center for Open Data Enterprise
Photography: The GovLab, 2013.
24. Joel Gurin
President and Founder,
Center for Open Data Enterprise
Defining Data Categories
OPEN DATA
Business Reporting And
Other Business Data
(e.g., ESG data and comsumer complaints)
BIG DATA OPEN GOV
Non-Public
Data
for marketing,
business analysis,
national security
Citizen
Engagement
Programs
not based on
data (e.g.,
petition
websites)
Large Datasets
from scientific
research, social
media, or other
non-government
sources
Public Data
from state, local,
federal
government
(e.g., budget
data)
Large Public
Government
Datasets
(e.g., weather,
GPS, Census,
SEC, healthcare)
Photography: The GovLab, 2013.
Diagram: From Joel Gurin, “Open Data Now,” McGraw-Hill Education, 2014.
26. Rufus Pollock
President and Co-Founder
Open Knowledge (Foundation)
Photography: Sebastiaan ter Burg, http://www.flickr.com/photos/31013861@N00/14860905785/, 2014.
27. Rufus Pollock
President and Co-Founder
Open Knowledge (Foundation)
Photography: Sebastiaan ter Burg, http://www.flickr.com/photos/31013861@N00/14860905785/, 2014.
28. Rufus Pollock
President and Co-Founder
Open Knowledge (Foundation)
Photography: Sebastiaan ter Burg, http://www.flickr.com/photos/31013861@N00/14860905785/, 2014.
29. Rufus Pollock
President and Co-Founder
Open Knowledge (Foundation)
Photography: Sebastiaan ter Burg, http://www.flickr.com/photos/31013861@N00/14860905785/, 2014.
30. Open Definition
Rufus Pollock
President and Co-Founder
Open Knowledge (Foundation)
Photography: Sebastiaan ter Burg, http://www.flickr.com/photos/31013861@N00/14860905785/, 2014.
Source: “Open Definition 2.1,” http://opendefinition.org/od/2.1/, 2015.
Knowledge is
OPENif ANYONE is
FREE to ACCESS, USE,
MODIFY, and SHAREit
— subject, at most, to measures that preserve
PROVENANCE and OPENNESS.
“
”
31. Open Definition
Rufus Pollock
President and Co-Founder
Open Knowledge (Foundation)
Photography: Sebastiaan ter Burg, http://www.flickr.com/photos/31013861@N00/14860905785/, 2014.
Source: “Open Definition 2.1,” http://opendefinition.org/od/2.1/, 2015.
Data and Content are
OPENif ANYONE is
FREE to ACCESS, USE,
MODIFY, and SHAREit
— subject, at most, to measures that preserve
PROVENANCE and OPENNESS.
“
”
32. How Data are Open or Closed, based on four characteristics
Source: From “Open data: Unlocking innovation and performance with liquid information,” McKinsey Global Institute, Oct. 2013.
Completely
Closed
Completely
Open
More “Liquid”
Degree of Access Everyone has access
Access to data is to a subset of
individuals or organizations
Machine-Readability
Available in formats that can be easily
retrieved and processed by computers
Data in formats not easily retrieved and
processed by computers
Cost No cost to obtain Offered only at a significant fee
Rights
Unlimited rights to reuse
and redistribute data
Re-use, republishing, or
distribution of data is forbidden
35. 5-Star Deployment Scheme for Open Data
Tim Berners-Lee
The Inventor of theWorldWideWeb
Photography: Bret Hartman, TED, 2014.
”In order to encourage people — especially government data owners — along the road
to good linked data, I have developed this star rating system. Linked Open Data (LOD) is
Linked Data which is released under an open license, which does not impede its reuse
for free. […] Linked Data does not of course in general have to be open. […]
However, if it claims to be Linked Open Data then it does have to be
open, to get any star at all.”— “Linked Data,” Design Issues, 2010.
38. The Tip of the Iceberg
Image: Science for all, 2015.
“All those pages on websites are only tips of icebergs:
• The real data is hidden in databases, XML files, Excel sheets, …
• You only have access to what the Web page designers allow you to see.
[…]
Various data sources expose their data via Web Services or APIs, each with a different API,
a different logic, a different structure. Mashups are forced to reinvent the wheel many
times because there is no standard way getting to the data.”
— Ivan Herman, “High Level Intro to Semantic Web,” Feb. 2012.
42. Licenses for the“Database”and its“Contents”
Screenshot: http://opendatacommons.org/
”The database and its contents may have separate rights. […] Different types of subject
matter (e.g., code, content, or data) necessitate differences in licensing. Licenses designed
for one type of subject matter — as CC licenses (lower than 4.0) were designed for
content, and F/OSS licenses for code — aren’t always best suited to licensing another
type of subject matter.”— “Licenses FAQ,” Open Data Commons, 2010.
43. Creative Commons Rights Expression Language (CC REL)
Source: https://www.w3.org/Submission/ccREL/
44. Open Data Rights Statement Vocabulary
Screenshot: https://alpha.openaddressesuk.org/about/terms/
Screenshot: http://schema.theodi.org/odrs/
45. Open Data Rights Statement Vocabulary
Screenshot: https://alpha.openaddressesuk.org/about/terms/
Screenshot: http://schema.theodi.org/odrs/
46.
47. Screenshot: The Next Web, TED, 2009.
Tim Berners-Lee
The Inventor of theWorldWideWeb
48. Tim Berners-Lee
The Inventor of theWorldWideWeb
Raw (Structured) Data, Now!
Screenshot: The Next Web, TED, 2009.
49. No Ontological Commitment, No Machine-Understandability.
”The term ontological commitment is
used as a general term in both
philosophy and in information systems to
refer to the essential elements of an
ontology. An ontological commitment in
describing ontological comparisons is
taken to refer to a subset of elements of
an ontology that it shares with all other
ontologies based upon the same theory
or conceptualization.”— Citizendium, 2013.
Diagram: John R. Brews, Citizendium, 2013.
53. Frictionless Data
Diagram: Open Knowledge, http://blog.okfn.org/2013/04/24/frictionless-data-making-it-radically-easier-to-get-stuff-done-with-data/, 2013.
55. Data Package Standards & Tools
Diagram: Open Knowledge, http://blog.okfn.org/2013/04/24/frictionless-data-making-it-radically-easier-to-get-stuff-done-with-data/, 2013.
Screenshot: http://data.okfn.org/tools
56. CSV on the Web
Screenshot: https://www.w3.org/TR/tabular-metadata/
Screenshot: http://www.w3.org/TR/tabular-data-model/
Screenshot: http://www.w3.org/TR/csv2json/
Screenshot: https://www.w3.org/TR/csv2rdf/
57.
58. Core & Community Datasets
Screenshot: http://data.okfn.org/data
Screenshot: https://github.com/datasets
63. Linked Data
Tim Berners-Lee
The Inventor of theWorldWideWeb
Photography: Paul Clarke, 2014.
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names
3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
4. Include links to other URIs so that they can discover more things
“I’ll refer to the steps above as rules, but they are expectations of behavior. Breaking them
does not destroy anything, but misses an opportunity to make data interconnected. This
in turn limits the ways it can later be reused in unexpected ways. It is the unexpected re-
use of information which is the value added by the Web.”
— “Linked Data,” Design Issues, 2010.
64. Hypertext Transfer Protocol Uniform Resource Identifiers
Tim Berners-Lee
The Inventor of theWorldWideWeb
Photography: Paul Clarke, 2014.
”The first rule, to identify things with URIs, is pretty much understood by most people
doing semantic Web technology. […] The second rule, to use HTTP URIs, is also widely
understood. The only deviation has been a constant tendency for people to invent new
URI schemes such as XRIs, DOIs, and so on for various reasons. Typically, these involve not
wanting to commit to the established Domain Name System (DNS) for delegation of
authority but to construct something under separate control. Sometimes it has to do
with not understanding that HTTP URIs are names […].”— “Linked Data,” Design Issues, 2010.
65. Hypertext Transfer Protocol Uniform Resource Identifiers
”HTTP URIs, in the Web architecture, have been used to
denote documents. However, with the growth of the
Semantic Web, which uses URIs to denote anything at
all, the urge to use and practice of using HTTP URIs for
arbitrary things grew steadily.”— “What HTTP URIs Identify,”
Design Issues, 2010.
Diagram: “What do HTTP URIs Identify?,” Design Issues, 2002.
KEY 1 … … … … … Car
KEY 2 … … … … … …
KEY 3 … … … … … …
KEY 4 … … … … … …
KEY 5 … … … … … …
URI 1
URI 2
URI 3
URI 4
URI 5
URI 0
URI 6
66. Resource Description Framework
Subject Object
Predicate
Triple
URI 1 URI 3 / Value
URI 2
- COL 1 COL 2 COL 3 COL 4 COL 5 COL 6
KEY 1 … … … … … Car
KEY 2 … … … … … …
KEY 3 … … … … … …
KEY 4 … … … … … …
67. Resource Description Framework
Subject 1
Object 1
Subject 2
Predicate 1
Triple 1
URI 1 URI 3
URI 2
- COL 1 COL 2 COL 3 COL 4 COL 5 COL 6
KEY 1 … … … … … Car
KEY 2 … … … … … …
KEY 3 … … … … … …
KEY 4 … … … … … …
Object 2
Predicate 2
URI 5 / Value
URI 4
Triple 2
- COL 7
Car Tire
… …
… …
… …
Graph
69. Dereferenceable Uniform Resource Identifiers
Tim Berners-Lee
The Inventor of theWorldWideWeb
Photography: Paul Clarke, 2014.
”The third rule, that one should serve information on the Web against a URI, is, in 2006,
well followed for most ontologies, but, for some reason, not for some major datasets. […]
Large datasets provide a SPARQL query service, but the basic linked data should be
provided as well. Many research and evaluation projects in the few years of the Semantic
Web technologies produces ontologies, and significant data stores, but the data, if
available at all, is buried in a zip archive somewhere, rather than being accessible on the
Web as linked data.”— “Linked Data,” Design Issues, 2010.
72. Why Linked Data?
Source: Tom Heath, “How to Publish Linked Data on the Web,” 2008.
• Ease of Discovery
• Ease of Consumption
- standards-based data sharing
• Reduced Redundancy
- avoid duplication
• Added Value
- build ecosystems around your data/content
74. Open Data Ecosystem
Figure 1. The open data ecosystem
Supplies data to
Uses data to deliver to
Source: Deloitte LLP
Business data
Business
data
Business
data
Government
data
Citizen
data
Citizen
data
Government data
Citizen data
Citizen
Government
data
Government
Business
There are three principal constituencies in any
successful open data ecosystem: government, business
and citizen. Each constituency supplies data to itself
and to others. In turn, businesses and government
use the data to deliver services demanded by all
constituencies. The three classes of open data supplied
by the constituencies and used to deliver services are:
Open government data – data produced, collected
or paid for by the public sector, subject to restrictions
relating to sub judice, national security, commercial
sensitivity and privacy. In addition, special commercial
arrangements also being made for certain trading
funds, including Companies House, the Ordnance
Survey, the Meteorological Office and HM Land
Registry, which together form the newly created Public
Data Group.10
Open business data – data produced or collected
by the private sector and published freely and openly,
subject to restrictions that individual businesses decide
to put in place.
Open citizen data – the personal and non-personal
data of individual citizens published into the open
domain.
Diagram: From “Open data: Driving growth, ingenuity and innovation,” Deloitte, 2012.
79. Diagram: Florence Nightingale, “Notes on Matters Affecting the Health, Efficiency, and Hospital Administration of the British Army,” 1858.
80. Photography: "Don’t Panic – the Truth about Population,” BBC, 2013.
Hans Rosling
Co-founder and Chairman,
Gapminder Foundation
81. Photography: "Don’t Panic – the Truth about Population,” BBC, 2013.
Hans Rosling
Co-founder and Chairman,
Gapminder Foundation
Animation: “Mesmerizing Animation Shows How Much Healthier The World Has Become,” Business Insider, 2014.
82. Reality Mining: Serendipitous Reuse
Figure 1. Normalized data from Fluwatch (influenza cases, lab tests, ILI reports from sentinel
physicians) and Google (number of clicks on an keyword-triggered influenza link).
Results
Over the flu-season period, the Google campaign
received a total of 54,507 impressions and 4,582
clicks (Figure 1). Among all the ad campaign
measures, the number of clicks on the ad was
found to have the best correlation with traditional
query sampling week could be predicted with
100% specificity and sensitivity.
The costs of the Google sentinel method were
negligible compared to traditional methods:
Google charges $0.08 per click-through, thus the
campaign cost only Can$365.64 for the entire
flu-season.Screenshot: https://www.google.com/adsense/start/
Diagram: Gunther Eysenbach, "Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance," 2006.