1. The scope and application of public
“big data”
Randy Goebel
Alberta Machine Intelligence Institute
University of Alberta
Edmonton, Alberta
Canada
rgoebel@ualberta.ca
2. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Michael
Bowling
Randy
Goebel
Russ
Greiner
Patrick
M.
Pilarski
Csaba
Szepesvári
Dale
Schuurmans
Yutaka
Yasui
Or
Sheffet
Osmar
Zaïane
Robert
Holte
Richard
S.
Sutton
www.amii.ca/
researchers
AMII Principle Investigators
3. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
AMII at a glance …
• Founded 2002
• Resides at University of Alberta
• One of the top Machine Learning research
organizations in the world
• Since inception >110 technologies created
• ≈134 people
• 11 Professors (PIs)
• 32 Staff & 1 Admin
• 91 M.Sc., PhD. & PDFs
4. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Outline
• The “Big Data” value chain
• Public/private/secret data
• Illustrative examples
• health
• transportation
• Summary
5. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Outline
• The “Big Data” value chain: data-information-knowledge-
action
• e-Government assessments and shift from process to data
• “Data IS infrastructure” – the acknowledgement of data-driven
decision making
• “Data is the new oil” – Clive Humby, www.humbyanddunn.com
• Public/private/secret
• digital democracy
• data silos, data sharing, public-private collaborations
• Illustrative examples
• health
• transportation
• Summary
6. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
The “Big Data” value chain
• The “Big Data” value chain: data-
information-knowledge-action
• e-Government assessments and shift from
process to data
• “Data IS infrastructure” – the acknowledgement
of data-driven decision making
• “Data is the new oil” – Clive Humby,
www.humbyanddunn.com
7. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
decision
support
The “Big Data” value chain
data
Infor-
mation
know-
ledge
action
IoT
sensor
networks
…
Security
-access
federation
…
integration
domain
knowledge
machine
learning
modeling
… resource-
bounded
choice
action
sets
…
8. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Accenture’s Digital Government 2014
What is a digital government?
We see digital government as the optimal use of
electronic channels of communication and
engagement to improve citizen satisfaction in service
delivery, enhance economic competitiveness, forge
new levels of engagement and trust, and increase
productivity of public services.
https://www.accenture.com/ca-en/~/media/Accenture/Conversion-Assets/DotCom/
Documents/Global/PDF/Industries_7/Accenture-Digital-Government-Pathways-to-
Delivering-Public-Services-for-the-Future.pdf, Page 9.
9. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Accenture’s Digital Government 2004
“Since 2000, Accenture has been
studying and reporting on trends in
the international e-Government
landscape. During that time we
have seen countries around the
globe rush to build an online mirror
of the offline world—and then step
back to reflect on what value that
strategy had brought to them. For
most, it brought a realization of the
need for change.”
http://grandsorganismes.gouv.qc.ca/fileadmin/Fichiers/Veilles%20stratégiques/
Prestation%20de%20services%20en%20personne/2004-
egovernment_leadership.pdf, Page 2
“Governments need to integrate
services seamlessly across horizontal
and vertical levels of government.
The technology challenges and the
complexities of governance mean the
task will not be easy, but only then will
they be able to provide the truly
seamless service that will drive broad
take-up of services.”
10. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
E-Government “maturity” 2004
http://grandsorganismes.gouv.qc.ca/fileadmin/Fichiers/Veilles%20stratégiques/
Prestation%20de%20services%20en%20personne/2004-egovernment_leadership.pdf, Page 7
“maturity” is an Accenture compound
measure based on % of government
services accessible by Internet, the
growth in their use by citizens, and
whether the e-processes result in
improvements in process.
(Canada, Japan are highlighted)
11. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
E-Government Ranking 2014
http://grandsorganismes.gouv.qc.ca/fileadmin/Fichiers/Veilles%20stratégiques/
Prestation%20de%20services%20en%20personne/2004-egovernment_leadership.pdf, Page 12
“We ranked the 10 countries on a
scale of 1 to 10 based on the scores
from the Citizen Satisfaction Survey,
Service Maturity and Citizen Service
Delivery Experience.”
.
(Canada, Japan were not included; it
seems that UAE paid Accenture for
most of the study?)
12. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Outline
• The “Big Data” value chain: data-information-knowledge-
action
• e-Government assessments and shift from process to data
• “Data IS infrastructure” – the acknowledgement of data-driven
decision making
• “Data is the new oil” – Clive Humby, www.humbyanddunn.com
• Public/private/secret data generation and curation
• digital democracy
• data silos, data sharing, public-private collaborations
• Illustrative examples
• health
• transportation
• Summary
13. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
USA/NSF: evidence of a shift
“Data IS infrastructure.”
Peter Arzberger (Division Director, Computer and Network Systems
(CNS) in the Directorate for Computer and Information Science and
Engineering at NSF)
Chaitan Baru (Senior Advisor for Data Science in the Directorate
for Computer and Information Sciences and Engineering at NSF)
"Big Data and Smart and Connected Communities: Opportunities
and Challenges”
JST Big Data Application Symposium, August 5, 2016
14. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
USA/NSF: evidence of a shift
"Data are motivating a profound transformation in the
culture and conduct of scientific research in every
field of science and engineering. American scientists
must rise to the challenges and seize the
opportunities afforded by this new, data-driven
revolution. The work we do today will lay the
groundwork for new enterprises and fortify the
foundations for U.S. competitiveness for decades to
come."
Subra Suresh
USA National Science Foundation (NSF) Director
15. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
USA public investment
“Federal agencies spent about US$4.9 billion
on Big Data resources in fiscal 2012,
according to estimates from Deltek, an IT
consultancy. The annual amount of such
spending will grow to $5.7 billion in 2014 and
then to $7.2 billion by 2017 with a compound
annual growth rate of 8.2 percent.”
http://www.ecommercetimes.com/story/77690.html
16. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
USA/NSF/NIH
• NIH BD2K (Big Data to Knowledge)
• https://datascience.nih.gov/bd2k/faqs
• NSF
• Big data regional innovation hubs
• https://www.nsf.gov/pubs/2015/nsf15562/
nsf15562.htm
• Core Techniques
• NSF 12-499, Core Techniques and Technologies for
Advancing Big Data Science and Engineering
(BIGDATA)
17. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Canada: evidence of a shift
• CANARIE (Canadian Network for the
Advancement of Research, Industry
and Education)
• Compute/network/could infrastructure for
research, industry, and education
• Investments in
• Storage and preservation
• Compute and analysis
• Discovery and dissemination
• Support and training
18. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Canada: evidence of a shift
• RDC (Research Data Canada)
• Consortium of public organizations, significant leadership
from research librarians to create policy, standards for
public data capture, access, use
• Tri-council policy development
• Natural Sciences & Engineering Research Council
• Canadian Institutes of Health Research
• Social Sciences & Humanities Research Council
• Genome Canada
• Big data in omics (genomics, metabolomics,
etc.)
19. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Canada: evidence of a shift
https://smith.queensu.ca/ConversionDocs/MMA/big-data-gap.pdf
http://www.ictc-ctic.ca/wp-content/uploads/2015/12/BIG-DATA-2015.pdf
20. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Europe: evidence of a shift
“Economic and social activities have long relied on
data. But today the increased volume, velocity,
variety, and social and economic value of data
signals a paradigm shift towards a data-driven
socioeconomic model.”
European Big Data Value Strategic Research & Innovation Agenda,
http://www.nessi-europe.com/Files/Private/EuropeanBigDataValuePartnership_SRIA__v099%20v4.pdf
21. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Europe: evidence of a shift
• E.g., Horizon 2020 funding of Eudeco “Modeling
the European Data Economy”
“EuDEco assists European science and industry in
understanding and exploiting the potentials of data reuse in
the context of big and open data. The aim is to establish a
self-sustaining data market and thereby increase the
competitiveness of Europe. To be able to extract the
benefits of data reuse, it is crucial to understand the
underlying economic, societal, legal, and technological
framework conditions and challenges to build useful
applications and services.”
http://data-reuse.eu
http://data-reuse.eu/wp-content/uploads/2016/09/D2.1_InitialHeuristicModel-v1_2015-11-06.pdf
22. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Public, Private, Secret Data
public
(e.g., open
data, open
science, open
government)
Private
(e.g.,
Google,
Baidu,
Facebook)
secret
(e.g.,
USA NSA
Israel Shabak
UK MI5)
23. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Public, private, secret data
http://webfoundation.org/2016/04/open-data-barometer/
“Over half of countries
studied now have
open data initiatives,
but still less than 10%
of the government
data vital for
sustainable
development is open.”
24. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Public, private, secret data
https://nsa.gov1.info/data/
25. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Public, private, secret data
http://www.economist.com/news/leaders
/21711904-worrying-experiments-new-form-social-control-chinas-digital-dictatorship
“Officials talk of creating a
system that by 2020 will
‘allow the trustworthy to
roam everywhere under
heaven while making it
hard for the discredited to
take a single step.’ ”
26. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Public, Private, Secret Data
public
(e.g., open
government)
Private
(e.g., Google,
Baidu,
Facebook)
secret
(e.g.,
USA NSA
Israel Shabak
UK MI5)
27. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Public-Private sharing
• NIH, Google
• (https://www.autismspeaks.org/science/science-news/nih-
announces-first-46m-brain-initiative-research)
• NIH and DNA Data Bank of Japan
• http://www.ddbj.nig.ac.jp
• NIH, Institute for Systems Biology (ISB), SRA
International “ISB Awarded $6.5M (2014), $3.4M
(2016) NIH Contract to Develop ‘Cancer Genomics
Cloud’ with Google and SRA International”
• https://www.systemsbiology.org/news/2014/10/10/isb-
gets-6-5-million-from-nci-to-create-cancer-genomics-cloud-
with-partners-google-and-sra-international/)
28. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Private precision health
• 23 and me
• “23andMe to Share DNA Data with Researchers Using
Apple iPhone”
https://www.technologyreview.com/s/601082/23andme-to-share-dna-
data-with-researchers-using-apple-iphone/
• Patients like me
• Community-based health support
https://www.patientslikeme.com
• Molecular You
• Integrative omics (genomics, metabolomics, biomics)
http://molecularyou.com/#overview
29. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Puzzle break
30. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Puzzle break
31. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Illustrative examples
• health
• transportation
32. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Data and Personalized Medicine
Towards personalized medicine …
U
U
U
…
33. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Data and Personalized Medicine
Towards precision medicine …
. . .
. . .
34. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Image-based Glioma biopsy
Background: Some patients with low-grade glioma have
extraordinarily long survival times; current, early treatment
does not prolong their lives. For this reason, therapies
that sometimes have neurologic side effects are often
deferred intentionally.
Methods: In a study of oligodendrogliomas, we used a
quantitative method of MR analysis based on the S-
transform to investigate whether codeletion of
chromosomes 1p and 19q, a marker of good prognosis,
could be predicted accurately by measuring image
texture.
Results: Differences in texture were seen between tumors
with codeletion of chromosomes1p and 19q and
those with intact 1p and 19q alleles on contrast-
enhanced T1-weighted andT2-weighted MR images.
Quantitative MR texture onT2 images predicted
codeletion of chromo-somes 1p and 19q with high
sensitivity and specificity.
Conclusions: This new method of MR image interpretation
may have the potential to augment the diagnostic
assessment of patients with suspected low-grade
glioma.
The Use of Magnetic Resonance Imaging to
Noninvasively Detect Genetic Signatures in
Oligodendroglioma, Clin Cancer Res
2008;14:2357-2362. Published online April 15,
2008
35. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Masking
Option
Masking
Option
De-identification
Figure 2: The goal of an interoperable life cycle of health information.
Point of Care
[EMR, EHR]
NetCare
•Shared Provincial
Record
•Complete EHR
PHR
Patient
(Proxy)
Life Cycle of
Health
Information
Notification
function
Improve
Efficiency
Secondary Use
System Maintenance
Research
Audit/budgeting
Health System Planning
Performance Analysis
Health Care Team
[EMR, EHR]
Labs/Diagnostic Imaging
Clinics
Hospitals
Specialists
Pharmacists
Figure 2- Goal of an interoperable cycle of health information
Cycle of Health Information
• closing the loop on
public system
health data
Draft of Health Information Executive
Committee, Working Group “white paper,”
Alberta Department of Health
36. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Allowing disruptive technology
“Fifty-nine percent of respondents to a survey by
Health Catalyst said precision medicine will not play
a significant role in their organizations in the next five
years. Among respondents from non-academic
hospitals and health systems, the number rises to 68
percent who say precision medicine will play an
average, small or non-existent role in their
organizations between now and 2020.”
https://www.healthcatalyst.com/news/survey-healthcare-
organizations-unprepared-precision-medicine/
37. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Example capacity: BGI
- About 50% of GenBank contributed by BGI
- About 50% of world’s sequencing capacity
- At full capacity, 24x7, sequencing 1 million
genomes would take 20 years
38. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Sequencing Human Microbiome
Figure 1b, Arumugam et al., “Enterotypes of the human gut microbiome,”
Nature 473, May 2011.
“Microbes in the human gut
undergo selective pressure
from the host as well as
from microbial competitors.
This typically leads to
homeostasis of the
ecosystem in which some
species occur in high and
many in low abundance (the
‘long-tail’ effect, as seen in
Fig. 1b), with some low-
abundance species,
like methanogens,
performing specialized
functions beneficial to the
host.” (p. 175)
39. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Enterotypes
“By combining 22 newly
sequenced faecal
metagenomes of individuals
from four countries with
previously published data sets,
here we identify three robust
clusters (referred to as
enterotypes hereafter) that are
not nation or continent
specific.” (Fig. 2a, p. 176)
Arumugam et al., “Enterotypes of the human gut microbiome,” Nature 473,
May 2011.
40. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Illustrative examples
• health
• transportation
41. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Intelligence: infrastructure vs. vehicle?
• What is the
trade off
between
intelligent
vehicles and
intelligent
infrastructure?
42. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Levels of Autonomy
http://cyberlaw.stanford.edu/files/blogimages/LevelsofDrivingAutomation.pdf
43. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Infrastructure costs
• Cost of
conventional
intersection
signaling: ≈ $350K
• In Edmonton,
1200*350K =
$420M
https://www.fastcompany.com/3025722/will-you-ever-be-able-to-afford-a-self-driving-car
44. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Intelligent vehicle costs?
• Instrumented
Prius ≈ $320K
• (not counting
software)
https://www.fastcompany.com/3025722/will-you-ever-be-able-to-afford-a-self-driving-car
45. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Intelligent Vehicles
http://robotik.dfki-bremen.de/en/research/robot-systems/eo-smart-connecting-1.html
46. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Driver-Car-Environment Interaction
http://robotik.dfki-bremen.de/en/research/robot-systems/eo-smart-connecting-1.html
47. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Driver-Car-Environment Interaction
http://robotik.dfki-bremen.de/en/research/robot-systems/eo-smart-connecting-1.html
48. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Moving or Standing Still?
https://www.youtube.com/watch?v=VKTj2jsGG3k
49. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Moving or Standing Still?
http://open.canada.ca/data/en/dataset/1eb9eba7-71d1-4b30-9fb1-30cbdab7e63a
no longtitude/latitude
50. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Moving or Standing Still?
http://www.edmonton.ca/transportation/RoadsTraffic/OTS_2015_MVC_AnnualReport.pdf
51. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Moving or Standing Still?
https://www.edmonton.ca/programs_services/apps_mobile/smart-travel-app.aspx
52. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Capturing existing data
Courtesy of Meme Media Lab, Hokkaido University
53. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Traffic & Snow Removal
54. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Traffic & Snow Removal
55. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Snow equipment tracking
56. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Challenges
• Can government development of open data policy
help accelerate big data impact?
• Data IS infrastructure
• Focus on balanced portfolio in the data value
chain
• Data exploitation is crucial in the trade-offs of
multi-criteria optimization
• Platforms like literature-based discovery will
exploit foundations and help focus discipline/
domain knowledge capture
57. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Consumer Trajectories
• Extract “Consumer
Trajectories” from the
targeted search records
constrained by the search
type, region, user group,
and time span
• extracted consumer
trajectories are visualized
on 2-dimensional
geographical maps.
1
4
2
3
Figure 3: A consumer trajectory example
visualized on 2-D map
58. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Poynt classified geo-search
www.poynt.com
59. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Poynt classified geo-search
60. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Poynt classified geo-search
61. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Poynt classified geo-search
62. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Poynt records
• 531,512,270 records:
• 5 weeks from May 29/2011: 178,354,260
• 5 weeks from Oct 30/2011: 188,986,457
• 5 weeks from Apr 29/2012: 164,171,553
user_id | trans_date | year | week | second | lon | lat | source_device | search type
63. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Cartesian Geography Map
64. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Storyline Visualization
• Storygraph
(lon, lat)
lon
lon
Cartesian Coordinate Storygraph
lat lat
65. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Cartesian and Storyline
66. JST CREST International Symposium on Big Data Application, January 11, 2017, Tokyo
Blackberry vs Android
BlackBerry
Android