ELIS	
  –	
  Mul*media	
  Lab	
  
What
if
dr.	
  Erik	
  Mannens	
  
@erikmannens	
  
Open Data, Linked Data, and Big Data
We need
together
ELIS	
  –	
  Mul*media	
  Lab	
  
Open
Data
ELIS	
  –	
  Mul*media	
  Lab	
  
Way of … Thinking
ELIS	
  –	
  Mul*media	
  Lab	
  
Silos of Data
ELIS	
  –	
  Mul*media	
  Lab	
  
“Stop Hugging your Data”
ELIS	
  –	
  Mul*media	
  Lab	
  
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … Open Learning
ELIS	
  –	
  Mul*media	
  Lab	
  
Open
Data
Linked
ELIS	
  –	
  Mul*media	
  Lab	
  
Way of … Publishing
ELIS	
  –	
  Mul*media	
  Lab	
  
Semantic Web
ELIS	
  –	
  Mul*media	
  Lab	
  
ELIS	
  –	
  Mul*media	
  Lab	
  
Connect your Silos
ELIS	
  –	
  Mul*media	
  Lab	
  
5-stars (Technical Perspective)
Open Linked Data (Tim Berners-Lee)
Make your Stuff available on the Web
Make it available as Structured Data
In a non-proprietary Format
Use URLs to identify Things, so one can point at your Stuff
Link your Data to other People’s Data to provide Context
ELIS	
  –	
  Mul*media	
  Lab	
  
5-stars (Organisational Perspective)
Open Data Engagement (Tim Davies)
Be Demand-driven
Provide Context
Support Conversation
Build Skills & Capacity
Collaborate with the Community
ELIS	
  –	
  Mul*media	
  Lab	
  
5-stars (Functional Perspective)
Open Data Portal Functionalities (iMinds)
Dataset Registry
Metadata Provider
Co-creation Platform
Data Publishing Platform
Common Data Hub
ELIS	
  –	
  Mul*media	
  Lab	
  
Data as Commodity
ELIS	
  –	
  Mul*media	
  Lab	
  
Sidenote
R&Wbase
ELIS	
  –	
  Mul*media	
  Lab	
  
ELIS	
  –	
  Mul*media	
  Lab	
  
15’ Open Data Publishing Framework
e.g.
data.gent.be
opendata.antwerpen.be
ELIS	
  –	
  Mul*media	
  Lab	
  
Publishes 2 to 5 Star Data
tdt/core
tdt/input
triple store
ELIS	
  –	
  Mul*media	
  Lab	
  
REST-full API for Developers
triple store
core
RESTful data adapter
CSV
XLS
JSON
XML
SPARQL
endpoint
...
e.g. datatank.gent.be/Grondgebied/Straten
or data.irail.be/NMBS/Stations
ELIS	
  –	
  Mul*media	
  Lab	
  
R&Wbase
git for triples
ELIS	
  –	
  Mul*media	
  Lab	
  
Read/Write
LINKED
DATA
ELIS	
  –	
  Mul*media	
  Lab	
  
TRIPLE STORES
are they up for the challenge?
ELIS	
  –	
  Mul*media	
  Lab	
  
Distributed Triple Version Control
Commits
DeltasVirtual graphs
Versions
store
describe
identify
 resolve
ELIS	
  –	
  Mul*media	
  Lab	
  
LIVE triples
require fast version retrieval
LIGHTWEIGHT
algorithm
through a
ELIS	
  –	
  Mul*media	
  Lab	
  
Store triples
QUADS
<subject> <predicate> <object> <context>
using
ELIS	
  –	
  Mul*media	
  Lab	
  
R&Wbase
GRAPH access
TRIPLE
STORES
PROVENANCE
VERSION
with direct
provides
control
for
and
ELIS	
  –	
  Mul*media	
  Lab	
  
Data
BIG
ELIS	
  –	
  Mul*media	
  Lab	
  
Way of … Analyzing
ELIS	
  –	
  Mul*media	
  Lab	
  
How Difficult Can It Be?
ELIS	
  –	
  Mul*media	
  Lab	
  
Collaborative Effort found Higgs Boson
ELIS	
  –	
  Mul*media	
  Lab	
  
Banking Industry
Healthcare
Industry
Marketing
Industry
Smart Cities
Deep understanding of some key Big Data markets
ELIS	
  –	
  Mul*media	
  Lab	
  
•  US Securities and Exchanges Commission has estimated that it
would need to collect 20 terabytes of data per month to monitor all US
capital market activity
•  Unstructured data comprises some 80% of the total data held by
the average financial institution
•  The total number of non-cash payments in the EU amounted to
90.6 billion in 2011.
•  The total number of automatic teller machines (ATMs) in the EU in
2011 was 0.44 million
•  The number of points of sale (POS) terminals in the EU was 8.8
million in 2011
Big (Data) Bang in Banking
ELIS	
  –	
  Mul*media	
  Lab	
  
What
ifit were
OPEN & LINKED
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … OpenSpending
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … OpenSpending
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … OpenBank
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … OpenCorporates
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … OpenCorporates - Belgium
ELIS	
  –	
  Mul*media	
  Lab	
  
•  Medical images are increasing by 20-40% annually
•  Electronic medical records: in 2009, 99% of primary care physicians
in the Netherlands used EMRs, compared to 46% in the United States
and 36% in Canada
•  Medical research, in which 100,000 participants are genotyped
(ca. 1.5 GB/person), could result in a staggering 150 terabytes of
data.
•  As of July 2012 PatientsLikeMe members have shared 4,029,661
symptom reports about 7,338 symptoms and 548,650 treatment
histories about 12,838 treatments
Big (Data) Bang in Healthcare
ELIS	
  –	
  Mul*media	
  Lab	
  
What
ifit were
OPEN & LINKED
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … PatientsLikeMe
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … 23AndMe
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … PlayStation III
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … OpenPhacts
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … DisQover (iMinds –Ontoforce)
ELIS	
  –	
  Mul*media	
  Lab	
  
•  Data use is expected to grow by as much as 44 times, amounting to
some 35.2ZB (zettabytes -- a billion terabytes) globally
•  Walmart handles more than 1 million customer transactions
every hour, which is imported into databases estimated to
contain more than 2.5 petabytes of data.
•  Twitter has 200 million tweets per day or approximately 46MB/sec of
data created (August 2011)
•  25% of search results for the World’s Top 20 largest brands are links
to user-generated content
•  YouTube has 3 billion visitors per day, 48 hours of video is uploaded
per minute (May 2011)
•  There are over 200,000,000 blogs: 34% of their posts are
opinions about products & brands
Big (Data) Bang in Marketing
ELIS	
  –	
  Mul*media	
  Lab	
  
What
ifit were
OPEN & LINKED
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … Consumers in 1990
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … Consumers in 2000
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … Consumers since 2010
ELIS	
  –	
  Mul*media	
  Lab	
  
The Tyranny of the Empowered ConsYOUmers
ELIS	
  –	
  Mul*media	
  Lab	
  
ELIS	
  –	
  Mul*media	
  Lab	
  
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … GoodRelations
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … Nike
ELIS	
  –	
  Mul*media	
  Lab	
  
•  Data use is expected to grow by as much as 44 times,
amounting to some 35.2ZB (zettabytes -- a billion terabytes) globally
•  Sensors, social media feeds, photos, video and cellphone GPS
signals account for 2.5 quintillion bytes of data per day
•  More than 50% global population lives in cities and this number is
forecast to rise to 69% by 2050
•  The number of city residents is expected to grow from 3.5 billion
to 5 billion in the next 20 years
•  ‘Internet of Things’ Age is approaching: 25 billion devices
connected to the Internet by 2015 and 50 billion by 2020
•  Access to public data is estimated to be worth €27 billion in the EU
•  ICT-enabled energy efficiency could translate into over €600 billion
worth of cost savings for the public and private sector
Big (Data) Bang in Smart Cities
ELIS	
  –	
  Mul*media	
  Lab	
  
What
ifit were
OPEN & LINKED
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … OpenTransport
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … OpenTransport
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … OpenEnergyMonitor
ELIS	
  –	
  Mul*media	
  Lab	
  
ELIS	
  –	
  Mul*media	
  Lab	
  
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … Big Data … in Iceland?
ELIS	
  –	
  Mul*media	
  Lab	
  
e.g. … a Trillion Sensors … in Iceland!
ELIS	
  –	
  Mul*media	
  Lab	
  
ELIS	
  –	
  Mul*media	
  Lab	
  
ELIS	
  –	
  Mul*media	
  Lab	
  
ELIS	
  –	
  Mul*media	
  Lab	
  
ELIS	
  –	
  Mul*media	
  Lab	
  
ELIS	
  –	
  Mul*media	
  Lab	
  
QUESTIONS?
dr. Erik Mannens
erik.mannens@ugent.be
@erikmannens
Thoughts?
ELIS	
  –	
  Mul*media	
  Lab	
  
Credits
•  EMC - Greenplum
•  Peter Hinssen
•  Scott Brinker
•  Jim Lecinski
•  David Armano
•  Did not have time to check all licenses of the Flickr
photos – in my defense, I did not kill anyone nor did I in
any way insult and/or infringe the CIA, NSA, NDA, or
any other JAA (Just Another Acronym)

Flanders Open Data Day II - KeyNote - Erik Mannens