BYTE: 
Big data roadmap and cross-disciplinary community for 
addressing societal Externalities 
Se*ng the scene for Big Data in Europe, 
looking ahead to the case studies 
Guillermo Vega-­‐Gorgojo – Universitetet i Oslo
So far, what we have learned in BYTE? 
◦ Big 
data, 
more 
than 
“the 
3Vs” 
◦ Defini7on, 
dimensions, 
ac7vi7es, 
applica7ons, 
data 
flows, 
policies 
◦ Big 
data 
ini7a7ves 
◦ Technologies 
and 
infrastructures 
for 
big 
data 
◦ Posi7ve 
and 
nega7ve 
societal 
externali7es 
◦ Economic, 
legal, 
social, 
ethical, 
poli7cal… 
@BYTE_EU www.byte-project.eu
What we expect to learn through the 
case studies? 
1. Inves7gate 
which 
posi%ve 
and 
nega%ve 
societal 
externali%es 
@BYTE_EU www.byte-project.eu 
do 
organiza7ons 
create 
through 
the 
use 
of 
big 
data 
2. How 
have 
they 
worked 
to 
amplify 
posi%ve 
externali%es 
3. How 
have 
they 
addressed 
the 
nega%ve 
externali%es 
they 
have 
encountered
A template for the case studies in BYTE 
CASE 
STUDY 
OVERVIEW 
1. Organiza7on 
2. Sector 
3. Case 
study 
moQo 
4. Execu7ve 
summary 
5. Business 
processes 
6. Rela7on 
to 
big 
data 
ini7a7ves 
7. Illustra7ve 
user 
stories 
SOURCES 
OF 
INFORMATION 
◦ Semi-­‐structured 
interviews 
◦ Organiza7on 
documents 
TECHNICAL 
PERSPECTIVE 
8. Data 
sources 
9. Data 
flows 
10. Relevant 
big 
data 
policies 
11. Main 
technical 
challenges 
12. Big 
data 
dimensions 
SOCIETAL 
EXTERNALITIES 
13. Posi7ve 
societal 
externali7es 
14. Nega7ve 
societal 
externali7es 
15. Amplifying 
posi7ve 
externali7es 
16. Addressing 
nega7ve 
externali7es 
@BYTE_EU www.byte-project.eu
A model for the societal externaliKes 
Ci%zens 
Public 
Sector 
@BYTE_EU www.byte-project.eu 
Private 
Sector
Examples of posiKve and negaKve 
societal externaliKes 
Ci%zens 
Public 
Sector 
+ 
support 
communi7es 
-­‐ 
con7nuous 
and 
invisible 
surveillance 
@BYTE_EU www.byte-project.eu 
+ 
commercializa7on 
of 
new 
goods 
and 
services 
+ 
data-­‐driven 
employment 
offerings 
Private 
Sector 
+ 
innova7ve 
business 
models 
-­‐ 
inequali7es 
to 
data 
access 
-­‐ 
need 
to 
reconcile 
different 
laws 
and 
agreements 
+ 
economic 
growth 
through 
community 
building 
-­‐ 
compe77ve 
disadvantage 
of 
newer 
businesses 
and 
SMEs 
-­‐ 
private 
data 
misuse 
-­‐ 
invasive 
use 
of 
informa7on 
+ 
accelerate 
scien7fic 
progress 
+ 
transparency 
and 
accountability 
-­‐ 
distrust 
of 
government 
data-­‐ 
based 
ac7vi7es
The case studies 
Case 
study 
Organiza%on 
Contact 
partner 
Environment 
ESA 
and 
others 
CNR 
Crime 
XXX 
TRI 
Smart 
ci7es 
Siemens 
Siemens 
Culture 
Europeana 
TRI 
Energy 
Statoil 
UiO 
Health 
Ins7tute 
of 
Child 
Health 
TRI 
Transport 
Rolls 
Royce/Farstad 
shipping 
DNV 
@BYTE_EU www.byte-project.eu
Preliminary case study analysis for Statoil 
Case study overview 
1. Organiza%on 
Statoil 
2. Sector 
ENERGY 
3. Case 
study 
moQo 
Improve 
decision 
making 
in 
oil 
& 
gas 
explora7on 
in 
the 
presence 
of 
par7al 
informa7on 
and 
limited 
7me. 
5. Business 
processes 
Oil 
& 
gas 
explora7on 
decision-­‐making 
6. Rela%on 
to 
big 
data 
ini%a%ves 
Research 
projects: 
OPTIQUE 
4. Execu%ve 
summary 
In 
the 
early 
phases 
of 
the 
explora7on 
process 
of 
oil 
and 
gas 
many 
prospects, 
i.e. 
@BYTE_EU www.byte-project.eu 
poten%al 
projects, 
are 
at 
any 
7me 
under 
evalua7on 
in 
order 
to 
select 
just 
a 
few 
of 
them 
for 
further 
inves7ga7on. 
These 
decisions 
are 
oken 
of 
cri7cal 
importance 
for 
Statoil. 
However, 
in 
most 
cases 
prospects 
have 
to 
be 
selected 
on 
a 
short 
no%ce 
and 
on 
the 
basis 
of 
only 
par%al 
informa%on. 
Typically, 
explora7on 
experts 
in 
these 
very 
early 
phases 
of 
an 
explora7on 
project 
spend 
just 
a 
few 
days 
collec7ng 
relevant 
informa7on 
before 
they 
embark 
on 
further 
analyses; 
the 
data 
that 
is 
not 
found 
within 
this 
7me 
frame 
is 
then 
simply 
ignored, 
and 
will 
hence 
not 
influence 
the 
important 
selec7on 
of 
prospects. 
If 
the 
geophysics 
and 
geology 
(G&G) 
experts 
u7lize 
all 
the 
data 
available, 
this 
will 
reduce 
the 
risk 
factor 
in 
the 
selec7on 
process, 
and 
hence 
also 
increase 
the 
chances 
that 
the 
‘right’ 
prospects 
are 
selected. 
In 
the 
end 
this 
will 
in 
all 
likelihood 
increase 
the 
number 
of 
successful 
explora%on 
projects 
for 
Statoil.
Preliminary case study analysis for Statoil 
Technical descripKon 
8. Data 
sources 
Name: 
Subsurface 
Short 
descrip7on: 
◦ Seismic 
survey 
◦ Seismic 
& 
geophysical 
data 
◦ Well 
and 
wellbore 
data 
◦ Acquisi7on 
reports 
Domain: 
geophysics 
and 
geology 
How 
is 
collected: 
◦ Seismic 
shots 
◦ Well 
data 
from 
drilling 
opera7ons 
◦ Reports 
from 
value-­‐adding 
analysis 
Size: 
~8 
PB 
… 
11. Main 
technical 
challenges 
Data 
storage 
and 
access: 
VERY 
CHALLENGING 
◦ G&G 
experts 
in 
explora7on 
spend 
16% 
of 
their 
7me 
on 
finding 
the 
relevant 
data 
sets 
and 
documents 
(internal 
survey 
of 
Statoil 
in 
2005) 
◦ There 
is 
a 
plethora 
of 
tools 
to 
access 
and 
process 
the 
different 
kinds 
of 
data, 
amplified 
by 
the 
segrega7on 
into 
silos 
Data 
integra7on: 
CHALLENGING 
◦ There 
is 
a 
clear 
need 
to 
integrate 
the 
data 
scaQered 
across 
different 
repositories 
and 
databases 
from 
mul7ple 
vendors. 
For 
instance, 
the 
provided 
user 
story 
reflects 
that 
the 
Subsurface 
database 
was 
not 
up 
to 
date 
due 
to 
limited 
integra7on 
with 
the 
OpenWorks 
project 
databases 
… 
@BYTE_EU www.byte-project.eu 
12. Big 
data 
dimensions 
Volume: 
YES 
◦ Some 
datasets 
are 
at 
a 
scale 
of 
PBs 
◦ Extremely 
complex 
queries 
that 
can 
involve 
more 
than 
30 
joins 
Velocity: 
NO 
◦ No 
streaming 
data 
processing 
Variety: 
YES 
◦ Need 
of 
different 
data 
models 
to 
reflect 
the 
views 
of 
Drilling 
Engineers, 
Petrophysicists, 
Geophysicists, 
Geologists 
and 
Reservoir 
Engineers 
◦ Very 
complex 
data 
models: 
~K 
of 
tables 
and 
~10K 
columns 
Veracity: 
YES 
◦ Some 
of 
the 
employed 
data 
sources 
are 
more 
trustworthy 
than 
others
Preliminary case study analysis for Statoil 
Societal externaliKes 
Statoil 
– 
Ci%zens 
+ Reduced 
risk 
for 
environment 
+ Demand 
for 
hiring 
big 
data 
analysts 
Statoil 
– 
Other 
corpora%ons 
+ New 
work 
processes 
and 
vendor 
ecosystems 
- Data 
lock-­‐in, 
contracts 
prohibit 
access 
to 
data 
for 
third 
par7es 
- Increased 
risk 
of 
exposing 
confiden7al 
data 
Statoil 
– 
Public 
sector 
+ BeQer 
informed 
decisions 
for 
drilling 
opera7ons 
based 
on 
open 
government 
data 
(FactPages) 
- Compe77ve 
advantage 
of 
the 
private 
sector 
w.r.t 
open 
data 
(Statoil 
doesn’t 
have 
to 
open 
their 
data, 
while 
it 
has 
access 
to 
public 
data) 
@BYTE_EU www.byte-project.eu
Societal externaliKes (1-­‐3) 
Public 
sector 
– 
Ci%zens 
+ Gather 
public 
insight 
by 
iden7fying 
social 
trends 
and 
sta7s7cs 
+ Accelerate 
scien7fic 
progress 
+ Tracking 
environmental 
challenges 
+ Transparency 
and 
accountability 
of 
the 
public 
sector 
+ Increased 
ci7zen 
par7cipa7on 
+ Foster 
innova7on, 
e.g. 
new 
applica7ons, 
from 
government 
data 
+ BeQer 
services, 
e.g. 
health 
care 
and 
educa7on, 
through 
data 
sharing 
and 
analysis 
+ More 
targeted 
services 
for 
ci7zens, 
through 
profiling 
popula7ons 
+ cost-­‐effec7veness 
of 
services 
+ crime 
preven7on 
and 
detec7on, 
including 
fraud 
- Distrust 
of 
government 
data-­‐based 
ac7vi7es 
- Unnecessary 
surveillance 
- Compromise 
to 
government 
security 
and 
privacy 
- Private 
data 
misuse, 
especially 
sharing 
with 
third 
par7es 
without 
consent 
- Threats 
to 
data 
protec7on 
and 
personal 
privacy 
- Threats 
to 
intellectual 
property 
rights 
(including 
scholars' 
rights 
and 
contribu7ons) 
- Public 
reluctance 
to 
provide 
informa7on 
(especially 
personal 
data) 
@BYTE_EU www.byte-project.eu
Societal externaliKes (2-­‐3) 
Private 
sector 
– 
Ci%zens 
+ Rapid 
commercializa7on 
of 
new 
goods 
and 
services 
+ Free 
use 
of 
services, 
e.g. 
email, 
search 
engines 
+ Enhances 
in 
data-­‐driven 
R&D 
+ Making 
society 
energy 
efficient 
+ Op7miza7on 
of 
u7li7es 
through 
data 
analy7cs 
+ Data-­‐driven 
employment 
offerings 
+ Marke7ng 
improvement 
+ Increased 
insight 
of 
goods 
(more 
transparency) 
+ Increased 
transparency 
in 
commercial 
decision 
making 
+ Fostering 
innova7on 
from 
opening 
data 
+ Increase 
awareness 
about 
privacy 
viola7ons 
and 
ethical 
issues 
of 
big 
data 
+ Time-­‐saving 
in 
transac7ons 
if 
personal 
data 
were 
already 
held 
- Employment 
losses 
for 
certain 
job 
categories 
- Invasive 
use 
of 
informa7on 
- Risk 
of 
informa7onal 
rent-­‐seeking 
- Discriminatory 
prac7ces 
and 
targeted 
adver7sing 
- Distrust 
of 
commercial 
data-­‐based 
ac7vi7es 
- Unethical 
exploita7on 
of 
data 
- Reduced 
market 
compe77on 
- Consumer 
manipula7on 
- Crea7on 
of 
data-­‐based 
monopolies 
(plaxorms 
and 
services) 
- Private 
data 
accumula7on 
and 
ownership 
- Private 
data 
leakage 
- Private 
data 
misuse, 
especially 
sharing 
with 
third 
par7es 
without 
consent 
- Privacy 
threats 
even 
with 
anonymized 
data 
and 
with 
data 
mining 
- Threats 
to 
intellectual 
property 
rights 
- Public 
reluctance 
to 
provide 
informa7on 
(especially 
personal 
data) 
- “Sabotaged" 
data 
prac7ces 
@BYTE_EU www.byte-project.eu
Societal externaliKes (3-­‐3) 
Ci%zens 
– 
Ci%zens 
+ Support 
communi7es 
- Con7nuous 
and 
invisible 
surveillance 
Private 
sector 
– 
Private 
sector 
+ Opportuni7es 
for 
economic 
growth 
+ Innova7ve 
business 
models 
- Barriers 
to 
market 
entry 
- Inequali7es 
to 
data 
access 
- Market 
manipula7on 
- Challenge 
of 
tradi7onal 
non-­‐digital 
services 
- Dependency 
on 
external 
data 
sources, 
plaxorms 
and 
services 
- Compe77ve 
disadvantage 
of 
newer 
businesses 
and 
SMEs 
- Reduced 
growth 
and 
profit 
among 
all 
business 
- Threats 
to 
commercially 
valuable 
informa7on 
Public 
sector 
– 
Private 
sector 
+ Opportuni7es 
for 
economic 
growth 
+ Innova7ve 
business 
models 
+ Support 
communi7es 
- Open 
data 
puts 
the 
private 
sector 
at 
a 
compe77ve 
@BYTE_EU www.byte-project.eu 
advantage 
- Inequali7es 
to 
data 
access, 
especially 
in 
research 
- Taxa7on 
leakages 
- Lack 
of 
norms 
for 
data 
storage 
and 
processing 
Public 
sector 
– 
Public 
sector 
- Geopoli7cal 
tensions 
due 
to 
surveillance 
out 
of 
the 
boundaries 
of 
states 
- Need 
to 
reconcile 
different 
laws 
and 
agreements, 
e.g. 
"right 
to 
be 
forgoQen" 
Barriers 
to 
market 
entry

Setting the Scene for Big Data in Europe, Looking Ahead to the Case Studies

  • 1.
    BYTE: Big dataroadmap and cross-disciplinary community for addressing societal Externalities Se*ng the scene for Big Data in Europe, looking ahead to the case studies Guillermo Vega-­‐Gorgojo – Universitetet i Oslo
  • 2.
    So far, whatwe have learned in BYTE? ◦ Big data, more than “the 3Vs” ◦ Defini7on, dimensions, ac7vi7es, applica7ons, data flows, policies ◦ Big data ini7a7ves ◦ Technologies and infrastructures for big data ◦ Posi7ve and nega7ve societal externali7es ◦ Economic, legal, social, ethical, poli7cal… @BYTE_EU www.byte-project.eu
  • 3.
    What we expectto learn through the case studies? 1. Inves7gate which posi%ve and nega%ve societal externali%es @BYTE_EU www.byte-project.eu do organiza7ons create through the use of big data 2. How have they worked to amplify posi%ve externali%es 3. How have they addressed the nega%ve externali%es they have encountered
  • 4.
    A template forthe case studies in BYTE CASE STUDY OVERVIEW 1. Organiza7on 2. Sector 3. Case study moQo 4. Execu7ve summary 5. Business processes 6. Rela7on to big data ini7a7ves 7. Illustra7ve user stories SOURCES OF INFORMATION ◦ Semi-­‐structured interviews ◦ Organiza7on documents TECHNICAL PERSPECTIVE 8. Data sources 9. Data flows 10. Relevant big data policies 11. Main technical challenges 12. Big data dimensions SOCIETAL EXTERNALITIES 13. Posi7ve societal externali7es 14. Nega7ve societal externali7es 15. Amplifying posi7ve externali7es 16. Addressing nega7ve externali7es @BYTE_EU www.byte-project.eu
  • 5.
    A model forthe societal externaliKes Ci%zens Public Sector @BYTE_EU www.byte-project.eu Private Sector
  • 6.
    Examples of posiKveand negaKve societal externaliKes Ci%zens Public Sector + support communi7es -­‐ con7nuous and invisible surveillance @BYTE_EU www.byte-project.eu + commercializa7on of new goods and services + data-­‐driven employment offerings Private Sector + innova7ve business models -­‐ inequali7es to data access -­‐ need to reconcile different laws and agreements + economic growth through community building -­‐ compe77ve disadvantage of newer businesses and SMEs -­‐ private data misuse -­‐ invasive use of informa7on + accelerate scien7fic progress + transparency and accountability -­‐ distrust of government data-­‐ based ac7vi7es
  • 7.
    The case studies Case study Organiza%on Contact partner Environment ESA and others CNR Crime XXX TRI Smart ci7es Siemens Siemens Culture Europeana TRI Energy Statoil UiO Health Ins7tute of Child Health TRI Transport Rolls Royce/Farstad shipping DNV @BYTE_EU www.byte-project.eu
  • 8.
    Preliminary case studyanalysis for Statoil Case study overview 1. Organiza%on Statoil 2. Sector ENERGY 3. Case study moQo Improve decision making in oil & gas explora7on in the presence of par7al informa7on and limited 7me. 5. Business processes Oil & gas explora7on decision-­‐making 6. Rela%on to big data ini%a%ves Research projects: OPTIQUE 4. Execu%ve summary In the early phases of the explora7on process of oil and gas many prospects, i.e. @BYTE_EU www.byte-project.eu poten%al projects, are at any 7me under evalua7on in order to select just a few of them for further inves7ga7on. These decisions are oken of cri7cal importance for Statoil. However, in most cases prospects have to be selected on a short no%ce and on the basis of only par%al informa%on. Typically, explora7on experts in these very early phases of an explora7on project spend just a few days collec7ng relevant informa7on before they embark on further analyses; the data that is not found within this 7me frame is then simply ignored, and will hence not influence the important selec7on of prospects. If the geophysics and geology (G&G) experts u7lize all the data available, this will reduce the risk factor in the selec7on process, and hence also increase the chances that the ‘right’ prospects are selected. In the end this will in all likelihood increase the number of successful explora%on projects for Statoil.
  • 9.
    Preliminary case studyanalysis for Statoil Technical descripKon 8. Data sources Name: Subsurface Short descrip7on: ◦ Seismic survey ◦ Seismic & geophysical data ◦ Well and wellbore data ◦ Acquisi7on reports Domain: geophysics and geology How is collected: ◦ Seismic shots ◦ Well data from drilling opera7ons ◦ Reports from value-­‐adding analysis Size: ~8 PB … 11. Main technical challenges Data storage and access: VERY CHALLENGING ◦ G&G experts in explora7on spend 16% of their 7me on finding the relevant data sets and documents (internal survey of Statoil in 2005) ◦ There is a plethora of tools to access and process the different kinds of data, amplified by the segrega7on into silos Data integra7on: CHALLENGING ◦ There is a clear need to integrate the data scaQered across different repositories and databases from mul7ple vendors. For instance, the provided user story reflects that the Subsurface database was not up to date due to limited integra7on with the OpenWorks project databases … @BYTE_EU www.byte-project.eu 12. Big data dimensions Volume: YES ◦ Some datasets are at a scale of PBs ◦ Extremely complex queries that can involve more than 30 joins Velocity: NO ◦ No streaming data processing Variety: YES ◦ Need of different data models to reflect the views of Drilling Engineers, Petrophysicists, Geophysicists, Geologists and Reservoir Engineers ◦ Very complex data models: ~K of tables and ~10K columns Veracity: YES ◦ Some of the employed data sources are more trustworthy than others
  • 10.
    Preliminary case studyanalysis for Statoil Societal externaliKes Statoil – Ci%zens + Reduced risk for environment + Demand for hiring big data analysts Statoil – Other corpora%ons + New work processes and vendor ecosystems - Data lock-­‐in, contracts prohibit access to data for third par7es - Increased risk of exposing confiden7al data Statoil – Public sector + BeQer informed decisions for drilling opera7ons based on open government data (FactPages) - Compe77ve advantage of the private sector w.r.t open data (Statoil doesn’t have to open their data, while it has access to public data) @BYTE_EU www.byte-project.eu
  • 11.
    Societal externaliKes (1-­‐3) Public sector – Ci%zens + Gather public insight by iden7fying social trends and sta7s7cs + Accelerate scien7fic progress + Tracking environmental challenges + Transparency and accountability of the public sector + Increased ci7zen par7cipa7on + Foster innova7on, e.g. new applica7ons, from government data + BeQer services, e.g. health care and educa7on, through data sharing and analysis + More targeted services for ci7zens, through profiling popula7ons + cost-­‐effec7veness of services + crime preven7on and detec7on, including fraud - Distrust of government data-­‐based ac7vi7es - Unnecessary surveillance - Compromise to government security and privacy - Private data misuse, especially sharing with third par7es without consent - Threats to data protec7on and personal privacy - Threats to intellectual property rights (including scholars' rights and contribu7ons) - Public reluctance to provide informa7on (especially personal data) @BYTE_EU www.byte-project.eu
  • 12.
    Societal externaliKes (2-­‐3) Private sector – Ci%zens + Rapid commercializa7on of new goods and services + Free use of services, e.g. email, search engines + Enhances in data-­‐driven R&D + Making society energy efficient + Op7miza7on of u7li7es through data analy7cs + Data-­‐driven employment offerings + Marke7ng improvement + Increased insight of goods (more transparency) + Increased transparency in commercial decision making + Fostering innova7on from opening data + Increase awareness about privacy viola7ons and ethical issues of big data + Time-­‐saving in transac7ons if personal data were already held - Employment losses for certain job categories - Invasive use of informa7on - Risk of informa7onal rent-­‐seeking - Discriminatory prac7ces and targeted adver7sing - Distrust of commercial data-­‐based ac7vi7es - Unethical exploita7on of data - Reduced market compe77on - Consumer manipula7on - Crea7on of data-­‐based monopolies (plaxorms and services) - Private data accumula7on and ownership - Private data leakage - Private data misuse, especially sharing with third par7es without consent - Privacy threats even with anonymized data and with data mining - Threats to intellectual property rights - Public reluctance to provide informa7on (especially personal data) - “Sabotaged" data prac7ces @BYTE_EU www.byte-project.eu
  • 13.
    Societal externaliKes (3-­‐3) Ci%zens – Ci%zens + Support communi7es - Con7nuous and invisible surveillance Private sector – Private sector + Opportuni7es for economic growth + Innova7ve business models - Barriers to market entry - Inequali7es to data access - Market manipula7on - Challenge of tradi7onal non-­‐digital services - Dependency on external data sources, plaxorms and services - Compe77ve disadvantage of newer businesses and SMEs - Reduced growth and profit among all business - Threats to commercially valuable informa7on Public sector – Private sector + Opportuni7es for economic growth + Innova7ve business models + Support communi7es - Open data puts the private sector at a compe77ve @BYTE_EU www.byte-project.eu advantage - Inequali7es to data access, especially in research - Taxa7on leakages - Lack of norms for data storage and processing Public sector – Public sector - Geopoli7cal tensions due to surveillance out of the boundaries of states - Need to reconcile different laws and agreements, e.g. "right to be forgoQen" Barriers to market entry