www.bsc.es 
Volum, Varietat, Velocitat … 
i Compartició 
Anna Queralt 
Storage System Research Group 
anna.queralt@bsc.es
Looking at things from a different perspective 
“Creativity is just 
connecting things” 
Steve Jobs 
“True originality consists 
not in a new manner, but 
in a new vision” 
Edith Wharton 
“Cambiar de respuesta 
es evolución. Cambiar de 
pregunta es revolución” 
Jorge Wagensberg 
“We cannot solve our 
problems with the same 
thinking we used when 
we created them” 
Albert Einstein
Big Data 
sharing
Open Data 
Open data are the building blocks of open knowledge. 
Open knowledge is what open data becomes when it’s 
useful, usable and used. 
Open data is data that can be freely used, reused and 
redistributed by anyone - subject only, at most, to the 
requirement to attribute and sharealike.
Importance of Open Data in Europe 
“Towards a thriving data driven economy” 
European stragegy on data, with Open Data as 
a prominent element 
– Infrastructure 
– Analysis 
– Privacy 
– ...
Why? 
Makes public administration more efficient and more effective 
– Thanks to Open Data, the US government has reduced the annual costs of 
attending citizens from 500 M$ to 34 M$ 
Open data portals stimulate innovation and economic growth 
– Applications that can help to improve society, tackle economical problems, 
generate employment and drive economic growth 
– Research suggests that seven sectors alone could generate 
more than $3 trillion a year in additional value as a result of 
open data 
Open Data: Unlocking Innovation And Performance With Liquid Information 
(McKinsey Global Institute) 
– Big Data and open data will contribute more than 200.000 
M€ to the European economy by 2020 
Big&Open Data in Europe: a growth engine or a missed opportunity? 
(demosEuropa, WISE , Microsoft)
How is data shared today? 
Most open data is available as downloadable files (2509 sources)
How is data shared today? 
Only 27% of sources are provided in a processable format (2132 are PDF)
How is data shared today? 
Downloadable files: owner decides what can be copied 
Unnecessary data movements and copies 
Stale data 
Owner loses control over data 
Flexible 
Data services: owner decides what and how data is shared 
Very restrictive 
Changes imply data provider involvement 
No data movements or copies 
Owner keeps full control
A new way of sharing data 
dataClay
The pillars of dataClay 
Data sharing 
Control 
Avoiding data transfers 
A single data model
Why persistent data is different than volatile? 
Today 
We have a data model for volatile data 
Objects and data structures 
We have a different model for persistent data 
Relational database, NoSQL database, files 
Future 
Store data in the same way as when volatile 
Store objects and relations
Our vision 
Create a platform that 
Enables applications to easily make objects persistent 
Enables users to add more data or “change” the data model 
Enables users to add new computations to be shared 
& 
The data owner does not lose control over the data 
Key idea: self-contained objects and 
data enrichment by 3rd parties
Push the idea of data services to the limit 
Key technology: self-contained objects 
Data 
Client App Client App 
Data Data 
Data 
Functions 
Security, Integrity, … 
Data 
Security, ... 
Functions 
Data service 
Data store 
Data store
Key-technology: 3rd party enrichments 
Self-contained objects 
seem to be a new technology to offer 
data services in a different way 
Then… 
… we need something else … 
… something to make it really flexible!
3rd-party enrichment 
By enrichment we understand: 
Adding new information to existing datasets 
Adding new code to existing datasets 
This enrichment should 
Be possible during the life of data 
Not be limited to the data owner 
Enable different views of the data to different users/clients 
Be shareable again
Data can be enriched both with data and code in provider infrastructure 
Code can be executed in the provider infrastructure 
Then… 
Enrichment 
Client App 
Data provider infrastructure
Efficient usage of resources 
Data and code can be offloaded to resources not accessible by the data provider 
Moreover… 
Data 
Security, ... 
Functions 
Provider Infrastructure 
Client Infrastructure 
Cloud
CONCLUSIONS
Sharing (big) data is key to innovation 
Conclusions 
Build new knowledge on top, and share it 
See data produced by others from 
a different perspective
Credits 
dataClay team 
– Toni Cortés (Team leader) 
– Anna Queralt (PhD) 
– Jonathan Martí 
– Daniel Gasull 
– Juanjo Costa (PhD) 
– Alex Barceló 
Former team members 
– Ernest Artiaga (PhD)

Volum, Varietat, Velocitat... i Compartició

  • 1.
    www.bsc.es Volum, Varietat,Velocitat … i Compartició Anna Queralt Storage System Research Group anna.queralt@bsc.es
  • 2.
    Looking at thingsfrom a different perspective “Creativity is just connecting things” Steve Jobs “True originality consists not in a new manner, but in a new vision” Edith Wharton “Cambiar de respuesta es evolución. Cambiar de pregunta es revolución” Jorge Wagensberg “We cannot solve our problems with the same thinking we used when we created them” Albert Einstein
  • 3.
  • 4.
    Open Data Opendata are the building blocks of open knowledge. Open knowledge is what open data becomes when it’s useful, usable and used. Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.
  • 5.
    Importance of OpenData in Europe “Towards a thriving data driven economy” European stragegy on data, with Open Data as a prominent element – Infrastructure – Analysis – Privacy – ...
  • 6.
    Why? Makes publicadministration more efficient and more effective – Thanks to Open Data, the US government has reduced the annual costs of attending citizens from 500 M$ to 34 M$ Open data portals stimulate innovation and economic growth – Applications that can help to improve society, tackle economical problems, generate employment and drive economic growth – Research suggests that seven sectors alone could generate more than $3 trillion a year in additional value as a result of open data Open Data: Unlocking Innovation And Performance With Liquid Information (McKinsey Global Institute) – Big Data and open data will contribute more than 200.000 M€ to the European economy by 2020 Big&Open Data in Europe: a growth engine or a missed opportunity? (demosEuropa, WISE , Microsoft)
  • 7.
    How is datashared today? Most open data is available as downloadable files (2509 sources)
  • 8.
    How is datashared today? Only 27% of sources are provided in a processable format (2132 are PDF)
  • 9.
    How is datashared today? Downloadable files: owner decides what can be copied Unnecessary data movements and copies Stale data Owner loses control over data Flexible Data services: owner decides what and how data is shared Very restrictive Changes imply data provider involvement No data movements or copies Owner keeps full control
  • 10.
    A new wayof sharing data dataClay
  • 11.
    The pillars ofdataClay Data sharing Control Avoiding data transfers A single data model
  • 12.
    Why persistent datais different than volatile? Today We have a data model for volatile data Objects and data structures We have a different model for persistent data Relational database, NoSQL database, files Future Store data in the same way as when volatile Store objects and relations
  • 13.
    Our vision Createa platform that Enables applications to easily make objects persistent Enables users to add more data or “change” the data model Enables users to add new computations to be shared & The data owner does not lose control over the data Key idea: self-contained objects and data enrichment by 3rd parties
  • 14.
    Push the ideaof data services to the limit Key technology: self-contained objects Data Client App Client App Data Data Data Functions Security, Integrity, … Data Security, ... Functions Data service Data store Data store
  • 15.
    Key-technology: 3rd partyenrichments Self-contained objects seem to be a new technology to offer data services in a different way Then… … we need something else … … something to make it really flexible!
  • 16.
    3rd-party enrichment Byenrichment we understand: Adding new information to existing datasets Adding new code to existing datasets This enrichment should Be possible during the life of data Not be limited to the data owner Enable different views of the data to different users/clients Be shareable again
  • 17.
    Data can beenriched both with data and code in provider infrastructure Code can be executed in the provider infrastructure Then… Enrichment Client App Data provider infrastructure
  • 18.
    Efficient usage ofresources Data and code can be offloaded to resources not accessible by the data provider Moreover… Data Security, ... Functions Provider Infrastructure Client Infrastructure Cloud
  • 19.
  • 20.
    Sharing (big) datais key to innovation Conclusions Build new knowledge on top, and share it See data produced by others from a different perspective
  • 22.
    Credits dataClay team – Toni Cortés (Team leader) – Anna Queralt (PhD) – Jonathan Martí – Daniel Gasull – Juanjo Costa (PhD) – Alex Barceló Former team members – Ernest Artiaga (PhD)