Empowering Communities with
Data Technologies
Year #1 Series of Workshops
General
Overview
BIG DATA EUROPE
The Motivation – Big Data
Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the
world today has been created in the last two years alone.
This data comes from everywhere: sensors used to gather climate information, posts to
social media sites, digital pictures and videos, purchase transaction records, and cell phone
GPS signals to name a few.
This data is big data. Source:
Big Data
21-mai-15www.big-data-europe.eu
BIG
DATA
Big Data: Dimensions
www.big-data-europe.eu
Volume
Velocity
Variety
100010101010101010101010
010100101010101010010100
101010010100101010010100
101001010100101010100101
010001010101010010101010
101010101001010101010101
001010101110001010101010
101010101001010010101010
101001010010101001010010
101001010010100101010010
101010010101000101010101
00101010101
1000101010
1010101010
1010010100
1010101010
1001010010
…….
………….
……
………..…….
.
……………
1 0
1000101010
1010101010
1010010100
1010101010
100101oo11
Veracity!
Big Data Dimensions
Health
Climate
Energy
Transport
Food
Societies
Security
Big Data in Europe: Challenges,
Opportunities
www.big-data-europe.eu
Loremi
psumd
olors
KSDJ
OPSC
KKSD
KAB
LKASJL
LAWWD
S
wpweppe
pwpisio
we
101010
011010
100101
010
Regional Data Repositories
#1: Compile, Harmonise, Publish
101010
011010
100101
010
101010
011010
100101
010
#2: Interlink, Centralise Access, Explore
101010100101010101001
011010001010101010010
101010100001011010001
010101010010101010100
100101010100101010101
001011010001010
Data
Eleme
nt
#3: Analyse, Discover, Visualise
#4: Mashup, Cross-domain Exploitation
Journalists Authorities
Big Data in Europe:
Obstacles
21-mai-15www.big-data-europe.eu
#1 Big Data “Variety“ problem
 Multiple Data Sources
 Required: Integration, Harmonisation
#2 Opening-up Data concerns
 Loss of control, lack of tracking
 Reservations about large corporations
#3 Limited Skills, Training,
Technology
 Lack of Data Scientists
 Lack of Generic Architectures, components
Data Value Chain Evolution
21-mai-15www.big-data-europe.eu
Extraction, Curation Quality, Linking,
Integration
Publication,
Visualization, Analysis
Extraction, Curation, Quality,
Linking, Integration, Publication,
Visualization, Analysis
Health
Transport
Security
Extraction Curation Quality Linking Integration Publication Visualization Analysis
Data Repositories Linked Open Data
Cloud
Stage 1
Stage 2
Stage 3
Food SocietiesClimate Energy
BigDataEurop
e – The
Project
21-mai-15www.big-data-europe.eu
Rationale
 Show societal value of Big Data
 Lower barrrier for using big data technologies
o Required effort and resources
o Limited data science skills
 Help establishing cross-
lingual/organizational/domain Data Value
Chains
21-mai-15
BigDataEurope: Objectives
21-mai-15www.big-data-europe.eu
COORDINATION
Stakeholder Engagement
(Requirements Elicitation)
SUPPORT
Design, Realise, Evaluate
Big Data Aggregator
Platform
Create and Manage Societal
Big Data Interest Groups
Cloud-deployment ready
Big Data Aggregator
Platform
CSA
Measures
Results
BigDataEurope: Consortium
Big Data Ecosystems: Orthogonal
Dimensions
21-mai-15www.big-data-europe.eu
Generic Big Data Enabling Technologies
Data Value Chain
Data Generation
& Acquisition
Data Analysis &
Processing
Data Storage &
Curation
Data
Visualization &
Usage
Data-driven
Services
SocietalChallenges
DomainSpecificDataAssets&Technology
Healthcare
Food Security
Energy
Intelligent Transport
Climate & Environment
Inclusive & Reflective Societies
Secure Societies
Work Packages & Phases
Community
Building
M1-M12 M13-M24 M25-M36
Enabling
Technologies
Component
Integration
Uptake
Integrator
Deployment
Community
Assessment
WP3 – Big Data Generic Enabling
Technologies & Architecture
WP5 – Big Data Integrator Instances
WP7 – Dissemination & Communication
WP2 – Community Building & Requirements
WP4 – Big Data Integrator Platform
WP6 – Real-life Deployment & User Evaluation
Societal Domains, Focus Areas, Data
assetts
Societal Domain Preliminary Big Data Focus area Selected Key Data assets
Life Sciences &
Health
Heterogeneous data Linking & integration
Biomedical Semantic Indexing & QA
ACD Labs / ChemSpider, ChEBI, ChEMBL, Con-ceptWiki, DrugBank, EN-
ZYME, Gene Ontology, GO Annotation, Swis-sProt, UniProt, Wik-
iPathways, PubMed, MeSH, Disease Ontology (DO), Joint Chemical Dic-
tionary (Jochem), Bio-ASQ datasets
Food & Agriculture Large-scale distributed data integration
INFOODS, AQUASTAT Green Learning Network (GLN), Agricultural
Bibliography Network (ABN), AGRIS, AquaMaps, Fishbase
Energy
Real-time monitoring, stream processing, data
analytics, and decision support
European Energy Exchange Data, smart meter measurement data,
gas/fuels/energy market/price data, consumption statistics, equipment
condition monitoring data)
Transport
Streaming sensor network & geo-spatial data
integration
GTFS data, OSM/ LinkedGeoData, MobilityMaps, Transport sensor data,
ROSATTE Road safety attributes, European Road Data Infrastructure -
EuroRoadS
Climate
Real-time monitoring, stream processing, and
data analytics.
European Grid Infrastructure (EGI), Databases hosting atmospheric data.
Several software frameworks for simulation, calibration and
reconstruction.
Social Sciences
Statistical and research data linking &
integration
Federated social sciences data catalogs, statistical data from public data
portals and statistical offices (e.g. EuroStats, UNESCO, WorldBank)
Security
Real-time monitoring, stream processing, and
data analytics.
Earth Observation data (e.g. Very High Resolution Satellite Imagery
acquired from commercial providers and governmental systems) and
collateral data for supporting CFSP/CSDP missions and operations,
Stakeholder Engagement Cycle
Data Aggregator Platform:
Blueprint
Batch Layer
Speed Layer
Data Storage
Real-time data &
Transactions …
Batch View
Real-time
View
messagepassing
message passing
Applications & Showcases
Real-time dashboards
Domain-specific BDE apps
Big Data Analytics
In-stream Mining
BDEPlatform&
Intelligence
Input data
Stream
Spatial
Social
Statistical
Temporal
Transaction
al
Imagery
+ Semantic Layer (Retaining Semantics using LD
Lambda Architecture
Current Activities – Year#1
 2015 BDE Societal Workshops (7) Planned
o Schedule on Website
 7 W3C Interest Groups set up: Please Join!
o SC1: HEALTH https://www.w3.org/community/bde-health/join
o SC2: FOOD & AGRICULTURE https://www.w3.org/community/bde-food/
o SC3: ENERGY https://www.w3.org/community/bde-energy/
o SC4: TRANSPORT https://www.w3.org/community/bde-transport/
o SC5: CLIMATE & ENVIRONMENT https://www.w3.org/community/bde-climate/
o SC6: SOCIETIES https://www.w3.org/community/bde-societies/
o SC7: SECURITY https://www.w3.org/community/bde-secure-societies/
www.big-data-europe.eu

BigDataEurope: Project Introduction @ Year #1 Workshops

  • 1.
    Empowering Communities with DataTechnologies Year #1 Series of Workshops General Overview BIG DATA EUROPE
  • 2.
    The Motivation –Big Data Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data. Source:
  • 3.
  • 4.
  • 5.
  • 6.
    Health Climate Energy Transport Food Societies Security Big Data inEurope: Challenges, Opportunities www.big-data-europe.eu Loremi psumd olors KSDJ OPSC KKSD KAB LKASJL LAWWD S wpweppe pwpisio we 101010 011010 100101 010 Regional Data Repositories #1: Compile, Harmonise, Publish 101010 011010 100101 010 101010 011010 100101 010 #2: Interlink, Centralise Access, Explore 101010100101010101001 011010001010101010010 101010100001011010001 010101010010101010100 100101010100101010101 001011010001010 Data Eleme nt #3: Analyse, Discover, Visualise #4: Mashup, Cross-domain Exploitation Journalists Authorities
  • 7.
    Big Data inEurope: Obstacles 21-mai-15www.big-data-europe.eu #1 Big Data “Variety“ problem  Multiple Data Sources  Required: Integration, Harmonisation #2 Opening-up Data concerns  Loss of control, lack of tracking  Reservations about large corporations #3 Limited Skills, Training, Technology  Lack of Data Scientists  Lack of Generic Architectures, components
  • 8.
    Data Value ChainEvolution 21-mai-15www.big-data-europe.eu Extraction, Curation Quality, Linking, Integration Publication, Visualization, Analysis Extraction, Curation, Quality, Linking, Integration, Publication, Visualization, Analysis Health Transport Security Extraction Curation Quality Linking Integration Publication Visualization Analysis Data Repositories Linked Open Data Cloud Stage 1 Stage 2 Stage 3 Food SocietiesClimate Energy
  • 9.
  • 10.
    Rationale  Show societalvalue of Big Data  Lower barrrier for using big data technologies o Required effort and resources o Limited data science skills  Help establishing cross- lingual/organizational/domain Data Value Chains 21-mai-15
  • 11.
    BigDataEurope: Objectives 21-mai-15www.big-data-europe.eu COORDINATION Stakeholder Engagement (RequirementsElicitation) SUPPORT Design, Realise, Evaluate Big Data Aggregator Platform Create and Manage Societal Big Data Interest Groups Cloud-deployment ready Big Data Aggregator Platform CSA Measures Results
  • 12.
  • 13.
    Big Data Ecosystems:Orthogonal Dimensions 21-mai-15www.big-data-europe.eu Generic Big Data Enabling Technologies Data Value Chain Data Generation & Acquisition Data Analysis & Processing Data Storage & Curation Data Visualization & Usage Data-driven Services SocietalChallenges DomainSpecificDataAssets&Technology Healthcare Food Security Energy Intelligent Transport Climate & Environment Inclusive & Reflective Societies Secure Societies
  • 14.
    Work Packages &Phases Community Building M1-M12 M13-M24 M25-M36 Enabling Technologies Component Integration Uptake Integrator Deployment Community Assessment WP3 – Big Data Generic Enabling Technologies & Architecture WP5 – Big Data Integrator Instances WP7 – Dissemination & Communication WP2 – Community Building & Requirements WP4 – Big Data Integrator Platform WP6 – Real-life Deployment & User Evaluation
  • 15.
    Societal Domains, FocusAreas, Data assetts Societal Domain Preliminary Big Data Focus area Selected Key Data assets Life Sciences & Health Heterogeneous data Linking & integration Biomedical Semantic Indexing & QA ACD Labs / ChemSpider, ChEBI, ChEMBL, Con-ceptWiki, DrugBank, EN- ZYME, Gene Ontology, GO Annotation, Swis-sProt, UniProt, Wik- iPathways, PubMed, MeSH, Disease Ontology (DO), Joint Chemical Dic- tionary (Jochem), Bio-ASQ datasets Food & Agriculture Large-scale distributed data integration INFOODS, AQUASTAT Green Learning Network (GLN), Agricultural Bibliography Network (ABN), AGRIS, AquaMaps, Fishbase Energy Real-time monitoring, stream processing, data analytics, and decision support European Energy Exchange Data, smart meter measurement data, gas/fuels/energy market/price data, consumption statistics, equipment condition monitoring data) Transport Streaming sensor network & geo-spatial data integration GTFS data, OSM/ LinkedGeoData, MobilityMaps, Transport sensor data, ROSATTE Road safety attributes, European Road Data Infrastructure - EuroRoadS Climate Real-time monitoring, stream processing, and data analytics. European Grid Infrastructure (EGI), Databases hosting atmospheric data. Several software frameworks for simulation, calibration and reconstruction. Social Sciences Statistical and research data linking & integration Federated social sciences data catalogs, statistical data from public data portals and statistical offices (e.g. EuroStats, UNESCO, WorldBank) Security Real-time monitoring, stream processing, and data analytics. Earth Observation data (e.g. Very High Resolution Satellite Imagery acquired from commercial providers and governmental systems) and collateral data for supporting CFSP/CSDP missions and operations,
  • 16.
  • 17.
    Data Aggregator Platform: Blueprint BatchLayer Speed Layer Data Storage Real-time data & Transactions … Batch View Real-time View messagepassing message passing Applications & Showcases Real-time dashboards Domain-specific BDE apps Big Data Analytics In-stream Mining BDEPlatform& Intelligence Input data Stream Spatial Social Statistical Temporal Transaction al Imagery + Semantic Layer (Retaining Semantics using LD Lambda Architecture
  • 18.
    Current Activities –Year#1  2015 BDE Societal Workshops (7) Planned o Schedule on Website  7 W3C Interest Groups set up: Please Join! o SC1: HEALTH https://www.w3.org/community/bde-health/join o SC2: FOOD & AGRICULTURE https://www.w3.org/community/bde-food/ o SC3: ENERGY https://www.w3.org/community/bde-energy/ o SC4: TRANSPORT https://www.w3.org/community/bde-transport/ o SC5: CLIMATE & ENVIRONMENT https://www.w3.org/community/bde-climate/ o SC6: SOCIETIES https://www.w3.org/community/bde-societies/ o SC7: SECURITY https://www.w3.org/community/bde-secure-societies/ www.big-data-europe.eu

Editor's Notes

  • #12 Two clearly defined coordination and support measures: Coordination: Engaging with a diverse range of stakeholder groups from SCs Collecting requirements for the ICT infrastructure needed by data-intensive science practitioners covering all aspects of publishing and consuming semantically interoperable, large-scale data and knowledge assets Support: meets requirements minimises disruption to current workflows, maximises the opportunities to take advantage of the latest European RTD developments BigDataEurope will implement and apply two main instruments to successfully realize these measures: Build Societal Big Data Interest/Community Groups in the W3C interest group scheme & involving a large number of stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design, integrate and deploy a cloud-deployment-ready Big Data aggregator platform comprising key open-source Big Data technologies for real-time and batch processing, such as Hadoop, Cassandra and Storm.