Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big data Europe: concept, platform and pilots


Published on

Presented by Simon Scerri (University of Bonn, Fraunhofer IAIS) during the 2nd BDE SC5 workshop, 11 October 2016, in Brussels, Belgium

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Big data Europe: concept, platform and pilots

  1. 1. BIG DATA EUROPE: CONCEPT, PLATFORM, AND PILOTS BDE SC5 Workshop, Brussels11 October 2016
  2. 2. Talk outline ¥ The BigDataEurope action ¥ The Big Data Integrator platform ¥ Pilots across all seven H2020 challenges ¥ Upcoming BDE Activities
  3. 3. BigDataEurope Action
  4. 4. Big Data Europe (CSA: 2015-17) ¥ Show societal value of Big Data o Across all societal challenges addressed by Horizon 2020 ¥ Lower barrier for using big data technologies o Effort and resources to convert tools and workflows o Skills and expertise ¥ Help establish data value chains o Across languages, organizations, and domains
  5. 5. Consortium NCSR DEMOKRITOS
  6. 6. Stakeholder Engagement ¥ Present action, showcase deployments ¥ Raise awareness about BDE results, what they mean for stakeholders ¥ Collect requirements to drive further development 18-oct.-16 M12M6 M18 M24 M30
  7. 7. Data Value Chain Evolution Extraction, Curation Quality, Linking, Integration Publication, Visualization, Analysis Extraction, Curation, Quality, Linking, Integration, Publication, Visualization, Analysis Health Transport Security Extraction Curation Quality Linking Integration Publication Visualization Analysis Data Repositories Linked Open Data Cloud Stage 1 Stage 2 Stage 3 Food SocietiesClimate Energy
  8. 8. Big Data Integrator
  9. 9. Architecture ¥ Big Data Integrator (BDI): o The prototype developed by BDE ¥ Main points of the architecture o Dockerization o Support layer, including integrated UI o Semantification layer
  10. 10. Big Data Integrator ¥ Plug-and-play BD Platform ¥ Cloud-deployment ready ¥ Domain independent, Customisable ¥ Bundles Open Source solutions ¥ First Version Released!
  11. 11. Docker containers ¥ Docker offers lightweight virtualization o Docker containers can be shared to be provisioned on different Linux variations and versions ¥ Identical base sys not required ¥ All BDI components: Docker containers
  12. 12. BDI components ¥ Processing and storage components o Re-used existing docker containers where available o Dockerized by BDE otherwise o Ensured all can be provisioned through Docker Swarm ¥ Components by BDE: o Support Layer o Semantic Layer
  13. 13. Support Layer ¥ BDE defines uniform UI stylesheets o Web UIs from BDE dockers (including for third party components) follow these BDE stylesheets ¥ BDE-developed tools: o Starting containers and dependencies o Monitoring execution
  14. 14. Semantic data lake ¥ Minimal ingestion pre-processing ¥ Semantic layer maintains metadata ¥ Add meaning when retrieving/processing Data Lake: scalable unstructured data store Relationship definitions and metadata JSON-LD CSVW R2RMLXML2RDF
  15. 15. BDE Docker Containers ¥ Data serving: HDFS, Cassandra, 4store, PostGIS, Strabon, Elastic Search, Hive, Semagrow ¥ Processing: Spark, Flink, Sansa ¥ Stream ingestion middleware: Flume, Kafka
  16. 16. Semantic layer tools ¥ BDE tooling for Semantic Data Lake: o Swagger: Semantics of RESTful APIs o Semantic Analytics Stack (SANSA): Distributed data processing for large- scale RDF data o Semagrow: SPARQL perspective over Big Data stores
  17. 17. BigDataEurope Pilots
  18. 18. SC1: Pharmacology research Life Sciences & Health • Extensive toolset developed by OPF and others • Query a large number of datasets, some large • Existing elaborate ingestion and homogenization by the OpenPHACTS Foundation
  19. 19. SC2: Viticulture resources Food and Agriculture • AgInfra is a major infrastructure for agriculture researchers, serving cross-linked bibliography, data, and processing services • Pilot automates publication ingestion and thematic classification
  20. 20. SC3: Predictive maintenance Energy • Wind turbine monitoring applies computational models to sensor data streams • Models are weekly re- parameterized using week’s data from multiple turbines
  21. 21. SC4: Traffic conditions estimation Transport • Estimation of real-time traffic conditions in Thessaloniki • Combines: • Traffic modelling from historical data • Current measurements from a taxi fleet of 1200 vehicles
  22. 22. SC5: Climate modelling Climate • Discovering and re-using previously computed derivatives • Lineage annotation: datasets and model parameters used to compute derivative datasets • Finding appropriate past runs avoids repeating weeks-long modelling runs • Preparing modelling experiments • Slicing, transforming, combining datasets into new datasets • Submission to and retrieval from modelling infrastructure
  23. 23. SC5 Pilot: Points Demonstrated Climate • Existing infrastructure and stable, reliable software for parallel computation of models • BDI is deployed as an external infrastructure for preparing and managing datasets • BDI offers: • Hive for managing data in a way that can be retrieved and manipulated, rather than file blocks • Cassandra stores structured and textual metadata for searching headers and lineage
  24. 24. SC6: Municipality budgets Social Sciences • Ingestion of budget and budget execution data • Multiple municipalities in varied formats and data models • Homogenized data made available for analysis and comparison
  25. 25. SC7: Change detection & verification Secure Societies • Events are extracted from text published by news agencies and on social networking sites • Events are geo-located and relevant changes are detected by comparing current and previous satellite images
  26. 26. UPCOMING BDE ACTIVITIES Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges
  27. 27. 2nd round of Societal Workshops Transport 22 September 2016 Brussels Collocated with Big Data for Transport, Tisa workshop Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018- 20 stakeholder consultation Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day on “Smart Grids and Storage” Climate 11 October 2016 (1) Brussels Collocated with Melodies Project Event – Exploiting Open Data Health 19 October 2016 Brussels Standalone Workshop Security 18 October 2016 Brussels Standalone Workshop Societies 5 December 2016 Cologne Collocated with EDDI16- 8th Annual European DDI User Conference
  28. 28. Other Activities ¥ Hands-on BDE pilots workshop o Apache Big Data Europe, Seville, 14-16 Nov o Enable BD technology practitioners to try out BDI & components o To fine-tune technical BDI requirements ¥ Various SC-focussed and general hangouts, follow! o Apache Flink & BDE (20 Oct) – Free Webinar
  29. 29. WEB: EMAIL: BIG DATA INTEGRATOR: PROJECT COORDINATION Prof. Sören Auer, auer © cs.uni-bonn · de (Fraunhofer IAIS) > Dr. Simon Scerri, scerri © cs.uni-bonn · de (Fraunhofer IAIS) EIS Department/Group, Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany Fraunhofer IAIS: Leads Fraunhofer Big Data Alliance Questions & Contacts 18-oct.-16 #BigDataEurope