Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BDE SC6-ws-05/12/2016 technology part - SWC


Published on

Presentation given by Semantic Web Company in Cologne during the BDE SC6 Workshop.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

BDE SC6-ws-05/12/2016 technology part - SWC

  2. 2. Big Data Europe (CSA: 2015-17)  Show societal value of Big Data: 7 Domains  Lower barrier for using big data technologies o Required effort and resources o Limited data science skills  Help establishing cross- lingual/organizational/domain Data Value Chains 16-déc.-16
  3. 3. Big Data Europe 16-déc.-16 COORDINATION Stakeholder Engagement (Requirements Elicitation) SUPPORT Design, Realise, Evaluate Big Data Aggregator Platform Create and Manage Societal Big Data Interest Groups Cloud-deployment ready Big Data Aggregator Platform CSA Measures Results
  4. 4. THE BDE PLATFORM ARCHITECTURE & COMPONENTS Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges
  5. 5. The three Big Data „V“ Variety is often neglected
  6. 6. Current State of Platform Architecture
  7. 7. Adding a Semantic Layer to Data Lakes Manufacturing Marketing Sales SupportAccounting Semantic Data Lake • central place for model, schema and data historization • Combination of Scale Out (cost reduction) and semantics (increased control & flexibility) • grows incrementally (pay-as-you-go) Inbound Data Sources Outbound and Consumption Inbound Raw Data Store Data Lake (order of magnitude cheaper scalable data store) Knowledge Graph for Relationship Definition and Meta Data Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems JSON-LD CSVW R2RMLXML2RDF
  8. 8. Why to use BDE Technology? Hortonworks Cloudera MapR Bigtop BDE File System HDFS HDFS NFS HDFS HDFS Installation Native Native Native Native lightweight virtualization Plug & play components (no rigid schema) no no no no yes High Availability Single failure recovery (yarn) Single failure recovery (yarn) Self healing, mult. failure rec. Single failure recovery (yarn) Multiple Failure recovery Cost Commercial Commercial Commercial Free Free Scaling Freemium Freemium Freemium Free Free Addition of custom components Not easy No No No Yes Integration testing yes yes yes yes -- Operating systems Linux Linux Linux Linux All Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom
  9. 9. SC6 PILOT CITIZENS BUDGET ON MUNICIPAL LEVEL ARCHITECTURE & COMPONENTS Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges
  10. 10. SC6 in Big Data Europe – what is included  Europe in a changing world - inclusive, innovative and reflective societies  Social Sciences  Smart Statistics  (Digital) Humanities  Digital (Research) Archives 16-dé
  11. 11. SC6: Social Sciences 16-dé Pilot focus area: Citizens budget spending on municipal level Big Data Focus area: Statistical and research data linking & integration Selected Key Data assets: Detailed budget execution data in city level, statistical data from public data portals and statistical offices, federated social sciences data
  12. 12. SC6 Pilot: Idea & Objectives State of the Art: o Budget: the most important document of public policy o Budget execution affects everyday lives o Citizens are more involved in city level activities Objective: Can we make budgets more useful for citizens, researchers and decision makers? 16-déc.-16
  13. 13. SC6 Pilot: Idea & Objectives  Create an online Dashboard on Economic Data o Harvest data from several sources in diff. formats o Normalise the data (RDF) o Link & map the data (attributes, structure, languages) o Analyse the data – financial ratios (comparisons, predictions etc.) o Visualise the analysis on an online dashboard including help & infos to understand data & analysis o Procide raw data (for further use as open data)16-dé
  14. 14. 2 H2020 projects working together on the SC6 Pilot • Big Data Europe • Your Data Stories SC6 Pilot core team: Ivana Versic (Cessda), Michalis Vafopoulos (NCSR-D), Martin Kaltenböck (SWC), Jürgen Jakobitsch (SWC), Hossein Abroshan (Cessda) SC6 Pilot Partners
  15. 15. Data used / produced in Pilot Budget Data and Budget Execution Data  Municipality of Athens, Greece o Description: budget execution data in detail o Frequency: daily o Ownership: open o Format: API  Municipality of Thessaloniki, Greece o Description: budget execution data in detail o Frequency: daily o Ownership: open o Format: csv, xls (files for download provided) 16-dé  Municipality of Kalamaria, Greece o Description: budget execution data in detail o Frequency: weekly o Ownership: open o Format: csv, xls (files for download provided)  Additional Open Data o Description: economic taxonomies etc. o Ownership: open o Format: RDF (skos, owl), other o E.g. COFOG (UN Classification)  Size of Data o ~ 30 Mio triples (statements) for 1 year
  16. 16. 4 Vs of Big Data in SC6 Pilot  Variety: requirement based on the harvesting of budget data and budget execution data from several sources, available in different structures and formats.  Volume: requirement regarding the growing amount of open budget data available as well as of budget execution data  Velocity: requirements regarding budget execution data that is provided on continuous basis by the publisher (daily, weekly, monthly).  Veracity: Veracity refers to the biases, noise and abnormality in data. Even for within the same country there are differences on the published data because often are coming from different systems or public accounting standards are not enforced absolutely uniformly (e.g. different municipal departments) 16-dé
  17. 17. SC6: Social Sciences Pilot Architecture & Components
  18. 18. SC6 Pilot - Architecture 16-dé
  19. 19. SC6 Pilot: Technical Components  Apache Flume, (data ingestion)  Apache Kafka, (messaging service)  Apache Spark, (distributed analysis, transformation)  Apache HDFS, (raw data storage)  SWCs’ PoolParty Semantic Suite, (data consolidation, curation, mapping)  OpenLink s’ Virtuoso, (triple store – data storage)  Apache HTTP, (linked data serving)  Apache Avro, (intermediate data schema)  D3 JS Library, (visualisation of RDF data using SPARQL queries)  SWCs’ PoolParty GraphSearch (SPARQL based interface component for filter & faceted search) 16-dé
  20. 20. SC6 Pilot: 1st version implemented 16-dé https://bde.poolparty.bizGraphSearchSC6
  21. 21. SC6 Pilot: Pilot Evaluation Evaluation Approach SC6 Pilot (starts 01/2017):  Invite municipalities to evaluate and use the system  Invite community (open data, data community, BDE community, W3C)  Evaluate within the participating projects (BDE, DataStories, invite: OpenBudget)  BDE SC6 workshop in Cologne, 5.12.2016 Additional evaluation – tests over time with  a growing amount of data  a growing number of different sources & formats docked onto the system  additional analytics in place 16-dé
  22. 22. How to benefit best from BDE 16-dé • BDE Workshops& Webinars • Use & expand the BDE Platform (BDE github) • Visit Website: news, events, community, … • Big Data Europe W3C Community Group • 7+1x Mailing Lists – stay tuned! • BDE Platform website coming soon!! • Related EC Call on Big Data, open until 02 Feb2017: Policy-development in the age of big data: data-driven policy-making, policy-modelling
  23. 23. Contacts:  CESSDA, Ivana Ilijasic Versic, Hossein Abroshan,  NCSR-D, Michalis Vafopoulos,  Semantic Web Company (SWC), Martin Kaltenböck, Jürgen Jakobitsch, 16-dé
  24. 24. Questions & Contacts 16-déc.-16 #BigDataEurope Martin Kaltenböck CFO, Semantic Web Company