Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Societal Challenge 6: Social Sciences - Spending Comparison

192 views

Published on

Jürgen Jakobitsch describes the BDE project pilot for Societal Challenge 6 (Social Sciences). The platform is being used to ingest, analyse and visualise spending data from multiple sources.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Societal Challenge 6: Social Sciences - Spending Comparison

  1. 1. •BDE-Platform Release Webinar •Jakobitsch Jürgen, SWC
  2. 2. •SC Network Social Sciences Jürgen Jakobitsch j.jakobitsch@semantic-web.com https://www.linkedin.com/in/turnguard https://www.poolparty.biz/semantic-web- company-gmbh/
  3. 3. •SC Network Social Sciences SC6: Create an online Dashboard on Economic Data Making budget data comparable Harvesting heterogeneous formats Normalize to RDF Link & map data Analyze data by calculating Financial Ratios Visualize data
  4. 4. •SC6 Architecture
  5. 5. •SC6 – Data Accquisition (I) Apache Flume Collecting and moving large amounts of data Sources, Channels, Sinks In SC6 Spooling Directory Source Memory Channel HDFSSink, KafkaSink
  6. 6. •SC6 – Data Accquisition (II) Extend bde2020/flume Docker Image Add startup.json
  7. 7. •SC6 – Data Accquisition (III) Agent Setup (Source)
  8. 8. •SC6 – Data Accquisition (IV) Agent Setup (Channel) Agent Setup (Sink)
  9. 9. •SC6 – Messaging (I) Apache Kafka Producer/Consumer paradigm Messages are grouped by topic In SC6 Apache Flume is creating the topic Messages contain filename and filecontents (as byte array) Spark will be the consumer
  10. 10. •SC6 – Messaging (II) Extend bde2020/kafka Add startup.json
  11. 11. •SC6 – Processing (I) Apache Spark Parallel computing In SC6 Acts as a Kafka Consumer Working on message pairs from Kafka Filename (String) File content (byte[]) Triplification via Parser for each datasource SPI
  12. 12. •SC6 – Processing (II) Spark Job (I) https://github.com/big-data-europe/pilot-sc6-cycle1/
  13. 13. •SC6 – Processing (III) Spark Job (II)
  14. 14. •SC6 – Processing (IV) Spark Job (III)
  15. 15. •SC6 – Processing (V) Spark Job (IV) – Parsers Sample
  16. 16. •SC6 – Processing (VI) Spark Job (V) – Virtuoso Loader https://github.com/big-data-europe/virtuoso-utils
  17. 17. •SC6 – Processing (VII) Spark Job (VI) – Docker Image
  18. 18. •SC6 – Stack (I) Putting things together - Zookeeper
  19. 19. •SC6 – Stack (II) Putting things together - Kafka
  20. 20. •SC6 – Stack (III) Putting things together - Flume
  21. 21. •SC6 – Stack (IV) Putting things together – Spark (I)
  22. 22. •SC6 – Stack (V) Putting things together – Spark (II)
  23. 23. •SC6 – Data Representation PoolParty GraphSearch https://www.poolparty.biz/poolparty-semantic-
  24. 24. Big Data Europe Integrator Platform Launch Wednesday 3 May @ 15:00 CEST Please type your questions at any time. Q&A will follow the presentations

×