•BDE-Platform Release Webinar
•Jakobitsch Jürgen, SWC
•SC Network Social Sciences
Jürgen Jakobitsch
j.jakobitsch@semantic-web.com
https://www.linkedin.com/in/turnguard
https://www.poolparty.biz/semantic-web-
company-gmbh/
•SC Network Social Sciences
SC6: Create an online Dashboard on Economic
Data
Making budget data comparable
Harvesting heterogeneous formats
Normalize to RDF
Link & map data
Analyze data by calculating Financial Ratios
Visualize data
•SC6 Architecture
•SC6 – Data Accquisition (I)
Apache Flume
Collecting and moving large amounts of data
Sources, Channels, Sinks
In SC6
Spooling Directory Source
Memory Channel
HDFSSink, KafkaSink
•SC6 – Data Accquisition (II)
Extend bde2020/flume Docker Image
Add startup.json
•SC6 – Data Accquisition (III)
Agent Setup (Source)
•SC6 – Data Accquisition (IV)
Agent Setup (Channel)
Agent Setup (Sink)
•SC6 – Messaging (I)
Apache Kafka
Producer/Consumer paradigm
Messages are grouped by topic
In SC6
Apache Flume is creating the topic
Messages contain filename and filecontents
(as byte array)
Spark will be the consumer
•SC6 – Messaging (II)
Extend bde2020/kafka
Add startup.json
•SC6 – Processing (I)
Apache Spark
Parallel computing
In SC6
Acts as a Kafka Consumer
Working on message pairs from Kafka
Filename (String)
File content (byte[])
Triplification via Parser for each datasource
SPI
•SC6 – Processing (II)
Spark Job (I)
https://github.com/big-data-europe/pilot-sc6-cycle1/
•SC6 – Processing (III)
Spark Job (II)
•SC6 – Processing (IV)
Spark Job (III)
•SC6 – Processing (V)
Spark Job (IV) – Parsers
Sample
•SC6 – Processing (VI)
Spark Job (V) – Virtuoso Loader
https://github.com/big-data-europe/virtuoso-utils
•SC6 – Processing (VII)
Spark Job (VI) – Docker Image
•SC6 – Stack (I)
Putting things together - Zookeeper
•SC6 – Stack (II)
Putting things together - Kafka
•SC6 – Stack (III)
Putting things together - Flume
•SC6 – Stack (IV)
Putting things together – Spark (I)
•SC6 – Stack (V)
Putting things together – Spark (II)
•SC6 – Data Representation
PoolParty GraphSearch
https://www.poolparty.biz/poolparty-semantic-
Big Data Europe Integrator Platform Launch
Wednesday 3 May @ 15:00 CEST
Please type your questions at any time. Q&A will follow the presentations

Societal Challenge 6: Social Sciences - Spending Comparison

  • 1.
  • 2.
    •SC Network SocialSciences Jürgen Jakobitsch j.jakobitsch@semantic-web.com https://www.linkedin.com/in/turnguard https://www.poolparty.biz/semantic-web- company-gmbh/
  • 3.
    •SC Network SocialSciences SC6: Create an online Dashboard on Economic Data Making budget data comparable Harvesting heterogeneous formats Normalize to RDF Link & map data Analyze data by calculating Financial Ratios Visualize data
  • 4.
  • 5.
    •SC6 – DataAccquisition (I) Apache Flume Collecting and moving large amounts of data Sources, Channels, Sinks In SC6 Spooling Directory Source Memory Channel HDFSSink, KafkaSink
  • 6.
    •SC6 – DataAccquisition (II) Extend bde2020/flume Docker Image Add startup.json
  • 7.
    •SC6 – DataAccquisition (III) Agent Setup (Source)
  • 8.
    •SC6 – DataAccquisition (IV) Agent Setup (Channel) Agent Setup (Sink)
  • 9.
    •SC6 – Messaging(I) Apache Kafka Producer/Consumer paradigm Messages are grouped by topic In SC6 Apache Flume is creating the topic Messages contain filename and filecontents (as byte array) Spark will be the consumer
  • 10.
    •SC6 – Messaging(II) Extend bde2020/kafka Add startup.json
  • 11.
    •SC6 – Processing(I) Apache Spark Parallel computing In SC6 Acts as a Kafka Consumer Working on message pairs from Kafka Filename (String) File content (byte[]) Triplification via Parser for each datasource SPI
  • 12.
    •SC6 – Processing(II) Spark Job (I) https://github.com/big-data-europe/pilot-sc6-cycle1/
  • 13.
    •SC6 – Processing(III) Spark Job (II)
  • 14.
    •SC6 – Processing(IV) Spark Job (III)
  • 15.
    •SC6 – Processing(V) Spark Job (IV) – Parsers Sample
  • 16.
    •SC6 – Processing(VI) Spark Job (V) – Virtuoso Loader https://github.com/big-data-europe/virtuoso-utils
  • 17.
    •SC6 – Processing(VII) Spark Job (VI) – Docker Image
  • 18.
    •SC6 – Stack(I) Putting things together - Zookeeper
  • 19.
    •SC6 – Stack(II) Putting things together - Kafka
  • 20.
    •SC6 – Stack(III) Putting things together - Flume
  • 21.
    •SC6 – Stack(IV) Putting things together – Spark (I)
  • 22.
    •SC6 – Stack(V) Putting things together – Spark (II)
  • 23.
    •SC6 – DataRepresentation PoolParty GraphSearch https://www.poolparty.biz/poolparty-semantic-
  • 24.
    Big Data EuropeIntegrator Platform Launch Wednesday 3 May @ 15:00 CEST Please type your questions at any time. Q&A will follow the presentations