Societal Challenge 6: Social Sciences - Spending Comparison

•BDE-Platform Release Webinar
•Jakobitsch Jürgen, SWC

•SC Network Social Sciences
Jürgen Jakobitsch
j.jakobitsch@semantic-web.com
https://www.linkedin.com/in/turnguard
https://www.poolparty.biz/semantic-web-
company-gmbh/

•SC Network Social Sciences
SC6: Create an online Dashboard on Economic
Data
Making budget data comparable
Harvesting heterogeneous formats
Normalize to RDF
Link & map data
Analyze data by calculating Financial Ratios
Visualize data

•SC6 – Data Accquisition (I)
Apache Flume
Collecting and moving large amounts of data
Sources, Channels, Sinks
In SC6
Spooling Directory Source
Memory Channel
HDFSSink, KafkaSink

•SC6 – Data Accquisition (II)
Extend bde2020/flume Docker Image
Add startup.json

•SC6 – Data Accquisition (III)
Agent Setup (Source)

•SC6 – Data Accquisition (IV)
Agent Setup (Channel)
Agent Setup (Sink)

•SC6 – Messaging (I)
Apache Kafka
Producer/Consumer paradigm
Messages are grouped by topic
In SC6
Apache Flume is creating the topic
Messages contain filename and filecontents
(as byte array)
Spark will be the consumer

•SC6 – Messaging (II)
Extend bde2020/kafka
Add startup.json

•SC6 – Processing (I)
Apache Spark
Parallel computing
In SC6
Acts as a Kafka Consumer
Working on message pairs from Kafka
Filename (String)
File content (byte[])
Triplification via Parser for each datasource
SPI

•SC6 – Processing (II)
Spark Job (I)
https://github.com/big-data-europe/pilot-sc6-cycle1/

•SC6 – Processing (III)
Spark Job (II)

•SC6 – Processing (IV)
Spark Job (III)

•SC6 – Processing (V)
Spark Job (IV) – Parsers
Sample

•SC6 – Processing (VI)
Spark Job (V) – Virtuoso Loader
https://github.com/big-data-europe/virtuoso-utils

•SC6 – Processing (VII)
Spark Job (VI) – Docker Image

•SC6 – Stack (I)
Putting things together - Zookeeper

•SC6 – Stack (II)
Putting things together - Kafka

•SC6 – Stack (III)
Putting things together - Flume

•SC6 – Stack (IV)
Putting things together – Spark (I)

•SC6 – Stack (V)
Putting things together – Spark (II)

•SC6 – Data Representation
PoolParty GraphSearch
https://www.poolparty.biz/poolparty-semantic-

Big Data Europe Integrator Platform Launch
Wednesday 3 May @ 15:00 CEST
Please type your questions at any time. Q&A will follow the presentations

Societal Challenge 6: Social Sciences - Spending Comparison

More Related Content

What's hot

Similar to Societal Challenge 6: Social Sciences - Spending Comparison

More from BigData_Europe

Recently uploaded

Societal Challenge 6: Social Sciences - Spending Comparison