The talk describes the SANSA software framework for distributed in-memory analytics ("Big Data") based on the semantic technology stack, which was presented at ISWC (International Semantic Web Conference) 2017 in Vienna.
6. “Big Data” Processing (Spark/Flink) Semantic Technology Stack
Data Integration Manual pre-processing Partially automated,
standardised
Modelling Simple (often flat feature vectors) Expressive
Support for data
exchange
Limited (heterogeneous formats
with limited schema information)
Yes (RDF & OWL W3C
Standards)
Business value Direct Indirect
Horizontally
scalable
Yes No
Idea: combine advantages of both worlds
9. •
•
•
val graph: TripleRDD = NTripleReader.load(spark, uri)
graph.find(ANY, URI("http://dbpedia.org/ontology/influenced"), ANY)
val rdf_stats_prop_dist = PropertyUsage(graph, spark).PostProc()
11. •
•
•
•
•
val rdd = ManchesterSyntaxOWLAxiomsRDDBuilder.build(spark, "file.owl")
// get all subclass-of axioms
val sco = rdd.filter(_.isInstanceOf[OWLSubClassOfAxiom])
13. val graphRdd = NTripleReader.load(spark,input)
val partitions = RdfPartitionUtilsSpark.partitionGraph(graphRdd)
val rewriter = SparqlifyUtils.createSparqlSqlRewriter(spark, partitions)
val qef = new QueryExecutionFactorySparqlifySpark(spark, rewriter)
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Sparqlifying
Distributed Data
Structures
ResultsViews Views