SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
The talk describes the SANSA software framework for distributed in-memory analytics ("Big Data") based on the semantic technology stack, which was presented at ISWC (International Semantic Web Conference) 2017 in Vienna.
The talk describes the SANSA software framework for distributed in-memory analytics ("Big Data") based on the semantic technology stack, which was presented at ISWC (International Semantic Web Conference) 2017 in Vienna.
5.
“Big Data” Processing (Spark/Flink) Semantic Technology Stack
Data Integration Manual pre-processing Partially automated,
standardised
Modelling Simple (often flat feature vectors) Expressive
Support for data
exchange
Limited (heterogeneous formats
with limited schema information)
Yes (RDF & OWL W3C
Standards)
Business value Direct Indirect
Horizontally
scalable
Yes No
Idea: combine advantages of both worlds
6.
•
•
•
val graph: TripleRDD = NTripleReader.load(spark, uri)
graph.find(ANY, URI("http://dbpedia.org/ontology/influenced"), ANY)
val rdf_stats_prop_dist = PropertyUsage(graph, spark).PostProc()
8.
•
•
•
•
•
val rdd = ManchesterSyntaxOWLAxiomsRDDBuilder.build(spark, "file.owl")
// get all subclass-of axioms
val sco = rdd.filter(_.isInstanceOf[OWLSubClassOfAxiom])
9.
val graphRdd = NTripleReader.load(spark,input)
val partitions = RdfPartitionUtilsSpark.partitionGraph(graphRdd)
val rewriter = SparqlifyUtils.createSparqlSqlRewriter(spark, partitions)
val qef = new QueryExecutionFactorySparqlifySpark(spark, rewriter)
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Sparqlifying
Distributed Data
Structures
ResultsViews Views