SlideShare a Scribd company logo
1 of 23
Download to read offline
Building a Graph based RDF Store for Apache Cassandra
Name: Ravindra Ranwala
ID: 138227T
Supervisor: Dr. Amal Shehan Perera
1
Agenda
● Introduction
● Basic Concepts
● The Problem
● Literature Review
● Methodology
● Demo
● Evaluation and Result
● Conclusion 2
Introduction
● RDFs are used to support queries in the semantic web.
● RDF stores contain trillions of triples.
● Today RDF data is everywhere - commercial search
engines proliferate RDF data ex. Google, yahoo, bing
etc.
● SPARQL - used as a query language.
● Different approaches exists to build a triple store.
● Main challenges are system scalability and generality.
3
Basic Concepts - RDF Triple
● RDF dataset consists of statements in the form of
(subject, predicate, object)
● Subject has a predicate property whose value is the
object.
● Examples: <Titanic, has award, Best picture>
● Core of the semantic web is built on top of the RDF data
model.
● These triples can be stored in different ways.
4
The Problem
● Apache Cassandra is a Nosql, multi tenant and multi
data centric database.
● Our objective is to build a scalable RDF store for
Apache Cassandra.
● Cassandra is used by eBay, Twitter, Cisco, etc.
● This will exponentially increase the value of Cassandra.
● The largest known Cassandra cluster has 300 TB of
data over 400 machines.
● This motivates us to build a distributed, scalable RDF
store to answer user queries on them efficiently. 5
Literature Review - Concepts
● A triple store can be built on top of any DBMS or File system.
● RDF dataset consists of statements in the form of <subject, predicate,
object>
● Subject has a predicate property whose value is object.
● Ex. <person1, name, Mike>
● A typical triple store holds a multi millions/billions of such triples.
● Efficient and scalable management of RDF data is a fundamental
challenge.
● SPARQL queries are submitted to the RDF store.
Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang Kai Zeng, "A Distributed Graph
Engine for Web Scale RDF Data,"
6
Apache Cassandra
● Distributed, fault tolerant (i.e. no single point of failures),
post relational, Nosql database system.
● Peer to peer distributed architecture. Supports both strict
and eventual consistency.
● All the nodes are the same. There is no master and slave
nodes.
● Uses read/write anywhere style architecture.
DataStax Corporation. (2011, October) “Welcome to Apache Cassandra 1.0”
7
Triple store –approaches
● There are different approaches the exist to manage
RDF data.
● Each approach has it’s own advantages and
disadvantages.
8
Relational Approach
● Triples are stored using the relational model.
Justin J. Levandoski F. Mokbel, "RDF Data-Centric Storage,"
9
Relational Approach (contd.)
● Triple store - yields costly self joins of a huge RDF store
(trillions of triples)
● N-array - eliminates the need for joins, but leads to
higher number of nulls.
● reduces null storage, but introduces costly join.
10
Graph based approaches
● New approach that greatly improves the performance of
SPARQL query processing
● Graph exploration instead of joins.
● Unnecessary intermediate results can be pruned down.
● Models RDF data in it’s native graph form.
● Examples: Trinity, TripleRush etc.
11
Trinity RDF
● Graph based implementation. Models RDF as a DAG.
● Subjects and objects are represented as a node.
● Predicate is represented as a directed labelled edge.
● Graph is stored in memory for fast access.
H. Wang, and Y. Li B. Shao, "The Trinity graph engine. Technical Report 161291, Microsoft Research," 12
Trinity Architecture
● Distributed in memory key value store.
● Partitions RDF graph across multiple machines by hashing on the nodes.
● Each machine holds a disjoint part of the graph.
● Final result is assembled at the proxy.
Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang Kai Zeng, "A Distributed Graph Engine for Web
Scale RDF Data,"
13
Methodology
● Use case Scenarios
○ Populating data into Cassandra Cluster
○ Building the RDF Graph
○ Querying the RDF Graph
○ Dropping the RDF Store
● Technologies used.
○ Apache Jena RDF API
○ Struts 2
○ Java/JSP/XSLT/XML/XPath
14
System Architecture
15
Demo
16
Evaluation and Result
● DBPedia benchmarking was used to compare.
● DBPedia geo-coordinates and homepages dataset was
used. Accounts for 0.7 million triples
● 4Store, Bigdata RDF stores were compared with our
implementation
● Queries used
○ Query One: Finds the homepage of the Metropolitan museum of Art
○ Query Two: Finds the Homepage of Kevin_Bacon
○ Query Three: Finds all the resources and their homepages which
reside near the area of Berlin.
○ Query Four: Finds all the resources and their homepages which reside
near the area of New York. 17
Benchmark Results
● Query complexity increases from Q1 through Q4.
● The execution time taken by different RDF stores, to execute above four queries.
● Query execution time is measured in ms.
Q1 Q2 Q3 Q4
Our
implementation
216ms 7ms 336ms 279ms
4Store 16ms 18ms 455ms 416ms
Bigdata 41ms 30ms 2sec, 355ms 1sec, 600ms
DBpedia. (2008, Jan 10.) RDF Store Benchmarks with DBpedia [Online]. Available:
http://wifo5-03.informatik.uni-mannheim.de/benchmarks-200801/ 18
Benchmarking Results
19
Benchmarking Results
20
Benchmarking Analysis
● Graph based approach yields more performance boosts
when query becomes more and more complex
● Complexity increases from Query 1 to 4 gradually.
● This implementation outperforms 4store and bigdata
especially when the complexity of the query increases.
● First query takes time, because it builds the index
structure.
21
Future Work
● Main limitation of the approach is Scalability.
● Larger datasets lead to OutOfMemory error while building the graph model.
● Solution: Distributed implementation
22
Conclusion
● Approaches used to model and retrieve RDF data.
● New approaches to manage RDF data efficiently.
● Graph based approach.
● New Implementation
○ Use case scenarios
○ Evaluation and result using DBPedia dataset
○ Benchmark Analysis
23

More Related Content

What's hot

Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaJeen Broekstra
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar PresentationMuntazir Mehdi
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsGraph-TA
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIdatamantra
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeAngad Singh
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientistsStitch Fix Algorithms
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)jottevanger
 
Superset druid realtime
Superset druid realtimeSuperset druid realtime
Superset druid realtimearupmalakar
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataGraph-TA
 
Enabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MMEnabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MMThomas Kurz
 
Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011Juan Sequeda
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF Navid Sedighpour
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Wes McKinney
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentHarsh Thakkar
 
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike LimcacoExtending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike LimcacoData Con LA
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & MarquezJulien Le Dem
 

What's hot (20)

Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in Java
 
Tracking data lineage at Stitch Fix
Tracking data lineage at Stitch FixTracking data lineage at Stitch Fix
Tracking data lineage at Stitch Fix
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar Presentation
 
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL PlatformsRDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Graph Data Management in Oracle Database and NoSQL Platforms
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source API
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)
 
Superset druid realtime
Superset druid realtimeSuperset druid realtime
Superset druid realtime
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
Enabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MMEnabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MM
 
Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF
 
Data structures
Data structuresData structures
Data structures
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
 
Drupal and the Semantic Web
Drupal and the Semantic WebDrupal and the Semantic Web
Drupal and the Semantic Web
 
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike LimcacoExtending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
 
Extending Analytic Reach
Extending Analytic ReachExtending Analytic Reach
Extending Analytic Reach
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
 

Viewers also liked

WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2
 
Deploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on MesosDeploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on MesosImesh Gunaratne
 
WSO2Con EU 2016: Integrate APIM to Third-party Tools: Creating an Agent for ELK
WSO2Con EU 2016: Integrate APIM to Third-party Tools:  Creating an Agent for ELKWSO2Con EU 2016: Integrate APIM to Third-party Tools:  Creating an Agent for ELK
WSO2Con EU 2016: Integrate APIM to Third-party Tools: Creating an Agent for ELKWSO2
 
Deploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on KubernetesDeploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on KubernetesImesh Gunaratne
 
WSO2 Identity Server - Product Overview
WSO2 Identity Server - Product OverviewWSO2 Identity Server - Product Overview
WSO2 Identity Server - Product OverviewWSO2
 
Enhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
Enhanced Developer Experience with WSO2 Enterprise Service Bus ToolingEnhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
Enhanced Developer Experience with WSO2 Enterprise Service Bus ToolingWSO2
 
Resilient Enterprise Messaging with WSO2 ESB
Resilient Enterprise Messaging with WSO2 ESBResilient Enterprise Messaging with WSO2 ESB
Resilient Enterprise Messaging with WSO2 ESBRavindra Ranwala
 
Solution Architecture Patterns for Digital Transformation
Solution Architecture Patterns for Digital TransformationSolution Architecture Patterns for Digital Transformation
Solution Architecture Patterns for Digital TransformationWSO2
 
2016 Year End Webinar - Are You Ready for Digital Transformation?
2016 Year End Webinar - Are You Ready for Digital Transformation?2016 Year End Webinar - Are You Ready for Digital Transformation?
2016 Year End Webinar - Are You Ready for Digital Transformation?WSO2
 

Viewers also liked (9)

WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
 
Deploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on MesosDeploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on Mesos
 
WSO2Con EU 2016: Integrate APIM to Third-party Tools: Creating an Agent for ELK
WSO2Con EU 2016: Integrate APIM to Third-party Tools:  Creating an Agent for ELKWSO2Con EU 2016: Integrate APIM to Third-party Tools:  Creating an Agent for ELK
WSO2Con EU 2016: Integrate APIM to Third-party Tools: Creating an Agent for ELK
 
Deploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on KubernetesDeploying WSO2 Middleware on Kubernetes
Deploying WSO2 Middleware on Kubernetes
 
WSO2 Identity Server - Product Overview
WSO2 Identity Server - Product OverviewWSO2 Identity Server - Product Overview
WSO2 Identity Server - Product Overview
 
Enhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
Enhanced Developer Experience with WSO2 Enterprise Service Bus ToolingEnhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
Enhanced Developer Experience with WSO2 Enterprise Service Bus Tooling
 
Resilient Enterprise Messaging with WSO2 ESB
Resilient Enterprise Messaging with WSO2 ESBResilient Enterprise Messaging with WSO2 ESB
Resilient Enterprise Messaging with WSO2 ESB
 
Solution Architecture Patterns for Digital Transformation
Solution Architecture Patterns for Digital TransformationSolution Architecture Patterns for Digital Transformation
Solution Architecture Patterns for Digital Transformation
 
2016 Year End Webinar - Are You Ready for Digital Transformation?
2016 Year End Webinar - Are You Ready for Digital Transformation?2016 Year End Webinar - Are You Ready for Digital Transformation?
2016 Year End Webinar - Are You Ready for Digital Transformation?
 

Similar to Graph basedrdf storeforapachecassandra

Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionRUHULAMINHAZARIKA
 
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL DatabasesA Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL DatabasesLuiz Henrique Zambom Santana
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusBoldRadius Solutions
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemAdam Marcus
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem yarapavan
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012scorlosquet
 
Data processing with spark in r &amp; python
Data processing with spark in r &amp; pythonData processing with spark in r &amp; python
Data processing with spark in r &amp; pythonMaloy Manna, PMP®
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopJoe Drumgoole
 
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...Alexey Zinoviev
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013scorlosquet
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014mahchiev
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Michele Pasin
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
 
final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)Ankit Rathi
 

Similar to Graph basedrdf storeforapachecassandra (20)

Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
 
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL DatabasesA Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL Ecosystem
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012Slides semantic web and Drupal 7 NYCCamp 2012
Slides semantic web and Drupal 7 NYCCamp 2012
 
Data processing with spark in r &amp; python
Data processing with spark in r &amp; pythonData processing with spark in r &amp; python
Data processing with spark in r &amp; python
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB Workshop
 
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)final_copy_camera_ready_paper (7)
final_copy_camera_ready_paper (7)
 

Recently uploaded

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 

Recently uploaded (20)

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 

Graph basedrdf storeforapachecassandra

  • 1. Building a Graph based RDF Store for Apache Cassandra Name: Ravindra Ranwala ID: 138227T Supervisor: Dr. Amal Shehan Perera 1
  • 2. Agenda ● Introduction ● Basic Concepts ● The Problem ● Literature Review ● Methodology ● Demo ● Evaluation and Result ● Conclusion 2
  • 3. Introduction ● RDFs are used to support queries in the semantic web. ● RDF stores contain trillions of triples. ● Today RDF data is everywhere - commercial search engines proliferate RDF data ex. Google, yahoo, bing etc. ● SPARQL - used as a query language. ● Different approaches exists to build a triple store. ● Main challenges are system scalability and generality. 3
  • 4. Basic Concepts - RDF Triple ● RDF dataset consists of statements in the form of (subject, predicate, object) ● Subject has a predicate property whose value is the object. ● Examples: <Titanic, has award, Best picture> ● Core of the semantic web is built on top of the RDF data model. ● These triples can be stored in different ways. 4
  • 5. The Problem ● Apache Cassandra is a Nosql, multi tenant and multi data centric database. ● Our objective is to build a scalable RDF store for Apache Cassandra. ● Cassandra is used by eBay, Twitter, Cisco, etc. ● This will exponentially increase the value of Cassandra. ● The largest known Cassandra cluster has 300 TB of data over 400 machines. ● This motivates us to build a distributed, scalable RDF store to answer user queries on them efficiently. 5
  • 6. Literature Review - Concepts ● A triple store can be built on top of any DBMS or File system. ● RDF dataset consists of statements in the form of <subject, predicate, object> ● Subject has a predicate property whose value is object. ● Ex. <person1, name, Mike> ● A typical triple store holds a multi millions/billions of such triples. ● Efficient and scalable management of RDF data is a fundamental challenge. ● SPARQL queries are submitted to the RDF store. Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang Kai Zeng, "A Distributed Graph Engine for Web Scale RDF Data," 6
  • 7. Apache Cassandra ● Distributed, fault tolerant (i.e. no single point of failures), post relational, Nosql database system. ● Peer to peer distributed architecture. Supports both strict and eventual consistency. ● All the nodes are the same. There is no master and slave nodes. ● Uses read/write anywhere style architecture. DataStax Corporation. (2011, October) “Welcome to Apache Cassandra 1.0” 7
  • 8. Triple store –approaches ● There are different approaches the exist to manage RDF data. ● Each approach has it’s own advantages and disadvantages. 8
  • 9. Relational Approach ● Triples are stored using the relational model. Justin J. Levandoski F. Mokbel, "RDF Data-Centric Storage," 9
  • 10. Relational Approach (contd.) ● Triple store - yields costly self joins of a huge RDF store (trillions of triples) ● N-array - eliminates the need for joins, but leads to higher number of nulls. ● reduces null storage, but introduces costly join. 10
  • 11. Graph based approaches ● New approach that greatly improves the performance of SPARQL query processing ● Graph exploration instead of joins. ● Unnecessary intermediate results can be pruned down. ● Models RDF data in it’s native graph form. ● Examples: Trinity, TripleRush etc. 11
  • 12. Trinity RDF ● Graph based implementation. Models RDF as a DAG. ● Subjects and objects are represented as a node. ● Predicate is represented as a directed labelled edge. ● Graph is stored in memory for fast access. H. Wang, and Y. Li B. Shao, "The Trinity graph engine. Technical Report 161291, Microsoft Research," 12
  • 13. Trinity Architecture ● Distributed in memory key value store. ● Partitions RDF graph across multiple machines by hashing on the nodes. ● Each machine holds a disjoint part of the graph. ● Final result is assembled at the proxy. Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang Kai Zeng, "A Distributed Graph Engine for Web Scale RDF Data," 13
  • 14. Methodology ● Use case Scenarios ○ Populating data into Cassandra Cluster ○ Building the RDF Graph ○ Querying the RDF Graph ○ Dropping the RDF Store ● Technologies used. ○ Apache Jena RDF API ○ Struts 2 ○ Java/JSP/XSLT/XML/XPath 14
  • 17. Evaluation and Result ● DBPedia benchmarking was used to compare. ● DBPedia geo-coordinates and homepages dataset was used. Accounts for 0.7 million triples ● 4Store, Bigdata RDF stores were compared with our implementation ● Queries used ○ Query One: Finds the homepage of the Metropolitan museum of Art ○ Query Two: Finds the Homepage of Kevin_Bacon ○ Query Three: Finds all the resources and their homepages which reside near the area of Berlin. ○ Query Four: Finds all the resources and their homepages which reside near the area of New York. 17
  • 18. Benchmark Results ● Query complexity increases from Q1 through Q4. ● The execution time taken by different RDF stores, to execute above four queries. ● Query execution time is measured in ms. Q1 Q2 Q3 Q4 Our implementation 216ms 7ms 336ms 279ms 4Store 16ms 18ms 455ms 416ms Bigdata 41ms 30ms 2sec, 355ms 1sec, 600ms DBpedia. (2008, Jan 10.) RDF Store Benchmarks with DBpedia [Online]. Available: http://wifo5-03.informatik.uni-mannheim.de/benchmarks-200801/ 18
  • 21. Benchmarking Analysis ● Graph based approach yields more performance boosts when query becomes more and more complex ● Complexity increases from Query 1 to 4 gradually. ● This implementation outperforms 4store and bigdata especially when the complexity of the query increases. ● First query takes time, because it builds the index structure. 21
  • 22. Future Work ● Main limitation of the approach is Scalability. ● Larger datasets lead to OutOfMemory error while building the graph model. ● Solution: Distributed implementation 22
  • 23. Conclusion ● Approaches used to model and retrieve RDF data. ● New approaches to manage RDF data efficiently. ● Graph based approach. ● New Implementation ○ Use case scenarios ○ Evaluation and result using DBPedia dataset ○ Benchmark Analysis 23