Highly Extensible & Available RDF Table:  Heart Edward J. Yoon (edwardyoon@apache.org), Hyunsik Choi (hyunsik.choi@gmail.c...
Background <ul><li>RDF and Graph Databases have gained increasing attention from both academic and industrial areas. </li>...
Proposal <ul><li>A system to store and process Large-Scale RDF. </li></ul><ul><li>Our system is based on distributed frame...
The Main Goals <ul><li>Scalability for large-scale data </li></ul><ul><li>To support various queries </li></ul>
Why is RDF on Hbase? <ul><li>RDF data is very sparse (when RDF are denomalized for each same subject) </li></ul><ul><li>Hb...
Architecture <ul><li>Heart consists of components as follows: </li></ul><ul><ul><li>Heart Data Loader </li></ul></ul><ul><...
Heart Data Loader <ul><li>For bulk insertion </li></ul><ul><li>Load RDF data from a very large RDF or TURTLE file </li></u...
Heart Storage Manager <ul><li>Concern with how to partition graphs (i.e., triples) in terms of their topological character...
Heart Query Processor <ul><li>Executes RDF queries on large-scale RDF data in a Hbase table </li></ul><ul><li>SPQRQL query...
Heart Query Language <ul><li>Based on SPARQL (from W3C) </li></ul><ul><li>But, we intend to extend SPARQL in order to supp...
Heart Data Materializer <ul><li>Generate index data for RDF Sub-graph Query </li></ul><ul><ul><li>Result in efficient quer...
Summary <ul><li>We propose  Heart   to store and process large-scale RDF data on Hbase. </li></ul><ul><li>The main goals a...
Upcoming SlideShare
Loading in...5
×

Heart Proposal

2,549

Published on

Heart (Highly Extensible & Accumlative RDF Table) is an opensource project that intends to design and implement a system that stores and processes large-scale RDF data. Heart is based on Hadoop and HBase, opensource projects of Apache.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,549
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Heart Proposal

  1. 1. Highly Extensible & Available RDF Table: Heart Edward J. Yoon (edwardyoon@apache.org), Hyunsik Choi (hyunsik.choi@gmail.com)
  2. 2. Background <ul><li>RDF and Graph Databases have gained increasing attention from both academic and industrial areas. </li></ul><ul><li>Most of the RDF and Graph Databases are likely to be very large as follows: </li></ul><ul><ul><li>web data, biological data, and social Networks </li></ul></ul><ul><li>There is no product for large-scale RDF data. </li></ul>
  3. 3. Proposal <ul><li>A system to store and process Large-Scale RDF. </li></ul><ul><li>Our system is based on distributed frameworks, Hbase and Hadoop. </li></ul><ul><ul><li>Hbase and Hadoop are Apache Projects </li></ul></ul>
  4. 4. The Main Goals <ul><li>Scalability for large-scale data </li></ul><ul><li>To support various queries </li></ul>
  5. 5. Why is RDF on Hbase? <ul><li>RDF data is very sparse (when RDF are denomalized for each same subject) </li></ul><ul><li>Hbase well support very sparse data. </li></ul><ul><ul><li>Column-based storage </li></ul></ul><ul><ul><li>Various compression techniques </li></ul></ul>
  6. 6. Architecture <ul><li>Heart consists of components as follows: </li></ul><ul><ul><li>Heart Data Loader </li></ul></ul><ul><ul><li>Heart Storage Manager </li></ul></ul><ul><ul><li>Heart Query Processor </li></ul></ul><ul><ul><ul><li>Heart Query Language </li></ul></ul></ul><ul><ul><li>Heart Data Materializer </li></ul></ul>
  7. 7. Heart Data Loader <ul><li>For bulk insertion </li></ul><ul><li>Load RDF data from a very large RDF or TURTLE file </li></ul><ul><li>Organizes the graph data into a Hbase table </li></ul>
  8. 8. Heart Storage Manager <ul><li>Concern with how to partition graphs (i.e., triples) in terms of their topological characteristics. </li></ul><ul><li>Determine which clients store the given partitioned graphs. </li></ul><ul><li>Give locality between adjacent subjects </li></ul>
  9. 9. Heart Query Processor <ul><li>Executes RDF queries on large-scale RDF data in a Hbase table </li></ul><ul><li>SPQRQL query -> Query Parser -> Query Optimizer -> Execution </li></ul>
  10. 10. Heart Query Language <ul><li>Based on SPARQL (from W3C) </li></ul><ul><li>But, we intend to extend SPARQL in order to support more various graph queries. </li></ul>
  11. 11. Heart Data Materializer <ul><li>Generate index data for RDF Sub-graph Query </li></ul><ul><ul><li>Result in efficient query processing </li></ul></ul><ul><li>We have to design an efficient index scheme. </li></ul>
  12. 12. Summary <ul><li>We propose Heart to store and process large-scale RDF data on Hbase. </li></ul><ul><li>The main goals are to provide scalability and various queries. </li></ul><ul><li>It will be one of the Apache Incubator Projects. </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×