Highly Extensible & Available RDF Table:  Heart Edward J. Yoon (edwardyoon@apache.org), Hyunsik Choi (hyunsik.choi@gmail.com)
Background RDF and Graph Databases have gained increasing attention from both academic and industrial areas. Most of the RDF and Graph Databases are likely to be very large as follows: web data, biological data, and social Networks There is no product for large-scale RDF data.
Proposal A system to store and process Large-Scale RDF. Our system is based on distributed frameworks, Hbase and Hadoop. Hbase and Hadoop are Apache Projects
The Main Goals Scalability for large-scale data To support various queries
Why is RDF on Hbase? RDF data is very sparse (when RDF are denomalized for each same subject) Hbase well support very sparse data. Column-based storage Various compression techniques
Architecture Heart consists of components as follows: Heart Data Loader Heart Storage Manager Heart Query Processor Heart Query Language Heart Data Materializer
Heart Data Loader For bulk insertion Load RDF data from a very large RDF or TURTLE file Organizes the graph data into a Hbase table
Heart Storage Manager Concern with how to partition graphs (i.e., triples) in terms of their topological characteristics. Determine which clients store the given partitioned graphs. Give locality between adjacent subjects
Heart Query Processor Executes RDF queries on large-scale RDF data in a Hbase table SPQRQL query -> Query Parser -> Query Optimizer -> Execution
Heart Query Language Based on SPARQL (from W3C) But, we intend to extend SPARQL in order to support more various graph queries.
Heart Data Materializer Generate index data for RDF Sub-graph Query Result in efficient query processing We have to design an efficient index scheme.
Summary We propose  Heart   to store and process large-scale RDF data on Hbase. The main goals are to provide scalability and various queries. It will be one of the Apache Incubator Projects.

Heart Proposal

  • 1.
    Highly Extensible &Available RDF Table: Heart Edward J. Yoon (edwardyoon@apache.org), Hyunsik Choi (hyunsik.choi@gmail.com)
  • 2.
    Background RDF andGraph Databases have gained increasing attention from both academic and industrial areas. Most of the RDF and Graph Databases are likely to be very large as follows: web data, biological data, and social Networks There is no product for large-scale RDF data.
  • 3.
    Proposal A systemto store and process Large-Scale RDF. Our system is based on distributed frameworks, Hbase and Hadoop. Hbase and Hadoop are Apache Projects
  • 4.
    The Main GoalsScalability for large-scale data To support various queries
  • 5.
    Why is RDFon Hbase? RDF data is very sparse (when RDF are denomalized for each same subject) Hbase well support very sparse data. Column-based storage Various compression techniques
  • 6.
    Architecture Heart consistsof components as follows: Heart Data Loader Heart Storage Manager Heart Query Processor Heart Query Language Heart Data Materializer
  • 7.
    Heart Data LoaderFor bulk insertion Load RDF data from a very large RDF or TURTLE file Organizes the graph data into a Hbase table
  • 8.
    Heart Storage ManagerConcern with how to partition graphs (i.e., triples) in terms of their topological characteristics. Determine which clients store the given partitioned graphs. Give locality between adjacent subjects
  • 9.
    Heart Query ProcessorExecutes RDF queries on large-scale RDF data in a Hbase table SPQRQL query -> Query Parser -> Query Optimizer -> Execution
  • 10.
    Heart Query LanguageBased on SPARQL (from W3C) But, we intend to extend SPARQL in order to support more various graph queries.
  • 11.
    Heart Data MaterializerGenerate index data for RDF Sub-graph Query Result in efficient query processing We have to design an efficient index scheme.
  • 12.
    Summary We propose Heart to store and process large-scale RDF data on Hbase. The main goals are to provide scalability and various queries. It will be one of the Apache Incubator Projects.