Your SlideShare is downloading. ×

Heart Proposal

2,526

Published on

Heart (Highly Extensible & Accumlative RDF Table) is an opensource project that intends to design and implement a system that stores and processes large-scale RDF data. Heart is based on Hadoop and …

Heart (Highly Extensible & Accumlative RDF Table) is an opensource project that intends to design and implement a system that stores and processes large-scale RDF data. Heart is based on Hadoop and HBase, opensource projects of Apache.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,526
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Highly Extensible & Available RDF Table: Heart Edward J. Yoon (edwardyoon@apache.org), Hyunsik Choi (hyunsik.choi@gmail.com)
  • 2. Background
    • RDF and Graph Databases have gained increasing attention from both academic and industrial areas.
    • Most of the RDF and Graph Databases are likely to be very large as follows:
      • web data, biological data, and social Networks
    • There is no product for large-scale RDF data.
  • 3. Proposal
    • A system to store and process Large-Scale RDF.
    • Our system is based on distributed frameworks, Hbase and Hadoop.
      • Hbase and Hadoop are Apache Projects
  • 4. The Main Goals
    • Scalability for large-scale data
    • To support various queries
  • 5. Why is RDF on Hbase?
    • RDF data is very sparse (when RDF are denomalized for each same subject)
    • Hbase well support very sparse data.
      • Column-based storage
      • Various compression techniques
  • 6. Architecture
    • Heart consists of components as follows:
      • Heart Data Loader
      • Heart Storage Manager
      • Heart Query Processor
        • Heart Query Language
      • Heart Data Materializer
  • 7. Heart Data Loader
    • For bulk insertion
    • Load RDF data from a very large RDF or TURTLE file
    • Organizes the graph data into a Hbase table
  • 8. Heart Storage Manager
    • Concern with how to partition graphs (i.e., triples) in terms of their topological characteristics.
    • Determine which clients store the given partitioned graphs.
    • Give locality between adjacent subjects
  • 9. Heart Query Processor
    • Executes RDF queries on large-scale RDF data in a Hbase table
    • SPQRQL query -> Query Parser -> Query Optimizer -> Execution
  • 10. Heart Query Language
    • Based on SPARQL (from W3C)
    • But, we intend to extend SPARQL in order to support more various graph queries.
  • 11. Heart Data Materializer
    • Generate index data for RDF Sub-graph Query
      • Result in efficient query processing
    • We have to design an efficient index scheme.
  • 12. Summary
    • We propose Heart to store and process large-scale RDF data on Hbase.
    • The main goals are to provide scalability and various queries.
    • It will be one of the Apache Incubator Projects.

×