• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Heart Proposal
 

Heart Proposal

on

  • 3,772 views

Heart (Highly Extensible & Accumlative RDF Table) is an opensource project that intends to design and implement a system that stores and processes large-scale RDF data. Heart is based on Hadoop and ...

Heart (Highly Extensible & Accumlative RDF Table) is an opensource project that intends to design and implement a system that stores and processes large-scale RDF data. Heart is based on Hadoop and HBase, opensource projects of Apache.

Statistics

Views

Total Views
3,772
Views on SlideShare
3,765
Embed Views
7

Actions

Likes
2
Downloads
12
Comments
0

1 Embed 7

http://www.linkedin.com 7

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Heart Proposal Heart Proposal Presentation Transcript

    • Highly Extensible & Available RDF Table: Heart Edward J. Yoon (edwardyoon@apache.org), Hyunsik Choi (hyunsik.choi@gmail.com)
    • Background
      • RDF and Graph Databases have gained increasing attention from both academic and industrial areas.
      • Most of the RDF and Graph Databases are likely to be very large as follows:
        • web data, biological data, and social Networks
      • There is no product for large-scale RDF data.
    • Proposal
      • A system to store and process Large-Scale RDF.
      • Our system is based on distributed frameworks, Hbase and Hadoop.
        • Hbase and Hadoop are Apache Projects
    • The Main Goals
      • Scalability for large-scale data
      • To support various queries
    • Why is RDF on Hbase?
      • RDF data is very sparse (when RDF are denomalized for each same subject)
      • Hbase well support very sparse data.
        • Column-based storage
        • Various compression techniques
    • Architecture
      • Heart consists of components as follows:
        • Heart Data Loader
        • Heart Storage Manager
        • Heart Query Processor
          • Heart Query Language
        • Heart Data Materializer
    • Heart Data Loader
      • For bulk insertion
      • Load RDF data from a very large RDF or TURTLE file
      • Organizes the graph data into a Hbase table
    • Heart Storage Manager
      • Concern with how to partition graphs (i.e., triples) in terms of their topological characteristics.
      • Determine which clients store the given partitioned graphs.
      • Give locality between adjacent subjects
    • Heart Query Processor
      • Executes RDF queries on large-scale RDF data in a Hbase table
      • SPQRQL query -> Query Parser -> Query Optimizer -> Execution
    • Heart Query Language
      • Based on SPARQL (from W3C)
      • But, we intend to extend SPARQL in order to support more various graph queries.
    • Heart Data Materializer
      • Generate index data for RDF Sub-graph Query
        • Result in efficient query processing
      • We have to design an efficient index scheme.
    • Summary
      • We propose Heart to store and process large-scale RDF data on Hbase.
      • The main goals are to provide scalability and various queries.
      • It will be one of the Apache Incubator Projects.