Your SlideShare is downloading. ×
0
Tilani Gunawardena
 Introduction Architecture             and Design Example application Demostration Scenario
   Managing and analysing massive data    ◦ Provides high performance    ◦ Scales over clusters of thousands of heterogen...
   Database Connector - connects Hadoop with    the single-node database systems.   Data Loader - partitions data and ma...
 Supports any JDBC-compliant database server  as an underlying DBMS layer Applications built on top of HadoopDB  general...
   A semantic web/biological data analysis    application.   A business data warehousing application.
   Semantic web is an effort by the W3C to    enable integration and sharing of data across    dierent applications   RD...
   Find all proteins whose existence in the    `Human organism is uncertain   SPARQL query :
   demonstrate    ◦ how the data administrator should prepare the      dataset.   Analyst- is shielded from the complexi...
   Natural target application for HadoopDB.   Common business data warehousing    workloads are read-mostly and involve ...
   Find 10 highest-revenue unshipped orders   Query :
   Audience is invited to query both data sets    through HadoopDB   Data sets are located in a remote cluster   Multip...
 user   selects dataset SemanticWeb—Biological Data Analysis    - An animation of the behind-the-scenes data    preparat...
 In addition demonstrate HadoopDBs fault-  tolerance with the introduction of a node  failure. For a subset of the prede...
Thank You!
HadoopDB in Action
HadoopDB in Action
Upcoming SlideShare
Loading in...5
×

HadoopDB in Action

460

Published on

HadoopDB in Action: Building Real World Applications

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
460
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Versatile-system flexibility
  • Key components of HadoopDB
  • HadoopDB therefore pushes computation closer to data (into the data tier) to achieve maximum parallelization in a multi-node clustercomplexity of the data tier and its parallel nature is hidden from the application developer
  • Universal Protein Resource.presentation layer consists of a web-based interface where analysts specify queries and view resultslogic layer consists of a SPARQL to SQL conversion toollogic and data layer communicate through JDBC
  • presentations provide our audience with an idea of the eort required for data preparation in HadoopDB
  • Transcript of "HadoopDB in Action"

    1. 1. Tilani Gunawardena
    2. 2.  Introduction Architecture and Design Example application Demostration Scenario
    3. 3.  Managing and analysing massive data ◦ Provides high performance ◦ Scales over clusters of thousands of heterogeneous machines ◦ Versatile-adaptability of a system to analytical queries of varying complexityHow does one build real world applications withHadoopDB?
    4. 4.  Database Connector - connects Hadoop with the single-node database systems. Data Loader - partitions data and manages parallel loading of data into the database systems. Catalog - tracks locations of different data chunks,including those replicated across multiple nodes. SQL-MapReduce-SQL (SMS) planner - extends Hive to provide a SQL interface to HadoopDB
    5. 5.  Supports any JDBC-compliant database server as an underlying DBMS layer Applications built on top of HadoopDB generally use the 3-tier architecture ◦ data tier ◦ business logic tier ◦ presentation tier HadoopDB is a black box(in application perspective)
    6. 6.  A semantic web/biological data analysis application. A business data warehousing application.
    7. 7.  Semantic web is an effort by the W3C to enable integration and sharing of data across dierent applications RDF- is a directed, labeled graph data format for representing information in the Web SPARQL –is an RDF query language
    8. 8.  Find all proteins whose existence in the `Human organism is uncertain SPARQL query :
    9. 9.  demonstrate ◦ how the data administrator should prepare the dataset. Analyst- is shielded from the complexity of the actual implementation of the RDF storage layer.
    10. 10.  Natural target application for HadoopDB. Common business data warehousing workloads are read-mostly and involve analytical queries over a complex schema To achieve good query performance, the dataset requires signicant preparation through data partitioning and replication to optimize for join queries Data & Queries- TPC-H benchmark
    11. 11.  Find 10 highest-revenue unshipped orders Query :
    12. 12.  Audience is invited to query both data sets through HadoopDB Data sets are located in a remote cluster Multiple users interaction- two client machines that connect to the clusters.
    13. 13.  user selects dataset SemanticWeb—Biological Data Analysis - An animation of the behind-the-scenes data preparation & loading is presented - Details on the tools used for data conversion from RDF to relational form. Business Data Warehousing- the animation provides details on the partitioning scheme, the interaction between the loader and catalog components, and a summary of the configuration parameters User select and parametrize a query to execute -User can then monitor the progress of query execution
    14. 14.  In addition demonstrate HadoopDBs fault- tolerance with the introduction of a node failure. For a subset of the predened queries, as the query executes in the background, an animation of the flow of data and control through the HadoopDB system is simultaneously presented, highlighting which parts of the query execution are run in parallel.
    15. 15. Thank You!
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×