SlideShare a Scribd company logo
SQL-H: A New Way to Enable SQL
Analytics on Hadoop
Sushil Thomas
June 2012
Outline


•    HCatalog primer
•    Aster primer
•    SQL-H definition and features
•    SQL-H example usage




2      Confidential and proprietary. Copyright © 2011 Teradata Corporation.
HCatalog Primer
•  HCatalog provides table management and storage
   management for Apache Hadoop
    -  Provides a shared schema and data type mechanism
    -  Provides a table abstraction so that users need not be concerned
       with where or how their data is stored
    -  Provides interoperability across data processing tools such as Pig,
       Map Reduce, Streaming, and Hive


•  Uses Hive-like DDL commands. Supports tables, views,
   partitions.

•  Provides parallel load and store interfaces

•  Agnostic to file format of stored data
    -  Currently supports RCFile, CSV text, JSON text, and SequenceFile

3     Confidential and proprietary. Copyright © 2011 Teradata Corporation.
HCatalog Primer: Example Syntax

!
CREATE EXTERNAL TABLE apachelog (!
       host STRING, identity STRING, user STRING,!
       time STRING, request STRING, status STRING,!
       size STRING, referer STRING, agent STRING)!
ROW FORMAT!
SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe’!
WITH SERDEPROPERTIES ("input.regex" = "([^]*) …”)!
STORED AS TEXTFILE!
LOCATION ‘hdfs://data/apachelogs’;!
!
Note: This is run via HCatalog interfaces to record the format of data
stored in HDFS for later use by Hive, Pig etc. This is not run on the Aster
system.
!
4   Confidential and proprietary. Copyright © 2011 Teradata Corporation.
HCatalog Primer: Read Flow (Hadoop Job
Submission)


        Job Controller                                                     HCatalog Server Node

                      Table Name,
                      Partitions
                                                                              HCatalog
                                                                              Server
                          Splits




5   Confidential and proprietary. Copyright © 2011 Teradata Corporation.
HCatalog Primer: Read Flow (Hadoop Job
Execution)

Processing Nodes (running Hive, Pig or MR jobs)


    Map Task                                        Map Task                Map Task
            Tuples                                               Tuples         Tuples

            Split                                                Split          Split
                                                                                           …
     Source Data                                         Source Data         Source Data




6    Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Aster Primer

                                                                      ARC        Data
                                                                      Engine     Partition
                                                                                             Inter
                                                                     …                       Cluster
SQL-MapReduce     Parser                                              ARC        Data        Express
                                                                      Engine     Partition
                  Optimizer
                                                               Worker Nodes

                  Executor                                           ARC         Data
                                                                     Engine      Partition   Inter
                SQL Engine
                                                                     …                       Cluster
        Queen Node                                                   ARC         Data        Express
                                                                     Engine      Partition

    7     Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Aster SQL-H

•  Direct access to HCatalog data within AsterDB
    -  HCatalog tables available without duplicating DDL commands on
       the Aster side


•  HCatalog tables are first class objects within AsterDB
    -  Full support for all SQL operators


•  We use the HCatalog interfaces to read tuples in parallel on all
   data nodes




8     Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Aster Reads From HCatalog (Planning)



    Aster Optimizer
                                                                 HCatalog Server Node

              Table Name,
              Partitions
                                                                         HCatalog
                                                                         Server
                  Splits




    Query Planning Phase

9     Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Aster Reads From HCatalog (Execution)


HDFS                 Split                                               ARC Data
Data                                        Tuples
Nodes                Split
                                                                       Engine Partition


HDFS                 Split                                                ARC Data
Data                                        Tuples                      Engine Partition
Nodes                Split



HDFS                  Split                                              ARC Data
Data                                          Tuples                   Engine Partition
Nodes                 Split


          Execution Phase On A Single Worker Node

10      Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Features – Simple and Comprehensive Support

•  Interactions with HCatalog master server and HDFS only
     -  No MapReduce slots used
     -  Hadoop system can be used for other activity simultaneously


•  Aster runs native HCatalog InputReader code for translating
   HCatalog table names into input splits, and then getting data
   from input splits
     -  No impedance mismatch between the two systems
     -  Everything supported by HCatalog interfaces is supported in Aster


•  Changes made on HCatalog are reflected immediately on the
   Aster side
     -  New tables, modified schemas, new partitions etc. are available
        immediately. No extra steps required.


11     Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Features - Usability

•  Full integration with BI tools
     -  Tableau, MSTR etc. now work with data in Hadoop seamlessly


•  Data in Hadoop can now be joined with relational data in your
   Aster system
     -  Previously, using data from multiple systems involved complex ETL
        tasks


•  Full SQL support
     -  HCatalog table data can be inserted into a SQL flow just like native
        table data


•  If desired, provides a load pipeline into Aster from Hadoop


12     Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Features – Teradata Aster Analytical Foundation

•  Full suite of Aster Analytical Foundation functions available for
   data in Hadoop
     -  Time-Series/Path Analysis
     -  Statistical Analysis
     -  Relational Analysis
     -  Text Analysis
     -  Clustering Analysis
     -  Data Transformations


•  Makes users productive faster

•  Spend time analyzing data, not building functionality and tools



13     Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Features - Performance

•  Partition pruning is transparently supported
     -  select * from hadoop_weblogs where ds=‘2012-06-10’
       •  If “hadoop_weblogs” is partitioned on ‘ds’, then this command will only
          scan data in this particular partition


•  Performance Notes
     -  Data transfer is required, but the network may not be your
        bottleneck. Time taken for the initial data read may be a small part
        of overall query performance
     -  Aster’s native SQL execution engine is a lot faster than Hive’s MR
        based execution engine
     -  As queries get complex, performance advantage increases
     -  If required, impact on hadoop system and network bandwidth
        usage can be tuned down



14     Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Example SQL Syntax – Remote Catalog
beehive=> extl host=hcatalog1.asterdata.com !
List of databases!
 Name     !
----------!
 prod     !
 testdb     !
(2 rows)!
 !
beehive=> extd host=hcatalog1.asterdata.com database=prod!
List of tables!
 Name !
---------!
 apachelogs   !
 movieratings   !
(2 rows)!

15   Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Example SQL Syntax – Remote Catalog
beehive=> extd host=hcatalog1.asterdata.com database=prod
table=movieratings!
     Table ”prod".”movieratings"!
Table ”prod".”movieratings"!
Name      | Type    | Partitioned Column !
---------+---------+--------------------!
userid    | string | f!
movieid | int       | f!
rating    | double | f!
ds        | string | t!
(4 rows)!




16   Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Example SQL Syntax – HCatalog Data Access

SELECT * FROM load_from_hcatalog(!
      !   ON mr_driver !
          server(’hcatalog1.asterdata.com’)!
      !   dbname(‘prod’)!
      !   tablename(‘student’)!
      !   columns(‘userid’, ’movieid’, ‘rating’));!
!
!
CREATE VIEW hadoop_weblogs AS!
            SELECT * FROM load_from_hcatalog(!
                     ON mr_driver!
                     . . .);!




17   Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Example SQL Syntax – Data Load From HCatalog


CREATE TABLE aster_weblogs DISTRIBUTE BY HASH(userid) AS!
             SELECT * FROM hadoop_weblogs;!




18   Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Example SQL Syntax – Partition Pruning
beehive=> extd host=hcatalog1.asterdata.com database=prod
table=movieratings!
Table ”prod".”movieratings"!
Name      | Type    | Partitioned Column !
---------+---------+--------------------!
userid    | string | f!
movieid | int       | f!
rating    | double | f!
ds        | string | t!
(4 rows)!
!
!
// Because ‘ds’ is a partitioned column, the query below!
// will only pull in data from the ‘2011-06-10’ partition!
SELECT * FROM hadoop_movieratings!
          WHERE ds=‘2011-06-10’;!
19   Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Example SQL Join Syntax – Complex Queries


// Join example!
!
select t1.name, t2.page_url, t1.price                                       !
from !
   aster_product t1, !
   hadoop_weblogs t2 !
where t1.product_id=t2.product_id;!
!
!
!




20   Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Example SQL-MapReduce Syntax
// Find all the sessions with a particular page visit pattern where!
// atleast 3 products have been checked out during the session!
!
SELECT * FROM npath(!
      ON hadoop_weblogs!
      PARTITION BY sessionid ORDER BY clicktime!
      MODE(nonoverlapping) !
      PATTERN(‘h.h*.d*.c{3,}.d’)!
   SYMBOLS(pagetype = ‘home’ as h, pagetype=‘checkout’ as c,!
           pagetype<>’home’ and pagetype<>’checkout’ as d)!
   RESULT(first(sessionid of c) as sessionid,!
        max_choose(productprice, productname of c) as most_expensive,!
        max(productprice of c) as max_price,!
        min_choose(productprice, productname of c) as least_expensive, !
        min(productprice of c) as min_price))!
ORDER BY sessionid;!


21   Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Example BI Tool Usage – Path Analysis on Data
Stored in Aster and Hadoop




22   Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Example BI Tool Usage – Path Analysis on Data
Stored in Aster and Hadoop




23   Confidential and proprietary. Copyright © 2011 Teradata Corporation.
SQL-H a new way to enable SQL analytics

More Related Content

What's hot

Informatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityInformatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data Quality
Database Architechs
 
Bi Is Not An Isolated Decision
Bi Is Not An Isolated DecisionBi Is Not An Isolated Decision
Bi Is Not An Isolated DecisionJoseph Lopez
 
Sap sap so h 2013
Sap sap so h 2013Sap sap so h 2013
Sap sap so h 2013
deepersnet
 
Innovation Webinar - Using IFS Applications BI to drive business excellence
Innovation Webinar - Using IFS Applications BI to drive business excellenceInnovation Webinar - Using IFS Applications BI to drive business excellence
Innovation Webinar - Using IFS Applications BI to drive business excellence
IFS
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceDon Jackson
 
Cv D Pietrzak Dpbc En
Cv D Pietrzak Dpbc EnCv D Pietrzak Dpbc En
Cv D Pietrzak Dpbc En
dariuszpietrzak
 
From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012
Anand Deshpande
 
Empowering the Business with Agile Analytics
Empowering the Business with Agile AnalyticsEmpowering the Business with Agile Analytics
Empowering the Business with Agile Analytics
Inside Analysis
 
Big Data i CSC's optik, CSC Representative
Big Data i CSC's optik, CSC RepresentativeBig Data i CSC's optik, CSC Representative
Big Data i CSC's optik, CSC Representative
IBM Danmark
 
Open Source Solution
Open Source SolutionOpen Source Solution
Open Source Solutionittishait
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Usedmurph4
 
Innovations in SAP BusinessObjects 4.0
Innovations in SAP BusinessObjects 4.0Innovations in SAP BusinessObjects 4.0
Innovations in SAP BusinessObjects 4.0
Pierre Leroux
 
Tera stream for datastreams
Tera stream for datastreamsTera stream for datastreams
Tera stream for datastreams치민 최
 
Saleseffectivity and business intelligence
Saleseffectivity and business intelligenceSaleseffectivity and business intelligence
Saleseffectivity and business intelligence
marekdan
 
B13 Driving Business Intelligence John Robson
B13 Driving Business Intelligence John RobsonB13 Driving Business Intelligence John Robson
B13 Driving Business Intelligence John Robson
Provoke Solutions
 
Kaizentric Presentation
Kaizentric PresentationKaizentric Presentation
Kaizentric Presentation
Azhagarasan Annadorai
 
Rationalizing an Enterprise IT Architecture
Rationalizing an Enterprise IT ArchitectureRationalizing an Enterprise IT Architecture
Rationalizing an Enterprise IT Architecture
Bob Rhubart
 
Database Architecture Proposal
Database Architecture ProposalDatabase Architecture Proposal
Database Architecture Proposal
DATANYWARE.com
 
Sap Supplier Risk Performance 2011
Sap Supplier Risk  Performance 2011Sap Supplier Risk  Performance 2011
Sap Supplier Risk Performance 2011
Henner Schliebs
 

What's hot (20)

Informatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data QualityInformatica World 2006 - MDM Data Quality
Informatica World 2006 - MDM Data Quality
 
Bi Is Not An Isolated Decision
Bi Is Not An Isolated DecisionBi Is Not An Isolated Decision
Bi Is Not An Isolated Decision
 
Sap sap so h 2013
Sap sap so h 2013Sap sap so h 2013
Sap sap so h 2013
 
Mobile Analytics
Mobile AnalyticsMobile Analytics
Mobile Analytics
 
Innovation Webinar - Using IFS Applications BI to drive business excellence
Innovation Webinar - Using IFS Applications BI to drive business excellenceInnovation Webinar - Using IFS Applications BI to drive business excellence
Innovation Webinar - Using IFS Applications BI to drive business excellence
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Cv D Pietrzak Dpbc En
Cv D Pietrzak Dpbc EnCv D Pietrzak Dpbc En
Cv D Pietrzak Dpbc En
 
From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012
 
Empowering the Business with Agile Analytics
Empowering the Business with Agile AnalyticsEmpowering the Business with Agile Analytics
Empowering the Business with Agile Analytics
 
Big Data i CSC's optik, CSC Representative
Big Data i CSC's optik, CSC RepresentativeBig Data i CSC's optik, CSC Representative
Big Data i CSC's optik, CSC Representative
 
Open Source Solution
Open Source SolutionOpen Source Solution
Open Source Solution
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Use
 
Innovations in SAP BusinessObjects 4.0
Innovations in SAP BusinessObjects 4.0Innovations in SAP BusinessObjects 4.0
Innovations in SAP BusinessObjects 4.0
 
Tera stream for datastreams
Tera stream for datastreamsTera stream for datastreams
Tera stream for datastreams
 
Saleseffectivity and business intelligence
Saleseffectivity and business intelligenceSaleseffectivity and business intelligence
Saleseffectivity and business intelligence
 
B13 Driving Business Intelligence John Robson
B13 Driving Business Intelligence John RobsonB13 Driving Business Intelligence John Robson
B13 Driving Business Intelligence John Robson
 
Kaizentric Presentation
Kaizentric PresentationKaizentric Presentation
Kaizentric Presentation
 
Rationalizing an Enterprise IT Architecture
Rationalizing an Enterprise IT ArchitectureRationalizing an Enterprise IT Architecture
Rationalizing an Enterprise IT Architecture
 
Database Architecture Proposal
Database Architecture ProposalDatabase Architecture Proposal
Database Architecture Proposal
 
Sap Supplier Risk Performance 2011
Sap Supplier Risk  Performance 2011Sap Supplier Risk  Performance 2011
Sap Supplier Risk Performance 2011
 

Similar to SQL-H a new way to enable SQL analytics

Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
Felicia Haggarty
 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HiveYukinori Suda
 
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
オラクルエンジニア通信
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Modern Data Stack France
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
lucenerevolution
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
bigdatagurus_meetup
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
All Things Open
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Cloudera, Inc.
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Alluxio, Inc.
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
 
Miro Consulting Oracle Exadata Database Machine Offering
Miro Consulting  Oracle Exadata Database Machine OfferingMiro Consulting  Oracle Exadata Database Machine Offering
Miro Consulting Oracle Exadata Database Machine Offering
garylcoleman
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
Kashif Khan
 
HTAP Queries
HTAP QueriesHTAP Queries
HTAP Queries
Atif Shaikh
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
Adam Muise
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
Data Con LA
 
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBaseOct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Yahoo Developer Network
 
Sybase To Oracle Migration for DBAs
Sybase To Oracle Migration for DBAsSybase To Oracle Migration for DBAs
Sybase To Oracle Migration for DBAs
Clearwater Technical Group Inc
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
Yousun Jeong
 

Similar to SQL-H a new way to enable SQL analytics (20)

Dancing with the Elephant
Dancing with the ElephantDancing with the Elephant
Dancing with the Elephant
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
 
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2020年8月版]
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
Miro Consulting Oracle Exadata Database Machine Offering
Miro Consulting  Oracle Exadata Database Machine OfferingMiro Consulting  Oracle Exadata Database Machine Offering
Miro Consulting Oracle Exadata Database Machine Offering
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
HTAP Queries
HTAP QueriesHTAP Queries
HTAP Queries
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBaseOct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
 
Sybase To Oracle Migration for DBAs
Sybase To Oracle Migration for DBAsSybase To Oracle Migration for DBAs
Sybase To Oracle Migration for DBAs
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

SQL-H a new way to enable SQL analytics

  • 1. SQL-H: A New Way to Enable SQL Analytics on Hadoop Sushil Thomas June 2012
  • 2. Outline •  HCatalog primer •  Aster primer •  SQL-H definition and features •  SQL-H example usage 2 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 3. HCatalog Primer •  HCatalog provides table management and storage management for Apache Hadoop -  Provides a shared schema and data type mechanism -  Provides a table abstraction so that users need not be concerned with where or how their data is stored -  Provides interoperability across data processing tools such as Pig, Map Reduce, Streaming, and Hive •  Uses Hive-like DDL commands. Supports tables, views, partitions. •  Provides parallel load and store interfaces •  Agnostic to file format of stored data -  Currently supports RCFile, CSV text, JSON text, and SequenceFile 3 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 4. HCatalog Primer: Example Syntax ! CREATE EXTERNAL TABLE apachelog (! host STRING, identity STRING, user STRING,! time STRING, request STRING, status STRING,! size STRING, referer STRING, agent STRING)! ROW FORMAT! SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe’! WITH SERDEPROPERTIES ("input.regex" = "([^]*) …”)! STORED AS TEXTFILE! LOCATION ‘hdfs://data/apachelogs’;! ! Note: This is run via HCatalog interfaces to record the format of data stored in HDFS for later use by Hive, Pig etc. This is not run on the Aster system. ! 4 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 5. HCatalog Primer: Read Flow (Hadoop Job Submission) Job Controller HCatalog Server Node Table Name, Partitions HCatalog Server Splits 5 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 6. HCatalog Primer: Read Flow (Hadoop Job Execution) Processing Nodes (running Hive, Pig or MR jobs) Map Task Map Task Map Task Tuples Tuples Tuples Split Split Split … Source Data Source Data Source Data 6 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 7. Aster Primer ARC Data Engine Partition Inter … Cluster SQL-MapReduce Parser ARC Data Express Engine Partition Optimizer Worker Nodes Executor ARC Data Engine Partition Inter SQL Engine … Cluster Queen Node ARC Data Express Engine Partition 7 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 8. Aster SQL-H •  Direct access to HCatalog data within AsterDB -  HCatalog tables available without duplicating DDL commands on the Aster side •  HCatalog tables are first class objects within AsterDB -  Full support for all SQL operators •  We use the HCatalog interfaces to read tuples in parallel on all data nodes 8 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 9. Aster Reads From HCatalog (Planning) Aster Optimizer HCatalog Server Node Table Name, Partitions HCatalog Server Splits Query Planning Phase 9 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 10. Aster Reads From HCatalog (Execution) HDFS Split ARC Data Data Tuples Nodes Split Engine Partition HDFS Split ARC Data Data Tuples Engine Partition Nodes Split HDFS Split ARC Data Data Tuples Engine Partition Nodes Split Execution Phase On A Single Worker Node 10 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 11. Features – Simple and Comprehensive Support •  Interactions with HCatalog master server and HDFS only -  No MapReduce slots used -  Hadoop system can be used for other activity simultaneously •  Aster runs native HCatalog InputReader code for translating HCatalog table names into input splits, and then getting data from input splits -  No impedance mismatch between the two systems -  Everything supported by HCatalog interfaces is supported in Aster •  Changes made on HCatalog are reflected immediately on the Aster side -  New tables, modified schemas, new partitions etc. are available immediately. No extra steps required. 11 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 12. Features - Usability •  Full integration with BI tools -  Tableau, MSTR etc. now work with data in Hadoop seamlessly •  Data in Hadoop can now be joined with relational data in your Aster system -  Previously, using data from multiple systems involved complex ETL tasks •  Full SQL support -  HCatalog table data can be inserted into a SQL flow just like native table data •  If desired, provides a load pipeline into Aster from Hadoop 12 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 13. Features – Teradata Aster Analytical Foundation •  Full suite of Aster Analytical Foundation functions available for data in Hadoop -  Time-Series/Path Analysis -  Statistical Analysis -  Relational Analysis -  Text Analysis -  Clustering Analysis -  Data Transformations •  Makes users productive faster •  Spend time analyzing data, not building functionality and tools 13 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 14. Features - Performance •  Partition pruning is transparently supported -  select * from hadoop_weblogs where ds=‘2012-06-10’ •  If “hadoop_weblogs” is partitioned on ‘ds’, then this command will only scan data in this particular partition •  Performance Notes -  Data transfer is required, but the network may not be your bottleneck. Time taken for the initial data read may be a small part of overall query performance -  Aster’s native SQL execution engine is a lot faster than Hive’s MR based execution engine -  As queries get complex, performance advantage increases -  If required, impact on hadoop system and network bandwidth usage can be tuned down 14 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 15. Example SQL Syntax – Remote Catalog beehive=> extl host=hcatalog1.asterdata.com ! List of databases! Name ! ----------! prod ! testdb ! (2 rows)! ! beehive=> extd host=hcatalog1.asterdata.com database=prod! List of tables! Name ! ---------! apachelogs ! movieratings ! (2 rows)! 15 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 16. Example SQL Syntax – Remote Catalog beehive=> extd host=hcatalog1.asterdata.com database=prod table=movieratings! Table ”prod".”movieratings"! Table ”prod".”movieratings"! Name | Type | Partitioned Column ! ---------+---------+--------------------! userid | string | f! movieid | int | f! rating | double | f! ds | string | t! (4 rows)! 16 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 17. Example SQL Syntax – HCatalog Data Access SELECT * FROM load_from_hcatalog(! ! ON mr_driver ! server(’hcatalog1.asterdata.com’)! ! dbname(‘prod’)! ! tablename(‘student’)! ! columns(‘userid’, ’movieid’, ‘rating’));! ! ! CREATE VIEW hadoop_weblogs AS! SELECT * FROM load_from_hcatalog(! ON mr_driver! . . .);! 17 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 18. Example SQL Syntax – Data Load From HCatalog CREATE TABLE aster_weblogs DISTRIBUTE BY HASH(userid) AS! SELECT * FROM hadoop_weblogs;! 18 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 19. Example SQL Syntax – Partition Pruning beehive=> extd host=hcatalog1.asterdata.com database=prod table=movieratings! Table ”prod".”movieratings"! Name | Type | Partitioned Column ! ---------+---------+--------------------! userid | string | f! movieid | int | f! rating | double | f! ds | string | t! (4 rows)! ! ! // Because ‘ds’ is a partitioned column, the query below! // will only pull in data from the ‘2011-06-10’ partition! SELECT * FROM hadoop_movieratings! WHERE ds=‘2011-06-10’;! 19 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 20. Example SQL Join Syntax – Complex Queries // Join example! ! select t1.name, t2.page_url, t1.price ! from ! aster_product t1, ! hadoop_weblogs t2 ! where t1.product_id=t2.product_id;! ! ! ! 20 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 21. Example SQL-MapReduce Syntax // Find all the sessions with a particular page visit pattern where! // atleast 3 products have been checked out during the session! ! SELECT * FROM npath(! ON hadoop_weblogs! PARTITION BY sessionid ORDER BY clicktime! MODE(nonoverlapping) ! PATTERN(‘h.h*.d*.c{3,}.d’)! SYMBOLS(pagetype = ‘home’ as h, pagetype=‘checkout’ as c,! pagetype<>’home’ and pagetype<>’checkout’ as d)! RESULT(first(sessionid of c) as sessionid,! max_choose(productprice, productname of c) as most_expensive,! max(productprice of c) as max_price,! min_choose(productprice, productname of c) as least_expensive, ! min(productprice of c) as min_price))! ORDER BY sessionid;! 21 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 22. Example BI Tool Usage – Path Analysis on Data Stored in Aster and Hadoop 22 Confidential and proprietary. Copyright © 2011 Teradata Corporation.
  • 23. Example BI Tool Usage – Path Analysis on Data Stored in Aster and Hadoop 23 Confidential and proprietary. Copyright © 2011 Teradata Corporation.