SlideShare a Scribd company logo
SQL-on-Accumulo with Pivotal HAWQ and PXF
Agenda
• HAWQ & PXF Overview
• Accumulo Connector - Usage
• Accumulo Connector - Advanced Features
• PXF API
• Demo
HAWQ is…
A parallel SQL query engine on Hadoop
PHD
PHD
PHD
PHD
PXF is...
A fast extensible framework connecting
HAWQ to a data store of choice that
exposes a parallel API
PHD
directanalytics
PXF
PHD
indirectanalytics
PXF
Usage
CREATE EXTERNAL TABLE <table>(<col list>)
LOCATION (‘pxf://rest_host:port/<data source>?<plugin options>’)
FORMAT ‘<type>’(<params>)
[SEGMENT REJECT LIMIT <n> [ROWS|PERCENT] LOG ERRORS INTO <err_t>]
-- direct analytics (external)
SELECT <…> FROM <table> WHERE <…>
-- indirect analytics (internal)
INSERT INTO <hawq table> SELECT <…> FROM <table> WHERE <…>
Any SQL operation (joining, aggregates, sorting, etc) can be executed
Accumulo Connector - Usage
CREATE EXTERNAL TABLE <table>(<col list>)
LOCATION (‘pxf://…/<accumulo table name>?profile=accumulo’)
FORMAT ‘custom’(formatter=‘pxfwritable_import’)
CREATE EXTERNAL TABLE t(
recordkey text,
“cf1:date” date,
“cf1:price” double)
LOCATION (‘pxf://…/instance:sales?profile=accumulo’)
FORMAT ‘custom’(formatter=‘pxfwritable_import’)
-- Example of a simple query
SELECT “cf1:date”, max(“cf1:price”)
FROM t
GROUP BY “cf1:date”
Accumulo Connector - Advanced Features
Smart filtering with predicate pushdown
Excluding irrelevant tablets and filtering on values on source according to HAWQ’s query
WHERE clause.
Error tables for logging badly formatted data and avoid aborting the query
Specify desired error threshold. Query the error table after operation to see the rejected
data and the related error.
Lookup table for easy access to non textual qualifiers
Define a qualifier lookup table that translates between Accumulo style naming and SQL
style naming.
Automatic Statistics for better join planning
Run ANALYZE on a PXF-Accumulo table to update HAWQ’s optimizer with table and
attribute level statistics from the Accumulo table.
Mechanism for storing remote credentials
The mapping between a HAWQ user credentials and Accumulo user credentials are
entered once in HAWQ and automatically transferred to the Accumulo connector in
runtime.
Accumulo Connector - Advanced Features
Visibility labels for enhanced security
The Accumulo connector utilizes Accumulo’s built in cell-level security to ensure users
are only able to view information for which they have been granted access.
Custom Iterators for increased performance
Predicate pushdown is implemented using stackable custom Iterators which increase
comparison operation (<, <=, >, >=, ==, !=) performance in a query’s WHERE clause.
Intelligent range filtering
Specifying a comparison on a recordkey will modify the Accumulo Connector’s range,
minimizing the amount of data scanned, resulting in faster scans.
Automatic type detection
Data types are detected automatically within the iterator, ensuring correct comparison
operations are being utilized.
PXF API
• Fragmenter – returns a list of data source fragments and their location
• Accessor – access a given list of fragments, read them and return records
• Resolver – deserialize each record according to a given schema or technique
Distributed
execution
threads
Distributed
database
servers
PXF API
• AccumuloFragmenter
returns a list of Accumulo tablets+locations for a given table
• AccumuloAccessor
access a given list of fragments, read them and return Accumulo records. Use filter
pushdown when possible
• AccumuloResolver
convert each qualifier value into something that can be understood by HAWQ
Live Demo
Accumulo Table Contents
User Authorizations
$PHD_ROOT/conf/pxf-profiles.xml
Define Table in HAWQ
Setting Authorizations
Executing a Simple Query
A Query With a Single Pushdown Filter
A Query With a Single Pushdown Filter
A Query With a Multiple Pushdown Filters
A Query With a Multiple Pushdown Filters
A Query With a Multiple Pushdown Filters
Setting Authorizations
Executing a Query as ‘foo’
Define a Lookup Table in Accumulo
Define a Lookup Table in HAWQ
Performing a Simple Query

More Related Content

What's hot

Getting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQLGetting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQL
Morgan Dedmon
 
Crack the complexity of oracle applications r12 workload v2
Crack the complexity of oracle applications r12 workload v2Crack the complexity of oracle applications r12 workload v2
Crack the complexity of oracle applications r12 workload v2
Ajith Narayanan
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
lucenerevolution
 
Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014
Eric Torreborre
 
Oracle data pump
Oracle data pumpOracle data pump
Oracle data pumpmarcxav72
 
Making Nested Columns as First Citizen in Apache Spark SQL
Making Nested Columns as First Citizen in Apache Spark SQLMaking Nested Columns as First Citizen in Apache Spark SQL
Making Nested Columns as First Citizen in Apache Spark SQL
Databricks
 
Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014
John Beresniewicz
 
Oracle SQL Tuning
Oracle SQL TuningOracle SQL Tuning
Oracle SQL Tuning
Alex Zaballa
 
Awr1page OTW2018
Awr1page OTW2018Awr1page OTW2018
Awr1page OTW2018
John Beresniewicz
 

What's hot (9)

Getting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQLGetting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQL
 
Crack the complexity of oracle applications r12 workload v2
Crack the complexity of oracle applications r12 workload v2Crack the complexity of oracle applications r12 workload v2
Crack the complexity of oracle applications r12 workload v2
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014
 
Oracle data pump
Oracle data pumpOracle data pump
Oracle data pump
 
Making Nested Columns as First Citizen in Apache Spark SQL
Making Nested Columns as First Citizen in Apache Spark SQLMaking Nested Columns as First Citizen in Apache Spark SQL
Making Nested Columns as First Citizen in Apache Spark SQL
 
Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014
 
Oracle SQL Tuning
Oracle SQL TuningOracle SQL Tuning
Oracle SQL Tuning
 
Awr1page OTW2018
Awr1page OTW2018Awr1page OTW2018
Awr1page OTW2018
 

Similar to Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)maclean liu
 
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
Docker, Inc.
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxData
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Timothy McPhillips
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot Architectures
Thanigai Vellore
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Andrew Lamb
 
Hyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache SparkHyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache Spark
Databricks
 
Solr -
Solr - Solr -
Harvard University database
Harvard University databaseHarvard University database
Harvard University database
Md.Mojibul Hoque
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
Radu Tudoran
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
Julian Hyde
 
Prezo tooracleteam (2)
Prezo tooracleteam (2)Prezo tooracleteam (2)
Prezo tooracleteam (2)Sharma Podila
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Christian Tzolov
 
LarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - Introduction
LarKC
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software Suite
myGrid team
 
Net app course content
Net app course contentNet app course content
Net app course contentsyed m
 

Similar to Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF (20)

The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
 
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot Architectures
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
Hyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache SparkHyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache Spark
 
Solr -
Solr - Solr -
Solr -
 
Harvard University database
Harvard University databaseHarvard University database
Harvard University database
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 
Struts2 - 101
Struts2 - 101Struts2 - 101
Struts2 - 101
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Prezo tooracleteam (2)
Prezo tooracleteam (2)Prezo tooracleteam (2)
Prezo tooracleteam (2)
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...
 
LarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - Introduction
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software Suite
 
Net app course content
Net app course contentNet app course content
Net app course content
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 

Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

  • 1. SQL-on-Accumulo with Pivotal HAWQ and PXF Agenda • HAWQ & PXF Overview • Accumulo Connector - Usage • Accumulo Connector - Advanced Features • PXF API • Demo
  • 2. HAWQ is… A parallel SQL query engine on Hadoop
  • 3. PHD
  • 4. PHD
  • 5. PHD
  • 6. PHD
  • 7. PXF is... A fast extensible framework connecting HAWQ to a data store of choice that exposes a parallel API
  • 10. Usage CREATE EXTERNAL TABLE <table>(<col list>) LOCATION (‘pxf://rest_host:port/<data source>?<plugin options>’) FORMAT ‘<type>’(<params>) [SEGMENT REJECT LIMIT <n> [ROWS|PERCENT] LOG ERRORS INTO <err_t>] -- direct analytics (external) SELECT <…> FROM <table> WHERE <…> -- indirect analytics (internal) INSERT INTO <hawq table> SELECT <…> FROM <table> WHERE <…> Any SQL operation (joining, aggregates, sorting, etc) can be executed
  • 11. Accumulo Connector - Usage CREATE EXTERNAL TABLE <table>(<col list>) LOCATION (‘pxf://…/<accumulo table name>?profile=accumulo’) FORMAT ‘custom’(formatter=‘pxfwritable_import’) CREATE EXTERNAL TABLE t( recordkey text, “cf1:date” date, “cf1:price” double) LOCATION (‘pxf://…/instance:sales?profile=accumulo’) FORMAT ‘custom’(formatter=‘pxfwritable_import’) -- Example of a simple query SELECT “cf1:date”, max(“cf1:price”) FROM t GROUP BY “cf1:date”
  • 12. Accumulo Connector - Advanced Features Smart filtering with predicate pushdown Excluding irrelevant tablets and filtering on values on source according to HAWQ’s query WHERE clause. Error tables for logging badly formatted data and avoid aborting the query Specify desired error threshold. Query the error table after operation to see the rejected data and the related error. Lookup table for easy access to non textual qualifiers Define a qualifier lookup table that translates between Accumulo style naming and SQL style naming. Automatic Statistics for better join planning Run ANALYZE on a PXF-Accumulo table to update HAWQ’s optimizer with table and attribute level statistics from the Accumulo table. Mechanism for storing remote credentials The mapping between a HAWQ user credentials and Accumulo user credentials are entered once in HAWQ and automatically transferred to the Accumulo connector in runtime.
  • 13. Accumulo Connector - Advanced Features Visibility labels for enhanced security The Accumulo connector utilizes Accumulo’s built in cell-level security to ensure users are only able to view information for which they have been granted access. Custom Iterators for increased performance Predicate pushdown is implemented using stackable custom Iterators which increase comparison operation (<, <=, >, >=, ==, !=) performance in a query’s WHERE clause. Intelligent range filtering Specifying a comparison on a recordkey will modify the Accumulo Connector’s range, minimizing the amount of data scanned, resulting in faster scans. Automatic type detection Data types are detected automatically within the iterator, ensuring correct comparison operations are being utilized.
  • 14. PXF API • Fragmenter – returns a list of data source fragments and their location • Accessor – access a given list of fragments, read them and return records • Resolver – deserialize each record according to a given schema or technique Distributed execution threads Distributed database servers
  • 15. PXF API • AccumuloFragmenter returns a list of Accumulo tablets+locations for a given table • AccumuloAccessor access a given list of fragments, read them and return Accumulo records. Use filter pushdown when possible • AccumuloResolver convert each qualifier value into something that can be understood by HAWQ
  • 23. A Query With a Single Pushdown Filter
  • 24. A Query With a Single Pushdown Filter
  • 25. A Query With a Multiple Pushdown Filters
  • 26. A Query With a Multiple Pushdown Filters
  • 27. A Query With a Multiple Pushdown Filters
  • 29. Executing a Query as ‘foo’
  • 30. Define a Lookup Table in Accumulo
  • 31. Define a Lookup Table in HAWQ