SlideShare a Scribd company logo
1 of 27
Download to read offline
1 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Apache	Spark	– Apache	HBase Connector
Feature	Rich	and	Efficient	Access	to	HBase
through	Spark	SQL
Weiqing Yang	
Mingjie Tang	
October,	2017
2 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
About	Authors
à Weiqing Yang
• Contribute	to	Apache	Spark,	Apache	Hadoop,	Apache	HBase,	Apache	Ambari
• Software	Engineer	at	Hortonworks
à Mingjie Tang
• SparkSQL,	Spark	Mllib,	Spark	Streaming,	Data	Mining,	Machine	Learning
• Software	Engineer	at	Hortonworks
à …	All	Other	SHC	Contributors
3 ©	Hortonworks	 Inc.	2011	– 2016.	All	Rights	Reserved
Agenda
Motivation
Overview
Architecture	&	Implementation
Usage	&	Demo
4 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Motivation
à Limited	Spark	Support	in	HBase Upstream
– RDD	level
– But	Spark	Is	Moving	to	DataFrame/Dataset
à Existing	Connectors	in	DataFrame Level
– Complicated	Design
• Embedding	Optimization	Plan	inside	Catalyst	Engine
• Stability	Impact	with	Coprocessor
• Serialized	RDD	Lineage	to	HBase
– Heavy	Maintenance	Overhead
5 ©	Hortonworks	 Inc.	2011	– 2016.	All	Rights	Reserved
Overview
6 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Apache	Spark– Apache	HBase Connector	(SHC)
à Combine	Spark	and	HBase
– Spark	Catalyst	Engine	for	Query	Plan	and	Optimization
– HBase as	Fast	Access	KV	Store
– Implement	Standard	External	Data	Source	with	Build-in	Filter,	Maintain	Easily
à Full	Fledged	DataFrameSupport
– Spark	SQL
– Integrated	Language	Query
à High	Performance
– Partition	Pruning,	Data	Locality,	Column	Pruning,	Predicate	Pushdown
– Use	Spark	UnhandledFilters API
– Cache	Spark	HBase Connections
7 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Data	Coder	&	Data	Schema
à Support	Different	Data	Coders
– PrimitiveType:	Native	Support	Java	Primitive	Types
– Avro:	Native	Support	Avro	Encoding/Decoding
– Phoenix:	Phoenix	Encoding/Decoding
– Plug-In	Data	Coder
– Can	Run	on	the	Top	of	Existing	HBase Tables
à Support	Composite	Key
– def cat	=	s"""{
|"table":{"namespace":"default",	"name":"shcExampleTable",	"tableCoder":”Phoenix"},
|"rowkey":"key1:key2",
|"columns":{
|"col00":{"cf":"rowkey",	"col":"key1",	"type":"string”},
|"col01":{"cf":"rowkey",	"col":"key2",	"type":"int"},
…
...
8 ©	Hortonworks	 Inc.	2011	– 2016.	All	Rights	Reserved
Architecture	&	Implementation
9 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Architecture
…...
Driver
Executor Executor Executor
Region	
Server
Region	
Server
Region	
Server…...
Spark
HBase
Picture	1.	SHC	architecture
Host	1
10 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Architecture
…...
Driver
Executor Executor Executor
Region	
Server
Region	
Server
Region	
Server…...
Picture	1.	SHC	architecture
Task
Query
Partition
Filters,	Required	
Columns
RS	start/end	
point
sqlContext.sql("select	
count(col1)	from	table1	
where	key	<	'row050'")
PP P
Scans
BulkGets
11 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Implementation
…...
Driver
Executor Executor Executor
Region	
Server
Region	
Server
Region	
Server…...
Picture	1.	SHC	architecture
Task
Query
Partition
Filters,	Required	
ColumnsPartition	Pruning:	Task	Only
Performed	in	Region	Server	
Holding	Requested	Data
PP P
Scans
BulkGets Filters	->	Multiple	Scan	Ranges		
															∩
(Start	point,	end	point)
RS	start/end	
point
12 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Implementation
…...
Driver
Executor Executor Executor
Region	
Server
Region	
Server
Region	
Server…...
Picture	1.	SHC	architecture
Task
Query
Partition
Filters,	Required	
Columns
RS	start/end	
point
Data	Locality:	Move	
Computation	to	Data.	
PP P
Scans
BulkGets
RDD	Partition	has	preferred	location:
getPreferredLocations(partition) {
return RS.hostName}
13 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Implementation
…...
Driver
Executor Executor Executor
Region	
Server
Region	
Server
Region	
Server…...
Picture	1.	SHC	architecture
Task
Query
Partition
Filters,	Required	
Columns
RS	start/end	
point
Column	Pruning:	Required	
Column
Predicate	Pushdown:	HBase
built-in	Filters
PP P
Filters,	Required	
Columns
Filters,	Required	
Columns
Scans
BulkGets
Filters,	Required	
Columns
14 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Implementation
…...
Driver
Executor Executor Executor
Region	
Server
Region	
Server
Region	
Server…...
Picture	1.	SHC	architecture
Task
Query
Partition
Filters,	Required	
Columns
RS	start/end	
point
Scan	and	BulkGets:	Grouped	
by	region	server.	
PP P
Scans
BulkGets
WHERE	column	>	x	and	
column	<	y for	scan
and WHERE	column	=	
x for	get.
15 ©	Hortonworks	 Inc.	2011	– 2016.	All	Rights	Reserved
Usage	&	Demo
16 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
How	to	Use	SHC?
à Github
– https://github.com/hortonworks-spark/shc	 	
à SHC	Examples
– https://github.com/hortonworks-spark/shc/tree/master/examples
à Apache	HBase Jira
– https://issues.apache.org/jira/browse/HBASE-14789
17 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Demo
à Interactive	Jobs	through	Spark Shell
à Batch	Jobs
18 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Acknowledgement
à HBase Community	&	Spark	Community
à All	SHC Contributors,	Zhan	Zhang
19 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Reference
à Hortonworks	Public	Repo
– http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/
à Apache	Spark
– http://spark.apache.org/
à Apache	HBase
– https://hbase.apache.org/
20 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Thanks
Q	&	A
Emails:	
wyang@hortonworks.com
21 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
BACKUP
22 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Kerberos	Cluster
à Kerberos	Ticket
– kinit	-kt foo.keytab foouser or		Principle/Keytab
à Long	Running	Service
– --principal,	--keytab
à Multiple	Secure	HBase Clusters
– Spark	only	Supports	Single	Secure	HBase Cluster
– Use	SHC	Credential	Manager
– Refer	LRJobAccessing2Clusters	Example	in	github
23 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Usage
Define	the	catalog	for	the	schema	mapping:
24 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Usage
à Prepare	the	data	and	populate	the	HBase table
val data	=	(0	to	255).map	{	i =>	 HBaseRecord(i,	“extra”)}
sc.parallelize(data).toDF.write.options(
Map(HBaseTableCatalog.tableCatalog ->	catalog,	HBaseTableCatalog.newTable->	“5”))
.format(“org.apache.spark.sql.execution.datasources.hbase”)
.save()
25 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Usage
à Load	the	DataFrame
def withCatalog(cat:	String):	DataFrame =	{
sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog->cat))
.format(“org.apache.spark.sql.execution.datasources.hbase”)
.load()
}
val df =	withCatalog(catalog)
26 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Usage
à Query
Language	integrated	query:
val s	=	df.filter((($"col0ʺ	<=	“çrow050ʺ	&&	$”col0”	>	“row040”)	||
$”col0ʺ	===	“row005”	&& ($”col4ʺ	===	1	|| $”col4ʺ	===	42))
.select(“col0”,	“col1”,	“col4”)
SQL:
val s	=	df.filter((($”col0ʺ	<=	“row050ʺ	&&	$”col0”	>	“row040”)
df.registerTempTable(“table”)
sqlContext.sql(“select	count(col1)	from	table”).show
27 ©	Hortonworks	 Inc.	2011	– 2017.	All	Rights	Reserved
Usage
à Work	with	different	data	sources
//	Part	1:	write	data	into	Hive	table	and	read	data	from	it
val df1	=	sql("SELECT	*	FROM	shcHiveTable")
//	Part	2:	read data from Hbase table
val df2	=	withCatalog(cat)
//	Part	3:	join the two dataframes
val s1	=	df1.filter($"key"	<=	"40").select("key",	"col1")
val s2	=	df2.filter($"key"	<=	"20"	&&	$"key"	>=	"1").select("key",	"col2")
val result =		s1.join(s2,	Seq("key"))
result.show()

More Related Content

What's hot

Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and FlinkBryan Bende
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Databricks
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Understanding Storage I/O Under Load
Understanding Storage I/O Under LoadUnderstanding Storage I/O Under Load
Understanding Storage I/O Under LoadScyllaDB
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondBowen Li
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLPGConf APAC
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...DataWorks Summit
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy Docker, Inc.
 

What's hot (20)

Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Understanding Storage I/O Under Load
Understanding Storage I/O Under LoadUnderstanding Storage I/O Under Load
Understanding Storage I/O Under Load
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQL
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 

Viewers also liked

Building the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time AnalyticsBuilding the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time AnalyticsSingleStore
 
빅데이터윈윈 컨퍼런스_데이터시각화자료
빅데이터윈윈 컨퍼런스_데이터시각화자료빅데이터윈윈 컨퍼런스_데이터시각화자료
빅데이터윈윈 컨퍼런스_데이터시각화자료ABRC_DATA
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Cloudera, Inc.
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkSingleStore
 
CWIN17 Frankfurt / Cloudera
CWIN17 Frankfurt / ClouderaCWIN17 Frankfurt / Cloudera
CWIN17 Frankfurt / ClouderaCapgemini
 
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...Cloudera, Inc.
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
Spark meetup - Zoomdata Streaming
Spark meetup  - Zoomdata StreamingSpark meetup  - Zoomdata Streaming
Spark meetup - Zoomdata StreamingZoomdata
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...confluent
 
Cloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for BusinessCloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for BusinessData IQ Argentina
 
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Cloudera, Inc.
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data ArchitectureWei-Chiu Chuang
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleHortonworks
 

Viewers also liked (20)

Building the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time AnalyticsBuilding the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time Analytics
 
빅데이터윈윈 컨퍼런스_데이터시각화자료
빅데이터윈윈 컨퍼런스_데이터시각화자료빅데이터윈윈 컨퍼런스_데이터시각화자료
빅데이터윈윈 컨퍼런스_데이터시각화자료
 
Softnix Security Data Lake
Softnix Security Data Lake Softnix Security Data Lake
Softnix Security Data Lake
 
Softnix Messaging Server
Softnix Messaging ServerSoftnix Messaging Server
Softnix Messaging Server
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Zoomdata
ZoomdataZoomdata
Zoomdata
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
 
Ibm watson
Ibm watsonIbm watson
Ibm watson
 
CWIN17 Frankfurt / Cloudera
CWIN17 Frankfurt / ClouderaCWIN17 Frankfurt / Cloudera
CWIN17 Frankfurt / Cloudera
 
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Spark meetup - Zoomdata Streaming
Spark meetup  - Zoomdata StreamingSpark meetup  - Zoomdata Streaming
Spark meetup - Zoomdata Streaming
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
 
Cloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for BusinessCloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for Business
 
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets

 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data Architecture
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
 

Similar to Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBase through Spark SQL with Weiqing Yang

HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData DayJohn Park
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopTimothy Spann
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...VMware Tanzu
 
Hive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big DataHive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big DataYifeng Jiang
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon
 
Analytics Modernization: Configuring SAS® Grid Manager for Hadoop
Analytics Modernization: Configuring SAS® Grid Manager for HadoopAnalytics Modernization: Configuring SAS® Grid Manager for Hadoop
Analytics Modernization: Configuring SAS® Grid Manager for HadoopHortonworks
 
A glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataA glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataSaurav Kumar Sinha
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...Impetus Technologies
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerJosh Elser
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...Big Data Spain
 
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanApache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanAnkit Singhal
 
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Timothy Spann
 
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan GatesApache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan GatesBig Data Spain
 

Similar to Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBase through Spark SQL with Weiqing Yang (20)

HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
 
Hive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big DataHive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big Data
 
Spark + HBase
Spark + HBase Spark + HBase
Spark + HBase
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
Analytics Modernization: Configuring SAS® Grid Manager for Hadoop
Analytics Modernization: Configuring SAS® Grid Manager for HadoopAnalytics Modernization: Configuring SAS® Grid Manager for Hadoop
Analytics Modernization: Configuring SAS® Grid Manager for Hadoop
 
A glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataA glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big Data
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanApache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
 
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
 
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan GatesApache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
 

Recently uploaded

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...HyderabadDolls
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...vershagrag
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 

Recently uploaded (20)

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBase through Spark SQL with Weiqing Yang

  • 1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache Spark – Apache HBase Connector Feature Rich and Efficient Access to HBase through Spark SQL Weiqing Yang Mingjie Tang October, 2017
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved About Authors à Weiqing Yang • Contribute to Apache Spark, Apache Hadoop, Apache HBase, Apache Ambari • Software Engineer at Hortonworks à Mingjie Tang • SparkSQL, Spark Mllib, Spark Streaming, Data Mining, Machine Learning • Software Engineer at Hortonworks à … All Other SHC Contributors
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Motivation Overview Architecture & Implementation Usage & Demo
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Motivation à Limited Spark Support in HBase Upstream – RDD level – But Spark Is Moving to DataFrame/Dataset à Existing Connectors in DataFrame Level – Complicated Design • Embedding Optimization Plan inside Catalyst Engine • Stability Impact with Coprocessor • Serialized RDD Lineage to HBase – Heavy Maintenance Overhead
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Overview
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache Spark– Apache HBase Connector (SHC) à Combine Spark and HBase – Spark Catalyst Engine for Query Plan and Optimization – HBase as Fast Access KV Store – Implement Standard External Data Source with Build-in Filter, Maintain Easily à Full Fledged DataFrameSupport – Spark SQL – Integrated Language Query à High Performance – Partition Pruning, Data Locality, Column Pruning, Predicate Pushdown – Use Spark UnhandledFilters API – Cache Spark HBase Connections
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Data Coder & Data Schema à Support Different Data Coders – PrimitiveType: Native Support Java Primitive Types – Avro: Native Support Avro Encoding/Decoding – Phoenix: Phoenix Encoding/Decoding – Plug-In Data Coder – Can Run on the Top of Existing HBase Tables à Support Composite Key – def cat = s"""{ |"table":{"namespace":"default", "name":"shcExampleTable", "tableCoder":”Phoenix"}, |"rowkey":"key1:key2", |"columns":{ |"col00":{"cf":"rowkey", "col":"key1", "type":"string”}, |"col01":{"cf":"rowkey", "col":"key2", "type":"int"}, … ...
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Architecture & Implementation
  • 9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Architecture …... Driver Executor Executor Executor Region Server Region Server Region Server…... Spark HBase Picture 1. SHC architecture Host 1
  • 10. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Architecture …... Driver Executor Executor Executor Region Server Region Server Region Server…... Picture 1. SHC architecture Task Query Partition Filters, Required Columns RS start/end point sqlContext.sql("select count(col1) from table1 where key < 'row050'") PP P Scans BulkGets
  • 11. 11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Implementation …... Driver Executor Executor Executor Region Server Region Server Region Server…... Picture 1. SHC architecture Task Query Partition Filters, Required ColumnsPartition Pruning: Task Only Performed in Region Server Holding Requested Data PP P Scans BulkGets Filters -> Multiple Scan Ranges ∩ (Start point, end point) RS start/end point
  • 12. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Implementation …... Driver Executor Executor Executor Region Server Region Server Region Server…... Picture 1. SHC architecture Task Query Partition Filters, Required Columns RS start/end point Data Locality: Move Computation to Data. PP P Scans BulkGets RDD Partition has preferred location: getPreferredLocations(partition) { return RS.hostName}
  • 13. 13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Implementation …... Driver Executor Executor Executor Region Server Region Server Region Server…... Picture 1. SHC architecture Task Query Partition Filters, Required Columns RS start/end point Column Pruning: Required Column Predicate Pushdown: HBase built-in Filters PP P Filters, Required Columns Filters, Required Columns Scans BulkGets Filters, Required Columns
  • 14. 14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Implementation …... Driver Executor Executor Executor Region Server Region Server Region Server…... Picture 1. SHC architecture Task Query Partition Filters, Required Columns RS start/end point Scan and BulkGets: Grouped by region server. PP P Scans BulkGets WHERE column > x and column < y for scan and WHERE column = x for get.
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Usage & Demo
  • 16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved How to Use SHC? à Github – https://github.com/hortonworks-spark/shc à SHC Examples – https://github.com/hortonworks-spark/shc/tree/master/examples à Apache HBase Jira – https://issues.apache.org/jira/browse/HBASE-14789
  • 17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Demo à Interactive Jobs through Spark Shell à Batch Jobs
  • 18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Acknowledgement à HBase Community & Spark Community à All SHC Contributors, Zhan Zhang
  • 19. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Reference à Hortonworks Public Repo – http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/ à Apache Spark – http://spark.apache.org/ à Apache HBase – https://hbase.apache.org/
  • 20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Thanks Q & A Emails: wyang@hortonworks.com
  • 21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved BACKUP
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Kerberos Cluster à Kerberos Ticket – kinit -kt foo.keytab foouser or Principle/Keytab à Long Running Service – --principal, --keytab à Multiple Secure HBase Clusters – Spark only Supports Single Secure HBase Cluster – Use SHC Credential Manager – Refer LRJobAccessing2Clusters Example in github
  • 23. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Usage Define the catalog for the schema mapping:
  • 24. 24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Usage à Prepare the data and populate the HBase table val data = (0 to 255).map { i => HBaseRecord(i, “extra”)} sc.parallelize(data).toDF.write.options( Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable-> “5”)) .format(“org.apache.spark.sql.execution.datasources.hbase”) .save()
  • 25. 25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Usage à Load the DataFrame def withCatalog(cat: String): DataFrame = { sqlContext .read .options(Map(HBaseTableCatalog.tableCatalog->cat)) .format(“org.apache.spark.sql.execution.datasources.hbase”) .load() } val df = withCatalog(catalog)
  • 26. 26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Usage à Query Language integrated query: val s = df.filter((($"col0ʺ <= “çrow050ʺ && $”col0” > “row040”) || $”col0ʺ === “row005” && ($”col4ʺ === 1 || $”col4ʺ === 42)) .select(“col0”, “col1”, “col4”) SQL: val s = df.filter((($”col0ʺ <= “row050ʺ && $”col0” > “row040”) df.registerTempTable(“table”) sqlContext.sql(“select count(col1) from table”).show
  • 27. 27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Usage à Work with different data sources // Part 1: write data into Hive table and read data from it val df1 = sql("SELECT * FROM shcHiveTable") // Part 2: read data from Hbase table val df2 = withCatalog(cat) // Part 3: join the two dataframes val s1 = df1.filter($"key" <= "40").select("key", "col1") val s2 = df2.filter($"key" <= "20" && $"key" >= "1").select("key", "col2") val result = s1.join(s2, Seq("key")) result.show()