SlideShare a Scribd company logo
HBase with Spark POC
Demo
21st Dec, 2016
Chetan Khatri - Global Lead of Data Science,
Accionlabs Inc.
Current Environment
● Ubuntu 16.04
● Apache Hadoop 2.7.3
● Apache HBase 1.2.4
● Apache Spark 2.0.2
● Scala 2.12
● SBT(Scala Build Tool) 0.13.13
Note: Everything is in Pseudo Distributed mode
Action
Read Table
Write to Table
Approach
1. Created Table at HBase
2. Generated Dummy Data in Spark as a RDD and Saved to HBase
3. Reading HBase Table as a HadoopRDD in Spark
4. Traversing HadoopRDD Elements and Printing it out.
5. Entire Spark Transformation coded in Scala
6. Generated Executable Uber(“Super”) Jar with SBT
7. Submitted Spark job using spark-submit
Source Code - SparkHbaseJob.scala
Deployment
1. Add SBT Assembly Plugin
a. addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")
b. At Plugins.sbt file under Scala Project
2. Build Uber(“Super”) Jar
a. Sbt clean assembly
3. Submit Spark Job
a. bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob
/home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar
Demo
TODO
1. Read HBase Table and Store HBaseRDD to Apache Hive with Spark.
a. Incremental load from HBase to Hive
b. SparkSQL / SparkRDD and Apache Drill Performance Benchmark for Reading
HBase.
2. Save Summarized Spark Aggregation to PostgreSQL
Assumed : High Level Architecture
Event Queuing Kafka Batch
Incremental load /
Transformations
Reporting Tool
(Dashboard)
Processed DW - Ad
hoc Analysis / 1st
level aggregation
Data Ingestion
Pipeline
RestfulAPI’s
Summarized DB - KPI
Reporting
Reference
Github: https://github.com/chetkhatri/SparkHbasePOC

More Related Content

What's hot

Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
Rafael Roman Otero
 
Big data workloads using Apache Sparkon HDInsight
Big data workloads using Apache Sparkon HDInsightBig data workloads using Apache Sparkon HDInsight
Big data workloads using Apache Sparkon HDInsight
Nilesh Gule
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
DataWorks Summit
 
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
Yoshiyasu SAEKI
 
20150627 bigdatala
20150627 bigdatala20150627 bigdatala
20150627 bigdatala
gethue
 
20161215 python pandas-spark四方山話
20161215 python pandas-spark四方山話20161215 python pandas-spark四方山話
20161215 python pandas-spark四方山話
Ryuji Tamagawa
 
Build, Deploy and Manage Your Releases Like a Boss
Build, Deploy and Manage Your Releases Like a BossBuild, Deploy and Manage Your Releases Like a Boss
Build, Deploy and Manage Your Releases Like a Boss
GlobalLogic Ukraine
 
SF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big DataSF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big Data
gethue
 
Haskell Tooling Whirlwind
Haskell Tooling WhirlwindHaskell Tooling Whirlwind
Haskell Tooling Whirlwind
Steven Shaw
 
Azure cli2.0
Azure cli2.0Azure cli2.0
Azure cli2.0
Mohit Chhabra
 
COOKPADでのHadoop利用
COOKPADでのHadoop利用COOKPADでのHadoop利用
COOKPADでのHadoop利用
Tatsuya Sasaki
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Databricks
 

What's hot (12)

Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
 
Big data workloads using Apache Sparkon HDInsight
Big data workloads using Apache Sparkon HDInsightBig data workloads using Apache Sparkon HDInsight
Big data workloads using Apache Sparkon HDInsight
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
 
20150627 bigdatala
20150627 bigdatala20150627 bigdatala
20150627 bigdatala
 
20161215 python pandas-spark四方山話
20161215 python pandas-spark四方山話20161215 python pandas-spark四方山話
20161215 python pandas-spark四方山話
 
Build, Deploy and Manage Your Releases Like a Boss
Build, Deploy and Manage Your Releases Like a BossBuild, Deploy and Manage Your Releases Like a Boss
Build, Deploy and Manage Your Releases Like a Boss
 
SF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big DataSF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big Data
 
Haskell Tooling Whirlwind
Haskell Tooling WhirlwindHaskell Tooling Whirlwind
Haskell Tooling Whirlwind
 
Azure cli2.0
Azure cli2.0Azure cli2.0
Azure cli2.0
 
COOKPADでのHadoop利用
COOKPADでのHadoop利用COOKPADでのHadoop利用
COOKPADでのHadoop利用
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
 

Similar to HBase with Apache Spark POC Demo

DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
Andrey Kudryavtsev
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
 
Apache spark installation [autosaved]
Apache spark installation [autosaved]Apache spark installation [autosaved]
Apache spark installation [autosaved]
Shweta Patnaik
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Olalekan Fuad Elesin
 
Apache Cassandra and Apche Spark
Apache Cassandra and Apche SparkApache Cassandra and Apche Spark
Apache Cassandra and Apche Spark
Alex Thompson
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Benjamin Bengfort
 
Spark Working Environment in Windows OS
Spark Working Environment in Windows OSSpark Working Environment in Windows OS
Spark Working Environment in Windows OS
Universiti Technologi Malaysia (UTM)
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark Ecosystem
Bojan Babic
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)
Søren Lund
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Dharmjit Singh
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
 
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
TriHUG talk on Spark and Shark
TriHUG talk on Spark and SharkTriHUG talk on Spark and Shark
TriHUG talk on Spark and Shark
trihug
 
Up and running with pyspark
Up and running with pysparkUp and running with pyspark
Up and running with pyspark
Krishna Sangeeth KS
 
How to start using Scala
How to start using ScalaHow to start using Scala
How to start using Scala
Ngoc Dao
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 preview
Mario Cartia
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Alex Zeltov
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
Frank Schroeter
 
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
Spark Summit
 

Similar to HBase with Apache Spark POC Demo (20)

DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Apache spark installation [autosaved]
Apache spark installation [autosaved]Apache spark installation [autosaved]
Apache spark installation [autosaved]
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
 
Apache Cassandra and Apche Spark
Apache Cassandra and Apche SparkApache Cassandra and Apche Spark
Apache Cassandra and Apche Spark
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Spark Working Environment in Windows OS
Spark Working Environment in Windows OSSpark Working Environment in Windows OS
Spark Working Environment in Windows OS
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark Ecosystem
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
 
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
 
TriHUG talk on Spark and Shark
TriHUG talk on Spark and SharkTriHUG talk on Spark and Shark
TriHUG talk on Spark and Shark
 
Up and running with pyspark
Up and running with pysparkUp and running with pyspark
Up and running with pyspark
 
How to start using Scala
How to start using ScalaHow to start using Scala
How to start using Scala
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 preview
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
 

More from Chetan Khatri

Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Chetan Khatri
 
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Chetan Khatri
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
Chetan Khatri
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
Chetan Khatri
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
Chetan Khatri
 
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_productionPyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
Chetan Khatri
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Chetan Khatri
 
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
Chetan Khatri
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
Chetan Khatri
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
Chetan Khatri
 
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
Chetan Khatri
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
Chetan Khatri
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
Chetan Khatri
 
Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...
Chetan Khatri
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
Chetan Khatri
 
Introduction to Computer Science
Introduction to Computer ScienceIntroduction to Computer Science
Introduction to Computer Science
Chetan Khatri
 
An introduction to Git with Atlassian Suite
An introduction to Git with Atlassian SuiteAn introduction to Git with Atlassian Suite
An introduction to Git with Atlassian Suite
Chetan Khatri
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetan
Chetan Khatri
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabs
Chetan Khatri
 
Voltage measurement using arduino
Voltage measurement using arduinoVoltage measurement using arduino
Voltage measurement using arduino
Chetan Khatri
 

More from Chetan Khatri (20)

Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
 
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_productionPyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
 
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
 
Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
 
Introduction to Computer Science
Introduction to Computer ScienceIntroduction to Computer Science
Introduction to Computer Science
 
An introduction to Git with Atlassian Suite
An introduction to Git with Atlassian SuiteAn introduction to Git with Atlassian Suite
An introduction to Git with Atlassian Suite
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetan
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabs
 
Voltage measurement using arduino
Voltage measurement using arduinoVoltage measurement using arduino
Voltage measurement using arduino
 

Recently uploaded

Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
ginni singh$A17
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
Kanchana Weerasinghe
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
revolutionary575
 
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
saadkhan1485265
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
tanupasswan6
 
Data Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learnersData Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learners
mohamed Ibrahim
 
CHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptxCHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptx
girewiy968
 
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
bhupeshkumar0889
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
sharonblush
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
dizzycaye
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
huseindihon
 
the unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithmthe unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithm
huseindihon
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
AnujaGaikwad28
 
ch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ssch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ss
MinThetLwin1
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
kinni singh$A17
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
fatima shekh$A17
 
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
vrvipin164
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
huseindihon
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
huseindihon
 

Recently uploaded (20)

Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
 
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
 
Data Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learnersData Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learners
 
CHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptxCHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptx
 
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
 
the unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithmthe unexpected potential of Dijkstra's Algorithm
the unexpected potential of Dijkstra's Algorithm
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
 
ch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ssch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ss
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
 
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
 

HBase with Apache Spark POC Demo

  • 1. HBase with Spark POC Demo 21st Dec, 2016 Chetan Khatri - Global Lead of Data Science, Accionlabs Inc.
  • 2. Current Environment ● Ubuntu 16.04 ● Apache Hadoop 2.7.3 ● Apache HBase 1.2.4 ● Apache Spark 2.0.2 ● Scala 2.12 ● SBT(Scala Build Tool) 0.13.13 Note: Everything is in Pseudo Distributed mode
  • 4. Approach 1. Created Table at HBase 2. Generated Dummy Data in Spark as a RDD and Saved to HBase 3. Reading HBase Table as a HadoopRDD in Spark 4. Traversing HadoopRDD Elements and Printing it out. 5. Entire Spark Transformation coded in Scala 6. Generated Executable Uber(“Super”) Jar with SBT 7. Submitted Spark job using spark-submit
  • 5. Source Code - SparkHbaseJob.scala
  • 6. Deployment 1. Add SBT Assembly Plugin a. addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3") b. At Plugins.sbt file under Scala Project 2. Build Uber(“Super”) Jar a. Sbt clean assembly 3. Submit Spark Job a. bin/spark-submit --class hbase.spark.chetan.com.SparkHbaseJob /home/chetan/hbase-spark/SparkMSAPoc-assembly-1.0.jar
  • 8. TODO 1. Read HBase Table and Store HBaseRDD to Apache Hive with Spark. a. Incremental load from HBase to Hive b. SparkSQL / SparkRDD and Apache Drill Performance Benchmark for Reading HBase. 2. Save Summarized Spark Aggregation to PostgreSQL
  • 9. Assumed : High Level Architecture Event Queuing Kafka Batch Incremental load / Transformations Reporting Tool (Dashboard) Processed DW - Ad hoc Analysis / 1st level aggregation Data Ingestion Pipeline RestfulAPI’s Summarized DB - KPI Reporting