Jaws - Data Warehouse with Spark SQL by Ema Orhian

Ema Orhian
@emaorhian
Jaws - Data Warehouse with
Spark SQL

2
• Big Data analytics / Machine Learning
• 4+ years exp with Hadoop ecosystem
• 2 years exp with Spark
About me
http://bigdataresearch.io/
• Co-founder of Big Data Research Group
• Provides open source solutions around Big Data
analytics
http://atigeo.com/

Agenda
• jaws-spark-sql-rest (Jaws) intro
• Main features
• Architecture
• Scaling
• Resource manager
• Working with Tachyon
• Working with Parquet files
• Configure Spark Sql context
• Demo 3

4
Shared
Spark Sql
Context
Concurrent
queries run
Query history
Page results
Query editor

Jaws
• Highly scalable and resilient data warehouse explorer
• Restful alternative to Spark SQL JDBC and not only …
• Support for Spark 0.9.1/Shark thru Spark 1.5
• Support for hive/MR
https://github.com/atigeo/jaws-spark-sql-rest
5

Main features
• Submit queries concurrently and asynchronously
• Provides persisted logs, query history, results with paging
• Pluggable persistent layer (Cassandra/HDFS)
• Supports load balancing with query cancelation
• Provides a metadata browser
• In-memory Parquet warehouse with Tachyon
• Configuration file to fine tune Spark context
• Pluggable UI 6

Scaling
8
•Standalone mode
•Mesos
•YARN
Fine grained mode
Coarse grained mode

Results persistence
• Queries with limited number of results:
‣ Cassandra
‣ HDFS
• Queries with unlimited number of results:
‣ HDFS
‣ Tachyon
11

Working with Tachyon
• Persists unlimited results in Tachyon
• Registers tables over Parquet files from Tachyon
12
Tachyon benefits:
★ in memory storage system
★ share data between applications at a memory
speed

Working with Parquet files
• Register tables on top of parquet files
13
Parquet
★ columnar format
★ nested data structures
★ supports schema evolution
★ efficient compression
• Files stored on HDFS or Tachyon
• MetaInfo about table stored in Cassandra (feature before Spark
1.3)

Configuring Jaws
14
• Cassandra
• HDFS
• Spray
• Application
• Spark
sparkConfiguration {
spark-master=“spark://devbox.local:7077”
/ “mesos://devbox.local:5050”
/ yarn-client
spark-mesos-coarse=false / true
spark-cores-max=100
spark-executor-instances=10
}

Jaws - Data Warehouse with Spark SQL by Ema Orhian

More Related Content

What's hot

Viewers also liked

Similar to Jaws - Data Warehouse with Spark SQL by Ema Orhian

More from Spark Summit

Recently uploaded

Jaws - Data Warehouse with Spark SQL by Ema Orhian