RubiX
A caching framework for big data engines in the
cloud
Strata + Hadoop World
March 2017
Shubham Tagra (stagra@qubole.com)
Agenda
● Intro
● Why Caching?
● Path to Rubix
● Rubix Architecture
● Future of Rubix
● QnA
Built for Anyone who Uses Data
Analysts l Data Scientists l Data Engineers l Data Admins
Optimize performance,
cost, and scale through
automation, control and
orchestration of big data
workloads.
A Single Platform for Any Use Case
ETL & Reporting l Ad Hoc Queries l Machine Learning l
Streaming l Vertical Apps
Open Source Engines, Optimized for the Cloud
Native Integration with multiple cloud providers
Qubole operates at Cloud Scale
500 PB
Data Processed in the
Cloud Monthly
6
PB
80
PB
150
PB
500
PB
500 Nodes
Largest Spark Cluster in
the Cloud
2000
Clusters Started per
month
Why Caching
● Popularity of Cloud Stores like S3
+ Near-infinite capacity
+ Inexpensive
+ Ease of use
- Network Latencies
- Back-offs
Rubix ancestors
● File cache
Rubix ancestors
● File cache
○ Benefits: as much as 10x performance improvement
○ Problems
■ Huge warm-ups
■ Cache size
■ Tied to Presto
■ Required Presto scheduler changes
● Improve performance
● Abstracted from user
○ Easy of use
● Support Columnar formats
○ Improves speed
● Work well with autoscaling
○ Saves cost
● Ease of extension to clouds and engines
Requirements for new cache
Alternatives Considered: FUSE FileSystem
● Mount S3 paths on ec2
● OS for page caching, read ahead, etc
● Problems
○ Exclusive control over bucket
○ Data corruptions in external updates
○ Not production ready
Alternatives Considered: HTTP Caching
Alternatives Considered: HTTP Caching
● Worked fine with TXT data
● Problems
○ Columnar formats and Byte-Range based Varnish Keys
■ Poor hit ratio
■ Redundant copies
Tachyon/Alluxio
● More than just a caching system
● We required light weight system
● SQL first
Rubix
● Extendible to many engines
● Columnar format friendly
● Works well with autoscaling
● Share-able across engines/instances
Architecture
● Split ownership assignment system
● Data Caching System
● Plugins
Architecture
● Split ownership assignment
system
○ Used in master node during split computation
○ Calculates which node owns particular split of
file
○ Uses Consistent Hashing to work well with
Autoscaling
● Data Caching System
○ Used in worker nodes when data is read
○ Read from disk or remote as per the
metadata
○ Metadata stored in units of block (1MB each)
○ BookKeeper provides metadata for the block
○ Metadata too Checkpointed to local disk
Architecture
● Plugin
○ Provides two types of information
■ How to get the list of nodes in the
system
■ FileSystem for remote reads
○ E.g. presto plugin, hadoop1 plugin, hadoop2
plugin
Architecture
Plugins: Presto
● Presto provides tight control over scheduling local splits
● This ensured that splits will be always scheduled locally
● Worked well for our customers
Plugins: Hadoop
● Strict local scheduling was not possible with hadoop
● This meant lot of warm-ups and redundant copies of data
● Options:
○ Read directly from remote for non-local read
○ Figure out the correct owner and read from it
○ Implement Non-Local reads for Hadoop support
○ Learnings
■ 100% strict location based scheduling not possible in H2
Using Rubix with Presto
● Configure disk mount point
○ Assumes disks mounted on /media/ephemeral0, /media/ephemeral1, etc by default
● Start BookKeeper
● Place rubix jars in hive-hadoop2 plugin of Presto
● Configure Presto to use Rubix FileSystem for the cloud store
Using Rubix with Presto in Qubole
Using Rubix with Hadoop
● Configure disk mount point
○ Assumes disks mounted on /media/ephemeral0, /media/ephemeral1, etc by default
● Start BookKeeper
● Place rubix jars with hadoop libraries
● Configure Hadoop to use Rubix FileSystem for the cloud store
Extending to other Engines and Clouds
Performance gains
Future Work
● Extend to other clouds and engines
● Table aware objects in Rubix
● Caching policies for Hive Partitions
● Subquery caching
Questions?

RubiX

  • 1.
    RubiX A caching frameworkfor big data engines in the cloud Strata + Hadoop World March 2017 Shubham Tagra (stagra@qubole.com)
  • 2.
    Agenda ● Intro ● WhyCaching? ● Path to Rubix ● Rubix Architecture ● Future of Rubix ● QnA
  • 3.
    Built for Anyonewho Uses Data Analysts l Data Scientists l Data Engineers l Data Admins Optimize performance, cost, and scale through automation, control and orchestration of big data workloads. A Single Platform for Any Use Case ETL & Reporting l Ad Hoc Queries l Machine Learning l Streaming l Vertical Apps Open Source Engines, Optimized for the Cloud Native Integration with multiple cloud providers
  • 4.
    Qubole operates atCloud Scale 500 PB Data Processed in the Cloud Monthly 6 PB 80 PB 150 PB 500 PB 500 Nodes Largest Spark Cluster in the Cloud 2000 Clusters Started per month
  • 5.
    Why Caching ● Popularityof Cloud Stores like S3 + Near-infinite capacity + Inexpensive + Ease of use - Network Latencies - Back-offs
  • 6.
  • 7.
    Rubix ancestors ● Filecache ○ Benefits: as much as 10x performance improvement ○ Problems ■ Huge warm-ups ■ Cache size ■ Tied to Presto ■ Required Presto scheduler changes
  • 8.
    ● Improve performance ●Abstracted from user ○ Easy of use ● Support Columnar formats ○ Improves speed ● Work well with autoscaling ○ Saves cost ● Ease of extension to clouds and engines Requirements for new cache
  • 9.
    Alternatives Considered: FUSEFileSystem ● Mount S3 paths on ec2 ● OS for page caching, read ahead, etc ● Problems ○ Exclusive control over bucket ○ Data corruptions in external updates ○ Not production ready
  • 10.
  • 11.
    Alternatives Considered: HTTPCaching ● Worked fine with TXT data ● Problems ○ Columnar formats and Byte-Range based Varnish Keys ■ Poor hit ratio ■ Redundant copies
  • 12.
    Tachyon/Alluxio ● More thanjust a caching system ● We required light weight system ● SQL first
  • 13.
    Rubix ● Extendible tomany engines ● Columnar format friendly ● Works well with autoscaling ● Share-able across engines/instances
  • 14.
    Architecture ● Split ownershipassignment system ● Data Caching System ● Plugins
  • 15.
    Architecture ● Split ownershipassignment system ○ Used in master node during split computation ○ Calculates which node owns particular split of file ○ Uses Consistent Hashing to work well with Autoscaling
  • 16.
    ● Data CachingSystem ○ Used in worker nodes when data is read ○ Read from disk or remote as per the metadata ○ Metadata stored in units of block (1MB each) ○ BookKeeper provides metadata for the block ○ Metadata too Checkpointed to local disk Architecture
  • 17.
    ● Plugin ○ Providestwo types of information ■ How to get the list of nodes in the system ■ FileSystem for remote reads ○ E.g. presto plugin, hadoop1 plugin, hadoop2 plugin Architecture
  • 18.
    Plugins: Presto ● Prestoprovides tight control over scheduling local splits ● This ensured that splits will be always scheduled locally ● Worked well for our customers
  • 19.
    Plugins: Hadoop ● Strictlocal scheduling was not possible with hadoop ● This meant lot of warm-ups and redundant copies of data ● Options: ○ Read directly from remote for non-local read ○ Figure out the correct owner and read from it ○ Implement Non-Local reads for Hadoop support ○ Learnings ■ 100% strict location based scheduling not possible in H2
  • 20.
    Using Rubix withPresto ● Configure disk mount point ○ Assumes disks mounted on /media/ephemeral0, /media/ephemeral1, etc by default ● Start BookKeeper ● Place rubix jars in hive-hadoop2 plugin of Presto ● Configure Presto to use Rubix FileSystem for the cloud store
  • 21.
    Using Rubix withPresto in Qubole
  • 22.
    Using Rubix withHadoop ● Configure disk mount point ○ Assumes disks mounted on /media/ephemeral0, /media/ephemeral1, etc by default ● Start BookKeeper ● Place rubix jars with hadoop libraries ● Configure Hadoop to use Rubix FileSystem for the cloud store
  • 23.
    Extending to otherEngines and Clouds
  • 24.
  • 25.
    Future Work ● Extendto other clouds and engines ● Table aware objects in Rubix ● Caching policies for Hive Partitions ● Subquery caching
  • 26.