OBJECT STORAGE FOR BIG DATA
Mengmeng Liu
Senior Manager, Big Fast Data
Technology
CEPH DAY SAN JOSE
Kyle Bader
Senior Solution Architect
2
THE STAKEHOLDERS
RED HAT
BUSINESS
STAKEHOLDER
DATA ENGINEER
IT OPS
DATA SCIENTIST
DATA ANALYST
INFRASTRUCTURE
ENGINEER
3
WHAT THEY WANT
RED HAT
● Support diverse ecosystem of data analysis tools
● Independent scaling of compute and storage
● Rapid provisioning of data labs
● Public Cloud / Private Cloud architectural parity
● Control costs, predictable costs
4
THE ELEPHANT IN THE ROOM
RED HAT
Batch analytics with Map-Reduce on HDFS
Exit to other
Data warehousesIngest
Persistent Data
5
PLURALITY OF ANALYTICS TOOLS
RED HAT
Single source of truth
Exit to other
Data warehousesIngest
Persistent Data
BIG DATA LINEAGE
Multiple data copies from each stage of data transformation, potentially in different places
Ingest
The more analytical processing clusters you have, the harder it is to know
which data is where.
7
● Ad-hoc provisioning of transient analytics clusters
● Scale compute up for workload
● Terminate compute post workload to reclaim resources
● Allow non-analytic workloads to make use of excess compute
● Break linear relationship between compute and storage scaling
DISAGGREGATION HIGHLIGHTS
RED HAT
8
● Striped Erasure Coding – Overhead 1.5x vs replication
● Strongly consistent
● Rich access control semantics
● Bucket versioning
● Active / active multi-site replication (asynchronous)
CEPH OBJECT STORE HIGHLIGHTS
RED HAT
ARCHITECTURE—ELASTIC DATA LAKE
Disaggregating compute resources from an object storage solution enables the most flexibility
• Ingest from multiple sources
using S3A API
• Analytics operate directly
without expensive, time-
consuming ETL or
replication
• Analytics performed using
batch or interactive tools
• Exploratory analysis support
by ephemeral clusters
ANALYTICS (IN SITU) EPHEMERAL
CLUSTERS
INGEST
APPS
BATCH
ANALYTICS
INTERACTIVE
FRAMEWORKS
S3A Compatible API
Object Storage Solution
S3 /
NFS
S3A
Data Laboratories
Batch
HDFS
ELASTIC DATA LAKE
STREAMS S3 /
NFS
Interactive
Query Engine
Resource Mgmt
WHAT WE’RE TESTING
Measuring key use cases from our Center of Excellence from query to storage platform
Test Execution
● Use cases and workloads
● Structured / Log data
● …
Parameters
● Query engines
● Data volume
● Object storage locations
● Configurations
ANALYTICS (IN SITU)
INGEST
APPS
BATCH
ANALYTICS
INTERACTIVE
FRAMEWORKS
S3A Compatible APIS3 /
NFS
S3A
ELASTIC DATA LAKE
STREAMS S3 /
NFS
Query Engine
Resource Mgmt
AWS
BenchmarkEvaluationSuite
11
○ Decouple compute from storage
○ Leverage Openstack/Ceph and OneOps (lifecycle management tool open sourced by
@Walmartlabs: https://github.com/oneops)
○ Predictable SLAs, flexibility of big data software versions, tenant-specific cloud
deployment and operations, and a shared data lake with 1.5x replication
○ Analogous to the AWS + S3 model
OUR BIG DATA JOURNEY @WALMARTLABS
Started to build on-premise Openstack/Ceph clouds in early 2016
12
● Ceph RGW supports S3A/Swift RESTful APIs
● S3A/Swift hadoop connectors/drivers in the hadoop codebase
● Interact with Ceph objects like files in HDFS (but no append)
● External tables in Hive metastore
MAKING THE CONNECTION
RED HAT
13
● https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-
aws/index.html
● S3A connector in HDFS client
fs.s3a.{access,secret}.key
fs.s3a.endpoint
● Use in conjunction with Hive external tables
create database mydb location 's3a://bucket/mydb';
MAKING THE CONNECTION: S3A
RED HAT
14
● http://hadoop.apache.org/docs/current//hadoop-openstack/index.html
● Swift connector in HDFS client
fs.swift.service.region_name.{uername,password,tenant,region,public}
fs.swift.service.region_name.auth.url
● Use in conjunction with Hive external tables
create database mydb location 
’swift://container.region_name/mydb';
MAKING THE CONNECTION: SWIFT
RED HAT
1
5
● Performant S3A connector features in unreleased Hadoop 2.8.0-
SNAPSHOT+
○ Lazy Seek
○ Better threadpool implementation
○ Multi-region support
○ http://events.linuxfoundation.org/sites/events/files/slides/2016-11-08-
Hadoop,%20Hive,%20Spark%20and%20Object%20Stores%20-final.pdf
● The aws-sdk-java library that S3A depends on keeps changing and
Ceph needs to catch up testing against the latest
● A lot of testing against S3A going on @ Ceph
LIMITATIONS OF S3A CONNECTOR
1
6
● Job failures and inferior performance running large scale
Hive/Spark/Presto queries
○ Uncontrolled number of HTTP requests (list, rename, copy, etc)
○ Listing large number of objects (bound by a limited number, e.g., 10000 objects)
○ ORC range queries throw EOF exceptions
○ Re-auth expiration issue with deadlocks
LIMITATIONS OF SWIFT CONNECTOR
17
1. Re-designed and implemented new thread-pools for list, copy, delete and rename
2. Re-designed and implemented pagination and iterator-based object listing to reduce
memory footprint retrieving large number of objects
3. Implemented lazy seek (which shows improvement of some Presto queries to 30x-
100x)
4. Implemented LRU cache to reduce the overhead of HEAD requests
5. Fixed the frequent re-auth expiration error with issues of deadlock
6. Addressed the ORC range query issue with dynamic offsets based on file lengths
7. Added metrics support for input stream, output stream, and swift file system, etc
8. Added support for parallel uploads during data creation (multi-part upload)
INTRODUCING SWIFTA: A PERFORMANT
SWIFT CONNECTOR
Tons of performance improvement features while retaining core functionalities of Swift API
(@Walmartlabs will open source this performant SwiftA hadoop connector this year and
present it in OpenStack Summit 2017 at Boston)
THANKS!
19 RED HAT
2
0
RED HAT
21 RED HAT
2
2
LOGOS!
RED HAT

Ceph Day San Jose - Object Storage for Big Data

  • 1.
    OBJECT STORAGE FORBIG DATA Mengmeng Liu Senior Manager, Big Fast Data Technology CEPH DAY SAN JOSE Kyle Bader Senior Solution Architect
  • 2.
    2 THE STAKEHOLDERS RED HAT BUSINESS STAKEHOLDER DATAENGINEER IT OPS DATA SCIENTIST DATA ANALYST INFRASTRUCTURE ENGINEER
  • 3.
    3 WHAT THEY WANT REDHAT ● Support diverse ecosystem of data analysis tools ● Independent scaling of compute and storage ● Rapid provisioning of data labs ● Public Cloud / Private Cloud architectural parity ● Control costs, predictable costs
  • 4.
    4 THE ELEPHANT INTHE ROOM RED HAT Batch analytics with Map-Reduce on HDFS Exit to other Data warehousesIngest Persistent Data
  • 5.
    5 PLURALITY OF ANALYTICSTOOLS RED HAT Single source of truth Exit to other Data warehousesIngest Persistent Data
  • 6.
    BIG DATA LINEAGE Multipledata copies from each stage of data transformation, potentially in different places Ingest The more analytical processing clusters you have, the harder it is to know which data is where.
  • 7.
    7 ● Ad-hoc provisioningof transient analytics clusters ● Scale compute up for workload ● Terminate compute post workload to reclaim resources ● Allow non-analytic workloads to make use of excess compute ● Break linear relationship between compute and storage scaling DISAGGREGATION HIGHLIGHTS RED HAT
  • 8.
    8 ● Striped ErasureCoding – Overhead 1.5x vs replication ● Strongly consistent ● Rich access control semantics ● Bucket versioning ● Active / active multi-site replication (asynchronous) CEPH OBJECT STORE HIGHLIGHTS RED HAT
  • 9.
    ARCHITECTURE—ELASTIC DATA LAKE Disaggregatingcompute resources from an object storage solution enables the most flexibility • Ingest from multiple sources using S3A API • Analytics operate directly without expensive, time- consuming ETL or replication • Analytics performed using batch or interactive tools • Exploratory analysis support by ephemeral clusters ANALYTICS (IN SITU) EPHEMERAL CLUSTERS INGEST APPS BATCH ANALYTICS INTERACTIVE FRAMEWORKS S3A Compatible API Object Storage Solution S3 / NFS S3A Data Laboratories Batch HDFS ELASTIC DATA LAKE STREAMS S3 / NFS Interactive Query Engine Resource Mgmt
  • 10.
    WHAT WE’RE TESTING Measuringkey use cases from our Center of Excellence from query to storage platform Test Execution ● Use cases and workloads ● Structured / Log data ● … Parameters ● Query engines ● Data volume ● Object storage locations ● Configurations ANALYTICS (IN SITU) INGEST APPS BATCH ANALYTICS INTERACTIVE FRAMEWORKS S3A Compatible APIS3 / NFS S3A ELASTIC DATA LAKE STREAMS S3 / NFS Query Engine Resource Mgmt AWS BenchmarkEvaluationSuite
  • 11.
    11 ○ Decouple computefrom storage ○ Leverage Openstack/Ceph and OneOps (lifecycle management tool open sourced by @Walmartlabs: https://github.com/oneops) ○ Predictable SLAs, flexibility of big data software versions, tenant-specific cloud deployment and operations, and a shared data lake with 1.5x replication ○ Analogous to the AWS + S3 model OUR BIG DATA JOURNEY @WALMARTLABS Started to build on-premise Openstack/Ceph clouds in early 2016
  • 12.
    12 ● Ceph RGWsupports S3A/Swift RESTful APIs ● S3A/Swift hadoop connectors/drivers in the hadoop codebase ● Interact with Ceph objects like files in HDFS (but no append) ● External tables in Hive metastore MAKING THE CONNECTION RED HAT
  • 13.
    13 ● https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop- aws/index.html ● S3Aconnector in HDFS client fs.s3a.{access,secret}.key fs.s3a.endpoint ● Use in conjunction with Hive external tables create database mydb location 's3a://bucket/mydb'; MAKING THE CONNECTION: S3A RED HAT
  • 14.
    14 ● http://hadoop.apache.org/docs/current//hadoop-openstack/index.html ● Swiftconnector in HDFS client fs.swift.service.region_name.{uername,password,tenant,region,public} fs.swift.service.region_name.auth.url ● Use in conjunction with Hive external tables create database mydb location ’swift://container.region_name/mydb'; MAKING THE CONNECTION: SWIFT RED HAT
  • 15.
    1 5 ● Performant S3Aconnector features in unreleased Hadoop 2.8.0- SNAPSHOT+ ○ Lazy Seek ○ Better threadpool implementation ○ Multi-region support ○ http://events.linuxfoundation.org/sites/events/files/slides/2016-11-08- Hadoop,%20Hive,%20Spark%20and%20Object%20Stores%20-final.pdf ● The aws-sdk-java library that S3A depends on keeps changing and Ceph needs to catch up testing against the latest ● A lot of testing against S3A going on @ Ceph LIMITATIONS OF S3A CONNECTOR
  • 16.
    1 6 ● Job failuresand inferior performance running large scale Hive/Spark/Presto queries ○ Uncontrolled number of HTTP requests (list, rename, copy, etc) ○ Listing large number of objects (bound by a limited number, e.g., 10000 objects) ○ ORC range queries throw EOF exceptions ○ Re-auth expiration issue with deadlocks LIMITATIONS OF SWIFT CONNECTOR
  • 17.
    17 1. Re-designed andimplemented new thread-pools for list, copy, delete and rename 2. Re-designed and implemented pagination and iterator-based object listing to reduce memory footprint retrieving large number of objects 3. Implemented lazy seek (which shows improvement of some Presto queries to 30x- 100x) 4. Implemented LRU cache to reduce the overhead of HEAD requests 5. Fixed the frequent re-auth expiration error with issues of deadlock 6. Addressed the ORC range query issue with dynamic offsets based on file lengths 7. Added metrics support for input stream, output stream, and swift file system, etc 8. Added support for parallel uploads during data creation (multi-part upload) INTRODUCING SWIFTA: A PERFORMANT SWIFT CONNECTOR Tons of performance improvement features while retaining core functionalities of Swift API (@Walmartlabs will open source this performant SwiftA hadoop connector this year and present it in OpenStack Summit 2017 at Boston)
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.