Hadoop & cloud storage object store integration in production (final)

1 © Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Hadoop & Cloud Storage:
Object Store Integration in
Production
Chris Nauroth
Rajesh Balamohan
Hadoop Summit 2016

Reserved
About Us
Rajesh Balamohan, rbalamohan@hortonworks.com, Twitter: @rajeshbalamohan
– Apache Tez Committer, PMC Member
– Mainly working on performance in Tez
– Have been using Hadoop since 2009
Chris Nauroth, cnauroth@hortonworks.com, Twitter: @cnauroth
– Apache Hadoop committer, PMC member, and Apache Software Foundation member
– Working on HDFS and alternative file systems such as WASB and S3A
– Hadoop user since 2010
Steve Loughran, stevel@hortonworks.com, Twitter: @steveloughran
– Apache Hadoop committer, PMC member, and Apache Software Foundation member
– Hadoop deployment since 2008, especially Cloud integration, Filesystem Spec author.
– Working on: Apache Slider, Spark+cloud integration, Hadoop + Cloud

Reserved
Agenda
⬢ Hadoop/Cloud Storage Integration Use Cases
⬢ Hadoop-compatible File System Architecture
⬢ Recent Enhancements in S3A FileSystem Connector
⬢ Hive Access Patterns
⬢ Performance Improvements and TPC-DS Benchmarks with Hive-TestBench
⬢ Next Steps for S3A and other Object Stores
⬢ Q & A

Reserved
Why Hadoop in the Cloud?

Reserved
Hadoop Cloud Storage Utilization Evolution
HDFS
Application
HDFS
Application
GoalEvolution towards cloud storage as the primary Data Lake
Input Output
Backup Restore
Input
Output
Copy
HDFS
Application
Input
Output
tmp

Reserved
What is the Problem?
Cloud Object Stores designed for
⬢ Scale
⬢ Cost
⬢ Geographic Distribution
⬢ Availability
⬢ Cloud app writers often modify apps to deal with cloud storage semantics and limitations
Challenges - Hadoop apps should work on HDFS or Cloud Storage transparently
⬢ Eventual consistency
⬢ Performance - separated from compute
⬢ Cloud Storage not designed for file-like access patterns
⬢ Limitations in APIs (e.g. rename)

Reserved
Goal and Approach
Goals
⬢ Integrate with unique functionality of each cloud
⬢ Optimize each cloud’s object store connector
⬢ Optimize upper layers for cloud object stores
Overall Approach
⬢ Consistency in face of eventual consistency (use a secondary metadata store)
⬢ Performance in the connector (e.g. lazy seek)
⬢ Upper layer improvements (Hive, ORC, Tez, etc.)

Reserved
Hadoop-compatible File System Architecture

Reserved
Hadoop-compatible File System Architecture
⬢ Applications
– File system interactions coded to file system-agnostic abstraction layer.
• FileSystem class - traditional API
• FileContext/AbstractFileSystem classes - newer API providing split between client API and provider API
– Can be retargeted to a different file system by configuration changes (not code changes).
• Caveat: Different FileSystem implementations may offer limited feature set.
• Example: Only HDFS and WASB can run HBase.
⬢ File System Abstraction Layer
– Defines interface of common file system operations: create, open, rename, etc.
– Supports additional mix-in interfaces to indicate implementation of optional features.
– Semantics of each operation documented in formal specification, derived from HDFS behavior.
⬢ File System Implementation Layer
– Each file system provides a set of concrete classes implementing the interface.
– A set of common file system contract tests execute against each implementation to prove its adherence to specified
semantics.

1
0
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Cloud Storage Connectors
Azure WASB ● Strongly consistent
● Good performance
● Well-tested on applications (incl. HBase)
ADL ● Strongly consistent
● Tuned for big data analytics workloads
Amazon Web Services S3A ● Eventually consistent - consistency work in
progress by Hortonworks
● Performance improvements in progress
● Active development in Apache
EMRFS ● Proprietary connector used in EMR
● Optional strong consistency for a cost
Google Cloud Platform GCS ● Multiple configurable consistency policies
● Currently Google open source
● Good performance
● Work under way for contribution to Apache

1
1
Reserved
1
1
Reserved
Case Study: S3A Functionality and
Performance

1
2
Reserved
Authentication
⬢ Basic
– AWS Access Key ID and Secret Access Key in Hadoop Configuration Files
– Hadoop Credential Provider API to avoid using world-readable configuration files
⬢ EC2 Metadata
– Reads credentials published by AWS directly into EC2 VM instances
– More secure, because external distribution of secrets not required
⬢ AWS Environment Variables
– Less secure, but potentially easier integration for some applications
⬢ Session Credentials
– Temporary security credentials issued by Amazon Security Token Service
– Fixed lifetime reduces impact of credential leak
⬢ Anonymous Login
– Easy read-only access to public buckets for early prototyping

1
3
Reserved
Encryption
⬢ S3 Server-Side Encryption
– Encryption of data at rest at S3
– Supports the SSE-S3 option: each object encrypted by a unique key using AES-256 cipher
– Now covered in S3A automated test suites
– Support for additional options under development (SSE-KMS and SSE-C)

1
4
Reserved
Supportability
⬢ Documentation
– Backfill missing documentation, and include documentation in new enhancements
– To be published to hadoop.apache.org with Apache Hadoop 2.8.0 release
– Meanwhile, raw content visible on GitHub:
• https://github.com/apache/hadoop/blob/branch-2.8/hadoop-tools/hadoop-
aws/src/site/markdown/tools/hadoop-aws/index.md
⬢ Error Reporting
– Identify common user errors and provide more descriptive error messages
– S3 HTTP error codes examined and translated to specific error types
⬢ Instrumentation
– Internal metrics covering a wide range of metadata and data operations
– Already proven helpful in flagging a potential performance regression in a patch

1
5
Reserved
Performance Improvements
⬢ Lazy Seek
– Earlier implementation
• Reopened file in every seek call; Aborted connection in every reopen
• Positional Read was expensive (seek, read, seek)
– Current implementation
• Seek is a no-op call
• Performs real seek on need basis
⬢ Connection Abort Problem
– Backward seeks caused connection aborts
– Recent modifications to S3AFileSystem fixes these and added support for sequential reads
and random reads
• fs.s3a.experimental.input.fadvise

1
6
Reserved
Hive Access Patterns
⬢ ETL and Admin Activities
– Bringing in dataset / Creating Tables
– Cleansing / Transforming Data
– Analyze Tables, Compute Column Statistics
– MSCK to fix partition related information
⬢ Read
– Running Queries
⬢ Write
– Store Output

1
7
Reserved
Hive - MSCK Improvements
⬢ MSCK helps in fixing metastore for partitioned dataset
– Scan table path to identify missing partitions (expensive in S3)

1
8
Reserved
Hive - Analyze Column Statistics Improvements
⬢ Hive needs statistics to run queries efficiently
– Gathering table and column statistics can be expensive in partitioned datasets

1
9
Reserved
Performance Considerations When Running Hive Queries
⬢ Splits Generation
– File formats like ORC provides threadpool in split generation
⬢ ORC Footer Cache
– hive.orc.cache.stripe.details.size > 0
– Caches footer details; Helps in reducing data reads during split generation
⬢ Reduce S3A reads in Task side
– hive.orc.splits.include.file.footer=true
– Sends ORC footer information in splits payload.
– Helps reducing the amount of data read in task side.

2
0
Reserved
Performance Considerations When Running Hive Queries
⬢ Tez Splits Grouping
– Hive uses Tez as its default execution engine
– Tez groups splits based on min/max group setting, location details and so on
– S3A always provides “localhost” as its block location information
– When all splits-length falls below min group setting, Tez aggressively groups them into single
split. This causes issues with S3A as single task ends up doing sequential operations.
– Fixed in recent releases
⬢ Container Launches
– S3A always provides “localhost” for block locations.
– Good to set “yarn.scheduler.capacity.node-locality-delay=0”

2
1
Reserved
Hive-TestBench Benchmark Results
⬢ Hive-TestBench has subset of queries from TPC-DS (https://github.com/hortonworks/hive-testbench)
⬢ m4x4x large - 5 nodes
⬢ TPC-DS @ 200 GB Scale in S3
⬢ “HDP 2.3 + S3 in cloud” vs “HDP 2.4 + S3 in cloud”
– Average speedup 2.5x
– Queries like 15,17, 25, 73,75 etc did not run in HDP 2.3 (throws AWS timeout exceptions)

2
2
Reserved
Hive-TestBench Benchmark Results - LLAP
⬢ LLAP DAG runtime comparison with Hive
⬢ Reduces the amount of data to be read from S3 significantly; Improves runtime.

2
3
Reserved
Best Practices
⬢ Tune multipart settings
– fs.s3a.multipart.threshold (default: Integer.MAX_VALUE)
– fs.s3a.multipart.size (default: 100 MB)
– fs.s3a.connection.timeout (default: 200 seconds)
⬢ Tune File Committer Algorithm
– mapreduce.fileoutputcommitter.algorithm.version=2
⬢ Disable node locality delay in YARN
– Set “yarn.scheduler.capacity.node-locality-delay=0” to avoid delays in container launches
⬢ Disable Storage Based authorization in Hive
– hive.security.metastore.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetas
toreAuthorizationProvider
– hive.metastore.pre.event.listeners= (set to empty value)
⬢ Tune ORC threads for reducing split generation times
– hive.orc.compute.splits.num.threads (default 10)

2
4
Reserved
Next Steps for S3A and other Object Stores
⬢ S3A Phase III
– https://issues.apache.org/jira/browse/HADOOP-13204
⬢ Output Committers
– Logical commit operation decoupled from rename (non-atomic and costly in object stores)
⬢ Object Store Abstraction Layer
– Avoid impedance mismatch with FileSystem API
– Provide specific APIs for better integration with object stores: saving, listing, copying
⬢ Ongoing Performance Improvement
– Less chatty call pattern for object listings
– Metadata caching to mask latency of remote object store calls
⬢ Consistency

2
5
Reserved
Summary
⬢ Evolution towards cloud storage
⬢ Hadoop-compatible File System Architecture fosters integration with cloud storage
⬢ Integration with multiple cloud providers available: Azure, AWS, Google
⬢ Recent enhancements in S3A
⬢ Hive usage and TPC-DS benchmarks show significant S3A performance
improvements
⬢ More coming soon for S3A and other object stores

2
6
Reserved
Q & A
Thank You!

Hadoop & cloud storage object store integration in production (final)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hadoop & cloud storage object store integration in production (final)

Similar to Hadoop & cloud storage object store integration in production (final) (20)

Recently uploaded

Recently uploaded (20)

Hadoop & cloud storage object store integration in production (final)