Cloudy with a chance of Hadoop
Running Hadoop in the cloud(s)
Ram Venkatesh
Mingliang Liu
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Presenters
Mingliang Liu
Software Engineer, Hortonworks
Apache Hadoop Committer
Ram Venkatesh
Senior Director of Engineering, Hortonworks
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
 Use cases and scenarios
 Problems encountered and lessons learned
 A couple deep dives
– Fault tolerance
– Object storage consistency
 Wrap-up
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop-in-the-cloud Use Cases
 Full on-premise multi-user, multi-tenant cluster with all Hadoop ecosystem components
 Per-workload clusters for specific use cases
– Clusters for processing batch jobs such as MapReduce, Pig, Hive or Spark jobs
– Clusters for running interactive workloads such as Hive LLAP or Livy
 Dev, QA, UAT setups for non- and pre- production use cases
 Production setups with SLAs and monitoring and, and,
 Self-service vs full-service
– Some clusters setup by sophisticated dev-ops and admin groups
– Some clusters spun up by end users such as data engineers or data scientists
 Long-running vs ephemeral
 Varying security and compliance requirements
No one-size-fits-all solution possible!
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks Cloud Solutions
Microsoft AWS Google
Managed Azure HDInsight
Non-Managed /
Marketplace
Hortonworks Data
Cloud for AWS
Cloud IaaS
Hortonworks Data Platform
(via Cloudbreak and Ambari)
FOCUS OF THIS TALK
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Easily Launch HDP on Any Cloud with Cloudbreak
Dev / Test
Bi / Analytics
IoT
On-Premises
Cloudbreak is a tool for
provisioning clusters on
cloud infrastructure.
Cloudbreak allows
enterprises to simplify the
provisioning of clusters in
the cloud and optimize
their use of cloud resources
as workloads change.
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cloudbreak Goals and Motivations
 Declarative/full Hadoop stack provisioning in all major
cloud providers
 Automate and unify the process
 Same process through a cluster lifecycle (Dev, QA,
UAT, Prod)
 Provide first-class dev ops tooling - UI, REST API and
CLI/shell
 Flexible cluster shapes – security, HA, cluster
topologies
 Cloud friendly – elasticity, auto-scaling, fault
tolerance, auto recovery
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned
 Not all cloud providers are the same
– Difference in performance, storage and functionality
 Know your customer – capacity planning is fundamentally different
– Based on workload type (batch / interactive and ad-hoc / long running)
– Use heterogeneous clusters
– Cluster size is a variable in your calculations
– Trial and error – mistakes are cheap, iterate until you find your best fit
 Storage and what you do with it matters
– Multiple choices (ephemeral, block storage and object stores)
– Speed, Cost, Reliability are all important factors to think about
– defaultFS vs cloud object store connector architecture
– Default Hive warehouse directory configuration
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned: Cloud Provider Specific
 Compute
– Find your instance types for the workload, use heterogeneous clusters
– Different instance types for transient (e.g. C4, M4) and long running (e.g. H2, D2) clusters
– Dedicated instances (to avoid noise, regulations e.g. HIPPA)
 Network
– Use enhanced networking (Amazon Linux by default, RHEL based – apply patch)
– Placement groups, cross AZ deployments
– Not all instance types can use the 10Gbit network (e.g. use 8x)
 Storage
– Azure - ephemeral disk is faster than root disk – does not survive auto-updates
– Multiple storage choices – WASB, ADLS
– Multiple connector choices with S3 – pick S3a…it’s the latest
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons You DON’T Want to Learn
 Security considerations - Defense in depth Data
Workload
Compute
Network
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Some Things to Think About
 Network security
– Private networks, internet connectivity, ports and protocols, security groups
 Compute
– Edge nodes vs cluster nodes, SSH access and who needs it, IAM roles are your friend
 Workloads
– Authenticated end points, API security, workload-specific authentication and authorization
 Data Protection
– At rest, in motion and everything in between
 Extra Credit
– Audit and traceability
– DevOps as code
– Declarative automation for review and change management
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
And Finally: “Cloud Readiness” and Fault Tolerance
 VM != Bare Metal
 Cloud Storage != HDFS
 HDFS NameNode & YARN ResourceManager HA != Cloud fault tolerance
 All the parts matter…its all ephemeral, y’all
 Externalize everything – files, tables, schema, policies, ambari state, cloudbreak state
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
Auto-scaling and Fault Tolerance
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cloud Storage Made Better
Bridging S3 consistency model and Hadoop applications
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS And Cloud Storage Are Not Mutually Exclusive
HDFS
Application
HDFS
Application
GoalEvolution towards cloud storage as the primary Data Lake
Input Output
Backup Restore
Input
Output
Copy
Application
Input
Output
Tmp
HDFS
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Object Store Pretending A FileSystem
 Cloud Object Stores designed for
– Scale
– Cost
– Geographic Distribution
– Availability
 Cloud apps dedicatedly deal with cloud storage semantics and limitations
 Hadoop apps should work on cloud storage transparently
– S3A/WASB partially adheres to the FileSystem specification
– ADL supports WebHDFS REST API
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop FileSystem: One Interface Fits All
org.apache.hadoop.fs.FileSystem
hdfs s3awasb adlswift gs
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Practical Problems Using Cloud Storage Service in Hadoop
 P1: Performance
– Separated from compute (e.g. data locality)
– Slow metadata read
 P2: Limitations in APIs
– File formats and access patterns vs object oriented streaming
– Not atomic operations
• delete(path, recursive=true)
• rename(source, dest)
 P3: Eventual consistency (S3 specific)
– List
– Delete
– Update
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
P2: Not Atomic API: rename()
 A Series of Operations on The Client
00
00
00
01
01
s01 s02
s03 s04
hash("/work/pending/part-01")
["s02", "s03", "s04"]
copy("/work/pending/part-01",
"/work/complete/part01")
01
01
01
01
delete("/work/pending/part-01")
hash("/work/pending/part-00")
["s01", "s02", "s04"]
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
P3: Eventual Consistency From FileSystem’s View
 When listing a directory
– Newly created files may not yet be visible, deleted ones still present
 After updating a file
– Opening and reading the file may still return the previous data
 After deleting a file
– Opening the file may succeed, returning the data
 While reading an object
– If object is updated or deleted during the process
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
P3: Eventually Consistent – Seeing Deleted Data
00
00
00
01
01
s01 s02
s03 s04
01
DELETE /work/pending/part-00
GET /work/pending/part-00
GET /work/pending/part-00
200
200
200
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
S3Guard: Fast And Consistent S3 Metadata
 Goals
– Provide consistent list and get status operations on S3 objects written with S3Guard enabled
• listStatus() after put and delete
• getFileStatus() after put and delete
– Performance improvements that impact real workloads
– Provide tools to manage associated metadata and caching policies.
 Again, 100% open source in Apache Hadoop community
– Hortonworks, Cloudera, Western Digital, Disney …
 Inspired by Apache licensed S3mper project from Netflix
 Seamless integration with S3AFileSystem
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
S3Guard: Core Ideas
 Using a consistent store (DynamoDB) for indexing metadata
 Mutating file system operations
– Update both S3 and DynamoDB
 Read operations
– Return results to callers as sourced from S3
– First check their results against the metadata in DynamoDB
– For failures, S3A waits and rechecks both S3 and DynamoDB
 Try it today!
<property>
<name>fs.s3a.metadatastore.impl</name>
<value>org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore</value>
</property>
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
S3Guard Write/Read Path
Hadoop Application
Amazon DynamoDB Amazon S3
S3AFileSystem
The Hadoop FileSystem for Amazon S3
DynamoDB Client S3 Client
FileSystem operations
WRITE
fs metadata
2
READ
fs metadata
1
WRITE
object data
1
LIST
object info
2
READ
object data
3
Write
Read
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
S3Guard: Faster Metadata Listing
0
10
20
30
40
50
60
70
Total Runtime of Query Split Computation Rumtime
Runtime(seconds)
Hive Query Performance with S3Guard
Without S3Guard With S3Guard
* The lower, the better
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Practical Problems Using Amazon S3 in Hadoop
 P1: Performance
– Separated from compute
– Slow metadata read
 P2: Limitations in APIs
 P3: Eventual consistency
– Listing inconsistency
– Delete inconsistency
– Update inconsistency
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Learn More
 Try Cloudbreak Today
– https://hortonworks.com/open-source/cloudbreak/
 Try Hortonworks Data Cloud Today
– GA: https://aws.amazon.com/marketplace/pp/B01LXOQBOU
– Technical Preview: http://hortonworks.github.io/hdp-aws/
 BREAKOUT SESSIONS
– Wednesday, June 14 @ 5:50p, Don’t Let Spark Burn Your House: Perspectives on Securing Spark
 CRASH COURSE
– Thursday, June 15 @ 3:00p – 6:00p, Apache Spark and Apache Hive processing on the Cloud
 BIRDS OF A FEATHER
– Thursday, June 15 @ 5:00p, Security and Governance
– Thursday, June 15 @ 5:00p, Cloud and Operations
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you

Cloudy with a chance of Hadoop - real world considerations

  • 1.
    Cloudy with achance of Hadoop Running Hadoop in the cloud(s) Ram Venkatesh Mingliang Liu
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved Presenters Mingliang Liu Software Engineer, Hortonworks Apache Hadoop Committer Ram Venkatesh Senior Director of Engineering, Hortonworks
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda  Use cases and scenarios  Problems encountered and lessons learned  A couple deep dives – Fault tolerance – Object storage consistency  Wrap-up
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved Hadoop-in-the-cloud Use Cases  Full on-premise multi-user, multi-tenant cluster with all Hadoop ecosystem components  Per-workload clusters for specific use cases – Clusters for processing batch jobs such as MapReduce, Pig, Hive or Spark jobs – Clusters for running interactive workloads such as Hive LLAP or Livy  Dev, QA, UAT setups for non- and pre- production use cases  Production setups with SLAs and monitoring and, and,  Self-service vs full-service – Some clusters setup by sophisticated dev-ops and admin groups – Some clusters spun up by end users such as data engineers or data scientists  Long-running vs ephemeral  Varying security and compliance requirements No one-size-fits-all solution possible!
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved Hortonworks Cloud Solutions Microsoft AWS Google Managed Azure HDInsight Non-Managed / Marketplace Hortonworks Data Cloud for AWS Cloud IaaS Hortonworks Data Platform (via Cloudbreak and Ambari) FOCUS OF THIS TALK
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved Easily Launch HDP on Any Cloud with Cloudbreak Dev / Test Bi / Analytics IoT On-Premises Cloudbreak is a tool for provisioning clusters on cloud infrastructure. Cloudbreak allows enterprises to simplify the provisioning of clusters in the cloud and optimize their use of cloud resources as workloads change.
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved Cloudbreak Goals and Motivations  Declarative/full Hadoop stack provisioning in all major cloud providers  Automate and unify the process  Same process through a cluster lifecycle (Dev, QA, UAT, Prod)  Provide first-class dev ops tooling - UI, REST API and CLI/shell  Flexible cluster shapes – security, HA, cluster topologies  Cloud friendly – elasticity, auto-scaling, fault tolerance, auto recovery
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved Lessons Learned  Not all cloud providers are the same – Difference in performance, storage and functionality  Know your customer – capacity planning is fundamentally different – Based on workload type (batch / interactive and ad-hoc / long running) – Use heterogeneous clusters – Cluster size is a variable in your calculations – Trial and error – mistakes are cheap, iterate until you find your best fit  Storage and what you do with it matters – Multiple choices (ephemeral, block storage and object stores) – Speed, Cost, Reliability are all important factors to think about – defaultFS vs cloud object store connector architecture – Default Hive warehouse directory configuration
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved Lessons Learned: Cloud Provider Specific  Compute – Find your instance types for the workload, use heterogeneous clusters – Different instance types for transient (e.g. C4, M4) and long running (e.g. H2, D2) clusters – Dedicated instances (to avoid noise, regulations e.g. HIPPA)  Network – Use enhanced networking (Amazon Linux by default, RHEL based – apply patch) – Placement groups, cross AZ deployments – Not all instance types can use the 10Gbit network (e.g. use 8x)  Storage – Azure - ephemeral disk is faster than root disk – does not survive auto-updates – Multiple storage choices – WASB, ADLS – Multiple connector choices with S3 – pick S3a…it’s the latest
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved Lessons You DON’T Want to Learn  Security considerations - Defense in depth Data Workload Compute Network
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved Some Things to Think About  Network security – Private networks, internet connectivity, ports and protocols, security groups  Compute – Edge nodes vs cluster nodes, SSH access and who needs it, IAM roles are your friend  Workloads – Authenticated end points, API security, workload-specific authentication and authorization  Data Protection – At rest, in motion and everything in between  Extra Credit – Audit and traceability – DevOps as code – Declarative automation for review and change management
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved And Finally: “Cloud Readiness” and Fault Tolerance  VM != Bare Metal  Cloud Storage != HDFS  HDFS NameNode & YARN ResourceManager HA != Cloud fault tolerance  All the parts matter…its all ephemeral, y’all  Externalize everything – files, tables, schema, policies, ambari state, cloudbreak state
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved Demo Auto-scaling and Fault Tolerance
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved Cloud Storage Made Better Bridging S3 consistency model and Hadoop applications
  • 15.
    15 © HortonworksInc. 2011 – 2016. All Rights Reserved HDFS And Cloud Storage Are Not Mutually Exclusive HDFS Application HDFS Application GoalEvolution towards cloud storage as the primary Data Lake Input Output Backup Restore Input Output Copy Application Input Output Tmp HDFS
  • 16.
    16 © HortonworksInc. 2011 – 2016. All Rights Reserved Object Store Pretending A FileSystem  Cloud Object Stores designed for – Scale – Cost – Geographic Distribution – Availability  Cloud apps dedicatedly deal with cloud storage semantics and limitations  Hadoop apps should work on cloud storage transparently – S3A/WASB partially adheres to the FileSystem specification – ADL supports WebHDFS REST API
  • 17.
    17 © HortonworksInc. 2011 – 2016. All Rights Reserved Hadoop FileSystem: One Interface Fits All org.apache.hadoop.fs.FileSystem hdfs s3awasb adlswift gs
  • 18.
    18 © HortonworksInc. 2011 – 2016. All Rights Reserved Practical Problems Using Cloud Storage Service in Hadoop  P1: Performance – Separated from compute (e.g. data locality) – Slow metadata read  P2: Limitations in APIs – File formats and access patterns vs object oriented streaming – Not atomic operations • delete(path, recursive=true) • rename(source, dest)  P3: Eventual consistency (S3 specific) – List – Delete – Update
  • 19.
    19 © HortonworksInc. 2011 – 2016. All Rights Reserved P2: Not Atomic API: rename()  A Series of Operations on The Client 00 00 00 01 01 s01 s02 s03 s04 hash("/work/pending/part-01") ["s02", "s03", "s04"] copy("/work/pending/part-01", "/work/complete/part01") 01 01 01 01 delete("/work/pending/part-01") hash("/work/pending/part-00") ["s01", "s02", "s04"]
  • 20.
    20 © HortonworksInc. 2011 – 2016. All Rights Reserved P3: Eventual Consistency From FileSystem’s View  When listing a directory – Newly created files may not yet be visible, deleted ones still present  After updating a file – Opening and reading the file may still return the previous data  After deleting a file – Opening the file may succeed, returning the data  While reading an object – If object is updated or deleted during the process
  • 21.
    21 © HortonworksInc. 2011 – 2016. All Rights Reserved P3: Eventually Consistent – Seeing Deleted Data 00 00 00 01 01 s01 s02 s03 s04 01 DELETE /work/pending/part-00 GET /work/pending/part-00 GET /work/pending/part-00 200 200 200
  • 22.
    22 © HortonworksInc. 2011 – 2016. All Rights Reserved S3Guard: Fast And Consistent S3 Metadata  Goals – Provide consistent list and get status operations on S3 objects written with S3Guard enabled • listStatus() after put and delete • getFileStatus() after put and delete – Performance improvements that impact real workloads – Provide tools to manage associated metadata and caching policies.  Again, 100% open source in Apache Hadoop community – Hortonworks, Cloudera, Western Digital, Disney …  Inspired by Apache licensed S3mper project from Netflix  Seamless integration with S3AFileSystem
  • 23.
    23 © HortonworksInc. 2011 – 2016. All Rights Reserved S3Guard: Core Ideas  Using a consistent store (DynamoDB) for indexing metadata  Mutating file system operations – Update both S3 and DynamoDB  Read operations – Return results to callers as sourced from S3 – First check their results against the metadata in DynamoDB – For failures, S3A waits and rechecks both S3 and DynamoDB  Try it today! <property> <name>fs.s3a.metadatastore.impl</name> <value>org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore</value> </property>
  • 24.
    24 © HortonworksInc. 2011 – 2016. All Rights Reserved S3Guard Write/Read Path Hadoop Application Amazon DynamoDB Amazon S3 S3AFileSystem The Hadoop FileSystem for Amazon S3 DynamoDB Client S3 Client FileSystem operations WRITE fs metadata 2 READ fs metadata 1 WRITE object data 1 LIST object info 2 READ object data 3 Write Read
  • 25.
    25 © HortonworksInc. 2011 – 2016. All Rights Reserved S3Guard: Faster Metadata Listing 0 10 20 30 40 50 60 70 Total Runtime of Query Split Computation Rumtime Runtime(seconds) Hive Query Performance with S3Guard Without S3Guard With S3Guard * The lower, the better
  • 26.
    26 © HortonworksInc. 2011 – 2016. All Rights Reserved Practical Problems Using Amazon S3 in Hadoop  P1: Performance – Separated from compute – Slow metadata read  P2: Limitations in APIs  P3: Eventual consistency – Listing inconsistency – Delete inconsistency – Update inconsistency
  • 27.
    27 © HortonworksInc. 2011 – 2016. All Rights Reserved Learn More  Try Cloudbreak Today – https://hortonworks.com/open-source/cloudbreak/  Try Hortonworks Data Cloud Today – GA: https://aws.amazon.com/marketplace/pp/B01LXOQBOU – Technical Preview: http://hortonworks.github.io/hdp-aws/  BREAKOUT SESSIONS – Wednesday, June 14 @ 5:50p, Don’t Let Spark Burn Your House: Perspectives on Securing Spark  CRASH COURSE – Thursday, June 15 @ 3:00p – 6:00p, Apache Spark and Apache Hive processing on the Cloud  BIRDS OF A FEATHER – Thursday, June 15 @ 5:00p, Security and Governance – Thursday, June 15 @ 5:00p, Cloud and Operations
  • 28.
    28 © HortonworksInc. 2011 – 2016. All Rights Reserved Thank you

Editor's Notes

  • #15 Next we will talk about the challenges when using cloud storage options for real world applications. Specially, we will discuss the Amazon S3 consistency model, and how to solve the problems when Hadoop applications directly work with Amazon S3.
  • #16 So, you may have heard of some opinions stating that, as migrating the Hadoop system from on premise clusters to cloud benefits a lot because of many good reasons, cloud storage can replace HDFS. Someone argues that Amazon S3, for example, is 10X better than HDFS. There are indeed some aspects that S3 is better than HDFS, regarding lower total cost, elasticity, high availability and durability, reduced operation burden etc. However, there are also some problems in using S3 to replace HDFS. For example, HDFS provides much larger read/write throughput, and provides strong file system guarantees. We will talk about the practical problems shortly. Actually, they have different pros and cons for difference use cases, and fortunately they are not mutually exclusive. We suggest our customers use the right storage at the right place. So in this picture, we have cloud object storage service as the final input and output for the Hadoop application, while the HDFS serves intermediate data storage in the lifecycle of the virtual cluster in cloud. In this way, we can explore the advantages of cloud storage service and have high throughput so overall performance is not compromised.
  • #17 How we do that? If only the cloud storage connectors can adhere to the file system specification. So ideally the upper level applications are not aware of the underlying implementation, so migration is extremely easy. However, the cloud storage service like Amazon S3 has its own RES API, so we have to wrap its specific semantics and pretend it as a file system.
  • #18 Why Hadoop FileSystem? Because in Hadoop world, we have one interface that fits all. In this picture we can see that the upper level is the Hadoop applications and the underlying level are storage system implementations, including HDFS, WASB, S3,…
  • #19 Listing inconsistency: - take time to list created objects - lag in changed metadata about existing objects - lag in observing deleted objects
  • #21 Some of the real world use cases which can be impacted due to the S3 eventual consistency model are: - Listing Files: Newly created files might not be visible for data processing. In Hive, Spark and Mapreduce, this can lead to erroneous results from incomplete source data or failure to commit all intermediate results. -  Extract, Transform, Load (ETL) Workflow: Systems like Oozie rely on marker files to trigger the subsequent workflows. Any delay in the visibility of these files can lead to delays in the subsequent workflows.
  • #23 As it stands, using Amazon S3 as the direct output filesystem for queries is dangerous, as the eventually-consistent nature of the store means that invalid results may be generated. This is more likely with larger datasets and longer queries — the kind used in production, rather than development.
  • #24 Currently S3Guard uses the Amazon DynamoDB as the metadata store because of its low latency, high availability and seamless scalability. More importantly, because users are using Amazon S3 web service, it makes perfect sense for them to use another fully-managed web service like DynamoDB instead of maintaining the secondary metadata store themselves. So S3 is still the source of truth.
  • #25  BACKUP As indicated in the above figure, hadoop applications use the S3A filesystem client whose reads and writes we have already sped up. Only now, the client has had the transparent S3Guard extension enabled. Now any write operations that mutate the file system tree such as file creation and deletion will firstly go to S3 for persisting objects, after which they will update the DynamoDB metadata store accordingly. Read operations continue to return results to callers as sourced from S3 as the system of record, but those operations first check their results against the metadata in the consistent store. By unioning the results from both S3 and DynamoDB, listing operations (e.g. listStatus, listLocatedStatus and listFiles) will have latest view of the file system tree information. Overall, S3Guard enables S3 to be used as the intermediate store of queries and the direct destination of output with a consistent model.
  • #26 Meanwhile, S3Guard reduces the number of calls to S3 and helps in improving the performance of listing files — that being a core operation in the "split calculation" at the start of queries S3Guard provides tangible benefits when executing queries, as well as the less visible but more critical prevention of inconsistent listings. Our early performance benchmarking shows that S3Guard can cut split computation time in half in datasets involving large number of partitions. S3Guard doesn't just deliver consistency — it delivers speed. Tis still work in progress in terms of improving the performance further by making S3Guard data as authoritative data, which can further reduce the runtime by huge margin.
  • #27  Future works: Our ongoing effort is to enhance delete consistency to implement retry policies for dealing with disagreements between S3 and metadata store, and to further improve list performance if the subtree of file system is fully tracked in metadata store. Alongside this we are working on the next big step change: a zero-rename committer to S3 using S3Guard.