SlideShare a Scribd company logo
404 Not found
Gabor Bota
Sw Engineer at Cloudera
April 2020.
and other S3A bits in hadoop 3.3
404 Coffee - Puerto Rico
S3A and Filesystems
HADOOP client
FileSystem API
Spark Hive HBase
HDFS
...
Azure AWS S3
Google cloud
storage
S3A, S3Guard
● S3: not a real FS
● List&co consistency: S3Guard
● HBoss
Hadoop
S3A
Client
AWS
DynamoDB
AWS
S3
S3Guard no longer experimental
HADOOP-14936 Resolved:
● Rename work
● On-demand ddb tables
● Etags/versions for read inconsistency
● Metadata expiry (OOB deletions and creation)
● Authoritative (exclusive) mode for path
● Fsck
● and many other fixes, improvements.
404 (Not Found) - Bristol, UK
What’s not found?
● QE: “Input/output error while copying data from local to S3 folder”
● Issue with command: hdfs dfs -copyFromLocal
● Nightly test failure
● Intermittent failures
● Turning S3Guard on did not help
What’s not found?
# hdfs dfs -copyFromLocal /someData s3a://qe-bucket/st-hdfs-190612-085408-
s7z/user/hrt_1/
copyFromLocal: rename `s3a://qe-bucket/SF100_05.txt._COPYING_' to `s3a://qe-
bucket/SF100_05.txt': Input/output error
What’s not found?
More logging...
10:18:24,181 HEAD jar1.jar._COPYING (S3AFS.create) -> 404
10:18:24,944 HEAD jar1.jar._COPYING_ (deleteOnExit() registration) -> 404
10:18:25,733 PUT jar1.jar._COPYING_ (put file to S3A bucket) ->200
10:18:26,740 HEAD jar1.jar._COPYING_ (as out of band check at the start of
rename) -> 404
10:18:27,411 HEAD jar1.jar._COPYING_ (as preamble to COPY) -> 404
...repeated until 10:18:38,939 and final failure
What happened?
● S3 caches negative response
● It’s in the cache / load balancers for a while
● Every HEAD/GET refreshes the 404 in the cache
● How long - hours? No official answer in docs.
● Based on our tests it’s short amount (minutes)
Workaround #1
● Hotfix: increase timeout for operations. (fs.s3a.retry.limit and
fs.s3a.retry.interval)
● Overall performance impacted negatively
● Easy, config level
● Emergency workaround
● Linear waiting
● Stopped the test failing.
Workaround #2
HADOOP-16499. S3A retry policy to be exponential
● Add a non-linear policy
● Structured workaround
● Really it’s proportional
○ RetryUpToMaximumCountWithProportionalSleep
● Still one shared retry policy for everything
Fixing it
HADOOP-16490. Avoid/handle cached 404s during S3A file creation
● Better exception propagation
● Don’t do HEAD checks when creating files with overwrite=true
● Shell commands to avoid deleteOnExit calls (exists() checks)
● How would you test this?
○ Mocking - simulate getting back 404 util it’s 200.
Issues
1. Basic commit protocols (like distcp) use commit with rename in the end. This is
not good for cloud storages.
2. the -d option wasn't used in copyFromLocal, so the file is copied to a temp file
and then renamed.
3. S3Guard is off. So when rename() looks for a file and it isn't found, it doesn't
retry to see if it turns up. We use different policies for retry with S3Guard.
Recommendations
● Use -d for copyFromLocal uploads, maybe even -t for more threads on the
(slow) upload
● Use -direct for distcp for cloud storages!
● Spark: S3AStagingCommitter
● Turn S3Guard on.
Follow-up: HADOOP-16484 S3Guard: fail or warn if turned off!
404 Coffee Shop - Wisconsin, US
New in S3A Hadoop 3.3
● Delegation Token
○ AWS STS is used to request short-lived session tokens
● Support for etags and versioning
○ Credit for Cerner (Ben Roling)
● OpenSSL support via WildFly
● S3Select (cli tool, api)
● Delete: thundering herd.
○ Problem: Delete 1000 object at a time.
○ Don’t try the same request - split up to smaller page sizes.
● Rename parallelized, failure handling
○ DDB is updated even when failing partially
S3Guard FSCK
● CLI command
● hadoop s3guard fsck [-check | -internal] [-fix] (s3a://BUCKET |
s3a://PATH_PREFIX)
● Check: S3->DDB
● Internal: DDB internal consistency of DDB
● Fix: remove orphan entries
S3Guard FSCK
● It’s in hadoop 3.3
● Good starting point to get into
S3A and S3Guard
● Contributions welcome
?What’s next
● Directory marker problems: HADOOP-13230
● New stat interface: IOStatistics
● Vectored IO
○ We’d like to do it across the stack
Takeaways
● CDP QE is constantly working to find out issues before customers/users do.
● Fast turnaround for updates in upstream
○ Only a few weeks
○ We add incremental PRs instead of huge ones.
● Hadoop 3.3.0 has branched
○ We’ll be releasing soon
○ Come and help test the new release
○ It’s better to find the bugs before we ship
Q/A

More Related Content

What's hot

HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at Box
Cloudera, Inc.
 
Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)
Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)
Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)
Ontico
 
Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in Cloud
InMobi Technology
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! Scale
HBaseCon
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Developing High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & GoDeveloping High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & Go
Chris Stivers
 
Full Text Search in PostgreSQL
Full Text Search in PostgreSQLFull Text Search in PostgreSQL
Full Text Search in PostgreSQL
Aleksander Alekseev
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log system
Avleen Vig
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark Jobs
Samir Bessalah
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
InfluxData
 
Streaming huge databases using logical decoding
Streaming huge databases using logical decodingStreaming huge databases using logical decoding
Streaming huge databases using logical decoding
Alexander Shulgin
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC
 
PGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 - Tale from TrenchesPGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 - Tale from Trenches
PGConf APAC
 
pgday.seoul 2019: TimescaleDB
pgday.seoul 2019: TimescaleDBpgday.seoul 2019: TimescaleDB
pgday.seoul 2019: TimescaleDB
Chan Shik Lim
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017
HBaseCon
 
Мониторинг. Опять, rootconf 2016
Мониторинг. Опять, rootconf 2016Мониторинг. Опять, rootconf 2016
Мониторинг. Опять, rootconf 2016
Vsevolod Polyakov
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
Wei Shan Ang
 
SPARQLstream and Morph-streams
SPARQLstream and Morph-streamsSPARQLstream and Morph-streams
SPARQLstream and Morph-streams
Jean-Paul Calbimonte
 
Metrics: where and how
Metrics: where and howMetrics: where and how
Metrics: where and how
Vsevolod Polyakov
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
Denish Patel
 

What's hot (20)

HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at Box
 
Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)
Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)
Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)
 
Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in Cloud
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! Scale
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
 
Developing High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & GoDeveloping High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & Go
 
Full Text Search in PostgreSQL
Full Text Search in PostgreSQLFull Text Search in PostgreSQL
Full Text Search in PostgreSQL
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log system
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark Jobs
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
 
Streaming huge databases using logical decoding
Streaming huge databases using logical decodingStreaming huge databases using logical decoding
Streaming huge databases using logical decoding
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
PGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 - Tale from TrenchesPGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 - Tale from Trenches
 
pgday.seoul 2019: TimescaleDB
pgday.seoul 2019: TimescaleDBpgday.seoul 2019: TimescaleDB
pgday.seoul 2019: TimescaleDB
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017
 
Мониторинг. Опять, rootconf 2016
Мониторинг. Опять, rootconf 2016Мониторинг. Опять, rootconf 2016
Мониторинг. Опять, rootconf 2016
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
 
SPARQLstream and Morph-streams
SPARQLstream and Morph-streamsSPARQLstream and Morph-streams
SPARQLstream and Morph-streams
 
Metrics: where and how
Metrics: where and howMetrics: where and how
Metrics: where and how
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
 

Similar to Hadoop 3.3.0 S3A community call 202004

From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
Databricks
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
Rose Toomey
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesLeveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
Rose Toomey
 
Plain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsPlain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticals
Angela Byron
 
Lcna tutorial-2012
Lcna tutorial-2012Lcna tutorial-2012
Lcna tutorial-2012
Gluster.org
 
Lcna 2012-tutorial
Lcna 2012-tutorialLcna 2012-tutorial
Lcna 2012-tutorial
Gluster.org
 
Top 10 Perl Performance Tips
Top 10 Perl Performance TipsTop 10 Perl Performance Tips
Top 10 Perl Performance Tips
Perrin Harkins
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
Databricks
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming app
hadooparchbook
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Databricks
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisation
grooverdan
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsTracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Databricks
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
Angad Singh
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
PostgreSQL Experts, Inc.
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
mutt_data
 
Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
Steve Loughran
 

Similar to Hadoop 3.3.0 S3A community call 202004 (20)

From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesLeveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
 
Plain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsPlain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticals
 
Lcna tutorial-2012
Lcna tutorial-2012Lcna tutorial-2012
Lcna tutorial-2012
 
Lcna 2012-tutorial
Lcna 2012-tutorialLcna 2012-tutorial
Lcna 2012-tutorial
 
Top 10 Perl Performance Tips
Top 10 Perl Performance TipsTop 10 Perl Performance Tips
Top 10 Perl Performance Tips
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming app
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Clug 2012 March web server optimisation
Clug 2012 March   web server optimisationClug 2012 March   web server optimisation
Clug 2012 March web server optimisation
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsTracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 

Recently uploaded

All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Envertis Software Solutions
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
Rakesh Kumar R
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
Ayan Halder
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Top 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptxTop 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptx
devvsandy
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 

Recently uploaded (20)

All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Top 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptxTop 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptx
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 

Hadoop 3.3.0 S3A community call 202004

  • 1. 404 Not found Gabor Bota Sw Engineer at Cloudera April 2020. and other S3A bits in hadoop 3.3
  • 2. 404 Coffee - Puerto Rico
  • 3. S3A and Filesystems HADOOP client FileSystem API Spark Hive HBase HDFS ... Azure AWS S3 Google cloud storage
  • 4. S3A, S3Guard ● S3: not a real FS ● List&co consistency: S3Guard ● HBoss Hadoop S3A Client AWS DynamoDB AWS S3
  • 5.
  • 6. S3Guard no longer experimental HADOOP-14936 Resolved: ● Rename work ● On-demand ddb tables ● Etags/versions for read inconsistency ● Metadata expiry (OOB deletions and creation) ● Authoritative (exclusive) mode for path ● Fsck ● and many other fixes, improvements.
  • 7. 404 (Not Found) - Bristol, UK
  • 8. What’s not found? ● QE: “Input/output error while copying data from local to S3 folder” ● Issue with command: hdfs dfs -copyFromLocal ● Nightly test failure ● Intermittent failures ● Turning S3Guard on did not help
  • 9. What’s not found? # hdfs dfs -copyFromLocal /someData s3a://qe-bucket/st-hdfs-190612-085408- s7z/user/hrt_1/ copyFromLocal: rename `s3a://qe-bucket/SF100_05.txt._COPYING_' to `s3a://qe- bucket/SF100_05.txt': Input/output error
  • 10. What’s not found? More logging... 10:18:24,181 HEAD jar1.jar._COPYING (S3AFS.create) -> 404 10:18:24,944 HEAD jar1.jar._COPYING_ (deleteOnExit() registration) -> 404 10:18:25,733 PUT jar1.jar._COPYING_ (put file to S3A bucket) ->200 10:18:26,740 HEAD jar1.jar._COPYING_ (as out of band check at the start of rename) -> 404 10:18:27,411 HEAD jar1.jar._COPYING_ (as preamble to COPY) -> 404 ...repeated until 10:18:38,939 and final failure
  • 11. What happened? ● S3 caches negative response ● It’s in the cache / load balancers for a while ● Every HEAD/GET refreshes the 404 in the cache ● How long - hours? No official answer in docs. ● Based on our tests it’s short amount (minutes)
  • 12. Workaround #1 ● Hotfix: increase timeout for operations. (fs.s3a.retry.limit and fs.s3a.retry.interval) ● Overall performance impacted negatively ● Easy, config level ● Emergency workaround ● Linear waiting ● Stopped the test failing.
  • 13. Workaround #2 HADOOP-16499. S3A retry policy to be exponential ● Add a non-linear policy ● Structured workaround ● Really it’s proportional ○ RetryUpToMaximumCountWithProportionalSleep ● Still one shared retry policy for everything
  • 14. Fixing it HADOOP-16490. Avoid/handle cached 404s during S3A file creation ● Better exception propagation ● Don’t do HEAD checks when creating files with overwrite=true ● Shell commands to avoid deleteOnExit calls (exists() checks) ● How would you test this? ○ Mocking - simulate getting back 404 util it’s 200.
  • 15. Issues 1. Basic commit protocols (like distcp) use commit with rename in the end. This is not good for cloud storages. 2. the -d option wasn't used in copyFromLocal, so the file is copied to a temp file and then renamed. 3. S3Guard is off. So when rename() looks for a file and it isn't found, it doesn't retry to see if it turns up. We use different policies for retry with S3Guard.
  • 16. Recommendations ● Use -d for copyFromLocal uploads, maybe even -t for more threads on the (slow) upload ● Use -direct for distcp for cloud storages! ● Spark: S3AStagingCommitter ● Turn S3Guard on. Follow-up: HADOOP-16484 S3Guard: fail or warn if turned off!
  • 17. 404 Coffee Shop - Wisconsin, US
  • 18. New in S3A Hadoop 3.3 ● Delegation Token ○ AWS STS is used to request short-lived session tokens ● Support for etags and versioning ○ Credit for Cerner (Ben Roling) ● OpenSSL support via WildFly ● S3Select (cli tool, api) ● Delete: thundering herd. ○ Problem: Delete 1000 object at a time. ○ Don’t try the same request - split up to smaller page sizes. ● Rename parallelized, failure handling ○ DDB is updated even when failing partially
  • 19. S3Guard FSCK ● CLI command ● hadoop s3guard fsck [-check | -internal] [-fix] (s3a://BUCKET | s3a://PATH_PREFIX) ● Check: S3->DDB ● Internal: DDB internal consistency of DDB ● Fix: remove orphan entries
  • 20. S3Guard FSCK ● It’s in hadoop 3.3 ● Good starting point to get into S3A and S3Guard ● Contributions welcome
  • 21. ?What’s next ● Directory marker problems: HADOOP-13230 ● New stat interface: IOStatistics ● Vectored IO ○ We’d like to do it across the stack
  • 22. Takeaways ● CDP QE is constantly working to find out issues before customers/users do. ● Fast turnaround for updates in upstream ○ Only a few weeks ○ We add incremental PRs instead of huge ones. ● Hadoop 3.3.0 has branched ○ We’ll be releasing soon ○ Come and help test the new release ○ It’s better to find the bugs before we ship
  • 23. Q/A