SlideShare a Scribd company logo
SMB-3.0 Offload Data Transfer (ODX)
(fast server-side copy)
Gordon Ross
Nexenta Systems, Inc.
Sept. 2016
What’s driving this? (Hyper-V)
• Hyper-V requirements for storage on SMB
– SMB 3 protocol level
– CA shares (or resilient handles)
– Authenticate machine accounts
• Hyper-V “nice to have”
– Offload Data Transfer (ODX)
for fast VM deployment
Background: Copy Chunks
• Traditional server-side copy uses the
“copy chunks” ioctl (etc.)
• Client-side outline: (simplified)
Open src & dst files
ioctl FSCTL_SRV_REQUEST_RESUME_KEY on src
if src is sparse
ioctl FSCTL_QUERY_ALLOCATED_RANGES on src
build list ranges to copy
ioctl FSCTL_SRV_COPYCHUNK on dst
Why another server-side copy?
• Short answer: Allows the server to optimize.
http://technet.microsoft.com/en-us/library/hh831628.aspx
• The server is in control of:
– Representation of data being copied
(the “token” is opaque to clients)
– Length of data represented by ODX read
– Length of data transferred by ODX write
• Much more flexible than “copy chunks”.
• Allows the server to be “in control”!
• Allows “T10” level copy with RSVD
(optional, and not covered here)
How it works (client side)
• Client-side outline:
open src_file (and get size etc)
create dst_file (and set its size)
while src_file not all copied
offload_read(src_file, &token, &rsize);
while token not all copied (rsize > 0)
offload_write(dst_file, token, &wsize);
• Two-level loop – not obvious from spec.
• The server controls read & write stride!
Details: ODX read
• Inputs:
– src file handle
– src file offset
– copy_length
• Outputs:
– xfer_length
– ODX “token”
• Note about ODX read out.xfer_length:
The server may return xfer_length < copy_length
i.e. to limit number of ranges to fit in the token, or
to control the read “stride”.
• … what’s that “token” thing? (next)
Details: ODX “token”
• Essentially a fixed-size “union”
• The fixed part specifies the type
• One type is pre-defined for “all zeros”
• All other types are vendor defined.
• Every token must represent the data range(s)
covered by the off+len returned by ODX read.
• Server decides what the token looks like, and
(more important) the read transfer length.
Example ODX token layout
(as implemented in NexentaStor)
#define STORAGE_OFFLOAD_TOKEN_TYPE_ZERO_DATA 0xFFFF0001 /* protocol defined */
#define STORAGE_OFFLOAD_TOKEN_TYPE_NATIVE1 0x10001 /* implementation defined */
#define TOKEN_TOTAL_SIZE 512
#define TOKEN_MAX_PAYLOAD 504 /* 512 - 8 */
typedef struct smb_odx_token {
uint32_t tok_type; /* big-endian on the wire */
uint16_t tok_reserved; /* zero */
uint16_t tok_len; /* big-endian on the wire */
union {
uint8_t u_tok_other[TOKEN_MAX_PAYLOAD];
struct tok_native1 {
smb2fid_t tn1_fid;
uint64_t tn1_off;
uint64_t tn1_eof;
} u_tok_native1;
} tok_u;
} smb_odx_token_t;
Details: ODX write
• Inputs:
– dst file handle
– dst file offset
– copy_length
– xfer offset (relative to start of token)
– ODX “token”
• Outputs:
– xfer_length
• Note about ODX write out.xfer_length:
the server may return xfer_length < copy_length
i.e. to limit the amount of I/O per ODX write request.
The client will issue more ODX write calls with the same
token until all data represented by the token is copied.
Optimizing ODX read
Easy optimizations:
– Use the “all zeros” token when the range to be
copied is entirely within a “hole” (common with
virtual disks, i.e. with Hyper-V).
– When there’s no more data after the requested
range, set (return) the flag ..._ALL_ZERO_BEYOND
which tells that client they can stop copying here.
Note that we don’t encode ranges in ODX read.
(Just encode the “src” handle, and do “sparse copy” in write.)
Optimizing ODX write
Easy optimizations:
– When given the all-zeros token, “punch a hole”
(free the range with fcntl F_FREESP or equiv.)
– When given a “native” (data) token, just call a
common “sparse copy” function (“hole aware”)
– Limit the amount of I/O per write, to avoid
causing timeouts, and allowing the client to
provide reasonable progress feed-back.
ODX copy scenarios
How this looks “on the wire”:
– When copying “all-zeros” we have a “stride” of
256MB (one ODX read and one ODX write)
– When copying data, the read “stride” is 256MB
and the write “stride” is 16MB. (Client does
one ODX read and then 16 ODX writes.)
[ wireshark examples ]
Testing ODX copy
• This part of the protocol is tricky to test
thanks to the amount of control left to
servers about tokens, strides etc.
• Have a simple ODX copy test for smbtorture:
https://github.com/gwr/samba/blob/gwr-m2
source4/torture/smb2/ioctl.c
Follow-up
• Where to see our code:
• https://github.com/Nexenta/illumos-nexenta
usr/src/uts/common/fs/smbsrv/smb2_fsctl_odx.c
(not published yet, but will be “soon”)
• Questions?
• Gordon Ross <gwr@nexenta.com>

More Related Content

What's hot

redis basics
redis basicsredis basics
redis basics
Manoj Kumar
 
Docker: please contain your excitement
Docker: please contain your excitementDocker: please contain your excitement
Docker: please contain your excitement
johnnnl
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
Tanu Siwag
 
4. linux file systems
4. linux file systems4. linux file systems
4. linux file systems
Marian Marinov
 
Ceph Day Beijing - Small Files & All Flash: Inspur's works on Ceph
Ceph Day Beijing - Small Files & All Flash:  Inspur's works on Ceph Ceph Day Beijing - Small Files & All Flash:  Inspur's works on Ceph
Ceph Day Beijing - Small Files & All Flash: Inspur's works on Ceph
Danielle Womboldt
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFS
Takahiro Inoue
 
Linux: An Unbeaten Empire
Linux: An Unbeaten EmpireLinux: An Unbeaten Empire
Linux: An Unbeaten Empire
Yogesh Sharma
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
Zubair Nabi
 
Redis Introduction
Redis IntroductionRedis Introduction
Redis Introduction
Alex Su
 
Linux Basic Commands
Linux Basic CommandsLinux Basic Commands
Linux Basic Commands
Hanan Nmr
 
Part 03 File System Implementation in Linux
Part 03 File System Implementation in LinuxPart 03 File System Implementation in Linux
Part 03 File System Implementation in Linux
Tushar B Kute
 
Resource Management of Docker
Resource Management of DockerResource Management of Docker
Resource Management of Docker
SpeedyCloud
 
Shell programming
Shell programmingShell programming
Shell programming
Moayad Moawiah
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformance
sprdd
 
The Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsThe Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOs
Divye Kapoor
 
Basic linux commands for bioinformatics
Basic linux commands for bioinformaticsBasic linux commands for bioinformatics
Basic linux commands for bioinformatics
Bonnie Ng
 
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Anne Nicolas
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
Zubair Nabi
 
Vi Editor
Vi EditorVi Editor

What's hot (19)

redis basics
redis basicsredis basics
redis basics
 
Docker: please contain your excitement
Docker: please contain your excitementDocker: please contain your excitement
Docker: please contain your excitement
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
 
4. linux file systems
4. linux file systems4. linux file systems
4. linux file systems
 
Ceph Day Beijing - Small Files & All Flash: Inspur's works on Ceph
Ceph Day Beijing - Small Files & All Flash:  Inspur's works on Ceph Ceph Day Beijing - Small Files & All Flash:  Inspur's works on Ceph
Ceph Day Beijing - Small Files & All Flash: Inspur's works on Ceph
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFS
 
Linux: An Unbeaten Empire
Linux: An Unbeaten EmpireLinux: An Unbeaten Empire
Linux: An Unbeaten Empire
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
Redis Introduction
Redis IntroductionRedis Introduction
Redis Introduction
 
Linux Basic Commands
Linux Basic CommandsLinux Basic Commands
Linux Basic Commands
 
Part 03 File System Implementation in Linux
Part 03 File System Implementation in LinuxPart 03 File System Implementation in Linux
Part 03 File System Implementation in Linux
 
Resource Management of Docker
Resource Management of DockerResource Management of Docker
Resource Management of Docker
 
Shell programming
Shell programmingShell programming
Shell programming
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformance
 
The Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsThe Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOs
 
Basic linux commands for bioinformatics
Basic linux commands for bioinformaticsBasic linux commands for bioinformatics
Basic linux commands for bioinformatics
 
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
 
Vi Editor
Vi EditorVi Editor
Vi Editor
 

Similar to SMB3 Offload Data Transfer (ODX)

ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
NAVER D2
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
Ben Stopford
 
ODX (Offloaded Data Transfers)
ODX (Offloaded Data Transfers)ODX (Offloaded Data Transfers)
ODX (Offloaded Data Transfers)
Ashwin Pawar
 
OpenSSL Basic Function Call Flow
OpenSSL Basic Function Call FlowOpenSSL Basic Function Call Flow
OpenSSL Basic Function Call Flow
William Lee
 
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The Stack
Tomer Zait
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
chiportal
 
Socket Programming
Socket ProgrammingSocket Programming
Socket Programming
CEC Landran
 
Network and distributed systems
Network and distributed systemsNetwork and distributed systems
Network and distributed systems
Sri Prasanna
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
DataStax
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ram
Chris Adkin
 
Shellcoding in linux
Shellcoding in linuxShellcoding in linux
Shellcoding in linux
Ajin Abraham
 
Fosdem10
Fosdem10Fosdem10
Fosdem10
wremes
 
Sedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor Internals
Ivan Shcheklein
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...
Fwdays
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
Chris Adkin
 
Introduction to nexux from zero to Hero
Introduction to nexux  from zero to HeroIntroduction to nexux  from zero to Hero
Introduction to nexux from zero to Hero
Dhruv Sharma
 
L5-Sockets.pptx
L5-Sockets.pptxL5-Sockets.pptx
L5-Sockets.pptx
ycelgemici1
 
Socket programming with php
Socket programming with phpSocket programming with php
Socket programming with php
Elizabeth Smith
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Phil Estes
 
x86_1.ppt
x86_1.pptx86_1.ppt
x86_1.ppt
jeronimored
 

Similar to SMB3 Offload Data Transfer (ODX) (20)

ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
ODX (Offloaded Data Transfers)
ODX (Offloaded Data Transfers)ODX (Offloaded Data Transfers)
ODX (Offloaded Data Transfers)
 
OpenSSL Basic Function Call Flow
OpenSSL Basic Function Call FlowOpenSSL Basic Function Call Flow
OpenSSL Basic Function Call Flow
 
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The Stack
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
 
Socket Programming
Socket ProgrammingSocket Programming
Socket Programming
 
Network and distributed systems
Network and distributed systemsNetwork and distributed systems
Network and distributed systems
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ram
 
Shellcoding in linux
Shellcoding in linuxShellcoding in linux
Shellcoding in linux
 
Fosdem10
Fosdem10Fosdem10
Fosdem10
 
Sedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor Internals
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
Introduction to nexux from zero to Hero
Introduction to nexux  from zero to HeroIntroduction to nexux  from zero to Hero
Introduction to nexux from zero to Hero
 
L5-Sockets.pptx
L5-Sockets.pptxL5-Sockets.pptx
L5-Sockets.pptx
 
Socket programming with php
Socket programming with phpSocket programming with php
Socket programming with php
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
 
x86_1.ppt
x86_1.pptx86_1.ppt
x86_1.ppt
 

Recently uploaded

Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 

Recently uploaded (20)

Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 

SMB3 Offload Data Transfer (ODX)

  • 1. SMB-3.0 Offload Data Transfer (ODX) (fast server-side copy) Gordon Ross Nexenta Systems, Inc. Sept. 2016
  • 2. What’s driving this? (Hyper-V) • Hyper-V requirements for storage on SMB – SMB 3 protocol level – CA shares (or resilient handles) – Authenticate machine accounts • Hyper-V “nice to have” – Offload Data Transfer (ODX) for fast VM deployment
  • 3. Background: Copy Chunks • Traditional server-side copy uses the “copy chunks” ioctl (etc.) • Client-side outline: (simplified) Open src & dst files ioctl FSCTL_SRV_REQUEST_RESUME_KEY on src if src is sparse ioctl FSCTL_QUERY_ALLOCATED_RANGES on src build list ranges to copy ioctl FSCTL_SRV_COPYCHUNK on dst
  • 4. Why another server-side copy? • Short answer: Allows the server to optimize. http://technet.microsoft.com/en-us/library/hh831628.aspx • The server is in control of: – Representation of data being copied (the “token” is opaque to clients) – Length of data represented by ODX read – Length of data transferred by ODX write • Much more flexible than “copy chunks”. • Allows the server to be “in control”! • Allows “T10” level copy with RSVD (optional, and not covered here)
  • 5. How it works (client side) • Client-side outline: open src_file (and get size etc) create dst_file (and set its size) while src_file not all copied offload_read(src_file, &token, &rsize); while token not all copied (rsize > 0) offload_write(dst_file, token, &wsize); • Two-level loop – not obvious from spec. • The server controls read & write stride!
  • 6. Details: ODX read • Inputs: – src file handle – src file offset – copy_length • Outputs: – xfer_length – ODX “token” • Note about ODX read out.xfer_length: The server may return xfer_length < copy_length i.e. to limit number of ranges to fit in the token, or to control the read “stride”. • … what’s that “token” thing? (next)
  • 7. Details: ODX “token” • Essentially a fixed-size “union” • The fixed part specifies the type • One type is pre-defined for “all zeros” • All other types are vendor defined. • Every token must represent the data range(s) covered by the off+len returned by ODX read. • Server decides what the token looks like, and (more important) the read transfer length.
  • 8. Example ODX token layout (as implemented in NexentaStor) #define STORAGE_OFFLOAD_TOKEN_TYPE_ZERO_DATA 0xFFFF0001 /* protocol defined */ #define STORAGE_OFFLOAD_TOKEN_TYPE_NATIVE1 0x10001 /* implementation defined */ #define TOKEN_TOTAL_SIZE 512 #define TOKEN_MAX_PAYLOAD 504 /* 512 - 8 */ typedef struct smb_odx_token { uint32_t tok_type; /* big-endian on the wire */ uint16_t tok_reserved; /* zero */ uint16_t tok_len; /* big-endian on the wire */ union { uint8_t u_tok_other[TOKEN_MAX_PAYLOAD]; struct tok_native1 { smb2fid_t tn1_fid; uint64_t tn1_off; uint64_t tn1_eof; } u_tok_native1; } tok_u; } smb_odx_token_t;
  • 9. Details: ODX write • Inputs: – dst file handle – dst file offset – copy_length – xfer offset (relative to start of token) – ODX “token” • Outputs: – xfer_length • Note about ODX write out.xfer_length: the server may return xfer_length < copy_length i.e. to limit the amount of I/O per ODX write request. The client will issue more ODX write calls with the same token until all data represented by the token is copied.
  • 10. Optimizing ODX read Easy optimizations: – Use the “all zeros” token when the range to be copied is entirely within a “hole” (common with virtual disks, i.e. with Hyper-V). – When there’s no more data after the requested range, set (return) the flag ..._ALL_ZERO_BEYOND which tells that client they can stop copying here. Note that we don’t encode ranges in ODX read. (Just encode the “src” handle, and do “sparse copy” in write.)
  • 11. Optimizing ODX write Easy optimizations: – When given the all-zeros token, “punch a hole” (free the range with fcntl F_FREESP or equiv.) – When given a “native” (data) token, just call a common “sparse copy” function (“hole aware”) – Limit the amount of I/O per write, to avoid causing timeouts, and allowing the client to provide reasonable progress feed-back.
  • 12. ODX copy scenarios How this looks “on the wire”: – When copying “all-zeros” we have a “stride” of 256MB (one ODX read and one ODX write) – When copying data, the read “stride” is 256MB and the write “stride” is 16MB. (Client does one ODX read and then 16 ODX writes.) [ wireshark examples ]
  • 13. Testing ODX copy • This part of the protocol is tricky to test thanks to the amount of control left to servers about tokens, strides etc. • Have a simple ODX copy test for smbtorture: https://github.com/gwr/samba/blob/gwr-m2 source4/torture/smb2/ioctl.c
  • 14. Follow-up • Where to see our code: • https://github.com/Nexenta/illumos-nexenta usr/src/uts/common/fs/smbsrv/smb2_fsctl_odx.c (not published yet, but will be “soon”) • Questions? • Gordon Ross <gwr@nexenta.com>