SlideShare a Scribd company logo
1 of 14
Download to read offline
SMB-3.0 Offload Data Transfer (ODX)
(fast server-side copy)
Gordon Ross
Nexenta Systems, Inc.
Sept. 2016
What’s driving this? (Hyper-V)
• Hyper-V requirements for storage on SMB
– SMB 3 protocol level
– CA shares (or resilient handles)
– Authenticate machine accounts
• Hyper-V “nice to have”
– Offload Data Transfer (ODX)
for fast VM deployment
Background: Copy Chunks
• Traditional server-side copy uses the
“copy chunks” ioctl (etc.)
• Client-side outline: (simplified)
Open src & dst files
ioctl FSCTL_SRV_REQUEST_RESUME_KEY on src
if src is sparse
ioctl FSCTL_QUERY_ALLOCATED_RANGES on src
build list ranges to copy
ioctl FSCTL_SRV_COPYCHUNK on dst
Why another server-side copy?
• Short answer: Allows the server to optimize.
http://technet.microsoft.com/en-us/library/hh831628.aspx
• The server is in control of:
– Representation of data being copied
(the “token” is opaque to clients)
– Length of data represented by ODX read
– Length of data transferred by ODX write
• Much more flexible than “copy chunks”.
• Allows the server to be “in control”!
• Allows “T10” level copy with RSVD
(optional, and not covered here)
How it works (client side)
• Client-side outline:
open src_file (and get size etc)
create dst_file (and set its size)
while src_file not all copied
offload_read(src_file, &token, &rsize);
while token not all copied (rsize > 0)
offload_write(dst_file, token, &wsize);
• Two-level loop – not obvious from spec.
• The server controls read & write stride!
Details: ODX read
• Inputs:
– src file handle
– src file offset
– copy_length
• Outputs:
– xfer_length
– ODX “token”
• Note about ODX read out.xfer_length:
The server may return xfer_length < copy_length
i.e. to limit number of ranges to fit in the token, or
to control the read “stride”.
• … what’s that “token” thing? (next)
Details: ODX “token”
• Essentially a fixed-size “union”
• The fixed part specifies the type
• One type is pre-defined for “all zeros”
• All other types are vendor defined.
• Every token must represent the data range(s)
covered by the off+len returned by ODX read.
• Server decides what the token looks like, and
(more important) the read transfer length.
Example ODX token layout
(as implemented in NexentaStor)
#define STORAGE_OFFLOAD_TOKEN_TYPE_ZERO_DATA 0xFFFF0001 /* protocol defined */
#define STORAGE_OFFLOAD_TOKEN_TYPE_NATIVE1 0x10001 /* implementation defined */
#define TOKEN_TOTAL_SIZE 512
#define TOKEN_MAX_PAYLOAD 504 /* 512 - 8 */
typedef struct smb_odx_token {
uint32_t tok_type; /* big-endian on the wire */
uint16_t tok_reserved; /* zero */
uint16_t tok_len; /* big-endian on the wire */
union {
uint8_t u_tok_other[TOKEN_MAX_PAYLOAD];
struct tok_native1 {
smb2fid_t tn1_fid;
uint64_t tn1_off;
uint64_t tn1_eof;
} u_tok_native1;
} tok_u;
} smb_odx_token_t;
Details: ODX write
• Inputs:
– dst file handle
– dst file offset
– copy_length
– xfer offset (relative to start of token)
– ODX “token”
• Outputs:
– xfer_length
• Note about ODX write out.xfer_length:
the server may return xfer_length < copy_length
i.e. to limit the amount of I/O per ODX write request.
The client will issue more ODX write calls with the same
token until all data represented by the token is copied.
Optimizing ODX read
Easy optimizations:
– Use the “all zeros” token when the range to be
copied is entirely within a “hole” (common with
virtual disks, i.e. with Hyper-V).
– When there’s no more data after the requested
range, set (return) the flag ..._ALL_ZERO_BEYOND
which tells that client they can stop copying here.
Note that we don’t encode ranges in ODX read.
(Just encode the “src” handle, and do “sparse copy” in write.)
Optimizing ODX write
Easy optimizations:
– When given the all-zeros token, “punch a hole”
(free the range with fcntl F_FREESP or equiv.)
– When given a “native” (data) token, just call a
common “sparse copy” function (“hole aware”)
– Limit the amount of I/O per write, to avoid
causing timeouts, and allowing the client to
provide reasonable progress feed-back.
ODX copy scenarios
How this looks “on the wire”:
– When copying “all-zeros” we have a “stride” of
256MB (one ODX read and one ODX write)
– When copying data, the read “stride” is 256MB
and the write “stride” is 16MB. (Client does
one ODX read and then 16 ODX writes.)
[ wireshark examples ]
Testing ODX copy
• This part of the protocol is tricky to test
thanks to the amount of control left to
servers about tokens, strides etc.
• Have a simple ODX copy test for smbtorture:
https://github.com/gwr/samba/blob/gwr-m2
source4/torture/smb2/ioctl.c
Follow-up
• Where to see our code:
• https://github.com/Nexenta/illumos-nexenta
usr/src/uts/common/fs/smbsrv/smb2_fsctl_odx.c
(not published yet, but will be “soon”)
• Questions?
• Gordon Ross <gwr@nexenta.com>

More Related Content

What's hot

Docker: please contain your excitement
Docker: please contain your excitementDocker: please contain your excitement
Docker: please contain your excitementjohnnnl
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redisTanu Siwag
 
Ceph Day Beijing - Small Files & All Flash: Inspur's works on Ceph
Ceph Day Beijing - Small Files & All Flash:  Inspur's works on Ceph Ceph Day Beijing - Small Files & All Flash:  Inspur's works on Ceph
Ceph Day Beijing - Small Files & All Flash: Inspur's works on Ceph Danielle Womboldt
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFSTakahiro Inoue
 
Linux: An Unbeaten Empire
Linux: An Unbeaten EmpireLinux: An Unbeaten Empire
Linux: An Unbeaten EmpireYogesh Sharma
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
 
Redis Introduction
Redis IntroductionRedis Introduction
Redis IntroductionAlex Su
 
Linux Basic Commands
Linux Basic CommandsLinux Basic Commands
Linux Basic CommandsHanan Nmr
 
Part 03 File System Implementation in Linux
Part 03 File System Implementation in LinuxPart 03 File System Implementation in Linux
Part 03 File System Implementation in LinuxTushar B Kute
 
Resource Management of Docker
Resource Management of DockerResource Management of Docker
Resource Management of DockerSpeedyCloud
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformancesprdd
 
The Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsThe Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsDivye Kapoor
 
Basic linux commands for bioinformatics
Basic linux commands for bioinformaticsBasic linux commands for bioinformatics
Basic linux commands for bioinformaticsBonnie Ng
 
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Anne Nicolas
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!Zubair Nabi
 

What's hot (19)

redis basics
redis basicsredis basics
redis basics
 
Docker: please contain your excitement
Docker: please contain your excitementDocker: please contain your excitement
Docker: please contain your excitement
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
 
4. linux file systems
4. linux file systems4. linux file systems
4. linux file systems
 
Ceph Day Beijing - Small Files & All Flash: Inspur's works on Ceph
Ceph Day Beijing - Small Files & All Flash:  Inspur's works on Ceph Ceph Day Beijing - Small Files & All Flash:  Inspur's works on Ceph
Ceph Day Beijing - Small Files & All Flash: Inspur's works on Ceph
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFS
 
Linux: An Unbeaten Empire
Linux: An Unbeaten EmpireLinux: An Unbeaten Empire
Linux: An Unbeaten Empire
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
Redis Introduction
Redis IntroductionRedis Introduction
Redis Introduction
 
Linux Basic Commands
Linux Basic CommandsLinux Basic Commands
Linux Basic Commands
 
Part 03 File System Implementation in Linux
Part 03 File System Implementation in LinuxPart 03 File System Implementation in Linux
Part 03 File System Implementation in Linux
 
Resource Management of Docker
Resource Management of DockerResource Management of Docker
Resource Management of Docker
 
Shell programming
Shell programmingShell programming
Shell programming
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformance
 
The Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsThe Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOs
 
Basic linux commands for bioinformatics
Basic linux commands for bioinformaticsBasic linux commands for bioinformatics
Basic linux commands for bioinformatics
 
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
 
Vi Editor
Vi EditorVi Editor
Vi Editor
 

Similar to SMB3 Offload Data Transfer (ODX)

ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortNAVER D2
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
ODX (Offloaded Data Transfers)
ODX (Offloaded Data Transfers)ODX (Offloaded Data Transfers)
ODX (Offloaded Data Transfers)Ashwin Pawar
 
OpenSSL Basic Function Call Flow
OpenSSL Basic Function Call FlowOpenSSL Basic Function Call Flow
OpenSSL Basic Function Call FlowWilliam Lee
 
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackTomer Zait
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -evechiportal
 
Socket Programming
Socket ProgrammingSocket Programming
Socket ProgrammingCEC Landran
 
Network and distributed systems
Network and distributed systemsNetwork and distributed systems
Network and distributed systemsSri Prasanna
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...DataStax
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramChris Adkin
 
Shellcoding in linux
Shellcoding in linuxShellcoding in linux
Shellcoding in linuxAjin Abraham
 
Fosdem10
Fosdem10Fosdem10
Fosdem10wremes
 
Sedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsIvan Shcheklein
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Fwdays
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton insertsChris Adkin
 
Introduction to nexux from zero to Hero
Introduction to nexux  from zero to HeroIntroduction to nexux  from zero to Hero
Introduction to nexux from zero to HeroDhruv Sharma
 
Socket programming with php
Socket programming with phpSocket programming with php
Socket programming with phpElizabeth Smith
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayPhil Estes
 

Similar to SMB3 Offload Data Transfer (ODX) (20)

ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
ODX (Offloaded Data Transfers)
ODX (Offloaded Data Transfers)ODX (Offloaded Data Transfers)
ODX (Offloaded Data Transfers)
 
OpenSSL Basic Function Call Flow
OpenSSL Basic Function Call FlowOpenSSL Basic Function Call Flow
OpenSSL Basic Function Call Flow
 
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The Stack
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
 
Socket Programming
Socket ProgrammingSocket Programming
Socket Programming
 
Network and distributed systems
Network and distributed systemsNetwork and distributed systems
Network and distributed systems
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ram
 
Shellcoding in linux
Shellcoding in linuxShellcoding in linux
Shellcoding in linux
 
Fosdem10
Fosdem10Fosdem10
Fosdem10
 
Sedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor Internals
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
Introduction to nexux from zero to Hero
Introduction to nexux  from zero to HeroIntroduction to nexux  from zero to Hero
Introduction to nexux from zero to Hero
 
L5-Sockets.pptx
L5-Sockets.pptxL5-Sockets.pptx
L5-Sockets.pptx
 
Socket programming with php
Socket programming with phpSocket programming with php
Socket programming with php
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
 
x86_1.ppt
x86_1.pptx86_1.ppt
x86_1.ppt
 

Recently uploaded

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

SMB3 Offload Data Transfer (ODX)

  • 1. SMB-3.0 Offload Data Transfer (ODX) (fast server-side copy) Gordon Ross Nexenta Systems, Inc. Sept. 2016
  • 2. What’s driving this? (Hyper-V) • Hyper-V requirements for storage on SMB – SMB 3 protocol level – CA shares (or resilient handles) – Authenticate machine accounts • Hyper-V “nice to have” – Offload Data Transfer (ODX) for fast VM deployment
  • 3. Background: Copy Chunks • Traditional server-side copy uses the “copy chunks” ioctl (etc.) • Client-side outline: (simplified) Open src & dst files ioctl FSCTL_SRV_REQUEST_RESUME_KEY on src if src is sparse ioctl FSCTL_QUERY_ALLOCATED_RANGES on src build list ranges to copy ioctl FSCTL_SRV_COPYCHUNK on dst
  • 4. Why another server-side copy? • Short answer: Allows the server to optimize. http://technet.microsoft.com/en-us/library/hh831628.aspx • The server is in control of: – Representation of data being copied (the “token” is opaque to clients) – Length of data represented by ODX read – Length of data transferred by ODX write • Much more flexible than “copy chunks”. • Allows the server to be “in control”! • Allows “T10” level copy with RSVD (optional, and not covered here)
  • 5. How it works (client side) • Client-side outline: open src_file (and get size etc) create dst_file (and set its size) while src_file not all copied offload_read(src_file, &token, &rsize); while token not all copied (rsize > 0) offload_write(dst_file, token, &wsize); • Two-level loop – not obvious from spec. • The server controls read & write stride!
  • 6. Details: ODX read • Inputs: – src file handle – src file offset – copy_length • Outputs: – xfer_length – ODX “token” • Note about ODX read out.xfer_length: The server may return xfer_length < copy_length i.e. to limit number of ranges to fit in the token, or to control the read “stride”. • … what’s that “token” thing? (next)
  • 7. Details: ODX “token” • Essentially a fixed-size “union” • The fixed part specifies the type • One type is pre-defined for “all zeros” • All other types are vendor defined. • Every token must represent the data range(s) covered by the off+len returned by ODX read. • Server decides what the token looks like, and (more important) the read transfer length.
  • 8. Example ODX token layout (as implemented in NexentaStor) #define STORAGE_OFFLOAD_TOKEN_TYPE_ZERO_DATA 0xFFFF0001 /* protocol defined */ #define STORAGE_OFFLOAD_TOKEN_TYPE_NATIVE1 0x10001 /* implementation defined */ #define TOKEN_TOTAL_SIZE 512 #define TOKEN_MAX_PAYLOAD 504 /* 512 - 8 */ typedef struct smb_odx_token { uint32_t tok_type; /* big-endian on the wire */ uint16_t tok_reserved; /* zero */ uint16_t tok_len; /* big-endian on the wire */ union { uint8_t u_tok_other[TOKEN_MAX_PAYLOAD]; struct tok_native1 { smb2fid_t tn1_fid; uint64_t tn1_off; uint64_t tn1_eof; } u_tok_native1; } tok_u; } smb_odx_token_t;
  • 9. Details: ODX write • Inputs: – dst file handle – dst file offset – copy_length – xfer offset (relative to start of token) – ODX “token” • Outputs: – xfer_length • Note about ODX write out.xfer_length: the server may return xfer_length < copy_length i.e. to limit the amount of I/O per ODX write request. The client will issue more ODX write calls with the same token until all data represented by the token is copied.
  • 10. Optimizing ODX read Easy optimizations: – Use the “all zeros” token when the range to be copied is entirely within a “hole” (common with virtual disks, i.e. with Hyper-V). – When there’s no more data after the requested range, set (return) the flag ..._ALL_ZERO_BEYOND which tells that client they can stop copying here. Note that we don’t encode ranges in ODX read. (Just encode the “src” handle, and do “sparse copy” in write.)
  • 11. Optimizing ODX write Easy optimizations: – When given the all-zeros token, “punch a hole” (free the range with fcntl F_FREESP or equiv.) – When given a “native” (data) token, just call a common “sparse copy” function (“hole aware”) – Limit the amount of I/O per write, to avoid causing timeouts, and allowing the client to provide reasonable progress feed-back.
  • 12. ODX copy scenarios How this looks “on the wire”: – When copying “all-zeros” we have a “stride” of 256MB (one ODX read and one ODX write) – When copying data, the read “stride” is 256MB and the write “stride” is 16MB. (Client does one ODX read and then 16 ODX writes.) [ wireshark examples ]
  • 13. Testing ODX copy • This part of the protocol is tricky to test thanks to the amount of control left to servers about tokens, strides etc. • Have a simple ODX copy test for smbtorture: https://github.com/gwr/samba/blob/gwr-m2 source4/torture/smb2/ioctl.c
  • 14. Follow-up • Where to see our code: • https://github.com/Nexenta/illumos-nexenta usr/src/uts/common/fs/smbsrv/smb2_fsctl_odx.c (not published yet, but will be “soon”) • Questions? • Gordon Ross <gwr@nexenta.com>