SlideShare a Scribd company logo
Network flows
Tomer Perry
IBM Spectrum Scale Network
Outline
ļ‚§ Overview
ļ‚§ Types of Network traffic in IBM Spectrum Scale environment
ļ‚§ Components
ļ‚§ Component traffic flow
ļ‚§ Other resources
2
Overview
ā—
IBM Spectrum Scale, like any distributed software, depends highly on the network
infrastructure in order to operate efficiently.
ā—
The goal of this presentation is to provide a description of the various network traffic
flows involved in the IBM Spectrum Scale environment.
ā—
Network designers can use this information to design a suitable network for the IBM
Spectrum Scale environments
Types of network traff in IBM Speftrum Sfale environment
ā—
In general, network traffic in IBM Spectrum Scale can be divided into 3 major types:
ā€“ Data/Metadata Traffic:
This type of traffic is mostly around actual data movement between various IBM Spectrum Scale components. It might travel over TCP/IP or use RDMA depending on the network type and
protocol being used.
Examples include: NSD traffic, GNR synchronization, public NAS protocol traffic, AFM replication traffic , external pools traffic etc.
ā€“ Control traffic:
This type of traffic consists of messages (usually small payloads) that are being used for control purposes. It includes both internal and external traffic (internal traffic involves control traffic
between IBM Spectrum Scale components, while external traffic involves control traffic between IBM Spectrum Scale and external entities).
Examples includes: Token traffic, disk leases traffic, encryption related key traffic, quota management traffic, monitoring traffic, CTDB traffic, configuration management traffic, authentication
traffic etc.
ā€“ Adminitration traffic:
This type of traffic is around administrative operations that are being performed on an IBM Spectrum Scale environment
Examples includes: remote shell traffic, Web GUI traffic (external), REST administration traffic etc.
ā—
Different components are using the above mentioned traffic type in different ways. Thus, we will try to describe each traffic type on a per
component basis
Note: In the last couple of years, many new features were introduced on top of the traditional GPFS product, which made IBM Spectrum Scale more
feature-rich. Many of those features required introducing new types of network flows.
IBM Speftrum Sfale fomponent - ā€œCoreā€
ā—
The traditional GPFS is the core component of IBM Spectrum Scale.
ā—
All the non ā€œcoreā€ IBM Spectrum Scale components essentially use the ā€œcoreā€
components while enriching its capabilities ( access using standard protocols,
management capabilities etc.) . Thus, the flows discussed in this section apply to the other
components as well.
ā—
Some traffic flows depend on the features being used (for example: encryption, quotas
etc.).
ā—
Some traffic flows are affected by the configuration being used (for example: replication
traffic).
IBM Speftrum Sfale fomponents - ā€œCoreā€
Data traff
ā—
The basic data traffic type in ā€œCoreā€ environment is that data transfer between the NSD
servers and various clients (which might have other roles, like protocol nodes, TSM nodes,
AFM GW, TCT nodes etc.). In the context of this document, Data includes filesystem
metadata as well.
ā—
Data traffic is using the NSD protocol (Network Shared Disk) over either TCP/IP and/or
RDMA (over Ethernet ā€“ a.k.a. RoCE, over IB and over OPA). RDMA protocol is considered
more efficient, especially for large I/O sizes.
ā—
The required characteristics of the data channel highly depend on the customer workload
(throughput oriented, latency oriented etc.). While metadata workload is usually latency-
sensitive.
ā—
For sequential workloads, data messages size may be as large as the file system block size.
ā—
When using IBM Spectrum Scale sync replication, the client is responsible for writing multiple
copies. Thus, it is required to take those into account as well (2X/3X writes for 2/3 replicas).
For FPO configurations, where enableRepWriteStream is enabled, the first NSD server will
write the third replica instead of the client.
ā—
Other data paths, which are still considered ā€œcoreā€ are the ones related to various features
around moving data out of/into ā€œIBM Spectrum Scaleā€ environments:
ā€“ AFM GW: Read/write (configuration dependent) data to/from external sources (either
remote GPFS cluster or generic NFS server)
ā€“ Spectrum Protect HSM: When using Spectrum Protect as an external pool for ILM,
data will move to/from IBM Spectrum Scale to the IBM Spectrum Protect
infrastructure.
ā€“ TCT: Transparent Cloud Tiering used as an external pool as well, data will be moved
to/from the external cloud/Object infrastructure
NSD
server
NSD
server
NSD
server
AFM
GW
AFM
GW
TSM
HSM
TSM
HSM
TSM
SRV
TCT
GW
TCT
GW
Cloud
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
Remote Cluster
/NFS server
IBM Speftrum Sfale fomponents - ā€œCoreā€
Control traff
ā—
Control traffic in IBM Spectrum Scale environments can
be divided into three major types:
ā€“ Cluster ā€œwell beingā€ - i.e. disk leases (ā€œheartbeatā€)
ā€“ Data access or consistency management
(Tokens, quotas, allocation etc.)
ā€“ Configuration or feature related (key
management, Zimon collectors, configuration
management, AFM related RPCs etc)
ā—
Most of these types of communication are latency
sensitive (i.e. slow response time will affect
performance) however, some of them,especially the
ā€œwell beingā€ ones are less latency sensitive, but highly
impact the availability of nodes in the cluster if message
delay exceeds specific values (e.g. heartbeat timeouts).
ā—
Note: In most cases, a node can play multiple roles.
NSD
server
NSD
server
NSD
server
AFM
GW
AFM
GW
CL
MGR
FS
MGR
Zimon
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
Token
SRV
Meta
Node
AFM
GW
Token
SRV
Token
SRV
Meta
Node
Meta
Node
KLM
SRV
CCR
SRV
CCR
SRV
CCR
SRV
ZimonZimonZimonZimon
IBM Speftrum Sfale fomponents - ā€œCoreā€
Control traff
ā€œCluster well beingā€ - a.k.a disk leases traffic
ā—
In each cluster, the cluster manager manages the disk
leases. The cluster manager is automatically elected from
the list of ā€œquorum nodesā€.
ā—
Each node (regardless of its role) needs to renew its lease
with the cluster manager server (dynamically elected) at
regular intervals (35 sec. By default).
ā—
If a node fails to renew its lease, the cluster manager will
try to ping the ā€œsuspected nodeā€ before expelling it.
ā—
Another expel case might take place if a node canā€™t talk to
another node. In that case, it will ask the CL mgr to expel
the node. The CL MGR will decide which node needs to be
expelled
ā—
All nodes in the local cluster (and potentially on remote
clusters) participate in the lease-renewal mechanism,
regardless of their role.
NSD
server
NSD
server
NSD
server
AFM
GW
AFM
GW
CL
MGR
FS
MGR
Zimon
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
Token
SRV
Meta
Node
AFM
GW
Token
SRV
Token
SRV
Meta
Node
Meta
Node
KLM
SRV
CCR
SRV
CCR
SRV
CCR
SRV
ZimonZimonZimonZimon
IBM Speftrum Sfale fomponents - ā€œCoreā€
Control traff
ā€œData access/consistency managementā€
ā—
As a distributed filesystem, IBM Spectrum Scale needs some
mechanisms in order to assure data consistency as well as capacity
management. We describe those cases below:
ā€“ Tokens:
Overview:
One of the basic constructs of IBM Spectrum Scale, as a parallel
fs, is the use of tokens in order to maintain consistency.
Essentially, in order to perform an operation on a filesystem object
(file/directory etc.) a relevant token
is required (for reading, writing, caching etc.).
Workflow:
At least one token server is automatically selected from the list of
ā€œmanagerā€ nodes. The token servers manage the tokens, and
each client (regardless of its role) needs to contact a token server.
The token servers ā€œdivideā€ the object responsibility between them
based on inode number, thus achieving load balancing and better
resiliency.
Network Requirements:
In general, token traffic is based on small RPC messages, which
are mostly latency sensitive (i.e. not throughput oriented). Slow
response time (due to latency, network congestion etc.) might
result in ā€œhangsā€ or ā€œhiccupsā€ from user/application perspective
NSD
server
NSD
server
NSD
server
AFM
GW
AFM
GW
CL
MGR
FS
MGR
Zimon
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
Token
SRV
Meta
Node
AFM
GW
Token
SRV
Token
SRV
Meta
Node
Meta
Node
KLM
SRV
CCR
SRV
CCR
SRV
CCR
SRV
ZimonZimonZimonZimon
IBM Speftrum Sfale fomponents - ā€œCoreā€
Metadata traff in presenfe of fle sharing
ā€œData access/consistency managementā€
ā€“ Metanode:
Overview:
A corner case in token management (in order to achieve better
performance) is the use of a ā€œmetanodeā€.
One of the challenges in multi-node access to the same object is
the ability to maintain metadata consistency. While in theory, the
common token system can be used, updating (or querying) object
metadata might result in many token ā€œstealsā€. Thus, we use a
slightly different mechanism, in which one node (usually the first
node that opened the object) will act as a ā€œmetadata nodeā€ for that
object. Any other node that would like to update/query the objectā€™s
metadata (inode fields) will talk to that node ā€“ thus achieving
better scalability for multiple objects.The metanode will be the only
one updating the objectā€™s metadata through the NSD server.
Workflow:
The main difference in workflow when considering the metanode
role is that there will be metadata traffic not only between NSD
clients and server, but also directly between different NSD clients.
Network Requirements:
Metanode traffic is similar to generic token traffic, i.e. latency-
sensitive for small messages.
NSD
server
NSD
server
NSD
server
AFM
GW
AFM
GW
CL
MGR
FS
MGR
Zimon
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
Token
SRV
Meta
Node
AFM
GW
Token
SRV
Token
SRV
Meta
Node
Meta
Node
KLM
SRV
CCR
SRV
CCR
SRV
CCR
SRV
ZimonZimonZimonZimon
IBM Speftrum Sfale fomponents - ā€œCoreā€
Control traff
ā€œData access/consistency managementā€
ā€“ Filesystem allocation and quota:
Overview:
Managing block and inode allocation in highly distributed filesystem is a
big challenge. While there are similarities between standard allocation and
quotas, the main difference is that quotas operates at finer granularity
(there are separate quota for each user/group/fileset etc.) , Thus different
mechanisms are used that affect the communication impact on both.
Workflow:
For filesystem based block/inode allocation, a set of predefined allocation
regions are used. Thus, when a node needs block or inode allocation, it
just needs to be assigned with an available region. The communication
impact is relatively low until the region is used or ā€œStolenā€ by another node.
The FS MGR manages the region allocation.
Quota is a bit more challenging as it requires much more frequent
messaging. Thus, an eventual consistent usage was implemented. Each
node will get a ā€œshareā€ of blocks/inodes for the relevant quota entity
(autotuned post 4.1.1). The quota manager, running on the FS MGR, will
manage those shares assignment upon request.
Network Requirements:
Block/inode allocation is also based on small messages and thus are
latency sensitive. While regular allocation might not be affected by
slowness due to the region based approach, quotas allocation might affect
write/create performance.
In both cases, all nodes will contact the respective FS MGR in order to get
allocation regions and/or quota shares.
NSD
server
NSD
server
NSD
server
AFM
GW
AFM
GW
CL
MGR
FS
MGR
Zimon
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
Token
SRV
Meta
Node
AFM
GW
Token
SRV
Token
SRV
Meta
Node
Meta
Node
KLM
SRV
CCR
SRV
CCR
SRV
CCR
SRV
ZimonZimonZimonZimon
IBM Speftrum Sfale fomponents - ā€œCoreā€
Control traff
ā€œConfiguration/feature relatedā€
ā€“ Key management:
Overview:
When IBM Spectrum Scale introduced the native
filesystem encryption, a new dependency (or
communication flow) was introduced, the ability to talk
to a key server (currently either SKLM or Vormetrics)
Workflow:
Since any node in the environment (either local or
remote cluster) might need to access encrypted data,
all nodes needs to have access to a key server.
While the nodes will cache the keys for some time
(configurable) not being able to fetch keys from the key
server might result in data access error
Network Requirements:
Key fetching from key server/s is mostly around small
messages to the designated set of servers. As
mentioned above, all nodes accessing encrypted data
need to have access to those servers
NSD
server
NSD
server
NSD
server
AFM
GW
AFM
GW
CL
MGR
FS
MGR
Zimon
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
Token
SRV
Meta
Node
AFM
GW
Token
SRV
Token
SRV
Meta
Node
Meta
Node
KLM
SRV
CCR
SRV
CCR
SRV
CCR
SRV
ZimonZimonZimonZimon
IBM Speftrum Sfale fomponents - ā€œCoreā€
Control traff
ā€œConfiguration/feature relatedā€
ā€“ Performance collection (zimon):
Overview:
Starting with IBM Spectrum Scale 4.1.1 a performance collection
solution is included with the product. It is based on the zimon
technology. Zimon collects a variety of performance related metrics
to be later presented using either IBM Spectrum Scale GUI,
Grafana (using a bridge) or CLI (using the mmperfmon command)
Workflow:
Like most performance monitoring systems, zimon (starting with
IBM Spectrum Scale 4.2) can potentially use a federated structure
of reporters (every node) and collectors (specifically defined nodes).
Each node reports to its designated collector allowing the various
collectors to be queried for the collected data. The collectors know
about each other ā€“ thus able to satisfy queries for performance data
stored on other collectors.
Network Requirements:
Zimon reporting is also based on relatively small messages, flowing
between monitored nodes to their designated collector. In order to
achieve HA one client might report to more then one collector.
That said, its important to note that zimon traffic is not in the ā€œdata
pathā€ meaning, lack of ability to report in a timely fashion will result
in some performance data missing, but wouldnā€™t affect actual data
access (i.e. no user impact, only admin impact).
NSD
server
NSD
server
NSD
server
AFM
GW
AFM
GW
CL
MGR
FS
MGR
Zimon
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
Token
SRV
Meta
Node
AFM
GW
Token
SRV
Token
SRV
Meta
Node
Meta
Node
KLM
SRV
CCR
SRV
CCR
SRV
CCR
SRV
ZimonZimonZimon
IBM Speftrum Sfale fomponents - ā€œCoreā€
Control traff
ā€œConfiguration/feature relatedā€
ā€“ Configuration Management (CCR):
Overview:
In the past, GPFS used two designated configuration server nodes to
manage the authoritative version of the cluster configuration. Starting
with version 4.1 a new Cluster Configuration Repository (CCR) was
introduced. Today, all quorum nodes act as a CCR nodes, meaning
that they manage the distributed configuration state of the cluster.
CCR also allows using multiple configuration files ā€“ thus allowing IBM
Spectrum Scale configuration repository to manage much more info
(including protocols configuration, authentication configuration etc.).
Workflow:
In general, there are two major operations that can be done on
configuration that are being managed by CCR: update and query.
Both can be done from any node ā€“ but will change (and distribute the
changes) to all CCR nodes in the cluster and potentially (depending
on the specific configuration file) to all cluster nodes.
On update, the whole file will be pushed to the CCR nodes ā€“ and while
the file is usually small, it might grow to several MB on large clusters
Network Requirements:
CCR usually use small messages as well, from all cluster nodes.
Special nodes that utilize the CCR more then others (protocol nodes,
GNR nodes, GUI etc) might introduce more traffic then others.
NSD
server
NSD
server
NSD
server
AFM
GW
AFM
GW
CL
MGR
FS
MGR
Zimon
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
Token
SRV
Meta
Node
AFM
GW
Token
SRV
Token
SRV
Meta
Node
Meta
Node
KLM
SRV
CCR
SRV
CCR
SRV
CCR
SRV
ZimonZimonZimon
IBM Speftrum Sfale fomponents - ā€œCoreā€
Control traff
ā€œConfiguration/feature relatedā€
ā€“ AFM related RPCs:
Overview:
While the data traffic part of AFM was discussed earlier,
AFM also uses client to AFM gateway RPCs in order to
let the GW know about filesystem operations that need
to be replicated by the GW.
Workflow:
When a client node (any client that does I/O on an AFM
fileset) perform read/write operation on a filesystem
object ā€“ it send special RPC to the AFM GW so the GW
can either add the specific op to the fileset queue (for
write/creates etc.) or validate/fetch that data on reads.
In order to perform the various operations, the AFM GW
node might require other type of traffic, data or control
in order to actually perform the required operations
Network Requirements:
The AFM related RPCs are small messages, as only
the op itself is being transmitted, not the data itself.
The traffic might flow to/from any node that access the
AFM fileset (on either local or remote clusters).
NSD
server
NSD
server
NSD
server
AFM
GW
AFM
GW
CL
MGR
FS
MGR
Zimon
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
Token
SRV
Meta
Node
AFM
GW
Token
SRV
Token
SRV
Meta
Node
Meta
Node
KLM
SRV
CCR
SRV
CCR
SRV
CCR
SRV
ZimonZimonZimon
IBM Speftrum Sfale fomponents - ā€œCoreā€
Administration traff
ā—
Administration traffic in the core components can be divided into internal
traffic (i.e. traffic that is used to perform operations on the various cluster
nodes) and external traffic (traffic used by external entities in order to
ā€œtalkā€ to components performing operations on their behalf).
ā—
The first type of traffic mostly uses remote shell commands (ssh on
modern systems). Some parts of internal traffic also use special RPCs
that replaced the remote shell commands on older versions of GPFS (in
order to minimize the dependency on remote shell). Those operations
are mostly around distributing config files on new nodes, NSD creation
etc. Some customers are using dedicated ā€œmanagementā€ network in
order to perform those (admin interface).
Unless otherwise defined (using the adminMode config option) any node
in the cluster might be the source of those operations, in most cases,
some other node will perform the actual operation on the requester
behalf (cluster manager/filesystem manager etc.).
ā—
The second type (external administration) is mostly around web based
GUI access and/or the new REST interface to IBM Spectrum Scale
commands.
ā—
Both operations are not in the data path and usually being used by
interactive commands (thus, extremely long delays will cause
complaints by administrators), but their performance requirements are
not critical.
NSD
server
NSD
server
NSD
server
GUI
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
REST
mgmt
External
Management
clients
CL
MGR
FS
MGR
CCR
SRV
CCR
SRV
CCR
SRV
ā€œProtocolsā€
IBM Speftrum Sfale fomponents - ā€œProtofolsā€
ā—
The ā€œprotocolsā€ component of IBM Spectrum scale provides the ability to access data stored on various
filesystems using standard protocols.
ā—
Today, the supported protocols are: NFS, CIFS, Swift (Object), HDFS and iSCSI (boot only) .
ā—
ā€œprotocol nodesā€ are nodes that are running the various services. From IBM Spectrum Scale perspective
they are usually ā€œclientā€ nodes (function wise, not licensing wise) ā€“ meaning that they access the data as
any other IBM Spectrum Scale client node.
ā—
Thus, on one hand they utilize all the data/control/management traffic as mentioned in the ā€œcoreā€ section,
but on the other hand, uses other protocols data/control traffic (protocol dependent).
ā—
Some protocols require special control protocol for cluster awareness (CIFS) and most protocols also
require authentication control traffic (usually to external authentication server/s).
IBM Speftrum Sfale fomponents - ā€œProtofolsā€
Data traff
ā—
As mentioned earlier, protocol nodes are using
the core data traffic on their ā€œbackendā€ in order
to talk to the NSD servers.
ā—
On the ā€œfrontendā€ protocol nodes are using the
relevant protocol data path.
ā—
Depending on the access pattern and dataset
the traffic might be throughput oriented or
latency oriented.
ā—
Some protocols might assign different roles to
different nodes (e.g. HDFS datanode vs.
namenode).
ā—
It is advised not to use the same interfaces for
both IBM Spectrum Scale (backend) traffic and
protocols (frontend) traffic.
NSD
server
NSD
server
NSD
server
CIFS
server
CIFS
server
NFS
server
NFS
server
SWIFT
server
SWIFT
server
HDFS
Data
HDFS
Data
IBM Speftrum Sfale fomponents - ā€œProtofolsā€
Control traff
ā—
As mentioned earlier, protocol nodes are
using the same control traffic on their
ā€œbackendā€ as any IBM Spectrum Scale client
node.
ā—
On the protocols specific part, there are both
common control traffic (usually authentication)
and potentially protocol specific control traffic:
ā€“ Authentication: Most protocols (unless
using local authentication) will contact the
configured external authentication
servers in order to get authentication
related information. For the most part
those are relatively small, latency
sensitive operations
CIFS
server
CIFS
server
NFS
server
NFS
server
SWIFT
server
SWIFT
server
HDFS
Name
Auth
server
Auth
server
IBM Speftrum Sfale fomponents - ā€œProtofolsā€
Control traff
ā—
Protocol Specific control traffic
ā€“ NFS:
IBM Spectrum Scale uses Ganesha NFS
server for providing NFS services.
Back-end:
Ganesha doesnā€™t use any special internode
communication, it relies on GPFS as a
clustering mechanism.
Front-end:
For the most part, the NFS protocol is includes
mixed data and control traffic. Control traffic
on NFS includes locking (separate on V3 and
integrated on V4), delegation backchannel (V4
ā€“ not yet supported) and rquota protocol.
All of the front end control data is mostly
around small messages somewhat latency
sensitive (application dependent).
NFS
server
NFS
server
NFS
server
locking
backchannel
rquota
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
NFS
client
IBM Speftrum Sfale fomponents - ā€œProtofolsā€
Control traff
ā—
Protocol Specific control traffic
ā€“ CIFS:
In order to provide CIFS services, IBM Spectrum
Scale is using the SAMBA software together
with the CTDB clustering software.
Back-end:
In order to manage the CIFS locking state inside
the cluster, CTDB needs to communicate
between the nodes. The traffic is based on
latency-sensitive small messages .
Front-end:
For the most part, the CIFS protocols mix both
data and control protocol ā€“ so its a bit hard to
differentiate between them. Control traffic
includes locking and callback channel (i.e. when
the server contacts the client to break oplocks
etc.). Both largely depend on the specific
workload and dataset.
CIFS
server
CIFS
server
CIFS
server
locking
backchannel
rquota
NFS
client
NFS
client
NFS
client
NFS
client
CIFS
client
NFS
client
NFS
client
NFS
client
NFS
client
CIFS
client
NFS
client
NFS
client
NFS
client
NFS
client
CIFS
client
NFS
client
NFS
client
NFS
client
NFS
client
CIFS
client
CTDB
IBM Speftrum Sfale fomponents - ā€œProtofolsā€
Data traff
ā—
Protocol Specific control traffic
ā€“ SWIFT/S3:
In order to provide Object based services, IBM
Spectrum Scale is using OpenStack Swift software
with some enhancements in order to provide better
S3 support.
Back-end:
Currently, the only traffic on the backend network with
the object implementation is the same traffic as the
regular ā€œcoreā€ components.
Front-end:
The object implementation is using several data and
control messages over the public network:
ā€“ Data: With object protocol data, a client requests a read/write
operation to a protocol node over the front end (CES) network. With
traditional Swift mode, that request may be processed by the same
protocol node or passed to a different protocol node for processing.
With unified file and object access mode (Swift on File), the request is
always processed by the protocol node that received the client request.
The front end (CES) network is used for passing requests between
protocol nodes
SWIFT
server
SWIFT
server
SWIFT
server
NFS
client
NFS
client
NFS
client
NFS
client
OBJ
client
NFS
client
NFS
client
NFS
client
NFS
client
OBJ
client
NFS
client
NFS
client
NFS
client
NFS
client
OBJ
client
NFS
client
NFS
client
NFS
client
NFS
client
OBJ
client
Data
IBM Speftrum Sfale fomponents - ā€œProtofolsā€
Control traff
ā—
Protocol Specific control traffic
ā€“ SWIFT/S3 (Continued)
Control: Control traffic in Swift is mostly around
authentication traffic, and somewhat depends on the
chosen configuration:
ā—
If the keystone authentication service is being provided
by the protocol nodes (the common case), then clients
will contact the keystone node in order to validate the
user and get security token.
When using Swift protocol, the server will cache the
token using memcached based on the token TTL, in
order to reduce the authentication traffic, when using S3,
the server will not cache any tokens.
ā—
When clients contact a protocol node with their security
token, the node will contact the keystone server in order
to validate the token. In some cases, the keystone server
might be running on different node or on a 3rd party node
(when using external keystone).
ā€“ Note: Currently, all the authentication traffic will take place
using the front-end (CES) IPs and not the back-end network
SWIFT
server
SWIFT
server
SWIFT
server
NFS
client
NFS
client
NFS
client
NFS
client
OBJ
client
NFS
client
NFS
client
NFS
client
NFS
client
OBJ
client
NFS
client
NFS
client
NFS
client
NFS
client
OBJ
client
NFS
client
NFS
client
NFS
client
NFS
client
OBJ
client
keystone
server
AUTH
AUTH
IBM Speftrum Sfale fomponents - ā€œProtofolsā€
Control traff
ā—
Protocol Specific control traffic
ā€“ HDFS:
For Hadoop-like workload (or applications that are
using the DFS API in order to access data, IBM
Spectrum Scale implements the transparent Hadoop
connector. The control data is being managed by a
single ā€œnamenodeā€ at a time, which provides the client
with info which data nodes to get the actual data from.
Back-end:
There is no special back-end data going on for the
hadoop transparent connector.
Front-end:
Today, a single node act as the namenode at any
given time. Clients will first contact this node, using the
standard HDFS RPCs in order to query location of
specific blocks.
As with most control protocols, those are small
messages, relatively latency sensitive (since actual
data is usually large, the latency might not be that
critical).
HDFS
name
HDFS
Data
HDFS
Data
NFS
client
NFS
client
NFS
client
NFS
client
HDFS
client
NFS
client
NFS
client
NFS
client
NFS
client
HDFS
client
NFS
client
NFS
client
NFS
client
NFS
client
HDFS
client
NFS
client
NFS
client
NFS
client
NFS
client
HDFS
client
ā€œGNRā€
IBM Speftrum Sfale fomponents - ā€œGNRā€
ā—
GPFS Native Raid (a.k.a IBM Spectrum Scale RAID) is a software based RAID
implementation that is being used with several storage product, for example Elastic
Storage Server (ESS) from IBM.
ā—
From ā€œcoreā€ IBM Spectrum Scale perspective, GNR node are considered NSD servers
where ā€œbelowā€ the NSD layer we have the GNR layer itself that takes care of the physical
disks management and data protection using declustered RAID technology.
ā—
That said, the GNR layer itself present several new network related flows in order to
provide efficient write caching as well as monitoring of the partner node in each GNR
building block.
Main Data Flow
IBM Speftrum Sfale fomponents - ā€œGNRā€
Data traff
ā—
As mentioned earlier, GNR nodes also act as NSD servers, thus
the main data flow that is going on with GNR is reading/writing
to/from clients. That part is discussed in the ā€œcoreā€ component.
That said, the high performance that GNR nodes usually provide,
require adequate network bandwidth in order to really utilize GNR
nodes capabilities.
ā—
Another special type of traffic we have with GNR is NVRAM
replication. Essentially, GNR uses the internal NVRAM on the
nodes in order to implement a ā€œlogTipā€, as an extremely fast small
write cache (technically, its an append to circular log) and for
internal GNR metadata operations. Since the NVRAM is internal to
each server, a special protocol (NSPD) is used in order to replicate
those operations into the building block ā€œpartnerā€. The replication is
based on small messages which are highly latency sensitive
(higher latency will impact small write performance). Under some
workload, it might make sense to separate this traffic using fast
interconnect (e.g. IB) in order to enhance the performance and
avoid the competition with the main data stream.
GNR
server
GNR
server
GNR
server
GNR
server
GNR
server
GNR
server
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSPD NSPD NSPD
Main Data Flow
IBM Speftrum Sfale fomponents - ā€œGNRā€
Control traff
ā—
Much of GNRā€™s control traffic is the same
as the rest of the ā€œcoreā€ components.
That said, due to the special need to
manage large number of disks there are
some special control traffic going on in
the background in order to monitor each
building block partner.
ā—
The nature of this traffic is around
relatively small messages, but since they
are performed in the background, latency
if not that critical for those.
GNR
server
GNR
server
GNR
server
GNR
server
GNR
server
GNR
server
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSD
client
NSPD NSPD NSPD
Other resources
Other resourfes
ā—
List of ports being used by IBM Spectrum Scale components
https://ibm.biz/BdixM4

More Related Content

What's hot

ģ œ3ķšŒė‚œź³µė¶ˆė½ ģ˜¤ķ”ˆģ†ŒģŠ¤ ģøķ”„ė¼ģ„øėÆøė‚˜ - Pacemaker
ģ œ3ķšŒė‚œź³µė¶ˆė½ ģ˜¤ķ”ˆģ†ŒģŠ¤ ģøķ”„ė¼ģ„øėÆøė‚˜ - Pacemakerģ œ3ķšŒė‚œź³µė¶ˆė½ ģ˜¤ķ”ˆģ†ŒģŠ¤ ģøķ”„ė¼ģ„øėÆøė‚˜ - Pacemaker
ģ œ3ķšŒė‚œź³µė¶ˆė½ ģ˜¤ķ”ˆģ†ŒģŠ¤ ģøķ”„ė¼ģ„øėÆøė‚˜ - Pacemaker
Tommy Lee
Ā 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Anant Corporation
Ā 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
Jose De La Rosa
Ā 
Cloud datacenter network architecture (2014)
Cloud datacenter network architecture (2014)Cloud datacenter network architecture (2014)
Cloud datacenter network architecture (2014)
Gasida Seo
Ā 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to Vulkan
Mark Kilgard
Ā 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
Sage Weil
Ā 
MPLS on Router OS V7 - Part 1
MPLS on Router OS V7 - Part 1MPLS on Router OS V7 - Part 1
MPLS on Router OS V7 - Part 1
GLC Networks
Ā 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
Hisaki Ohara
Ā 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
Ceph Community
Ā 
DIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL LeeDIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL Lee
MyNOG
Ā 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
ScyllaDB
Ā 
Network Function Virtualisation: a tutorial
Network Function Virtualisation: a tutorialNetwork Function Virtualisation: a tutorial
Network Function Virtualisation: a tutorial
APNIC
Ā 
Large scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsLarge scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutions
Han Zhou
Ā 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
confluent
Ā 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Karan Singh
Ā 
Zeus: Uberā€™s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uberā€™s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uberā€™s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uberā€™s Highly Scalable and Distributed Shuffle as a Service
Databricks
Ā 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
Ceph Community
Ā 
Spark Performance Tuning .pdf
Spark Performance Tuning .pdfSpark Performance Tuning .pdf
Spark Performance Tuning .pdf
Amit Raj
Ā 
Bonding Interface in MikroTik
Bonding Interface in MikroTikBonding Interface in MikroTik
Bonding Interface in MikroTik
KHNOG
Ā 
NVIDIA DataArt IT
NVIDIA DataArt ITNVIDIA DataArt IT
NVIDIA DataArt IT
Alison B. Lowndes
Ā 

What's hot (20)

ģ œ3ķšŒė‚œź³µė¶ˆė½ ģ˜¤ķ”ˆģ†ŒģŠ¤ ģøķ”„ė¼ģ„øėÆøė‚˜ - Pacemaker
ģ œ3ķšŒė‚œź³µė¶ˆė½ ģ˜¤ķ”ˆģ†ŒģŠ¤ ģøķ”„ė¼ģ„øėÆøė‚˜ - Pacemakerģ œ3ķšŒė‚œź³µė¶ˆė½ ģ˜¤ķ”ˆģ†ŒģŠ¤ ģøķ”„ė¼ģ„øėÆøė‚˜ - Pacemaker
ģ œ3ķšŒė‚œź³µė¶ˆė½ ģ˜¤ķ”ˆģ†ŒģŠ¤ ģøķ”„ė¼ģ„øėÆøė‚˜ - Pacemaker
Ā 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Ā 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
Ā 
Cloud datacenter network architecture (2014)
Cloud datacenter network architecture (2014)Cloud datacenter network architecture (2014)
Cloud datacenter network architecture (2014)
Ā 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to Vulkan
Ā 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
Ā 
MPLS on Router OS V7 - Part 1
MPLS on Router OS V7 - Part 1MPLS on Router OS V7 - Part 1
MPLS on Router OS V7 - Part 1
Ā 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
Ā 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
Ā 
DIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL LeeDIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL Lee
Ā 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
Ā 
Network Function Virtualisation: a tutorial
Network Function Virtualisation: a tutorialNetwork Function Virtualisation: a tutorial
Network Function Virtualisation: a tutorial
Ā 
Large scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsLarge scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutions
Ā 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
Ā 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ā 
Zeus: Uberā€™s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uberā€™s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uberā€™s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uberā€™s Highly Scalable and Distributed Shuffle as a Service
Ā 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
Ā 
Spark Performance Tuning .pdf
Spark Performance Tuning .pdfSpark Performance Tuning .pdf
Spark Performance Tuning .pdf
Ā 
Bonding Interface in MikroTik
Bonding Interface in MikroTikBonding Interface in MikroTik
Bonding Interface in MikroTik
Ā 
NVIDIA DataArt IT
NVIDIA DataArt ITNVIDIA DataArt IT
NVIDIA DataArt IT
Ā 

Similar to IBM Spectrum Scale Network Flows

IBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking FlowIBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking Flow
Sandeep Patil
Ā 
hakin9_6-2006_str22-33_snort_EN
hakin9_6-2006_str22-33_snort_ENhakin9_6-2006_str22-33_snort_EN
hakin9_6-2006_str22-33_snort_EN
Pierpaolo Palazzoli
Ā 
DESIGN AND PERFORMANCE ANALYSIS OF ZBT SRAM CONTROLLER
DESIGN AND PERFORMANCE ANALYSIS OF ZBT SRAM CONTROLLERDESIGN AND PERFORMANCE ANALYSIS OF ZBT SRAM CONTROLLER
DESIGN AND PERFORMANCE ANALYSIS OF ZBT SRAM CONTROLLER
VLSICS Design
Ā 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Coburn Watson
Ā 
Dataplane networking acceleration with OpenDataplane / ŠœŠ°ŠŗсŠøŠ¼ Š£Š²Š°Ń€Š¾Š² (Linaro)
Dataplane networking acceleration with OpenDataplane / ŠœŠ°ŠŗсŠøŠ¼ Š£Š²Š°Ń€Š¾Š² (Linaro)Dataplane networking acceleration with OpenDataplane / ŠœŠ°ŠŗсŠøŠ¼ Š£Š²Š°Ń€Š¾Š² (Linaro)
Dataplane networking acceleration with OpenDataplane / ŠœŠ°ŠŗсŠøŠ¼ Š£Š²Š°Ń€Š¾Š² (Linaro)
Ontico
Ā 
Snmp chapter7
Snmp chapter7Snmp chapter7
Snmp chapter7
jignash
Ā 
OSMC 2009 | net-snmp: The forgotten classic by Dr. Michael Schwartzkopff
OSMC 2009 | net-snmp: The forgotten classic by Dr. Michael SchwartzkopffOSMC 2009 | net-snmp: The forgotten classic by Dr. Michael Schwartzkopff
OSMC 2009 | net-snmp: The forgotten classic by Dr. Michael Schwartzkopff
NETWAYS
Ā 
Introduction to IBM Shared Memory Communications Version 2 (SMCv2) and SMC-Dv2
Introduction to IBM Shared Memory Communications Version 2 (SMCv2) and SMC-Dv2Introduction to IBM Shared Memory Communications Version 2 (SMCv2) and SMC-Dv2
Introduction to IBM Shared Memory Communications Version 2 (SMCv2) and SMC-Dv2
zOSCommserver
Ā 
CC-4153, Verizon Cloud Compute and the SM15000, by Paul Curtis
CC-4153, Verizon Cloud Compute and the SM15000, by Paul CurtisCC-4153, Verizon Cloud Compute and the SM15000, by Paul Curtis
CC-4153, Verizon Cloud Compute and the SM15000, by Paul Curtis
AMD Developer Central
Ā 
F5 tcpdump
F5 tcpdumpF5 tcpdump
F5 tcpdump
alex wade
Ā 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Amazon Web Services
Ā 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
Ganesan Narayanasamy
Ā 
Securing Millions of Devices
Securing Millions of DevicesSecuring Millions of Devices
Securing Millions of Devices
Kai Hudalla
Ā 
Dryad Paper Review and System Analysis
Dryad Paper Review and System AnalysisDryad Paper Review and System Analysis
Dryad Paper Review and System Analysis
JinGui LI
Ā 
Centralized monitoring station for it computing and network infrastructure
Centralized monitoring station for it computing and network infrastructureCentralized monitoring station for it computing and network infrastructure
Centralized monitoring station for it computing and network infrastructure
MOHD ARISH
Ā 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
Tim Ellison
Ā 
Monitoring pfSense 2.4 with SNMP - pfSense Hangout March 2018
Monitoring pfSense 2.4 with SNMP - pfSense Hangout March 2018Monitoring pfSense 2.4 with SNMP - pfSense Hangout March 2018
Monitoring pfSense 2.4 with SNMP - pfSense Hangout March 2018
Netgate
Ā 
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
Shawn Wells
Ā 
High perf-networking
High perf-networkingHigh perf-networking
High perf-networking
mtimjones
Ā 
Exploiting Multi Core Architectures for Process Speed Up
Exploiting Multi Core Architectures for Process Speed UpExploiting Multi Core Architectures for Process Speed Up
Exploiting Multi Core Architectures for Process Speed Up
IJERD Editor
Ā 

Similar to IBM Spectrum Scale Network Flows (20)

IBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking FlowIBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking Flow
Ā 
hakin9_6-2006_str22-33_snort_EN
hakin9_6-2006_str22-33_snort_ENhakin9_6-2006_str22-33_snort_EN
hakin9_6-2006_str22-33_snort_EN
Ā 
DESIGN AND PERFORMANCE ANALYSIS OF ZBT SRAM CONTROLLER
DESIGN AND PERFORMANCE ANALYSIS OF ZBT SRAM CONTROLLERDESIGN AND PERFORMANCE ANALYSIS OF ZBT SRAM CONTROLLER
DESIGN AND PERFORMANCE ANALYSIS OF ZBT SRAM CONTROLLER
Ā 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Ā 
Dataplane networking acceleration with OpenDataplane / ŠœŠ°ŠŗсŠøŠ¼ Š£Š²Š°Ń€Š¾Š² (Linaro)
Dataplane networking acceleration with OpenDataplane / ŠœŠ°ŠŗсŠøŠ¼ Š£Š²Š°Ń€Š¾Š² (Linaro)Dataplane networking acceleration with OpenDataplane / ŠœŠ°ŠŗсŠøŠ¼ Š£Š²Š°Ń€Š¾Š² (Linaro)
Dataplane networking acceleration with OpenDataplane / ŠœŠ°ŠŗсŠøŠ¼ Š£Š²Š°Ń€Š¾Š² (Linaro)
Ā 
Snmp chapter7
Snmp chapter7Snmp chapter7
Snmp chapter7
Ā 
OSMC 2009 | net-snmp: The forgotten classic by Dr. Michael Schwartzkopff
OSMC 2009 | net-snmp: The forgotten classic by Dr. Michael SchwartzkopffOSMC 2009 | net-snmp: The forgotten classic by Dr. Michael Schwartzkopff
OSMC 2009 | net-snmp: The forgotten classic by Dr. Michael Schwartzkopff
Ā 
Introduction to IBM Shared Memory Communications Version 2 (SMCv2) and SMC-Dv2
Introduction to IBM Shared Memory Communications Version 2 (SMCv2) and SMC-Dv2Introduction to IBM Shared Memory Communications Version 2 (SMCv2) and SMC-Dv2
Introduction to IBM Shared Memory Communications Version 2 (SMCv2) and SMC-Dv2
Ā 
CC-4153, Verizon Cloud Compute and the SM15000, by Paul Curtis
CC-4153, Verizon Cloud Compute and the SM15000, by Paul CurtisCC-4153, Verizon Cloud Compute and the SM15000, by Paul Curtis
CC-4153, Verizon Cloud Compute and the SM15000, by Paul Curtis
Ā 
F5 tcpdump
F5 tcpdumpF5 tcpdump
F5 tcpdump
Ā 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Ā 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
Ā 
Securing Millions of Devices
Securing Millions of DevicesSecuring Millions of Devices
Securing Millions of Devices
Ā 
Dryad Paper Review and System Analysis
Dryad Paper Review and System AnalysisDryad Paper Review and System Analysis
Dryad Paper Review and System Analysis
Ā 
Centralized monitoring station for it computing and network infrastructure
Centralized monitoring station for it computing and network infrastructureCentralized monitoring station for it computing and network infrastructure
Centralized monitoring station for it computing and network infrastructure
Ā 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
Ā 
Monitoring pfSense 2.4 with SNMP - pfSense Hangout March 2018
Monitoring pfSense 2.4 with SNMP - pfSense Hangout March 2018Monitoring pfSense 2.4 with SNMP - pfSense Hangout March 2018
Monitoring pfSense 2.4 with SNMP - pfSense Hangout March 2018
Ā 
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
Ā 
High perf-networking
High perf-networkingHigh perf-networking
High perf-networking
Ā 
Exploiting Multi Core Architectures for Process Speed Up
Exploiting Multi Core Architectures for Process Speed UpExploiting Multi Core Architectures for Process Speed Up
Exploiting Multi Core Architectures for Process Speed Up
Ā 

Recently uploaded

GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
Ā 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
Ā 
原ē‰ˆå®šåˆ¶ē¾Žå›½ēŗ½ēŗ¦å·žē«‹å¤§å­¦å„„å°”å·“å°¼åˆ†ę ”ęƕäøščƁ学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·
原ē‰ˆå®šåˆ¶ē¾Žå›½ēŗ½ēŗ¦å·žē«‹å¤§å­¦å„„å°”å·“å°¼åˆ†ę ”ęƕäøščƁ学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·åŽŸē‰ˆå®šåˆ¶ē¾Žå›½ēŗ½ēŗ¦å·žē«‹å¤§å­¦å„„å°”å·“å°¼åˆ†ę ”ęƕäøščƁ学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·
原ē‰ˆå®šåˆ¶ē¾Žå›½ēŗ½ēŗ¦å·žē«‹å¤§å­¦å„„å°”å·“å°¼åˆ†ę ”ęƕäøščƁ学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·
mz5nrf0n
Ā 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
Ā 
åœØēŗæč“­ä¹°åŠ ę‹æå¤§č‹±å±žå“„ä¼¦ęƔäŗšå¤§å­¦ęƕäøščÆęœ¬ē§‘学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·
åœØēŗæč“­ä¹°åŠ ę‹æå¤§č‹±å±žå“„ä¼¦ęƔäŗšå¤§å­¦ęƕäøščÆęœ¬ē§‘学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·åœØēŗæč“­ä¹°åŠ ę‹æå¤§č‹±å±žå“„ä¼¦ęƔäŗšå¤§å­¦ęƕäøščÆęœ¬ē§‘学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·
åœØēŗæč“­ä¹°åŠ ę‹æå¤§č‹±å±žå“„ä¼¦ęƔäŗšå¤§å­¦ęƕäøščÆęœ¬ē§‘学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·
mz5nrf0n
Ā 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
ISH Technologies
Ā 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
Ā 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
kalichargn70th171
Ā 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
Ā 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
Ā 
Atelier - Innover avec lā€™IA GĆ©nĆ©rative et les graphes de connaissances
Atelier - Innover avec lā€™IA GĆ©nĆ©rative et les graphes de connaissancesAtelier - Innover avec lā€™IA GĆ©nĆ©rative et les graphes de connaissances
Atelier - Innover avec lā€™IA GĆ©nĆ©rative et les graphes de connaissances
Neo4j
Ā 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
Ā 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
Ā 
Codeigniter VS Cakephp Which is Better for Web Development.pdf
Codeigniter VS Cakephp Which is Better for Web Development.pdfCodeigniter VS Cakephp Which is Better for Web Development.pdf
Codeigniter VS Cakephp Which is Better for Web Development.pdf
Semiosis Software Private Limited
Ā 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
Ā 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
Ā 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
Ā 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
Ā 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
Ā 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
Ā 

Recently uploaded (20)

GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Ā 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Ā 
原ē‰ˆå®šåˆ¶ē¾Žå›½ēŗ½ēŗ¦å·žē«‹å¤§å­¦å„„å°”å·“å°¼åˆ†ę ”ęƕäøščƁ学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·
原ē‰ˆå®šåˆ¶ē¾Žå›½ēŗ½ēŗ¦å·žē«‹å¤§å­¦å„„å°”å·“å°¼åˆ†ę ”ęƕäøščƁ学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·åŽŸē‰ˆå®šåˆ¶ē¾Žå›½ēŗ½ēŗ¦å·žē«‹å¤§å­¦å„„å°”å·“å°¼åˆ†ę ”ęƕäøščƁ学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·
原ē‰ˆå®šåˆ¶ē¾Žå›½ēŗ½ēŗ¦å·žē«‹å¤§å­¦å„„å°”å·“å°¼åˆ†ę ”ęƕäøščƁ学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·
Ā 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
Ā 
åœØēŗæč“­ä¹°åŠ ę‹æå¤§č‹±å±žå“„ä¼¦ęƔäŗšå¤§å­¦ęƕäøščÆęœ¬ē§‘学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·
åœØēŗæč“­ä¹°åŠ ę‹æå¤§č‹±å±žå“„ä¼¦ęƔäŗšå¤§å­¦ęƕäøščÆęœ¬ē§‘学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·åœØēŗæč“­ä¹°åŠ ę‹æå¤§č‹±å±žå“„ä¼¦ęƔäŗšå¤§å­¦ęƕäøščÆęœ¬ē§‘学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·
åœØēŗæč“­ä¹°åŠ ę‹æå¤§č‹±å±žå“„ä¼¦ęƔäŗšå¤§å­¦ęƕäøščÆęœ¬ē§‘学位čƁ书原ē‰ˆäø€ęØ”äø€ę ·
Ā 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
Ā 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Ā 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Ā 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Ā 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Ā 
Atelier - Innover avec lā€™IA GĆ©nĆ©rative et les graphes de connaissances
Atelier - Innover avec lā€™IA GĆ©nĆ©rative et les graphes de connaissancesAtelier - Innover avec lā€™IA GĆ©nĆ©rative et les graphes de connaissances
Atelier - Innover avec lā€™IA GĆ©nĆ©rative et les graphes de connaissances
Ā 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Ā 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Ā 
Codeigniter VS Cakephp Which is Better for Web Development.pdf
Codeigniter VS Cakephp Which is Better for Web Development.pdfCodeigniter VS Cakephp Which is Better for Web Development.pdf
Codeigniter VS Cakephp Which is Better for Web Development.pdf
Ā 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Ā 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Ā 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
Ā 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
Ā 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
Ā 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Ā 

IBM Spectrum Scale Network Flows

  • 1. Network flows Tomer Perry IBM Spectrum Scale Network
  • 2. Outline ļ‚§ Overview ļ‚§ Types of Network traffic in IBM Spectrum Scale environment ļ‚§ Components ļ‚§ Component traffic flow ļ‚§ Other resources 2
  • 3. Overview ā— IBM Spectrum Scale, like any distributed software, depends highly on the network infrastructure in order to operate efficiently. ā— The goal of this presentation is to provide a description of the various network traffic flows involved in the IBM Spectrum Scale environment. ā— Network designers can use this information to design a suitable network for the IBM Spectrum Scale environments
  • 4. Types of network traff in IBM Speftrum Sfale environment ā— In general, network traffic in IBM Spectrum Scale can be divided into 3 major types: ā€“ Data/Metadata Traffic: This type of traffic is mostly around actual data movement between various IBM Spectrum Scale components. It might travel over TCP/IP or use RDMA depending on the network type and protocol being used. Examples include: NSD traffic, GNR synchronization, public NAS protocol traffic, AFM replication traffic , external pools traffic etc. ā€“ Control traffic: This type of traffic consists of messages (usually small payloads) that are being used for control purposes. It includes both internal and external traffic (internal traffic involves control traffic between IBM Spectrum Scale components, while external traffic involves control traffic between IBM Spectrum Scale and external entities). Examples includes: Token traffic, disk leases traffic, encryption related key traffic, quota management traffic, monitoring traffic, CTDB traffic, configuration management traffic, authentication traffic etc. ā€“ Adminitration traffic: This type of traffic is around administrative operations that are being performed on an IBM Spectrum Scale environment Examples includes: remote shell traffic, Web GUI traffic (external), REST administration traffic etc. ā— Different components are using the above mentioned traffic type in different ways. Thus, we will try to describe each traffic type on a per component basis Note: In the last couple of years, many new features were introduced on top of the traditional GPFS product, which made IBM Spectrum Scale more feature-rich. Many of those features required introducing new types of network flows.
  • 5. IBM Speftrum Sfale fomponent - ā€œCoreā€ ā— The traditional GPFS is the core component of IBM Spectrum Scale. ā— All the non ā€œcoreā€ IBM Spectrum Scale components essentially use the ā€œcoreā€ components while enriching its capabilities ( access using standard protocols, management capabilities etc.) . Thus, the flows discussed in this section apply to the other components as well. ā— Some traffic flows depend on the features being used (for example: encryption, quotas etc.). ā— Some traffic flows are affected by the configuration being used (for example: replication traffic).
  • 6. IBM Speftrum Sfale fomponents - ā€œCoreā€ Data traff ā— The basic data traffic type in ā€œCoreā€ environment is that data transfer between the NSD servers and various clients (which might have other roles, like protocol nodes, TSM nodes, AFM GW, TCT nodes etc.). In the context of this document, Data includes filesystem metadata as well. ā— Data traffic is using the NSD protocol (Network Shared Disk) over either TCP/IP and/or RDMA (over Ethernet ā€“ a.k.a. RoCE, over IB and over OPA). RDMA protocol is considered more efficient, especially for large I/O sizes. ā— The required characteristics of the data channel highly depend on the customer workload (throughput oriented, latency oriented etc.). While metadata workload is usually latency- sensitive. ā— For sequential workloads, data messages size may be as large as the file system block size. ā— When using IBM Spectrum Scale sync replication, the client is responsible for writing multiple copies. Thus, it is required to take those into account as well (2X/3X writes for 2/3 replicas). For FPO configurations, where enableRepWriteStream is enabled, the first NSD server will write the third replica instead of the client. ā— Other data paths, which are still considered ā€œcoreā€ are the ones related to various features around moving data out of/into ā€œIBM Spectrum Scaleā€ environments: ā€“ AFM GW: Read/write (configuration dependent) data to/from external sources (either remote GPFS cluster or generic NFS server) ā€“ Spectrum Protect HSM: When using Spectrum Protect as an external pool for ILM, data will move to/from IBM Spectrum Scale to the IBM Spectrum Protect infrastructure. ā€“ TCT: Transparent Cloud Tiering used as an external pool as well, data will be moved to/from the external cloud/Object infrastructure NSD server NSD server NSD server AFM GW AFM GW TSM HSM TSM HSM TSM SRV TCT GW TCT GW Cloud NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client Remote Cluster /NFS server
  • 7. IBM Speftrum Sfale fomponents - ā€œCoreā€ Control traff ā— Control traffic in IBM Spectrum Scale environments can be divided into three major types: ā€“ Cluster ā€œwell beingā€ - i.e. disk leases (ā€œheartbeatā€) ā€“ Data access or consistency management (Tokens, quotas, allocation etc.) ā€“ Configuration or feature related (key management, Zimon collectors, configuration management, AFM related RPCs etc) ā— Most of these types of communication are latency sensitive (i.e. slow response time will affect performance) however, some of them,especially the ā€œwell beingā€ ones are less latency sensitive, but highly impact the availability of nodes in the cluster if message delay exceeds specific values (e.g. heartbeat timeouts). ā— Note: In most cases, a node can play multiple roles. NSD server NSD server NSD server AFM GW AFM GW CL MGR FS MGR Zimon NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client Token SRV Meta Node AFM GW Token SRV Token SRV Meta Node Meta Node KLM SRV CCR SRV CCR SRV CCR SRV ZimonZimonZimonZimon
  • 8. IBM Speftrum Sfale fomponents - ā€œCoreā€ Control traff ā€œCluster well beingā€ - a.k.a disk leases traffic ā— In each cluster, the cluster manager manages the disk leases. The cluster manager is automatically elected from the list of ā€œquorum nodesā€. ā— Each node (regardless of its role) needs to renew its lease with the cluster manager server (dynamically elected) at regular intervals (35 sec. By default). ā— If a node fails to renew its lease, the cluster manager will try to ping the ā€œsuspected nodeā€ before expelling it. ā— Another expel case might take place if a node canā€™t talk to another node. In that case, it will ask the CL mgr to expel the node. The CL MGR will decide which node needs to be expelled ā— All nodes in the local cluster (and potentially on remote clusters) participate in the lease-renewal mechanism, regardless of their role. NSD server NSD server NSD server AFM GW AFM GW CL MGR FS MGR Zimon NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client Token SRV Meta Node AFM GW Token SRV Token SRV Meta Node Meta Node KLM SRV CCR SRV CCR SRV CCR SRV ZimonZimonZimonZimon
  • 9. IBM Speftrum Sfale fomponents - ā€œCoreā€ Control traff ā€œData access/consistency managementā€ ā— As a distributed filesystem, IBM Spectrum Scale needs some mechanisms in order to assure data consistency as well as capacity management. We describe those cases below: ā€“ Tokens: Overview: One of the basic constructs of IBM Spectrum Scale, as a parallel fs, is the use of tokens in order to maintain consistency. Essentially, in order to perform an operation on a filesystem object (file/directory etc.) a relevant token is required (for reading, writing, caching etc.). Workflow: At least one token server is automatically selected from the list of ā€œmanagerā€ nodes. The token servers manage the tokens, and each client (regardless of its role) needs to contact a token server. The token servers ā€œdivideā€ the object responsibility between them based on inode number, thus achieving load balancing and better resiliency. Network Requirements: In general, token traffic is based on small RPC messages, which are mostly latency sensitive (i.e. not throughput oriented). Slow response time (due to latency, network congestion etc.) might result in ā€œhangsā€ or ā€œhiccupsā€ from user/application perspective NSD server NSD server NSD server AFM GW AFM GW CL MGR FS MGR Zimon NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client Token SRV Meta Node AFM GW Token SRV Token SRV Meta Node Meta Node KLM SRV CCR SRV CCR SRV CCR SRV ZimonZimonZimonZimon
  • 10. IBM Speftrum Sfale fomponents - ā€œCoreā€ Metadata traff in presenfe of fle sharing ā€œData access/consistency managementā€ ā€“ Metanode: Overview: A corner case in token management (in order to achieve better performance) is the use of a ā€œmetanodeā€. One of the challenges in multi-node access to the same object is the ability to maintain metadata consistency. While in theory, the common token system can be used, updating (or querying) object metadata might result in many token ā€œstealsā€. Thus, we use a slightly different mechanism, in which one node (usually the first node that opened the object) will act as a ā€œmetadata nodeā€ for that object. Any other node that would like to update/query the objectā€™s metadata (inode fields) will talk to that node ā€“ thus achieving better scalability for multiple objects.The metanode will be the only one updating the objectā€™s metadata through the NSD server. Workflow: The main difference in workflow when considering the metanode role is that there will be metadata traffic not only between NSD clients and server, but also directly between different NSD clients. Network Requirements: Metanode traffic is similar to generic token traffic, i.e. latency- sensitive for small messages. NSD server NSD server NSD server AFM GW AFM GW CL MGR FS MGR Zimon NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client Token SRV Meta Node AFM GW Token SRV Token SRV Meta Node Meta Node KLM SRV CCR SRV CCR SRV CCR SRV ZimonZimonZimonZimon
  • 11. IBM Speftrum Sfale fomponents - ā€œCoreā€ Control traff ā€œData access/consistency managementā€ ā€“ Filesystem allocation and quota: Overview: Managing block and inode allocation in highly distributed filesystem is a big challenge. While there are similarities between standard allocation and quotas, the main difference is that quotas operates at finer granularity (there are separate quota for each user/group/fileset etc.) , Thus different mechanisms are used that affect the communication impact on both. Workflow: For filesystem based block/inode allocation, a set of predefined allocation regions are used. Thus, when a node needs block or inode allocation, it just needs to be assigned with an available region. The communication impact is relatively low until the region is used or ā€œStolenā€ by another node. The FS MGR manages the region allocation. Quota is a bit more challenging as it requires much more frequent messaging. Thus, an eventual consistent usage was implemented. Each node will get a ā€œshareā€ of blocks/inodes for the relevant quota entity (autotuned post 4.1.1). The quota manager, running on the FS MGR, will manage those shares assignment upon request. Network Requirements: Block/inode allocation is also based on small messages and thus are latency sensitive. While regular allocation might not be affected by slowness due to the region based approach, quotas allocation might affect write/create performance. In both cases, all nodes will contact the respective FS MGR in order to get allocation regions and/or quota shares. NSD server NSD server NSD server AFM GW AFM GW CL MGR FS MGR Zimon NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client Token SRV Meta Node AFM GW Token SRV Token SRV Meta Node Meta Node KLM SRV CCR SRV CCR SRV CCR SRV ZimonZimonZimonZimon
  • 12. IBM Speftrum Sfale fomponents - ā€œCoreā€ Control traff ā€œConfiguration/feature relatedā€ ā€“ Key management: Overview: When IBM Spectrum Scale introduced the native filesystem encryption, a new dependency (or communication flow) was introduced, the ability to talk to a key server (currently either SKLM or Vormetrics) Workflow: Since any node in the environment (either local or remote cluster) might need to access encrypted data, all nodes needs to have access to a key server. While the nodes will cache the keys for some time (configurable) not being able to fetch keys from the key server might result in data access error Network Requirements: Key fetching from key server/s is mostly around small messages to the designated set of servers. As mentioned above, all nodes accessing encrypted data need to have access to those servers NSD server NSD server NSD server AFM GW AFM GW CL MGR FS MGR Zimon NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client Token SRV Meta Node AFM GW Token SRV Token SRV Meta Node Meta Node KLM SRV CCR SRV CCR SRV CCR SRV ZimonZimonZimonZimon
  • 13. IBM Speftrum Sfale fomponents - ā€œCoreā€ Control traff ā€œConfiguration/feature relatedā€ ā€“ Performance collection (zimon): Overview: Starting with IBM Spectrum Scale 4.1.1 a performance collection solution is included with the product. It is based on the zimon technology. Zimon collects a variety of performance related metrics to be later presented using either IBM Spectrum Scale GUI, Grafana (using a bridge) or CLI (using the mmperfmon command) Workflow: Like most performance monitoring systems, zimon (starting with IBM Spectrum Scale 4.2) can potentially use a federated structure of reporters (every node) and collectors (specifically defined nodes). Each node reports to its designated collector allowing the various collectors to be queried for the collected data. The collectors know about each other ā€“ thus able to satisfy queries for performance data stored on other collectors. Network Requirements: Zimon reporting is also based on relatively small messages, flowing between monitored nodes to their designated collector. In order to achieve HA one client might report to more then one collector. That said, its important to note that zimon traffic is not in the ā€œdata pathā€ meaning, lack of ability to report in a timely fashion will result in some performance data missing, but wouldnā€™t affect actual data access (i.e. no user impact, only admin impact). NSD server NSD server NSD server AFM GW AFM GW CL MGR FS MGR Zimon NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client Token SRV Meta Node AFM GW Token SRV Token SRV Meta Node Meta Node KLM SRV CCR SRV CCR SRV CCR SRV ZimonZimonZimon
  • 14. IBM Speftrum Sfale fomponents - ā€œCoreā€ Control traff ā€œConfiguration/feature relatedā€ ā€“ Configuration Management (CCR): Overview: In the past, GPFS used two designated configuration server nodes to manage the authoritative version of the cluster configuration. Starting with version 4.1 a new Cluster Configuration Repository (CCR) was introduced. Today, all quorum nodes act as a CCR nodes, meaning that they manage the distributed configuration state of the cluster. CCR also allows using multiple configuration files ā€“ thus allowing IBM Spectrum Scale configuration repository to manage much more info (including protocols configuration, authentication configuration etc.). Workflow: In general, there are two major operations that can be done on configuration that are being managed by CCR: update and query. Both can be done from any node ā€“ but will change (and distribute the changes) to all CCR nodes in the cluster and potentially (depending on the specific configuration file) to all cluster nodes. On update, the whole file will be pushed to the CCR nodes ā€“ and while the file is usually small, it might grow to several MB on large clusters Network Requirements: CCR usually use small messages as well, from all cluster nodes. Special nodes that utilize the CCR more then others (protocol nodes, GNR nodes, GUI etc) might introduce more traffic then others. NSD server NSD server NSD server AFM GW AFM GW CL MGR FS MGR Zimon NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client Token SRV Meta Node AFM GW Token SRV Token SRV Meta Node Meta Node KLM SRV CCR SRV CCR SRV CCR SRV ZimonZimonZimon
  • 15. IBM Speftrum Sfale fomponents - ā€œCoreā€ Control traff ā€œConfiguration/feature relatedā€ ā€“ AFM related RPCs: Overview: While the data traffic part of AFM was discussed earlier, AFM also uses client to AFM gateway RPCs in order to let the GW know about filesystem operations that need to be replicated by the GW. Workflow: When a client node (any client that does I/O on an AFM fileset) perform read/write operation on a filesystem object ā€“ it send special RPC to the AFM GW so the GW can either add the specific op to the fileset queue (for write/creates etc.) or validate/fetch that data on reads. In order to perform the various operations, the AFM GW node might require other type of traffic, data or control in order to actually perform the required operations Network Requirements: The AFM related RPCs are small messages, as only the op itself is being transmitted, not the data itself. The traffic might flow to/from any node that access the AFM fileset (on either local or remote clusters). NSD server NSD server NSD server AFM GW AFM GW CL MGR FS MGR Zimon NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client Token SRV Meta Node AFM GW Token SRV Token SRV Meta Node Meta Node KLM SRV CCR SRV CCR SRV CCR SRV ZimonZimonZimon
  • 16. IBM Speftrum Sfale fomponents - ā€œCoreā€ Administration traff ā— Administration traffic in the core components can be divided into internal traffic (i.e. traffic that is used to perform operations on the various cluster nodes) and external traffic (traffic used by external entities in order to ā€œtalkā€ to components performing operations on their behalf). ā— The first type of traffic mostly uses remote shell commands (ssh on modern systems). Some parts of internal traffic also use special RPCs that replaced the remote shell commands on older versions of GPFS (in order to minimize the dependency on remote shell). Those operations are mostly around distributing config files on new nodes, NSD creation etc. Some customers are using dedicated ā€œmanagementā€ network in order to perform those (admin interface). Unless otherwise defined (using the adminMode config option) any node in the cluster might be the source of those operations, in most cases, some other node will perform the actual operation on the requester behalf (cluster manager/filesystem manager etc.). ā— The second type (external administration) is mostly around web based GUI access and/or the new REST interface to IBM Spectrum Scale commands. ā— Both operations are not in the data path and usually being used by interactive commands (thus, extremely long delays will cause complaints by administrators), but their performance requirements are not critical. NSD server NSD server NSD server GUI NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client REST mgmt External Management clients CL MGR FS MGR CCR SRV CCR SRV CCR SRV
  • 18. IBM Speftrum Sfale fomponents - ā€œProtofolsā€ ā— The ā€œprotocolsā€ component of IBM Spectrum scale provides the ability to access data stored on various filesystems using standard protocols. ā— Today, the supported protocols are: NFS, CIFS, Swift (Object), HDFS and iSCSI (boot only) . ā— ā€œprotocol nodesā€ are nodes that are running the various services. From IBM Spectrum Scale perspective they are usually ā€œclientā€ nodes (function wise, not licensing wise) ā€“ meaning that they access the data as any other IBM Spectrum Scale client node. ā— Thus, on one hand they utilize all the data/control/management traffic as mentioned in the ā€œcoreā€ section, but on the other hand, uses other protocols data/control traffic (protocol dependent). ā— Some protocols require special control protocol for cluster awareness (CIFS) and most protocols also require authentication control traffic (usually to external authentication server/s).
  • 19. IBM Speftrum Sfale fomponents - ā€œProtofolsā€ Data traff ā— As mentioned earlier, protocol nodes are using the core data traffic on their ā€œbackendā€ in order to talk to the NSD servers. ā— On the ā€œfrontendā€ protocol nodes are using the relevant protocol data path. ā— Depending on the access pattern and dataset the traffic might be throughput oriented or latency oriented. ā— Some protocols might assign different roles to different nodes (e.g. HDFS datanode vs. namenode). ā— It is advised not to use the same interfaces for both IBM Spectrum Scale (backend) traffic and protocols (frontend) traffic. NSD server NSD server NSD server CIFS server CIFS server NFS server NFS server SWIFT server SWIFT server HDFS Data HDFS Data
  • 20. IBM Speftrum Sfale fomponents - ā€œProtofolsā€ Control traff ā— As mentioned earlier, protocol nodes are using the same control traffic on their ā€œbackendā€ as any IBM Spectrum Scale client node. ā— On the protocols specific part, there are both common control traffic (usually authentication) and potentially protocol specific control traffic: ā€“ Authentication: Most protocols (unless using local authentication) will contact the configured external authentication servers in order to get authentication related information. For the most part those are relatively small, latency sensitive operations CIFS server CIFS server NFS server NFS server SWIFT server SWIFT server HDFS Name Auth server Auth server
  • 21. IBM Speftrum Sfale fomponents - ā€œProtofolsā€ Control traff ā— Protocol Specific control traffic ā€“ NFS: IBM Spectrum Scale uses Ganesha NFS server for providing NFS services. Back-end: Ganesha doesnā€™t use any special internode communication, it relies on GPFS as a clustering mechanism. Front-end: For the most part, the NFS protocol is includes mixed data and control traffic. Control traffic on NFS includes locking (separate on V3 and integrated on V4), delegation backchannel (V4 ā€“ not yet supported) and rquota protocol. All of the front end control data is mostly around small messages somewhat latency sensitive (application dependent). NFS server NFS server NFS server locking backchannel rquota NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client NFS client
  • 22. IBM Speftrum Sfale fomponents - ā€œProtofolsā€ Control traff ā— Protocol Specific control traffic ā€“ CIFS: In order to provide CIFS services, IBM Spectrum Scale is using the SAMBA software together with the CTDB clustering software. Back-end: In order to manage the CIFS locking state inside the cluster, CTDB needs to communicate between the nodes. The traffic is based on latency-sensitive small messages . Front-end: For the most part, the CIFS protocols mix both data and control protocol ā€“ so its a bit hard to differentiate between them. Control traffic includes locking and callback channel (i.e. when the server contacts the client to break oplocks etc.). Both largely depend on the specific workload and dataset. CIFS server CIFS server CIFS server locking backchannel rquota NFS client NFS client NFS client NFS client CIFS client NFS client NFS client NFS client NFS client CIFS client NFS client NFS client NFS client NFS client CIFS client NFS client NFS client NFS client NFS client CIFS client CTDB
  • 23. IBM Speftrum Sfale fomponents - ā€œProtofolsā€ Data traff ā— Protocol Specific control traffic ā€“ SWIFT/S3: In order to provide Object based services, IBM Spectrum Scale is using OpenStack Swift software with some enhancements in order to provide better S3 support. Back-end: Currently, the only traffic on the backend network with the object implementation is the same traffic as the regular ā€œcoreā€ components. Front-end: The object implementation is using several data and control messages over the public network: ā€“ Data: With object protocol data, a client requests a read/write operation to a protocol node over the front end (CES) network. With traditional Swift mode, that request may be processed by the same protocol node or passed to a different protocol node for processing. With unified file and object access mode (Swift on File), the request is always processed by the protocol node that received the client request. The front end (CES) network is used for passing requests between protocol nodes SWIFT server SWIFT server SWIFT server NFS client NFS client NFS client NFS client OBJ client NFS client NFS client NFS client NFS client OBJ client NFS client NFS client NFS client NFS client OBJ client NFS client NFS client NFS client NFS client OBJ client Data
  • 24. IBM Speftrum Sfale fomponents - ā€œProtofolsā€ Control traff ā— Protocol Specific control traffic ā€“ SWIFT/S3 (Continued) Control: Control traffic in Swift is mostly around authentication traffic, and somewhat depends on the chosen configuration: ā— If the keystone authentication service is being provided by the protocol nodes (the common case), then clients will contact the keystone node in order to validate the user and get security token. When using Swift protocol, the server will cache the token using memcached based on the token TTL, in order to reduce the authentication traffic, when using S3, the server will not cache any tokens. ā— When clients contact a protocol node with their security token, the node will contact the keystone server in order to validate the token. In some cases, the keystone server might be running on different node or on a 3rd party node (when using external keystone). ā€“ Note: Currently, all the authentication traffic will take place using the front-end (CES) IPs and not the back-end network SWIFT server SWIFT server SWIFT server NFS client NFS client NFS client NFS client OBJ client NFS client NFS client NFS client NFS client OBJ client NFS client NFS client NFS client NFS client OBJ client NFS client NFS client NFS client NFS client OBJ client keystone server AUTH AUTH
  • 25. IBM Speftrum Sfale fomponents - ā€œProtofolsā€ Control traff ā— Protocol Specific control traffic ā€“ HDFS: For Hadoop-like workload (or applications that are using the DFS API in order to access data, IBM Spectrum Scale implements the transparent Hadoop connector. The control data is being managed by a single ā€œnamenodeā€ at a time, which provides the client with info which data nodes to get the actual data from. Back-end: There is no special back-end data going on for the hadoop transparent connector. Front-end: Today, a single node act as the namenode at any given time. Clients will first contact this node, using the standard HDFS RPCs in order to query location of specific blocks. As with most control protocols, those are small messages, relatively latency sensitive (since actual data is usually large, the latency might not be that critical). HDFS name HDFS Data HDFS Data NFS client NFS client NFS client NFS client HDFS client NFS client NFS client NFS client NFS client HDFS client NFS client NFS client NFS client NFS client HDFS client NFS client NFS client NFS client NFS client HDFS client
  • 27. IBM Speftrum Sfale fomponents - ā€œGNRā€ ā— GPFS Native Raid (a.k.a IBM Spectrum Scale RAID) is a software based RAID implementation that is being used with several storage product, for example Elastic Storage Server (ESS) from IBM. ā— From ā€œcoreā€ IBM Spectrum Scale perspective, GNR node are considered NSD servers where ā€œbelowā€ the NSD layer we have the GNR layer itself that takes care of the physical disks management and data protection using declustered RAID technology. ā— That said, the GNR layer itself present several new network related flows in order to provide efficient write caching as well as monitoring of the partner node in each GNR building block.
  • 28. Main Data Flow IBM Speftrum Sfale fomponents - ā€œGNRā€ Data traff ā— As mentioned earlier, GNR nodes also act as NSD servers, thus the main data flow that is going on with GNR is reading/writing to/from clients. That part is discussed in the ā€œcoreā€ component. That said, the high performance that GNR nodes usually provide, require adequate network bandwidth in order to really utilize GNR nodes capabilities. ā— Another special type of traffic we have with GNR is NVRAM replication. Essentially, GNR uses the internal NVRAM on the nodes in order to implement a ā€œlogTipā€, as an extremely fast small write cache (technically, its an append to circular log) and for internal GNR metadata operations. Since the NVRAM is internal to each server, a special protocol (NSPD) is used in order to replicate those operations into the building block ā€œpartnerā€. The replication is based on small messages which are highly latency sensitive (higher latency will impact small write performance). Under some workload, it might make sense to separate this traffic using fast interconnect (e.g. IB) in order to enhance the performance and avoid the competition with the main data stream. GNR server GNR server GNR server GNR server GNR server GNR server NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSPD NSPD NSPD
  • 29. Main Data Flow IBM Speftrum Sfale fomponents - ā€œGNRā€ Control traff ā— Much of GNRā€™s control traffic is the same as the rest of the ā€œcoreā€ components. That said, due to the special need to manage large number of disks there are some special control traffic going on in the background in order to monitor each building block partner. ā— The nature of this traffic is around relatively small messages, but since they are performed in the background, latency if not that critical for those. GNR server GNR server GNR server GNR server GNR server GNR server NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSD client NSPD NSPD NSPD
  • 31. Other resourfes ā— List of ports being used by IBM Spectrum Scale components https://ibm.biz/BdixM4