SlideShare a Scribd company logo
1 of 25
Spectrum Scale Memory
Usage
Tomer Perry
Spectrum Scale Development
<tomp@il.ibm.com>
Outline
●
Relevant Scale Basics
●
Scale Memory Pools
●
Memory Calculations
●
Is that all?
●
Others
Outline
●
Relevant Scale Basics
●
Scale Memory Pools
●
Memory Calculations
●
Is that all?
●
Others
Some relevant Scale Basics
”The Cluster”
• Cluster – A group of operating system
instances, or nodes, on which an
instance of Spectrum Scale is deployed
• Network Shared Disk - Any block
storage (disk, partition, or LUN) given to
Spectrum Scale for its use
• NSD server – A node making an NSD
available to other nodes over the network
• GPFS filesystem – Combination of data
and metadata which is deployed over
NSDs and managed by the cluster
• Client – A node running Scale Software
accessing a filesystem
nsd
nsd
srv
nsd nsd
nsd
srv
nsd
srv
nsd
srv
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
Some relevant Scale Basics
MultiCluster
• A Cluster can share some or all of its
filesystems with other clusters
• Such cluster can be referred to as
“Storage Cluster”
• A cluster that don’t have any local
filesystems can be referred to as “Client
Cluster”
• A Client cluster can connect to 31
clusters ( outbound)
• A Storage cluster can be mounted by
unlimited number of client clusters
( Inbound) – 16383 really.
• Client cluster nodes “joins” the storage
cluster upon mount
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
Some relevant Scale Basics
Node Roles
• While the general concept in Scale is “all nodes were made equal” some has special roles
Config Servers

Holds cluster
configuration files

4.1 onward supports
CCR - quorum

Not in the data path
Cluster Manager

One of the quorum
nodes

Manage leases,
failures and recoveries

Selects filesystem
managers

mm*mgr -c
Filesystem Manager

One of the “manager”
nodes

One per filesystem

Manage filesystem
configurations ( disks)

Space allocation

Quota management
Token Manager/s

Multiple per filesystem

Each manage portion
of tokens for each
filesystem based on
inode number
Some relevant Scale Basics
Node Internal Structure
User Mode Daemon:
I/O Optimization, Communications, Clustering Kernel Module
Pagepool (Buffer Pool)
Pinned Memory
Disk Descriptor
File System Descriptor
File metadata
File Data
Application
Shared Storage
Communication
TCP/IP
RDMA
Sockets
GPFS Commands
/usr/lpp/mmfs/bin
Configuration
CCR
Other Memory
Non-pinned Memory
Other Nodes
Shell
Outline
●
Relevant Scale Basics
●
Scale Memory Pools
●
Memory Calculations
●
Is that all?
●
Others
Scale Memory Pools
• At a high level, Scale is using several
memory pools:
• Pagepool
data
• Shared Segment
“memory metadata”, shared ( U+K)
• External/heap
Used by “external” parts of Scale ( AFM
queue, Ganesha, SAMBA etc.)
External/Heap
SharedSegment
Pagepool
Scale Memory Pools
• At a high level, Scale is using several
memory pools:
• Pagepool
data
• Shared Segment
“memory metadata”, shared ( U+K)
• External/heap
Used by “external” parts of Scale ( AFM
queue, Ganesha, SAMBA etc.)
External/Heap
SharedSegment
Pagepool
Scale Memory Pools
Pagepool
• Pagepool size is statically defined using
the pagepool config option. It is allocated
on daemon startup and can be changed
using mmchconfig with the -i option
( NUMA, fragmentation)
• Pagepool stores data only
• All pagepool content is referenced by the
sharedSegment ( data is worthless without
metadata…)
• Not all the sharedSegment points to the
pagepool
SharedSegment
Pagepool
Scale Memory Pools
SharedSegment
• The shared segment has two major pools
• Pool2 - “Shared Segment”
-
All references to data stored in
the pagepool ( bufferDesc,
Openfiles, IndBlocks,
CliTokens) etc.
-
Statcache ( compactOpenfile)
• Pool3 - “Token Memory”
-
Tokens on token servers
-
Some client tokens ( BR)
Pool2
“UNPINNED”
Pool3
TokenMemory
Scale Memory Pools
SharedSegment
• What’s the difference between
maxFilesToCache and maxStatCache?
– maxFilesToCache:
How many files data and metadata
can we cache in the pagepool.
Implies how many concurrent open
files we can have
– maxStatCache:
If a file needs to be evicted from the
pagepool due to MFTC above, we
can still take its stat structure,
compact it and store it here. So we
won’t need an IOP for those stat
calls
maxStatCache
maxFilesToCache
Outline
●
Relevant Scale Basics
●
Scale Memory Pools
●
Memory Calculations
●
Is that all?
●
Others
Memory Calculations
”So how much memory do I need?”
• Pagepool - ”it depends...”:
How much DATA do you need to cache?
How much RAM can you spend?
• Shared Segment - “it depends...”
How many files do you need to cache?
How many inodes do you need to cache?
What is your access pattern ( BR)
SharedSegment
Pagepool
Memory Calculations
SharedSegment
0 10000 20000 30000 40000 50000 60000
100
200
300
400
500
Memory Use
per cached files
No Of Files
MB
• Pool 2 - “Shared Segment”
– Scope: Almost completely local
( not much dependency on other
nodes)
– Relevant parameters:
●
maxFilesToCache
●
maxStatCache
– Cost:
●
Each MFTC =~ 10k
●
Each MSC =~ 480b
For example, using MFTC of 50k and MSC of 20k and
opening 60k files
Memory Calculations
SharedSegment
• Pool 3 - “Token Memory”
– Scope: “Storage cluster” manager nodes
– Relevant parameters:
●
MFTC, MSC, number of nodes, “hot objects”
– Cost:
●
TokenMemPerFile ~= 464b, ServerBRTreeNode = 56b
●
BasicTokenMem = (MFTC+MSC)*NoOfNodes*TokenMemPerFile
●
ClusterBRTokens = NoOfHotBlocksRandomAccess * ServerBRTreeNode
●
TotalTokenMemory = BasicTokenMem + ClusterBRTokens
●
PerTokenServerMem = TotalTokenMemory / NoOfTokenServers
* The number of the byte range tokens, in the worse case scenario, would be one byte range per block for all the blocks being accessed randomly
Memory Calculations
SharedSegment
• Pool 3 - “Token Memory”
– Scope: “Storage cluster” manager nodes
– Relevant parameters:
●
MFTC, MSC, number of nodes, “hot objects”
– Cost:
●
TokenMemPerFile ~= 464b, ServerBRTreeNode = 56b
●
BasicTokenMem = (MFTC+MSC)*NoOfNodes*TokenMemPerFile
●
ClusterBRTokens = NoOfHotBlocksRandomAccess * ServerBRTreeNode
●
TotalTokenMemory = BasicTokenMem + ClusterBRTokens
●
PerTokenServerMem = TotalTokenMemory / NoOfTokenServers
* The number of the byte range tokens, in the worse case scenario, would be one byte range per block for all the blocks being accessed randomly
Overall number of cached objects in the cluster
( different nodes might cache different numbers) * 0.5K
+ monitor token memory usage
Memory Calculations
SharedSegment
0 10000 20000 30000 40000 50000 60000
200
400
600
800
1000
1200
Token Server Memory Use
Per cached file
SrvTM SrvTM*25
NoOfFiles
MB
• Pool 3 - “Shared Segment”
– As can be seen in the graph, single
client don’t usually large memory
foot print ( blue line)
– Using 25 clients, already brings us
closer to 1G
– There is no difference between the
“cost” of MFTC and MSC
For example, using MFTC of 50k and MSC of 20k and
opening 60k files
Memory Calculations
SharedSegment
• But...where can we find those numbers?
– Mmdiag is your friend
– “The Command-Who-Must-Not-Be
Named” can provide even more
details ( dump malloc)
Outline
●
Relevant Scale Basics
●
Scale Memory Pools
●
Memory Calculations
●
Is that all?
●
Others
Is that all?
Related OS memory usage
0 10000 20000 30000 40000 50000 60000
100
200
300
400
500
Memory Utilization
Shared2 dEntryOSz inodeOvSz
No Of Files
MB
• “Scale don’t use the operating system
cache”...well...sort of
– Directory Cache ( a.k.a dcache) - ~192b
– VFS inode cache - ~1K
• But is it really important? The SharedSeg
takes much more memory
Outline
●
Relevant Scale Basics
●
Scale Memory Pools
●
Memory Calculations
●
Is that all?( Linux memory)
●
Others
Others
Out of scope here
• TCP Memory
• AFM queue memory
• Policy ( and its affect on cached objects)
• Ganesha (MDCache)
• SAMBA
• Sysmon, zimon
• GUI
• TCT, Archive
• fsck, other maintenance commands
Thank
You

More Related Content

What's hot

Introduction to IBM Spectrum Scale and Its Use in Life Science
Introduction to IBM Spectrum Scale and Its Use in Life ScienceIntroduction to IBM Spectrum Scale and Its Use in Life Science
Introduction to IBM Spectrum Scale and Its Use in Life ScienceSandeep Patil
 
1 introduction to windows server 2016
1  introduction to windows server 20161  introduction to windows server 2016
1 introduction to windows server 2016Hameda Hurmat
 
Structure of shared memory space
Structure of shared memory spaceStructure of shared memory space
Structure of shared memory spaceCoder Tech
 
RESTful services on IBM Domino/XWork
RESTful services on IBM Domino/XWorkRESTful services on IBM Domino/XWork
RESTful services on IBM Domino/XWorkJohn Dalsgaard
 
active-directory-domain-services
active-directory-domain-servicesactive-directory-domain-services
active-directory-domain-services202066
 
Chapter01 Introduction To Windows Server 2003
Chapter01     Introduction To  Windows  Server 2003Chapter01     Introduction To  Windows  Server 2003
Chapter01 Introduction To Windows Server 2003Raja Waseem Akhtar
 
IBM Spectrum Scale Authentication for File Access - Deep Dive
IBM Spectrum Scale Authentication for File Access - Deep DiveIBM Spectrum Scale Authentication for File Access - Deep Dive
IBM Spectrum Scale Authentication for File Access - Deep DiveShradha Nayak Thakare
 
Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed SystemsDr Sandeep Kumar Poonia
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Fazli Amin
 
Distributed Operating System,Network OS and Middle-ware.??
Distributed Operating System,Network OS and Middle-ware.??Distributed Operating System,Network OS and Middle-ware.??
Distributed Operating System,Network OS and Middle-ware.??Abdul Aslam
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...xKinAnx
 
Backroll: Production Grade KVM Backup Solution Integrated in CloudStack
Backroll: Production Grade KVM Backup Solution Integrated in CloudStackBackroll: Production Grade KVM Backup Solution Integrated in CloudStack
Backroll: Production Grade KVM Backup Solution Integrated in CloudStackShapeBlue
 
Backup & restore in windows
Backup & restore in windowsBackup & restore in windows
Backup & restore in windowsJab Vtl
 
VMware Log Insight
VMware Log Insight VMware Log Insight
VMware Log Insight Iwan Rahabok
 
11 distributed file_systems
11 distributed file_systems11 distributed file_systems
11 distributed file_systemslongly
 

What's hot (20)

Introduction to IBM Spectrum Scale and Its Use in Life Science
Introduction to IBM Spectrum Scale and Its Use in Life ScienceIntroduction to IBM Spectrum Scale and Its Use in Life Science
Introduction to IBM Spectrum Scale and Its Use in Life Science
 
1 introduction to windows server 2016
1  introduction to windows server 20161  introduction to windows server 2016
1 introduction to windows server 2016
 
Structure of shared memory space
Structure of shared memory spaceStructure of shared memory space
Structure of shared memory space
 
RESTful services on IBM Domino/XWork
RESTful services on IBM Domino/XWorkRESTful services on IBM Domino/XWork
RESTful services on IBM Domino/XWork
 
active-directory-domain-services
active-directory-domain-servicesactive-directory-domain-services
active-directory-domain-services
 
Chapter01 Introduction To Windows Server 2003
Chapter01     Introduction To  Windows  Server 2003Chapter01     Introduction To  Windows  Server 2003
Chapter01 Introduction To Windows Server 2003
 
IBM Spectrum Scale Authentication for File Access - Deep Dive
IBM Spectrum Scale Authentication for File Access - Deep DiveIBM Spectrum Scale Authentication for File Access - Deep Dive
IBM Spectrum Scale Authentication for File Access - Deep Dive
 
Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed Systems
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)
 
Lecture 3 threads
Lecture 3   threadsLecture 3   threads
Lecture 3 threads
 
Distributed Operating System,Network OS and Middle-ware.??
Distributed Operating System,Network OS and Middle-ware.??Distributed Operating System,Network OS and Middle-ware.??
Distributed Operating System,Network OS and Middle-ware.??
 
Unit 1
Unit 1Unit 1
Unit 1
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...
 
Backroll: Production Grade KVM Backup Solution Integrated in CloudStack
Backroll: Production Grade KVM Backup Solution Integrated in CloudStackBackroll: Production Grade KVM Backup Solution Integrated in CloudStack
Backroll: Production Grade KVM Backup Solution Integrated in CloudStack
 
Backup & restore in windows
Backup & restore in windowsBackup & restore in windows
Backup & restore in windows
 
NUMA overview
NUMA overviewNUMA overview
NUMA overview
 
Distributed file systems chapter 9
Distributed file systems chapter 9Distributed file systems chapter 9
Distributed file systems chapter 9
 
VMware Log Insight
VMware Log Insight VMware Log Insight
VMware Log Insight
 
11 distributed file_systems
11 distributed file_systems11 distributed file_systems
11 distributed file_systems
 

Similar to Spectrum Scale Memory Usage

Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Databricks
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)MongoDB
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Community
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment StrategiesMongoDB
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)Lars Marowsky-Brée
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Driver development – memory management
Driver development – memory managementDriver development – memory management
Driver development – memory managementVandana Salve
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services
 
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)Suresh Kumar
 
Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02Suresh Kumar
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Lars Marowsky-Brée
 
Mac Memory Analysis with Volatility
Mac Memory Analysis with VolatilityMac Memory Analysis with Volatility
Mac Memory Analysis with VolatilityAndrew Case
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internalsSisimon Soman
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterPatrick Quairoli
 
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureHeechul Yun
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
 

Similar to Spectrum Scale Memory Usage (20)

Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Driver development – memory management
Driver development – memory managementDriver development – memory management
Driver development – memory management
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
Memory (Computer Organization)
Memory (Computer Organization)Memory (Computer Organization)
Memory (Computer Organization)
 
Kinetic basho public
Kinetic basho publicKinetic basho public
Kinetic basho public
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
 
Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
 
Mac Memory Analysis with Volatility
Mac Memory Analysis with VolatilityMac Memory Analysis with Volatility
Mac Memory Analysis with Volatility
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internals
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System Architecture
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Spectrum Scale Memory Usage

  • 1. Spectrum Scale Memory Usage Tomer Perry Spectrum Scale Development <tomp@il.ibm.com>
  • 2. Outline ● Relevant Scale Basics ● Scale Memory Pools ● Memory Calculations ● Is that all? ● Others
  • 3. Outline ● Relevant Scale Basics ● Scale Memory Pools ● Memory Calculations ● Is that all? ● Others
  • 4. Some relevant Scale Basics ”The Cluster” • Cluster – A group of operating system instances, or nodes, on which an instance of Spectrum Scale is deployed • Network Shared Disk - Any block storage (disk, partition, or LUN) given to Spectrum Scale for its use • NSD server – A node making an NSD available to other nodes over the network • GPFS filesystem – Combination of data and metadata which is deployed over NSDs and managed by the cluster • Client – A node running Scale Software accessing a filesystem nsd nsd srv nsd nsd nsd srv nsd srv nsd srv CCCCC CCCCC CCCCC CCCCC CCCCC CCCCC CCCCC CCCCC CCCCC CCCCC CCCCC CCCCC CCCCC CCCCC CCCCC
  • 5. Some relevant Scale Basics MultiCluster • A Cluster can share some or all of its filesystems with other clusters • Such cluster can be referred to as “Storage Cluster” • A cluster that don’t have any local filesystems can be referred to as “Client Cluster” • A Client cluster can connect to 31 clusters ( outbound) • A Storage cluster can be mounted by unlimited number of client clusters ( Inbound) – 16383 really. • Client cluster nodes “joins” the storage cluster upon mount CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC
  • 6. Some relevant Scale Basics Node Roles • While the general concept in Scale is “all nodes were made equal” some has special roles Config Servers  Holds cluster configuration files  4.1 onward supports CCR - quorum  Not in the data path Cluster Manager  One of the quorum nodes  Manage leases, failures and recoveries  Selects filesystem managers  mm*mgr -c Filesystem Manager  One of the “manager” nodes  One per filesystem  Manage filesystem configurations ( disks)  Space allocation  Quota management Token Manager/s  Multiple per filesystem  Each manage portion of tokens for each filesystem based on inode number
  • 7. Some relevant Scale Basics Node Internal Structure User Mode Daemon: I/O Optimization, Communications, Clustering Kernel Module Pagepool (Buffer Pool) Pinned Memory Disk Descriptor File System Descriptor File metadata File Data Application Shared Storage Communication TCP/IP RDMA Sockets GPFS Commands /usr/lpp/mmfs/bin Configuration CCR Other Memory Non-pinned Memory Other Nodes Shell
  • 8. Outline ● Relevant Scale Basics ● Scale Memory Pools ● Memory Calculations ● Is that all? ● Others
  • 9. Scale Memory Pools • At a high level, Scale is using several memory pools: • Pagepool data • Shared Segment “memory metadata”, shared ( U+K) • External/heap Used by “external” parts of Scale ( AFM queue, Ganesha, SAMBA etc.) External/Heap SharedSegment Pagepool
  • 10. Scale Memory Pools • At a high level, Scale is using several memory pools: • Pagepool data • Shared Segment “memory metadata”, shared ( U+K) • External/heap Used by “external” parts of Scale ( AFM queue, Ganesha, SAMBA etc.) External/Heap SharedSegment Pagepool
  • 11. Scale Memory Pools Pagepool • Pagepool size is statically defined using the pagepool config option. It is allocated on daemon startup and can be changed using mmchconfig with the -i option ( NUMA, fragmentation) • Pagepool stores data only • All pagepool content is referenced by the sharedSegment ( data is worthless without metadata…) • Not all the sharedSegment points to the pagepool SharedSegment Pagepool
  • 12. Scale Memory Pools SharedSegment • The shared segment has two major pools • Pool2 - “Shared Segment” - All references to data stored in the pagepool ( bufferDesc, Openfiles, IndBlocks, CliTokens) etc. - Statcache ( compactOpenfile) • Pool3 - “Token Memory” - Tokens on token servers - Some client tokens ( BR) Pool2 “UNPINNED” Pool3 TokenMemory
  • 13. Scale Memory Pools SharedSegment • What’s the difference between maxFilesToCache and maxStatCache? – maxFilesToCache: How many files data and metadata can we cache in the pagepool. Implies how many concurrent open files we can have – maxStatCache: If a file needs to be evicted from the pagepool due to MFTC above, we can still take its stat structure, compact it and store it here. So we won’t need an IOP for those stat calls maxStatCache maxFilesToCache
  • 14. Outline ● Relevant Scale Basics ● Scale Memory Pools ● Memory Calculations ● Is that all? ● Others
  • 15. Memory Calculations ”So how much memory do I need?” • Pagepool - ”it depends...”: How much DATA do you need to cache? How much RAM can you spend? • Shared Segment - “it depends...” How many files do you need to cache? How many inodes do you need to cache? What is your access pattern ( BR) SharedSegment Pagepool
  • 16. Memory Calculations SharedSegment 0 10000 20000 30000 40000 50000 60000 100 200 300 400 500 Memory Use per cached files No Of Files MB • Pool 2 - “Shared Segment” – Scope: Almost completely local ( not much dependency on other nodes) – Relevant parameters: ● maxFilesToCache ● maxStatCache – Cost: ● Each MFTC =~ 10k ● Each MSC =~ 480b For example, using MFTC of 50k and MSC of 20k and opening 60k files
  • 17. Memory Calculations SharedSegment • Pool 3 - “Token Memory” – Scope: “Storage cluster” manager nodes – Relevant parameters: ● MFTC, MSC, number of nodes, “hot objects” – Cost: ● TokenMemPerFile ~= 464b, ServerBRTreeNode = 56b ● BasicTokenMem = (MFTC+MSC)*NoOfNodes*TokenMemPerFile ● ClusterBRTokens = NoOfHotBlocksRandomAccess * ServerBRTreeNode ● TotalTokenMemory = BasicTokenMem + ClusterBRTokens ● PerTokenServerMem = TotalTokenMemory / NoOfTokenServers * The number of the byte range tokens, in the worse case scenario, would be one byte range per block for all the blocks being accessed randomly
  • 18. Memory Calculations SharedSegment • Pool 3 - “Token Memory” – Scope: “Storage cluster” manager nodes – Relevant parameters: ● MFTC, MSC, number of nodes, “hot objects” – Cost: ● TokenMemPerFile ~= 464b, ServerBRTreeNode = 56b ● BasicTokenMem = (MFTC+MSC)*NoOfNodes*TokenMemPerFile ● ClusterBRTokens = NoOfHotBlocksRandomAccess * ServerBRTreeNode ● TotalTokenMemory = BasicTokenMem + ClusterBRTokens ● PerTokenServerMem = TotalTokenMemory / NoOfTokenServers * The number of the byte range tokens, in the worse case scenario, would be one byte range per block for all the blocks being accessed randomly Overall number of cached objects in the cluster ( different nodes might cache different numbers) * 0.5K + monitor token memory usage
  • 19. Memory Calculations SharedSegment 0 10000 20000 30000 40000 50000 60000 200 400 600 800 1000 1200 Token Server Memory Use Per cached file SrvTM SrvTM*25 NoOfFiles MB • Pool 3 - “Shared Segment” – As can be seen in the graph, single client don’t usually large memory foot print ( blue line) – Using 25 clients, already brings us closer to 1G – There is no difference between the “cost” of MFTC and MSC For example, using MFTC of 50k and MSC of 20k and opening 60k files
  • 20. Memory Calculations SharedSegment • But...where can we find those numbers? – Mmdiag is your friend – “The Command-Who-Must-Not-Be Named” can provide even more details ( dump malloc)
  • 21. Outline ● Relevant Scale Basics ● Scale Memory Pools ● Memory Calculations ● Is that all? ● Others
  • 22. Is that all? Related OS memory usage 0 10000 20000 30000 40000 50000 60000 100 200 300 400 500 Memory Utilization Shared2 dEntryOSz inodeOvSz No Of Files MB • “Scale don’t use the operating system cache”...well...sort of – Directory Cache ( a.k.a dcache) - ~192b – VFS inode cache - ~1K • But is it really important? The SharedSeg takes much more memory
  • 23. Outline ● Relevant Scale Basics ● Scale Memory Pools ● Memory Calculations ● Is that all?( Linux memory) ● Others
  • 24. Others Out of scope here • TCP Memory • AFM queue memory • Policy ( and its affect on cached objects) • Ganesha (MDCache) • SAMBA • Sysmon, zimon • GUI • TCT, Archive • fsck, other maintenance commands