SlideShare a Scribd company logo
1
About the Presenter
Jan Bigalke
SAS Architect at Allianz Managed Operations & Services SE
Greg Nelson
CEO and Founder of ThotWave Technologies
2
Beyond Best Practice: Grid
Computing in the Modern World
11562-2016
3
Five Reasons you should stay…
1. 90% of the time, an out of the box grid install wont handle your use
cases
2. You will gain a better appreciation of what’s happening with using
applications such as EG with SAS Grid Manager
3. You’ll better understand how options in queue will affect your
users based on differing workloads
4. You will learn how to estimate how many jobs your grid
environment can theoretically run
5. You will get a tutorial on how to go beyond the installation and
configure your system for high availability
6. Plus more….
4
Agenda
§ Introduction Modern SAS Architectures and SAS Grid
§ Understanding how SAS Workloads can impact different
resources
§ How SAS Grid Processes Work
§ Best practices for Post-Installation Configuration
§ Calculating Capacity
§ Implementing High Availability
§ Maintaining your SAS Software in a Grid Environment
5
Trends / Motivation
§ Workload management
§ Commodity Hardware x86
§ Scalability
§ Start small grow with the customer demand
§ Efficiency (reduce cost)
6
Types of Workload in an Enterprise SAS
Environment
7
Grid
SAS WebServer
SAS WebApp
SAS Meta
SAS Compute SAS Compute SAS Compute…..
Grid architecture
SAS WebServer
SAS WebApp
SAS Meta
SAS Compute
Changes with GRID:
Shared Filesystem for the compute servers necessary
Distribution of Workload between Servers
Scalability
Default
Shared FS
8
A SAS Grid
computing
environment is one
in which SAS
computing tasks are
distributed among
multiple computers
on a network, all
under the control of
SAS Grid Manager.
ArchitectureSAS Grid @work
9
Design considerations
§ Memory, CPU and I/O
§ Utilization, latency, throughput
§ Type of Workload , different needs
10
SAS Versions and grid capabilities
SAS Version new capabilities (key points)
9.4 M2 Grid Manager plug-in for the Environment
Manager
9.4 M1 stored process servers ,
pooled workspace servers grid-launched
9.4 M0 grid options,
grid-launched workspace servers
9.3 load balancing for stored
process servers, OLAP servers and pooled
workspace servers
9.2 SAS code analyzer
grid-launched batch SAS jobs
load balancing for SAS Workspace Servers
11
SAS Grid request flow
4
3
SAS Metadata Server
SAS ObjectSpawner
LSFgridrun Workspace
Server
SAS Grid Node
LSF
SAS® Enterprise Guide®
1
2
Request flow:
1. Metadata Server
2. Object Spawner
3. Get grid options
4. Spawn task on Grid Node
12
Metadata and GRID
Object Spawner Console Log:
2015-10-07T21:09:23,801 INFO (gridrun.c:590) - commHandler: command received is >[INIT] [PROVNAME]:"Platform" [MODNAME]:""
[SRVHOST]:"sas94-app1-syst.testdomain" [SRVPORT]:"0" [USERNAME]:"" [PASSWORD]:"" [TIMEOUT]:"0"
[OPTIONS]:<project=SASApp><.
2015-10-07T21:09:23,836 INFO (gridrun.c:609) - commHandler: command response is >[DONE]<.
2015-10-07T21:09:23,837 INFO (gridrun.c:590) - commHandler: command received is >[STARTJOB] [JOBNAME]:"SAS Enterprise
Guide_SASApp - Workspace Server_4101A009-6340-2B46-8E08-8DB2933E8182" [RESOURCES]:""
[COMMAND]:</var/opt/data/sas/sas94/configAPP/Lev1/SASApp/WorkspaceServer/WorkspaceServer.sh> [ARGUMENTS]:<-noterminal -
noxcmd -netencryptalgorithm AES -metaserver sas94-meta-syst.testdomain -metaport 8561 -metarepository Foundation -locale en_US -
objectserver -objectserverparms "delayconn sph=hosta.testdomain protocol=bridge spawned spp=42449 cid=0 pb classfactory=440196D4-
90F0-11D0-9F41-00A024BB830C server=OMSOBJ:SERVERCOMPONENT/A5ZI7NU4.AY0000WN cel=everything lb recon grid
"keepalive=30"" -METAUSER '"testuser@!*(generatedpassworddomain)*!"' -METAPASS 49944139d506b727d1555D7b1d8E6162 >
[OPTIONS]:<> [ARMCORR]:"" [FLAGS]:"0" [INFILES]:"" [OUTFILES]:"" [HOSTS]:"sas94-app1-syst.testdomain,sas94-app2-syst.testdomain"
[MPIPROCS]:"0" [PROCSHOST]:"0"<.
Job <33707> is submitted to queue <qiSASApp>.
13
Job Flow Processing Inside LSF
SEQUENCE:
1. Submit the job
2. Schedule the job
3. Dispatch for job
4. Run the job
5. Return output
Master Host
Submit job
Job
PEND
1 2
Compute Host
3
Dispatch job
Job
RUN
4Queue
5 Job output
mbatchd:
JOB_SCHEDULING_INTERVAL
gridrun
sbatchd:
JOB_ACCEPT_INTERVAL
SBD_SLEEP_TIME
Resource Info
Job
DONE
14
Best Practices
Configuration in Metadata
Beyond the basics with Queues
Multi-tenant considerations
SAS Grid and Hadoop
Theoreticalnumber of jobs
High availability
Software Updates
15
SAS configuration
§ STP , Workspace, batch
§ STP only balancing (needs to check if this helps)
§ Workspace Grid launched
§ Batch grid integration into enterprise scheduler
16
Grid Configuration in Metadata
§ Workspace Server
17
Grid Configuration in Metadata
§ Stored Process Server
18
Grid Configuration in Metadata
§ Pooled Workspace Server
19
GRID and workload considerations
§ GRID load balanced (load balanced based on utilization)
§ Web Requests need short latency
§ Grid for longer running request (Batch, Workspace
Server)
§ Online distribute via ObjectSpawner (shorter latency)
20
Mange Workloads with queues
Queues can manage Workload on different requirements
SAS
clients
protect access with groups
Parameter per queue:
• Priority
• Limits
• Stop und start conditions
Interactive
Batch
default
Queues
Scheduler
SAS Progs
21
Why Queues Matter
Parameter Example
(Interactive
Queue)
Example (Batch
Queue)
Definition
PRIORITY PRIORITY=50 PRIORITY=20 The relative priorities as compared to
other queues
NICE NICE=20 NICE=10 Specifies the execution priority change,
based on Linux “nice” values.
CPULIMIT CPULIMIT=5 CPULIMIT=15 a time limit applied to jobs
UJOB_LIMIT UJOB_LIMIT=5 UJOB_LIMIT=2 the maximum job slots per user in a queue
PJOB_LIMIT PJOB_LIMIT=10 PJOB_LIMIT=5 the maximum job slots per processor in a
queue.
QJOB_LIMIT QJOB_LIMIT = 120 QJOB_LIMIT = 60 the maximum jobs in a queue.
22
Why Queues Matter
Parameter Example
(Interactive
Queue)
Example (Batch
Queue)
Definition
HJOB_LIMIT HJOB_LIMIT = 4 HJOB_LIMIT = 4 Maximum number of job slots that this
queue can use on any host
CHUNK_JOB
_SIZE
CHUNK_JOB_SIZE
= 4
CHUNK_JOB_SIZE
= 4
Specifies the maximum number of jobs
allowed to be dispatched together in a
chunk.
r1m r1m=0.3/1.5 r1m=0.3/1.5 1-minute CPU run queue length (alias:cpu)
ut ut=0.2 ut=0.2 1-minute CPU utilization (0.0 to 1.0)
r15s r15s=0.3/1.5 r15s=0.3/1.5 15 second CPU run queue length
(alias:cpu)
it it=10/1 it=10/1 Idle time (minutes) (alias: idle)
23
Multi tenancy considerations
Queues per tenant
§ allow different settings per tenant
Queues and workload
§ stop_cond = select[ (cpuusg > 95.0) && (cguxx > 90.0)) ]
§ resume_cond = select[ (cguxx < 95.0) || (cpuusg < 95.0) ]
24
GRID and interaction with other systems
(databases / Hadoop)
§ Database Access interfaces (same DB client config)
§ Hadoop (Java JAR, Auth Kerberos, Grid)
Node 1
RDBMS
Node 2 Node 3 Node 4 Node 5 Node n
RDBMS
25
Capacity Sample
§ ! Jobs(k)
)
*+,
= MXJ1 + MXJ2 +..+MXJn
§ JOB_ACCEPT_INTERVAL is 1
§ MBD_SLEEP_TIME is 5
§ Platform LSF dispatches one job to a particular machine
and waits for 5 seconds before dispatching another job
to the same machine regardless of how long each job
takes
26
Capacity Sample
§ Average job duration: 5 seconds
§ JOB_ACCEPT_INTERVAL: 1
§ MBD_SLEEP_TIME: 5
§ 4 cores per host
§ 2 hosts
§ 4 job slots per core
27
Capacity Sample
§ Jobs per host= 60/ (1 * 5) = 12 number of jobs per host
per minute
§ 12 * 2 = 24 jobs per minute in the GRID
28
What’s Included
§ Core LSF Processes
(Base and Batch)
What’s Not Included
§ SAS Management Console
§ SAS Mid-Tier
§ SAS Object Spawner
§ Platform Process Manager·
§ Platform Grid Management
Service
Failover Options
29
LSF Daemons on UNIX
Server
Host
Master
Server
LIM
RES
PIM
PEM
LSF BASE
LSF BATCH
SBATCHD
MBATCHD
MBSCHD
30
Other Daemons on UNIX
JFD PM
GMSGABD
31
GRID failover EGO
§ Where can grid help, compute cluster
SAS WebServer
SAS WebApp
SAS Meta
SAS Compute SAS Compute SAS Compute…..
SAS Meta SAS Meta
SAS WebApp SAS WebApp Midtier Cluster
Metadata Cluster
Compute services via EGO
Grid
32
GRID and failover
§ EGO define the services and number of instances
if one Server goes down
EGO restarts the service on a failover node
Node 1 Node 2 Node 3
S S S
33
Failover
34
GRID and hot fixing process
§ Update considerations
§ Base and beyond.
Node 1 Node 2 Node 3
Shared Store
FIX FIX FIX
FIX Close node in LSF,
Stop Services EGO
Wait for end SAS processes
FIX binaries, open Node
Sync (rsync)
35
Conclusion
§ GRID allows us to scale horizontally
§ Different Workloads need different settings
§ We can optimize workloads with Queues
§ We use EGO to mange the services failover
§ GRID is not only SAS … EGO/LSF settings
36
Jan Bigalke
Allianz Managed Operations & Services SE
jan.bigalke@allianz.com
Greg Nelson
ThotWave Technologies
greg@ThotWave.com

More Related Content

What's hot

Tool it Up! - Session #3 - MySQL
Tool it Up! - Session #3 - MySQLTool it Up! - Session #3 - MySQL
Tool it Up! - Session #3 - MySQL
toolitup
 
TechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSTechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWS
Pythian
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
DataStax
 
Background Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbitBackground Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbit
Redis Labs
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
Denish Patel
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
DataStax Academy
 
Spark / Mesos Cluster Optimization
Spark / Mesos Cluster OptimizationSpark / Mesos Cluster Optimization
Spark / Mesos Cluster Optimization
ebiznext
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
DataStax Academy
 
Using advanced options in MariaDB Connector/J
Using advanced options in MariaDB Connector/JUsing advanced options in MariaDB Connector/J
Using advanced options in MariaDB Connector/J
MariaDB plc
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
DataStax Academy
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
DataStax Academy
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
ScyllaDB
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
DaeMyung Kang
 
Lightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning SpeedLightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning Speed
ScyllaDB
 
Cassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireCassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on Fire
DataStax
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
DataStax Academy
 
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
Ontico
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
Ontico
 
How We Made Scylla Maintenance Easier, Safer and Faster
How We Made Scylla Maintenance Easier, Safer and FasterHow We Made Scylla Maintenance Easier, Safer and Faster
How We Made Scylla Maintenance Easier, Safer and Faster
ScyllaDB
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
NAVER D2
 

What's hot (20)

Tool it Up! - Session #3 - MySQL
Tool it Up! - Session #3 - MySQLTool it Up! - Session #3 - MySQL
Tool it Up! - Session #3 - MySQL
 
TechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSTechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWS
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 
Background Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbitBackground Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbit
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
 
Spark / Mesos Cluster Optimization
Spark / Mesos Cluster OptimizationSpark / Mesos Cluster Optimization
Spark / Mesos Cluster Optimization
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Using advanced options in MariaDB Connector/J
Using advanced options in MariaDB Connector/JUsing advanced options in MariaDB Connector/J
Using advanced options in MariaDB Connector/J
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
 
Lightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning SpeedLightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning Speed
 
Cassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireCassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on Fire
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
 
How We Made Scylla Maintenance Easier, Safer and Faster
How We Made Scylla Maintenance Easier, Safer and FasterHow We Made Scylla Maintenance Easier, Safer and Faster
How We Made Scylla Maintenance Easier, Safer and Faster
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 

Similar to Beyond Best Practice: Grid Computing in the Modern World

Cloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftCloud Native Applications on OpenShift
Cloud Native Applications on OpenShift
Serhat Dirik
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
Jimmy Angelakos
 
VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
Stephen Rose
 
Windows Azure Acid Test
Windows Azure Acid TestWindows Azure Acid Test
Windows Azure Acid Test
expanz
 
OpenDS_Jazoon2010
OpenDS_Jazoon2010OpenDS_Jazoon2010
OpenDS_Jazoon2010
Ludovic Poitou
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Julien Anguenot
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
DataStax Academy
 
Building a Dynamic Rules Engine with Kafka Streams
Building a Dynamic Rules Engine with Kafka StreamsBuilding a Dynamic Rules Engine with Kafka Streams
Building a Dynamic Rules Engine with Kafka Streams
HostedbyConfluent
 
weblogic perfomence tuning
weblogic perfomence tuningweblogic perfomence tuning
weblogic perfomence tuning
prathap kumar
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
ScyllaDB
 
Oracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open WorldOracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open World
Paul Marden
 
Midwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL FeaturesMidwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL Features
Dave Stokes
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Sean Zhong
 
Network Automation with Salt and NAPALM: Introuction
Network Automation with Salt and NAPALM: IntrouctionNetwork Automation with Salt and NAPALM: Introuction
Network Automation with Salt and NAPALM: Introuction
Cloudflare
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 

Similar to Beyond Best Practice: Grid Computing in the Modern World (20)

Cloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftCloud Native Applications on OpenShift
Cloud Native Applications on OpenShift
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
 
VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
 
Windows Azure Acid Test
Windows Azure Acid TestWindows Azure Acid Test
Windows Azure Acid Test
 
OpenDS_Jazoon2010
OpenDS_Jazoon2010OpenDS_Jazoon2010
OpenDS_Jazoon2010
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
Building a Dynamic Rules Engine with Kafka Streams
Building a Dynamic Rules Engine with Kafka StreamsBuilding a Dynamic Rules Engine with Kafka Streams
Building a Dynamic Rules Engine with Kafka Streams
 
weblogic perfomence tuning
weblogic perfomence tuningweblogic perfomence tuning
weblogic perfomence tuning
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
Oracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open WorldOracle RAC Presentation at Oracle Open World
Oracle RAC Presentation at Oracle Open World
 
Midwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL FeaturesMidwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL Features
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Network Automation with Salt and NAPALM: Introuction
Network Automation with Salt and NAPALM: IntrouctionNetwork Automation with Salt and NAPALM: Introuction
Network Automation with Salt and NAPALM: Introuction
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 

Recently uploaded

一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
ArianaRamos54
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 

Recently uploaded (20)

一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 

Beyond Best Practice: Grid Computing in the Modern World

  • 1. 1 About the Presenter Jan Bigalke SAS Architect at Allianz Managed Operations & Services SE Greg Nelson CEO and Founder of ThotWave Technologies
  • 2. 2 Beyond Best Practice: Grid Computing in the Modern World 11562-2016
  • 3. 3 Five Reasons you should stay… 1. 90% of the time, an out of the box grid install wont handle your use cases 2. You will gain a better appreciation of what’s happening with using applications such as EG with SAS Grid Manager 3. You’ll better understand how options in queue will affect your users based on differing workloads 4. You will learn how to estimate how many jobs your grid environment can theoretically run 5. You will get a tutorial on how to go beyond the installation and configure your system for high availability 6. Plus more….
  • 4. 4 Agenda § Introduction Modern SAS Architectures and SAS Grid § Understanding how SAS Workloads can impact different resources § How SAS Grid Processes Work § Best practices for Post-Installation Configuration § Calculating Capacity § Implementing High Availability § Maintaining your SAS Software in a Grid Environment
  • 5. 5 Trends / Motivation § Workload management § Commodity Hardware x86 § Scalability § Start small grow with the customer demand § Efficiency (reduce cost)
  • 6. 6 Types of Workload in an Enterprise SAS Environment
  • 7. 7 Grid SAS WebServer SAS WebApp SAS Meta SAS Compute SAS Compute SAS Compute….. Grid architecture SAS WebServer SAS WebApp SAS Meta SAS Compute Changes with GRID: Shared Filesystem for the compute servers necessary Distribution of Workload between Servers Scalability Default Shared FS
  • 8. 8 A SAS Grid computing environment is one in which SAS computing tasks are distributed among multiple computers on a network, all under the control of SAS Grid Manager. ArchitectureSAS Grid @work
  • 9. 9 Design considerations § Memory, CPU and I/O § Utilization, latency, throughput § Type of Workload , different needs
  • 10. 10 SAS Versions and grid capabilities SAS Version new capabilities (key points) 9.4 M2 Grid Manager plug-in for the Environment Manager 9.4 M1 stored process servers , pooled workspace servers grid-launched 9.4 M0 grid options, grid-launched workspace servers 9.3 load balancing for stored process servers, OLAP servers and pooled workspace servers 9.2 SAS code analyzer grid-launched batch SAS jobs load balancing for SAS Workspace Servers
  • 11. 11 SAS Grid request flow 4 3 SAS Metadata Server SAS ObjectSpawner LSFgridrun Workspace Server SAS Grid Node LSF SAS® Enterprise Guide® 1 2 Request flow: 1. Metadata Server 2. Object Spawner 3. Get grid options 4. Spawn task on Grid Node
  • 12. 12 Metadata and GRID Object Spawner Console Log: 2015-10-07T21:09:23,801 INFO (gridrun.c:590) - commHandler: command received is >[INIT] [PROVNAME]:"Platform" [MODNAME]:"" [SRVHOST]:"sas94-app1-syst.testdomain" [SRVPORT]:"0" [USERNAME]:"" [PASSWORD]:"" [TIMEOUT]:"0" [OPTIONS]:<project=SASApp><. 2015-10-07T21:09:23,836 INFO (gridrun.c:609) - commHandler: command response is >[DONE]<. 2015-10-07T21:09:23,837 INFO (gridrun.c:590) - commHandler: command received is >[STARTJOB] [JOBNAME]:"SAS Enterprise Guide_SASApp - Workspace Server_4101A009-6340-2B46-8E08-8DB2933E8182" [RESOURCES]:"" [COMMAND]:</var/opt/data/sas/sas94/configAPP/Lev1/SASApp/WorkspaceServer/WorkspaceServer.sh> [ARGUMENTS]:<-noterminal - noxcmd -netencryptalgorithm AES -metaserver sas94-meta-syst.testdomain -metaport 8561 -metarepository Foundation -locale en_US - objectserver -objectserverparms "delayconn sph=hosta.testdomain protocol=bridge spawned spp=42449 cid=0 pb classfactory=440196D4- 90F0-11D0-9F41-00A024BB830C server=OMSOBJ:SERVERCOMPONENT/A5ZI7NU4.AY0000WN cel=everything lb recon grid "keepalive=30"" -METAUSER '"testuser@!*(generatedpassworddomain)*!"' -METAPASS 49944139d506b727d1555D7b1d8E6162 > [OPTIONS]:<> [ARMCORR]:"" [FLAGS]:"0" [INFILES]:"" [OUTFILES]:"" [HOSTS]:"sas94-app1-syst.testdomain,sas94-app2-syst.testdomain" [MPIPROCS]:"0" [PROCSHOST]:"0"<. Job <33707> is submitted to queue <qiSASApp>.
  • 13. 13 Job Flow Processing Inside LSF SEQUENCE: 1. Submit the job 2. Schedule the job 3. Dispatch for job 4. Run the job 5. Return output Master Host Submit job Job PEND 1 2 Compute Host 3 Dispatch job Job RUN 4Queue 5 Job output mbatchd: JOB_SCHEDULING_INTERVAL gridrun sbatchd: JOB_ACCEPT_INTERVAL SBD_SLEEP_TIME Resource Info Job DONE
  • 14. 14 Best Practices Configuration in Metadata Beyond the basics with Queues Multi-tenant considerations SAS Grid and Hadoop Theoreticalnumber of jobs High availability Software Updates
  • 15. 15 SAS configuration § STP , Workspace, batch § STP only balancing (needs to check if this helps) § Workspace Grid launched § Batch grid integration into enterprise scheduler
  • 16. 16 Grid Configuration in Metadata § Workspace Server
  • 17. 17 Grid Configuration in Metadata § Stored Process Server
  • 18. 18 Grid Configuration in Metadata § Pooled Workspace Server
  • 19. 19 GRID and workload considerations § GRID load balanced (load balanced based on utilization) § Web Requests need short latency § Grid for longer running request (Batch, Workspace Server) § Online distribute via ObjectSpawner (shorter latency)
  • 20. 20 Mange Workloads with queues Queues can manage Workload on different requirements SAS clients protect access with groups Parameter per queue: • Priority • Limits • Stop und start conditions Interactive Batch default Queues Scheduler SAS Progs
  • 21. 21 Why Queues Matter Parameter Example (Interactive Queue) Example (Batch Queue) Definition PRIORITY PRIORITY=50 PRIORITY=20 The relative priorities as compared to other queues NICE NICE=20 NICE=10 Specifies the execution priority change, based on Linux “nice” values. CPULIMIT CPULIMIT=5 CPULIMIT=15 a time limit applied to jobs UJOB_LIMIT UJOB_LIMIT=5 UJOB_LIMIT=2 the maximum job slots per user in a queue PJOB_LIMIT PJOB_LIMIT=10 PJOB_LIMIT=5 the maximum job slots per processor in a queue. QJOB_LIMIT QJOB_LIMIT = 120 QJOB_LIMIT = 60 the maximum jobs in a queue.
  • 22. 22 Why Queues Matter Parameter Example (Interactive Queue) Example (Batch Queue) Definition HJOB_LIMIT HJOB_LIMIT = 4 HJOB_LIMIT = 4 Maximum number of job slots that this queue can use on any host CHUNK_JOB _SIZE CHUNK_JOB_SIZE = 4 CHUNK_JOB_SIZE = 4 Specifies the maximum number of jobs allowed to be dispatched together in a chunk. r1m r1m=0.3/1.5 r1m=0.3/1.5 1-minute CPU run queue length (alias:cpu) ut ut=0.2 ut=0.2 1-minute CPU utilization (0.0 to 1.0) r15s r15s=0.3/1.5 r15s=0.3/1.5 15 second CPU run queue length (alias:cpu) it it=10/1 it=10/1 Idle time (minutes) (alias: idle)
  • 23. 23 Multi tenancy considerations Queues per tenant § allow different settings per tenant Queues and workload § stop_cond = select[ (cpuusg > 95.0) && (cguxx > 90.0)) ] § resume_cond = select[ (cguxx < 95.0) || (cpuusg < 95.0) ]
  • 24. 24 GRID and interaction with other systems (databases / Hadoop) § Database Access interfaces (same DB client config) § Hadoop (Java JAR, Auth Kerberos, Grid) Node 1 RDBMS Node 2 Node 3 Node 4 Node 5 Node n RDBMS
  • 25. 25 Capacity Sample § ! Jobs(k) ) *+, = MXJ1 + MXJ2 +..+MXJn § JOB_ACCEPT_INTERVAL is 1 § MBD_SLEEP_TIME is 5 § Platform LSF dispatches one job to a particular machine and waits for 5 seconds before dispatching another job to the same machine regardless of how long each job takes
  • 26. 26 Capacity Sample § Average job duration: 5 seconds § JOB_ACCEPT_INTERVAL: 1 § MBD_SLEEP_TIME: 5 § 4 cores per host § 2 hosts § 4 job slots per core
  • 27. 27 Capacity Sample § Jobs per host= 60/ (1 * 5) = 12 number of jobs per host per minute § 12 * 2 = 24 jobs per minute in the GRID
  • 28. 28 What’s Included § Core LSF Processes (Base and Batch) What’s Not Included § SAS Management Console § SAS Mid-Tier § SAS Object Spawner § Platform Process Manager· § Platform Grid Management Service Failover Options
  • 29. 29 LSF Daemons on UNIX Server Host Master Server LIM RES PIM PEM LSF BASE LSF BATCH SBATCHD MBATCHD MBSCHD
  • 30. 30 Other Daemons on UNIX JFD PM GMSGABD
  • 31. 31 GRID failover EGO § Where can grid help, compute cluster SAS WebServer SAS WebApp SAS Meta SAS Compute SAS Compute SAS Compute….. SAS Meta SAS Meta SAS WebApp SAS WebApp Midtier Cluster Metadata Cluster Compute services via EGO Grid
  • 32. 32 GRID and failover § EGO define the services and number of instances if one Server goes down EGO restarts the service on a failover node Node 1 Node 2 Node 3 S S S
  • 34. 34 GRID and hot fixing process § Update considerations § Base and beyond. Node 1 Node 2 Node 3 Shared Store FIX FIX FIX FIX Close node in LSF, Stop Services EGO Wait for end SAS processes FIX binaries, open Node Sync (rsync)
  • 35. 35 Conclusion § GRID allows us to scale horizontally § Different Workloads need different settings § We can optimize workloads with Queues § We use EGO to mange the services failover § GRID is not only SAS … EGO/LSF settings
  • 36. 36 Jan Bigalke Allianz Managed Operations & Services SE jan.bigalke@allianz.com Greg Nelson ThotWave Technologies greg@ThotWave.com