Citrix XenDesktop: Dealing with Failure - SYN408

SYN408:
XenDesktop 7.6 Architecture:
dealing with failure
Tom Gamull – Ericsson Consulting Manager
Citrix Synergy – May 2015
@magicalyak

WHAT WOULD YOU SAY YOU DO HERE?
2

Prevent Failures to begin with
•Failures are bad events
•Today’s technology should be
bulletproof
•Is 99.999% uptime the new
normal?

“The perfect is the enemy of
the good” - Voltaire

Our thinking is broken
Customer: “I can’t get to my
desktop”
Support/Admin: “The desktops
aren’t working because storage
failed”
CIO/Boss: “We need to ensure
storage never fails”

Solution
• Upgrade/Redundant SAN
• Somehow believe replication
can occur without penalty
(sales guy promised)
• Storage stays up!

Netflix Chaos Monkey
2010: Netflix moves to AWS
2011: US-East Outage - Netflix posts lessons learned
The best way to avoid failure is to fail constantly
Since 2013: Chaos Monkey is run in
Production except holidays and weekends

Before you buy more stuff – try this
• How do you respond to events today?
• How long to identity them?
• How long to solve them?
• Mean time before Failure is legacy
• Focus on Mean time to Resolution or Cycle time
MTBF
VS
MTTR

Before you buy more stuff – try this
• How are you rolling out Citrix or changes?
• AUTOMATE!!!
• RULE: If you do it twice, it should be automated
• Focus on reducing Cycle time
• time(what is wrong) + time(how to fix it) + time(implement fix) =
cycle time
• Immutable Servers
• Servers are rebuilt from scratch for changes

Survive Failure - Architecture
• Does Citrix still work if:
• Your storage fails (SAN, Local,
whatever)?
• Your database fails?
• NetScaler fails?
• What can your users handle?
• Most can handle getting logged
off if they can log in again
• Most can NOT handle
• Applications hangs
• Print failures
• Can’t log in or connect
Source: theoatmeal.com

User Profiles and Folders
• Redirect Folders as much as possible
• This is where data that people use live (My Docs, Downloads, etc).
• Profiles
• Profiles should be as light as possible
• Can you use mandatory profile settings?
• Replicate profiles across 2 data centers
• Profiles are not going to work on DFS-R without corruption (except one-way)
• Active/Passive only (not active/active)
• Split users so some are active for one data center, passive for the other
• Use cloud storage
• Hack OneDrive for My Docs - https://office365drivemap.codeplex.com/

Storage / DB
• Use redundancy in the software, not
hardware
• PVS fails over on the fly (not for
CIFS/SMB though!)
• Local disk with PVS is better than an
expensive SAN (and likely performs
better, esp if you have SSD local)
Local Disk on Server
Whiptail_61 Whiptail_62
Mirror Aware Databases:
Standalone Databases:
Primary Database
APS-DCXA1SQL01
Mirror Database
APS-DCXA2SQL02
Witness
(no Database)
APS-DCXDCSQL03

PVS HA/DR Components
SQL
Database
(highly available)
PVS
Server
PVS
Server
Vdisk Store
Vdisk Store
DHCP – can be split on
2008 R2/2012
TFTP can be load
balanced with a
hardware load
balancer
2 Different
Locations
Mirror – storage resilient
Cluster – server resilient

Network
• Multiple Sites = Netscaler GSLB
• Active/Passive is easiest to setup
• All components should be load balanced if possible
• Even TFTP, double up on every component
• No NetScaler stags in Production
• HA/Failover Pair
• They share the VIP but have separate IP info (so the VIP floats)
• 1 NS + Hypervisor != Pair
NS LB
Zone US-East1
Zone US-West1
NS LB
NS LB
VIP

BLUE/GREEN
LB
App v1.0
App v1.0
App v1.1
App v1.1
Db v1.0
Db v1.1
Limiting Downtime
• Like active/passive
Don’t use DNS for this
• can’t trust TTL
When to use
• ANY database/schema upgrade
• Restore from backup is too large/long

• Like active/active but with a purpose
• Canary in the coal mine
• See if someone screams!
• Live to production
• Limiting Risk
• Back up your data
• All nodes use production database
• Route new connections to new nodes
CANARY
LB
App v1.0
App v1.0
App v1.1
Db v1.0

External
Firewall
Internal
Firewall
2 MPX 11500
External Users
Internal
Users
24,000
Zero
Clients
School Districts
Printers
Citrix
PVS
XA1 SCVMM
XA2 SCVMM
XDC SCVMM
APPVPublish
APPVReport
SQL
Mirror
Profiles
User Data
2 Delivery
Controllers
2 Provisioning
Servers
License
Servers
AppV
Cluster
SCVMM
Server
Storefront
2008 R2
Desktops
2008 R2
Applications
2 Delivery
Controllers
2 Provisioning
Servers
SCVMM
Server
2008 R2
Desktops
2008 R2
Applications
2 Delivery
Controllers
2 Provisioning
Servers
SCVMM
Server
Windows 7
Desktops
Atlanta Public Schools
Citrix Delivery Overview
Architect: Thomas Gamull
Company: Presidio
Date: 3/17/2014
File
Server
Print
Servers
CLL Data Center - 8,000 Concurrent Desktops for Students

XENAPP1
APS-DCXA1HOST01 APS-DCXA1HOST02
APS-DCXA1 Management Cluster
vSwitch
vSS-iSCSI-B
vSS-PVS-XAPP1-B : 10.90.68.0/23 – VLAN 68
vSS-XAPP1-A : 10.90.72.0/23 – VLAN 72
vSS-Servers-A
APS-DCXA1PVS01 APS-DCXA1SF01 APS-DCXA1DDC01 APS-DCXA1VMM01 APS-DCXA1WDM01
APS-DCXA1SQL01 APS-DCXA1APPV01
PVS02
SF02
DDC02

Rack Layout
NetScaler NetScaler
Top of Rack Switch Top of Rack Switch
Compute Blades
Compute Blades
Compute Blades
Compute Blades
Compute Blades
Compute Rack-Mount
Local Disk Storage
Compute Rack-Mount
Local Disk Storage
Compute Rack-Mount
Local Disk StorageCompute Blades
iSCSI/FC Storage iSCSI/FC Storage
Storage is always in pairs if needed
• Prefer multiple smaller arrays over monolithic SAN
• Let app/software do the work
Network redundancy is important
• Load balancers can remove switch dependencies
• Leverage common NIC cabling
Server choice can vary
• Blades are dense but lack local disk
• Rack Mounts are often very flexible
• Without automation you will have scaling problems

“Je n’ai fait celle-ci plus longue que parce
que je n’ai pas eu le loisir de la faire plus
courte.” – Blaise Pascal, Provincial Letters:
Letter XVI, 1657
English Translation: “If I had more
time, I would have written a shorter
letter.”

Tom Gamull
@magicalyak
http://magicalyak.org

Citrix XenDesktop: Dealing with Failure - SYN408

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Citrix XenDesktop: Dealing with Failure - SYN408

Similar to Citrix XenDesktop: Dealing with Failure - SYN408 (20)

Recently uploaded

Recently uploaded (20)

Citrix XenDesktop: Dealing with Failure - SYN408

Editor's Notes