SlideShare a Scribd company logo
1 of 22
SYN408:
XenDesktop 7.6 Architecture:
dealing with failure
Tom Gamull – Ericsson Consulting Manager
Citrix Synergy – May 2015
@magicalyak
WHAT WOULD YOU SAY YOU DO HERE?
2
Prevent Failures to begin with
•Failures are bad events
•Today’s technology should be
bulletproof
•Is 99.999% uptime the new
normal?
“The perfect is the enemy of
the good” - Voltaire
Our thinking is broken
Customer: “I can’t get to my
desktop”
Support/Admin: “The desktops
aren’t working because storage
failed”
CIO/Boss: “We need to ensure
storage never fails”
Solution
• Upgrade/Redundant SAN
• Somehow believe replication
can occur without penalty
(sales guy promised)
• Storage stays up!
Netflix Chaos Monkey
2010: Netflix moves to AWS
2011: US-East Outage - Netflix posts lessons learned
The best way to avoid failure is to fail constantly
Since 2013: Chaos Monkey is run in
Production except holidays and weekends
Before you buy more stuff – try this
• How do you respond to events today?
• How long to identity them?
• How long to solve them?
• Mean time before Failure is legacy
• Focus on Mean time to Resolution or Cycle time
MTBF
VS
MTTR
Before you buy more stuff – try this
• How are you rolling out Citrix or changes?
• AUTOMATE!!!
• RULE: If you do it twice, it should be automated
• Focus on reducing Cycle time
• time(what is wrong) + time(how to fix it) + time(implement fix) =
cycle time
• Immutable Servers
• Servers are rebuilt from scratch for changes
Survive Failure - Architecture
• Does Citrix still work if:
• Your storage fails (SAN, Local,
whatever)?
• Your database fails?
• NetScaler fails?
• What can your users handle?
• Most can handle getting logged
off if they can log in again
• Most can NOT handle
• Applications hangs
• Print failures
• Can’t log in or connect
Source: theoatmeal.com
User Profiles and Folders
• Redirect Folders as much as possible
• This is where data that people use live (My Docs, Downloads, etc).
• Profiles
• Profiles should be as light as possible
• Can you use mandatory profile settings?
• Replicate profiles across 2 data centers
• Profiles are not going to work on DFS-R without corruption (except one-way)
• Active/Passive only (not active/active)
• Split users so some are active for one data center, passive for the other
• Use cloud storage
• Hack OneDrive for My Docs - https://office365drivemap.codeplex.com/
Storage / DB
• Use redundancy in the software, not
hardware
• PVS fails over on the fly (not for
CIFS/SMB though!)
• Local disk with PVS is better than an
expensive SAN (and likely performs
better, esp if you have SSD local)
Local Disk on Server
Whiptail_61 Whiptail_62
Mirror Aware Databases:
Standalone Databases:
Primary Database
APS-DCXA1SQL01
Mirror Database
APS-DCXA2SQL02
Witness
(no Database)
APS-DCXDCSQL03
PVS HA/DR Components
SQL
Database
(highly available)
PVS
Server
PVS
Server
Vdisk Store
Vdisk Store
DHCP – can be split on
2008 R2/2012
TFTP can be load
balanced with a
hardware load
balancer
2 Different
Locations
Mirror – storage resilient
Cluster – server resilient
Network
• Multiple Sites = Netscaler GSLB
• Active/Passive is easiest to setup
• All components should be load balanced if possible
• Even TFTP, double up on every component
• No NetScaler stags in Production
• HA/Failover Pair
• They share the VIP but have separate IP info (so the VIP floats)
• 1 NS + Hypervisor != Pair
NS LB
Zone US-East1
Zone US-West1
NS LB
NS LB
VIP
BLUE/GREEN
LB
App v1.0
App v1.0
App v1.1
App v1.1
Db v1.0
Db v1.1
Limiting Downtime
• Like active/passive
Don’t use DNS for this
• can’t trust TTL
When to use
• ANY database/schema upgrade
• Restore from backup is too large/long
• Like active/active but with a purpose
• Canary in the coal mine
• See if someone screams!
• Live to production
• Limiting Risk
• Back up your data
• All nodes use production database
• Route new connections to new nodes
CANARY
LB
App v1.0
App v1.0
App v1.1
Db v1.0
External
Firewall
Internal
Firewall
2 MPX 11500
External Users
Internal
Users
24,000
Zero
Clients
School Districts
Printers
Citrix
PVS
XA1 SCVMM
XA2 SCVMM
XDC SCVMM
APPVPublish
APPVReport
SQL
Mirror
Profiles
User Data
2 Delivery
Controllers
2 Provisioning
Servers
License
Servers
AppV
Cluster
SCVMM
Server
Storefront
2008 R2
Desktops
2008 R2
Applications
2 Delivery
Controllers
2 Provisioning
Servers
SCVMM
Server
2008 R2
Desktops
2008 R2
Applications
2 Delivery
Controllers
2 Provisioning
Servers
SCVMM
Server
Windows 7
Desktops
Atlanta Public Schools
Citrix Delivery Overview
Architect: Thomas Gamull
Company: Presidio
Date: 3/17/2014
File
Server
Print
Servers
CLL Data Center - 8,000 Concurrent Desktops for Students
XENAPP1
APS-DCXA1HOST01 APS-DCXA1HOST02
APS-DCXA1 Management Cluster
vSwitch
vSS-iSCSI-B
vSS-PVS-XAPP1-B : 10.90.68.0/23 – VLAN 68
vSS-XAPP1-A : 10.90.72.0/23 – VLAN 72
vSS-Servers-A
APS-DCXA1PVS01 APS-DCXA1SF01 APS-DCXA1DDC01 APS-DCXA1VMM01 APS-DCXA1WDM01
APS-DCXA1SQL01 APS-DCXA1APPV01
PVS02
SF02
DDC02
Rack Layout
NetScaler NetScaler
Top of Rack Switch Top of Rack Switch
Compute Blades
Compute Blades
Compute Blades
Compute Blades
Compute Blades
Compute Rack-Mount
Local Disk Storage
Compute Rack-Mount
Local Disk Storage
Compute Rack-Mount
Local Disk StorageCompute Blades
iSCSI/FC Storage iSCSI/FC Storage
Storage is always in pairs if needed
• Prefer multiple smaller arrays over monolithic SAN
• Let app/software do the work
Network redundancy is important
• Load balancers can remove switch dependencies
• Leverage common NIC cabling
Server choice can vary
• Blades are dense but lack local disk
• Rack Mounts are often very flexible
• Without automation you will have scaling problems
“Je n’ai fait celle-ci plus longue que parce
que je n’ai pas eu le loisir de la faire plus
courte.” – Blaise Pascal, Provincial Letters:
Letter XVI, 1657
English Translation: “If I had more
time, I would have written a shorter
letter.”
Tom Gamull
@magicalyak
http://magicalyak.org

More Related Content

What's hot

Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that growGibraltar Software
 
DevCon13 System Administration Basics
DevCon13 System Administration BasicsDevCon13 System Administration Basics
DevCon13 System Administration Basicssysnickm
 
Modern Cloud Fundamentals: Misconceptions and Industry Trends
Modern Cloud Fundamentals: Misconceptions and Industry TrendsModern Cloud Fundamentals: Misconceptions and Industry Trends
Modern Cloud Fundamentals: Misconceptions and Industry TrendsChristopher Bennage
 
E g innovations
E g innovationsE g innovations
E g innovationsdvmug1
 
Webinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS DeploymentsWebinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS DeploymentsStorage Switzerland
 
Virtualization for competitive advantage - Eric Vanderburg
Virtualization for competitive advantage - Eric VanderburgVirtualization for competitive advantage - Eric Vanderburg
Virtualization for competitive advantage - Eric VanderburgEric Vanderburg
 
Branch Office Infrastructure
Branch Office InfrastructureBranch Office Infrastructure
Branch Office InfrastructureAidan Finn
 
Building azure applications ireland
Building azure applications irelandBuilding azure applications ireland
Building azure applications irelandMichael Meagher
 
#Surgeconf Scaling Twitter to go After the Fail Whale
#Surgeconf Scaling Twitter to go After the Fail Whale#Surgeconf Scaling Twitter to go After the Fail Whale
#Surgeconf Scaling Twitter to go After the Fail WhaleJonathan Reichhold
 
Exploring Windows XP to 7 Migration Options
Exploring Windows XP to 7 Migration OptionsExploring Windows XP to 7 Migration Options
Exploring Windows XP to 7 Migration OptionsDavid Strom
 
Kscope 2013 delphix
Kscope 2013 delphixKscope 2013 delphix
Kscope 2013 delphixKyle Hailey
 
Capacity - Ransomware - Protection - Three Windows File Server Upgrades to Avoid
Capacity - Ransomware - Protection - Three Windows File Server Upgrades to AvoidCapacity - Ransomware - Protection - Three Windows File Server Upgrades to Avoid
Capacity - Ransomware - Protection - Three Windows File Server Upgrades to AvoidStorage Switzerland
 
10 zig
10 zig10 zig
10 zigdvmug1
 
Webinar: Eventual Consistency != Hopeful Consistency
Webinar: Eventual Consistency != Hopeful ConsistencyWebinar: Eventual Consistency != Hopeful Consistency
Webinar: Eventual Consistency != Hopeful ConsistencyDataStax
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Pythondidip
 
Digital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesDigital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesLightbend
 

What's hot (20)

Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
DevCon13 System Administration Basics
DevCon13 System Administration BasicsDevCon13 System Administration Basics
DevCon13 System Administration Basics
 
Hp
HpHp
Hp
 
Modern Cloud Fundamentals: Misconceptions and Industry Trends
Modern Cloud Fundamentals: Misconceptions and Industry TrendsModern Cloud Fundamentals: Misconceptions and Industry Trends
Modern Cloud Fundamentals: Misconceptions and Industry Trends
 
Delphix
DelphixDelphix
Delphix
 
E g innovations
E g innovationsE g innovations
E g innovations
 
Webinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS DeploymentsWebinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS Deployments
 
Virtualization for competitive advantage - Eric Vanderburg
Virtualization for competitive advantage - Eric VanderburgVirtualization for competitive advantage - Eric Vanderburg
Virtualization for competitive advantage - Eric Vanderburg
 
Branch Office Infrastructure
Branch Office InfrastructureBranch Office Infrastructure
Branch Office Infrastructure
 
3 migration
3 migration3 migration
3 migration
 
Building azure applications ireland
Building azure applications irelandBuilding azure applications ireland
Building azure applications ireland
 
#Surgeconf Scaling Twitter to go After the Fail Whale
#Surgeconf Scaling Twitter to go After the Fail Whale#Surgeconf Scaling Twitter to go After the Fail Whale
#Surgeconf Scaling Twitter to go After the Fail Whale
 
Exploring Windows XP to 7 Migration Options
Exploring Windows XP to 7 Migration OptionsExploring Windows XP to 7 Migration Options
Exploring Windows XP to 7 Migration Options
 
Kscope 2013 delphix
Kscope 2013 delphixKscope 2013 delphix
Kscope 2013 delphix
 
Capacity - Ransomware - Protection - Three Windows File Server Upgrades to Avoid
Capacity - Ransomware - Protection - Three Windows File Server Upgrades to AvoidCapacity - Ransomware - Protection - Three Windows File Server Upgrades to Avoid
Capacity - Ransomware - Protection - Three Windows File Server Upgrades to Avoid
 
Supporting SQLserver
Supporting SQLserverSupporting SQLserver
Supporting SQLserver
 
10 zig
10 zig10 zig
10 zig
 
Webinar: Eventual Consistency != Hopeful Consistency
Webinar: Eventual Consistency != Hopeful ConsistencyWebinar: Eventual Consistency != Hopeful Consistency
Webinar: Eventual Consistency != Hopeful Consistency
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Python
 
Digital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesDigital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and Microservices
 

Similar to Citrix XenDesktop: Dealing with Failure - SYN408

Data Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloningData Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloningKyle Hailey
 
The Hard Problems of Continuous Deployment
The Hard Problems of Continuous DeploymentThe Hard Problems of Continuous Deployment
The Hard Problems of Continuous DeploymentTimothy Fitz
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interactionGovind Kanshi
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLTriNimbus
 
Denver devops : enabling DevOps with data virtualization
Denver devops : enabling DevOps with data virtualizationDenver devops : enabling DevOps with data virtualization
Denver devops : enabling DevOps with data virtualizationKyle Hailey
 
Webinar: Hyperconvergence is Broken, Learn How to Fix it!
Webinar: Hyperconvergence is Broken, Learn How to Fix it!Webinar: Hyperconvergence is Broken, Learn How to Fix it!
Webinar: Hyperconvergence is Broken, Learn How to Fix it!Storage Switzerland
 
Nagios Conference 2014 - Jeremy Rust - Avoiding Downtime Using Linux High Ava...
Nagios Conference 2014 - Jeremy Rust - Avoiding Downtime Using Linux High Ava...Nagios Conference 2014 - Jeremy Rust - Avoiding Downtime Using Linux High Ava...
Nagios Conference 2014 - Jeremy Rust - Avoiding Downtime Using Linux High Ava...Nagios
 
Planning For Catastrophe with IBM WAS and IBM BPM
Planning For Catastrophe with IBM WAS and IBM BPMPlanning For Catastrophe with IBM WAS and IBM BPM
Planning For Catastrophe with IBM WAS and IBM BPMWASdev Community
 
Webinar: Cloud Data Masking - Tips to Test Software Securely
Webinar: Cloud Data Masking - Tips to Test Software Securely Webinar: Cloud Data Masking - Tips to Test Software Securely
Webinar: Cloud Data Masking - Tips to Test Software Securely Skytap Cloud
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
Share on LinkedIn Share on Twitter Share on Facebook Share on Google+ Share b...
Share on LinkedIn Share on Twitter Share on Facebook Share on Google+ Share b...Share on LinkedIn Share on Twitter Share on Facebook Share on Google+ Share b...
Share on LinkedIn Share on Twitter Share on Facebook Share on Google+ Share b...Avere Systems
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAlluxio, Inc.
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...Citrix
 
Agile Data: revolutionizing data and database cloning
Agile Data: revolutionizing data and database cloningAgile Data: revolutionizing data and database cloning
Agile Data: revolutionizing data and database cloningKyle Hailey
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?TechWell
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesJosef Adersberger
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesQAware GmbH
 

Similar to Citrix XenDesktop: Dealing with Failure - SYN408 (20)

Data Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloningData Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloning
 
The Hard Problems of Continuous Deployment
The Hard Problems of Continuous DeploymentThe Hard Problems of Continuous Deployment
The Hard Problems of Continuous Deployment
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
Denver devops : enabling DevOps with data virtualization
Denver devops : enabling DevOps with data virtualizationDenver devops : enabling DevOps with data virtualization
Denver devops : enabling DevOps with data virtualization
 
Webinar: Hyperconvergence is Broken, Learn How to Fix it!
Webinar: Hyperconvergence is Broken, Learn How to Fix it!Webinar: Hyperconvergence is Broken, Learn How to Fix it!
Webinar: Hyperconvergence is Broken, Learn How to Fix it!
 
Nagios Conference 2014 - Jeremy Rust - Avoiding Downtime Using Linux High Ava...
Nagios Conference 2014 - Jeremy Rust - Avoiding Downtime Using Linux High Ava...Nagios Conference 2014 - Jeremy Rust - Avoiding Downtime Using Linux High Ava...
Nagios Conference 2014 - Jeremy Rust - Avoiding Downtime Using Linux High Ava...
 
Planning For Catastrophe with IBM WAS and IBM BPM
Planning For Catastrophe with IBM WAS and IBM BPMPlanning For Catastrophe with IBM WAS and IBM BPM
Planning For Catastrophe with IBM WAS and IBM BPM
 
Webinar: Cloud Data Masking - Tips to Test Software Securely
Webinar: Cloud Data Masking - Tips to Test Software Securely Webinar: Cloud Data Masking - Tips to Test Software Securely
Webinar: Cloud Data Masking - Tips to Test Software Securely
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
SQL Saturday San Diego
SQL Saturday San DiegoSQL Saturday San Diego
SQL Saturday San Diego
 
Share on LinkedIn Share on Twitter Share on Facebook Share on Google+ Share b...
Share on LinkedIn Share on Twitter Share on Facebook Share on Google+ Share b...Share on LinkedIn Share on Twitter Share on Facebook Share on Google+ Share b...
Share on LinkedIn Share on Twitter Share on Facebook Share on Google+ Share b...
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
 
Agile Data: revolutionizing data and database cloning
Agile Data: revolutionizing data and database cloningAgile Data: revolutionizing data and database cloning
Agile Data: revolutionizing data and database cloning
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

Citrix XenDesktop: Dealing with Failure - SYN408

  • 1. SYN408: XenDesktop 7.6 Architecture: dealing with failure Tom Gamull – Ericsson Consulting Manager Citrix Synergy – May 2015 @magicalyak
  • 2. WHAT WOULD YOU SAY YOU DO HERE? 2
  • 3. Prevent Failures to begin with •Failures are bad events •Today’s technology should be bulletproof •Is 99.999% uptime the new normal?
  • 4. “The perfect is the enemy of the good” - Voltaire
  • 5. Our thinking is broken Customer: “I can’t get to my desktop” Support/Admin: “The desktops aren’t working because storage failed” CIO/Boss: “We need to ensure storage never fails”
  • 6. Solution • Upgrade/Redundant SAN • Somehow believe replication can occur without penalty (sales guy promised) • Storage stays up!
  • 7. Netflix Chaos Monkey 2010: Netflix moves to AWS 2011: US-East Outage - Netflix posts lessons learned The best way to avoid failure is to fail constantly Since 2013: Chaos Monkey is run in Production except holidays and weekends
  • 8.
  • 9. Before you buy more stuff – try this • How do you respond to events today? • How long to identity them? • How long to solve them? • Mean time before Failure is legacy • Focus on Mean time to Resolution or Cycle time MTBF VS MTTR
  • 10. Before you buy more stuff – try this • How are you rolling out Citrix or changes? • AUTOMATE!!! • RULE: If you do it twice, it should be automated • Focus on reducing Cycle time • time(what is wrong) + time(how to fix it) + time(implement fix) = cycle time • Immutable Servers • Servers are rebuilt from scratch for changes
  • 11. Survive Failure - Architecture • Does Citrix still work if: • Your storage fails (SAN, Local, whatever)? • Your database fails? • NetScaler fails? • What can your users handle? • Most can handle getting logged off if they can log in again • Most can NOT handle • Applications hangs • Print failures • Can’t log in or connect Source: theoatmeal.com
  • 12. User Profiles and Folders • Redirect Folders as much as possible • This is where data that people use live (My Docs, Downloads, etc). • Profiles • Profiles should be as light as possible • Can you use mandatory profile settings? • Replicate profiles across 2 data centers • Profiles are not going to work on DFS-R without corruption (except one-way) • Active/Passive only (not active/active) • Split users so some are active for one data center, passive for the other • Use cloud storage • Hack OneDrive for My Docs - https://office365drivemap.codeplex.com/
  • 13. Storage / DB • Use redundancy in the software, not hardware • PVS fails over on the fly (not for CIFS/SMB though!) • Local disk with PVS is better than an expensive SAN (and likely performs better, esp if you have SSD local) Local Disk on Server Whiptail_61 Whiptail_62 Mirror Aware Databases: Standalone Databases: Primary Database APS-DCXA1SQL01 Mirror Database APS-DCXA2SQL02 Witness (no Database) APS-DCXDCSQL03
  • 14. PVS HA/DR Components SQL Database (highly available) PVS Server PVS Server Vdisk Store Vdisk Store DHCP – can be split on 2008 R2/2012 TFTP can be load balanced with a hardware load balancer 2 Different Locations Mirror – storage resilient Cluster – server resilient
  • 15. Network • Multiple Sites = Netscaler GSLB • Active/Passive is easiest to setup • All components should be load balanced if possible • Even TFTP, double up on every component • No NetScaler stags in Production • HA/Failover Pair • They share the VIP but have separate IP info (so the VIP floats) • 1 NS + Hypervisor != Pair NS LB Zone US-East1 Zone US-West1 NS LB NS LB VIP
  • 16. BLUE/GREEN LB App v1.0 App v1.0 App v1.1 App v1.1 Db v1.0 Db v1.1 Limiting Downtime • Like active/passive Don’t use DNS for this • can’t trust TTL When to use • ANY database/schema upgrade • Restore from backup is too large/long
  • 17. • Like active/active but with a purpose • Canary in the coal mine • See if someone screams! • Live to production • Limiting Risk • Back up your data • All nodes use production database • Route new connections to new nodes CANARY LB App v1.0 App v1.0 App v1.1 Db v1.0
  • 18. External Firewall Internal Firewall 2 MPX 11500 External Users Internal Users 24,000 Zero Clients School Districts Printers Citrix PVS XA1 SCVMM XA2 SCVMM XDC SCVMM APPVPublish APPVReport SQL Mirror Profiles User Data 2 Delivery Controllers 2 Provisioning Servers License Servers AppV Cluster SCVMM Server Storefront 2008 R2 Desktops 2008 R2 Applications 2 Delivery Controllers 2 Provisioning Servers SCVMM Server 2008 R2 Desktops 2008 R2 Applications 2 Delivery Controllers 2 Provisioning Servers SCVMM Server Windows 7 Desktops Atlanta Public Schools Citrix Delivery Overview Architect: Thomas Gamull Company: Presidio Date: 3/17/2014 File Server Print Servers CLL Data Center - 8,000 Concurrent Desktops for Students
  • 19. XENAPP1 APS-DCXA1HOST01 APS-DCXA1HOST02 APS-DCXA1 Management Cluster vSwitch vSS-iSCSI-B vSS-PVS-XAPP1-B : 10.90.68.0/23 – VLAN 68 vSS-XAPP1-A : 10.90.72.0/23 – VLAN 72 vSS-Servers-A APS-DCXA1PVS01 APS-DCXA1SF01 APS-DCXA1DDC01 APS-DCXA1VMM01 APS-DCXA1WDM01 APS-DCXA1SQL01 APS-DCXA1APPV01 PVS02 SF02 DDC02
  • 20. Rack Layout NetScaler NetScaler Top of Rack Switch Top of Rack Switch Compute Blades Compute Blades Compute Blades Compute Blades Compute Blades Compute Rack-Mount Local Disk Storage Compute Rack-Mount Local Disk Storage Compute Rack-Mount Local Disk StorageCompute Blades iSCSI/FC Storage iSCSI/FC Storage Storage is always in pairs if needed • Prefer multiple smaller arrays over monolithic SAN • Let app/software do the work Network redundancy is important • Load balancers can remove switch dependencies • Leverage common NIC cabling Server choice can vary • Blades are dense but lack local disk • Rack Mounts are often very flexible • Without automation you will have scaling problems
  • 21. “Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir de la faire plus courte.” – Blaise Pascal, Provincial Letters: Letter XVI, 1657 English Translation: “If I had more time, I would have written a shorter letter.”

Editor's Notes

  1. Why This information isn’t useful without explaining why I will spend no more than half the speaking time on this Don’t need to write stuff down just try to grasp my message What Some examples Actual architecture and things you can do Also I will finish with at least 10 minutes for Q&A Email and twitter I respond to
  2. I was the Practice Manager for Workforce Mobility at Presidio, which is a great company and Citrix partner. One of my accomplishments there was the Atlanta Public Schools XenApp/XenDesktop 7 deployment for 50,000 students (one of the first large XenDesktop 7 deployments from a partner). I honestly wanted to do more and joined Ericsson earlier this year as a Consulting Manager – I could list buzzwords like DevOps, OpenStack, CI/CD, SDN and NFV but in reality I currently help customers align their entire deployment pipeline (including software development) with how their company produces value.
  3. Failures can stop business flow and cost companies money. If you’ve ever worked in Operations, you might think that their sole job is to prevent failures over anything else. To add to this thought, we consume better hardware every year and expect stable performance. Why do newer phones seem to have battery life issues and problems making calls? It’s amazing I can grab a cell phone from 10 years ago and it would last all day on a charge. I had a Volkswagen beetle that still runs, we seriously can’t make data center hardware reliable? We can shorten this philosophy to 5 9s uptime, 99.999% uptime seems to be written into every CIO’s wishlist from any architecture today.
  4. Failure is a tough thing to avoid or predict. We really should be looking at things a different way. I also realize that many of us have different roles and can think they don’t have a say in this. I disagree, if you can relate anything back to the business value, you will get people’s ears or at worst, a better job.
  5. Let’s walk through a hypothetical here. Our customer or end user can’t get to the desktop, we find out the desktop can’t pull our profile data from the storage server. In fact, our storage appears to have failed! “Nevermind the details!”, says the Director or CIO, we need this fixed now. We need to ensure storage does not fail again!
  6. Let’s get a storage expert in here! The solution is a new or upgraded SAN with better performance, more reliability and a promise that it will not fail, or your money back (terms and conditions apply!). The problem with this solution is that it confuses eliminating a problem with finding a solution. It does not address the underlying cause. Could this have been the storage driver? How does SAN uptime prevent that? What if it’s just space/performance/latency? Just because the desktop failed when storage did doesn’t mean that storage is the cause You are now forever justifying this fix (can you honestly admit it’s wrong if you find out?) Also, how’s the SAN fabric looking?
  7. One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage. Rambo architecture, each component can survive failures of the other components it depends on If our recommendations system is down, we degrade the quality of our responses to our customers, but we still respond. We’ll show popular titles instead of personalized picks. If our search system is intolerably slow, streaming should still work perfectly fine. http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html
  8. Automation is key. A few years ago Aaron Parker had a session and he asked how are people not automating, there is not reason. I did not automate much back then, I do now though! If you do something twice, you need to automate it. Humans are not ideal at repeat entry, but computers are. Utilizing Chef or Puppet is something you should look into if you haven’t yet. Also, our focus should ultimately be about cycle time. Finally, the concept of immutable servers is also a worthwhile solution: Treat servers like inkjet printers, its often easier to just replace them http://www.thoughtworks.com/insights/blog/rethinking-building-cloud-part-4-immutable-servers
  9. Let’s talk about architecture for XenDesktop 7.6 and how we survive failure. Think of having Desktop and Apps still run despite necessary components failing. A better way to focus this is to evaluate what end users can handle. Surprisingly, I’ve found most handle logoffs better than slow performance. People often don’t report a logoff if they can log back in, but when a print job takes 30 minutes or longer, you can be assured of a ticket.
  10. Another session SYN502 discussed issues with SMB, Folder redirection and newer technologies. I’m still a fan of redirection but mainly for Documents and file data, not for Desktop or AppData. This is one of the biggest areas to tackle for failure issues, in my earlier example, it was the profile that failed, causing the desktop to not load. I have seen profile replication in failover scenarios where one data center is primary for a set of users, while the other is primary for another set of users. End user feedback is important to get this issue resolved, is it worth hardware and slowness because people use the desktop for their my documents? Usually not. For more info see Synergy 2015 - SYN502: I’ve got 99 problems, and folder redirection is every one of them (Helge Klein, Sean Bass, Aaron Parker)
  11. Did I mention how easy it is to scale later using cheap hardware, storage, compute? Perhaps take out APS refs in the picture?
  12. For HA we should always add another PVS server with a SEPARATE vdisk store (you can mix SAN/local disk, etc here) If we leave DHCP alone we add a point of failure where target devices may fail to boot. You can use 2008 R2 or 2012 to provide split scope or utilize a more redundant solution such as bluecat or infoblox. PXE and TFTP is another point of HA concern, you can only provide true HA with a hardware load balancer. I often do NOT provide HA for TFTP but if you have a hardware load balancer there is no reason not to. PXE will load the bootstrap which, if not specified with you PVS servers, won’t work (you need to add them) Use mirroring with SQL if you can. It’s great and clustering doesn’t really prevent you from dealing with issues such as the storage failing! If your storage will never ever fail then that’s awesome but keep in mind I can use local storage and mirroring and pretty much get the same benefits, well except for the feeling of spending tons of money. Clustering helps update SQL nodes one at a time while keeping SQL up, this generally is not something I do, but I do recommend mirroring. Mirroring requires a witness server, a 3rd server that doesn’t do anything other than help with the quorum (sql deciding what server is primary). If you set this up and lose a secondary and a witness, the primary will stop. I often put my witness on a local disk.
  13. Load balancers are your friend, I reference NetScaler because of obvious reasons but keep in mind there are free virtual load balancers that are linux based that can do some work. You don’t have to be a Cisco CCIE to figure this stuff out either, there are tons of blogs and walkthroughs out there to guide you through this. That being said, GSLB is a LOT harder than just load balancing internal components
  14. This diagram is actually for application/dev updating but the theory is the same for different scenarios. We can use blue/green for upgrades, new feature rollout, etc. Note we actually snapshot or clone the database, then flip over to the other application set (or data center, database, etc). If your backups are too long and big, this method of updating or rolling out changes is ideal. Limiting Downtime -Green/Blue Deployments Create live replica of database Duplicate all app nodes w new code/config Adjust routing to activate new code When to Use You are updating your schema No object versioned db No feature flags Can test the feature outside production Restoring from a backup is not practical (big data sets) Plan for the worst case scenario: Oops, my feature blew up http://www.slideshare.net/adrianjotto/docker-102-immutable-infrastructure
  15. Limiting Risk Requires Feature Flags or Sticky LB sessions Back up your data All nodes use production database Route new connections to new nodes When to use No contract breaking changes to schema You have object versioned db You use feature flags Impractical to test the feature outside prod Have a full backup of your data & can restore http://www.slideshare.net/adrianjotto/docker-102-immutable-infrastructure
  16. Note the right side with 3 SCVMM (hyper-v) clusters, we use both clusters but can survive the failure of an entire cluster. All the clusters share the same SQL mirror, storefront farm and File server for profiles.
  17. This is one cluster of 2 or more for Hyper-V 2 Blades do the work (so one blade can fail and my cluster is up). If they both fail, I have another cluster. I have 2 of everything Don’t skimp on something, make it two or more of EVERYTHING you can.
  18. Notice pair of netscalers in top of rack? I have two storage appliances at each data center (In this case flash storage using PVS) Primary Data Center – CLL – 48 blades with an invicta and two 6296s in each rack Secondary – Brewer - has 32 blades and 2 invictas, 2 netscalers and 2 6296s in a single rack.