Share point disaster avoidance architecture for large scale enterprises


Published on

SharePoint best practices dictate that a proper disaster recovery plan should be in place before the launch of your SharePoint farm. Standard methodologies related to disaster planning in SharePoint deal with the traditional type of scenarios where your datacenter is a smoldering hole in the ground. Processes such as SQL Server database backups or STSADM backups for site collections are often employed to cater to such scenarios. When something seemingly benign like a Secure Store Service Application corruption strikes, architects and administrators often come to the sad conclusion that a complete farm rebuild is their only recourse. Additionally the risks associated with the application of regular bi-monthly SharePoint Cumulative Updates and periodic service packs, all of which have no uninstall or undo features, also serve to increase the probability of experiencing an complete emergency farm rebuild at some point in an architect/administrator’s career. Long after a rebuild is completed and business has been restored to "almost" normal status, you’ll still be troubleshooting server configurations and tweaking the environment to get back to your pre-disaster level.
This workshop takes you through a dramatically new way of architecting your disaster plan. By applying the principles of this new methodology, you’ll dramatically cut down your disaster response time to the point of almost avoiding them entirely.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Share point disaster avoidance architecture for large scale enterprises

  1. 1. SharePoint Disaster Avoidance Architecture for Large Scale EnterprisesCornelius J. van Dyk Jason HimmelsteinCrayveon Corporation @sharepointlhorn
  2. 2. • Chief Architect, Crayveon Corporation• 7 time MVP, MCITP, MCTS• Blog:• Twitter: @cjvandyk• LinkedIn:
  3. 3. • SharePoint Practice Director, Sentri Inc.• MCITP, MCTS SharePoint 2010• Microsoft vTSP ● virtual Technology Solutions Professional• SharePoint Foundation Logger (• Web:• Blog:• Twitter: @sharepointlhorn• LinkedIn:
  4. 4. Why do we do this?Jason’s Family Cornelius’ Family
  5. 5. GET TO KNOW YOU• Name• Company• What you do with SharePoint• Something interesting about yourself
  6. 6. DISASTER• Outage vs Disaster• When is a disaster actually a disaster?• Traditional disaster planning
  7. 7. DISCUSSION GROUP BREAKOUT• What is disaster planning to you?• In the context of SharePoint• Critical points
  8. 8. BUSINESS CONTINUITY PLANNING• Business continuity planning identifies an organizations exposure to internal and external threats and synthesizes hard and soft assets to provide effective prevention and recovery for the organization, whilst maintaining competitive advantage and system integrity.• Components ● Planning ● Testing ● Validation
  9. 9. STRATEGIES• Recovery Point Objective (RPO)• Recovery Time Objective (RTO)• Tolerance for down time
  10. 10. DISASTER PLANNING STEPS• Executive Management Commitment ● This costs money ● Must invest to protect ● Think of Insurance
  11. 11. DISASTER PLANNING STEPS• Planning Committee ● All business units represented ● One person to lead – think Chief Justice ● Responsibility ● Authority
  12. 12. DISASTER PLANNING STEPS• Risk Assessment ● Business Impact Analysis • Natural Disasters • Technical Disasters • Human threats • Terrorism
  13. 13. DISASTER PLANNING STEPS• Determine SLA ● SLA for corporate users ● SLA for internal customers ● SLA for partner companies ● SLA for public
  14. 14. DISASTER PLANNING STEPS• Establish Priorities for Recovery ● Critical Operations ● Key Personnel ● Vital Systems ● Documentation/Records/Policies & Procedures
  15. 15. DISASTER PLANNING STEPS• Determine Recovery Strategies ● Facilities • Destroyed • Impaired ● Hardware • Servers – replacement availability • Network – service providers
  16. 16. DISASTER PLANNING STEPS• Determine Recovery Strategies ● Software • Install ISOs • Updates ● Communications • Inter-company • Partners & Public
  17. 17. DISASTER PLANNING STEPS• Determine Recovery Strategies ● Data • Backups • Availability ● Company Services ● Customer Services
  18. 18. DISASTER PLANNING STEPS• Determine Recovery Strategies ● Distributed architecture • Hot Site • Warm Site • Cold Site
  19. 19. DISASTER PLANNING STEPS• Determine Recovery Strategies ● Vendor Agreements • Circumstances constituting an emergency • Contract Duration • Termination Conditions • Cost • Testing
  20. 20. DISASTER PLANNING STEPS• Determine Recovery Strategies ● Vendor Agreements (cont.) • Security procedures • System change notifications • Hours of operation • Hardware requirements • Personnel requirements
  21. 21. DISASTER PLANNING STEPS• Determine Recovery Strategies ● Vendor Agreements (cont.) • Compatibility guarantee • Availability guarantee • Priorities with other customers
  22. 22. DISASTER PLANNING STEPS• Perform Data Collection ● Critical phone numbers ● Hardware inventory • Vendor contact and equipment information ● Software inventory ● Notification checklist
  23. 23. DISASTER PLANNING STEPS• Organize & Document a Written Plan ● Plan should follow a checklist ● Think rebuild from scratch • Notifications • Hardware • Software • Restore backups
  24. 24. DISASTER PLANNING STEPS• Organize & Document a Written Plan (cont.) ● Think rebuild from scratch (cont.) • Re-establish systems • Test & Validate • Communicate • After Action Review
  25. 25. DISASTER PLANNING STEPS• Develop Testing Criteria & Procedures• Test the plan• Test the plan again• Approve the plan
  26. 26. DISASTER PLANNING STEPS• Ongoing plan validation ● Annual testing ● Scenario testing ● Testing when something changes
  27. 27. TRADITIONAL DISASTER PLANNING• Backups• Log Shipping• SQL Replication• Hot Site
  28. 28. SHAREPOINT ARCHITECTURE• Farm configuration• 2 WFE, 2 APP, SQL Cluster• The role of virtualization
  29. 29. RECOVERY vs AVOIDANCE• What is Disaster Avoidance?• A new way of looking at DR• Why another DR strategy?• What makes SPDAALSE different?
  30. 30. CAUSES OF DISASTERS• Natural disasters such as floods, hurricanes, earthquakes, tornados, storms etc.• Human induced such as accidents, acts of terrorism etc.• Hardware failures such as drive crashes, memory or board failures etc.
  31. 31. CAUSES OF DISASTERS (cont)• Malware such as worms, viruses etc.• The one everyone forgets about…• Software incompatibility when upgrading: ● Operating systems ● Software service pack ● Software patches
  32. 32. SHAREPOINT CUMULATIVE UPDATES• Bi-monthly• Recommended by support• History of hot fixes and re-releases• Famously broke User Profile Services
  33. 33. CUs A NECCESARY EVIL• Why apply them at all?• What’s their risk?• Can’t we just uninstall them?• Compared to Exchange…
  34. 34. HOW DOES SPDAALSE HELP?• Farm Architecture• SharePoint databases• Difference between data and configuration• What makes Large Scale Enterprises different?
  37. 37. THINKING DIFFERENT• Separation of data and configuration• Performance considerations• Adding virtualization
  38. 38. IN ACTION• Building the farm based on SPDAALSE• Preparing the farm for testing• Snapping the farm• Backups
  39. 39. IN ACTION (cont)• Patching the farm• Testing the patch• Rolling back• Validating rollback
  40. 40. IN ACTION (cont)• Demo
  41. 41. Agenda• Infrastructure Design ● Analyze Customer Requirements ● Hardware requirements ● Server configuration ● Network recommendations ● Virtual vs. Physical• SQL Server Performance ● Pre-grow vs. Auto-growth ● IO requirements ● Sizing recommendations ● Database Isolation• SharePoint Server Performance ● Tier isolation vs. Location Proximity Requirements ● Load balancing your App Tier ● Load testing in your environment ● Governance & Troubleshooting
  42. 42. Infrastructure Design• Analyze Customer Requirements ● High Availability ● Disaster Recovery ● Budget Constraints ● Location Awareness ● Number of Concurrent Users
  43. 43. Infrastructure Design• Hardware requirements ● Web servers & Application servers Developer or Evaluation environments Production in Single Server or farm CPU: 4 cores, 64-bit required environments RAM: 4GB CPU: 4 cores, 64-bit required Hard Drive space: 80GB RAM: 8GB Hard Drive space: 80GB ● SQL servers Small Farm Medium Farm Large Farm CPU: 4 cores, 64-bit required CPU: 8 cores, 64-bit required Up to 2TB Content DBS RAM: 8GB RAM: 16GB RAM: 32 GB Hard Drive space: 80GB Hard Drive space: 80GB From 2TB to 5TB Content DBS RAM: 64 GB• What constitutes a small/medium/large farm?
  44. 44. Infrastructure Design• Server configuration – Small Farm
  45. 45. Infrastructure Design• Server configuration – Scaled Farm
  46. 46. Infrastructure Design
  47. 47. Infrastructure Design• Network recommendations ● Traffic Isolation • Web • Database • Search • Service Applications • Authentication ● Number of NICs per server ● Limit the number of hops ● Colocation of servers
  48. 48. Infrastructure Design• Physical ● Benefits • No virtualization overhead • Ability to target DBs to separate physical spindles • Only OS limits on Hardware • Simple Networking ● Drawbacks • Backup & recovery time • Limited snapshot ability • Costly & lacking Centralized Management • Failover limitations
  49. 49. Infrastructure Design• Virtualization ● Benefits • Snapshot capability • Rapid system deployment • HADR ability • Centralized Management ● Drawbacks • Loss of minimum 8% compute for overhead • Limitations on addressing full hardware • Disks are stored as single/multi-file • Centralized Networking
  50. 50. SQL Server Performance• Pre-grow databases ● Requires more space initially ● Dramatic increase in performance ● Databases like contiguous space• Auto-growth ● Immediately change from 1m increments ● Do not use “Grow by %” setting ● 50-100m maximum growth per required ● Schedule maintenance task to check size & grow in off peak hours as required
  51. 51. SQL Server Performance• IO requirements DB Files RAID Level Optimization 1 TempDB data 10 Write 2 TempDB logs 10 Write 3 ContentDB data 10 ReadWrite 4 ContentDB logs 10 Write 5 Crawl DB logs 10 Write 6 Crawl DB data 10 ReadWrite 7 Property DB logs 10 Write 8 Property DB data 10 Write 9 Services DB logs 10 Write 10 Services DB data 5/10 ReadWrite 11 Archive Content DB 5 Read 12 Publishing Site Content DB 5 Read
  52. 52. SQL Server Performance• Sizing recommendations ● Recommended limit for ContentDBs: 200G • Maximum supported: 4TB – Includes Remote BLOBs ● Backup/Restore timing ● Simple vs. Full recovery mode
  53. 53. SQL Server Performance• Database Instance Isolation ● Secure Store Database ● SharePoint core databases ● Content Databases ● Search ● Highly Transactional non-SharePoint DBs• Drawback ● Lose the central management in a single SQL Server Management Studio window
  54. 54. SharePoint Server Performance• Tier isolation vs. Location Proximity Requirements ● Separation via vLAN • Less chatter • Increased hop count ● Collocating SharePoint in a single vLAN • Increased chatter • Lower hop count• Key take away ● Know your network, determine your topology based upon traffic & requirements
  55. 55. SharePoint Server Performance• Load balancing your App Tier ● Know your load ● Scale based upon need, not perception• Find your choke point, then release the grasp ● Don’t assume, validate!
  56. 56. SharePoint Server Performance • Load testing in your environment ● Example • 2 Web Servers (4cores, 16GB RAM) using NLB • 1 App Server (4cores, 16 GB RAM) • 1 SQL Server Instance (16cores, 128GB RAM) • Simple CRUD operations – Login, create list item, open item, modify item, save item, delete item, log out
  57. 57. SharePoint Server Performance• Load testing in your environment ● Results • Farm was completely non-responsive at ~500 concurrent users ● Root cause • Watching this test on the server side we found that we were immediately CPU bound. ● Conclusion • Add CPUs or Web Servers to the farm to handle additional load
  58. 58. References• Jason’s Blog Sentri, Inc SharePoint Foundation Logger• My Article on SharePoint Pro• Cornelius J. van Dyk’s Blog• Eric Shupps’s Blog• SharePoint Server 2010 Hardware and software requirements• SharePoint Server 2010 Capacity Management: Software Boundaries and Limits• Capacity Management and Sizing Overview for SharePoint Server 2010• Capacity Planning for SharePoint Server 2010• Performance Testing for SharePoint Server 2010• Storage and SQL Server Capacity Planning and Configuration• Performance and Capacity Technical Case Studies• Monitoring and Maintaining SharePoint Server 2010• Performance Testing for SharePoint Server 2010• The Load Testing Kit for Visual Studio Team System• Web Capacity Analysis Tool (WCAT)
  59. 59. REFERENCES• @cjvandyk @sharepointlhorn•• Deck download• Painless deck• Logging deck• PowerPivot deck• Versions List• Corne’s Utils
  60. 60. Your Feedback is Important Please fill out a session evaluation form drop it off at the conference registration desk. Thank you!