#engageug
Fixing Server Sickness
Gabriella Davis
Technical Director
The Turtle Partnership
!1
#engageug
Fixing Your Server
• What causes server sickness
• Tools to spot sickness
• Getting Your Server Back to Full Hea...
#engageug
Server Sickness
• The problem with Domino
• How does a server get sick?
• Vulnerabilities
• Aging Configurations...
#engageug
Server Sickness
• The problem with Domino
• How does a server get sick?
• Vulnerabilities
• Aging Configurations...
#engageug
The Problem With Domino
• “My Server Is Running Fine”
• Server Stability
• Often despite our best efforts
• Task...
#engageug
Vulnerabilities
• Start with the OS
• patch levels
• unnecessary processes with exposed ports
• disk and data se...
#engageug
Vulnerabilities
• Security
• ACLs
• -Default- and Anonymous
• LocalDomainServers
• HTTP vs HTTPs
• LDAP
• DIIOP
...
#engageug
Aging Configurations
• What can give you problems over time
• Database sizes
• More users
• More tasks and featu...
#engageug
Bad Habits
• What are your users doing?
• what features are they using
• how are they using them
• are they crea...
#engageug
Giving Developers Power
• Allowing development to dictate replication and agent
scheduling
• The curse of not pr...
#engageug
Tools to Spot Sickness
• Understanding Priorities
• DDM Probes and Event Analysis
!11
#engageug
Tools to Spot Sickness
• Understanding Priorities
• DDM Probes and Event Analysis
• Statistics
• Catalog.nsf
• Q...
#engageug
Understanding Priorities
• Server role
• What do you want from your server
• What are statistics telling you
• W...
#engageug
Bringing Problems to You
• Event Handlers, Event Generators, Statistics, Fault Reports
and DDM Probes - where to...
#engageug
Bringing Problems To You
• Why we set up collection hierarchies for DDM
• and how
• Daily and Weekly DDM reviews...
#engageug
Probes for Mail Servers
• Security - Weekly
• Directory Performance
• Critical mail routes
• Mail ‘Slack’
!16
#engageug
Probes for Application Servers
• Agent run times
• agent cpu usage
• Security and Web Configuration
!17
#engageug
Probes for Struggling Servers
• OS level
• disk performance (beware of reported SAN problems)
• memory
• network...
#engageug
What to look for
• Fatal problems
• Persistent Warnings
• Peak activity behaviour
• uptick in problems at 9am, 1...
#engageug
Catalog.nsf
• Not every database is immediately visible but they are all
there (just hidden with selection formu...
#engageug
QoS - Quality of Service
• Monitor server health and performance
• Monitors application behavior, stability and ...
#engageug
QoS Configuration
• Starting Domino under Java Controller should create a
dcontroller.ini file
• QOS_Enable=1
• ...
#engageug
QOS - Potential Problems
• QOS doesn’t support passwords on server ids , the restart
will pause at the password ...
#engageug
Enhanced Fault Reporting
• Fault Reporting Database -lndfr.nsf
• Expanded to include a by Disposition view
• all...
#engageug
Possible Problem - Actionable
• Out Of Memory: Represents a crash in which the Java virtual
machine (JVM) ran ou...
#engageug
Back to Full Health
• Getting Control
• Mail , Databases and ECLs
• SMTP
• Agent Scheduling
• Directories
• Admi...
#engageug
Back to Full Health
• Getting Control
• Mail , Databases and ECLs
• SMTP
• Agent Scheduling
• Directories
• Admi...
#engageug
Getting Control - Mail and Databases
• Setting ACLs at directory level (Editor)
• Lock down ECLs via Policies
• ...
#engageug
Database Management Tools
• DBMT Server Command
• runs copy-style compact operations
• purges deletion stubs
• e...
#engageug
DBMT Parameters
• -compactThreads
• -updallThreads
• -ftiThreads
• -timeLimit refers to compact timeout for DBMT...
#engageug
Getting Control - SMTP
• Restrict relaying to specific ip addresses not network ranges
• Beware of allowing auth...
#engageug
Getting Control - SMTP (more)
• Don’t allow all connecting hosts to deliver mail inbound, if
you use a service r...
#engageug
Getting Control - Agent Scheduling
• When are agents set to run
• amgr_newmaileventdelay
• amgr_newmailagentmini...
#engageug
Getting Control - Directories
• Avoid adding additional views to the Domino Directory
• The risk of allowing loc...
#engageug
Getting Control - Adminp
• Purge old documents
• Requests awaiting approval
• Tell adminp process NEW not ALL
!35
#engageug
Getting Control - LDAP
• Allowing anonymous access to query LDAP
• Authenticating LDAP queries
• Extended Direct...
#engageug
Getting Control - Tasks and Program
Documents
• Disable tasks you don’t need
• Schedule overnight tasks so they ...
#engageug
Getting Control - Internet Site Documents
• Web Configuration means TCPIP tasks are configured in the
server doc...
#engageug
Domino Configuration Tuner
• Domino Configuration Tuner is an analysis tool based on a
set of pre-configured bes...
#engageug
How does it work?
• Run and installed via the Domino Configuration Tuner
database
• Updated by online template u...
#engageug
How does it work?
• Creates reports on each scanned server based on the rules
you select
• Each report contains
...
#engageug
Pre-requisites
• v8 Notes client (standard or basic) or administrator
• dct.nsf database and dct.ntf template
• ...
#engageug
Setup
• DCT.NSF
• StdDominoConfigTuner Template (dct.ntf)
• ID must have reader access to names.nsf
• ID must ha...
#engageug
View Administrator Rights
• Server Document
• Security Tab
• View Administrator is a subset 

of ‘Administrator’...
#engageug
DCT Preferences
• List of all rules
• Review rule , description and supporting documentation
• All rules are ena...
#engageug
DCT Updates
• Connects to the IBM site to download
• must have outbound connectivity
!46
#engageug
DCT Updates
• Click ‘check for updates’
• Connects to an external IBM site to identifies any template or
rule up...
#engageug
DCT Updates
• Accept license and updates download
• It’s not possible to selectively download
!48
#engageug
DCT Updates - Finished
• “Successful” screen will notify you to restart your client
• You may need to do 2 clien...
#engageug
• First select the servers in your current domain you want to run
against
• The list of servers is retrieved fro...
#engageug
• You can manually type in the full hierarchical names of any
other servers you want to scan as part of this ana...
#engageug
Understanding the Results
• Summary results
• Issues by criticality
!52
#engageug
Understanding the Results
• Summary results
• Servers that failed to scan
• reason why scan failed
!53
#engageug
Understanding the Results
• Summary results
• Detailed list of rules evaluated
!54
#engageug
Understanding the Results
• View the current report
• Select ‘change’ to view a different report
!55
#engageug
Understanding the Results
• Filter results to make analysis easier
• by server
• by specific rules
• by severity...
#engageug
Understanding the results
• Categorised results of recommendations
• Sorted by criticality and then by server na...
#engageug
Understanding the results
• Each recommendation comes with an explanation so you
can evaluate on a result by res...
#engageug
• Each recommendation is provided with a link to a best /
worst practices supporting documentation
Understanding...
#engageug
Working with Rules
• Disabling and enabling rules can be done through the
‘Preferences’
!60
#engageug
Working with Rules
• Selecting a rule shows the description and links to the best /
worst practice documentation...
#engageug
Making Changes
• Advanced Database Properties
• assigned en masse via Domino Admin
• notes.ini settings
• assign...
#engageug
Resources
• Domino Configuration Tuner blog
• http://www.bleedyellow.com/blogs/DCT/
• details and explanations o...
#engageug
Summary
• No matter how well your servers are configured they will continue to degrade in
performance over time ...
#engageug
Questions
!65
How to contact me:
Gabriella Davis
gabriella@turtlepartnership.com
Twitter: gabturtle
Upcoming SlideShare
Loading in...5
×

Fixing Domino Server Sickness

1,023

Published on

From Engage 2014 - Breda, NL

Updated presentation on working with Domino tools to analyse and fix problems

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,023
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
74
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Fixing Domino Server Sickness

  1. 1. #engageug Fixing Server Sickness Gabriella Davis Technical Director The Turtle Partnership !1
  2. 2. #engageug Fixing Your Server • What causes server sickness • Tools to spot sickness • Getting Your Server Back to Full Health
 !2
  3. 3. #engageug Server Sickness • The problem with Domino • How does a server get sick? • Vulnerabilities • Aging Configurations • Bad Habits !3
  4. 4. #engageug Server Sickness • The problem with Domino • How does a server get sick? • Vulnerabilities • Aging Configurations • Bad Habits • Developers Gone Wild !4
  5. 5. #engageug The Problem With Domino • “My Server Is Running Fine” • Server Stability • Often despite our best efforts • Tasks that just run • even without being properly configured !5
  6. 6. #engageug Vulnerabilities • Start with the OS • patch levels • unnecessary processes with exposed ports • disk and data security
 • Then the hardware • It’s all about disk performance • Using a SAN? Is the SAN configured for Domino? • Transaction logs configured?
 !6
  7. 7. #engageug Vulnerabilities • Security • ACLs • -Default- and Anonymous • LocalDomainServers • HTTP vs HTTPs • LDAP • DIIOP • Sametime !7
  8. 8. #engageug Aging Configurations • What can give you problems over time • Database sizes • More users • More tasks and features !8
  9. 9. #engageug Bad Habits • What are your users doing? • what features are they using • how are they using them • are they creating repeating 10yr appointments for instance • are they copying themselves on emails • Password quality for HTTP passwords !9
  10. 10. #engageug Giving Developers Power • Allowing development to dictate replication and agent scheduling • The curse of not production tested XPages code • Demands for “LDAP” or “DIIOP” for an application to work !10
  11. 11. #engageug Tools to Spot Sickness • Understanding Priorities • DDM Probes and Event Analysis !11
  12. 12. #engageug Tools to Spot Sickness • Understanding Priorities • DDM Probes and Event Analysis • Statistics • Catalog.nsf • QoS - new with Domino 9 • Enhanced Fault Reporting - new with Domino 9 !12
  13. 13. #engageug Understanding Priorities • Server role • What do you want from your server • What are statistics telling you • Warning Levels • Is it safe to ignore ‘Warning (Low)’ and focus on ‘Fatal’ or ‘Failure’ !13
  14. 14. #engageug Bringing Problems to You • Event Handlers, Event Generators, Statistics, Fault Reports and DDM Probes - where to start • Setting Statistic Thresholds • Choosing and configuring probes • Reviewing Faults • Setting up QoS behaviour !14
  15. 15. #engageug Bringing Problems To You • Why we set up collection hierarchies for DDM • and how • Daily and Weekly DDM reviews • What to look out for !15
  16. 16. #engageug Probes for Mail Servers • Security - Weekly • Directory Performance • Critical mail routes • Mail ‘Slack’ !16
  17. 17. #engageug Probes for Application Servers • Agent run times • agent cpu usage • Security and Web Configuration !17
  18. 18. #engageug Probes for Struggling Servers • OS level • disk performance (beware of reported SAN problems) • memory • network !18
  19. 19. #engageug What to look for • Fatal problems • Persistent Warnings • Peak activity behaviour • uptick in problems at 9am, 1pm etc • Repetitive low level ‘annoyances’ !19
  20. 20. #engageug Catalog.nsf • Not every database is immediately visible but they are all there (just hidden with selection formulae) • It’s a good place to start looking for multiple replica • It’s a good place to find ACL issues • Replicates around your domain and updates overnight !20
  21. 21. #engageug QoS - Quality of Service • Monitor server health and performance • Monitors application behavior, stability and hangs • Restarts Domino if it thinks there are memory issues or an application is hung • Shuts down Domino if a clean shutdown doesn’t happen and the server hangs • Controlled via notes.ini settings and dcontroller.ini • Requires Domino to be running under the Java Controller • nserver -jc !21
  22. 22. #engageug QoS Configuration • Starting Domino under Java Controller should create a dcontroller.ini file • QOS_Enable=1 • In Notes.Ini • QOS_ProbeInterval (defaults to 1 min) • QOS_ProbeTimeout (defaults to 5 mins) • QOS_ShutDown_Timeout • QOS_Apps_Timeout • QOS_Shutdown_Timeout !22
  23. 23. #engageug QOS - Potential Problems • QOS doesn’t support passwords on server ids , the restart will pause at the password entry screen • QOS timeouts being too low • Don’t enable QOS on servers without transaction logging !23
  24. 24. #engageug Enhanced Fault Reporting • Fault Reporting Database -lndfr.nsf • Expanded to include a by Disposition view • all faults when analyzed have a disposition value that categorises as • Problem • Possible Problem (possibly actionable ) • Possible Problem (likely NOT actionable ) • Informational • Unknown (investigate) !24
  25. 25. #engageug Possible Problem - Actionable • Out Of Memory: Represents a crash in which the Java virtual machine (JVM) ran out of a memory resource such as heap space. • Launched Notes multiple times: Indicates that the user quickly launched multiple instances of the Notes client • Possible hang: Indicates that the Notes client was manually terminated while it appeared to be doing useful work. • User Kill: Indicates that the user manually terminated the client while it appeared to be waiting for input or network timeout !25
  26. 26. #engageug Back to Full Health • Getting Control • Mail , Databases and ECLs • SMTP • Agent Scheduling • Directories • Adminp • LDAP • Tasks and Internet Site Documents • Domino Configuration Tuner !26
  27. 27. #engageug Back to Full Health • Getting Control • Mail , Databases and ECLs • SMTP • Agent Scheduling • Directories • Adminp • LDAP • Tasks and Internet Site Documents • Domino Configuration Tuner !27
  28. 28. #engageug Getting Control - Mail and Databases • Setting ACLs at directory level (Editor) • Lock down ECLs via Policies • Introducing quotas alongside server based archiving • Consider archiving files to a dedicated server • Upgrade to 8 and enable OOO router instead of agents • Disable forwarding rules set up by users • Use message tracking and mail rules very sparingly • Disable on the fly searching of non indexed databases !28
  29. 29. #engageug Database Management Tools • DBMT Server Command • runs copy-style compact operations • purges deletion stubs • expires soft deleted entries • updates views • reorganizes folders • merges full-text indexes • updates unread lists • ensures that critical views are created for failover • Replaces Updall • Load updall - nodbmt tells updall to run but not perform the functions that DMBT already does !29
  30. 30. #engageug DBMT Parameters • -compactThreads • -updallThreads • -ftiThreads • -timeLimit refers to compact timeout for DBMT • -range starttime stoptime • compactNdays (run Compact every x days) • ftiNdays (run FT Index every x days) • force d (day Sunday =1) fixup if compact fails for consecutive day !30
  31. 31. #engageug Getting Control - SMTP • Restrict relaying to specific ip addresses not network ranges • Beware of allowing authenticated relaying and opening up to dictionary attacks • Restrict rights to send to internal groups from internet addresses • Don’t accept mail for local part matches • Configure your server for HTML mail not plain text !31
  32. 32. #engageug Getting Control - SMTP (more) • Don’t allow all connecting hosts to deliver mail inbound, if you use a service restrict to those hosts • Use services / tools to spot attacks such as • persistent attempts to mass deliver within a time period • continual failures by a host to deliver to a correct address • Move responsibility for that first line of defense away from native Domino !32
  33. 33. #engageug Getting Control - Agent Scheduling • When are agents set to run • amgr_newmaileventdelay • amgr_newmailagentmininterval • If you’re using OOO agents how often are they scheduled • Do users have private agents running • Sh Agents [DBName] • All shared and private agents in a database • Who has rights to run agents !33
  34. 34. #engageug Getting Control - Directories • Avoid adding additional views to the Domino Directory • The risk of allowing local replicas with Author rights • Directory Assistance • Sh xdir !34
  35. 35. #engageug Getting Control - Adminp • Purge old documents • Requests awaiting approval • Tell adminp process NEW not ALL !35
  36. 36. #engageug Getting Control - LDAP • Allowing anonymous access to query LDAP • Authenticating LDAP queries • Extended Directory Catalog used by LDAP • Relying on DNS • Not configuring the LDAP task correctly to allow large searches with no timeouts • Maintaining schema.nsf !36
  37. 37. #engageug Getting Control - Tasks and Program Documents • Disable tasks you don’t need • Schedule overnight tasks so they don’t overlap • and don’t conflict with backups • Use program documents so you can review and manage easily • sh config servertasksat* • Keeping templates on every server • Using compact -B !37
  38. 38. #engageug Getting Control - Internet Site Documents • Web Configuration means TCPIP tasks are configured in the server document and are server wide • often enabled by default • Internet site documents require you to opt in for TCPIP services • configured by hostname !38
  39. 39. #engageug Domino Configuration Tuner • Domino Configuration Tuner is an analysis tool based on a set of pre-configured best practice/worst practice rules • The Rules are shipped by IBM with the Lotus installs and are updated via a public update site • Makes recommendations on configuration changes to enhance performance and security and reduce TCO !39
  40. 40. #engageug How does it work? • Run and installed via the Domino Configuration Tuner database • Updated by online template updates and rule updates • DCT rules and results are held in a local database and will require a restart of the client for changes to take effect • Scans • Server documents • notes.ini settings • advanced database properties • Intended to scan servers in a single domain !40
  41. 41. #engageug How does it work? • Creates reports on each scanned server based on the rules you select • Each report contains • Issues • recommendations for adjustments • links to supporting documentation !41
  42. 42. #engageug Pre-requisites • v8 Notes client (standard or basic) or administrator • dct.nsf database and dct.ntf template • servers 7.x or higher !42
  43. 43. #engageug Setup • DCT.NSF • StdDominoConfigTuner Template (dct.ntf) • ID must have reader access to names.nsf • ID must have ‘View Administrator’ rights • Requires no server or domain changes !43
  44. 44. #engageug View Administrator Rights • Server Document • Security Tab • View Administrator is a subset 
 of ‘Administrator’ rights • Think of it as ‘Show’ not ‘Tell’ rights • Sh users - YES • tell http refresh - NO !44
  45. 45. #engageug DCT Preferences • List of all rules • Review rule , description and supporting documentation • All rules are enabled by default for all scans • Enable and Disable rules !45
  46. 46. #engageug DCT Updates • Connects to the IBM site to download • must have outbound connectivity !46
  47. 47. #engageug DCT Updates • Click ‘check for updates’ • Connects to an external IBM site to identifies any template or rule updates !47
  48. 48. #engageug DCT Updates • Accept license and updates download • It’s not possible to selectively download !48
  49. 49. #engageug DCT Updates - Finished • “Successful” screen will notify you to restart your client • You may need to do 2 client restarts before DCT can be used !49
  50. 50. #engageug • First select the servers in your current domain you want to run against • The list of servers is retrieved from the domain of the home server identified in your location document • Change locations to scan a different domain Running the tuner !50
  51. 51. #engageug • You can manually type in the full hierarchical names of any other servers you want to scan as part of this analysis • Separate multiple server names with commas, semi colons or new lines • You can only scan servers you can reach so you need a connection document to any you list • or the server needs to be available via your passthru server in your location Running the tuner !51
  52. 52. #engageug Understanding the Results • Summary results • Issues by criticality !52
  53. 53. #engageug Understanding the Results • Summary results • Servers that failed to scan • reason why scan failed !53
  54. 54. #engageug Understanding the Results • Summary results • Detailed list of rules evaluated !54
  55. 55. #engageug Understanding the Results • View the current report • Select ‘change’ to view a different report !55
  56. 56. #engageug Understanding the Results • Filter results to make analysis easier • by server • by specific rules • by severity !56
  57. 57. #engageug Understanding the results • Categorised results of recommendations • Sorted by criticality and then by server name !57
  58. 58. #engageug Understanding the results • Each recommendation comes with an explanation so you can evaluate on a result by result basis if you want to make the change !58
  59. 59. #engageug • Each recommendation is provided with a link to a best / worst practices supporting documentation Understanding the results !59
  60. 60. #engageug Working with Rules • Disabling and enabling rules can be done through the ‘Preferences’ !60
  61. 61. #engageug Working with Rules • Selecting a rule shows the description and links to the best / worst practice documentation !61
  62. 62. #engageug Making Changes • Advanced Database Properties • assigned en masse via Domino Admin • notes.ini settings • assigned via the command set config xxx = x • shown via the command sh config xxx = x • Many recommendations refer to ‘some databases’ but don’t specify which ones - check which ones will be affected !62
  63. 63. #engageug Resources • Domino Configuration Tuner blog • http://www.bleedyellow.com/blogs/DCT/ • details and explanations of new rules published each month !63
  64. 64. #engageug Summary • No matter how well your servers are configured they will continue to degrade in performance over time unless you pro-actively monitor and fix • Many of the server performance issues will be seen first by your users before they filter down to you • Make reviewing your server configuration using DDM probes followed by a DCT analysis part of every server upgrade • Enable probes that are specific to the server role. Mail and Directory probes on Mail servers and Agent probes on Application servers • Use Security and Database probes configured in DDM to stay on top of any low level warnings that could cause larger problems in the future • Don’t over configure your servers to monitor everything or you’ll be looking for a needle in a haystack. Ask your servers to tell you only what you need to be aware of so immediately • Use the built in tools, DCT, Statistics, DDM, Catalog, Activity Trends to monitor your servers and gain a good understanding of what is their ‘normal’ behaviour so you can more easily spot when something goes wrong. !64
  65. 65. #engageug Questions !65 How to contact me: Gabriella Davis gabriella@turtlepartnership.com Twitter: gabturtle
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×