Fixing Domino Server Sickness

  • 633 views
Uploaded on

From Engage 2014 - Breda, NL …

From Engage 2014 - Breda, NL

Updated presentation on working with Domino tools to analyse and fix problems

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
633
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
40
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. #engageug Fixing Server Sickness Gabriella Davis Technical Director The Turtle Partnership !1
  • 2. #engageug Fixing Your Server • What causes server sickness • Tools to spot sickness • Getting Your Server Back to Full Health
 !2
  • 3. #engageug Server Sickness • The problem with Domino • How does a server get sick? • Vulnerabilities • Aging Configurations • Bad Habits !3
  • 4. #engageug Server Sickness • The problem with Domino • How does a server get sick? • Vulnerabilities • Aging Configurations • Bad Habits • Developers Gone Wild !4
  • 5. #engageug The Problem With Domino • “My Server Is Running Fine” • Server Stability • Often despite our best efforts • Tasks that just run • even without being properly configured !5
  • 6. #engageug Vulnerabilities • Start with the OS • patch levels • unnecessary processes with exposed ports • disk and data security
 • Then the hardware • It’s all about disk performance • Using a SAN? Is the SAN configured for Domino? • Transaction logs configured?
 !6
  • 7. #engageug Vulnerabilities • Security • ACLs • -Default- and Anonymous • LocalDomainServers • HTTP vs HTTPs • LDAP • DIIOP • Sametime !7
  • 8. #engageug Aging Configurations • What can give you problems over time • Database sizes • More users • More tasks and features !8
  • 9. #engageug Bad Habits • What are your users doing? • what features are they using • how are they using them • are they creating repeating 10yr appointments for instance • are they copying themselves on emails • Password quality for HTTP passwords !9
  • 10. #engageug Giving Developers Power • Allowing development to dictate replication and agent scheduling • The curse of not production tested XPages code • Demands for “LDAP” or “DIIOP” for an application to work !10
  • 11. #engageug Tools to Spot Sickness • Understanding Priorities • DDM Probes and Event Analysis !11
  • 12. #engageug Tools to Spot Sickness • Understanding Priorities • DDM Probes and Event Analysis • Statistics • Catalog.nsf • QoS - new with Domino 9 • Enhanced Fault Reporting - new with Domino 9 !12
  • 13. #engageug Understanding Priorities • Server role • What do you want from your server • What are statistics telling you • Warning Levels • Is it safe to ignore ‘Warning (Low)’ and focus on ‘Fatal’ or ‘Failure’ !13
  • 14. #engageug Bringing Problems to You • Event Handlers, Event Generators, Statistics, Fault Reports and DDM Probes - where to start • Setting Statistic Thresholds • Choosing and configuring probes • Reviewing Faults • Setting up QoS behaviour !14
  • 15. #engageug Bringing Problems To You • Why we set up collection hierarchies for DDM • and how • Daily and Weekly DDM reviews • What to look out for !15
  • 16. #engageug Probes for Mail Servers • Security - Weekly • Directory Performance • Critical mail routes • Mail ‘Slack’ !16
  • 17. #engageug Probes for Application Servers • Agent run times • agent cpu usage • Security and Web Configuration !17
  • 18. #engageug Probes for Struggling Servers • OS level • disk performance (beware of reported SAN problems) • memory • network !18
  • 19. #engageug What to look for • Fatal problems • Persistent Warnings • Peak activity behaviour • uptick in problems at 9am, 1pm etc • Repetitive low level ‘annoyances’ !19
  • 20. #engageug Catalog.nsf • Not every database is immediately visible but they are all there (just hidden with selection formulae) • It’s a good place to start looking for multiple replica • It’s a good place to find ACL issues • Replicates around your domain and updates overnight !20
  • 21. #engageug QoS - Quality of Service • Monitor server health and performance • Monitors application behavior, stability and hangs • Restarts Domino if it thinks there are memory issues or an application is hung • Shuts down Domino if a clean shutdown doesn’t happen and the server hangs • Controlled via notes.ini settings and dcontroller.ini • Requires Domino to be running under the Java Controller • nserver -jc !21
  • 22. #engageug QoS Configuration • Starting Domino under Java Controller should create a dcontroller.ini file • QOS_Enable=1 • In Notes.Ini • QOS_ProbeInterval (defaults to 1 min) • QOS_ProbeTimeout (defaults to 5 mins) • QOS_ShutDown_Timeout • QOS_Apps_Timeout • QOS_Shutdown_Timeout !22
  • 23. #engageug QOS - Potential Problems • QOS doesn’t support passwords on server ids , the restart will pause at the password entry screen • QOS timeouts being too low • Don’t enable QOS on servers without transaction logging !23
  • 24. #engageug Enhanced Fault Reporting • Fault Reporting Database -lndfr.nsf • Expanded to include a by Disposition view • all faults when analyzed have a disposition value that categorises as • Problem • Possible Problem (possibly actionable ) • Possible Problem (likely NOT actionable ) • Informational • Unknown (investigate) !24
  • 25. #engageug Possible Problem - Actionable • Out Of Memory: Represents a crash in which the Java virtual machine (JVM) ran out of a memory resource such as heap space. • Launched Notes multiple times: Indicates that the user quickly launched multiple instances of the Notes client • Possible hang: Indicates that the Notes client was manually terminated while it appeared to be doing useful work. • User Kill: Indicates that the user manually terminated the client while it appeared to be waiting for input or network timeout !25
  • 26. #engageug Back to Full Health • Getting Control • Mail , Databases and ECLs • SMTP • Agent Scheduling • Directories • Adminp • LDAP • Tasks and Internet Site Documents • Domino Configuration Tuner !26
  • 27. #engageug Back to Full Health • Getting Control • Mail , Databases and ECLs • SMTP • Agent Scheduling • Directories • Adminp • LDAP • Tasks and Internet Site Documents • Domino Configuration Tuner !27
  • 28. #engageug Getting Control - Mail and Databases • Setting ACLs at directory level (Editor) • Lock down ECLs via Policies • Introducing quotas alongside server based archiving • Consider archiving files to a dedicated server • Upgrade to 8 and enable OOO router instead of agents • Disable forwarding rules set up by users • Use message tracking and mail rules very sparingly • Disable on the fly searching of non indexed databases !28
  • 29. #engageug Database Management Tools • DBMT Server Command • runs copy-style compact operations • purges deletion stubs • expires soft deleted entries • updates views • reorganizes folders • merges full-text indexes • updates unread lists • ensures that critical views are created for failover • Replaces Updall • Load updall - nodbmt tells updall to run but not perform the functions that DMBT already does !29
  • 30. #engageug DBMT Parameters • -compactThreads • -updallThreads • -ftiThreads • -timeLimit refers to compact timeout for DBMT • -range starttime stoptime • compactNdays (run Compact every x days) • ftiNdays (run FT Index every x days) • force d (day Sunday =1) fixup if compact fails for consecutive day !30
  • 31. #engageug Getting Control - SMTP • Restrict relaying to specific ip addresses not network ranges • Beware of allowing authenticated relaying and opening up to dictionary attacks • Restrict rights to send to internal groups from internet addresses • Don’t accept mail for local part matches • Configure your server for HTML mail not plain text !31
  • 32. #engageug Getting Control - SMTP (more) • Don’t allow all connecting hosts to deliver mail inbound, if you use a service restrict to those hosts • Use services / tools to spot attacks such as • persistent attempts to mass deliver within a time period • continual failures by a host to deliver to a correct address • Move responsibility for that first line of defense away from native Domino !32
  • 33. #engageug Getting Control - Agent Scheduling • When are agents set to run • amgr_newmaileventdelay • amgr_newmailagentmininterval • If you’re using OOO agents how often are they scheduled • Do users have private agents running • Sh Agents [DBName] • All shared and private agents in a database • Who has rights to run agents !33
  • 34. #engageug Getting Control - Directories • Avoid adding additional views to the Domino Directory • The risk of allowing local replicas with Author rights • Directory Assistance • Sh xdir !34
  • 35. #engageug Getting Control - Adminp • Purge old documents • Requests awaiting approval • Tell adminp process NEW not ALL !35
  • 36. #engageug Getting Control - LDAP • Allowing anonymous access to query LDAP • Authenticating LDAP queries • Extended Directory Catalog used by LDAP • Relying on DNS • Not configuring the LDAP task correctly to allow large searches with no timeouts • Maintaining schema.nsf !36
  • 37. #engageug Getting Control - Tasks and Program Documents • Disable tasks you don’t need • Schedule overnight tasks so they don’t overlap • and don’t conflict with backups • Use program documents so you can review and manage easily • sh config servertasksat* • Keeping templates on every server • Using compact -B !37
  • 38. #engageug Getting Control - Internet Site Documents • Web Configuration means TCPIP tasks are configured in the server document and are server wide • often enabled by default • Internet site documents require you to opt in for TCPIP services • configured by hostname !38
  • 39. #engageug Domino Configuration Tuner • Domino Configuration Tuner is an analysis tool based on a set of pre-configured best practice/worst practice rules • The Rules are shipped by IBM with the Lotus installs and are updated via a public update site • Makes recommendations on configuration changes to enhance performance and security and reduce TCO !39
  • 40. #engageug How does it work? • Run and installed via the Domino Configuration Tuner database • Updated by online template updates and rule updates • DCT rules and results are held in a local database and will require a restart of the client for changes to take effect • Scans • Server documents • notes.ini settings • advanced database properties • Intended to scan servers in a single domain !40
  • 41. #engageug How does it work? • Creates reports on each scanned server based on the rules you select • Each report contains • Issues • recommendations for adjustments • links to supporting documentation !41
  • 42. #engageug Pre-requisites • v8 Notes client (standard or basic) or administrator • dct.nsf database and dct.ntf template • servers 7.x or higher !42
  • 43. #engageug Setup • DCT.NSF • StdDominoConfigTuner Template (dct.ntf) • ID must have reader access to names.nsf • ID must have ‘View Administrator’ rights • Requires no server or domain changes !43
  • 44. #engageug View Administrator Rights • Server Document • Security Tab • View Administrator is a subset 
 of ‘Administrator’ rights • Think of it as ‘Show’ not ‘Tell’ rights • Sh users - YES • tell http refresh - NO !44
  • 45. #engageug DCT Preferences • List of all rules • Review rule , description and supporting documentation • All rules are enabled by default for all scans • Enable and Disable rules !45
  • 46. #engageug DCT Updates • Connects to the IBM site to download • must have outbound connectivity !46
  • 47. #engageug DCT Updates • Click ‘check for updates’ • Connects to an external IBM site to identifies any template or rule updates !47
  • 48. #engageug DCT Updates • Accept license and updates download • It’s not possible to selectively download !48
  • 49. #engageug DCT Updates - Finished • “Successful” screen will notify you to restart your client • You may need to do 2 client restarts before DCT can be used !49
  • 50. #engageug • First select the servers in your current domain you want to run against • The list of servers is retrieved from the domain of the home server identified in your location document • Change locations to scan a different domain Running the tuner !50
  • 51. #engageug • You can manually type in the full hierarchical names of any other servers you want to scan as part of this analysis • Separate multiple server names with commas, semi colons or new lines • You can only scan servers you can reach so you need a connection document to any you list • or the server needs to be available via your passthru server in your location Running the tuner !51
  • 52. #engageug Understanding the Results • Summary results • Issues by criticality !52
  • 53. #engageug Understanding the Results • Summary results • Servers that failed to scan • reason why scan failed !53
  • 54. #engageug Understanding the Results • Summary results • Detailed list of rules evaluated !54
  • 55. #engageug Understanding the Results • View the current report • Select ‘change’ to view a different report !55
  • 56. #engageug Understanding the Results • Filter results to make analysis easier • by server • by specific rules • by severity !56
  • 57. #engageug Understanding the results • Categorised results of recommendations • Sorted by criticality and then by server name !57
  • 58. #engageug Understanding the results • Each recommendation comes with an explanation so you can evaluate on a result by result basis if you want to make the change !58
  • 59. #engageug • Each recommendation is provided with a link to a best / worst practices supporting documentation Understanding the results !59
  • 60. #engageug Working with Rules • Disabling and enabling rules can be done through the ‘Preferences’ !60
  • 61. #engageug Working with Rules • Selecting a rule shows the description and links to the best / worst practice documentation !61
  • 62. #engageug Making Changes • Advanced Database Properties • assigned en masse via Domino Admin • notes.ini settings • assigned via the command set config xxx = x • shown via the command sh config xxx = x • Many recommendations refer to ‘some databases’ but don’t specify which ones - check which ones will be affected !62
  • 63. #engageug Resources • Domino Configuration Tuner blog • http://www.bleedyellow.com/blogs/DCT/ • details and explanations of new rules published each month !63
  • 64. #engageug Summary • No matter how well your servers are configured they will continue to degrade in performance over time unless you pro-actively monitor and fix • Many of the server performance issues will be seen first by your users before they filter down to you • Make reviewing your server configuration using DDM probes followed by a DCT analysis part of every server upgrade • Enable probes that are specific to the server role. Mail and Directory probes on Mail servers and Agent probes on Application servers • Use Security and Database probes configured in DDM to stay on top of any low level warnings that could cause larger problems in the future • Don’t over configure your servers to monitor everything or you’ll be looking for a needle in a haystack. Ask your servers to tell you only what you need to be aware of so immediately • Use the built in tools, DCT, Statistics, DDM, Catalog, Activity Trends to monitor your servers and gain a good understanding of what is their ‘normal’ behaviour so you can more easily spot when something goes wrong. !64
  • 65. #engageug Questions !65 How to contact me: Gabriella Davis gabriella@turtlepartnership.com Twitter: gabturtle