Successfully reported this slideshow.

AusLug2012 - Client serve and application monitoring and optimization done right!

1,744 views

Published on

Published in: Technology, Business
  • Be the first to comment

AusLug2012 - Client serve and application monitoring and optimization done right!

  1. 1. AusLUG2012 Client, Server and Application Monitoring and Optimization done right Florian Vogler | CEO & CTO | panagendaMeet.Share.Learn www.panagenda.comEfficiency describes the extent to which time or effortis well used for an intended task or purpose. 29th & 30th March, Melbourne, Victoria, Australia
  2. 2. AusLUG2012 Agenda Coming up next … Who am I? … and about panagenda Laying the basics of what is actually possible – or: • What Admins and IT departments have to cope with Deep Diving … • The 30 most important server statistics (out of ~2.000) • … and Clients? • … and Groups? • … and Databases?Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  3. 3. AusLUG2012 About Florian Vogler CEO & CTO – (hopefully) representative for the great work of my colleagues at panagenda Born in Hamburg (DE), lived in London (UK), Vienna (AT), Frankfurt (DE), Alicante (ES); currently back in Frankfurt (DE) Lotus Notes / Domino since 1992 Started to work with Notes at Raiffeisen Austria • Administration and Development • 35,000 user worldwide (today > 100,000) Since 2002 core competency Client Management, Notes / Domino infrastructure analysis and optimization I enjoy working with many great companies in many different countries (I travel *a lot*)Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  4. 4. AusLUG2012 About panagenda We network symbiotic relationships with our customers and partners for ongoing joint win-win HQ: Vienna/AT, offices in Heppenheim near Frankfurt/DE, Boston/USA Development of standard products > 4 million licenses in over 70 countries IBM Lotus Notes Client Management MarvelClient :: „99%“ manageability (not „just“ IBM Lotus Domino) Server Analytics, Monitoring & Reporting GreenLight :: realtime, longterm, smart Analyze Groups, Certifiers and ACLs GroupExplorer :: better transparency, security & automation plus: NameChanger (Name changes), DatabaseExplorer (Design Analysis), Notes2Web (Web transformation)Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  5. 5. AusLUG2012 Agenda Coming up next … Who am I? … and about panagenda Laying the basics of what is actually possible – or: • What Admins and IT departments have to cope with Deep Diving … • The 30 most important server statistics (out of ~2.000) • … and Clients? • … and Groups? • … and Databases?Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  6. 6. AusLUG2012 What Admins and IT departments have to cope with • Above all: Lack of knowledge (apologies) • Mostly because of overstress  No time (anymore) for the inner workings of clients, servers, and systems  Growing complexity of single systems  Growing number of systems Development stages of teddy  „Laying the egg“ = yes; bears (Proactive) „Nurturing“ = no. • Unknown sources of knowledge • Lack of time • If you dont take the time to do things right you’ll need the time to do them over Newborn bear 3 month old bear, Full-grown teddy without fur with thick fur • „Wrong“ &| missing tooling Grown environments: large servers are fundamentally different from small ones; new ones (8) from old ones (< 8)!Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  7. 7. AusLUG2012 What Admins and IT departments have to cope with Systemic interactions / dependencies in Lotus Notes / Domino Hardware (CPU, Memory) Data storage Across all: Servers Network connection Configuration Geographies Databases, tasks, mail traffic Network (bandwidth, structure) … Online/Offline Clustering/Loadbalancing … Hardware ODS Data storage Size NW connection Clients Databases Reader fields Configuration Design Databases # & Size of documents … …Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  8. 8. AusLUG2012 Lotus Domino „out of the box“ tooling  Public NAB – Servers – Clusters – People/Groups – Directory – Messaging – Replication – Policies – Web ConfigurationMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  9. 9. AusLUG2012 Lotus Domino „out of the box“ tooling  Public NAB (8)  Log.nsf – Miscellaneous (!!) – Mail – Replication – (Database) Usage – Passthru ConnectionsMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  10. 10. AusLUG2012 Lotus Domino „out of the box“ tooling  Public NAB (8)  Log.nsf (5)  Admin Client – Monitoring  Tip 1: Enable Health- Monitoring in Admin Preferences  Tip 2: Disable „Refresh server bookmarks“Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  11. 11. AusLUG2012 Lotus Domino „out of the box“ tooling  Public NAB (8)  Log.nsf (5)  Admin Client – Monitoring (1) – Analysis (~15) (ACL, Catalog, AdminP, ...) – Statistics (1 or ~1.200) – Activity Trends („1“) – Messaging („1“) – Replication („1“)Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  12. 12. AusLUG2012 Lotus Domino „out of the box“ tooling  Public NAB (8)  Log.nsf (5)  Admin Client (20)  Events (1) & DDM … – Probes – Filters – Collection Hierarchy – Event Handlers – Event GeneratorsMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  13. 13. AusLUG2012 Lotus Domino „out of the box“ tooling  Public NAB (8)  Log.nsf (5)  Admin Client (20)  Events & DDM (6)  Monitoring Results (statrep) – Alarms – Events – Statistic ReportsMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  14. 14. AusLUG2012 Lotus Domino „out of the box“ tooling 42     Public NAB (8) Log.nsf (5) Admin Client (20) Events & DDM (6)  Monitoring Results (statrep) Although 42 is (3) „the answer to life, the universe and everything“  8 + 5 + 20 + 6 + 3 = 42 (according to the Hitchhikers Guide to the Galaxy)  That‘s at least 42 views / areas, that doesn‘t help much one should monitor ... for LN/D Monitoring & Analysis Tip 3: In case you don‘t know the Hitchhikers Guide to the Galaxy from Douglas Adams  Must ReadMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  15. 15. AusLUG2012 Making more of what you already have • Many companies don‘t even use what‘s in the box already … • (As said earlier): Realtime Server Monitoring with Health Monitoring • DDM – Domino Domain Monitoring (sometimes a bit too much, but then again much better than nothing!) • Frequent reviews of Groups • Frequent checking of the most important server stats (more of that later) • Look through Lotusphere presentations •… • Investigate Usage-views in log.nsf; for example …Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  16. 16. AusLUG2012 A sample analysis of usage information from log.nsf (that you can do yourself easily) Copy/Paste in Excel  Daten Sortieren nach z.B. TransaktionenMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  17. 17. AusLUG2012 Possibilities are endless (unfortunately, time is not) • In almost all of the beforementioned areas one can (and should) „dig deeper“ • Unfortunately, digging deeper requires (time- consuming) correlation of data, e.g. … • Connection documents and log.nsf (db usage):  How much Mail- and/or Replication traffic is there between which servers? • Clients and log.nsf - database usage:  Which users cause what load from where? • Database details from clients and servers:  Who has replicas of databases s/he no longer has access to?  Who has (unencrypted) replicas of critical databases? • Network compression between servers and clients • A lot of the data is either already there or (relatively ;-)) easy to get a hold of • Correlation pays back (repeatedly) …Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  18. 18. AusLUG2012 A picture says a thousand words … Topological visualization of Mail- & Replication-Traffic between ServersMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  19. 19. AusLUG2012 A picture says a thousand words … One way to look at network compression 87% = 1 Server of your IBM Lotus Domino servers use port compression (33 of 38 servers) 75% = 1.000 Clients of your IBM Notes Clients use port compression (35,409 of 47,212 clients)Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  20. 20. AusLUG2012 A picture says a thousand words … Another way to look at network compression 4 2 saved (GByte) 2.30 3.30 1.65 transfered (GByte) 0 current setup no port compression full port compression ● Network transfer volume per day: 3.3 Gbyte ● Current settings: 60% configured „correctly“  ~1 GByte / 30% saved ● Applying port compression to all your servers and clients could save you an additional ~0.65 GByte every day which is an additional 28% reduction / absolute 50% reduction of trafficMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  21. 21. AusLUG2012 Agenda Coming up next … Who am I? … and about panagenda Laying the basics of what is actually possible – or: • What Admins and IT departments have to cope with Deep Diving … • The 30 most important server statistics (out of ~2.000) • … and Clients? • … and Groups? • … and Databases?Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  22. 22. AusLUG2012 Before we look at the 30 most important server statistics … • Difficult – if not impossible – to test in the lab • Start with the obvious / easy things • Note down current settings before changing them • Think in possible interdependencies • „Too much good“ can actually harm performance (or lead to „Out of Memory“) • Don‘t change (too) many things at once • Unless it‘s absolutely necessary / so „documented“ • Watch your servers for some (sense making) time after making changes • Check whether/that your servers are doing better • „Google“ • Think along/ahead • Have the heart to try • This is just the beginning – stay curious!Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  23. 23. AusLUG2012 And another preliminary note (last one(s), promised ;-)) • Many of the following statistics cannot be grasped with a ‚single‘ „sh sta“, but require analysis „over time“ • Otherwise you won‘t know whether you‘re looking at a permant / recurring / onetime / sometime problem • Otherwise you won‘t know whether changes actually improved things (or made things worse) • A picture says a thousand words … • Admin Client can be used as a starting point … (unfortunately, it is very limited)Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  24. 24. AusLUG2012 ViewRebuildDir & Disk optimization(s) Most important of all: free disk space & disk performance („30%“ to prevent fragmentation) Seperate, dedicated disks for … – Translog – Data – If possible, own disk for page file/OS – „ViewRebuildDir“=… view indexing on its own disk – From 8.5.3. on where necessary/wanted .ft-directories on own diskMeet.Share.Learn – DAOS („cheap“) 29th & 30th March, Melbourne, Victoria, Australia
  25. 25. AusLUG2012 Server.Availability Shows how available = „ready to respond“ a Server is (in %) < 30% means trouble (or loadbalancing); IF the Availability Index is correct in the first place …  (Only!) if the server is well busy: „sh ai“ on server console; results in recommendation on how to tune ini:SERVER_TRANSINFO_RANGE From notes 8.5 and up, you are advised to set: – notes.ini: Server_MinPossibleTransTime=1500 – notes.ini: Server_MaxPossibleTransTime=20000000 Important: Delete loadmon.ncf after server shutdown in order to delete oldMeet.Share.Learn values 29th & 30th March, Melbourne, Victoria, Australia
  26. 26. AusLUG2012 Keep an eye on Monitor.* Warnings; Examples Monitor.Last.ADMIN PROCESS.Warning(High)Text = Disk space statistics could not be found on Servername/Cert. Monitor.Last.EVENT MONITOR.Warning(High)Text = Event: Error adding event document to Domino Domain Monitoring: Event correlation cache is full. You can increase its size via the NOTES.INI setting EVENT_CORRELATION_POOL_SIZE. Monitor.Last.INDEX ALL.Warning(High)Text = Error updating view #4538 in mailnameabc.nsf: The single copy template associated with this database cannot be located. Monitor.Last.SMTP SERVER.FailureText = SMTP Server: Initialization failure: Message Queue name already in use. Monitor.Last.STATISTICS.Warning(High)Text = Unable to update activity document in log database for mailnamexyz.nsf: In Datenbank kann nicht geschrieben werden, da die Datenbank die erlaubte Größe überschreiten würde.Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  27. 27. AusLUG2012 Server.Sessions.Dropped Tells you how many sessions have been ‚dropped‘ since last server restart Happens when • issuing a serverside „Drop all“ „Drop all“ • Pressing Ctrl+Break on clients („frustration- meter“) „different“ Problem Server.Sessions.Dropped = 25407 18/6 – 18/10 = 4*30 = 120 days 25407 / 120 = 211 sessions dropped per day Should be further correlated with peak # of usersMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  28. 28. AusLUG2012 Platform.LogicalDisk.* Platform.LogicalDisk.1.AssignedName = D Platform.LogicalDisk.2.AssignedName = C Platform.LogicalDisk.1.AvgQueueLen = 0 Platform.LogicalDisk.2.AvgQueueLen = 0,01 Platform.LogicalDisk.1.AvgQueueLen.Avg = 0,01 Platform.LogicalDisk.2.AvgQueueLen.Avg = 0,73 Platform.LogicalDisk.1.AvgQueueLen.Peak = 1,01 Platform.LogicalDisk.2.AvgQueueLen.Peak = 34,74 Platform.LogicalDisk.1.BytesReadPerSec = 0 Platform.LogicalDisk.2.BytesReadPerSec = 17.272,75 Platform.LogicalDisk.1.BytesWrittenPerSec = 10.172,49 Platform.LogicalDisk.2.BytesWrittenPerSec = 63.697,52 Platform.LogicalDisk.1.PctUtil = 0,22 Platform.LogicalDisk.2.PctUtil = 1,11 Platform.LogicalDisk.1.PctUtil.Avg = 0,86 Platform.LogicalDisk.2.PctUtil.Avg = 72,8 Platform.LogicalDisk.1.PctUtil.Peak = 101,07 Platform.LogicalDisk.2.PctUtil.Peak = 3.473,81 Platform.LogicalDisk.1.ReadsPerSec = 0 Platform.LogicalDisk.2.ReadsPerSec = 2,58 Platform.LogicalDisk.1.WritesPerSec = 2,07 Platform.LogicalDisk.2.WritesPerSec = 7,3 Interpretation GOOD < 2% < AvgQueueLen > 5% > BAD (1-100% = 0,01 – 1,0!) GOOD = PctUtil < 80% (1-100% = 1-100) NOTE: may need to divide by # of spindles  SAN/NAS Solution Various parameters (bufferpool, cache, namelookup) and OS / Disk TuningMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  29. 29. AusLUG2012 Platform.LogicalDisk.#.PctUtilMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  30. 30. AusLUG2012 Mail.Mailbox.* Mail.Mailbox.AccessConflicts/Mail.Mailbox.Accesses) x 100 Must be < 2, otherwise: add another Mailbox (benefit increase decreases above 4-5 mailboxes) Example: Mail.Mailbox.AccessConflicts = 1636 Mail.Mailbox.Accesses = 189864 = 0,86 = okMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  31. 31. AusLUG2012 Update.PendingList Update.PendingList = number of Background: views waiting to be updated • If you have many databases/apps … If • … and a busy update task – Full text index could be the reason for slowing down Update.PendingList / “blocking” view indexing „is often“ > 0, then … •  Separate FTI and view updates – FTI then runs in its own Memory Thread Notes.ini: • Improves performance Update_Fulltext_Thread=1 • Update_Fulltext_Thread=1 FTUPDATE_IDLE_TIME=4 Speaking of Fulltext-Indexing: You can isolate the FTI thread from the limited Domino update pool: ftg_use_sys_memory=1 FTI thread then gets memory from OS pool; relieves Domino system memoryMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  32. 32. AusLUG2012 Database.Database.BufferPool.* Database.Database.BufferPool.PerCentReadsInBuffer = 78,96 BAD < 90% < PercentReadsInBuffer < 98% < PERFECT (99.9% is bad, too!) – Typically leads to too many requests being written to disk – Server needs a larger BufferPool Solution: notes.ini NSF_Buffer_Pool_Size_MB=n (in MB) ─ Default: 512 MBMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  33. 33. AusLUG2012 Database.DbCache.* Database.DbCache.CurrentEntries = 1647 Database.DbCache.HighWaterMark = 1691 Database.DbCache.MaxEntries = 1536 Database.DbCache.OvercrowdingRejections = 0 GOOD = HighWaterMark < MaxEntries GOOD = 0 OvercrowdingRejections Solution: – notes.ini NSF_DbCache_MaxEntries = n • Default: NSF_BUFFER Pool size x 3Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  34. 34. AusLUG2012 Replica.Cluster.* Replica.Cluster.Failed Replica.Cluster.SecondsOnQueue Replica.Cluster.WorkQueueDepth PERFECT < 10 < SecondsOnQueue > 15 > BAD PERFECT < 10 < WorkQueueDepth > 15 > BAD Solution: – Add more cluster replicators – Optimize cluster load (e.g. “manually” balance users across cluster if not load-balance)Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  35. 35. AusLUG2012 Server.Trans.PerMinute Server.Trans.PerMinute=956 Server.Users = 26  956/26=36,7 HEAVY < 30 < Trans.PerMinute (per User) > 10 > LIGHT Solution: – Identify users causing load (db usage view!)Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  36. 36. AusLUG2012 Database.NAMELookupCache* Database.NAMELookupCacheCacheSize = 2.513.328 Database.NAMELookupCacheHits = 24.628.339 Database.NAMELookupCacheMisses = 48.160.502 IMPORTANT: NoHitHits!  Cache too small or too large(!) Miss > Hits: „Doublecheck“ ini:NLCache_Size=16000000Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  37. 37. AusLUG2012 Server.ConcurrentTasks* Server.ConcurrentTasks Server.ConcurrentTasks.Waiting Waiting should be ZERO (0) Solution: ─ Server_Pool_Tasks = n (e.g. 80) ─ Server_Max_Concurrent_Trans = m (e.g. Server_Pool_Tasks * # Ports)Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  38. 38. AusLUG2012 Platform.PagingFile.Total.* Platform.PagingFile.Total.PctUtil = 0,28 Platform.PagingFile.Total.PctUtil.Avg = 0,14 Platform.PagingFile.Total.PctUtil.Peak = 0,8 OK < 0% < PctUtil.Avg > 10% > BAD OS Level tuning, Check Memory Note: If “sh sta” doesn’t show Platform.* stats  Admin-HelpMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  39. 39. AusLUG2012 Agenda Coming up next … Who am I? … and about panagenda Laying the basics of what is actually possible – or: • What Admins and IT departments have to cope with Deep Diving … • The 30 most important server statistics (out of ~2.000) • … and Clients? • … and Groups? • … and Databases?Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  40. 40. AusLUG2012 Sponsor Break – Sneak Peek during Social Evening http://panagenda.com/giftoftransparency • Efficient Client-Analysis is impossible without additional tooling • FREE 4 weeks license of panagenda GreenLight – our server monitoring and reporting solution – includes Database Analyzer for 1 year for one of your servers • FREE one year license of panagenda MarvelClient Analyze • The results speak for themselves on „just“ the clientside • The results can also be used together with GreenLight • For groups and databases, wie also have GroupExplorer and DatabaseExplorer • Whether we may help you is up to youMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  41. 41. AusLUG2012 Timeout Spending 60 minutes on Performance Improvements can be compared to a walk on the tip of the iceberg – we have worked on a MANY more business cases and solved a MANY more problems than those mentioned just now. If your problem was not mentioned in this session – be it a Client, Server, Design, Admin or other challenge: we would love to hear from you.Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  42. 42. AusLUG2012 Thank you for listening – Questions? Answers! Q&AMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  43. 43. AusLUG2012 Contact me – I look forward to hearing from you! panagenda GmbH Doblhoffgasse 7 / 6a :: 1010 Vienna :: Austria Web: http://www.panagenda.com Email: office@panagenda.com Fax: +43 1 89 012 89 – 15Meet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia
  44. 44. AusLUG2012 Ressources / Links • Daniel Nashed, Nash!Com • LS08: BP112 • LS11: BP102, BP110, BP118 • LS12: BP110, BP121, ID112, ID114 • Windows Indexing: http://bit.ly/ACzO6Z • „The internet“ – google „Domino performance ibm“; great IBM Whitepapers and articles, some very good site out thereMeet.Share.Learn 29th & 30th March, Melbourne, Victoria, Australia

×