SlideShare a Scribd company logo
Read the full guide at: http://www.datadoghq.com/blog/monitoring-windows-server/
g the Blinds: Monitoring Windows Server
• SaaS based infrastructure and app monitoring
• Open Source Agent
• Time series data (metrics and events)
• Processing nearly a trillion data points per day
• Intelligent Alerting and Insightful Dashboards
Datadog Overview
Operating Systems, Cloud Providers (AWS), Containers, Web Servers, Datastores,
Caches, Queues and more...
Monitor Everything
Agenda
- Why should I monitor Windows Server?
- What are some indicators of performance
issues?
- How can I collect performance metrics for
analysis?
What to monitor?
CPU metrics
- PercentProcessorTime
- ContextSwitchesPersec
- ProcessorQueueLength
- DPCsQueuedPersec
- PercentPrivilegedTime
- PercentDPCTime
- PercentInterruptTime
CPU: ContextSwitchesPersec
What it tracks:
Number of times the processor switched to a new thread
Correlate with:
Memory: PageFaultsPersec
Disk: DiskTransfersPersec
Network: BytesSentPersec/BytesReceivedPersec
Issue resolution:
Adding processors, thread partitioning, DPC partitioning,
hardware interrupt partitioning, disable I/O counters
CPU: PercentProcessorTime
What it tracks:
Percentage of time spent performing work (not idle)
Correlate with:
ProcessorQueueLength
Issue resolution:
More processors, bigger instance, optimize offending application,
CPU: ProcessorQueueLength
What it tracks:
Size of processor queue
Correlate with:
CPU: PercentProcessorTime, PercentPrivilegedTime, PercentDPCTime, PercentInterruptTime
Issue resolution:
Adding processors, thread partitioning, DPC partitioning,
hardware interrupt partitioning, disable I/O counters
CPU:DPCsQueuedPersec
What it tracks:
Deferred procedure call (DPC) enqueue rate
Correlate with:
CPU: PercentDPCTime
Disk: DiskTransfersPersec
Network: BytesSentPersec/BytesReceivedPersec
Issue resolution:
Remove buggy device, rollback driver
CPU: PercentPrivilegedTime/PercentDPCTime
PercentInterruptTime
What they track:
Percentage of time CPU spent in privileged mode/deferred procedure
calls/interrupts
Correlate with:
ContextSwitchesPersec/PercentPrivilegedTime/PercentDPCTime PercentInterruptTime
Issue resolution:
Adding processors, thread partitioning, DPC partitioning,
hardware interrupt partitioning, disable I/O counters
Memory metrics
- PoolNonpagedBytes
- PageFaultsPersec
- PagesInputPersec
Memory: PoolNonpagedBytes
What it tracks:
Amount of non-paged memory in use
Correlate with:
Windows Event 2019 “Nonpaged Memory Pool Empty”
Issue resolution:
Identify troublesome driver/roll back to known good state
What it tracks:
Rate of page faults
Correlate with:
PagesInputPersec
Issue resolution:
Increase system memory
Memory: PageFaultsPersec
What it tracks:
Rate pages are read (from disk) into memory
Correlate with:
PageFaultsPersec/ DiskTransfersPersec
Issue resolution:
Increase system memory, move page file to separate physical disk
Memory: PagesInputPersec
- AvgDiskQueueLength
- DiskTransfersPersec
- PercentIdleTime
Disk Metrics
Disk: AvgDiskQueueLength
What it tracks:
Running average of I/O ops in queue
Correlate with:
DiskTransfersPersec
Issue resolution:
Move data for I/O-intensive applications to separate disk; add disks to syste
Disk: DiskTransfersPersec
What it tracks:
Aggregate I/O rate
Correlate with:
AvgDiskQueueLength
Issue resolution:
Move data for I/O-intensive applications to separate disk; add disks to
system; increase disk cache
Disk: PercentIdleTime
What it tracks:
Percent of time disk is idle
Correlate with:
AvgDiskQueueLength
Issue resolution:
Move page file to separate disk; add disks to system; use SSDs
Tooling
Word of Warning
Powershell
- Windows’ scripting language (no more batch files!)
- Powerful language with deep OS support
- Integrates with C# natively
- Output is typed (unlike *NIX)
Powershell
Powershell
Perfmon
Windows Performance Toolkit
Requires Windows
Assessment and
Deployment Kit (formerly
Windows Performance
Toolkit)
https://www.microsoft.com
/en-
US/download/details.aspx
?id=39982
Windows Performance Recorder
Questions?
Evan Mouzakitis
Research Engineer
Twitter: @vagelim
Email: evan@datadoghq.com
Read the full guide at: http://www.datadoghq.com/blog/monitoring-windows-server/

More Related Content

What's hot

Devops as a service
Devops as a serviceDevops as a service
Devops as a service
Saravanan Subburayal
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
Datadog
 
Frappé Framework - A Full Stack Web Framework
Frappé Framework - A Full Stack Web FrameworkFrappé Framework - A Full Stack Web Framework
Frappé Framework - A Full Stack Web Framework
rushabh_mehta
 
Meetup OpenTelemetry Intro
Meetup OpenTelemetry IntroMeetup OpenTelemetry Intro
Meetup OpenTelemetry Intro
DimitrisFinas1
 
Relational Database CI/CD
Relational Database CI/CDRelational Database CI/CD
Relational Database CI/CD
Jasmin Fluri
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
REX: Cloud Native Apps on a K8S stack
REX: Cloud Native Apps on a K8S stackREX: Cloud Native Apps on a K8S stack
REX: Cloud Native Apps on a K8S stack
Mathieu Herbert
 
Keep CALMS and DevSecOps
Keep CALMS and DevSecOps Keep CALMS and DevSecOps
Keep CALMS and DevSecOps
Luciano Moreira da Cruz
 
Demystifying observability
Demystifying observability Demystifying observability
Demystifying observability
Abigail Bangser
 
Debugging Your Debugging Tools: What to do When Your Service Mesh Goes Down
Debugging Your Debugging Tools: What to do When Your Service Mesh Goes DownDebugging Your Debugging Tools: What to do When Your Service Mesh Goes Down
Debugging Your Debugging Tools: What to do When Your Service Mesh Goes Down
Aspen Mesh
 
Kubeflow
KubeflowKubeflow
Kubeflow
Karane Vieira
 
Introduction to Distributed Tracing
Introduction to Distributed TracingIntroduction to Distributed Tracing
Introduction to Distributed Tracing
petabridge
 
DevSecOps - The big picture
DevSecOps - The big pictureDevSecOps - The big picture
DevSecOps - The big picture
Stefan Streichsbier
 
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
Simplilearn
 
DevOps and Tools
DevOps and ToolsDevOps and Tools
DevOps and Tools
Mohammed Fazuluddin
 
Google cloud study jam 2019 #cloud studyjam
Google cloud study jam 2019 #cloud studyjamGoogle cloud study jam 2019 #cloud studyjam
Google cloud study jam 2019 #cloud studyjam
Wessam ElSharawy
 
AzureOpenAI.pptx
AzureOpenAI.pptxAzureOpenAI.pptx
AzureOpenAI.pptx
Udaiappa Ramachandran
 
What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...
What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...
What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...
Edureka!
 
OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)
Sebastian Poxhofer
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2
Chandresh Pancholi
 

What's hot (20)

Devops as a service
Devops as a serviceDevops as a service
Devops as a service
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
 
Frappé Framework - A Full Stack Web Framework
Frappé Framework - A Full Stack Web FrameworkFrappé Framework - A Full Stack Web Framework
Frappé Framework - A Full Stack Web Framework
 
Meetup OpenTelemetry Intro
Meetup OpenTelemetry IntroMeetup OpenTelemetry Intro
Meetup OpenTelemetry Intro
 
Relational Database CI/CD
Relational Database CI/CDRelational Database CI/CD
Relational Database CI/CD
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
REX: Cloud Native Apps on a K8S stack
REX: Cloud Native Apps on a K8S stackREX: Cloud Native Apps on a K8S stack
REX: Cloud Native Apps on a K8S stack
 
Keep CALMS and DevSecOps
Keep CALMS and DevSecOps Keep CALMS and DevSecOps
Keep CALMS and DevSecOps
 
Demystifying observability
Demystifying observability Demystifying observability
Demystifying observability
 
Debugging Your Debugging Tools: What to do When Your Service Mesh Goes Down
Debugging Your Debugging Tools: What to do When Your Service Mesh Goes DownDebugging Your Debugging Tools: What to do When Your Service Mesh Goes Down
Debugging Your Debugging Tools: What to do When Your Service Mesh Goes Down
 
Kubeflow
KubeflowKubeflow
Kubeflow
 
Introduction to Distributed Tracing
Introduction to Distributed TracingIntroduction to Distributed Tracing
Introduction to Distributed Tracing
 
DevSecOps - The big picture
DevSecOps - The big pictureDevSecOps - The big picture
DevSecOps - The big picture
 
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
 
DevOps and Tools
DevOps and ToolsDevOps and Tools
DevOps and Tools
 
Google cloud study jam 2019 #cloud studyjam
Google cloud study jam 2019 #cloud studyjamGoogle cloud study jam 2019 #cloud studyjam
Google cloud study jam 2019 #cloud studyjam
 
AzureOpenAI.pptx
AzureOpenAI.pptxAzureOpenAI.pptx
AzureOpenAI.pptx
 
What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...
What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...
What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...
 
OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2
 

Viewers also liked

Scaling monitoring with Datadog
Scaling monitoring with DatadogScaling monitoring with Datadog
Scaling monitoring with Datadogalexismidon
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
Datadog
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
Mukta Aphale
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at Scale
Datadog
 
Why Visibility into Your Stack Matters
Why Visibility into Your Stack MattersWhy Visibility into Your Stack Matters
Why Visibility into Your Stack Matters
Amazon Web Services
 
Datadog- Monitoring In Motion
Datadog- Monitoring In Motion Datadog- Monitoring In Motion
Datadog- Monitoring In Motion
Cloud Native Apps SF
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps Webinar
Datadog
 
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All SlidesCloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
C4Media
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoring
Rohit Jnagal
 
20161108 datadog and_sushi
20161108 datadog and_sushi20161108 datadog and_sushi
20161108 datadog and_sushi
Masahiro Hattori
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Sylvain Kalache
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Redis Labs
 
Deep-Dive to Application Insights
Deep-Dive to Application Insights Deep-Dive to Application Insights
Deep-Dive to Application Insights
Gunnar Peipman
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016
Matthew Broberg
 
Sysdig Monitorama Slides
Sysdig Monitorama SlidesSysdig Monitorama Slides
Sysdig Monitorama Slides
Loris Degioanni
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
Amazon Web Services
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a Service
LN Renganarayana
 
'The History of Metrics According to me' by Stephen Day
'The History of Metrics According to me' by Stephen Day'The History of Metrics According to me' by Stephen Day
'The History of Metrics According to me' by Stephen Day
Docker, Inc.
 

Viewers also liked (20)

Scaling monitoring with Datadog
Scaling monitoring with DatadogScaling monitoring with Datadog
Scaling monitoring with Datadog
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at Scale
 
Why Visibility into Your Stack Matters
Why Visibility into Your Stack MattersWhy Visibility into Your Stack Matters
Why Visibility into Your Stack Matters
 
Datadog- Monitoring In Motion
Datadog- Monitoring In Motion Datadog- Monitoring In Motion
Datadog- Monitoring In Motion
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps Webinar
 
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All SlidesCloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoring
 
20161108 datadog and_sushi
20161108 datadog and_sushi20161108 datadog and_sushi
20161108 datadog and_sushi
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Data Logging and Telemetry
Data Logging and TelemetryData Logging and Telemetry
Data Logging and Telemetry
 
Deep-Dive to Application Insights
Deep-Dive to Application Insights Deep-Dive to Application Insights
Deep-Dive to Application Insights
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016
 
Sysdig Monitorama Slides
Sysdig Monitorama SlidesSysdig Monitorama Slides
Sysdig Monitorama Slides
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a Service
 
'The History of Metrics According to me' by Stephen Day
'The History of Metrics According to me' by Stephen Day'The History of Metrics According to me' by Stephen Day
'The History of Metrics According to me' by Stephen Day
 

Similar to Lifting the Blinds: Monitoring Windows Server 2012

Perfmon And Profiler 101
Perfmon And Profiler 101Perfmon And Profiler 101
Perfmon And Profiler 101
Quest Software
 
SharePoint 2013 Performance and Capacity Management
SharePoint 2013 Performance and Capacity Management SharePoint 2013 Performance and Capacity Management
SharePoint 2013 Performance and Capacity Management
jems7
 
Web Performance Part 3 "Server-side tips"
Web Performance Part 3  "Server-side tips"Web Performance Part 3  "Server-side tips"
Web Performance Part 3 "Server-side tips"
Binary Studio
 
Testing pc’s performance lf
Testing pc’s performance lfTesting pc’s performance lf
Testing pc’s performance lf
iteclearners
 
Ch14.run time support systems
Ch14.run time support systemsCh14.run time support systems
Ch14.run time support systemsYi-Jun Zheng
 
#SUGCON 2015 Sitecore Monitoring
#SUGCON 2015 Sitecore Monitoring#SUGCON 2015 Sitecore Monitoring
#SUGCON 2015 Sitecore Monitoring
chriswoj
 
Optimization In Mobile Systems
Optimization In Mobile SystemsOptimization In Mobile Systems
Optimization In Mobile Systemsmomobangalore
 
Sql Server Performance Tuning
Sql Server Performance TuningSql Server Performance Tuning
Sql Server Performance Tuning
Bala Subra
 
Big data meet_up_08042016
Big data meet_up_08042016Big data meet_up_08042016
Big data meet_up_08042016
Mark Smith
 
Testing pc’s performance
Testing pc’s performanceTesting pc’s performance
Testing pc’s performance
iteclearners
 
Windows Internal - Ch9 memory management
Windows Internal - Ch9 memory managementWindows Internal - Ch9 memory management
Windows Internal - Ch9 memory management
Kent Huang
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developersBiju Nair
 
Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Introductiontoasp netwindbgdebugging-100506045407-phpapp01Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Introductiontoasp netwindbgdebugging-100506045407-phpapp01Camilo Alvarez Rivera
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Sql server troubleshooting
Sql server troubleshootingSql server troubleshooting
Sql server troubleshooting
Nathan Winters
 
How Data Instant Replay and Data Progression Work Together
How Data Instant Replay and Data Progression Work TogetherHow Data Instant Replay and Data Progression Work Together
How Data Instant Replay and Data Progression Work Together
Compellent Technologies
 
16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptx16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptx
MyName1sJeff
 
Application Performance Lecture
Application Performance LectureApplication Performance Lecture
Application Performance LectureVishwanath Ramdas
 

Similar to Lifting the Blinds: Monitoring Windows Server 2012 (20)

Perfmon And Profiler 101
Perfmon And Profiler 101Perfmon And Profiler 101
Perfmon And Profiler 101
 
SharePoint 2013 Performance and Capacity Management
SharePoint 2013 Performance and Capacity Management SharePoint 2013 Performance and Capacity Management
SharePoint 2013 Performance and Capacity Management
 
Web Performance Part 3 "Server-side tips"
Web Performance Part 3  "Server-side tips"Web Performance Part 3  "Server-side tips"
Web Performance Part 3 "Server-side tips"
 
Testing pc’s performance lf
Testing pc’s performance lfTesting pc’s performance lf
Testing pc’s performance lf
 
Ch14.run time support systems
Ch14.run time support systemsCh14.run time support systems
Ch14.run time support systems
 
Performance Whackamole (short version)
Performance Whackamole (short version)Performance Whackamole (short version)
Performance Whackamole (short version)
 
#SUGCON 2015 Sitecore Monitoring
#SUGCON 2015 Sitecore Monitoring#SUGCON 2015 Sitecore Monitoring
#SUGCON 2015 Sitecore Monitoring
 
Optimization In Mobile Systems
Optimization In Mobile SystemsOptimization In Mobile Systems
Optimization In Mobile Systems
 
Sql Server Performance Tuning
Sql Server Performance TuningSql Server Performance Tuning
Sql Server Performance Tuning
 
Big data meet_up_08042016
Big data meet_up_08042016Big data meet_up_08042016
Big data meet_up_08042016
 
SQL 2005 Disk IO Performance
SQL 2005 Disk IO PerformanceSQL 2005 Disk IO Performance
SQL 2005 Disk IO Performance
 
Testing pc’s performance
Testing pc’s performanceTesting pc’s performance
Testing pc’s performance
 
Windows Internal - Ch9 memory management
Windows Internal - Ch9 memory managementWindows Internal - Ch9 memory management
Windows Internal - Ch9 memory management
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 
Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Introductiontoasp netwindbgdebugging-100506045407-phpapp01Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Introductiontoasp netwindbgdebugging-100506045407-phpapp01
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Sql server troubleshooting
Sql server troubleshootingSql server troubleshooting
Sql server troubleshooting
 
How Data Instant Replay and Data Progression Work Together
How Data Instant Replay and Data Progression Work TogetherHow Data Instant Replay and Data Progression Work Together
How Data Instant Replay and Data Progression Work Together
 
16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptx16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptx
 
Application Performance Lecture
Application Performance LectureApplication Performance Lecture
Application Performance Lecture
 

More from Datadog

What it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service ProviderWhat it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service Provider
Datadog
 
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Datadog
 
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
Datadog
 
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Datadog
 
Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015
Datadog
 
Treating Infrastructure as Garbage
Treating Infrastructure as GarbageTreating Infrastructure as Garbage
Treating Infrastructure as Garbage
Datadog
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of Webops
Datadog
 
The Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLThe Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLDatadog
 
Big (IT) data
Big (IT) dataBig (IT) data
Big (IT) data
Datadog
 
Deep dive into Nagios analytics
Deep dive into Nagios analyticsDeep dive into Nagios analytics
Deep dive into Nagios analytics
Datadog
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developers
Datadog
 
Customer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer supportCustomer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer support
Datadog
 
I <3 graphs in 20 slides
I <3 graphs in 20 slidesI <3 graphs in 20 slides
I <3 graphs in 20 slides
Datadog
 
Effective monitoring with StatsD
Effective monitoring with StatsDEffective monitoring with StatsD
Effective monitoring with StatsD
Datadog
 
Alerting: more signal, less noise, less pain
Alerting: more signal, less noise, less painAlerting: more signal, less noise, less pain
Alerting: more signal, less noise, less pain
Datadog
 
Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoringDatadog
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based Monitoring
Datadog
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-to
Datadog
 
What’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike FiedlerWhat’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike Fiedler
Datadog
 
I Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-QuôcI Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-Quôc
Datadog
 

More from Datadog (20)

What it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service ProviderWhat it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service Provider
 
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
 
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
 
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
 
Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015
 
Treating Infrastructure as Garbage
Treating Infrastructure as GarbageTreating Infrastructure as Garbage
Treating Infrastructure as Garbage
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of Webops
 
The Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLThe Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQL
 
Big (IT) data
Big (IT) dataBig (IT) data
Big (IT) data
 
Deep dive into Nagios analytics
Deep dive into Nagios analyticsDeep dive into Nagios analytics
Deep dive into Nagios analytics
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developers
 
Customer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer supportCustomer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer support
 
I <3 graphs in 20 slides
I <3 graphs in 20 slidesI <3 graphs in 20 slides
I <3 graphs in 20 slides
 
Effective monitoring with StatsD
Effective monitoring with StatsDEffective monitoring with StatsD
Effective monitoring with StatsD
 
Alerting: more signal, less noise, less pain
Alerting: more signal, less noise, less painAlerting: more signal, less noise, less pain
Alerting: more signal, less noise, less pain
 
Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoring
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based Monitoring
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-to
 
What’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike FiedlerWhat’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike Fiedler
 
I Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-QuôcI Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-Quôc
 

Recently uploaded

AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Yara Milbes
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 

Recently uploaded (20)

AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 

Lifting the Blinds: Monitoring Windows Server 2012

  • 1. Read the full guide at: http://www.datadoghq.com/blog/monitoring-windows-server/ g the Blinds: Monitoring Windows Server
  • 2. • SaaS based infrastructure and app monitoring • Open Source Agent • Time series data (metrics and events) • Processing nearly a trillion data points per day • Intelligent Alerting and Insightful Dashboards Datadog Overview
  • 3. Operating Systems, Cloud Providers (AWS), Containers, Web Servers, Datastores, Caches, Queues and more... Monitor Everything
  • 4. Agenda - Why should I monitor Windows Server? - What are some indicators of performance issues? - How can I collect performance metrics for analysis?
  • 5.
  • 7.
  • 8. CPU metrics - PercentProcessorTime - ContextSwitchesPersec - ProcessorQueueLength - DPCsQueuedPersec - PercentPrivilegedTime - PercentDPCTime - PercentInterruptTime
  • 9. CPU: ContextSwitchesPersec What it tracks: Number of times the processor switched to a new thread Correlate with: Memory: PageFaultsPersec Disk: DiskTransfersPersec Network: BytesSentPersec/BytesReceivedPersec Issue resolution: Adding processors, thread partitioning, DPC partitioning, hardware interrupt partitioning, disable I/O counters
  • 10. CPU: PercentProcessorTime What it tracks: Percentage of time spent performing work (not idle) Correlate with: ProcessorQueueLength Issue resolution: More processors, bigger instance, optimize offending application,
  • 11. CPU: ProcessorQueueLength What it tracks: Size of processor queue Correlate with: CPU: PercentProcessorTime, PercentPrivilegedTime, PercentDPCTime, PercentInterruptTime Issue resolution: Adding processors, thread partitioning, DPC partitioning, hardware interrupt partitioning, disable I/O counters
  • 12. CPU:DPCsQueuedPersec What it tracks: Deferred procedure call (DPC) enqueue rate Correlate with: CPU: PercentDPCTime Disk: DiskTransfersPersec Network: BytesSentPersec/BytesReceivedPersec Issue resolution: Remove buggy device, rollback driver
  • 13. CPU: PercentPrivilegedTime/PercentDPCTime PercentInterruptTime What they track: Percentage of time CPU spent in privileged mode/deferred procedure calls/interrupts Correlate with: ContextSwitchesPersec/PercentPrivilegedTime/PercentDPCTime PercentInterruptTime Issue resolution: Adding processors, thread partitioning, DPC partitioning, hardware interrupt partitioning, disable I/O counters
  • 14. Memory metrics - PoolNonpagedBytes - PageFaultsPersec - PagesInputPersec
  • 15. Memory: PoolNonpagedBytes What it tracks: Amount of non-paged memory in use Correlate with: Windows Event 2019 “Nonpaged Memory Pool Empty” Issue resolution: Identify troublesome driver/roll back to known good state
  • 16. What it tracks: Rate of page faults Correlate with: PagesInputPersec Issue resolution: Increase system memory Memory: PageFaultsPersec
  • 17. What it tracks: Rate pages are read (from disk) into memory Correlate with: PageFaultsPersec/ DiskTransfersPersec Issue resolution: Increase system memory, move page file to separate physical disk Memory: PagesInputPersec
  • 18. - AvgDiskQueueLength - DiskTransfersPersec - PercentIdleTime Disk Metrics
  • 19. Disk: AvgDiskQueueLength What it tracks: Running average of I/O ops in queue Correlate with: DiskTransfersPersec Issue resolution: Move data for I/O-intensive applications to separate disk; add disks to syste
  • 20. Disk: DiskTransfersPersec What it tracks: Aggregate I/O rate Correlate with: AvgDiskQueueLength Issue resolution: Move data for I/O-intensive applications to separate disk; add disks to system; increase disk cache
  • 21. Disk: PercentIdleTime What it tracks: Percent of time disk is idle Correlate with: AvgDiskQueueLength Issue resolution: Move page file to separate disk; add disks to system; use SSDs
  • 24. Powershell - Windows’ scripting language (no more batch files!) - Powerful language with deep OS support - Integrates with C# natively - Output is typed (unlike *NIX)
  • 28. Windows Performance Toolkit Requires Windows Assessment and Deployment Kit (formerly Windows Performance Toolkit) https://www.microsoft.com /en- US/download/details.aspx ?id=39982
  • 30. Questions? Evan Mouzakitis Research Engineer Twitter: @vagelim Email: evan@datadoghq.com Read the full guide at: http://www.datadoghq.com/blog/monitoring-windows-server/

Editor's Notes

  1. Our goal is to help you monitor everything from all levels of your stack so that you can make intelligent data based decisions about your applications and infrastructure.
  2. Why monitor Windows in the first place? Monitoring the performance of the applications that run your business is critical; but applications don’t live in a vacuum. Applications interact with the underlying operating system often to, request resources, preempt the execution of other processes, access hardware devices, and more. Being aware of the health and performance of the operating system gives you more information when troubleshooting issues anywhere higher up in the stack (not to mention that monitoring the operating system is critical for insight into hardware issues). For example, is a SQL Server database query slow because of the query itself, or because the SQL Server is also hosted alongside Exchange and they are competing for disk access? These kinds of issues can only be surfaced when you monitor both the application in question and the underlying operating system.
  3. A monitoring plan typically tries to cover Work metrics, Resource metrics, and non-metric data like events or code changes. As the broker between applications and hardware resources, when monitoring Windows server we are primarily focused on resource metrics, because that is what the operating system is managing. Work metrics are usually more applicable to application-level monitoring, but as you will see there are a few work metrics related to disk access that we’ll cover here too.
  4. What kind of resources are we interested in monitoring? What kinds of metrics can we surface from those resources? Generally speaking, the most useful resources to monitor are CPU, RAM, disk, and network. Things like power consumption, thermal monitoring, noise and data of a similar nature, while useful, don’t usually add meaningful context to application or operating system performance issues.
  5. At the highest level, the following metrics are useful in assessing CPU performance, and can shed light on performance bottlenecks depending on what the kind of work the CPU spends most of its time performing.
  6. ContextSwitchesPersec tracks the number of times the processor switched to a new execution context. Context switches are computationally expensive; before the processor can enter the execution context of another thread, it must first save the current context, push the old context to the bottom of its priority queue, find the highest priority queue containing an executable thread, pop it from its queue, load its context, and finally execute the thread. In a multi-core machine (common today), context switching add significant overhead. By default, the Windows Task manager measures I/O per-process, and attributing I/O to a particular process in a multi-core multithreaded environment can have a drastic performance impact under heavy I/O loads. If that’s the case, you would benefit from disabling global and per-process I/O counters by adding a CountOperations entry as a REG_DWORD with a value of 0 to the registry under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\I/O System\
  7. PercentProcessorTime is a metric most everyone is familiar with, even if they don’t know the name. It tracks the percentage of time the CPU was doing something. In and of itself, this metric isn’t all that useful. For example, if I’m analyzing data on a single core machine, I’d expect the CPU to in use 100% of the time. However, when correlated with ProcessorQueueLength, which tracks the number of pending threads, you have enough information to determine whether or not the system is suffering a CPU bottleneck. A queue length greater than 2 * the number of processors, coupled with prolonged periods of maxed out CPU utilization very clearly indicate that the system does not have enough processor resources to perform all of its tasks.
  8. The processor queue length is a value which reflects the number of threads that are ready to run, but are not able to use the processor. A healthy measure of processor queue length is about 2 * the number of processors on the system. Even on multicore machines, there is only one processorqueuelength performance counter. High values for this counter very clearly indicate CPU contention. You can correlate this metric with other CPU metrics like PercentProcessorTime, PercentPrivilegedTime, PercentDPCTime, PercentInterruptTime to determine where the CPU is spending its time, and to narrow down if the CPU is the bottleneck causing backed up queue.
  9. Hardware requirements demand real-time, unfettered access to the CPU in order to ensure that high-priority work (like accepting keyboard input) is performed when it is needed. Interrupts provide a means by which devices can interrupt the processor and force it to perform the requested operation (triggering the processor to perform a context switch). Some work from devices may be put off until later, but still must be accomplished in a timely manner. Enter DPCs. Through DPCs, real-time processes like device drivers can schedule lower-priority tasks to be completed after higher-priority interrupts are handled. DPCs are created by the kernel, and can only be called by kernel mode programs. A large or near-constant number of DPCs could point to issues with low-level system software. An unused but buggy sound driver could be the culprit, for example.
  10. This trio of metrics, taken together, help to shed light on where the CPU is spending its time. In particular, privileged time reflects the time spent executing instructions for kernel-mode programs. Code executing in privileged mode have unrestricted access to the system’s hardware. This includes device drivers, core operating system functions, etc. If you observe a system spending 30 percent or more of its time processing privileged instructions, check the values of PercentDPCTime and PercentInterruptTime. If either of those two metrics report values greater than 20%, it is likely that a poorly written device driver, or very busy peripheral is the culprit.
  11. As with CPU metrics, Windows exposes a wealth of performance counters tracking memory statistics. We’ve omitted AvailableMemory and similar metrics from this webinar because they are pretty self-explanatory. The three listed here, PageFaultsPersec, PoolNonpagedBytes, and PagesInputPersec provide insight into the nature of issues which may be impacting performance. We’ll touch on each in turn, but at a high level, PageFaultsPersec tracks the rate of page faults, PoolNonpagedBytes describes the current size of non-pageable memory, and the last, PagesInputPersec, describes the rate of pages read from disk (which is distinct from the number of page reads from disk).
  12. Windows maintains two general pools of memory: a paged pool and non paged pool. The paged pool is for general use and is the pool used by all user space applications for memory allocation. Because user space applications are more tolerant to latency, or, to put it another way, because user space applications don’t generally have real-time requirements, they can get by if the requested memory needs to be read in (or paged in) from disk. Because kernel-level software has real-time execution requirements, device drivers and the like make use of the non paged pool. The non paged pool is guaranteed to reside in physical memory at all times, with no possibility of being paged to disk (hence the name “non paged”). This significantly reduces latency by preventing the possibility of page faults. No memory pool is infinite, and poorly written device drivers could end up exhausting the entire non paged pool if left unchecked. If you are seeing reports of Event 2019, it’s already too late. But keeping an eye on the size of this pool and its growth over time are necessary to identify and deal with any troublesome drivers or hardware.
  13. Page faults occur when a thread references a page that is not in the current set of memory-resident pages. Because the thread can’t perform its work without the requested memory, a hardware interrupt occurs, the processor enters into kernel-mode (resulting in a context switch—both upon entering and exiting kernel-mode), and attempts to locate the page in memory. If the page is found somewhere else in memory, it is that address which is returned to the requesting thread. This is called a “soft” page fault. If the page is not elsewhere in memory the kernel will look in the page file and read it into memory. This is called a “hard” page fault. Because this operation requires accessing the disk, it is more computationally expensive to perform this type of lookup. Page faults occur under normal operating conditions, but a spike in page faults could result in serious performance degradation, depending on the “hardness” of the fault. By tracking the page fault rate alongside the page input rate, you can differentiate between hard and soft page faults. High values of both metrics unequivocally indicate hard page faults. There’s not much you can do to prevent soft page faults from occurring, but increasing the amount of RAM available on the system is a straightforward way of alleviating hard page faults. It is worth mentioning that when a hard page fault does occur, Windows attempts to retrieve multiple, contiguous pages into memory, to maximize the work performed by each read. This, in turn, can potentially increase a page fault’s performance impact, as more disk bandwidth is consumed reading in potentially unneeded pages. All of this can potentially be avoided by putting your page file (see next section) on a separate physical (not logical) disk, or increasing the amount of RAM available to your system.
  14. As I mentioned, there are two types of page faults, and tracking PagesInputPersec alongside PageFaultsPersec gives you the information you need to determine the type of page fault occurring. If you are seeing high values of both metrics, the page faults are hard. The effects of hard page faults can be exacerbated if disk is a contentious resource. To give a simplified example, if your have a system with one disk and it’s running an I/O intensive application, page faults will hit this system harder (and performance will degrade in the application) because Windows is competing with the application for disk access (and Windows always wins). This goes to show that an excessive number of page faults can be responsible for system wide effects, completely unrelated to the application experiencing performance degradation.
  15. Though there are many disk metrics worth tracking, I’ve distilled the list to the most essential, while omitting the obvious, like PercentFreeSpace.
  16. The AvgDiskQueueLength counter gives an estimated average of the number of I/O operations currently awaiting execution. Generally speaking, this counter should not exceed 2 * the number of drives on the system. If you are seeing greater values than that, it means the system cannot service the number of I/O requests it’s receiving in a timely manner, which can lead to processing delays, degraded application performance, and more.
  17. DiskTransfersPersec is an aggregate measure of both disk reads and writes. It is useful for shedding light on the cause of bottlenecks. High values for this metric do not always indicate issues; for example if you are running I/O intensive applications on your server you are definitely going to observe high values for this metric (and most likely for PercentIdleTime as well). However, if I/O ops are not being enqueued (per the AvgDiskQueueLength metric) and applications are not hurting for memory (and thus paging to disk), there should be no observable performance impact.
  18. PercentIdleTime is a pretty intuitive metric that tracks the percent of time disks are idle. Depending on the role of the system under investigation, low idle times may be expected, especially for when running I/O intensive applications like SQL Server or Exchange. If that’s not the case, low values should be investigated. If you don’t already have your page file stored on a separate drive, you should do so. Otherwise, consider either adding disks to the system to increase performance, or swap out HDDs for SSDs if possible.
  19. Windows offers numerous methods by which you can collect, store, and visualize system performance data. Because the methods are so varied, I will only go through a couple of the tools that I have experience with. All of the tools mentioned are native to Windows Server 2012 R2 so you can get up and running quickly.
  20. Reading performance counters does not generally appear to have much of an impact on system performance. In my tests, collecting 2631 counters with 1-second sample rate caused a 4 percent increase in user CPU usage (by perfmon). There are a few things to keep in mind, though: depending on the data collected and the duration of the collection, the collected data could be very large. To give you an idea about the size of the data collected, in a test collecting handle and kernel base events, pagefaults, cpu, I/O and memory samples, the data grew at a rate approaching 100 MB/min. Additionally, if you are collecting data from your local machine, you may see occasional spikes in I/O latency; in my tests I observed response times for some user space applications in excess of 2000 ms! Also, I did not attempt to collect performance counters from user applications which may have an impact on the application’s performance. And as I mentioned earlier in the CPU section, if you are sampling I/O with processor-specific information, you most certainly will observe degradation in performance.
  21. Powershell is great for collecting performance counters programmatically. You can query the event log from powershell as well. You can use powershell to collect metrics from local and remote machines.
  22. Here are some example powershell commands for retrieving CPU-related performance counters. As you can see, there is a regular pattern. For a full list of commands to retrieve performance counters for CPU, memory, disk, network, and events, check out my “How to collect Windows Server 2012 metrics” article on the datadog blog. https://www.datadoghq.com/blog/collect-windows-server-2012-metrics/#toc-powershell
  23. Last thing about powershell, if you want to do something in powershell and there’s no pre-packaged cmdlet to get you what you want, you can always interface with WMI to get what you’re looking for.
  24. In my honest opinion, perfmon is not nearly as useful as xperf or Windows Performance Recorder when it comes to investigating performance issues. It is a good tool to help spot issues, but not so good for getting into the nitty gritty. Here’s a screenshot of perfmon collecting “System Performance counters” a counter set provided out of the box. As you can see, there is a lot going on. My investigation was focusing on the cause of excessive memory use, visualized as the black bar nearly pinned to the 100 mark. From this image it’s clear that something is going on, but since I was only collecting the Total memory usage (as opposed to collection per-process), it isn’t clear which process is exhausting RAM. To determine the underlying cause in this case requires me to re-run perfmon, this time collecting per-process counters in addition to the total, and hoping that my issue arises again. As you’re about to see, we can do better.
  25. The Windows performance toolkit contains the Windows Performance Recorder & Windows Performance Analyzer (WPA). Though technically not strictly “native” since it requires a download, it is a useful, graphical tool for collecting and analyzing windows performance data and is made by Microsoft.
  26. Windows performance recorder is a modern replacement for xperf. It features both graphical and command line interfaces. Here you can see the available collection profiles. Collecting data with the Windows Performance Recorder is as easy as clicking “Start”. Technically, Windows Performance Recorder (and xperf) do not merely collect performance counters; they are a tracing mechanism for collecting fine-grained performance data. As you will see, traces are superior to performance counters when investigating performance issues.