Recently I needed to identify potential memory leaks in a Windows application running in an AWS environment. CloudWatch metrics were used to be able to store this information over time for business leaders to be able to identify performance over time in our web server application. Below I will explain how I set up CloudWatch on our Windows servers as well as how I configured CloudWatch to monitor specific performance counters that might indicate application memory leaks.
Your EC2 instance must have the AWS Managed role called 'CloudWatchAgentServerPolicy' attached to it. Instructions on how to do that can be found here.
CloudWatch Agent Configuration
Open your File Explorer and navigate to 'C:\ProgramData\Amazon\AmazonCloudWatchAgent\Configs'.
Save the below JSON to a file called 'config.json'.
This will monitor the Chrome process and will put the logs into the CWAgent namespace in us-east-1. Edit agent.region in the config.json for the region in which your EC2 instance is located. For example: us-west-2, eu-west-1, etc.
Copy the 'config.json' from the previous step into 'C:\ProgramData\Amazon\AmazonCloudWatchAgent\Configs'.
Start the CloudWatch Agent Service.
In a few minutes you should start seeing metrics show up in CloudWatch under the CWAgent namespace in us-east-1.
Below I will outline why we are logging specific metrics:
Pages/sec - Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults.
We are logging the memory counter "Pages/sec" and the LogicalDisk counter "Avg. Disk sec/Transfer" because if the product of these counters exceeds 0.1, paging is taking more than 10 percent of disk access time. That can cause problems and can indicate some form of a memory leak.
% Paging File Usage - Displays the percentage of the paging file that is currently in use. If your counter shows that your paging file has reached or is nearing 100% current usage, then your system and applications will not be able to function properly, and your computer will lag and have slow processing speed. You want your paging file to be large enough that, at any given time, only 50% to 75% of it is being used at most, although even lower numbers are preferred.
We are logging "Paging File\% Usage" and "Paging File\%" because if any one of these values increase gradually over time it indicates a memory leak.
Page File Bytes - The amount of data (in bytes) stored in virtual memory which the process has reserved for use in the paging file(s). An increase in the page file over time indicates a memory leak.
Process Pool Paged Bytes - Virtual memory that can be paged in and out of the system.
Process Pool Nonpaged Bytes - Virtual memory addresses that reside in physical memory as long as the corresponding kernel objects are allocated.
If we do have a memory leak we can identify which process (and when) by monitoring the following counters "Process\Page File Bytes", "Process\Pool Nonpaged Bytes", "Process\Pool Paged Bytes", "Process\Private Bytes", and "Process\Thread Count".
cpu_time - The amount of time that the process uses the CPU. This metric is measured in hundredths of a second. memory_rss - The amount of real memory (resident set) that the process is using. memory_vms- The amount of virtual memory that the process is using.