Home CloudWatch Monitor EC2 Linux services and set alarms
Post
Cancel

CloudWatch Monitor EC2 Linux services and set alarms

Prerequisite

Set up a service to monitor

At the moment CloudWatch does not monitor services as we would normally expect a monitoring system to do with a agent, is it up or down. So we need to create the metric ourselves.

Systemctl can help us there with:

1
systemctl -q is-active httpd.service

is-active PATTERN Check whether any of the specified units are active (i.e. running). Returns an exit code 0 if at least one is active, or non-zero otherwise. Unless –quiet is specified, this will also print the current unit state to standard output.

Using && echo "Running:1" || echo "Warning:0"section is using the OR operator, so echo either 1 if systemctl command returns its exit code, the service is up echo “1”‘ OR ** ** …echo “0”, which indicates the service is down.

Adding the metric to a log file

Now we want the metric going to a logfile to pull into CloudWatch. In this example I’m using the apache package htpd. Change the log file folder and .log file name to what ever service you are wanting to monitor.

1
mkdir -p /var/log/services/ && touch /var/log/services/httpd_service.log

Now the command will look like this

Remember to change the service folder and log name to what service you are using

1
systemctl -q is-active httpd.service  && echo "Running:1" > /var/log/services/httpd_service.log || echo "Warning:0" > /var/log/services/httpd_service.log

So let check, we should either have a 1 (Running) or 0 (Warning)

1
cat /var/log/services/httpd_service.log

If nothing is showing, you need to go back and figure out if the path is bad or the directory name is there

Adding this to CRON

Now we have data going to the log file, we need to automate it with CRON.

As root

1
crontab -e

and add the following:

1
*/1 * * * * systemctl -q is-active httpd.service  && echo "Running:1" >> /var/log/services/httpd_service.log || echo "Warning:0" >> /var/log/services/httpd_service.log

This will run every minute for testing purposes, but feel free to adjust it to what you need.

Do not forget to change the log file path. The above example is for httpd

Edit the CloudWatch agent config

Now we need to configure the CloudWatch agent. Edit the CloudWatch agent configuration file.

1
vim /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent.json 

This is an example of pulling the log metric only

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
    "agent": {
    "logfile": "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log",
    "metrics_collection_interval": 60,
    "run_as_user": "root",
 .  "debug": false
    },
    "logs": {
        "logs_collected": {
            "files": {
         .      "collect_list": [
                    {
                        "file_path": "/var/log/services/httpd_service.log",
                        "log_group_name": "/ec2/CloudWatchAgentServiceLog/",
                        "log_stream_name": "{instance_id}_{hostname}",
                        "timezone": "Local"

                     }
                ]
            }
        }
    }
}

Now tell the agent to use the new config and restart the agent.

1
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent.json

Now wait a minute and you would not have logs being pulled into CloudWatch

if there are error it will show in the output of the previous command

You can check the agents status with:

1
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a status

Set up logrotate

Now we have logs being created and ingested into CloudWatch we need to do some house keeping on the system with logrotate.

Create a logrotate config for not just this one service, but all that we may create in the future.

1
touch /etc/logrotate.d/monitored_services

Now lets edit the config

1
vim /etc/logrotate.d/monitored_services
1
2
3
4
5
6
7
8
9
/var/log/services/*log {
    weekly
    missingok
    copytruncate
    rotate 12
    compress
    delaycompress
    notifempty
}  

This is a good starting point. it will keep 12 weeks (3 months) worth of logs and compress them. These rules will apply to everything in /var/log/servicess If you need to isolate a service and have a different rule then just separate the log into its own directory. such as /var/log/servicess/httpd/

Further reading

I would highly recommend going through the docs for Manually create or edit the CloudWatch agent configuration file

Setting up the CloudWatch Metric and Alarm

Metric

OK so now we want to get alerts from CloudWatch if our service goes down. We need to create a “filter” in CloudWatch and then create a alarm based on the filters status.

In the AWS console, navigate to

CloudWatch > CloudWatch Logs> Log groups > NAMEOFLOG

If you are following along with the example the log name would be ** /ec2/CloudWatchAgentServiceLog/**

Then under the Actions menu select Create metric filter

Now we need to fill out the requirements for the Define pattern Filter pattern enter Warning

This is the key word that is is our log file for a service going down

Click next

Filter name enter Apache down Metric namespace enter Services

We can add other services to the namespace - its just a grouping

Metric name enter httpd down Metric value enter 1 Default value – optional enter 0

This will allow us to see when the service goes down on a lined graph, with a 1 indicating the service is down and the indicating normal operations

Click Next.

Review and confirm

Now when you look at the metric httpd down, you will see when the service has gone down. Set you Period to 1 min and Statistic to max in a line graph view.

Alarm

To create our alarm we need to navigate to All alarms on the left hand side panel, then click the orange Create alarm.

On the new page select **Select Metric **

Now select our Group name that we created earlier, Services

and select Metric with no dimensions and select httpd down

The metric name will change for whatever you have named the metric name to be.

Specify metric and conditions

Now we need to set the conditions for the alarm. You can change the metric name here if you wanted to.

  • Statistic select Maximum

    You want this to be the Maximum, so its either a 1 or a 0, its up or down

  • Period select 1 minutes

    Change this to meet your needs

  • Threshold type select Static
  • Define the alarm condition select Greater/Equal
  • Define the threshold value enter 1

Click Next

Configure actions

  • Alarm state trigger set to In alarm
  • Send a notification to the following SNS topic

    If you have not already set up a SNS topic you will need to create a new topic.

  • You can also create a OK alert but clicking Add notification and adding the same SNS topic you created. You will then get an alert when the service goes down and comes back up.

Click Next

Add name and description

  • Give the alarm a name and description.

Click Next

Preview and create

Review the alarm setting and click Create alarm.

This post is licensed under CC BY 4.0 by the author.