Paul Sullivan

Paul Sullivan

GO!

Systems Monitoring: It starts with a desire to know without having to find out from users/client when systems are in an unusable state, but you can head down a rabbit hole that you never return from.

Files available for download:

Nagios monitoring and exception reporting on a myriad of systems and their variables

The most basic setup

Everytime I install Nagios in a new environment (3 times so far) it begins as an effort to know the most basic information and fulfill a best practice requirement.  A basic howto covers "is this server and some of it's services UP".  A basic setup provides a false sense of security.

Ah, actually UP doesn't mean "OK"

Almost as soon as the first fault occurs that is not picked up by Nagios because (say) an email is not delivered, someone realises that basic monitoring of a host with ICMP or a service with a TCP connect is really not sufficient and basically pointless.  Users don't care that the service is there, they care that it works.

Beyond TCP Connect AKA check_tcp

Checking through the service end to end

These are my notes on Nagios (or general system monitoring) best practice, with specific examples.

Valid XHTML 1.1 Strict Valid CSS

Originally from East London in South Africa. I recently moved to Dorking, Surrey having previously lived in the London Borough of Sutton and Cape Town