Monitoring links and devices using Nagios, SNMP and shell scripting
Monitoring and controlling systems are the day-to-day activities of a Systems Administrator, I've been doing this in some capacity since late 1996 and am familiar with the following tools...
I use Nagios extensively for exception reporting of items requiring immediate attention.
For looking at trends over the last day or last year it's useful to feed data into an Round Robin Database. I've used this to spot trends in Hard drive temperatures, RAID Array throughput and IOPS, Network traffic flows and CPU utilisation. Being able to see these values graphed gives a very good indication of long term trends that needs to be planned for and in the shorter term a very good out of band warning of a sudden change in trend that may indicate more investigation is necessary.
Whilst Nagios notifications via email are very useful there are some notifications that are critical such as if a server has failed completely. In this case I've implemented a bash solution using a linux SIP client to place a call to a mobile phone to ensure someone is immediately alerted to investigate further.
Dell Power Edge R620 monitor Power Supply (PSU) with SNMP
[root@localhost srvadmin]# snmpwalk -v 1 -c public localhost 184.108.40.206.4.1.674.10892.1.600.12.1.5.1.1 SNMPv2-SMI::enterprises.674.10892.1.600.12.1.5.1.1 = INTEGER: 3
[root@localhost srvadmin]# snmpwalk -v 1 -c public localhost 220.127.116.11.4.1.674.10892.1.600.12.1.5.1.1 SNMPv2-SMI::enterprises.674.10892.1.600.12.1.5.1.1 = INTEGER: 5
Happy to work as an Employee, Consultant, Contractor / Self Employed or via Limited Company