Here's where Nagios comes in.
Nagios is a system that allows you to monitor and alert for site related issues. It's a highly flexible system that allows you to write code in any language for checking on the health of your site. But, for the purposes of this article, we're going to talk about a powerful feature of the system that probably doesn't get as much use as it should, support for event handlers.
So, what exactly is an event handler? An event handler is a command that gets run whenever the state of a service changes. This change can mean that it switches between any of the following states, OK, WARNING, CRITICAL, UNKNOWN, as well as substates. By substate, I refer to SOFT and HARD problem states, as well as when there is an increment in the check attempt during one of the problem states.
While this does add complexity to the options, it also gives you the ability to fine tune when your response commands get run. Let's look at an example of an event handler script:
#!/bin/sh
# define nagios command as:
# restart_service.sh $HOSTNAME$ $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $ARG1$
case "$2" in
WARNING)
# Service is going warning
# We only want to take action if it's the 20th attempt
case "$4" in
20)
ssh $1 "/etc/init.d/$5 stop ; sleep 2 ; /etc/init.d/$5 start"
;;
esac
esac
As you can see, this command takes 5 arguments. They are:
HOSTNAME - this is the hostname where the service is running
SERVICESTATE - This can be one of OK, WARNING, CRITICAL, or UNKNOWN
SERVICESTATETYPE - This can be one of the problem states of SOFT or HARD
SERVICEATTEMPT - which check attempt we are on
ARG1 - the name of the linux init.d service that we want to restart
Go ahead and setup the event handler command as suggested in the comments:
define command {
command_name restart_service
command_line $USER2$/restart_service.sh $HOSTNAME$ $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $ARG1$
}
Then, all that's left is to hook it into your service definition:
define service {
... [your service definition go here]
event_handler restart_service!httpd
}
Done! You now have Nagios automatically restarting the httpd service after it shows problems 20 checks in a row. Admittedly, this example is far from complete and requires many more pieces, but I'll leave that as an exercise for the reader.
For more information on Nagios, go to their website (http://www.nagios.org)
showlive聊天網
ReplyDeleteshowlive影音視訊
showlive平台
ShowLive影音聊天網
showlive影音視訊聊天網
showlive
live173影音live秀 全台首創一對一 免費視訊
live秀live173影音視訊live秀-全台
live173影音live秀 每日限量特
live173 視訊美女