FreeBSD安装Nagios监控系统

通过前篇文章《Linux安装Nagios监控系统, perl-fcgi, nginx》看,Nagios的安装比较简单,即使是用nginx来替代apache。复杂的是设置和配置参数的设定。不过你要放松一点,毕竟我们要搞定它,不是吗?那就开始吧:

1:获得最新的安装包,http://www.nagios.org/download
2:以root身份登录服务器,目前最新的版本是3.0.6:
1)nagios,版本2.5:

wget http://nchc.dl.sourceforge.net/sourceforge/nagios/nagios-3.0.6.tar.gz

2)获得nagios插件,版本1.4.3:

wget http://jaist.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.13.tar.gz

3)获得图库文件:

http://dl.sf.net/nagios/imagepak-base.tar.gz

4)NRPE,版本2.5.2

http://ufpr.dl.sourceforge.net/sourceforge/nagios/nrpe-2.5.2.tar.gz

5)NSCA,版本2.6

http://kent.dl.sourceforge.net/sourceforge/nagios/nsca-2.6.tar.gz

3:切换到root用户:
sudo su

4:解压缩
tar zxvf nagios-2.5.tar.gz

5:建立运行nagios的用户:
adduser nagios

6:建立安装nagios的文件夹,并使这个文件夹的所有者为nagios:nagios
mkdir /usr/local/nagios
chown nagios.nagios /usr/local/nagios

7:确认web服务器的用户
可能会通过web接口执行一些命令,必须确定web服务器以哪个用户运行的,通常为:apache:
grep “^User” /usr/local/apache2/conf/httpd.conf

8:建立命令文件组
这个新的组会包括apache的用户和nagios的用户
pw groupadd nagcmd
pw usermod apache -G nagcmd
pw usermod nagios -G nagcmd
———————————-
cat /etc/group
nagcmd:*:9007:apache,nagios
———————————-

8:运行配置脚本并安装nagios
cd nagios-3.0.6

./configure –prefix=/usr/local/nagios –with-gd-lib=/usr/local/lib –with-gd-inc=/usr/local/include
make all
make install
make install-init
make install-commandmode
make install-config

9:安装nagios-plugins
tar zxvf nagios-plugins-1.4.13.tar.gz
cd nagios-plugins-1.4.13
./configure –prefix=/usr/local/nagios-plugins
make all
make install
安装完成以后在/usr/local/nagios-plugins-plugins会产生一个libexec的目录,将该目录全部移动到/usr/local/nagios目录下即可。
mv /usr/local/nagios-plugins-plugins/libexec/ /usr/local/nagios/

10:imagepak-base.tar.gz的安装
tar –xvzf imagepak-base.tar.gz
解压以后是base目录
mv base/ /usr/local/nagios/share/images/logos/

———————————————————————-
现在开始配置:
———————————————————————-
1:配置web接口
假设你已经运行了apache,如果没有,请参考:

http://localhost/upload/blog.php?do-showone-tid-18.html

vi /usr/local/apache2/conf/httpd.conf
添加如下内容:
引用
ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName “Nagios Access”
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user

Alias /nagios /usr/local/nagios/share
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName “Nagios Access”
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user

修改完毕,保存文件,并重启apache:
/usr/local/apahce2/bin/apachectl restart

2:配置apache的BASIC认证:
生成认证密码:
/usr/local/apache2/bin/htpasswd –c /usr/local/nagios/etc/htpasswd.users nagios nagios
apache接口配置完成。

开始配置nagios:
cd /usr/local/nagios/etc/
在/usr/local/nagios/etc下是nagios的配置模板文件-sample,把.cfg-sample文件全部拷贝成.cfg
例如:cp nagios.cfg-sample nagios.cfg
全部拷贝完成即可.

vi minimal.cfg
注释所有command:
注释的方法是在每一个定义语句前面添加”#“
修改cgi.cfg
修改use_authentication=1为use_authentication=0,即不用验证.不然有一些页面不会显示。

现在检查配置文件是否有语法错误:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果正确,会显示以下结果:
Total Warnings: 0
Total Errors: 0
否则,需要根据提示进行修改配置文件。

配置文件等会再弄。现在启动nagios
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

为了使nagios异常中断,我们使用daemontools启动:
安装daemontool:
mkdir -p /package
chmod 1755 /package
cd /package
fetch http://cr.yp.to/daemontools/daemontools-0.76.tar.gz
cd admin/daemontools-0.76/
package/install
检查svscan进程是否启动:
ps aux | grep svscan
root 376 0.0 0.0 1636 0 con- IW – 0:00.00 /bin/sh /command/svscanboot
root 411 0.0 0.0 1224 208 con- S 8Jul06 0:42.50 svscan /service

ok,启动正常了。
cd /service
mkdir nagios
chmod 1755 nagios
touch ./run
chmod 755 ./run
vi run
PATH=/usr/local/bin:/usr/bin:/bin
export PATH

exec env – PATH=$PATH \
/usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg

mkdir log
cd log
touch ./run
chmod 755 ./run
vi ./run
#!/bin/sh
exec setuidgid logadmin multilog t s1000000 n100 ./main

mkdir main
chmod 777 main
chown nagios.nagios main
touch status
chown nagios.nagios status

svc -u /service/nagios/
svstat /service/nagios/
root@## ps auxww | grep nagios
root 23276 0.0 0.1 1176 488 ?? I 5:00PM 0:01.71 supervise nagios
nagios 34251 0.0 0.3 2316 1552 ?? S 6:06PM 0:00.10 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
root@##

ok,现在把nagios服务做成自动启动的服务了。
通过svc命令可以启动或者停止服务。
———————————————————————————
svc opts services
opts is a series of getopt-style options. services consists of any number of arguments, each argument naming a directory used by supervise.

-u: Up. If the service is not running, start it. If the service stops, restart it.
-d: Down. If the service is running, send it a TERM signal and then a CONT signal. After it stops, do not restart it.
-o: Once. If the service is not running, start it. Do not restart it if it stops.
-p: Pause. Send the service a STOP signal.
-c: Continue. Send the service a CONT signal.
-h: Hangup. Send the service a HUP signal.
-a: Alarm. Send the service an ALRM signal.
-i: Interrupt. Send the service an INT signal.
-t: Terminate. Send the service a TERM signal.
-k: Kill. Send the service a KILL signal.
-x: Exit. supervise will exit as soon as the service is down. If you use this option on a stable system, you’re doing something wrong; supervise is designed to run forever.
———————————————————————————
比如:
停止nagios--svc -d /service/nagios/
重启nagios--svc -t /service/nagios/
启动nagios--svc -u /service/nagios/

当然,你也可以使用inited的方式进行:
/usr/local/etc/rc.d/nagios start/stop

好了,反正daemontools很强大,以后慢慢熟悉,转入正题。
现在打开网页:http://localhost/nagios/
一定会让你大吃一惊,呵呵,我的服务器和服务状态都清楚的看到了。
现在我们的nagios中只有一个,那就是它自己,localhost,呵呵,等会我们添加别的主机和主机服务,ok,我们认识一下nagios的庐山真面目:

配置nagios:

1)为主机添加服务
2)添加主机并添加服务
3)停止一个服务
4)删除一台主机和服务
5)查看所有主机的故障
6)查看一台特定的主机状态
7)改变报警的时间间隔
8)改变发现故障的重试次数
9)如何在nagios中使用外部命令

1)为主机添加一个服务
为localhost主机添加qmail服务的监控,方法如下:
vi minimal.cfg
define service{
use generic-service ; Name of service template to use
host_name localhost
service_description qmail_smtp
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_smtp!20%!10%!/
}

可以直接拷贝原有的进行修改,我这个就是拷贝的原有的check_local_disk进行的。
修改host_name,service_description,check_command等

define service{
use generic-service ; Name of service template to use
host_name localhost
service_description qmail_pop3
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_pop!20%!10%!/
}
照猫画虎的进行修改,然后去修改:
vi checkcommands.cfg
#’check_qmail’ command definition
define command{
command_name check_qmail
command_line $USER1$/check_smtp -H 127.0.0.1
}
define command{
command_name check_pop3
command_line $USER1$/check_pop -H 127.0.0.1
}
保存,然后检查配置文件:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果没有错误会显示:
Total Warnings: 0
Total Errors: 0
如果有错误,请根据提示进行错误的修正。
重启nagios
svc -d /service/nagios/ && svc -u /service/nagios/
通过web页面检查nagios的结果:

http://10.5.1.153/nagios/

点击“Service Detail”
会出现:

2)添加主机并添加服务
我们会监控这台主机的负载、磁盘等一些没有通过端口方式启动的服务器状态,以及它的服务,比如:apache、mysql、qmail和ntp等等吧。那么没有端口的nagios直接能监控到吗?答案是不行。所以我们必须在两台主机上安装nrpe,nrpe可以启动5666端口,把检测的信息源源不断的传给监控中心的主机。
ok,我们把apache、mysql、qmail和ntp先加上,这回我们把监控的主机和服务新建一个文件:
cd /usr/local/nagios/etc/
touch 10_5_1_156.cfg
vi nagios.cfg
cfg_file=/usr/local/nagios/etc/10_5_1_156.cfg

vi 10_5_1_156.cfg
定义一个主机:
define host{
use generic-host ; Name of host template to use
host_name test_nrpe
alias client
address 10.5.1.156
check_command check-host-alive
max_check_attempts 1
check_period 24×7
notification_interval 120
notification_period 24×7
notification_options d,r
contact_groups admins
}

定义主机需要检查的服务:
define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description PING
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_ping!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description apache
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_http!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description mysql
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_mysql!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description ntp
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_ntp!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description qmail_smtp
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_smtp!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description qmail_pop3
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_pop!100.0,20%!500.0,60%
}
现在我们象上次一样把
服务也定义完了:

此时是不是多了一个主机和它下面的服务呢?那是肯定的,添加主机和服务可能出现的问题有如下情况:
1:配置参数出现问题,如果你没有检查配置就启动nagios,可能会启动成功,但是显示会不正常;
解决方法:调整配置参数
2:Connection refused
当出现这个问题的时候,我开始以为是ssh的无密码登录没有成功,但是其实我的服务器没有启动该服务造成的,启动服务即可。

但是这些是有端口的服务,没有使用端口的状态任何检测?
使用nrpe,ok,我们现在在服务器上安装nrpe:
一、远程主机的配置
1、安装nrpe与配置
fetch http://ufpr.dl.sourceforge.net/sourceforge/nagios/nrpe-2.5.2.tar.gz
tar zxvf nrpe-2.5.2.tar.gz
cd nrpe-2.5.2
./configure –enable-ssl –enable-command-args
make all
mkdir -p /usr/local/nagios/etc
mkdir /usr/local/nagios/bin
mkdir /usr/local/nagios/libexec
pw addgroup nagios
pw useradd nagios -g nagios -d /usr/local/nagios/ -s /sbin/nologin
chown -R nagios:nagios /usr/local/nagios
cp ./sample-config/nrpe.cfg /usr/local/nagios/etc
cp src/nrpe /usr/local/nagios/bin
2、启动nrpe,端口为5666
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
netstat -ant | grep 5666
tcp4 0 0 *.5666 *.* LISTEN

二、监控服务器上的配置
1、安装nrpe(主要是使用check_nrpe模块)
fetch http://ufpr.dl.sourceforge.net/sourceforge/nagios/nrpe-2.5.2.tar.gz
tar zxvf nrpe-2.5.2.tar.gz
cd nrpe-2.5.2
./configure –enable-ssl –enable-command-args
make all
cp src/check_nrpe /usr/local/nagios/libexec
2、nagios文件的配置
vi checkcommands.cfg
定义check_nrpe命令
# ‘check_nrep’ command definition
define command{
command_name check_nrpe
command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
三、上面我们已经配置了一部分参数,下面是配置的最终结果:
define host{
use generic-host ; Name of host template to use
host_name test_nrpe
alias client
address 10.5.1.156
check_command check-host-alive
max_check_attempts 1
check_period 24×7
notification_interval 120
notification_period 24×7
notification_options d,r
contact_groups admins
}

# ‘check_load’ command definition
define command{
command_name check_load
command_line $USER1$/check_load -w $ARG1$ -c $ARG2$
}

# ‘check_load’ command definition
define command{
command_name check_disk
command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$
}
define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description PING
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_ping!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description apache
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_http!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description mysql
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_mysql!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description ntp
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_ntp!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description qmail_smtp
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_smtp!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description qmail_pop3
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_pop!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description test_load
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_load!100.0,20%!500.0,60%
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description test_disk
is_volatile 0
check_period 24×7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 960
notification_period 24×7
check_command check_disk!100.0,20%!500.0,60%
}

四、检查配置参数并重启nagios

9)如何在nagios中使用外部命令
vi /usr/local/nagios/etc/nagios.cfg
check_external_commands=1

mkdir /usr/local/nagios/var/rw
chown nagios.nagcmd /usr/local/nagios/var/rw
chmod u+rw /usr/local/nagios/var/rw
chmod g+rw /usr/local/nagios/var/rw
chmod g+s /usr/local/nagios/var/rw

svc -t /service/nagios/
/usr/local/apache2/bin/apachectl restart

  1. 是有点难够死:) 不过很有用的东东,祝你把它搞定。

  1. 还没有引用通告。