[Nagiosplug-help] monitoring F5 bigIP Load Balancers

Heiko rupertt at gmail.com
Wed Jun 18 15:27:03 CEST 2008


On Wed, Jun 18, 2008 at 2:33 PM, Thomas Guyot-Sionnest <dermoth at aei.ca> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 18/06/08 02:51 AM, Heiko wrote:
>> On Wed, Jun 18, 2008 at 3:32 AM, Thomas Guyot-Sionnest <dermoth at aei.ca> wrote:
>> On 17/06/08 08:36 AM, Heiko wrote:
>>>>>>> Hello Thoms,
>>>>>>> sorry for the late response, been/am busy here:
>>>>>>> I did a restart pf the snmpd deamon on one LB which gave me some stats back.
>>>>>>> To get the rest working I had to set the global nagios timeout to 180 seconds,
>>>>>>> i only did this with the -t option, which got overriden by nagios...
>>>>>>>
>>>>>>> Now im getting happy, nice plugin, thx for that and help.
>>>>>>>
>>>>>>>
>>>>>> mmh, after some minutes it did go to the old status, vserver are
>>>>>> monitored but the pools get an timeout, again :(.
>> Hello. Please keep the discussion on the list as it may help others as
>> well...
>>> Hello Thomas,
>>
>>> Sorry about that, the reply button in gmail only adds the sender..
>> Pool monitoring on the 9.x requires walking the table and counting the
>> lines... this can take some time is there's some latency between your
>> monitoring server and the BigIP.
>>
>> Increasing the timeout could help, but it might be worth trying this as
>> well:
>>
>> Locate the line ear the top which says:
>> $snmpwalkcmd = '/usr/bin/snmpwalk';
>>
>> and change it to:
>> $snmpwalkcmd = '/usr/bin/snmpbulkwalk';
>>
>> This will do SNMP BULKWALK requests instead, which should be faster.
>> AFAIK the parameters for snmpbulkwalk are the same...
>>
>>> this looks like to work much better, I still get notifications about
>>> pools that have no users,
>
> I'm not sure what you mean here...
>
it did report some wrong status about that nodes are offline but they wherent
like this, it cant be that on one machine only 1 node is there and on
the other both,
in this case the second BigIP reports a timeout:

[root at monitoring-1:/usr/local/nagios/libexec]# date
Wed Jun 18 13:18:22 UTC 2008
[root at monitoring-1:/usr/local/nagios/libexec]#
/usr/local/nagios/libexec/check_bigip_pool -H 172.17.1.12 -C public -S
9 -vw 51 -c 26 -P pool1_PRODUCTION_www_v2 -t 180
Getting 'MemberQty' trough SNMP
Matching F5-BIGIP-LOCAL-MIB::ltmPoolMemberPoolName against
'\.1\.3\.6\.1\.4\.1\.3375\.2\.2\.5\.3\.2\.1\.1\.27\.67\.66\.105\.100\.101\.97\.115\.116\.118\.95\.80\.82\.79\.68\.85\.67\.84\.73\.79\.78\.95\.119\.'
Getting 'ActiveMemberCount' trough SNMP
CHECK_BIGIP_POOL WARNING - pool1_PRODUCTION_www_v2 1/2 nodes online
[root at monitoring-1:/usr/local/nagios/libexec]# date
Wed Jun 18 13:18:26 UTC 2008
[root at monitoring-1:/usr/local/nagios/libexec]#
/usr/local/nagios/libexec/check_bigip_pool -H 172.17.1.11 -C public -S
9 -vw 51 -c 26 -P pool1_PRODUCTION_www_v2 -t 180
Getting 'MemberQty' trough SNMP
Matching F5-BIGIP-LOCAL-MIB::ltmPoolMemberPoolName against
'\.1\.3\.6\.1\.4\.1\.3375\.2\.2\.5\.3\.2\.1\.1\.27\.67\.66\.105\.100\.101\.97\.115\.116\.118\.95\.80\.82\.79\.68\.85\.67\.84\.73\.79\.78\.95\.119\.'
Getting 'ActiveMemberCount' trough SNMP
CHECK_BIGIP_POOL OK - pool1_PRODUCTION_www_v2 all 2 nodes online


>>> arent reachable and messages like this:
>
> You mean the BigIP isn't reachable? Have you tried snmpwalk'ing it by
> hands? Raising the timeout?
>
My timeout is at 180 minutes, so i set the normal_check_interval to 5 minutes.
But we still get a lot of timeouts, they recover often on the next
check, but under this situation we
cant use it in a production environment.
We think it is a bigIP problem, maybe it gives snmp queries a low
priority, so they get processed to late or never.
We have some heavy load on these machines. But even on the standby
unit we have some timeouts on pools.
Strange thing is that the vservers are always reported like they should be.


greetings

heiko


>>>      CHECK_BIGIP_POOL UNKNOWN - cbw-www1_mysql 1/0 nodes make no sense
>>> The Vserver stats are ok somehow.
>>> Is there anything else we can do?
>
> Send me the full output of the plugin with -vvv (in private if you
> wish). I'll have a look.
>
these message didt appear the last hours.

> Basically the error above means that the BigIP reported one node online,
> but walking the node list fro the pool returned no results.
>
> Thomas
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFIWQCM6dZ+Kt5BchYRAj3YAKCdU9Eqp3jPBz6RlJYc2zRv0l74owCgor3K
> goIXxuGQEQkmEIsxLl2oYRQ=
> =m9Wx
> -----END PGP SIGNATURE-----
>




More information about the Help mailing list