[Nagiosplug-help] Logos don't display on status map or in host status views

Andreas Ericsson ae at op5.se
Thu Dec 14 10:11:35 CET 2006


Ralph.Grothe at itdz-berlin.de wrote:
>> From: Andreas Ericsson [mailto:ae at op5.se]
> 
> Hello Andreas
>>
>> That's because the gd-library reads the .png images, 
>> re-renders them to 
>> .gd2 images internally and then creates the statusmap (which is
> 
>> basically just one big image) from those re-rendered gd2
> images.
>> If you had instead done
>>
>> 	%s/\(.*icon_image.*)\(.*gd2\)/\1\2.png/
>>
>> (that is, replace only .gd2 for the icon_image variable) it 
>> would still 
>> have worked, but the statusmap.cgi binary wouldn't have to 
>> re-render the 
>> images internally.
> 
> Hope I got it right.
> So the gd2 graphic meta data files are basically held in stock
> to speed up the rendering of the status map.

Yes.

> Then I should refer to the true graphic file in PNG when
> referring to the icon_image
> (to give the browser something meaningful)
> and for the status_map image its better to refer to the GD2
> files?
> 

Yes.

> As for the status map,
> I found that the auto positioning of hosts and parent links in
> the map isn't really nice
> because I have a few hosts that are cluster nodes which
> themselves host cluster packages
> that are assigned virtual IP addresses (VIPs), and that I defined
> as true hosts within
> my nagios config.
> This leads to a seemingly random shuffling between VIP hosts and
> separate non-cluster hosts in the same
> lan segment, with many criss cross lines of wide apart cluster
> packages that should rather visually stick together.
> To circumvent this would one have to assign the 2d_coords
> parameter with x,y coordinates to position
> them predictably?

Yes.

> Would one need a graphics editor or similar to measure the
> coordinates?
> 

No, the coordinates are measured from the upper left corner and you can 
space things as far apart as you like. The statusmap.cgi program will 
adjust the size of the image accordingly.

> 
> Besides, I have another issue with the VIP hosts
> that is not a specific plug-in question but rather some basic
> nagios configuration
> matter I suppose.
> 
> For all hosts that reside in lan segments where gateways don't
> block ICMP echo requests
> I defined as the basic check_command the check_host link to
> check_icmp.
> Additionally I defined a generic check_icmp service for all non
> firewalled hosts
> because I would like to make use of the performance data of these
> service checks.
> 
> Therefore I sort of have a redundant notification if a VIP host
> goes down.
> Because the check_host command is immediately executed repeatedly
> five times
> (my generic host definition max_check_attempts assignment)
> a down notification is instantly sent out,
> far earlier than the service notification of the check_icmp would
> be
> because of the defined 3 minute retry_check_interval for
> services.
> 
> As every cluster node hosts about 10-20 such VIP hosts
> contacts get really bombarded with host down notifications
> even during a relatively short reboot of a new kernel where the
> service checks
> thanks to their inert recheck behavior relaps back into a hard ok
> state
> before notifications were due.
> 
> I also defined parents relations for every VIP host that refer to
> its hosting cluster node because I hoped that host down
> notifications
> would then be restricted to the parents if all nodes went down.

And they should, but it won't work if the host checking logic takes 
longer to complete than it takes for the system to reboot. You might 
have to reduce the number of max_check_attempts for hosts in order to 
get this to work. You can use something along the lines of

/path/to/check_host -n 15 $HOSTADDRESS$

for your hostcheck, along with a max_check_attempts of 1, which will 
trigger an ICMP_HOSTUNREACH from wherever the last hop on the route is 
if the host is actually not reachable via icmp (note that you can get 
ICMP_HOSTUNREACH from the machine you're running check_icmp on if 
there's no router in between). This short-circuits the logic in 
check_icmp / check_host and makes it return early.

In 99% of the cases this is the Right Thing to do, because the host will 
have to be up a couple of seconds and send some traffic for the switch 
it's connected to to pick up it's mac-address and be able to do the 
ARP-resolution anyways, so when it's actually down it will show up as 
down really quickly.

Given the fact that hostchecks are serialized anyway and your problem 
doesn't seem to be temporary glitches in stability, but rather that the 
checks take longer to complete than the system does to reboot, this 
should work for you for those particular hosts. Note that it may raise 
the level of false positives though. YMMV.

> Would one rather have to define a host dependency to map this?
> 

No, parent relations should work.

> 
> Because of all this mess I decided to disable host notifications
> for those VIP hosts.
> 
> But I am not aware of the difference between a
> 
> notification_options	n
> 
> and a
> 
> notifications_enabled	0
> 
> in the host definition block.
> 

There is none, although notification_options = 0 is most often used when 
you temporarily want to disable notifications without forgetting your 
normal settings for the host.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231




More information about the Help mailing list