From karl at debisschop.net Tue Jul 1 03:55:09 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Tue Jul 1 03:55:09 2003 Subject: [Nagiosplug-devel] Re: Issues on check_disk In-Reply-To: <53104E20A25CD411B556009027E50636079A9CE2@pnnemp02.pn.egg.com> References: <53104E20A25CD411B556009027E50636079A9CE2@pnnemp02.pn.egg.com> Message-ID: <1057056708.21850.42.camel@miles.debisschop.net> On Mon, 2003-06-30 at 13:04, Voon, Ton wrote: > (Damn Outlook makes it hard for me to add comments inline - apologies for > appending at the top) > > Karl, > > -w -1% is fine for clearing thresholds. Just seemed like a lot of dashes on > the command line, but you're right - the alternatives are not much better. > > Fixed "check_disk warn crit [path]". This syntax had thresholds at used > levels so I've left it like that, whereas the new code is reporting and > expecting -w and -c on free levels so these two are equivalent: > > check_disk -w 10% -c 5% -p / > check_disk 90 95 / > > Personally, I think it is a bit peculiar to support a syntax which is a few > releases old, especially as we are breaking more current syntax... I would propose that this syntax never be advertised. But to my mind, retaining it does not seem to hurt. I am constantly surpirsed how old some installs are, this constant dribble of 0.0.7 qustions... > The way it is currently coded, when -p is seen, it will "save" the last set > of thresholds specified. If a threshold is set after the path is specified, > then this will be ignored. At the moment, you can't say "check 5% for /var > and 10% for everything else" - you have to list "everything else". Is this a > limitation? I think so. > If so, what syntax do you propose? Are you saying a later -w -c > without a -p means "this threshold for everything else"? Instead of thinking of early thresholds as a 'default', we could think of them as a state. So a threshold setting would apply to all partitions before it with no threshold set, and all that follow until another threshold is defined (or it is unset, of course). Does that make sense? I think it's not too hard to do with the code we currently have. > (All this syntax stuff is making me think that threshold parameter should > really be held as object variables. I think this is how Patrol does it > (badly) - send all values back to the central server which then does the > checking of thresholds) > > Ton > > > -----Original Message----- > > From: Karl DeBisschop [SMTP:karl at debisschop.net] > > Sent: Monday, June 30, 2003 1:49 PM > > To: NagiosPlug Devel > > Subject: [Nagiosplug-devel] Re: Issues on check_disk > > > > Voon, Ton writes: > > > > > The code for clearing thresholds is already there! Use -1% or -1: > > > > > > $ ./check_disk -v -v -v -w 10% -c 5% -p /tmp -w 10000 -c 5000 -p /var > > > DISK OK [846 MB (85%) free on /var] [1886 MB (93%) free on /tmp] > > > 846 of 992 MB (85%) free on /dev/dsk/c0t0d0s3 (type ufs mounted on /var) > > > warn:10000 crit:5000 warn%:10% crit%:5% > > > 1886 of 2034 MB (93%) free on swap (type tmpfs mounted on /tmp) warn:-1 > > > crit:-1 warn%:10% crit%:5% > > > > I didn't have a chance to check, but I'm not surprised. > > > > > $ ./check_disk -v -v -v -w 10% -c 5% -p /tmp -w 10000 -c 5000 -w -1% -c > > -1% > > > -p /var > > > DISK OK [846 MB (85%) free on /var] [1887 MB (93%) free on /tmp] > > > 846 of 992 MB (85%) free on /dev/dsk/c0t0d0s3 (type ufs mounted on /var) > > > warn:10000 crit:5000 warn%:-1% crit%:-1% > > > 1887 of 2035 MB (93%) free on swap (type tmpfs mounted on /tmp) warn:-1 > > > crit:-1 warn%:10% crit%:5% > > > > > > Looks like any values are accepted, but checked at the end of all > > parameter > > > parsing. It looks a nightmare to read though. > > > > I don't find it a nightmare to read. Not pretty, but a nightmare? Can you > > put you finger on what you find disconcerting? > > > > > Do you think it should be something else (-w C -c C?) > > > > I knew the checking happened late, and thought about 'undef' -- that would > > > > be pretty clear to a wide audience. But 'undef%' seemed odd as would 'C%'. > > I > > don't mind accepting a short list of strings like 'undef' and 'null' > > however > > (but I fail to see hoe 'C' is intuitive). > > > > Also, -C as a option to clear all previous defaults is fine. It makes much > > > > more sense to me in the context of this framework. On its own, as I have > > expressed before, it was sort of ad hoc to me. > > > > I think we also need to make a clear statement on what a threshold becomes > > > > if it is not specified foir a drive -- is it the last one used, or is it a > > > > 'default' thta would be specified before and paths are specified? > > > > > Also, this currently does not work: > > > > > > check_disk -w 10% -c 5% / /tmp /var > > > > > > You need to specify as: > > > > > > check_disk -w 10% -c 5% -p / -p /tmp -p /var > > > > > > I think it makes sense to do it the top way, but check_disk looks like > > it is > > > expecting: > > > > > > check_disk warn crit path > > > > > > I seem to have broken this with my latest changes. Instead of fixing > > that, > > > can I propose removing warn and crit and assume all additional > > parameters to > > > check_disk are considered as paths? > > > > 'check_disk warn crit path' is the oldest form of usage, but was > > originally > > the only valid invocation. I would prefer to keep that as well, since I > > think it can be accepted without too much trouble, it does not violate > > POSIX, and my policy has been that reverse compatibility should be > > preserved > > if reasonably possible. Again, if there is a groundswell of disagreement, > > I > > will defer. But I do feel rather strenuously that old invocations should > > be > > supported and would be decidely less happy if we choose not to go that > > way. > > Release schedule has no core priciples attached, this does. > > > > -- > > Karl > > > This private and confidential e-mail has been sent to you by Egg. > The Egg group of companies includes Egg Banking plc > (registered no. 2999842), Egg Financial Products Ltd (registered > no. 3319027) and Egg Investments Ltd (registered no. 3403963) which > carries out investment business on behalf of Egg and is regulated > by the Financial Services Authority. > Registered in England and Wales. Registered offices: 1 Waterhouse Square, > 138-142 Holborn, London EC1N 2NA. > If you are not the intended recipient of this e-mail and have > received it in error, please notify the sender by replying with > 'received in error' as the subject and then delete it from your > mailbox. > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 > _______________________________________________ > Nagiosplug-devel mailing list > Nagiosplug-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel > ::: Please include plugins version (-v) and OS when reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null From Ton.Voon at egg.com Tue Jul 1 05:33:06 2003 From: Ton.Voon at egg.com (Voon, Ton) Date: Tue Jul 1 05:33:06 2003 Subject: [Nagiosplug-devel] Re: Issues on check_disk Message-ID: <53104E20A25CD411B556009027E50636079A9CE8@pnnemp02.pn.egg.com> Just so I get this clear before I start making changes, for the case: "warn 10% crit 5% for /tmp & /, warn 10MB crit 5MB for /var, everything else warn (20% or 10MB) crit (10% or 5MB)" I think the syntax should look like: check_disk -w 10% -c 5% -p /tmp -p / -C -w 10000 -c 5000 -p /var -w 20% -c 10% The order is vital. Picturing the code, it maybe possible to add -p DEFAULT at the end of the above so 20% 10% is the default for everything else if you think this makes more sense. I can make sure it still works without it. Ton > -----Original Message----- > From: Karl DeBisschop [SMTP:karl at debisschop.net] > Sent: Tuesday, July 01, 2003 11:52 AM > To: Voon, Ton > Cc: 'NagiosPlug Devel' > Subject: RE: [Nagiosplug-devel] Re: Issues on check_disk > > On Mon, 2003-06-30 at 13:04, Voon, Ton wrote: > > (Damn Outlook makes it hard for me to add comments inline - apologies > for > > appending at the top) > > > > Karl, > > > > -w -1% is fine for clearing thresholds. Just seemed like a lot of dashes > on > > the command line, but you're right - the alternatives are not much > better. > > > > Fixed "check_disk warn crit [path]". This syntax had thresholds at used > > levels so I've left it like that, whereas the new code is reporting and > > expecting -w and -c on free levels so these two are equivalent: > > > > check_disk -w 10% -c 5% -p / > > check_disk 90 95 / > > > > Personally, I think it is a bit peculiar to support a syntax which is a > few > > releases old, especially as we are breaking more current syntax... > > I would propose that this syntax never be advertised. But to my mind, > retaining it does not seem to hurt. I am constantly surpirsed how old > some installs are, this constant dribble of 0.0.7 qustions... > > > The way it is currently coded, when -p is seen, it will "save" the last > set > > of thresholds specified. If a threshold is set after the path is > specified, > > then this will be ignored. At the moment, you can't say "check 5% for > /var > > and 10% for everything else" - you have to list "everything else". Is > this a > > limitation? > > I think so. > > > If so, what syntax do you propose? Are you saying a later -w -c > > without a -p means "this threshold for everything else"? > > Instead of thinking of early thresholds as a 'default', we could think > of them as a state. So a threshold setting would apply to all partitions > before it with no threshold set, and all that follow until another > threshold is defined (or it is unset, of course). Does that make sense? > > I think it's not too hard to do with the code we currently have. > > > (All this syntax stuff is making me think that threshold parameter > should > > really be held as object variables. I think this is how Patrol does it > > (badly) - send all values back to the central server which then does the > > checking of thresholds) > > > > Ton > > > This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. From karl at debisschop.net Tue Jul 1 07:55:07 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Tue Jul 1 07:55:07 2003 Subject: [Nagiosplug-devel] Re: Issues on check_disk In-Reply-To: <53104E20A25CD411B556009027E50636079A9CE8@pnnemp02.pn.egg.com> References: <53104E20A25CD411B556009027E50636079A9CE8@pnnemp02.pn.egg.com> Message-ID: Voon, Ton writes: > Just so I get this clear before I start making changes, for the case: > > "warn 10% crit 5% for /tmp & /, warn 10MB crit 5MB for /var, everything else > warn (20% or 10MB) crit (10% or 5MB)" > > I think the syntax should look like: > > check_disk -w 10% -c 5% -p /tmp -p / -C -w 10000 -c 5000 -p /var -w 20% -c > 10% Yes. Except the last pair of threshold do nothing because no disks follow them. I am starting with the rule that a threshold applies to all patrtitions that follow it. But, I feel this expected usage should also be vaild: check_disk -p / -w 10% -c 5% So I modify the rule to add that a threshold applies to the preceding partitions if and only if they have not yet been set. Which I think means we need to mark a difference between initially unset thresholds, and thresholds that have been reset (say '-2'). There is also the oddity that this rule would mean check_disk -p / -w 10% -c 5% -p /tmp -w 10000 -c 5000 is exactly the same as check_disk -w 10% -c 5% -w 10000 -c 5000 -p / -p /tmp and check_disk -p / -p /tmp -w 10% -c 5% -w 10000 -c 5000 and of course check_disk -w 10% -c 5% -w 10000 -c 5000 / /tmp > > The order is vital. Picturing the code, it maybe possible to add -p DEFAULT > at the end of the above so 20% 10% is the default for everything else if you > think this makes more sense. I can make sure it still works without it. If I follow that idea, I don't think I want to go that way. -- Karl > Ton > >> -----Original Message----- >> From: Karl DeBisschop [SMTP:karl at debisschop.net] >> Sent: Tuesday, July 01, 2003 11:52 AM >> To: Voon, Ton >> Cc: 'NagiosPlug Devel' >> Subject: RE: [Nagiosplug-devel] Re: Issues on check_disk >> >> On Mon, 2003-06-30 at 13:04, Voon, Ton wrote: >> > (Damn Outlook makes it hard for me to add comments inline - apologies >> for >> > appending at the top) >> > >> > Karl, >> > >> > -w -1% is fine for clearing thresholds. Just seemed like a lot of dashes >> on >> > the command line, but you're right - the alternatives are not much >> better. >> > >> > Fixed "check_disk warn crit [path]". This syntax had thresholds at used >> > levels so I've left it like that, whereas the new code is reporting and >> > expecting -w and -c on free levels so these two are equivalent: >> > >> > check_disk -w 10% -c 5% -p / >> > check_disk 90 95 / >> > >> > Personally, I think it is a bit peculiar to support a syntax which is a >> few >> > releases old, especially as we are breaking more current syntax... >> >> I would propose that this syntax never be advertised. But to my mind, >> retaining it does not seem to hurt. I am constantly surpirsed how old >> some installs are, this constant dribble of 0.0.7 qustions... >> >> > The way it is currently coded, when -p is seen, it will "save" the last >> set >> > of thresholds specified. If a threshold is set after the path is >> specified, >> > then this will be ignored. At the moment, you can't say "check 5% for >> /var >> > and 10% for everything else" - you have to list "everything else". Is >> this a >> > limitation? >> >> I think so. >> >> > If so, what syntax do you propose? Are you saying a later -w -c >> > without a -p means "this threshold for everything else"? >> >> Instead of thinking of early thresholds as a 'default', we could think >> of them as a state. So a threshold setting would apply to all partitions >> before it with no threshold set, and all that follow until another >> threshold is defined (or it is unset, of course). Does that make sense? >> >> I think it's not too hard to do with the code we currently have. >> >> > (All this syntax stuff is making me think that threshold parameter >> should >> > really be held as object variables. I think this is how Patrol does it >> > (badly) - send all values back to the central server which then does the >> > checking of thresholds) >> > >> > Ton >> > >> > > > This private and confidential e-mail has been sent to you by Egg. > The Egg group of companies includes Egg Banking plc > (registered no. 2999842), Egg Financial Products Ltd (registered > no. 3319027) and Egg Investments Ltd (registered no. 3403963) which > carries out investment business on behalf of Egg and is regulated > by the Financial Services Authority. > Registered in England and Wales. Registered offices: 1 Waterhouse Square, > 138-142 Holborn, London EC1N 2NA. > If you are not the intended recipient of this e-mail and have > received it in error, please notify the sender by replying with > 'received in error' as the subject and then delete it from your > mailbox. > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 > _______________________________________________ > Nagiosplug-devel mailing list > Nagiosplug-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel > ::: Please include plugins version (-v) and OS when reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null From Ton.Voon at egg.com Wed Jul 2 02:37:05 2003 From: Ton.Voon at egg.com (Voon, Ton) Date: Wed Jul 2 02:37:05 2003 Subject: [Nagiosplug-devel] RE: Issues on check_disk Message-ID: <53104E20A25CD411B556009027E50636079A9CF5@pnnemp02.pn.egg.com> I've been trying to get my head round this and I think it comes to this: We currently have check_disk with multiple thresholds, with thresholds read via a "left to right" scheme (eg, "check_disk -w 10% -c 5% -p /tmp"). There are 2 desireable improvements: 1) backward syntax compatibility, eg "check_disk -p /tmp -w 10% -c 5%" should set thresholds of 10%,5% for /tmp. This would effectively mean that the thresholds could be read "bidirectionally" 2) a method of saying "check /tmp at 10%, 5% with 20%, 15% for everything else" I think there are 3 roads to go down: A) Not bother with (1), but implement (2). This would allow us to use "check_disk -w 10% -c 5% -p /tmp -w 20% -c 15%" so a threshold without a path following is considered for everything else. This does not break the "left to right" methodology and seems a logical extension of r1.3 syntax of "check_disk -w 20% -c 15%". B) Implement (1), but not (2). Because you can read thresholds "bidirectionally", it is not possible to set a threshold for everything else. C) Do (1) and (2) by using "-p default". As the optarg to -p is a path, only the namespace beginning with "/" is used, so we could use labels such as "default" as the threshold for everything else. However, Karl has expressed a dislike of this approach. My vote is for (A). We are breaking current syntax (-p "/var /tmp" will not work anymore) so as long as we issue clear guidance on how the current syntax works, people can make their changes accordingly - currently, "check_disk -p /tmp -w 10% -c 5%" will give a UNKNOWN status since the syntax is wrong. I think the functionality of (2) is a lot more desireable than (1). Ton > -----Original Message----- > From: Karl DeBisschop [SMTP:karl at debisschop.net] > Sent: Tuesday, July 01, 2003 2:53 PM > To: Voon, Ton > Cc: 'Karl DeBisschop'; 'NagiosPlug Devel' > Subject: Re: Issues on check_disk > > Voon, Ton writes: > > > Just so I get this clear before I start making changes, for the case: > > > > "warn 10% crit 5% for /tmp & /, warn 10MB crit 5MB for /var, everything > else > > warn (20% or 10MB) crit (10% or 5MB)" > > > > I think the syntax should look like: > > > > check_disk -w 10% -c 5% -p /tmp -p / -C -w 10000 -c 5000 -p /var -w 20% > -c > > 10% > > Yes. Except the last pair of threshold do nothing because no disks follow > them. > > I am starting with the rule that a threshold applies to all patrtitions > that > follow it. But, I feel this expected usage should also be vaild: > > check_disk -p / -w 10% -c 5% > > So I modify the rule to add that a threshold applies to the preceding > partitions if and only if they have not yet been set. Which I think means > we > need to mark a difference between initially unset thresholds, and > thresholds > that have been reset (say '-2'). > > There is also the oddity that this rule would mean > > check_disk -p / -w 10% -c 5% -p /tmp -w 10000 -c 5000 > > is exactly the same as > > check_disk -w 10% -c 5% -w 10000 -c 5000 -p / -p /tmp > > and > > check_disk -p / -p /tmp -w 10% -c 5% -w 10000 -c 5000 > > and of course > > check_disk -w 10% -c 5% -w 10000 -c 5000 / /tmp > > > > > The order is vital. Picturing the code, it maybe possible to add -p > DEFAULT > > at the end of the above so 20% 10% is the default for everything else if > you > > think this makes more sense. I can make sure it still works without it. > > If I follow that idea, I don't think I want to go that way. > > -- > Karl > > > Ton > > > >> -----Original Message----- > >> From: Karl DeBisschop [SMTP:karl at debisschop.net] > >> Sent: Tuesday, July 01, 2003 11:52 AM > >> To: Voon, Ton > >> Cc: 'NagiosPlug Devel' > >> Subject: RE: [Nagiosplug-devel] Re: Issues on check_disk > >> > >> On Mon, 2003-06-30 at 13:04, Voon, Ton wrote: > >> > (Damn Outlook makes it hard for me to add comments inline - apologies > >> for > >> > appending at the top) > >> > > >> > Karl, > >> > > >> > -w -1% is fine for clearing thresholds. Just seemed like a lot of > dashes > >> on > >> > the command line, but you're right - the alternatives are not much > >> better. > >> > > >> > Fixed "check_disk warn crit [path]". This syntax had thresholds at > used > >> > levels so I've left it like that, whereas the new code is reporting > and > >> > expecting -w and -c on free levels so these two are equivalent: > >> > > >> > check_disk -w 10% -c 5% -p / > >> > check_disk 90 95 / > >> > > >> > Personally, I think it is a bit peculiar to support a syntax which is > a > >> few > >> > releases old, especially as we are breaking more current syntax... > >> > >> I would propose that this syntax never be advertised. But to my mind, > >> retaining it does not seem to hurt. I am constantly surpirsed how old > >> some installs are, this constant dribble of 0.0.7 qustions... > >> > >> > The way it is currently coded, when -p is seen, it will "save" the > last > >> set > >> > of thresholds specified. If a threshold is set after the path is > >> specified, > >> > then this will be ignored. At the moment, you can't say "check 5% for > >> /var > >> > and 10% for everything else" - you have to list "everything else". Is > >> this a > >> > limitation? > >> > >> I think so. > >> > >> > If so, what syntax do you propose? Are you saying a later -w -c > >> > without a -p means "this threshold for everything else"? > >> > >> Instead of thinking of early thresholds as a 'default', we could think > >> of them as a state. So a threshold setting would apply to all > partitions > >> before it with no threshold set, and all that follow until another > >> threshold is defined (or it is unset, of course). Does that make sense? > > >> > >> I think it's not too hard to do with the code we currently have. > >> > >> > (All this syntax stuff is making me think that threshold parameter > >> should > >> > really be held as object variables. I think this is how Patrol does > it > >> > (badly) - send all values back to the central server which then does > the > >> > checking of thresholds) > >> > > >> > Ton > This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. From noreply at sourceforge.net Wed Jul 2 08:59:03 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jul 2 08:59:03 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-740132 ] check_disk_smb update for smbclient 2.2.7 Message-ID: Patches item #740132, was opened at 2003-05-19 23:09 Message generated for change (Comment added) made by tonvoon You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=740132&group_id=29880 Category: Enhancement Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Cove Schneider (coveschneider) >Assigned to: Ton Voon (tonvoon) Summary: check_disk_smb update for smbclient 2.2.7 Initial Comment: Catches the error messages from smbclient Version 2.2.7-security- rollup-fix correctly. ---------------------------------------------------------------------- >Comment By: Ton Voon (tonvoon) Date: 2003-07-02 16:58 Message: Logged In: YES user_id=664364 Thanks very much for the patch. Applied to r1_3_0 and HEAD. Ton ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=740132&group_id=29880 From noreply at sourceforge.net Wed Jul 2 09:02:03 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jul 2 09:02:03 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-740132 ] check_disk_smb update for smbclient 2.2.7 Message-ID: Patches item #740132, was opened at 2003-05-19 22:09 Message generated for change (Comment added) made by coveschneider You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=740132&group_id=29880 Category: Enhancement Group: None Status: Closed Resolution: Fixed Priority: 5 Submitted By: Cove Schneider (coveschneider) Assigned to: Ton Voon (tonvoon) Summary: check_disk_smb update for smbclient 2.2.7 Initial Comment: Catches the error messages from smbclient Version 2.2.7-security- rollup-fix correctly. ---------------------------------------------------------------------- >Comment By: Cove Schneider (coveschneider) Date: 2003-07-02 16:01 Message: Logged In: YES user_id=763815 Very welcome! ---------------------------------------------------------------------- Comment By: Ton Voon (tonvoon) Date: 2003-07-02 15:58 Message: Logged In: YES user_id=664364 Thanks very much for the patch. Applied to r1_3_0 and HEAD. Ton ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=740132&group_id=29880 From noreply at sourceforge.net Wed Jul 2 09:24:05 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jul 2 09:24:05 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-755456 ] check_oracle fix if Oracle-Error is reported Message-ID: Patches item #755456, was opened at 2003-06-16 18:57 Message generated for change (Comment added) made by tonvoon You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=755456&group_id=29880 Category: Bugfix Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Sven Meyer (scm) >Assigned to: Ton Voon (tonvoon) Summary: check_oracle fix if Oracle-Error is reported Initial Comment: I frequently stumble into corrupted database, which typically causes an ORA-12500 to be reported to any connection request. I found that "check_oracle --tablespace" (admittedly the only one tested) then reports no error, but a tablespace size of 0, usage 0. I'd like a nice error description, so that I can treat these problem immediately. Adding the following patch to check_oracle (taken from Plugins Release 1.3.0) 31a32 > . /etc/profile.d/oracle.sh 254a256,260 > if [ -n "`echo $result | grep ORA-`" ] ; then > echo $result > exit $STATE_UNKNOWN > fi > will report an UNKNOWN-State an reissue to problem as status information. "ERROR: ORA-12500: TNS:listener failed to start a dedicated server process Invalid option. Usage: CONNECT [AS SYSDBA ..." ---------------------------------------------------------------------- >Comment By: Ton Voon (tonvoon) Date: 2003-07-02 17:23 Message: Logged In: YES user_id=664364 Thanks very much for the patch. Applied to r1_3_0 and HEAD. Have changed to STATE_CRITICAL instead of STATE_UNKNOWN, as should know about these failures. Ton ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=755456&group_id=29880 From noreply at sourceforge.net Wed Jul 2 10:23:04 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jul 2 10:23:04 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-764745 ] check_procs SIGSEGV fix for some Solaris systems w/ zombies Message-ID: Patches item #764745, was opened at 2003-07-02 12:22 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=764745&group_id=29880 Category: Bugfix Group: None Status: Open Resolution: None Priority: 5 Submitted By: Alexander Matey (amatey) Assigned to: Nobody/Anonymous (nobody) Summary: check_procs SIGSEGV fix for some Solaris systems w/ zombies Initial Comment: check_procs (current CVS rev 1.5) dies with segmentation fault on _some_ of my Solaris systems. I was seeing it on both Solaris 7 and 8 with different patchlevels. Running check_procs under gdb revealed that in all cases it died on check_procs.c:211 because asprintf() call on previous line returned procargs == NULL. The process entry in ps output that caused this was always a zombie process with input_buffer + pos pointing to 0x0 instead of '\n' which the code assumed to be there. Attached patch corrects this behavior by not making this assumption. This may help other OSes besides Solaris. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=764745&group_id=29880 From karl at debisschop.net Wed Jul 2 20:48:06 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Wed Jul 2 20:48:06 2003 Subject: [Nagiosplug-devel] Re: Nagios plugin translation in French In-Reply-To: <3EF83F36.10905@ifsic.univ-rennes1.fr> References: <3E4BC633.2050805@ifsic.univ-rennes1.fr> <3ECC9BF5.2090709@ifsic.univ-rennes1.fr> <1053600186.18021.3.camel@miles.debisschop.net> <3ECCBC29.8030704@ifsic.univ-rennes1.fr> <1053656465.11092.6.camel@localhost.localdomain> <3EF83F36.10905@ifsic.univ-rennes1.fr> Message-ID: <1057203972.16780.5.camel@miles.debisschop.net> On Tue, 2003-06-24 at 08:08, Pierre-Antoine.Angelini admin wrote: > Hi, Karl, > > what's up ? I just did the first part of gettext conversion in my CVS. I need to get a little more comforatble before I commit it back to the repository. I will start by converting one of the C plugins for translation. I have also researched and found that I will be able to add you and the other translators to CVS in such a way that you can do translations without having write access to the entire CVS tree, which should enable us to tabke on a raft of translators without the longe acceptance period we have for full-fledged devlopers. Do you have a sourceforge account? I will need to know you sourceforge user name at some point. Thanks for your patience -- we are finally making progress I think. -- Karl From noreply at sourceforge.net Thu Jul 3 09:50:11 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jul 3 09:50:11 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-764745 ] check_procs SIGSEGV fix for some Solaris systems w/ zombies Message-ID: Patches item #764745, was opened at 2003-07-02 18:22 Message generated for change (Comment added) made by tonvoon You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=764745&group_id=29880 Category: Bugfix Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Alexander Matey (amatey) >Assigned to: Ton Voon (tonvoon) Summary: check_procs SIGSEGV fix for some Solaris systems w/ zombies Initial Comment: check_procs (current CVS rev 1.5) dies with segmentation fault on _some_ of my Solaris systems. I was seeing it on both Solaris 7 and 8 with different patchlevels. Running check_procs under gdb revealed that in all cases it died on check_procs.c:211 because asprintf() call on previous line returned procargs == NULL. The process entry in ps output that caused this was always a zombie process with input_buffer + pos pointing to 0x0 instead of '\n' which the code assumed to be there. Attached patch corrects this behavior by not making this assumption. This may help other OSes besides Solaris. ---------------------------------------------------------------------- >Comment By: Ton Voon (tonvoon) Date: 2003-07-03 17:49 Message: Logged In: YES user_id=664364 Thanks for the patch. I can't seem to get the problem on my SunOS 5.6, but you do say it is a bit erratic. My guess is that the ps command is not returning a carriage return or is missing the field for the comm column. However, the patch doesn't seem to cause a problem on my Solaris server, so I think it is safe to add it in. Committed to check_procs.c v1.16. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=764745&group_id=29880 From kjell.sundtjonn at elkem.no Fri Jul 4 04:45:02 2003 From: kjell.sundtjonn at elkem.no (kjell.sundtjonn at elkem.no) Date: Fri Jul 4 04:45:02 2003 Subject: [Nagiosplug-devel] Performance data Message-ID: I have two questions regarding the output of performance data from nagios plugins Will the next release of the standard nagios plugins include performance data as a part of the output string on those plugins where that is relevant ? Are there any programming guidelines on how the performance data should be formatted ? Thanks Kjell Sundtj?nn From karl at debisschop.net Fri Jul 4 05:09:15 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Fri Jul 4 05:09:15 2003 Subject: [Nagiosplug-devel] Performance data In-Reply-To: References: Message-ID: <1057320445.29839.2.camel@localhost.localdomain> On Fri, 2003-07-04 at 07:43, kjell.sundtjonn at elkem.no wrote: > I have two questions regarding the output of performance data from nagios > plugins > > Will the next release of the standard nagios plugins include performance > data as a part of the output string on those plugins where that is relevant > ? The 1.4.x series will have full support for performance data. 1.3.x is bugfix only and will not get additional perf data support. Release of 1.4.x is tentatively sometime around mid August, at least for beta versions. > Are there any programming guidelines on how the performance data should be > formatted ? Look at nagios docs. I don't remeber what section. -- Karl From RLAdams at Kelsey-Seybold.com Mon Jul 7 09:03:09 2003 From: RLAdams at Kelsey-Seybold.com (Russell Adams) Date: Mon Jul 7 09:03:09 2003 Subject: [Nagiosplug-devel] Plugin Templates Message-ID: <20030707160220.GG2707@soja.ksnet.com.> Has anyone written any plugin templates in C, shell, or Perl? Kind of a best practices template with signal/error handling, timeouts, etc? Figured its about time for me to cleanup some of my custom plugins I wrote for my local use with Netsaint, as I'm upgrading to Nagios 1.1. Comments? Russell From Dmitri.Smirnov at fusepoint.com Mon Jul 7 10:05:13 2003 From: Dmitri.Smirnov at fusepoint.com (Dmitri Smirnov) Date: Mon Jul 7 10:05:13 2003 Subject: [Nagiosplug-devel] check_http cookie and app-proxy support Message-ID: <77F055FA968580429F4546414D8C10E70134E778@s102b.rhcci.net> Hi guys, I've found a number of sites on our infrastructure that require check_http plugin to have cookie support for sessions management and 'Connection: Keep-Alive' in HTTP header to work correctly. Below is a little patch for check_http (latest from CVS) I've made. Will apriciate, guys, if you will review and incorporate such functionality in standard check_http (wrapped by cmd arguments probably). Dmitri -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: check_http.patch.txt URL: From Ton.Voon at egg.com Tue Jul 8 03:59:15 2003 From: Ton.Voon at egg.com (Voon, Ton) Date: Tue Jul 8 03:59:15 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines Message-ID: Hi! One of the features required for 1.4 is performance data. I would like to write up the guidelines for this, but wanted confirmation if this is the right way to go, so any comments would be appreciated. I think perf data should have/be: - short labels - generic and common labels across plugins if possible - comma separated, no spaces. Regex format: [a-z0-9]+=[0-9]?\.?[0-9]+ - redundant data removed (eg, if check_disk returns pct and number (free), can calculate used bytes) My suggestion for labels are: Name ; Units ; printf format ; Details time ; seconds ; %.3f ; time taken to do a specific check (eg DNS query, HTTP request, ping RTA) pct ; percent ; %.3f ; percentage (free rather than used if applicable) (eg total disk, total swap, ping percent loss) number ; must be bytes if applicable ; %d ; a given number of things (free rather than used if applicable) (eg processes, users, bytes used such as total disk or total swap) numberf ; float ; %.3f ; a given number of things that may be fractional (eg, load average, average bytes transmitted) counter ; a continuous counter (must be bytes if applicable) ; %d ; a continuous counter (eg bytes transmitted on an interface) load1 ; load ; %.2f ; load average over 1 min load5 ; load ; %.2f ; load average over 5 min load15 ; load ; %.2f ; load average over 15 min Contentious points: - loadx. Not really keen on these, but don't seem to fit into any other labels, unless we only return load5 and use numberf - taking free values rather than used. This is consistent with the output for check_disk and check_swap. Looking at graphs, I guess you want to see it nearer zero which is your definite limit, rather than continuously increasing - maybe numberf is not required, but we say that number could be fractional. I think this maybe better as RRD doesn't care whether values are integers or not - too reductionalist? Would you prefer labels that describe the measure? I think the labels should be generic and the plugin describes the context As an example, the patches submitted on SF for check_ping had perf labels of rta and loss, but I think these should be time and pct respectively. I think this makes it easier for something like RRD to work out what type of value it is to draw the graphs. Why the returned values are bad is then up to interpretation (and that is the key to any performance analysis!). Ton This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. From Peter.Hoogendijk at atosorigin.com Tue Jul 8 06:37:19 2003 From: Peter.Hoogendijk at atosorigin.com (Hoogendijk, Peter) Date: Tue Jul 8 06:37:19 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines Message-ID: <63C0E7F555D57547BBC0A4457E8E05EB60234D@pwi8004.sd.bnet.nl> Ton, We are in the process of developing a plugin to check information collected by another datacollection system. Based on the 'Performance Data' chapter in the Nagios documentation, we decided on comma-separated 'name=value' pairs. As we want to be able to transparently support the names and values used by the other system, both the name and the value part can optionally be quoted (with either single or double quotes). The result is: Plugin Output|name1=value1, 'name 2'=value2, name3='11"', name4="Peter's PC" To check our procedures for processing the performance data, I also modified the check_ping plugin. It now reports: PING OK - Packet loss = 0%, RTA = 1.96 ms|"Packet loss"=0% RTA="1.96 ms" The problem we are facing with this format is indeed the interpretation by RRD (or in our case the script that's feeding RRD), so we are open for suggestions. Your proposed guideline at least seems to help us find the right direction. Peter. -----Original Message----- From: Voon, Ton [mailto:Ton.Voon at egg.com] Sent: dinsdag 8 juli 2003 12:58 To: 'nagiosplug-devel at lists.sourceforge.net' Subject: [Nagiosplug-devel] RFC: Performance data guidelines Hi! One of the features required for 1.4 is performance data. I would like to write up the guidelines for this, but wanted confirmation if this is the right way to go, so any comments would be appreciated. I think perf data should have/be: - short labels - generic and common labels across plugins if possible - comma separated, no spaces. Regex format: [a-z0-9]+=[0-9]?\.?[0-9]+ - redundant data removed (eg, if check_disk returns pct and number (free), can calculate used bytes) My suggestion for labels are: Name ; Units ; printf format ; Details time ; seconds ; %.3f ; time taken to do a specific check (eg DNS query, HTTP request, ping RTA) pct ; percent ; %.3f ; percentage (free rather than used if applicable) (eg total disk, total swap, ping percent loss) number ; must be bytes if applicable ; %d ; a given number of things (free rather than used if applicable) (eg processes, users, bytes used such as total disk or total swap) numberf ; float ; %.3f ; a given number of things that may be fractional (eg, load average, average bytes transmitted) counter ; a continuous counter (must be bytes if applicable) ; %d ; a continuous counter (eg bytes transmitted on an interface) load1 ; load ; %.2f ; load average over 1 min load5 ; load ; %.2f ; load average over 5 min load15 ; load ; %.2f ; load average over 15 min Contentious points: - loadx. Not really keen on these, but don't seem to fit into any other labels, unless we only return load5 and use numberf - taking free values rather than used. This is consistent with the output for check_disk and check_swap. Looking at graphs, I guess you want to see it nearer zero which is your definite limit, rather than continuously increasing - maybe numberf is not required, but we say that number could be fractional. I think this maybe better as RRD doesn't care whether values are integers or not - too reductionalist? Would you prefer labels that describe the measure? I think the labels should be generic and the plugin describes the context As an example, the patches submitted on SF for check_ping had perf labels of rta and loss, but I think these should be time and pct respectively. I think this makes it easier for something like RRD to work out what type of value it is to draw the graphs. Why the returned values are bad is then up to interpretation (and that is the key to any performance analysis!). Ton This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ Nagiosplug-devel mailing list Nagiosplug-devel at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel ::: Please include plugins version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From paraic at novara.ie Wed Jul 9 05:51:03 2003 From: paraic at novara.ie (Paraic OCeallaigh) Date: Wed Jul 9 05:51:03 2003 Subject: [Nagiosplug-devel] check_udp doesn't work on nagios 1.1 Message-ID: <067e01c34618$9cab7e70$c800a8c0@mediamogul> Hi it looks like check_udp is broken in particular for the port I'm checking - syslog, port 514. Here's the output regardless of what host info or port number I use: nagios:/usr/local/nagios/etc# ../libexec/check_udp -H 172.16.3.10 -p 514 Host name was not supplied Usage: check_udp -H [-p port] [-w warn_time] [-c crit_time] [-e expect] [-s send] [-t to_sec] [-v] nagios:/usr/local/nagios/etc# ../libexec/check_udp -H 172.16.1.10 -p 37 -v Host name was not supplied Usage: check_udp -H [-p port] [-w warn_time] [-c crit_time] [-e expect] [-s send] [-t to_sec] [-v] This seems to be the only feedback I can generate from the check_udp command regardless what I put into it. Let me know if theres a fix or if its scheduled for the next release. Great job with Nagios in general though...cracking app - esp the SSL implementation in nrpe 2.0. Nice one. Rgds, Paraic www.host.ie From noreply at sourceforge.net Wed Jul 9 06:17:09 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jul 9 06:17:09 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-768445 ] Add Exim support to check_mailq.pl Message-ID: Patches item #768445, was opened at 2003-07-09 15:16 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=768445&group_id=29880 Category: Enhancement Group: None Status: Open Resolution: None Priority: 5 Submitted By: Eric Bollengier (ricozz) Assigned to: Nobody/Anonymous (nobody) Summary: Add Exim support to check_mailq.pl Initial Comment: Add Exim support to check_mailq.pl ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=768445&group_id=29880 From jbautista at icnet.com.ve Wed Jul 9 06:50:03 2003 From: jbautista at icnet.com.ve (Jeyri Bautista) Date: Wed Jul 9 06:50:03 2003 Subject: [Nagiosplug-devel] Reports Message-ID: <5.1.0.14.0.20030709094741.00b9cd98@pop.icnet.com.ve> Hi, I want to know if Can I export the trends, availability reports to excel, or other tool? Thakns Jeyri From jeremy+nagios at undergrid.net Wed Jul 9 09:17:06 2003 From: jeremy+nagios at undergrid.net (Jeremy T. Bouse) Date: Wed Jul 9 09:17:06 2003 Subject: [Nagiosplug-devel] check_udp doesn't work on nagios 1.1 In-Reply-To: <067e01c34618$9cab7e70$c800a8c0@mediamogul> References: <067e01c34618$9cab7e70$c800a8c0@mediamogul> Message-ID: <20030709161401.GD27557@UnderGrid.net> Paraic, If you would take a look at the bug report #731467 (when the SourceForge site comes back from maintenance) you would find that the check_udp and check_udp2 are both reported as having problems and appear to need a possible total re-write to get them functioning properly as UDP packets and services do not respond in the same manner as TCP packets due to the nature of UDP packets not using the handshaking to establish a connection as TCP does... Regards, Jeremy On Wed, Jul 09, 2003 at 01:50:01PM +0100, Paraic OCeallaigh wrote: > Hi > it looks like check_udp is broken in particular for the port I'm checking - > syslog, port 514. > Here's the output regardless of what host info or port number I use: > > nagios:/usr/local/nagios/etc# ../libexec/check_udp -H 172.16.3.10 -p 514 > Host name was not supplied > Usage: check_udp -H [-p port] [-w warn_time] [-c crit_time] > [-e expect] [-s send] [-t to_sec] [-v] > > nagios:/usr/local/nagios/etc# ../libexec/check_udp -H 172.16.1.10 -p 37 -v > Host name was not supplied > Usage: check_udp -H [-p port] [-w warn_time] [-c crit_time] > [-e expect] [-s send] [-t to_sec] [-v] > > This seems to be the only feedback I can generate from the check_udp command > regardless what I put into it. > > Let me know if theres a fix or if its scheduled for the next release. > > Great job with Nagios in general though...cracking app - esp the SSL > implementation in nrpe 2.0. Nice one. > > Rgds, > Paraic > www.host.ie > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > www.parasoft.com/bulletproofapps > _______________________________________________ > Nagiosplug-devel mailing list > Nagiosplug-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel > ::: Please include plugins version (-v) and OS when reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null From lists-nagios at host.ie Wed Jul 9 09:23:10 2003 From: lists-nagios at host.ie (lists-nagios at host.ie) Date: Wed Jul 9 09:23:10 2003 Subject: [Nagiosplug-devel] check_udp doesn't work on nagios 1.1 References: <067e01c34618$9cab7e70$c800a8c0@mediamogul> <20030709161401.GD27557@UnderGrid.net> Message-ID: <079c01c34636$52c92c80$c800a8c0@mediamogul> H Jeremy, Thanks for the update. Just wondering if there is an alternative or do I just forget about udp checks for the moment until the problems are sorted? Cheers, Paraic www.host.ie ----- Original Message ----- From: "Jeremy T. Bouse" To: "Paraic OCeallaigh" Cc: Sent: Wednesday, July 09, 2003 5:14 PM Subject: Re: [Nagiosplug-devel] check_udp doesn't work on nagios 1.1 > Paraic, > > If you would take a look at the bug report #731467 (when the > SourceForge site comes back from maintenance) you would find that the > check_udp and check_udp2 are both reported as having problems and appear > to need a possible total re-write to get them functioning properly as > UDP packets and services do not respond in the same manner as TCP > packets due to the nature of UDP packets not using the handshaking to > establish a connection as TCP does... > > Regards, > Jeremy > > On Wed, Jul 09, 2003 at 01:50:01PM +0100, Paraic OCeallaigh wrote: > > Hi > > it looks like check_udp is broken in particular for the port I'm checking - > > syslog, port 514. > > Here's the output regardless of what host info or port number I use: > > > > nagios:/usr/local/nagios/etc# ../libexec/check_udp -H 172.16.3.10 -p 514 > > Host name was not supplied > > Usage: check_udp -H [-p port] [-w warn_time] [-c crit_time] > > [-e expect] [-s send] [-t to_sec] [-v] > > > > nagios:/usr/local/nagios/etc# ../libexec/check_udp -H 172.16.1.10 -p 37 -v > > Host name was not supplied > > Usage: check_udp -H [-p port] [-w warn_time] [-c crit_time] > > [-e expect] [-s send] [-t to_sec] [-v] > > > > This seems to be the only feedback I can generate from the check_udp command > > regardless what I put into it. > > > > Let me know if theres a fix or if its scheduled for the next release. > > > > Great job with Nagios in general though...cracking app - esp the SSL > > implementation in nrpe 2.0. Nice one. > > > > Rgds, > > Paraic > > www.host.ie > > > > > > > > ------------------------------------------------------- > > This SF.Net email sponsored by: Parasoft > > Error proof Web apps, automate testing & more. > > Download & eval WebKing and get a free book. > > www.parasoft.com/bulletproofapps > > _______________________________________________ > > Nagiosplug-devel mailing list > > Nagiosplug-devel at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel > > ::: Please include plugins version (-v) and OS when reporting any issue. > > ::: Messages without supporting info will risk being sent to /dev/null > From noreply at sourceforge.net Thu Jul 10 07:18:03 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jul 10 07:18:03 2003 Subject: [Nagiosplug-devel] [ nagiosplug-New Plugins-769145 ] Allows checking of multiple qmail queues. Message-ID: New Plugins item #769145, was opened at 2003-07-10 14:17 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=769145&group_id=29880 Category: Perl plugin Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jason Burnett (trig_monkeypr0n) Assigned to: Nobody/Anonymous (nobody) Summary: Allows checking of multiple qmail queues. Initial Comment: # The user that your plugins run as must be in the qmail group that has # access to the $qmail_home/queue/mess directories. Based on the check_mailq plugin. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=769145&group_id=29880 From noreply at sourceforge.net Thu Jul 10 07:21:13 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jul 10 07:21:13 2003 Subject: [Nagiosplug-devel] [ nagiosplug-New Plugins-769145 ] Allows checking of multiple qmail queues. Message-ID: New Plugins item #769145, was opened at 2003-07-10 14:17 Message generated for change (Comment added) made by trig_monkeypr0n You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=769145&group_id=29880 Category: Perl plugin Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jason Burnett (trig_monkeypr0n) Assigned to: Nobody/Anonymous (nobody) Summary: Allows checking of multiple qmail queues. Initial Comment: # The user that your plugins run as must be in the qmail group that has # access to the $qmail_home/queue/mess directories. Based on the check_mailq plugin. ---------------------------------------------------------------------- >Comment By: Jason Burnett (trig_monkeypr0n) Date: 2003-07-10 14:20 Message: Logged In: YES user_id=778916 oops forgot the utils.pm addition. Need to add $PATH_TO_QMAIL = "/var/qmail"; to utils.pm ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=769145&group_id=29880 From Ton.Voon at egg.com Thu Jul 10 07:28:16 2003 From: Ton.Voon at egg.com (Voon, Ton) Date: Thu Jul 10 07:28:16 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines Message-ID: Peter, Thanks for your reply. I like the idea of quoting the attributes/values, but I don't think they will be necessary if we get the standard attributes and their values right. I think perfdata should be space separated data (just to save processing), but I'm happy to take a consensus. Comma separated may make it a bit easier to parse visually. Any other opinions? Based on my guidelines, an example output of check_ping would be: PING OK - Packet loss = 0%, RTA = 1.96 ms|pct=0 time=1.96 Three things that spring to mind: - it's a bit shorter! - time means something different from check_http, check_tcp, etc. Those mean "time taken to do a check". For check_ping, it would mean average time for a packet - pct is at 0, which is a "good" result (0% packet loss). However - according to my proposal - check_disk would return pct=5 for 5% free on total disk, which, as it gets closer to 0%, would be "bad". Maybe it should be reversed, so pct=100% to mean no packet loss - should 0% always be considered the worst case? This may not be easy for "number" attributes. As you can see, it is hard to standardise on what the values actually tell you. This is what I meant by "Why the returned values are bad is then up to interpretation (and that is the key to any performance analysis!)". However, what the guidelines will do is allow the RRD generation to happen easier. Ton > -----Original Message----- > From: Hoogendijk, Peter [mailto:Peter.Hoogendijk at atosorigin.com] > Sent: Tuesday, July 08, 2003 2:36 PM > To: Voon, Ton > Cc: nagiosplug-devel at lists.sourceforge.net > Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines > > > Ton, > > We are in the process of developing a plugin to check information > collected by another datacollection system. Based on the 'Performance > Data' chapter in the Nagios documentation, we decided on > comma-separated > 'name=value' pairs. As we want to be able to transparently support the > names and values used by the other system, both the name and the value > part can optionally be quoted (with either single or double > quotes). The > result is: > > Plugin Output|name1=value1, 'name 2'=value2, name3='11"', > name4="Peter's PC" > > To check our procedures for processing the performance data, I also > modified the check_ping plugin. It now reports: > > PING OK - Packet loss = 0%, RTA = 1.96 ms|"Packet loss"=0% > RTA="1.96 ms" > > The problem we are facing with this format is indeed the > interpretation > by RRD (or in our case the script that's feeding RRD), so we are open > for suggestions. Your proposed guideline at least seems to > help us find > the right direction. > > Peter. > > > -----Original Message----- > From: Voon, Ton [mailto:Ton.Voon at egg.com] > Sent: dinsdag 8 juli 2003 12:58 > To: 'nagiosplug-devel at lists.sourceforge.net' > Subject: [Nagiosplug-devel] RFC: Performance data guidelines > > > Hi! > > One of the features required for 1.4 is performance data. I would like > to write up the guidelines for this, but wanted confirmation > if this is > the right way to go, so any comments would be appreciated. > > I think perf data should have/be: > > - short labels > - generic and common labels across plugins if possible > - comma separated, no spaces. Regex format: [a-z0-9]+=[0-9]?\.?[0-9]+ > - redundant data removed (eg, if check_disk returns pct and number > (free), can calculate used bytes) > > My suggestion for labels are: > > Name ; Units ; printf format ; Details > time ; seconds ; %.3f ; time taken to do a specific check (eg > DNS query, > HTTP request, ping RTA) pct ; percent ; %.3f ; percentage (free rather > than used if applicable) (eg total disk, total swap, ping > percent loss) > number ; must be bytes if applicable ; %d ; a given number of things > (free rather than used if applicable) (eg processes, users, bytes used > such as total disk or total swap) numberf ; float ; %.3f ; a given > number of things that may be fractional (eg, load average, > average bytes > transmitted) counter ; a continuous counter (must be bytes if > applicable) ; %d ; a continuous counter (eg bytes transmitted on an > interface) load1 ; load ; %.2f ; load average over 1 min > load5 ; load ; > %.2f ; load average over 5 min load15 ; load ; %.2f ; load > average over > 15 min > > Contentious points: > - loadx. Not really keen on these, but don't seem to fit into > any other > labels, unless we only return load5 and use numberf > - taking free values rather than used. This is consistent with the > output for check_disk and check_swap. Looking at graphs, I guess you > want to see it nearer zero which is your definite limit, rather than > continuously increasing > - maybe numberf is not required, but we say that number could be > fractional. I think this maybe better as RRD doesn't care > whether values > are integers or not > - too reductionalist? Would you prefer labels that describe > the measure? > I think the labels should be generic and the plugin describes the > context > > As an example, the patches submitted on SF for check_ping had perf > labels of rta and loss, but I think these should be time and pct > respectively. I think this makes it easier for something like RRD to > work out what type of value it is to draw the graphs. Why the returned > values are bad is then up to interpretation (and that is the > key to any > performance analysis!). > > Ton > This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. From noreply at sourceforge.net Thu Jul 10 10:14:18 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jul 10 10:14:18 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Support Requests-769227 ] check_hprsc.pl problem Message-ID: Support Requests item #769227, was opened at 2003-07-10 20:04 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397598&aid=769227&group_id=29880 Category: None Group: None Status: Open Priority: 5 Submitted By: MiikaT (mlistsf) Assigned to: Nobody/Anonymous (nobody) Summary: check_hprsc.pl problem Initial Comment: I'm trying to use the check_hprsc.pl contrib plugin to monitor HP-UX 11i disk resources on RH 8 nagios. When I execute the script, I get this: ./check_hprsc.pl --show-filesystems --host pvv.finansium.fi --community public snmpwalk: Timeout kid exited 256 at ./check_hprsc.pl line 108. snmpwalk: Timeout kid exited 256 at ./check_hprsc.pl line 108. snmpwalk: Timeout kid exited 256 at ./check_hprsc.pl line 108. filesystemID1 mounted filesystem filesystem path On the HP-UX client tcpdump shows this: tcpdump: listening on lan1 20:01:20.817510 10.209.12.27.33620 > 10.209.12.21.snmp: F=r [|snmp][|snmp] (DF) 20:01:21.819009 10.209.12.27.33620 > 10.209.12.21: F=r [|snmp][|snmp] (DF) The same thing happens when I manually issue a command snmpwalk -cpublic 10.209.12.21. When I add "-v1" to snmpwalk command, everything works ok. Looks like the perl script lack the version switch for snmpwalk. Is that something that could be fixed? -MiikaT ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397598&aid=769227&group_id=29880 From noreply at sourceforge.net Thu Jul 10 14:26:16 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jul 10 14:26:16 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-769311 ] adds smtp auth ability to the check_smtp plugin Message-ID: Patches item #769311, was opened at 2003-07-10 20:47 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=769311&group_id=29880 Category: Enhancement Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jason Burnett (trig_monkeypr0n) Assigned to: Nobody/Anonymous (nobody) Summary: adds smtp auth ability to the check_smtp plugin Initial Comment: Adds the ability to confirm that your smtp auth mechanism is working on your smtp server. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=769311&group_id=29880 From noreply at sourceforge.net Thu Jul 10 17:00:14 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jul 10 17:00:14 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Bugs-769398 ] Urlize and search strings in check_http Message-ID: Bugs item #769398, was opened at 2003-07-10 16:59 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=769398&group_id=29880 Category: Argument proccessing Group: Release (specify) Status: Open Resolution: None Priority: 5 Submitted By: Jeff Rodriguez (bigtangringo) Assigned to: Nobody/Anonymous (nobody) Summary: Urlize and search strings in check_http Initial Comment: Create a check_http command which does a search that has multiple words. Make the first word actually exist on the page and the following words bogus. The check will fail (as expected) Then URLIZE the check you just made, it succeeds which it should not do. You never changed the actual check itself! COMMAND: ./check_http -H 'www.google.com' - I 'www.google.com' -u '/' -s 'Google this does not exist' OUTPUT: HTTP CRITICAL: string not found|time= 0.177 COMMAND: ./urlize 'http://www.google.com' ./check_http -H 'www.google.com' -I 'www.google.com' -u '/' - s 'Google this does not exist' OUTPUT: HTTP ok: HTTP/1.0 200 OK - 0.377 second response time |time= 0.377 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=769398&group_id=29880 From noreply at sourceforge.net Thu Jul 10 17:01:33 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jul 10 17:01:33 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Bugs-769398 ] Urlize and search strings in check_http Message-ID: Bugs item #769398, was opened at 2003-07-10 16:59 Message generated for change (Comment added) made by bigtangringo You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=769398&group_id=29880 Category: Argument proccessing Group: Release (specify) Status: Open Resolution: None Priority: 5 Submitted By: Jeff Rodriguez (bigtangringo) Assigned to: Nobody/Anonymous (nobody) Summary: Urlize and search strings in check_http Initial Comment: Create a check_http command which does a search that has multiple words. Make the first word actually exist on the page and the following words bogus. The check will fail (as expected) Then URLIZE the check you just made, it succeeds which it should not do. You never changed the actual check itself! COMMAND: ./check_http -H 'www.google.com' - I 'www.google.com' -u '/' -s 'Google this does not exist' OUTPUT: HTTP CRITICAL: string not found|time= 0.177 COMMAND: ./urlize 'http://www.google.com' ./check_http -H 'www.google.com' -I 'www.google.com' -u '/' - s 'Google this does not exist' OUTPUT: HTTP ok: HTTP/1.0 200 OK - 0.377 second response time |time= 0.377 ---------------------------------------------------------------------- >Comment By: Jeff Rodriguez (bigtangringo) Date: 2003-07-10 17:00 Message: Logged In: YES user_id=744211 check_http (nagios-plugins 1.3.0) 1.24 urlize (nagios-plugins 1.3.0) 1.5 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=769398&group_id=29880 From sweetz at domain-hoster.com Thu Jul 10 20:16:36 2003 From: sweetz at domain-hoster.com (Verona Trautmann) Date: Thu Jul 10 20:16:36 2003 Subject: [Nagiosplug-devel] Livecam Botschaft für Nagiosplugdevel Message-ID: An HTML attachment was scrubbed... URL: From karl at debisschop.net Thu Jul 10 21:40:05 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Thu Jul 10 21:40:05 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines In-Reply-To: References: Message-ID: <1057898272.4306.44.camel@miles.debisschop.net> On Thu, 2003-07-10 at 10:30, Voon, Ton wrote: > I like the idea of quoting the attributes/values, but I don't think they > will be necessary if we get the standard attributes and their values right. I agree somewhat - spaces in attributes especially seem avoidable. > I think perfdata should be space separated data (just to save processing), > but I'm happy to take a consensus. Comma separated may make it a bit easier > to parse visually. Any other opinions? While spaces in attributes seem avoidable, I am less sure about spaces in values. I could imagine a plugin where the perf data was a string from a SNMP OID, where we would not really have control over what was in that string. > Based on my guidelines, an example output of check_ping would be: > > PING OK - Packet loss = 0%, RTA = 1.96 ms|pct=0 time=1.96 Why do we not allow the plugin perf data to return units like: PING OK - Packet loss = 0%, RTA = 1.96 ms|loss=0%,time=1.96 ms I only ask because there are implementations of ping that can return 'us' instead of 'ms' - I've alwys felt things are less likely to get confused if you keep units explicit (juat ask NASA and the mars lander team). > Three things that spring to mind: > - it's a bit shorter! Short is good. But not so good that reliability, accuracy, or reasonable clarity should be sacrificed. > - time means something different from check_http, check_tcp, etc. Those mean > "time taken to do a check". For check_ping, it would mean average time for a > packet Hense the idea of allowing units > - pct is at 0, which is a "good" result (0% packet loss). However - > according to my proposal - check_disk would return pct=5 for 5% free on > total disk, which, as it gets closer to 0%, would be "bad". Maybe it should > be reversed, so pct=100% to mean no packet loss - should 0% always be > considered the worst case? This may not be easy for "number" attributes. If you allow units, check_disk could return either DISK OK [6390 MB (42%) free on /]|free=42% or DISK OK [6390 MB (42%) free on /]|used=58% And I would suggest the latter. > As you can see, it is hard to standardise on what the values actually tell > you. This is what I meant by "Why the returned values are bad is then up to > interpretation (and that is the key to any performance analysis!)". However, > what the guidelines will do is allow the RRD generation to happen easier. > > > From: Hoogendijk, Peter [mailto:Peter.Hoogendijk at atosorigin.com] > > > > We are in the process of developing a plugin to check information > > collected by another datacollection system. Based on the 'Performance > > Data' chapter in the Nagios documentation, we decided on > > comma-separated > > 'name=value' pairs. As we want to be able to transparently support the > > names and values used by the other system, both the name and the value > > part can optionally be quoted (with either single or double > > quotes). The > > result is: > > > > Plugin Output|name1=value1, 'name 2'=value2, name3='11"', > > name4="Peter's PC" > > > > To check our procedures for processing the performance data, I also > > modified the check_ping plugin. It now reports: > > > > PING OK - Packet loss = 0%, RTA = 1.96 ms|"Packet loss"=0% > > RTA="1.96 ms" > > > > The problem we are facing with this format is indeed the > > interpretation by RRD (or in our case the script that's > > feeding RRD), so we are open for suggestions. Your proposed > > guideline at least seems to help us find the right direction. > > > > > From: Voon, Ton [mailto:Ton.Voon at egg.com] > > > > > > One of the features required for 1.4 is performance data. I would like > > > to write up the guidelines for this, but wanted confirmation > > > if this is the right way to go, so any comments would be appreciated. Ton - thanks for kicking this off - sorry I was unable to respond immediately. > > > I think perf data should have/be: > > > > > > - short labels > > > - generic and common labels across plugins if possible > > > - comma separated, no spaces. Regex format: [a-z0-9]+=[0-9]?\.?[0-9]+ > > > - redundant data removed (eg, if check_disk returns pct and number > > > (free), can calculate used bytes) > > > > > > My suggestion for labels are: > > > > > > Name ; Units ; printf format ; Details > > > time ; seconds ; %.3f ; time taken to do a specific check (eg > > > DNS query, > > > HTTP request, ping RTA) pct ; percent ; %.3f ; percentage (free rather > > > than used if applicable) (eg total disk, total swap, ping > > > percent loss) > > > number ; must be bytes if applicable ; %d ; a given number of things > > > (free rather than used if applicable) (eg processes, users, bytes used > > > such as total disk or total swap) numberf ; float ; %.3f ; a given > > > number of things that may be fractional (eg, load average, > > > average bytes > > > transmitted) counter ; a continuous counter (must be bytes if > > > applicable) ; %d ; a continuous counter (eg bytes transmitted on an > > > interface) load1 ; load ; %.2f ; load average over 1 min > > > load5 ; load ; > > > %.2f ; load average over 5 min load15 ; load ; %.2f ; load > > > average over > > > 15 min > > > > > > Contentious points: > > > - loadx. Not really keen on these, but don't seem to fit into > > > any other > > > labels, unless we only return load5 and use numberf > > > - taking free values rather than used. This is consistent with the > > > output for check_disk and check_swap. Looking at graphs, I guess you > > > want to see it nearer zero which is your definite limit, rather than > > > continuously increasing > > > - maybe numberf is not required, but we say that number could be > > > fractional. I think this maybe better as RRD doesn't care > > > whether values > > > are integers or not > > > - too reductionalist? Would you prefer labels that describe > > > the measure? > > > I think the labels should be generic and the plugin describes the > > > context > > > > > > As an example, the patches submitted on SF for check_ping had perf > > > labels of rta and loss, but I think these should be time and pct > > > respectively. I think this makes it easier for something like RRD to > > > work out what type of value it is to draw the graphs. Why the returned > > > values are bad is then up to interpretation (and that is the > > > key to any > > > performance analysis!). -- Karl From Stanley.Hopcroft at IPAustralia.Gov.AU Thu Jul 10 22:05:15 2003 From: Stanley.Hopcroft at IPAustralia.Gov.AU (Stanley Hopcroft) Date: Thu Jul 10 22:05:15 2003 Subject: [Nagiosplug-devel] Re: [Nagios-devel] Adding more advanced correlation to nagios with sec (any interest?) In-Reply-To: <200306281948.h5SJmGxN020728@mx1.cs.umb.edu>; from rouilj@cs.umb.edu on Sat, Jun 28, 2003 at 03:48:16PM -0400 References: <200306281948.h5SJmGxN020728@mx1.cs.umb.edu> Message-ID: <20030711150412.B84683@IPAustralia.Gov.AU> Dear Sir, I am writing to thank you for your well conceived and expressed letter and say, On Sat, Jun 28, 2003 at 03:48:16PM -0400, John P. Rouillard wrote: > However, I have some things that I want to do that are not easily > done within nagios. E.G. > > If a system jumpstart is in progress, ignore warnings about high > interface usage (on one interface), or dropped packets (on the > hub). > > If an index operation of the HTTP server is in progress, ignore > warnings about the http interface being slow. > > I also want to show a host/service down if a given system went down, > (as determined by a syslog message) but I want the report done > ONLY if the system isn't back up in 5 minutes. > > Note that none of the rebooting, indexing, or jumpstarting operations > occur at fixed times, so I can't schedule these in advance. > that this, as you say, demonstrates the case for Nagios being able to provide better event correlation than it does now. However, please would you spell what events and their origin are correlated by Sec to avoid spurious alarms in these cases - especially the first two. Is Sec correlating plugin failures with syslog messages ? > Other things can sort of be done in nagios, but it is a bit tough to > configure. E.G. I have a single snmp_trap service defined for my > hosts. The service is considered volatile, and is state_stalked. I > want to do the following: > > If an (particular range of) interfaces on a switch goes down (and > sends a trap) ignore it unless it has gone down/up 3 times in > five minutes. Don't clear it until it has stayed up for at least > 15 minutes. > > Other interfaces on the same switch should be reported immediately > > I could do part of this by adding every one of my 20 interfaces on the > switch as services, but that doesn't really handle the timing aspects. > It makes the services a lot more difficult to read and configure. > > Another thing I want to do is: > > Synthesize an event that notes if two of my three links to > a remote site are having problems. That is two of my three > routers may be in a warn state, and I want to place the > "Access to 16 net" service in a critical state. > > This can be done by event handlers, but you end up writing a portion > of sec to do it, so you might just as well use sec in the first place. > Agreed. > I have a method of integrating sec > into nagios to handle these issues and more. > > Using sec to process traps (or other passive checks) is straight > forward. The trap collector running from snmptrapd just dumps the trap > report (formatted as a nagios passive service check) into sec's input > fifo and then sec processes it, and reports it (if needed) into the > nagios.cmd pipe. > And a very attractive means of handling SNMP traps it is too. Sec has become for me, the standard way of providing event and trap handlers. For example, I have a general host and service handler that updates a MySQL DB with the outage interval. To do this it must retain state (and does so with a Perl hash tied to a DB file) so it can determine if there has been a transition and if so, how long it was. This would probably be easier to do with Sec contexts. > However for polled items, it more difficult. I don't want to have a > flapping service where the plugin determines that there is a problem, > nagios reacts to that, and then sec reacts to that (being fed its info > by an event handler) by clearing the service because sec determines > that there is not yet a problem. This leads to a flapping service as > nagios and sec disagree on what is a true problem, and leads to > spurious notifications because I can't put in a high > max_check_attempts and have nagios respond to sec when it has a real > problem (unless I define yet another service yech). > > What I did was write a plugin in perl (sec_filter) that runs the > nagios command (sort of like check_ssh). It always passes the output > of the plugin to sec's input pipe. However, depending on the flags > given to the sec_filter script, it will exit: > > with an "ignore OK" code, and no output > with an "ignore ERROR" code, and no output > with the exit code and output of the plugin > > I have chosen exit status of 5 for "ignore OK" and 6 for "ignore > ERROR". (It looks like code 4 is used internally for pending states, > and I didn't want to use that number hence my choice of 5 and 6.) > > The reason for these new codes is to make nagios not change any status > for the polled service based on the poll. The new status will be sent > to it by a passive check command generated from sec. > > That is I want nagios to be a (almost) dumb poller and to let sec > filter all the data. If I understand correctly, the proposal is 1 When Nag schedules a service check, of any and all service checks, it in fact execs sec_filter with the real plugin name and flags that determine sec_filters behaviour by allowing it to either 1.1 treat the service as a normal Nag service (a 'polled' service, for which no event correlation by Sec is necessary) 1.2 treat the service as requiring Sec processing to accurately determine the service state. Sec will get the plugin output and use this with other Sec inputs and Sec context to determine the service state 2 Sec_filter writes 2.1 For those services requiring Sec, 2.1.1 An event to Sec 2.1.2 One of the new status codes to Nagios 2.2 Otherwise, in the case of 'polled' services, the usual Nag status codes and plugin output are written to Nags input queue 3 Nag processes former status codes with no changes (ie CRITICAL leads to the check being repeated retry_interval and if the state persists to Notification), but those with the new code of IGNORE_ERROR are recognised as requiring retry at the retry_interval but _no_ other processing. 4 Sec will eventually submit a PROCESS_SERVICE_CHECK_RESULT to the Nag input queue (for the services that have formerly been reported as IGNORE_\w+. Is this correct ? My remarks are 1 This _may_ be better done in the Nag core. Nag could be equipped with configuration directives for Sec processing so that Nag itself could submit the event to Sec (rather than the plugin sec_filter). This saves an extra fork. 2 I am not sure how your proposal relates to the embedded Perl stuff (where each plugin is called as a function from the Nagios address space). This is probably trivial since sec_filter simply becomes another Perl plugin that Nag calls (and sec_filter 'requires' the real Perl plugin so that re-compilation of the real plugin is avoided 3 I like the bit about making Sec processing optional (depending on the options specified to sec_filter) > Using sec provides much better control over flap > detection, and multiple service correlation. Above I said I wanted > nagios to be an almost dumb poller. This is because I want nagios to > poll at the retry_check_interval if there is a problem found by the > plugin. If sec_filter exits with status 6, then nagios polls at the > faster retry interval. This allows sec to better determine the trouble > the system is in, or more easily determine when the system recovers. > For me, I am quite happy with Nags processing of most services. I can't say that the scenarios you mention are problematic for me. However, I would very much like the option of event correlation when required. > I have set it up so that sec itself is a passive nagios service, and > automatically sends notifications to nagios, as well as nagios being > able to poll the sec service if its data gets stale. > > So is anybody interested in my mods (about 30 lines) to nagios to > support this, and my plugin? This needs the comment of the Nagios developer. It sounds attractive to me however. I am sorry if these remarks are stupid or based on misunderstanding. I think I would need to see the mods for a better (marginally) response. It may simply be worth posting them to Nagios-Devel. AFAIK this is not on the Nag road map so it simply may be a golden opportunity for a big benefit. Finally, you have identified a good area for future development. Root cause analysis and event correlation is one area that commercial products can claim superiority. Thank you very much. > > Note, there is a issue with sec in that ;'s can't be embedded in its > action commands. This is a problem since nagios' passive commands are ; > delimited. There should be a new version of sec out (2.1.8) once > testing is complete that addresses this issue. > As you say, this as been dealt with to my satisfaction in 2.1.8. > -- rouilj > John Rouillard > =========================================================================== > My employers don't acknowledge my existence much less my opinions. > Yours sincerely. -- ------------------------------------------------------------------------ Stanley Hopcroft ------------------------------------------------------------------------ '...No man is an island, entire of itself; every man is a piece of the continent, a part of the main. If a clod be washed away by the sea, Europe is the less, as well as if a promontory were, as well as if a manor of thy friend's or of thine own were. Any man's death diminishes me, because I am involved in mankind; and therefore never send to know for whom the bell tolls; it tolls for thee...' from Meditation 17, J Donne. From karl at debisschop.net Thu Jul 10 23:41:04 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Thu Jul 10 23:41:04 2003 Subject: [Nagiosplug-devel] release of 1.3.1 Message-ID: <1057905536.4306.52.camel@miles.debisschop.net> I am packing 1.3.1 now, as work in stable bug fixes seems to be slack. If people have more stuff to add, we can always do the 1.3.2 release. I found one nasty think out, however. The Changelog that Emacs/CVS produces does not separate between the stable and development branches. I have long felt that since I am putting good CVS comments into the log, I should not need to write a changelog narrative. Sorry, but I just don't have the time. But I also must confess tha our current changelog (derived from CVS) is horrible. Any good suggestions on a low effort way to address this? I suppose it would be quite easy enough to make a like script that wrapped the cvs commit with something that took that same comment and put it into the change log. Better ideas, anyone? Anyway, the changelog is junk, but there's a bunch of bugs fixed, and we've waited long enough. Expect a release within the hour. -- Karl From noreply at sourceforge.net Fri Jul 11 00:17:14 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jul 11 00:17:14 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Bugs-769398 ] Urlize and search strings in check_http Message-ID: Bugs item #769398, was opened at 2003-07-10 19:59 Message generated for change (Comment added) made by kdebisschop You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=769398&group_id=29880 Category: Argument proccessing Group: Release (specify) >Status: Closed >Resolution: Invalid Priority: 5 Submitted By: Jeff Rodriguez (bigtangringo) Assigned to: Nobody/Anonymous (nobody) Summary: Urlize and search strings in check_http Initial Comment: Create a check_http command which does a search that has multiple words. Make the first word actually exist on the page and the following words bogus. The check will fail (as expected) Then URLIZE the check you just made, it succeeds which it should not do. You never changed the actual check itself! COMMAND: ./check_http -H 'www.google.com' - I 'www.google.com' -u '/' -s 'Google this does not exist' OUTPUT: HTTP CRITICAL: string not found|time= 0.177 COMMAND: ./urlize 'http://www.google.com' ./check_http -H 'www.google.com' -I 'www.google.com' -u '/' - s 'Google this does not exist' OUTPUT: HTTP ok: HTTP/1.0 200 OK - 0.377 second response time |time= 0.377 ---------------------------------------------------------------------- >Comment By: Karl DeBisschop (kdebisschop) Date: 2003-07-11 03:16 Message: Logged In: YES user_id=1671 This is not a flaw in urlize - the shell strips away your quotes before urlize ever has a chance to see them. You want something like: ./urlize 'http://www.google.com' "./check_http -H 'www.google.com' -I 'www.google.com' -u '/' -s 'Google this does not exist'" ---------------------------------------------------------------------- Comment By: Jeff Rodriguez (bigtangringo) Date: 2003-07-10 20:00 Message: Logged In: YES user_id=744211 check_http (nagios-plugins 1.3.0) 1.24 urlize (nagios-plugins 1.3.0) 1.5 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=769398&group_id=29880 From kjell.sundtjonn at elkem.no Fri Jul 11 01:15:14 2003 From: kjell.sundtjonn at elkem.no (kjell.sundtjonn at elkem.no) Date: Fri Jul 11 01:15:14 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines Message-ID: As I understand it, the major reason for introducing performance data is to be able to integrate Nagios with RRDtool. (Performance data is an open architecture, but it seems that integration with RRDtool is what everyone is talking about). With that as a background I will propose a layout for the PING example as PING OK - Packet loss = 0%, RTA = 1.96 ms|Packet_loss=0%,RTA=1.96ms More generally, performance data should be a comma separated 'name=value[UOM]' list. Name should be a valid and meaningful RRD DataSource name (1 to 19 characters long in the characters [a-zA-Z0-9_]). UOM is optional unit of measurement (%, MB etc, no whitespace alloved). This format is easy to parse and generate RRDtools update statements with the RRD datasource name is given directly in the performance data string. Given some reasonable assumption about the consolidation structure of our RRD databases you should even be able to create new RRD databases on-the-fly for any new service that starts to deliver performance data through Nagios. Kjell Sundtj?nn |---------+--------------------------------------------> | | Karl DeBisschop | | | | | | Sent by: | | | nagiosplug-devel-admin at lists.sour| | | ceforge.net | | | | | | | | | 11.07.2003 06:37 | | | | |---------+--------------------------------------------> >------------------------------------------------------------------------------------------------------------------------------| | | | To: "Voon, Ton" | | cc: "'Hoogendijk, Peter'" , NagiosPlug Devel | | | | Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines | >------------------------------------------------------------------------------------------------------------------------------| On Thu, 2003-07-10 at 10:30, Voon, Ton wrote: > I like the idea of quoting the attributes/values, but I don't think they > will be necessary if we get the standard attributes and their values right. I agree somewhat - spaces in attributes especially seem avoidable. > I think perfdata should be space separated data (just to save processing), > but I'm happy to take a consensus. Comma separated may make it a bit easier > to parse visually. Any other opinions? While spaces in attributes seem avoidable, I am less sure about spaces in values. I could imagine a plugin where the perf data was a string from a SNMP OID, where we would not really have control over what was in that string. > Based on my guidelines, an example output of check_ping would be: > > PING OK - Packet loss = 0%, RTA = 1.96 ms|pct=0 time=1.96 Why do we not allow the plugin perf data to return units like: PING OK - Packet loss = 0%, RTA = 1.96 ms|loss=0%,time=1.96 ms I only ask because there are implementations of ping that can return 'us' instead of 'ms' - I've alwys felt things are less likely to get confused if you keep units explicit (juat ask NASA and the mars lander team). > Three things that spring to mind: > - it's a bit shorter! Short is good. But not so good that reliability, accuracy, or reasonable clarity should be sacrificed. > - time means something different from check_http, check_tcp, etc. Those mean > "time taken to do a check". For check_ping, it would mean average time for a > packet Hense the idea of allowing units > - pct is at 0, which is a "good" result (0% packet loss). However - > according to my proposal - check_disk would return pct=5 for 5% free on > total disk, which, as it gets closer to 0%, would be "bad". Maybe it should > be reversed, so pct=100% to mean no packet loss - should 0% always be > considered the worst case? This may not be easy for "number" attributes. If you allow units, check_disk could return either DISK OK [6390 MB (42%) free on /]|free=42% or DISK OK [6390 MB (42%) free on /]|used=58% And I would suggest the latter. > As you can see, it is hard to standardise on what the values actually tell > you. This is what I meant by "Why the returned values are bad is then up to > interpretation (and that is the key to any performance analysis!)". However, > what the guidelines will do is allow the RRD generation to happen easier. > > > From: Hoogendijk, Peter [mailto:Peter.Hoogendijk at atosorigin.com] > > > > We are in the process of developing a plugin to check information > > collected by another datacollection system. Based on the 'Performance > > Data' chapter in the Nagios documentation, we decided on > > comma-separated > > 'name=value' pairs. As we want to be able to transparently support the > > names and values used by the other system, both the name and the value > > part can optionally be quoted (with either single or double > > quotes). The > > result is: > > > > Plugin Output|name1=value1, 'name 2'=value2, name3='11"', > > name4="Peter's PC" > > > > To check our procedures for processing the performance data, I also > > modified the check_ping plugin. It now reports: > > > > PING OK - Packet loss = 0%, RTA = 1.96 ms|"Packet loss"=0% > > RTA="1.96 ms" > > > > The problem we are facing with this format is indeed the > > interpretation by RRD (or in our case the script that's > > feeding RRD), so we are open for suggestions. Your proposed > > guideline at least seems to help us find the right direction. > > > > > From: Voon, Ton [mailto:Ton.Voon at egg.com] > > > > > > One of the features required for 1.4 is performance data. I would like > > > to write up the guidelines for this, but wanted confirmation > > > if this is the right way to go, so any comments would be appreciated. Ton - thanks for kicking this off - sorry I was unable to respond immediately. > > > I think perf data should have/be: > > > > > > - short labels > > > - generic and common labels across plugins if possible > > > - comma separated, no spaces. Regex format: [a-z0-9]+=[0-9]?\.?[0-9]+> > > - redundant data removed (eg, if check_disk returns pct and number > > > (free), can calculate used bytes) > > > > > > My suggestion for labels are: > > > > > > Name ; Units ; printf format ; Details > > > time ; seconds ; %.3f ; time taken to do a specific check (eg > > > DNS query, > > > HTTP request, ping RTA) pct ; percent ; %.3f ; percentage (free rather > > > than used if applicable) (eg total disk, total swap, ping > > > percent loss) > > > number ; must be bytes if applicable ; %d ; a given number of things > > > (free rather than used if applicable) (eg processes, users, bytes used > > > such as total disk or total swap) numberf ; float ; %.3f ; a given > > > number of things that may be fractional (eg, load average, > > > average bytes > > > transmitted) counter ; a continuous counter (must be bytes if > > > applicable) ; %d ; a continuous counter (eg bytes transmitted on an > > > interface) load1 ; load ; %.2f ; load average over 1 min > > > load5 ; load ; > > > %.2f ; load average over 5 min load15 ; load ; %.2f ; load > > > average over > > > 15 min > > > > > > Contentious points: > > > - loadx. Not really keen on these, but don't seem to fit into > > > any other > > > labels, unless we only return load5 and use numberf > > > - taking free values rather than used. This is consistent with the > > > output for check_disk and check_swap. Looking at graphs, I guess you > > > want to see it nearer zero which is your definite limit, rather than > > > continuously increasing > > > - maybe numberf is not required, but we say that number could be > > > fractional. I think this maybe better as RRD doesn't care > > > whether values > > > are integers or not > > > - too reductionalist? Would you prefer labels that describe > > > the measure? > > > I think the labels should be generic and the plugin describes the > > > context > > > > > > As an example, the patches submitted on SF for check_ping had perf > > > labels of rta and loss, but I think these should be time and pct > > > respectively. I think this makes it easier for something like RRD to > > > work out what type of value it is to draw the graphs. Why the returned > > > values are bad is then up to interpretation (and that is the > > > key to any > > > performance analysis!). -- Karl ------------------------------------------------------- This SF.Net email sponsored by: Parasoft Error proof Web apps, automate testing & more. Download & eval WebKing and get a free book. www.parasoft.com/bulletproofapps1 _______________________________________________ Nagiosplug-devel mailing list Nagiosplug-devel at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel ::: Please include plugins version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From Ton.Voon at egg.com Fri Jul 11 02:57:11 2003 From: Ton.Voon at egg.com (Voon, Ton) Date: Fri Jul 11 02:57:11 2003 Subject: [Nagiosplug-devel] release of 1.3.1 Message-ID: Some of my comments are not worth putting into a Changelog. For instance, I keep doing typos which I don't realise until after a commit. Can I suggest something like prefixing at the beginning of a comment: - "-" for deletion from the Changelog - "*" to signify an important amendment, worthy of a short "features for this release" narrative You could use a perl script to pull these out of the Changelog after it is generated. > -----Original Message----- > From: Karl DeBisschop [mailto:karl at debisschop.net] > Sent: Friday, July 11, 2003 7:39 AM > To: NagiosPlug Devel > Subject: [Nagiosplug-devel] release of 1.3.1 > > > I am packing 1.3.1 now, as work in stable bug fixes seems to be slack. > If people have more stuff to add, we can always do the 1.3.2 release. > > I found one nasty think out, however. The Changelog that Emacs/CVS > produces does not separate between the stable and development > branches. > > I have long felt that since I am putting good CVS comments > into the log, > I should not need to write a changelog narrative. Sorry, but I just > don't have the time. > > But I also must confess tha our current changelog (derived > from CVS) is > horrible. Any good suggestions on a low effort way to address this? I > suppose it would be quite easy enough to make a like script > that wrapped > the cvs commit with something that took that same comment and put it > into the change log. Better ideas, anyone? > > Anyway, the changelog is junk, but there's a bunch of bugs fixed, and > we've waited long enough. Expect a release within the hour. > > -- > Karl This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. From karl at debisschop.net Fri Jul 11 04:47:05 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Fri Jul 11 04:47:05 2003 Subject: [Nagiosplug-devel] release of 1.3.1 In-Reply-To: References: Message-ID: <1057923898.4306.61.camel@miles.debisschop.net> On Fri, 2003-07-11 at 05:31, Voon, Ton wrote: > Some of my comments are not worth putting into a Changelog. For instance, I > keep doing typos which I don't realise until after a commit. Can I suggest > something like prefixing at the beginning of a comment: > > - "-" for deletion from the Changelog > - "*" to signify an important amendment, worthy of a short "features for > this release" narrative > > You could use a perl script to pull these out of the Changelog after it is > generated. I like that idea. It seems like it should be complatible with the other suggestion here, cvs2cl. So let's adopt that as a practice. -- Karl From Ton.Voon at egg.com Fri Jul 11 06:10:02 2003 From: Ton.Voon at egg.com (Voon, Ton) Date: Fri Jul 11 06:10:02 2003 Subject: [Nagiosplug-devel] release of 1.3.1 Message-ID: I've updated the developer-guidelines to reflect. http://nagiosplug.sourceforge.net/developer-guidelines.html will be automatically updated soon (there's a proxy at SF that serves static pages, so it will take a few days) > -----Original Message----- > From: Karl DeBisschop [mailto:karl at debisschop.net] > Sent: Friday, July 11, 2003 12:45 PM > To: Voon, Ton > Cc: NagiosPlug Devel > Subject: RE: [Nagiosplug-devel] release of 1.3.1 > > > On Fri, 2003-07-11 at 05:31, Voon, Ton wrote: > > Some of my comments are not worth putting into a Changelog. > For instance, I > > keep doing typos which I don't realise until after a > commit. Can I suggest > > something like prefixing at the beginning of a comment: > > > > - "-" for deletion from the Changelog > > - "*" to signify an important amendment, worthy of a short > "features for > > this release" narrative > > > > You could use a perl script to pull these out of the > Changelog after it is > > generated. > > I like that idea. It seems like it should be complatible with > the other > suggestion here, cvs2cl. > > So let's adopt that as a practice. > > -- > Karl > This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. From administrator at net-and-works.de Fri Jul 11 06:59:23 2003 From: administrator at net-and-works.de (administrator at net-and-works.de) Date: Fri Jul 11 06:59:23 2003 Subject: [Nagiosplug-devel] Windows Eventlog Addon/Plugin published Message-ID: Hi, we have just released our first public version of a Windows Eventlog Plugin for Nagios. Details can be found on http://naplax.sourceforge.net This addon allows Nagios to monitor Windows EventLogs by querying an agent installed on the Windows machine (the agent is part of this package.) While by default every event is notified by Nagios, extensive filtering can be defined through various parameters. You can do "anything but XY" or"nothing but XY" notifications or some strange things between these two. Martin Schmitz net&works Netzwerke und Service GmbH Luetzerodestrasse 12 D-30161 Hannover, Germany PGP fingerprint: 225E A59C C08A 9ED5 9003 01A1 399B BFE0 6450 CA40 *** Besuchen Sie uns im Netz: http://www.naw.de !!! *** From Ton.Voon at egg.com Fri Jul 11 07:08:01 2003 From: Ton.Voon at egg.com (Voon, Ton) Date: Fri Jul 11 07:08:01 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines Message-ID: I'm starting to side with Kjell's and Karl's idea of labels being separate from the units. I think that was the flaw in my original proposal - if we can standarise on the units, then RRD generation should be fairly easy and then you can keep labels descriptive and whatever you think is suitable for a particular plugin. So my amended proposal is: - output of format 'label=value[UOM]' comma separated - labels 1-19 characters long in class [a-zA-Z0-9_] (should spaces be allowed?) - special labels of warn, warnp, crit and critp (or just warn and crit with different units?). These pass the threshold levels specified on the command line. My idea on this is that you can then use RRD to draw yellow/red lines to show where the warning levels are. - values in class [-0-9.]. No spaces. Karl has a worry about returned values from SNMP OIDs, but I think values should always be a number, so it can be parsed to remove extraneous characters - units one of: no unit specified - assume a number (int or float) of things (users, processes, load averages) s - seconds (also, us, ms) % - percentage b - bytes (also kb, Mb, Tb) c - a continuous counter (such as bytes transmitted on an interface) (Does this interfere with a standard unit?) So some examples: check_ping: PING OK - Packet loss = 0%, RTA = 1.00 ms|packet_loss=0%,rta=1ms,warnp=10%,critp=20% check_disk: DISK OK [1150211 kB (57%) free on /dev/dsk/c0t0d0s0]|free_percent=57%,free=1150Mb,warn=100Mb,warnp=10% I still think that you do not need the total, used and used_percent because these are calculatable from free and free_percent. I would also use free rather than used because the lowest limit is 0 and the output shows free. I think if you specify a set of disks, then data is returned for the total of the disks. check_swap: CRITICAL - Swap used: 18% (778368 out of 4194272)|free_percent=82%,free=778Mb,warnp=5% check_load: OK - load average: 0.03, 0.04, 0.05|load1=0.03,warn=1,crit=2 I think we should only return performance data for 1 set of timings, otherwise it gets very complicated (on a side issue, it is possible to have a plugin return % values instead of load levels?) check_procs: OK - 5 processes running with command name /usr/local/apache/bin/httpd|processes=5,warn=10 Hmmm, this goes against my check_disk example of using 0 as a lower bound as check_procs can only be reported "upwards" check_users: USERS OK - 2 users currently logged in|users=2,warn=10,crit=20 Are we getting closer? Ton This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. From pietrob at lansystems.it Fri Jul 11 07:35:12 2003 From: pietrob at lansystems.it (Pietro Bandera) Date: Fri Jul 11 07:35:12 2003 Subject: [Nagiosplug-devel] Last CVS check_snmp command problem Message-ID: Hi all i found this problem in the last CVS check_snmp command If i do the command ./check_snmp 10.11.58.44 -C public -o enterprises.ibm.ibmProd.ibmServeRaid.ibmServeRaidMIB.ibmServeRaidMibObje cts.ibmServeRaidInfo.ibmServeRaidPhysDeviceTable.ibmServeRaidPhysDeviceE ntry.ibmServeRaidPhysDeviceStatus.\"113\" -l Disco -s "online(2)" Instead of having this output: OK online(2) I got this one Could not open pipe: ./snmpget -m ALL -v 1 -c public 10.11.58.44:161 enterprises.ibm.ibmProd.ibmServeRaid.ibmServeRaidMIB.ibmServeRaidMibObje cts.ibmServeRaidInfo.ibmServeRaidPhysDeviceTable.ibmServeRaidPhysDeviceE ntry.ibmServeRaidPhysDeviceStatus."113" Here there is the strace of the command [root at dns plugins]# strace -v ./check_snmp 10.11.58.44 -C public -o enterprises.ibm.ibmProd.ibmServeRaid.ibmServeRaidMIB.ibmServeRaidMibObje cts.ibmServeRaidInfo.ibmServeRaidPhysDeviceTable.ibmServeRaidPhysDeviceE ntry.ibmServeRaidPhysDeviceStatus.\"113\" -l Disco -s "online(2)" execve("./check_snmp", ["./check_snmp", "10.11.58.44", "-C", "public", "-o", "enterprises.ibm.ibmProd.ibmServeRaid.ibmServeRaidMIB.ibmServeRaidMibObj ects.ibmServeRaidInfo.ibmServeRaidPhysDeviceTable.ibmServeRaidPhysDevice Entry.ibmServeRaidPhysDeviceStatus.\"113\"", "-l", "Disco", "-s", "online(2)"], [/* 18 vars */]) = 0 uname({sysname="Linux", nodename="dns.lansystems.it", release="2.4.18-24.7.x", version="#1 Fri Jan 31 07:06:03 EST 2003", machine="i686"}) = 0 brk(0) = 0x804e290 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_dev=makedev(104, 3), st_ino=17680, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=56, st_size=24707, st_atime=2003/07/11-16:35:39, st_mtime=2003/06/13-17:18:16, st_ctime=2003/06/13-17:18:16}) = 0 old_mmap(NULL, 24707, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40014000 close(3) = 0 open("/lib/i686/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0Pv\1B4\0"..., 1024) = 1024 fstat64(3, {st_dev=makedev(104, 3), st_ino=55883, st_mode=S_IFREG|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=2752, st_size=1402035, st_atime=2003/07/11-16:35:39, st_mtime=2002/10/10-14:58:59, st_ctime=2002/11/08-12:21:58}) = 0 old_mmap(0x42000000, 1264960, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x42000000 mprotect(0x4212c000, 36160, PROT_NONE) = 0 old_mmap(0x4212c000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x12c000) = 0x4212c000 old_mmap(0x42131000, 15680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x42131000 close(3) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4001b000 munmap(0x40014000, 24707) = 0 brk(0) = 0x804e290 brk(0x804e2c0) = 0x804e2c0 brk(0x804f000) = 0x804f000 getrlimit(0x4, 0xbffff2a0) = 0 setrlimit(RLIMIT_CORE, {rlim_cur=0, rlim_max=RLIM_INFINITY}) = 0 fstat64(1, {st_dev=makedev(0, 6), st_ino=3, st_mode=S_IFCHR|0620, st_nlink=1, st_uid=502, st_gid=5, st_blksize=1024, st_blocks=0, st_rdev=makedev(136, 1), st_atime=2003/07/11-16:35:39, st_mtime=2003/07/11-16:35:39, st_ctime=2003/07/11-16:25:28}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40014000 write(1, "Could not open pipe: /usr/bin/sn"..., 271Could not open pipe: /usr/bin/snmpget -t 1 -r 9 -m ALL -v 1 -c public 10.11.58.44:161 enterprises.ibm.ibmProd.ibmServeRaid.ibmServeRaidMIB.ibmServeRaidMibObje cts.ibmServeRaidInfo.ibmServeRaidPhysDeviceTable.ibmServeRaidPhysDeviceE ntry.ibmServeRaidPhysDeviceStatus."113" ) = 271 munmap(0x40014000, 4096) = 0 _exit(3) = ? [root at dns plugins]# Lan Systems Srl via Roncati 9 40143 Bologna (Italy) tel +39 051 6150511 fax +39 051 6150535 mobile 348 7112587 From kjell.sundtjonn at elkem.no Sat Jul 12 09:43:20 2003 From: kjell.sundtjonn at elkem.no (kjell.sundtjonn at elkem.no) Date: Sat Jul 12 09:43:20 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines Message-ID: I really like the idea of including the critical and warning level together with max and min values in the performance data, but let me propose an alternative layout based on colon (:) separated fields : - output of format 'label=value[UOM]:[critical]:[warning]:[max]:[min]' comma separated - labels 1-19 characters long in class [a-zA-Z0-9_] (spaces allowed, but not recommended) - values, critical, warning, max, min in class [-0-9.]. No spaces. - critical and warning is the thresholds for this measurement - max and min is the maximum/minimum value for the measurement It think this is easier to parse than the proposal from Ton based on 'magical' words. Example Disk space DISK OK [22118452 kB (84%) free on /dev/hda3] [81574 kB (85%) free on /dev/hda2] [252600 kB (100%) free on /dev/shm]|_dev_hda3=84%:10:25:100:0, _dev_hda2=85%:10:25:100:0,_dev_shm=100%:10:25:100:0 For disk space and other plugins where the UOM is defined when the plugin is called, use the active OUM as the value for the performance data. Notice how the / is replaced with _ to ensure a valid RRD datasource name. It is necessary to show the performance data for each disk in a disk set, not only for the total as Ton proposes. PING PING OK - Packet loss = 0%, RTA = 1.00ms|packet_loss=0%:20:10:100:0,RTA=1ms:20:30::0 The empty max value for RTA is understood as undefined. Kjell Sundtj?nn |---------+--------------------------------------------> | | "Voon, Ton" | | | Sent by: | | | nagiosplug-devel-admin at lists.sour| | | ceforge.net | | | | | | | | | 11.07.2003 16:10 | | | | |---------+--------------------------------------------> >----------------------------------------------------------------------------------------------| | | | To: NagiosPlug Devel | | cc: | | Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines | >----------------------------------------------------------------------------------------------| I'm starting to side with Kjell's and Karl's idea of labels being separate from the units. I think that was the flaw in my original proposal - if we can standarise on the units, then RRD generation should be fairly easy and then you can keep labels descriptive and whatever you think is suitable for a particular plugin. So my amended proposal is: - output of format 'label=value[UOM]' comma separated - labels 1-19 characters long in class [a-zA-Z0-9_] (should spaces be allowed?) - special labels of warn, warnp, crit and critp (or just warn and crit with different units?). These pass the threshold levels specified on the command line. My idea on this is that you can then use RRD to draw yellow/red lines to show where the warning levels are. - values in class [-0-9.]. No spaces. Karl has a worry about returned values from SNMP OIDs, but I think values should always be a number, so it can be parsed to remove extraneous characters - units one of: no unit specified - assume a number (int or float) of things (users, processes, load averages) s - seconds (also, us, ms) % - percentage b - bytes (also kb, Mb, Tb) c - a continuous counter (such as bytes transmitted on an interface) (Does this interfere with a standard unit?) So some examples: check_ping: PING OK - Packet loss = 0%, RTA = 1.00 ms|packet_loss=0%,rta=1ms,warnp=10%,critp=20% check_disk: DISK OK [1150211 kB (57%) free on /dev/dsk/c0t0d0s0]|free_percent=57%,free=1150Mb,warn=100Mb,warnp=10% I still think that you do not need the total, used and used_percent because these are calculatable from free and free_percent. I would also use free rather than used because the lowest limit is 0 and the output shows free. I think if you specify a set of disks, then data is returned for the total of the disks. check_swap: CRITICAL - Swap used: 18% (778368 out of 4194272)|free_percent=82%,free=778Mb,warnp=5% check_load: OK - load average: 0.03, 0.04, 0.05|load1=0.03,warn=1,crit=2 I think we should only return performance data for 1 set of timings, otherwise it gets very complicated (on a side issue, it is possible to have a plugin return % values instead of load levels?) check_procs: OK - 5 processes running with command name /usr/local/apache/bin/httpd|processes=5,warn=10 Hmmm, this goes against my check_disk example of using 0 as a lower bound as check_procs can only be reported "upwards" check_users: USERS OK - 2 users currently logged in|users=2,warn=10,crit=20 Are we getting closer? Ton This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. ------------------------------------------------------- This SF.Net email sponsored by: Parasoft Error proof Web apps, automate testing & more. Download & eval WebKing and get a free book. www.parasoft.com/bulletproofapps1 _______________________________________________ Nagiosplug-devel mailing list Nagiosplug-devel at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel ::: Please include plugins version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From Peter.Hoogendijk at atosorigin.com Mon Jul 14 06:44:49 2003 From: Peter.Hoogendijk at atosorigin.com (Hoogendijk, Peter) Date: Mon Jul 14 06:44:49 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines Message-ID: <63C0E7F555D57547BBC0A4457E8E05EB602368@pwi8004.sd.bnet.nl> Karl, Ton, I have been thinking about this during the weekend. In my opinion there are two types of plugins: 1) Plugins that perform a specific (direct) check and return a specific answer. In this case you (the author of the plugin) can make an exact choice about both the plugin output and the performance data format. 2) Plugins that perform a lookup (indirect) check and return (an interpretation) of the result. This is the case with plugins checking SNMP or the Microsoft Windows Perfmon data. This second type of plugin is causing the problems. Karl remarks that 'spaces in attributes seem avoidable', but looking at the results returned by Microsoft Windows Perfmon, we see a lot of objects counters and results with spaces: '\System\System Up Time'='15693 sec' We could decide to remove the spaces, or replace them by underscores, but this makes the whole process less transparent. As a result, I prefer a set of guidelines that allows for strings containing any characters. To summarize the questions I came up with while defining the output/perfdata format for a lookup (indirect) plugin: - Do I use single quotes or double quotes? - How do I escape this character if it exists in a string? - Do I use spaces or comma's to separate the data? I myself prefer to use single quotes as used in mySql queries: put single quotes around the string and double any single quotes in the string itself. For the seperating character I have no preference: I just used the character as proposed in the 'Performance Data' chapter of the Nagios documentation. Peter. P.S. If the strings themself contain spaces, but don't contain '=' characters or seperator characters, the quotes aren't even needed! -----Original Message----- From: Karl DeBisschop [mailto:karl at debisschop.net] Sent: vrijdag 11 juli 2003 06:38 To: Voon, Ton Cc: Hoogendijk, Peter; NagiosPlug Devel Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines On Thu, 2003-07-10 at 10:30, Voon, Ton wrote: > I like the idea of quoting the attributes/values, but I don't think > they will be necessary if we get the standard attributes and their > values right. I agree somewhat - spaces in attributes especially seem avoidable. > I think perfdata should be space separated data (just to save > processing), but I'm happy to take a consensus. Comma separated may > make it a bit easier to parse visually. Any other opinions? While spaces in attributes seem avoidable, I am less sure about spaces in values. I could imagine a plugin where the perf data was a string from a SNMP OID, where we would not really have control over what was in that string. > Based on my guidelines, an example output of check_ping would be: > > PING OK - Packet loss = 0%, RTA = 1.96 ms|pct=0 time=1.96 Why do we not allow the plugin perf data to return units like: PING OK - Packet loss = 0%, RTA = 1.96 ms|loss=0%,time=1.96 ms I only ask because there are implementations of ping that can return 'us' instead of 'ms' - I've alwys felt things are less likely to get confused if you keep units explicit (juat ask NASA and the mars lander team). > Three things that spring to mind: > - it's a bit shorter! Short is good. But not so good that reliability, accuracy, or reasonable clarity should be sacrificed. > - time means something different from check_http, check_tcp, etc. > Those mean "time taken to do a check". For check_ping, it would mean > average time for a packet Hense the idea of allowing units > - pct is at 0, which is a "good" result (0% packet loss). However - > according to my proposal - check_disk would return pct=5 for 5% free > on total disk, which, as it gets closer to 0%, would be "bad". Maybe > it should be reversed, so pct=100% to mean no packet loss - should 0% > always be considered the worst case? This may not be easy for "number" > attributes. If you allow units, check_disk could return either DISK OK [6390 MB (42%) free on /]|free=42% or DISK OK [6390 MB (42%) free on /]|used=58% And I would suggest the latter. > As you can see, it is hard to standardise on what the values actually > tell you. This is what I meant by "Why the returned values are bad is > then up to interpretation (and that is the key to any performance > analysis!)". However, what the guidelines will do is allow the RRD > generation to happen easier. > > > From: Hoogendijk, Peter [mailto:Peter.Hoogendijk at atosorigin.com] > > > > We are in the process of developing a plugin to check information > > collected by another datacollection system. Based on the > > 'Performance Data' chapter in the Nagios documentation, we decided > > on comma-separated 'name=value' pairs. As we want to be able to > > transparently support the names and values used by the other system, > > both the name and the value part can optionally be quoted (with > > either single or double quotes). The > > result is: > > > > Plugin Output|name1=value1, 'name 2'=value2, name3='11"', > > name4="Peter's PC" > > > > To check our procedures for processing the performance data, I also > > modified the check_ping plugin. It now reports: > > > > PING OK - Packet loss = 0%, RTA = 1.96 ms|"Packet loss"=0% > > RTA="1.96 ms" > > > > The problem we are facing with this format is indeed the > > interpretation by RRD (or in our case the script that's > > feeding RRD), so we are open for suggestions. Your proposed > > guideline at least seems to help us find the right direction. > > > > > From: Voon, Ton [mailto:Ton.Voon at egg.com] > > > > > > One of the features required for 1.4 is performance data. I would > > > like to write up the guidelines for this, but wanted confirmation > > > if this is the right way to go, so any comments would be > > > appreciated. Ton - thanks for kicking this off - sorry I was unable to respond immediately. > > > I think perf data should have/be: > > > > > > - short labels > > > - generic and common labels across plugins if possible > > > - comma separated, no spaces. Regex format: > > > [a-z0-9]+=[0-9]?\.?[0-9]+ > > > - redundant data removed (eg, if check_disk returns pct and number > > > (free), can calculate used bytes) > > > > > > My suggestion for labels are: > > > > > > Name ; Units ; printf format ; Details > > > time ; seconds ; %.3f ; time taken to do a specific check (eg > > > DNS query, > > > HTTP request, ping RTA) pct ; percent ; %.3f ; percentage (free rather > > > than used if applicable) (eg total disk, total swap, ping > > > percent loss) > > > number ; must be bytes if applicable ; %d ; a given number of things > > > (free rather than used if applicable) (eg processes, users, bytes used > > > such as total disk or total swap) numberf ; float ; %.3f ; a given > > > number of things that may be fractional (eg, load average, > > > average bytes > > > transmitted) counter ; a continuous counter (must be bytes if > > > applicable) ; %d ; a continuous counter (eg bytes transmitted on an > > > interface) load1 ; load ; %.2f ; load average over 1 min > > > load5 ; load ; > > > %.2f ; load average over 5 min load15 ; load ; %.2f ; load > > > average over > > > 15 min > > > > > > Contentious points: > > > - loadx. Not really keen on these, but don't seem to fit into > > > any other > > > labels, unless we only return load5 and use numberf > > > - taking free values rather than used. This is consistent with the > > > output for check_disk and check_swap. Looking at graphs, I guess you > > > want to see it nearer zero which is your definite limit, rather than > > > continuously increasing > > > - maybe numberf is not required, but we say that number could be > > > fractional. I think this maybe better as RRD doesn't care > > > whether values > > > are integers or not > > > - too reductionalist? Would you prefer labels that describe > > > the measure? > > > I think the labels should be generic and the plugin describes the > > > context > > > > > > As an example, the patches submitted on SF for check_ping had perf > > > labels of rta and loss, but I think these should be time and pct > > > respectively. I think this makes it easier for something like RRD > > > to work out what type of value it is to draw the graphs. Why the > > > returned values are bad is then up to interpretation (and that is > > > the key to any performance analysis!). -- Karl From Ton.Voon at egg.com Tue Jul 15 06:31:03 2003 From: Ton.Voon at egg.com (Voon, Ton) Date: Tue Jul 15 06:31:03 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines Message-ID: Kjell, Firstly, just want to say thank you for your contribution. This is a fascinating thread. I much rather have this discussion now than it raised as design problems afterwards! Yeah, I thought afterwards that check_disk has to be different as a summation does not really tell you anything useful. My preference is that the the output reflects the filesystem, not the device, but we can use a switch for that. I think the : sepearated fields instead of crit,warn,critp,warnp is better too - the new check_disk allows different thresholds per disk, so this fits in well. However, some questions pop up: 1) I don't like the min and max values. I think that information is held with the UOM (% is 0-100, seconds is 0-infinity). If there is no UOM, then assume any value. 2) what about check_disk -w 5% -w 10000? If there is no min/max, then it could be: 'label=value[UOM][:critical:warning[:critical:warning]]' 3) what about "critical at 10%, but no warning levels"? Can just use a null, I guess. 4) check_procs allows you to say -c 5:5 to mean alert if not exactly 5 processes. Is this doable at all? If so, would we need to change the separators? Ton > -----Original Message----- > From: kjell.sundtjonn at elkem.no [mailto:kjell.sundtjonn at elkem.no] > Sent: Saturday, July 12, 2003 5:41 PM > To: NagiosPlug Devel > Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines > > > > I really like the idea of including the critical and warning > level together > with max and min values in the performance data, but let me propose an > alternative layout based on colon (:) separated fields : > > - output of format 'label=value[UOM]:[critical]:[warning]:[max]:[min]' > comma separated > - labels 1-19 characters long in class [a-zA-Z0-9_] (spaces > allowed, but > not recommended) > - values, critical, warning, max, min in class [-0-9.]. No spaces. > - critical and warning is the thresholds for this measurement > - max and min is the maximum/minimum value for the measurement > > It think this is easier to parse than the proposal from Ton based on > 'magical' words. > > Example > > Disk space > DISK OK [22118452 kB (84%) free on /dev/hda3] [81574 kB (85%) free on > /dev/hda2] [252600 kB (100%) free on > /dev/shm]|_dev_hda3=84%:10:25:100:0, > _dev_hda2=85%:10:25:100:0,_dev_shm=100%:10:25:100:0 > > For disk space and other plugins where the UOM is defined > when the plugin > is called, use the active OUM as the value for the > performance data. Notice > how the / is replaced with _ to ensure a valid RRD datasource > name. It is > necessary to show the performance data for each disk in a > disk set, not > only for the total as Ton proposes. > > PING > > PING OK - Packet loss = 0%, RTA = > 1.00ms|packet_loss=0%:20:10:100:0,RTA=1ms:20:30::0 > > The empty max value for RTA is understood as undefined. > > > > Kjell Sundtj?nn > > > > |---------+--------------------------------------------> > | | "Voon, Ton" | > | | Sent by: | > | | nagiosplug-devel-admin at lists.sour| > | | ceforge.net | > | | | > | | | > | | 11.07.2003 16:10 | > | | | > |---------+--------------------------------------------> > > >------------------------------------------------------------- > ---------------------------------| > | > | > | To: NagiosPlug Devel > | > | cc: > | > | Subject: RE: [Nagiosplug-devel] RFC: Performance > data guidelines | > > >------------------------------------------------------------- > ---------------------------------| > > > > > I'm starting to side with Kjell's and Karl's idea of labels > being separate > from the units. I think that was the flaw in my original > proposal - if we > can standarise on the units, then RRD generation should be > fairly easy and > then you can keep labels descriptive and whatever you think > is suitable for > a particular plugin. > > So my amended proposal is: > > - output of format 'label=value[UOM]' comma separated > - labels 1-19 characters long in class [a-zA-Z0-9_] (should spaces be > allowed?) > - special labels of warn, warnp, crit and critp (or just warn > and crit with > different units?). These pass the threshold levels specified > on the command > line. My idea on this is that you can then use RRD to draw > yellow/red lines > to show where the warning levels are. > - values in class [-0-9.]. No spaces. Karl has a worry about returned > values > from SNMP OIDs, but I think values should always be a number, > so it can be > parsed to remove extraneous characters > - units one of: > > no unit specified - assume a number (int or float) of things (users, > processes, load averages) > s - seconds (also, us, ms) > % - percentage > b - bytes (also kb, Mb, Tb) > c - a continuous counter (such as bytes transmitted on an > interface) (Does > this interfere with a standard unit?) > > So some examples: > > check_ping: > PING OK - Packet loss = 0%, RTA = 1.00 > ms|packet_loss=0%,rta=1ms,warnp=10%,critp=20% > > check_disk: > DISK OK [1150211 kB (57%) free on > /dev/dsk/c0t0d0s0]|free_percent=57%,free=1150Mb,warn=100Mb,warnp=10% > I still think that you do not need the total, used and > used_percent because > these are calculatable from free and free_percent. I would > also use free > rather than used because the lowest limit is 0 and the output > shows free. I > think if you specify a set of disks, then data is returned > for the total of > the disks. > > check_swap: > CRITICAL - Swap used: 18% (778368 out of > 4194272)|free_percent=82%,free=778Mb,warnp=5% > > check_load: > OK - load average: 0.03, 0.04, 0.05|load1=0.03,warn=1,crit=2 > I think we should only return performance data for 1 set of timings, > otherwise it gets very complicated (on a side issue, it is > possible to have > a plugin return % values instead of load levels?) > > check_procs: > OK - 5 processes running with command name > /usr/local/apache/bin/httpd|processes=5,warn=10 > Hmmm, this goes against my check_disk example of using 0 as a > lower bound > as > check_procs can only be reported "upwards" > > check_users: > USERS OK - 2 users currently logged in|users=2,warn=10,crit=20 > > Are we getting closer? > > Ton > > > This private and confidential e-mail has been sent to you by Egg. > The Egg group of companies includes Egg Banking plc > (registered no. 2999842), Egg Financial Products Ltd (registered > no. 3319027) and Egg Investments Ltd (registered no. 3403963) which > carries out investment business on behalf of Egg and is regulated > by the Financial Services Authority. > Registered in England and Wales. Registered offices: 1 > Waterhouse Square, > 138-142 Holborn, London EC1N 2NA. > If you are not the intended recipient of this e-mail and have > received it in error, please notify the sender by replying with > 'received in error' as the subject and then delete it from your > mailbox. > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > www.parasoft.com/bulletproofapps1 > _______________________________________________ > Nagiosplug-devel mailing list > Nagiosplug-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel > ::: Please include plugins version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > > > > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > www.parasoft.com/bulletproofapps1 > _______________________________________________ > Nagiosplug-devel mailing list > Nagiosplug-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel > ::: Please include plugins version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > From Ton.Voon at egg.com Tue Jul 15 09:01:11 2003 From: Ton.Voon at egg.com (Voon, Ton) Date: Tue Jul 15 09:01:11 2003 Subject: [Nagiosplug-devel] check_http cookie and app-proxy support Message-ID: Dmitri, Thanks very much for your patch. I'm sorry it has taken so long to look at it. I've given it a try and it seems to work okay with sites that do set cookies. However, it seems to fail when a site does not check for cookies - it just hangs when querying the site. I think there's a bug in your patch somewhere? If you do update your patch, please post on sourceforge so we can keep track of it: http://sourceforge.net/tracker/?group_id=29880&atid=397599 Thanks, Ton > -----Original Message----- > From: Dmitri Smirnov [mailto:Dmitri.Smirnov at fusepoint.com] > Sent: Monday, July 07, 2003 6:04 PM > To: nagiosplug-devel at lists.sourceforge.net > Subject: [Nagiosplug-devel] check_http cookie and app-proxy support > > > Hi guys, > > I've found a number of sites on our infrastructure that require > check_http plugin to have > cookie support for sessions management and 'Connection: Keep-Alive' in > HTTP header to work correctly. > Below is a little patch for check_http (latest from CVS) I've made. > Will apriciate, guys, if you will review and incorporate such > functionality in standard check_http (wrapped by cmd arguments > probably). > > Dmitri > This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. From Ton.Voon at egg.com Tue Jul 15 09:56:19 2003 From: Ton.Voon at egg.com (Voon, Ton) Date: Tue Jul 15 09:56:19 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines Message-ID: Peter, Firstly, just want to say thank you for your contribution. This is a fascinating thread. I much rather have this discussion now than it raised as design problems afterwards! Good point about the two different types of plugins. I think we are starting to nail down "homegrown" plugins, so I think that will be finalised soon. Regarding plugins through indirect checks, I think there has to be a level of translation - it's just trying to work out where. To get RRD graphs (for homegrown or indirect plugins), I think there are 4 generic steps: 1) Perf data returned by the plugin 2) Data stored (in db or file) 3) Extracts the perf data into an RRD 4) Draw the graph Given that indirect plugins return their performance metrics in different formats, there needs to be a translation at some point. If the plugins just return whatever the result from the lookup, then the translation needs to happen at step 2 or 3. The advantage is that the code is only held only once (instead of check_snmp and check_nt). The disadvantage is you will not get useful data like what the thresholds were. I propose that the translation happen at the plugin - step 1. So, from your example, '\System\System Up Time'='15693 sec' is returned as '\System\System Up Time'=15693s (Or whatever we decide the format of the performance data will eventually be.) Re the labels, looking at RRD's manual, it says labels must be between 1-19 chars in the class [a-zA-Z0-9_], which is going to make your example v difficult. I say RRD has the limitation, so keep it like the example and let step 3 handles the conversion for RRD (if a different graphing program was used, there may not be the same limitation). Does this make sense? Any further comments? Ton > -----Original Message----- > From: Hoogendijk, Peter [mailto:Peter.Hoogendijk at atosorigin.com] > Sent: Monday, July 14, 2003 2:44 PM > To: Karl DeBisschop; Voon, Ton > Cc: NagiosPlug Devel > Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines > > > Karl, Ton, > > I have been thinking about this during the weekend. In my > opinion there > are two types of plugins: > > 1) Plugins that perform a specific (direct) check and return a > specific answer. In this case you (the author of the plugin) > can make an > exact choice about both the plugin output and the performance data > format. > > 2) Plugins that perform a lookup (indirect) check and return (an > interpretation) of the result. This is the case with plugins checking > SNMP or the Microsoft Windows Perfmon data. > > This second type of plugin is causing the problems. Karl remarks that > 'spaces in attributes seem avoidable', but looking at the results > returned by Microsoft Windows Perfmon, we see a lot of > objects counters > and results with spaces: > > '\System\System Up Time'='15693 sec' > > We could decide to remove the spaces, or replace them by underscores, > but this makes the whole process less transparent. As a > result, I prefer > a set of guidelines that allows for strings containing any characters. > To summarize the questions I came up with while defining the > output/perfdata format for a lookup (indirect) plugin: > > - Do I use single quotes or double quotes? > - How do I escape this character if it exists in a string? > - Do I use spaces or comma's to separate the data? > > I myself prefer to use single quotes as used in mySql queries: put > single quotes around the string and double any single quotes in the > string itself. For the seperating character I have no > preference: I just > used the character as proposed in the 'Performance Data' > chapter of the > Nagios documentation. > > Peter. > > P.S. If the strings themself contain spaces, but don't contain '=' > characters or seperator characters, the quotes aren't even needed! > > > -----Original Message----- > From: Karl DeBisschop [mailto:karl at debisschop.net] > Sent: vrijdag 11 juli 2003 06:38 > To: Voon, Ton > Cc: Hoogendijk, Peter; NagiosPlug Devel > Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines > > > On Thu, 2003-07-10 at 10:30, Voon, Ton wrote: > > > I like the idea of quoting the attributes/values, but I don't think > > they will be necessary if we get the standard attributes and their > > values right. > > I agree somewhat - spaces in attributes especially seem avoidable. > > > I think perfdata should be space separated data (just to save > > processing), but I'm happy to take a consensus. Comma separated may > > make it a bit easier to parse visually. Any other opinions? > > While spaces in attributes seem avoidable, I am less sure about spaces > in values. I could imagine a plugin where the perf data was a string > from a SNMP OID, where we would not really have control over > what was in > that string. > > > Based on my guidelines, an example output of check_ping would be: > > > > PING OK - Packet loss = 0%, RTA = 1.96 ms|pct=0 time=1.96 > > Why do we not allow the plugin perf data to return units like: > > PING OK - Packet loss = 0%, RTA = 1.96 ms|loss=0%,time=1.96 ms > > I only ask because there are implementations of ping that can return > 'us' instead of 'ms' - I've alwys felt things are less likely to get > confused if you keep units explicit (juat ask NASA and the mars lander > team). > > > Three things that spring to mind: > > - it's a bit shorter! > > Short is good. But not so good that reliability, accuracy, or > reasonable > clarity should be sacrificed. > > > - time means something different from check_http, check_tcp, etc. > > Those mean "time taken to do a check". For check_ping, it > would mean > > average time for a packet > > Hense the idea of allowing units > > > - pct is at 0, which is a "good" result (0% packet loss). However - > > according to my proposal - check_disk would return pct=5 > for 5% free > > on total disk, which, as it gets closer to 0%, would be > "bad". Maybe > > it should be reversed, so pct=100% to mean no packet loss - > should 0% > > always be considered the worst case? This may not be easy > for "number" > > > attributes. > > If you allow units, check_disk could return either > > DISK OK [6390 MB (42%) free on /]|free=42% > > or > > DISK OK [6390 MB (42%) free on /]|used=58% > > And I would suggest the latter. > > > As you can see, it is hard to standardise on what the > values actually > > tell you. This is what I meant by "Why the returned values > are bad is > > then up to interpretation (and that is the key to any performance > > analysis!)". However, what the guidelines will do is allow the RRD > > generation to happen easier. > > > > > From: Hoogendijk, Peter [mailto:Peter.Hoogendijk at atosorigin.com] > > > > > > We are in the process of developing a plugin to check information > > > collected by another datacollection system. Based on the > > > 'Performance Data' chapter in the Nagios documentation, > we decided > > > on comma-separated 'name=value' pairs. As we want to be able to > > > transparently support the names and values used by the > other system, > > > > both the name and the value part can optionally be quoted (with > > > either single or double quotes). The > > > result is: > > > > > > Plugin Output|name1=value1, 'name 2'=value2, name3='11"', > > > name4="Peter's PC" > > > > > > To check our procedures for processing the performance > data, I also > > > modified the check_ping plugin. It now reports: > > > > > > PING OK - Packet loss = 0%, RTA = 1.96 ms|"Packet loss"=0% > > > RTA="1.96 ms" > > > > > > The problem we are facing with this format is indeed the > > > interpretation by RRD (or in our case the script that's > > > feeding RRD), so we are open for suggestions. Your proposed > > > guideline at least seems to help us find the right direction. > > > > > > > From: Voon, Ton [mailto:Ton.Voon at egg.com] > > > > > > > > One of the features required for 1.4 is performance > data. I would > > > > like to write up the guidelines for this, but wanted > confirmation > > > > if this is the right way to go, so any comments would be > > > > appreciated. > > Ton - thanks for kicking this off - sorry I was unable to respond > immediately. > > > > > I think perf data should have/be: > > > > > > > > - short labels > > > > - generic and common labels across plugins if possible > > > > - comma separated, no spaces. Regex format: > > > > [a-z0-9]+=[0-9]?\.?[0-9]+ > > > > - redundant data removed (eg, if check_disk returns pct > and number > > > > (free), can calculate used bytes) > > > > > > > > My suggestion for labels are: > > > > > > > > Name ; Units ; printf format ; Details > > > > time ; seconds ; %.3f ; time taken to do a specific check (eg > > > > DNS query, > > > > HTTP request, ping RTA) pct ; percent ; %.3f ; percentage (free > rather > > > > than used if applicable) (eg total disk, total swap, ping > > > > percent loss) > > > > number ; must be bytes if applicable ; %d ; a given number of > things > > > > (free rather than used if applicable) (eg processes, > users, bytes > used > > > > such as total disk or total swap) numberf ; float ; > %.3f ; a given > > > > number of things that may be fractional (eg, load average, > > > > average bytes > > > > transmitted) counter ; a continuous counter (must be bytes if > > > > applicable) ; %d ; a continuous counter (eg bytes transmitted on > an > > > > interface) load1 ; load ; %.2f ; load average over 1 min > > > > load5 ; load ; > > > > %.2f ; load average over 5 min load15 ; load ; %.2f ; load > > > > average over > > > > 15 min > > > > > > > > Contentious points: > > > > - loadx. Not really keen on these, but don't seem to fit into > > > > any other > > > > labels, unless we only return load5 and use numberf > > > > - taking free values rather than used. This is > consistent with the > > > > output for check_disk and check_swap. Looking at graphs, I guess > you > > > > want to see it nearer zero which is your definite limit, rather > than > > > > continuously increasing > > > > - maybe numberf is not required, but we say that number could be > > > > fractional. I think this maybe better as RRD doesn't care > > > > whether values > > > > are integers or not > > > > - too reductionalist? Would you prefer labels that describe > > > > the measure? > > > > I think the labels should be generic and the plugin > describes the > > > > context > > > > > > > > As an example, the patches submitted on SF for > check_ping had perf > > > > > labels of rta and loss, but I think these should be > time and pct > > > > respectively. I think this makes it easier for > something like RRD > > > > to work out what type of value it is to draw the > graphs. Why the > > > > returned values are bad is then up to interpretation > (and that is > > > > the key to any performance analysis!). > > -- > Karl > This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. From karl at debisschop.net Tue Jul 15 16:49:19 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Tue Jul 15 16:49:19 2003 Subject: [Nagiosplug-devel] check_http cookie and app-proxy support In-Reply-To: References: Message-ID: <1058312846.7138.1.camel@miles.debisschop.net> On Tue, 2003-07-15 at 12:04, Voon, Ton wrote: > Dmitri, > > Thanks very much for your patch. I'm sorry it has taken so long to look at > it. > > I've given it a try and it seems to work okay with sites that do set > cookies. However, it seems to fail when a site does not check for cookies - > it just hangs when querying the site. I think there's a bug in your patch > somewhere? > > If you do update your patch, please post on sourceforge so we can keep track > of it: http://sourceforge.net/tracker/?group_id=29880&atid=397599 When I think of taking check_http any further than it is, curl libs seems like the way to go, AFAICT. I haven't bit that one off yet, but people may want to take it into consideration. -- Karl From Peter.Hoogendijk at atosorigin.com Wed Jul 16 00:29:32 2003 From: Peter.Hoogendijk at atosorigin.com (Hoogendijk, Peter) Date: Wed Jul 16 00:29:32 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines Message-ID: <63C0E7F555D57547BBC0A4457E8E05EB60237B@pwi8004.sd.bnet.nl> Ton, This certainly makes sense. I was thinking along the same lines and concluded that I need two extra (optional) plugin options: 1) An option to set the label: -L label (--label) 2) An option to to specify the format of the data: -P printf (--printf) This solves the problem of the RRD labels. It also proves you are right with your proposal to do the translations at the plugin, as this is also the place where I have to configure the perfmon counter to be checked (for this discussion I'll stick to the Microsoft Windows Perfmon example). As a result, the perfmon plugin would take the following options: -f filename (--filename) -C counter (--counter) -S scanf (--scanf) -L label (--label) -P printf (--printf) -w warning threshold (--warning) -c critical threshold (--critical) The resulting command to perform the check would be: ./check_perfmon -f /var/log/perfmon/hostname -C "\System\System Up Time" -S "%l" -L "SystemUpTime" -P "%ls" The filename, as specified with the -f option, points to the file that contains a list of Microsoft Windows Perfmon counters and their values for the host being checked. This file is generated using a third-party product, running as a service on the Microsoft Windows servers. The option names I used are open to discussion, but the principle at solves the problems being discussed. It also leaves the format of the perfdata free to be adapted to the program that will process this data. This leaves me with the specification of the thresholds. The developer guidelines are mostly clear, but just to make sure: how do I specify a warning below 10 and a critical above 45 ? For counters having a known range, this is clear, but what do I do with a signed counter value, when I don't know the possible minimum and maximum values? Peter. -----Original Message----- From: Voon, Ton [mailto:Ton.Voon at egg.com] Sent: dinsdag 15 juli 2003 15:53 To: Hoogendijk, Peter; Karl DeBisschop Cc: NagiosPlug Devel Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines Peter, Firstly, just want to say thank you for your contribution. This is a fascinating thread. I much rather have this discussion now than it raised as design problems afterwards! Good point about the two different types of plugins. I think we are starting to nail down "homegrown" plugins, so I think that will be finalised soon. Regarding plugins through indirect checks, I think there has to be a level of translation - it's just trying to work out where. To get RRD graphs (for homegrown or indirect plugins), I think there are 4 generic steps: 1) Perf data returned by the plugin 2) Data stored (in db or file) 3) Extracts the perf data into an RRD 4) Draw the graph Given that indirect plugins return their performance metrics in different formats, there needs to be a translation at some point. If the plugins just return whatever the result from the lookup, then the translation needs to happen at step 2 or 3. The advantage is that the code is only held only once (instead of check_snmp and check_nt). The disadvantage is you will not get useful data like what the thresholds were. I propose that the translation happen at the plugin - step 1. So, from your example, '\System\System Up Time'='15693 sec' is returned as '\System\System Up Time'=15693s (Or whatever we decide the format of the performance data will eventually be.) Re the labels, looking at RRD's manual, it says labels must be between 1-19 chars in the class [a-zA-Z0-9_], which is going to make your example v difficult. I say RRD has the limitation, so keep it like the example and let step 3 handles the conversion for RRD (if a different graphing program was used, there may not be the same limitation). Does this make sense? Any further comments? Ton > -----Original Message----- > From: Hoogendijk, Peter [mailto:Peter.Hoogendijk at atosorigin.com] > Sent: Monday, July 14, 2003 2:44 PM > To: Karl DeBisschop; Voon, Ton > Cc: NagiosPlug Devel > Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines > > > Karl, Ton, > > I have been thinking about this during the weekend. In my > opinion there > are two types of plugins: > > 1) Plugins that perform a specific (direct) check and return a > specific answer. In this case you (the author of the plugin) can make > an exact choice about both the plugin output and the performance data > format. > > 2) Plugins that perform a lookup (indirect) check and return (an > interpretation) of the result. This is the case with plugins checking > SNMP or the Microsoft Windows Perfmon data. > > This second type of plugin is causing the problems. Karl remarks that > 'spaces in attributes seem avoidable', but looking at the results > returned by Microsoft Windows Perfmon, we see a lot of objects > counters and results with spaces: > > '\System\System Up Time'='15693 sec' > > We could decide to remove the spaces, or replace them by underscores, > but this makes the whole process less transparent. As a result, I > prefer a set of guidelines that allows for strings containing any > characters. To summarize the questions I came up with while defining > the output/perfdata format for a lookup (indirect) plugin: > > - Do I use single quotes or double quotes? > - How do I escape this character if it exists in a string? > - Do I use spaces or comma's to separate the data? > > I myself prefer to use single quotes as used in mySql queries: put > single quotes around the string and double any single quotes in the > string itself. For the seperating character I have no > preference: I just > used the character as proposed in the 'Performance Data' > chapter of the > Nagios documentation. > > Peter. > > P.S. If the strings themself contain spaces, but don't contain '=' > characters or seperator characters, the quotes aren't even needed! > > > -----Original Message----- > From: Karl DeBisschop [mailto:karl at debisschop.net] > Sent: vrijdag 11 juli 2003 06:38 > To: Voon, Ton > Cc: Hoogendijk, Peter; NagiosPlug Devel > Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines > > > On Thu, 2003-07-10 at 10:30, Voon, Ton wrote: > > > I like the idea of quoting the attributes/values, but I don't think > > they will be necessary if we get the standard attributes and their > > values right. > > I agree somewhat - spaces in attributes especially seem avoidable. > > > I think perfdata should be space separated data (just to save > > processing), but I'm happy to take a consensus. Comma separated may > > make it a bit easier to parse visually. Any other opinions? > > While spaces in attributes seem avoidable, I am less sure about spaces > in values. I could imagine a plugin where the perf data was a string > from a SNMP OID, where we would not really have control over what was > in that string. > > > Based on my guidelines, an example output of check_ping would be: > > > > PING OK - Packet loss = 0%, RTA = 1.96 ms|pct=0 time=1.96 > > Why do we not allow the plugin perf data to return units like: > > PING OK - Packet loss = 0%, RTA = 1.96 ms|loss=0%,time=1.96 ms > > I only ask because there are implementations of ping that can return > 'us' instead of 'ms' - I've alwys felt things are less likely to get > confused if you keep units explicit (juat ask NASA and the mars lander > team). > > > Three things that spring to mind: > > - it's a bit shorter! > > Short is good. But not so good that reliability, accuracy, or > reasonable > clarity should be sacrificed. > > > - time means something different from check_http, check_tcp, etc. > > Those mean "time taken to do a check". For check_ping, it > would mean > > average time for a packet > > Hense the idea of allowing units > > > - pct is at 0, which is a "good" result (0% packet loss). However - > > according to my proposal - check_disk would return pct=5 > for 5% free > > on total disk, which, as it gets closer to 0%, would be > "bad". Maybe > > it should be reversed, so pct=100% to mean no packet loss - > should 0% > > always be considered the worst case? This may not be easy > for "number" > > > attributes. > > If you allow units, check_disk could return either > > DISK OK [6390 MB (42%) free on /]|free=42% > > or > > DISK OK [6390 MB (42%) free on /]|used=58% > > And I would suggest the latter. > > > As you can see, it is hard to standardise on what the > values actually > > tell you. This is what I meant by "Why the returned values > are bad is > > then up to interpretation (and that is the key to any performance > > analysis!)". However, what the guidelines will do is allow the RRD > > generation to happen easier. > > > > > From: Hoogendijk, Peter [mailto:Peter.Hoogendijk at atosorigin.com] > > > > > > We are in the process of developing a plugin to check information > > > collected by another datacollection system. Based on the > > > 'Performance Data' chapter in the Nagios documentation, > we decided > > > on comma-separated 'name=value' pairs. As we want to be able to > > > transparently support the names and values used by the > other system, > > > > both the name and the value part can optionally be quoted (with > > > either single or double quotes). The > > > result is: > > > > > > Plugin Output|name1=value1, 'name 2'=value2, name3='11"', > > > name4="Peter's PC" > > > > > > To check our procedures for processing the performance > data, I also > > > modified the check_ping plugin. It now reports: > > > > > > PING OK - Packet loss = 0%, RTA = 1.96 ms|"Packet loss"=0% > > > RTA="1.96 ms" > > > > > > The problem we are facing with this format is indeed the > > > interpretation by RRD (or in our case the script that's feeding > > > RRD), so we are open for suggestions. Your proposed guideline at > > > least seems to help us find the right direction. > > > > > > > From: Voon, Ton [mailto:Ton.Voon at egg.com] > > > > > > > > One of the features required for 1.4 is performance > data. I would > > > > like to write up the guidelines for this, but wanted > confirmation > > > > if this is the right way to go, so any comments would be > > > > appreciated. > > Ton - thanks for kicking this off - sorry I was unable to respond > immediately. > > > > > I think perf data should have/be: > > > > > > > > - short labels > > > > - generic and common labels across plugins if possible > > > > - comma separated, no spaces. Regex format: > > > > [a-z0-9]+=[0-9]?\.?[0-9]+ > > > > - redundant data removed (eg, if check_disk returns pct > and number > > > > (free), can calculate used bytes) > > > > > > > > My suggestion for labels are: > > > > > > > > Name ; Units ; printf format ; Details > > > > time ; seconds ; %.3f ; time taken to do a specific check (eg > > > > DNS query, HTTP request, ping RTA) pct ; percent ; %.3f ; > > > > percentage (free > rather > > > > than used if applicable) (eg total disk, total swap, ping > > > > percent loss) > > > > number ; must be bytes if applicable ; %d ; a given number of > things > > > > (free rather than used if applicable) (eg processes, > users, bytes > used > > > > such as total disk or total swap) numberf ; float ; > %.3f ; a given > > > > number of things that may be fractional (eg, load average, > > > > average bytes > > > > transmitted) counter ; a continuous counter (must be bytes if > > > > applicable) ; %d ; a continuous counter (eg bytes transmitted on > an > > > > interface) load1 ; load ; %.2f ; load average over 1 min > > > > load5 ; load ; > > > > %.2f ; load average over 5 min load15 ; load ; %.2f ; load > > > > average over > > > > 15 min > > > > > > > > Contentious points: > > > > - loadx. Not really keen on these, but don't seem to fit into > > > > any other labels, unless we only return load5 and use numberf > > > > - taking free values rather than used. This is > consistent with the > > > > output for check_disk and check_swap. Looking at graphs, I guess > you > > > > want to see it nearer zero which is your definite limit, rather > than > > > > continuously increasing > > > > - maybe numberf is not required, but we say that number could be > > > > fractional. I think this maybe better as RRD doesn't care > > > > whether values are integers or not > > > > - too reductionalist? Would you prefer labels that describe > > > > the measure? > > > > I think the labels should be generic and the plugin > describes the > > > > context > > > > > > > > As an example, the patches submitted on SF for > check_ping had perf > > > > > labels of rta and loss, but I think these should be > time and pct > > > > respectively. I think this makes it easier for > something like RRD > > > > to work out what type of value it is to draw the > graphs. Why the > > > > returned values are bad is then up to interpretation > (and that is > > > > the key to any performance analysis!). > > -- > Karl > This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. ------------------------------------------------------- This SF.Net email sponsored by: Parasoft Error proof Web apps, automate testing & more. Download & eval WebKing and get a free book. www.parasoft.com/bulletproofapps1 _______________________________________________ Nagiosplug-devel mailing list Nagiosplug-devel at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel ::: Please include plugins version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From karl at debisschop.net Wed Jul 16 04:24:26 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Wed Jul 16 04:24:26 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines In-Reply-To: <63C0E7F555D57547BBC0A4457E8E05EB60237B@pwi8004.sd.bnet.nl> References: <63C0E7F555D57547BBC0A4457E8E05EB60237B@pwi8004.sd.bnet.nl> Message-ID: <1058354492.4954.26.camel@miles.debisschop.net> On Wed, 2003-07-16 at 03:28, Hoogendijk, Peter wrote: > Ton, > > This certainly makes sense. I was thinking along the same lines and > concluded that I need two extra (optional) plugin options: > > 1) An option to set the label: -L label (--label) > 2) An option to to specify the format of the data: -P printf > (--printf) > > This solves the problem of the RRD labels. It also proves you are right > with your proposal to do the translations at the plugin, as this is also > the place where I have to configure the perfmon counter to be checked > (for this discussion I'll stick to the Microsoft Windows Perfmon > example). As a result, the perfmon plugin would take the following > options: > > -f filename (--filename) > -C counter (--counter) > -S scanf (--scanf) > -L label (--label) > -P printf (--printf) > -w warning threshold (--warning) > -c critical threshold (--critical) > > The resulting command to perform the check would be: > > ./check_perfmon -f /var/log/perfmon/hostname -C "\System\System Up > Time" -S "%l" -L "SystemUpTime" -P "%ls" > > The filename, as specified with the -f option, points to the file that > contains a list of Microsoft Windows Perfmon counters and their values > for the host being checked. This file is generated using a third-party > product, running as a service on the Microsoft Windows servers. > > The option names I used are open to discussion, but the principle at > solves the problems being discussed. It also leaves the format of the > perfdata free to be adapted to the program that will process this data. Why not: ./check_perfmon -f /var/log/perfmon/hostname \ -C "\System\System Up Time" -S "%l" \ -P "SystemUpTime=%ls" Since you are providing a printf format, you really don't need to separately specify the label AFAICS. > how do I specify a warning below 10 and a critical above 45 ? Most (all?) plugins will balk at this. For instance, you can warn outside the range 10-25 and send a critcal response outside the range 0-25, which would be similar. But in general the plugins should and do check to make sure that the values that generate critical condtions are a subset of the warning specification, possibly inclusive. But there is the problem of passing ranges out through the perf data. In most cases, it's a single value - but for some plugins the "good" zone may be above the threshold, and for others below. For ranges, unless RRDtool or others have a native syntax for specifying ranges, I would this we just pass ours our - the good range is within a colon-separated pair. > but what do I do with a signed counter value, when > I don't know the possible minimum and maximum values? ISTM that if he possible minimum/maximum values are not known, a well behaved application would not require that they be specified. -- Karl From noreply at sourceforge.net Wed Jul 16 08:05:13 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jul 16 08:05:13 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Bugs-772366 ] check_udp2 on 1.3.1 ? Message-ID: Bugs item #772366, was opened at 2003-07-16 17:04 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=772366&group_id=29880 Category: None Group: Release (specify) Status: Open Resolution: None Priority: 5 Submitted By: R?nald CASAGRAUDE (kipit) Assigned to: Nobody/Anonymous (nobody) Summary: check_udp2 on 1.3.1 ? Initial Comment: Is it normal that check_udp2 (symbolic link to check_tcp) disappear from this release ? This link is present on nagios-plugins-CVS and creating the link by hand with 1.3.1 (release) do the job... If the disappearance of check_udp2 is normal, how to check if an udp port is open ? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=772366&group_id=29880 From kjell.sundtjonn at elkem.no Wed Jul 16 10:55:21 2003 From: kjell.sundtjonn at elkem.no (kjell.sundtjonn at elkem.no) Date: Wed Jul 16 10:55:21 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines Message-ID: Ton A few comments >1) I don't like the min and max values. I think that information is held >with the UOM (% is 0-100, seconds is 0-infinity). If there is no UOM, then >assume any value. My reason to include the max min values is to bring along as much information in the performance data string as possible. Max an min is relevant information to get correct scaling on graphs drawn by tools as RRD. RRD accept this as an optional parameter in the data source definition and I think that it should be included when available (that is when the plugin logic can deduct this in a sensible way). If the UOM is % you can assume min =0, max=100, but if you monitor free your diskspace in GB, information about the total diskspace available is valuable information the plugin can provide. >2) what about check_disk -w 5% -w 10000? If there is no min/max, then it >could be: 'label=value[UOM][:critical:warning[:critical:warning]]' What about changing the general layout to 'label=value1[UOM];value2[UOM];...[:[critical1;critical2;...] [:[warning1;warning2;...][:[max1;max2;...][:min1;max2;...]]]]' This can handle the situation you describe, all information is carried over to the tool you choose to use to parse the performance data in a structured format easily parseable. We should allow any character (except '=') in the label. Changing this to correct syntax for RRD datasource (or other tool of your choice) name should be left to the program you select to parse the data. >3) what about "critical at 10%, but no warning levels"? Can just use a null, >I guess. >4) check_procs allows you to say -c 5:5 to mean alert if not exactly 5 >processes. Is this doable at all? If so, would we need to change the >separators? Let us allow for embedding parameters in quotes No_processes=5:'5:5'::10:0 I must say that I am sceptical to the proposal by Peter Hoogendijk to use scanf and print format specifiers as parameters to the plugins. Let us try to develop a common recommendation on performance data that enables all relevant information to be forwarded from the plugin to the tool you select to parse the data. It is important that this is a common definition that simplifies the program needed for parsing the data. Kjell Sundtj?nn |---------+--------------------------------------------> | | "Voon, Ton" | | | Sent by: | | | nagiosplug-devel-admin at lists.sour| | | ceforge.net | | | | | | | | | 15.07.2003 15:32 | | | | |---------+--------------------------------------------> >----------------------------------------------------------------------------------------------| | | | To: "'kjell.sundtjonn at elkem.no'" , NagiosPlug Devel | | | | cc: | | Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines | >----------------------------------------------------------------------------------------------| Kjell, Firstly, just want to say thank you for your contribution. This is a fascinating thread. I much rather have this discussion now than it raised as design problems afterwards! Yeah, I thought afterwards that check_disk has to be different as a summation does not really tell you anything useful. My preference is that the the output reflects the filesystem, not the device, but we can use a switch for that. I think the : sepearated fields instead of crit,warn,critp,warnp is better too - the new check_disk allows different thresholds per disk, so this fits in well. However, some questions pop up: 1) I don't like the min and max values. I think that information is held with the UOM (% is 0-100, seconds is 0-infinity). If there is no UOM, then assume any value. 2) what about check_disk -w 5% -w 10000? If there is no min/max, then it could be: 'label=value[UOM][:critical:warning[:critical:warning]]' 3) what about "critical at 10%, but no warning levels"? Can just use a null, I guess. 4) check_procs allows you to say -c 5:5 to mean alert if not exactly 5 processes. Is this doable at all? If so, would we need to change the separators? Ton > -----Original Message----- > From: kjell.sundtjonn at elkem.no [mailto:kjell.sundtjonn at elkem.no] > Sent: Saturday, July 12, 2003 5:41 PM > To: NagiosPlug Devel > Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines > > > > I really like the idea of including the critical and warning > level together > with max and min values in the performance data, but let me propose an > alternative layout based on colon (:) separated fields : > > - output of format 'label=value[UOM]:[critical]:[warning]:[max]:[min]' > comma separated > - labels 1-19 characters long in class [a-zA-Z0-9_] (spaces > allowed, but > not recommended) > - values, critical, warning, max, min in class [-0-9.]. No spaces. > - critical and warning is the thresholds for this measurement > - max and min is the maximum/minimum value for the measurement > > It think this is easier to parse than the proposal from Ton based on > 'magical' words. > > Example > > Disk space > DISK OK [22118452 kB (84%) free on /dev/hda3] [81574 kB (85%) free on > /dev/hda2] [252600 kB (100%) free on > /dev/shm]|_dev_hda3=84%:10:25:100:0, > _dev_hda2=85%:10:25:100:0,_dev_shm=100%:10:25:100:0 > > For disk space and other plugins where the UOM is defined > when the plugin > is called, use the active OUM as the value for the > performance data. Notice > how the / is replaced with _ to ensure a valid RRD datasource > name. It is > necessary to show the performance data for each disk in a > disk set, not > only for the total as Ton proposes. > > PING > > PING OK - Packet loss = 0%, RTA = > 1.00ms|packet_loss=0%:20:10:100:0,RTA=1ms:20:30::0 > > The empty max value for RTA is understood as undefined. > > > > Kjell Sundtj?nn > > > > |---------+--------------------------------------------> > | | "Voon, Ton" | > | | Sent by: | > | | nagiosplug-devel-admin at lists.sour| > | | ceforge.net | > | | | > | | | > | | 11.07.2003 16:10 | > | | | > |---------+--------------------------------------------> > > >------------------------------------------------------------- > ---------------------------------| > | > | > | To: NagiosPlug Devel > | > | cc: > | > | Subject: RE: [Nagiosplug-devel] RFC: Performance > data guidelines | > > >------------------------------------------------------------- > ---------------------------------| > > > > > I'm starting to side with Kjell's and Karl's idea of labels > being separate > from the units. I think that was the flaw in my original > proposal - if we > can standarise on the units, then RRD generation should be > fairly easy and > then you can keep labels descriptive and whatever you think > is suitable for > a particular plugin. > > So my amended proposal is: > > - output of format 'label=value[UOM]' comma separated > - labels 1-19 characters long in class [a-zA-Z0-9_] (should spaces be > allowed?) > - special labels of warn, warnp, crit and critp (or just warn > and crit with > different units?). These pass the threshold levels specified > on the command > line. My idea on this is that you can then use RRD to draw > yellow/red lines > to show where the warning levels are. > - values in class [-0-9.]. No spaces. Karl has a worry about returned > values > from SNMP OIDs, but I think values should always be a number, > so it can be > parsed to remove extraneous characters > - units one of: > > no unit specified - assume a number (int or float) of things (users, > processes, load averages) > s - seconds (also, us, ms) > % - percentage > b - bytes (also kb, Mb, Tb) > c - a continuous counter (such as bytes transmitted on an > interface) (Does > this interfere with a standard unit?) > > So some examples: > > check_ping: > PING OK - Packet loss = 0%, RTA = 1.00 > ms|packet_loss=0%,rta=1ms,warnp=10%,critp=20% > > check_disk: > DISK OK [1150211 kB (57%) free on > /dev/dsk/c0t0d0s0]|free_percent=57%,free=1150Mb,warn=100Mb,warnp=10% > I still think that you do not need the total, used and > used_percent because > these are calculatable from free and free_percent. I would > also use free > rather than used because the lowest limit is 0 and the output > shows free. I > think if you specify a set of disks, then data is returned > for the total of > the disks. > > check_swap: > CRITICAL - Swap used: 18% (778368 out of > 4194272)|free_percent=82%,free=778Mb,warnp=5% > > check_load: > OK - load average: 0.03, 0.04, 0.05|load1=0.03,warn=1,crit=2 > I think we should only return performance data for 1 set of timings, > otherwise it gets very complicated (on a side issue, it is > possible to have > a plugin return % values instead of load levels?) > > check_procs: > OK - 5 processes running with command name > /usr/local/apache/bin/httpd|processes=5,warn=10 > Hmmm, this goes against my check_disk example of using 0 as a > lower bound > as > check_procs can only be reported "upwards" > > check_users: > USERS OK - 2 users currently logged in|users=2,warn=10,crit=20 > > Are we getting closer? > > Ton > > > This private and confidential e-mail has been sent to you by Egg. > The Egg group of companies includes Egg Banking plc > (registered no. 2999842), Egg Financial Products Ltd (registered > no. 3319027) and Egg Investments Ltd (registered no. 3403963) which > carries out investment business on behalf of Egg and is regulated > by the Financial Services Authority. > Registered in England and Wales. Registered offices: 1 > Waterhouse Square, > 138-142 Holborn, London EC1N 2NA. > If you are not the intended recipient of this e-mail and have > received it in error, please notify the sender by replying with > 'received in error' as the subject and then delete it from your > mailbox. > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > www.parasoft.com/bulletproofapps1 > _______________________________________________ > Nagiosplug-devel mailing list > Nagiosplug-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel > ::: Please include plugins version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > > > > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > www.parasoft.com/bulletproofapps1 > _______________________________________________ > Nagiosplug-devel mailing list > Nagiosplug-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel > ::: Please include plugins version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > ------------------------------------------------------- This SF.Net email sponsored by: Parasoft Error proof Web apps, automate testing & more. Download & eval WebKing and get a free book. www.parasoft.com/bulletproofapps1 _______________________________________________ Nagiosplug-devel mailing list Nagiosplug-devel at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel ::: Please include plugins version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From noreply at sourceforge.net Wed Jul 16 19:59:03 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jul 16 19:59:03 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-772757 ] check_disk same thing mentioned in ( 726552 ). Message-ID: Patches item #772757, was opened at 2003-07-16 21:58 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=772757&group_id=29880 Category: Enhancement Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jason Burnett (trig_monkeypr0n) Assigned to: Nobody/Anonymous (nobody) Summary: check_disk same thing mentioned in ( 726552 ). Initial Comment: This is the same enhancement mentioned in ( 726552 ) but the actual patch is attached. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=772757&group_id=29880 From noreply at sourceforge.net Wed Jul 16 22:16:25 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jul 16 22:16:25 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Bugs-772366 ] check_udp2 on 1.3.1 ? Message-ID: Bugs item #772366, was opened at 2003-07-16 08:04 Message generated for change (Comment added) made by undrgrid You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=772366&group_id=29880 Category: None Group: Release (specify) Status: Open Resolution: None Priority: 5 Submitted By: R?nald CASAGRAUDE (kipit) >Assigned to: Jeremy T. Bouse (undrgrid) Summary: check_udp2 on 1.3.1 ? Initial Comment: Is it normal that check_udp2 (symbolic link to check_tcp) disappear from this release ? This link is present on nagios-plugins-CVS and creating the link by hand with 1.3.1 (release) do the job... If the disappearance of check_udp2 is normal, how to check if an udp port is open ? ---------------------------------------------------------------------- >Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-16 22:15 Message: Logged In: YES user_id=10485 The link was removed from the 1.3.1 release as it was only added to the CVS HEAD tag... It also has been found to not operate properly, thus it's removal keeps repeated bugs saying it doesn't work from being filed until it can be fixed properly... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=772366&group_id=29880 From noreply at sourceforge.net Wed Jul 16 22:19:03 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jul 16 22:19:03 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-769311 ] adds smtp auth ability to the check_smtp plugin Message-ID: Patches item #769311, was opened at 2003-07-10 13:47 Message generated for change (Comment added) made by undrgrid You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=769311&group_id=29880 Category: Enhancement Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jason Burnett (trig_monkeypr0n) >Assigned to: Jeremy T. Bouse (undrgrid) Summary: adds smtp auth ability to the check_smtp plugin Initial Comment: Adds the ability to confirm that your smtp auth mechanism is working on your smtp server. ---------------------------------------------------------------------- >Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-16 22:18 Message: Logged In: YES user_id=10485 Looking into the patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=769311&group_id=29880 From noreply at sourceforge.net Wed Jul 16 22:39:07 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jul 16 22:39:07 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-769311 ] adds smtp auth ability to the check_smtp plugin Message-ID: Patches item #769311, was opened at 2003-07-10 13:47 Message generated for change (Comment added) made by undrgrid You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=769311&group_id=29880 Category: Enhancement Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jason Burnett (trig_monkeypr0n) Assigned to: Jeremy T. Bouse (undrgrid) Summary: adds smtp auth ability to the check_smtp plugin Initial Comment: Adds the ability to confirm that your smtp auth mechanism is working on your smtp server. ---------------------------------------------------------------------- >Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-16 22:38 Message: Logged In: YES user_id=10485 Can you provide a patch against a recent version of the CVS code? This patch appears to be against a very old version that has had many changes made to it since then. ---------------------------------------------------------------------- Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-16 22:18 Message: Logged In: YES user_id=10485 Looking into the patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=769311&group_id=29880 From code at monkeypr0n.org Thu Jul 17 01:18:13 2003 From: code at monkeypr0n.org (code at monkeypr0n.org) Date: Thu Jul 17 01:18:13 2003 Subject: [Nagiosplug-devel] patches Message-ID: <20030717081656.GA11823@cannonfodder.org> Is there a standard for what version to submit patches against? I am not using the cvs versions and have been submitting patches against 1.3.0. -- If I were a NetHack monster, I would be a cockatrice. People tend to go out of their way to avoid me; those that don't have to treat me very carefully indeed. From Ton.Voon at egg.com Thu Jul 17 02:40:01 2003 From: Ton.Voon at egg.com (Voon, Ton) Date: Thu Jul 17 02:40:01 2003 Subject: [Nagiosplug-devel] patches Message-ID: I've just updated the developer-guidelines, but it won't show up on http://nagiosplug.sf.net until after a few days (some proxy thing...). If the patch is a bug patch, please supply a unified or context diff with the version you are using. If the patch is for new features, please supply against CVS HEAD. Thanks for contributing! Ton > -----Original Message----- > From: code at monkeypr0n.org [mailto:code at monkeypr0n.org] > Sent: Thursday, July 17, 2003 9:17 AM > To: nagiosplug-devel at lists.sourceforge.net > Subject: [Nagiosplug-devel] patches > > > Is there a standard for what version to submit patches > against? I am not > using the cvs versions and have been submitting patches > against 1.3.0. > > -- > If I were a NetHack monster, I would be a cockatrice. People > tend to go out of their way to avoid me; those that don't > have to treat me very carefully indeed. > > > ------------------------------------------------------- > This SF.net email is sponsored by: VM Ware > With VMware you can run multiple operating systems on a > single machine. > WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual > machines at the > same time. Free trial click here: http://www.vmware.com/wl/offer/345/0 > _______________________________________________ > Nagiosplug-devel mailing list > Nagiosplug-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel > ::: Please include plugins version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. From jeremy+nagios at undergrid.net Thu Jul 17 06:28:10 2003 From: jeremy+nagios at undergrid.net (Jeremy T. Bouse) Date: Thu Jul 17 06:28:10 2003 Subject: [Nagiosplug-devel] patches In-Reply-To: <20030717081656.GA11823@cannonfodder.org> References: <20030717081656.GA11823@cannonfodder.org> Message-ID: <20030717132502.GA31247@UnderGrid.net> Patches would want too be made against CVS HEAD as code in CVS HEAD has changed considerably since 1.3.0 and 1.3.1... Work done against the 1.3.x version will be bug fixes and things submitted into CVS HEAD will probably not make it back to them. Regards, Jeremy On Thu, Jul 17, 2003 at 03:16:56AM -0500, code at monkeypr0n.org wrote: > Is there a standard for what version to submit patches against? I am not > using the cvs versions and have been submitting patches against 1.3.0. > > -- > If I were a NetHack monster, I would be a cockatrice. People tend to go out of their way to avoid me; those that don't have to treat me very carefully indeed. > > > ------------------------------------------------------- > This SF.net email is sponsored by: VM Ware > With VMware you can run multiple operating systems on a single machine. > WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the > same time. Free trial click here: http://www.vmware.com/wl/offer/345/0 > _______________________________________________ > Nagiosplug-devel mailing list > Nagiosplug-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel > ::: Please include plugins version (-v) and OS when reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null From noreply at sourceforge.net Fri Jul 18 05:02:20 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jul 18 05:02:20 2003 Subject: [Nagiosplug-devel] [ nagiosplug-New Plugins-773584 ] Check Printer status by SNMP Message-ID: New Plugins item #773584, was opened at 2003-07-18 12:01 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=773584&group_id=29880 Category: Network device plugin Group: None Status: Open Resolution: None Priority: 5 Submitted By: Brad Meier (knightorc) Assigned to: Nobody/Anonymous (nobody) Summary: Check Printer status by SNMP Initial Comment: Heavily hacked version of CHECK_HPJD which adheres to the Printer MIB definition. I know there is a PERL version being worked on, but this should be more efficient hopefully! My code is untidy and not the best quality, C/C++ isn't my strongest language. Please feel free to fix/suggest & test this plugin extensively. I have run it against various Canon and HP printers and it seems to be catching and reporting the errors correctly, please test on more printers and give feedback! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=773584&group_id=29880 From ragnar at skolelinux.no Fri Jul 18 05:02:21 2003 From: ragnar at skolelinux.no (=?ISO-8859-1?B?UmFnbmFyIFdpc2z4ZmY=?=) Date: Fri Jul 18 05:02:21 2003 Subject: [Nagiosplug-devel] Nagios-plugins as Debian packages Message-ID: <1058529609.3f17e149b1058@ragnar.mine.nu> I saw in the archives that Jeremy Bouse asked if there was any interest in Debian packages. The Skolelinux project needs them, and we'd be willing to chip in with anything we can contribute. -- Ragnar Wisl?ff From noreply at sourceforge.net Fri Jul 18 05:06:05 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jul 18 05:06:05 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Bugs-773588 ] check_ntp vs Cisco & Solaris NTP responses Message-ID: Bugs item #773588, was opened at 2003-07-18 12:05 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=773588&group_id=29880 Category: Parsing problem Group: None Status: Open Resolution: None Priority: 5 Submitted By: Brad Meier (knightorc) Assigned to: Nobody/Anonymous (nobody) Summary: check_ntp vs Cisco & Solaris NTP responses Initial Comment: Found some Solaris machines that don't match against line 263's parsing of the reply from ntpq, they use a #, not * or o. Changed it to (\*|o|\#) instead of (\*|o) in the regex and its happy again, checked against Tardis on windows and xntpd on Linux and Solaris. Found another problem, tried against a cisco and it returned a - where the script expects l,u,m,b. Line 263 again. Changed ([lumb]+) to ([lumb-]+) Patch tested against Linux, Solaris, Windows (Tardis) and Cisco ntp's. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=773588&group_id=29880 From noreply at sourceforge.net Fri Jul 18 05:51:15 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jul 18 05:51:15 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Bugs-773588 ] check_ntp vs Cisco & Solaris NTP responses Message-ID: Bugs item #773588, was opened at 2003-07-18 12:05 Message generated for change (Settings changed) made by knightorc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=773588&group_id=29880 Category: Parsing problem >Group: v1.3.0 beta3 Status: Open Resolution: None Priority: 5 Submitted By: Brad Meier (knightorc) >Assigned to: Subhendu Ghosh (sghosh) Summary: check_ntp vs Cisco & Solaris NTP responses Initial Comment: Found some Solaris machines that don't match against line 263's parsing of the reply from ntpq, they use a #, not * or o. Changed it to (\*|o|\#) instead of (\*|o) in the regex and its happy again, checked against Tardis on windows and xntpd on Linux and Solaris. Found another problem, tried against a cisco and it returned a - where the script expects l,u,m,b. Line 263 again. Changed ([lumb]+) to ([lumb-]+) Patch tested against Linux, Solaris, Windows (Tardis) and Cisco ntp's. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=773588&group_id=29880 From jeremy+nagios at undergrid.net Fri Jul 18 06:28:13 2003 From: jeremy+nagios at undergrid.net (Jeremy T. Bouse) Date: Fri Jul 18 06:28:13 2003 Subject: [Nagiosplug-devel] Re: Nagios-plugins as Debian packages In-Reply-To: <1058529609.3f17e149b1058@ragnar.mine.nu> References: <1058529609.3f17e149b1058@ragnar.mine.nu> Message-ID: <20030718132424.GA2348@UnderGrid.net> Actually I am already working on them... I've got two sets of packages that I'm working to maintain at this time... One with the 1.3.1 release of the plugins and another a snapshot of the CVS repository... I have not yet released and uploaded them to the Debian mirrors yet as I'm trying to coordinate some fixes/updates to the Nagios Debian package which is maintained by Turbo... I've got a diff patch to send to him but I'm still waiting on reply from my last email to him... I can provide you with the URL for apt-get that I have the current packages I'm testing located but if you're running the default package for Nagios by Turbo the configuration will fail... Regards, Jeremy T. Bouse On Fri, Jul 18, 2003 at 02:00:09PM +0200, Ragnar Wisl?ff wrote: > I saw in the archives that Jeremy Bouse asked if there was any interest in > Debian packages. The Skolelinux project needs them, and we'd be willing to chip > in with anything we can contribute. > > -- > Ragnar Wisl?ff From ragnar at skolelinux.no Fri Jul 18 07:16:03 2003 From: ragnar at skolelinux.no (=?ISO-8859-1?B?UmFnbmFyIFdpc2z4ZmY=?=) Date: Fri Jul 18 07:16:03 2003 Subject: [Nagiosplug-devel] Re: Nagios-plugins as Debian packages In-Reply-To: <20030718132424.GA2348@UnderGrid.net> References: <1058529609.3f17e149b1058@ragnar.mine.nu> <20030718132424.GA2348@UnderGrid.net> Message-ID: <1058537638.3f1800a6d4140@ragnar.mine.nu> Siterer "Jeremy T. Bouse" : > Actually I am already working on them... I've got two sets of packages > that I'm working to maintain at this time... One with the 1.3.1 release > of the plugins and another a snapshot of the CVS repository... OK. I've done a rebuild of the netsaint-plugins, but would like to have the latest and greatest. I did not realise there was a 1.3.1. We aim for stability, so I guess 1.3.1 is better than CVS ;-) > > I have not yet released and uploaded them to the Debian mirrors yet as > I'm trying to coordinate some fixes/updates to the Nagios Debian package > which is maintained by Turbo... I've got a diff patch to send to him > but I'm still waiting on reply from my last email to him... > Oh, is the official maintainer also working on a Nagios core package for woody? I know he can't get them into woody, but he has made them anyway? That would be very nice. I have rebuilt them with some tweaks to suit us, but to have the maintainer packages would be the best. > I can provide you with the URL for apt-get that I have the current > packages I'm testing located but if you're running the default package > for Nagios by Turbo the configuration will fail... OK, but that's just a little change to the dependencies to change that in the Nagios package? If you have nagios-packges that go with the plugins we can test them out as well. Thanks for your quick reply. -- Ragnar Wisl?ff From jeremy+nagios at undergrid.net Fri Jul 18 09:11:18 2003 From: jeremy+nagios at undergrid.net (Jeremy T. Bouse) Date: Fri Jul 18 09:11:18 2003 Subject: [Nagiosplug-devel] Re: Nagios-plugins as Debian packages In-Reply-To: <1058537638.3f1800a6d4140@ragnar.mine.nu> References: <1058529609.3f17e149b1058@ragnar.mine.nu> <20030718132424.GA2348@UnderGrid.net> <1058537638.3f1800a6d4140@ragnar.mine.nu> Message-ID: <20030718160824.GB20793@UnderGrid.net> On Fri, Jul 18, 2003 at 04:13:58PM +0200, Ragnar Wisl?ff wrote: > Siterer "Jeremy T. Bouse" : > > > Actually I am already working on them... I've got two sets of packages > > that I'm working to maintain at this time... One with the 1.3.1 release > > of the plugins and another a snapshot of the CVS repository... > > OK. I've done a rebuild of the netsaint-plugins, but would like to have the > latest and greatest. I did not realise there was a 1.3.1. We aim for stability, > so I guess 1.3.1 is better than CVS ;-) > Well if you're wanting to monitor any IPv6 enabled devices you'll need the CVS version... That is why I have the snapshot version that I have been working to update about once a month as I haven't automated the process too much... 1.3.1 was the most recent bug fix version and what I'm hoping to get into the Debian mirror as soon as I get Turbo (Nagios package maintainer) to either accept my updates or authorize me to NMU the changes myself... > > > > I have not yet released and uploaded them to the Debian mirrors yet as > > I'm trying to coordinate some fixes/updates to the Nagios Debian package > > which is maintained by Turbo... I've got a diff patch to send to him > > but I'm still waiting on reply from my last email to him... > > > > Oh, is the official maintainer also working on a Nagios core package for woody? > I know he can't get them into woody, but he has made them anyway? That would be > very nice. I have rebuilt them with some tweaks to suit us, but to have the > maintainer packages would be the best. > No he isn't working on the Nagios package for woody as he can't get it included into woody; However I have been working on it as my Nagios monitoring machines are still running woody... Part of my fixes were to modify the Build-depends so that it will build against stable and unstable... I also had to tweak the Depends, Suggests, Provides, Conflicts in the debian/control to allow it to better replace NetSaint and it's plugins... > > I can provide you with the URL for apt-get that I have the current > > packages I'm testing located but if you're running the default package > > for Nagios by Turbo the configuration will fail... > > OK, but that's just a little change to the dependencies to change that in the > Nagios package? If you have nagios-packges that go with the plugins we can test > them out as well. > I have a Nagios 1.1-1.1 package compiled for both Woody and Sid that has my fixes I'm trying to get approved by Turbo as I'm not the Nagios maintainer along with Nagios-plugins 1.3.1-0 and Nagios-plugins-snapshot 1.3.99-0.cvs.2003.07.11 all reachable via apt-get by adding one of the following: dep http://people.debian.org/~jbouse/nagios/ sid/i386/ dep http://people.debian.org/~jbouse/nagios/ woody/i386/ Also the source DEBs are available via: dep-src http://people.debian.org/~jbouse/nagios/ sid/source/ dep-src http://people.debian.org/~jbouse/nagios/ woody/source/ Again though the Nagios packages up there are modified by me not the Nagios package maintainer. The plugin packages however are by me so report any problems to me directly as they are not currently in the Debian mirror and thus the BTS would not handle the requests properly... > Thanks for your quick reply. > Not a problem... Wife says I live on the computer so what am I to do :) Regards, Jeremy From mm at elabnet.de Fri Jul 18 11:38:15 2003 From: mm at elabnet.de (Michael Markstaller) Date: Fri Jul 18 11:38:15 2003 Subject: [Nagiosplug-devel] Nagios-plugin to check Cisco Interface input queue Message-ID: <246BE4BBD2754248AD5C14E535004ABA10207E@elab4.elabnet.com> Hi, I just built me a lousy but working quickhack to monitor, if an interface input-queue on a Cisco-router grows above a specific value. Maybe somebody finfs it useful. You might think what current exploit this is for, I just want to make sure and also monitor it in general for future as I had other bugs were the queue grew that way.. I found no snmp-value to poll this out so I used a (very lousy) thing: rsh wrapped in a shell-script. I already use rsh to poll NAT/Inspect-stats for mtrg on several routers - I'm sure there're 100 better ways to do, but this works for me so far.. Sometime I'd tend to bring this thing into a perlscript using at least telnet or even better ssh, but it's Friday evening and the Biergarten is waiting ;) Maybe this shouldn't go into the contrib or wider production, without being re-written with more time and another concept than rsh as I don't want to distribute lousy things ;) Quick-HowTo: - enable rsh on the router: --- cut --- ip rcmd rsh-enable ip rcmd remote-host nagios nagios enable ! below is only for testing with root-account ip rcmd remote-host nagios root enable --- cut --- - services.cfg --- cut --- define service{ use xyz-template host_name xyz-host service_description IfInQueue normal_check_interval 30 check_command check_ifqueue!RSH-USER!Serial0/0:0!15!30 } --- cut --- - checkcommands.cfg --- cut --- # 'check_ifqueue' command definition define command{ command_name check_ifqueue command_line $USER1$/check_ifqueue.sh $HOSTADDRESS$ $ARG1$ "$ARG2$" $ARG3$ $ARG4$ } --- cut --- - the script check_ifqueue.sh should go into the nagios-plugins-folder, set correct rights so nagios can execute it --- cut --- #! /bin/sh Exe=/usr/bin/rsh PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin PROGNAME=`basename $0` PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'` REVISION=`echo '$Revision: 1.0 $' | sed -e 's/[^0-9.]//g'` . $PROGPATH/utils.sh print_usage() { echo "Usage: $PROGNAME HOSTNAME RSH-USERNAME IF-NAME WARN-LEVEL CRIT-LEVEL" } print_help() { print_revision $PROGNAME $REVISION echo "" print_usage echo "" echo "This plugin checks the given cisco interface input queue by using rsh." echo "quickhack 17. July 2003 Michael Markstaller / ElabNET" echo "on router: #ip rcmd rsh-enable# and #ip rcmd remote-host nagios nagios enable#" echo "" support exit 3 } if [ ! "$#" == "5" ]; then print_help fi rshUser=$2 ifname=$3 WARN=$4 CRIT=$5 case "$1" in --help) print_help exit 0 ;; -h) print_help exit 0 ;; --version) print_revision $PROGNAME $REVISION exit 0 ;; -V) print_revision $PROGNAME $REVISION exit 0 ;; *) ifInQlen=`$Exe -l $rshUser $1 "show int $ifname | inc Input" | cut -d ':' -f 2 | cut -d '/' -f 1 | awk '{print $1}' | head -n 1` if test ${ifInQlen} -gt $CRIT; then echo "CRITICAL - InQueue of $ifname is $ifInQlen (gt $CRIT)" exit 2 elif test ${ifInQlen} -gt $WARN; then echo "WARNING - InQueue of $ifname is $ifInQlen (gt $WARN)" exit 1 elif test ${ifInQlen} -lt $WARN; then echo "OK - InQueue of $ifname is $ifInQlen" exit 0 else echo "UNKNOWN - Output $ifInQlen - in $1 $2 $3 $4 $5" exit -1 fi ;; esac --- cut --- Michael From cal at calevans.com Fri Jul 18 14:43:10 2003 From: cal at calevans.com (Cal Evans) Date: Fri Jul 18 14:43:10 2003 Subject: [Nagiosplug-devel] Yet another plugin to check MS SQL. Message-ID: <2002.192.168.0.90.1058564463.squirrel@192.168.0.150> This one does not require sqsh. This has not been tested with SQL 7. I would appreciate any feedback. Also, this has not been tested extensivly at all. I'm in the proces of testing it as I can but would appreciate feedback from anyone who has tested it. =C= * * Cal Evans * http://www.christianperformer.com * Stay plugged in to your audience! * #!/bin/sh # # Description : # Checks the status of Microsoft SQL Server 2000. (Possibly other versions) # Copyright (c) 2003 Cal Evans # # License : GPL # # Props to : # Tom DeBlende for the core concepts. # Jerome Tytgat for the verify_deps code # Scott Lambert for an excellent example of how to write these things (check_adptraid.sh) # # Requirements : # FreeTDS (http://www.freetds.org/) # # Version 1.0 : 07/18/2003 # Initial release. # # TODO: # Paramertize the tmp directory # Add a verbose mode that gives all output for testing. # ################################################################################################################### PROGNAME=`basename $0` PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'` REVISION=`echo '$Revision: 1.3 $' | sed -e 's/[^0-9.]//g'` . $PROGPATH/utils.sh # # Initalize a few variables # HOSTNAME=$1 USERLOGIN=$2 PASSWORD=$3 SERVER=$4 OUTPUT='' EXITCODE="3" print_usage() { echo "Usage: $PROGNAME" echo "check_mssql.sh [host] [username] [password]" echo " " echo "Options:" echo "[host] " echo "The name or IP address of the server to check." echo " " echo "[username] " echo "The user login to use when connecting to the server." echo " " echo "[password]" echo "The password for the specified user." echo " " echo "Example:" echo "check_mssql dbserver foo bar" } print_help() { print_revision $PROGNAME $REVISION echo "" print_usage echo " " echo "check_mssql checks Microsoft SQL Server connectivity. It works with versions 7 and 2000." echo "You need FreeTDS (http://www.freetds.org/) to connect to the SQL server." echo " " echo " " support exit 0 } verify_dep() { needed="bash tsql cat grep mktemp uniq tail" for i in `echo $needed` do type $i > /dev/null 2>&1 /dev/null if [ $? -eq 1 ] then echo "I am missing an important component : $i" echo "Cannot continue, sorry, try to find the missing one..." exit 3 fi done } case "$1" in --help) print_help exit 0 ;; -h) print_help exit 0 ;; --version) print_revision $PROGNAME $REVISION exit 0 ;; -V) print_revision $PROGNAME $REVISION exit 0 ;; *) verify_dep if [ ! "$#" == "4" ]; then echo "You did not supply enough arguments." exit "3" fi TEMPFILE=`mktemp /tmp/$HOSTNAME.XXXXXX` echo "DECLARE @iUsers int," > $TEMPFILE echo " @iAgeInMinutesOfOldestProcess int," >> $TEMPFILE echo " @iMaxCPU int," >> $TEMPFILE echo " @iMaxIO int," >> $TEMPFILE echo " @iBlocks int" >> $TEMPFILE echo "SELECT @iUsers = COUNT(*) FROM sysprocesses WHERE spid > 50" >> $TEMPFILE echo "SELECT @iAgeInMinutesOfOldestProcess = DATEDIFF( mi, MIN( last_batch )," >> $TEMPFILE echo " GETDATE() ) FROM sysprocesses WHERE spid > 50" >> $TEMPFILE echo "SELECT @iMaxCPU = MAX( cpu ) FROM sysprocesses WHERE spid > 50" >> $TEMPFILE echo "SELECT @iMaxIO = MAX( physical_io ) FROM sysprocesses WHERE spid > 50" >> $TEMPFILE echo "SELECT @iBlocks = COUNT(*) FROM sysprocesses WHERE blocked > 0" >> $TEMPFILE echo "SELECT ('Users = '+convert(varchar, at iUsers)+" >> $TEMPFILE echo " ' Age In Minutes Of Oldest User = '+convert(varchar, at iAgeInMinutesOfOldestProcess) +" >> $TEMPFILE echo " ' Max CPU User = '+convert(varchar, at iMaxCPU)+" >> $TEMPFILE echo " ' Max IO User = '+convert(varchar, at iMaxIO)+" >> $TEMPFILE echo " ' TotalBlocks = '+convert(varchar, at iBlocks)" >> $TEMPFILE echo " ) as out" >> $TEMPFILE echo "go" >> $TEMPFILE RESULTFILE=`mktemp /tmp/$HOSTNAME.XXXXXX` tsql -S $HOSTNAME -U $USERLOGIN -P $PASSWORD < $TEMPFILE 2>/dev/null | grep -v ">" | tail -n1 > $RESULTFILE if [ ! -s $RESULTFILE ]; then OUTPUT="CRITICAL - Could not connect to SQL server running on $HOSTNAME." EXITCODE='2' else OUTPUT="$(cat $RESULTFILE)" EXITCD='0' fi # Cleaning up. rm -f $TEMPFILE $RESULTFILE echo $OUTPUT exit $EXITCD ;; esac From noreply at sourceforge.net Fri Jul 18 18:35:06 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jul 18 18:35:06 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-774019 ] Major check_oracle reorg - + new getopts style arg handling Message-ID: Patches item #774019, was opened at 2003-07-18 21:34 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=774019&group_id=29880 Category: Enhancement Group: None Status: Open Resolution: None Priority: 5 Submitted By: John Marquart (vaix) Assigned to: Nobody/Anonymous (nobody) Summary: Major check_oracle reorg - + new getopts style arg handling Initial Comment: This is my attempt at re-organizing the check_oracle plugin to enable easier modifications/updates - as well as 1 new feature (the reason for my re-organization). Reorg: 1) Command line argument handling has been supplemented: A) getopts style handling has been implemented B) old style args still work - however require a leading '--' 2) All "checks" (i.e. cache, tns, login, etc) have been turned into functions. - this enables argument handling to be independant of actual check code 3) All "checks" - now use set variables instead of passed arguments - this makes the code easier to read & manage 4) "Chaining" - w/ changes 1-3, it is now easier to combine options into chains - rather than individual checks. New: Extent checking - this new option (-e || --extents) is the first real excercise of the above mentioned new functionality. When invoked as -e (w/ no other getopts style checks) or --extents - it checks the entire oracle instance for tables that cannot be extended - due to insufficient tablespace. When invoked as -e w/ the -T argument (-T = tablespace check) - it checks the tablespace for available space - and then checks to see if the tablespace in question can be extended. ---- This is a pretty large change - it nearly doubles the existing code base (partially in backwards compatibility support). I have uploaded the entire new "oracle.sh" - since the diff is larger. If required - i can break this into smaller iterative patches - and submit them sequentially en mass. (unfotunately my zeal prevented me from doing it that way to begin w/) I really think that this re-organization - or a not dissimilar one will encourate / enable easier addition of funtionality as well as making debugging /modification of existing functions simpler. Please let me know what you think - I intend to send this to the developer list as well (as specified in the guidelines). -john marquart (aka vaix) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=774019&group_id=29880 From rouilj at cs.umb.edu Fri Jul 18 19:27:07 2003 From: rouilj at cs.umb.edu (John P. Rouillard) Date: Fri Jul 18 19:27:07 2003 Subject: [Nagiosplug-devel] Re: [Nagios-devel] Adding more advanced correlation to nagios with sec (any interest?) In-Reply-To: Your message of "Fri, 11 Jul 2003 15:04:16 +1000." <20030711150412.B84683@IPAustralia.Gov.AU> Message-ID: <200307190225.h6J2PtcM009392@mx1.cs.umb.edu> Hello. I apologize for taking so long to respond to you. Things at work have been really hectic. In message <20030711150412.B84683 at IPAustralia.Gov.AU>, Stanley Hopcroft writes: >On Sat, Jun 28, 2003 at 03:48:16PM -0400, John P. Rouillard wrote: >> However, I have some things that I want to do that are not easily >> done within nagios. E.G. >> >> If a system jumpstart is in progress, ignore warnings about high >> interface usage (on one interface), or dropped packets (on the >> hub). >> >> If an index operation of the HTTP server is in progress, ignore >> warnings about the http interface being slow. >> >> I also want to show a host/service down if a given system went down, >> (as determined by a syslog message) but I want the report done >> ONLY if the system isn't back up in 5 minutes. >> >> Note that none of the rebooting, indexing, or jumpstarting operations >> occur at fixed times, so I can't schedule these in advance. > [...] >However, please would you spell what events and their origin are >correlated by Sec to avoid spurious alarms in these cases - especially >the first two. Is Sec correlating plugin failures with syslog messages ? For the first, I have a tftp daemon that logs via syslog. A child sec (invoked via the spawn action) watches the /var/adm/messages file, and sends a message "jumpstart_in_progress_on_16_net" to the parent sec that is handling the nagios alerts. This causes suppression of the warnings until a target file is retrieved near the end of the jumpstart which resets the context allowing the events to pass. The last problem is handled by generating a 5 minute HOSTNAME_REBOOTING context/flag when the reboot syslog message is received. The existance of this context enables a suppress rule that gobbles all of the events for the rebooting device. After 5 minutes or the arrival of a "system up" syslog entry, the context is destroyed. If the host is still down nagios's next poll of the device will cause sec to pass the events to nagios for reporting. The HTTP index program is started from a shell script that sends a trap when the indexing operation starts and stops. >> I have a method of integrating sec >> into nagios to handle these issues and more. >> >> Using sec to process traps (or other passive checks) is straight >> forward. The trap collector running from snmptrapd just dumps the trap >> report (formatted as a nagios passive service check) into sec's input >> fifo and then sec processes it, and reports it (if needed) into the >> nagios.cmd pipe. [...] >Sec has become for me, the standard way of providing event and trap >handlers. > >For example, I have a general host and service handler that updates a >MySQL DB with the outage interval. To do this it must retain state (and >does so with a Perl hash tied to a DB file) so it can determine if there >has been a transition and if so, how long it was. > >This would probably be easier to do with Sec contexts. One way to handle it is to store the start and stop times for the event in a context's event store using the add command with %u (the current time as number of seconds since Jan 1 1970). Then report these to an external shell script, and have it subtract them to get duration of the outage. With the ability to trigger perl programs, you could probably do it all within sec and remove the need for an external program. >> However for polled items, it more difficult. I don't want to have a >> flapping service where the plugin determines that there is a problem, >> nagios reacts to that, and then sec reacts to that (being fed its info >> by an event handler) by clearing the service because sec determines >> that there is not yet a problem. This leads to a flapping service as >> nagios and sec disagree on what is a true problem, and leads to >> spurious notifications because I can't put in a high >> max_check_attempts and have nagios respond to sec when it has a real >> problem (unless I define yet another service yech). >> >> What I did was write a plugin in perl (sec_filter) that runs the >> nagios command (sort of like check_ssh). It always passes the output >> of the plugin to sec's input pipe. However, depending on the flags >> given to the sec_filter script, it will exit: >> >> with an "ignore OK" code, and no output >> with an "ignore ERROR" code, and no output >> with the exit code and output of the plugin >> >> I have chosen exit status of 5 for "ignore OK" and 6 for "ignore >> ERROR". (It looks like code 4 is used internally for pending states, >> and I didn't want to use that number hence my choice of 5 and 6.) >> >> The reason for these new codes is to make nagios not change any status >> for the polled service based on the poll. The new status will be sent >> to it by a passive check command generated from sec. >> >> That is I want nagios to be a (almost) dumb poller and to let sec >> filter all the data. > >If I understand correctly, the proposal is > >1 When Nag schedules a service check, of any and all service checks, it > in fact execs sec_filter with the real plugin name and flags that > determine sec_filters behaviour by allowing it to either Correct. > 1.1 treat the service as a normal Nag service (a 'polled' service, for > which no event correlation by Sec is necessary) Almost. It may or may not be correlated by Sec. It's just that you want the initial report to be recorded in/acted on by Nag. Sec may still clear the event. > 1.2 treat the service as requiring Sec processing to accurately > determine the service state. Sec will get the plugin output and > use this with other Sec inputs and Sec context to determine the > service state Correct. >2 Sec_filter writes > > 2.1 For those services requiring Sec, I would say: for those services (service events) being reported only via Sec, > 2.1.1 An event to Sec > > 2.1.2 One of the new status codes to Nagios > > 2.2 Otherwise, in the case of 'polled' services, the usual Nag status > codes and plugin output are written to Nags input queue Correct. >3 Nag processes former status codes with no changes (i.e. CRITICAL leads >to the check being repeated retry_interval and if the state persists to >Notification), but those with the new code of IGNORE_ERROR are >recognised as requiring retry at the retry_interval but _no_ other >processing. Exactly. However, it looks like I don't have that quite down in my code yet. I sometimes have services dropping into an unknown state when sec is suppressing a report, but I am not sure why. >4 Sec will eventually submit a PROCESS_SERVICE_CHECK_RESULT to the Nag >input queue (for the services that have formerly been reported as >IGNORE_\w+. Yes. It will usually submit the OK state, but there is no requirement for that. Maybe this is where the unknown state is coming from? Is there a default "freshness" on polled items that results in an unknown state? >My remarks are > >1 This _may_ be better done in the Nag core. Nag could be equipped with > configuration directives for Sec processing so that Nag itself could > submit the event to Sec (rather than the plugin sec_filter). This > saves an extra fork. I agree with this. It could be generalized to allow diversion of the plugin's report to an arbitrary file/pipe/program in addition to or instead of sending it to nagios. Another network monitoring package that I use has no method of intercepting the events between the time they are generated and the time they are acted upon by the core. This leads to a lot of useless event traffic running through the system. >2 I am not sure how your proposal relates to the embedded Perl stuff > (where each plugin is called as a function from the Nagios address > space). > I currently use a subroutine call in sec_filter to lock the sec input file so I don't screw up the data. This is probably unnecessary since the size of the data is small enough that it should be an atomic write, but I prefer to be safe. However sec_filter would probably have to be modified to be embedded perl safe. > This is probably trivial since sec_filter simply becomes another Perl > plugin that Nag calls (and sec_filter 'requires' the real Perl plugin so > that re-compilation of the real plugin is avoided Hmm, will that work, does require keep it in the same name/function space? Also the sec_filter would have to be rewritten to detect that it is running a perl script, and require it's argument. Currently sec_filter can run any nagios plugin. I think this is an argument for putting the diversion mechanism in the core. >3 I like the bit about making Sec processing optional (depending on the > options specified to sec_filter) I see two uses for optional processing: you may want use the plugin output to affect correlation of other services and not have itself processed/correlated by sec (as you mention above). This allows service, cluster and other dependencies to be implemented in sec rather then in nagios. Triggering Nag's event handlers (especially in soft state). While you can run commands from Sec as well, the soft/hard state and number of calls is handled better from Nagios. These event handlers can be written to provide additional info to sec. It will result in flapping (flipping) service states as nagios and sec disagree about the state of the service, but with properly set retries, and nagios's soft and hard states it may be useful. >For me, I am quite happy with Nags processing of most services. I can't >say that the scenarios you mention are problematic for me. However, I >would very much like the option of event correlation when required. > >> I have set it up so that sec itself is a passive nagios service, and >> automatically sends notifications to nagios, as well as nagios being >> able to poll the sec service if its data gets stale. >> >> So is anybody interested in my mods (about 30 lines) to nagios to >> support this, and my plugin? > >This needs the comment of the Nagios developer. It sounds attractive to >me however. I haven't seen any signs of interest from the Nagios developer(s). I'm not even sure of they are interested in/know of this patch. As I said I still have a few issues to work out, and I think the developer(s) could do a cleaner implementation of my patch to add the ignore OK and ignore ERROR functionality. Obvious to implement this in the core would be something that the developers would need to be involved in since it is a larger job than I have done. >I am sorry if these remarks are stupid or based on misunderstanding. I >think I would need to see the mods for a better (marginally) response. > >It may simply be worth posting them to Nagios-Devel. AFAIK this is not >on the Nag road map so it simply may be a golden opportunity for a big >benefit. Your remarks are correct. I'll try to pull the patches and things together in the next couple of weeks. It's still got that annoying unknown state issue, but it doesn't look like I will be able to do more work on it. >Finally, you have identified a good area for future development. Root >cause analysis and event correlation is one area that commercial >products can claim superiority. The funny part is that sec was written as a lower cost alternative to HPOV's correlation engine. I have had at least one report that it is easier to use then HPOV's commercial tool. -- rouilj John Rouillard =========================================================================== My employers don't acknowledge my existence much less my opinions. From noreply at sourceforge.net Sat Jul 19 07:06:04 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sat Jul 19 07:06:04 2003 Subject: [Nagiosplug-devel] [ nagiosplug-New Plugins-774200 ] check_mssql.sh Message-ID: New Plugins item #774200, was opened at 2003-07-19 10:05 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=774200&group_id=29880 Category: Application monitor Group: None Status: Open Resolution: None Priority: 5 Submitted By: Cal Evans (calevans) Assigned to: Nobody/Anonymous (nobody) Summary: check_mssql.sh Initial Comment: Yet another plugin to check mssql. This one only requires freetds, not sqsh. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=774200&group_id=29880 From cal at calevans.com Sat Jul 19 07:21:22 2003 From: cal at calevans.com (Cal Evans) Date: Sat Jul 19 07:21:22 2003 Subject: [Nagiosplug-devel] check_mssql.sh v1.1 Message-ID: <3888.192.168.0.90.1058623252.squirrel@192.168.0.150> This one has had more testing. I fixed a problem with it not properly reporting errors. =C= * * Cal Evans * http://www.christianperformer.com * Stay plugged in to your audience! * #!/bin/sh # # Description : # Checks the status of Microsoft SQL Server 2000. (Possibly other versions) # Copyright (c) 2003 Cal Evans # # License : GPL # # Special Thanks to : # Tom DeBlende for the core concepts. # Dennis Deming for the nifty TSQL code # Jerome Tytgat for the verify_deps code # Scott Lambert for an excellent example of how to write these things # (check_adptraid.sh) # # Requirements : # FreeTDS (http://www.freetds.org/) # # Version 1.0 : 07/18/2003 # Initial release. # # Version 1.1 : 07/19/2003 # Fixed the error checking so that a non-connect will be properly reported. # # Fixed the parameter check so that 3 parameters are now required but 4 won't # cause a problem. # # Minor cleanups. # # TODO: # Paramertize the tmp directory # Add a verbose mode that gives all output for testing. # Find a better parameter parsing routine. # ################################################################################ PROGNAME=`basename $0` PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'` REVISION=`echo '$Revision: 1.3 $' | sed -e 's/[^0-9.]//g'` . $PROGPATH/utils.sh # # Initalize a few variables # HOSTNAME=$1 USERLOGIN=$2 PASSWORD=$3 SERVER=$4 OUTPUT='' EXITCODE="3" ERRORMSG="There was a problem connecting to the server" print_usage() { echo "Usage: $PROGNAME" echo "check_mssql.sh [host] [username] [password]" echo " " echo "Options:" echo "[host] " echo "The name or IP address of the server to check." echo " " echo "[username] " echo "The user login to use when connecting to the server." echo " " echo "[password]" echo "The password for the specified user." echo " " echo "Example:" echo "check_mssql dbserver foo bar" } print_help() { print_revision $PROGNAME $REVISION echo "" print_usage echo " " echo "check_mssql checks Microsoft SQL Server connectivity. It works with versions 7 and 2000." echo "You need FreeTDS (http://www.freetds.org/) to connect to the SQL server." echo " " echo " " support exit 0 } verify_dep() { needed="bash tsql cat grep mktemp uniq tail" for i in `echo $needed` do type $i > /dev/null 2>&1 /dev/null if [ $? -eq 1 ] then echo "I am missing an important component : $i" echo "Cannot continue, sorry, try to find the missing one..." exit 3 fi done } case "$1" in --help) print_help exit 0 ;; -h) print_help exit 0 ;; --version) print_revision $PROGNAME $REVISION exit 0 ;; -V) print_revision $PROGNAME $REVISION exit 0 ;; *) verify_dep if [ "$#" -lt 3 ]; then echo "This plugin requires 3 arguments." exit "3" fi TEMPFILE=`mktemp /tmp/$HOSTNAME.XXXXXX` echo "DECLARE @iUsers int," > $TEMPFILE echo " @iAgeInMinutesOfOldestProcess int," >> $TEMPFILE echo " @iMaxCPU int," >> $TEMPFILE echo " @iMaxIO int," >> $TEMPFILE echo " @iBlocks int" >> $TEMPFILE echo "SELECT @iUsers = COUNT(*) FROM sysprocesses WHERE spid > 50" >> $TEMPFILE echo "SELECT @iAgeInMinutesOfOldestProcess = DATEDIFF( mi, MIN( last_batch )," >> $TEMPFILE echo " GETDATE() ) FROM sysprocesses WHERE spid > 50" >> $TEMPFILE echo "SELECT @iMaxCPU = MAX( cpu ) FROM sysprocesses WHERE spid > 50" >> $TEMPFILE echo "SELECT @iMaxIO = MAX( physical_io ) FROM sysprocesses WHERE spid > 50" >> $TEMPFILE echo "SELECT @iBlocks = COUNT(*) FROM sysprocesses WHERE blocked > 0" >> $TEMPFILE echo "SELECT ('Users = '+convert(varchar, at iUsers)+" >> $TEMPFILE echo " ' Age In Minutes Of Oldest User = '+convert(varchar, at iAgeInMinutesOfOldestProcess) +" >> $TEMPFILE echo " ' Max CPU User = '+convert(varchar, at iMaxCPU)+" >> $TEMPFILE echo " ' Max IO User = '+convert(varchar, at iMaxIO)+" >> $TEMPFILE echo " ' TotalBlocks = '+convert(varchar, at iBlocks)" >> $TEMPFILE echo " ) as out" >> $TEMPFILE echo "go" >> $TEMPFILE RESULTFILE=`mktemp /tmp/$HOSTNAME.XXXXXX` ERRORFILE=`mktemp /tmp/$HOSTNAME.XXXXXX` tsql -S $HOSTNAME -U $USERLOGIN -P $PASSWORD < $TEMPFILE 2>$ERRORFILE >$RESULTFILE # Check for the error message in the error file. SUCCESS="$(grep "$ERRORMSG" $ERRORFILE | wc -l)" if [ ! $SUCCESS == 0 ]; then OUTPUT="Error connecting to server running on $HOSTNAME." EXITCODE='2' # If we put in a verbose mode, dump the entire error file here. else OUTPUT="$(cat $RESULTFILE | grep -v ">" | tail -1)" EXITCD='0' fi # Clean up. rm -f $TEMPFILE $RESULTFILE $ERRORFILE echo $OUTPUT exit $EXITCD ;; esac From news at villa-kuip.kabel.utwente.nl Sat Jul 19 08:03:06 2003 From: news at villa-kuip.kabel.utwente.nl (Willempie) Date: Sat Jul 19 08:03:06 2003 Subject: [Nagiosplug-devel] Updated check_disk_smb support annonymous logins Message-ID: Hi, I've updated the check_disk_smb plugin to also allow annonymous logins. It is switched with the -N option (also used in smbclient), and in that case --pass and --user are not used. Can this be included in the plugins? Wim The diff file (with "# $Id: check_disk_smb.pl,v 1.8.2.1 2003/07/02 15:52:23 tonvoon Exp $"): $> diff check_disk_smb.new check_disk_smb 1c1 < #! /usr/bin/perl -w --- > #!/usr/bin/perl -w 26c26 < use vars qw($opt_V $opt_h $opt_H $opt_s $opt_W $opt_u $opt_p $opt_w $opt_c $opt_N $verbose); --- > use vars qw($opt_V $opt_h $opt_H $opt_s $opt_W $opt_u $opt_p $opt_w $opt_c $verbose); 28c28 < use lib "/usr/local/nagios/libexec" ; --- > use lib utils.pm ; 51d50 < "N" => \$opt_N, "annonymous" => \$opt_N, 126,130c125 < if( $opt_N ) { < $res = qx/$smbclient \/\/$host\/$share -W $workgroup -N $smbclientoptions -c ls/; < }else { < $res = qx/$smbclient \/\/$host\/$share $pass -W $workgroup -U $user $smbclientoptions -c ls/; < } --- > $res = qx/$smbclient \/\/$host\/$share $pass -W $workgroup -U $user $smbclientoptions -c ls/; 132,138c127,128 < if( $opt_N ) { < print "$smbclient " . "\/\/$host\/$share" ." -N $smbclientoptions -c ls\n" if ($verbose); < $res = qx/$smbclient \/\/$host\/$share -N $smbclientoptions -c ls/; < }else { < print "$smbclient " . "\/\/$host\/$share" ." $pass -U $user $smbclientoptions -c ls\n" if ($verbose); < $res = qx/$smbclient \/\/$host\/$share $pass -U $user $smbclientoptions -c ls/; < } --- > print "$smbclient " . "\/\/$host\/$share" ." $pass -U $user $smbclientoptions -c ls\n" if ($verbose); > $res = qx/$smbclient \/\/$host\/$share $pass -U $user $smbclientoptions -c ls/; 243c233 < -w -c [-W ] [-N]\n"; --- > -w -c [-W ]\n"; 265,266d254 < -N, --annonymous < Log in annonymous From news at villa-kuip.kabel.utwente.nl Sat Jul 19 09:31:07 2003 From: news at villa-kuip.kabel.utwente.nl (Willempie) Date: Sat Jul 19 09:31:07 2003 Subject: [Nagiosplug-devel] check_by_ssh support user-defined timeout reply Message-ID: Hi, I've updated the check_by_ssh plugin to make the reply it gives at timeout configurable with the -r option. It's supporting currently STATE_CRITICAL (2) and STATE_UNKNOWN (3). The option -r takes the integer value of STATE_CRITICAL or STATE_UNKNOWN. Wim * $Id: check_by_ssh.c,v 1.9 2003/01/29 06:15:32 kdebisschop Exp $ $> diff check_by_ssh.c.new check_by_ssh.c.old 41c41 < void timeout_alarm_handler_reply (int); --- > 50d49 < int timeout_reply = 2; 79c78 < if (signal (SIGALRM, timeout_alarm_handler_reply) == SIG_ERR) { --- > if (signal (SIGALRM, popen_timeout_alarm_handler) == SIG_ERR) { 188d186 < {"state_timeout", required_argument, 0, 'r'}, 212c210 < getopt_long (argc, argv, "Vvh46ft:H:O:p:i:u:l:C:n:s:r:", long_options, --- > getopt_long (argc, argv, "Vvh46ft:H:O:p:i:u:l:C:n:s:", long_options, 215c213 < c = getopt (argc, argv, "Vvh46ft:H:O:p:i:u:l:C:n:s:r:"); --- > c = getopt (argc, argv, "Vvh46ft:H:O:p:i:u:l:C:n:s:"); 249,253d246 < case 'r': /* timeout reply */ < if (!is_integer (optarg)) < usage2 ("timeout reply must be an integer", optarg); < timeout_reply = atoi (optarg); < break; 374,375d366 < "-r, --state_timeout=STATE\n" < " reply status when timeout appears [optional]\n" 404c395 < " [-n name] [-s servicelist] [-O outputfile] [-p port] [-r state_timeout]\n" --- > " [-n name] [-s servicelist] [-O outputfile] [-p port]\n" 408,421d398 < < void timeout_alarm_handler_reply (int signo) < { < if (signo == SIGALRM) { < kill (childpid[fileno (child_process)], SIGKILL); < if ( timeout_reply==3 ) { < printf ("UNKNOWN - Plugin timed out after %d seconds\n",timeout_interval); < exit (STATE_UNKNOWN); < }else { < printf ("CRITICAL - Plugin timed out after %d seconds\n",timeout_interval); < exit (STATE_CRITICAL); < } < } < } From noreply at sourceforge.net Sun Jul 20 07:04:18 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sun Jul 20 07:04:18 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Bugs-774569 ] wrong spec was tagged for r1_3_1 Message-ID: Bugs item #774569, was opened at 2003-07-20 10:03 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=774569&group_id=29880 Category: None Group: CVS Status: Open Resolution: None Priority: 4 Submitted By: Karl DeBisschop (kdebisschop) Assigned to: Karl DeBisschop (kdebisschop) Summary: wrong spec was tagged for r1_3_1 Initial Comment: need to move tag to 1.8.2.1 for nagiosplug.spec.in ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=774569&group_id=29880 From noreply at sourceforge.net Sun Jul 20 07:09:13 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sun Jul 20 07:09:13 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Bugs-772366 ] check_udp2 on 1.3.1 ? Message-ID: Bugs item #772366, was opened at 2003-07-16 11:04 Message generated for change (Settings changed) made by kdebisschop You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=772366&group_id=29880 Category: None Group: Release (specify) Status: Open >Resolution: Invalid Priority: 5 Submitted By: R?nald CASAGRAUDE (kipit) Assigned to: Jeremy T. Bouse (undrgrid) Summary: check_udp2 on 1.3.1 ? Initial Comment: Is it normal that check_udp2 (symbolic link to check_tcp) disappear from this release ? This link is present on nagios-plugins-CVS and creating the link by hand with 1.3.1 (release) do the job... If the disappearance of check_udp2 is normal, how to check if an udp port is open ? ---------------------------------------------------------------------- Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-17 01:15 Message: Logged In: YES user_id=10485 The link was removed from the 1.3.1 release as it was only added to the CVS HEAD tag... It also has been found to not operate properly, thus it's removal keeps repeated bugs saying it doesn't work from being filed until it can be fixed properly... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397597&aid=772366&group_id=29880 From ragnar at skolelinux.no Sun Jul 20 23:41:23 2003 From: ragnar at skolelinux.no (=?ISO-8859-1?Q?Ragnar_Wisl=F8ff?=) Date: Sun Jul 20 23:41:23 2003 Subject: [Nagiosplug-devel] Re: Nagios-plugins as Debian packages References: <1058529609.3f17e149b1058@ragnar.mine.nu> <20030718132424.GA2348@UnderGrid.net> <1058537638.3f1800a6d4140@ragnar.mine.nu> <20030718160824.GB20793@UnderGrid.net> Message-ID: <3F1B8AC1.6030107@skolelinux.no> Jeremy T. Bouse wrote: >On Fri, Jul 18, 2003 at 04:13:58PM +0200, Ragnar Wisl?ff wrote: > > >>Siterer "Jeremy T. Bouse" : >> >> >> >>> Actually I am already working on them... I've got two sets of packages >>>that I'm working to maintain at this time... One with the 1.3.1 release >>>of the plugins and another a snapshot of the CVS repository... >>> >>> >>OK. I've done a rebuild of the netsaint-plugins, but would like to have the >>latest and greatest. I did not realise there was a 1.3.1. We aim for stability, >>so I guess 1.3.1 is better than CVS ;-) >> >> >> > Well if you're wanting to monitor any IPv6 enabled devices >you'll need the CVS version... That is why I have the snapshot version >that I have been working to update about once a month as I haven't >automated the process too much... 1.3.1 was the most recent bug fix >version and what I'm hoping to get into the Debian mirror as soon as I >get Turbo (Nagios package maintainer) to either accept my updates or >authorize me to NMU the changes myself... > Thanks for the rundown of the versions. No need for ipv6 yet in our distro, so 1.3.1 looks like the right choice. > > > >>> I have not yet released and uploaded them to the Debian mirrors yet as >>>I'm trying to coordinate some fixes/updates to the Nagios Debian package >>>which is maintained by Turbo... I've got a diff patch to send to him >>>but I'm still waiting on reply from my last email to him... >>> >>> >>> >>Oh, is the official maintainer also working on a Nagios core package for woody? >>I know he can't get them into woody, but he has made them anyway? That would be >>very nice. I have rebuilt them with some tweaks to suit us, but to have the >>maintainer packages would be the best. >> >> >> > No he isn't working on the Nagios package for woody as he can't >get it included into woody; However I have been working on it as my >Nagios monitoring machines are still running woody... Part of my fixes >were to modify the Build-depends so that it will build against stable >and unstable... I also had to tweak the Depends, Suggests, Provides, >Conflicts in the debian/control to allow it to better replace NetSaint >and it's plugins... > > And a good job too, the packages install without a hitch :-) > > >>> I can provide you with the URL for apt-get that I have the current >>>packages I'm testing located but if you're running the default package >>>for Nagios by Turbo the configuration will fail... >>> >>> >>OK, but that's just a little change to the dependencies to change that in the >>Nagios package? If you have nagios-packges that go with the plugins we can test >>them out as well. >> >> >> > I have a Nagios 1.1-1.1 package compiled for both Woody and Sid >that has my fixes I'm trying to get approved by Turbo as I'm not the >Nagios maintainer along with Nagios-plugins 1.3.1-0 and >Nagios-plugins-snapshot 1.3.99-0.cvs.2003.07.11 all reachable via >apt-get by adding one of the following: > > dep http://people.debian.org/~jbouse/nagios/ sid/i386/ > dep http://people.debian.org/~jbouse/nagios/ woody/i386/ > > Also the source DEBs are available via: > > dep-src http://people.debian.org/~jbouse/nagios/ sid/source/ > dep-src http://people.debian.org/~jbouse/nagios/ woody/source/ > > Wonderful. Thanks. > Again though the Nagios packages up there are modified by me >not the Nagios package maintainer. The plugin packages however are by me >so report any problems to me directly as they are not currently in the >Debian mirror and thus the BTS would not handle the requests properly... > > So far no problems have been observed, but I'll let you know if I come across something. > > >>Thanks for your quick reply. >> >> >> > Not a problem... Wife says I live on the computer so what am I >to do :) > Heh, sounds like here :-) Ragnar From Peter.Hoogendijk at atosorigin.com Mon Jul 21 01:01:18 2003 From: Peter.Hoogendijk at atosorigin.com (Hoogendijk, Peter) Date: Mon Jul 21 01:01:18 2003 Subject: [Nagiosplug-devel] RFC: Performance data guidelines Message-ID: <63C0E7F555D57547BBC0A4457E8E05EB821C1F@pwi8004.sd.bnet.nl> Karl, You are right, I can include the label with the -P option: ./check_perfmon -f /var/log/perfmon/hostname \ -C "\System\System Up Time" -S "%l" \ -P "SystemUpTime=%ls" The available options for the perfmon plugin are now: -f filename (--filename) -C counter (--counter) -S scanf (--scanf) -P printf (--printf) -w warning threshold (--warning) -c critical threshold (--critical) Kjell is sceptical to the proposal of using the scanf/printf format specifiers, but I need some mechanism to specify the formats! For the input part, the alternative is using a regular expression, but as I'm writing the plugin in C, the scanf format specifier is the easiest to implement, so I'll stick to that format for the moment. Peter. -----Original Message----- From: Karl DeBisschop [mailto:karl at debisschop.net] Sent: woensdag 16 juli 2003 13:22 To: Hoogendijk, Peter Cc: Voon, Ton; NagiosPlug Devel Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines On Wed, 2003-07-16 at 03:28, Hoogendijk, Peter wrote: > Ton, > > This certainly makes sense. I was thinking along the same lines and > concluded that I need two extra (optional) plugin options: > > 1) An option to set the label: -L label (--label) > 2) An option to to specify the format of the data: -P printf > (--printf) > > This solves the problem of the RRD labels. It also proves you are > right with your proposal to do the translations at the plugin, as this > is also the place where I have to configure the perfmon counter to be > checked (for this discussion I'll stick to the Microsoft Windows > Perfmon example). As a result, the perfmon plugin would take the > following > options: > > -f filename (--filename) > -C counter (--counter) > -S scanf (--scanf) > -L label (--label) > -P printf (--printf) > -w warning threshold (--warning) > -c critical threshold (--critical) > > The resulting command to perform the check would be: > > ./check_perfmon -f /var/log/perfmon/hostname -C "\System\System Up > Time" -S "%l" -L "SystemUpTime" -P "%ls" > > The filename, as specified with the -f option, points to the file that > contains a list of Microsoft Windows Perfmon counters and their values > for the host being checked. This file is generated using a third-party > product, running as a service on the Microsoft Windows servers. > > The option names I used are open to discussion, but the principle at > solves the problems being discussed. It also leaves the format of the > perfdata free to be adapted to the program that will process this > data. Why not: ./check_perfmon -f /var/log/perfmon/hostname \ -C "\System\System Up Time" -S "%l" \ -P "SystemUpTime=%ls" Since you are providing a printf format, you really don't need to separately specify the label AFAICS. > how do I specify a warning below 10 and a critical above 45 ? Most (all?) plugins will balk at this. For instance, you can warn outside the range 10-25 and send a critcal response outside the range 0-25, which would be similar. But in general the plugins should and do check to make sure that the values that generate critical condtions are a subset of the warning specification, possibly inclusive. But there is the problem of passing ranges out through the perf data. In most cases, it's a single value - but for some plugins the "good" zone may be above the threshold, and for others below. For ranges, unless RRDtool or others have a native syntax for specifying ranges, I would this we just pass ours our - the good range is within a colon-separated pair. > but what do I do with a signed counter value, when > I don't know the possible minimum and maximum values? ISTM that if he possible minimum/maximum values are not known, a well behaved application would not require that they be specified. -- Karl From vonessj at intelihealth.com Thu Jul 24 08:45:02 2003 From: vonessj at intelihealth.com (VonEssen, John) Date: Thu Jul 24 08:45:02 2003 Subject: [Nagiosplug-devel] check_procs Message-ID: <1150D7750302044CBA06C923B5D6059621BB44@EXCHVS1.corp.intelihealth.com> I have a question about how check_procs search for an argument. Apparently it counts some sort of grep process in its final output. So if you search for a process that is not running, you'll get a return value of "OK - 1 processes ...". Is there any plan to fix this? It's just a matter of grepping out the 'grep' command that was used initially for the search. -John From karl at debisschop.net Thu Jul 24 18:51:03 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Thu Jul 24 18:51:03 2003 Subject: [Nagiosplug-devel] check_procs In-Reply-To: <1150D7750302044CBA06C923B5D6059621BB44@EXCHVS1.corp.intelihealth.com> References: <1150D7750302044CBA06C923B5D6059621BB44@EXCHVS1.corp.intelihealth.com> Message-ID: <1059097764.6506.4.camel@miles.debisschop.net> On Thu, 2003-07-24 at 11:44, VonEssen, John wrote: > I have a question about how check_procs search for an argument. > Apparently it counts some sort of grep process in its final output. It does not use grep. > So if you search for a process that is not running, you'll get a > return value of "OK - 1 processes ...". I cannot reproduce the described bevaiour. Can you tell us what OS and plugin version you are using? Please be as specific as you can. For plugin version, the output of check_procs --version is strongly preferred. For OS, be as specific as you can. Also kernel version and the version of ps used can be significant. Also, the exact comment that you see this behaviour with. If this can be confirmed, it is a bug and we will do our best to fix it. -- Karl From noreply at sourceforge.net Fri Jul 25 01:33:04 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jul 25 01:33:04 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-777416 ] check_ping for IPv6 Message-ID: Patches item #777416, was opened at 2003-07-25 16:32 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=777416&group_id=29880 Category: Enhancement Group: None Status: Open Resolution: None Priority: 5 Submitted By: Hendra (ardhen) Assigned to: Nobody/Anonymous (nobody) Summary: check_ping for IPv6 Initial Comment: Hi, This is a patch for "check_ping.c" to support IPv6. This patch works for nagios-plugins-1.3.1 (tarball). And tested under BSD. You may need to define check_ping6 into Nagios's checkcommands.cfg ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=777416&group_id=29880 From vonessj at intelihealth.com Fri Jul 25 07:54:13 2003 From: vonessj at intelihealth.com (VonEssen, John) Date: Fri Jul 25 07:54:13 2003 Subject: [Nagiosplug-devel] check_procs Message-ID: <1150D7750302044CBA06C923B5D6059621BB45@EXCHVS1.corp.intelihealth.com> System is Solaris 2.8 (2/02). Plugin version is: check_procs (nagios-plugins 1.3.1) 1.9.2.1 I am generating the output from the command line as follows: # /usr/local/nagios/libexec/check_procs -w 5 -c 10 -a abc123xyz and I get: OK - 1 processes running with args abc123xyz And there is nothing with 'abc123xyz' is my ps -ef output. On a related note, using the same commond line syntax, I am unable to get the negate program to work - partly do to the fact that I am not sure of the proper syntax. If I use: # ./negate ./check_procs -w 5 -c 10 -a abc123xyz I get: OK - 2 processes running with args abc123xyz -John -----Original Message----- From: Karl DeBisschop [mailto:karl at debisschop.net] Sent: Thursday, July 24, 2003 9:49 PM To: VonEssen, John Cc: NagiosPlug Devel Subject: Re: [Nagiosplug-devel] check_procs On Thu, 2003-07-24 at 11:44, VonEssen, John wrote: > I have a question about how check_procs search for an argument. > Apparently it counts some sort of grep process in its final output. It does not use grep. > So if you search for a process that is not running, you'll get a > return value of "OK - 1 processes ...". I cannot reproduce the described bevaiour. Can you tell us what OS and plugin version you are using? Please be as specific as you can. For plugin version, the output of check_procs --version is strongly preferred. For OS, be as specific as you can. Also kernel version and the version of ps used can be significant. Also, the exact comment that you see this behaviour with. If this can be confirmed, it is a bug and we will do our best to fix it. -- Karl From noreply at sourceforge.net Fri Jul 25 08:40:03 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jul 25 08:40:03 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-777416 ] check_ping for IPv6 Message-ID: Patches item #777416, was opened at 2003-07-25 01:32 Message generated for change (Comment added) made by undrgrid You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=777416&group_id=29880 Category: Enhancement Group: None Status: Open Resolution: None >Priority: 3 Submitted By: Hendra (ardhen) >Assigned to: Jeremy T. Bouse (undrgrid) Summary: check_ping for IPv6 Initial Comment: Hi, This is a patch for "check_ping.c" to support IPv6. This patch works for nagios-plugins-1.3.1 (tarball). And tested under BSD. You may need to define check_ping6 into Nagios's checkcommands.cfg ---------------------------------------------------------------------- >Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-25 08:39 Message: Logged In: YES user_id=10485 As the one that's already coded the AF-independent code in the CVS tree I will look at the patch to see if there is anything not already in the code that might be useful. Also if you want to submit patches in the future I would recommend reading the Developers Guide [http://nagiosplug.sourceforge.net/developer-guidelines.html] as patches should be applied against the CVS HEAD not the release. The code in CVS HEAD is very different from the 1.3.1 release and thus the patch itself is fairly unlikely to be able to be applied as is. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=777416&group_id=29880 From noreply at sourceforge.net Fri Jul 25 08:53:04 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jul 25 08:53:04 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-777416 ] check_ping for IPv6 Message-ID: Patches item #777416, was opened at 2003-07-25 01:32 Message generated for change (Settings changed) made by undrgrid You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=777416&group_id=29880 Category: Enhancement Group: None >Status: Closed >Resolution: Rejected Priority: 3 Submitted By: Hendra (ardhen) Assigned to: Jeremy T. Bouse (undrgrid) Summary: check_ping for IPv6 Initial Comment: Hi, This is a patch for "check_ping.c" to support IPv6. This patch works for nagios-plugins-1.3.1 (tarball). And tested under BSD. You may need to define check_ping6 into Nagios's checkcommands.cfg ---------------------------------------------------------------------- >Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-25 08:52 Message: Logged In: YES user_id=10485 Upon examination of the patch supplied there were no changes that aren't already in the current CVS HEAD revision. Also the code was very un-portable as the IPv6 command is hard coded into it whereas the current CVS HEAD tests along with the IPv4 ping command in the configure script. CVS HEAD also has -4 (--use-ipv4) and -6 (--use-ipv6) to specify IPv4 or IPv6 as the new resolver function in CVS HEAD resolves to both IPv6 and IPv4 hostnames as provided by DNS. ---------------------------------------------------------------------- Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-25 08:39 Message: Logged In: YES user_id=10485 As the one that's already coded the AF-independent code in the CVS tree I will look at the patch to see if there is anything not already in the code that might be useful. Also if you want to submit patches in the future I would recommend reading the Developers Guide [http://nagiosplug.sourceforge.net/developer-guidelines.html] as patches should be applied against the CVS HEAD not the release. The code in CVS HEAD is very different from the 1.3.1 release and thus the patch itself is fairly unlikely to be able to be applied as is. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=777416&group_id=29880 From pagerc at ufl.edu Fri Jul 25 09:46:08 2003 From: pagerc at ufl.edu (Raymond Page) Date: Fri Jul 25 09:46:08 2003 Subject: [Nagiosplug-devel] Possible forking issue ? Message-ID: <200307251244.54458.pagerc@ufl.edu> To whomever might have a clue more than me: I'm using Cyrus IMAP's imtest to check our mail servers. From the command line I'm able to get the program to work. However, it is not automated. I wrote a perl script that uses Perl Expect to interactively converse with imtest so I don't have to. Running the perl script from the command line works as I anticipated. The problem is when I have nagios call the perl script. With Perl Expect, it forks off a child process, and that child then execs imtest. When called from Nagios, for some reason, an "OPEN" call fails in the Perl Expect module where it forks. I'm just curious if anyone here might know why this would happen. Appreciate any thoughts, Raymond Page From bqueen at nas.nasa.gov Fri Jul 25 15:03:02 2003 From: bqueen at nas.nasa.gov (Brian S Queen) Date: Fri Jul 25 15:03:02 2003 Subject: [Nagiosplug-devel] changes to check_log Message-ID: I changed check_log a bit, because I didn't want it to make copies of my massive log files. It still writes small diff files to tmp, but it no longer keeps a copy of the log file. I think it would be pretty easy to eliminate the tmp file too. Here is a patch (is this a useful patch?): --- ./check_log Fri Jul 25 14:44:10 2003 +++ /usr/local/nagios/libexec/check_log Fri Jul 25 11:24:51 2003 @@ -66,6 +66,9 @@ TAIL="/usr/bin/tail" CAT="/bin/cat" RM="/bin/rm" +WC="/usr/bin/wc" +AWK="/bin/awk" +SED="/bin/sed" PROGNAME=`/bin/basename $0` PROGPATH=`echo $0 | /bin/sed -e 's,[\\/][^\\/][^\\/]*$,,'` @@ -173,12 +176,24 @@ # we're running this test, so copy the original log file over to # the old diff file and exit +new_offset_line=`$WC -l $logfile` if [ ! -e $oldlog ]; then - $CAT $logfile > $oldlog + $ECHO '%s %s\n' $new_offset_line > $oldlog $ECHO "Log check data initialized...\n" exit $STATE_OK fi +#grab the offset +#this program secretly use printf not echo +new_offset_count=`$ECHO '%s %s\n' $new_offset_line | $AWK '{print $1}'` + +offset_count=`$CAT $oldlog | $AWK '{print $1}' ` + +if [ "$offset_count" = "0" ]; then # no new data + $ECHO "Log check ok - no data found" + exitstatus=$STATE_OK +fi + # The old log file exists, so compare it to the original log now # The temporary file that the script should use while @@ -192,7 +207,10 @@ chmod 600 $tempdiff fi -$DIFF $logfile $oldlog > $tempdiff +#start at the NEXT line +offset_count=$[$offset_count + 1] + From JasonT at plumtree.com Fri Jul 25 16:55:01 2003 From: JasonT at plumtree.com (Jason Truong) Date: Fri Jul 25 16:55:01 2003 Subject: [Nagiosplug-devel] is there an API for nagios? Message-ID: My coworker would like to know if there is an API for Nagios and if so, how can he get a copy of it. I'm not a programmer and would not know what to do with an API. Thanks, Jason T. From karl at debisschop.net Fri Jul 25 20:31:24 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Fri Jul 25 20:31:24 2003 Subject: [Nagiosplug-devel] Internationalization Message-ID: <1059190167.5918.4.camel@miles.debisschop.net> I have met with some success in creating the framework for internationalizing the plugins. It seems time to commit this to the CVS HEAD. I wish I could promise that everyone's builds will go smoothly, but frankly I may have to work at it to get exactly the right set of files committed and leave the right set for developers to build locally. As with Automake/Autoconf, I will be uploading a fairly small set of the files - developers will need to have GNU gettext installed for things to work correctly. I am using gettext-0.11.4-7 I have set up a second local sandbox to test my commits as I go, but feel free to email me if you find persistent problems. -- Karl From karl at debisschop.net Fri Jul 25 21:48:02 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Fri Jul 25 21:48:02 2003 Subject: [Nagiosplug-devel] Internationalization In-Reply-To: <1059190167.5918.4.camel@miles.debisschop.net> References: <1059190167.5918.4.camel@miles.debisschop.net> Message-ID: <1059194775.5918.11.camel@miles.debisschop.net> On Fri, 2003-07-25 at 23:29, Karl DeBisschop wrote: > I have met with some success in creating the framework for > internationalizing the plugins. It seems time to commit this to the CVS > HEAD. > I have set up a second local sandbox to test my commits as I go, but > feel free to email me if you find persistent problems. I have completed my initial commits and I find that I can successfully build a working set of files with gettext support in my second CVS sandbox - all the mod were made in my original, then committed, and then tested in the copy. The only problem I have found so far is that you may need to 'make' before you 'make dist' So far, check_tcp is the onlt plugin where I have tried to mark strings for translation. And I'm not fully happy with that even. But with this, we my be able to have partila support for multilanguage plugins for the proposed mid-August alpha release of 1.4.0 -- Karl From karl at debisschop.net Fri Jul 25 22:39:07 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Fri Jul 25 22:39:07 2003 Subject: [Nagiosplug-devel] Internationalization In-Reply-To: <1059194775.5918.11.camel@miles.debisschop.net> References: <1059190167.5918.4.camel@miles.debisschop.net> <1059194775.5918.11.camel@miles.debisschop.net> Message-ID: <1059197843.5918.20.camel@miles.debisschop.net> I have just commited po/fr.po and po/de.po which are the source files for translations for french and german - the two languages where we have volunteers to do translation. Eventually, translation team leads should be able to commit the translated versions of the files to CVS without assistance frm the core developers. For now, however, the point is moot as these files will be subject to frequent changed as more of the plugins are morked up for translation. Nonetheless, translators may want to look at manual informaion for GNU gettext and start getting used to the tsak at hand. Also, judging by the list traffic, I think we would be well served to try and secure additional translation teams for spanish and for italian. Any other languages are welcome as well, I just point those out on the basis of national domian suffixes I have chanced to notice. Post post to this list if you are interested in being part of a translation effort. -- Karl From noreply at sourceforge.net Fri Jul 25 23:17:05 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jul 25 23:17:05 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-777416 ] check_ping for IPv6 Message-ID: Patches item #777416, was opened at 2003-07-25 04:32 Message generated for change (Comment added) made by kdebisschop You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=777416&group_id=29880 Category: Enhancement Group: None Status: Closed Resolution: Rejected Priority: 3 Submitted By: Hendra (ardhen) Assigned to: Jeremy T. Bouse (undrgrid) Summary: check_ping for IPv6 Initial Comment: Hi, This is a patch for "check_ping.c" to support IPv6. This patch works for nagios-plugins-1.3.1 (tarball). And tested under BSD. You may need to define check_ping6 into Nagios's checkcommands.cfg ---------------------------------------------------------------------- >Comment By: Karl DeBisschop (kdebisschop) Date: 2003-07-25 18:59 Message: Logged In: YES user_id=1671 In fairness, We do say that submitting against the stable HEAD is OK as well. Granted, new features should go against the CVS HEAD, but we are not as clear as we could be. (more docs to improve) ---------------------------------------------------------------------- Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-25 11:52 Message: Logged In: YES user_id=10485 Upon examination of the patch supplied there were no changes that aren't already in the current CVS HEAD revision. Also the code was very un-portable as the IPv6 command is hard coded into it whereas the current CVS HEAD tests along with the IPv4 ping command in the configure script. CVS HEAD also has -4 (--use-ipv4) and -6 (--use-ipv6) to specify IPv4 or IPv6 as the new resolver function in CVS HEAD resolves to both IPv6 and IPv4 hostnames as provided by DNS. ---------------------------------------------------------------------- Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-25 11:39 Message: Logged In: YES user_id=10485 As the one that's already coded the AF-independent code in the CVS tree I will look at the patch to see if there is anything not already in the code that might be useful. Also if you want to submit patches in the future I would recommend reading the Developers Guide [http://nagiosplug.sourceforge.net/developer-guidelines.html] as patches should be applied against the CVS HEAD not the release. The code in CVS HEAD is very different from the 1.3.1 release and thus the patch itself is fairly unlikely to be able to be applied as is. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=777416&group_id=29880 From jeremy+nagios at undergrid.net Fri Jul 25 23:49:03 2003 From: jeremy+nagios at undergrid.net (Jeremy T. Bouse) Date: Fri Jul 25 23:49:03 2003 Subject: [Nagiosplug-devel] Internationalization In-Reply-To: <1059197843.5918.20.camel@miles.debisschop.net> References: <1059190167.5918.4.camel@miles.debisschop.net> <1059194775.5918.11.camel@miles.debisschop.net> <1059197843.5918.20.camel@miles.debisschop.net> Message-ID: <20030726064656.GB6053@UnderGrid.net> Karl, I guess it was good timing then as I had just recently been working with getting to use gettext in a PHP project I was working on, so I should be getting up to speed to help out with the i18n work needed for the plugins. As for getting the translations done might it not be a good idea to make a master messages.po file to give to the translation people to work from. If they all work from that master then they should be able to use msgmerge to include new translations into the POT file when there are additions and subtractions of entries. One thought I had while reading the patches you submitted for check_tcp was that I thought from what I read that adding the \n to the end of the entry within _("") might cause problems when ran through xgettext. Have you found any problems with this in your testing or might this be something we need to be aware of and watchful for as we proceed. I do like your idea of having partial gettext support by the mid-august targeted release date is a reachable goal. If we could have english, french and german by then that would be an ideal goal. If you want to split up the plugins that still need work I can take on some of the load. I have plenty of time while commuting back and forth to work on the train now that I have a Sony Vaio which is light enough to take back and forth (replacement for 8lbs Toshiba laptop). I'll also be taking some vacation time the 30th - 1st so I'll have some free time to work on things but not the whole time as wife wants me to spend some time with her. Regards, Jeremy From karl at debisschop.net Sat Jul 26 04:40:02 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Sat Jul 26 04:40:02 2003 Subject: [Nagiosplug-devel] Internationalization In-Reply-To: <20030726064656.GB6053@UnderGrid.net> References: <1059190167.5918.4.camel@miles.debisschop.net> <1059194775.5918.11.camel@miles.debisschop.net> <1059197843.5918.20.camel@miles.debisschop.net> <20030726064656.GB6053@UnderGrid.net> Message-ID: <1059219445.23887.21.camel@miles.debisschop.net> On Sat, 2003-07-26 at 02:46, Jeremy T. Bouse wrote: > Karl, > > I guess it was good timing then as I had just recently been working > with getting to use gettext in a PHP project I was working on, so I > should be getting up to speed to help out with the i18n work needed for > the plugins. As for getting the translations done might it not be a good > idea to make a master messages.po file to give to the translation people > to work from. If they all work from that master then they should be able > to use msgmerge to include new translations into the POT file when there > are additions and subtractions of entries. I thought 1 .po for each language might work well - there's no reason a polis translation team should have to consult with the chinses team on changes. And thats' what auotmake does by default anyway (as an aside, while autotconf has always been a big win, automake in my opinion was closer to break even until this work - I have a feeling this would have been alot harder without automake support.) > One thought I had while reading the patches you submitted for > check_tcp was that I thought from what I read that adding the \n to the > end of the entry within _("") might cause problems when ran through > xgettext. Have you found any problems with this in your testing or might > this be something we need to be aware of and watchful for as we proceed. No - works fine. > I do like your idea of having partial gettext support by the > mid-august targeted release date is a reachable goal. If we could have > english, french and german by then that would be an ideal goal. If you > want to split up the plugins that still need work I can take on some of > the load. I have plenty of time while commuting back and forth to work > on the train now that I have a Sony Vaio which is light enough to take > back and forth (replacement for 8lbs Toshiba laptop). I'll also be > taking some vacation time the 30th - 1st so I'll have some free time > to work on things but not the whole time as wife wants me to spend some > time with her. Great. Here's what I've figured out so far: 1) the style I had where the strings were initialized as constant cahr's up front will not work. Once gettext comes into play, they are no longer const cahr. 2) I could go back to #defines defined up front, but gettext guidance has you keep the message sizes to a 'paragraph' -- so many of the plugins would need severad #defines. (I'm also looking at a size limitation in the constants that would break an old ANSI c compiler). 3) I still plan to extarct embedded docbook out of theses strings at a later date (beyond the mid-august alpha, probably beyond 1.4 entirely). Actaully, gettext makes this pretty easy. If I define S_() as a second macro for gettext(), I can just grep through the file to collect them for the SGML. But to do this, things need to be in the order they would be read. So, I had to move the print_usage and print_help functions above main. Historically, I haven't wanted to do that becuase it makes poeple slog through non-core code before they see tha actual program logic. But it seemed to be the right course given all these constraints. As for approach, when people pick up a plugin to mark for translation, they could give a shout on this lest, or they could enter it in a tracker (we could use bugs, or we could just create a separate tracker for advisory "checkouts" like this) What works for you? -- Karl From noreply at sourceforge.net Sat Jul 26 12:29:02 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sat Jul 26 12:29:02 2003 Subject: [Nagiosplug-devel] [ nagiosplug-New Plugins-778194 ] Performance CoPilot checker Message-ID: New Plugins item #778194, was opened at 2003-07-26 21:28 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=778194&group_id=29880 Category: Application monitor Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jan-Frode Myklebust (smuff) Assigned to: Nobody/Anonymous (nobody) Summary: Performance CoPilot checker Initial Comment: Performance Co-Pilot (PCP) is a framework and services to support system-level performance monitoring and performance management. It can monitor, collect and report on a lot of detailed system-level metrics. F.ex. cpuload, filesystem usage, network usage, network errors, memory usage, pageing activity, etc.. Pluss it's extendable, so you can write your own agents to monitor services that's not already in the default package. For more info, check out the PCP homepage at http://oss.sgi.com/projects/pcp/ (it might even be a competitor to nagios, but a bit more low level). I've written a small plugin that will connect to a pmcd (performance metrics collector daemon) and check any of the metrics the PCP on that host knows about. It can f.ex. check cpuload, filesytem usage, network traffic, number of users, interrups per second, number of processes, etc.. It can monitor just about everything, but at the moment it only runs on linux and IRIX. This small script is just a wrapper around the PCP 'pmval' command, so that it can easily be used with nagios. I currently use it for monitoring load, filesystem usage and machine room temperature. Example usage: define command{ command_name check-pcp-filesys command_line $USER1$/check_pcpmetric.py -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -i $ARG3$ -m filesys.full } ARG3 would here be the filesystem device to check. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=778194&group_id=29880 From noreply at sourceforge.net Sat Jul 26 16:50:04 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sat Jul 26 16:50:04 2003 Subject: [Nagiosplug-devel] [ nagiosplug-New Plugins-778194 ] Performance CoPilot checker Message-ID: New Plugins item #778194, was opened at 2003-07-26 21:28 Message generated for change (Comment added) made by smuff You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=778194&group_id=29880 Category: Application monitor Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jan-Frode Myklebust (smuff) Assigned to: Nobody/Anonymous (nobody) Summary: Performance CoPilot checker Initial Comment: Performance Co-Pilot (PCP) is a framework and services to support system-level performance monitoring and performance management. It can monitor, collect and report on a lot of detailed system-level metrics. F.ex. cpuload, filesystem usage, network usage, network errors, memory usage, pageing activity, etc.. Pluss it's extendable, so you can write your own agents to monitor services that's not already in the default package. For more info, check out the PCP homepage at http://oss.sgi.com/projects/pcp/ (it might even be a competitor to nagios, but a bit more low level). I've written a small plugin that will connect to a pmcd (performance metrics collector daemon) and check any of the metrics the PCP on that host knows about. It can f.ex. check cpuload, filesytem usage, network traffic, number of users, interrups per second, number of processes, etc.. It can monitor just about everything, but at the moment it only runs on linux and IRIX. This small script is just a wrapper around the PCP 'pmval' command, so that it can easily be used with nagios. I currently use it for monitoring load, filesystem usage and machine room temperature. Example usage: define command{ command_name check-pcp-filesys command_line $USER1$/check_pcpmetric.py -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -i $ARG3$ -m filesys.full } ARG3 would here be the filesystem device to check. ---------------------------------------------------------------------- >Comment By: Jan-Frode Myklebust (smuff) Date: 2003-07-27 01:49 Message: Logged In: YES user_id=361 Oops, forgot the attachment. Run './check_pcpmetric.py -h' for defailed usage info. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=778194&group_id=29880 From noreply at sourceforge.net Sun Jul 27 07:47:06 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sun Jul 27 07:47:06 2003 Subject: [Nagiosplug-devel] [ nagiosplug-New Plugins-778477 ] check_frontpage Message-ID: New Plugins item #778477, was opened at 2003-07-27 15:46 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=778477&group_id=29880 Category: Perl plugin Group: None Status: Open Resolution: None Priority: 5 Submitted By: Kev Green (kyrian) Assigned to: Nobody/Anonymous (nobody) Summary: check_frontpage Initial Comment: In anger at that particular bane of unix sysadmins lives, or perhaps my own willingness to install it, the other day I wrote a nagios plugin to monitor whether frontpage appeared to be working on a given site. Do with it as you will, but I accept no responsibility for its failing to allow you to sleep soundly at night, save you from tearing your hair out, etc. Obviously FrontPage remains a trademark/copyright/whatever of Microsoft, and the fact that this plugin has ever proved necessary as a result of various misconfigurations and website migrations from server to server is not intended as any kind of indicator for or against the quality of Microsoft's software. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=778477&group_id=29880 From noreply at sourceforge.net Sun Jul 27 15:59:11 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sun Jul 27 15:59:11 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-778644 ] check_http cookies and keep-alive support Message-ID: Patches item #778644, was opened at 2003-07-27 12:58 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=778644&group_id=29880 Category: Enhancement Group: None Status: Open Resolution: None Priority: 5 Submitted By: Dmitri Smirnov (trbmaker) Assigned to: Nobody/Anonymous (nobody) Summary: check_http cookies and keep-alive support Initial Comment: Finally found a problem with my initial patch. Some IIS servers are not respond if Keep-Alive header supplied in request. I've made new patch for check_http 1.3.1 with 'Keep- Alive' disabled by default (option '-k' to enable). Patch attached below. -----Original Message----- From: Voon, Ton [mailto:Ton.Voon at egg.com] Sent: Tuesday, July 15, 2003 9:04 AM To: Dmitri Smirnov; nagiosplug- devel at lists.sourceforge.net Subject: RE: [Nagiosplug-devel] check_http cookie and app-proxy support Dmitri, Thanks very much for your patch. I'm sorry it has taken so long to look at it. I've given it a try and it seems to work okay with sites that do set cookies. However, it seems to fail when a site does not check for cookies - it just hangs when querying the site. I think there's a bug in your patch somewhere? If you do update your patch, please post on sourceforge so we can keep track of it: http://sourceforge.net/tracker/? group_id=29880&atid=397599 Thanks, Ton > -----Original Message----- > From: Dmitri Smirnov [mailto:Dmitri.Smirnov at fusepoint.com] > Sent: Monday, July 07, 2003 6:04 PM > To: nagiosplug-devel at lists.sourceforge.net > Subject: [Nagiosplug-devel] check_http cookie and app-proxy support > > > Hi guys, > > I've found a number of sites on our infrastructure that require > check_http plugin to have cookie support for sessions management and > 'Connection: Keep-Alive' in HTTP header to work correctly. > Below is a little patch for check_http (latest from CVS) I've made. > Will apriciate, guys, if you will review and incorporate such > functionality in standard check_http (wrapped by cmd arguments > probably). > > Dmitri > ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=778644&group_id=29880 From noreply at sourceforge.net Sun Jul 27 17:09:15 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sun Jul 27 17:09:15 2003 Subject: [Nagiosplug-devel] [ nagiosplug-New Plugins-703898 ] New Plugin: check_traceroute Message-ID: New Plugins item #703898, was opened at 2003-03-14 23:03 Message generated for change (Comment added) made by kyrian You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=703898&group_id=29880 Category: Perl plugin Group: None Status: Open Resolution: None Priority: 5 Submitted By: Myke Place (mplace) Assigned to: Subhendu Ghosh (sghosh) Summary: New Plugin: check_traceroute Initial Comment: check_traceroute is a plugin that sends alerts to Nagios if a specified host is greater than a certain number of hops away. In cases where a multihomed network looses a connection and the remote side can be reached through another gateway check_ping will still report the connection is being up, though it has actually gone done. This is an admittedly rare instance, but perhaps others will find varied uses for this plugin as well. check_traceroute is written in Perl and utilizes Net::Traceroute and Getopts::Long. Usage is as follows: check_traceroute -t [-w ] [-c ] ---------------------------------------------------------------------- Comment By: Kev Green (kyrian) Date: 2003-07-28 01:08 Message: Logged In: YES user_id=99923 Just by way of random thought, it might be a useful addition to this plugin to add the facility to check for a traceroute via a given host (and give the option of whether a traceroute via this host should issue "OK", "WARNING", or "CRITICAL" nagios status), so eg. a network administrator could use it to check that their upstream transit had not failed over to a backup router/connection - or indeed that a routing loop had occured -, as it's common for the hostname of particular routers/hops in traceroute to change in the event of such a failover. HTH, HAND. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=541465&aid=703898&group_id=29880 From Ton.Voon at egg.com Mon Jul 28 01:59:12 2003 From: Ton.Voon at egg.com (Voon, Ton) Date: Mon Jul 28 01:59:12 2003 Subject: [Nagiosplug-devel] check_procs Message-ID: John, check_procs -v will tell you the ps command used. From here, you can see which command has the arguments abc123xyz. But I can tell you now, it is the check_procs command itself that it has caught. This has been fixed in the CVS HEAD version of check_procs. Ton > -----Original Message----- > From: VonEssen, John [mailto:vonessj at intelihealth.com] > Sent: Friday, July 25, 2003 3:54 PM > To: Karl DeBisschop > Cc: NagiosPlug Devel > Subject: RE: [Nagiosplug-devel] check_procs > > > System is Solaris 2.8 (2/02). Plugin version is: > > check_procs (nagios-plugins 1.3.1) 1.9.2.1 > > I am generating the output from the command line as follows: > > # /usr/local/nagios/libexec/check_procs -w 5 -c 10 -a abc123xyz > > and I get: > > OK - 1 processes running with args abc123xyz > > > And there is nothing with 'abc123xyz' is my ps -ef output. > > On a related note, using the same commond line syntax, I am unable to > get the negate program to work - partly do to the fact that I am not > sure of the proper syntax. If I use: > > # ./negate ./check_procs -w 5 -c 10 -a abc123xyz > > I get: > > OK - 2 processes running with args abc123xyz > > -John > > -----Original Message----- > From: Karl DeBisschop [mailto:karl at debisschop.net] > Sent: Thursday, July 24, 2003 9:49 PM > To: VonEssen, John > Cc: NagiosPlug Devel > Subject: Re: [Nagiosplug-devel] check_procs > > On Thu, 2003-07-24 at 11:44, VonEssen, John wrote: > > I have a question about how check_procs search for an argument. > > Apparently it counts some sort of grep process in its final output. > > It does not use grep. > > > So if you search for a process that is not running, you'll get a > > return value of "OK - 1 processes ...". > > I cannot reproduce the described bevaiour. > > Can you tell us what OS and plugin version you are using? > > Please be as specific as you can. For plugin version, the output of > check_procs --version is strongly preferred. For OS, be as specific as > you can. Also kernel version and the version of ps used can be > significant. > > Also, the exact comment that you see this behaviour with. > > If this can be confirmed, it is a bug and we will do our best > to fix it. > > -- > Karl > > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet > _072303_01/01 > _______________________________________________ > Nagiosplug-devel mailing list > Nagiosplug-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel > ::: Please include plugins version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > This private and confidential e-mail has been sent to you by Egg. The Egg group of companies includes Egg Banking plc (registered no. 2999842), Egg Financial Products Ltd (registered no. 3319027) and Egg Investments Ltd (registered no. 3403963) which carries out investment business on behalf of Egg and is regulated by the Financial Services Authority. Registered in England and Wales. Registered offices: 1 Waterhouse Square, 138-142 Holborn, London EC1N 2NA. If you are not the intended recipient of this e-mail and have received it in error, please notify the sender by replying with 'received in error' as the subject and then delete it from your mailbox. From ajernejcic at leasfinanz.at Mon Jul 28 03:13:14 2003 From: ajernejcic at leasfinanz.at (Jernejcic Alexander) Date: Mon Jul 28 03:13:14 2003 Subject: [Nagiosplug-devel] Check Plugin for WuT 57101 Web Thermometer Message-ID: <92442012046.20030728121219@leasfinanz.at> hi, if somebody finds that usefull: a little check for a Wiesemann & Theis 57101 web thermometer. it tries to get the temperature and alarms on the given options. should do its job with other WuT web thermometers if the readout command string is adjusted. (eg 'GET /Single1' for 57601) alexander -- cut --- #!/usr/bin/perl -w # check_wt57101 - check the temperature from W&T 57101 WQEB Thermometer # # License Information: # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. # ############################################################################ use POSIX; use strict; use Getopt::Long; use vars qw($opt_V $opt_h $opt_H $opt_p $opt_w $opt_W $opt_c $opt_C $PROGNAME $host $port $loww $lowc $highw $highc $status $DEFAULT_LOW_WARN $DEFAULT_LOW_CRIT $DEFAULT_HIGH_WARN $DEFAULT_HIGH_CRIT $DEFAULT_PORT $opt_t); use lib "/usr/local/nagios/libexec" ; use utils qw($TIMEOUT %ERRORS &print_revision &support); use IO::Socket; sub print_help (); sub print_usage (); sub process_arguments (); # default temoeratures # too cold $DEFAULT_LOW_WARN=15.0; $DEFAULT_LOW_CRIT=10.0; # too hot $DEFAULT_HIGH_WARN=28.0; $DEFAULT_HIGH_CRIT=35.0; # default port $DEFAULT_PORT="80"; # ProgName $PROGNAME="check_wt57101"; # littel helper my $EOL = "\015\012"; Getopt::Long::Configure('bundling'); $status = process_arguments(); if ($status){ print "ERROR: processing arguments\n"; exit $ERRORS{"UNKNOWN"}; } # Timeout $SIG{'ALRM'} = sub { print ("ERROR: timed out witing for thermometer $host \n"); exit $ERRORS{"WARNING"}; }; alarm($opt_t); # here we get the temperature my $remote; if (! ($remote = IO::Socket::INET->new(Proto => "tcp", PeerAddr => $host, PeerPort => $port))) { print "Error: cannot connect to $port at $host !\n"; exit $ERRORS{"UNKNOWN"}; } $remote->autoflush(1); # CHANGE READOUT COMAND HERE! print $remote "GET /Single B".$EOL; my ($byte,$answer); ANSWER: while (sysread($remote, $byte, 1) == 1) { $answer .= $byte; last ANSWER if ($byte eq 'C'); } close $remote; my ($temperatur,$junk)= split('?',$answer,2); my $msg; if ($temperatur <= $lowc || $temperatur >= $highc) { $status = $ERRORS{'CRITICAL'}; $msg = "CRITICAL: "; } elsif ($temperatur <= $loww || $temperatur >= $highw) { $status = $ERRORS{'WARNING'}; $msg = "WARNING: "; } else { $status = $ERRORS{'OK'}; $msg = "OK: "; } print "$msg Temperature is $answer"; exit $status; # Subs sub process_arguments(){ GetOptions ("V" => \$opt_V, "version" => \$opt_V, "h" => \$opt_h, "help" => \$opt_h, "H=s" => \$opt_H, "hostname" => \$opt_H, "p=s" => \$opt_p, "port" => \$opt_p, "w=s" => \$opt_w, "lowwarning=f" => \$opt_w, # warning if below this "W=s" => \$opt_W, "highwarning=f" => \$opt_W, # warning if above this "c=s" => \$opt_c, "lowcritical=f" => \$opt_c, # critical if below this "C=s" => \$opt_C, "highcritical=f" => \$opt_c, # critical if above this "t=i" => \$opt_t, "timeout=i" => \$opt_t ); if ($opt_V) { print_revision($PROGNAME,'$Revision: 0.1 $ '); exit $ERRORS{'OK'}; } if ($opt_h) { print_help(); exit $ERRORS{'OK'}; } unless (defined $opt_H ) { print_usage(); exit $ERRORS{'UNKNOWN'}; } unless (defined $opt_t) { $opt_t = $utils::TIMEOUT ; # default timeout } $host = $opt_H; $port = $opt_p || $DEFAULT_PORT; $loww = $opt_w || $DEFAULT_LOW_WARN; $lowc = $opt_c || $DEFAULT_LOW_CRIT; $highw = $opt_W || $DEFAULT_HIGH_WARN; $highc = $opt_C || $DEFAULT_HIGH_CRIT; if ($loww !~ /\d+/) { print "Waring: invalid port!\n"; } if (($lowc !~ /\d+/) || ($loww !~ /\d+/) || ($highw !~ /\d+/) || ($highc !~ /\d+/) ) { print "Waring: invalid temperature!\n"; exit $ERRORS{'UNKNOWN'}; } if ($highw >= $highc) { print "Waring: high temp warning can not be abovbe high temp critical!\n"; exit $ERRORS{'UNKNOWN'}; } if ($loww <= $lowc) { print "Waring: low temp warning can not be below low temp critical!\n"; exit $ERRORS{'UNKNOWN'}; } if ($loww >= $highw) { print "Warning: low temp warnung must be below high temp warning!\n"; exit $ERRORS{'UNKNOWN'}; } return $ERRORS{'OK'}; } sub print_usage () { print "Usage: $PROGNAME -H [-p ][-w ] [-c ] [-W ] [-C ] [-t ]\n"; } sub print_help () { print_revision($PROGNAME,'$Revision: 0.1 $'); print "\nauthor: Alexander Jernejcic\n"; print "significant parts of code are \"copy and paste\" \n"; print "from check_mailq by Subhendu Ghosh !\n"; print "\n"; print_usage(); print "\n"; print " Checks the tempereature from a W\&T 57101 WEB Thermometer (http://www.wut.de)\n"; print "-H (--hostadress) = IP-Address of the WebThermometer\n"; print "-p (--port) = port, the WebThermometer ist listening on (default=$DEFAULT_PORT)\n"; print "-w (--lowwarning) = warning if below this Temperature (default=$DEFAULT_LOW_WARN ?C)\n"; print "-W (--highwarning) = warning if above Temperature (default=$DEFAULT_HIGH_WARN ?C)\n"; print "-c (--lowcritical) = critical if below Temperature (default=$DEFAULT_LOW_CRIT ?C)\n"; print "-C (--highcritical) = critical if above Temperatur (default=$DEFAULT_HIGH_CRIT ?C)\n"; print "-t (--timeout) = Plugin timeout in seconds (default = $utils::TIMEOUT)\n"; print "-h (--help)\n"; print "-V (--version)\n"; print "\n\n"; support(); } From wdinyes at ourvacationstore.com Mon Jul 28 14:47:08 2003 From: wdinyes at ourvacationstore.com (William Dinyes) Date: Mon Jul 28 14:47:08 2003 Subject: [Nagiosplug-devel] Issue with custom nagios plugin Message-ID: <202DB470B8484B469E8B4302E44A7139121447@ovs_exchanger.icegallery.com> In toying around with writing my own nagios plugin, I've run into a rather strange error, which I am hoping someone here has seen before. The script I am running sets a date as part of a URL to check freshness of stats pages on my sites. Here's the script (simple? oh yes, and a kludge to boot): #!/bin/bash # # This is a simple script to check our stats servers to make sure that # new stats were uploaded properly today. PATH=/bin:/usr/bin URL=$1 if [ "$1" == "--help" -o "$1" == "-h" ] then //simple help output fi URLDATE=`date -d yesterday +"%Y\/%m"` SEARCHDATE=`date +"%d %b %Y"` URLOUT=`/bin/sed -e "s/DATE/$URLDATE/g" << EOF $URL EOF` /usr/bin/lynx -dump http://$URLOUT | /bin/egrep "Last Update :[[:space:]]+$SEARC HDATE" > /dev/null OUTCODE=$? if [ "$OUTCODE" -eq "1" ] then echo "CRITICAL: Update failed, check hex logs." exit 2 elif [ "$OUTCODE" -eq "0" ] then echo "OK: Update for $SEARCHDATE has occurred." exit 0 else echo "WARNING: Unable to parse file, website down?" exit 1 fi If I run this from the command line as nagios, I get: nagios at tuxedo:~/libexec 125$ ./check_stats 10.1.1.50/DATE/awstats.market.html OK: Update for 28 Jul 2003 has occurred. But when I set it up and run it from nagios itself against the same server, I get the CRITICAL: Update failed... line every time. Things I've tried to no avail: - putting the full path on every command (from sed to date) - outputting additional information indicates that the URL is being constructed fine, the dates are all set properly. - checked services.cfg, checkcommands.cfg, etc. All seem fine. Frankly, all I can think of is that my return code is somehow always getting set to 1 for no reason. Is there something I am missing (like pathing, or some sort of nagios config I've missed)? I did try this with check_http, but getting the date set properly wasn't working at all, no matter how many $USERx$ lines I had. William Dinyes Website Administrator OurVacationStore.com 10030 N. 25th Ave. Phoenix, AZ 85021 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.504 / Virus Database: 302 - Release Date: 7/24/2003 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 3488 bytes Desc: not available URL: From kessib at asp-platform.com Mon Jul 28 18:38:06 2003 From: kessib at asp-platform.com (Jessica Brauer) Date: Mon Jul 28 18:38:06 2003 Subject: [Nagiosplug-devel] Livecam Botschaft für Nagiosplugdevel Message-ID: An HTML attachment was scrubbed... URL: From roy at karlsbakk.net Tue Jul 29 02:20:13 2003 From: roy at karlsbakk.net (Roy Sigurd Karlsbakk) Date: Tue Jul 29 02:20:13 2003 Subject: [Nagiosplug-devel] Writing a new universal cross-platform plugin? Message-ID: <200307291116.47784.roy@karlsbakk.net> hi all This friend of mine came up with a rather neat idea: Write a new universal cross-platform plugin in perl (run as service in windows). Configure the service to expose the services to be monitored (disk space on device c: or /dev/sda, cpu etc), and have an auto-dicover script dicover all local hosts/services, creating nagios configs for it. Perhaps this should be done directly in a database? will nagios 2 support database-based configs? roy From pagerc at ufl.edu Tue Jul 29 14:11:08 2003 From: pagerc at ufl.edu (Raymond Page) Date: Tue Jul 29 14:11:08 2003 Subject: [Nagiosplug-devel] Need to know who wrote OutputTrap.pm Message-ID: <200307291710.29875.pagerc@ufl.edu> To whomever: I wrote earlier about writing a perl script that uses the Expect module. I get the following error from Nagios whenever the Expect module's spawn function is called. Perhaps the author could explain what functionality is lacking for me? Appreciate any comments, Raymond Page From roy at karlsbakk.net Tue Jul 29 15:07:06 2003 From: roy at karlsbakk.net (Roy Sigurd Karlsbakk) Date: Tue Jul 29 15:07:06 2003 Subject: [Nagiosplug-devel] Writing a new universal cross-platform plugin? In-Reply-To: <3F26DDC7.6000606@verisign.com> References: <200307291116.47784.roy@karlsbakk.net> <3F26DDC7.6000606@verisign.com> Message-ID: <200307300003.58358.roy@karlsbakk.net> I'm quite sure that this would benefit users. The reason of making a general plugin instead of mrtg, nsclient and all sorts of stuff is to simplify it all. SNMP is generally more tricky to set up than an agent on most systems, as you'll need to know SNMP, obviously. Most people don't. The auto-discovery part is almost finished already using multicast 'pings' (talking to the agents). So - give us a week or ten, I beleive this project will gain Nagios's quality a lot :) roy On Tuesday 29 July 2003 22:49, you wrote: > I'm not sure what kind of benefit you would gain from writing the client > side portion of this since SNMP agents exist for nearly all platforms > you could conceive. The non-trivial part is the auto-discovery. HP > OpenView does this via pings and SNMP queries, but you run into the > problem where machines with multiple interfaces end up as multiple > instances within your configuration. And that can be very messy. > > I think many people would be interested if some sort of simple > auto-discovery tool was available for Nagios in order to pre-populate > configurations. > > --Andy > > Roy Sigurd Karlsbakk wrote: > > hi all > > > > This friend of mine came up with a rather neat idea: > > > > Write a new universal cross-platform plugin in perl (run as service in > > windows). Configure the service to expose the services to be monitored > > (disk space on device c: or /dev/sda, cpu etc), and have an auto-dicover > > script dicover all local hosts/services, creating nagios configs for it. > > > > Perhaps this should be done directly in a database? will nagios 2 support > > database-based configs? > > > > roy > > > > > > > > ------------------------------------------------------- > > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > > Data Reports, E-commerce, Portals, and Forums are available now. > > Download today and enter to win an XBOX or Visual Studio .NET. > > http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/ > >01 _______________________________________________ > > Nagiosplug-devel mailing list > > Nagiosplug-devel at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel > > > > ::: Please include plugins version (-v) and OS when reporting any issue. > > ::: Messages without supporting info will risk being sent to /dev/null From kdebisschop at alert.infoplease.com Tue Jul 29 19:40:04 2003 From: kdebisschop at alert.infoplease.com (Karl DeBisschop) Date: Tue Jul 29 19:40:04 2003 Subject: [Nagiosplug-devel] check_nt.c Message-ID: <1059532723.15303.24.camel@miles.debisschop.net> Hi Yves, I'm trying to clenup code for a 1.4.0 alpha on the plugins, so I've got my compiler set very strict. I come up with this warning about comparison of unsigned expressions: gcc -O3 -march=athlon-xp -pedantic -W -Wimplicit-int -Wmain -Wreturn-type -Wunused -Wswitch -Wcomment -Wtrigraphs -Wformat -Wchar-subscripts -Wparentheses -Wtraditional -Wshadow -Wcast-qual -Wpointer-arith -L. -L/usr/lib -o check_dig check_dig.o netutils.o utils.o ../lib/libnagiosplug.a -lnsl -lresolv popen.o if gcc -DLOCALEDIR=\"/usr/local/nagios/share/locale\" -DHAVE_CONFIG_H -I. -I../../plugins -I. -I.. -I../../lib -I../../intl -I/usr/include/ldap -I/include -I/usr/include -I/usr/kerberos/include -O3 -march=athlon-xp -pedantic -W -Wimplicit-int -Wmain -Wreturn-type -Wunused -Wswitch -Wcomment -Wtrigraphs -Wformat -Wchar-subscripts -Wparentheses -Wtraditional -Wshadow -Wcast-qual -Wpointer-arith -MT check_nt.o -MD -MP -MF ".deps/check_nt.Tpo" \ -c -o check_nt.o `test -f '../../plugins/check_nt.c' || echo '../../plugins/'`../../plugins/check_nt.c; \ then mv -f ".deps/check_nt.Tpo" ".deps/check_nt.Po"; \ else rm -f ".deps/check_nt.Tpo"; exit 1; \ fi ../../plugins/check_nt.c: In function `main': ../../plugins/check_nt.c:125: warning: comparison of unsigned expression >= 0 is always true ../../plugins/check_nt.c:127: warning: comparison of unsigned expression >= 0 is always true gcc -O3 -march=athlon-xp -pedantic -W -Wimplicit-int -Wmain -Wreturn-type -Wunused -Wswitch -Wcomment -Wtrigraphs -Wformat -Wchar-subscripts -Wparentheses -Wtraditional -Wshadow -Wcast-qual -Wpointer-arith -L. -L/usr/lib -o check_nt check_nt.o netutils.o utils.o ../lib/libnagiosplug.a -lnsl -lresolv The lines look like: else if(vars_to_check==CHECK_CPULOAD){ if (check_value_list==TRUE) { if (strtolarray(&lvalue_list,value_list,",")==TRUE) { /* -l parameters is present with only integers */ return_code=STATE_OK; asprintf(&temp_string,"CPU Load"); while (lvalue_list[0+offset]>(unsigned long)0 && lvalue_list[0+offset]<=(unsigned long)17280 && -> lvalue_list[1+offset]>=(unsigned long)0 && lvalue_list[1+offset]<=(unsigned long)100 && -> lvalue_list[2+offset]>=(unsigned long)0 && lvalue_list[2+offset]<=(unsigned long)100) { /* loop until one of the parameters is wrong or not present */ /* Send request and retrieve data */ If there are no bugs in the program, those lines could be removed. I could also imagine there is a bug, and the comparator should be '>' instead of '>='. However, as I do not use nsclient or check_nt, I have only just now started looking at the code. Can you suggest the propoer way to modify this code so the warning will be suppressed? -- Karl DeBisschop From andreas.unterkircher at cubit.at Thu Jul 31 01:51:10 2003 From: andreas.unterkircher at cubit.at (Andreas Unterkircher) Date: Thu Jul 31 01:51:10 2003 Subject: [Nagiosplug-devel] check_traceroute Message-ID: <1059641445.530.101.camel@winsucks> Hello! Few days ago there was an announcment for an plugin called check_traceroute which i thought i saw in the cvs trees of the nagiosplugs.... now its away... when it will be back??? greetings, andi Andreas Unterkircher CUBiT IT Solutions GmbH Albertgasse 43 A-1080 Wien Tel: +43-1-7189880-0 Fax: +43-1-7189880-11 andreas.unterkircher at cubit.at http://www.cubit.at -------------- next part -------------- An HTML attachment was scrubbed... URL: From noreply at sourceforge.net Thu Jul 31 10:53:02 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jul 31 10:53:02 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Feature Requests-781002 ] Modify check_ping for packet size Message-ID: Feature Requests item #781002, was opened at 2003-07-31 11:51 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397600&aid=781002&group_id=29880 Category: None Group: None Status: Open Priority: 5 Submitted By: J. Casalino (thedoc31) Assigned to: Nobody/Anonymous (nobody) Summary: Modify check_ping for packet size Initial Comment: check_ping doesn't currently support the -s (packetsize) feature of the ping command. I have an instance where normal ICMP packets sometimes don't show packet loss across our WAN but larger packets will show the packet loss. It would be handy to be able to specify the packet size on the commandline of check_ping so I can make a small packet check and a large packet check for WAN latency in Nagios. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397600&aid=781002&group_id=29880 From noreply at sourceforge.net Thu Jul 31 15:29:03 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jul 31 15:29:03 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-769311 ] adds smtp auth ability to the check_smtp plugin Message-ID: Patches item #769311, was opened at 2003-07-10 15:47 Message generated for change (Comment added) made by trig_monkeypr0n You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=769311&group_id=29880 Category: Enhancement Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jason Burnett (trig_monkeypr0n) Assigned to: Jeremy T. Bouse (undrgrid) Summary: adds smtp auth ability to the check_smtp plugin Initial Comment: Adds the ability to confirm that your smtp auth mechanism is working on your smtp server. ---------------------------------------------------------------------- >Comment By: Jason Burnett (trig_monkeypr0n) Date: 2003-07-31 17:20 Message: Logged In: YES user_id=778916 Sorry been out of the office. I will try and get that patch against cvs done tonight. ---------------------------------------------------------------------- Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-17 00:38 Message: Logged In: YES user_id=10485 Can you provide a patch against a recent version of the CVS code? This patch appears to be against a very old version that has had many changes made to it since then. ---------------------------------------------------------------------- Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-17 00:18 Message: Logged In: YES user_id=10485 Looking into the patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=769311&group_id=29880 From noreply at sourceforge.net Thu Jul 31 17:17:21 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jul 31 17:17:21 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-769311 ] adds smtp auth ability to the check_smtp plugin Message-ID: Patches item #769311, was opened at 2003-07-10 15:47 Message generated for change (Comment added) made by trig_monkeypr0n You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=769311&group_id=29880 Category: Enhancement Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jason Burnett (trig_monkeypr0n) Assigned to: Jeremy T. Bouse (undrgrid) Summary: adds smtp auth ability to the check_smtp plugin Initial Comment: Adds the ability to confirm that your smtp auth mechanism is working on your smtp server. ---------------------------------------------------------------------- >Comment By: Jason Burnett (trig_monkeypr0n) Date: 2003-07-31 19:05 Message: Logged In: YES user_id=778916 This is the patch against the cvs version of nagiosplug checked out from anonymous cvs earlier today. ---------------------------------------------------------------------- Comment By: Jason Burnett (trig_monkeypr0n) Date: 2003-07-31 17:20 Message: Logged In: YES user_id=778916 Sorry been out of the office. I will try and get that patch against cvs done tonight. ---------------------------------------------------------------------- Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-17 00:38 Message: Logged In: YES user_id=10485 Can you provide a patch against a recent version of the CVS code? This patch appears to be against a very old version that has had many changes made to it since then. ---------------------------------------------------------------------- Comment By: Jeremy T. Bouse (undrgrid) Date: 2003-07-17 00:18 Message: Logged In: YES user_id=10485 Looking into the patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=769311&group_id=29880 From noreply at sourceforge.net Thu Jul 31 18:21:04 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jul 31 18:21:04 2003 Subject: [Nagiosplug-devel] [ nagiosplug-Patches-781227 ] Can now specify a port for check_disk_smb to use. Message-ID: Patches item #781227, was opened at 2003-07-31 20:20 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=781227&group_id=29880 Category: Enhancement Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jason Burnett (trig_monkeypr0n) Assigned to: Nobody/Anonymous (nobody) Summary: Can now specify a port for check_disk_smb to use. Initial Comment: We needed the ability to specify which port to connect (139 or 445) depending on the host. This patch allows you to specify or leave it blank and use the default of your smbclient. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=397599&aid=781227&group_id=29880 From karl at debisschop.net Thu Jul 31 20:00:03 2003 From: karl at debisschop.net (Karl DeBisschop) Date: Thu Jul 31 20:00:03 2003 Subject: [Nagiosplug-devel] setting LC=ALL in spopen.c Message-ID: <1059706669.16157.56.camel@miles.debisschop.net> The role of spopen() has been to provide a central piece of code for exec of system utilities like 'ps'. In the interest of security, it is fairly restrictive - it excludes shell expansions and strips the environment down to nothing. In marking up the plugins for translation, I noticed that where we use spopen and scan through resulting output for variuos strings, I think we could get flummoxed if the system default LOCALE is not english. My proposed solutio is to change: char *environ[] = { NULL }; to: char *environ[] = { "LC_ALL=C", char*(0) }; I have two questions: 1) Is this really a problem we need to worry about, or does the current NULL environment override any language defaults on the machine. 2) If it is a problem, do people feel that the above is an appropriate way to handle it? -- Karl