[Nagiosplug-devel] check_ntp (Was: Flight 1.4.8, ready for boarding)

Thomas Guyot-Sionnest dermoth at aei.ca
Thu Apr 5 00:05:40 CEST 2007


On 04/04/07 04:03 PM, sean finney wrote:
> hey thomas,
> 
> On Wed, 2007-04-04 at 13:37 -0400, Thomas Guyot-Sionnest wrote:
>> The problem with xntpd it that it doesn't have a jitter value (ntp v3).
>> I'll work up a patch a bit different than what I sent but basically
>> it'll do the same: take dispersion in place for jitter.
> 
> okay.  i *think* that's the same thing with just a different name,
> right?

Not really. I don't know what's taken into account when calculating the
jitter but the dispersion is definately higher in general (or always?)

ntpq> lass

ind assID status  conf reach auth condition  last_event cnt
===========================================================
[...]
  3 63998  97f4   yes   yes  none  pps.peer   reachable 15
[...]
ntpq> rv 63998 jitter,dispersion
assID=63998 status=97f4 reach, conf, sel_pps.peer, 15 events, event_reach,
jitter=0.002, dispersion=0.926

>> While we were working on this check, me and Holger raised up a few
>> issues. I worked up a todo list and I'd like to share it with you. Only
>> the older ntp client support would go in before the 1.4.8 release.
>>
>> 1. Older ntp server support (I'm working on it)
>>
>> 2. The offset and jitter doesn't change on every call, so there's no
>> reason to poll 4 times and compute the average. I'd like to remove all
>> code related to that.
> 
> one of the design goals when i first wrote the plugin was to mimick as
> closely as possible the behaviour of the previous perl-based check_ntp
> plugin.  after analyzing the plugin plus the source for ntp plus packet
> dumping what the ntp cmdline programs did, this was the behaviour i
> found.  that doesn't necessarily mean that various behaviours are the
> ideal behaviour, but just so you know where it came from :)
> 
> but anyway, as far sampling/averaging goes, the offset/delay can vary a
> bit more if the network is less than reliable iirc, hence the multiple
> requests.  this is what the ntp cmdline client does as well.

What's you're pooling in the jitter section is local variables on the
remote server. That server will update them as time goes, but they'Re
not affected by network conditions.

>> 3. Allow to use -H multiple times
> 
> seems reasonable, though i'd probably have a seperate nagios check for
> each host.  see comments below for (4) though.
> 
>> 3a. Do one lookup for the servers and store an array of IPs for the
>> various functions. (Is it worth it? Will avoid code duplcation
>> implementing #4)
> 
> isn't that what's already done with the getaddrinfo(), and
> array-of-sockets allocation?

The hostname is resolved again in the jitter section. Anyway that's not
a big deal, would only be useful for implementing #4.

>> 4. When multiple servers are specified (either multiple IP per hostname
>> or multiple -H aguments, check the jitter for all servers.
> 
> i think this falls back into the mimicking-behaviour design again.
> previously i believe we only checked the jitter on the remote clock
> declared as the sync source, but i could be wrong.  i don't really think
> this is the *right* behaviour, but before i went fixing it the idea was
> to get something that was compatible with the current versoin.

Right now it gets the first server listed in dns (while the offset
function gets them all) and find the synchronization source. It then
check the sync source; if there is none it will check all candidates.

> actually, istr someone pointing out several months ago that we were
> really doing the wrong thing to begin with wrt jitter checking, and that
> we ought to really be checking the local jitter and not the jitter of
> remote systems to begin with, or something like that.  i'm going from
> some hazy memory here, but i think ultimately the problem is that there
> are two use cases for check_ntp, but the code has in the past and still
> currently not differentiated between the two cases. 

The server jitter somewhat related to its peers's jitter so that's not a
big deal. Moreover, Older ntp server does not have dispersion for the
server itself so it makes supporting them even worse.

> first you have the case of checking the status of the local system, by
> connecting to peers specified on the cmdline and verifying the offset.
> in such cases we really want to see the local jitter and not the remote
> jitter.

Can you explain? I remember the old  perl script user to show a 0/almost
0 jitter on localhost, but that's devinately not what we get when
getting the server jitter. It looks more like it was getting the time
and then showing the jitter in that operation.

> the second case is when you're actually interested in the status of the
> remote system, and in this case you're comparing the state of its clock
> with that of yours (or others), and in which case you're interested in
> the jitter on the remote system.

This is what check_ntp currently do and should be the default behavior IMHO.

> if i'm remembering all of this correctly, i think it would be best to
> provide a flag for which form of check we're doing and then have the
> plugin behave appropriately based on that.

Agreed. But for testing the first case we should only accept to run it
locally (ex. trough NRPE on a remote time server) otherwise it doesn't
make much sense.

>> 5. Look into the possibility of storing some of the sent header in a
>> linked list on write and then match them on reads. That will allow to
>> send all packets as fast as possible (ex. when checking the jitter of
>> all sync candidates) and also to easily drop odd packets. If put in a
>> separate routine that would also allow to easily loop for additional
>> packets and append the data. (Any other suggestion?)
> 
> i'm not quite sure i follow here.  how this is different from poll on an
> array of sockets...?  currently afaik the data *is* sent as fast as
> possible, and we read the data as fast as it comes in.  if we need more
> per-host information, we know ahead of time how many hosts/sockets/etc
> that are needed, so i don't think there's any need for a linked list
> instead of a pre-allocated array for whatever extra data we need to
> track.

This has nothing to do with the array of sockets, but rather making sure
what we get back is what we expect. This is of the lowest priority but
should speed up a bit some sequential operations when there is latency
(like checking jitter on all candidates).

Thomas




More information about the Devel mailing list