[Nagiosplug-devel] check_ntp (Was: Flight 1.4.8, ready for boarding)

Thomas Guyot-Sionnest dermoth at aei.ca
Thu Apr 5 15:29:28 CEST 2007


On 05/04/07 03:08 AM, sean finney wrote:
> heya,
> 
> On Wed, 2007-04-04 at 18:05 -0400, Thomas Guyot-Sionnest wrote:
>> Not really. I don't know what's taken into account when calculating the
>> jitter but the dispersion is definately higher in general (or always?)
> 
> okay, here are some definitions  i got from an ntp powerpoint
> presentation made by the author:
> 
> Jitter: exponential average of first-order time differences
> Dispersion: maximum error due oscillator frequency tolerance.
> 
> so yeah, not quite the same thing.  and as you pointed out the
> differences in value are quite big.  should we really be checking it
> with jitter then?  maybe instead we could extend the plugin to check
> various variables/thresholds, and have it return some failure status
> when a non-existant variable is requested?

That could begin to confuse users that don't know much about NTP.
Aspecially since "ntpq -p <host>" will return the dispersion if there is
no jitter. I'd suggest the same behavior for check_ntp.


>> What's you're pooling in the jitter section is local variables on the
>> remote server. That server will update them as time goes, but they'Re
>> not affected by network conditions.
> 
> ah, i see.  so the control packet data probably won't change in the
> interval that it's being checked.  in that case it doesn't make any
> sense to average it, i agree.  however, since this is udp we're talking
> about, maybe it's still worthwhile to throw a couple extra packets on
> the wire to make sure one of them is recieved?  or perhaps we could
> default to a single packet, but provide a configurable retry parameter
> or something.

I'll take a look at what the other check used to do... But I doubt it
would be useful for the jitter section.

>> Right now it gets the first server listed in dns (while the offset
>> function gets them all) and find the synchronization source. It then
>> check the sync source; if there is none it will check all candidates.
> 
> right.  again to be filed under "behaves as before" wrt check_ntp and
> ntpq/ntpdate.

So you're ok to check jitter on all servers?

>>> actually, istr someone pointing out several months ago that we were
>>> really doing the wrong thing to begin with wrt jitter checking, and that
>>> we ought to really be checking the local jitter and not the jitter of
>>> remote systems to begin with, or something like that.  i'm going from
>>> some hazy memory here, but i think ultimately the problem is that there
>>> are two use cases for check_ntp, but the code has in the past and still
>>> currently not differentiated between the two cases. 
>> The server jitter somewhat related to its peers's jitter so that's not a
>> big deal. Moreover, Older ntp server does not have dispersion for the
>> server itself so it makes supporting them even worse.
> 
> out of curiosity, is there any difference on the packet level that could
> let us know the version/vendor of the ntp server?

Yes, but I felt is was much safer to just try with jitter, then try
dispersion if jitter doesn't work. There's many different version/forks
of net daemons out there and it would be really difficult to behave
correctly on all of them.

>>> first you have the case of checking the status of the local system, by
>>> connecting to peers specified on the cmdline and verifying the offset.
>>> in such cases we really want to see the local jitter and not the remote
>>> jitter.
>> Can you explain? I remember the old  perl script user to show a 0/almost
>> 0 jitter on localhost, but that's devinately not what we get when
>> getting the server jitter. It looks more like it was getting the time
>> and then showing the jitter in that operation.
> 
> my memory is hazy, but i'll go digging through the list archives and see
> if i can find the message i'm thinking of.
> 
>>> the second case is when you're actually interested in the status of the
>>> remote system, and in this case you're comparing the state of its clock
>>> with that of yours (or others), and in which case you're interested in
>>> the jitter on the remote system.
>> This is what check_ntp currently do and should be the default behavior IMHO.
> 
> so for clarity: with check_ntp -H host, should the jitter on the host be
> calculated, or the jitter of its sync source / candidate sync sources be
> checked?

You mean: check_ntp -H host -j x -k y

As without -j or -k jitter is not calculated.

Checking the host jitter or its sync source(s) is pretty much the same
(ex if you have one peer both values are equal).

The more I think about it the more I believe there should be no other
way. The NTP server itself is the best place to look for time health. We
could however add more checks, like reachability...

>>> if i'm remembering all of this correctly, i think it would be best to
>>> provide a flag for which form of check we're doing and then have the
>>> plugin behave appropriately based on that.
>> Agreed. But for testing the first case we should only accept to run it
>> locally (ex. trough NRPE on a remote time server) otherwise it doesn't
>> make much sense.
> 
> agreed on both these.
> 
>>>> 5. Look into the possibility of storing some of the sent header in a
>>>> linked list on write and then match them on reads. That will allow to
>>>> send all packets as fast as possible (ex. when checking the jitter of
>>>> all sync candidates) and also to easily drop odd packets. If put in a
>>>> separate routine that would also allow to easily loop for additional
>>>> packets and append the data. (Any other suggestion?)
>>> i'm not quite sure i follow here.  how this is different from poll on an
>>> array of sockets...?  currently afaik the data *is* sent as fast as
>>> possible, and we read the data as fast as it comes in.  if we need more
>>> per-host information, we know ahead of time how many hosts/sockets/etc
>>> that are needed, so i don't think there's any need for a linked list
>>> instead of a pre-allocated array for whatever extra data we need to
>>> track.
>> This has nothing to do with the array of sockets, but rather making sure
>> what we get back is what we expect. This is of the lowest priority but
>> should speed up a bit some sequential operations when there is latency
>> (like checking jitter on all candidates).
> 
> i think i was looking at the offset_request function again, oops.
> anyway, probably the same method could still be used, though the setup
> might be a little more complicated if the total size results from a few
> different gettaddrinfo calls.  is that what you were thinking of using
> the linked list for?  i.e. the setup and not the actual i/o?

I was thinking of resolving the address once and putting them in an
array. The linked list would contain some of the header fields on sent
packets so that when we get a packet back we can match it. This is
longer-term though so we'll see later.

Thomas




More information about the Devel mailing list