[Nagiosplug-help] IIS Load and throughput

Thomas Guyot-Sionnest dermoth at aei.ca
Fri Oct 5 14:26:44 CEST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/10/07 01:55 AM, Anthony Montibello wrote:
> "The problem with them is that they cannot get
> the average value for a real period of time. Also NC_Net doesn't handle
> counters that go away momentarily (i.e. restarted service).
> <http://sourceforge.net/projects/nc-net>"
>  
> Did you test NC_Net v 4.1a for the counters going away issue? I re-coded
> the Counter check such that it should be fixed, unless the counter is
> away at the moment of testing, but you should be able to recheck it for
> a good value. (please let me know about this)

That's good news. However as I said we're not using this system to graph
out systems perfmons as it was originally planed and the decision isn't
mine. There's still some NC_Net instances here and there but I most
likely won't have time to play with that anytime soon. And finally the
most problematic counters are not monitored anymore because of some
architectural changes (they're useless now).

> I agree sbout the problem in recovering  the Rate of time values, and
> have a few notes about it,
> does the accuracy of the value really mattrer. when it is sampled you
> get a snapshot of the value at that moment, and with  lots of
> instintanious samples put into a RRD isnt that close enough to give a
> good overview of the load?
>  
> I would think many monitoring apps do not take into account that these
> rates need to be prefetched in order to determin a rate/time for X min. 
> It is all a compromize of resources and since the Users are removed from
> the details of the implementation they naturally assume it is what the
>  labele implies. to put it simple, a brief 1/2 sec check every 5 min of
> a Time/rate value is NOT a 5 min average and in some cases it may not be
> a good representation of the average.

Well, the problem is dual-faced. Not only the time sampled is very sort,
it's still a bit long to have a poller run trough all counters every
minute. As a workaround we use Nagios to poll the counters and use a
Nagios performance data caching daemon to hold the latest value so that
Cacti can fetch them in almost no time (In our system we get samples
every minute - that's also the precision of our RRDs).

The real source of the problem is Windows that cannot provide real
counters from which you can derive precise values by calculating the
time between two polls. Since Windows can't be fixed the only solution
left is a daemon that do the work for us.

> I think in some cases this does matter, thats why nc_net implements
> CPULOAD in the way you described, it keeps an internal RRD of the CPU
> (_TOTAL)  with the time between samples configurable in the
> config,roughly 12 times a min.  then when CPU load is requested it
> calculates the average.
>  
> I have also thought of a similar Proxie for windows that basically does
> what you describe, although i have not had funding to beging working on
> it.  one of the issues is that most people assume if you can retrieve
> the counter or the value from WMI then its OK, and they forget that
> microsoft somtimes is a bit misleading.

Yes that's another issue. I understood the problem by reading M$ kb
articles about the different ways to poll performance data but I don't
expect many people will even find (or look for) them :)

A good start could be explaining this issue in your NC_Net page and hope
NC_Net users will read it...

Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBi2E6dZ+Kt5BchYRAiMIAJ41P+dG6kTfsM+p6w8XwbAlBJtmEACcD0u6
dTFltlPu7vDnqGryeESLsBs=
=7/b0
-----END PGP SIGNATURE-----




More information about the Help mailing list