<div>"The real source of the problem is Windows that cannot provide real<br>counters from which you can derive precise values by calculating the<br>time between two polls. Since Windows can't be fixed the only solution <br>left is a daemon that do the work for us."</div> <div> </div> <div>I will stop responding with this message, since I think it may not benefit the NAGIOS community anymore.</div> <div> </div> <div>Windows does actually provide real counter (and the sample space beteen polls can be varied) , But I agree the problem is at windows.</div> <div>There is no good mechanism retrieving average values remotly without keeping local cache of intermediate values. </div> <div>for example if you want a 5 min average of a counter it MUST be prepared ahead of time (windows cannot provide it in realtiume) otherwise the monitoring server would have to wait 5 min (unacceptable) </div> <div> </div> <div>Is anyone else interested in retrieving more accurate Averages of these values, or is this a dead end point.</div> <div> </div> <div>TOny (author of NC_NEt)</div> <div> </div> <div><br><br> </div> <div><span class="gmail_quote">On 10/5/07, <b class="gmail_sendername">Thomas Guyot-Sionnest</b> <<a href="mailto:dermoth@aei.ca">dermoth@aei.ca</a>> wrote:</span> <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">-----BEGIN PGP SIGNED MESSAGE-----<br>Hash: SHA1<br><br>On 05/10/07 01:55 AM, Anthony Montibello wrote:<br> > "The problem with them is that they cannot get<br>> the average value for a real period of time. Also NC_Net doesn't handle<br>> counters that go away momentarily (i.e. restarted service).<br>> < <a href="http://sourceforge.net/projects/nc-net">http://sourceforge.net/projects/nc-net</a>>"<br>><br>> Did you test NC_Net v 4.1a for the counters going away issue? I re-coded<br>> the Counter check such that it should be fixed, unless the counter is <br>> away at the moment of testing, but you should be able to recheck it for<br>> a good value. (please let me know about this)<br><br>That's good news. However as I said we're not using this system to graph <br>out systems perfmons as it was originally planed and the decision isn't<br>mine. There's still some NC_Net instances here and there but I most<br>likely won't have time to play with that anytime soon. And finally the <br>most problematic counters are not monitored anymore because of some<br>architectural changes (they're useless now).<br><br>> I agree sbout the problem in recovering the Rate of time values, and<br>> have a few notes about it, <br>> does the accuracy of the value really mattrer. when it is sampled you<br>> get a snapshot of the value at that moment, and with lots of<br>> instintanious samples put into a RRD isnt that close enough to give a <br>> good overview of the load?<br>><br>> I would think many monitoring apps do not take into account that these<br>> rates need to be prefetched in order to determin a rate/time for X min.<br>> It is all a compromize of resources and since the Users are removed from <br>> the details of the implementation they naturally assume it is what the<br>> labele implies. to put it simple, a brief 1/2 sec check every 5 min of<br>> a Time/rate value is NOT a 5 min average and in some cases it may not be <br>> a good representation of the average.<br><br>Well, the problem is dual-faced. Not only the time sampled is very sort,<br>it's still a bit long to have a poller run trough all counters every<br>minute. As a workaround we use Nagios to poll the counters and use a <br>Nagios performance data caching daemon to hold the latest value so that<br>Cacti can fetch them in almost no time (In our system we get samples<br>every minute - that's also the precision of our RRDs).<br><br>The real source of the problem is Windows that cannot provide real <br>counters from which you can derive precise values by calculating the<br>time between two polls. Since Windows can't be fixed the only solution<br>left is a daemon that do the work for us.<br><br>> I think in some cases this does matter, thats why nc_net implements <br>> CPULOAD in the way you described, it keeps an internal RRD of the CPU<br>> (_TOTAL) with the time between samples configurable in the<br>> config,roughly 12 times a min. then when CPU load is requested it <br>> calculates the average.<br>><br>> I have also thought of a similar Proxie for windows that basically does<br>> what you describe, although i have not had funding to beging working on<br>> it. one of the issues is that most people assume if you can retrieve <br>> the counter or the value from WMI then its OK, and they forget that<br>> microsoft somtimes is a bit misleading.<br><br>Yes that's another issue. I understood the problem by reading M$ kb<br>articles about the different ways to poll performance data but I don't <br>expect many people will even find (or look for) them :)<br><br>A good start could be explaining this issue in your NC_Net page and hope<br>NC_Net users will read it...<br><br>Thomas<br>-----BEGIN PGP SIGNATURE-----<br> Version: GnuPG v1.4.6 (GNU/Linux)<br>Comment: Using GnuPG with Mozilla - <a href="http://enigmail.mozdev.org">http://enigmail.mozdev.org</a><br><br>iD8DBQFHBi2E6dZ+Kt5BchYRAiMIAJ41P+dG6kTfsM+p6w8XwbAlBJtmEACcD0u6<br>dTFltlPu7vDnqGryeESLsBs= <br>=7/b0<br>-----END PGP SIGNATURE-----<br></blockquote></div><br>