[Nagiosplug-devel] Re: SNMP + Nag Was: Kickoff for 1.5
Stanley.Hopcroft at IPAustralia.Gov.AU
Thu Mar 10 03:30:01 CET 2005
Not much here but,
On Wed, Mar 09, 2005 at 09:21:23PM -0500, Subhendu Ghosh wrote:
> On Wed, 9 Mar 2005, Harper Mann wrote:
> >Hi Everyone,
> >There are several items in an SNMP plugin discussion we're interested in
> >are working on. What I can remember off the top of my head is:
> >1) How to manage and alarm on counter data like interface traffic, etc. We
> >use check_rrd, which was mentioned earlier in this thread, and perhaps
> >that's sufficient since we customarily store and graph, but standardizing
> >this would be good. We're not sure RRDTool will scale to sufficient size
If the devices support RMON (and most do), then the alarm group
transforms the problem into one of trap harvesting (ie define alarm
thresholds on trunks in the switch/router) and have it send traps when
the threshold is exceeded. Only con is static, non adapatives,
See below if you want to allow for diurnal/seasonal variation.
> >2) We've had a request to collect 3-4 SNMP values (in, out, errors) from
> >more than 10,000 interfaces every 15 minutes so we're looking into how to
> >scale to such a large installation. Aside from how to get plugins to keep
> >up with collecting, what's the best way to store so much performance data?
> >3) Fix the performance data so it conforms to the project standards and
> >manages OIDs and Symbolic names well for multiple requests.
> Separate out the functionality - Nagios is primarily a fault management
> tool. For 10k interface performance choose a performance
> management(monitor) tool.
Absolutely. I think the nomenclature is
1 a poller/collector - interrogates the thingys and saves the data
2 an analyser/presenter - summarise the saved data and report by various
These are best implemented as separate processes so they can perform
Non blocking IO with Net::SNMP out performs forking an Net::SNMP::get.
Storing data in RRDs has the advantages that
1 Lots of third party applications know and love RRDs (orca, cricket)
2 The Holt-Winters time series prediction algorithm can let the analyser
distinguish a daily surge from an anomaly/problem
NB Toby the RRD man haa got funding from a client to bring the dev
branch RRD - with the HW stuff - into supported production form.
3 the RRDs are self maintaining. Except in exceptional cases there is no
need to unload and resize databases when the db fills up (it never does)
4 the storage of an RRD never exceeds what is allocated when the RRD is
> I've been partial to Cricket to snmp data collection - the snmp
> pretty well designed so that each device is only contacted once and all
> the different oids are requested together. (cricket.sf.net)
> I've seen it scale quite well so long as you can stagger the the hosts
> groups (ie. not everything runs at the same 15 min interval) and you can
> use snmp v2 and get-bulk
> For alarms - either check_rrd or snmptraps from Cricket (and possibly
> 2Cacti in the near future).
Sounds good to me if you can't get RMON (or don't/can't configure your
devices - although that were the case, you prob couldn't poll them).
> By forcing Nagios to do traffic measurements from snmp - the scalability
> is not present based on the plugin architecture. You need something else
> to do the active monitor and check the results.
Here Here. Let Nag present part of the conclusions - its neat to have
the plugin output return a hyperlink to an RRDtool or other CGI that
allows the Nag viewer to display the RRDtool graphs.
> For small installs that
> don't want multiple tools, it would work, but large installs like yours
> should definitely use separate tools.
> I used to monitor about the same number of interfaces with mrtg arounf
> '98-'00. disk i/o was the biggest issue. (ram disk to the rescue).
> RRDtool scales as well as the underlying hardware (disk i/o) and file
The bottle neck is more likely to be in the poller than RRDtool in my
view (that's why there are fpings and so on).
Does this adequately sum up what's been presented that's relevant to Nag
SNMP plugins ?
1 the plugins should probably confine themselves to checking state
rather than collecting/storing performance data (leaving this to a
standalone poller that may or may not interact with Nag directly)
2 traffic thresholds are best dealt with by
2.1 standalone poller + analyser submitting passive service check
results to Nag (possibly via traps to a trap collector), or
2.2 device specific means (RMON)
3 The probs of dealing with large numbers of communitys remain although
it seems to me that the -C option should go a long way to help (maybe in
conjunction with a heap of included files defining different arguments
4 Plugins that save/store state probably don't scale and should thereby
be excluded from developer focus
5 It may be worth recognising that SNMP pollers/managers are a good
supplement to Nag; the poller is getting close to peak development and
therefore effort is only needed in exploiting synergy rather than
seeking to do it again with plugins.
Ph: (02) 6283 3189 Fax: (02) 6281 1353
PO Box 200 Woden ACT 2606
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
More information about the Devel