[Nagiosplug-devel] Re: SNMP + Nag Was: Kickoff for 1.5

Stanley Hopcroft Stanley.Hopcroft at IPAustralia.Gov.AU
Thu Mar 10 03:30:01 CET 2005

Dear Folks,

Not much here but,

On Wed, Mar 09, 2005 at 09:21:23PM -0500, Subhendu Ghosh wrote:
> On Wed, 9 Mar 2005, Harper Mann wrote:
> >Hi Everyone,
> >
> >There are several items in an SNMP plugin discussion we're interested in 
> >and
> >are working on.  What I can remember off the top of my head is:
> >
> >1) How to manage and alarm on counter data like interface traffic, etc.  We
> >use check_rrd, which was mentioned earlier in this thread, and perhaps
> >that's sufficient since we customarily store and graph, but standardizing
> >this would be good.  We're not sure RRDTool will scale to sufficient size
> >installations.
> >

If the devices support RMON (and most do), then the alarm group 
transforms the problem into one of trap harvesting (ie define alarm 
thresholds on trunks in the switch/router) and have it send traps when 
the threshold is exceeded. Only con is static, non adapatives, 

See below if you want to allow for diurnal/seasonal variation.

> >2) We've had a request to collect 3-4 SNMP values (in, out, errors) from
> >more than 10,000 interfaces every 15 minutes so we're looking into how to
> >scale to such a large installation.  Aside from how to get plugins to keep
> >up with collecting, what's the best way to store so much performance data?
> >
> >3) Fix the performance data so it conforms to the project standards and
> >manages OIDs and Symbolic names well for multiple requests.
> >
> Separate out the functionality  - Nagios is primarily a fault management 
> tool. For 10k interface performance choose a performance 
> management(monitor) tool.

Absolutely. I think the nomenclature is 

1 a poller/collector - interrogates the thingys and saves the data

2 an analyser/presenter - summarise the saved data and report by various 

These are best implemented as separate processes so they can perform 
without tradeoffs. 

Non blocking IO with Net::SNMP out performs forking an Net::SNMP::get.

Storing data in RRDs has the advantages that

1 Lots of third party applications know and love RRDs (orca, cricket)

2 The Holt-Winters time series prediction algorithm can let the analyser 
distinguish a daily surge from an anomaly/problem

NB Toby the RRD man haa got funding from a client to bring the dev 
branch RRD - with the HW stuff - into supported production form.

3 the RRDs are self maintaining. Except in exceptional cases there is no 
need to unload and resize databases when the db fills up (it never does)

4 the storage of an RRD never exceeds what is allocated when the RRD is 

> I've been partial to Cricket to snmp data collection - the snmp 
engine is 
> pretty well designed so that each device is only contacted once and all 
> the different oids are requested together. (cricket.sf.net)
> I've seen it scale quite well so long as you can stagger the the hosts 
> groups (ie. not everything runs at the same 15 min interval) and you can 
> use snmp v2 and get-bulk
> For alarms - either check_rrd or snmptraps from Cricket (and possibly 
> 2Cacti in the near future).

Sounds good to me if you can't get RMON (or don't/can't configure your 
devices - although that were the case, you prob couldn't poll them).

> By forcing Nagios to do traffic measurements from snmp - the scalability 
> is not present based on the plugin architecture.  You need something else 
> to do the active monitor and check the results.

Here Here. Let Nag present part of the conclusions - its neat to have 
the plugin output return a hyperlink to an RRDtool or other CGI that 
allows the Nag viewer to display the RRDtool graphs.

>  For small installs that 
> don't want multiple tools, it would work, but large installs like yours 
> should definitely use separate tools.

Amen brother.

> I used to monitor about the same number of interfaces with mrtg arounf 
> '98-'00.  disk i/o was the biggest issue. (ram disk to the rescue).
> RRDtool scales as well as the underlying hardware (disk i/o) and file 
> layout.

The bottle neck is more likely to be in the poller than RRDtool in my 
view (that's why there are fpings and so on).

> -- 
> -sg

Does this adequately sum up what's been presented that's relevant to Nag 
SNMP plugins ?

1 the plugins should probably confine themselves to checking state 
rather than collecting/storing performance data (leaving this to a 
standalone poller that may or may not interact with Nag directly)

2 traffic thresholds are best dealt with by

2.1 standalone poller + analyser submitting passive service check 
results to Nag (possibly via traps to a trap collector), or

2.2 device specific means (RMON)

3 The probs of dealing with large numbers of communitys remain although 
it seems to me that the -C option should go a long way to help (maybe in 
conjunction with a heap of included files defining different arguments 
for commands.

4 Plugins that save/store state probably don't scale and should thereby 
be excluded from developer focus

5 It may be worth recognising that SNMP pollers/managers are a good 
supplement to Nag; the poller is getting close to peak development and 
therefore effort is only needed in exploiting synergy rather than 
seeking to do it again with plugins.

Yours sincerely.

Stanley Hopcroft

IP Australia
Ph: (02) 6283 3189  Fax: (02) 6281 1353
PO Box 200 Woden  ACT 2606
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: disclaimer.txt
URL: <https://www.monitoring-plugins.org/archive/devel/attachments/20050310/26d2ca69/attachment.txt>

More information about the Devel mailing list