Maintaining narf

In general narf is almost entirely hands off once it has been installed properly. However, there is one vital thing you must do periodically: you must roll the logfile. Apart from that, we suggest keeping an eye on narf's memory usage and the state of any article backlog you may have (or not have).

Rolling the narf log

Narf's log accumulates rapidly, especially with its shipped default of logging at least one line for every article that it examines. Unless you have infinite disk space, you will want to roll (and gzip) these log files periodically. In order to roll narf's logfile, follow these steps:

rename $LOGFILE: Rename (mv) the log file you have set narf to use to some other temporary name.
send narf a SIGHUP signal: Using narf's process ID file, send narf a SIGHUP signal. This asks it to (among other things) restart its logfile soon.
wait a bit: narf only notices signals after it has finished processing its current batch file, so before doing anything else you should wait long enough for this to happen. Sixty seconds should be safe.

You can now do whatever you want to the old log file. You probably want to gzip it in order to save space. We roll our narf log file every 24 hours; it accumulates to about 60-70 megabytes in that time, and compresses to about 16 to 18 megabytes.

You may want to produce reports at this time, as we do; our somewhat rough and ready software for this is covered here in the installation instructions. We have put our reports on the web here, but our software to do this is so hackish that we are not distributing it.

Controlling narf

A running narf process can be poked in several ways. It reacts to two Unix signals:

SIGHUP makes narf, at the end of the current input batch, close and reopen its log and terminate the current output batch. It also makes narf save its current list of memorized EMP signatures at the end of processing the current set of batchs.
SIGTERM makes narf exit after it has finished the current input batch. All files will be flushed and closed and the memorized EMP signatures will be saved if possible.

Locally, we send narf a SIGHUP every half hour, so that in a crash (system or otherwise) we will lose as few new recognized EMP signatures as possible.

Narf will automatically reload the filter_innd.pl filter file at the end of processing a run of batches if the file has changed. This is different than in INN, where the filter must be specifically reloaded. While narf makes a fair attempt to not die if something goes wrong with the filter reload, it is possible to take the daemon down with a particularly bad mistake or typo. If the filter has changed and does not reload cleanly, narf will pause until it can be successfully reloaded (and log errors to standard error).

Narf will reload the $CONFFILE file if it exists and changes. Despite the name, this file is not really intended for configuration setups; it is more a hacker's interface to load code or change variables in the running daemon without having to restart it. Narf will not stop running batches if a load of this file fails, unlike the filter file. Unless you understand the implications of lexical scoping and perl's do function, you should not attempt to use this to redefine narf's own functions on the fly.

If the $STOPFILE file exists, narf will stop processing batches until the file is removed. Narf will continue to reload the filter and $CONFFILE despite being stopped, and will save recognized EMP signatures if hit with a SIGHUP. On many systems, narf will immediately resume processing batches if this file is removed and narf is sent a signal that it catches.

Inspecting `$DUMPDIR`

Narf saves copies various sorts of rejected articles in $DUMPDIR if this has been configured in. In particular, all local postings that narf rejects are saved here (either in the file local or in filenames that start with local-), whether or not anything else will normally be saved. What is written is controlled by a subroutine &filter_logname that is normally defined in your filter; if this routine is not defined, nothing but rejected local posts will be logged.

The copies don't normally accumulate fast enough to require automatic maintenance; we just look at them periodically to see what's turned up. As shipped and with our filter, narf will log:

all local postings that narf has rejected, in filenames starting with local- or in the file local (depending on the exact rejection reasons)
cancels rejected by the filter for reasons other than being in too many or the wrong newsgroups (in several variations)
many rejected control messages other than cancels
new spam that is recognized (in two variations specific to our filter)

Updating narf's filter

Our filter (and most INN filters) will automatically recognize much (but not all) new spam and reject future instances of it. Unless you feel like hunting around for new spam and new spammers, you should have no need to update your filter. On the other hand, the author finds that hunting spam can be an amusing and satisfying way of spending some time.

At least some familiarity with perl will be required to update the filter, especially if you plan on doing anything complex.

Narf automatically reloads the filter if it's been changed (see the cautions in the Controlling Narf section).

Avoid restarting narf

Unless narf grows to too large, you should not need to stop and restart it. Because narf and the filter keep various pieces of information in memory (such as cancels narf will reject if it sees them soon, or the signatures of recent articles to use in recognizing excessive posting), restarting narf should be avoided whenever possible.

Narf must be restarted for certain sorts of configuration changes to take effect. You should not normally need to do this if all you're doing is changing the filter.

Tuning narf

Making narf use less memory

As shipped both narf and our filter have generous limits and narf is configured conservatively; both of these may result in large memory usages. Tuning our filter is discussed in its documentation; some of the discussion is also relevant to an unmodified cleanfeed-inn filter, or to our MD5-modified version of it.

The best way to reduce narf memory usage is to remove its need to keep track of recently seen message-ids. In order to do this safely you must arrange that your NNTP daemons do their best to accept no duplicate message-ids. We do this with Paul Vixie's message-id daemon, which keeps an in-memory collection of recently accepted message-ids; we see basically no dups at all in the stream of articles that our narf processes. You should measure your dup rate before turning off narf's tracking of recent message-ids. To turn it off, change the configuration variable $DoMSGHist (and restart narf).

If you are allowing narf to do cancel rejection, you can change the size of the cache of prospective cancels that narf keeps. Although measuring your actual data is best, our statistics suggest that we could get good hit rates with a even a very small cache (see the comments in the source code). The variable $CMSGHIST is what you want to tweak.

Extracting a copy of the article body from the article itself is one of the things that contributes to fragmenting perl's memory and thus growing its total virtual memory. If your filter only examines the first so many bytes of the article body, you should change it to set the global variable $::MaxArtSize. Narf will then copy no more than this much of the actual article body into the __BODY__ element of the %::hdr hash, which has resulted in a significant reduction in memory growth for us. It's vital that your filter behave the same for articles bodies of a size at or over this size, otherwise you may get incorrect results.

If your chosen filter does not look at the article body at all (cleanfeed-inn and ours do look at the article body), you can delete the setting of the __BODY__ element of the %::hdr hash in the &headercrack subroutine.

Internal variables

Narf has a number of internal configuration variables that may not have obvious effects. Although we believe that the shipped narf defaults are fine, you may wish to tune them in special circumstances. The narf source code is the final authority on them but here is a guide to some and their effects.

$xtra: Log various extra information about rejected articles. Our top sources reports depend on this information being available, and it's relatively short; we recommend leaving it on.
$logngs: Log the newsgroups a rejected article was posted to. Effective only if $xtra is also on. Logging newsgroups is a judgement call depending on how much gory information you want versus how much space you have for logs.
$waitint: This variable (normally set to 120 seconds) is how many seconds narf will sleep before looking for more batches to process. Lowering this reduces the latency between your NNTP daemon accepting an article and it being potentially visible to relaynews; however it may result in more CPU time and smaller batches being passed to relaynews. There is an obvious relationship with how often you run relaynews via newsrun and this; it is pointless to try for low latency in narf while you have a high one in newsrun.
$MAXARTSIZE: Articles over this many bytes in size will be summarily rejected, without even their message-ID being parsed out for logging. Large articles can contribute to growing narf's total virtual memory, especially with unwary filters (our filter is wary).
&dumpxtraid: This subroutine returns a string (or nothing) in order to tag log entries rejected articles with identifiers that may otherwise not be easily recovered. Narfhippo can optionally use these tags. This is mostly convenient when you want to accurately track a particular source that disguises itself, such as Netzilla.

In general you should always read and understand the comments in the source code before changing anything not exposed as a runtime option.

You can change many of these parameters (and the &dumpxtraid routine) via code tucked in the $CONFFILE file, although knowledge is recommended. Narf must be restarted for some changes to take effect.

Further information

This page is part of our narf pages.

This page and much of our precautions are maintained by Chris Siebenmann, who hates junk email and other spam.