SAForkAndTest

Adrian Bridgett adrian at smop.co.uk
Thu Sep 26 22:35:00 IST 2002


In our current email setup, we just pipe emails through spamassassin using
spamd/c.  I've configured spamassassin to stick it's report in the headers
in English rather than the short names.  One small perl tweak later and it
now does for all emails rather than just ones with spam in them.  In this
way we can see what SA checks were triggered and make a decision on possibly
raising their associated score.

So, I started delving into mailscanner to see how to set this up and came
across SAForkAndTest.  This function creates a pipe, forks and then runs the
tests in the child process, sending the data through the pipe to the parent
process. In order to implement a time limit on SA, the parent process is
actually in an "eval", with a timer set to a die function which is caught
outside the evail.  Cute.

However I wondered how much overhead all this forking added and so I've
ripped it out, moving the tests into the old "parent process" eval code and
then removing the redundant fork.  On a sample run of 100 calls (over a 10
line email), the time drops from 9.8 seconds to 6.1! (Note, this is just the
SAForkandTest code ripped out into a tiny program). On a larger email (600
lines) the time drops from 57.5/59 seconds to 55/57 seconds.  Not much, but
maybe it's worthwhile investigating.

This was on 1.4GHz Athlon, which is probably close enough to your dual 1GHz,
so for 20000 this might mean for your 20,000 message benchmark, it would
knock 6 or 7 minutes off your 130 minute time which ain't bad.

There is a problem with this (I suspect why it was coded this way in the
first place), SA appears to leak memory.  Going off "SIZE" in top, the perl
process grew from around 13000KB to 14000KB by the end of the 100 messages.
OTOH with forking, the main process sat at 12000KB, spawning a 12300KB
process (as I said before, these were drastically reduced test programs).

Back to the original reason I started looking at this, support for english
text report in the headers.  Currently SAForkandTest prints three results
down the pipe which are read at the other end using regexp matches.  I'm
wondering if these could be done usingq Data::Dumper so that other things
can be passed down the pipe if needed. The result could then be eval()ed".
This could be used elsewhere (RBLs.pm
for instance).  Perhaps even a generic wrapper around such functions so that
the sysadmin can decide if they want a faster, leaky version or a slower,
non-leaky version.  One problem I can see is what hapens with an incomplete
pipe, but that can be fixed by only using the result if the pipe is closed.

The final question is, do you think any of this is worth pursuing or is it
all just a waste of time?

Cheers

Adrian

Email: adrian at smop.co.uk
Windows NT - Unix in beta-testing. GPG/PGP keys available on public key servers
Debian GNU/Linux  -*-  By professionals for professionals  -*-  www.debian.org



More information about the MailScanner mailing list