SAForkAndTest
Julian Field
mailscanner at ecs.soton.ac.uk
Thu Sep 26 22:57:06 IST 2002
There are 2 mail reasons why I wrap up the SpamAssassin calls inside evals
and forks. 1, SA leaks memory like a sieve, as you have found. 2, it
doesn't always terminate in reasonable time. When I first wrote support for
SA, it was relatively easy for it to hit a regexp that took about 24 hours
to evaluate, due to all the back-tracking that was in the regexp. So I
*had* to implement timeouts round it, as otherwise it was unusable. I won't
release code to people that could fail quite so spectacularly :-)
We also had a problem recently when a couple of the RBLs disappeared off
the net for a day. This meant that all the DNS lookups for the RBLs was
taking a long time as it had to wait for a DNS lookup timeout, which is
several seconds. The end result was that our incoming mail flow reduced to
a trickle. I wrapped it up in timeouts, and added the "abandon it if n
consecutive lookups time out" code, so that it would abandon using a dead
RBL if it failed several times in a row. When MailScanner restarts itself a
few hours later, these counters are reset and it gives the RBL another chance.
So yes, I admit that the resulting code is not as fast as it absolutely
could be, but I prefer reliability to a few % of raw speed. You'll never
get people using a system which is 5% faster but very unreliable :-)
I haven't ever looked at Data::Dumper though. What can it do for me?
At 22:35 26/09/2002, you wrote:
>So, I started delving into mailscanner to see how to set this up and came
>across SAForkAndTest. This function creates a pipe, forks and then runs the
>tests in the child process, sending the data through the pipe to the parent
>process. In order to implement a time limit on SA, the parent process is
>actually in an "eval", with a timer set to a die function which is caught
>outside the evail. Cute.
>
>However I wondered how much overhead all this forking added and so I've
>ripped it out, moving the tests into the old "parent process" eval code and
>then removing the redundant fork. On a sample run of 100 calls (over a 10
>line email), the time drops from 9.8 seconds to 6.1! (Note, this is just the
>SAForkandTest code ripped out into a tiny program). On a larger email (600
>lines) the time drops from 57.5/59 seconds to 55/57 seconds. Not much, but
>maybe it's worthwhile investigating.
>
>This was on 1.4GHz Athlon, which is probably close enough to your dual 1GHz,
>so for 20000 this might mean for your 20,000 message benchmark, it would
>knock 6 or 7 minutes off your 130 minute time which ain't bad.
>
>There is a problem with this (I suspect why it was coded this way in the
>first place), SA appears to leak memory. Going off "SIZE" in top, the perl
>process grew from around 13000KB to 14000KB by the end of the 100 messages.
>OTOH with forking, the main process sat at 12000KB, spawning a 12300KB
>process (as I said before, these were drastically reduced test programs).
>
>Back to the original reason I started looking at this, support for english
>text report in the headers. Currently SAForkandTest prints three results
>down the pipe which are read at the other end using regexp matches. I'm
>wondering if these could be done usingq Data::Dumper so that other things
>can be passed down the pipe if needed. The result could then be eval()ed".
>This could be used elsewhere (RBLs.pm
>for instance). Perhaps even a generic wrapper around such functions so that
>the sysadmin can decide if they want a faster, leaky version or a slower,
>non-leaky version. One problem I can see is what hapens with an incomplete
>pipe, but that can be fixed by only using the result if the pipe is closed.
>
>The final question is, do you think any of this is worth pursuing or is it
>all just a waste of time?
>
>Cheers
>
>Adrian
>
>Email: adrian at smop.co.uk
>Windows NT - Unix in beta-testing. GPG/PGP keys available on public key
>servers
>Debian GNU/Linux -*- By professionals for professionals -*- www.debian.org
--
Julian Field Teaching Systems Manager
jkf at ecs.soton.ac.uk Dept. of Electronics & Computer Science
Tel. 023 8059 2817 University of Southampton
Southampton SO17 1BJ
More information about the MailScanner
mailing list