Possible MailScanner bottleneck

David Lee t.d.lee at DURHAM.AC.UK
Fri Jan 18 15:08:24 GMT 2002


Last week, on the campus mail machines which receive email from the
outside world (MX records for our domains), we moved from an early version
of MailScanner to 3.02-1. I also enabled "Spam Checks" (which we had
disabled in the earlier version).

Since then, on the MX-preferred machine, which typically handles, say,
2,000 msgs/hour, we have often seen big build-ups of email, often
considerably over 1000, sitting in the "incoming" directory, waiting for
the "mailscanner" process to get around to processing them.

The output side of mailscanner is configured:
   Delivery Method = batch
   Deliver In Background = yes

specifically to try to keep itself moving,yet we still get this logjam.

When this does happen, a "truss -p <pid>" of mailscanner often shows it
waiting for a long time (few minutes) in the Solaris "door_call" function.
This obscurely named routine is, I understand, the kernel interface to
name lookups such as userids, group names and host names.

My strong suspicion is the code at sendmail.pl, approx line 185:
    for ($i=0; $i<@Config::SpamNames; $i++) {
      # Look up $relay in each of the @Config::SpamDomains we have
      $RBLEntry = gethostbyname("$relay." . $Config::SpamDomains[$i]);

and that it is hanging on "gethostbyname(...)" for a remote site.

In other words, the email throughput of the entire machine can be brought
to a snail's pace or slower by one DNS lookup.

(Note that even if the above analysis of this particular problem is
incorrect, I still believe there is a potential bottleneck here anyway.)

I see a few possible solutions (there may be more):

1. Multi-threading: allow the mailscanner process to be multi-threaded;

2. Multi-processing: allow multiple parallel invocations of mailscanner;

3. Timeout: guard the "gethostbyname(...)" code with a timer.


Multi-threading may be a non-starter: different OSes do it different ways,
if at all.  The programming effort may be large.

Multi-processing may be useful anyway (it's what sendmail does in forking
a new process).  But I seem to recall that the MailScanner cannot, at
present, work this way (is my recollection faulty?).  Is the programming
effort to implement this small, medium, or large?  Multi-processing could
also be augmented by timeout ...

Timeout: I don't know the code well enough, but guess that extra coding
work would be needed to classify messages for which "gethostbyname(...)"
was abandoned (timed-out).


Thoughts?  (Or have I completely missed something obvious?)

--

:  David Lee                                I.T. Service          :
:  Systems Programmer                       Computer Centre       :
:                                           University of Durham  :
:  http://www.dur.ac.uk/t.d.lee/            South Road            :
:                                           Durham                :
:  Phone: +44 191 374 2882                  U.K.                  :



More information about the MailScanner mailing list