bottlenecking?

Julian Field mailscanner at ecs.soton.ac.uk
Thu Sep 26 17:01:22 IST 2002


Next time it happens, any chance you could tar up the incoming dir and the
mqueue.in dir (the incoming dir is the most useful), and take a look at it
off-line some time, to see if there is anything odd in there?

Also, a "ps -fel" or "ps auxww" at the time when it is stuck, just saved to
a file, would be useful as that might tell us what process is actually
hanging up (it might be the virus scanner, it might be MailScanner, it
might be the TNEF decoder, all sorts of things).

Hopefully this will help me get to the bottom of this one.
Jules.

At 15:24 26/09/2002, you wrote:
>On Thu, 2002-09-26 at 04:56, Julian Field wrote:
> > At 03:19 26/09/2002, you wrote:
> > > >
> > >FWIW: I've seen this sort of problem to one degree or another with every
> > >V3 version of MailScanner that I've deployed. Its too soon to say if v4
> > >will have the same sorts of problems. My solution is to use a smart Perl
> > >monitor, rather than a shell script, to manage the MailScanner
> > >processes. The perl code watches for excessive CPU consumption,
> > >excessive process size, or a MailScanner that's run longer than it
> > >should. If any of the boundary conditions are seen the offending process
> > >is killed and restarted. While I suppose a clever shell script could be
> > >written to do the same thing it was very easy to do with Perl and I took
> > >the path of least resistance.
> >
> > It would be interesting to discover what is actually causing the problem,
> > as I've never seen it on our systems here at all. Have you checked
> > everywhere under /var/spool/MailScanner for "core" files? These can take a
> > very long time to scan, and should just be deleted most of the time. If
> > many other people were seeing the same problem as you, I would have heard
> > about it a lot. And I haven't, so I can only think this is a fairly unusual
> > problem.
> >
>I certainly wouldn't say that it is a common problem or that it happens
>at all frequently. I only see it happen at infrequent intervals. I don't
>know if the problem is load related or message related, but when it
>happens all processing of messages from mqueue.in stops and mail starts
>backing up. By the time I'd notice the problem (usually 15 minutes to a
>hour later) I might have 10-15K messages in the input queue. At that
>point the name of the game is to get the queue cleared and make the
>phone stop ringing, so investigative work mostly has to be done in retro
>spec. I have looked for core files and not found any. So far, simply
>killing the MS process and restarting it causes message processing to
>resume.
>
>For a while I thought that the problem only occurred on my large volume
>servers and was leaning towards a load related cause. But I have
>observed it (even less frequently) on low volume servers (less that 15k
>messages/day). So far I haven't been able to duplicate that failure when
>I save off the contents of the mqueue.in dir and run that though my test
>jig. That might imply that there's some critical set of conditions that
>has to occur to cause MailScanner to go walk-about. One other thing that
>I've observed is that MailScanner always has a batch of messages in
>process at the time of the failure. The same message ID's exist both in
>the work directory and in the input queue. I guess I don't know exactly
>what MS was doing at the time it ran off into the weeds, only that it
>appeared to have been processing messages.
>
> > >The V4 implementation brings new challenges. Not only do you have the
> > >mater process, but you also have a number of child processes to deal
> > >with. I'd like to see a pid file for each of the children, perhaps with
> > >a name of the form mailscanner1.pid, mailscanner2.pid, etc. And it would
> > >be awfully nice is killing the master process would cause it to reap its
> > >children.
> >
> > I happened to write that for you last night. There are pid files for all of
> > the children, and the master creates and destroys these as the children
> > start and stop. I've written an init.d script for it (for RedHat) that has
> > start, stop, restart, status and reload commands. It does the "reload"
> > operation by doing a "kill -HUP" on all the MailScanner processes.
> >
>Very nice.
>--
>The instructions said to use Windows 98 or better, so I installed
>RedHat.

--
Julian Field                Teaching Systems Manager
jkf at ecs.soton.ac.uk         Dept. of Electronics & Computer Science
Tel. 023 8059 2817          University of Southampton
                             Southampton SO17 1BJ



More information about the MailScanner mailing list