Vexing problem

Thomas DuVally thomas_duvally at BROWN.EDU
Wed Jul 23 14:07:32 IST 2003


On Tue, 2003-07-22 at 21:35, Michael Janssen wrote:

> Have you got reasons to suspect a memory problem? 16 MS workes should
> consume up to 550MB (I count 33MB resident set size RSS given by "top" per
> worker). This should be fine with 4GB (your sendmail(?)/ virus-scanner/ SA
> can't take all the rest). Is the machine swapping (while it's mostly no
> problem at all when the machine has swaped out some never used data it's
> of course a problem if the machine is actually freeing and claiming
> swap-space)?
>

Swapping doesn't seem to be a problem.  There always seems to be at
least 2gig free available. Sure, some stuff has swapped out, but that's
just cruft, I think.  And about 23MB RES per, but multiply that by 10-15
workers and I only get about 230-350MB, not exactly taxing 4gig

> What are the MS-Processes doing? Standing still (last logentry is what?
> WCHAN and %CPU? strace-output (In the hope Solaris has all this kind of
> information I'm familar to from our linux systems)?) or running too slow?
>

I'll have to dig a little.

> It's a bit hard to track this for 16 workers. Probably with help of a
> filter script, that sets the loglines for different pids to different
> colors (uhm 16 readable colors on console...). Anyway, in case the
> processes are "just" slow it would be interessting if the TIME and
> CTIME (Cumulative TIME - as far as i known only provided by top ("S"
> key) of the Processes differs much.
>

This gives me a better point to dig deeper and some ideas to get better
stats.  BTW, that 16 number is from Raymonds responses, not mine.  I'm
running 10 and seen it go up to 15. (should that be happening, I've
never been sure.  I just assumed it was automagicly creating what it
needed above the initial 10.  If so, shouldn't it be labeled "Min
Children"?)

>
> By the way: I've just generated a fresh report for our system (MS 4.22-5):
> http://www.rz.uni-frankfurt.de/~janssenm/logstats/daily/07.23.marcy.html
>
> and Batchperformance/ Time/Batch (computing how much time was needed to
> work on one batch) shows a very suspious pattern with low scan-times and
> high - well, not high in a critical sense but the pattern is there and it
> is correlated with the "dying of old age" Messages in the logs. I can't
> remember to see such a pattern before and I really don't like it, cause
> one might suspect, that MS would take more and more time without the
> periodically-restart mechanism (which is by now regarded as a hyper secure
> guard against possibly not actual problems). We have upgraded from v4.12
> last week and swithced to sophossavi.... Nice, I'd love to investigate
> that deeper.
>
>
> Michael

--
Thomas J. DuVally
Lead Systems Prog.
CIS, Brown Univ.

http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x15F233F6



More information about the MailScanner mailing list