It seems that viruses CAN slip through MailScanner under high load!

Wed Sep 3 14:41:59 IST 2003

Hi

Bad news I'm afraid. We've just upgraded to MailScanner 4.23-11 and
viruses are still slipping through.  Admittedly our server is still
under load.

Thanks for any help.

Joan

On Fri, 29 Aug 2003 03:16:47 +0100 Brian Hoy <brian.hoy at OPUS.CO.NZ>
wrote:

> Hi all,
>
> Thanks to everyone for their comments and advice.  It is very much
> appreciated.  And especially to Julian for finding and fixing the problem so
> quickly!
>
> Our sendmail config does have the load settings configured that many of you
> mentioned, but still the mail was flowing in!  The input queue was growing
> faster than Mailscanner could scan it, and the problem just kept compounding.
>
> The reason is that the "load average" stats are not always a good measure of
> the real stress that the machine is under.  If a machine is heavily using
> swap space, then the disks and motherboard I/O bandwidth are being consumed
> (and CPU also if the disks are ATA, rather than SCSI), yet no useful work is
> being done.
>
> If a process is waiting on a page fault, I do not think that it is placed in
> the OS's run queue until the page is loaded (and another page swapped out -
> still more disk I/O!).  If this is true then the load average does not
> increase, yet the machine is clearly starting to struggle with the load.
> This is what happened to us the other day.
>
> If you want to experiment with this idea, compile this C program:
>
> // Compile with gcc -o vm_tester vm_tester.c
> //
> #include <stdio.h>
> #include <malloc.h>
>
> #define NUM_PASSES 10
> #define MB_TO_ALLOC 128
> #define BYTES_TO_ALLOC (MB_TO_ALLOC * 1024*1024)
>
> int main(void)
> {
>   char *mem;
>   int pass, r, c;
>
>   if ((mem = (char *) malloc(BYTES_TO_ALLOC)) == NULL)
>   {
>     printf("malloc() failed");
>     exit(-1);
>   }
>
>   for (pass=0; pass<NUM_PASSES; pass++)
>   {
>     for (c=0; c<4096; c++)
>     {
>       for (r=0; r<BYTES_TO_ALLOC/4096; r++)
>       {
>         mem[r*4096 + c]++;
>       }
>     }
>   }
>
>   return 0;
> }
>
> // -----------------------------------------------
>
> It allocates 128M of RAM, and increments bytes in a way that generates as
> many page faults as possible.  As an initial suggestion, run as many of
> these programs as needed to consume all your RAM and watch your other
> processes struggle to get a slice of the CPU.  BTW, don't do this on a
> production server, or try to consume more memory than your total VM - you
> have been warned!
>
> Use top and vmstat to watch things.  If you start running more of these
> programs, then you find that the load average does not increase that much,
> but your disks are flat out, and machine responsiveness goes right out the
> window (esp on ATA disks).
>
> I still think my suggestion (in my first post) for an "unfair" way of
> selecting messages for scanning under "high load" has merit.  When our mail
> gateway was stressed out the other day, I was using strace to monitor the
> system calls in the MailScanner processes, and they were spending 5-30mins
> just doing the stat() calls before locking messages for scanning.
>
> When you machine is really overloaded, let's do anything to concentrate the
> meagre available resources on clearing the queue in the most expedient fashion.
>
> Perhaps "high load" can be determined by the length of the input queue
> (rather than the misleading system load average), and be user configurable.
>
> For example, if the input queue has in excess of 1000 messages waiting, peel
> off any 30 for scanning.  Ensure that no other MailScanner process evaluates
> the length of the queue until a user configurable time has passed (15
> mins?).  I know this is easier said than done, but I think it really would
> help when the machine is steaming up shit creek.
>
> Another thought....Sendmail names all it's df and qf files, such that an
> alphabetical listing is sorted by ascending time order too!  If the other
> MTAs are the same, then perhaps this fact could be used to remove all the
> stat()s and still meet the fairness algorithm?
>
> Comments anyone?
>
> Regards,
> Brian

----------------------
Joan Bryan
Unix Systems Administrator
Information Systems
Telephone: +44 (0) 20 7848 2671
mailto:joan.bryan at kcl.ac.uk