Upgraded to 4.67.6, MailScanner scans a batch then hangs at 100 percent CPU

Steve Crumley scrumley at secure-enterprise.com
Wed Mar 12 20:59:47 GMT 2008


 

> -----Original Message-----
> From: mailscanner-bounces at lists.mailscanner.info 
> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf 
> Of Julian Field
> Sent: Tuesday, March 11, 2008 6:50 PM
> To: MailScanner discussion
> Subject: Re: Upgraded to 4.67.6, MailScanner scans a batch 
> then hangs at 100 percent CPU
> 
> * PGP Signed by an unverified key: 03/11/08 at 18:50:26
> 
> 
> 
> Steve Crumley wrote:
> >  
> >
> >   
> >> -----Original Message-----
> >> From: mailscanner-bounces at lists.mailscanner.info 
> >> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf 
> >> Of Glenn Steen
> >> Sent: Tuesday, March 11, 2008 4:32 PM
> >> To: MailScanner discussion
> >> Subject: Re: Upgraded to 4.67.6,MailScanner scans a batch 
> >> then hangs at 100 percent CPU
> >>
> >> On 11/03/2008, Steve Crumley 
> <scrumley at secure-enterprise.com> wrote:
> >>     
> >>>  > -----Original Message-----
> >>>  > From: mailscanner-bounces at lists.mailscanner.info
> >>>  > [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf
> >>>
> >>>       
> >>>> Of Glenn Steen
> >>>>         
> >>>  > Sent: Tuesday, March 11, 2008 1:21 PM
> >>>  > To: MailScanner discussion
> >>>  > Subject: Re: Upgraded to 4.67.6,MailScanner scans a batch
> >>>  > then hangs at 100 percent CPU
> >>>  >
> >>>  > On 11/03/2008, Steve Crumley 
> >>>       
> >> <scrumley at secure-enterprise.com> wrote:
> >>     
> >>>  > >
> >>>  > >
> >>>  > >  > -----Original Message-----
> >>>  > >  > From: mailscanner-bounces at lists.mailscanner.info
> >>>  > >  > [mailto:mailscanner-bounces at lists.mailscanner.info] 
> >>>       
> >> On Behalf
> >>     
> >>>  > >  > Of --[ UxBoD ]--
> >>>  > >
> >>>  > > > Sent: Tuesday, March 11, 2008 11:29 AM
> >>>  > >  > To: MailScanner discussion
> >>>  > >  > Subject: Re: Upgraded to 4.67.6, MailScanner scans a batch
> >>>  > >  > then hangs at 100 percent CPU
> >>>  > >  >
> >>>  > >
> >>>  > > > do you have strace installed on the server ? if so when the
> >>>  > >  > process is running at 100% CPU connect to it and 
> see what it
> >>>  > >  > is doing.  I had this before, but for the life of 
> >>>       
> >> me I cannot
> >>     
> >>>  > >  > remember what I changed to fix it :(
> >>>  > >  >
> >>>  > >  > Things to check :-
> >>>  > >  >
> >>>  > >  > 1) Permissions, are they all correct
> >>>  > >  > 2) Check MailScanner.conf again just to make sure no typos
> >>>  > >  >
> >>>  > >  > Regards,
> >>>  > >  >
> >>>  > >  > --
> >>>  > >
> >>>  > >
> >>>  > > Here is the output from strace:
> >>>  > >
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
> >>>  > >
> >>>  > >
> >>>  > >
> >>>  > >
> >>>  > >  The system had been running fine for over a year, I 
> >>>       
> >> can't find any
> >>     
> >>>  > >  permission or setting change thats doing this, but 
> I could be
> >>>  > >  overlooking something.
> >>>  > >  Thanks,
> >>>  > >  -Steve
> >>>  > >
> >>>  > Could perhaps be a busted SQLite SA cache? What does 
> >>>       
> >> analyse_s<TAB> (I
> >>     
> >>>  > don't remember if it is sacache or spamassassin_cache 
> >>>       
> >> ... the command
> >>     
> >>>  > completion should take care of it:-) say? If it looks 
> >>>       
> >> fishy, simply
> >>     
> >>>  > delete the SA cache file and restart MS.
> >>>  >
> >>>  > You've run MailScanner --lint, right? Nothing obvious 
> from that?
> >>>  >
> >>>  > Oh, and what av scanners do you use? Obviously not 
> >>>       
> >> clamavmodule, but
> >>     
> >>>  > perhaps clamav or clamd? are those OK?
> >>>  >
> >>>  > Cheers
> >>>  > --
> >>>  > -- Glenn
> >>>  > email: glenn < dot > steen < at > gmail < dot > com
> >>>  > work: glenn < dot > steen < at > ap1 < dot > se
> >>>
> >>>       
> >>>> --
> >>>>         
> >>>  > MailScanner mailing list
> >>>  > mailscanner at lists.mailscanner.info
> >>>  > http://lists.mailscanner.info/mailman/listinfo/mailscanner
> >>>  >
> >>>  > Before posting, read http://wiki.mailscanner.info/posting
> >>>  >
> >>>  > Support MailScanner development - buy the book off the website!
> >>>  >
> >>>
> >>>
> >>>
> >>> analyse_SpamAssassin_cache looks clean, MailScanner --lint 
> >>>       
> >> is clean too.
> >>     
> >>>  I'm running clamd for AV but I've set virus scanning to no 
> >>>       
> >> while working
> >>     
> >>>  on this.
> >>>
> >>> Thanks,
> >>>  -Steve
> >>>       
> >> Couldn't be something easily mended, huh:-)....
> >>
> >> What you seem to have attached to above (with strace) would be the
> >> main MailScanner process, since it basically just wait for it's
> >> children to end... Or is it? What does a ps listing show (one that
> >> show the command argument list, since Jules rewrite it to 
> show what it
> >> thinks it is basically doing)?
> >> Do the children restart endlessly when hung? How many children are
> >> there, and in what state?
> >> Cheers
> >> -- Glenn
> >>     
> >
> >
> >
> > When I first started it with 8 children, they all end up 
> quickly hanging
> > and consuming CPU.  For now, I've set it to 1 child and I've been
> > running in debug mode.  The ps gives us a good clue!  Its the only
> > mailscanner process and it reports "MailScanner: extracting 
> attachments"
> >
> > Thanks,
> > -Steve
> >   
> In which case go into "sub Explode" in 
> /usr/lib/MailScanner/MailScanner/Message.pm, and add some 
> "print STDERR" 
> lines to generate tracing output so you can see how far it gets. When 
> you do a "MailScanner --debug" it will show you the STDERR 
> debug output 
> in the terminal session.


OK, Here is whats happening.  Its using Explode in MessageBatch.pm and
not Message.pm.
Here is where it dies in MessageBatch.pm:

sub Explode {
  my $this = shift;
  print STDERR "messagebatch\n";  #crumley

  my($key, $message);

  # jjh 2004-03-12 reap as many as we can.
  # JKF Test 2004-11-23 1 until waitpid(-1, &POSIX::WNOHANG) == -1;
  print STDERR "about to hang\n";  
  1 until waitpid(-1, WNOHANG) == -1;
  print STDERR "we never get here\n";  


> 


More information about the MailScanner mailing list