Upgraded to 4.67.6, MailScanner scans a batch then hangs at 100 percent CPU

Julian Field MailScanner at ecs.soton.ac.uk
Wed Mar 12 21:51:26 GMT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Steve Crumley wrote:
>  
>
>   
>> -----Original Message-----
>> From: mailscanner-bounces at lists.mailscanner.info 
>> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf 
>> Of Julian Field
>> Sent: Tuesday, March 11, 2008 6:50 PM
>> To: MailScanner discussion
>> Subject: Re: Upgraded to 4.67.6, MailScanner scans a batch 
>> then hangs at 100 percent CPU
>>
>> * PGP Signed by an unverified key: 03/11/08 at 18:50:26
>>
>>
>>
>> Steve Crumley wrote:
>>     
>>>  
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: mailscanner-bounces at lists.mailscanner.info 
>>>> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf 
>>>> Of Glenn Steen
>>>> Sent: Tuesday, March 11, 2008 4:32 PM
>>>> To: MailScanner discussion
>>>> Subject: Re: Upgraded to 4.67.6,MailScanner scans a batch 
>>>> then hangs at 100 percent CPU
>>>>
>>>> On 11/03/2008, Steve Crumley 
>>>>         
>> <scrumley at secure-enterprise.com> wrote:
>>     
>>>>     
>>>>         
>>>>>  > -----Original Message-----
>>>>>  > From: mailscanner-bounces at lists.mailscanner.info
>>>>>  > [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf
>>>>>
>>>>>       
>>>>>           
>>>>>> Of Glenn Steen
>>>>>>         
>>>>>>             
>>>>>  > Sent: Tuesday, March 11, 2008 1:21 PM
>>>>>  > To: MailScanner discussion
>>>>>  > Subject: Re: Upgraded to 4.67.6,MailScanner scans a batch
>>>>>  > then hangs at 100 percent CPU
>>>>>  >
>>>>>  > On 11/03/2008, Steve Crumley 
>>>>>       
>>>>>           
>>>> <scrumley at secure-enterprise.com> wrote:
>>>>     
>>>>         
>>>>>  > >
>>>>>  > >
>>>>>  > >  > -----Original Message-----
>>>>>  > >  > From: mailscanner-bounces at lists.mailscanner.info
>>>>>  > >  > [mailto:mailscanner-bounces at lists.mailscanner.info] 
>>>>>       
>>>>>           
>>>> On Behalf
>>>>     
>>>>         
>>>>>  > >  > Of --[ UxBoD ]--
>>>>>  > >
>>>>>  > > > Sent: Tuesday, March 11, 2008 11:29 AM
>>>>>  > >  > To: MailScanner discussion
>>>>>  > >  > Subject: Re: Upgraded to 4.67.6, MailScanner scans a batch
>>>>>  > >  > then hangs at 100 percent CPU
>>>>>  > >  >
>>>>>  > >
>>>>>  > > > do you have strace installed on the server ? if so when the
>>>>>  > >  > process is running at 100% CPU connect to it and 
>>>>>           
>> see what it
>>     
>>>>>  > >  > is doing.  I had this before, but for the life of 
>>>>>       
>>>>>           
>>>> me I cannot
>>>>     
>>>>         
>>>>>  > >  > remember what I changed to fix it :(
>>>>>  > >  >
>>>>>  > >  > Things to check :-
>>>>>  > >  >
>>>>>  > >  > 1) Permissions, are they all correct
>>>>>  > >  > 2) Check MailScanner.conf again just to make sure no typos
>>>>>  > >  >
>>>>>  > >  > Regards,
>>>>>  > >  >
>>>>>  > >  > --
>>>>>  > >
>>>>>  > >
>>>>>  > > Here is the output from strace:
>>>>>  > >
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>  > >
>>>>>  > >
>>>>>  > >
>>>>>  > >
>>>>>  > >  The system had been running fine for over a year, I 
>>>>>       
>>>>>           
>>>> can't find any
>>>>     
>>>>         
>>>>>  > >  permission or setting change thats doing this, but 
>>>>>           
>> I could be
>>     
>>>>>  > >  overlooking something.
>>>>>  > >  Thanks,
>>>>>  > >  -Steve
>>>>>  > >
>>>>>  > Could perhaps be a busted SQLite SA cache? What does 
>>>>>       
>>>>>           
>>>> analyse_s<TAB> (I
>>>>     
>>>>         
>>>>>  > don't remember if it is sacache or spamassassin_cache 
>>>>>       
>>>>>           
>>>> ... the command
>>>>     
>>>>         
>>>>>  > completion should take care of it:-) say? If it looks 
>>>>>       
>>>>>           
>>>> fishy, simply
>>>>     
>>>>         
>>>>>  > delete the SA cache file and restart MS.
>>>>>  >
>>>>>  > You've run MailScanner --lint, right? Nothing obvious 
>>>>>           
>> from that?
>>     
>>>>>  >
>>>>>  > Oh, and what av scanners do you use? Obviously not 
>>>>>       
>>>>>           
>>>> clamavmodule, but
>>>>     
>>>>         
>>>>>  > perhaps clamav or clamd? are those OK?
>>>>>  >
>>>>>  > Cheers
>>>>>  > --
>>>>>  > -- Glenn
>>>>>  > email: glenn < dot > steen < at > gmail < dot > com
>>>>>  > work: glenn < dot > steen < at > ap1 < dot > se
>>>>>
>>>>>       
>>>>>           
>>>>>> --
>>>>>>         
>>>>>>             
>>>>>  > MailScanner mailing list
>>>>>  > mailscanner at lists.mailscanner.info
>>>>>  > http://lists.mailscanner.info/mailman/listinfo/mailscanner
>>>>>  >
>>>>>  > Before posting, read http://wiki.mailscanner.info/posting
>>>>>  >
>>>>>  > Support MailScanner development - buy the book off the website!
>>>>>  >
>>>>>
>>>>>
>>>>>
>>>>> analyse_SpamAssassin_cache looks clean, MailScanner --lint 
>>>>>       
>>>>>           
>>>> is clean too.
>>>>     
>>>>         
>>>>>  I'm running clamd for AV but I've set virus scanning to no 
>>>>>       
>>>>>           
>>>> while working
>>>>     
>>>>         
>>>>>  on this.
>>>>>
>>>>> Thanks,
>>>>>  -Steve
>>>>>       
>>>>>           
>>>> Couldn't be something easily mended, huh:-)....
>>>>
>>>> What you seem to have attached to above (with strace) would be the
>>>> main MailScanner process, since it basically just wait for it's
>>>> children to end... Or is it? What does a ps listing show (one that
>>>> show the command argument list, since Jules rewrite it to 
>>>>         
>> show what it
>>     
>>>> thinks it is basically doing)?
>>>> Do the children restart endlessly when hung? How many children are
>>>> there, and in what state?
>>>> Cheers
>>>> -- Glenn
>>>>     
>>>>         
>>>
>>> When I first started it with 8 children, they all end up 
>>>       
>> quickly hanging
>>     
>>> and consuming CPU.  For now, I've set it to 1 child and I've been
>>> running in debug mode.  The ps gives us a good clue!  Its the only
>>> mailscanner process and it reports "MailScanner: extracting 
>>>       
>> attachments"
>>     
>>> Thanks,
>>> -Steve
>>>   
>>>       
>> In which case go into "sub Explode" in 
>> /usr/lib/MailScanner/MailScanner/Message.pm, and add some 
>> "print STDERR" 
>> lines to generate tracing output so you can see how far it gets. When 
>> you do a "MailScanner --debug" it will show you the STDERR 
>> debug output 
>> in the terminal session.
>>     
>
>
> OK, Here is whats happening.  Its using Explode in MessageBatch.pm and
> not Message.pm.
> Here is where it dies in MessageBatch.pm:
>
> sub Explode {
>   my $this = shift;
>   print STDERR "messagebatch\n";  #crumley
>
>   my($key, $message);
>
>   # jjh 2004-03-12 reap as many as we can.
>   # JKF Test 2004-11-23 1 until waitpid(-1, &POSIX::WNOHANG) == -1;
>   print STDERR "about to hang\n";  
>   1 until waitpid(-1, WNOHANG) == -1;
>   print STDERR "we never get here\n";  
>   
But as the comments in the code show, this code hasn't been touched 
since 2004. So I don't understand why you are just seeing a change in 
behaviour. I would suspect you have upgraded something else in your system.

Are other people seeing the same problem?
What OS, distro, version, kernel, etc are you running?
Is anyone else running an identical system?
If so, are they seeing the same symptoms?

 From the "perl-func" man page:
       waitpid PID,FLAGS
               Waits for a particular child process to terminate and returns
               the pid of the deceased process, or "-1" if there is no such
               child process.
so it should reap processes until there aren't any left to be reaped. 
What does the documentation for waitpid say on your system? This is a 
POSIX function, so should be the same across most systems.

If you take out the waitpid() call, you will collect <defunct> 
processes, as they are terminating but never being reaped. So this call 
is very necessary.

I'm not going to touch this code with a 10-foot barge pole unless I have 
*very* good reason to.

Jules

- -- 
Julian Field MEng CITP CEng
www.MailScanner.info
Buy the MailScanner book at www.MailScanner.info/store

MailScanner customisation, or any advanced system administration help?
Contact me at Jules at Jules.FM

PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
PGP public key: http://www.jules.fm/julesfm.asc


-----BEGIN PGP SIGNATURE-----
Version: PGP Desktop 9.8.1 (Build 2523)
Comment: Use Thunderbird Enigmail to verify this message
Charset: ISO-8859-1

wj8DBQFH2FBiEfZZRxQVtlQRAl4eAJ0SzVj0VVnisBxaEqBH/FArFk5t9gCgvk/I
UjetCsUZ1ZmEaLAA4+DJB7g=
=hWp8
-----END PGP SIGNATURE-----

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the MailScanner mailing list