Upgraded to 4.67.6, MailScanner scans a batch then hangs at 100 percent CPU

Scott Silva ssilva at sgvwater.com
Thu Mar 13 16:17:26 GMT 2008


on 3-12-2008 9:19 PM Steve Crumley spake the following:
>  
> 
>> -----Original Message-----
>> From: mailscanner-bounces at lists.mailscanner.info 
>> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf 
>> Of Julian Field
>> Sent: Wednesday, March 12, 2008 5:51 PM
>> To: MailScanner discussion
>> Subject: Re: Upgraded to 4.67.6, MailScanner scans a batch 
>> then hangs at 100 percent CPU
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>>
>> Steve Crumley wrote:
>>>  
>>>
>>>   
>>>> -----Original Message-----
>>>> From: mailscanner-bounces at lists.mailscanner.info 
>>>> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf 
>>>> Of Julian Field
>>>> Sent: Tuesday, March 11, 2008 6:50 PM
>>>> To: MailScanner discussion
>>>> Subject: Re: Upgraded to 4.67.6, MailScanner scans a batch 
>>>> then hangs at 100 percent CPU
>>>>
>>>> * PGP Signed by an unverified key: 03/11/08 at 18:50:26
>>>>
>>>>
>>>>
>>>> Steve Crumley wrote:
>>>>     
>>>>>  
>>>>>
>>>>>   
>>>>>       
>>>>>> -----Original Message-----
>>>>>> From: mailscanner-bounces at lists.mailscanner.info 
>>>>>> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf 
>>>>>> Of Glenn Steen
>>>>>> Sent: Tuesday, March 11, 2008 4:32 PM
>>>>>> To: MailScanner discussion
>>>>>> Subject: Re: Upgraded to 4.67.6,MailScanner scans a batch 
>>>>>> then hangs at 100 percent CPU
>>>>>>
>>>>>> On 11/03/2008, Steve Crumley 
>>>>>>         
>>>> <scrumley at secure-enterprise.com> wrote:
>>>>     
>>>>>>     
>>>>>>         
>>>>>>>  > -----Original Message-----
>>>>>>>  > From: mailscanner-bounces at lists.mailscanner.info
>>>>>>>  > [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf
>>>>>>>
>>>>>>>       
>>>>>>>           
>>>>>>>> Of Glenn Steen
>>>>>>>>         
>>>>>>>>             
>>>>>>>  > Sent: Tuesday, March 11, 2008 1:21 PM
>>>>>>>  > To: MailScanner discussion
>>>>>>>  > Subject: Re: Upgraded to 4.67.6,MailScanner scans a batch
>>>>>>>  > then hangs at 100 percent CPU
>>>>>>>  >
>>>>>>>  > On 11/03/2008, Steve Crumley 
>>>>>>>       
>>>>>>>           
>>>>>> <scrumley at secure-enterprise.com> wrote:
>>>>>>     
>>>>>>         
>>>>>>>  > >
>>>>>>>  > >
>>>>>>>  > >  > -----Original Message-----
>>>>>>>  > >  > From: mailscanner-bounces at lists.mailscanner.info
>>>>>>>  > >  > [mailto:mailscanner-bounces at lists.mailscanner.info] 
>>>>>>>       
>>>>>>>           
>>>>>> On Behalf
>>>>>>     
>>>>>>         
>>>>>>>  > >  > Of --[ UxBoD ]--
>>>>>>>  > >
>>>>>>>  > > > Sent: Tuesday, March 11, 2008 11:29 AM
>>>>>>>  > >  > To: MailScanner discussion
>>>>>>>  > >  > Subject: Re: Upgraded to 4.67.6, MailScanner 
>> scans a batch
>>>>>>>  > >  > then hangs at 100 percent CPU
>>>>>>>  > >  >
>>>>>>>  > >
>>>>>>>  > > > do you have strace installed on the server ? if 
>> so when the
>>>>>>>  > >  > process is running at 100% CPU connect to it and 
>>>>>>>           
>>>> see what it
>>>>     
>>>>>>>  > >  > is doing.  I had this before, but for the life of 
>>>>>>>       
>>>>>>>           
>>>>>> me I cannot
>>>>>>     
>>>>>>         
>>>>>>>  > >  > remember what I changed to fix it :(
>>>>>>>  > >  >
>>>>>>>  > >  > Things to check :-
>>>>>>>  > >  >
>>>>>>>  > >  > 1) Permissions, are they all correct
>>>>>>>  > >  > 2) Check MailScanner.conf again just to make 
>> sure no typos
>>>>>>>  > >  >
>>>>>>>  > >  > Regards,
>>>>>>>  > >  >
>>>>>>>  > >  > --
>>>>>>>  > >
>>>>>>>  > >
>>>>>>>  > > Here is the output from strace:
>>>>>>>  > >
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >  waitpid(-1, 0xbff09448, WNOHANG)        = 0
>>>>>>>  > >
>>>>>>>  > >
>>>>>>>  > >
>>>>>>>  > >
>>>>>>>  > >  The system had been running fine for over a year, I 
>>>>>>>       
>>>>>>>           
>>>>>> can't find any
>>>>>>     
>>>>>>         
>>>>>>>  > >  permission or setting change thats doing this, but 
>>>>>>>           
>>>> I could be
>>>>     
>>>>>>>  > >  overlooking something.
>>>>>>>  > >  Thanks,
>>>>>>>  > >  -Steve
>>>>>>>  > >
>>>>>>>  > Could perhaps be a busted SQLite SA cache? What does 
>>>>>>>       
>>>>>>>           
>>>>>> analyse_s<TAB> (I
>>>>>>     
>>>>>>         
>>>>>>>  > don't remember if it is sacache or spamassassin_cache 
>>>>>>>       
>>>>>>>           
>>>>>> ... the command
>>>>>>     
>>>>>>         
>>>>>>>  > completion should take care of it:-) say? If it looks 
>>>>>>>       
>>>>>>>           
>>>>>> fishy, simply
>>>>>>     
>>>>>>         
>>>>>>>  > delete the SA cache file and restart MS.
>>>>>>>  >
>>>>>>>  > You've run MailScanner --lint, right? Nothing obvious 
>>>>>>>           
>>>> from that?
>>>>     
>>>>>>>  >
>>>>>>>  > Oh, and what av scanners do you use? Obviously not 
>>>>>>>       
>>>>>>>           
>>>>>> clamavmodule, but
>>>>>>     
>>>>>>         
>>>>>>>  > perhaps clamav or clamd? are those OK?
>>>>>>>  >
>>>>>>>  > Cheers
>>>>>>>  > --
>>>>>>>  > -- Glenn
>>>>>>>  > email: glenn < dot > steen < at > gmail < dot > com
>>>>>>>  > work: glenn < dot > steen < at > ap1 < dot > se
>>>>>>>
>>>>>>>       
>>>>>>>           
>>>>>>>> --
>>>>>>>>         
>>>>>>>>             
>>>>>>>  > MailScanner mailing list
>>>>>>>  > mailscanner at lists.mailscanner.info
>>>>>>>  > http://lists.mailscanner.info/mailman/listinfo/mailscanner
>>>>>>>  >
>>>>>>>  > Before posting, read http://wiki.mailscanner.info/posting
>>>>>>>  >
>>>>>>>  > Support MailScanner development - buy the book off 
>> the website!
>>>>>>>  >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> analyse_SpamAssassin_cache looks clean, MailScanner --lint 
>>>>>>>       
>>>>>>>           
>>>>>> is clean too.
>>>>>>     
>>>>>>         
>>>>>>>  I'm running clamd for AV but I've set virus scanning to no 
>>>>>>>       
>>>>>>>           
>>>>>> while working
>>>>>>     
>>>>>>         
>>>>>>>  on this.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>  -Steve
>>>>>>>       
>>>>>>>           
>>>>>> Couldn't be something easily mended, huh:-)....
>>>>>>
>>>>>> What you seem to have attached to above (with strace) 
>> would be the
>>>>>> main MailScanner process, since it basically just wait for it's
>>>>>> children to end... Or is it? What does a ps listing show 
>> (one that
>>>>>> show the command argument list, since Jules rewrite it to 
>>>>>>         
>>>> show what it
>>>>     
>>>>>> thinks it is basically doing)?
>>>>>> Do the children restart endlessly when hung? How many 
>> children are
>>>>>> there, and in what state?
>>>>>> Cheers
>>>>>> -- Glenn
>>>>>>     
>>>>>>         
>>>>> When I first started it with 8 children, they all end up 
>>>>>       
>>>> quickly hanging
>>>>     
>>>>> and consuming CPU.  For now, I've set it to 1 child and I've been
>>>>> running in debug mode.  The ps gives us a good clue!  Its the only
>>>>> mailscanner process and it reports "MailScanner: extracting 
>>>>>       
>>>> attachments"
>>>>     
>>>>> Thanks,
>>>>> -Steve
>>>>>   
>>>>>       
>>>> In which case go into "sub Explode" in 
>>>> /usr/lib/MailScanner/MailScanner/Message.pm, and add some 
>>>> "print STDERR" 
>>>> lines to generate tracing output so you can see how far it 
>> gets. When 
>>>> you do a "MailScanner --debug" it will show you the STDERR 
>>>> debug output 
>>>> in the terminal session.
>>>>     
>>>
>>> OK, Here is whats happening.  Its using Explode in 
>> MessageBatch.pm and
>>> not Message.pm.
>>> Here is where it dies in MessageBatch.pm:
>>>
>>> sub Explode {
>>>   my $this = shift;
>>>   print STDERR "messagebatch\n";  #crumley
>>>
>>>   my($key, $message);
>>>
>>>   # jjh 2004-03-12 reap as many as we can.
>>>   # JKF Test 2004-11-23 1 until waitpid(-1, &POSIX::WNOHANG) == -1;
>>>   print STDERR "about to hang\n";  
>>>   1 until waitpid(-1, WNOHANG) == -1;
>>>   print STDERR "we never get here\n";  
>>>   
>> But as the comments in the code show, this code hasn't been touched 
>> since 2004. So I don't understand why you are just seeing a change in 
>> behaviour. I would suspect you have upgraded something else 
>> in your system.
>>
>> Are other people seeing the same problem?
>> What OS, distro, version, kernel, etc are you running?
>> Is anyone else running an identical system?
>> If so, are they seeing the same symptoms?
>>
>>  From the "perl-func" man page:
>>        waitpid PID,FLAGS
>>                Waits for a particular child process to 
>> terminate and returns
>>                the pid of the deceased process, or "-1" if 
>> there is no such
>>                child process.
>> so it should reap processes until there aren't any left to be reaped. 
>> What does the documentation for waitpid say on your system? This is a 
>> POSIX function, so should be the same across most systems.
>>
>> If you take out the waitpid() call, you will collect <defunct> 
>> processes, as they are terminating but never being reaped. So 
>> this call 
>> is very necessary.
>>
>> I'm not going to touch this code with a 10-foot barge pole 
>> unless I have 
>> *very* good reason to.
>>
>> Jules
>>
>> - -- 
>> Julian Field MEng CITP CEng
> 
> Julian, I really appreciate you looking at this.  I understand this code
> hasn't changed and I'm certianly not suggesting you change it now.  I'm
> just trying to track this down.  I'm running a pretty standard Centos
> 4.6 system plus the rpmforge repositories so I'm guessing someone else
> may run into this as well.  I think you are probably right, something
> else on the system may be involved.  Everything is up to date with a
> "yum upgrade".  I just don't have a clue as to what could be causing
> this.
> Thanks,
> -Steve
Rpmforge on 4.6? How about doing a rpm -qa --last and posting any changed 
rpm's since the time it quit working. I'm guessing a new perl module that is 
slightly incompatible like the mail-tools problem earlier in the year.

-- 
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://lists.mailscanner.info/pipermail/mailscanner/attachments/20080313/1b8dd5a7/signature.bin


More information about the MailScanner mailing list