Upgraded to 4.67.6, MailScanner scans a batch then hangs at
100 percent CPU
Scott Silva
ssilva at sgvwater.com
Thu Mar 13 16:17:26 GMT 2008
on 3-12-2008 9:19 PM Steve Crumley spake the following:
>
>
>> -----Original Message-----
>> From: mailscanner-bounces at lists.mailscanner.info
>> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf
>> Of Julian Field
>> Sent: Wednesday, March 12, 2008 5:51 PM
>> To: MailScanner discussion
>> Subject: Re: Upgraded to 4.67.6, MailScanner scans a batch
>> then hangs at 100 percent CPU
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>>
>> Steve Crumley wrote:
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: mailscanner-bounces at lists.mailscanner.info
>>>> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf
>>>> Of Julian Field
>>>> Sent: Tuesday, March 11, 2008 6:50 PM
>>>> To: MailScanner discussion
>>>> Subject: Re: Upgraded to 4.67.6, MailScanner scans a batch
>>>> then hangs at 100 percent CPU
>>>>
>>>> * PGP Signed by an unverified key: 03/11/08 at 18:50:26
>>>>
>>>>
>>>>
>>>> Steve Crumley wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: mailscanner-bounces at lists.mailscanner.info
>>>>>> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf
>>>>>> Of Glenn Steen
>>>>>> Sent: Tuesday, March 11, 2008 4:32 PM
>>>>>> To: MailScanner discussion
>>>>>> Subject: Re: Upgraded to 4.67.6,MailScanner scans a batch
>>>>>> then hangs at 100 percent CPU
>>>>>>
>>>>>> On 11/03/2008, Steve Crumley
>>>>>>
>>>> <scrumley at secure-enterprise.com> wrote:
>>>>
>>>>>>
>>>>>>
>>>>>>> > -----Original Message-----
>>>>>>> > From: mailscanner-bounces at lists.mailscanner.info
>>>>>>> > [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Of Glenn Steen
>>>>>>>>
>>>>>>>>
>>>>>>> > Sent: Tuesday, March 11, 2008 1:21 PM
>>>>>>> > To: MailScanner discussion
>>>>>>> > Subject: Re: Upgraded to 4.67.6,MailScanner scans a batch
>>>>>>> > then hangs at 100 percent CPU
>>>>>>> >
>>>>>>> > On 11/03/2008, Steve Crumley
>>>>>>>
>>>>>>>
>>>>>> <scrumley at secure-enterprise.com> wrote:
>>>>>>
>>>>>>
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > > -----Original Message-----
>>>>>>> > > > From: mailscanner-bounces at lists.mailscanner.info
>>>>>>> > > > [mailto:mailscanner-bounces at lists.mailscanner.info]
>>>>>>>
>>>>>>>
>>>>>> On Behalf
>>>>>>
>>>>>>
>>>>>>> > > > Of --[ UxBoD ]--
>>>>>>> > >
>>>>>>> > > > Sent: Tuesday, March 11, 2008 11:29 AM
>>>>>>> > > > To: MailScanner discussion
>>>>>>> > > > Subject: Re: Upgraded to 4.67.6, MailScanner
>> scans a batch
>>>>>>> > > > then hangs at 100 percent CPU
>>>>>>> > > >
>>>>>>> > >
>>>>>>> > > > do you have strace installed on the server ? if
>> so when the
>>>>>>> > > > process is running at 100% CPU connect to it and
>>>>>>>
>>>> see what it
>>>>
>>>>>>> > > > is doing. I had this before, but for the life of
>>>>>>>
>>>>>>>
>>>>>> me I cannot
>>>>>>
>>>>>>
>>>>>>> > > > remember what I changed to fix it :(
>>>>>>> > > >
>>>>>>> > > > Things to check :-
>>>>>>> > > >
>>>>>>> > > > 1) Permissions, are they all correct
>>>>>>> > > > 2) Check MailScanner.conf again just to make
>> sure no typos
>>>>>>> > > >
>>>>>>> > > > Regards,
>>>>>>> > > >
>>>>>>> > > > --
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > Here is the output from strace:
>>>>>>> > >
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > > waitpid(-1, 0xbff09448, WNOHANG) = 0
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > The system had been running fine for over a year, I
>>>>>>>
>>>>>>>
>>>>>> can't find any
>>>>>>
>>>>>>
>>>>>>> > > permission or setting change thats doing this, but
>>>>>>>
>>>> I could be
>>>>
>>>>>>> > > overlooking something.
>>>>>>> > > Thanks,
>>>>>>> > > -Steve
>>>>>>> > >
>>>>>>> > Could perhaps be a busted SQLite SA cache? What does
>>>>>>>
>>>>>>>
>>>>>> analyse_s<TAB> (I
>>>>>>
>>>>>>
>>>>>>> > don't remember if it is sacache or spamassassin_cache
>>>>>>>
>>>>>>>
>>>>>> ... the command
>>>>>>
>>>>>>
>>>>>>> > completion should take care of it:-) say? If it looks
>>>>>>>
>>>>>>>
>>>>>> fishy, simply
>>>>>>
>>>>>>
>>>>>>> > delete the SA cache file and restart MS.
>>>>>>> >
>>>>>>> > You've run MailScanner --lint, right? Nothing obvious
>>>>>>>
>>>> from that?
>>>>
>>>>>>> >
>>>>>>> > Oh, and what av scanners do you use? Obviously not
>>>>>>>
>>>>>>>
>>>>>> clamavmodule, but
>>>>>>
>>>>>>
>>>>>>> > perhaps clamav or clamd? are those OK?
>>>>>>> >
>>>>>>> > Cheers
>>>>>>> > --
>>>>>>> > -- Glenn
>>>>>>> > email: glenn < dot > steen < at > gmail < dot > com
>>>>>>> > work: glenn < dot > steen < at > ap1 < dot > se
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>> > MailScanner mailing list
>>>>>>> > mailscanner at lists.mailscanner.info
>>>>>>> > http://lists.mailscanner.info/mailman/listinfo/mailscanner
>>>>>>> >
>>>>>>> > Before posting, read http://wiki.mailscanner.info/posting
>>>>>>> >
>>>>>>> > Support MailScanner development - buy the book off
>> the website!
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> analyse_SpamAssassin_cache looks clean, MailScanner --lint
>>>>>>>
>>>>>>>
>>>>>> is clean too.
>>>>>>
>>>>>>
>>>>>>> I'm running clamd for AV but I've set virus scanning to no
>>>>>>>
>>>>>>>
>>>>>> while working
>>>>>>
>>>>>>
>>>>>>> on this.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -Steve
>>>>>>>
>>>>>>>
>>>>>> Couldn't be something easily mended, huh:-)....
>>>>>>
>>>>>> What you seem to have attached to above (with strace)
>> would be the
>>>>>> main MailScanner process, since it basically just wait for it's
>>>>>> children to end... Or is it? What does a ps listing show
>> (one that
>>>>>> show the command argument list, since Jules rewrite it to
>>>>>>
>>>> show what it
>>>>
>>>>>> thinks it is basically doing)?
>>>>>> Do the children restart endlessly when hung? How many
>> children are
>>>>>> there, and in what state?
>>>>>> Cheers
>>>>>> -- Glenn
>>>>>>
>>>>>>
>>>>> When I first started it with 8 children, they all end up
>>>>>
>>>> quickly hanging
>>>>
>>>>> and consuming CPU. For now, I've set it to 1 child and I've been
>>>>> running in debug mode. The ps gives us a good clue! Its the only
>>>>> mailscanner process and it reports "MailScanner: extracting
>>>>>
>>>> attachments"
>>>>
>>>>> Thanks,
>>>>> -Steve
>>>>>
>>>>>
>>>> In which case go into "sub Explode" in
>>>> /usr/lib/MailScanner/MailScanner/Message.pm, and add some
>>>> "print STDERR"
>>>> lines to generate tracing output so you can see how far it
>> gets. When
>>>> you do a "MailScanner --debug" it will show you the STDERR
>>>> debug output
>>>> in the terminal session.
>>>>
>>>
>>> OK, Here is whats happening. Its using Explode in
>> MessageBatch.pm and
>>> not Message.pm.
>>> Here is where it dies in MessageBatch.pm:
>>>
>>> sub Explode {
>>> my $this = shift;
>>> print STDERR "messagebatch\n"; #crumley
>>>
>>> my($key, $message);
>>>
>>> # jjh 2004-03-12 reap as many as we can.
>>> # JKF Test 2004-11-23 1 until waitpid(-1, &POSIX::WNOHANG) == -1;
>>> print STDERR "about to hang\n";
>>> 1 until waitpid(-1, WNOHANG) == -1;
>>> print STDERR "we never get here\n";
>>>
>> But as the comments in the code show, this code hasn't been touched
>> since 2004. So I don't understand why you are just seeing a change in
>> behaviour. I would suspect you have upgraded something else
>> in your system.
>>
>> Are other people seeing the same problem?
>> What OS, distro, version, kernel, etc are you running?
>> Is anyone else running an identical system?
>> If so, are they seeing the same symptoms?
>>
>> From the "perl-func" man page:
>> waitpid PID,FLAGS
>> Waits for a particular child process to
>> terminate and returns
>> the pid of the deceased process, or "-1" if
>> there is no such
>> child process.
>> so it should reap processes until there aren't any left to be reaped.
>> What does the documentation for waitpid say on your system? This is a
>> POSIX function, so should be the same across most systems.
>>
>> If you take out the waitpid() call, you will collect <defunct>
>> processes, as they are terminating but never being reaped. So
>> this call
>> is very necessary.
>>
>> I'm not going to touch this code with a 10-foot barge pole
>> unless I have
>> *very* good reason to.
>>
>> Jules
>>
>> - --
>> Julian Field MEng CITP CEng
>
> Julian, I really appreciate you looking at this. I understand this code
> hasn't changed and I'm certianly not suggesting you change it now. I'm
> just trying to track this down. I'm running a pretty standard Centos
> 4.6 system plus the rpmforge repositories so I'm guessing someone else
> may run into this as well. I think you are probably right, something
> else on the system may be involved. Everything is up to date with a
> "yum upgrade". I just don't have a clue as to what could be causing
> this.
> Thanks,
> -Steve
Rpmforge on 4.6? How about doing a rpm -qa --last and posting any changed
rpm's since the time it quit working. I'm guessing a new perl module that is
slightly incompatible like the mail-tools problem earlier in the year.
--
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://lists.mailscanner.info/pipermail/mailscanner/attachments/20080313/1b8dd5a7/signature.bin
More information about the MailScanner
mailing list