Mailscanner child freezes
Jeffrey Haas
jeffrey at life.illinois.edu
Sun Nov 16 02:36:38 GMT 2008
Hi Glenn -
Thanks for your reply. I'll send you a URL where you could download the
messages. I've collected three so far.
I'm not familiar with the earlier issue with freezing. I've used
MailScanner for years, and it has worked very well in our situation. I
only recently joined the mailing list after I found answers there when
we were upgrading to ClamAV 0.94 with an earlier version of MailScanner.
We're not using any milters with Postfix.
I have a secondary mail system set up on Ubuntu 8.04. I also upgraded
this system using the latest .tgz files last Tuesday. And it also has
had Mailscanner child processes freeze after I wrote my message to the
list. Not as often, but this system receives much less mail. Not sure
what that means exactly, but the problem doesn't seem related to Ubuntu
versions anyway.
On the original system ---
/opt/MailScanner/bin# ./MailScanner -v
Running on
Linux les 2.6.22-15-server #1 SMP Fri Jul 11 19:54:13 UTC 2008 i686
GNU/Linux
This is Perl version 5.008008 (5.8.8)
This is MailScanner version 4.72.5
Module versions are:
1.00 AnyDBM_File
1.16 Archive::Zip
0.21 bignum
1.04 Carp
1.41 Compress::Zlib
1.119 Convert::BinHex
0.17 Convert::TNEF
2.121_08 Data::Dumper
2.27 Date::Parse
1.00 DirHandle
1.05 Fcntl
2.74 File::Basename
2.09 File::Copy
2.01 FileHandle
1.08 File::Path
0.20 File::Temp
0.90 Filesys::Df
1.35 HTML::Entities
3.56 HTML::Parser
2.37 HTML::TokeParser
1.23 IO
1.14 IO::File
1.13 IO::Pipe
2.02 Mail::Header
1.86 Math::BigInt
0.19 Math::BigRat
3.07 MIME::Base64
5.427 MIME::Decoder
5.427 MIME::Decoder::UU
5.427 MIME::Head
5.427 MIME::Parser
3.07 MIME::QuotedPrint
5.427 MIME::Tools
0.11 Net::CIDR
1.25 Net::IP
0.16 OLE::Storage_Lite
1.04 Pod::Escapes
3.05 Pod::Simple
1.09 POSIX
1.19 Scalar::Util
1.78 Socket
2.16 Storable
1.4 Sys::Hostname::Long
0.18 Sys::Syslog
1.26 Test::Pod
0.7 Test::Simple
1.9707 Time::HiRes
1.02 Time::localtime
Optional module versions are:
1.29 Archive::Tar
0.21 bignum
1.82 Business::ISBN
1.10 Business::ISBN::Data
1.08 Data::Dump
1.814 DB_File
1.13 DBD::SQLite
1.56 DBI
1.15 Digest
1.01 Digest::HMAC
2.36 Digest::MD5
2.11 Digest::SHA1
missing Encode::Detect
0.17008 Error
0.18 ExtUtils::CBuilder
2.18 ExtUtils::ParseXS
2.36 Getopt::Long
0.44 Inline
1.08 IO::String
1.04 IO::Zlib
2.21 IP::Country
missing Mail::ClamAV
3.002005 Mail::SpamAssassin
v2.004 Mail::SPF
1.999001 Mail::SPF::Query
0.2808 Module::Build
0.20 Net::CIDR::Lite
0.63 Net::DNS
0.002.2 Net::DNS::Resolver::Programmable
missing Net::LDAP
4.004 NetAddr::IP
1.94 Parse::RecDescent
missing SAVI
2.64 Test::Harness
0.95 Test::Manifest
1.98 Text::Balanced
1.35 URI
0.7203 version
0.62 YAML
---
/opt/MailScanner/bin# ./MailScanner --lint
Trying to setlogsock(unix)
Checking version numbers...
Version number in MailScanner.conf (4.72.5) is correct.
Your envelope_sender_header in spam.assassin.prefs.conf is correct.
MailScanner setting GID to (112)
MailScanner setting UID to (105)
Checking for SpamAssassin errors (if you use it)...
SpamAssassin temporary working directory is
/var/spool/MailScanner/incoming/SpamAssassin-Temp
SpamAssassin temp dir = /var/spool/MailScanner/incoming/SpamAssassin-Temp
Using SpamAssassin results cache
Connected to SpamAssassin cache database
SpamAssassin reported no errors.
Using locktype = posix
MailScanner.conf says "Virus Scanners = clamd"
Found these virus scanners installed: clamd
===========================================================================
Filename Checks: Windows/DOS Executable (1 eicar.com)
Other Checks: Found 1 problems
Virus and Content Scanning: Starting
Clamd::INFECTED:: Eicar-Test-Signature :: ./1/eicar.com
Virus Scanning: Clamd found 2 infections
Infected message 1 came from 10.1.1.1
Virus Scanning: Found 2 viruses
===========================================================================
Virus Scanner test reports:
Clamd said "eicar.com was infected: Eicar-Test-Signature"
If any of your virus scanners (clamd)
are not listed there, you should check that they are installed correctly
and that MailScanner is finding them correctly via its virus.scanners.conf.
---
When things are running smoothly:
/opt/MailScanner/bin# ./MailScanner --debug
In Debugging mode, not forking...
Trying to setlogsock(unix)
SpamAssassin temp dir = /var/spool/MailScanner/incoming/SpamAssassin-Temp
Building a message batch to scan...
Have a batch of 1 message.
max message size is '200k'
Stopping now as you are debugging me.
With the problem message:
/opt/MailScanner/bin# ./MailScanner --debug |tee ms_debug
In Debugging mode, not forking...
Trying to setlogsock(unix)
SpamAssassin temp dir = /var/spool/MailScanner/incoming/SpamAssassin-Temp
Building a message batch to scan...
Have a batch of 1 message.
max message size is '200k'
---
I can resend the body of one of the problem messages to myself, and then
it sends the MailScanner process to 100% CPU. If I connect with "strace
-p 26089" after the problem starts the only output is:
Process 26089 attached - interrupt to quit
'top' shows:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26089 postfix 25 0 60508 55m 3116 R 100 1.4 4:07.21 MailScanner
I was able to produce lots of strace output with:
strace -f -o ms_strace /opt/MailScanner/bin/MailScanner
From there I worked out that the problem PID was 27473. Then I grep'ed
all the lines for this process. That data ends with:
27473 _llseek(8, 601088, [601088], SEEK_SET) = 0
27473 write(8,
"\r\0\0\0\1\1\265\0\1\265\1E\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
27473 fsync(8) = 0
27473 close(11) = 0
27473
unlink("/var/spool/MailScanner/incoming/SpamAssassin.cache.db-journal") = 0
27473 fcntl64(8, F_SETLK64, {type=F_RDLCK, whence=SEEK_SET,
start=1073741826, len=510}, 0xbf840964) = 0
27473 fcntl64(8, F_SETLK64, {type=F_UNLCK, whence=SEEK_SET,
start=1073741824, len=2}, 0xbf840964) = 0
27473 fcntl64(8, F_SETLK64, {type=F_UNLCK, whence=SEEK_SET, start=0,
len=0}, 0xbf840964) = 0
27473 gettimeofday({1226800996, 724335}, NULL) = 0
27473 brk(0xab17000) = 0xab17000
I'll send you an URL where you can download the full strace if you like.
Thanks for your help!
--jeff
Glenn Steen wrote:
> 2008/11/14 Jeffrey Haas <jeffrey at life.illinois.edu>:
>> I'm having trouble with a MailScanner child process freezing up. If
>> anyone has any suggestions on how to investigate this, I'd be grateful.
>>
>> On Tuesday, I upgraded a MailScanner/ClamAV installation to the latest
>> versions hoping to get past the recently announced buffer overflow in
>> ClamAV. I installed using the .tgz files for MailScanner &
>> SpamAssassin/ClamAV on the mailscanner.info. I configured MailScanner
>> to use clamd.
>>
>> Everything went well at the time (as it usually does - thanks Julian!).
>> However this morning, I found that MailScanner had stopped processing
>> mail. I was in a bit of a panic at the time, so I simply restarted
>> MailScanner, and some mail started flowing, but then things froze up
>> again.
>>
>> I was thinking that there was a message that was gumming up the works
>> somewhere, so I set in MailScanner.conf:
>>
>> #Max Unscanned Messages Per Scan = 30
>> #Max Unsafe Messages Per Scan = 30
>> Max Unscanned Messages Per Scan = 1
>> Max Unsafe Messages Per Scan = 1
>>
>> That kept the good mail from getting tangled up with the bad.
>>
>> Running 'ps auwx|grep MailScanner', I find:
>>
>> postfix 31809 0.0 0.4 25616 20088 ? Ss 17:59 0:00
>> MailScanner: starting child
>> postfix 31810 95.1 1.1 50884 47328 ? R 17:59 333:13
>> MailScanner: cleaning messages
>> postfix 21305 0.0 1.2 56152 51988 ? S 22:02 0:03
>> MailScanner: waiting for messages
>> ...
>>
>>
>> Process 31810 picks up a message to clean, but can't complete for some
>> reason. Inspecting /var/spool/MailScanner/incoming, I can see the
>> contents of the message.
>>
>> It is a bounce message from a Mailman list which contains an attachment,
>> 'text.zip'. I can run clamscan manually on the files extracted from the
>> message and it reports 'Worm.Mydoom.M FOUND'.
>>
>> I thought perhaps this was an issue with clamd, since that is a bit new
>> to me. (I've used the Mail::ClamAV module for many years.) So, I
>> reconfigured to 'Virus Scanners = clamav' to have MailScanner invoke
>> clamscan. I still get the same behavior with that. One child process
>> grabs the troublesome message, and then stays in the 'cleaning messages'
>> state indefinitely. The CPU utilization is 100% for that process. I've
>> left it running for about 6 hours now, but there's no change. I think
>> my freeze up of the entire server this morning was perhaps, all of the
>> children (5) getting tied up in this way.
>>
>> This is an Ubuntu 7.10 system with postfix 2.4.5 & perl 5.8.8 installed
>> from .deb packages. I think everything else of importance came from the
>> .tgz files.
>>
>> The last messages from process 31810 are:
>>
>> Nov 13 17:59:17 les MailScanner[31810]: MailScanner E-Mail Virus Scanner
>> version 4.72.5 starting...
>> Nov 13 17:59:17 les MailScanner[31810]: SpamAssassin temporary working
>> directory is /var/spool/MailScanner/incoming/SpamAssassin-Temp
>> Nov 13 17:59:17 les MailScanner[31810]: Using SpamAssassin results cache
>> Nov 13 17:59:17 les MailScanner[31810]: Connected to SpamAssassin cache
>> database
>> Nov 13 17:59:17 les MailScanner[31810]: Expired 6 records from the
>> SpamAssassin cache
>> Nov 13 17:59:18 les MailScanner[31810]: Using locktype = flock
>> Nov 13 17:59:18 les MailScanner[31810]: New Batch: Found 40 messages
>> waiting
>> Nov 13 17:59:18 les MailScanner[31810]: New Batch: Scanning 1 messages,
>> 45362 bytes
>> Nov 13 17:59:18 les MailScanner[31810]: Spam Checks: Starting
>> Nov 13 17:59:18 les MailScanner[31810]: Message 2D5154E85A3.D9D39 from
>> 127.0.0.1 (mailman-bounces at life.illinois.edu) is whitelisted
>> Nov 13 17:59:18 les MailScanner[31810]: SpamAssassin cache hit for
>> message 2D5154E85A3.D9D39
>> Nov 13 17:59:18 les MailScanner[31810]: Message 2D5154E85A3.D9D39 from
>> 127.0.0.1 (mailman-bounces at life.illinois.edu) to uiuc.edu is not spam
>> (whitelisted), SpamAssassin (cached, score=-1.44, required 6,
>> autolearn=not spam, ALL_TRUSTED -1.44)
>> Nov 13 17:59:18 les MailScanner[31810]: Filename Checks: Possible MS-Dos
>> program shortcut attack (2D5154E85A3.D9D39 text.htm
>> .pif)
>> Nov 13 17:59:18 les MailScanner[31810]: Filetype Checks: No executables
>> (2D5154E85A3.D9D39 text.htm
>> .pif)
>> Nov 13 17:59:18 les MailScanner[31810]: Other Checks: Found 2 problems
>> Nov 13 17:59:18 les MailScanner[31810]: Virus and Content Scanning:
>> Starting
>> Nov 13 17:59:19 les MailScanner[31810]: ./2D5154E85A3.D9D39.message:
>> Worm.Mydoom.M FOUND
>> Nov 13 17:59:19 les MailScanner[31810]: ./2D5154E85A3.D9D39/text1.zip:
>> Worm.Mydoom.M FOUND
>> Nov 13 17:59:19 les MailScanner[31810]:
>> ./2D5154E85A3.D9D39/text.htm.pif: Worm.Mydoom.M FOUND
>> Nov 13 17:59:19 les MailScanner[31810]: ./2D5154E85A3.D9D39/text.zip:
>> Worm.Mydoom.M FOUND
>> Nov 13 17:59:20 les MailScanner[31810]: Virus Scanning: ClamAV found 4
>> infections
>>
>> Any ideas how to prevent this from happening? Thanks for any suggestions.
>>
> Hi Jeffrey,
>
> If you attach to one of the children "getting stuck" with strace, what
> does it seem to be doing? Further, looking with something simple, like
> top or sar... Is it continually "eating memory" (ie leaking....)?
> Since you know which messages cause this, could you lift a couple from
> the hold queue and either send them to Jules or to me, so that we
> could look at what our systems think of them...?
> My gut feeling is that there is some problem with some perl module,
> but ... that's just a gut reaction:-). What does "MailScanner -v" say?
> Also, simple things like "MailScanner --lint" and definitely
> "MailScanner --debug" could perhaps reveal something interesting:).
>
> As I'm sure you know, we've had one ... instance of children "freezing
> up"/"looping forever" with the milter support (specific to PF, I
> missed a place to handle things:-), but that was fixed a while back
> and shouldn't be affecting things with something this new. Obligatory
> question is "do you run any milters?", more for completeness than
> anything:-).
>
> Cheers
More information about the MailScanner
mailing list