Urgent: MailScanner apparently stopped processing...
Julian Field
mailscanner at ecs.soton.ac.uk
Sat May 17 17:51:44 IST 2003
Here is an excerpt from the ChangeLog for 4.21:
- Postfix support now has extra permissions parameter on "mkdir" calls,
solving a syntax error on some versions of Perl.
- Postfix support now won't abandon a message because it could not get
the SMTP client IP address out of it. Will insert 0.0.0.0 if no IP
address could be found.
- Postfix will always pick up IP address of locally-generated mail.
- Postfix detects hash directory depth more cleanly.
- Postfix handles queue files which are still being written.
- Postfix bug fixed when processing messages with no body.
At 16:13 17/05/2003, you wrote:
>I am having a similar problem. I installed MailScanner-4.20-3 with
>Mail-SpamAssassin-2.53 on RedHat 8.0. Already using Postfix 1.1.11 for about
>2 months. Postfix is fabulous. Installing and configuring MailScanner was a
>breeze. ;-) It ran great for about 48 hours. Then yesterday mail stopped
>moving. The two Postfix processes (in and out) were on and seemed fine. The
>MailScanner was down to 2 processes when I had allowed 10 threads for the 2
>CPU system (PIII 933mhz 2gb mem). The "dying of old age" messages had
>stopped appearing in the log. I killed everything and started again. Mail
>moved a little, but I noticed the:
>
>"Batch: Found invalid queue file for message..."
>
>repeating quite often. I moved the 20 or so offending files to another path
>and started again. After a while I had more of the same kind of offending
>files. Finally I couldn't afford to have the mail failing, so I took
>MailScanner out of the loop. MailScanner looks like a great product, but I
>can't afford this kind of a problem. The CEO came in and pointed out the
>importance he places on reliable email (it had only be down 4 hours).
>Unfortunatly, he notice the outage before I did.
>
>This problem seems to involve the defer/deferred mail not being cleaned up
>properly.
>
>
>On Fri, 9 May 2003 18:49:47 -0300, Mariano Absatz <mailscanner at LISTS.COM.AR>
>wrote:
> >El 9 May 2003 a las 21:54, Julian Field escribió:
> >
> >> I agree it would stay in the queue and, due to the sorting, would always
> >> appear as the first message in the batch. But why would it jam anything?
> >> It would get found and logged at the start of each batch, but any other
> >> messages that later appeared would still be added to the batch.
> >> So it would cause a log warning at the start of each batch, but what harm
> >> would it do otherwise?
> >This was my first thought and that's why I didn't tell you the first time...
> >but Leo thinks something is going wrong there... I'll ask him on Monday
> to do
> >a bunch of tests (actually, the only time he saw it happen was by actually
> >feeding by hand a manually constructed queue file that contained a typo).
> >
> >> I could add it to a hash of known bad messages if you like, so that it
> >> ignored that message id in subsequent queue scans. But I don't see how the
> >> current system actually breaks.
> >I'd rather quarantine the message at the end of ReadQf() (before the
> >return 0)...
> >
> >>
> >> At 20:05 09/05/2003, you wrote:
> >> >Julian,
> >> >
> >> >Leo Helman (the guy who actually wrote most of ZMailer support)
> spotted this
> >> >one a few days ago and I thought it was just unelegant, but it might
> indeed
> >> >be a bug... if it is so, it affects _all_ versions (sendmail, exim,
>zmailer &
> >> >postfix)... maybe it showed up because of some problem in the postfix
> queue
> >> >file parser, but it is there anyway.
> >> >
> >> >Leo says that within MailScanner::Sendmail::CreateBatch() you have the
> >> >following code excerpt:
> >> >
> >> > $batchempty = 1;
> >> >
> >> > while(($file = shift @SortedFiles) &&
> >> > $HitLimit1+$HitLimit2+$HitLimit3+$HitLimit4<1) {
> >> > .... .... ....
> >> > .... .... ....
> >> > $newmessage = MailScanner::Message->new($id, $queuedirname);
> >> > next unless $newmessage;
> >> > .... .... ....
> >> > .... .... ....
> >> > }
> >> >
> >> > # Wait a bit until I check the queue again
> >> > sleep(MailScanner::Config::Value('queuescaninterval')) if
>$batchempty;
> >> > } while $batchempty; # Keep trying until we get something
> >> >
> >> >now, newmessage is false when a lock fails or when there was an error
>parsing
> >> >the envelope (e.g. missing envelope from, to or origin).
> >> >
> >> >If the lock failed, that is because another MailScanner locked it and the
> >> >next round of the loop or so, the file will probably be not there,
>'cause the
> >> >other MailScanner that had it locked, processed it and removed it
> from the
> >> >queue.
> >> >
> >> >But, if the envelope was corrupt, the file stays in the queue forever,
>and as
> >> >$batchempty is not modified, it never quits the loop (the $HitLimitX stay
> >> >always 0).
> >> >
> >> >At first I thought that the only problem would be that the queue file
> would
> >> >stay there forever (or until an operator read the log message produced
>within
> >> >MailScanner::Sendmail::ReadQf() (smtp MailScanner[xxxx]: Batch: Found
>invalid
> >> >queue file for message xxxxxx) and would manually remove it from the
>queue...
> >> >
> >> >In fact, I dismissed a message I was writing to you about this when I
>thought
> >> >that... now that Leo read this thread and recalls our dialog back then,
>I re-
> >> >read it and notice that, as we always sort the queue files by age, this
> >> >corrupt file will _always_ be the first to be processed and, hence, would
> >> >stuck the queue...
> >> >
> >> >I think we should differntiate the way ReadQf() fails if the queue
> file is
> >> >locked or if it is ill-formed... or change the while() condition...
> >> >
> >> >
> >> >
> >> >El 9 May 2003 a las 8:46, Julian Field escribió:
> >> >
> >> > > At 19:25 08/05/2003, you wrote:
> >> > > >I'm hoping someone can shed some light on this one - recently I had
> >> > > >MailScanner which I've implemented on RedHat 8 w/Postfix just
> yesterday,
> >> > > >abruptly stop processing mail.
> >> > > >
> >> > > >I only happened to notice as the only indication was that no mail was
> >> > > >passing through to my internal mail/pop servers, etc.
> >> > > >
> >> > > >When I checked the maillog I found only entries from the postfix
> demon
> >> > > >that receives incoming mail, nothing from MailScanner or the postfix
>demon
> >> > > >that then delivers what MailScanner gives it. All processes
>including the
> >> > > >MailScanner processes were running - in fact, MailScanner was using a
> >> > > >majority of cpu time. I tried manually starting up MailScanner
> and found
> >> > > >that this fact of "MailScanner starting" and "xxx messages found
> to be
> >> > > >scanned" did show up in the maillog, however, no other change,
> mail did
> >> > > >not start to flow.
> >> > > >
> >> > > >I finally restarted the server and then everything started to move.
> >> > >
> >> > > But was it scanning after you restarted?
> >> > >
> >> > > Have you use redhat-switchmail-nox to set which email system RedHat
>thinks
> >> > > it is trying to run?
> >> > >
> >> > > >So, based on this I have a few questions:
> >> > > >
> >> > > >1. Any ideas why this happened and how can I prevent it and also does
> >> > > >anyone have any scripts out there that detect this kindof thing
> and then
> >> > > >cleanly shut down mailscanner and restart it?
> >> > > >
> >> > > >2. I realized I don't even know how to cleanly shut down MailScanner
> >> > > >manually. This may seem a stupid question but if someone could
> answer it
> >> > > >that would be great.
> >> > >
> >> > > service MailScanner stop
> >> > >
> >> > > You can do "service MailScanner" to get a list of the command
> options you
> >> > > can give it.
> >> > > Does "service MailScanner start" work cleanly, or does it output any
> >> > errors?
> >> > >
> >> > > >4. I have an error message repeatedly showing up in the maillog
> that I
> >> > > >have been unable to discover the cause of. It is:
> >> > > >smtp MailScanner[xxxx]: Batch: Found invalid queue file for message
>xxxxxx
> >> > >
> >> > > For some reason it thinks one of your incoming queue files is
> corrupt. It
> >> > > needs to be able to find the sender and recipient addresses, and
> the last
> >> > > hop IP address, in the file it lifts from the queue.
> >> > >
> >> > > Can you send me one of the files from
> /var/spool/postfix.in/deferred that
> >> > > exhibits this problem.
> >> > > Then I can improve the Postfix parser to stop it happening again.
> >> > > --
> >> > > Julian Field
> >> > > www.MailScanner.info
> >> > > MailScanner thanks transtec Computers for their support
> >> >
> >> >
> >> >--
> >> >Mariano Absatz
> >> >El Baby
> >> >----------------------------------------------------------
> >> >I am not afraid of death, I just don't want to
> >> >be there when it happens.
> >> > -- Woody Allen
> >>
> >> --
> >> Julian Field
> >> www.MailScanner.info
> >> Professional Support Services at www.MailScanner.biz
> >> MailScanner thanks transtec Computers for their support
> >
> >
> >--
> >Mariano Absatz
> >El Baby
> >----------------------------------------------------------
> >Lottery: A tax on people who are bad at math.
--
Julian Field
www.MailScanner.info
Professional Support Services at www.MailScanner.biz
MailScanner thanks transtec Computers for their support
More information about the MailScanner
mailing list