MailScanner/Postfix message duplication - possible fix

Kash, Howard (Civ,ARL/CISD) hmkash at ARL.ARMY.MIL
Tue Dec 16 19:55:21 GMT 2003


I just got nailed with a few more duplicates, and it was up to 40
seconds between the postfix "(deferred transport)" log entry and the
"skipped, still being delivered" log entry.  So I'm going to change the
waiting period on my system to 60 seconds (time + 940) and see how it
goes.


Howard



-----Original Message-----
From: Kash, Howard (Civ,ARL/CISD) 
Sent: Tuesday, December 16, 2003 9:42 AM
To: MAILSCANNER at JISCMAIL.AC.UK
Subject: MailScanner/Postfix message duplication - possible fix


Here is a solution Julian proposed to the postfix/still being
delivered/duplicate message problem back in September.  Based on my
analysis of the Postfix code and logs from actual occurrences of the
bug, I think this is along the right track.  However, postfix postdates
messages that it moves into the deferred queue by 1000 seconds
(minimal_backoff_time default value).  My version of this patch is:

    next if ($ModDate{$file} + 10) > (time + 1000);

or more efficiently:

    next if $ModDate{$file} > (time + 990);

This accounts for the 1000 second postdate period and adds 10 seconds to
get around the apparent race condition.  In every occurrence that I've
seen of the bug, MailScanner starts it's scan just as a message is being
processed (moved into the deferred queue) by postfix.  I think there is
a brief instance when postfix does not have a lock on the file and
MailScanner picks it up (and locks it).  Then postfix tries to lock the
file.  Seeing that it is already locked, it generates the "skipped,
still being delivered" message and backs off for 60 seconds (see
nqmgr/qmgr_active.c:qmgr_active_feed()) and then re-queues the message
again.

You will need to adjust the 1000 second value if you have changed the
default postfix setting for minimal_backoff_time.  You may also want to
play around with the 10 second delay if it's too long or short.  Since
the bug is very difficult to reproduce and occurs so infrequently, it's
hard to say yet if this is actually working.  If others could try it out
and let the list know if it seems to be working for them, maybe Julian
can add it to the next release.  The only side affect of adding this
line will be a 10 second delay in mail delivery.



Howard


-----Original Message-----
From: Julian Field [mailto:mailscanner at ECS.SOTON.AC.UK] 
Sent: Thursday, September 04, 2003 6:45 AM
To: MAILSCANNER at JISCMAIL.AC.UK
Subject: Re: MailScanner+PostFix ---- try this


Here's a patch to Postfix.pm. I know it's not exactly a neat solution to
the problem, but if it fixes it I will know I have found the problem.

--- Postfix.pm.old      2003-09-01 12:28:21.000000000 +0100
+++ Postfix.pm  2003-09-04 11:49:17.000000000 +0100
@@ -1132,6 +1132,9 @@
        #print STDERR "Files are " . join(', ', @SortedFiles) . "\n";
        while(defined($file = shift @SortedFiles) &&
              $HitLimit1+$HitLimit2+$HitLimit3+$HitLimit4<1) {
+        # Yes I know this is a hack but it will help isolate the
problem
+        next if $ModDate{$file} > time-3;
+
          # must separate next two lines or $1 gets re-tainted by being
part of
          # same expression as $file [mumble mumble grrr mumble mumble]
          #print STDERR "Reading file $file from list\n";




More information about the MailScanner mailing list