skipped, still being delivered -- proposed solution

Julian Field mailscanner at ecs.soton.ac.uk
Sat Oct 4 00:54:09 IST 2003


Okay, I have spent the last hour in bed staring at the ceiling, failing to
get to sleep, thinking about how this could possibly happen. MailScanner
never directly accessed any file in /etc/postfix.in/incoming, of that I am
certain. So the only way this can be happening is if the file in "incoming"
is linked to the file in "deferred".

This could happen if Wietse designed the message creation the way I now
think he did. He gets the envelope information and the headers from the
SMTP session. He uses that to create the queue message file in "incoming".
Based on the headers and envelope, he works out that the message wants to
go into "deferred", so he links the file into there. Then he reads the
message body from SMTP and adds it onto the end of the queue message file
(which is now in both "incoming" and "deferred"). Then he removes the link
from "incoming", leaving the message only in "deferred".

If that is true, it would also explain why you were first seeing a message
with no body, then the same message again but with the body the second time.

What it boils down to is a simple patch for Postfix.pm, which I would
really appreciate you trying. If it doesn't work, it won't do any harm. It
is attached to this message as I didn't want my mailer wrapping it.

I'm going back to bed now...

At 23:45 03/10/2003, you wrote:
>Also, exactly what version of Postfix are you using?
>And on what OS and version?
>
>I have just tried breaking my DNS server, pumping mail as fast as I can at
>Postfix and MailScanner, and doing lots of
>         lsof | grep /var/spool/postfix | grep MailScanner
>and I can't get it show anything other than /var/spool/postfix.in/deferred.
>Which is what it should be doing.
>Any ideas on how I might reproduce the behaviour of your system?
>
>At 21:15 03/10/2003, you wrote:
>>On Wed, Oct 01, 2003 at 03:22:40PM +0100, Julian Field wrote:
>> > Great :(
>> > In this case, I really don't know what is going on. Postfix generates the
>> > error when it tries to open a message to find it is already locked by
>> > something else. I have been *extremely* careful and paranoid about the
>> > locking and have made sure the file is unlocked before it is put in the
>> > queue. So how this is still happening beats me.
>>
>>Within the past couple days I started seeing this problem as well, and
>>there is one thing I have noticed, yet not seen discussed, namely that
>>MailScanner appears to be opening files in a directory -
>>/var/spool/postfix.in/incoming/ - where AFAICT it should not:
>>
>>====================
>>[root at Gabriel log]# lsof | grep /var/spool/postfix.in/incoming
>>nqmgr     15792 root    8r   DIR       72,2     4096    1954564
>>/var/spool/postfix.in/incoming
>>nqmgr     15792 root   12r   DIR       72,2     4096     521221
>>/var/spool/postfix.in/incoming/6
>>nqmgr     15792 root   13r   DIR       72,2     4096     895861
>>/var/spool/postfix.in/incoming/6/3
>>MailScann 15897 root    5u   REG       72,2     3226    1956954
>>/var/spool/postfix.in/incoming/1/8/18C9A1DDC5A
>>MailScann 15897 root   19u   REG       72,2     8602    1957133
>>/var/spool/postfix.in/incoming/4/3/43B771DDD0D
>>MailScann 15897 root   20u   REG       72,2     8713    1957134
>>/var/spool/postfix.in/incoming/4/7/473F31DDD0E
>>MailScann 15897 root   21u   REG       72,2     8256    1956916
>>/var/spool/postfix.in/incoming/4/1/411D11DDC34
>><snip>
>>MailScann 15963 root   28u   REG       72,2     2687    1956812
>>/var/spool/postfix.in/incoming/1/8/18A571DDBCC
>>cleanup   16029 root   12u   REG       72,2        0    1957467
>>/var/spool/postfix.in/incoming/C/A/CADDD1DDE5B
>>cleanup   16276 root   12u   REG       72,2        0    1957469
>>/var/spool/postfix.in/incoming/A/8/A85B71DDE5D
>>cleanup   16308 root   12u   REG       72,2        0    1957464
>>/var/spool/postfix.in/incoming/9/D/9D1D51DDE58
>>cleanup   16366 root   12u   REG       72,2        0    1957466
>>/var/spool/postfix.in/incoming/7/D/7D3C21DDE5A
>>cleanup   16372 root   12u   REG       72,2        0    1957468
>>/var/spool/postfix.in/incoming/8/6/86CB71DDE5C
>>MailScann 16375 root   14u   REG       72,2     2131    1957194
>>/var/spool/postfix.in/incoming/4/1/41AD51DDD4A
>><snip>
>>====================
>>...For brevity's sake I don't feel it necessary to include the rest of the
>>output, as they're all various queue files being accessed by MailScanner
>>or cleanup, but I certainly can if folks want (I omitted maybe two dozen
>>lines or so).
>>
>>Similarly, I can post my full configs, but I think this is sufficient for
>>now:
>>Incoming Queue Dir = /var/spool/postfix.in/deferred
>>Incoming Work Dir = /var/spool/MailScanner/incoming
>>Outgoing Queue Dir = /var/spool/postfix/incoming
>>
>>While it's obvious that MailScanner should read queue files in
>>/var/spool/postfix.in/DEFERRED/ [my caps] and write them to
>>/var/spool/postfix/incoming/ , I don't think it should be touching
>>anything in /var/spool/postfix.IN/INCOMING/ .  Am I wrong in this?
>>
>>Below are some other pieces of information which may or may not be helpful:
>>
>>1) I'm running RedHat 7.3 on a Compaq DL360 with two 1GHz processors and
>>1GB RAM; I've seen no problems with system load.
>>
>>2) MailScanner version is 4.22-5; Postfix version 1.1.12; SpamAssassin
>>version 2.55.
>>
>>3) I thought I had turned off all defunct DNS-based spam checks, but MS
>>seemed (sorry, no hard numbers) to be taking a long time to scan messages,
>>especially late last night (between roughly midnight and 2am GMT-04, but I
>>was also actively changing configurations and restarting MS & Co., which
>>seemed to cause bursts of duplicate/blank emails and "skipped, still being
>>delivered" messages).  I was unable to get MS to complain of any RBL
>>timeouts (including with Debug = yes in MailScanner.conf), but to make a
>>long story shorter, I turned off all of the following DNS-based spam
>>checks on a hunch (some, again, had already been offf):
>>
>>====================
>>Fri Oct  3 15:56:36 /etc/MailScanner
>>root at Gabriel# ls -l /etc/mail/spamassassin/local.cf
>>-r--r--r--    1 root     root          780 Oct  3 02:13
>>/etc/mail/spamassassin/local.cf
>>
>>Fri Oct  3 15:56:39 /etc/MailScanner
>>root at Gabriel# grep score /etc/mail/spamassassin/local.cf
>>score RCVD_IN_RELAYS_ORDB_ORG 0.0
>>score RCVD_IN_OSIRUSOFT_COM 0.0
>>score X_OSIRU_DUL 0.0
>>score X_OSIRU_DUL_FH 0.0
>>score X_OSIRU_OPEN_RELAY 0.0
>>score X_OSIRU_SPAMWARE_SITE 0.0
>>score X_OSIRU_SPAM_SRC 0.0
>>score RCVD_IN_RFCI 0.0
>>score RCVD_IN_ORBS 0.0
>>score RCVD_IN_DSBL 0.0
>>score RCVD_IN_MULTIHOP_DSBL 0.0
>>score RCVD_IN_SBL 0.0
>>score RCVD_IN_UNCONFIRMED_DSBL 0.0
>>score RCVD_IN_BONDEDSENDER 0.0
>>score RCVD_IN_OPM 0.0
>>score RCVD_IN_NJABL 0.0
>>score X_NJABL_OPEN_PROXY 0.0
>>score X_NJABL_DIALUP 0.0
>>====================
>>
>>4) Very quickly after that, my inbound queue from from 6,000 to "zero" and
>>my "skipped, still being delivered" messages stopped - from this I
>>hypothesize that some DNS-based spam check was timing out (I tested some,
>>but not all), causing MailScanner (and I'm going well beyond my expertise
>>here) to start losing a race condition of some sort with the inbound
>>Postfix process.
>>
>>====================
>>Fri Oct  3 02:16:16 /etc/mail/spamassassin
>>
>>---------->Checking inbound, pre-MS/SA queue<----------
>>
>>root at Gabriel# postqueue -c /etc/postfix.in -p | tail -1
>>-- 74771 Kbytes in 6368 Requests.
>>
>>Fri Oct  3 02:16:32 /etc/mail/spamassassin
>>
>>---------->Checking outbound, post-MS/SA queue<----------
>>
>>root at Gabriel# postqueue -p | tail -1
>>-- 511 Kbytes in 257 Requests.
>>
>>Fri Oct  3 02:16:35 /etc/mail/spamassassin
>>root at Gabriel# postqueue -c /etc/postfix.in -p | tail -1
>>-- 74904 Kbytes in 6384 Requests.
>>
>>Fri Oct  3 02:18:38 /etc/mail/spamassassin
>>root at Gabriel# postqueue -p | tail -1
>>-- 644 Kbytes in 360 Requests.
>>
>>Fri Oct  3 02:18:42 /etc/mail/spamassassin
>>root at Gabriel# postqueue -c /etc/postfix.in -p | tail -1
>>-- 75055 Kbytes in 6402 Requests.
>>
>>Fri Oct  3 02:21:19 /etc/mail/spamassassin
>>root at Gabriel# postqueue -p | tail -1
>>-- 1168 Kbytes in 628 Requests.
>>
>>Fri Oct  3 02:21:25 /etc/mail/spamassassin
>>root at Gabriel# postqueue -c /etc/postfix.in -p | tail -1
>>-- 74947 Kbytes in 6394 Requests.
>>
>>Fri Oct  3 02:22:54 /etc/mail/spamassassin
>>root at Gabriel# postqueue -p | tail -1
>>-- 889 Kbytes in 465 Requests.
>>
>>Fri Oct  3 02:22:56 /etc/mail/spamassassin
>>root at Gabriel# postqueue -c /etc/postfix.in -p | tail -1
>>-- 73887 Kbytes in 6245 Requests.
>>
>>Fri Oct  3 02:26:04 /etc/mail/spamassassin
>>root at Gabriel# postqueue -p | tail -1
>>-- 2131 Kbytes in 665 Requests.
>>
>>Fri Oct  3 02:26:07 /etc/mail/spamassassin
>>root at Gabriel# postqueue -c /etc/postfix.in -p | tail -1
>>-- 71968 Kbytes in 5934 Requests.
>>
>>Fri Oct  3 02:27:12 /etc/mail/spamassassin
>>root at Gabriel# postqueue -p | tail -1
>>-- 3866 Kbytes in 851 Requests.
>>
>>Fri Oct  3 02:27:18 /etc/mail/spamassassin
>>root at Gabriel#
>>
>>Fri Oct  3 02:30:28 /etc/mail/spamassassin
>>root at Gabriel# postqueue -c /etc/postfix.in -p | tail -1
>>-- 68222 Kbytes in 5465 Requests.
>>
>>Fri Oct  3 02:30:37 /etc/mail/spamassassin
>>root at Gabriel# postqueue -p | tail -1
>>-- 7310 Kbytes in 1032 Requests.
>>
>>Fri Oct  3 02:30:43 /etc/mail/spamassassin
>>root at Gabriel#
>>
>>Fri Oct  3 03:46:02 /etc/mail/spamassassin
>>root at Gabriel# postqueue -c /etc/postfix.in -p | tail -1
>>Mail queue is empty
>>
>>Fri Oct  3 03:46:03 /etc/mail/spamassassin
>>root at Gabriel# postqueue -p | tail -1
>>-- 2032 Kbytes in 310 Requests.
>>
>>Fri Oct  3 03:46:06 /etc/mail/spamassassin
>>root at Gabriel#
>>
>>Fri Oct  3 09:26:59 /etc/mail/spamassassin
>>root at Gabriel# postqueue -c /etc/postfix.in -p | tail -1
>>Mail queue is empty
>>
>>Fri Oct  3 09:27:00 /etc/mail/spamassassin
>>root at Gabriel# postqueue -p | tail -1
>>-- 35 Kbytes in 6 Requests.
>>
>>Fri Oct  3 09:27:02 /etc/mail/spamassassin
>>root at Gabriel#
>>
>>Fri Oct  3 11:13:48 /etc/mail/spamassassin
>>root at Gabriel# postqueue -c /etc/postfix.in -p | tail -1
>>Mail queue is empty
>>
>>Fri Oct  3 11:13:49 /etc/mail/spamassassin
>>root at Gabriel# postqueue -p | tail -1
>>-- 35 Kbytes in 6 Requests.
>>Fri Oct  3 11:58:53 /etc/mail/spamassassin
>>root at Gabriel# postqueue -c /etc/postfix.in -p | tail -1
>>Mail queue is empty
>>
>>Fri Oct  3 11:58:54 /etc/mail/spamassassin
>>root at Gabriel# postqueue -p | tail -1
>>-- 39 Kbytes in 7 Requests.
>>
>>Fri Oct  3 11:58:57 /etc/mail/spamassassin
>>
>>Fri Oct  3 16:05:27 /etc/MailScanner
>>root at Gabriel# grep "skipped, still being delivered" /var/log/maillog |
>>tail -2
>>Oct  3 02:51:24 Gabriel postfix/nqmgr[13729]: 095461DE774: skipped, still
>>being delivered
>>Oct  3 02:51:24 Gabriel postfix/nqmgr[13729]: 09AE61DEBC9: skipped, still
>>being delivered
>>====================
>>
>>I still need to finish testing the DNS-based spam checks to find which
>>one(s) were timing out.  My testing and documentation weren't the best on
>>this one, but I apparently did something which coincided with the
>>remission of my immediate problem. :)
>>
>>--Eric
>
>--
>Julian Field
>www.MailScanner.info
>Professional Support Services at www.MailScanner.biz
>MailScanner thanks transtec Computers for their support
>PGP footprint: EE81 D763 3DB0 0BFD E1DC  7222 11F6 5947 1415 B654
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Postfix.pm.patch
Type: application/octet-stream
Size: 1674 bytes
Desc: not available
Url : http://lists.mailscanner.info/pipermail/mailscanner/attachments/20031004/0cb930a0/Postfix.pm.obj
-------------- next part --------------
--
Julian Field
www.MailScanner.info
Professional Support Services at www.MailScanner.biz
MailScanner thanks transtec Computers for their support
PGP footprint: EE81 D763 3DB0 0BFD E1DC  7222 11F6 5947 1415 B654


More information about the MailScanner mailing list