Crash protection
David Lee
t.d.lee at durham.ac.uk
Fri Mar 6 09:57:10 GMT 2009
On Thu, 5 Mar 2009, David Lee wrote:
> On Wed, 4 Mar 2009, Julian Field wrote:
> [...]
>> Please try the attached MessageBatch.pm (which I have compressed, of
>> course).
>> Please let me know if this fixes the problem.
>
> Will do; I have just installed it. (I made sure the inbound queue was empty
> and removed the previous "Processing.db" to give it a clean start.)
> [...]
First, the bad news: it is still occuring, so the patch seems not to have
made any difference.
-----------------------------------------------------------
Tries Message Last Tried
===== ======= ==========
1 n2650oUu021398 Fri Mar 6 05:05:35 2009
1 n2647uja010341 Fri Mar 6 04:12:49 2009
1 n2610rCJ022463 Fri Mar 6 01:05:22 2009
1 n2610rjK022464 Fri Mar 6 01:03:38 2009
1 n25J0ovL023772 Thu Mar 5 19:03:52 2009
1 n25I0msJ026885 Thu Mar 5 18:04:11 2009
1 n25H0sF7025852 Thu Mar 5 17:06:29 2009
1 n25H0oK1025828 Thu Mar 5 17:06:26 2009
1 n25C0uSx007184 Thu Mar 5 12:05:31 2009
1 n25A0bJ6029642 Thu Mar 5 10:05:57 2009
1 n25A0qAP029669 Thu Mar 5 10:05:12 2009
1 n25A0ZJX029632 Thu Mar 5 10:04:27 2009
-----------------------------------------------------------
Now the possibly good news.
Note that the times in both the above set and the previous set are
consistently soon after the hour. Pattern? And when I look in the
logfile for the sendmail id (the "n2..."), their final entries are
followed within one or two seconds by all the MS processes catching a
SIGHUP. More than coincidence?
(The above times are actually "next retry" with a random addition to
time-now; what they actually reflect are last updates to "Processing.db"
from a few minutes earlier.)
We have been running your spear-phishing script. And, of course, this has
an hourly cron-job which ends: "service MailScanner reload". Again, more
than coincidence?
I suspect some sort of interaction. Going into the realms of speculation:
When this new, db-enabled, version of MS has successfully processed any
email it now has to do two things:
1. Deliver it to the next stage, e.g. out-queue (ham); deletion (spam)
2. Remove from "Processing.db"
In all cases these need to happen as a single, atomic action. So I
suspect there is at least one outcome (particularly when "spam actions are
delete") in which these events are happening separately and
non-atomically, with the risk of an MS restart coming between them.
Guess: for a spam-deletion, MS firstly removes the {df,qf} pair from
in-queue but only later gets around to removing it from "Processing.db".
If MS stops (HUP signal, etc.) between them, then stale entries are left
in "Processing.db".
Is there sufficient signal-trapping to keep these things atomic? (There
may be other areas where this might apply.)
Plausible?
--
: David Lee I.T. Service :
: Senior Systems Programmer Computer Centre :
: UNIX Team Leader Durham University :
: South Road :
: http://www.dur.ac.uk/t.d.lee/ Durham DH1 3LE :
: Phone: +44 191 334 2752 U.K. :
More information about the MailScanner
mailing list