An express checkout? [was: Re: Postfix and Mailscanner sitting in a tree k-iss-ing]

paddy paddy at PANICI.NET
Fri Dec 31 19:55:40 GMT 2004


On Fri, Dec 31, 2004 at 06:55:32PM +0000, Julian Field wrote:
> paddy wrote:
>
> >On Fri, Dec 31, 2004 at 05:33:36PM +0000, Julian Field wrote:
> >
> >
> >>paddy wrote:
> >>
> >>
> >>
> >>>On Thu, Dec 30, 2004 at 06:00:20PM +0000, Julian Field wrote:
> >>>
> >>>
> >>>
> >>>
> >>>>Incidentally, something in the same vein has already been done for
> >>>>Communigate Pro, but I have never looked at that. I suspect (though
> >>>>without evidence either way) that it is not approx. 100% robust in the
> >>>>face of a concerted DoS attack. I go to some lengths to try to ensure
> >>>>that, when under attack, the MTA will give out long before MailScanner
> >>>>does.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>Looking at CriticalQueueSize I wonder if there might be some optimisation
> >>>possible that prefers to processes mail that is more likely to ham by
> >>>some
> >>>simple criteria, a sort of express checkout.
> >>>
> >>>Possible devices to avoid starvation on the main queue could include
> >>>running
> >>>a slow queue along side the fast queue, so that the system is always
> >>>progressing,
> >>>and/or an elevator algorithm.
> >>>
> >>>
> >>>
> >>>
> >>Either you appear to have changed the subject without changing the
> >>Subject: or else I don't understand the link between the quoted text and
> >>your reply...
> >>
> >>
> >
> >Okay, I confess it was a bit of a leap ... :)
> >
> >I started from 'in the face of a concerted DoS attack'.
> >
> >I've seen peaks of up to 500+ messages queued, which can take some time
> >(on a tiny, underpowered, overloaded system) to clear up.  At times like
> >that I find myself doing stuff like
> >
> >tail -f maillog | grep delay | sed 's/\(...............\).*\]:
> >\([^:]*\):.* delay=\([^,]*\),.*/\1 \2 \3/'
> >
> >and trying not to worry about it!
> >
> >(or maybe that should be xdelay, I don't know!)
> >
> >But its not a feature request, more of a 'what do you think?'
> >
> >So, yes, this no longer bears any relation to the previous Subject, sorry!
> >
> >
> What does your sed command actually do?

It pulls three fields from a sendmail syslog entry.  That gives me something
I imagine to be an indicator of how long mail has languished in the queue.
Output looks like:

Dec 31 17:29:08 iBVHSYR02477 00:00:00
Dec 31 17:32:27 iBVHVSR02862 00:00:00
Dec 31 17:33:45 iBVHX6R02977 00:00:00
Dec 31 17:34:20 iBVHXhR03023 00:00:00

On a good day :)

> And what would your quick "likely to be ham" test do?

<more hand-waving ahead!>

perhaps use info in the qf file (I realise this is a
relatively expensive operation, which is the last thing that you need
if your optimising solely for load, but the objective would be to buy
latency at that cost) to match against some 'not-quite-white' list
perhaps generated in a similiar fashion to the WhiteListFutureSender
(ie: there is a history of transactions in the opposite direction)
stuff that I posted here, for example.

I don't know really, maybe its just a hairbrained idea. :)

It makes me think though.  My system almost serialised, there is a very
low limit on the number of simultaneous mailscanner children (because at
some point I thought it was a source of problems - we have limited vm to
work with), but this idea mirrors the kind of scheduling that goes on
inside a maliscanner batch, more parallelism.

Rather desperately perhaps, I find the thread of my thought returns to
the postfix conversation:

Imagine for a moment that postfix can take delivery of mail, and pass it
on to mailscanner without it paging out to disk in an operation that
optimises away to a little page-table shuffling (which is going to happen
anyway).  I don't know if this optimistic view is possible, but anyway ...
MailScanner would have a chance to do or kick off some operations before
the mail pages out to disk.  Would this ever improve throughput in a
high load situation?  I really don't know, I can't immediately see how it
would, but it perhaps it could improve the cost of any 'express checkout'.

(for that matter, if mailscanner reads the files as they appear in the
queue the pages may still be in RAM anyway)

<watch out soapbox ahead!>

Until recently I thought of whitelisting as a rather naive approach to
the problem (what about the unsolicited mail that you want, for example?)
But I've come round to the idea that relatively good stuff is often easy to
identify, and an ability to optimise for latency, to let the mice through
quickly, so to speak, is a usefull strategy.  Faced with two tasks,
one of which is five minutes, one five hours, equal priority, no deadline
trouble, I tend to do the short one first.  Similarly, a higher value
task.  I don't really care if spam is badly delayed, especially within
some reasonable limit, and mailscanner already solves this problem
excellently.  An express checkout could mean better mix of QOS against
spamminess in a high-load situation.  Then again I could easily be
missing the bit where this is all done already!

Regards,
Paddy
--
Perl 6 will give you the big knob. -- Larry Wall

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the MAQ (http://www.mailscanner.biz/maq/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!




More information about the MailScanner mailing list