sendmail message splitting defeats bandwidth savings?

Mon Nov 3 15:00:19 GMT 2003

I've had the sendmail message splitting running fine for a while, since it
was the only way to get MailScanner whitelisting to be controlled granularly
(per user instead of per message).

However, I'm concerned that the resulting increase in mail-related bandwidth
consumption is many times greater than the actual savings being gained by
filtering out spam.  (Yes, I realize bandwidth savings aren't the only
reason to filter spam, but they are important.)

I'm unclear on why, technically, it still makes sense not to have MS split
messages only when needed, instead of using sendmail queue groups to do it.
It seems MS is already decoding the messages (MCP, HTML tag checking, etc),
so the increase in cpu ought to be negligible.  All messages landing in
mqueue should pass through MS, so MS can easily ensure it doesn't create
duplicate queue IDs.

Hopefully I'm just incorrect in my current understanding of what happens
when a message is split by sendmail (please correct me if so), but this is
how I think things change when queue groups are used:

Without queue group message splitting:
1. One message comes in meant for many recipients at the same domain.
2. Sendmail writes one queue file pair.
3. MailScanner scans and re-queues that message.
4. Sendmail delivers the message, sending it ONLY ONCE over the wire to the
next MX.

With queue group message splitting:
1. One message comes in meant for many recipients at the same domain.
2. Sendmail writes many queue file pairs.
3. MailScanner scans and re-queues all of the (now many) messages.
4. Sendmail delivers the messages, one copy per recipient, resulting in the
original message being sent MANY TIMES over the wire to the next MX.

The message splitting feature applies to ALL messages, not just spam.  This
means that we may drastically increase our bandwidth usage just by turning
it on, regardless of whether we're doing spam checking.  I've already seen a
few instances where the reason our internal WAN links were pegged for an
hour could be directly traced to this change in delivery architecture.

By contrast what I'd prefer MS to do is: if a message comes in bound for
multiple recipients and only a few of those recipients should be handled
specially (whitelisted), create separate copies of the message for those
recipients, queuing the files into mqueue by generating its own IDs.

To be fair, I realize this is probably not a big concern for most sites, but
for a site with mail delivery to remote mail box servers over many expensive
WAN links, this can be a significant problem.

Any suggestions or corrections would be appreciated.