Max SpamAssassin Size problems

Logan Shaw lshaw at emitinc.com
Thu Aug 24 18:23:06 IST 2006


On Thu, 24 Aug 2006, Julian Field wrote:
> Anthony Peacock wrote:
>> Julian Field wrote:

>>> Sounds survivable. After the limit I will keep going until I hit the
>>> first line that only contains white space.

>> I have been watching this discussion with a growing uneasiness.  I
>> could be wrong but doesn't this behaviour open up the system to
>> problems with huge image files...

> Yes, you are absolutely correct. Non-spam may well include huge images.
> The problem with rewinding to the previous boundary is that you may end
> up not giving SpamAssassin _anything_ to work with.
>
> So it's up for a vote:
>
> do I chop half way through an image?
> do I chop at the end of an image?
> do I carry on for a max of 100 lines of Base64 data or until the end of
> an image, which is earlier?

I don't like the last option at all.  It still easily allows
a situation where a valid message with a valid image in it
gets detected as a corrupt image and hits a rule that scores
it as spam.

If we assume there are 80 columns of base64 data per line, then
we get 60 bytes per line (since each base64 character carries
6 bits of data).  That means 100 lines only holds 6K, maximum.

So this option only works if the chop-off point randomly
happens to fall within the last 6K (or less) of the image.
If the max message size causes the initial chop-off point to
fall any earlier, it still creates an invalid image.  If you
have a 50K max message size and someone sends a 75K image
(which is not out of the ordinary at all), this method will
keep going up to 56K and then quit.

Basically, adding the 100 extra lines is really not much better
than chopping right at the max message size barrier, unless
you assume that most images aren't much larger than 6K, which
I don't think is a valid assumption at all.  So, this option
adds extra complexity and doesn't really give much benefit.

   - Logan


More information about the MailScanner mailing list