Max SpamAssassin Size problems

Ken A ka at
Fri Aug 25 23:50:07 IST 2006

Glenn Steen wrote:
> On 25/08/06, Anthony Peacock <a.peacock at> wrote:
>> Ken A wrote:
>> >
>> >
>> > Logan Shaw wrote:
>> >> On Thu, 24 Aug 2006, Julian Field wrote:
>> >>> Anthony Peacock wrote:
>> >>>> Julian Field wrote:
>> >>
>> >>>>> Sounds survivable. After the limit I will keep going until I hit 
>> the
>> >>>>> first line that only contains white space.
>> >>
>> >>>> I have been watching this discussion with a growing uneasiness.  I
>> >>>> could be wrong but doesn't this behaviour open up the system to
>> >>>> problems with huge image files...
>> >>
>> >>> Yes, you are absolutely correct. Non-spam may well include huge 
>> images.
>> >>> The problem with rewinding to the previous boundary is that you 
>> may end
>> >>> up not giving SpamAssassin _anything_ to work with.
>> >>>
>> >>> So it's up for a vote:
>> >>>
>> >>> do I chop half way through an image?
>> >>> do I chop at the end of an image?
>> >>> do I carry on for a max of 100 lines of Base64 data or until the 
>> end of
>> >>> an image, which is earlier?
>> >>
>> >> I don't like the last option at all.  It still easily allows
>> >> a situation where a valid message with a valid image in it
>> >> gets detected as a corrupt image and hits a rule that scores
>> >> it as spam.
>> >>
>> >> If we assume there are 80 columns of base64 data per line, then
>> >> we get 60 bytes per line (since each base64 character carries
>> >> 6 bits of data).  That means 100 lines only holds 6K, maximum.
>> >>
>> >> So this option only works if the chop-off point randomly
>> >> happens to fall within the last 6K (or less) of the image.
>> >> If the max message size causes the initial chop-off point to
>> >> fall any earlier, it still creates an invalid image.  If you
>> >> have a 50K max message size and someone sends a 75K image
>> >> (which is not out of the ordinary at all), this method will
>> >> keep going up to 56K and then quit.
>> >>
>> >> Basically, adding the 100 extra lines is really not much better
>> >> than chopping right at the max message size barrier, unless
>> >> you assume that most images aren't much larger than 6K, which
>> >> I don't think is a valid assumption at all.  So, this option
>> >> adds extra complexity and doesn't really give much benefit.
>> >>
>> >>   - Logan
>> >
>> > I'm all for #3 and and just set "score FUZZY_OCR_CORRUPT_IMG 0" if you
>> > are worried about false positives. Fuzzyocr will get better at sorting
>> > this out. And of course in the mean time, don't use outlook, since it
>> > will probably render corrupt images just fine. (it's a feature)
>> This could be controversial here...
>> <Evil Grin>
>> I have another suggestion, why don't we agree to leave the MailScanner
>> code alone.  Those people who are experiencing problems with broken
>> images can raise the value of "Max SpamAssassin Size" in *THEIR*
>> configurations, the rest of us can carry on as normal.
>> There is already a way for people to adjust how much information SA gets
>> from MailScanner, people who need more information can used that on
>> their systems.
>> </Evil Grin>

Spammers do like to use broken gif images, and MailScanner should not 
default to looking like a spammer to an SA plugin.

Maybe set "Max SpamAssassin Size" to a larger value and roll back to the 
previous mime boundry if MailScanner would otherwise be truncating an 

Or, would it be possible to skip the mime part if it was over a certain 
size, and continue with the rest of the message as if that part didn't 

Ken A.

>> <Ducks and Runs>
> No need for dramatic escapes:-)
> You and Logan have made some good arguments for the status quo...
> After all, one needs to assess which is the lesser evil and go with
> that.
> On the first readthrough I was simply not looking at this from the
> correct perspective:-). MailScanner shouldn't need solve this
> "problem", at least not in such a way that it invites a possible DoS
> (which is far more dire than a simple SA rule "missfire", of course).
> That just tells us that both option 1 and 3 are viable though, so any
> argument for option 3 would need show that it would actually be
> worthwile to complicate the code further... And I can say I didn't do
> my maths (shame on me), but Logan shows that the usefulness of option
> 3 is rather less than we could assume at the outset. Oh well. Change
> my vote there to number 1.

More information about the MailScanner mailing list