Max SpamAssassin Size problems

Glenn Steen glenn.steen at
Fri Aug 25 11:53:29 IST 2006

On 25/08/06, Anthony Peacock <a.peacock at> wrote:
> Ken A wrote:
> >
> >
> > Logan Shaw wrote:
> >> On Thu, 24 Aug 2006, Julian Field wrote:
> >>> Anthony Peacock wrote:
> >>>> Julian Field wrote:
> >>
> >>>>> Sounds survivable. After the limit I will keep going until I hit the
> >>>>> first line that only contains white space.
> >>
> >>>> I have been watching this discussion with a growing uneasiness.  I
> >>>> could be wrong but doesn't this behaviour open up the system to
> >>>> problems with huge image files...
> >>
> >>> Yes, you are absolutely correct. Non-spam may well include huge images.
> >>> The problem with rewinding to the previous boundary is that you may end
> >>> up not giving SpamAssassin _anything_ to work with.
> >>>
> >>> So it's up for a vote:
> >>>
> >>> do I chop half way through an image?
> >>> do I chop at the end of an image?
> >>> do I carry on for a max of 100 lines of Base64 data or until the end of
> >>> an image, which is earlier?
> >>
> >> I don't like the last option at all.  It still easily allows
> >> a situation where a valid message with a valid image in it
> >> gets detected as a corrupt image and hits a rule that scores
> >> it as spam.
> >>
> >> If we assume there are 80 columns of base64 data per line, then
> >> we get 60 bytes per line (since each base64 character carries
> >> 6 bits of data).  That means 100 lines only holds 6K, maximum.
> >>
> >> So this option only works if the chop-off point randomly
> >> happens to fall within the last 6K (or less) of the image.
> >> If the max message size causes the initial chop-off point to
> >> fall any earlier, it still creates an invalid image.  If you
> >> have a 50K max message size and someone sends a 75K image
> >> (which is not out of the ordinary at all), this method will
> >> keep going up to 56K and then quit.
> >>
> >> Basically, adding the 100 extra lines is really not much better
> >> than chopping right at the max message size barrier, unless
> >> you assume that most images aren't much larger than 6K, which
> >> I don't think is a valid assumption at all.  So, this option
> >> adds extra complexity and doesn't really give much benefit.
> >>
> >>   - Logan
> >
> > I'm all for #3 and and just set "score FUZZY_OCR_CORRUPT_IMG 0" if you
> > are worried about false positives. Fuzzyocr will get better at sorting
> > this out. And of course in the mean time, don't use outlook, since it
> > will probably render corrupt images just fine. (it's a feature)
> This could be controversial here...
> <Evil Grin>
> I have another suggestion, why don't we agree to leave the MailScanner
> code alone.  Those people who are experiencing problems with broken
> images can raise the value of "Max SpamAssassin Size" in *THEIR*
> configurations, the rest of us can carry on as normal.
> There is already a way for people to adjust how much information SA gets
> from MailScanner, people who need more information can used that on
> their systems.
> </Evil Grin>
> <Ducks and Runs>
No need for dramatic escapes:-)
You and Logan have made some good arguments for the status quo...
After all, one needs to assess which is the lesser evil and go with
On the first readthrough I was simply not looking at this from the
correct perspective:-). MailScanner shouldn't need solve this
"problem", at least not in such a way that it invites a possible DoS
(which is far more dire than a simple SA rule "missfire", of course).
That just tells us that both option 1 and 3 are viable though, so any
argument for option 3 would need show that it would actually be
worthwile to complicate the code further... And I can say I didn't do
my maths (shame on me), but Logan shows that the usefulness of option
3 is rather less than we could assume at the outset. Oh well. Change
my vote there to number 1.

-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se

More information about the MailScanner mailing list