Max SpamAssassin Size problems

Fri Aug 25 19:40:08 IST 2006

Glenn Steen spake the following on 8/25/2006 3:53 AM:
> On 25/08/06, Anthony Peacock <a.peacock at chime.ucl.ac.uk> wrote:
>> Ken A wrote:
>> >
>> >
>> > Logan Shaw wrote:
>> >> On Thu, 24 Aug 2006, Julian Field wrote:
>> >>> Anthony Peacock wrote:
>> >>>> Julian Field wrote:
>> >>
>> >>>>> Sounds survivable. After the limit I will keep going until I hit
>> the
>> >>>>> first line that only contains white space.
>> >>
>> >>>> I have been watching this discussion with a growing uneasiness.  I
>> >>>> could be wrong but doesn't this behaviour open up the system to
>> >>>> problems with huge image files...
>> >>
>> >>> Yes, you are absolutely correct. Non-spam may well include huge
>> images.
>> >>> The problem with rewinding to the previous boundary is that you
>> may end
>> >>> up not giving SpamAssassin _anything_ to work with.
>> >>>
>> >>> So it's up for a vote:
>> >>>
>> >>> do I chop half way through an image?
>> >>> do I chop at the end of an image?
>> >>> do I carry on for a max of 100 lines of Base64 data or until the
>> end of
>> >>> an image, which is earlier?
>> >>
>> >> I don't like the last option at all.  It still easily allows
>> >> a situation where a valid message with a valid image in it
>> >> gets detected as a corrupt image and hits a rule that scores
>> >> it as spam.
>> >>
>> >> If we assume there are 80 columns of base64 data per line, then
>> >> we get 60 bytes per line (since each base64 character carries
>> >> 6 bits of data).  That means 100 lines only holds 6K, maximum.
>> >>
>> >> So this option only works if the chop-off point randomly
>> >> happens to fall within the last 6K (or less) of the image.
>> >> If the max message size causes the initial chop-off point to
>> >> fall any earlier, it still creates an invalid image.  If you
>> >> have a 50K max message size and someone sends a 75K image
>> >> (which is not out of the ordinary at all), this method will
>> >> keep going up to 56K and then quit.
>> >>
>> >> Basically, adding the 100 extra lines is really not much better
>> >> than chopping right at the max message size barrier, unless
>> >> you assume that most images aren't much larger than 6K, which
>> >> I don't think is a valid assumption at all.  So, this option
>> >> adds extra complexity and doesn't really give much benefit.
>> >>
>> >>   - Logan
>> >
>> > I'm all for #3 and and just set "score FUZZY_OCR_CORRUPT_IMG 0" if you
>> > are worried about false positives. Fuzzyocr will get better at sorting
>> > this out. And of course in the mean time, don't use outlook, since it
>> > will probably render corrupt images just fine. (it's a feature)
>>
>> This could be controversial here...
>>
>> <Evil Grin>
>> I have another suggestion, why don't we agree to leave the MailScanner
>> code alone.  Those people who are experiencing problems with broken
>> images can raise the value of "Max SpamAssassin Size" in *THEIR*
>> configurations, the rest of us can carry on as normal.
>>
>> There is already a way for people to adjust how much information SA gets
>> from MailScanner, people who need more information can used that on
>> their systems.
>> </Evil Grin>
>>
>> <Ducks and Runs>
>>
> No need for dramatic escapes:-)
> You and Logan have made some good arguments for the status quo...
> After all, one needs to assess which is the lesser evil and go with
> that.
> On the first readthrough I was simply not looking at this from the
> correct perspective:-). MailScanner shouldn't need solve this
> "problem", at least not in such a way that it invites a possible DoS
> (which is far more dire than a simple SA rule "missfire", of course).
> That just tells us that both option 1 and 3 are viable though, so any
> argument for option 3 would need show that it would actually be
> worthwile to complicate the code further... And I can say I didn't do
> my maths (shame on me), but Logan shows that the usefulness of option
> 3 is rather less than we could assume at the outset. Oh well. Change
> my vote there to number 1.
> 
I still vote for execute the spammers!
<More Evil Grin>
How much perl code will that take?
Or do you just have to beat them with whatever is handy?

-- 

MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!