Max SpamAssassin Size problems -- round 2

Tue Aug 29 09:19:43 IST 2006

Logan Shaw wrote:
> On Mon, 28 Aug 2006, Julian Field wrote:
>> All fair points. Which brings us back to the beginning.
>> The option which got the biggest number of votes was along the lines of
>> this:
>>
>> for ($lines=$size=0; $lines<100 && $size<20_000; $lines++)
>> {
>>   $line = getnextline();
>>   $size += length($line);
>>   last if $size>20_000;
>>   push @SAinput, $line;
>>   last if $line =~ /^\s*$/;
>> }
>>
>> It should keep copying lines until we hit a line that is only whitespace
>> (or blank) or until we have copied 20k of extra data, whichever comes
>> first. And it won't be confused by nearly 20k of extra data followed by
>> 1 huge line lasting for mbytes.
>>
>> Is that a reasonable compromise?
> 
> I like the idea of trying to be a little intelligent and
> flexible about where you chop the message is a good one.
> That seems to me to have value.  If you can chop at an
> attachment boundary, that's good, so chopping at the first
> boundary within a window (of bytes and/or lines) is a good
> thing.  It will work some of the time.

If we agree that MS should be as friendly to SA as possible, and Julian 
is happy to make some changes, then I think this is the best option.

I do not like the idea of just ignoring messages over the "Max SA Size" 
and not passing them to SA at all.  That would lower the overall 
effectiveness of scanning.  I think that having a flexible window around 
the "Max SA Size" to try to find the end of an image is a good idea.

> However, I still think there needs to be an answer to the
> question of what to do when the window method fails to solve
> the problem.  I think that will happen frequently enough that
> it's important to be intentional about it.

Agreed!

> So, if the boundary does not lie in the window, what is the best
> thing to do?  It seems to me you have three reasonable options:
> (1) chop somewhere inside the window anyway,
> (2) keep going to the end of the current attachment and
>     chop after it's over,
> (3) roll back to the beginning of the current attachment,
>     and chop before it begins.

I would vote for No 3, as long as this did not make the code changes too 
complicated.  I think that this has the advantage of passing something 
to SA to scan (headers, leading text, etc), without risking sending a 
broken image to SA.

<SNIP good summary of effects of above options>

-- 
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"If you have an apple and I have  an apple and we  exchange apples
then you and I will still each have  one apple. But  if you have an
idea and I have an idea and we exchange these ideas, then each of us
will have two ideas." -- George Bernard Shaw