Max SpamAssassin Size problems -- round 2

Logan Shaw lshaw at emitinc.com
Mon Aug 28 19:15:05 IST 2006


On Mon, 28 Aug 2006, Julian Field wrote:
> All fair points. Which brings us back to the beginning.
> The option which got the biggest number of votes was along the lines of
> this:
>
> for ($lines=$size=0; $lines<100 && $size<20_000; $lines++)
> {
>   $line = getnextline();
>   $size += length($line);
>   last if $size>20_000;
>   push @SAinput, $line;
>   last if $line =~ /^\s*$/;
> }
>
> It should keep copying lines until we hit a line that is only whitespace
> (or blank) or until we have copied 20k of extra data, whichever comes
> first. And it won't be confused by nearly 20k of extra data followed by
> 1 huge line lasting for mbytes.
>
> Is that a reasonable compromise?

I like the idea of trying to be a little intelligent and
flexible about where you chop the message is a good one.
That seems to me to have value.  If you can chop at an
attachment boundary, that's good, so chopping at the first
boundary within a window (of bytes and/or lines) is a good
thing.  It will work some of the time.

However, I still think there needs to be an answer to the
question of what to do when the window method fails to solve
the problem.  I think that will happen frequently enough that
it's important to be intentional about it.

So, if the boundary does not lie in the window, what is the best
thing to do?  It seems to me you have three reasonable options:
(1) chop somewhere inside the window anyway,
(2) keep going to the end of the current attachment and
     chop after it's over,
(3) roll back to the beginning of the current attachment,
     and chop before it begins.

Implications of each:
#1: False positives because of how FuzzyOcr behaves.
#2: Possible denial-of-service attack because you're
     allowing input to bypass limits on message size.
#3: False negatives because you're not scanning the
     whole message.

Personally, I think if you have to err, you should err on the
side of false negatives rather than false positives.  So that
eliminates #1 in my mind.  That leaves only #2 and #3.  Now,
I think, if you want to scan the whole message, just turn off
the "Max SpamAssassin Size" limit completely.  That makes #2
fairly useless, or at least redundant.

To me, that means that if you care about what happens when
the intelligent window things fail, you've got to go with #3,
which is roll back to the beginning of the current attachment.
In Perl code, that shouldn't be too terribly hard: if you reach
the end of the window loop and you haven't found a blank line
within the window, then just keep popping lines off @SAinput
until you get a blank line.  Something like this:

     my $found_boundary = 0;
     for ($lines=$size=0; $lines<100 && $size<20_000; $lines++)
     {
 	$line = getnextline();
 	$size += length($line);
 	last if $size>20_000;
 	push @SAinput, $line;

 	if ($line =~ /^\s*$/)
 	{
 	    $found_boundary = 1;
 	    last;
 	}
     }

     # roll back to previous blank line before the window
     if (not $found_boundary)
     {
 	until ($SAinput[$#SAinput] =~ /^\s*$/)
 	{
 	    pop @SAinput;
 	}
     }

Of course, that's not guaranteed to be bug free.  Actually,
it definitely isn't bug free since even the original loop
doesn't check if getnextline() is telling you that you've
already read the last line in the input.  So maybe this:

     my $found_boundary = 0;
     for ($lines=$size=0; $lines<100 && $size<20_000; $lines++)
     {
 	$line = getnextline();
 	if (not defined $line)
 	{
 	    $found_boundary = 1;
 	    last;
 	}

 	$size += length($line);
 	last if $size>20_000;
 	push @SAinput, $line;

 	if ($line =~ /^\s*$/)
 	{
 	    $found_boundary = 1;
 	    last;
 	}
     }

     # roll back to previous blank line before the window
     if (not $found_boundary)
     {
 	until ($SAinput[$#SAinput] =~ /^\s*$/)
 	{
 	    pop @SAinput;
 	}
     }

Well, that gets the general idea across at least.

   - Logan


More information about the MailScanner mailing list