At 17:09 12/12/2003, you wrote:
>On Fri, 2003-12-12 at 03:47, Randal, Phil wrote:
> > RFC 2396 ( generalises URIs.
>I only skimmed the spec. But what I gathered, unless I completely
>misunderstood the document is that characters from %00 through %1F
>inclusive and %7F are control characters and shouldn't be in a URI.
>    Although they are disallowed within the URI syntax, we include here a
>    description of those US-ASCII characters that have been excluded and
>    the reasons for their exclusion.
>    The control characters in the US-ASCII coded character set are not
>    used within a URI, both because they are non-printable and because
>    they are likely to be misinterpreted by some control mechanisms.
>    control     = <US-ASCII coded characters 00-1F and 7F hexadecimal>
>So how much trouble would we cause if we just disallowed the entire
>range of control characters from URIs? Can anyone think of a real website
>that legitimately uses any of these control codes within their URIs? I'm
>particularly concerned about shopping sites with their massive URIs.

Sounds good to me.

>I still think I would rather have MailScanner do the checking for this
>so we can notify the recipient properly, rather than just marking
>the message as high spam and/or deleting the message altogether. Perhaps
>we could even have MailScanner remove the link code altogether but still
>deliver the rest of the message.

Spotting the occurrence of these inside URIs is very hard to do reliably.
SpamAssassin goes to considerable lengths to do this, and I don't want to
attempt to duplicate their work. So I still say do it in SpamAssassin, but
probably in the MCP code which is used for direct actions on mail, rather
than the spam detection which is really just attempting to qualify the
