Help with a regexp
Steve Campbell
campbell at cnpapers.com
Tue Aug 26 12:39:01 IST 2008
Thanks Steve and Dominik.
I'll try one and/or the other shortly.
Steve
Schramm, Dominik wrote:
> Hi Steve,
>
> Steve Campbell wrote on Monday, August 25, 2008 3:23 PM:
>
>
>> One of our domain names is cnpapers.com and another is cnpapers.net.
>> The SA rule URI_CHINA_ADJ catches a lot of our mail, and although
>> it is a relatively low scoring rule, it does contribute.
>>
>> The rule is defined as follows:
>>
>> /^(?:https?:\/\/)?.*\.cn.*/i
>>
>
> The regex says:
>
> an optional protocol prefix ("http://" or "https://"), followed
> by an arbitrary amount of arbitrary characters (which may be omitted
> altogether), followed by ".cn", followed by an arbitrary amount of
> arbitrary characters (which may be omitted altogether). So ".cn" is
> the only obligatory character string and sufficient for the regex
> to match; the scanner probably finds somethings like
> mailhost.cnpapers.com in the headers or http://www.cnpapers.com
> in the footer.
>
> What it should catch IMHO is:
>
> an optional protocol prefix ("http://" or "https://"), followed
> by an arbitrary amount of arbitrary characters (which may be omitted
> altogether), followed by ".cn", either followed by a slash or followed
> by whitespace, followed by an arbitrary amount of arbitrary characters
> (which may be omitted altogether).
>
> And that would translate back into a regex like this:
>
> /^(?:https?:\/\/)?.*\.cn(?:\/|\s).*/i
>
> However, I find the expression rather vague, even like this. It
> should restrict the characters between the optional http(s) and
> ".cn" to those allowed in domain names.
>
> Hope this helps,
> Dominik
>
>
More information about the MailScanner
mailing list