Help with a regexp
Schramm, Dominik
dominik.schramm at businessmart.de
Mon Aug 25 15:44:31 IST 2008
Hi Steve,
Steve Campbell wrote on Monday, August 25, 2008 3:23 PM:
> One of our domain names is cnpapers.com and another is cnpapers.net.
> The SA rule URI_CHINA_ADJ catches a lot of our mail, and although
> it is a relatively low scoring rule, it does contribute.
>
> The rule is defined as follows:
>
> /^(?:https?:\/\/)?.*\.cn.*/i
The regex says:
an optional protocol prefix ("http://" or "https://"), followed
by an arbitrary amount of arbitrary characters (which may be omitted
altogether), followed by ".cn", followed by an arbitrary amount of
arbitrary characters (which may be omitted altogether). So ".cn" is
the only obligatory character string and sufficient for the regex
to match; the scanner probably finds somethings like
mailhost.cnpapers.com in the headers or http://www.cnpapers.com
in the footer.
What it should catch IMHO is:
an optional protocol prefix ("http://" or "https://"), followed
by an arbitrary amount of arbitrary characters (which may be omitted
altogether), followed by ".cn", either followed by a slash or followed
by whitespace, followed by an arbitrary amount of arbitrary characters
(which may be omitted altogether).
And that would translate back into a regex like this:
/^(?:https?:\/\/)?.*\.cn(?:\/|\s).*/i
However, I find the expression rather vague, even like this. It
should restrict the characters between the optional http(s) and
".cn" to those allowed in domain names.
Hope this helps,
Dominik
More information about the MailScanner
mailing list