Correct regexp to block mails that advertise certain websites?
Matt Kettler
mkettler at evi-inc.com
Fri Jan 11 18:43:11 GMT 2008
Remco Barendse wrote:
> Hi list!
>
> I would like to block e-mails that always contain a certain url that
> allows to "unsubscribe".
>
> The unsubcribe url from the e-mail is this in plain text :
> <a
> href="http://www.ur...ethanes-technology-international.com/unsubscribe.aspx?id=31797&email=whoever@whatever.com">click
> here to unsubscribe</a> .
>
> I tried to block this using the following regexp :
> body URETH1
> /^http:\/\/.*\.ur...ethanes-technology-international\.com\//
> describe URETH1 urethane rubbish
> score URETH1 100
>
> (deliberately broke the url with some dots as my message doesn't seem to
> make it to the list)
>
> But nothing happens, where do i go wrong?
You can't use a body rule to match HTML tags in SpamAssassin. By definition, all
HTML tags are removed prior to running the body rules, to avoid spammers
obfuscating strings with comment tags or things like <b></b>.
You really want a uri rule for this. rawbody would also work, but could fail to
match sometimes due to line wraps (both body and uri are normalized to remove
linewraps, but rawbody isn't.)
I'd also warn you about using ^. In general, for body rules it makes no sense at
all to use ^, as line breaks are removed for this rule type. This would force it
to only match at the start of the message, not the start of a line in the body.
(actually, for long messages SA does break it up into chunks, so it would really
match the start of any "chunk"). For URI rules you're probably OK with it being
there, but it's probably not needed.
so try this instead:
uri URETH1 /^http:\/\/.*\.ur...ethanes-technology-international\.com\//
Or you could simplify to:
uri URETH1 /\.ur.{3}ethanes-technology-international\.com\//
However, the latter would be subject to matching mailto: uri's, which you might
not want.
More information about the MailScanner
mailing list