Correct regexp to block mails that advertise certain websites?

Matt Kettler mkettler at evi-inc.com
Fri Jan 11 18:43:11 GMT 2008


Remco Barendse wrote:
> Hi list!
> 
> I would like to block e-mails that always contain a certain url that 
> allows to "unsubscribe".
> 
> The unsubcribe url from the e-mail is this in plain text :
> <a 
> href="http://www.ur...ethanes-technology-international.com/unsubscribe.aspx?id=31797&email=whoever@whatever.com">click 
> here to unsubscribe</a> .
> 
> I tried to block this using the following regexp :
> body     URETH1 
> /^http:\/\/.*\.ur...ethanes-technology-international\.com\//
> describe URETH1          urethane rubbish
> score    URETH1          100
> 
> (deliberately broke the url with some dots as my message doesn't seem to 
> make it to the list)
> 
> But nothing happens, where do i go wrong?

You can't use a body rule to match HTML tags in SpamAssassin. By definition, all 
HTML tags are removed prior to running the body rules, to avoid spammers 
obfuscating strings with comment tags or things like <b></b>.

You really want a uri rule for this. rawbody would also work, but could fail to 
match sometimes due to line wraps (both body and uri are normalized to remove 
linewraps, but rawbody isn't.)

I'd also warn you about using ^. In general, for body rules it makes no sense at 
all to use ^, as line breaks are removed for this rule type. This would force it 
to only match at the start of the message, not the start of a line in the body. 
(actually, for long messages SA does break it up into chunks, so it would really 
match the start of any "chunk"). For URI rules you're probably OK with it being 
there, but it's probably not needed.

so try this instead:

uri URETH1	/^http:\/\/.*\.ur...ethanes-technology-international\.com\//

Or you could simplify to:

uri URETH1	/\.ur.{3}ethanes-technology-international\.com\//

However, the latter would be subject to matching mailto: uri's, which you might 
not want.



More information about the MailScanner mailing list