creating rules

Glenn Steen glenn.steen at GMAIL.COM
Thu Sep 29 09:08:51 IST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "US-ASCII" character set.  ]
    [ Some characters may be displayed incorrectly. ]

On 29/09/05, Matt Kettler <mkettler at evi-inc.com> wrote:
> Glenn Steen wrote:
> > On 29/09/05, Matt Kettler <mkettler at evi-inc.com> wrote:
> > (snip)
> >
> >>1) * isn't a character wildcard in regex, it's a repeat-count wildcard. ie: d*
> >>will match any number of d's in a row (including 0).
> >
> > (snip)
> >
> >>in light of 1-3 I'd rewrite that as:
> >>
> >>header PROLO_GSPAM15 Subject =~ /Re\[\d\]:/i
> >
> > Um, wouldn't
> > header PROLO_GSPAM15 Subject =~ /Re\[\d\d*\]:/i
> > or
> > header PROLO_GSPAM15 Subject =~ /Re\[\d+\]:/i
> > be better? I see a fair amount of these, and the're almost always "Re
> > [12[" or somesuch... Then again, for some reason most of these are
> > caught (BLs IIRC... @home now, can check tomorrow if anyone really
> > cares).
> > (snip)
>
> Personally, I'd use a range.. I'd do \d{1,2} or \d{1,3}..
>
> I never use + or * in spamassassin rules as a matter of general principle. While
> it would be harmless to use + or * here, they are quite dangerous and can cause
> extraordinarily large regex expansions which are painfully slow.
>
> In particular,  putting .* in a SA body rule can burn quite a lot of CPU cycles
> if the text leading up to it is common. For example /e.*xactly/i will do this
> very nicely. For every "e" in the message, it's going to have to scan the rest
> of the body looking for "xactly" anywhere in it, then go back and look for the
> next e... ouch!

Good point, very true.
That's why, _if_ one needs have a wildcard, one should always try
"anchor" the RE to some finite (and not to common) boundary... In
scanning input lines, Anchoring to the beginning of the line isn't a
bad stratagem.
But best is to avoid the wildcards as much as possible.
>
> >
> >>to messages, for example Microsoft outlook with the auto-bcc plugin.
> >
> > A lot of *curse-words* there:-)
>
> True, but a lot of not-that-uncommon *curse-words* there. Remember, management
> falls in the same class of curse words...

Oh, I live it... every day. Sigh.

--
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!



More information about the MailScanner mailing list