creating rules

Matt Kettler mkettler at EVI-INC.COM
Thu Sep 29 01:43:34 IST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "US-ASCII" character set.  ]
    [ Some characters may be displayed incorrectly. ]

No, In summary the answer is:

/Re\[\d{1,3}\]:/i

Nathan Olson wrote:
> I think, in summary, the answer to his immediate question is /Re\[(\d)*\]:/i

Ouch.. use of * and creation of a back-reference.. two points for the CPU wasters :)

I'd also question accuracy. The above would also match Re[] with no numbers in
between. Although that's not likely to happen, I don't think is intended and
unintended hits are the bane of any regex writer.


Regex performance tip 1: Never use () in a SA regex without really knowing what
you're doing. In particular never parenthesis a single-element like that unless
you intend to do a back reference later (ie: you know what \1 does and mean to
use it). The parens above serve no purpose except to create a back reference.
\d* would work the same as (\d)*, but the latter burns more memory and cpu to run.

Don't believe me? try this:

$ perl -Mre=debug -e '/(\d)*/'
Compiling REx `(\d)*'
size 10 Got 84 bytes for offset annotations.

$ perl -Mre=debug -e '/\d*/'
Compiling REx `\d*'
size 3 Got 28 bytes for offset annotations.

Ouch.

If you must use parens, use (?: instead of (. The ?: modifier tells perl not to
store a back-reference. Check the rules that come with SA, they ALL use this
syntax. For example:
body FIN_FREE                     /\bfinancial(?:ly)? free/i



regex performance tip 2: (re-iterated from previous mail) don't use * or + in
your SA rules if you can avoid it. If you must, think LONG and HARD about the
maximal likely expansion size. In general, use {n,n} syntax instead to avoid
over-long expansions.

While *'s are generally safe in header rules, they can be catastrophic in body
rules, so it's just a good habit to avoid them altogether.

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!



More information about the MailScanner mailing list