MailScanner content scanning for keywords

Sat Jul 16 02:23:48 IST 2005

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "US-ASCII" character set.  ]
    [ Some characters may be displayed incorrectly. ]

On Sat, 16 Jul 2005 07:30 am, Matt Kettler wrote:
> Spammers use thousands of variants of the word "Viagra", do you want to
> dictionary them all? 1 regex rule detects absurd numbers of of possible
> spellings:
>
> /(?:\b|\s)[_\W]{0,3}(?:\\\/|V)[_\W]{0,3}[a4ij1!|l\xCC-\xCF\xEC-\xEF][_\W]
>{0,3}[ila40\xC0-\xC6\xE0-\xE6@][_\W]{0,3}[x
> yz]?[gj][_\W]{0,3}rr?[_\W]{0,3}[a40\xC0-\xC6\xE0-\xE6@][_\W]{0,3}x?[_\W]{
>0,3}(?:\b|\s)/i

Good grief! That looks like a slightly extended version of the OBFU_VIAGRA 
rule I wrote about a year ago...I can tell coz it's still got the (?:\b|\s) 
rules which, syntactically can be replaced with [\b\s].  At least that's 
how it reads in my custom SA rules /now/ and works just the same (and is 
faster from my testing).

Perl gurus: Am I correct? does (?:\b|\s) == [\b\s] ??  If not, what's the 
difference?  AFAICT (?:...) matches something without creating the $x 
holder to refer to the match later, and [...] does the same thing except 
matches a set of individual characters.

So if you have (?:a|b|c|d|...|z) isn't that exactly the same as [a-z]?  
Obviously something like "fuss(?:ing|ed|y)?" is a where you'd want the 
(?:...) syntax - but I'm referring to matching individual characters.

Cheers,

James
-- 
It is better never to have been born.  But who among us has such luck?
One in a million, perhaps.

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!

    [ Part 2, Application/PGP-SIGNATURE  196bytes. ]
    [ Unable to print this part. ]