MailScanner content scanning for keywords
James Gray
james at grayonline.id.au
Sat Jul 16 02:23:48 IST 2005
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "US-ASCII" character set. ]
[ Some characters may be displayed incorrectly. ]
On Sat, 16 Jul 2005 07:30 am, Matt Kettler wrote:
> Spammers use thousands of variants of the word "Viagra", do you want to
> dictionary them all? 1 regex rule detects absurd numbers of of possible
> spellings:
>
> /(?:\b|\s)[_\W]{0,3}(?:\\\/|V)[_\W]{0,3}[a4ij1!|l\xCC-\xCF\xEC-\xEF][_\W]
>{0,3}[ila40\xC0-\xC6\xE0-\xE6@][_\W]{0,3}[x
> yz]?[gj][_\W]{0,3}rr?[_\W]{0,3}[a40\xC0-\xC6\xE0-\xE6@][_\W]{0,3}x?[_\W]{
>0,3}(?:\b|\s)/i
Good grief! That looks like a slightly extended version of the OBFU_VIAGRA
rule I wrote about a year ago...I can tell coz it's still got the (?:\b|\s)
rules which, syntactically can be replaced with [\b\s]. At least that's
how it reads in my custom SA rules /now/ and works just the same (and is
faster from my testing).
Perl gurus: Am I correct? does (?:\b|\s) == [\b\s] ?? If not, what's the
difference? AFAICT (?:...) matches something without creating the $x
holder to refer to the match later, and [...] does the same thing except
matches a set of individual characters.
So if you have (?:a|b|c|d|...|z) isn't that exactly the same as [a-z]?
Obviously something like "fuss(?:ing|ed|y)?" is a where you'd want the
(?:...) syntax - but I'm referring to matching individual characters.
Cheers,
James
--
It is better never to have been born. But who among us has such luck?
One in a million, perhaps.
------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).
Support MailScanner development - buy the book off the website!
[ Part 2, Application/PGP-SIGNATURE 196bytes. ]
[ Unable to print this part. ]
More information about the MailScanner
mailing list