search engine use in rule testing.

James Gray james at grayonline.id.au
Tue Jul 19 04:39:32 IST 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "US-ASCII" character set.  ]
    [ Some characters may be displayed incorrectly. ]

On Tue, 19 Jul 2005 04:14 am, Matt Kettler wrote:
> James Gray wrote:
> > On Sat, 16 Jul 2005 03:45 am, Matt Kettler wrote:
> >>(Note: the suggestion of using a web search is very powerful indeed.
> >>Gives you a very quick idea of things that could possibly match a
> >>word/phrase of interest.)
> >
> > I agree.  I've also made a little perl script that allows you to enter a
> > regex (cut-and-paste from SA rules etc) then it runs it against any
> > dictionaries you have installed on your system (/usr/share/dict/... etc).
> > I can make it available to the list if anyone is interested.
>
> That's quite interesting too. It serves a different purpose (testing a
> regex for unwanted matched words) than what I use a search engine for
> (testing a word/phrase for nonspam usage), but it's a damn useful thing to
> do.
>
> I'd be interested in that script for rule testing..

Ok, I received a number of private mails requesting this script so I figured 
I'd make it available for everyone.  It's provided "as-is" and is probably a 
pretty poor example of perl ;)  I've hacked in a second dictionary in case 
people want see how to add additional languages to the script - if you need 
to check more than about 3 languages, put the dictionary files into an array 
and walk that in one step (it's neater - see the code for what I mean).

Basically you just run it (it doesn't take command line args), follow the 
instructions and read the results.  Here's a sample interaction:

$./regex_test.pl

This program takes a Perl REGEX and does a case insensitive
check against an arbitrary string you specify (spam string)
It will then search the standard dictionary for possible
matches.
-----------------------------------------------------------
ASSUMPTIONS:
- REGEX delimiter '/' so escape any fwd slashes!
  eg, ".+\/foo\/.+" (without the "")is a sample
  of a valid regex for this tool
-----------------------------------------------------------

Enter the Perl REGEX (req'd): p[e3]n[i1][s5]
Enter the spam string (req'd): p3ni5
Enter an (optional) e-mail message or file to test: test1.eml

REGEX matches p3ni5

Dictonary Search:
(Ideally this should return as few as possible)
Searching /usr/share/dict/british-english-large dictionary:
penis
penis's
penises
3 dictionary matches - /usr/share/dict/british-english-large

Searching /usr/share/dict/american-english-large dictionary:
penis
penis's
penises
3 dictionary matches - /usr/share/dict/american-english-large

Testing test1.eml:
13: PENIS HAD ITS OWN OPINION ON THIS QUESTION.<BR>
1 file matches

Feel free to modify the script - it's GPL'ed and all my code (dunno if I 
should actually admit that or not!) :)  I hope people find it useful.

Cheers,

James
-- 
A professor is one who talks in someone else's sleep.

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!

    [ Part 2, Application/X-PERL  3.6KB. ]
    [ Unable to print this part. ]




More information about the MailScanner mailing list