search engine use in rule testing.
James Gray
james at grayonline.id.au
Tue Jul 19 04:39:32 IST 2005
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "US-ASCII" character set. ]
[ Some characters may be displayed incorrectly. ]
On Tue, 19 Jul 2005 04:14 am, Matt Kettler wrote:
> James Gray wrote:
> > On Sat, 16 Jul 2005 03:45 am, Matt Kettler wrote:
> >>(Note: the suggestion of using a web search is very powerful indeed.
> >>Gives you a very quick idea of things that could possibly match a
> >>word/phrase of interest.)
> >
> > I agree. I've also made a little perl script that allows you to enter a
> > regex (cut-and-paste from SA rules etc) then it runs it against any
> > dictionaries you have installed on your system (/usr/share/dict/... etc).
> > I can make it available to the list if anyone is interested.
>
> That's quite interesting too. It serves a different purpose (testing a
> regex for unwanted matched words) than what I use a search engine for
> (testing a word/phrase for nonspam usage), but it's a damn useful thing to
> do.
>
> I'd be interested in that script for rule testing..
Ok, I received a number of private mails requesting this script so I figured
I'd make it available for everyone. It's provided "as-is" and is probably a
pretty poor example of perl ;) I've hacked in a second dictionary in case
people want see how to add additional languages to the script - if you need
to check more than about 3 languages, put the dictionary files into an array
and walk that in one step (it's neater - see the code for what I mean).
Basically you just run it (it doesn't take command line args), follow the
instructions and read the results. Here's a sample interaction:
$./regex_test.pl
This program takes a Perl REGEX and does a case insensitive
check against an arbitrary string you specify (spam string)
It will then search the standard dictionary for possible
matches.
-----------------------------------------------------------
ASSUMPTIONS:
- REGEX delimiter '/' so escape any fwd slashes!
eg, ".+\/foo\/.+" (without the "")is a sample
of a valid regex for this tool
-----------------------------------------------------------
Enter the Perl REGEX (req'd): p[e3]n[i1][s5]
Enter the spam string (req'd): p3ni5
Enter an (optional) e-mail message or file to test: test1.eml
REGEX matches p3ni5
Dictonary Search:
(Ideally this should return as few as possible)
Searching /usr/share/dict/british-english-large dictionary:
penis
penis's
penises
3 dictionary matches - /usr/share/dict/british-english-large
Searching /usr/share/dict/american-english-large dictionary:
penis
penis's
penises
3 dictionary matches - /usr/share/dict/american-english-large
Testing test1.eml:
13: PENIS HAD ITS OWN OPINION ON THIS QUESTION.<BR>
1 file matches
Feel free to modify the script - it's GPL'ed and all my code (dunno if I
should actually admit that or not!) :) I hope people find it useful.
Cheers,
James
--
A professor is one who talks in someone else's sleep.
------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).
Support MailScanner development - buy the book off the website!
[ Part 2, Application/X-PERL 3.6KB. ]
[ Unable to print this part. ]
More information about the MailScanner
mailing list