Detecting grids of drug names
Matt Kettler
mkettler at EVI-INC.COM
Mon Nov 14 21:06:21 GMT 2005
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "US-ASCII" character set. ]
[ Some characters may be displayed incorrectly. ]
Julian Field wrote:
> I have produced a rule which detects grids of letters. They are using a
> table trick to rotate the words by 90 degrees so the letters of the
> first column all come first, followed by all the letters of the second
> column and so on. This stops you detecting words with HTML junk in
> between the letters.
>
> But I can now detect these grids:
>
> rawbody JKF_DRUG_GRID1 /(\>([[:alpha:]]\s){4}[[:alpha:]].*){4}\>/i
> describe JKF_DRUG_GRID1 Grid of letters rotated to produce drug names
> score JKF_DRUG_GRID1 4.5
>
> This detects grids of at least 4x4 characters, which is small enough to
> detect drug names.
> The first "4" sets the minimum number of rows in the grid, the second
> "4" sets the minimum number of columns.
>
> Quite succinct once you work out what you are looking for :-)
> All improvements and comments are most welcome.
>
Julian, I had a similar to a concept on Friday..
Mine work a bit differently, these look for a specific drug name in the
post-htm-stripped text. Thus far it works quite well, but I've got the scores
low as I'm testing them still.
See attached.
------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).
Support MailScanner development - buy the book off the website!
[ Part 2: "Attached Text" ]
body L_COLUMN_VIAG /\bv(?:\s\w){4,6}\si(?:\s\w){4,6}\sa(?:\s\w){4,6}\sg(?:\s\w){4,6}\sr(?:\s\w){4,6}\sa\b/i
describe L_COLUMN_VIAG looks like a column-obfuscated v-pill ad
score L_COLUMN_VIAG 0.5
body L_COLUMN_XAN /\bX(?:\s\w){4,6}\sA(?:\s\w){4,6}\sN(?:\s\w){4,6}\sA(?:\s\w){4,6}\sX\b/i
describe L_COLUMN_XAN looks like a column-obfuscated x-pill ad
score L_COLUMN_XAN 0.5
body L_COLUMN_CIA /\bC(?:\s\w){4,6}\sI(?:\s\w){4,6}\sA(?:\s\w){4,6}\sL(?:\s\w){4,6}\sI(?:\s\w){4,6}\sS\b/i
describe L_COLUMN_CIA looks like a column-obfuscated C-pill ad
score L_COLUMN_CIA 0.5
body L_COLUMN_VAL /\bV(?:\s\w){4,6}\sA(?:\s\w){4,6}\sL(?:\s\w){4,6}\sI(?:\s\w){4,6}\sU(?:\s\w){4,6}\sM\b/i
describe L_COLUMN_VAL looks like a column-obfuscated val-pill ad
score L_COLUMN_VAL 0.5
------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).
Support MailScanner development - buy the book off the website!
More information about the MailScanner
mailing list