Detecting grids of drug names

Matt Kettler mkettler at EVI-INC.COM
Mon Nov 14 21:06:21 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "US-ASCII" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Julian Field wrote:
> I have produced a rule which detects grids of letters. They are using a
> table trick to rotate the words by 90 degrees so the letters of the
> first column all come first, followed by all the letters of the second
> column and so on. This stops you detecting words with HTML junk in
> between the letters.
> 
> But I can now detect these grids:
> 
> rawbody  JKF_DRUG_GRID1 /(\>([[:alpha:]]\s){4}[[:alpha:]].*){4}\>/i
> describe JKF_DRUG_GRID1 Grid of letters rotated to produce drug names
> score    JKF_DRUG_GRID1 4.5
> 
> This detects grids of at least 4x4 characters, which is small enough to
> detect drug names.
> The first "4" sets the minimum number of rows in the grid, the second
> "4" sets the minimum number of columns.
> 
> Quite succinct once you work out what you are looking for :-)
> All improvements and comments are most welcome.
> 


Julian, I had a similar to a concept on Friday..

Mine work a bit differently, these look for a specific drug name in the
post-htm-stripped text. Thus far it works quite well, but I've got the scores
low as I'm testing them still.

See attached.

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!

    [ Part 2: "Attached Text" ]

body L_COLUMN_VIAG      /\bv(?:\s\w){4,6}\si(?:\s\w){4,6}\sa(?:\s\w){4,6}\sg(?:\s\w){4,6}\sr(?:\s\w){4,6}\sa\b/i
describe L_COLUMN_VIAG looks like a column-obfuscated v-pill ad
score L_COLUMN_VIAG     0.5

body L_COLUMN_XAN      /\bX(?:\s\w){4,6}\sA(?:\s\w){4,6}\sN(?:\s\w){4,6}\sA(?:\s\w){4,6}\sX\b/i
describe L_COLUMN_XAN looks like a column-obfuscated x-pill ad
score L_COLUMN_XAN     0.5

body L_COLUMN_CIA      /\bC(?:\s\w){4,6}\sI(?:\s\w){4,6}\sA(?:\s\w){4,6}\sL(?:\s\w){4,6}\sI(?:\s\w){4,6}\sS\b/i
describe L_COLUMN_CIA looks like a column-obfuscated C-pill ad
score L_COLUMN_CIA     0.5

body L_COLUMN_VAL      /\bV(?:\s\w){4,6}\sA(?:\s\w){4,6}\sL(?:\s\w){4,6}\sI(?:\s\w){4,6}\sU(?:\s\w){4,6}\sM\b/i
describe L_COLUMN_VAL looks like a column-obfuscated val-pill ad
score L_COLUMN_VAL    0.5

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!


More information about the MailScanner mailing list