multiple garbage words/bayes

Kevin Spicer kevins at BMRB.CO.UK
Mon Jan 26 19:03:28 GMT 2004

On Mon, 2004-01-26 at 18:46, Dustin Baer wrote:

> body   MULTI_WORD /\w{4,} \w{4,} \w{4,} \w{4,} \w{4,} \w{4,} \w{4,}
> \w{4,} \w{4,} \w{4,} \w{4,} \w{4,} \w{4,} \w{4,} \w{4,} \w{4,} \w{4,}
> \w{4,} \w{4,} \w{4,} \w{4,} \w{4,} \w{4,} \w{4,} \w{4,} \w{4,} \w{4,}
> \w{4,} \w{4,} \w{4,}/i
> describe MULTI_WORD A lot of 4-letter words, with no punctuation
> score MULTI_WORD 0.1
> Since I am not a Perl master, can anyone suggest an easier way to write
> it?
Nice idea I think.

I'm not a perl master either, but I'd suggest...

/(\w{4,} ){30,}/

(the trailing i is not required since \w matches upper and lower case

You might further allow different numbers of spaces/ tabs etc.  It might
also be worthwhile to disable capturing of the parenthesized part of the
expression (if memory serves this may make it faster)...


