Which messages to feed to Bayes?

Eric Dantan Rzewnicki rzewnickie at RFA.ORG
Fri Feb 27 21:01:32 GMT 2004


On Fri, Feb 27, 2004 at 02:23:15PM -0500, Eric Dantan Rzewnicki wrote:
> On Thu, Feb 26, 2004 at 03:47:45PM -0800, Michael St. Laurent wrote:
> > Matt Kettler <mailto:mkettler at EVI-INC.COM> wrote:
> > > At 04:55 PM 2/26/2004, Michael St. Laurent wrote:
> > >> Should we be feeding the Bayes engine in Spamassassin messages that
> > > My answer is a very emphatic YES!
> > Excellent.  Okay, what about spam messages that have lost their headers
> > becuase the user forwarded it to me (Outlook strips the headers when you do
> > that).  Will it still benefit from looking at just the body of the message?

For what it's worth, below is a cleaned up version of what I just
posted. I just ran it, and it seems to work ok. In previous tests and
just now an instance of grepmail will occassionally exit in the midst of
the for loop with a SIGSEGV:

xargs: grepmail: terminated by signal 11

the script continues to the next run through the loop. Again, I'm not
entirely sure what I'm doing and welcome any constructive criticism.


#!/bin/bash

spam_reports=/home/rzewnickie/SPAM-USER-FEEDBACK
archive=/var/spool/MailScanner/archive

# many of these are protected by quoting and may not need to
# be here. But, I'm not sure which ...
special_chars="\041-\055\072-\077\133-\140\173-\177"

subjects=`grep -A6 "^-----Original Message-----$" $spam_reports |\
    grep "^Subject: " | sort | uniq`

echo "$subjects"

for mbox in `ls $archive/20*`; do
    echo "$subjects" |\
    perl -pi -e "s/[$special_chars]/./g" |\
    xargs --replace grepmail -u -h "^{}" $mbox >> /tmp/spam ;
done



More information about the MailScanner mailing list