Which messages to feed to Bayes?
Eric Dantan Rzewnicki
rzewnickie at RFA.ORG
Fri Feb 27 21:01:32 GMT 2004
On Fri, Feb 27, 2004 at 02:23:15PM -0500, Eric Dantan Rzewnicki wrote:
> On Thu, Feb 26, 2004 at 03:47:45PM -0800, Michael St. Laurent wrote:
> > Matt Kettler <mailto:mkettler at EVI-INC.COM> wrote:
> > > At 04:55 PM 2/26/2004, Michael St. Laurent wrote:
> > >> Should we be feeding the Bayes engine in Spamassassin messages that
> > > My answer is a very emphatic YES!
> > Excellent. Okay, what about spam messages that have lost their headers
> > becuase the user forwarded it to me (Outlook strips the headers when you do
> > that). Will it still benefit from looking at just the body of the message?
For what it's worth, below is a cleaned up version of what I just
posted. I just ran it, and it seems to work ok. In previous tests and
just now an instance of grepmail will occassionally exit in the midst of
the for loop with a SIGSEGV:
xargs: grepmail: terminated by signal 11
the script continues to the next run through the loop. Again, I'm not
entirely sure what I'm doing and welcome any constructive criticism.
#!/bin/bash
spam_reports=/home/rzewnickie/SPAM-USER-FEEDBACK
archive=/var/spool/MailScanner/archive
# many of these are protected by quoting and may not need to
# be here. But, I'm not sure which ...
special_chars="\041-\055\072-\077\133-\140\173-\177"
subjects=`grep -A6 "^-----Original Message-----$" $spam_reports |\
grep "^Subject: " | sort | uniq`
echo "$subjects"
for mbox in `ls $archive/20*`; do
echo "$subjects" |\
perl -pi -e "s/[$special_chars]/./g" |\
xargs --replace grepmail -u -h "^{}" $mbox >> /tmp/spam ;
done
More information about the MailScanner
mailing list