Training spamassassin Bayes

Casey T. Deccio casey at deccio.net
Wed Aug 16 05:53:06 IST 2006


I'm using a Debian system with 
Exim4/MailScanner/Spamassassin/Courier-imap.  Using the default 
Spamassassin settings (including auto-learn), about half of the SPAM 
emails were incorrectly classified as ham.  I recently created a script 
(see below) to run daily as a cron job, but the Spam classification has 
only gotten worse since then.  Any ideas?

Thanks,
Casey

#!/bin/sh

SALEARN=/usr/bin/sa-learn
PREFS=/etc/MailScanner/spam.assassin.prefs.conf
JUNK=.Junk
HAM=`mktemp`
SPAM=`mktemp`
OLDSPAM=`mktemp`

if [ ! -x $SALEARN ]; then
     exit 1
fi

# Learn HAM
for dir in /home/*/Maildir/cur; do
     find $dir -type f -daystart -ctime -7 -ctime +0 >> $HAM
done
[ -s $HAM ] && sa-learn -p $PREFS --ham -f $HAM
rm $HAM

# Learn SPAM
for dir in /home/*/Maildir/$JUNK/{cur,new}; do
     find $dir -type f -daystart -ctime +0 >> $SPAM

     # Delete old spam (a week or older)
     find $dir -type f -daystart -ctime +6 >> $OLDSPAM
done
[ -s $SPAM ] && sa-learn -p $PREFS --spam -f $SPAM
rm $SPAM
[ -s $OLDSPAM ] && xargs rm < $OLDSPAM
rm $OLDSPAM



More information about the MailScanner mailing list