Training spamassassin Bayes
Casey T. Deccio
casey at deccio.net
Wed Aug 16 05:53:06 IST 2006
I'm using a Debian system with
Exim4/MailScanner/Spamassassin/Courier-imap. Using the default
Spamassassin settings (including auto-learn), about half of the SPAM
emails were incorrectly classified as ham. I recently created a script
(see below) to run daily as a cron job, but the Spam classification has
only gotten worse since then. Any ideas?
Thanks,
Casey
#!/bin/sh
SALEARN=/usr/bin/sa-learn
PREFS=/etc/MailScanner/spam.assassin.prefs.conf
JUNK=.Junk
HAM=`mktemp`
SPAM=`mktemp`
OLDSPAM=`mktemp`
if [ ! -x $SALEARN ]; then
exit 1
fi
# Learn HAM
for dir in /home/*/Maildir/cur; do
find $dir -type f -daystart -ctime -7 -ctime +0 >> $HAM
done
[ -s $HAM ] && sa-learn -p $PREFS --ham -f $HAM
rm $HAM
# Learn SPAM
for dir in /home/*/Maildir/$JUNK/{cur,new}; do
find $dir -type f -daystart -ctime +0 >> $SPAM
# Delete old spam (a week or older)
find $dir -type f -daystart -ctime +6 >> $OLDSPAM
done
[ -s $SPAM ] && sa-learn -p $PREFS --spam -f $SPAM
rm $SPAM
[ -s $OLDSPAM ] && xargs rm < $OLDSPAM
rm $OLDSPAM
More information about the MailScanner
mailing list