Spamassasin auto-learning
Nicolas Viers - SCI
viers at UNILIM.FR
Tue Apr 6 14:25:22 IST 2004
Hi,
i use MS+SA with bayes enable.
I use too spamlearn script (in attach) to force SA auto-learning.
But i receive the same message more than one time before "spamlearn"
without filtering
by SA.
Is it the only thing to force SA to learn spam message or not ?
Thanks a lot
____________________________________________________________
Nicolas Viers | Service Commun Informatique
Mél: viers at unilim.fr | 123, avenue Albert Thomas
| 87060 Limoges cedex
Tel: 05-55-45-77-09 | Fax: 05-55-45-75-95
http://www.unilim.fr/sci
____________________________________________________________
-------------- next part --------------
#!/bin/bash
# This script takes a mail file full of SPAM and sa-learns it for you.
# sa-learn apparently will not split the mails apart to learn them. this
# script splits the mails in the mail file apart, runs them thru
# spamassassin -d to remove the markup, and feeds them to sa-learn.
# Specify the file on the command line, or change it here:
# this is the file with the spam you need to sa-learn
spamfile='/var/spool/mail/spam'
# Override if you've specified one on the command line
if [[ "$1!" != "!" ]]; then spamfile=$1; fi
# Temp directory:
tmpdr="/tmp/"
if ( ! [ -r $spamfile ] ) ; then echo "Can't read $spamfile ... does it exist?"
exit ; fi
echo "Learning SPAM in $spamfile . . ."
# Let's copy your file, so if it is changed while we're working with it,
# we're ok. (TODO: implement locking?)
spamrnd="${tmpdr}spam${RANDOM}"
cp $spamfile $spamrnd
spamfile=$spamrnd
# this is a temporary file used for processing
tmpfile="${tmpdr}tmp${RANDOM}"
# this is the regular expression I stole from grepmail
# tmpfile will have a list of the line numbers that start new emails:
# CREDIT: Written by David Coppit (david at coppit.org, http://coppit.org/)
grep --extended-regexp --line-number "^(X-Draft-From: .*|X-From-Line: .*|From [^:]+(:[0-9][0-9]){1,2} ([A-Z]{2,3} [0-9]{4}|[0-9]{4}
[+-][0-9]{4}|[0-9]{4})( remote from .*)?)\$" $spamfile | sed "s/:.*//" > $tmpfile
# nummails will have the number of emails:
nummails=`grep -c . $tmpfile`
echo "$nummails message(s) . . ."
# now we can seperate out the emails and work on them.
for ((x=1; x<nummails; x++)); do
linea=`awk -v a=$x -- '{ if (FNR == a) print }' < $tmpfile`
lineb=`awk -v a=$((x+1)) -- '{ if (FNR == a) print }' < $tmpfile`
awk -v a=$linea -v b=$lineb -- '{ if ((FNR>=a)&&(FNR<b)) print }' < $spamfile | spamassassin -d | sa-learn --spam ; done
linea=`awk -v a=$x -- '{ if (FNR == a) print }' < $tmpfile`
awk -v a=$linea -- '{ if (FNR>=a) print }' < $spamfile | spamassassin -d | sa-learn --spam
rm -f $tmpfile
rm -f $spamfile
More information about the MailScanner
mailing list