Spamassasin auto-learning

Nicolas Viers - SCI viers at UNILIM.FR
Tue Apr 6 14:25:22 IST 2004


                                 Hi,
i use MS+SA with bayes enable.
I use too spamlearn script (in attach) to force SA auto-learning.
But i receive the same message more than one time before "spamlearn" 
without filtering
by SA.
Is it the only thing to force SA to learn spam message  or not ?

Thanks a lot



____________________________________________________________

Nicolas Viers               |  Service Commun Informatique
Mél: viers at unilim.fr        |  123, avenue Albert Thomas
                             |     87060 Limoges cedex
Tel: 05-55-45-77-09         |  Fax: 05-55-45-75-95
                   http://www.unilim.fr/sci
____________________________________________________________
-------------- next part --------------
#!/bin/bash

# This script takes a mail file full of SPAM and sa-learns it for you.
# sa-learn apparently will not split the mails apart to learn them. this
# script splits the mails in the mail file apart, runs them thru
# spamassassin -d to remove the markup, and feeds them to sa-learn.

# Specify the file on the command line, or change it here:
# this is the file with the spam you need to sa-learn
spamfile='/var/spool/mail/spam'

# Override if you've specified one on the command line
if [[ "$1!" != "!" ]]; then spamfile=$1; fi

# Temp directory:
tmpdr="/tmp/"

if ( ! [ -r $spamfile ] ) ; then echo "Can't read $spamfile ... does it exist?"
exit ; fi

echo "Learning SPAM in $spamfile . . ."

# Let's copy your file, so if it is changed while we're working with it,
# we're ok. (TODO: implement locking?)
spamrnd="${tmpdr}spam${RANDOM}"
cp $spamfile $spamrnd
spamfile=$spamrnd

# this is a temporary file used for processing
tmpfile="${tmpdr}tmp${RANDOM}"

# this is the regular expression I stole from grepmail
# tmpfile will have a list of the line numbers that start new emails:
# CREDIT: Written by David Coppit (david at coppit.org, http://coppit.org/)
grep --extended-regexp --line-number "^(X-Draft-From: .*|X-From-Line: .*|From [^:]+(:[0-9][0-9]){1,2} ([A-Z]{2,3} [0-9]{4}|[0-9]{4}
[+-][0-9]{4}|[0-9]{4})( remote from .*)?)\$" $spamfile | sed "s/:.*//" > $tmpfile

# nummails will have the number of emails:
nummails=`grep -c . $tmpfile`

echo "$nummails message(s) . . ."

# now we can seperate out the emails and work on them.

for ((x=1; x<nummails; x++)); do

linea=`awk -v a=$x -- '{ if (FNR == a) print }' < $tmpfile`
lineb=`awk -v a=$((x+1)) -- '{ if (FNR == a) print }' < $tmpfile`
awk -v a=$linea -v b=$lineb -- '{ if ((FNR>=a)&&(FNR<b)) print }' < $spamfile | spamassassin -d | sa-learn --spam ; done

linea=`awk -v a=$x -- '{ if (FNR == a) print }' < $tmpfile`
awk -v a=$linea -- '{ if (FNR>=a) print }' < $spamfile | spamassassin -d | sa-learn --spam

rm -f $tmpfile
rm -f $spamfile


More information about the MailScanner mailing list