postfix-specific method to feed spam/ham to sa-learn

Neil Robst neilrobst at ALM.ORG.UK
Fri Jan 23 09:32:52 GMT 2004


On Thu, 2004-01-22 at 22:04, Eric Dantan Rzewnicki wrote:
> I worked out one (naive?) way to get user feedback on false positives
> and false negatives using postfix and MS when users are only using pop
> and outlook. I'm not going to be using this method, but wanted to share
> it in case someone else needed to do it this way. (and also to get
> feedback on whether it's even a valid approach). Probably someone better
> than me at shell scripting and regular expressions could do this better
> and more succinctly.
>
> I set this in MailScanner.conf:
> Archive Mail = /var/spool/MailScanner/archive/
>
> Which creates a directory for each day containing a copy of the original
> queuefile for each message. I took this approach because it seemed it
> would be very easy to set up a simple cronjob that deleted directories
> older than x days.
>
> Basically the only feedback required from the user is to send the
> headers to either a spam at dom.tld or notspam at dom.tld.
>
> To get to the full headers in Outlook:
> 1) open the message in its own window
> 2) select View -> Options
> 3) the dialog box contains a scroll window at the bottom labeled
>    "Internet Headers". Cut and paste the text from there into a new
>    message.
>
> Getting the headers this way has been deemed too much work for the
> users, so I'm working out another feedback method. But, anyway, with the
> headers in the bodies of messages in an mbox I was able to get the date
> and queuefile-id with these two command lines (broken with \):
Eric, you can also get the headers if you get the users to forward the
original mail as an attachement. In Outlook Express you can do this by
going to the Tools menu (I think) and selecting Forward as attachment...

> # message id
> cat /var/mail/spam | \
> formail -I "" -s | \
> grep -A2 "^Received:" | \
> grep "by host.dom.tld (Postfix) with .*SMTP" | \
> cut -d" " -f8
>
> # date directory
> cat /var/mail/spam | \
> formail -I "" -s | \
> grep -A3 "^Received:" | \
> grep "for <.*@dom.tld>; ..., .. ... ...." | \
> cut -d";" -f 2 | \
> xargs -i date -d {} +%Y%m%d
>
> Thus far, I've just been feeding that output in pairs into the simple
> script below. If I were going further with this approach I'd have added
> the commands above to the script.
>
> #!/bin/bash
>
> archive_dir=/var/spool/MailScanner/archive
> sa_prefs=/opt/MailScanner/etc/spam.assassin.prefs.conf
> date_dir=$1
> queue_file=$2
> spam_or_ham=$3
> queue_file_path=$archive_dir/$date_dir/$queue_file
> line_count=`postcat $queue_file_path | wc -l`
> postcat $queue_file_path | \
> tail -$(($line_count-6)) | \
> head -$(($line_count-10)) | \
> sa-learn --$spam_or_ham -p $sa_prefs
>
>
> Well, that's it. Hopefully it's useful to someone. Appologies if this is
> silly or useless.
>
> -Eric Rz.
> (now to figure this out with forwarded fp/fn and archive as an mbox ...
> should be doable.)

Regards,
Neil



More information about the MailScanner mailing list