Bayes auto learn

James R. Stevens jstevens at ATHENSDISTRIBUTING.COM
Mon Jun 21 21:09:52 IST 2004


Hello everyone,
 
I'm trying to setup an account on my Mail Gateway where missed spam can
be forwarded and sa-learn can be run to teach Bayes. Problem is
forwarding the messages MINUS my DOMAIN headers (As not to blacklist
myself or Domain) or setup the scipt to parse the message and ignore my
Domain headers. (What the below script should do)
 
I pulled this of spamassassin.org but there is no logging and It seems
not to be working. Anyone have some clues.
I used the script below and altered $domain = athensdistributing.com and
$working_dir = /var/spool/mail/spamaccount
#!/usr/bin/perl -w
#
# quick and dirty script to take forwarded ham and spam in mbox format
and remove
# our own users email and then feed emails to sa-learn.
# Its setup to work on a debain testing box, you will need to change the
directories
# and perhaps the paths. It mostly uses system rather than perl
#
use strict;
 
# change these settings for your setup
my $domain = 'comnet';
my $working_dir = '/home/don/mbox_testing';
 
my @types = qw( spam ham );
 
foreach my $type (@types){
        my $real_mbox = "/var/spool/mail/$type";
        my $working_mbox = "$working_dir/$type-$domain";
        my $dir = "$working_dir/$type";
 
        #dont bother if there is no new mail:
        next unless (-s $real_mbox);
 
        # copy the ham and spam accounts mbox files and then zero the
origionals
        system "/bin/cp $real_mbox $working_mbox";
        #system "/bin/cp /dev/null $real_mbox";
 
        # get an approximate number of emails to process
        chomp( my $email_count = `grep -c '^From ' $working_mbox`);
        $email_count *= 2;
 
        # split the emails
        foreach my $i (0 .. ($email_count - 1)) {
                system "/usr/bin/formail +$i -1 -ds < $working_mbox >
$dir/$i.$type";
        }
 
        #get rid of emails that are from our users or empty files:
        opendir (SPAM, $dir)    or die "Can't open dir $dir: $!";
        while ( defined ( my $file = readdir SPAM) ) {
                next if $file =~ /^\.\.?$/;
                $file = "$dir/" . $file;
                next unless ((-f $file) && (-s $file)) ;
                chomp (my $from = `grep '^From ' $file`);
                #print "$file: $from\n" if $from =~ /$domain/;
                system "/bin/rm $file" if $from =~ /$domain/;
        }
        closedir (SPAM)                         or die "Can't close dir
$dir: $!";
 
        # process mails further (not added)
        # see man formail for what you can do here possibilities
include:
        # -I To: to ignore to headers
        # -I Date: to ignore date headers
        # -U Received: to ignore all but the first recieved header
        # -I "" to ignore headers alltogether
 
        # teach sa
        system "/usr/bin/sa-learn --$type --mbox $dir/*.$type";
 
        # cleanup
        system "/bin/rm $dir/*.spam";
        system "/bin/rm $working_mbox";
}

-------------------------- MailScanner list ----------------------
To leave, send    leave mailscanner    to jiscmail at jiscmail.ac.uk
Before posting, please see the Most Asked Questions at
http://www.mailscanner.biz/maq/     and the archives at
http://www.jiscmail.ac.uk/lists/mailscanner.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20040621/96c3638d/attachment.html


More information about the MailScanner mailing list