Which spam stats package?

Denis Beauchemin Denis.Beauchemin at USherbrooke.ca
Fri Oct 13 16:10:18 IST 2006


Paul Welsh a écrit :
> I'm looking for a package to report on spam stats.  In particular I'd like
> to know which spamassassin categories have had most hits.
>
> I know of Vispan (http://www.while.org.uk/mailstats/), MailWatch
> (http://mailwatch.sourceforge.net/doku.php) and mailscanner-mrtg
> (http://sourceforge.net/projects/mailscannermrtg).
>
> Any others?  Recommendations?
>
>   
Paul,

I wrote the following Perl script.  Might be useful for you too.

You will have to modify the search strings at the beginning of the 
script to your English equivalent (or whatever language you use).

Denis

-- 
   _
  °v°   Denis Beauchemin, analyste
 /(_)\  Université de Sherbrooke, S.T.I.
  ^ ^   T: 819.821.8000x62252 F: 819.821.8045

-------------- next part --------------
#!/usr/bin/perl -w
#
# Script that looks through maillog to find all messages tagged as spam
# by MailScanner.  It then tallies the different SpamAssassin rules that
# fired.
#
# $Id: sa-hits,v 1.12 2006/10/11 12:28:37 bead2306 Exp $
#
# $Log: sa-hits,v $
# Revision 1.12  2006/10/11 12:28:37  bead2306
# Mod pour requis avec decimales
#
# Revision 1.11  2006/10/02 21:23:58  bead2306
# Accelerated by using grep
#
# Revision 1.9  2006/10/02 20:12:50  bead2306
# Added cache-related strings
#
# Revision 1.8  2006/07/11 18:59:01  bead2306
# Modifications pour linback3
#
# Revision 1.7  2005/05/24 14:49:14  bead2306
# Small correction for last hit on the line
#
# Revision 1.6  2005/05/16 20:29:51  bead2306
# Accepts any number of file names to process on the command line.
# They don't have to be preceded with --log.
#
# Revision 1.5  2005/05/16 20:01:26  bead2306
# Can now work with compressed input files.
#
# Revision 1.4  2005/05/16 19:52:37  bead2306
# Added --log option to use an alternate maillog
#
# Revision 1.3  2005/05/16 18:45:23  bead2306
# Added Id and Log tags
#
#
# Denis Beauchemin, 20050516

use Getopt::Long;

my $isSpamString  = "est un polluriel, SpamAssassin";
my $scoreString   = "score=";
my $reqdString    = "requis ";
my $autoString    = "autolearn=spam";
my $cachedString  = "cached, ";
my $nCachedString = "not cached, ";

my $maillog = "/var/log/maillog";
$maillog .= "/maillog" if ( `uname -n` =~ /^linback2?$/);
@maillogs = ();

my $sortByName = 0;
my $sortByHits = 0;
my $help = 0;

GetOptions(
    'sortbyname|byname' => \$sortByName,
    'sortbyhits|byhits' => \$sortByHits,
    'log=s' => \@maillogs,
    'help'  => \$help,
);

if ( $help ) {
    print '
This program tallies SpamAssassin\'s rules that were triggered when
an email was detected as spam by MailScanner.

By default it sorts the results by rule name. It can also sort them
by number of hits if called with --sortbyhits (or --byhits).

The option --sortbyname (or --byname) is the default one.

If you don\'t want to use the current maillog, specify a different
one with --log new-maillog.

All unknown command line parameters will be treated as additional
file names to process.
';
    exit;
}

push @maillogs, @ARGV;
@maillogs = ( $maillog ) if ( @maillogs  == 0 );
#print "Maillogs: @maillogs\n";

foreach my $maillog ( @maillogs ) {
    print "Processing $maillog...\n";

    $sortByName++ if ( ( $sortByName == 0 ) && ( $sortByHits == 0 ) );

    my $openCmd = "LANG=C /bin/grep \"$isSpamString\" $maillog |";
    if ( $maillog =~ /\.gz$/ ) {
        $openCmd = "gunzip -c $maillog | LANG=C /bin/grep \"$isSpamString\" |";
    }
    open LOG, "$openCmd" || die "Cannot open $maillog";

    while ( <LOG> ) {
        next unless /$isSpamString \((?:$cachedString|$nCachedString)$scoreString[\d.]+, $reqdString[\d.]+,(?: $autoString,)?(.*)$/;
        my $hits = $1;
        foreach my $hit ( $hits =~ / ([^\s]+) [\d.]+(?:,|\))/g ) {
            $hit{$hit}++;
        }
    }

    close LOG;
}

if ( $sortByName ) {
    foreach my $hit ( sort keys %hit ) {
        printf "%23s %5d\n", $hit, $hit{$hit};
    }
} elsif ( $sortByHits ) {
    foreach my $hit ( sort {$hit{$b}<=>$hit{$a}} keys %hit ) {
        printf "%23s %5d\n", $hit, $hit{$hit};
    }
}


More information about the MailScanner mailing list