New Plugin for SpamAssassin

Daniel Kleinsinger danielk at AVALONPUB.COM
Tue Apr 6 19:48:44 IST 2004


For the last few days I've been using a new plugin to SpamAssassin I've
found, SpamCopURI (http://sourceforge.net/projects/spamcopuri/).  It
adds points to spam based on the list of spamvertised sites on
www.spamcop.net.  Older versions relied on local caching, the newest
version incorporates a DNSBL (http://www.surbl.org/).  So the plugin
(*not quite sure how it works, there's info on the websites about
dealing with randomized domain names and such) extracts the URIs from an
email and then checks them against the RBL. The RBL is populated as
follows (from the surbl.org website): "Scripts which power the database
and SURBL creation grab data from SpamCop's "Spamvertised Web Sites"
(http://www.spamcop.net/w3m?action=inprogress&type=www) web page every
couple minutes or so, then merge new entries and expire the data so that
it's never more than 4 days old."  I think of it as a BigEvil-type RBL.
Apparently it currently has about 400 records.

The installation is very simple.  It is a patch to SA 2.63.  Basically,
it copies a few files over the SA 2.63 versions of them and you add a
rules file to local.cf or /etc/mail/spamassasssin or wherever.  The
default rules score using the local cache method, you should disable
those rules (score 0, more info in quoted email below) and score the URI
rule appropriately (I score mine 3, same as bigevil).  I don't really
have a way of gathering statistics for this, but it seems likely that
there is a certain amount of overlap between RCVD_IN_BL_SPAMCOP_NET and
SPAMCOP_URI_RBL because they both gather their info from the same group
of emails.  But what they look for is pretty different and I get good
results from SpamCop anyway.  For what it's worth, glancing through my
logs I see quite a few emails that hit SPAMCOP_URI_RBL, but don't hit
any other blacklists.

I've found the plugin to be very effective.  It's been my forth most
effective rule and I haven't seen any false positives.  These are the
hitrates for the top positive SA rules on my smallish mailserver (~4000
email/day, 50-60% spam).  One thing I was surprised at was how effective
Bayes has become.  Judging from my results, if at all possible for your
config, everyone should be using Bayes.

rule                    spam hitrate
BAYES_99                0.91598
DCC_CHECK               0.61738
RAZOR2_CHECK            0.45551
SPAMCOP_URI_RBL         0.37685
RCVD_IN_BL_SPAMCOP_NET  0.36902
RCVD_IN_SORBS           0.30930
1 or more BIGEVIL       0.28707
RCVD_IN_SPAMHAUS_XBL    0.25700
RCVD_IN_DSBL            0.20964
RCVD_IN_DYNABLOCK       0.19646
RCVD_IN_NJABL           0.16928
RCVD_IN_SBL             0.16310


Here's the announcement I saw, it has more info:

I am pleased to announce that Eric Kolve has added SURBL support
to his SpamAssassin 2.63 plugin called SpamCopURI:

  http://sourceforge.net/projects/spamcopuri/

In order to use the new RBL method, please comment out the the
previous tests SPAMCOP_URI and SPAMCOP_URI_HOST and increase
the score for the new test up to something like 2.5:

  score SPAMCOP_URI_RBL  2.5

in the spamcop_uri.cf file.  Values higher than 2.5 may be
appropriate because the test is a highly accurate indicator
of spam, for some of the reasons mentioned at the SURBL site:

  http://www.surbl.org/

Note that unlike URIDNSBL, we are comparing *domains* found in
message bodies to *domains* in SURBL (aka a name or RHSBL), rather
than resolving the names into IP addresses (representing the spam
web site's hosting server) and comparing those addresses to a
number-based RBL.

We consider this a direct approach to the problem of URIs
advertised in spam, and we're confident that the URI data
we are getting from SpamCop and scoring based on report
counts are very useful and relevant.  More information about
the data SURBL is built on can be found at:

  http://spamcheck.freeapp.net/

Cheers,

Jeff C.
--
Jeff Chan
mailto:jeffc at surbl.org-nospam
http://www.surbl.org/



More information about the MailScanner mailing list