News from the cesspoll (URIBL)

Matt Kettler mkettler at evi-inc.com
Fri Apr 28 19:51:10 IST 2006


Julian Field wrote:
> Would people like me to add the Black and Grey lists to the
> spam.assassin.prefs.conf file I ship with MailScanner?
> 
> My easy-to-install ClamAV+SpamAssassin package enables the URIBL plugin
> for you during the installation, so most people will have the necessary
> stuff done in v310.pre already.
> 
> The scores I use (very successfully) are
> Black: 3.0
> Grey: 0.25
> 
> Your thoughts please folks!

I personally find URIBL to have a higher FP rate than the other uridnsbls at
surbl. While it's a valuable list, I really don't think it is safe to score it
3.0. I've had EXTENSIVE problems with URIBL and other SURBL lists overlapping in
false positives.

I really don't think URIBL is worth including at this time, but if you do, hack
the default score back a bit. I'm using 1.5, with an additional hack-back to 0.5
if it overlaps with the SURBL rules.

A lot of this was covered scattered across the giant near-flame-war thread I
started on spamassassin-users a while ago ago titled "Over-scoring of SURBL
lists". After the dust settled the following facts remained:

1) URIBL seems to have a surprising number FPs in common with WS and OB at my
site. They officially deny any common sources of input. Regardless of
commonality of sources, the fact of common dual-listed FPs between these at my
site is real. I rarely have a URIBL_BLACK FP that isn't also listed in WS or OB.
Fortunately FP's aren't that common, but when they happen they do more often
than not double-hit with another SURBL list, at least at my site.

2) Most URIBL.com (and SURBL) FPs tend to be of a commercial nature. Most seem
to be listings of mixed-use remailing and hosting providers that are used by
spammers and large companies alike. This is largely unsurprising.

3) It's quite rare for two surbl.org lists to over-lap on a FP. In my experience
if they do, it is most likely to be OB and WS. It almost never happens with SC,
but recently my overall hit-rate for SC is very low.

4) There are many spams which only match URIBL at time of delivery. URIBL has a
very fast adoption of reports, and in this regard they do very well.

5) overlap between URIBLs and SURBLs is NOT a problem, in and of itself. It's
only a problem when the overlap on nonspam. (MANY people in that flame thread
failed to read this fact. In fact, nearly all of them did.) That said, looking
at the overlap percentages you cannot rule out problems of duplicated-input any
more than you can prove it.

Summarizing some stats, quoted below:

-There are slightly more URIBL_BLACK hits than all of SURBL combined.

-There are 50% more URIBL_BLACK hits that are not in any SURBL (671) than hits
of any SURBL but are not in URIBL_BLACK (437). (note: I have not attempted to
adjust this for FP rate.. that 50% might be largely FPs, or might be all spam.)

-There is a significantly greater percentage of overlap between URIBL_BLACK and
any one of WS, OB or JP (all >92%), than there is overlap between any two of WS,
OB and JP (all <77%). Again, this isn't a problem, but it is an interesting fact.



stats (note: I copied mail-log to a separate dir, so this data is not changing
in these greps:)

Total URIBL_BLACK hits:
# grep "URIBL_BLACK" maillog |wc -l
   6995

Total hit stats:
Total SURBL.org hits
# grep "_SURBL" maillog |wc -l
   6761
Total WS hits:
# grep "WS_SURBL" maillog |wc -l
   4174
Total OB hits:
# grep "OB_SURBL" maillog |wc -l
  5246
# grep "JP_SURBL" maillog |wc -l
   4718
# grep "SC_SURBL" maillog |wc -l
    934

Overlap stats (take with a grain of salt. Overlap is alone is not proof of a
problem. In each percentage, I compared against the list with the lowest total
hit-count for the pairing)

# grep "WS_SURBL" maillog | grep "URIBL_BLACK" |wc -l
   3855   (92% of all WS hits)
# grep "OB_SURBL" maillog | grep "URIBL_BLACK" |wc -l
   5054 (96% of all OB hits)
# grep "JP_SURBL" maillog | grep "URIBL_BLACK" |wc -l
   4492 (95% of JP hits)
# grep "OB_SURBL" maillog | grep "WS_SURBL" |wc -l
   3175 (76% of all WS hits)
# grep "JP_SURBL" maillog | grep "OB_SURBL" |wc -l
   3571 (75% of all JP hits)
grep "JP_SURBL" maillog | grep "WS_SURBL" |wc -l
   3127 (75.9% of all WS hits)

Surbl vs URIBL comparison, hits in one but not the other:

grep -v "_SURBL" maillog | grep "URIBL_BLACK" |wc -l
    671 (9.9 % of URIBL_BLACK hits are not in SURBL)
grep "_SURBL" maillog | grep -v "URIBL_BLACK" |wc -l
    437 (6.4% of all _SURBL hits are not in URIBL_BLACK)







More information about the MailScanner mailing list