Spamhaus replacement

Matt Kettler mkettler at evi-inc.com
Wed Dec 5 14:55:46 GMT 2007


Steve Freegard wrote:
> Hi Matt,
> 
> Matt Kettler wrote:
>> bl.spamcop.net works pretty well, but does have some significant FPs 
>> now that they list backscatter sites (in the SpamAssassin 3.2 
>> mass-checks, the hits on spamcop were 87.1% spam, and therefore 12.9% 
>> nonspam)
> 
> The last network mass-check on ruleqa.spamassassin.org for Spamcop shows:
> 
> 62.71% spam, 0.10% non-spam (97msgs out of 90160), 0.998 S/O

Yeah, but ruleqa isn't as large or diverse a sample as a full release mass-check.

For example, the details for the last net check you quoted above are (see the 
tiny "source details" clicky on the right side near the top of the list):

OVERALL    SPAM%     HAM%     S/O    RANK   SCORE  NAME
       0   521287    90160    0.853   0.00    0.00  (all messages)
0.00000  85.2547  14.7453    0.853   0.00    0.00  (all messages as %)


However, the 3.2.x set1 stet was:
OVERALL    SPAM%     HAM%     S/O    RANK   SCORE  NAME
       0   953545   540903    0.638   0.00    0.00  (all messages)
0.00000  63.8058  36.1942    0.638   0.00    0.00  (all messages as %)


Note that there's 6 times more nonspam, and a substantially lower S/O for the 
overall set in the release mass-check.

For a rule to go from a S/O of 0.871 to a S/O of 0.998  (14.5% higher) is easy 
when the corpus itself goes from 0.638 to 0.853 (33% higher).

Be wary of reading too much into S/O's from ruleqa, particularly the net runs. 
It's a good "quick read" but you've got to be aware that the numbers can be 
significantly biased by the makeup of the corpus.






More information about the MailScanner mailing list