Spamhaus replacement
Matt Kettler
mkettler at evi-inc.com
Wed Dec 5 14:55:46 GMT 2007
Steve Freegard wrote:
> Hi Matt,
>
> Matt Kettler wrote:
>> bl.spamcop.net works pretty well, but does have some significant FPs
>> now that they list backscatter sites (in the SpamAssassin 3.2
>> mass-checks, the hits on spamcop were 87.1% spam, and therefore 12.9%
>> nonspam)
>
> The last network mass-check on ruleqa.spamassassin.org for Spamcop shows:
>
> 62.71% spam, 0.10% non-spam (97msgs out of 90160), 0.998 S/O
Yeah, but ruleqa isn't as large or diverse a sample as a full release mass-check.
For example, the details for the last net check you quoted above are (see the
tiny "source details" clicky on the right side near the top of the list):
OVERALL SPAM% HAM% S/O RANK SCORE NAME
0 521287 90160 0.853 0.00 0.00 (all messages)
0.00000 85.2547 14.7453 0.853 0.00 0.00 (all messages as %)
However, the 3.2.x set1 stet was:
OVERALL SPAM% HAM% S/O RANK SCORE NAME
0 953545 540903 0.638 0.00 0.00 (all messages)
0.00000 63.8058 36.1942 0.638 0.00 0.00 (all messages as %)
Note that there's 6 times more nonspam, and a substantially lower S/O for the
overall set in the release mass-check.
For a rule to go from a S/O of 0.871 to a S/O of 0.998 (14.5% higher) is easy
when the corpus itself goes from 0.638 to 0.853 (33% higher).
Be wary of reading too much into S/O's from ruleqa, particularly the net runs.
It's a good "quick read" but you've got to be aware that the numbers can be
significantly biased by the makeup of the corpus.
More information about the MailScanner
mailing list