How does SA auto white-list works?

Thu Jul 13 22:20:33 IST 2006

Scott Silva wrote:
> Matt Kettler spake the following on 7/13/2006 12:09 PM:
>> René Berber wrote:
>>> Hi,
>>>
>>> I'm using the SA auto white-list feature with MailScanner 4.54.6, and there's
>>> something confusing in the result I'm seeing: a score is added if the address is
>>> white listed.  Shouldn't it be subtracted?
>>>
>> Despite it's name, the AWL is NOT a whitelist. It's called that for lack of any
>> better name that isn't huge.
>>
>> The AWL is really a "History-based average score tracking system with automatic
>> whitelist and blacklist behaviors resulting from factoring past performance into
>> current scores".  But HBASTSAWBBRFPPICS is a rather long acronym.
>>
>> Please read:
>>
>> http://wiki.apache.org/spamassassin/AutoWhitelist
>>
>> and
>>
>> http://wiki.apache.org/spamassassin/AwlWrongWay
> Isn't that what bayes is supposed to be?

No. The AWL is strictly the sender's email address and IP. It also defaults to
taking no action at all if it's never seen mail from that sender before.

Bayes is *COMPLETELY* different. Bayes largely works by analyzing words out of
the body text, making them into "tokens" and keeping a database of how often
each appears in spam and non-spam.

Since bayes is word-based, you wind up with a massive amount of
inter-relationship between messages which are only vaguely similar, even if they
come from different senders and are discussing different subjects. For example,
the learning from this message will impact:
	discussion of apache (because of the link)
	anything mentioning blacklist or whitelist
	anything discussing performance
	etc.

Bayes does also tokenize sender addresses, and other header bits, but it's
largely dominated by body-text-word based tokens.