No bayesian for me?

Mauricio Tavares raubvogel at gmail.com
Wed Aug 12 14:43:38 IST 2009


Glenn Steen wrote:
> 2009/8/12 Mauricio Tavares <raubvogel at gmail.com>:
>> Jules Field wrote:
>>> Did you run sa-learn as the same user you run MailScanner as? ("Run As
>>> User" in MailScanner.conf). Otherwise your Bayes database you've been
>>> training will be in the wrong place.
>>>
>>        I see your point. I was indeed running sa-learn as root, not as
>> postfix, which should be the user MailScanner runs as. So, I guess I should
>> run it then as postfix. Now, should I delete the root-created database?
>> Also, where will it save the database at?
>>
> You should delete the one for root, if it resides in roots home
> directory, since that will be no help at all... Or move it. But I see
> you have configured it to reside somewhere sane, so all you need do is
> make it all owned by postfix.

	Here is an update: I wrote a script that through all the virtual email 
accounts (/var/spool/vmail/domain.com) and scanned the spam (placed in 
the .Spam folder) and the ham (placed in all the other mail folders). 
Since I am running it as postfix:postfix and that directory is owned by 
virtual:virtual, I did not get everyone. Is there a way to let the 
postfix-owned script check all the mails in the virtual-owned ones? Make 
postfix part of the virtual group? I think that is what the sticky bit 
is for, right? In any case, here is the output:

postfix at mail /etc/postfix $ sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0       1837          0  non-token data: nspam
0.000          0     179092          0  non-token data: nham
0.000          0    3104505          0  non-token data: ntokens
0.000          0 1053729759          0  non-token data: oldest atime
0.000          0 1250081652          0  non-token data: newest atime
0.000          0 1250081434          0  non-token data: last journal 
sync atime
0.000          0 1250034247          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire 
atime delta
0.000          0          0          0  non-token data: last expire 
reduction count
postfix at mail /etc/postfix $


As you can see, there is a lot more ham than spam. I wonder how much 
harm would that cause in my bayesian filtering...

> If you also use MailWatch, you'll need make the apache users group the
> "group owner" for the base directory and all the files, and set the
> GID bit for the directory (/var/spool/MailScanner/bayes in your case),
> so that any new files get the correct group ownership. Once you've
> done that, things should start cooking:-).

	Thanks for the suggestion! If I ever use MailWatch, I will try to 
remember to use that. =)

> One more thing: Always run your tests (spamassassin --lint and stuff
> like that) as your postfix user, to avoid some subleties that might
> otherwise bite.

postfix at mail /etc/postfix $ spamassassin --lint
[19591] warn: config: warning: score set for non-existent rule 
WANTS_CREDIT_CARD
[19591] warn: config: warning: score set for non-existent rule 
FORGED_RCVD_HELO
[19591] warn: lint: 2 issues detected, please rerun with debug enabled 
for more information
postfix at mail /etc/postfix $


> Since you aim to disable the score averager (AWL), you don't have fix
> those perms/ownerships... Just comment out the loadplugin.
> 
> Cheers




More information about the MailScanner mailing list