No bayesian for me?
Glenn Steen
glenn.steen at gmail.com
Wed Aug 12 16:30:45 IST 2009
2009/8/12 Mauricio Tavares <raubvogel at gmail.com>:
> Glenn Steen wrote:
>>
>> 2009/8/12 Mauricio Tavares <raubvogel at gmail.com>:
>>>
>>> Jules Field wrote:
>>>>
>>>> Did you run sa-learn as the same user you run MailScanner as? ("Run As
>>>> User" in MailScanner.conf). Otherwise your Bayes database you've been
>>>> training will be in the wrong place.
>>>>
>>> I see your point. I was indeed running sa-learn as root, not as
>>> postfix, which should be the user MailScanner runs as. So, I guess I
>>> should
>>> run it then as postfix. Now, should I delete the root-created database?
>>> Also, where will it save the database at?
>>>
>> You should delete the one for root, if it resides in roots home
>> directory, since that will be no help at all... Or move it. But I see
>> you have configured it to reside somewhere sane, so all you need do is
>> make it all owned by postfix.
>
> Here is an update: I wrote a script that through all the virtual
> email accounts (/var/spool/vmail/domain.com) and scanned the spam (placed in
> the .Spam folder) and the ham (placed in all the other mail folders). Since
> I am running it as postfix:postfix and that directory is owned by
> virtual:virtual, I did not get everyone. Is there a way to let the
> postfix-owned script check all the mails in the virtual-owned ones? Make
> postfix part of the virtual group? I think that is what the sticky bit is
> for, right? In any case, here is the output:
>
> postfix at mail /etc/postfix $ sa-learn --dump magic
> 0.000 0 3 0 non-token data: bayes db version
> 0.000 0 1837 0 non-token data: nspam
> 0.000 0 179092 0 non-token data: nham
> 0.000 0 3104505 0 non-token data: ntokens
> 0.000 0 1053729759 0 non-token data: oldest atime
> 0.000 0 1250081652 0 non-token data: newest atime
> 0.000 0 1250081434 0 non-token data: last journal sync
> atime
> 0.000 0 1250034247 0 non-token data: last expiry atime
> 0.000 0 0 0 non-token data: last expire atime
> delta
> 0.000 0 0 0 non-token data: last expire
> reduction count
> postfix at mail /etc/postfix $
>
>
> As you can see, there is a lot more ham than spam. I wonder how much harm
> would that cause in my bayesian filtering...
>
>> If you also use MailWatch, you'll need make the apache users group the
>> "group owner" for the base directory and all the files, and set the
>> GID bit for the directory (/var/spool/MailScanner/bayes in your case),
>> so that any new files get the correct group ownership. Once you've
>> done that, things should start cooking:-).
>
> Thanks for the suggestion! If I ever use MailWatch, I will try to
> remember to use that. =)
>
>> One more thing: Always run your tests (spamassassin --lint and stuff
>> like that) as your postfix user, to avoid some subleties that might
>> otherwise bite.
>
> postfix at mail /etc/postfix $ spamassassin --lint
> [19591] warn: config: warning: score set for non-existent rule
> WANTS_CREDIT_CARD
> [19591] warn: config: warning: score set for non-existent rule
> FORGED_RCVD_HELO
> [19591] warn: lint: 2 issues detected, please rerun with debug enabled for
> more information
> postfix at mail /etc/postfix $
>
Hm, I wonder if your postfix user really can read all the .cf files...
Do as it suggests and see what debug will tell you (spamassassin
--lint -D, as the PF user). Also try running a message through, or
else it will not test bayes for you:
spamassassin -t -D < /path/to/email/file
... and llok carefully at what it says about bayes. You might want to
pipe the output to a file (or less). Don't forget to redirect STDERR
as well ( 2>&1).
Cheers
--
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se
More information about the MailScanner
mailing list