No bayesian for me?

Martin Hepworth maxsec at gmail.com
Wed Aug 12 19:04:35 IST 2009


2009/8/12 Mauricio Tavares <raubvogel at gmail.com>

> Glenn Steen wrote:
>
>> 2009/8/12 Mauricio Tavares <raubvogel at gmail.com>:
>>
>>> Glenn Steen wrote:
>>>
>>>> 2009/8/12 Mauricio Tavares <raubvogel at gmail.com>:
>>>>
>>>>> Jules Field wrote:
>>>>>
>>>>>> Did you run sa-learn as the same user you run MailScanner as? ("Run As
>>>>>> User" in MailScanner.conf). Otherwise your Bayes database you've been
>>>>>> training will be in the wrong place.
>>>>>>
>>>>>>       I see your point. I was indeed running sa-learn as root, not as
>>>>> postfix, which should be the user MailScanner runs as. So, I guess I
>>>>> should
>>>>> run it then as postfix. Now, should I delete the root-created database?
>>>>> Also, where will it save the database at?
>>>>>
>>>>>  You should delete the one for root, if it resides in roots home
>>>> directory, since that will be no help at all... Or move it. But I see
>>>> you have configured it to reside somewhere sane, so all you need do is
>>>> make it all owned by postfix.
>>>>
>>>       Here is an update: I wrote a script that through all the virtual
>>> email accounts (/var/spool/vmail/domain.com) and scanned the spam
>>> (placed in
>>> the .Spam folder) and the ham (placed in all the other mail folders).
>>> Since
>>> I am running it as postfix:postfix and that directory is owned by
>>> virtual:virtual, I did not get everyone. Is there a way to let the
>>> postfix-owned script check all the mails in the virtual-owned ones? Make
>>> postfix part of the virtual group? I think that is what the sticky bit is
>>> for, right? In any case, here is the output:
>>>
>>> postfix at mail /etc/postfix $ sa-learn --dump magic
>>> 0.000          0          3          0  non-token data: bayes db version
>>> 0.000          0       1837          0  non-token data: nspam
>>> 0.000          0     179092          0  non-token data: nham
>>> 0.000          0    3104505          0  non-token data: ntokens
>>> 0.000          0 1053729759          0  non-token data: oldest atime
>>> 0.000          0 1250081652          0  non-token data: newest atime
>>> 0.000          0 1250081434          0  non-token data: last journal sync
>>> atime
>>> 0.000          0 1250034247          0  non-token data: last expiry atime
>>> 0.000          0          0          0  non-token data: last expire atime
>>> delta
>>> 0.000          0          0          0  non-token data: last expire
>>> reduction count
>>> postfix at mail /etc/postfix $
>>>
>>>
>>> As you can see, there is a lot more ham than spam. I wonder how much harm
>>> would that cause in my bayesian filtering...
>>>
>>>  If you also use MailWatch, you'll need make the apache users group the
>>>> "group owner" for the base directory and all the files, and set the
>>>> GID bit for the directory (/var/spool/MailScanner/bayes in your case),
>>>> so that any new files get the correct group ownership. Once you've
>>>> done that, things should start cooking:-).
>>>>
>>>       Thanks for the suggestion! If I ever use MailWatch, I will try to
>>> remember to use that. =)
>>>
>>>  One more thing: Always run your tests (spamassassin --lint and stuff
>>>> like that) as your postfix user, to avoid some subleties that might
>>>> otherwise bite.
>>>>
>>> postfix at mail /etc/postfix $ spamassassin --lint
>>> [19591] warn: config: warning: score set for non-existent rule
>>> WANTS_CREDIT_CARD
>>> [19591] warn: config: warning: score set for non-existent rule
>>> FORGED_RCVD_HELO
>>> [19591] warn: lint: 2 issues detected, please rerun with debug enabled
>>> for
>>> more information
>>> postfix at mail /etc/postfix $
>>>
>>>  Hm, I wonder if your postfix user really can read all the .cf files...
>> Do as it suggests and see what debug will tell you (spamassassin
>> --lint -D, as the PF user). Also try running a message through, or
>> else it will not test bayes for you:
>> spamassassin -t -D < /path/to/email/file
>> ... and llok carefully at what it says about bayes. You might want to
>> pipe the output to a file (or less). Don't forget to redirect STDERR
>> as well ( 2>&1).
>>
>> Cheers
>>
>
>        Some interesting findings (to me):
>
> postfix at mail /home/raub/Spam $ spamassassin -D < spam9.eml
>
> Content analysis details:   (10.2 points, 5.0 required)
>
>  pts rule name              description
> ---- ----------------------
> --------------------------------------------------
>  1.8 BAD_ENC_HEADER         Message has bad MIME encoding in the header
>  3.2 CHARSET_FARAWAY_HEADER A foreign language charset used in headers
>  0.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%
>                            [score: 0.5000]
>  1.4 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than 76 chars
>  0.9 RCVD_IN_SORBS_DUL      RBL: SORBS: sent directly from dynamic IP
> address
>                            [202.132.194.31 listed in dnsbl.sorbs.net]
>  0.9 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
>                            [202.132.194.31 listed in zen.spamhaus.org]
>  2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
>              [Blocked - see <
> http://www.spamcop.net/bl.shtml?202.132.194.31>]
>  0.1 RDNS_DYNAMIC           Delivered to trusted network by host with
>                            dynamic-looking rDNS
>  0.0 MISSING_MIMEOLE        Message has X-MSMail-Priority, but no X-MimeOLE
>
> But, as me:
>
> raub at mail ~/Spam $ spamassassin -D < spam9.eml
> [...]
>
> Content analysis details:   (12.7 points, 5.0 required)
>
>  pts rule name              description
> ---- ----------------------
> --------------------------------------------------
>  2.9 BAD_ENC_HEADER         Message has bad MIME encoding in the header
>  3.2 CHARSET_FARAWAY_HEADER A foreign language charset used in headers
>  1.8 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than 76 chars
>  0.9 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
>                            [202.132.194.31 listed in zen.spamhaus.org]
>  1.6 RCVD_IN_SORBS_DUL      RBL: SORBS: sent directly from dynamic IP
> address
>                            [202.132.194.31 listed in dnsbl.sorbs.net]
>  2.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
>              [Blocked - see <
> http://www.spamcop.net/bl.shtml?202.132.194.31>]
>  0.1 RDNS_DYNAMIC           Delivered to trusted network by host with
>                            dynamic-looking rDNS
>  0.0 MISSING_MIMEOLE        Message has X-MSMail-Priority, but no X-MimeOLE
>
> So, I guess the above means that bayesian was not run when I ran
> spamassasin as me because it did not have the rights to access the database.
> I can live with that.
>
> On a related note, why is it saying 5.0 points required if in
> MailScanner.conf I have
>
> Required SpamAssassin Score = 4.7
>
> Do I also have to define required_hits 4.70 in spam.assassin.prefs.conf?
>
> --
> MailScanner mailing list
> mailscanner at lists.mailscanner.info
> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>
> Before posting, read http://wiki.mailscanner.info/posting
>
> Support MailScanner development - buy the book off the website!
>

Hi

There are two settings in MailScanner.conf for SA scores. This gives you the
opportunity to mark the mail as "maybe spam" with delivery and the high
score as definitely spam and just drop it.

This differs from SA's view of the world.

-- 
Martin Hepworth
Oxford, UK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20090812/cee75605/attachment.html


More information about the MailScanner mailing list