No bayesian for me?

Jules Field MailScanner at ecs.soton.ac.uk
Wed Aug 12 19:46:12 IST 2009



On 12/08/2009 19:04, Martin Hepworth wrote:
>
>
> 2009/8/12 Mauricio Tavares <raubvogel at gmail.com 
> <mailto:raubvogel at gmail.com>>
>
>     Glenn Steen wrote:
>
>         2009/8/12 Mauricio Tavares <raubvogel at gmail.com
>         <mailto:raubvogel at gmail.com>>:
>
>             Glenn Steen wrote:
>
>                 2009/8/12 Mauricio Tavares <raubvogel at gmail.com
>                 <mailto:raubvogel at gmail.com>>:
>
>                     Jules Field wrote:
>
>                         Did you run sa-learn as the same user you run
>                         MailScanner as? ("Run As
>                         User" in MailScanner.conf). Otherwise your
>                         Bayes database you've been
>                         training will be in the wrong place.
>
>                          I see your point. I was indeed running
>                     sa-learn as root, not as
>                     postfix, which should be the user MailScanner runs
>                     as. So, I guess I
>                     should
>                     run it then as postfix. Now, should I delete the
>                     root-created database?
>                     Also, where will it save the database at?
>
>                 You should delete the one for root, if it resides in
>                 roots home
>                 directory, since that will be no help at all... Or
>                 move it. But I see
>                 you have configured it to reside somewhere sane, so
>                 all you need do is
>                 make it all owned by postfix.
>
>                   Here is an update: I wrote a script that through all
>             the virtual
>             email accounts (/var/spool/vmail/domain.com
>             <http://domain.com>) and scanned the spam (placed in
>             the .Spam folder) and the ham (placed in all the other
>             mail folders). Since
>             I am running it as postfix:postfix and that directory is
>             owned by
>             virtual:virtual, I did not get everyone. Is there a way to
>             let the
>             postfix-owned script check all the mails in the
>             virtual-owned ones? Make
>             postfix part of the virtual group? I think that is what
>             the sticky bit is
>             for, right? In any case, here is the output:
>
>             postfix at mail /etc/postfix $ sa-learn --dump magic
>             0.000          0          3          0  non-token data:
>             bayes db version
>             0.000          0       1837          0  non-token data: nspam
>             0.000          0     179092          0  non-token data: nham
>             0.000          0    3104505          0  non-token data:
>             ntokens
>             0.000          0 1053729759          0  non-token data:
>             oldest atime
>             0.000          0 1250081652          0  non-token data:
>             newest atime
>             0.000          0 1250081434          0  non-token data:
>             last journal sync
>             atime
>             0.000          0 1250034247          0  non-token data:
>             last expiry atime
>             0.000          0          0          0  non-token data:
>             last expire atime
>             delta
>             0.000          0          0          0  non-token data:
>             last expire
>             reduction count
>             postfix at mail /etc/postfix $
>
>
>             As you can see, there is a lot more ham than spam. I
>             wonder how much harm
>             would that cause in my bayesian filtering...
>
>                 If you also use MailWatch, you'll need make the apache
>                 users group the
>                 "group owner" for the base directory and all the
>                 files, and set the
>                 GID bit for the directory
>                 (/var/spool/MailScanner/bayes in your case),
>                 so that any new files get the correct group ownership.
>                 Once you've
>                 done that, things should start cooking:-).
>
>                   Thanks for the suggestion! If I ever use MailWatch,
>             I will try to
>             remember to use that. =)
>
>                 One more thing: Always run your tests (spamassassin
>                 --lint and stuff
>                 like that) as your postfix user, to avoid some
>                 subleties that might
>                 otherwise bite.
>
>             postfix at mail /etc/postfix $ spamassassin --lint
>             [19591] warn: config: warning: score set for non-existent rule
>             WANTS_CREDIT_CARD
>             [19591] warn: config: warning: score set for non-existent rule
>             FORGED_RCVD_HELO
>             [19591] warn: lint: 2 issues detected, please rerun with
>             debug enabled for
>             more information
>             postfix at mail /etc/postfix $
>
>         Hm, I wonder if your postfix user really can read all the .cf
>         files...
>         Do as it suggests and see what debug will tell you (spamassassin
>         --lint -D, as the PF user). Also try running a message through, or
>         else it will not test bayes for you:
>         spamassassin -t -D < /path/to/email/file
>         ... and llok carefully at what it says about bayes. You might
>         want to
>         pipe the output to a file (or less). Don't forget to redirect
>         STDERR
>         as well ( 2>&1).
>
>         Cheers
>
>
>            Some interesting findings (to me):
>
>     postfix at mail /home/raub/Spam $ spamassassin -D < spam9.eml
>
>     Content analysis details:   (10.2 points, 5.0 required)
>
>      pts rule name              description
>     ---- ----------------------
>     --------------------------------------------------
>      1.8 BAD_ENC_HEADER         Message has bad MIME encoding in the
>     header
>      3.2 CHARSET_FARAWAY_HEADER A foreign language charset used in headers
>      0.0 BAYES_50               BODY: Bayesian spam probability is 40
>     to 60%
>                                [score: 0.5000]
>      1.4 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than
>     76 chars
>      0.9 RCVD_IN_SORBS_DUL      RBL: SORBS: sent directly from dynamic
>     IP address
>                                [202.132.194.31 listed in
>     dnsbl.sorbs.net <http://dnsbl.sorbs.net>]
>      0.9 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
>                                [202.132.194.31 listed in
>     zen.spamhaus.org <http://zen.spamhaus.org>]
>      2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in
>     bl.spamcop.net <http://bl.spamcop.net>
>                  [Blocked - see
>     <http://www.spamcop.net/bl.shtml?202.132.194.31>]
>      0.1 RDNS_DYNAMIC           Delivered to trusted network by host with
>                                dynamic-looking rDNS
>      0.0 MISSING_MIMEOLE        Message has X-MSMail-Priority, but no
>     X-MimeOLE
>
>     But, as me:
>
>     raub at mail ~/Spam $ spamassassin -D < spam9.eml
>     [...]
>
>     Content analysis details:   (12.7 points, 5.0 required)
>
>      pts rule name              description
>     ---- ----------------------
>     --------------------------------------------------
>      2.9 BAD_ENC_HEADER         Message has bad MIME encoding in the
>     header
>      3.2 CHARSET_FARAWAY_HEADER A foreign language charset used in headers
>      1.8 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than
>     76 chars
>      0.9 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
>                                [202.132.194.31 listed in
>     zen.spamhaus.org <http://zen.spamhaus.org>]
>      1.6 RCVD_IN_SORBS_DUL      RBL: SORBS: sent directly from dynamic
>     IP address
>                                [202.132.194.31 listed in
>     dnsbl.sorbs.net <http://dnsbl.sorbs.net>]
>      2.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in
>     bl.spamcop.net <http://bl.spamcop.net>
>                  [Blocked - see
>     <http://www.spamcop.net/bl.shtml?202.132.194.31>]
>      0.1 RDNS_DYNAMIC           Delivered to trusted network by host with
>                                dynamic-looking rDNS
>      0.0 MISSING_MIMEOLE        Message has X-MSMail-Priority, but no
>     X-MimeOLE
>
>     So, I guess the above means that bayesian was not run when I ran
>     spamassasin as me because it did not have the rights to access the
>     database. I can live with that.
>
>     On a related note, why is it saying 5.0 points required if in
>     MailScanner.conf I have
>
>     Required SpamAssassin Score = 4.7
>
>     Do I also have to define required_hits 4.70 in
>     spam.assassin.prefs.conf?
>
>     -- 
>     MailScanner mailing list
>     mailscanner at lists.mailscanner.info
>     <mailto:mailscanner at lists.mailscanner.info>
>     http://lists.mailscanner.info/mailman/listinfo/mailscanner
>
>     Before posting, read http://wiki.mailscanner.info/posting
>
>     Support MailScanner development - buy the book off the website!
>
>
> Hi
>
> There are two settings in MailScanner.conf for SA scores. This gives 
> you the opportunity to mark the mail as "maybe spam" with delivery and 
> the high score as definitely spam and just drop it.
>
> This differs from SA's view of the world.
And you can implement as many extra levels of spam score handling as you 
like using "SpamAssassin Rule Actions" where you can specify a set of 
actions for any arbitrary spam score thresholds. So if you need 15 
levels of spam thresholds, no problem!

Jules

-- 
Julian Field MEng CITP CEng
www.MailScanner.info
Buy the MailScanner book at www.MailScanner.info/store

Need help customising MailScanner?
Contact me!
Need help fixing or optimising your systems?
Contact me!
Need help getting you started solving new requirements from your boss?
Contact me!

PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
Follow me at twitter.com/JulesFM and twitter.com/MailScanner


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the MailScanner mailing list