identical messages -- some get bayes score, some don't

Cannon Watts cwatts at elsberry.k12.mo.us
Sat Jan 10 17:28:13 GMT 2009


On Sat, January 10, 2009 4:31 am, Kai Schaetzl wrote:
> Cannon Watts wrote on Fri, 9 Jan 2009 15:37:06 -0600 (CST):
>
>> Probably getting beyond the scope of this list, but any tips on
>> debugging
>> this?  This particular box is running its own caching DNS that, prior to
>> seeing that debugging info, I would have said works perfectly.
>
> look which tests timeout, if it are always the same and then do some
> manual
> tests against these RBLs.
>
>> How would I go about disabling 'some of these tests'?  set
>> skip_rbl_checks
>> in /etc/mamil/spamassassin/mailscanner.cf?
>
> yes (this doesn't shut off URIBL tests).

Thanks, that certainly cuts down on the timeouts,  The URIBL tests are
still generating 281 timeouts on those 28 messages, but that's a minor
concern now that the bayes issues seem to be sorted out (see below).

>
>>
>> I suppose there could be a performance problem, but considering I just
>> moved   this server from a 933 Mhz Pentium with less than a gig of ram
>> (where it
>> was working reasonably well) to a 2 GHz quad-core w/ 4 GB of RAM and 15k
>> rpm disks (where I've never seen the system load go over 0.5), I tend to
>> look elsewhere first.
>
> I agree it doesn't look like it should be udnerpowered. But it depends on
> the
> number of messages you process each day. How many? How long does a
> spamassassin --lint run take? (use time).

It probably averages around 6000 per day.  'time spamassassin --lint'
returns
     real    0m2.450s
     user    0m2.309s
     sys     0m0.141s

I ran spamassassin --lint -D, and did find something peculiar in the output.

  dbg: bayes: tie-ing to DB file R/O /etc/MailScanner/bayes/bayes_toks
  dbg: bayes: tie-ing to DB file R/O /etc/MailScanner/bayes/bayes_seen
  .....
  dbg: bayes: not available for scanning, only 0 spam(s) in bayes DB < 200

/etc/MailScanner/bayes is the correct location for those files, and sa-learn
has been updating them without any errors, but something is obviously not
right.  I moved the old bayes_toks and bayes_seen files, then fed bayes
around 500 spams and hams via sa-learn to create a new database.

Now, running spamassassin on those 28 messages generates a BAYES_99 score
for each one with no bayes timeouts.

I guess my database was either corrupt, or just too big.  Will have to spend
some time re-training bayes, but I'm hopeful that part of the problem is
solved.  Thanks again for your help.

Cannon



More information about the MailScanner mailing list