when is Bayes scoring used?

Dene Ulmschneider dene at DATATECHIE.COM
Tue May 6 19:38:10 IST 2003

something else to add...

According the script that Julian provided to run sa-learn through cron, my
log is called "learn.spam.log"

When I checked that file - I added up all of the "learned form XX messages"
and the total number was 447.

Is the "learned from" referring to spam and ham? Is it possible that I have
87 spam and the rest of them a ham? I thought I was pretty sure that more
spam was getting processed than ham - but I could be wrong.

Can anyone shed a little light?


At 02:17 PM 5/6/2003 -0400, you wrote:
>The sa-learn -D --rebuild returned the following output:
>debug: Score set 0 chosen.
>debug: running in taint mode? no
>debug: using "/usr/share/spamassassin" for default rules dir
>debug: using "/etc/mail/spamassassin" for site rules dir
>Failed to create default user preference file /root/.spamassassin/user_prefs
>debug: using "/root/.spamassassin/user_prefs" for user prefs file
>debug: bayes: 17204 tie-ing to DB file R/O /root/.spamassassin/bayes_toks
>debug: bayes: 17204 tie-ing to DB file R/O /root/.spamassassin/bayes_seen
>debug: debug: Only 87 spam(s) in Bayes DB < 200
>debug: bayes: 17204 untie-ing
>debug: bayes: 17204 untie-ing db_toks
>debug: bayes: 17204 untie-ing db_seen
>debug: Score set 0 chosen.
>debug: Initialising learner
>debug: Initialising learner
>debug: lock: 17204 created
>debug: lock: 17204 trying to get lock on /root/.spamassassin/bayes with 0
>debug: lock: 17204 link to /root/.spamassassin/bayes.lock: link ok
>debug: bayes: 17204 tie-ing to DB file R/W /root/.spamassassin/bayes_toks
>debug: bayes: 17204 tie-ing to DB file R/W /root/.spamassassin/bayes_seen
>debug: bayes: 17204 untie-ing
>debug: bayes: 17204 untie-ing db_toks
>debug: bayes: 17204 untie-ing db_seen
>debug: bayes: files locked, now unlocking lock
>debug: unlock: 17204 unlink /root/.spamassassin/bayes.lock
>debug: bayes: 17204 untie-ing
>Does anything look wrong? I am shocked to find that only 87 messages have
>been recorded so far, but that's what the output states.
>Thanks for the help.
At 11:38 AM 5/6/2003 -0400, you wrote:
>>I think you need 200 spam and 200 ham.  Try running spamassassin with the
>>-D switch for debug and see what it says about bayes.  Also, you can run
>>the check_bayes_db command and see how many spam and ham have been
>>learned.  And you can run "sa-learn -D --rebuild" and see if it says
>>anything about there not being enough spam or ham.  These may give you
>>some clues to your questions.




>>Hey Julian et all-
>>In regards to all of the messages I have read that Bayes will not start
>>working until the magic number of 200 messages is reached, I am certain
>>that I have processed more than 200 messages and yet I still see no
>>"Bayes" entries in the headers.
>>I have checked the files in /root/.spamassassin and found the following:
>>filename                size            date modified
>>auto-whitelist          644.0 kb        today
>>auto-whitelist.db       12.0 kb         3.28.03
>>bayes_msgcount  3.2 kb          today
>>bayes_seen              1.3 mb          today
>>bayes_seen.db           4.0 kb          3.28.03
>>bayes_toks              2.6 mb          today
>>bayes_toks.db           12.0 kb         3.28.03
>>while I was checking these files - I saw that a new file was created and
>>then deleted called auto-whitelist.lock, due to the fact that the system
>>starting processing mails at this time.
>>The questions that I have are:
>>1-according to previous statements about the size of bayes_msgcount, have
>>I only correctly processed 3 or 4 emails?
>>2-why are all of the .db files form a month and a half ago?
>>3-why are there still no headers containing anything regarding Bayes?
>>Am I missing something. I have had MailScanner running for about 2 months
>>now and am certain that I have processed enough emails.
>>Any help is appreciated.
>>Thank You
At 02:29 PM 5/6/2003 +0100, you wrote:
At 14:18 06/05/2003, you wrote:
>>>>Well i have just setup mailscanner 4.20-3 and i have some problemes
>>>>with bayes "scoring".
>>>>I have the bayes database working as it s modified each time i receive
>>>>a mail but when i gor spam i never seen BAYES_DB tag in the scoring of
>>>>Is there a minim size of the bayes database in order to be uzed for
>>>It won't start using the results of the Bayes data until 200 messages have
>>>been scanned. The bayes_msgcount file will tell you how many it has scanned
>>>(file size == number of messages).
>>>>Thanks in advance for any help
>>>>the command
>>>>check_bayes_db -db /var/spool/spamassassin/bayes | head -8
>>>>0.000        0        0        0  non-token data: db format = on-the-fly
>>>>expiry, scan-counting
>>>>0.000        0       16        0  non-token data: nspam
>>>>0.000        0     1233        0  non-token data: nham
>>>>0.000        0    51394        0  non-token data: ntokens
>>>>0.000        0        0        0  non-token data: oldest age
>>>>0.000        0     1382        0  non-token data: current scan-count
>>>>0.000        0        0        0  non-token data: last expiry scan-count
>>>>0.027        0        8      801  english
