when is Bayes scoring used?

Desai, Jason jase at SENSIS.COM
Tue May 6 20:23:06 IST 2003


Are you sure you're using the same bayes database for everything?  Make sure
you are running everything (MailScanner, sa-learn scripts, etc) as the same
user or you specify the same location for your bayes database.  I think you
can force a location in both your spam.assassin.prefs.conf and in
MailScanner.conf.  If you're not specifying a location, it should default to
~/.spamassassin.

Right now though, it looks like your database (for root) has only learned
about 87 spams.

Jason

-----Original Message-----
From: Dene Ulmschneider [mailto:dene at DATATECHIE.COM]
Sent: Tuesday, May 06, 2003 2:38 PM
To: MAILSCANNER at JISCMAIL.AC.UK
Subject: Re: [MAILSCANNER] when is Bayes scoring used?


something else to add...

According the script that Julian provided to run sa-learn through cron, my
log is called "learn.spam.log"

When I checked that file - I added up all of the "learned form XX messages"
and the total number was 447.

Is the "learned from" referring to spam and ham? Is it possible that I have
87 spam and the rest of them a ham? I thought I was pretty sure that more
spam was getting processed than ham - but I could be wrong.

Can anyone shed a little light?

Dene

At 02:17 PM 5/6/2003 -0400, you wrote:


The sa-learn -D --rebuild returned the following output:

<---snip--->
debug: Score set 0 chosen.
debug: running in taint mode? no
debug: using "/usr/share/spamassassin" for default rules dir
debug: using "/etc/mail/spamassassin" for site rules dir
Failed to create default user preference file /root/.spamassassin/user_prefs

debug: using "/root/.spamassassin/user_prefs" for user prefs file
debug: bayes: 17204 tie-ing to DB file R/O /root/.spamassassin/bayes_toks
debug: bayes: 17204 tie-ing to DB file R/O /root/.spamassassin/bayes_seen
debug: debug: Only 87 spam(s) in Bayes DB < 200
debug: bayes: 17204 untie-ing
debug: bayes: 17204 untie-ing db_toks
debug: bayes: 17204 untie-ing db_seen
debug: Score set 0 chosen.
debug: Initialising learner
debug: Initialising learner
debug: lock: 17204 created
/root/.spamassassin/bayes.lock.neo.datatechie.com.17204
debug: lock: 17204 trying to get lock on /root/.spamassassin/bayes with 0
retries
debug: lock: 17204 link to /root/.spamassassin/bayes.lock: link ok
debug: bayes: 17204 tie-ing to DB file R/W /root/.spamassassin/bayes_toks
debug: bayes: 17204 tie-ing to DB file R/W /root/.spamassassin/bayes_seen
debug: bayes: 17204 untie-ing
debug: bayes: 17204 untie-ing db_toks
debug: bayes: 17204 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 17204 unlink /root/.spamassassin/bayes.lock
debug: bayes: 17204 untie-ing
<---snip--->

Does anything look wrong? I am shocked to find that only 87 messages have
been recorded so far, but that's what the output states.

Thanks for the help.

Dene

At 11:38 AM 5/6/2003 -0400, you wrote:


I think you need 200 spam and 200 ham.  Try running spamassassin with the -D
switch for debug and see what it says about bayes.  Also, you can run the
check_bayes_db command and see how many spam and ham have been learned.  And
you can run "sa-learn -D --rebuild" and see if it says anything about there
not being enough spam or ham.  These may give you some clues to your
questions.

Jason


-----Original Message-----

From: Dene Ulmschneider [ mailto:dene at DATATECHIE.COM
<mailto:dene at DATATECHIE.COM> ]

Sent: Tuesday, May 06, 2003 10:53 AM

To: MAILSCANNER at JISCMAIL.AC.UK

Subject: Re: [MAILSCANNER] when is Bayes scoring used?



Hey Julian et all-



In regards to all of the messages I have read that Bayes will not start
working until the magic number of 200 messages is reached, I am certain that
I have processed more than 200 messages and yet I still see no "Bayes"
entries in the headers.



I have checked the files in /root/.spamassassin and found the following:



filename                size            date modified

auto-whitelist          644.0 kb        today

auto-whitelist.db       12.0 kb         3.28.03

bayes_msgcount  3.2 kb          today

bayes_seen              1.3 mb          today

bayes_seen.db           4.0 kb          3.28.03

bayes_toks              2.6 mb          today

bayes_toks.db           12.0 kb         3.28.03



while I was checking these files - I saw that a new file was created and
then deleted called auto-whitelist.lock, due to the fact that the system
starting processing mails at this time.



The questions that I have are:

1-according to previous statements about the size of bayes_msgcount, have I
only correctly processed 3 or 4 emails?

2-why are all of the .db files form a month and a half ago?

3-why are there still no headers containing anything regarding Bayes?

Am I missing something. I have had MailScanner running for about 2 months
now and am certain that I have processed enough emails.



Any help is appreciated.



Thank You



Dene Ulmschneider

Data Techie Inc.

-------------------------------------------------------------------------

office:         718.738.8859

email:          dene at datatechie.com

pager mail:     denenow at datatechie.com

website:         www.datatechie.com <http://www.datatechie.com/>

-------------------------------------------------------------------------

"Life is too short...-...you  should have dessert first"



At 02:29 PM 5/6/2003 +0100, you wrote:


At 14:18 06/05/2003, you wrote:


Well i have just setup mailscanner 4.20-3 and i have some problemes

with bayes "scoring".



I have the bayes database working as it s modified each time i receive

a mail but when i gor spam i never seen BAYES_DB tag in the scoring of

spam.

Is there a minim size of the bayes database in order to be uzed for

scoring?

It won't start using the results of the Bayes data until 200 messages have

been scanned. The bayes_msgcount file will tell you how many it has scanned

(file size == number of messages).








Thanks in advance for any help



P.S

the command

check_bayes_db -db /var/spool/spamassassin/bayes | head -8

0.000        0        0        0  non-token data: db format = on-the-fly

probs,

expiry, scan-counting

0.000        0       16        0  non-token data: nspam

0.000        0     1233        0  non-token data: nham

0.000        0    51394        0  non-token data: ntokens

0.000        0        0        0  non-token data: oldest age

0.000        0     1382        0  non-token data: current scan-count

0.000        0        0        0  non-token data: last expiry scan-count

0.027        0        8      801  english







--

Eric Doutreleau

I.N.T                   | Tel   : +33 (0) 160764687

9 rue Charles Fourier   | Fax   : +33 (0) 160764321

91011 Evry   France     | email : Eric.Doutreleau at int-evry.fr

--

Julian Field

www.MailScanner.info <http://www.mailscanner.info/>

MailScanner thanks transtec Computers for their support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20030506/4917d8e0/attachment.html


More information about the MailScanner mailing list