when is Bayes scoring used?
Desai, Jason
jase at SENSIS.COM
Tue May 6 20:23:06 IST 2003
Are you sure you're using the same bayes database for everything? Make sure
you are running everything (MailScanner, sa-learn scripts, etc) as the same
user or you specify the same location for your bayes database. I think you
can force a location in both your spam.assassin.prefs.conf and in
MailScanner.conf. If you're not specifying a location, it should default to
~/.spamassassin.
Right now though, it looks like your database (for root) has only learned
about 87 spams.
Jason
-----Original Message-----
From: Dene Ulmschneider [mailto:dene at DATATECHIE.COM]
Sent: Tuesday, May 06, 2003 2:38 PM
To: MAILSCANNER at JISCMAIL.AC.UK
Subject: Re: [MAILSCANNER] when is Bayes scoring used?
something else to add...
According the script that Julian provided to run sa-learn through cron, my
log is called "learn.spam.log"
When I checked that file - I added up all of the "learned form XX messages"
and the total number was 447.
Is the "learned from" referring to spam and ham? Is it possible that I have
87 spam and the rest of them a ham? I thought I was pretty sure that more
spam was getting processed than ham - but I could be wrong.
Can anyone shed a little light?
Dene
At 02:17 PM 5/6/2003 -0400, you wrote:
The sa-learn -D --rebuild returned the following output:
<---snip--->
debug: Score set 0 chosen.
debug: running in taint mode? no
debug: using "/usr/share/spamassassin" for default rules dir
debug: using "/etc/mail/spamassassin" for site rules dir
Failed to create default user preference file /root/.spamassassin/user_prefs
debug: using "/root/.spamassassin/user_prefs" for user prefs file
debug: bayes: 17204 tie-ing to DB file R/O /root/.spamassassin/bayes_toks
debug: bayes: 17204 tie-ing to DB file R/O /root/.spamassassin/bayes_seen
debug: debug: Only 87 spam(s) in Bayes DB < 200
debug: bayes: 17204 untie-ing
debug: bayes: 17204 untie-ing db_toks
debug: bayes: 17204 untie-ing db_seen
debug: Score set 0 chosen.
debug: Initialising learner
debug: Initialising learner
debug: lock: 17204 created
/root/.spamassassin/bayes.lock.neo.datatechie.com.17204
debug: lock: 17204 trying to get lock on /root/.spamassassin/bayes with 0
retries
debug: lock: 17204 link to /root/.spamassassin/bayes.lock: link ok
debug: bayes: 17204 tie-ing to DB file R/W /root/.spamassassin/bayes_toks
debug: bayes: 17204 tie-ing to DB file R/W /root/.spamassassin/bayes_seen
debug: bayes: 17204 untie-ing
debug: bayes: 17204 untie-ing db_toks
debug: bayes: 17204 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 17204 unlink /root/.spamassassin/bayes.lock
debug: bayes: 17204 untie-ing
<---snip--->
Does anything look wrong? I am shocked to find that only 87 messages have
been recorded so far, but that's what the output states.
Thanks for the help.
Dene
At 11:38 AM 5/6/2003 -0400, you wrote:
I think you need 200 spam and 200 ham. Try running spamassassin with the -D
switch for debug and see what it says about bayes. Also, you can run the
check_bayes_db command and see how many spam and ham have been learned. And
you can run "sa-learn -D --rebuild" and see if it says anything about there
not being enough spam or ham. These may give you some clues to your
questions.
Jason
-----Original Message-----
From: Dene Ulmschneider [ mailto:dene at DATATECHIE.COM
<mailto:dene at DATATECHIE.COM> ]
Sent: Tuesday, May 06, 2003 10:53 AM
To: MAILSCANNER at JISCMAIL.AC.UK
Subject: Re: [MAILSCANNER] when is Bayes scoring used?
Hey Julian et all-
In regards to all of the messages I have read that Bayes will not start
working until the magic number of 200 messages is reached, I am certain that
I have processed more than 200 messages and yet I still see no "Bayes"
entries in the headers.
I have checked the files in /root/.spamassassin and found the following:
filename size date modified
auto-whitelist 644.0 kb today
auto-whitelist.db 12.0 kb 3.28.03
bayes_msgcount 3.2 kb today
bayes_seen 1.3 mb today
bayes_seen.db 4.0 kb 3.28.03
bayes_toks 2.6 mb today
bayes_toks.db 12.0 kb 3.28.03
while I was checking these files - I saw that a new file was created and
then deleted called auto-whitelist.lock, due to the fact that the system
starting processing mails at this time.
The questions that I have are:
1-according to previous statements about the size of bayes_msgcount, have I
only correctly processed 3 or 4 emails?
2-why are all of the .db files form a month and a half ago?
3-why are there still no headers containing anything regarding Bayes?
Am I missing something. I have had MailScanner running for about 2 months
now and am certain that I have processed enough emails.
Any help is appreciated.
Thank You
Dene Ulmschneider
Data Techie Inc.
-------------------------------------------------------------------------
office: 718.738.8859
email: dene at datatechie.com
pager mail: denenow at datatechie.com
website: www.datatechie.com <http://www.datatechie.com/>
-------------------------------------------------------------------------
"Life is too short...-...you should have dessert first"
At 02:29 PM 5/6/2003 +0100, you wrote:
At 14:18 06/05/2003, you wrote:
Well i have just setup mailscanner 4.20-3 and i have some problemes
with bayes "scoring".
I have the bayes database working as it s modified each time i receive
a mail but when i gor spam i never seen BAYES_DB tag in the scoring of
spam.
Is there a minim size of the bayes database in order to be uzed for
scoring?
It won't start using the results of the Bayes data until 200 messages have
been scanned. The bayes_msgcount file will tell you how many it has scanned
(file size == number of messages).
Thanks in advance for any help
P.S
the command
check_bayes_db -db /var/spool/spamassassin/bayes | head -8
0.000 0 0 0 non-token data: db format = on-the-fly
probs,
expiry, scan-counting
0.000 0 16 0 non-token data: nspam
0.000 0 1233 0 non-token data: nham
0.000 0 51394 0 non-token data: ntokens
0.000 0 0 0 non-token data: oldest age
0.000 0 1382 0 non-token data: current scan-count
0.000 0 0 0 non-token data: last expiry scan-count
0.027 0 8 801 english
--
Eric Doutreleau
I.N.T | Tel : +33 (0) 160764687
9 rue Charles Fourier | Fax : +33 (0) 160764321
91011 Evry France | email : Eric.Doutreleau at int-evry.fr
--
Julian Field
www.MailScanner.info <http://www.mailscanner.info/>
MailScanner thanks transtec Computers for their support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20030506/4917d8e0/attachment.html
More information about the MailScanner
mailing list