when is Bayes scoring used?
Dene Ulmschneider
dene at DATATECHIE.COM
Tue May 6 19:38:10 IST 2003
something else to add...
According the script that Julian provided to run sa-learn through cron, my
log is called "learn.spam.log"
When I checked that file - I added up all of the "learned form XX messages"
and the total number was 447.
Is the "learned from" referring to spam and ham? Is it possible that I have
87 spam and the rest of them a ham? I thought I was pretty sure that more
spam was getting processed than ham - but I could be wrong.
Can anyone shed a little light?
Dene
At 02:17 PM 5/6/2003 -0400, you wrote:
>The sa-learn -D --rebuild returned the following output:
>
><---snip--->
>debug: Score set 0 chosen.
>debug: running in taint mode? no
>debug: using "/usr/share/spamassassin" for default rules dir
>debug: using "/etc/mail/spamassassin" for site rules dir
>Failed to create default user preference file /root/.spamassassin/user_prefs
>debug: using "/root/.spamassassin/user_prefs" for user prefs file
>debug: bayes: 17204 tie-ing to DB file R/O /root/.spamassassin/bayes_toks
>debug: bayes: 17204 tie-ing to DB file R/O /root/.spamassassin/bayes_seen
>debug: debug: Only 87 spam(s) in Bayes DB < 200
>debug: bayes: 17204 untie-ing
>debug: bayes: 17204 untie-ing db_toks
>debug: bayes: 17204 untie-ing db_seen
>debug: Score set 0 chosen.
>debug: Initialising learner
>debug: Initialising learner
>debug: lock: 17204 created
>/root/.spamassassin/bayes.lock.neo.datatechie.com.17204
>debug: lock: 17204 trying to get lock on /root/.spamassassin/bayes with 0
>retries
>debug: lock: 17204 link to /root/.spamassassin/bayes.lock: link ok
>debug: bayes: 17204 tie-ing to DB file R/W /root/.spamassassin/bayes_toks
>debug: bayes: 17204 tie-ing to DB file R/W /root/.spamassassin/bayes_seen
>debug: bayes: 17204 untie-ing
>debug: bayes: 17204 untie-ing db_toks
>debug: bayes: 17204 untie-ing db_seen
>debug: bayes: files locked, now unlocking lock
>debug: unlock: 17204 unlink /root/.spamassassin/bayes.lock
>debug: bayes: 17204 untie-ing
><---snip--->
>
>Does anything look wrong? I am shocked to find that only 87 messages have
>been recorded so far, but that's what the output states.
>
>Thanks for the help.
>
>Dene
>
>At 11:38 AM 5/6/2003 -0400, you wrote:
>>I think you need 200 spam and 200 ham. Try running spamassassin with the
>>-D switch for debug and see what it says about bayes. Also, you can run
>>the check_bayes_db command and see how many spam and ham have been
>>learned. And you can run "sa-learn -D --rebuild" and see if it says
>>anything about there not being enough spam or ham. These may give you
>>some clues to your questions.
>>
>>Jason
>>-----Original Message-----
>>From: Dene Ulmschneider [mailto:dene at DATATECHIE.COM]
>>Sent: Tuesday, May 06, 2003 10:53 AM
>>To: MAILSCANNER at JISCMAIL.AC.UK
>>Subject: Re: [MAILSCANNER] when is Bayes scoring used?
>>
>>Hey Julian et all-
>>
>>In regards to all of the messages I have read that Bayes will not start
>>working until the magic number of 200 messages is reached, I am certain
>>that I have processed more than 200 messages and yet I still see no
>>"Bayes" entries in the headers.
>>
>>I have checked the files in /root/.spamassassin and found the following:
>>
>>filename size date modified
>>auto-whitelist 644.0 kb today
>>auto-whitelist.db 12.0 kb 3.28.03
>>bayes_msgcount 3.2 kb today
>>bayes_seen 1.3 mb today
>>bayes_seen.db 4.0 kb 3.28.03
>>bayes_toks 2.6 mb today
>>bayes_toks.db 12.0 kb 3.28.03
>>
>>while I was checking these files - I saw that a new file was created and
>>then deleted called auto-whitelist.lock, due to the fact that the system
>>starting processing mails at this time.
>>
>>The questions that I have are:
>>1-according to previous statements about the size of bayes_msgcount, have
>>I only correctly processed 3 or 4 emails?
>>2-why are all of the .db files form a month and a half ago?
>>3-why are there still no headers containing anything regarding Bayes?
>>Am I missing something. I have had MailScanner running for about 2 months
>>now and am certain that I have processed enough emails.
>>
>>Any help is appreciated.
>>
>>Thank You
>>
>>Dene Ulmschneider
>>Data Techie Inc.
>>-------------------------------------------------------------------------
>>office: 718.738.8859
>>email: dene at datatechie.com
>>pager mail: denenow at datatechie.com
>>website: www.datatechie.com
>>-------------------------------------------------------------------------
>>"Life is too short...-...you should have dessert first"
>>
>>At 02:29 PM 5/6/2003 +0100, you wrote:
>>>At 14:18 06/05/2003, you wrote:
>>>>Well i have just setup mailscanner 4.20-3 and i have some problemes
>>>>with bayes "scoring".
>>>>
>>>>I have the bayes database working as it s modified each time i receive
>>>>a mail but when i gor spam i never seen BAYES_DB tag in the scoring of
>>>>spam.
>>>>Is there a minim size of the bayes database in order to be uzed for
>>>>scoring?
>>>It won't start using the results of the Bayes data until 200 messages have
>>>been scanned. The bayes_msgcount file will tell you how many it has scanned
>>>(file size == number of messages).
>>>
>>>
>>>
>>>
>>>
>>>>Thanks in advance for any help
>>>>
>>>>P.S
>>>>the command
>>>>check_bayes_db -db /var/spool/spamassassin/bayes | head -8
>>>>0.000 0 0 0 non-token data: db format = on-the-fly
>>>>probs,
>>>>expiry, scan-counting
>>>>0.000 0 16 0 non-token data: nspam
>>>>0.000 0 1233 0 non-token data: nham
>>>>0.000 0 51394 0 non-token data: ntokens
>>>>0.000 0 0 0 non-token data: oldest age
>>>>0.000 0 1382 0 non-token data: current scan-count
>>>>0.000 0 0 0 non-token data: last expiry scan-count
>>>>0.027 0 8 801 english
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Eric Doutreleau
>>>>I.N.T | Tel : +33 (0) 160764687
>>>>9 rue Charles Fourier | Fax : +33 (0) 160764321
>>>>91011 Evry France | email : Eric.Doutreleau at int-evry.fr
>>>--
>>>Julian Field
>>>www.MailScanner.info
>>>MailScanner thanks transtec Computers for their support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mailscanner.info/pipermail/mailscanner/attachments/20030506/e739ab0c/attachment.html
More information about the MailScanner
mailing list