<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1170" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=130291819-06052003><FONT face=Arial color=#0000ff size=2>Are
you sure you're using the same bayes database for everything? Make sure
you are running everything (MailScanner, sa-learn scripts, etc) as the same user
or you specify the same location for your bayes database. I think you can
force a location in both your spam.assassin.prefs.conf and in
MailScanner.conf. If you're not specifying a location, it should default
to ~/.spamassassin.</FONT></SPAN></DIV>
<DIV><SPAN class=130291819-06052003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=130291819-06052003><FONT face=Arial color=#0000ff size=2>Right
now though, it looks like your database (for root) has only learned about 87
spams.</FONT></SPAN></DIV>
<DIV><SPAN class=130291819-06052003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=130291819-06052003><FONT face=Arial color=#0000ff
size=2>Jason</FONT></SPAN></DIV>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Dene Ulmschneider
[mailto:dene@DATATECHIE.COM]<BR><B>Sent:</B> Tuesday, May 06, 2003 2:38
PM<BR><B>To:</B> MAILSCANNER@JISCMAIL.AC.UK<BR><B>Subject:</B> Re:
[MAILSCANNER] when is Bayes scoring used?<BR><BR></FONT></DIV>something else
to add...<BR><BR>According the script that Julian provided to run sa-learn
through cron, my log is called "learn.spam.log"<BR><BR>When I checked that
file - I added up all of the "learned form <I>XX</I> messages" and the total
number was 447.<BR><BR>Is the "learned from" referring to spam and ham? Is it
possible that I have 87 spam and the rest of them a ham? I thought I was
pretty sure that more spam was getting processed than ham - but I could be
wrong.<BR><BR>Can anyone shed a little light?<BR><BR>Dene<BR><BR>At 02:17 PM
5/6/2003 -0400, you wrote:<BR>
<BLOCKQUOTE class=cite cite="" type="cite">The sa-learn -D --rebuild
returned the following output:<BR><BR><---snip---><BR><FONT
size=2>debug: Score set 0 chosen. <BR>debug: running in taint mode? no
<BR>debug: using "/usr/share/spamassassin" for default rules dir <BR>debug:
using "/etc/mail/spamassassin" for site rules dir <BR>Failed to create
default user preference file /root/.spamassassin/user_prefs <BR>debug: using
"/root/.spamassassin/user_prefs" for user prefs file <BR>debug: bayes: 17204
tie-ing to DB file R/O /root/.spamassassin/bayes_toks <BR>debug: bayes:
17204 tie-ing to DB file R/O /root/.spamassassin/bayes_seen <BR>debug:
debug: Only 87 spam(s) in Bayes DB < 200 <BR>debug: bayes: 17204
untie-ing <BR>debug: bayes: 17204 untie-ing db_toks <BR>debug: bayes: 17204
untie-ing db_seen <BR>debug: Score set 0 chosen. <BR>debug: Initialising
learner <BR>debug: Initialising learner <BR>debug: lock: 17204 created
<BR>/root/.spamassassin/bayes.lock.neo.datatechie.com.17204 <BR>debug: lock:
17204 trying to get lock on /root/.spamassassin/bayes with 0 <BR>retries
<BR>debug: lock: 17204 link to /root/.spamassassin/bayes.lock: link ok
<BR>debug: bayes: 17204 tie-ing to DB file R/W
/root/.spamassassin/bayes_toks <BR>debug: bayes: 17204 tie-ing to DB file
R/W /root/.spamassassin/bayes_seen <BR>debug: bayes: 17204 untie-ing
<BR>debug: bayes: 17204 untie-ing db_toks <BR>debug: bayes: 17204 untie-ing
db_seen <BR>debug: bayes: files locked, now unlocking lock <BR>debug:
unlock: 17204 unlink /root/.spamassassin/bayes.lock <BR>debug: bayes: 17204
untie-ing <BR></FONT><---snip---><BR><BR>Does anything look wrong? I
am shocked to find that only 87 messages have been recorded so far, but
that's what the output states.<BR><BR>Thanks for the
help.<BR><BR>Dene<BR><BR>At 11:38 AM 5/6/2003 -0400, you wrote:<BR>
<BLOCKQUOTE class=cite cite="" type="cite"><FONT face=arial color=#0000ff
size=2>I think you need 200 spam and 200 ham. Try running
spamassassin with the -D switch for debug and see what it says about
bayes. Also, you can run the check_bayes_db command and see how many
spam and ham have been learned. And you can run "sa-learn -D
--rebuild" and see if it says anything about there not being enough spam
or ham. These may give you some clues to your
questions.</FONT><BR> <BR><FONT face=arial color=#0000ff
size=2>Jason</FONT>
<DL><FONT face=tahoma size=2>
<DD>-----Original Message-----
<DD>From: Dene Ulmschneider [<A href="mailto:dene@DATATECHIE.COM"
eudora="autourl">mailto:dene@DATATECHIE.COM</A>]
<DD>Sent: Tuesday, May 06, 2003 10:53 AM
<DD>To: MAILSCANNER@JISCMAIL.AC.UK
<DD>Subject: Re: [MAILSCANNER] when is Bayes scoring
used?<BR><BR></FONT>
<DD>Hey Julian et all-<BR><BR>
<DD>In regards to all of the messages I have read that Bayes will not
start working until the magic number of 200 messages is reached, I am
certain that I have processed more than 200 messages and yet I still see
no "Bayes" entries in the headers.<BR><BR>
<DD>I have checked the files in /root/.spamassassin and found the
following:<BR><BR>
<DD>filename<X-TAB> </X-TAB><X-TAB> </X-TAB>size<X-TAB> </X-TAB><X-TAB> </X-TAB>date
modified
<DD>auto-whitelist<X-TAB> </X-TAB><X-TAB> </X-TAB>644.0
kb<X-TAB> </X-TAB>today
<DD>auto-whitelist.db<X-TAB> </X-TAB>12.0
kb<X-TAB> </X-TAB><X-TAB> </X-TAB>3.28.03
<DD>bayes_msgcount<X-TAB> </X-TAB>3.2
kb<X-TAB> </X-TAB><X-TAB> </X-TAB>today
<DD>bayes_seen<X-TAB> </X-TAB><X-TAB> </X-TAB>1.3
mb<X-TAB> </X-TAB><X-TAB> </X-TAB>today
<DD>bayes_seen.db<X-TAB> </X-TAB><X-TAB> </X-TAB>4.0
kb<X-TAB> </X-TAB><X-TAB> </X-TAB>3.28.03
<DD>bayes_toks<X-TAB> </X-TAB><X-TAB> </X-TAB>2.6
mb<X-TAB> </X-TAB><X-TAB> </X-TAB>today
<DD>bayes_toks.db<X-TAB> </X-TAB><X-TAB> </X-TAB>12.0
kb<X-TAB> </X-TAB><X-TAB> </X-TAB>3.28.03<BR><BR>
<DD>while I was checking these files - I saw that a new file was created
and then deleted called auto-whitelist.lock, due to the fact that the
system starting processing mails at this time.<BR><BR>
<DD>The questions that I have are:
<DD>1-according to previous statements about the size of bayes_msgcount,
have I only correctly processed 3 or 4 emails?
<DD>2-why are all of the .db files form a month and a half ago?
<DD>3-why are there still no headers containing anything regarding
Bayes?
<DD>Am I missing something. I have had MailScanner running for about 2
months now and am certain that I have processed enough emails.<BR><BR>
<DD>Any help is appreciated.<BR><BR>
<DD>Thank You<BR><BR>
<DD>Dene Ulmschneider
<DD>Data Techie Inc.
<DD>-------------------------------------------------------------------------
<DD>office:<X-TAB> </X-TAB><X-TAB> </X-TAB>718.738.8859
<DD>email:<X-TAB> </X-TAB><X-TAB> </X-TAB>dene@datatechie.com
<DD>pager
mail:<X-TAB> </X-TAB>denenow@datatechie.com
<DD>website:<X-TAB> </X-TAB><A
href="http://www.datatechie.com/"
eudora="autourl">www.datatechie.com</A>
<DD>-------------------------------------------------------------------------
<DD>"Life is too short...-...you should have dessert
first"<BR><BR>
<DD>At 02:29 PM 5/6/2003 +0100, you wrote:
<BLOCKQUOTE class=cite cite="" type="cite">
<DD>At 14:18 06/05/2003, you wrote:
<BLOCKQUOTE class=cite cite="" type="cite">
<DD>Well i have just setup mailscanner 4.20-3 and i have some
problemes
<DD>with bayes "scoring".<BR><BR>
<DD>I have the bayes database working as it s modified each time i
receive
<DD>a mail but when i gor spam i never seen BAYES_DB tag in the
scoring of
<DD>spam.
<DD>Is there a minim size of the bayes database in order to be uzed
for
<DD>scoring?</DD></BLOCKQUOTE>
<DD>It won't start using the results of the Bayes data until 200
messages have
<DD>been scanned. The bayes_msgcount file will tell you how many it
has scanned
<DD>(file size == number of messages).<BR><BR><BR><BR><BR><BR>
<BLOCKQUOTE class=cite cite="" type="cite">
<DD>Thanks in advance for any help<BR><BR>
<DD>P.S
<DD>the command
<DD>check_bayes_db -db /var/spool/spamassassin/bayes | head -8
<DD>0.000
0
0 0 non-token data:
db format = on-the-fly
<DD>probs,
<DD>expiry, scan-counting
<DD>0.000
0
16 0 non-token data:
nspam
<DD>0.000
0
1233 0 non-token
data: nham
<DD>0.000
0 51394
0 non-token data: ntokens
<DD>0.000
0
0 0 non-token data:
oldest age
<DD>0.000
0
1382 0 non-token
data: current scan-count
<DD>0.000
0
0 0 non-token data:
last expiry scan-count
<DD>0.027
0
8 801
english<BR><BR><BR><BR><BR><BR>
<DD>--
<DD>Eric Doutreleau
<DD>I.N.T
| Tel : +33 (0) 160764687
<DD>9 rue Charles Fourier | Fax : +33 (0)
160764321
<DD>91011 Evry France | email :
Eric.Doutreleau@int-evry.fr</DD></BLOCKQUOTE>
<DD>--
<DD>Julian Field
<DD><A href="http://www.mailscanner.info/"
eudora="autourl">www.MailScanner.info</A>
<DD>MailScanner thanks transtec Computers for their support
</DD></BLOCKQUOTE></DD></DL></BLOCKQUOTE></BLOCKQUOTE>
<BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>