Is this really how bayes+autolearn works?

Martin Hepworth martinh at solidstatelogic.com
Wed Dec 13 14:44:47 GMT 2006


Furnish, Trever G wrote:
>  
> 
>> -----Original Message-----
>> From: mailscanner-bounces at lists.mailscanner.info 
>> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf 
>> Of Scott Silva
>> Sent: Tuesday, December 12, 2006 5:45 PM
>> To: mailscanner at lists.mailscanner.info
>> Subject: Re: Is this really how bayes+autolearn works?
> 
>> Furnish, Trever G spake the following on 12/12/2006 1:59 PM:
>>> So Bayes is getting lots of messages that SA doesn't detect 
>> as spam, 
>>> and only a few similar messages that I train it to treat as 
>> spam.  Is 
>>> this a plausible explanation for why Bayes would consistently be 
>>> misclassifying this mail?
>>>  
>>> So far the floods start in the afternoon and the subject 
>> strings are 
>>> consistent enough that I'm able to correct the damage by:
>>>     - removing my bayes database and retraining from archived spam 
>>> corpus (slow)
>>>     - creating custom rules to, for example, filter out "Subject =~ 
>>> /Good Morning/" (dangerous)
>  
>> I also see a lot of spam coming from bots, but I consistently 
>> catch most of it. Are you using some good add-on rules?
>> Do you have any samples that some of us could run through our 
>> systems to see what we get?
> 
> Requested samples are attached.  They're very simple messages -- but
> they're flooding in without being caught and then Bayes starts to assign
> -2.60 to them. :-(
> 
> --
> Trever

Trever

latest SARE_STOCKS should help here..

-- 
Martin Hepworth
Senior Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300

**********************************************************************

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.	

**********************************************************************



More information about the MailScanner mailing list