Is this really how bayes+autolearn works?

Tue Dec 12 22:45:11 GMT 2006

Furnish, Trever G spake the following on 12/12/2006 1:59 PM:
> My Bayes db seems to consistently start assigning BAYES_00 (-2.6) to
> messages that are simple plain text messages you'd think would be easily
> caught.  The messages are all seemingly almost identical.  They're
> coming from bots, and only a small percentage of them are caught by
> other SA rules.
>  
> Does that mean that they're being auto-learned as Ham and then
> cancelling out my attempts to teach Bayes later that this is spam?  Out
> of the many thousands that flood in, I'm only able to retrain bayes
> using a small percentage (because only a small subset of my users drag
> spam into the retraining system consistently).
>  
> The reason they're not caught by other SA rules is because they're
> coming from bots.  Many of the samples I've looked at also wouldn't have
> been caught John Rudd's Botnet plugin.
> 
> So Bayes is getting lots of messages that SA doesn't detect as spam, and
> only a few similar messages that I train it to treat as spam.  Is this a
> plausible explanation for why Bayes would consistently be misclassifying
> this mail?
>  
> So far the floods start in the afternoon and the subject strings are
> consistent enough that I'm able to correct the damage by:
>     - removing my bayes database and retraining from archived spam
> corpus (slow)
>     - creating custom rules to, for example, filter out "Subject =~
> /Good Morning/" (dangerous)
>  
I also see a lot of spam coming from bots, but I consistently catch most of
it. Are you using some good add-on rules?
Do you have any samples that some of us could run through our systems to see
what we get?

-- 

MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!