Bayes Question

Matt Kettler mkettler at EVI-INC.COM
Thu May 13 18:53:00 IST 2004


At 09:04 AM 5/13/2004, Max Kipness wrote:
>However, I have about 1000 spam messages that a user and I have been
>collecting over the last month in an mbox file (via IMAP). I had read in
>a post on this group that you can just feed the spam without the ham
>successfully. Is this true and should I attempt this?

Well, you can, but you need to feed SA at least *some* ham. It needs at
minimum 200 hams before bayes will be used. Hand training is the fastest way.

In a truly perfect world your training would have a ham:spam ratio that's
the same as your inbound mail ratio. However, bayes is pretty tolerant of
considerable deviation from that. I'd still recommend doing at least _some_
ham training to have a good well balanced bayes db.

My automated hamtrap/spamtrap system gives me a training ratio of about
1:15 right now (93% spam). My real world ratio is about 1:2 (66% spam).

That's quite a difference, but it's not causing any problems.

>Or do I just grab around 1000 messages from my inbox and copy them to
>another IMAP
>attached mbox file?

I'd recommend grabbing somewhere between 200 and 1000 nonspams if you can.
Just make sure you don't grab any spam-discussion lists like
spamassassin-users, etc.

-------------------------- MailScanner list ----------------------
To leave, send    leave mailscanner    to jiscmail at jiscmail.ac.uk
Before posting, please see the Most Asked Questions at
http://www.mailscanner.biz/maq/     and the archives at
http://www.jiscmail.ac.uk/lists/mailscanner.html



More information about the MailScanner mailing list