Bayes Question
Matt Kettler
mkettler at EVI-INC.COM
Thu May 13 18:53:00 IST 2004
At 09:04 AM 5/13/2004, Max Kipness wrote:
>However, I have about 1000 spam messages that a user and I have been
>collecting over the last month in an mbox file (via IMAP). I had read in
>a post on this group that you can just feed the spam without the ham
>successfully. Is this true and should I attempt this?
Well, you can, but you need to feed SA at least *some* ham. It needs at
minimum 200 hams before bayes will be used. Hand training is the fastest way.
In a truly perfect world your training would have a ham:spam ratio that's
the same as your inbound mail ratio. However, bayes is pretty tolerant of
considerable deviation from that. I'd still recommend doing at least _some_
ham training to have a good well balanced bayes db.
My automated hamtrap/spamtrap system gives me a training ratio of about
1:15 right now (93% spam). My real world ratio is about 1:2 (66% spam).
That's quite a difference, but it's not causing any problems.
>Or do I just grab around 1000 messages from my inbox and copy them to
>another IMAP
>attached mbox file?
I'd recommend grabbing somewhere between 200 and 1000 nonspams if you can.
Just make sure you don't grab any spam-discussion lists like
spamassassin-users, etc.
-------------------------- MailScanner list ----------------------
To leave, send leave mailscanner to jiscmail at jiscmail.ac.uk
Before posting, please see the Most Asked Questions at
http://www.mailscanner.biz/maq/ and the archives at
http://www.jiscmail.ac.uk/lists/mailscanner.html
More information about the MailScanner
mailing list