Reliable spam/nospam bayes learner?

Eric Dantan Rzewnicki rzewnickie at RFA.ORG
Fri Jan 30 17:55:58 GMT 2004

Julian's scripts are in this FAQ.

They will only work on bounced/resent messages.

If you can get the user's to send you the headers, it's pretty easy to
match up the date and queuefile-id with archived mail if you keep your
archives as queuefiles. I posted a half-baked solution for this a while

Others have suggested getting the headers is possible if the users
can forward the mail as an attachment.

I'm working out a script to do what you have asked for. It's tricky
because outlook changes the headers it includes in the forward, but I
think I (maybe) can get it to work.

I'm now archiving all mail for 7 days in mbox format. My plan is to use
a combination of formail and grep to get what information I can out of
the user's forwarded spam/notspam messages. i.e., subject, sender and
recipient. Then use grepmail to match that with the pristine original
message in the archive mbox and feed it to sa-learn.

Things have been a little crazy here this week what with the weather,
protracted flakiness (several outages including one >7 hours, ugh) on
the part of our ISP, tracking this mydoom explosion, double checking
everything to make sure it doesn't slip through the cracks, and
responding to all the "I didn't send this. Why are they telling me I
have a virus"-type inquiries from management and users. So, I haven't
gotten very far with it...

I'll see if I can make some more progress on it today. I'll share
whatever I come up with, whenever I come up with it.

-Eric Rz.

