Reliable spam/nospam bayes learner?

Eric Dantan Rzewnicki rzewnickie at RFA.ORG
Fri Jan 30 17:55:58 GMT 2004


Julian's scripts are in this FAQ.

http://www.sng.ecs.soton.ac.uk/mailscanner/serve/cache/98.html

They will only work on bounced/resent messages.

If you can get the user's to send you the headers, it's pretty easy to
match up the date and queuefile-id with archived mail if you keep your
archives as queuefiles. I posted a half-baked solution for this a while
back:

http://www.jiscmail.ac.uk/cgi-bin/wa.exe?A2=ind0401&L=mailscanner&P=R65809&I=-1

Others have suggested getting the headers is possible if the users
can forward the mail as an attachment.

I'm working out a script to do what you have asked for. It's tricky
because outlook changes the headers it includes in the forward, but I
think I (maybe) can get it to work.

I'm now archiving all mail for 7 days in mbox format. My plan is to use
a combination of formail and grep to get what information I can out of
the user's forwarded spam/notspam messages. i.e., subject, sender and
recipient. Then use grepmail to match that with the pristine original
message in the archive mbox and feed it to sa-learn.

Things have been a little crazy here this week what with the weather,
protracted flakiness (several outages including one >7 hours, ugh) on
the part of our ISP, tracking this mydoom explosion, double checking
everything to make sure it doesn't slip through the cracks, and
responding to all the "I didn't send this. Why are they telling me I
have a virus"-type inquiries from management and users. So, I haven't
gotten very far with it...

I'll see if I can make some more progress on it today. I'll share
whatever I come up with, whenever I come up with it.

-Eric Rz.

On Fri, Jan 30, 2004 at 09:31:21AM +0000, Julian Field wrote:
> I have posted my scripts to do this to this list a few times now. Try
> searching for posts from me which include "notspam".
>
> At 08:10 30/01/2004, you wrote:
> >Hi,
> >
> >maybe I'm missing seeing the link, but I was looking for a script that I
> >can set up so users can forward false positives/negatives to so that
> >they will be learned by SA as spam or ham ... also, as it will be hard
> >enough to teach people to forward correctly, it has to learn from a
> >forwarded, not bounced, mail ... that is, ignore the information added
> >by the mail client and just look at the original mail (as far as
> >information is still available, like headers, etc.)
> >
> >Help appreciated,
> >
> >-garry
>
> --
> Julian Field
> www.MailScanner.info
> MailScanner thanks transtec Computers for their support
>
> PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654



More information about the MailScanner mailing list