sa-learn (Slightly OT)

Mon Jan 28 20:35:07 GMT 2008

This is slightly OT, since it's a spam assassin specific question.

Still, I may have good luck here, so I'll post anyway.

I've built up a small collection of spam that's slipped through my
MailScanner, and was planning on using sa-learn to train up the bayes
filter.

The catch?  We use outlook/exchange, and when I save the messages off
as text, I don't get much of the header, but if I save it off as a
message, it's in some weird outlook format.

I tried using sa-learn with the weird outlook messages, and now I'm
wondering, was that a good idea.  Out of about 60 messages it claimed
to learn 84 tokens.  Should I have it un-learn those?   Will the
header anemic text versions be sufficient for learning?  I'm going to
assume now that spam assassin won't read in outlook message format.

Thanks,
        Kyle