Working well (SA customization tips)

Mon Nov 18 17:36:05 GMT 2002

Matt,

Thank you very much for the response. That's exactly what I was looking
for, real world examples and experience! (thanks to Mike for his earlier
response too)

I am running 2.43 so I should be up to date. Like I said, on my server
at home it has 100% successful, but I only two or three email accounts
that receive any spam. I'm assuming if we use this at our company we'll
come across some that get tagged that really aren't.

Best regards,
James

On Mon, 2002-11-18 at 12:02, Matt Kettler wrote:
> Ok, I'll admit upfront that I do a bit more "rule tinkering" than most end
> users do, and I sometimes do a small contribution to the SA Development
> effort. So I confess up front that I probably customize SA more than most
> users will.
>
> That said, I don't think that writing a few simple custom rules are beyond
> the scope of what a "normal" user might want to do.
>
> Before you go off tuning your ruleset, first make sure you're running a
> reasonable version of SA. If you're getting an unreasonable number false
> positives/negatives, and are running something older than 2.42, upgrade.
> 2.40 and 2.41 had absolutely horrid scores (due to a combination of a few
> bad rules, a minor issue in the GA, and some mis-placed emails in the
> corpus). Older versions aren't likely to be very effective against current
> spam.
>
> Other "general" tweaks you can apply are to increase the threshold, and to
> (lightly) bump down the scores of rules which are false-positive prone on
> your email. You can apply these to your spam.assassin.prefs and the
> SpamAssassin man page mentioned below should be sufficient to show you the
> format for these options. Note that any score line in your prefs file will
> supercede anything in 50_scores.cf, so to adjust a score, just create a new
> score in your prefs file like this one (these are adjustments from my
> config, with scores changed a little from what I really use)
>
> # How many hits before a mail is considered spam.
> required_hits           5.2
> #X_OSIRU_SPAM_SRC is high collateral damage, trim score down a little
> score X_OSIRU_SPAM_SRC               1.5
>
>
> SA is tuned for a more or less "general purpose" variety of email.
> Depending on what industry you work in, you might get more "spam-alike"
> marketing than most. Fortunately you also know there are certain "catch
> phrases" for your industry that aren't likely to appear in spam mail.
>
> I tend to have a small handful of "correction" rules that decrease the
> score of emails pertaining to the industry my company works in. This makes
> it a bit less likely that newsletters and marketing information that people
> here have requested will be tagged as spam.
>
> Note: you should not need to make a whole lot of these rules, in general
> I'd think hard before making more than 10 of them.
>
> What follows is a quickie guide to simple SA rule writing, targeted towards
> MailScanner users
> ---------------------------
>
> The first thing you'll want to do is skim through man
> Mail::SpamAssassin::Conf. Then go to your /usr/share/spamassassin and look
> at some of the rules in 20_head_tests.cf and 20_body_tests.cf. (note: it is
> strongly advised that you NOT edit the files in /usr/share/spamassassin)
>
> Since you're running MailScanner the best place to put your rules is in
> MailScanner's spam.assassin.prefs.conf, but I'd recommend writing and
> testing them using the command-line tools while editing
> /root/.spamassassin/user_prefs.
>
> The simplest rules look for a basic text string, and assign a score, like
> this one (this is one mine):
>
> body BUGTRAQ_MENTIONED          /\bbugtraq\b/i
> describe BUGTRAQ_MENTIONED      mentions bugtraq in body
> score BUGTRAQ_MENTIONED         -1.0
>
> The describe line is optional, and not very relevant to MailScanner setups.
> I put it in there for my own reference.
>
> The body rule itself is just a regex string match which is started and
> terminated with forward slash characters (/).
>
> The \b's are used inside the string to indicate "any kind of word break"
> including spaces, tabs, newlines, etc and are generally a good idea at the
> beginning and end of most rules (unless you want it to match even if there
> is no word break). A string match ending with "not" will match not, note,
> notice, etc but one ending with "not\b" will only match not.
>
> the /i at the end makes the entire text match case insensitive. Some rules
> you might want to leave this off, others you might want it on.
>
> The regex's can be a lot more complicated, but most things you'll want to
> do yourself should be simple enough with rules like this one.
>
> After you write a rule, you need to test it. Every time you add a rule you
> risk a typo causing SpamAssassin to skip large chunks of your rules. If you
> followed my advice about trying them on root's user_prefs first, test the
> rules using SpamAssassin's command line:
>
> spamassassin --lint
>
> This will make SA complain about rule syntax. Note that if MailScanner
> calls SpamAssassin and there's a typo it will SILENTLY skip rules until it
> can start parsing the config file again.
>
> You can also test your rules against emails that are in raw text format
> (note: this must be a complete SMTP formatted email, with headers, with a
> empty blank line after the headers before the body begins, as per RFC
> requirements)
>
> spamassassin -tD <mytest.mail
>
> Once you've got rules that don't error, and suit your needs put them into
> your spam.assassin.prefs.conf
>
>
>
>
> At 08:12 AM 11/18/2002 -0500, you wrote:
> >One question I still have is, how do you handle a situation where
> >messages are marked as spam but really aren't? Let's assume it's not
> >because of DNS Blacklist, but because of content. I can't give an
> >example since it hasn't happened to me yet, so this is hypathetically
> >speaking. I assume if it's content that SpamAssassin is what is marking
> >it as spam.
> >
> >Are the config files(content filters) for SpamAssasin configurable?
> >Where would this be done at? If it's not SpamAssassin, what would it be?
> >If there's a FAQ or Doc I should be looking at let me know.
>
> Note: Emails authored under this address do not reflect the opinions of my
> employer unless otherwise stated. Facts contained are also prone to human
> error. If either of these statements are not humanly obvious to you, I
> suggest careful thought before leaping to any other conclusions. :)