Working well (SA customization tips)

Mon Nov 18 17:02:12 GMT 2002

Ok, I'll admit upfront that I do a bit more "rule tinkering" than most end
users do, and I sometimes do a small contribution to the SA Development
effort. So I confess up front that I probably customize SA more than most
users will.

That said, I don't think that writing a few simple custom rules are beyond
the scope of what a "normal" user might want to do.

Before you go off tuning your ruleset, first make sure you're running a
reasonable version of SA. If you're getting an unreasonable number false
positives/negatives, and are running something older than 2.42, upgrade.
2.40 and 2.41 had absolutely horrid scores (due to a combination of a few
bad rules, a minor issue in the GA, and some mis-placed emails in the
corpus). Older versions aren't likely to be very effective against current
spam.

Other "general" tweaks you can apply are to increase the threshold, and to
(lightly) bump down the scores of rules which are false-positive prone on
your email. You can apply these to your spam.assassin.prefs and the
SpamAssassin man page mentioned below should be sufficient to show you the
format for these options. Note that any score line in your prefs file will
supercede anything in 50_scores.cf, so to adjust a score, just create a new
score in your prefs file like this one (these are adjustments from my
config, with scores changed a little from what I really use)

# How many hits before a mail is considered spam.
required_hits           5.2
#X_OSIRU_SPAM_SRC is high collateral damage, trim score down a little
score X_OSIRU_SPAM_SRC               1.5

SA is tuned for a more or less "general purpose" variety of email.
Depending on what industry you work in, you might get more "spam-alike"
marketing than most. Fortunately you also know there are certain "catch
phrases" for your industry that aren't likely to appear in spam mail.

I tend to have a small handful of "correction" rules that decrease the
score of emails pertaining to the industry my company works in. This makes
it a bit less likely that newsletters and marketing information that people
here have requested will be tagged as spam.

Note: you should not need to make a whole lot of these rules, in general
I'd think hard before making more than 10 of them.

What follows is a quickie guide to simple SA rule writing, targeted towards
MailScanner users
---------------------------

The first thing you'll want to do is skim through man
Mail::SpamAssassin::Conf. Then go to your /usr/share/spamassassin and look
at some of the rules in 20_head_tests.cf and 20_body_tests.cf. (note: it is
strongly advised that you NOT edit the files in /usr/share/spamassassin)

Since you're running MailScanner the best place to put your rules is in
MailScanner's spam.assassin.prefs.conf, but I'd recommend writing and
testing them using the command-line tools while editing
/root/.spamassassin/user_prefs.

The simplest rules look for a basic text string, and assign a score, like
this one (this is one mine):

body BUGTRAQ_MENTIONED          /\bbugtraq\b/i
describe BUGTRAQ_MENTIONED      mentions bugtraq in body
score BUGTRAQ_MENTIONED         -1.0

The describe line is optional, and not very relevant to MailScanner setups.
I put it in there for my own reference.

The body rule itself is just a regex string match which is started and
terminated with forward slash characters (/).

The \b's are used inside the string to indicate "any kind of word break"
including spaces, tabs, newlines, etc and are generally a good idea at the
beginning and end of most rules (unless you want it to match even if there
is no word break). A string match ending with "not" will match not, note,
notice, etc but one ending with "not\b" will only match not.

the /i at the end makes the entire text match case insensitive. Some rules
you might want to leave this off, others you might want it on.

The regex's can be a lot more complicated, but most things you'll want to
do yourself should be simple enough with rules like this one.

After you write a rule, you need to test it. Every time you add a rule you
risk a typo causing SpamAssassin to skip large chunks of your rules. If you
followed my advice about trying them on root's user_prefs first, test the
rules using SpamAssassin's command line:

spamassassin --lint

This will make SA complain about rule syntax. Note that if MailScanner
calls SpamAssassin and there's a typo it will SILENTLY skip rules until it
can start parsing the config file again.

You can also test your rules against emails that are in raw text format
(note: this must be a complete SMTP formatted email, with headers, with a
empty blank line after the headers before the body begins, as per RFC
requirements)

spamassassin -tD <mytest.mail

Once you've got rules that don't error, and suit your needs put them into
your spam.assassin.prefs.conf

At 08:12 AM 11/18/2002 -0500, you wrote:
>One question I still have is, how do you handle a situation where
>messages are marked as spam but really aren't? Let's assume it's not
>because of DNS Blacklist, but because of content. I can't give an
>example since it hasn't happened to me yet, so this is hypathetically
>speaking. I assume if it's content that SpamAssassin is what is marking
>it as spam.
>
>Are the config files(content filters) for SpamAssasin configurable?
>Where would this be done at? If it's not SpamAssassin, what would it be?
>If there's a FAQ or Doc I should be looking at let me know.

Note: Emails authored under this address do not reflect the opinions of my
employer unless otherwise stated. Facts contained are also prone to human
error. If either of these statements are not humanly obvious to you, I
suggest careful thought before leaping to any other conclusions. :)