Working well (SA customization tips)

Mon Nov 18 17:39:41 GMT 2002

Great info.  One question on the SA versions.  I upgraded to 2.42 (might
have been 2.43) and had huge problems with false negatives.  The
probably tripled.  Have you heard of that happening before.  I run SA
without any RBL's, and without Razor.  (though I'm thinking about going
down the razor path soon.)

Steve Evans
SDSU Foundation
(619) 594-0653 

-----Original Message-----
From: Matt Kettler [mailto:mkettler at EVI-INC.COM] 
Sent: Monday, November 18, 2002 9:02 AM
To: MAILSCANNER at JISCMAIL.AC.UK
Subject: Re: Working well (SA customization tips)

Ok, I'll admit upfront that I do a bit more "rule tinkering" than most
end users do, and I sometimes do a small contribution to the SA
Development effort. So I confess up front that I probably customize SA
more than most users will.

That said, I don't think that writing a few simple custom rules are
beyond the scope of what a "normal" user might want to do.

Before you go off tuning your ruleset, first make sure you're running a
reasonable version of SA. If you're getting an unreasonable number false
positives/negatives, and are running something older than 2.42, upgrade.
2.40 and 2.41 had absolutely horrid scores (due to a combination of a
few bad rules, a minor issue in the GA, and some mis-placed emails in
the corpus). Older versions aren't likely to be very effective against
current spam.

Other "general" tweaks you can apply are to increase the threshold, and
to
(lightly) bump down the scores of rules which are false-positive prone
on your email. You can apply these to your spam.assassin.prefs and the
SpamAssassin man page mentioned below should be sufficient to show you
the format for these options. Note that any score line in your prefs
file will supercede anything in 50_scores.cf, so to adjust a score, just
create a new score in your prefs file like this one (these are
adjustments from my config, with scores changed a little from what I
really use)

# How many hits before a mail is considered spam.
required_hits           5.2
#X_OSIRU_SPAM_SRC is high collateral damage, trim score down a little
score X_OSIRU_SPAM_SRC               1.5

SA is tuned for a more or less "general purpose" variety of email.
Depending on what industry you work in, you might get more "spam-alike"
marketing than most. Fortunately you also know there are certain "catch
phrases" for your industry that aren't likely to appear in spam mail.

I tend to have a small handful of "correction" rules that decrease the
score of emails pertaining to the industry my company works in. This
makes it a bit less likely that newsletters and marketing information
that people here have requested will be tagged as spam.

Note: you should not need to make a whole lot of these rules, in general
I'd think hard before making more than 10 of them.

What follows is a quickie guide to simple SA rule writing, targeted
towards MailScanner users
---------------------------

The first thing you'll want to do is skim through man
Mail::SpamAssassin::Conf. Then go to your /usr/share/spamassassin and
look at some of the rules in 20_head_tests.cf and 20_body_tests.cf.
(note: it is strongly advised that you NOT edit the files in
/usr/share/spamassassin)

Since you're running MailScanner the best place to put your rules is in
MailScanner's spam.assassin.prefs.conf, but I'd recommend writing and
testing them using the command-line tools while editing
/root/.spamassassin/user_prefs.

The simplest rules look for a basic text string, and assign a score,
like this one (this is one mine):

body BUGTRAQ_MENTIONED          /\bbugtraq\b/i
describe BUGTRAQ_MENTIONED      mentions bugtraq in body
score BUGTRAQ_MENTIONED         -1.0

The describe line is optional, and not very relevant to MailScanner
setups. I put it in there for my own reference.

The body rule itself is just a regex string match which is started and
terminated with forward slash characters (/).

The \b's are used inside the string to indicate "any kind of word break"
including spaces, tabs, newlines, etc and are generally a good idea at
the beginning and end of most rules (unless you want it to match even if
there is no word break). A string match ending with "not" will match
not, note, notice, etc but one ending with "not\b" will only match not.

the /i at the end makes the entire text match case insensitive. Some
rules you might want to leave this off, others you might want it on.

The regex's can be a lot more complicated, but most things you'll want
to do yourself should be simple enough with rules like this one.

After you write a rule, you need to test it. Every time you add a rule
you risk a typo causing SpamAssassin to skip large chunks of your rules.
If you followed my advice about trying them on root's user_prefs first,
test the rules using SpamAssassin's command line:

spamassassin --lint

This will make SA complain about rule syntax. Note that if MailScanner
calls SpamAssassin and there's a typo it will SILENTLY skip rules until
it can start parsing the config file again.

You can also test your rules against emails that are in raw text format
(note: this must be a complete SMTP formatted email, with headers, with
a empty blank line after the headers before the body begins, as per RFC
requirements)

spamassassin -tD <mytest.mail

Once you've got rules that don't error, and suit your needs put them
into your spam.assassin.prefs.conf

At 08:12 AM 11/18/2002 -0500, you wrote:
>One question I still have is, how do you handle a situation where 
>messages are marked as spam but really aren't? Let's assume it's not 
>because of DNS Blacklist, but because of content. I can't give an 
>example since it hasn't happened to me yet, so this is hypathetically 
>speaking. I assume if it's content that SpamAssassin is what is marking

>it as spam.
>
>Are the config files(content filters) for SpamAssasin configurable? 
>Where would this be done at? If it's not SpamAssassin, what would it 
>be? If there's a FAQ or Doc I should be looking at let me know.

Note: Emails authored under this address do not reflect the opinions of
my employer unless otherwise stated. Facts contained are also prone to
human error. If either of these statements are not humanly obvious to
you, I suggest careful thought before leaping to any other conclusions.
:)