Bayes scoring working wrong

Randal, Phil prandal at HEREFORDSHIRE.GOV.UK
Thu Dec 18 10:47:02 GMT 2003


I've found that the BigEvil list, popcorn, and other rules from
http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm really make a
difference.

And the detokenising rules from http://www.wot.no-ip.com/cgi-bin/detoken.pl
help too.

In the last two weeks using the standard SA rules plus the rules from the
above pages and a few custom rules everything scoring over 11.0 here has
been genuine spam.

Our custom rules are below.

NOTE: The WRONGCURRENCY Rule should only be used where the local currency is
not dollars.

header   UNBELIEVABLE   Subject =~ /unbelie?vable/i
describe UNBELIEVABLE   I cannot believe it is not spam
score    UNBELIEVABLE   4.0

header   FREE_SHIPPING  Subject =~ /free shipping/i
describe FREE_SHIPPING  Free shipping
score    FREE_SHIPPING  3.0

header   NATWEST_SCAM   Subject =~ /NatWest Bank Security Update/i
describe NATWEST_SCAM   I want your NatWest Password now!
score    NATWEST_SCAM   3.0

header   WRONGCURRENCY  Subject =~ /\$|dollar/i
describe WRONGCURRENCY  Wrong currency - dollar in subject
score    WRONGCURRENCY  4.0

header   FROM_PANEL     From =~ /sales\@panelwarehouse.com/i
describe FROM_PANEL     PanelWarehouse spam
score    FROM_PANEL     4.0

header   TOO_GOOD       Subject =~ /too good to miss/i
describe TOO_GOOD       Too good to not be spam
score    TOO_GOOD       4.0

# This next rule provides some protection against the latest IE
vulnerability
uri      IE_VULN        /https?:\/\/.*%([01][0-9a-f]|7f).*@/i
score    IE_VULN        100.0
describe IE_VULN        Internet Explorer vulnerability

header  RCVD_IN_BNBL    eval:check_rbl('bl', 'bl.blueshore.net.')
describe RCVD_IN_BNBL   Listed by BNBL
tflags  RCVD_IN_BNBL    net
score   RCVD_IN_BNBL    2.0

header  TO_MEET         Subject =~ /wants? to meet you/i
describe TO_MEET        A spammer wants to meet you
score   TO_MEET         3.5

header  FREE_LASER      Subject =~ /Free Laser Eye Consultation/i
describe FREE_LASER     You can see this is spam
score   FREE_LASER      3.5

Cheers,

Phil
---------------------------------------------
Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK

> -----Original Message-----
> From: MailScanner mailing list [mailto:MAILSCANNER at JISCMAIL.AC.UK]On
> Behalf Of James Gray
> Sent: 18 December 2003 02:50
> To: MAILSCANNER at JISCMAIL.AC.UK
> Subject: Re: Bayes scoring working wrong
>
>
> On Thu, 18 Dec 2003 01:24 pm, Nortex PageGuys wrote:
> > Hello ,
> >
> > > Return-Path: <ppfwts at hongkong.com>
> > > Received: by mailadmin.nortex.net (CommuniGate Pro PIPE 4.1.5)
> > >   with PIPE id 32188222; Wed, 17 Dec 2003 19:41:13 -0600
> > > Received: from [12.158.34.221] (HELO psmtp.com)
> > >   by mailadmin.nortex.net (CommuniGate Pro SMTP 4.1.5)
> > >   with SMTP id 32188182 for **REMOVED FOR SECURITY**;
> Wed, 17 Dec 2003
> > > 19:41:00 -0600 Received: from source ([218.235.30.213]) by
> > > exprod5mx69.postini.com ([12.158.34.245]) with SMTP; Wed,
> 17 Dec 2003
> > > 17:40:57 PST
> > > Received: from [218.235.30.213] by rx357.comIP with HTTP;
> > >         Thu, 18 Dec 2003 05:36:45 +0500
> > > From: "Riddle Eric" <ppfwts at hongkong.com>
> > > To: **REMOVED FOR SECURITY**
> > > Subject: Re: %RND_UC_CHAR[2-8], the promised kurolesov
>
> **snipped**
>
> > I have fed the Bayes engine in SpamAssassin lots of spam and ham
> > emails over the past 8 months, and this the result of all
> my work, its
> > reversing valid spam as not spam.
> >
> > Any suggestions on what I can do to improve spamassassins scoring on
> > this?
> > Best regards,
> >  Nortex                          mailto:pages at ntin.net
>
> Not a lot we can do about Bayes poisoning :( except create a couple of
> customised rules:
>
> header FROM_SPAMMER01   From =~ /\@.*hongkong\.com/i
> describe FROM_SPAMMER01 Known spam source 'hongkong.com'
> score FROM_SPAMMER01    3.5
>
> body BODY_BAN_CD        /Banned CD/i
> describe BODY_BAN_CD    Mentions 'banned CD'
> score BODY_BAN_CD       2.0
>
> Now unless my math is out: 3.5 + 2.0 - 0.399 = 5.101
>
> Bingo :)  Of course you'll need to keep creating rules for each forged
> address :-/  Not exactly ideal but it works.  Plus with
> perl's powerful
> regex, you'll find after a while that most spammers are
> creatures of habit
> and you can create some pretty powerful filters based on
> common themes,
> like domains that only have numbers (eg, 12345.biz in perl would be
> /[0-9]{5}\.biz/i etc) or common obfuscating patterns (eg,
> /([a-zA-Z](?:\_|\
> |-|\.)){3,}/i would catch any sequence of 3 or more letters
> separated by
> either "_", " ", "-" or ".")
>
> As I said in a post recently our mail filter at work has a
> combined false
> +ve/-ve rate of less that 0.01%.  We also have two guys
> (myself and the
> other Unix guy) managing the filters.  We currently have created 1523
> custom rules to tailor the filters to our specific needs.
> This number will
> only ever increase :(  However, if you're interested, I'm
> happy to share
> them (in a modified form - without all our internal business-specific
> stuff.  There's too many internal addresses/lists to just
> "put them up on
> an ftp somewhere").  Contact me off-list if anyone is interested :)
>
> --James
> __________________________________
> A random quote of nothing:
>
> BOFH excuse #295:
>
> The Token fell out of the ring. Call us when you find it.
>



More information about the MailScanner mailing list