Bayes scoring working wrong
Randal, Phil
prandal at HEREFORDSHIRE.GOV.UK
Thu Dec 18 10:47:02 GMT 2003
I've found that the BigEvil list, popcorn, and other rules from
http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm really make a
difference.
And the detokenising rules from http://www.wot.no-ip.com/cgi-bin/detoken.pl
help too.
In the last two weeks using the standard SA rules plus the rules from the
above pages and a few custom rules everything scoring over 11.0 here has
been genuine spam.
Our custom rules are below.
NOTE: The WRONGCURRENCY Rule should only be used where the local currency is
not dollars.
header UNBELIEVABLE Subject =~ /unbelie?vable/i
describe UNBELIEVABLE I cannot believe it is not spam
score UNBELIEVABLE 4.0
header FREE_SHIPPING Subject =~ /free shipping/i
describe FREE_SHIPPING Free shipping
score FREE_SHIPPING 3.0
header NATWEST_SCAM Subject =~ /NatWest Bank Security Update/i
describe NATWEST_SCAM I want your NatWest Password now!
score NATWEST_SCAM 3.0
header WRONGCURRENCY Subject =~ /\$|dollar/i
describe WRONGCURRENCY Wrong currency - dollar in subject
score WRONGCURRENCY 4.0
header FROM_PANEL From =~ /sales\@panelwarehouse.com/i
describe FROM_PANEL PanelWarehouse spam
score FROM_PANEL 4.0
header TOO_GOOD Subject =~ /too good to miss/i
describe TOO_GOOD Too good to not be spam
score TOO_GOOD 4.0
# This next rule provides some protection against the latest IE
vulnerability
uri IE_VULN /https?:\/\/.*%([01][0-9a-f]|7f).*@/i
score IE_VULN 100.0
describe IE_VULN Internet Explorer vulnerability
header RCVD_IN_BNBL eval:check_rbl('bl', 'bl.blueshore.net.')
describe RCVD_IN_BNBL Listed by BNBL
tflags RCVD_IN_BNBL net
score RCVD_IN_BNBL 2.0
header TO_MEET Subject =~ /wants? to meet you/i
describe TO_MEET A spammer wants to meet you
score TO_MEET 3.5
header FREE_LASER Subject =~ /Free Laser Eye Consultation/i
describe FREE_LASER You can see this is spam
score FREE_LASER 3.5
Cheers,
Phil
---------------------------------------------
Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK
> -----Original Message-----
> From: MailScanner mailing list [mailto:MAILSCANNER at JISCMAIL.AC.UK]On
> Behalf Of James Gray
> Sent: 18 December 2003 02:50
> To: MAILSCANNER at JISCMAIL.AC.UK
> Subject: Re: Bayes scoring working wrong
>
>
> On Thu, 18 Dec 2003 01:24 pm, Nortex PageGuys wrote:
> > Hello ,
> >
> > > Return-Path: <ppfwts at hongkong.com>
> > > Received: by mailadmin.nortex.net (CommuniGate Pro PIPE 4.1.5)
> > > with PIPE id 32188222; Wed, 17 Dec 2003 19:41:13 -0600
> > > Received: from [12.158.34.221] (HELO psmtp.com)
> > > by mailadmin.nortex.net (CommuniGate Pro SMTP 4.1.5)
> > > with SMTP id 32188182 for **REMOVED FOR SECURITY**;
> Wed, 17 Dec 2003
> > > 19:41:00 -0600 Received: from source ([218.235.30.213]) by
> > > exprod5mx69.postini.com ([12.158.34.245]) with SMTP; Wed,
> 17 Dec 2003
> > > 17:40:57 PST
> > > Received: from [218.235.30.213] by rx357.comIP with HTTP;
> > > Thu, 18 Dec 2003 05:36:45 +0500
> > > From: "Riddle Eric" <ppfwts at hongkong.com>
> > > To: **REMOVED FOR SECURITY**
> > > Subject: Re: %RND_UC_CHAR[2-8], the promised kurolesov
>
> **snipped**
>
> > I have fed the Bayes engine in SpamAssassin lots of spam and ham
> > emails over the past 8 months, and this the result of all
> my work, its
> > reversing valid spam as not spam.
> >
> > Any suggestions on what I can do to improve spamassassins scoring on
> > this?
> > Best regards,
> > Nortex mailto:pages at ntin.net
>
> Not a lot we can do about Bayes poisoning :( except create a couple of
> customised rules:
>
> header FROM_SPAMMER01 From =~ /\@.*hongkong\.com/i
> describe FROM_SPAMMER01 Known spam source 'hongkong.com'
> score FROM_SPAMMER01 3.5
>
> body BODY_BAN_CD /Banned CD/i
> describe BODY_BAN_CD Mentions 'banned CD'
> score BODY_BAN_CD 2.0
>
> Now unless my math is out: 3.5 + 2.0 - 0.399 = 5.101
>
> Bingo :) Of course you'll need to keep creating rules for each forged
> address :-/ Not exactly ideal but it works. Plus with
> perl's powerful
> regex, you'll find after a while that most spammers are
> creatures of habit
> and you can create some pretty powerful filters based on
> common themes,
> like domains that only have numbers (eg, 12345.biz in perl would be
> /[0-9]{5}\.biz/i etc) or common obfuscating patterns (eg,
> /([a-zA-Z](?:\_|\
> |-|\.)){3,}/i would catch any sequence of 3 or more letters
> separated by
> either "_", " ", "-" or ".")
>
> As I said in a post recently our mail filter at work has a
> combined false
> +ve/-ve rate of less that 0.01%. We also have two guys
> (myself and the
> other Unix guy) managing the filters. We currently have created 1523
> custom rules to tailor the filters to our specific needs.
> This number will
> only ever increase :( However, if you're interested, I'm
> happy to share
> them (in a modified form - without all our internal business-specific
> stuff. There's too many internal addresses/lists to just
> "put them up on
> an ftp somewhere"). Contact me off-list if anyone is interested :)
>
> --James
> __________________________________
> A random quote of nothing:
>
> BOFH excuse #295:
>
> The Token fell out of the ring. Call us when you find it.
>
More information about the MailScanner
mailing list