Stock image spam blocking
John Rudd
jrudd at ucsc.edu
Tue Apr 25 21:36:51 IST 2006
On Apr 25, 2006, at 11:35, Matt Kettler wrote:
> Derek Chee wrote:
>> Hi,
>>
>> We've been getting bombarded recently with a lot of the embedded GIF
>> image OTCBB stock, pump and dump spam. The one with the random
>> subject,
>> from and sender lines.
>>
>> Has anybody had any luck creating SpamAssassin rules that would help
>> boost the score? Or better yet a good RBL that blocks them? For
>> RBLs,
>> we only run the Spamhaus lists. Being a university, we can't run a
>> very
>> aggressive RBL list as it would cause too many complaints about
>> blocking
>> legitimate email.
>>
>
> the SARE stock ruleset helps here. As do hash-based tests like Razor
> and DCC.
As has been pointed out, the hash based tests aren't going to catch all
image spam, because the spammers are smart enough to make small changes
to images that aren't caught by the human eye, but which do produce
unique hash results (meaning that they aren't caught by hash based
systems). As I mentioned last week, someone over on the mimedefang
list is working on a OCR perl module for feeding those images to, so
that you can get a bunch of text. The suggestion on the list is to
then attach that text to the message, so that when you feed it to Spam
Assassin, it gets picked up by bayes (both for training and scoring).
It might be a good thing to cross-pollenate into MailScanner.
> Finally, many seem to be sent from DUL listed hosts.
I recently took the stuff Steve Freegard posted to this list (under the
topic about Greylisting) and converted it to code for use with
MIMEDefang. It's doing a great job of catching all sorts of dynamic
and dial-up type host names.
Here's what he suggested, and my comments:
> 1) Check the PTR record (no lookup required Sendmail already does
> this).
> - TEMPFAIL the connection if no record exists.
>
> 2) Check the A record for the hostname returned by the reverse lookup.
> - (Optional), TEMPFAIL the connection if no record exists.
I do both of these. _AND_ if the A record does exist, but doesn't
match the relay's IP address, I give a permanent failure instead of a
tempfail.
> 3) Run a series of regexp tests against the hostname and REJECT the
> message if any match:
> - Hex encoded IP address appears within the hostname
> - all IP octets appear within the hostname (fwd/rev)
> - IP address without the .'s appears within the hostname (fwd/rev)
> - Last two octets appears within the hostname (fwd/rev)
> - Last octet appears within the hostname
> - Hostname contains any of the following (.adsl. .dsl. .dip. .ddns.)
The regex's I use here are:
elsif ( ($hostname =~ /(catv|cable|dsl|adsl|dhcp|ddns)/ ) ||
($hostname =~ /(dial-?up|dynamic|static|$e|$j)/ ) ||
($hostname =~ /($a.?0*$b|$b.?0*$c|$c.?0*$d)/ ) ||
($hostname =~ /($e.?0*$d|$d.?0*$c|$c.?0*$b)/ ) ||
($hostname =~ /($f.?0*$g|$g.?0*$h|$h.?0*$i)/ ) ||
($hostname =~ /($j.?0*$i|$i.?0*$h|$h.?0*$g)/ ) ) {
($a-$d are the dexicmal octets, $e is the entire IP address as a single
decimal value, $f-$i are the hex octets, and $j is the entire IP
address as a single hex value ... though, $j is technically redundant
since it wont be distinct from $f$g$h$i, and all has been converted to
lower case, including the hostname)
So, I eliminated dip ... I was uncomfortable with it being too generic,
and all of the hosts I saw that had it were caught by other parts of
this (or by the greet_pause, or by having given me my own host name as
their HELO string). Same with "if the last octet is in the hostname"
-- it was identifying hosts that looked like they were
non-dial-up/dynamic/end-user addresses (server-XX.someplace.com for
example).
So, my version of his #3 is:
- all hostname checks for IP addrs are done in both decimal and hex
- if any pair of octets is in the IP address, separated by any 1
character (or not), and including any leading zero padding (that I saw
in some such hostnames), in forward or reverse order
- if the entire hex IP address, or the total decimal value of the IP
address appears
- if the hostname contains catv cable dsl dhcp ddns dialup dial-up
dynamic or static (I don't require the leading and trailing .'s).
(and, yeah, in my regexp I put both dsl and adsl, even though dsl is
sufficient, I did it just for mental completeness ... besides, they
line up all pretty that way ... or mostly anyway).
I do all of that in filter_sender, so it happens after SMTP-AUTH, so
this check is after a check that basically says "don't worry about the
DNS things if they did an SMTP-AUTH, or they come from one of my own IP
addresses".
The error I give for this case instructs the sender (assuming it
bounces back to a human) to use their ISP's email server instead of
connecting directly to ours.
Since I started these checks, it has mostly impacted my mail flow by:
greatly reducing the number of hosts caught by SBL and XBL (they're now
being caught by these checks, which happen before the SBL and XBL
checks (I use delaychecks)), and reducing the number of messages I am
having to catch via spamassassin (which improves my system load by
quite a bit). And, looking at the greet_pause results, it looks like
90% of those would be caught by the above rules as well. So, I may
start to relax the greet_pause a little.
If you want to see the full code, it's at
http://www.rudd.cc/mimedefang-filter
note: I use this at home, not yet at work, and I no longer use
mailscanner at home; you could use them together ... if you did, I
would modify filter_begin to remove the virus checking, and modify
filter_end to remove the spamassassin stuff.
More information about the MailScanner
mailing list