Stock image spam blocking

John Rudd jrudd at ucsc.edu
Tue Apr 25 21:36:51 IST 2006


On Apr 25, 2006, at 11:35, Matt Kettler wrote:

> Derek Chee wrote:
>> Hi,
>>
>> We've been getting bombarded recently with a lot of the embedded GIF
>> image OTCBB stock, pump and dump spam.  The one with the random 
>> subject,
>> from and sender lines.
>>
>> Has anybody had any luck creating SpamAssassin rules that would help
>> boost the score?  Or better yet a good RBL that blocks them?  For 
>> RBLs,
>> we only run the Spamhaus lists.  Being a university, we can't run a 
>> very
>> aggressive RBL list as it would cause too many complaints about 
>> blocking
>> legitimate email.
>>
>
> the SARE stock ruleset helps here. As do hash-based tests like Razor 
> and DCC.

As has been pointed out, the hash based tests aren't going to catch all 
image spam, because the spammers are smart enough to make small changes 
to images that aren't caught by the human eye, but which do produce 
unique hash results (meaning that they aren't caught by hash based 
systems).  As I mentioned last week, someone over on the mimedefang 
list is working on a OCR perl module for feeding those images to, so 
that you can get a bunch of text.  The suggestion on the list is to 
then attach that text to the message, so that when you feed it to Spam 
Assassin, it gets picked up by bayes (both for training and scoring).

It might be a good thing to cross-pollenate into MailScanner.


> Finally, many seem to be sent from DUL listed hosts.

I recently took the stuff Steve Freegard posted to this list (under the 
topic about Greylisting) and converted it to code for use with 
MIMEDefang.  It's doing a great job of catching all sorts of dynamic 
and dial-up type host names.

Here's what he suggested, and my comments:


> 1) Check the PTR record (no lookup required Sendmail already does 
> this).
>  - TEMPFAIL the connection if no record exists.
>
> 2) Check the A record for the hostname returned by the reverse lookup.
>  - (Optional), TEMPFAIL the connection if no record exists.

I do both of these.  _AND_ if the A record does exist, but doesn't 
match the relay's IP address, I give a permanent failure instead of a 
tempfail.


> 3) Run a series of regexp tests against the hostname and REJECT the
> message if any match:
>  - Hex encoded IP address appears within the hostname
>  - all IP octets appear within the hostname (fwd/rev)
>  - IP address without the .'s appears within the hostname (fwd/rev)
>  - Last two octets appears within the hostname (fwd/rev)
>  - Last octet appears within the hostname
>  - Hostname contains any of the following (.adsl. .dsl. .dip. .ddns.)


The regex's I use here are:

          elsif ( ($hostname =~ /(catv|cable|dsl|adsl|dhcp|ddns)/ ) ||
                  ($hostname =~ /(dial-?up|dynamic|static|$e|$j)/ ) ||
                  ($hostname =~ /($a.?0*$b|$b.?0*$c|$c.?0*$d)/    ) ||
                  ($hostname =~ /($e.?0*$d|$d.?0*$c|$c.?0*$b)/    ) ||
                  ($hostname =~ /($f.?0*$g|$g.?0*$h|$h.?0*$i)/    ) ||
                  ($hostname =~ /($j.?0*$i|$i.?0*$h|$h.?0*$g)/    ) ) {

($a-$d are the dexicmal octets, $e is the entire IP address as a single 
decimal value, $f-$i are the hex octets, and $j is the entire IP 
address as a single hex value ... though, $j is technically redundant 
since it wont be distinct from $f$g$h$i, and all has been converted to 
lower case, including the hostname)

So, I eliminated dip ... I was uncomfortable with it being too generic, 
and all of the hosts I saw that had it were caught by other parts of 
this (or by the greet_pause, or by having given me my own host name as 
their HELO string).  Same with "if the last octet is in the hostname" 
-- it was identifying hosts that looked like they were 
non-dial-up/dynamic/end-user addresses (server-XX.someplace.com for 
example).

So, my version of his #3 is:

- all hostname checks for IP addrs are done in both decimal and hex
- if any pair of octets is in the IP address, separated by any 1 
character (or not), and including any leading zero padding (that I saw 
in some such hostnames), in forward or reverse order
- if the entire hex IP address, or the total decimal value of the IP 
address appears
- if the hostname contains catv cable dsl dhcp ddns dialup dial-up 
dynamic or static (I don't require the leading and trailing .'s).  
(and, yeah, in my regexp I put both dsl and adsl, even though dsl is 
sufficient, I did it just for mental completeness ... besides, they 
line up all pretty that way ... or mostly anyway).

I do all of that in filter_sender, so it happens after SMTP-AUTH, so 
this check is after a check that basically says "don't worry about the 
DNS things if they did an SMTP-AUTH, or they come from one of my own IP 
addresses".

The error I give for this case instructs the sender (assuming it 
bounces back to a human) to use their ISP's email server instead of 
connecting directly to ours.


Since I started these checks, it has mostly impacted my mail flow by: 
greatly reducing the number of hosts caught by SBL and XBL (they're now 
being caught by these checks, which happen before the SBL and XBL 
checks (I use delaychecks)), and reducing the number of messages I am 
having to catch via spamassassin (which improves my system load by 
quite a bit).  And, looking at the greet_pause results, it looks like 
90% of those would be caught by the above rules as well.  So, I may 
start to relax the greet_pause a little.


If you want to see the full code, it's at 
http://www.rudd.cc/mimedefang-filter

note: I use this at home, not yet at work, and I no longer use 
mailscanner at home; you could use them together ... if you did, I 
would modify filter_begin to remove the virus checking, and modify 
filter_end to remove the spamassassin stuff.



More information about the MailScanner mailing list