how to detect koi8-r characters

Mark Nienberg lists at tippingmar.com
Mon Sep 15 22:55:01 IST 2008


Kevin Howard wrote:
> We're receiving a lot of spam comprising Cyrillic characters in the 
> subject line, example Subject: =?koi8-r?B?8sXLzGHNwSDXIOnObcXSzsVtxSA=?=
>  
> and a message body which is 100% Cyrillic, some messages are plain 
> text and some HTML.
>  
> The plains messages are using;
>  
> MIME-Version: 1.0
> Content-Type: text/plain;
>  charset="koi8-r"
> Content-Transfer-Encoding: 8bit
>
>  
> Spamassassin doesn't seem to be able to detect these reliably despite 
> us training bayes on these messages and utilising language filters. So 
> we're trying to use MCP to detect them but have had no success 
> whatsoever to date.
>  
> I have tried making a rule to detect " ?koi8 " in the subject line but 
> Mailscanner only seems to look at visible characters.
>  
> Any ideas?  my preference is to stop them with MCP if possible.
>
I use a spamassassin rule like this:

header   LOCAL_CYRILLIC           Subject:raw =~ /windows\-1251/i
describe LOCAL_CYRILLIC           Cyrillic fonts
score    LOCAL_CYRILLIC           3

in your case, maybe you need to replace windows-1251 with koi8-r.  The 
"raw" part is important.

Mark Nienberg



More information about the MailScanner mailing list