Blocking by character set

Ljósnet ljosnet at gmail.com
Sun May 17 16:17:20 IST 2009


If you use sendmail:

LOCAL_CONFIG
dnl #
dnl regex map for character sets (not case-sensitive)
KCharsetKorean regex -a at MATCH
charset=.*(euc-kr|korean|ks.*c|koi8|iso-2022-kr|KS_C_5601-1987)
KCharsetChinese regex -a at MATCH
charset=.*(big5|Chinese|cn|gb|koi8|iso-2022-jp|EUC-TW)
dnl #
LOCAL_RULESETS
dnl #
##################################################################
#  Local ruleset - Check Content-Type:                           #
##################################################################
dnl Reject based on Content-Type header
HContent-Type:          $>CheckContentType
D{NoKoreanMsg}Korean not spoken here.
D{NoChineseMsg}Chinese not spoken here.
SCheckContentType
R$*             $: $(CharsetKorean $&{currHeader} $)
R at MATCH         $#error $: 550 5.7.0 ${NoKoreanMsg}
R$*             $: $(CharsetChinese $&{currHeader} $)
R at MATCH         $#error $: 550 5.7.0 ${NoChineseMsg}

On Mon, May 11, 2009 at 7:15 PM, Brendan Pirie <bpirie at rma.edu> wrote:
> Denis Beauchemin wrote:
>>
>> Paul Lemmons a écrit :
>>>
>>> Is there any way to recognize a particular character set in a message and
>>> block based on it. We are a non-international company and 100% of the email
>>> containing non-English characters is spam. I would like to use that to my
>>> advantage and simply block mail containing (to us) foreign character sets.
>>
>> Paul,
>>
>> Maybe this SA option could do the trick (from man
>> Mail::SpamAssassin::Conf):
>> ok_locales xx [ yy zz ... ] (default: all)
>>   This option is used to specify which locales are considered OK for
>> incoming mail. Mail using the character sets that are allowed by this option
>> will not be marked as possibly being spam in a foreign language.
>>
>>   If you receive lots of spam in foreign languages, and never get any
>> non-spam in these languages, this may help. Note that all ISO-8859-*
>> character sets, and Windows code page character sets, are always permitted
>> by default.
>>
>>   Set this to all to allow all character sets. This is the default.
>>
>>   The rules CHARSET_FARAWAY, CHARSET_FARAWAY_BODY, and
>> CHARSET_FARAWAY_HEADERS are triggered based on how this is set.
>>
>>   Examples:
>>
>>     ok_locales all         (allow all locales)
>>     ok_locales en          (only allow English)
>>     ok_locales en ja zh    (allow English, Japanese, and Chinese)
>>
>>   Note: if there are multiple ok_locales lines, only the last one is used.
>>
>>   Select the locales to allow from the list below:
>>
>> en - Western character sets in general
>> ja - Japanese character sets
>> ko - Korean character sets
>> ru - Cyrillic character sets
>> th - Thai character sets
>> zh - Chinese (both simplified and traditional) character sets
>>
>> normalize_charset ( 0 | 1) (default: 0)
>>   Whether to detect character sets and normalize message content to
>> Unicode. Requires the Encode::Detect module, HTML::Parser version 3.46 or
>> later, and Perl 5.8.5 or later.
>>
>> Denis
>>
>
> Another possible option is the TextCat plugin included with spamassassin.
>
> Brendan
> --
> MailScanner mailing list
> mailscanner at lists.mailscanner.info
> http://lists.mailscanner.info/mailman/listinfo/mailscanner
>
> Before posting, read http://wiki.mailscanner.info/posting
>
> Support MailScanner development - buy the book off the website!
>


More information about the MailScanner mailing list