Recognising and flagging 'foreign' language e-mails in MCP

Julian Field MailScanner at ecs.soton.ac.uk
Thu May 24 14:28:16 IST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Quentin Campbell wrote:
> I use a small group of SpamAssassin rules in MCP to add a header to any
> message that looks like it is in Russian. The added header will look
> something like:
>
> X-Newcastle-MailScanner-MCPCheck: MCP-Clean, MCP-Checker (score=0.01,
> 	required 1, MCP_RUSSIAN 0.01
>
> This allows anyone who expects to receive messages in Russian to set up
> a personal mail filter rule to look for the string "MCP_RUSSIAN" in the
> message headers and move such messages into a "Russian" folder.
>
> The reason they need to do this is that most messages in Russian that
> are received here are tagged as spam. Most are spam! 
>   
I have done this too, but I didn't see any need to do it in MCP (as MCP 
has a very high speed overhead). Just a normal SA rule with a small 
score will do fine, just put your initials or something similar at the 
start of the rule name.
> If this "MCP_RUSSIAN" rule precedes the personal mail filter rules that
> recipients use for dealing with tagged spam then they don't miss
> (possibly) important messages in Russian.
>
> I want to do similar tagging in MCP for messages in German, Chinese and
> Japanese and perhaps other languages if the need arises.
>
> I am probably re-inventing the wheel here. Does anyone have, or know of,
> sets of SpamAssassin rules that reliably recognise e-mail in various
> foreign languages, the three languages above in particular? The SA
> ok_languages and ok_locales options don't quite work in the way that is
> needed to achieve the above.
>   
I found the same problem. I just look for the windows-1251 character set 
string appearing in the Subject: line. There are similar character set 
strings for the other character sets you are interested in.


Jules

- -- 
Julian Field MEng CITP
www.MailScanner.info
Buy the MailScanner book at www.MailScanner.info/store

MailScanner customisation, or any advanced system administration help?
Contact me at Jules at Jules.FM

PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
For all your IT requirements visit www.transtec.co.uk



-----BEGIN PGP SIGNATURE-----
Version: PGP Desktop 9.6.1 (Build 1012)
Charset: ISO-8859-1

wj8DBQFGVZMfEfZZRxQVtlQRAhv9AKC/UiOWHkgRqIlMR6m1kleByXkvtgCgxRSY
gjaYKwfRPIV2HF33aLBbl4k=
=4DG+
-----END PGP SIGNATURE-----

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
For all your IT requirements visit www.transtec.co.uk



More information about the MailScanner mailing list