Innovative use of MCP - how do I detect and flag certain languages?

Alex Neuman van der Hans alex at NKPANAMA.COM
Fri Aug 19 16:22:53 IST 2005


    [ The following text is in the "windows-1251" character set. ]
    [ Your display is set for the "US-ASCII" character set.  ]
    [ Some characters may be displayed incorrectly. ]

How about using cyrillic-only characters in rules? I'm sure there's quite a
lot of letters you could use in a regex that would indicate the message is
in russian.

Look for things like (hope this gets interpreted correctly) 

¨^À^Áª½²¯£^Ê^Ì^Î^Í¡^ÏÀÁÃÄÆÇÈÉÊËÏÓÔÖ×ØÙÙÚÛÜÝÞßá



-----Original Message-----
From: MailScanner mailing list [mailto:MAILSCANNER at JISCMAIL.AC.UK] On Behalf
Of Quentin Campbell
Sent: Friday, August 19, 2005 4:16 AM
To: MAILSCANNER at JISCMAIL.AC.UK
Subject: Innovative use of MCP - how do I detect and flag certain languages?

My question on how to detect and flag Russian (and other language) messages
is at the end. The backgroud to the question is as follows:

We are a large UK university covering areas of study/research which include
Medicine, Agriculture, Engineering and Social Sciences. We also have a large
number of overseas students who send/receive e-mail in foreign languages.

This means that almost any word or language that commonly appears in real
spam may also appear in many genuine messges relating received here.

Some users here make unreasonable demands on the University to detect and
remove centrally _all_ offensive spam e-mail or all, say, Russian language
spam. As an alternative they expect to be able to detect and deal with it in
their personal mail filters. 

We have operational problems doing either of these:

1. For the reasons given above, central detection and deletion is impossible
without generating a large number of falsely positive tagged messages.

2. The bulk of our 20,000+ users use mailers that only allow filtering on
text in the message headers, not the message body.

This last problem is a real pain. It would be nice if users could simply
auto-delete any message whose body contains words/languages to which they
object and would not expect to see in genuine mail.

The way I am dealing with this last problem is by using SpamAssassin
rulessets in MCP to generate MCP_NCL_ALERT_* message headers that will
appear in a received message if the message body contains certain
"objectionable" words/phrases notified to me. The "*" in the header text is
replaced by a range of specific "content type tags". All our users'
personal mail filters can detect these MCP_NCL_ALERT_* strings in the
message headers.

This scheme works well enough but may not scale; however that is another
matter.

I want to extend this MCP detection scheme to use SpamAssassin rules to
detect and flag mail whose message body is in particular languages.
Russian is my current target.
 
How do I do this? Using "ok_locales" and "ok_languages" does not appear to
be appropriate in this context.

Quentin
---
PHONE: +44 191 222 8209    Information Systems and Services (ISS),
                           University of Newcastle,
                           Newcastle upon Tyne,
FAX:   +44 191 222 8765    United Kingdom, NE1 7RU.
------------------------------------------------------------------------
Any opinion expressed above is mine. The University can get its own. 

------------------------ MailScanner list ------------------------ To
unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and the
archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!



More information about the MailScanner mailing list