Innovative use of MCP - how do I detect and flag certain languages?

Quentin Campbell Q.G.Campbell at NEWCASTLE.AC.UK
Fri Aug 19 10:15:52 IST 2005


My question on how to detect and flag Russian (and other language)
messages is at the end. The backgroud to the question is as follows:

We are a large UK university covering areas of study/research which
include Medicine, Agriculture, Engineering and Social Sciences. We also
have a large number of overseas students who send/receive e-mail in
foreign languages.

This means that almost any word or language that commonly appears in
real spam may also appear in many genuine messges relating received
here.

Some users here make unreasonable demands on the University to detect
and remove centrally _all_ offensive spam e-mail or all, say, Russian
language spam. As an alternative they expect to be able to detect and
deal with it in their personal mail filters. 

We have operational problems doing either of these:

1. For the reasons given above, central detection and deletion is
impossible without generating a large number of falsely positive tagged
messages.

2. The bulk of our 20,000+ users use mailers that only allow filtering
on text in the message headers, not the message body.

This last problem is a real pain. It would be nice if users could simply
auto-delete any message whose body contains words/languages to which
they object and would not expect to see in genuine mail.

The way I am dealing with this last problem is by using SpamAssassin
rulessets in MCP to generate MCP_NCL_ALERT_* message headers that will
appear in a received message if the message body contains certain
"objectionable" words/phrases notified to me. The "*" in the header text
is replaced by a range of specific "content type tags". All our users'
personal mail filters can detect these MCP_NCL_ALERT_* strings in the
message headers.

This scheme works well enough but may not scale; however that is another
matter.

I want to extend this MCP detection scheme to use SpamAssassin rules to
detect and flag mail whose message body is in particular languages.
Russian is my current target.
 
How do I do this? Using "ok_locales" and "ok_languages" does not appear
to be appropriate in this context.

Quentin
---
PHONE: +44 191 222 8209    Information Systems and Services (ISS),
                           University of Newcastle,
                           Newcastle upon Tyne,
FAX:   +44 191 222 8765    United Kingdom, NE1 7RU.
------------------------------------------------------------------------
Any opinion expressed above is mine. The University can get its own. 

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the Wiki (http://wiki.mailscanner.info/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!



More information about the MailScanner mailing list