Porn msg identification?

Mariano Absatz mailscanner at LISTS.COM.AR
Thu Apr 10 15:22:44 IST 2003


El 9 Apr 2003 a las 14:45, Richard D Alloway escribió:

> On Tue, 8 Apr 2003, Mariano Absatz wrote:
> 
> > Hi Rich,
> > 
> > The point is that MailScanner doesn't know anything about scoring messages... 
> > the spam score you see in MailScanner is actually done by the SpamAssassin 
> > library that MailScanner optionally uses.
> 
> This is, of course, quite true :)
> 
> The reason I was suggesting it be part of MailScanner is the fact that
> MailScanner takes the output of SpamAssassin and modifies the subject
> and/or adds a header to the message.
>  
> > Now, _that_ library, including the rules that come with it, is developed and 
> > optimized to tag as much spam as possible _avoiding_ as many false positives 
> > as it can.
> 
> Well, I'm not necessarily looking to detect spam... legitimate email with
> mature content might not be spam. :)
Right, but my point is that, so far, MailScanner invokes SpamAssassin at most 
once, and thus, it only uses one set of SA rules that, by default, is 
configured to detect spam.

It would be easy (only a matter of configuration, not programming) to change 
the SA rules (and/or their scoring) to detect adult content, and modify the 
MailScanner.conf, so the X-MailScanner-xxxx and Subject be modified to report 
'adulthood' rather than 'spamhood' of the message.

The problem is if you want the _same_ MailScanner to do _both_ spam & adult 
content detection.

For that to work you should modify MS to invoke SA twice, with a different 
set of rules and generate to sets of headers and subject: modification, based 
on what each of the two SA invocations yield.

That would include duplicating some of MS's data structures representing 
messages with different names, configuration variables and their defaults, 
etc.

A slower (from a performance point of view) but faster (from a development 
point of view) solution would be to run 2 instances of MailScanner on the 
same machine, one to do de usual spam & virus detection and the other one to 
do adult content detection.

For this you'll have to set up another queue directory like 
/var/spool/mqueue.mid and set the first MS with that as the "output" 
directory and the second MS with that as the "input" directory...

You should also change, for the second MS all the messages that speak about 
"spam" to speak about "adult content", configure it to not query (either 
internally or via SA any RBL), to not check for viruses, eliminate the 
internal MS content checks (IFRAME, attachment extensions, etc.) so as to 
avoid as much double-processing as you can....

The first MS should also change its "Sendmail2" invocation... I don't know 
much about Sendmail and Exim, but, for what I see, it should be kind of 
"/bin/true" since every file that the second MS finds in 
/var/spool/mqueue.mid (left there by the first MS) will automatically be 
processed by the second MailScanner without it needing to be invoked as 
sendmail does...

Am I wrong, Julian, Nick?

>  
> > Thus, SpamAssassin scans the message looking for patterns and it adds or 
> > substracts from the score as some conditions are met or not...
> 
> Which is the same functionality I'd be looking for in a word/phrase
> detection routine, but with a seperate set of actions from the spam
> portion.
>  
> > You _could_ create a different set of rules for SpamAssassin and invoke it 
> > twice, once for spam detection and the other for "adulthood" detection, but 
> > that would imply at least modifying MailScanner and using a secondary set of 
> > SpamAssassin rules... it _will_ require some time and an effort to do it...
> 
> It seems I may be one of the very few actually looking for this type of
> feature...perhaps I will have to throw on the ol' coding hat for a while
> :)
> 
> Julian, if I am (or anybody else is) able to create a relatively
> lightweight way of adding this feature to MailScanner, would you consider
> adding it to the production version?  
> 
> Thanks again for everyone's feedback!
> 
> -Rich

--
Mariano Absatz
El Baby
----------------------------------------------------------
Honey, I Formatted the Kid!




More information about the MailScanner mailing list