Porn msg identification?
Mariano Absatz
mailscanner at LISTS.COM.AR
Thu Apr 10 15:22:44 IST 2003
El 9 Apr 2003 a las 14:45, Richard D Alloway escribió:
> On Tue, 8 Apr 2003, Mariano Absatz wrote:
>
> > Hi Rich,
> >
> > The point is that MailScanner doesn't know anything about scoring messages...
> > the spam score you see in MailScanner is actually done by the SpamAssassin
> > library that MailScanner optionally uses.
>
> This is, of course, quite true :)
>
> The reason I was suggesting it be part of MailScanner is the fact that
> MailScanner takes the output of SpamAssassin and modifies the subject
> and/or adds a header to the message.
>
> > Now, _that_ library, including the rules that come with it, is developed and
> > optimized to tag as much spam as possible _avoiding_ as many false positives
> > as it can.
>
> Well, I'm not necessarily looking to detect spam... legitimate email with
> mature content might not be spam. :)
Right, but my point is that, so far, MailScanner invokes SpamAssassin at most
once, and thus, it only uses one set of SA rules that, by default, is
configured to detect spam.
It would be easy (only a matter of configuration, not programming) to change
the SA rules (and/or their scoring) to detect adult content, and modify the
MailScanner.conf, so the X-MailScanner-xxxx and Subject be modified to report
'adulthood' rather than 'spamhood' of the message.
The problem is if you want the _same_ MailScanner to do _both_ spam & adult
content detection.
For that to work you should modify MS to invoke SA twice, with a different
set of rules and generate to sets of headers and subject: modification, based
on what each of the two SA invocations yield.
That would include duplicating some of MS's data structures representing
messages with different names, configuration variables and their defaults,
etc.
A slower (from a performance point of view) but faster (from a development
point of view) solution would be to run 2 instances of MailScanner on the
same machine, one to do de usual spam & virus detection and the other one to
do adult content detection.
For this you'll have to set up another queue directory like
/var/spool/mqueue.mid and set the first MS with that as the "output"
directory and the second MS with that as the "input" directory...
You should also change, for the second MS all the messages that speak about
"spam" to speak about "adult content", configure it to not query (either
internally or via SA any RBL), to not check for viruses, eliminate the
internal MS content checks (IFRAME, attachment extensions, etc.) so as to
avoid as much double-processing as you can....
The first MS should also change its "Sendmail2" invocation... I don't know
much about Sendmail and Exim, but, for what I see, it should be kind of
"/bin/true" since every file that the second MS finds in
/var/spool/mqueue.mid (left there by the first MS) will automatically be
processed by the second MailScanner without it needing to be invoked as
sendmail does...
Am I wrong, Julian, Nick?
>
> > Thus, SpamAssassin scans the message looking for patterns and it adds or
> > substracts from the score as some conditions are met or not...
>
> Which is the same functionality I'd be looking for in a word/phrase
> detection routine, but with a seperate set of actions from the spam
> portion.
>
> > You _could_ create a different set of rules for SpamAssassin and invoke it
> > twice, once for spam detection and the other for "adulthood" detection, but
> > that would imply at least modifying MailScanner and using a secondary set of
> > SpamAssassin rules... it _will_ require some time and an effort to do it...
>
> It seems I may be one of the very few actually looking for this type of
> feature...perhaps I will have to throw on the ol' coding hat for a while
> :)
>
> Julian, if I am (or anybody else is) able to create a relatively
> lightweight way of adding this feature to MailScanner, would you consider
> adding it to the production version?
>
> Thanks again for everyone's feedback!
>
> -Rich
--
Mariano Absatz
El Baby
----------------------------------------------------------
Honey, I Formatted the Kid!
More information about the MailScanner
mailing list