Picture analysis

Julian Field mailscanner at ecs.soton.ac.uk
Sat May 17 17:34:32 IST 2003

These are more possible applications for the general-purpose content filter
I want to write.
It's going to be a fairly large job, and lots of protocols and stuff to
sort out for MailScanner and the content filter to be able to communicate

There are loads of applications for it, it's a matter of working out
exactly how to write it, particularly so that external projects can easily
be plugged into it.

I am basically going to pass it filenames of attachments and chunks of
sanitized MIME header info. It will then do stuff with the contents of the
file, possibly also using the MIME header info. It then needs to be able to
optionally change the MIME header info as well, and then tell MailScanner
it has done it. I want to keep the communication extremely simple so that
external filters can be written in a wide variety of languages very easily.

On the other hand it also needs to be very fast, even in languages with
large startup overheads (such as cranking up a Java VM) and so must be able
to handle lots of files at once.

And so maybe the files aren't returned in the same order they were
presented. You could have a farm of processes sitting there waiting for
work requests, and these will naturally return the simplest requests first.

And not all the files may be returned at all, or maybe the content filter
crashes, so it all needs wrapping in timeouts as well.

As you (hopefully) see, it's not quite as simple as it looks. But there's
no point doing it at all unless it is fast, robust and highly scalable. I
leave it to the commercial guys to produce half-baked systems that are
slow, dodgy and unscalable.

At 12:39 17/05/2003, you wrote:
>I was thinking to myself why anyone would want to spend a fortune for a
>service such as Messagelabs (and similar) when they could easily build (or
>buy) a MailScanner solution.
>The only real benefit, as I see it, would be their "Porn filter".  I could
>be something very useful for schools, and quite possibly other organizations
>aswell.  Up until recently there has been no open source initiative (that I
>know of) in this field.
>Now there is "Poesia", see http://sourceforge.net/projects/poesia/ or
>I'm no programmer and I see the project involves Java, something i know
>MailScanner doesnt use at all.
>Would it be a huge task to implement Poesias Pics- and Imagefilter into
>Another thing that struck me after reading a recent article,
>Isnt it possible that some organizations might worry about "stegged" content
>in otherwise allowed files?
>"Stegcheck" at http://www.outguess.org/detection.php doesnt strike me as too
>hard (although I may be wrong...) to implement with MailScanner.
>(I'm sure there are other tools for detecting stegged content in various
>files, stegcheck would be a good start though)
>I gather both these features would require quite a lot of processing power
>and time but I'm sure some it would be acceptable for those who really need
>these features.
>Wouldnt a "Porn" and "Stego" feature in MailScanner be worth investigating?
>regards, Tony

Julian Field
Professional Support Services at www.MailScanner.biz
MailScanner thanks transtec Computers for their support

More information about the MailScanner mailing list