filtering file types vs. extensions

Julian Field mailscanner at ecs.soton.ac.uk
Fri Jun 6 18:42:57 IST 2003


At 18:29 06/06/2003, you wrote:
> > Does anyone know of a Perl module that uses the magic file? I
> > would very
> > much like to avoid having to write this, but I don't want to
> > have to crank
> > up the file command for every message batch if I can avoid it.
>
>maybe you missed Mariano's post with the link in (it ended up in a
>different thread in my mailreader) so heres the link he found..
>http://search.cpan.org/author/KNOK/File-MMagic-1.19/

I hadn't seen his post when I replied.

>Looks like this returns a mime type, which is probably the right way to go
>about this (saves processing the output from file too)
>
>Given mime types I think probaly the easiest way would be to have a
>mimetypes.rules.conf which matches using RE's in the same way
>filename.rules.conf does.
>
>I guess you run into issues if the output from filename rules and mimetype
>rules conflict (reject takes precedence?)
>
>I don't think combining filename rules and mime types into one file would
>be very easy as it would be difficult to deal with wildcard matching,
>double extensions etc.
>
>One suggestion which although complicating the implementation would make
>it much easier to construct rulesets based on file type is to have both a
>filename rules and mimetype rules file which assign category names (rather
>than simple yes/no) then have a much simpler ruleset determining action
>based on category (again reject takes precedence). Category names need to
>be arbitary so that users can extend the range of categories.
>
>I guess thats not easy - but it could be quite handy!

I want to keep it very simple to use. Very few people ever change these
files, as they are complicated enough already. Mapping a mimetype or a
filename rule to another keyword, then deny/allow based on those keywords,
is a bit too complicated in my opinion.

A file like filename.rules.conf that matches mimetypes (or possibly "file"
output) would be the easiest thing to do. But it would not manage to match
files in which the file content doesn't match the filename. But maybe this
isn't actually a problem. I think maybe that enforcing that is actually
going to cause you more trouble than it's worth anyway, so that might well
not be a problem.

It needs to be fast, fairly easy to implement, but above all easy to use.
It doesn't need to be able to do absolutely everything, though that would
be nice :-)
--
Julian Field
www.MailScanner.info
Professional Support Services at www.MailScanner.biz
MailScanner thanks transtec Computers for their support



More information about the MailScanner mailing list