Dangerous content detection with "file" command

hugh.fraser at arcelormittal.com hugh.fraser at arcelormittal.com
Tue Sep 29 18:05:15 IST 2009

I have a word document that was mistakenly flagged as "executable".
Adding some debugging into the "SweepOther.pm" code revealed that the
document contained a Title property of "The Quest of the Self". The
linux "file" command used to identify file types returns this property
(along with author and others) in it's output as follows:

Support.doc: CDF V2 Document, Little Endian, Os: Windows, Version 5.1,
Code page
: 1252, Title: The Quest of the Self, Author: johndoe, Template: Normal,
Last Sa
ved By: JOHN DOE, Revision Number: 2, Name of Creating Application:
Office Word, Total Editing Time: 01:00, Create Time/Date: Thu Sep 17
09:57:00 20
09, Last Saved Time/Date: Thu Sep 17 09:57:00 2009, Number of Pages: 1,
Number o
f Words: 2597, Number of Characters: 14289, Security: 0

MailScanner does a simple regex compare of the output from the "file"
command and sees the string "ELF" in it (in the word Self), and flags
the file as executable. This will happen with any Word doucment that
contains any matching strings in the title, subject, author, category,
comments, or any other property fields.

A simple change in the regex used in the CheckFileContentTypes to only
capture the "file" command's output up to the first "," does the trick,
and I've checked some other files in quarantine to see if it would be a
problem. So far, I don't see a problem.

The diffs for SweepOther.pm are as follows:

<         $FileTypes{$1}{$2} = $3 if /^([^\/]+)\/([^:]+):\s*(.*)$/;
>         $FileTypes{$1}{$2} = $3 if /^([^\/]+)\/([^:]+):\s*([^,]*),/;

More information about the MailScanner mailing list