Issue with "Add Text Of Doc" feature

Anthony Cartmell ajcartmell at
Fri May 15 14:51:50 IST 2009

> You can make it always generate utf-8 by editing and  
> changing the text around line 200 to say this:
>    $parententity->attach(     Type => "text/plain",
>                               Charset => "utf-8",
>                               Encoding => "8bit",
>                               Disposition => "attachment",
>                               Filename => $attachfile,
>                               Path => "$dir/$unpackfile");
> (note the "Charset" setting). Then just edit the setting in  
> MailScanner.conf to say
> Antiword = /usr/bin/antiword -f -m UTF-8.txt
> and that's all you need to do. If you find this works okay, that's what  
> will go in the next release.

That would work, but the problem is that the "Charset" setting in  
$parententity->attach has to match the charset output from Antiword, set  
with the -m flag or defaulted from the LANG environment variables.

If someone set

Antiword = /usr/bin/antiword -f -m 8859-1.txt

(or if they set
Antiword = /usr/bin/antiword -f
and had iso-8859-1 as their default character set)

then the attachment headers would be specifying the wrong character set,  
resulting in corrupted text display.

So if you hard-code the "utf-8" into the attachment headers in, then you should probably also hard-code the "-m UTF-8.txt"  
into the calling of the antiword command.

so line 120 in would become:

  my $cmd = "$antiword -m UTF-8.txt '$dir/$docname' > '$dir/$unpackfile'";

to partner with line 200:

  Charset => "utf-8",

Then the question is whether antiword always comes with UTF-8.txt, which I  
think it probably does. Choosing UTF-8 should be safe as it covers  
pretty-much any character, is the default for XML and modern Apache HTML,  


-- - Quality web sites

More information about the MailScanner mailing list