Issue with "Add Text Of Doc" feature
Anthony Cartmell
ajcartmell at fonant.com
Fri May 15 14:51:50 IST 2009
> You can make it always generate utf-8 by editing Antiword.pm and
> changing the text around line 200 to say this:
> $parententity->attach( Type => "text/plain",
> Charset => "utf-8",
> Encoding => "8bit",
> Disposition => "attachment",
> Filename => $attachfile,
> Path => "$dir/$unpackfile");
>
> (note the "Charset" setting). Then just edit the setting in
> MailScanner.conf to say
> Antiword = /usr/bin/antiword -f -m UTF-8.txt
>
> and that's all you need to do. If you find this works okay, that's what
> will go in the next release.
That would work, but the problem is that the "Charset" setting in
$parententity->attach has to match the charset output from Antiword, set
with the -m flag or defaulted from the LANG environment variables.
If someone set
Antiword = /usr/bin/antiword -f -m 8859-1.txt
(or if they set
Antiword = /usr/bin/antiword -f
and had iso-8859-1 as their default character set)
then the attachment headers would be specifying the wrong character set,
resulting in corrupted text display.
So if you hard-code the "utf-8" into the attachment headers in
Antiword.pm, then you should probably also hard-code the "-m UTF-8.txt"
into the calling of the antiword command.
so line 120 in Antiword.pm would become:
my $cmd = "$antiword -m UTF-8.txt '$dir/$docname' > '$dir/$unpackfile'";
to partner with line 200:
Charset => "utf-8",
Then the question is whether antiword always comes with UTF-8.txt, which I
think it probably does. Choosing UTF-8 should be safe as it covers
pretty-much any character, is the default for XML and modern Apache HTML,
etc.
HTH,
Anthony
--
www.fonant.com - Quality web sites
More information about the MailScanner
mailing list