Issue with "Add Text Of Doc" feature
MailScanner at ecs.soton.ac.uk
Fri May 15 15:08:36 IST 2009
On 15/05/2009 14:51, Anthony Cartmell wrote:
>> You can make it always generate utf-8 by editing Antiword.pm and
>> changing the text around line 200 to say this:
>> $parententity->attach( Type => "text/plain",
>> Charset => "utf-8",
>> Encoding => "8bit",
>> Disposition => "attachment",
>> Filename => $attachfile,
>> Path => "$dir/$unpackfile");
>> (note the "Charset" setting). Then just edit the setting in
>> MailScanner.conf to say
>> Antiword = /usr/bin/antiword -f -m UTF-8.txt
>> and that's all you need to do. If you find this works okay, that's
>> what will go in the next release.
> That would work, but the problem is that the "Charset" setting in
> $parententity->attach has to match the charset output from Antiword,
> set with the -m flag or defaulted from the LANG environment variables.
> If someone set
> Antiword = /usr/bin/antiword -f -m 8859-1.txt
> (or if they set
> Antiword = /usr/bin/antiword -f
> and had iso-8859-1 as their default character set)
> then the attachment headers would be specifying the wrong character
> set, resulting in corrupted text display.
> So if you hard-code the "utf-8" into the attachment headers in
> Antiword.pm, then you should probably also hard-code the "-m
> UTF-8.txt" into the calling of the antiword command.
> so line 120 in Antiword.pm would become:
> my $cmd = "$antiword -m UTF-8.txt '$dir/$docname' > '$dir/$unpackfile'";
> to partner with line 200:
> Charset => "utf-8",
> Then the question is whether antiword always comes with UTF-8.txt,
> which I think it probably does. Choosing UTF-8 should be safe as it
> covers pretty-much any character, is the default for XML and modern
> Apache HTML, etc.
All done. It will be in the next release.
Julian Field MEng CITP CEng
Buy the MailScanner book at www.MailScanner.info/store
Need help customising MailScanner?
Need help fixing or optimising your systems?
Need help getting you started solving new requirements from your boss?
PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
Follow me at twitter.com/JulesFM and twitter.com/MailScanner
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the MailScanner