Issue with "Add Text Of Doc" feature

Julian Field MailScanner at ecs.soton.ac.uk
Fri May 15 15:08:36 IST 2009



On 15/05/2009 14:51, Anthony Cartmell wrote:
>> You can make it always generate utf-8 by editing Antiword.pm and 
>> changing the text around line 200 to say this:
>>    $parententity->attach(     Type => "text/plain",
>>                               Charset => "utf-8",
>>                               Encoding => "8bit",
>>                               Disposition => "attachment",
>>                               Filename => $attachfile,
>>                               Path => "$dir/$unpackfile");
>>
>> (note the "Charset" setting). Then just edit the setting in 
>> MailScanner.conf to say
>> Antiword = /usr/bin/antiword -f -m UTF-8.txt
>>
>> and that's all you need to do. If you find this works okay, that's 
>> what will go in the next release.
>
> That would work, but the problem is that the "Charset" setting in 
> $parententity->attach has to match the charset output from Antiword, 
> set with the -m flag or defaulted from the LANG environment variables.
>
> If someone set
>
> Antiword = /usr/bin/antiword -f -m 8859-1.txt
>
> (or if they set
> Antiword = /usr/bin/antiword -f
> and had iso-8859-1 as their default character set)
>
> then the attachment headers would be specifying the wrong character 
> set, resulting in corrupted text display.
>
> So if you hard-code the "utf-8" into the attachment headers in 
> Antiword.pm, then you should probably also hard-code the "-m 
> UTF-8.txt" into the calling of the antiword command.
>
> so line 120 in Antiword.pm would become:
>
>  my $cmd = "$antiword -m UTF-8.txt '$dir/$docname' > '$dir/$unpackfile'";
>
> to partner with line 200:
>
>  Charset => "utf-8",
>
> Then the question is whether antiword always comes with UTF-8.txt, 
> which I think it probably does. Choosing UTF-8 should be safe as it 
> covers pretty-much any character, is the default for XML and modern 
> Apache HTML, etc.
All done. It will be in the next release.

Jules

-- 
Julian Field MEng CITP CEng
www.MailScanner.info
Buy the MailScanner book at www.MailScanner.info/store

Need help customising MailScanner?
Contact me!
Need help fixing or optimising your systems?
Contact me!
Need help getting you started solving new requirements from your boss?
Contact me!

PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
Follow me at twitter.com/JulesFM and twitter.com/MailScanner


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the MailScanner mailing list