PDF Woes --- Solved?
Julian Field
mailscanner at ecs.soton.ac.uk
Fri May 28 20:33:35 IST 2004
If a few people could try this from PC and Unix platforms over the weekend,
it would be really useful. Lacking any other indication, I will include it
in the 1st June release on the assumption that it won't cause trouble to
Unix and Mac users.
At 16:32 28/05/2004, you wrote:
>Here is a patch for Message.pm. It has taken me all day to add one line of
>code to a function :-(
>
>-----SNIP-----
>--- Message.pm.old 2004-05-22 13:30:15.000000000 +0100
>+++ Message.pm 2004-05-28 16:31:38.000000000 +0100
>@@ -4114,5 +4114,37 @@
> $top_ent;
> }
>
>+#
>+# Overload the MIME quoted-printable decoder.
>+# This version will make lines that end in \n now end in \r\n.
>+# This hopefully fixes problems with PDF files as they are now extracted
>+# correctly.
>+#
>+package MIME::QuotedPrint;
>+
>+sub decode_qp ($)
>+{
>+ my $res = shift;
>+ $res =~ s/[ \t]+?(\r?\n)/$1/g; # rule #3 (trailing space must be
>deleted)
>+ $res =~ s/=\r?\n//g; # rule #5 (soft line breaks)
>+ $res =~ s/([^\r])\n\Z/$1\r\n/; # JKF rule to replace trailing \n
>with \r\n
>+ if (ord('A') == 193) { # EBCDIC style machine
>+ if (ord('[') == 173) {
>+ $res =~
>s/=([\da-fA-F]{2})/Encode::encode('cp1047',Encode::decode('iso-8859-1',pack("C",
>hex($1))))/ge;
>+ }
>+ elsif (ord('[') == 187) {
>+ $res =~
>s/=([\da-fA-F]{2})/Encode::encode('posix-bc',Encode::decode('iso-8859-1',pack("C",
>hex($1))))/ge;
>+ }
>+ elsif (ord('[') == 186) {
>+ $res =~
>s/=([\da-fA-F]{2})/Encode::encode('cp37',Encode::decode('iso-8859-1',pack("C",
>hex($1))))/ge;
>+ }
>+ }
>+ else { # ASCII style machine
>+ $res =~ s/=([\da-fA-F]{2})/pack("C", hex($1))/ge;
>+ }
>+ $res;
>+}
>+
>+
> 1;
>-----SNIP-----
>
>Please can you give that a go, both on PDF files and non-PDF files, and
>let me know if it works.
>
>My tests (only with PDF files) have appeared to work fine. I have "Sign
>Clean Messages = yes" and diff reports no changes between the input and
>output files.
>
>Jules.
>
>At 10:06 28/05/2004, you wrote:
>>This is totally repeatable using mutt from command line, the client
>>receiving doesn't matter, mutt does the same thing from the command line,
>>it always sends pdf's as quoted printable. If you want a PDF that is
>>guaranteed to break I can supply, it is 2MBytes though...
>>
>>-----Original Message-----
>>From: MailScanner mailing list [mailto:MAILSCANNER at JISCMAIL.AC.UK] On
>>Behalf Of Julian Field
>>Sent: 27 May 2004 16:49
>>To: MAILSCANNER at JISCMAIL.AC.UK
>>Subject: Re: PDF Woes
>>
>>At 09:22 27/05/2004, you wrote:
>> >Karl Bailey wrote:
>> > > Guys,
>> > >
>> > > I'm having a very frustrating problem. We run a production process
>> > > that uses mutt to mail PDF's to customers. Now I know mutt has some
>> > > known issues with PDF's, but, the problems introduced are compounded
>> > > by adding a signature to the email after scanning using MailScanner.
>> > > The footer seems to cause the PDF to corrupt to the point it is
>> > > unusable in SOME CASES. I know this is to do with the fact that mutt
>> > > uses quoted-printable content transfer encoding cos if I use mutt
>> > > interactivly & force the encoding type to base64 then everything
>> > > works.. attach from the command line & it all corrupts.
>> > >
>> >
>> >Below is information Julian posted after I found out our pdf's were
>> >getting mangled after passing through MailScanner. This problem is a
>> >quoted-printable/signing messages problem. In our case MS Exchange
>> >incorrectly decides to encode some binary pdf's as quoted-printable,
>> >which in turn is corrupted when MailScanner signs them. Base64 always
>> >passes through correctly. We took the view of always zipping up pdf's
>> >which gets around the problem. Another thing to note is that I found
>> >pdf's created in different software are treated differently when being
>> >encoded in MS Exchange, so it seems that the pdf file version is also
>> >taken into consideration when the message is created.
>> >
>> >Hope this helps
>> >
>> >Dean Plant
>> >
>> >Previous post from Julian.
>> >
>> >Dean has kindly sent me the qf+df files from a message containing a PDF
>> >file that is corrupted. He has also sent me the original untouched PDF
>> >file to compare with the df file.
>> >
>> >Well, whatever generated the original quoted-printable message
>> > X-Mailer: Internet Mail Service (5.5.2653.19) did it wrong.
>> >
>> >If you do an "od -c" on the test1.pdf file you get this:
>> >0000000 % P D F - 1 . 2 \r % â ã Ï Ó \r \n
>> >0000020 6 3 2 6 0 o b j \r < < \r /
>> >Note the \r\n at the end of the first line, just before the 6326.
>> >
>> >but if you do an "od -c" of the quoted-printable message contents (so
>> >you can see any embedded newline characters and so on), you get this:
>> >0000000 % P D F - 1 . 2 = 0 D % = E 2 =
>> >0000020 E 3 = C F = D 3 \n 6 3 2 6 0
>> >0000040 o b j = 0 D < < = 0 D /
>> >Now look what has happened to the data just before the 6326. It has
>> >been squashed into 1 \n character, thereby destroying the \r in the
>> original.
>> >
>> >I can only imagine that Outlook/Exchange saw the \r\n sequence near the
>> >start of the file, and concluded that it was a text-based file. It
>> >therefore saw nothing wrong in squashing \r\n into just \n, which would
>> >work fine on a text file. Unfortunately its original decision about the
>> >file was wrong in this case :-(
>> >
>> >This makes it
>> >a) Microsoft's fault
>> >and
>> >b) Not a problem I can work around, as their software has destroyed
>> >data that I cannot reconstruct.
>> >
>> >Outlook XP always appears to use Base64, so I suspect the problem may
>> >just exist in Exchange 5.5 and/or Outlook 97. Don't know about Outlook
>> 2000.
>> >
>> >Whether Acrobat Reader (on some platforms) will continue to be able to
>> >use the damaged file is another matter entirely, something over which I
>> >have no control.
>> >
>> >All I can suggest is you request people using the particular
>> >troublesome versions always zip their PDF files to stop Outlook
>> destroying them.
>> >
>> >If anyone has any ideas about a software workaround I could implement,
>> >please let me know as I can't think of any way of doing it right now.
>>
>>I have just tried it with a new PDF file from Acrobat 6, sent using
>>Outlook 2003, and it sent it as Base64 so I can't even investigate the
>>problem any more :-( And the PDF file I was using before (which Outlook
>>2003 sent as
>>quoted-printable) turns out to be broken from the start, so I couldn't
>>get any version to work.
>>I need a PDF file which was generated with Acrobat 5 which Outlook 2003
>>will send as quoted-printable.
>>Then I stand a chance of being able to test it.
>>
>>One thought I had was to traverse the MIME tree looking for
>>quoted-printable sections and change them to Base64 (or even just do it
>>to PDF attachments). Doing it to everything would make the message bigger
>>and is probably unnecessary, it's just PDF which is the problem.
>>--
>>Julian Field
>>www.MailScanner.info
>>MailScanner thanks transtec Computers for their support
>>
>>PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
>>
>>-------------------------- MailScanner list ----------------------
>>To leave, send leave mailscanner to jiscmail at jiscmail.ac.uk
>>Before posting, please see the Most Asked Questions at
>>http://www.mailscanner.biz/maq/ and the archives at
>>http://www.jiscmail.ac.uk/lists/mailscanner.html
>>
>>
>>
>>====================================================================
>>This e-mail and any attachments may be confidential and/or legally
>>privileged. If you have received this e-mail and you are not a named
>>addressee, please inform Landmark Information Group on 01392 441700
>>and then delete the e-mail from your system. If you are not a named
>>addressee you must not use, disclose, distribute, copy, print or rely
>>on this e-mail. This email and any attachments have been scanned for
>>viruses and to the best of our knowledge are clean. To ensure
>>regulatory compliance and for the protection of our clients and
>>business, we may monitor and read e-mails sent to and from our
>>servers.
>>
>>-------------------------- MailScanner list ----------------------
>>To leave, send leave mailscanner to jiscmail at jiscmail.ac.uk
>>Before posting, please see the Most Asked Questions at
>>http://www.mailscanner.biz/maq/ and the archives at
>>http://www.jiscmail.ac.uk/lists/mailscanner.html
>
>--
>Julian Field
>www.MailScanner.info
>MailScanner thanks transtec Computers for their support
>
>PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
>
>-------------------------- MailScanner list ----------------------
>To leave, send leave mailscanner to jiscmail at jiscmail.ac.uk
>Before posting, please see the Most Asked Questions at
>http://www.mailscanner.biz/maq/ and the archives at
>http://www.jiscmail.ac.uk/lists/mailscanner.html
>
--
Julian Field
www.MailScanner.info
Professional Support Services at www.MailScanner.biz
MailScanner thanks transtec Computers for their support
PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654
-------------------------- MailScanner list ----------------------
To leave, send leave mailscanner to jiscmail at jiscmail.ac.uk
Before posting, please see the Most Asked Questions at
http://www.mailscanner.biz/maq/ and the archives at
http://www.jiscmail.ac.uk/lists/mailscanner.html
More information about the MailScanner
mailing list