Corrupt pdf files, any advice.

Julian Field mailscanner at ecs.soton.ac.uk
Tue Jul 29 12:31:14 IST 2003


Dean has kindly sent me the qf+df files from a message containing a PDF 
file that is corrupted. He has also sent me the original untouched PDF file 
to compare with the df file.

Well, whatever generated the original quoted-printable message
         X-Mailer: Internet Mail Service (5.5.2653.19)
did it wrong.

If you do an "od -c" on the test1.pdf file you get this:
0000000   %   P   D   F   -   1   .   2  \r   %   â   ã   Ï   Ó  \r  \n
0000020   6   3   2   6       0       o   b   j  \r   <   <      \r   /
Note the \r\n at the end of the first line, just before the 6326.

but if you do an "od -c" of the quoted-printable message contents (so you 
can see any embedded newline characters and so on), you get this:
0000000   %   P   D   F   -   1   .   2   =   0   D   %   =   E   2   =
0000020   E   3   =   C   F   =   D   3  \n   6   3   2   6       0
0000040   o   b   j   =   0   D   <   <       =   0   D   /
Now look what has happened to the data just before the 6326. It has been 
squashed into 1 \n character, thereby destroying the \r in the original.

I can only imagine that Outlook/Exchange saw the \r\n sequence near the 
start of the file, and concluded that it was a text-based file. It 
therefore saw nothing wrong in squashing \r\n into just \n, which would 
work fine on a text file. Unfortunately its original decision about the 
file was wrong in this case :-(

This makes it
a) Microsoft's fault
and
b) Not a problem I can work around, as their software has destroyed data 
that I cannot reconstruct.

Outlook XP always appears to use Base64, so I suspect the problem may just 
exist in Exchange 5.5 and/or Outlook 97. Don't know about Outlook 2000.

Whether Acrobat Reader (on some platforms) will continue to be able to use 
the damaged file is another matter entirely, something over which I have no 
control.

All I can suggest is you request people using the particular troublesome 
versions always zip their PDF files to stop Outlook destroying them.

If anyone has any ideas about a software workaround I could implement, 
please let me know as I can't think of any way of doing it right now.
-- 
Julian Field
www.MailScanner.info
MailScanner thanks transtec Computers for their support




More information about the MailScanner mailing list