More encoded subject woes

Tue May 23 09:31:47 IST 2006

On 5/20/06, Nick Smith <nick.smith67 at googlemail.com> wrote:
> On 5/19/06, Nick Smith <nick.smith67 at googlemail.com> wrote:
> > Hi,
> >
> > MS 4.54-2 / Postfix 2.10
> >
> > I've got more trouble with encoded subject headers being "mishandled"
> > from a recipient's point of view. The issue occurs when, for whatever
> > reason, MIME-Tools is unable to decode an encoded subject properly -
> > this example is UTF-8, but I don't know if it may affect other
> > encoding types too
> >
> > =?UTF-8?B?5oOF5aCx6YCj57Wh56WoIC0gVVNHcumVt+WQiOitsOW+heOBoSA=?==?
> > UTF-8?B?LSDnrKzvvJTvvJjlm57lhajml6XmnKwgLSDlsZXnpLrkvJrjga7lh7rlsZU=?=
> >
> > If you feed that string to MIME::WordDecoder::unmime it returns:
> >
> > ????? - USGr????? - ??????? - ??????
> >
> > I have absolutely no idea why this happens - whether it's a bug or
> > expected behaviour on the part of MIME-Tools, but I assume that each
> > question mark represents a multi-byte (Japanese in this case)
> > character that it was not possible to decode
> >
> > Drop the same string into an Outlook message and send it via SMTP
> > (making sure that it bypasses MailScanner), and when it arrives it
> > should show a bunch of Japanese characters. The recipients are
> > understandably not happy that the subject of their email when it shows
> > up has been replaced by a bunch of question marks
> >
> > I've worked around this problem with a patch against Postfix.pm
> > (attached), but I'm less than comfortable with it. Basically what it
> > does is to unmime into a temporary holding string instead of the
> > $message structure and then take a look at the results of its
> > handiwork. If it sees more than an arbitrary number of consecutive ?'s
> > (I picked more than 3 as a reasonable number), it assumes that the
> > unmime was unsuccessful and allows the original encoded subject to
> > pass. Otherwise it assumes decode success and fills the
> > message->{subject} structure with the unmime result
> >
> > The first problem is that the ???? test is far from foolproof -
> > there's loads of scope for false +ves and false -ves. The second
> > problem is I'm not sure what issues this might cause if MS has to
> > alter the subject later. I'm not altering any subjects at all so it
> > wouldn't be a worry on my system but...
> >
> > Clearly I'm working with Postfix here, but this affects other MTA's
> > too. Equally clearly the proper answer is to figure out what's up with
> > MIME-Tools, but I'm afraid that's way beyond my capabilities :(
> >
> > Thoughts appreciated
> >
> > Thanks
> >
> > Nick
> >
> >
> >
>
> Please ignore all of this - I think I've been fed old news by the
> group that reported this to me as an issue
>
> I'm pretty certain that their problem was actually the "Postfix
> truncates multi-line subject" thing that Julian already fixed for me,
> and that when they said they were still having the issue after
> re-testing they were mistaken
>
> I am working on the assumption that the ???? output from the unmime
> function is just an ASCII representation but it was plenty enough to
> confuse me :(
>
> Sorry for the false alarm
>
> Thanks
>
> Nick
>
Oh dear - it seems that maybe there is something in what I first
suggested. Please take a look at this UTF-8 encoded string from a mail
subject:

Subject: =?UTF-8?B?NDXmmYLplpPmrovmpa3otoXpgY7nlLPoq4sg5om/6KqN5L6d6aC8IFtJ?=
 =?UTF-8?B?S0VEQSBZT0hFSSDmsaDnlLAg5rSL5bmzXSAg?=

MIME-Tools doesn't seem able to decode this, and the original encoded
subject does get replaced by a bunch of ?'s (a single ? in place of
where each double byte Japanese character should be). Microsoft seems
to have no problem decoding this

The thing I still don't get at all with MailScanner is under what
circumstances the original encoded format subject header gets replaced
by the unmimed version as part of onward delivery

What I mean by this is that if a subject gets successfully unmimed
then it gets sent onwards in its original MIME form - if the unmime is
not successful however (as in this case) then the subject header in
the message itself gets physically replaced with the "broken" ASCII
representation where ?'s substitute for double byte characters

I'd very much appreciate any insight into this problem - does the
unmime function have a return code that could be tested for success
before using its output for example?

Unfortunately my previous strategy of testing for n successive ?'s
isn't going to work because I think all db characters will appear as a
? in the perl string test whether the decode was successful or not. I
also have not managed to figure out what dependencies there are here
that affect MailScanner's ability to do a subject rewrite if it needs
to insert a string of its own

Thanks

Nick