More encoded subject woes

Julian Field MailScanner at ecs.soton.ac.uk
Wed May 24 14:01:01 IST 2006


On 24 May 2006, at 13:39, Nick Smith wrote:

> On 5/23/06, Nick Smith <nick.smith67 at googlemail.com> wrote:
>> On 5/20/06, Nick Smith <nick.smith67 at googlemail.com> wrote:
>> > On 5/19/06, Nick Smith <nick.smith67 at googlemail.com> wrote:
>> > > Hi,
>> > >
>> > > MS 4.54-2 / Postfix 2.10
>> > >
>> > > I've got more trouble with encoded subject headers being  
>> "mishandled"
>> > > from a recipient's point of view. The issue occurs when, for  
>> whatever
>> > > reason, MIME-Tools is unable to decode an encoded subject  
>> properly -
>> > > this example is UTF-8, but I don't know if it may affect other
>> > > encoding types too
>> > >
>> > > =?UTF-8?B?5oOF5aCx6YCj57Wh56WoIC0gVVNHcumVt+WQiOitsOW+heOBoSA=? 
>> ==?
>> > > UTF-8?B? 
>> LSDnrKzvvJTvvJjlm57lhajml6XmnKwgLSDlsZXnpLrkvJrjga7lh7rlsZU=?=
>> > >
>> > > If you feed that string to MIME::WordDecoder::unmime it returns:
>> > >
>> > > ????? - USGr????? - ??????? - ??????
>> > >
>> > > I have absolutely no idea why this happens - whether it's a  
>> bug or
>> > > expected behaviour on the part of MIME-Tools, but I assume  
>> that each
>> > > question mark represents a multi-byte (Japanese in this case)
>> > > character that it was not possible to decode
>> > >
>> > > Drop the same string into an Outlook message and send it via SMTP
>> > > (making sure that it bypasses MailScanner), and when it  
>> arrives it
>> > > should show a bunch of Japanese characters. The recipients are
>> > > understandably not happy that the subject of their email when  
>> it shows
>> > > up has been replaced by a bunch of question marks
>> > >
>> > > I've worked around this problem with a patch against Postfix.pm
>> > > (attached), but I'm less than comfortable with it. Basically  
>> what it
>> > > does is to unmime into a temporary holding string instead of the
>> > > $message structure and then take a look at the results of its
>> > > handiwork. If it sees more than an arbitrary number of  
>> consecutive ?'s
>> > > (I picked more than 3 as a reasonable number), it assumes that  
>> the
>> > > unmime was unsuccessful and allows the original encoded  
>> subject to
>> > > pass. Otherwise it assumes decode success and fills the
>> > > message->{subject} structure with the unmime result
>> > >
>> > > The first problem is that the ???? test is far from foolproof -
>> > > there's loads of scope for false +ves and false -ves. The second
>> > > problem is I'm not sure what issues this might cause if MS has to
>> > > alter the subject later. I'm not altering any subjects at all  
>> so it
>> > > wouldn't be a worry on my system but...
>> > >
>> > > Clearly I'm working with Postfix here, but this affects other  
>> MTA's
>> > > too. Equally clearly the proper answer is to figure out what's  
>> up with
>> > > MIME-Tools, but I'm afraid that's way beyond my capabilities :(
>> > >
>> > > Thoughts appreciated
>> > >
>> > > Thanks
>> > >
>> > > Nick
>> > >
>> > >
>> > >
>> >
>> > Please ignore all of this - I think I've been fed old news by the
>> > group that reported this to me as an issue
>> >
>> > I'm pretty certain that their problem was actually the "Postfix
>> > truncates multi-line subject" thing that Julian already fixed  
>> for me,
>> > and that when they said they were still having the issue after
>> > re-testing they were mistaken
>> >
>> > I am working on the assumption that the ???? output from the unmime
>> > function is just an ASCII representation but it was plenty  
>> enough to
>> > confuse me :(
>> >
>> > Sorry for the false alarm
>> >
>> > Thanks
>> >
>> > Nick
>> >
>> Oh dear - it seems that maybe there is something in what I first
>> suggested. Please take a look at this UTF-8 encoded string from a  
>> mail
>> subject:
>>
>> Subject: =?UTF-8?B?NDXmmYLplpPmrovmpa3otoXpgY7nlLPoq4sg5om/ 
>> 6KqN5L6d6aC8IFtJ?=
>>  =?UTF-8?B?S0VEQSBZT0hFSSDmsaDnlLAg5rSL5bmzXSAg?=
>>
>> MIME-Tools doesn't seem able to decode this, and the original encoded
>> subject does get replaced by a bunch of ?'s (a single ? in place of
>> where each double byte Japanese character should be). Microsoft seems
>> to have no problem decoding this
>>
>> The thing I still don't get at all with MailScanner is under what
>> circumstances the original encoded format subject header gets  
>> replaced
>> by the unmimed version as part of onward delivery
>>
>> What I mean by this is that if a subject gets successfully unmimed
>> then it gets sent onwards in its original MIME form - if the  
>> unmime is
>> not successful however (as in this case) then the subject header in
>> the message itself gets physically replaced with the "broken" ASCII
>> representation where ?'s substitute for double byte characters
>>
>> I'd very much appreciate any insight into this problem - does the
>> unmime function have a return code that could be tested for success
>> before using its output for example?
>>
>> Unfortunately my previous strategy of testing for n successive ?'s
>> isn't going to work because I think all db characters will appear  
>> as a
>> ? in the perl string test whether the decode was successful or not. I
>> also have not managed to figure out what dependencies there are here
>> that affect MailScanner's ability to do a subject rewrite if it needs
>> to insert a string of its own
>>
>> Thanks
>>
>> Nick
>>
> OK - I wonder what the record is for replying to your own posts on  
> this list :)
>
> ...anyway, I have finally figured out the exact cause of this so no
> more aimless rambling or speculation
>
> When decoded, the string
> "=?UTF-8?B?NDXmmYLplpPmrovmpa3otoXpgY7nlLPoq4sg5om/6KqN5L6d6aC8IFtJ? 
> ==?UTF-8?B?S0VEQSBZT0hFSSDmsaDnlLAg5rSL5bmzXSAg?="
> contains 2 trailing spaces. Not immediately obvious, but the
> SweepContent module does a bit of checking for evidence of malicious
> subjects, and attempts to clean up. This isn't configurable or
> optional in any way, it is just what MS does
>
> One of the things it does is to remove trailing whitespace. However,
> if the subject is MIME encoded, it can't act on the subject itself
> directly, and instead does its work on the decoded version as returned
> by the unmime function
>
> This is fine until the encoded string contains multibyte unicode type
> data which of course cannot be represented in an ascii string (which
> is why it was encoded to begin with). The unmime function uses a ? as
> a placeholder when it finds a multibyte character
>
> Provided that SweepContent doesn't find any "badness" in the decoded
> representation of the subject that it's looking at, MS will allow the
> *original* encoded subject to pass unmolested. However, if it decides
> any changes need to be made it completely replaces the original
> encoded subject header with the ("cleaned") decoded representation
>
> It may well be that this is considered unfortunate but unavoidable
> collatoral damage by the MS team and that the fix is "don't do that"
> when it comes to putting spaces at the end of subjects that have to be
> encoded. However, I'm sure everybody would agree that it isn't easy
> sometimes to convince developers of applications that their code is
> "wrong", particularly when it involves a practice which is not
> actually forbidden as such and even more so when "it works fine with
> every other mail gateway"
>
> Anyway, for he meantime, I am doing this with Postfix.pm - which will
> allow MS to tolerate up to 2 trailing spaces (not tabs) if the subject
> has been encoded:
>
> -    $message->{subject} = MIME::WordDecoder::unmime($message-> 
> {subject});
> +    my $TmpSubject = ""; # Temp storage
> +    $TmpSubject = MIME::WordDecoder::unmime($message->{subject});
> +    if ($TmpSubject != $message->{subject}) {
> +      # The unmime function did something - we must be dealing with
> +      # an encoded subject. Remove up to 2 trailing spaces if present
> +      # so that SweepContent cuts us a little slack. Total  
> replacement
> +      # and hence probable destruction of unicode subjects for the  
> sake of
> +      # one or two probably harmless trailing spaces is a little  
> harsh
> +      $TmpSubject =~ s/ {1,2}$//;
> +      $message->{subject} = $TmpSubject;
> +    }
>
> I'd be grateful if consideration could be given to this problem - my
> "fix" probably isn't the most elegant, but perhaps there's a smarter
> way round the issue

Before I read your solution I was already thinking that if I allow 20  
spaces on the end it would provide reasonable security against  
malicious subjects but still allow trailing spaces on possibly- 
malicious mime-encoded Subject: lines.

So I would go for your suggestion, but how about we compromise on 10  
spaces instead of 2 or 20?

There are many things like this where I have to apply as strict  
security as I can get while not breaking reasonable use of things  
like Subject: lines. It's a judgement call as to where to draw the line.

I always err on the cautious side, as it is much better to slacken it  
off a little bit for some specific problem later, than it is to get a  
security vulnerability into the code that can actually be exploited.  
I believe firmly in "defence in depth" and so every bit of  
MailScanner is written looking from a hacker's point of view, so that  
you never actually create an exploitable vulnerability as there are  
so many layers the hacker would have to get through.

-- 
Julian Field
www.MailScanner.info
Buy the MailScanner book at www.MailScanner.info/store
PGP footprint: EE81 D763 3DB0 0BFD E1DC 7222 11F6 5947 1415 B654


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
MailScanner thanks transtec Computers for their support.



More information about the MailScanner mailing list