running html2text but still the e-mails are not completely clean?

Julian Field mailscanner at ecs.soton.ac.uk
Fri Jan 10 16:38:15 GMT 2003


At 16:26 10/01/2003, you wrote:
>I am trying out the html2text feature.
>
>When I look through a mail box I can see that not all html crap is
>removed. The filtered e-mails are about half the size before they went
>through th2 html2text filter but still there are loads of crap visible
>when looking at these mails in pine.
>
>This problem mostly seems to occur when the sender is using M$ Word as
>their e-mail editor for Outlook, the rest is filtered out pretty nicely.
>
>In pine loads of this chatter is visible:
>@font-face { font-family: MS Mincho; } @font-face { font-family: @MS
>Mincho; } @page Section1
>{size: 595.35pt 842.0pt; margin: 26.95pt 70.9pt 1.0in 70.9pt;
>mso-header-margin: .5in;
>mso-footer-margin: .5in; mso-paper-source: 0; } P.MsoNormal { FONT-SIZE:
>12pt; MARGIN: 0in 0in 0pt;
>FONT-FAMILY: Arial; mso-style-parent: ""; mso-pagination: widow-orphan;
>mso-fareast-font-family:
>"MS Mincho"; mso-bidi-font-family: "Times New Roman"; mso-ansi-language:
>NL; mso-fareast-language:
>JA; mso-bidi-font-weight: bold } LI.MsoNormal { FONT-SIZE: 12pt; MARGIN:
>0in 0in 0pt; FONT-FAMILY:
>Arial; mso-style-parent: ""; mso-pagination: widow-orphan;
>mso-fareast-font-family: "MS Mincho";
>mso-bidi-font-family: "Times New Roman"; mso-ansi-language: NL;
>mso-fareast-language: JA;
>
>Is this a bug in the filter?

It appears to be a problem with HTML-Parser not liking some versions of
MSWord HTML. 3.26 is the latest version, which is what I distribute. I'm
not sure there is very much I can immediately do about this unfortunately.
I have just tried it with Office XP and the chatter you give above doesn't
appear in the HTML file at all.
--
Julian Field
www.MailScanner.info
MailScanner thanks transtec Computers for their support


.



This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses
***********************************************************************************



.



This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses
***********************************************************************************



More information about the MailScanner mailing list