html2text output not really clean?

Julian Field mailscanner at ecs.soton.ac.uk
Fri Nov 15 09:16:19 GMT 2002


Can you send me 1 of the messages for me to experiment with please?

At 08:59 15/11/2002, you wrote:
>Hi!
>
>I am using Mailscanner 4.05-3 and have a mobile user collecting mail onm
>his laptop. I want to use the html2text feature to prevent expensive
>phonecalls to collect e-mail in HTML format that keep the connection open
>for hours. MS is running on a RedHat 7.3 box.
>
>I have this line in my /etc/MailScanner/MailScanner.conf :
>Convert HTML To Text = /etc/MailScanner/rules/html2text.rules
>
>The html2text.rules contains:
>To              r.barendse at somedomain.com       yes
>To              remco at somedomain.com            yes
>Fromorto        default                         no
>
>The output in maillog seems correct:
>Nov 15 09:44:19 linuxgw MailScanner[7367]: Content Checks: Need to convert
>HTML to plain text in 1 messages
>Nov 15 09:44:20 linuxgw MailScanner[7367]: Content Checks: Detected and
>will convert HTML message to plain text in gAF8iAN07366
>
>When I start pine and look in the inbox, I still see small messages
>being huge in size (13-40 Kb). The top of the e-mail contains stuff like :
>@font-face { font-family: Tahoma; } @font-face { font-family: Verdana; }
>@page Section1 {size:595.35pt 842.0pt; margin: 26.95pt 70.9pt 1.0in
>70.9pt; mso-header-margin:
>
>and similar rubble throughout the e-mail :
>….Whaaat ??
><![if !supportEmptyParas]><![endif]>
>You gotta be kidding me&#8230;.?!
>
>Now if I retrieve the contents of the mailbox using Outlook Express the
>e-mail *appears* to be stripped of html rubble because the formatting has
>changed (colors and font sizes are different). The size of the e-mail is
>slightly reduced (the original HTML mail was 21 Kb, the end result is 13
>Kb (still too much for only 80 lines of text).
>
>Why is there still all this font and other rubble in the e-mails and how
>can I strip them completely?
>
>Thanks!!
>
>Remco
>
>
>
>--
>This message has been scanned for viruses and
>dangerous content by MailScanner, and is
>believed to be clean.

--
Julian Field                Teaching Systems Manager
jkf at ecs.soton.ac.uk         Dept. of Electronics & Computer Science
Tel. 023 8059 2817          University of Southampton
                             Southampton SO17 1BJ



More information about the MailScanner mailing list