html2text output not really clean?

Julian Field mailscanner at ecs.soton.ac.uk
Fri Nov 15 14:36:25 GMT 2002


At 09:48 15/11/2002, you wrote:
>Ok, I will bounce you two messages, can I send them to a non-public
>address?

Even better, can you put the queue files into a password-protected zip and
mail that to me please?

>One thing that might matter : the e-mail I will send will
>be generated by M$ Word as e-mail editor. I have noticed that the html output
>of Outlook itself is a lot cleaner and doesn't contain anywhere near the
>amount of rubble that Word throws in....
>Maybe it is only the Word specific rubble that isn't cleaned?

Maybe so.

>Also I have found something else. If html2text is enabled for one specific
>user (smith at somedomain.com) but the message is cc'ed to another user
>on the some domain/box (chris at somedomain.com) then both users will get the
>message in `plain' text. This is logical because the one df/qf message is
>converted but may be undesirable. Maybe a thing to add at the bottom of
>the todo list if it's possible at all?

I have never done any splitting of messages. What applies to 1 recipient
applies to all recipients. I'm unwilling to change that unless there is a
very good reason to.


>Remco
>
>On Fri, 15 Nov 2002, Julian Field wrote:
>
> > Can you send me 1 of the messages for me to experiment with please?
> >
> > At 08:59 15/11/2002, you wrote:
> > >Hi!
> > >
> > >I am using Mailscanner 4.05-3 and have a mobile user collecting mail onm
> > >his laptop. I want to use the html2text feature to prevent expensive
> > >phonecalls to collect e-mail in HTML format that keep the connection open
> > >for hours. MS is running on a RedHat 7.3 box.
> > >
> > >I have this line in my /etc/MailScanner/MailScanner.conf :
> > >Convert HTML To Text = /etc/MailScanner/rules/html2text.rules
> > >
> > >The html2text.rules contains:
> > >To              r.barendse at somedomain.com       yes
> > >To              remco at somedomain.com            yes
> > >Fromorto        default                         no
> > >
> > >The output in maillog seems correct:
> > >Nov 15 09:44:19 linuxgw MailScanner[7367]: Content Checks: Need to convert
> > >HTML to plain text in 1 messages
> > >Nov 15 09:44:20 linuxgw MailScanner[7367]: Content Checks: Detected and
> > >will convert HTML message to plain text in gAF8iAN07366
> > >
> > >When I start pine and look in the inbox, I still see small messages
> > >being huge in size (13-40 Kb). The top of the e-mail contains stuff like :
> > >@font-face { font-family: Tahoma; } @font-face { font-family: Verdana; }
> > >@page Section1 {size:595.35pt 842.0pt; margin: 26.95pt 70.9pt 1.0in
> > >70.9pt; mso-header-margin:
> > >
> > >and similar rubble throughout the e-mail :
> > >….Whaaat ??
> > ><![if !supportEmptyParas]><![endif]>
> > >You gotta be kidding me&#8230;.?!
> > >
> > >Now if I retrieve the contents of the mailbox using Outlook Express the
> > >e-mail *appears* to be stripped of html rubble because the formatting has
> > >changed (colors and font sizes are different). The size of the e-mail is
> > >slightly reduced (the original HTML mail was 21 Kb, the end result is 13
> > >Kb (still too much for only 80 lines of text).
> > >
> > >Why is there still all this font and other rubble in the e-mails and how
> > >can I strip them completely?
> > >
> > >Thanks!!
> > >
> > >Remco
> > >
> > >
> > >
> > >--
> > >This message has been scanned for viruses and
> > >dangerous content by MailScanner, and is
> > >believed to be clean.
> >
> > --
> > Julian Field                Teaching Systems Manager
> > jkf at ecs.soton.ac.uk         Dept. of Electronics & Computer Science
> > Tel. 023 8059 2817          University of Southampton
> >                              Southampton SO17 1BJ
> >
> >
>
>
>--
>This message has been scanned for viruses and
>dangerous content by MailScanner, and is
>believed to be clean.

--
Julian Field                Teaching Systems Manager
jkf at ecs.soton.ac.uk         Dept. of Electronics & Computer Science
Tel. 023 8059 2817          University of Southampton
                             Southampton SO17 1BJ



More information about the MailScanner mailing list