html2text output not really clean?

Remco Barendse mailscanner at BARENDSE.TO
Fri Nov 15 09:48:03 GMT 2002


Ok, I will bounce you two messages, can I send them to a non-public
address?

One thing that might matter : the e-mail I will send will
be generated by M$ Word as e-mail editor. I have noticed that the html output
of Outlook itself is a lot cleaner and doesn't contain anywhere near the
amount of rubble that Word throws in....
Maybe it is only the Word specific rubble that isn't cleaned?

Also I have found something else. If html2text is enabled for one specific
user (smith at somedomain.com) but the message is cc'ed to another user
on the some domain/box (chris at somedomain.com) then both users will get the
message in `plain' text. This is logical because the one df/qf message is
converted but may be undesirable. Maybe a thing to add at the bottom of
the todo list if it's possible at all?

Remco

On Fri, 15 Nov 2002, Julian Field wrote:

> Can you send me 1 of the messages for me to experiment with please?
>
> At 08:59 15/11/2002, you wrote:
> >Hi!
> >
> >I am using Mailscanner 4.05-3 and have a mobile user collecting mail onm
> >his laptop. I want to use the html2text feature to prevent expensive
> >phonecalls to collect e-mail in HTML format that keep the connection open
> >for hours. MS is running on a RedHat 7.3 box.
> >
> >I have this line in my /etc/MailScanner/MailScanner.conf :
> >Convert HTML To Text = /etc/MailScanner/rules/html2text.rules
> >
> >The html2text.rules contains:
> >To              r.barendse at somedomain.com       yes
> >To              remco at somedomain.com            yes
> >Fromorto        default                         no
> >
> >The output in maillog seems correct:
> >Nov 15 09:44:19 linuxgw MailScanner[7367]: Content Checks: Need to convert
> >HTML to plain text in 1 messages
> >Nov 15 09:44:20 linuxgw MailScanner[7367]: Content Checks: Detected and
> >will convert HTML message to plain text in gAF8iAN07366
> >
> >When I start pine and look in the inbox, I still see small messages
> >being huge in size (13-40 Kb). The top of the e-mail contains stuff like :
> >@font-face { font-family: Tahoma; } @font-face { font-family: Verdana; }
> >@page Section1 {size:595.35pt 842.0pt; margin: 26.95pt 70.9pt 1.0in
> >70.9pt; mso-header-margin:
> >
> >and similar rubble throughout the e-mail :
> >….Whaaat ??
> ><![if !supportEmptyParas]><![endif]>
> >You gotta be kidding me&#8230;.?!
> >
> >Now if I retrieve the contents of the mailbox using Outlook Express the
> >e-mail *appears* to be stripped of html rubble because the formatting has
> >changed (colors and font sizes are different). The size of the e-mail is
> >slightly reduced (the original HTML mail was 21 Kb, the end result is 13
> >Kb (still too much for only 80 lines of text).
> >
> >Why is there still all this font and other rubble in the e-mails and how
> >can I strip them completely?
> >
> >Thanks!!
> >
> >Remco
> >
> >
> >
> >--
> >This message has been scanned for viruses and
> >dangerous content by MailScanner, and is
> >believed to be clean.
>
> --
> Julian Field                Teaching Systems Manager
> jkf at ecs.soton.ac.uk         Dept. of Electronics & Computer Science
> Tel. 023 8059 2817          University of Southampton
>                              Southampton SO17 1BJ
>
>


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the MailScanner mailing list