Phishing detection gets confused by malformed HTML

John Wilcock john at TRADOC.FR
Thu Feb 17 10:37:59 GMT 2005

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "US-ASCII" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Julian Field wrote:
> John Wilcock wrote:
>> If you have a bit more time for phishing mods, how about the two false
>> positive cases I reported in January?
>>> Click here to <a href="">visit
> Only looking at the last "word" in the text is a dodgy thing to do as
> spammers could completely defeat it by putting in 1 space in the text,
> and most users wouldn't notice the extra space.

Hmm. I hadn't thought this through enough.

I was going to suggest looking at the only "word" that "looks like" a
URL, but then I realised that there are cases like

<a href=""/><span style=invisible></span>yourbank.<span style=tiny> </span>com</a>

I can't see any way you could detect things like that and worse, yet not
trigger on my example above. Forget I asked - you're one step ahead of
most of us as usual.

>>> <a href="">all about .net technology</a>
> Look for .net with a space both sides of it? It would help but wouldn't
> be a complete solution by any means.

It would have to be .net with a word-separator either side, in case the
".net" is at the beginning or the end of the text.

And even that might be open to abuse by phishing for .net domains.


-- Over 2500 webcams from ski resorts around the world -
-- Translate your technical documents and web pages    -

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at with the words:
'leave mailscanner' in the body of the email.
Before posting, read the MAQ ( and
the archives (

Support MailScanner development - buy the book off the website!

More information about the MailScanner mailing list