Email is only an image - tag as spam?

Mariano Absatz mailscanner at LISTS.COM.AR
Thu Dec 4 15:35:44 GMT 2003


these standard SA 2.6 rules should match these messages:

# HTML_IMAGE_AREA - lots of image area (absolute)
body HTML_IMAGE_AREA_04  eval:html_range('image_area','400000','500000')
body HTML_IMAGE_AREA_05  eval:html_range('image_area','500000','600000')
body HTML_IMAGE_AREA_06  eval:html_range('image_area','600000','700000')
body HTML_IMAGE_AREA_07  eval:html_range('image_area','700000','800000')
body HTML_IMAGE_AREA_08  eval:html_range('image_area','800000','900000')
body HTML_IMAGE_AREA_09  eval:html_range('image_area','900000')
describe HTML_IMAGE_AREA_04     HTML has 4-5 kilopixels of images
describe HTML_IMAGE_AREA_05     HTML has 5-6 kilopixels of images
describe HTML_IMAGE_AREA_06     HTML has 6-7 kilopixels of images
describe HTML_IMAGE_AREA_07     HTML has 7-8 kilopixels of images
describe HTML_IMAGE_AREA_08     HTML has 8-9 kilopixels of images
describe HTML_IMAGE_AREA_09     HTML has over 9 kilopixels of images
# HTML_IMAGE_ONLY - not much text with images (absolute)
body HTML_IMAGE_ONLY_02         eval:html_image_only('0000','0200')
body HTML_IMAGE_ONLY_04         eval:html_image_only('0200','0400')
body HTML_IMAGE_ONLY_06         eval:html_image_only('0400','0600')
body HTML_IMAGE_ONLY_08         eval:html_image_only('0600','0800')
body HTML_IMAGE_ONLY_10         eval:html_image_only('0800','1000')
body HTML_IMAGE_ONLY_12         eval:html_image_only('1000','1200')
describe HTML_IMAGE_ONLY_02     HTML: images with 0-200 bytes of words
describe HTML_IMAGE_ONLY_04     HTML: images with 200-400 bytes of words
describe HTML_IMAGE_ONLY_06     HTML: images with 400-600 bytes of words
describe HTML_IMAGE_ONLY_08     HTML: images with 600-800 bytes of words
describe HTML_IMAGE_ONLY_10     HTML: images with 800-1000 bytes of words
describe HTML_IMAGE_ONLY_12     HTML: images with 1000-1200 bytes of 
# HTML_IMAGE_RATIO - more image area than text (ratio)
body HTML_IMAGE_RATIO_02        eval:html_image_ratio('0.000','0.002')
body HTML_IMAGE_RATIO_04        eval:html_image_ratio('0.002','0.004')
body HTML_IMAGE_RATIO_06        eval:html_image_ratio('0.004','0.006')
body HTML_IMAGE_RATIO_08        eval:html_image_ratio('0.006','0.008')
body HTML_IMAGE_RATIO_10        eval:html_image_ratio('0.008','0.010')
body HTML_IMAGE_RATIO_12        eval:html_image_ratio('0.010','0.012')
body HTML_IMAGE_RATIO_14        eval:html_image_ratio('0.012','0.014')
describe HTML_IMAGE_RATIO_02  HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_04  HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_06  HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_08  HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_10  HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_12  HTML has a low ratio of text to image area
describe HTML_IMAGE_RATIO_14  HTML has a low ratio of text to image area

And these are the standard scores for them:
score HTML_IMAGE_AREA_05 0.283 1.342 1.122 2.199
score HTML_IMAGE_AREA_07 1.615 1.681 1.997 1.022
score HTML_IMAGE_ONLY_02 2.751 2.244 1.472 1.230
score HTML_IMAGE_ONLY_04 1.898 1.527 1.136 1.001
score HTML_IMAGE_ONLY_06 1.531 1.709 0.527 1.439
score HTML_IMAGE_ONLY_08 0.525 0.837 0 0
score HTML_IMAGE_ONLY_10 0.615 1.138 0.431 0.019
score HTML_IMAGE_ONLY_12 0.787 1.012 0.483 0
score HTML_IMAGE_RATIO_04 0.821 0.892 0.667 1.050
score HTML_IMAGE_RATIO_06 0.935 0.317 0.649 0
score HTML_IMAGE_RATIO_08 0.605 0.408 0.413 0.359
score HTML_IMAGE_RATIO_10 0.535 0.488 0.619 0.315
score HTML_IMAGE_RATIO_12 0.324 0 0 0
score HTML_IMAGE_RATIO_14 0 0.276 0 0
score HTML_IMAGE_AREA_04 0
score HTML_IMAGE_AREA_09 0
score HTML_IMAGE_AREA_08 0
score HTML_IMAGE_AREA_06 0

Strangely enough (I'll never understand the "genetic algorithms" used to 
generate these scores) some of them "in the middle" are 0... that is, 
HTML_IMAGE_ONLY_06 and HTML_IMAGE_ONLY_10 are non-0, but 
HTML_IMAGE_ONLY_08 is 0 (in the fourth column).

What you can do is to raise these scores in spam.assassin.conf so they 
are more likely to trigger.

One of the things I've seen are messages which apparently are only 
comprised of an image, but that have hidden text (same color as 
background), even specially crafted "non-spam-looking" text that 
decreases the score and avoids some of these rules... I've even seen 
almost identical messages to score somehow above 5 and the next day score 
below 3... evidently many spammers are checking their messages with 
SpamAssassin, and adjusting them... playing around with some scores 
(especially, raising these "0" scores) might help you a lot (but be 
careful with false positives, check your logs).


El 4 Dec 2003 a las 9:04, Jody Cleveland escribió:

> Hello,
> I've noticed a new trend with spam lately. I've been getting emails that
> are one big image, which aren't caught by mailscanner or spamassassin.
> Is there a rule somewhere, where I can specify that if an email contains
> only an image to tag it as spam?
> --
> Jody Cleveland
> (cleveland at

Mariano Absatz
El Baby
Suicidal twin kills sister by mistake!

More information about the MailScanner mailing list