OT: pdf spam

Rick Cooper rcooper at dwford.com
Wed Jun 20 20:12:29 IST 2007


 > -----Original Message-----
 > From: mailscanner-bounces at lists.mailscanner.info 
 > [mailto:mailscanner-bounces at lists.mailscanner.info] On 
 > Behalf Of Glenn Steen
 > Sent: Wednesday, June 20, 2007 11:24 AM
 > To: MailScanner discussion
 > Subject: Re: OT: pdf spam
 > Ow, looks good, doesn't it:-).
 > I wonder if one could do something with pdftotext (that less uses),
 > since it mostly is text anyway ... pdftotext (or similar tools...
 > that's just the one used by lesspipe) aren't that 
 > horrendous, not like
 > fuzzyocr, but still... and how soon the b*stards will start having
 > "only image PDFs"...

Not too difficult to handle with pdftotext, however you have to remove all
the missing header stuff form the report, recalc the score without the
missin header stuff (I have a proof of concept program written), and of
course if it's only an image file you have to extract the image (and
probably convert from .ppm to .jpg), create a dummy email and attach the
image so fuzzy would work on it.

This is why I wish SpamAssassin supported pre-processors which would be a
better use of FuzzyOcr. It would create the additional text and SpamAssassin
would consider that ouput with the message it's self. I have suggested
adding pre-processing to SpamAssassin before and have never gotten a


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

More information about the MailScanner mailing list