OT: pdf spam

Rick Cooper rcooper at dwford.com
Wed Jun 20 20:12:29 IST 2007


 

 > -----Original Message-----
 > From: mailscanner-bounces at lists.mailscanner.info 
 > [mailto:mailscanner-bounces at lists.mailscanner.info] On 
 > Behalf Of Glenn Steen
 > Sent: Wednesday, June 20, 2007 11:24 AM
 > To: MailScanner discussion
 > Subject: Re: OT: pdf spam
 > 
[...]
 > Ow, looks good, doesn't it:-).
 > I wonder if one could do something with pdftotext (that less uses),
 > since it mostly is text anyway ... pdftotext (or similar tools...
 > that's just the one used by lesspipe) aren't that 
 > horrendous, not like
 > fuzzyocr, but still... and how soon the b*stards will start having
 > "only image PDFs"...
 > 

Not too difficult to handle with pdftotext, however you have to remove all
the missing header stuff form the report, recalc the score without the
missin header stuff (I have a proof of concept program written), and of
course if it's only an image file you have to extract the image (and
probably convert from .ppm to .jpg), create a dummy email and attach the
image so fuzzy would work on it.

This is why I wish SpamAssassin supported pre-processors which would be a
better use of FuzzyOcr. It would create the additional text and SpamAssassin
would consider that ouput with the message it's self. I have suggested
adding pre-processing to SpamAssassin before and have never gotten a
response.

Rick


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




More information about the MailScanner mailing list