OT: pdf spam
Rick Cooper
rcooper at dwford.com
Wed Jun 20 20:12:29 IST 2007
> -----Original Message-----
> From: mailscanner-bounces at lists.mailscanner.info
> [mailto:mailscanner-bounces at lists.mailscanner.info] On
> Behalf Of Glenn Steen
> Sent: Wednesday, June 20, 2007 11:24 AM
> To: MailScanner discussion
> Subject: Re: OT: pdf spam
>
[...]
> Ow, looks good, doesn't it:-).
> I wonder if one could do something with pdftotext (that less uses),
> since it mostly is text anyway ... pdftotext (or similar tools...
> that's just the one used by lesspipe) aren't that
> horrendous, not like
> fuzzyocr, but still... and how soon the b*stards will start having
> "only image PDFs"...
>
Not too difficult to handle with pdftotext, however you have to remove all
the missing header stuff form the report, recalc the score without the
missin header stuff (I have a proof of concept program written), and of
course if it's only an image file you have to extract the image (and
probably convert from .ppm to .jpg), create a dummy email and attach the
image so fuzzy would work on it.
This is why I wish SpamAssassin supported pre-processors which would be a
better use of FuzzyOcr. It would create the additional text and SpamAssassin
would consider that ouput with the message it's self. I have suggested
adding pre-processing to SpamAssassin before and have never gotten a
response.
Rick
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the MailScanner
mailing list