OT: pdf spam

Gareth list-mailscanner at linguaphone.com
Wed Jun 20 15:35:40 IST 2007


On Wed, 2007-06-20 at 14:49, Sattler, Tim wrote:
> Hello,
> 
> today we received a lot of penny stock spam with just dummy text and a
> pdf attachment "<username>_report.pdf". All "spammy" key words are
> inside the pdf document, so these mails are not marked as spam in the
> majority of cases. If this becomes fashion, I guess it will require new
> techniques like regex filtering inside attachments or hash databases for
> "spammy" documents.  

I was just about to post about these myself. I have attached an example.

I have found if I use 'less' to view the document it renders it to plain
text and is very readable. So would it be possible to convert a pdf to
plain text and append it to the email message for the purposes of the
spamassassin checks?

Alternativly perhaps this is a job for MCP?

Another possibility would be for the author of fuzzyocr to recognise
.pdf files and render them so they can be scanned for keywords. I can
think of a few keyword and load issues this could cause though.



More information about the MailScanner mailing list