OT: pdf spam
list-mailscanner at linguaphone.com
Wed Jun 20 15:35:40 IST 2007
On Wed, 2007-06-20 at 14:49, Sattler, Tim wrote:
> today we received a lot of penny stock spam with just dummy text and a
> pdf attachment "<username>_report.pdf". All "spammy" key words are
> inside the pdf document, so these mails are not marked as spam in the
> majority of cases. If this becomes fashion, I guess it will require new
> techniques like regex filtering inside attachments or hash databases for
> "spammy" documents.
I was just about to post about these myself. I have attached an example.
I have found if I use 'less' to view the document it renders it to plain
text and is very readable. So would it be possible to convert a pdf to
plain text and append it to the email message for the purposes of the
Alternativly perhaps this is a job for MCP?
Another possibility would be for the author of fuzzyocr to recognise
.pdf files and render them so they can be scanned for keywords. I can
think of a few keyword and load issues this could cause though.
More information about the MailScanner