OT: pdf spam

Glenn Steen glenn.steen at gmail.com
Wed Jun 20 16:23:30 IST 2007


On 20/06/07, Gareth <list-mailscanner at linguaphone.com> wrote:
> On Wed, 2007-06-20 at 15:52, Daniel Maher wrote:
> > > I was just about to post about these myself. I have attached an example.
> > >
> > > I have found if I use 'less' to view the document it renders it to plain
> > > text and is very readable. So would it be possible to convert a pdf to
> > > plain text and append it to the email message for the purposes of the
> > > spamassassin checks?
> > >
> > > Alternativly perhaps this is a job for MCP?
> > >
> > > Another possibility would be for the author of fuzzyocr to recognise
> > > .pdf files and render them so they can be scanned for keywords. I can
> > > think of a few keyword and load issues this could cause though.
> >
> > I'm not sure that the example was attached - at the very least, I didn't get it over here. :)  Would you be so kind as to forward a sample?  Thanks!
> >
> It was too big to send so I have uploaded it :-
> http://www.gbnetwork.co.uk/temp/ee_report.pdf
>
Ow, looks good, doesn't it:-).
I wonder if one could do something with pdftotext (that less uses),
since it mostly is text anyway ... pdftotext (or similar tools...
that's just the one used by lesspipe) aren't that horrendous, not like
fuzzyocr, but still... and how soon the b*stards will start having
"only image PDFs"...

-- 
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se


More information about the MailScanner mailing list