OT: pdf spam
prandal at herefordshire.gov.uk
Wed Jun 20 16:45:58 IST 2007
pdftotext does indeed convert that example into text we can do things
> -----Original Message-----
> From: mailscanner-bounces at lists.mailscanner.info
> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf
> Of Glenn Steen
> Sent: 20 June 2007 16:24
> To: MailScanner discussion
> Subject: Re: OT: pdf spam
> On 20/06/07, Gareth <list-mailscanner at linguaphone.com> wrote:
> > On Wed, 2007-06-20 at 15:52, Daniel Maher wrote:
> > > > I was just about to post about these myself. I have
> attached an example.
> > > >
> > > > I have found if I use 'less' to view the document it
> renders it to plain
> > > > text and is very readable. So would it be possible to
> convert a pdf to
> > > > plain text and append it to the email message for the
> purposes of the
> > > > spamassassin checks?
> > > >
> > > > Alternativly perhaps this is a job for MCP?
> > > >
> > > > Another possibility would be for the author of fuzzyocr
> to recognise
> > > > .pdf files and render them so they can be scanned for
> keywords. I can
> > > > think of a few keyword and load issues this could cause though.
> > >
> > > I'm not sure that the example was attached - at the very
> least, I didn't get it over here. :) Would you be so kind as
> to forward a sample? Thanks!
> > >
> > It was too big to send so I have uploaded it :-
> > http://www.gbnetwork.co.uk/temp/ee_report.pdf
> Ow, looks good, doesn't it:-).
> I wonder if one could do something with pdftotext (that less uses),
> since it mostly is text anyway ... pdftotext (or similar tools...
> that's just the one used by lesspipe) aren't that horrendous, not like
> fuzzyocr, but still... and how soon the b*stards will start having
> "only image PDFs"...
> -- Glenn
> email: glenn < dot > steen < at > gmail < dot > com
> work: glenn < dot > steen < at > ap1 < dot > se
> MailScanner mailing list
> mailscanner at lists.mailscanner.info
> Before posting, read http://wiki.mailscanner.info/posting
> Support MailScanner development - buy the book off the website!
More information about the MailScanner