OT: pdf spam

Randal, Phil prandal at herefordshire.gov.uk
Wed Jun 20 16:45:58 IST 2007


pdftotext does indeed convert that example into text we can do things
with.

Cheers,

Phil

--
Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK  

> -----Original Message-----
> From: mailscanner-bounces at lists.mailscanner.info 
> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf 
> Of Glenn Steen
> Sent: 20 June 2007 16:24
> To: MailScanner discussion
> Subject: Re: OT: pdf spam
> 
> On 20/06/07, Gareth <list-mailscanner at linguaphone.com> wrote:
> > On Wed, 2007-06-20 at 15:52, Daniel Maher wrote:
> > > > I was just about to post about these myself. I have 
> attached an example.
> > > >
> > > > I have found if I use 'less' to view the document it 
> renders it to plain
> > > > text and is very readable. So would it be possible to 
> convert a pdf to
> > > > plain text and append it to the email message for the 
> purposes of the
> > > > spamassassin checks?
> > > >
> > > > Alternativly perhaps this is a job for MCP?
> > > >
> > > > Another possibility would be for the author of fuzzyocr 
> to recognise
> > > > .pdf files and render them so they can be scanned for 
> keywords. I can
> > > > think of a few keyword and load issues this could cause though.
> > >
> > > I'm not sure that the example was attached - at the very 
> least, I didn't get it over here. :)  Would you be so kind as 
> to forward a sample?  Thanks!
> > >
> > It was too big to send so I have uploaded it :-
> > http://www.gbnetwork.co.uk/temp/ee_report.pdf
> >
> Ow, looks good, doesn't it:-).
> I wonder if one could do something with pdftotext (that less uses),
> since it mostly is text anyway ... pdftotext (or similar tools...
> that's just the one used by lesspipe) aren't that horrendous, not like
> fuzzyocr, but still... and how soon the b*stards will start having
> "only image PDFs"...
> 
> -- 
> -- Glenn
> email: glenn < dot > steen < at > gmail < dot > com
> work: glenn < dot > steen < at > ap1 < dot > se
> -- 
> MailScanner mailing list
> mailscanner at lists.mailscanner.info
> http://lists.mailscanner.info/mailman/listinfo/mailscanner
> 
> Before posting, read http://wiki.mailscanner.info/posting
> 
> Support MailScanner development - buy the book off the website! 
> 


More information about the MailScanner mailing list