OT: pdf spam

Scott Silva ssilva at sgvwater.com
Wed Jun 20 17:21:18 IST 2007


Glenn Steen spake the following on 6/20/2007 8:23 AM:
> On 20/06/07, Gareth <list-mailscanner at linguaphone.com> wrote:
>> On Wed, 2007-06-20 at 15:52, Daniel Maher wrote:
>> > > I was just about to post about these myself. I have attached an
>> example.
>> > >
>> > > I have found if I use 'less' to view the document it renders it to
>> plain
>> > > text and is very readable. So would it be possible to convert a
>> pdf to
>> > > plain text and append it to the email message for the purposes of the
>> > > spamassassin checks?
>> > >
>> > > Alternativly perhaps this is a job for MCP?
>> > >
>> > > Another possibility would be for the author of fuzzyocr to recognise
>> > > .pdf files and render them so they can be scanned for keywords. I can
>> > > think of a few keyword and load issues this could cause though.
>> >
>> > I'm not sure that the example was attached - at the very least, I
>> didn't get it over here. :)  Would you be so kind as to forward a
>> sample?  Thanks!
>> >
>> It was too big to send so I have uploaded it :-
>> http://www.gbnetwork.co.uk/temp/ee_report.pdf
>>
> Ow, looks good, doesn't it:-).
> I wonder if one could do something with pdftotext (that less uses),
> since it mostly is text anyway ... pdftotext (or similar tools...
> that's just the one used by lesspipe) aren't that horrendous, not like
> fuzzyocr, but still... and how soon the b*stards will start having
> "only image PDFs"...
> 
That explains why it was unintelligible on my system. No xpdf so no pdftotext.

-- 

MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!



More information about the MailScanner mailing list