FuzzyOcr working but not via MailScanner

Glenn Steen glenn.steen at gmail.com
Thu Oct 19 14:32:29 IST 2006


On 19/10/06, Rick Cooper <rcooper at dwford.com> wrote:
>
>
> > -----Original Message-----
> > From: mailscanner-bounces at lists.mailscanner.info
> > [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf
> > Of Glenn Steen
> > Sent: Thursday, October 19, 2006 4:42 AM
> > To: MailScanner discussion
> > Subject: Re: FuzzyOcr working but not via MailScanner
> >
> > On 19/10/06, Glenn Steen <glenn.steen at gmail.com> wrote:
> > > On 19/10/06, Anthony Cartmell <ajcartmell at fonant.com> wrote:
> > > > > I think you are on to something there Scott. I'll offer
> > a guess...
> > > > > Anthony, are you by any chance running Postfix?
> > > >
> > > > Nope, sendmail.
> > >
> > > OK. Was just a thought:-).
> > >
> > > > It seems to be a "search path" issue: MailScanner skips a
> > lot of setup
> > > > stuff that spamassassin does from the command line. From
> > MailScanner, the
> > > > whole /var/lib/spamassassin/3.001003 directory was being
> > missed and hence
> > > > a whole load of default rules.
> [...]
>
> > (just proving my PF "roots"....)
> >
> > I looked at SpamAssassin.pm, and this snippet is the clicher:
> > -----
> > @default_rules_path = (
> >   '__local_state_dir__/__version__',
> >   '__def_rules_dir__',
> >   '__prefix__/share/spamassassin',
> >   '/usr/local/share/spamassassin',
> >   '/usr/share/spamassassin',
> > );
> > -----
> > As you can see, the normal way to find the sa-updated files is via
> > "__local_state_dir__/__version__", where __local_state_dir__ defautls
> > to /var/lib/spamassassin .... and __version__ is set (just above that)
> > to something like 3.00100X ... If you did an upgrade (and perhaps
> > didn't do an sa-update afterwards) I suppose you could end up in a
> > situation where the new SA version couldn't find the updated files, I
> > suppose (someone a bit more fluent (than me) in how SA is instantiated
> > will probably eb able to tell if this supposition is correct).
> > I suppose running sa-update should clear any such problem... And you
> > might clear the FuzzyOcr problem by resetting the MailScanner option
> > for site rules.
>
> I can tell you from experience that if you do not run sa-update following a
> SA update your /var/lib/spamassassin<version> directory is not created and
> when SA does it's path checks it will (there is a thread on this on one of
> the SA lists)
>
> 1. check for /var/lib/spamassassin<version>
> 2. If step one fails it will use the local site dir
>
> It's unnecessarily complicated in my opinion compared to how it used to be.
> One would think you could do something like the old
>         $defaultrules = $test->{default_rules_path};
>         $defaultrules ||= $test->first_existing_path
> (@Mail::SpamAssassin::default_rules_path);
>
> and use
>
> $defaultupdaterules = $test->{default_update_rules_path};
> $defaultupdaterules ||= $test->first_existing_path
> (@Mail::SpamAssassin::default_update_rules_path);
>
> If I am remembering the balance of the thread it's now something like
>
> my $SAVersion = $Mail::SpamAssassin::VERSION;
> my $defaultrules = "/var/lib/spamassassin/$SAVersion" if -d
> "/var/lib/spamassassin/$SAVersion";
> $defaultrules ||= $test->first_existing_path
> (@Mail::SpamAssassin::default_rules_path);

As I thought. Thanks for corroboration Rick.

> In my case I forgot the sa-update after 3.17 and
> /var/lib/spamassassin/3.001007 was not created so it then defaulted to
> /usr/share/spamassassin even though /var/lib/spamassassin/3.001001 was still
> there (yes I skipped some updates, time constraints). I did not change the
> "SpamAssassin Default Rules Dir" setting from the default (blank) so I guess
> I am wondering if the proper /var/lib/spamassassin/3.001007 directory is
> being used when SA is run from MailScanner? Julian can you answer that
> question?

Since MS instantiates two SpamAssassin objects (one for SA, one for
MCP) and the logic for this is in the object creation method, I assume
this is working without changing that setting. And my MailWatch
searches (on what rules fire in the SpamReport) back me up in that
assumption:-).
But a conciliating comment from jules wouldn't hurt:-D.

> Also, along the lines of the thread. When I first installed FuzzyOcr and was
> testing it I forgot to set the value of focr_autodisable_score (in
> FuzzyOcr.cf) lower (I think the default is 10 or something like that) so it
> appeared that Fuzzy wasn't doing anything because the spams with images were
> already scoring above that. I lowered the score to 2 (temporarily) and then
> Fuzzy was hitting. Of course I raised it back to my hit range so it wouldn't
> waste resources when the messages was already scored as spam.

Right. Forgot about that. Thanks a bundle Rick.

-- 
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se


More information about the MailScanner mailing list