FuzzyOcr working but not via MailScanner

Glenn Steen glenn.steen at gmail.com
Thu Oct 19 08:52:45 IST 2006


On 18/10/06, Scott Silva <ssilva at sgvwater.com> wrote:
> Anthony Cartmell spake the following on 10/18/2006 1:05 PM:
> >> Answer 10:    MailScanner by default only passes the first 30kb of the
> >> mail to SpamAssassin.
> >
> > Interesting. Most of the spam in question is less than 30kb in total
> > size, though, and I don't see any error messages.
> >
> >> Another thing to try
> >> ====================
> >> Also try setting 'focr_verbose 2' in the config file, most messages
> >> report something like this..
> >
> > I get a lot of
> >
> > [2006-10-18 20:39:26] Debug mode: Set scansets to values:
> >                       $gocr -i -
> >                       $gocr -l 180 -d 2 -i -
> >
> > But only get messages like:
> >
> > [2006-10-18 16:17:11] Debug mode: Starting FuzzyOcr...
> > [2006-10-18 16:17:11] Debug mode: Attempting to load personal wordlist...
> > [2006-10-18 16:17:11] Debug mode: No personal wordlist found, skipping...
> > [2006-10-18 16:17:11] Debug mode: FuzzyOcr ending successfully...
> >
> > when I run the spamassassin test manually, not when it's run via
> > MailScanner :(
> >
> > The spam messages with inline GIFs are found by SARE_GIF_ATTACH, but
> > aren't scoring high enough to be marked.
> >
> > For example, a message that went through unmarked as spam, gets marked
> > as spam if I run spamassassin manually:
> >
> > spamassassin --debug -t <
> > /var/spool/MailScanner/quarantine/20061018/nonspam/k9IHujkc027719
> >
> > Hmmmm... it also gets a much higher score from this, as other tests also
> > seem to be missed when run from MailScanner...
> >
> > MailScanner score (1.508):
> >
> > 0.75    SARE_GIF_ATTACH    Email has a inline gif
> > 0.08    TW_DF    Odd Letter Triples with DF
> > 0.08    TW_GG    Odd Letter Triples with GG
> > 0.08    TW_GZ    Odd Letter Triples with GZ
> > 0.08    TW_RG    Odd Letter Triples with RG
> >
> > Manual spamassassin score (38.9):
> >
> > 3.8 HELO_DYNAMIC_IPADDR2   Relay HELO'd using suspicious hostname (IP
> > addr 2)
> >  1.1 EXTRA_MPART_TYPE       Header has extraneous Content-type:...type=
> > entry
> >  0.1 TW_GZ                  BODY: Odd Letter Triples with GZ
> >  0.1 TW_RG                  BODY: Odd Letter Triples with RG
> >  0.1 TW_GG                  BODY: Odd Letter Triples with GG
> >  0.1 TW_DF                  BODY: Odd Letter Triples with DF
> >  1.8 TVD_FW_GRAPHIC_NAME_LONG BODY: TVD_FW_GRAPHIC_NAME_LONG
> >  1.2 HTML_IMAGE_ONLY_20     BODY: HTML: images with 1600-2000 bytes of
> > words
> >  2.8 TVD_FW_GRAPHIC_ID1     BODY: TVD_FW_GRAPHIC_ID1
> >  0.0 HTML_MESSAGE           BODY: HTML included in message
> >  0.0 BAYES_50               BODY: Bayesian spam probability is 40 to
> > 60%   [score: 0.4908]
> >  0.8 SARE_GIF_ATTACH        FULL: Email has a inline gif
> >  2.0 RCVD_IN_SORBS_DUL      RBL: SORBS: sent directly from dynamic IP
> > address   [84.122.43.158 listed in dnsbl.sorbs.net]
> >  2.6 RCVD_IN_DSBL           RBL: Received via a relay in list.dsbl.org
> > [<http://dsbl.org/listing?84.122.43.158>]
> >  3.9 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
> > [84.122.43.158 listed in sbl-xbl.spamhaus.org]
> >  1.7 SARE_GIF_STOX          Inline Gif with little HTML
> >   17 FUZZY_OCR              BODY: Mail contains an image with common
> > spam text inside
> >                             Words found:
> >                             "alert" in 3 lines
> >                             "news" in 1 lines
> >                             "alert" in 3 lines
> >                             "stock" in 1 lines
> >                             "investor" in 2 lines
> >                             "company" in 1 lines
> >                             "trade" in 1 lines
> >                             "service" in 1 lines
> >                             "levitra" in 2 lines
> >                             (15 word occurrences found)
> >
> You must have some permission problems, as I did the same thing, and got near
> identical scores (at least to the first decimal - 32.7 in smamassassin 32.73
> in mailscanner.
> Maybe Julian can confirm if spamassassin called by mailscanner can still load
> plugins that have their loadplugin line in a .cf file instead of being called
> in a .pre file.. I seem to remember some sort of privilege change when
> spamassassin 3.0.0 or maybe 3.1.0 came out.
>
I think you are on to something there Scott. I'll offer a guess...
Anthony, are you by any chance running Postfix? There likely is a
problem for the user your MTA (and hence MailScanner) is running as.
You don't get the network tests, bayes etc and that is the real
"killer" here.

Check your SA lint and testmessage as the user you have for your
MTA... Likely you'll see the same result you had in MailScanner then.

If you do run Postfix, make sure there is a writable SpamAssassin
State Dir set, and/or that you create ~/.spamassassin, ~/.razor and
~/.pyzor (as appropriate for the set of tools you use) for the postfix
user, and that that user can write to those directories. Also, make
sure you have (in local.cf or mailscanner.cf) a correct bayes_path
(which actually end in the first fragment of the bayes filenames) and
bayes_file_mode ( it should be 0770, or similar... I need that since I
use MailWatch and let the webservers group have write perms...).

HtH
-- 
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se


More information about the MailScanner mailing list