FuzzyOcr working but not via MailScanner
Glenn Steen
glenn.steen at gmail.com
Thu Oct 19 08:52:45 IST 2006
On 18/10/06, Scott Silva <ssilva at sgvwater.com> wrote:
> Anthony Cartmell spake the following on 10/18/2006 1:05 PM:
> >> Answer 10: MailScanner by default only passes the first 30kb of the
> >> mail to SpamAssassin.
> >
> > Interesting. Most of the spam in question is less than 30kb in total
> > size, though, and I don't see any error messages.
> >
> >> Another thing to try
> >> ====================
> >> Also try setting 'focr_verbose 2' in the config file, most messages
> >> report something like this..
> >
> > I get a lot of
> >
> > [2006-10-18 20:39:26] Debug mode: Set scansets to values:
> > $gocr -i -
> > $gocr -l 180 -d 2 -i -
> >
> > But only get messages like:
> >
> > [2006-10-18 16:17:11] Debug mode: Starting FuzzyOcr...
> > [2006-10-18 16:17:11] Debug mode: Attempting to load personal wordlist...
> > [2006-10-18 16:17:11] Debug mode: No personal wordlist found, skipping...
> > [2006-10-18 16:17:11] Debug mode: FuzzyOcr ending successfully...
> >
> > when I run the spamassassin test manually, not when it's run via
> > MailScanner :(
> >
> > The spam messages with inline GIFs are found by SARE_GIF_ATTACH, but
> > aren't scoring high enough to be marked.
> >
> > For example, a message that went through unmarked as spam, gets marked
> > as spam if I run spamassassin manually:
> >
> > spamassassin --debug -t <
> > /var/spool/MailScanner/quarantine/20061018/nonspam/k9IHujkc027719
> >
> > Hmmmm... it also gets a much higher score from this, as other tests also
> > seem to be missed when run from MailScanner...
> >
> > MailScanner score (1.508):
> >
> > 0.75 SARE_GIF_ATTACH Email has a inline gif
> > 0.08 TW_DF Odd Letter Triples with DF
> > 0.08 TW_GG Odd Letter Triples with GG
> > 0.08 TW_GZ Odd Letter Triples with GZ
> > 0.08 TW_RG Odd Letter Triples with RG
> >
> > Manual spamassassin score (38.9):
> >
> > 3.8 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP
> > addr 2)
> > 1.1 EXTRA_MPART_TYPE Header has extraneous Content-type:...type=
> > entry
> > 0.1 TW_GZ BODY: Odd Letter Triples with GZ
> > 0.1 TW_RG BODY: Odd Letter Triples with RG
> > 0.1 TW_GG BODY: Odd Letter Triples with GG
> > 0.1 TW_DF BODY: Odd Letter Triples with DF
> > 1.8 TVD_FW_GRAPHIC_NAME_LONG BODY: TVD_FW_GRAPHIC_NAME_LONG
> > 1.2 HTML_IMAGE_ONLY_20 BODY: HTML: images with 1600-2000 bytes of
> > words
> > 2.8 TVD_FW_GRAPHIC_ID1 BODY: TVD_FW_GRAPHIC_ID1
> > 0.0 HTML_MESSAGE BODY: HTML included in message
> > 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to
> > 60% [score: 0.4908]
> > 0.8 SARE_GIF_ATTACH FULL: Email has a inline gif
> > 2.0 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP
> > address [84.122.43.158 listed in dnsbl.sorbs.net]
> > 2.6 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org
> > [<http://dsbl.org/listing?84.122.43.158>]
> > 3.9 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL
> > [84.122.43.158 listed in sbl-xbl.spamhaus.org]
> > 1.7 SARE_GIF_STOX Inline Gif with little HTML
> > 17 FUZZY_OCR BODY: Mail contains an image with common
> > spam text inside
> > Words found:
> > "alert" in 3 lines
> > "news" in 1 lines
> > "alert" in 3 lines
> > "stock" in 1 lines
> > "investor" in 2 lines
> > "company" in 1 lines
> > "trade" in 1 lines
> > "service" in 1 lines
> > "levitra" in 2 lines
> > (15 word occurrences found)
> >
> You must have some permission problems, as I did the same thing, and got near
> identical scores (at least to the first decimal - 32.7 in smamassassin 32.73
> in mailscanner.
> Maybe Julian can confirm if spamassassin called by mailscanner can still load
> plugins that have their loadplugin line in a .cf file instead of being called
> in a .pre file.. I seem to remember some sort of privilege change when
> spamassassin 3.0.0 or maybe 3.1.0 came out.
>
I think you are on to something there Scott. I'll offer a guess...
Anthony, are you by any chance running Postfix? There likely is a
problem for the user your MTA (and hence MailScanner) is running as.
You don't get the network tests, bayes etc and that is the real
"killer" here.
Check your SA lint and testmessage as the user you have for your
MTA... Likely you'll see the same result you had in MailScanner then.
If you do run Postfix, make sure there is a writable SpamAssassin
State Dir set, and/or that you create ~/.spamassassin, ~/.razor and
~/.pyzor (as appropriate for the set of tools you use) for the postfix
user, and that that user can write to those directories. Also, make
sure you have (in local.cf or mailscanner.cf) a correct bayes_path
(which actually end in the first fragment of the bayes filenames) and
bayes_file_mode ( it should be 0770, or similar... I need that since I
use MailWatch and let the webservers group have write perms...).
HtH
--
-- Glenn
email: glenn < dot > steen < at > gmail < dot > com
work: glenn < dot > steen < at > ap1 < dot > se
More information about the MailScanner
mailing list