Problems with some 'add-on' apps...

Tue Feb 6 15:43:37 CET 2007

I've inherited a system that's being used as a spam proxy/gateway. I'm
getting my head around it. Prior to getting this system we'd been
using some commercial stuff called Declude and Message Sniffer. We're
moving away from that in favor of something that's OSS and more
effective.

The system itself is s using MailScanner, currently ver 4.55.10,
SpamAssassin version 3.1.5, with a slew of add-ons, including
FuzzyOcr, Rules Du Jour, Pyzor, Razor, and a few other things.
PostGrey 1.27, and PostFix 2.2.2.

I'm running into a couple issues, and I see newer versions out. I
normally would guess that upgrading is simply the answer, but that's
almost like saying the fix to /every/ Windows(r) problem is to reboot.
(e.g. most of the time yes, but not every time).

For example.

FuzzyOcr. I turned up the verbosity to 3 (debug) and it doesn't
complain in the logs about not finding the image (stock alerts, etc.)
spam, but in MailWatch, I view the messages that MS is catching, and
none of them are showing the hits from Fuzzy OCR.

I am still catching a large number of the image spam messages in
quarantine (/var/spool/MailScanner/quarantine/[date]/spam) that I can
use to test. I know how to use spamassassin -t < (messageid) -- and it
will show things like the Fuzzy OCR hits.  But is there a way to test
the message from MailScanner's point of view?

Here's an example:

>From the web gui (mailwatch) on a message that has image spam:

cached    not
    score=19.406
4    required
    autolearn=spam
-0.18    BAYES_40    Bayesian spam probability is 20 to 40%
3.07    HELO_DYNAMIC_DHCP    Relay HELO'd using suspicious hostname (DHCP)
4.20    HELO_DYNAMIC_IPADDR    Relay HELO'd using suspicious hostname
(IP addr 1)
0.50    HTML_40_50    Message is 40% to 50% HTML
3.13    HTML_IMAGE_ONLY_08    HTML: images with 400-800 bytes of words
0.00    HTML_MESSAGE    HTML included in message
0.00    MIME_HTML_ONLY    Message only has text/html MIME parts
1.56    RCVD_IN_BL_SPAMCOP_NET    Received via a relay in bl.spamcop.net
2.05    RCVD_IN_SORBS_DUL    SORBS: sent directly from dynamic IP address
3.90    RCVD_IN_XBL    Received via a relay in Spamhaus XBL
1.20    TVD_FW_GRAPHIC_NAME_MID

And the same message with spamassassin -t < the message id  reports
the following:

Content analysis details:   (33.6 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 3.1 HELO_DYNAMIC_DHCP      Relay HELO'd using suspicious hostname (DHCP)
 4.2 HELO_DYNAMIC_IPADDR    Relay HELO'd using suspicious hostname (IP addr
                            1)
 0.5 HTML_40_50             BODY: Message is 40% to 50% HTML
 1.2 TVD_FW_GRAPHIC_NAME_MID BODY: TVD_FW_GRAPHIC_NAME_MID
 0.0 HTML_MESSAGE           BODY: HTML included in message
 3.1 HTML_IMAGE_ONLY_08     BODY: HTML: images with 400-800 bytes of words
 3.5 BAYES_99               BODY: Bayesian spam probability is 99 to 100%
                            [score: 1.0000]
 0.0 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
  10 FUZZY_OCR              BODY: Mail contains an image with common
spam text inside
                            Words found:
                            "buy" in 1 lines
                            "symbol" in 1 lines
                            "tuesday" in 1 lines
                            "news" in 2 lines
                            (7.5 word occurrences found)
  2.0 RCVD_IN_SORBS_DUL      RBL: SORBS: sent directly from dynamic IP address
                            [72.225.192.40 listed in dnsbl.sorbs.net]
  1.6 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
               [Blocked - see <http://www.spamcop.net/bl.shtml?72.225.192.40>]
  3.9 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
                            [72.225.192.40 listed in zen.spamhaus.org]

Some things jump out to me. One is that in the command line test, SA
says 5.0 points are required, but MS is only looking for 4. Is this
because I'm running the CLI test of SA as root and it's seeing a
different prefs file?  The one test called "TVD_FW_GRAPHIC_NAME_MID
appears in the MailWatch/MailScanner test but not the FuzzyOCR test.
Yet, the Fuzzy_OCR test appears int he CLI test but not in the MW/MS
test.

Thanks in advance for any assistance in looking at this and getting
straightened out.

Angelo