FuzzyOcr working but not via MailScanner
Scott Silva
ssilva at sgvwater.com
Wed Oct 18 22:10:27 IST 2006
Anthony Cartmell spake the following on 10/18/2006 1:05 PM:
>> Answer 10: MailScanner by default only passes the first 30kb of the
>> mail to SpamAssassin.
>
> Interesting. Most of the spam in question is less than 30kb in total
> size, though, and I don't see any error messages.
>
>> Another thing to try
>> ====================
>> Also try setting 'focr_verbose 2' in the config file, most messages
>> report something like this..
>
> I get a lot of
>
> [2006-10-18 20:39:26] Debug mode: Set scansets to values:
> $gocr -i -
> $gocr -l 180 -d 2 -i -
>
> But only get messages like:
>
> [2006-10-18 16:17:11] Debug mode: Starting FuzzyOcr...
> [2006-10-18 16:17:11] Debug mode: Attempting to load personal wordlist...
> [2006-10-18 16:17:11] Debug mode: No personal wordlist found, skipping...
> [2006-10-18 16:17:11] Debug mode: FuzzyOcr ending successfully...
>
> when I run the spamassassin test manually, not when it's run via
> MailScanner :(
>
> The spam messages with inline GIFs are found by SARE_GIF_ATTACH, but
> aren't scoring high enough to be marked.
>
> For example, a message that went through unmarked as spam, gets marked
> as spam if I run spamassassin manually:
>
> spamassassin --debug -t <
> /var/spool/MailScanner/quarantine/20061018/nonspam/k9IHujkc027719
>
> Hmmmm... it also gets a much higher score from this, as other tests also
> seem to be missed when run from MailScanner...
>
> MailScanner score (1.508):
>
> 0.75 SARE_GIF_ATTACH Email has a inline gif
> 0.08 TW_DF Odd Letter Triples with DF
> 0.08 TW_GG Odd Letter Triples with GG
> 0.08 TW_GZ Odd Letter Triples with GZ
> 0.08 TW_RG Odd Letter Triples with RG
>
> Manual spamassassin score (38.9):
>
> 3.8 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP
> addr 2)
> 1.1 EXTRA_MPART_TYPE Header has extraneous Content-type:...type=
> entry
> 0.1 TW_GZ BODY: Odd Letter Triples with GZ
> 0.1 TW_RG BODY: Odd Letter Triples with RG
> 0.1 TW_GG BODY: Odd Letter Triples with GG
> 0.1 TW_DF BODY: Odd Letter Triples with DF
> 1.8 TVD_FW_GRAPHIC_NAME_LONG BODY: TVD_FW_GRAPHIC_NAME_LONG
> 1.2 HTML_IMAGE_ONLY_20 BODY: HTML: images with 1600-2000 bytes of
> words
> 2.8 TVD_FW_GRAPHIC_ID1 BODY: TVD_FW_GRAPHIC_ID1
> 0.0 HTML_MESSAGE BODY: HTML included in message
> 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to
> 60% [score: 0.4908]
> 0.8 SARE_GIF_ATTACH FULL: Email has a inline gif
> 2.0 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP
> address [84.122.43.158 listed in dnsbl.sorbs.net]
> 2.6 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org
> [<http://dsbl.org/listing?84.122.43.158>]
> 3.9 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL
> [84.122.43.158 listed in sbl-xbl.spamhaus.org]
> 1.7 SARE_GIF_STOX Inline Gif with little HTML
> 17 FUZZY_OCR BODY: Mail contains an image with common
> spam text inside
> Words found:
> "alert" in 3 lines
> "news" in 1 lines
> "alert" in 3 lines
> "stock" in 1 lines
> "investor" in 2 lines
> "company" in 1 lines
> "trade" in 1 lines
> "service" in 1 lines
> "levitra" in 2 lines
> (15 word occurrences found)
>
You must have some permission problems, as I did the same thing, and got near
identical scores (at least to the first decimal - 32.7 in smamassassin 32.73
in mailscanner.
Maybe Julian can confirm if spamassassin called by mailscanner can still load
plugins that have their loadplugin line in a .cf file instead of being called
in a .pre file.. I seem to remember some sort of privilege change when
spamassassin 3.0.0 or maybe 3.1.0 came out.
--
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!
More information about the MailScanner
mailing list