FuzzyOcr working but not via MailScanner

Anthony Cartmell ajcartmell at fonant.com
Wed Oct 18 21:05:25 IST 2006


> Answer 10:    MailScanner by default only passes the first 30kb of the  
> mail to SpamAssassin.

Interesting. Most of the spam in question is less than 30kb in total size,  
though, and I don't see any error messages.

> Another thing to try
> ====================
> Also try setting 'focr_verbose 2' in the config file, most messages  
> report something like this..

I get a lot of

[2006-10-18 20:39:26] Debug mode: Set scansets to values:
                       $gocr -i -
                       $gocr -l 180 -d 2 -i -

But only get messages like:

[2006-10-18 16:17:11] Debug mode: Starting FuzzyOcr...
[2006-10-18 16:17:11] Debug mode: Attempting to load personal wordlist...
[2006-10-18 16:17:11] Debug mode: No personal wordlist found, skipping...
[2006-10-18 16:17:11] Debug mode: FuzzyOcr ending successfully...

when I run the spamassassin test manually, not when it's run via  
MailScanner :(

The spam messages with inline GIFs are found by SARE_GIF_ATTACH, but  
aren't scoring high enough to be marked.

For example, a message that went through unmarked as spam, gets marked as  
spam if I run spamassassin manually:

spamassassin --debug -t <  
/var/spool/MailScanner/quarantine/20061018/nonspam/k9IHujkc027719

Hmmmm... it also gets a much higher score from this, as other tests also  
seem to be missed when run from MailScanner...

MailScanner score (1.508):

0.75	SARE_GIF_ATTACH	Email has a inline gif
0.08	TW_DF	Odd Letter Triples with DF
0.08	TW_GG	Odd Letter Triples with GG
0.08	TW_GZ	Odd Letter Triples with GZ
0.08	TW_RG	Odd Letter Triples with RG

Manual spamassassin score (38.9):

3.8 HELO_DYNAMIC_IPADDR2   Relay HELO'd using suspicious hostname (IP addr  
2)
  1.1 EXTRA_MPART_TYPE       Header has extraneous Content-type:...type=  
entry
  0.1 TW_GZ                  BODY: Odd Letter Triples with GZ
  0.1 TW_RG                  BODY: Odd Letter Triples with RG
  0.1 TW_GG                  BODY: Odd Letter Triples with GG
  0.1 TW_DF                  BODY: Odd Letter Triples with DF
  1.8 TVD_FW_GRAPHIC_NAME_LONG BODY: TVD_FW_GRAPHIC_NAME_LONG
  1.2 HTML_IMAGE_ONLY_20     BODY: HTML: images with 1600-2000 bytes of  
words
  2.8 TVD_FW_GRAPHIC_ID1     BODY: TVD_FW_GRAPHIC_ID1
  0.0 HTML_MESSAGE           BODY: HTML included in message
  0.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%    
[score: 0.4908]
  0.8 SARE_GIF_ATTACH        FULL: Email has a inline gif
  2.0 RCVD_IN_SORBS_DUL      RBL: SORBS: sent directly from dynamic IP  
address   [84.122.43.158 listed in dnsbl.sorbs.net]
  2.6 RCVD_IN_DSBL           RBL: Received via a relay in list.dsbl.org    
[<http://dsbl.org/listing?84.122.43.158>]
  3.9 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL   
[84.122.43.158 listed in sbl-xbl.spamhaus.org]
  1.7 SARE_GIF_STOX          Inline Gif with little HTML
   17 FUZZY_OCR              BODY: Mail contains an image with common spam  
text inside
                             Words found:
                             "alert" in 3 lines
                             "news" in 1 lines
                             "alert" in 3 lines
                             "stock" in 1 lines
                             "investor" in 2 lines
                             "company" in 1 lines
                             "trade" in 1 lines
                             "service" in 1 lines
                             "levitra" in 2 lines
                             (15 word occurrences found)


Thanks for the ideas,

Anthony
-- 
www.fonant.com - Quality web sites


More information about the MailScanner mailing list