FuzzyOcr working but not via MailScanner
Anthony Cartmell
ajcartmell at fonant.com
Wed Oct 18 21:05:25 IST 2006
> Answer 10: MailScanner by default only passes the first 30kb of the
> mail to SpamAssassin.
Interesting. Most of the spam in question is less than 30kb in total size,
though, and I don't see any error messages.
> Another thing to try
> ====================
> Also try setting 'focr_verbose 2' in the config file, most messages
> report something like this..
I get a lot of
[2006-10-18 20:39:26] Debug mode: Set scansets to values:
$gocr -i -
$gocr -l 180 -d 2 -i -
But only get messages like:
[2006-10-18 16:17:11] Debug mode: Starting FuzzyOcr...
[2006-10-18 16:17:11] Debug mode: Attempting to load personal wordlist...
[2006-10-18 16:17:11] Debug mode: No personal wordlist found, skipping...
[2006-10-18 16:17:11] Debug mode: FuzzyOcr ending successfully...
when I run the spamassassin test manually, not when it's run via
MailScanner :(
The spam messages with inline GIFs are found by SARE_GIF_ATTACH, but
aren't scoring high enough to be marked.
For example, a message that went through unmarked as spam, gets marked as
spam if I run spamassassin manually:
spamassassin --debug -t <
/var/spool/MailScanner/quarantine/20061018/nonspam/k9IHujkc027719
Hmmmm... it also gets a much higher score from this, as other tests also
seem to be missed when run from MailScanner...
MailScanner score (1.508):
0.75 SARE_GIF_ATTACH Email has a inline gif
0.08 TW_DF Odd Letter Triples with DF
0.08 TW_GG Odd Letter Triples with GG
0.08 TW_GZ Odd Letter Triples with GZ
0.08 TW_RG Odd Letter Triples with RG
Manual spamassassin score (38.9):
3.8 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP addr
2)
1.1 EXTRA_MPART_TYPE Header has extraneous Content-type:...type=
entry
0.1 TW_GZ BODY: Odd Letter Triples with GZ
0.1 TW_RG BODY: Odd Letter Triples with RG
0.1 TW_GG BODY: Odd Letter Triples with GG
0.1 TW_DF BODY: Odd Letter Triples with DF
1.8 TVD_FW_GRAPHIC_NAME_LONG BODY: TVD_FW_GRAPHIC_NAME_LONG
1.2 HTML_IMAGE_ONLY_20 BODY: HTML: images with 1600-2000 bytes of
words
2.8 TVD_FW_GRAPHIC_ID1 BODY: TVD_FW_GRAPHIC_ID1
0.0 HTML_MESSAGE BODY: HTML included in message
0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
[score: 0.4908]
0.8 SARE_GIF_ATTACH FULL: Email has a inline gif
2.0 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP
address [84.122.43.158 listed in dnsbl.sorbs.net]
2.6 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org
[<http://dsbl.org/listing?84.122.43.158>]
3.9 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL
[84.122.43.158 listed in sbl-xbl.spamhaus.org]
1.7 SARE_GIF_STOX Inline Gif with little HTML
17 FUZZY_OCR BODY: Mail contains an image with common spam
text inside
Words found:
"alert" in 3 lines
"news" in 1 lines
"alert" in 3 lines
"stock" in 1 lines
"investor" in 2 lines
"company" in 1 lines
"trade" in 1 lines
"service" in 1 lines
"levitra" in 2 lines
(15 word occurrences found)
Thanks for the ideas,
Anthony
--
www.fonant.com - Quality web sites
More information about the MailScanner
mailing list