crm question

Koopmann, Jan-Peter jan-peter at
Wed Dec 19 23:51:48 GMT 2007


I just implemented crm today. Two things make me think:

1.	I see a lot of learned documents in spam.css. More than spams
above the spam threshold.
2.	The accuracy right now is poor:

Id:1J4v83-000EIG-MntSA Score:50.184 CRM114 Score:-0.15 
Id:1J4vSG-000Eng-8jtSA Score:7.415 CRM114 Score:-0.56 
Id:1J4vVR-000Esm-GHtSA Score:8.595 CRM114 Score:-0.10 
Id:1J4vZO-000Eva-MOtSA Score:14.439 CRM114 Score:-0.33 
Id:1J4vah-000ExY-GOtSA Score:21.129 CRM114 Score:14.96 
Id:1J4vbt-000F0R-FqtSA Score:6.463 CRM114 Score:-1.52 
Id:1J4vgb-000F5b-QGtSA Score:6.401 CRM114 Score:-0.40 
Id:1J4vgf-000F5Q-JrtSA Score:6.439 CRM114 Score:-0.36 
Id:1J4vgf-000F5S-HHtSA Score:6.401 CRM114 Score:-0.40 
Id:1J4vsE-000FNu-1ktSA Score:7.616 CRM114 Score:-0.05 
Id:1J4w68-000FeL-HwtSA Score:10.054 CRM114 Score:-0.09 
Id:1J4xFR-000Hpa-CatSA Score:12.041 CRM114 Score:-0.33 
Id:1J4xqF-000Ipa-0itSA Score:46.701 CRM114 Score:-0.22 
Id:1J4y6U-000JJk-PhtSA Score:47.709 CRM114 Score:0.18 
Id:1J4y8u-000JQj-CWtSA Score:10.574 CRM114 Score:0.46 
Id:1J4yLX-000Jjj-HPtSA Score:39.997 CRM114 Score:0.90 
Id:1J4yjL-000KTm-BXtSA Score:24.844 CRM114 Score:0.36 
Id:1J4yjB-000KTa-94tSA Score:11.199 CRM114 Score:-0.02 
Id:1J4ywH-000L0s-4VtSA Score:17.607 CRM114 Score:-0.03 
Id:1J4yyY-000L5j-6RtSA Score:20.166 CRM114 Score:-0.29 
Id:1J4z0m-000LC5-JmtSA Score:18.519 CRM114 Score:-0.40 
Id:1J4zRY-000M1J-UrtSA Score:7.77 CRM114 Score:0.65 
Id:1J4zhl-000Mbf-8htSA Score:11.473 CRM114 Score:0.47 
Id:1J4znT-000MlI-UOtSA Score:12.217 CRM114 Score:2.63 
Id:1J4zqC-000MvI-9wtSA Score:31.898 CRM114 Score:-0.11 
Id:1J4zrk-000N1K-HztSA Score:21.626 CRM114 Score:0.73 
Id:1J506N-000NWJ-4KtSA Score:35.986 CRM114 Score:0.75 
Id:1J50eF-000Ocu-3UtSA Score:8.262 CRM114 Score:-0.48 
Id:1J51He-000PxA-UotSA Score:21.383 CRM114 Score:0.81 
Id:1J51WW-0000Qq-SutSA Score:7.187 CRM114 Score:-0.40 
Id:1J525Y-0001I4-IktSA Score:43.963 CRM114 Score:1.06 
Id:1J52Aw-0001QU-0ctSA Score:40.391 CRM114 Score:0.66 
Id:1J52Sj-0001wY-8btSA Score:25.329 CRM114 Score:0.66 
Id:1J54Ir-0004JM-IvtSA Score:18.02 CRM114 Score:0.02 
Id:1J54ph-0005HH-TktSA Score:30.694 CRM114 Score:0.81 
Id:1J54th-0005Ph-IZtSA Score:28.211 CRM114 Score:1.05 
Id:1J55q8-0006Zw-VptSA Score:46.701 CRM114 Score:-0.22 
Id:1J55qB-0006bS-VPtSA Score:36.446 CRM114 Score:0.65 
Id:1J561i-0006p1-3itSA Score:18.866 CRM114 Score:0.36 
Id:1J56cJ-0007at-M1tSA Score:21.154 CRM114 Score:0.48 
Id:1J57fa-0008qe-BHtSA Score:15.118 CRM114 Score:0.09

Is this due to the few documents on my server? 

proxy:/server-root/spamlearn/crm # cssutil -b -r spam.css 

 Sparse spectra file spam.css statistics: 

 Total available buckets          :      1048577 
 Total buckets in use             :        26847  
 Total in-use zero-count buckets  :            0  
 Total buckets with value >= max  :            0  
 Total hashed datums in file      :        30840
 Documents learned                :          645  
 Features learned                 :        30841  
 Average datums per bucket        :         1.15
 Maximum length of overflow chain :            4  
 Average length of overflow chain :         1.04 
 Average packing density          :         0.03

proxy:/server-root/spamlearn/crm # cssutil -b -r nonspam.css 

 Sparse spectra file nonspam.css statistics: 

 Total available buckets          :      1048577 
 Total buckets in use             :        55127  
 Total in-use zero-count buckets  :            0  
 Total buckets with value >= max  :            0  
 Total hashed datums in file      :        62625
 Documents learned                :          666  
 Features learned                 :        62626  
 Average datums per bucket        :         1.14
 Maximum length of overflow chain :            5  
 Average length of overflow chain :         1.08 
 Average packing density          :         0.05

Will this improve automatically or is there something wrong with my

Kind regards,

More information about the MailScanner mailing list