Maybe a bit OT, auto adjusting high scoring value..
David
dh at UPTIME.AT
Sun Mar 16 14:08:48 GMT 2003
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
Hello.
First of all let me explain my setup.
I have a "low" score of 5.3 and a high score of 13. High scoring spam
is deleted, but the message is forwarded to me none the less, so I can
check, that it is really not a message that has some value to the user.
This is something we all agreed on.
Out of curiosity I collected 631 Spam messages, all verified by me to
be actual spam. Some of them are above the threshold of 13, others are
within the range of 5.3-13.
I have written a little Perl script, which reads that Mbox, collects
all the Spam Scores and tosses them into a little array on which I am
able to perform some statistical operations using Statistics::Lite.
For me that returns:
Max Value: 31.7
Min Value: 5.3 (kinda expected)
Data Range: 26.4
Std. Variance: 26.2935....
Std. Deviation: 5.0292...
Mean Score: 13.81410...
Median: 13.4
Now my question is and I am posting to this list because I know there
are many talented mathematicians out there.
a) Does this kind of collecting data make sense?
b) which statistical functions would make sense ?
What I am trying to do is the following.
I am noticing, that there is a LOT of verified Spam in the range
between 5.3 to 13 and I am trying to find the best value for our
typical Spam flow which will catch most verified spam and still allow
the seldom false positives to pass through to the user. If you recall,
I delete the high scoring Spam.
So basically I need to find the best value for "High scoring"-
I would be very happy if you could tell me how to tackle this, because
I really know nothing about math and I think what I just did has little
to no value
- -d
- - ❜ Fantasie ist wichtiger als Wissen.❛ - Albert Einstein
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (Darwin)
iD8DBQE+dIV0iW/Ta/pxHPQRAzVvAKDGv6WRjGyMqc5pRAQyi/467M7fHwCghgsh
TaL4ldLqeIEb0qtZdPwOF2Y=
=Ua2i
-----END PGP SIGNATURE-----
More information about the MailScanner
mailing list