Bayes still effective?
John Rudd
jrudd at UCSC.EDU
Wed Jul 28 23:31:58 IST 2004
On Jul 28, 2004, at 3:16 PM, Raymond Dijkxhoorn wrote:
> Hi!
>
>>>> This strikes me as a pretty effective way of circumventing bayes.
>>>> Is
>>>> worth bothering with?
>>>
>>> Its VERY effective, still... todays stats:
>>>
>>> SpamAssassin tag hits: (top 100)
>>> #1 202469 BAYES_99
>>
>> 202469 messages incorrectly marked as spam? out of 10 billion
>> messages
>> per day?
>
> Out of 2M today so far.
>
>> (just saying, posting a list of hit rates doesn't say anything about
>> the effectiveness of a particular tag ... it's almost meaningless,
>> really)
>
> I dont share your feeling. We have good results with bayes and also
> have a
> large test set to test on FP's. We for example do this for the SURBL
> project.
You don't share the feeling that a post of a single list one raw datum
has no meaning? what does that have to do with your FP rate? or that
you do it for SURBL?
> But what exactly did you want to add to the discussion ? I dont mind
> postings like you did, but it would be nice if you could share your own
> results instead of just commenting on other peoples posts. Doesnt help
> much.
I think you've completely misunderstood what I was saying. I didn't
say Bayes is meaningless nor ineffective. I didn't say anything one
way or another about that. (for the record, I find bayes quite useful)
What I was saying is that your post is meaningless without outside
context like number of messages those hits match against, spam/ham
ratio, hit/fp ratios for the Bayes_99 tag specifically, etc.
The comments illustrate that: your raw number of Bayes_99 hits can be
all FP's for all we know, because you didn't give any other data than
raw number of tag hits (my first comment). Or (from my second
comment), it could still be an incredibly small number of hits without
knowing the total number of messages that go with it.
It's only in your follow-up messages that we get more context that
starts to support the conclusion you gave with the data. But without
that context, the first message (that just listed your top 100 tags and
hit rates) is meaningless. It doesn't say anything, one way or the
other, about the effectiveness of Bayes.
(and, I suppose it might true that I didn't add anything to the
discussion about whether or not bayes is effective ... but a) with your
first post, neither did you (for the reasons I gave), and b) my post
was about the information content of raw data, which is more of a
meta-discussion, which was sort of demanded by your post that had no
meaning yet tried to support a conclusion from that lack of meaning)
-------------------------- MailScanner list ----------------------
To leave, send leave mailscanner to jiscmail at jiscmail.ac.uk
Before posting, please see the Most Asked Questions at
http://www.mailscanner.biz/maq/ and the archives at
http://www.jiscmail.ac.uk/lists/mailscanner.html
More information about the MailScanner
mailing list