Bayes still effective?
Mariano Absatz
el.baby at GMAIL.COM
Thu Jul 29 00:22:38 IST 2004
On Wed, 28 Jul 2004 15:31:58 -0700, John Rudd <jrudd at ucsc.edu> wrote:
> On Jul 28, 2004, at 3:16 PM, Raymond Dijkxhoorn wrote:
>
> > Hi!
> >
> >>>> This strikes me as a pretty effective way of circumventing bayes.
> >>>> Is
> >>>> worth bothering with?
> >>>
> >>> Its VERY effective, still... todays stats:
> >>>
> >>> SpamAssassin tag hits: (top 100)
> >>> #1 202469 BAYES_99
> >>
> >> 202469 messages incorrectly marked as spam? out of 10 billion
> >> messages
> >> per day?
> >
> > Out of 2M today so far.
> >
> >> (just saying, posting a list of hit rates doesn't say anything about
> >> the effectiveness of a particular tag ... it's almost meaningless,
> >> really)
> >
> > I dont share your feeling. We have good results with bayes and also
> > have a
> > large test set to test on FP's. We for example do this for the SURBL
> > project.
>
> You don't share the feeling that a post of a single list one raw datum
> has no meaning? what does that have to do with your FP rate? or that
> you do it for SURBL?
>
> > But what exactly did you want to add to the discussion ? I dont mind
> > postings like you did, but it would be nice if you could share your own
> > results instead of just commenting on other peoples posts. Doesnt help
> > much.
>
> I think you've completely misunderstood what I was saying. I didn't
> say Bayes is meaningless nor ineffective. I didn't say anything one
> way or another about that. (for the record, I find bayes quite useful)
>
> What I was saying is that your post is meaningless without outside
> context like number of messages those hits match against, spam/ham
> ratio, hit/fp ratios for the Bayes_99 tag specifically, etc.
>
> The comments illustrate that: your raw number of Bayes_99 hits can be
> all FP's for all we know, because you didn't give any other data than
> raw number of tag hits (my first comment). Or (from my second
> comment), it could still be an incredibly small number of hits without
> knowing the total number of messages that go with it.
...
<SNIP>
Mmhhh... I think we're having more a problem of form than that of
matter, I see both points.
I understand what John says about Raymond's posting lack of context
and I share that, but I also think that the 'tone' that at least I
'hear' in John posting is quite rude... I can think of several ways to
ask precisely the same thing in a much more polite way... I don't
intend that form be prevalent over matter, but there are ways of
answering questions and asking for more data...
Raymond is a very frequent poster and, most of the time, he's helping
other people who knows a lot less than he does, and IIRC he's usually
quite kind, even when he's replying to your average "it just doesn't
work" statement without even mentioning version of software or other
relevant data.
I participate in several mailing lists and really feel much more at
home in lists like these one, where people is either kind or silent,
than lists, like, say, the one for djbdns where you usually are
shouted at whenever you ask something that 'the wiser ones' consider
either irrelevant or plain stupid...
So, let's try to keep our tone civilized and go ahead with the
discussion... as I'm not using bayes, I'll remain silently reading
this thread...
Good evening, ladies and gentlemen..
:-)
--
Mariano Absatz - El Baby
el (dot) baby (AT) gmail (dot) com
el (punto) baby (ARROBA:@) gmail (punto) com
-------------------------- MailScanner list ----------------------
To leave, send leave mailscanner to jiscmail at jiscmail.ac.uk
Before posting, please see the Most Asked Questions at
http://www.mailscanner.biz/maq/ and the archives at
http://www.jiscmail.ac.uk/lists/mailscanner.html
More information about the MailScanner
mailing list