Found nn messages in the processing-messages database

Fri Apr 17 16:25:03 IST 2009

> -----Original Message-----
> From: mailscanner-bounces at lists.mailscanner.info 
> [mailto:mailscanner-bounces at lists.mailscanner.info] On Behalf 
> Of Julian Field
> Sent: Friday, April 17, 2009 10:49 AM
> To: MailScanner discussion
> Subject: Re: Found nn messages in the processing-messages database
> 
> 
> 
> On 17/04/2009 15:29, Kai Schaetzl wrote:
> > ailScanner|ecs.soton.ac.uk|040707 at ecs.soton.ac.uk>
> > Reply-To: mailscanner at lists.mailscanner.info
> >
> > Julian Field wrote on Fri, 17 Apr 2009 14:00:16 +0100:
> >
> >    
[...]
> >> It just occurred to me that the processing database won't 
> work well with
> >> Postfix at all.
> >> Postfix re-uses the message id numbers too fast for them to be
> >> considered good ids, which is why I have to add a random 
> number to the end.
> >> But I add a new random number to the end every time I pick up the
> >> message, so every time it sees the same message it will 
> create different
> >> message ids for it.
> >> So the processing database idea breaks down :-(
> >> Poo :-(
> >>      
> > But this shouldn't be the case here, right? And I don't see 
> evidence that it
> > reuses these ids "too fast". I can't see a reuse for these 
> ids for the whole
> > month.
> >    
> It certainly used to, that's why I had to add the random 
> characters on 
> the end. I had more messages coming in with the same id in high load 
> environments, so had to produce a better unique key for each 
> message. I 
> don't want to change that, as it was very definitely necessary at the 
> time and so is still necessary on some systems. Maybe ones with small 
> queue filesystems?
> 

Rather than a random number wouldn't a file checksum or MD5 hash work just
as well for randomness relating to the inode reuse issue but still give you
the ability to see that it's the same message as a previous message? I mean
the chances that you will have a message with the exact same has/checksum
attached to the exact same inode has to be infinite I would think.

I did some tests using the unix cksum program and the perl
String::CRC::Cksum module and the perl module sucks big time. Running both
against a 13M file (I know 13M is a bit large for an email but...) the cksum
finishes in pretty consistent 0.077 seconds and the perl module about 16.57
seconds (too long). Now if you used the perl Digest::MD5 module against the
same file it's about 0.188 seconds to calculate a 32 char checksum like
d41d8cd98f00b204e9800998ecf8427e

I know you don't really like stepping outside of perl but I *think* cksum is
pretty universal in the *nix world and it's pretty damn fast. Could put it
in a safepipe function with timeout too.

Rick

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.