MS ruleset file sizes

Julian Field mailscanner at ecs.soton.ac.uk
Sat Mar 22 17:14:23 GMT 2003


At 22:37 21/03/2003, you wrote:
>On Friday 21 March 2003 16:05, you wrote:
> > Hi!
> >
> > > >So whats the current limitation, how long could those files be ? :)
> > >
> > > As I have said elsewhere, the best approach for large config files is to
> > > slurp them all in from a database at startup time, then look them up in
> > > local hash tables at run time. I can't see tens of thousands causing much
> > > of a problem. There is no hard-wired limit at all.
> >
> > I'll do some filed testing with large lists soon, will post some reports
> > on the list when i am ready.
> >
>
>In the next week or two, I too plan to test w/ somewhere in that range of
>mailbox rules.  I too will share what I find.
>
>Would a database work for files that are order dependent?  If the current
>data
>is not order dependent I'll save you time, stop reading, the rest of this
>email means nothing.... :)

The current implementation of the per-domain black+whitelists is based on
the fact that they aren't order-dependent at all, so they work very fast as
they are just a few hash table lookups.

> >From the source, it looks like each file is loaded into an array.  Thus,
>searches on the array will be linear.  Am I understanding the code
>correctly?
>Each file is stored as an array and then each array is put into a hash based
>on the variable name of the file?

That's basically it, yes.

>Does anyone have performance data on when the number of rules might become
>relevant?  If someone does need 'log n' performance, maybe parts of the file
>can be loaded into a memory hash where the order doesn't matter.  With a goal
>of minimal changes/hacking in mind, here's an idea.
>
>Maybe, within the data files, comments could be used to start and end
>sections
>of entries that are not order dependent.  Then as a modification to
>mailscanner, we can look for those comment.  All data in between the comments
>could be treated as one element in the array.  That element would be a
>reference to a hash of the elements within.  Then when reading the arrays,
>we'd watch for hash references and search down through them when found.

This would end up being (I think) a hash of lists of hashes of lookup
values. Wow, this is going to have a lot of brackets in it! :-)

>It's just an idea, any thoughts?  If needed, is there any communal
>interest in
>working on this?

This could all be implemented in Custom Functions, so you don't have to
play with the main code at all, which will make life much easier for you.

In what situations do you actually need lists of hashes for a particular
config variable? Look at it from a user perspective first, what problem are
you trying to solve. Once you have concrete examples of what people need to
be able to do, you get a much clearer idea of what you are trying to
implement. Don't go in hacking code from the start, without doing some
(informal) user requirements analysis first. It *will* save you time...

>I really like the modularity and readability of the code.  Cheers to Julian!!

On good days, even I can figure out most of it now :-)
--
Julian Field
www.MailScanner.info
Professional Support Services at www.MailScanner.biz
MailScanner thanks transtec Computers for their support



More information about the MailScanner mailing list