Measuring spare capacity

Wed Dec 8 17:37:31 GMT 2004

Bart

50 seconds scan time is quite big - mine's normally well under 10.

Things like RBL's and razor type apps can impact the measurement quite
badly, esp if you don't keep the zone files locally.

Of course in order to measure the throughput you're going to have to
1) collect some good examples of whats flowing at the moment (prob
several thousands of emails in your case
2) split them up and be able to trickle feed the test system and
different rates. repeat a couple of times to get get averages.
3) Repeat with differing MS children and batch sizes. maybe repeat with
different RBL's and URIRBLs etc to see what impact a local zone file
would make??

I guess the biggest issue would be building the test harness so you can
repeat the tests with ease.

There's so many variables here that trying to get a percentage capacity
measurement is going to be very specific to your setup..and I'd use the
test harness to tune up the system first.

A basic measurement would be looking at the inbound queue and make sure
it's not growing over a set time...if the inbound queue is growing then
MS isn't processing the emails fast enough. Of course you'll get usage
spikes but the queue should be cleared or reducing within a set period.
If the inbound MTA can log to a different logfile to the outbound MTA
and MS then you've got some log files to look at and start analysing.

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300

MailScanner wrote:
> Thanks Martin,
>
> Would you mind keeping it on the list for a while longer? I hope to get
> a metric out of this which will benefit others too ;-)
>
> I can take one of the systems off-line on a weekend and do some tests
> with varying email loads. What would be the easiest way to simulate a
> particular load?
>
> I've had another thought; what if I look at all the 'stat=Sent' log
> entries and collect their delay values? The amount of time spent
> filtering by the MS children would equal the average delay multiplied by
> the number of children. The average time between log entries would give
> me the time they have available to do that work.
>
> E.g. if I have twenty children handling 1200 messages in an hour then
> they each have an average of one minute to work on each message. My
> average delay is 50 seconds, so I'm all right under that load.
>
> Wouldn't that give me a ratio indicating the utilisation of the system?
> It only gives me information on the messages that are actually passed
> on, but those are the ones whose performance I most care about anyway.
>
>
> Bart...
>
> -----Original Message-----
> From: MailScanner mailing list [mailto:MAILSCANNER at JISCMAIL.AC.UK] On
> Behalf Of Martin Hepworth
> Posted At: 08 December 2004 09:42
> Posted To: MailScanner
> Conversation: Measuring spare capacity
> Subject: Re: Measuring spare capacity
>
> Hi
>
> Do you have duplicate test system to 'play with'?
>
> That way you can do some repeatable tests with the same emails to
> measure trip points etc.
>
> I've done performance related issues one other system (Solaris/RDBMS's
> mainly) but not Linux or heavy MailScanner stuff.
>
> the MRTG stuff will only to show 5 min peaks and averages when you
> really need something a little more granular.
>
> If you want to take this off list for a bit we can chat...
>
> --
> Martin Hepworth
> Snr Systems Administrator
> Solid State Logic
> Tel: +44 (0)1865 842300
>
>
> MailScanner wrote:
>
>>Hi Martin,
>>
>>Thank you for your quick response.
>>
>>Two dual Xeon HT machines with 2GB RAM running Fedora Core and
>
> sendmail.
>
>>We have tuned the systems for performance already.
>>
>>It's not so much that we want to get more performance out of the
>
> system,
>
>>(although that would be nice). We rather want to be able to do some
>>capacity planning.
>>
>>When we have a problem with the service all we see is a spike in the
>>number of messages in mqueue.in. It seems that the systems have a
>
> 'trip
>
>>point'. Before this point they are able to keep the queue down to one
>
> or
>
>>two, after it the queue rockets to thousands of messages.
>>
>>I'm after a metric I can get from the systems which tells me how far I
>>am away from this 'trip point'. Something like: 'the current load is
>>5000 messages per hour. The system will trip at 7500 messages per
>
> hour'.
>
>>The load, CPU and memory graphs in MRTG are flat, apparently random or
>>spiking with the queue. I don't have any figures that show that the
>>system has less spare capacity during the day compared to the middle
>
> of
>
>>the night, which is what I expect to see.
>>
>>If we can work out the spare capacity, then we can plan for additional
>>system(s) to ensure that we do not run into trouble.
>>
>>Bart...
>>
>>-----Original Message-----
>>From: MailScanner mailing list [mailto:MAILSCANNER at JISCMAIL.AC.UK] On
>>Behalf Of Martin Hepworth
>>Posted At: 07 December 2004 15:44
>>Posted To: MailScanner
>>Conversation: Measuring spare capacity
>>Subject: Re: Measuring spare capacity
>>
>>Hi
>>
>>what hardware and what OS/MTA???
>>
>>what performance tuning have you done
>>
>>what does load average look like?
>>
>>might want to reduce batch set and max children to see how it gets
>>on..theses params are very much trial and error...
>>
>>--
>>Martin Hepworth
>>Snr Systems Administrator
>>Solid State Logic
>>Tel: +44 (0)1865 842300
>>
>>
>>MailScanner wrote:
>>
>>
>>>I am trying to get a handle on the amount of spare capacity in my MS
>>>boxes.
>>>
>>>We are running two MS which handle about 250,000 messages by day (MRTG
>>>count by recipient). The 'Max Unscanned Messages Per Scan' and Max
>>>Unsafe Messages Per Scan' are both set at 50 and max children at 15.
>>>
>>>I isolated a day's worth of maillog entries out of a rotated file and
>>>looked at some stats. Counting all instances of 'Found X messages
>>>waiting' I found that 75% had just one message, 17% two, 5% three and
>>>the more populous batches hardly registering at all. Doing the same
>>
>>for
>>
>>
>>>'Scanning X messages' I found a slightly wider spread but tapering off
>>>quickly after 15.
>>>
>>>These are the distributions for the first 15 counts:
>>>
>>>New Batch: Scanning $a messages,
>>>1       36153
>>>2       8187
>>>3       2188
>>>4       767
>>>5       319
>>>6       139
>>>7       70
>>>8       45
>>>9       18
>>>10      11
>>>11      17
>>>12      6
>>>13      4
>>>14      3
>>>15      5
>>>
>>>Found $a messages waiting
>>>1       0
>>>2       7545
>>>3       7019
>>>4       5126
>>>5       3484
>>>6       2535
>>>7       1724
>>>8       1238
>>>9       915
>>>10      618
>>>11      489
>>>12      372
>>>13      260
>>>14      211
>>>15      161
>>>
>>>This surprised me. I was expecting the batch size to grow during busy
>>>periods. It seems that the batch size is generally a single message,
>>>even though more messages are waiting to be processed. Looking at log
>>>snippets in the mailing list archives confirms that this is common.
>>>Looking at the time distribution of (rather rare) larger batches I
>>
>>found
>>
>>
>>>these spread randomly over the day.
>>>
>>>We regularly get a peak in the incoming messages queue of a few
>>>thousands of messages. This makes me believe that there is not that
>>
>>much
>>
>>
>>>slack in the capacity. During these peaks the number of messages per
>>>batch does go up to the maximum.
>>>
>>>Is there a way to measure how many more messages per day a given
>>
>>system
>>
>>
>>>can take?
>>>
>>>Thanks for any ideas.
>>>
>>>Bart...
>>>

**********************************************************************

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.

**********************************************************************

------------------------ MailScanner list ------------------------
To unsubscribe, email jiscmail at jiscmail.ac.uk with the words:
'leave mailscanner' in the body of the email.
Before posting, read the MAQ (http://www.mailscanner.biz/maq/) and
the archives (http://www.jiscmail.ac.uk/lists/mailscanner.html).

Support MailScanner development - buy the book off the website!