Finally: I'm live... :)

Tue May 4 21:28:53 IST 2004

Jason Williams wrote:
>
> As im monitoring the server here, as far as resources, what should I be
> keeping on eye on?
>

The size of your queue's. (at our site, this is by far the most
important thing to monitor on our mailscanner mail servers)

What I do is a really basic/cheesy report that basically just does a "ls
$QUEUE/qf* | wc -l" for each queue (for me, that's /var/spool/mqueue.in
and /var/spool/mqueue).  I run it via inet, and then I have a process on
my workstation ("mqc" - mail queue counts) that polls each of my
mailservers every 30 seconds, and displays the output in "waterfall"
type format:

 cats-mx1(in:m)  cats-mx2(in:m)  cats-mx3(in:m)  cats-mx4(in:m)
         39:726          53:550          37:452          19:544
         30:717          68:549          19:453          19:544
         26:705          91:548          26:450          19:543
         22:711          96:556          38:450          39:543
         13:715         103:549          46:450          53:545
         11:711          75:582          42:452          57:546
          7:699          89:553          58:452         142:543
          8:703          92:553          63:452         151:543
          4:701          84:564          64:457         149:543
         18:701          79:569          60:453         142:543
         43:697          97:555          35:463         145:544

("in" is the number of messages in mqueue.in, and "m" is the number of
messages in the outgoing/regular mqueue; the report repeats the header
line every 20 lines, so that it never completely scrolls off the window)

So, you can see here that cats-mx4 has had a little bit of extra traffic
kick in recently.  And cats-mx2 had a small peak there as well.  On a
typical day, I see low hundreds's most of the time.  When I get hit by a
virus or a mass mailing, they'll shoot up.  If an mqueue.in gets above
1200 I get nervous.  If it goes over 3000, I'm pretty sure that it wont
get back to normal without outside intervention (it is my experience
that during normal day operations, if I add to that enough traffic to
get up to the 3000 range, then it wont go back below that during
business hours, so if I need to get it back under that level, I need to
do something about it).

1200-1600 is in the "15 minutes from SMTP receipt to SMTP relay/Local
delivery" threshold, which is our service level agreement with the rest
of campus, so that's why I start getting nervous around there ... if
we're in that range too long, then I also need to figure out what I'm
going to do about it.

This also helps me see problems as they're forming.  For example:
There's a virus that makes some of our resnet (students in the dorms)
computers into spam generators, and they'll send messages at a VERY high
rate: tens of thousands per hour.  Having a background window with that
cascade running lets me see those as they're building up in momentum,
and then I have a script ("qstat") that will rip through the qf files
and identify which host (via the $_ relay line of the qf file) has the
most traffic waiting in mqueue.in.  Usually, if I see one host that has
more than 100 submissions, or more than 10% if it's over 1000, then
that's a bad thing.  You can then look at the qf and df file for some of
those messages (if that's legal at your site ... here the law basically
restricts me to analyzing the qf file only) and verify that it's
something negative.  Once I'm confident it's an attack of some sort, I
usually:

a) kill mailscanner (so that new messages from that host aren't leaking
through while I clean up; but I don't kill sendmail because I don't want
to interrupt service, from a user perspective)
b) add that host to my sendmail access file for blocking
c) push that access file out to all of my mail servers (and that process
includes doing loading/converting the access file into the access
database)
d) run a script that removes every queue file whose qf file $_ matches
the host ("qflush") (this part can take a long time)
e) restart mailscanner

If anyone wants to see the scripts (the inetd.conf scripts, mqc, qstat
and qflush) I'd be happy to share them.  Some of it (mqc in particular)
might be overkill if you're just running one server with 2 queues, but
still, it's good to see the health of your queues over time.

The inetd.conf scripts are also used by our bigbrother / bigsister
server to throw warnings when different thresholds are met.

-------------------------- MailScanner list ----------------------
To leave, send    leave mailscanner    to jiscmail at jiscmail.ac.uk
Before posting, please see the Most Asked Questions at
http://www.mailscanner.biz/maq/     and the archives at
http://www.jiscmail.ac.uk/lists/mailscanner.html