SQL Redux

Magnusson, Andrew Andrew.Magnusson at COCC.COM
Mon Aug 4 14:29:34 IST 2003


Brief note: I'm the one who modified the code to split the data into three
tables, in order to make it easier and more consistent for our internal
reporting and analysis scripts. (Once they're complete, I'll see what I can
do about getting them released to you folks if anyone's interested. The
focus is on email usage reports for a large number of email domains that
pass through our servers.)

If there's really no performance hit to logging after every email instead of
in batches, I'll see about changing my code to do it that way too. Nothing
worse than having to wait for all of the children to finish logging before I
can do a full restart of MailScanner.

Andrew Magnusson
Internet Product Analyst
COCC
1-877-678-0444 extension 640



-----Original Message-----
From: Steve Freegard [mailto:steve.freegard at LBSLTD.CO.UK]
Sent: Monday, August 04, 2003 5:04 AM
To: MAILSCANNER at JISCMAIL.AC.UK
Subject: Re: SQL Redux


Hi Trevor,

>Could someone do a quick round-up?

Certainly - there are two current methods for SQL Logging:

1) The code included in recent MailScanner CustomConfig.pm code

Originally written by Julian with the SQL bits from me, laterly changed by
someone else (not sure who though!).  This version uses temporary files to
log the data and loads the data into MySQL every time MailScanner is
restarted, or when it auto-restarts.

It splits the data into three SQL tables - maillog_mail, maillog_recipient
and maillog_report.

Currently - I know of no interface that uses this schema.

2) The code included (as a patch to CustomConfig.pm) with MailWatch

As Julian's original code, except that the temporary files have been removed
so that the data is inserted per message batch processed by MailScanner.

This puts all data into a table called maillog which is then used by
MailWatch for display and reporting.

You can get MailWatch from http://www.smf.f2s.com/mailscanner/ which is
currently at version 0.2 - there haven't been later patches for bugs - but
there were a couple of files missed from the tarball when I created it
(create.sql and CustomConfig.pm).

Several people have had a problem when trying to use MailWatch with FreeBSD
- but the problem seems to be with Perl DBI and MySQL, and this is still
under investigation.

I'm currently awaiting for approval for a Sourceforge project for MailWatch
- which I should get in the next couple of days.  I'll release 0.3 shortly
after which will contain the new MailWatch.pm file containing the SQL
Logging routines making it easier to install (and easier for me and Julian
to work out which code people are using!) and a couple of fixes.

Kind regards,
Steve.

-----Original Message-----
From: Julian Field [mailto:mailscanner at ECS.SOTON.AC.UK]
Sent: 03 August 2003 21:38
To: MAILSCANNER at JISCMAIL.AC.UK
Subject: Re: SQL Redux


You want MailWatch.
Hopefully someone else can comment on its exact current state.

At 20:32 01/08/2003, you wrote:
>Ok, so um, for those of us who aren't keeping several patchset branches in
>our heads but who now want to start sql logging, what are the options?
>Could someone do a quick round-up?
>
>I need to provide a web interface that provides plenty of detail about each
>message by middle of next week and I'm wondering whether I should do my own
>thing with a flat file log (since I have little time) or configure in
>support for one of the existing sql logging mechanisms.
>
>"Mailwatch for Mailscanner" seems to be at version 0.2 but I recall
mentions
>of patches for bugs post 0.2.  Is there a later version available?
>
>There's the sql logging code already in CustomConfig.pm, but is there a web
>interface for built yet for the tables it creates?  And I'm assuming I'll
>need the latest version of mailscanner to get the fixes listed in this
>thread.
>
>"David While's Mailstats" looks nice (though I'm not doing virus scanning
>and don't particularly care about geo-locating stuff), but I need a
>per-message interface, and mailstats seems to be more for performance
>reporting than for log analysis.
>
>The mailscanner-mrtg package again is for performance reporting, not log
>analysis.
>
>...So...
>
>What's my best bet for a web interface to logged data that includes such
>things as subject, recipients, spam tests, etc in the short term?
>
>-t.
>
>
> >-----Original Message-----
> >From: Kearney, Rob [mailto:RKearney at AZERTY.COM]
> >Sent: Thursday, July 31, 2003 10:47 AM
> >To: MAILSCANNER at JISCMAIL.AC.UK
> >Subject: Re: SQL Redux
> >
> >
> >oh.. and yes..
> >
> >thanks for the code tippets.. I'll have to change this. as I'm
> >not good with
> >perl  either.
> >
> >-rob
> >
> >-----Original Message-----
> >From: Steve Freegard [mailto:steve.freegard at lbsltd.co.uk]
> >Sent: Thursday, July 31, 2003 11:19 AM
> >To: 'Kearney, Rob '; 'MAILSCANNER at JISCMAIL.AC.UK '
> >Subject: RE: SQL Redux
> >
> >
> >Hi Rob,
> >
> >I'm not really much good with Perl (maybe Julian can back me
> >up on this) -
> >but my understanding is that in calling your SQLRTLogging
> >procedure without
> >the Init & End procedures will mean that the
> >connection/disconnection/prepare and execution of the SQL will
> >happen for
> >every message batch processed by MailScanner which would slow things up
> >quite considerably depending on the volume of messages you processes.
> >
> >The most expensive processes are connecting and preparing the
> >statement, so
> >it's better only to do this once (per child), then running the prepared
> >statements once per message batch.
> >
> >A better way is to have:
> >
> >InitSQLRTLogging:  (this is done once per MailScanner child)
> > - Connect to the database
> > - Prepare each SQL statement required
> >
> >SQLRTLogging: (done once for each message batch)
> > - Tidy-up the data to make it suitable for SQL
> > - Execute the prepared statements
> >
> >EndSQLRTLogging: (done once as each child dies)
> > - Disconnect from the database
> >
> >Cheers,
> >Steve.
> >
> >-----Original Message-----
> >From: Kearney, Rob
> >To: MAILSCANNER at JISCMAIL.AC.UK
> >Sent: 31/07/03 15:56
> >Subject: Re: SQL Redux
> >
> >here is what we did for SQL logging, to bypass temp-file stuff.
> >
> >Just took the SQLLogging and made SQLRTLogging, to write
> >directly to DB,
> >We
> >have not noticed any degradation in performance
> >Basically, we took the functions of SQLLogging and
> >EndSQLLogging and put
> >them together.
> >(dont forget Init and End scripts also
> >
> >---
> >sub SQLRTLogging {
> >  my($message) = @_;
> >  my($dbh);
> >  $dbh =
> >DBI->connect("DBI:mysql:mailscanner:localhost:mysql_socket=/var
> >/database
> >/mys
> >ql/mysql.sock",
> >                      "mailscanner", "mailscanner",
> >                      {'PrintError' => 0})
> >   or MailScanner::Log::DieLog("Cannot connect to the database: %s",
> >                                $DBI::errstr);
> >
> >  my $id = $message->{id};
> >  my $size = $message->{size};
> >  my $from = $message->{from};
> >  my ($from_user, $from_domain);
> >
> >  # split the from address into user and domain bits.
> >  # This may be unnecessary for you; we use it to more easily determine
> >  # inbound vs outbound email in a multi-domain environment.
> >  # HINT: refine queries using SQL 'join' with a table containing local
> >  # domains.
> >
> >  ($from_user, $from_domain) = split /\@/, $from;
> >
> >  my @to   = @{$message->{to}};
> >  my $subject = $message->{subject};
> >  my $clientip = $message->{clientip};
> >  my $archives = join(',', @{$message->{archiveplaces}});
> >  my $isspam = $message->{isspam};
> >  my $ishighspam = $message->{ishigh};
> >  my $sascore = $message->{sascore};
> >  my $spamreport = $message->{spamreport};
> >
> >  # Get rid of control chars and tidy-up SpamAssassin report
> >  $spamreport =~ s/\n/ /g;
> >  $spamreport =~ s/\t//g;
> >
> >  # Get timestamp, and format it so it is suitable to use with MySQL
> >  my($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
> >localtime();
> >  my($timestamp) = sprintf("%d-%02d-%02d %02d:%02d:%02d",
> >                           $year+1900,$mon+1,$mday,$hour,$min,$sec);
> >
> ># maillog_mail insert
> >    my @fields=($timestamp, $id, $size, $from_user, $from_domain,
> >                $subject, $clientip, $archives, $isspam, $ishighspam,
> >                $sascore, $spamreport);
> >    map { s/\'/\\'/g } @fields;
> >
> >    # Insert @fields into a database table
> >    my($sth) = $dbh->prepare("INSERT INTO maillog_mail (time, msg_id,
> >size,
> >from_user, from_domain, subject, clientip, archives, isspam,
> >ishighspam,
> >sascore, spamreport) VALUES (?,?,?,?,?,?,?,?,?,?,?,?)");
> >
> >$sth->execute($fields[0],$fields[1],$fields[2],$fields[3],$fiel
> >ds[4],$fi
> >elds
> >[5],$fields[6],$fields[7],$fields[8],$fields[9],$fields[10],$fi
> >elds[11])
> >or
> >MailScanner::Log::DieLog("Cannot insert row: %s", $DBI::errstr);
> >
> >
> >  my($file, $text);
> >  while(($file, $text) = each %{$message->{allreports}}) {
> >    $file = "the entire message" if $file eq "";
> >    # Use the sanitised filename to avoid problems caused by people
> >forcing
> >    # logging of attachment filenames which contain nasty SQL
> >instructions.
> >    $file = $message->{file2safefile}{$file} or $file;
> >    $text =~ s/\n/ /;  # Make sure text report only contains 1 line
> >    $text =~ s/\t/ /; # and no tab characters
> >
> >    my @fields = ($id, $file, $text);
> >    map { s/\'/\\'/g } @fields;
> >
> >    my($sth) = $dbh->prepare("INSERT INTO maillog_report (msg_id,
> >filename,
> >filereport) VALUES (?,?,?)");
> >    $sth->execute($fields[0],$fields[1],$fields[2]) or
> >MailScanner::Log::DieLog("Cannot insert row: %s", $DBI::errstr);
> >
> >  }
> >
> >  for (@to) {
> >    # again, split the recipient's email into user and domain halves
> >first.
> >    # see comment above about splitting the email like this.
> >
> >    my ($to_user, $to_domain);
> >    ($to_user, $to_domain) = split /\@/, $_;
> >    my @fields = ($id, $to_user, $to_domain);
> >    map { s/\'/\\'/g } @fields;
> >    my($sth) = $dbh->prepare("INSERT INTO maillog_recipient (msg_id,
> >to_user, to_domain) VALUES (?,?,?)");
> >    $sth->execute($fields[0],$fields[1],$fields[2]) or
> >MailScanner::Log::DieLog("Cannot insert row: %s", $DBI::errstr);
> >  }
> >
> >  # Close database connection
> >  $dbh->disconnect();
> >
> >}
> >
> >  MailScanner::Log::InfoLog("Ending SQL Real-Time Logging");
> >}
> >
> >1;
> >
> >
> >
> >-rob
> >
> >--
> >This email and any files transmitted with it are confidential and
> >intended solely for the use of the individual or entity to whom they
> >are addressed. If you have received this email in error please notify
> >the sender and delete the message from your mailbox.
> >
> >This footnote also confirms that this email message has been swept by
> >MailScanner (www.mailscanner.info) for the presence of
> >computer viruses.
> >

--
Julian Field
www.MailScanner.info
Professional Support Services at www.MailScanner.biz
MailScanner thanks transtec Computers for their support

--
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender and delete the message from your mailbox.

This footnote also confirms that this email message has been swept by
MailScanner (www.mailscanner.info) for the presence of computer viruses.

--
*** This message originates from COCC, Inc.

If the reader of this message, regardless of the address or routing, is not an intended recipient, you are hereby notified that you have received this transmittal in error and any review; use, distribution, dissemination or copying is strictly prohibited.  If you have received this message in error, please delete this e-mail and all files transmitted with it from your system and immediately notify COCC, Inc. by sending reply e-mail to the sender of this message.

Thank you. ***



More information about the MailScanner mailing list