SQL Redux

Steve Freegard steve.freegard at LBSLTD.CO.UK
Thu Jul 31 16:19:16 IST 2003


Hi Rob,

I'm not really much good with Perl (maybe Julian can back me up on this) -
but my understanding is that in calling your SQLRTLogging procedure without
the Init & End procedures will mean that the
connection/disconnection/prepare and execution of the SQL will happen for
every message batch processed by MailScanner which would slow things up
quite considerably depending on the volume of messages you processes.

The most expensive processes are connecting and preparing the statement, so
it's better only to do this once (per child), then running the prepared
statements once per message batch.

A better way is to have:

InitSQLRTLogging:  (this is done once per MailScanner child)
 - Connect to the database
 - Prepare each SQL statement required

SQLRTLogging: (done once for each message batch)
 - Tidy-up the data to make it suitable for SQL
 - Execute the prepared statements

EndSQLRTLogging: (done once as each child dies)
 - Disconnect from the database

Cheers,
Steve.

-----Original Message-----
From: Kearney, Rob
To: MAILSCANNER at JISCMAIL.AC.UK
Sent: 31/07/03 15:56
Subject: Re: SQL Redux

here is what we did for SQL logging, to bypass temp-file stuff.

Just took the SQLLogging and made SQLRTLogging, to write directly to DB,
We
have not noticed any degradation in performance
Basically, we took the functions of SQLLogging and EndSQLLogging and put
them together.
(dont forget Init and End scripts also

---
sub SQLRTLogging {
  my($message) = @_;
  my($dbh);
  $dbh =
DBI->connect("DBI:mysql:mailscanner:localhost:mysql_socket=/var/database
/mys
ql/mysql.sock",
                      "mailscanner", "mailscanner",
                      {'PrintError' => 0})
   or MailScanner::Log::DieLog("Cannot connect to the database: %s",
                                $DBI::errstr);

  my $id = $message->{id};
  my $size = $message->{size};
  my $from = $message->{from};
  my ($from_user, $from_domain);

  # split the from address into user and domain bits.
  # This may be unnecessary for you; we use it to more easily determine
  # inbound vs outbound email in a multi-domain environment.
  # HINT: refine queries using SQL 'join' with a table containing local
  # domains.

  ($from_user, $from_domain) = split /\@/, $from;

  my @to   = @{$message->{to}};
  my $subject = $message->{subject};
  my $clientip = $message->{clientip};
  my $archives = join(',', @{$message->{archiveplaces}});
  my $isspam = $message->{isspam};
  my $ishighspam = $message->{ishigh};
  my $sascore = $message->{sascore};
  my $spamreport = $message->{spamreport};

  # Get rid of control chars and tidy-up SpamAssassin report
  $spamreport =~ s/\n/ /g;
  $spamreport =~ s/\t//g;

  # Get timestamp, and format it so it is suitable to use with MySQL
  my($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime();
  my($timestamp) = sprintf("%d-%02d-%02d %02d:%02d:%02d",
                           $year+1900,$mon+1,$mday,$hour,$min,$sec);

# maillog_mail insert
    my @fields=($timestamp, $id, $size, $from_user, $from_domain,
                $subject, $clientip, $archives, $isspam, $ishighspam,
                $sascore, $spamreport);
    map { s/\'/\\'/g } @fields;

    # Insert @fields into a database table
    my($sth) = $dbh->prepare("INSERT INTO maillog_mail (time, msg_id,
size,
from_user, from_domain, subject, clientip, archives, isspam, ishighspam,
sascore, spamreport) VALUES (?,?,?,?,?,?,?,?,?,?,?,?)");

$sth->execute($fields[0],$fields[1],$fields[2],$fields[3],$fields[4],$fi
elds
[5],$fields[6],$fields[7],$fields[8],$fields[9],$fields[10],$fields[11])
or
MailScanner::Log::DieLog("Cannot insert row: %s", $DBI::errstr);


  my($file, $text);
  while(($file, $text) = each %{$message->{allreports}}) {
    $file = "the entire message" if $file eq "";
    # Use the sanitised filename to avoid problems caused by people
forcing
    # logging of attachment filenames which contain nasty SQL
instructions.
    $file = $message->{file2safefile}{$file} or $file;
    $text =~ s/\n/ /;  # Make sure text report only contains 1 line
    $text =~ s/\t/ /; # and no tab characters

    my @fields = ($id, $file, $text);
    map { s/\'/\\'/g } @fields;

    my($sth) = $dbh->prepare("INSERT INTO maillog_report (msg_id,
filename,
filereport) VALUES (?,?,?)");
    $sth->execute($fields[0],$fields[1],$fields[2]) or
MailScanner::Log::DieLog("Cannot insert row: %s", $DBI::errstr);

  }

  for (@to) {
    # again, split the recipient's email into user and domain halves
first.
    # see comment above about splitting the email like this.

    my ($to_user, $to_domain);
    ($to_user, $to_domain) = split /\@/, $_;
    my @fields = ($id, $to_user, $to_domain);
    map { s/\'/\\'/g } @fields;
    my($sth) = $dbh->prepare("INSERT INTO maillog_recipient (msg_id,
to_user, to_domain) VALUES (?,?,?)");
    $sth->execute($fields[0],$fields[1],$fields[2]) or
MailScanner::Log::DieLog("Cannot insert row: %s", $DBI::errstr);
  }

  # Close database connection
  $dbh->disconnect();

}

  MailScanner::Log::InfoLog("Ending SQL Real-Time Logging");
}

1;



-rob

--
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender and delete the message from your mailbox.

This footnote also confirms that this email message has been swept by
MailScanner (www.mailscanner.info) for the presence of computer viruses.



More information about the MailScanner mailing list