POPFile
(http://popfile.sourceforge.net/)
is a program that uses Bayesian techniques to classify email into one of
several categories ("buckets") by examining the words that make
up the message. It is typically used to classify mail as spam or non-spam.
The standard implementation of POPFile is as a proxy, either between the
Internet and the mail server or between the mail server and the mail readers.
Figure A shows an SMTP Proxy that receives mail from the internet and
Classifies it using Bayesian techniques before handing it off to a mail server.
Figure B shows a POP3 proxy, that retrieves mail from a mail server on
behalf of a mail client program, classifies it using Bayesian techniques
and then hands it off to the mail client that requested the mail
POPFileD is a Mercury/32 Daemon that implements a third way to use POPFile,
by having Mercury/32 call POPFile to examine mail. This allows Mercury/32
to serve as the direct interface both to the Internet and to the mail
reader clients, without having POPFile as a proxy.
Figure C shows the mail server receiving mail directly from the Internet,
passing it to the Bayesian Classifier for classification, and then
giving it to a mail client when needed
The sequence of processing using POPFileD is as follows:
Mercury/32 receives an email message.
The Mercury/32 Daemon passes a file with the email to POPFile.
This takes place before Policies, Content Filters and Mail Filtering Rules
are applied. For this communication, The Daemon uses port 111
to communicate with POPFile but this is configurable.
POPFile classifies the email and returns to Mercury/32 the name
of the bucket where the email belongs.
Mercury/32 adds a header to the email containing the name of the
bucket.
Mercury/32 then continues processing the email. Either the
Mercury/32 filters or the end recipient's filters can check for the
header that contains the bucket name and take appropriate action.
The Mercury/32 daemon has optional features to reduce unnecessary
processing and to aid in the handling of spam once it has been detected.
A size limit so that POPFile does not process messages larger
than a given size. This eliminates the overhead of processing email
with large attachments. The assumption is that spammers do not send
very large messages because it ties up their bandwidth.
A regular expression filter. The lines in the header of the message
being processed are compared with a set of regular expressions
and if a match is found the message is not given to POPFile for
processing.
Optionally, if a message received by Mercury/32 has at least
one non-local recipient, it will not be checked by POPFile, independent
of how many local recipients it has. The assumption is that you do
not send spam and you do not have an open relay, so you do not need
to check the mail your system is sending. This is particularly
important if you host mailing lists: a message received for the list, will
generate one outbound copy for each member of your list, and without this
option every copy would be checked by POPFile.
The ability to modify the Subject: header by inserting the name of
the POPFile bucket. This allows some email clients
(e.g. Outlook Express) that can only
examine the Subject: line to still discriminate spam from non-spam.
The ability to add headers containing the name of the recipient
to spam email. Many systems handle spam by redirecting it to a separate
mailbox that is checked periodically for mail incorrectly tagged as spam.
These false positives are manually forwarded to the intended recipient.
Sometimes the "To:" or "CC:" headers in the mail
do not contain the name of the intended recipient and it is difficult find
who was the intended recipient of the false positive. This feature simplifies
the problem by adding the
name of the recipient from the email envelope to the mail headers.
Note that this feature should be used with some thought since it will
add to the mail the names of the "BCC:" recipients,
thereby destroying the confidentiality afforded by the use of
"BCC:". For this reason, the names of the recipients are
added only to mail that has been tagged with special headers
and not added to mail that has not received these headers.
Communication between Mercury/32 and POPFile uses a TCP/IP socket.
This adds robustness to the configuration. If POPFile crashes,
Mercury/32 will continue processing mail, albeit without checking it
for spam. In a proxy configuration, a crash of POPFile can completely
halt mail processing.
Note that mail from a local sender to a local recipient must always be
checked, because the sender could actually be a spammer forging the local
sender's address.
Installation Instructions
You should have received the following files:
POPFileD.dll
Mercury Daemon DLL
daemon.ini
Mercury Daemon description file
MERC.pm
POPFile side of the interface
English.msg
POPFile text strings.
POPFileD.html
This file
proxies.gif
The graphics in this document.
popfiled.rxp
Sample regular expressions.
Both Mercury/32 and POPFile should already be installed on the same
computer. Mercury should be operational and correctly handling
your mail without using POPFile as a proxy.
This version of the Daemon works with POPFile Version 0.21.0. POPFile
should not be running.
Copy the files POPFileD.dll
and popfiled.rxp to the directory where Mercury.exe is installed.
If your Mercury directory does not have a file called daemon.ini, copy
the file provided to this directory.
If your Mercury directory already has a file called daemon.ini,
append the text from the file provided to your file.
Use Notepad to edit the file daemon.ini. It should have at least two
lines that look like this:
Leave the first line as-is. The second line should have on the right of
the equal sign:
the full path to the POPFiled DLL
a semicolon
optional configuration parameters for the daemon. These parameters are:
-p<number>
The IP port used to
communicate with POPFile. The default is 111.
-l
If present, only mail whose recipients
are all local is processed by POPFile. The default is to process all email.
-s<number>
If present, mail larger
than <number> Kb will not be processed by POPFile. The default
is to process all email. Note that the value 50 in the example
means that email messages of 50Kb or more will not be checked
by POPFile at all. If you are receiving spam messages
of more than 50Kb, you should adjust this value to suit your needs.
Some users have reported receiving spam messages as large as 120Kb.
-t
If present, mail tagged with either the
X-Text-Classification or the X-POPFile-Link headers
will also receive the new header
X-Recipient: <recipient_name>
-f<filename>
The full path of the
file containing regular expressions to filter mail. See below for
the format of this file.
Edit the file popfiled.rxp to match your domain and your policies
Restart Mercury/32. Check the messages in the System Messages window;
make sure the daemon has
started.
The Daemon will report its version number as it starts.
To view the System messages window,
click on Window/System Messages.
Copy the file MERC.pm to the "proxy" subdirectory of your
POPFile installation. The name of the file is case sensitive.
Since you will not be using POPFile as a proxy, you can delete the file
POP3.pm and the file SMTP.pm (if it exists) from that directory,
or move them to a safe place.
Do not remove the file Proxy.pm from that directory,
without it POPFile will not run.
Copy the file English.msg to the languages subdirectory of your
POPFile installation, replacing the file by that name already there.
This file contains text strings used by POPFile to construct its browser
interface. POPFileD adds four lines at the end of this file.
If you are using a language other than American English, you will need to
copy those four lines to the appropriate language file in your languages
directory, and translate the text to your language. Sorry, but my
technical Spanish is rusty and my Bulgarian is non-existent.
Start POPFile. Use its configuration interface to verify that the Mercury
module has been recognized and that it is listening on the same IP port
that you configured in step 4 (the default is 111). If you need to,
change the port number and restart POPFile.
Test by sending an email into your Mercury/32 server.
Check the System Messages window in Mercury/32 for errors.
Regular Expressions
The Daemon can compare the headers of a message against a set of
text strings called regular expressions. If the header lines match
the regular expressions then the Daemon will allow the message to proceed
without passing it to POPFile for classification. Possible uses of
this feature are:
Recognizing mail from internal users. If you know your users
are not sending spam, there is no need to check their mail.
Recognizing mail from an external user that is being resent by
an internal user.
Recognizing mail being automatically resent by Mercury's
forwarding file capability.
Recognizing mail that has been detected as spam by a DNS-based
Black List (DNSBL). If the DNSBL has determined the mail is spam, there
is no point in having POPFile check it for spam.
For a good explanation of Regular Expressions see the
Mercury/32 Help file. For consistency, the Daemon uses the
same regular expressions as Mercury/32.
The regular expression file contains one regular expression per line.
A regular expression must be preceeded by one of two keywords,
if and and. Their use is easiest
to explain with examples. If the file contained:
if <expression_1>
if <expression_2>
then the message would not be processed by POPFile if the message headers
contained a line that matched either <expression_1> or <expression_2>.
If the file contained:
if <expression_1>
and <expression_2>
if <expression_3>
then the message would not be processed by POPFile if the message headers
contained either a line matching <expression_1> and
a line matching
<expression_2> or if the headers contained a line matching
<expression_3>.
In general, the file consists of groups of regular
expressions. The first regular expression in each group is
introduced by the keyword if. This regular expression is
optionally
followed by regular expressions that are introduced by the keyword
and. The semantics is that the message will not be
handed off to POPFile if its headers contain lines matching
all the regular expressions in at least one group.
Support
I can provide support only via e-mail and on my spare time,
which usually means evenings.
Send support questions and suggestions to
.
Change History
New in Version 1.2.x
The following are the salient differences beween PopFileD 1.2.x and 1.1.x:
PopfileD 1.2.x supports regular expressions to give some control
over what messages are passed to POPFile.
PopFileD 1.2.x can handle malformed mail messages that consist
of headers without a message body.
Version 1.2.1: Correct problem with filename passed incorrectly
to POPFile when -f option not given in Daemon command line.
Version 1.2.2: Correct bug causing Daemon to delete blank
line separating header from body.
New in Version 1.1.x
PopFileD version 1.1.x was created for compatibility with
POPFile 0.21.0. If you
are using POPFile 0.20.x, you should use PopFileD version 1.0.1.
The following are the salient differences beween PopFileD 1.0.x and 1.1.x:
The -b flags used in version 1.0.x of the daemon to define
which buckets require adding headers to the mail and which buckets do not,
have been dropped. Instead, the daemon uses the new POPFile
configuration feature that lets you define what headers
should be added to the mail on a per-bucket basis.
Daemon can now modify the Subject: header with the name of the bucket.
This is also controlled from the POPFile configuration.
If the incoming mail contains X-Text-Classification: or X-POPFile-Link:
headers, the Daemon now deletes them before adding its own headers to prevent
duplicated headers
Both the POPFile extension and the Daemon report their version
number at startup.
Terms and conditions of use
PopFile Daemon is free software and may be used by any number of users
on any number of systems without fee or obligation, subject only to
the terms and conditions laid out below.
PopFIle Daemon is NOT in the public domain - the author,
Marketing Matrix, Inc., retains ownership and copyright, and exclusively
reserves all rights to the software. In countries where assertion of the
right to be identified as the author is required for copyright purposes,
Marketing Matrix, Inc. asserts its right to be recognized as the author
and owner of the PopFile Daemon and all its associated components.
Modification of the program or its resources or associated data files
without the author's explicit written permission is strictly forbidden.
Unauthorized modifications of any component of the PopFIle Daemon DLL
constitute a breach of intellectual property laws in most countries and
will be pursued vigorously to the full extent of the law.
PopFIle Daemon may be used by anyone and may be freely distributed
via any medium, either commercial or non-commercial, provided the
following conditions are met:
Distribution:
PopFile Daemon must be distributed complete
and unaltered in the original ZIP archive file or self-extracting archive,
with all messages intact. System administrators and ISPs wishing to
repackage the PopFile Daemon archive for supply to their users may do
so provided some basic guidelines are followed - please send e-mail to
the address above for more information on this. In the event that
the PopFile Daemon software is being distributed as part of another
package or software bundle, or in association with software or services
for which a charge is being levied, the author's permission must be
obtained before distribution occurs. We will authorize by fax or
by an e-mail message signed with our public key signature at our discretion.
Bona fide Internet Service Providers are exempted from the requirement to
obtain formal permission (see section 6, below).
Charging for distribution:
No charge may be directly levied for
the PopFile Daemon itself. Fair copying and support charges may be applied
but you must not represent that you are actually selling the software
itself. The intent of this statement is to allow book publishers to
distribute the system freely with books, and to permit Software Libraries
and BBS systems to distribute PopFile Daemon in their catalogues provided
only reasonable handling and duplication fees are charged.
Prohibited supply:
The supply or promotion of PopFile Daemon
for the purpose of sending bulk, unsolicited e-mail is incompatible with
the basic aims of the program, which revolve around the free provision
of a service that enhances the quality of communication between people.
PopFile Daemon may not be included in any package designed for this purpose,
whether free or otherwise, nor may vendors of such packages use the
"PopFile Daemon" trademark or other related material in the promotion of
their package. Similarly, we do not consider bulk, unsolicited e-mail
to be an appropriate use of PopFile Daemon and reserve the right to
decline technical support to people using it for this purpose.
Prohibited use:
PopFile Daemon may not be used for the purpose
of sending Bulk Unsolicited Commercial Electronic Mail. For the purposes
of this section, this shall be construed to mean electronic mail sent
to a total of more than 50 recipients for the purpose of advertising
a commercial product or service, where the recipient has not
explicitly expressed interest in receiving such advertisements.
Ownership:
Ownership of the PopFile Daemon Software remains
vested in the author, Marketing Matrix, Inc. You may not represent
ownership or copyright in the system in the course of distribution,
and you must not represent any specific connection with, or authorization
or license from the author.
Distribution by ISPs:
Bona-fide Internet Service Providers are
explicitly granted permission to bundle the PopFile Daemon Software with
their standard subscriber access package if they wish, even if a charge
is levied for that access package, provided the conditions laid out
in section (1) above are met. If you supply PopFile Daemon as a separate
item (as opposed to being part of an access bundle), you may only charge
a reasonable duplication or handling fee and must otherwise abide by
all other terms and conditions defined herein.
No liability:
Although all possible care has been taken to
ensure that the PopFile Daemon Software is as reliable as possible,
the diversity of environments in which it might be used means that we
can accept no responsibility for loss or damage, whether real
or consequential, arising from its use. By using the software you
explicitly agree to hold the author blameless for any such losses
or damages.
All rights reserved:
We reserve the right to change the terms
and conditions of use and distribution of PopFile Daemon without specific
notice, although we will make reasonable efforts to advise of any such
change through normal channels (user groups, mailing lists and so on).
The current terms and conditions of use of PopFile Daemon can be obtained at
any time by sending a message to the address above requesting them.
All the legalese aside, it is our strong desire that PopFile Daemon
be as widely used as is possible in the hope that by furthering
communication between people, we may in some small way come to
understand and accept each other better.