POPFile
(http://getpopfile.org/)
is a program that uses Bayesian techniques to classify email into one of
several categories ("buckets") by examining the words that make
up the message. It is typically used to classify mail as spam or non-spam.
The standard implementation of POPFile is as a proxy, either between the
Internet and the mail server or between the mail server and the mail readers.
Figure A shows an SMTP Proxy that receives mail from the internet and
Classifies it using Bayesian techniques before handing it off to a mail server.
Figure B shows a POP3 proxy, that retrieves mail from a mail server on
behalf of a mail client program, classifies it using Bayesian techniques
and then hands it off to the mail client that requested the mail
POPFileD is a Mercury/32 Daemon that implements a third way to use POPFile,
by having Mercury/32 call POPFile to examine mail. This allows Mercury/32
to serve as the direct interface both to the Internet and to the mail
reader clients, without having POPFile as a proxy.
Figure C shows the mail server receiving mail directly from the Internet,
passing it to the Bayesian Classifier for classification, and then
giving it to a mail client when needed
The sequence of processing using POPFileD is as follows:
Mercury/32 receives an email message.
The Mercury/32 Daemon passes a file with the email to POPFile.
This takes place before Policies, Content Filters and Mail Filtering Rules
are applied. For this communication, The Daemon uses port 111
to communicate with POPFile but this is configurable.
POPFile classifies the email and returns to Mercury/32 the name
of the bucket where the email belongs.
Mercury/32 adds a header to the email containing the name of the
bucket.
Mercury/32 then continues processing the email. Either the
Mercury/32 filters or the end recipient's filters can check for the
header that contains the bucket name and take appropriate action.
The Mercury/32 daemon has optional features to reduce unnecessary
processing and to aid in the handling of spam once it has been detected.
A size limit so that POPFile does not process messages larger
than a given size. This eliminates the overhead of processing email
with large attachments. The assumption is that spammers do not send
very large messages because it ties up their bandwidth.
A regular expression
filter (not available in POPFileD version 1.1.x).
The lines in the header of the message
being processed are compared with a set of regular expressions
and if a match is found the message is not given to POPFile for
processing.
Optionally, if a message received by Mercury/32 has at least
one non-local recipient, it will not be checked by POPFile, independent
of how many local recipients it has. The assumption is that you do
not send spam and you do not have an open relay, so you do not need
to check the mail your system is sending. This is particularly
important if you host mailing lists: a message received for the list, will
generate one outbound copy for each member of your list, and without this
option every copy would be checked by POPFile.
The ability to modify the Subject: header by inserting the name of
the POPFile bucket. This allows some email clients
(e.g. Outlook Express) that can only
examine the Subject: line to still discriminate spam from non-spam.
The ability to add headers containing the name of the recipient
to spam email. Many systems handle spam by redirecting it to a separate
mailbox that is checked periodically for mail incorrectly tagged as spam.
These false positives are manually forwarded to the intended recipient.
Sometimes the "To:" or "CC:" headers in the mail
do not contain the name of the intended recipient and it is difficult find
who was the intended recipient of the false positive. This feature simplifies
the problem by adding the
name of the recipient from the email envelope to the mail headers.
Note that this feature should be used with some thought since it will
add to the mail the names of the "BCC:" recipients,
thereby destroying the confidentiality afforded by the use of
"BCC:". For this reason, the names of the recipients are
added only to mail that has been tagged with special headers
and not added to mail that has not received these headers.
Communication between Mercury/32 and POPFile uses a TCP/IP socket.
This adds robustness to the configuration. If POPFile crashes,
Mercury/32 will continue processing mail, albeit without checking it
for spam. In a proxy configuration, a crash of POPFile can completely
halt mail processing.
Note that mail from a local sender to a local recipient must always be
checked, because the sender could actually be a spammer forging the local
sender's address.
The version of POPFileD you need to download depends on the version of
POPFile you are using.
(This feature is not available in POFileD Version 1.1.x and earlier.)
The Daemon can compare the headers of a message against a set of
text strings called regular expressions. If the header lines match
the regular expressions then the Daemon will allow the message to proceed
without passing it to POPFile for classification. Possible uses of
this feature are:
Recognizing mail from internal users. If you know your users
are not sending spam, there is no need to check their mail.
Recognizing mail from an external user that is being resent by
an internal user.
Recognizing mail being automatically resent by Mercury's
forwarding file capability.
Recognizing mail that has been detected as spam by a DNS-based
Black List (DNSBL). If the DNSBL has determined the mail is spam, there
is no point in having POPFile check it for spam.
For a good explanation of Regular Expressions see the
Mercury/32 Help file. For consistency, the Daemon uses the
same regular expressions as Mercury/32.
Support
I can provide support only via e-mail and on my spare time,
which usually means evenings.
Send support questions and suggestions to
.
Change History
New in Version 1.22.4
POPFileD 1.22.4 contains internal modifications to use hooks
implemented in Mercury/32 V4.10 and newer that provide a documented
way for the Daemon to modify incoming messages. In Mercury/32 V4.01b and
earlier, there was no documented way to do this, and as a result POPFileD
would sometimes not be able to work when other Mercury/32 daemons were
installed. POPFileD is backwards compatible and will still work with
Mercury/32 V4.01b and earlier using the old method of message
modification.
POPFileD 1.22.4 allows extensive debugging output, which should
simplify the diagnosis of problems
POPFileD 1.22.4 also contains a change in the installation procedure
that eliminates some minor glitches in the POPFile user interface when using
POPFile Version 0.22.4. It is recommended that all POPFileD users upgrade to
this new version of the Daemon:
Shut POPFile down
Reinstall POPFile (now is the time to upgrade to the latest version).
This will reinstall the original POPFile user interface internationalization
files, which were replaced when you installed an older version of the Daemon.
Install POPFileD following the instructions in this document
New in Version 1.22.0
POPFileD 1.22.0 is the successor to POPFileD 1.2.2. It was created for
compatibility with POPFile 0.22.x, and can only be used with this version
of POPFile. If you are using POPFile 0.21.x, you should be using POPFileD 1.2.2.
The POPFileD version numbering has been changed to track the POPFile version numbers.
POPFileD 1.22.0 only contains modifications to the PERL modules, to
account for the design and implementation changes between POPFile 0.21.x and
POPFile 0.22.0. There are no functionality changes n POPFileD 1.22.0 .
New in Version 1.2.x
The following are the salient differences beween PopFileD 1.2.x and 1.1.x:
PopfileD 1.2.x supports regular expressions to give some control
over what messages are passed to POPFile.
PopFileD 1.2.x can handle malformed mail messages that consist
of headers without a message body.
Version 1.2.1: Correct problem with filename passed incorrectly
to POPFile when -f option not given in Daemon command line.
Version 1.2.2: Correct bug causing Daemon to delete blank
line separating header from body.
New in Version 1.1.x
PopFileD version 1.1.x was created for compatibility with
POPFile 0.21.0. If you
are using POPFile 0.20.x, you should use PopFileD version 1.0.1.
The following are the salient differences beween PopFileD 1.0.x and 1.1.x:
The -b flags used in version 1.0.x of the daemon to define
which buckets require adding headers to the mail and which buckets do not,
have been dropped. Instead, the daemon uses the new POPFile
configuration feature that lets you define what headers
should be added to the mail on a per-bucket basis.
Daemon can now modify the Subject: header with the name of the bucket.
This is also controlled from the POPFile configuration.
If the incoming mail contains X-Text-Classification: or X-POPFile-Link:
headers, the Daemon now deletes them before adding its own headers to prevent
duplicated headers
Both the POPFile extension and the Daemon report their version
number at startup.