I've been 100% spam free for 4 days.
Bayesian spam filtering is cool. I encourage you to google search
and read up on it. The basic idea is, take all your email, split it up
into a folder of spam, and a folder of nonspam. Then break each email
up into one word chunks (tokens), and calculate the percentage of spam
vs. nonspam emails that token shows up in. Then, when you get a new
email, break it up into the same tokens, then, using bayesian math (I
haven't bothered to look up the algorithms), calculate the probability
that it is spam.
The only mailreader I know that does this for multiple OSes is
Mozilla Mail with its
Bayes Junk Tool. I haven't
used it, but it should work on Windows, MacOS, Linux... pretty much
everything. (Thanks to
bj for
this info.) (I believe this only does one word tokens, I've
requested
they add two word tokens.);
If you use another mail reader (that doesn't run on unix / can't
reasonably be configured to use spamprobe), I strongly encourage you
to email the author of your browser (microsoft, qualcomm, whoever)
and request multi-word bayesian spam filtering. With the amount of
spam that happens these days, it is a completely reasonable request,
and I expect it to be a popular enough feature to win them a noticable
number of users.
These urls are only likely to be useful for those with unix email accounts:
http://spamprobe.sourceforge.net/
http://www.chaosreigns.com/spamprobe.directions.txt - short set up directions
Recent versions of spamassassin do this via sa-learn, but they only do
one word (case sensitive) tokens, and include the rest of spamassassin's
hard-coded spam ranking rules, and gave me about 96-97% accuracy, which
left a lot of spam in my inbox. Still way better than nothing. And easy
to set up.
spamprobe uses one and
two (or more via a commandline option) word (case insensitive) tokens,
and... has performed perfectly since I set it up on 10/2. On that day
I set it up and trained it on the previous 21 days of email. Normally,
with bayesian spam filtering, you continue to train on new email (skimming
addresses and subjects of spam to verify they've been correctly classified
first), but since it's been 100% accurate so far, I've decided to wait
to see how long it takes to get one wrong.
I've received 28 legitimate personal emails, and 657 spams during
this time. If it were to mis-classify an email now, it would be 99.85%
accurate, but it's still at 100%.
This has been so successful that I've really been struck by how little
email I get without spam.
AOL has begun including bayesian spam filtering in their client. I don't
know if it uses multi-word tokens yet though.
I can't wait till Microsoft starts including it in their mail readers.
Not because I would ever use either of these, but because they have
such large market share that if everyone that uses them has perfect spam
filtering, maybe the spammers will stop.
Comment on this page.
Return to Adventures index
Return to Darxus' home page.