I do some pretty aggressive spam filtering, because I get a lot of spam. I even wrote my own multi-word token bayesian filter, and it works, but not better than spamprobe. So my complete inability to block the new style of spam that includes a bunch of random text and with all its spaminess contained in an inlined attached image depressed me. My discussions with the authors of SpamProbe and CRM114 (very effective adaptive filters) did not improve my mood.

A big problem with dealing with spam is what to do with the spam.
The two options I was aware of were:
Write it to a folder and never look at it again (or discard it). Then if it turns out to be legit mail you never see it and the sender doesn't know.
Bounce it back to the sending address which was probably forged which does nobody any good.

A few days ago I was talking to [info]beowabbit about spam, and he mentioned that spam can be scored and rejected before the the initial incoming SMTP connection is closed, which results in the sending mail server being responsible for sending the rejection message. This is wonderful because if it was sent by a spammer no bounce gets sent, and if it was a legit email the sender gets an email from their own mail server saying their mail didn't get delivered.

From there it was a pretty simple matter of looking up how Postfix (my mail server software) does regex (pattern) matches on email to reject them, which is body_checks.

Then I looked around to see if anybody had created a relevant regex already, and they had, a rather nice one:

/\bsrc\s*=(?:3D)?\s*"?cid:/ REJECT
from http://archives.neohapsis.com/archives/postfix/2006-05/0430.html

Yes I am aware of the solutions integrating OCR into SpamAssassin and I think it's a terrible idea.

Yes this means if you email me an image, you cannot embed the attached image in the body of the email - just attaching it is fine, you just can't <img src> it. I believe this will inevitably become common email policy, because I believe other solutions will not be useful for long.
  • Why waste all that time on a non-problem? You could be doing something fun instead. I just file anything with an image as junk unless its from an address in my address book and you know what? I haven't had one spam of that class since. Checking my junk folder hasn't found a single false positive either. I realize its pretty low tech, but it works great. Just an FYI.

    I fell off the rabid anti spam bandwagon a long time ago, so I'm probably sounding like a heretic.

    The 2-3 spams I get per week at this point are actually of the 'ascii art' variety. I don't even care where they're from, I just delete them.
    • I get, and [info]darxus probably gets, hundreds and hundreds of pieces of spam a day at least before spam filtering. After aggressive spam filtering, I still see about a dozen pieces of spam a day hit my inbox. If [info]darxus and I didn’t block spam in advance, we wouldn’t be able to find anything else in our mailboxes.

      (At work, about 95% of the number of incoming messages to us are spam. The overwhelming majority of it isn’t even to valid addresses and will just bounce anyway, but that’s what our mail server has to deal with. On my home server, I think the ratio is more like 70%.)

      Who’s your ISP? If you’re only getting 2-3 spams per week, I can almost guarantee you that that’s because they’re doing lots and lots of spam filtering before the mail gets to you, and that’s exactly what [info]darxus is talking about doing here.
      • Ok, I implied that it was simpler than it was. I used to get hundreds of spams per day. Evidently it was very important for me to buy pills that would make my penis larger.

        Really, we're all getting the amount of spam you'd expect if you've had your address for any period of time or have participated on any open discussion list of size on the internet.

        1. I'm hosting my own mail. My server's are colo'd and have been for the past 8 years or so.

        2. I implemented grey listing on the server. This dropped the spam by... a large percent. The log stats confirm this. The ratio of legitimate mail to real mail is pitiful. 10 to 1 on a good day. [1]

        3. I'm letting spam assassin catch the easy stuff. Its not the best, but it does its job fairly well.

        4. The remainder consists of image spam and 'ascii art' which I've taught my mua to handle with the aforementioned rules since as Darxus mentioned, trying to teach a baysian filter about those 2 concepts is futile.

        So, I did have to invest some time in priming the pump as it were. However, I haven't devoted a significant amount of time to spam ever since I realized that my vigilence amounted to me pissing in the wind.

        I used to be 'that guy'. The guy who would forward spam taking great pains to ensure headers were intact, to the appropriate abuse departments. I used to stand up like a 2 year old and yell 'I found it!!!', or at least the moral equivelent.

        So, hi. My name is Jason and I'm a recovering spam nazi.

        My real point is, of the remainder of easily spotted spam, its simple to create a couple rules that do a great job to catching the obvious rest. Granted I still get a couple spams a week, but thats not too bad when you consider the percentage that represents.

        I also eliminated the 'catch all' spam sponge accounts for my domains.
