200.165.162.110
187.45.193.174
80.95.70.173
70.166.17.109
66.207.161.158
66.207.161.157
66.207.161.156
66.207.161.155
62.193.234.9
213.186.38.38
93.157.47.1
89.215.98.86
87.244.217.116
87.213.66.191
87.106.10.20
82.140.81.3
80.65.16.71
74.53.39.250
74.50.95.10
74.205.124.9
69.55.236.96
91.189.90.139
209.234.229.51
199.201.145.165
91.189.94.204
82.195.75.100
208.93.0.128
208.80.57.240
74.125.82.51
74.125.82.173
72.172.89.42
66.220.144.154
66.220.144.140
66.220.144.139
66.211.161.26
66.211.161.25
62.180.227.30
209.85.214.173
209.85.210.173
209.46.25.134

IP Reputation

This is an automated, free, public email IP reputation system. For people contributing data, the results are already better than anything else used with spamassassin. Now we just need more data to make it more useful for everybody else.

The primary goal is a whitelist. Other data is provided as a consequence.

It is usable and fully automated as of 2011-03-31.

The data is the actual percentage of email from each IP which is ham (normalized like SpamAssassin's S/O score) and a count of the total emails from that IP (as a logarithm).

Data provided (updated daily).
Bind zone file (DNS format) (intended to be replaced by rsync and a SpamAssassin plugin).

The above two data files are released as public domain.

Reporting Script

iprep.pl

Run as:

./iprep.pl ham:dir:~/mail/ham spam:dir:~/mail/spam/

Arguments are the same "targets" used by SpamAssassin's mass-check, mail folders containing email that has been hand verified to be entirely ham or spam.

<class>:<format>:<location>
<class> is "spam" or "ham"
<format> is "dir" (maildir), "file", "mbx", "mbox", or "detect"
<location> is a file or directory name. globbing of ~ and * is supported

Config file ~/.ipreprc:

$trusted_networks = '<space delimited list of trusted hosts>';
$user = 'username';
$pass = 'password';

$trusted_networks is very important, as it prevents you from reporting the IP address of your trusted relays instead of the IP actually sending the email. Include the IPs (or CIDRs) from both trusted_networks and internal_networks SpamAssassin values, documented here: network test options, trust path.
It's pretty normal for this to be empty.

Please run as a daily cron job.

Another option is to feed the email through STDIN with the --live-ham or --live-spam arguments, and later upload the data with the --upload argument (probably from cron):

cat ham.txt | ./iprep.pl --live-ham
./iprep.pl --upload

Account

Email me for an account to allow you to upload. Please email me from a non-freemail account. Major examples of freemail accounts, which I do not want you to email me from, are gmail.com, yahoo.com, and hotmail.com. SpamAssassin has a more complete list of freemail providers. This is just an attempt to make it slightly more difficult for spammers to send me bad data.

Please let me know what username you'd like, so I don't have to guess. And I'd be curious to hear how you found out about this project.

DNS White / Black list

While I don't want to use DNS to provide the data long term, I am doing it now for testing.

SpamAssassin Rules

ifplugin Mail::SpamAssassin::Plugin::DNSEval
header   __RCVD_IN_IPREPDNS     eval:check_rbl('iprep-firsttrusted', 'iprep.chaosreigns.com.')
tflags   __RCVD_IN_IPREPDNS     nice net

header   RCVD_IN_IPREPDNS_100   eval:check_rbl_sub('iprep-firsttrusted', '^127\.\d+\.\d+\.100$')
describe RCVD_IN_IPREPDNS_100   Sender listed at http://www.chaosreigns.com/iprep/, 100% ham
tflags   RCVD_IN_IPREPDNS_100   nice net

header   RCVD_IN_IPREPDNS_50    eval:check_rbl_sub('iprep-firsttrusted', '^127\.\d+\.\d+\.50$')
describe RCVD_IN_IPREPDNS_50    Sender listed at http://www.chaosreigns.com/iprep/, 50% ham
tflags   RCVD_IN_IPREPDNS_50    nice net

header   RCVD_IN_IPREPDNS_0     eval:check_rbl_sub('iprep-firsttrusted', '^127\.\d+\.\d+\.0$')
describe RCVD_IN_IPREPDNS_0     Sender listed at http://www.chaosreigns.com/iprep/, 0% ham
tflags   RCVD_IN_IPREPDNS_0     net

meta     RCVD_NOT_IN_IPREPDNS   ( ! RCVD_IN_IPREPDNS_100 && ! RCVD_IN_IPREPDNS_50 && ! RCVD_IN_IPREPDNS_0 && ! NO_RELAYS && ! ALL_TRUSTED )
describe RCVD_NOT_IN_IPREPDNS   Sender not listed at http://www.chaosreigns.com/iprep/
tflags   RCVD_NOT_IN_IPREPDNS   net

score    RCVD_IN_IPREPDNS_100   -0.1
score    RCVD_IN_IPREPDNS_50    -0.0001
score    RCVD_IN_IPREPDNS_0     0.1
score    RCVD_NOT_IN_IPREPDNS   0.0001
endif

The zone is iprep.chaosreigns.com, with the typical reversed IP address lookup, and 127.0.0.<type> values. The values are 0, 50, and 100. 0 means 0% of the mail from the IP has been ham, 100 means it was 100%, and 50 means anything in the middle. Only 0.04% of the data is between 0% and 100%, which is why I'm not currently providing more ranges. So to look up 74.125.82.51, do:

$ host 51.82.125.74.iprep.chaosreigns.com
51.82.125.74.iprep.chaosreigns.com has address 127.0.40.50

Results

Training on 400 of my emails, then testing on 100 of my own emails (not used in testing):
RCVD_IN_IPREPDNS_100 hit 79.3% of ham and no spam.
RCVD_IN_IPREPDNS_0 hit 27.8% spam and no ham.

Those are crazy good numbers alone.

After training on 170,000 emails from myself and one other person, testing 10,000 of our emails:
RCVD_IN_IPREPDNS_100 hit 94.1% of ham, and 0.010%
RCVD_IN_IPREPDNS_0 hit 64% of spam and no ham.

Also crazy good numbers.

So for people contributing data, the results are better than anything else available for spamassassin. But for it to be useful for people not contributing data, we need more data.

Uploaded Data

The actual data you upload looks like this, just a timestamp and IP address from each email:

$ head ~/iprep/iprep-spam-darxus.log
1298981048 77.55.116.13
1299202198 208.89.10.45
1299245987 120.138.17.204
1299246951 120.138.17.204
1299792485 208.109.80.73
1299880708 66.207.161.156
1299934739 182.99.187.27
1299953420 74.117.209.132
1300488351 64.120.223.237
1300494388 64.120.223.238

Plans

I'm planning to provide the data only via rsync, because I think this will reduce bandwidth loads. I'll create a SpamAssassin plugin to retrieve the data directly and create the SpamAssassin tests for it.

IPv6

IPv6 is supported. IPs are aggregated to /48 blocks. So all IPs in 1234:5678:9012:* are lumped together. It is entirely possible this will change.

Mutt (mail reader) colorization

The mail reader I use is mutt. In my ~/.muttrc I have the following, to easily see what hasn't been flagged as ham by this data:

color index     yellow     default   ~hX-Spam-Status:.*RCVD_NOT_IN_IPREPDNS
color index     yellow     default   ~hX-Spam-Status:.*RCVD_IN_IPREPDNS_0

Google's white paper on reputation systems

Google presented a white paper on their email reputation system at CEAS 2006.


"Seems like this could all be more useful if there was a good way to automatically report addresses that sent non-spam."
- Darxus, November 2006, discussing dnswl.org. This sort of automation is still not used by dnswl.org, and a substantial part of my reason for creating this project.
I have been involved with DNSWL since then. I have provided a DNSWL DNS mirror since March 2007.


Mon Feb 27 16:00:54 EST 2012
Contact Darxus