What is Bayesian spam filtering?

Bayesian spam filters calculate the probability that a message is spam based on its content. Unlike simple word-based filters, Bayesian spam filters learn from incoming spam and good email, resulting in a very robust, adaptive, efficient anti-spam approach that rarely produces false positives.

IAML5.12: Naive Bayes for spam detection

Simple word-based spam filters do not account for what might be considered unusual words (an indication that a particular message might be spam) for each email user. Furthermore, they do not have the capacity to change the rules they use to identify spam over time. Bayesian spam filters are different because they do both.

Bayesian spam filters build up a list of unwanted words over time. They analyze both spam and good messages to calculate the probability that different features will appear in spam and in good mail. New, unwanted words are then added to the list.

If a word never appears in spam, but frequently appears in the legitimate email you receive, the chance that that word indicates spam is almost zero. For example, suppose you receive a lot of legitimate messages that contain the word Cartesian. That fact reduces the chance that any email you receive containing the word Cartesian is spam. Conversely, suppose you rarely or never receive legitimate messages containing the word toner. If you do receive a message that does contain the word toner, it is more likely to be spam.