Get totally free music and movies. Download P2P
software and start file sharing.
Click here No scams, no BS. Get BitTorrent, eMule, LimeWire or
Shareaza
Click here
Examination of anti-spam methods
There are a number of services and software systems that mail sites
and users can use to reduce the load of spam on their systems and
mailboxes. Some of these depend upon rejecting email from Internet
sites known or likely to send spam. Others rely on automatically
analyzing the content of email messages and weeding out those which
resemble spam. These two approaches are sometimes termed blocking and
filtering.
Blocking and filtering each have their advocates and advantages. While
both reduce the amount of spam delivered to users' mailboxes, blocking
does much more to alleviate the bandwidth cost of spam, since spam can
be rejected before the message is transmitted to the recipient's mail
server. Filtering tends to be more thorough, since it can examine all
the details of a message. Many modern spam filtering systems take
advantage of machine learning techniques, which vastly improve their
accuracy over manual methods. However, some people find filtering
intrusive to privacy, and many mail administrators prefer blocking to
deny access to their systems from sites tolerant of spammers.
DNSBLs
DNS-based Blackhole Lists, or DNSBLs, are used for heuristic filtering
and blocking. A site publishes lists (typically of IP addresses) via
the DNS, in such a way that mail servers can easily be set to reject
mail from those sources. There are literally scores of DNSBLs, each of
which reflects different policies: some list sites known to emit spam;
others list open mail relays or proxies; others list ISPs known to
support spam. Other DNS-based anti-spam systems list known good
("white") or bad ("black") IPs domains or URLs, including RHSBLs and
URIBLs.
Content-based filtering
Until recently, content filtering techniques relied on mail
administrators specifying lists of words or regular expressions
disallowed in mail messages. Thus, if a site receives spam advertising
"herbal Viagra", the administrator might place these words in the
filter configuration. The mail server would thence reject any message
containing the phrase.
Content based filtering can also filter based on content other than
the words and phrases that make up the body of the message. Primarily,
this means looking at the header of the email, the part of the message
that contains information about the message, and not the body text of
the message. Spammers will often spoof fields in the header in order
to hide their identities, or to try to make the email look more
legitimate than it is; many of these spoofing methods can be detected.
Also, spam sending software often produces a header that violates the
RFC 2822 standard on how the email header is supposed to be formed.
Disadvantages of this static filtering are threefold: First, it is
time-consuming to maintain. Second, it is prone to false positives.
Third, these false positives are not equally distributed: manual
content filtering is prone to reject legitimate messages on topics
related to products advertised in spam. A system administrator who
attempts to reject spam messages which advertise mortgage refinancing
may easily inadvertently block legitimate mail on the same subject.
Finally, spammers can change the phrases and spellings they use, or
employ methods to try to trip up phrase detectors. This means more
work for the administrator. However, it also has some advantages for
the spam fighter. If the spammer starts spelling "Viagra" as "V1agra"
(see leet) or "Via_gra", it makes it harder for the spammer's intended
audience to read their messages. If they try to trip up the phrase
detector, by, for example, inserting an invisible-to-the-user HTML
comment in the middle of a word ("Via<!---->gra"), this sleight of
hand is itself easily detectable, and is a good indication that the
message is spam. And if they send spam that consists entirely of
images, so that anti-spam software can't analyze the words and phrases
in the message, the fact that there is no readable text in the body
can be detected.
However, content filtering can also be implemented by examining the
URLs present (i.e. spamvertised) in an email message. This form of
content filtering is much harder to disguise as the URLs must resolve
to a valid domain name. Extracting a list of such links and comparing
them to published sources of spamvertised domains is a simple and
reliable way to eliminate a large percentage of spam via content
analysis.
Statistical filtering
Statistical filtering was first proposed in 1998 by Mehran Sahami et
al., at the AAAI-98 Workshop on Learning for Text Categorization. A
statistical filter is a kind of document classification system, and a
number of machine learning researchers have turned their attention to
the problem. Statistical filtering was popularized by Paul Graham's
influential 2002 article A Plan for Spam, which proposed the use of
naive Bayes classifiers to predict whether messages are spam or not ?
based on collections of spam and nonspam ("ham") email submitted by
users.
Statistical filtering, once set up, requires no maintenance per se:
instead, users mark messages as spam or nonspam and the filtering
software learns from these judgements. Thus, a statistical filter does
not reflect the software author's or administrator's biases as to
content, but it does reflect the user's biases as to content; a
biochemist who is researching Viagra won't have messages containing
the word "Viagra" flagged as spam, because "Viagra" will show up often
in his or her legitimate messages. A statistical filter can also
respond quickly to changes in spam content, without administrative
intervention.