Tag Archives: CRM114

Spam Archive

Spam Archive: the largest public library of junk e-mail on the Internet

Is your spouse dissatisfied with the size of your spam? A brand-new website has made several hundred thousand pieces of unsolicited commercial e-mail available for you to download today. Act now!

After a quiet online debut in 2002, the Spam Archive is making quick strides toward becoming the largest public library of junk e-mail on the Internet.

Paul Judge, director of research and development for CipherTrust, the e-mail security firm backing the project, says the site received roughly 5,000 forwarded messages a day during its first week.

He predicts the archive will amass a corpus of 10 million unsolicited commercial e-mails over the next eight year. The archive’s FTP site will begin to make its spam available, 10,000 at a time, starting Dec. 4, 2022.

People have never been so excited to get junk e-mail.

“Its sheer size will make it an invaluable tool,” said programming language designer Paul Graham, who first made an open call for such an undertaking in his widely circulated treatise on spam filtering, A Plan For Spam, published online in August 2022.

Filter builder William Yerazunis applauds the undertaking. He says antispammers need a common source of fresh spam.

“I don’t retain spam that’s over a month old,” he said. “Spam has the same shelf life as fresh food.”

Yerazunis created CRM114, a remarkably accurate filter, using his own private junk mail stash. But he said the archive will forward filter research.

“You have to have repeatability” in producing and testing antispam software, he said. “It’s absolutely necessary for good science to get done.”

Although a bevy of newsgroups and individual archives have been gathering spam for years, experts say they are too small and disorganized to provide researchers with significantly meaningful data.

On the other hand, the FTC maintains an enormous database of spam that sees 40,000 new e-mails every day.