As Blogs Grow, So Does Spam
Inboxes aren't the only things getting flooded by unwanted content. "Splogs" are taking root on the Web and threatening the existence of some sites by innundating them with fake blog
April 21, 2006
Spam blogs are a big headache for Taylor Bayouth.
He wants the Web site he founded, tBlog.com, a combination social network and blog publishing platform, to parse the words of its more than 200,000 members to update their profiles every time they post. Bayouth believes the "thought matching" system would be unique. But one of the biggest roadblocks he faces - besides competing against much bigger competitors, like MySpace.com - is the amount of spam blogs that hit his site.
"Spam is our No. 1 enemy," he says. "This is what we battle with on a daily basis. Spam could literally just kill this thing."
It may be little consolation to Bayouth, but in fact, there are millions of spam blogs - or "splogs" - with more added every day. And they aren't going away.
"It's not getting any better and it's probably getting worse," saysTim Finin, a computer science professor at University of Maryland, Baltimore County, who helped write a paper about detecting splogs that was presented last month at an American Association for Artificial Intelligence conference.That may be true, but Natalie Glance, senior research scientist with Nielsen BuzzMetrics, which tracks and analyzes what consumers say online about companies and their brands, says there is some good news: blog search engines are getting better at separating the garbage from their results.
"Identifying spam isn't all that hard," she says. "The thing is it's a game of escalation."
Nielsen BuzzMetrics operates its own search engine, BlogPulse, which reported Wednesday morning that it has identified more than 26 million blogs, with nearly 87,000 new within the previous 24 hours. The measurement company indexed 828,890 posts in the same time period. By comparison, search engine Technorati says the blogosphere is even bigger and doubling in size every six months. The company tracks more than 35 million blogs and about 1.2 million new posts each day, which works out to about 50,000 per hour. It reports that about 9 percent of new blogs are spam, and that 60 percent of pings - the messages blogs send to a centralized network service notifying of newly published posts - are from known spam sources. Technorati says it blocks these spam pings, or spings.
"Spam blogs and their cousins spings continue to present infrastructure providers like Technorati a challenge," founder and CEO David Sifry wrote on the site's blog Monday. "Aside from a few notable spam storms ... the high level of interesting, original content being created greatly outweighs the fake or duplicate content listed on splogs."
A study last December by the eBiquity Research Group at UMBC found that the amount of spam pings is even higher, nearly 75 percent. eBuiquity also discovered that more than half of the blogs pinging one particular ping server, weblogs.com, are spam.Finin, who helps run eBiquity at UMBC, says Technorati is as good as any search engine at picking out splogs, but that one out of every five blogs it counts are actually fake.
The people who create splogs - or, more accurately, the people who write the programs that do it for them - rarely intend for anyone to actually read their posts. They're just building a giant clump of links that refer back to some other site - that, say, promotes gambling or sells something like Viagra - and thus increases the page rank of that site on different search engines.
Then, in the odd chance that anyone might actually read their junk posts, the creators put ads on them that generate a small commission, usually a fraction of a dollar, for every click.As sploggers get more sophisticated, one of their newer strategies is to plagiarize material from other online sources rather than simply randomly generating text. Then they insert a generic sentence that points to the site they're promoting. "It's not easy for even a human to tell" if the blog is real, Finin says. "It takes a minute or two."
A case in point: Recently Finin noticed, through Technorati, that a blog had copied content he had written about the OWL programming language. When he visited the site, he found links to other stories that had to do with owls - the winged creatures, yes, but also the Temple University basketball team, a bar in Baltimore, a street in Houston and other random examples.
"On further investigation, the person who set this up also set up hundreds of others, focused on different keywords or phrases," he says. Finin believes the site is a "splog farm" that may look legitimate now but eventually will carry ads and links to target sites.Part of the problem is that blog search is still in its infancy and the companies doing it are small, unlike the huge companies that dominate Web search, which have huge teams dedicated to researching data quality, says BuzzMetrics' Glance. The company, which is owned by Dutch publishing giant VNU, bought competitor Intelliseek earlier this year.
On top of that, Web search results are ranked by relevance, which means that spam sites are allowed to exist, they just don't show up on the first few pages of results.
"On blog search, what people are interested in seeing is not the most relevant but the most recent," Glance says. "If there's a spam attack on a particular topic on a given day, it will be on the first page unless we filter them out. We can't just give them a low relevance score."
In the end, Finin believes that splogs, just like email spam, will never be eliminated completely. "It's something we'll just have to live with," he says. "But a certain level of effort has to be put into trying to suppress or eliminate it; otherwise it will completely overrun everything."
You May Also Like