Post image for The Aftermath of a Wordpress Spam Injection (and a Tool to Prevent it)

The Aftermath of a Wordpress Spam Injection (and a Tool to Prevent it)

by Jorge Escobar on April 20, 2009

Exactly one month ago my blog was the subject of a spam injection attack that has brought back consequences that are still with me to this day. Even though I am a web developer with years of experience and a sound approach to security, I was brought to my knees for days without even knowing it.

In this post I will explain to you what happened, what to look for and how to prevent that this happens to you.

Tell-tale signs

The very first sign that alerted me that something was going on was that my feed, which I self-subscribe to, started to appear with a list of spammy keywords at the end of each post. They didn’t have any links, just the words. I checked my blog and everything seemed okay, so I thought maybe this was some sort of problem with Feedburner (which lately has had a lot of issues, so why not one more, I thought.

health-ads

A couple of days later, I started noticing some strange ads popping up on my blog. They had to do with health and pills, and I thought this was strange, as most of my content is centered around social media and technology. I was starting to get worried but had no clue what was going on, as I would view the HTML code and my Wordpress installation and nothing strange was happening. Again, I thought maybe some health company had purchased space on my blog (and unfortunately you’ll see the same ads in this post as well, as Google thinks this post is about that).

Because I was protected with Akismet for my comments, and had FTP turned off for my blog, I was 100% sure I wasn’t infected with anything strange.

The truth was that I was infected.

The Final Discovery

After some research, I found out about some clever software injections that are either pushed via templates or plugins that are downloaded from non-Wordpress sites. I remembered I had downloaded a couple of plugins from external sites and went into panick mode.

The first recommended check was doing a site search with spammy keywords. I did, and was in for a rude surprise, they were all there.

site-search-sm

The funny thing was that if I clicked on the link to my site, I didn’t see them. The only way to see them was to go to the “Cached” version and then see the results in text mode.

I also went to my Google Webmaster tool (if you don’t have this, you should immediately) and saw all the spammy keywords in the content analysis:

spam-keywords

I was totally infected. It had been days (if not weeks) that this had been happening.

How the injection works

The spam keywords and links are hidden in chunks of code that are not human readable, usually encoded with a PHP function called base64 that converts all the HTML into words and letters that can be later decoded.

But when are they decoded? This is the smart part: if you see your site, your browser version is read by the spammy code and doesn’t render anything. But if the Google Bot or other bots are the ones accessing the code, it then decodes and prints out the spammy code.

Other times, it decodes it randomly, so only some users can see them.

One way to check out how this is triggered is by crawling your site using cURL, a tool that’s available for most Linux installations. If I did the following command, I could see the spam links on my footer section:

curl --no-sessionid --user-agent "Googlebot/2.1 (+http://www.googlebot.com/bot.html)" http://jungleg.com

Steps to solve it

You can try and pinpoint which of the functions is triggering the spam links. In my case, I just did a backup of the blog database and installed a new Wordpress folder from zero, adding the plugins and templates from Wordpress.

It is very important to notify Google about your attack as soon as you can. For me it was too late, my PageRank had gone down from 3 to zero. I wrote a reconsideration request, and even though I haven’t heard back from them, my blog did get back to a PR 2, and most of the spammy content is gone, even though I still see those pesky health ads every so often.

reconsideration

Monitoring: the hard part

spamcheckr-logo-sm

In theory, we would all have to do this monitoring every day, hopefully before the Google bot hits our site. But who has time to issue the cURL command or be looking at his own site’s Google Search results? What if it’s only one of your older posts?

As a developer, I thought this would be a good tool to write and on Saturday I released version 1 of this tool to the blogosphere: it’s called SpamCheckr.

SpamCheckr crawls your site acting as one of a handful bots to surface spammy keywords and will show you the text content the bot sees. Since Saturday 84 people have checked their sites, with at least 2 getting some sort of spam content present.

I will write, as time permits, a second version of the tool that will crawl your blog on a scheduled fashion, and alert via email or SMS if it finds spam — hopefully before Google indexes the content, ruining your hard-earned PageRank and ad revenue.

Have you been infected by blog spam? Tell me your war stories!

Image courtesy of hegarty_david

6 Tweets

{ 1 trackback }

Spam…..Its True – I have been hacked!!!!! » Gremlin’s Fireside Chat
October 15, 2009 at 3:23 pm

{ 11 comments… read them below or add one }

Joel Escobar April 20, 2009 at 10:51 pm

Very cool tool. My only problem is the name. I would have spelled it with an “er”.

I don’t run a blog, but I had a similar situation recently. In the last month or so, my installation of roundcube was compromised. The attacker able to execute code that replaced my index page with a redirect to a phishing bank site. It had been that way for like 10 days before I noticed. This installation is shared by all my clients, but no one complained so I assume that no one had tried using webmail in that time. My wife brought it to my attention when she was trying to use roundcube from her computer and complained that it kept sending her to a weird site. I thought for sure she was infected with some sort of malware. Then I tried from my machine and had the same problem so I knew it wasn’t her computer. I went into panic mode thinking my server was hacked. After some investigation, it turned out my webmail vhost was the only one affected. So I deleted everything and installed the latest version from scratch. The new version is supposed to include some security updates that may or may not have been related to how my installation was compromised.

Reply   More from author

mssmotorrd May 3, 2009 at 7:52 am

It’s the first time I commented here and I must say you share us genuine, and quality information for bloggers! Good job.
p.s. You have a very good template for your blog. Where did you find it?

Reply   More from author

Dave Newman July 23, 2009 at 10:27 am

I just found this same sort of thing on my Wordpress install – version 2.8.2. There was a file in the wp-includes directory called feed-atom2.php and included the Base64_decode for a remote user. I’ve saved the file if you’d like to have it :)

I couldn’t have found the problem without your post. Thank you.

Reply   More from author

Adam Pieniazek September 10, 2009 at 10:24 am

Another good idea is to setup Google alerts for spammy keywords for your domain. Hopefully you can catch it before it starts setting off Google alerts, but if not it’s a nice, free fail safe.

Reply   More from author

Denton Gentry September 12, 2009 at 7:54 am

> “you’ll see the same ads in this post as well, as Google thinks this post is about that”

FYI regarding the health ads on this page, you can use HTML comments to provide hints to Google’s Ad crawler of which portions of the page should be emphasized or de-emphasized. To suppress the health-related keywords, you’d surround the paragraphs with those keywords with the following:

<!– google_ad_section_start(weight=ignore) –>

<!– google_ad_section_end –>

Google’s description of the technique is here:
https://www.google.com/adsense/support/bin/answer.py?hl=en&answer=23168

Reply   More from author

Denton Gentry September 12, 2009 at 8:39 am

> curl –no-sessionid –user-agent “Googlebot/2.1 …

Might the next escalation in the spambot war be for them to start checking not only the user-agent, but also that the IP address resolves back to the google.com domain?

Reply   More from author

Leave a Comment

Additional comments powered by BackType

Previous post:

Next post: