Our blog

Protecting your website from the dreaded scrapers...

While it might be flattering that someone thinks that your content is good enough to steal, it’s not necessarily beneficial for your site’s search engine rankings. Google typically frowns on duplicated content but it’s still surprisingly common to see a ‘scraper’ site outrank the original source domain for a specific content related search.

Search engines are very discerning about duplicate content and rather than showing multiple results for the same content, they’ll instead try to ascertain which version is likely to be the original. The challenge here is to defeat the scrapers and ensure that it’s your (original) page that shows rather than the bogus alternative.

What can I do?

Fortunately there are several things you can do to ensure that your content is prioritised as the original source:

  • Set up a google alert – While this won’t protect you against scraping, it will at least alert you if any of your content is duplicated. Simply set up a google alert using a sentence from your newly published page. Scapers are, by definition, lazy so there’s a good chance that your content will be republished in its entirety. When this happens, you’ll be alerted.
  • Ping the major blogging services - The simplest way to let Google et al. know that you are the source of the content is to alert them to the content as soon as it’s posted. You can find instructions on how to set up these ping alerts at Google Blog Search, Technorati, Yahoo etc. It’s a simple process and one that should only take a few moments.
  • Use a pinging service – alternatively (and even easier) services such as pingomatic take care of the process for you. Simply go along to pingomatic every time your content is updated, enter a few details and the whole process is automated.
  • Finally - If scraping is becoming a real challenge and your content is regularly duplicated, there are a number of tools and services available that will actively block certain ‘bots’ from accessing your site. They’ll blacklist known scrapers and provide obstacles such as captchas for the bots to navigate. If all else fails, services from companies such as Shield Square or distilnetworks will help you to identify and block rogue site visitors.

Scraper networks and domains are still a real issue for the legitimate webmaster and can have a serious impact on your site’s performance, undoing all the good work from your content marketing. Fortunately, with a little vigilance and some timely ‘pinging’ you can protect your site’s content from falling into the hands of the scrapers.