Don't 301 Your Robots.txt

Delegator has a client we'll call the Widget Company. In September 2011 they decided to make a switch from Brand-keyword-stuffed-domain.com to Brand.com AND do a complete site redesign and restructuring. However, being a forward thinking company that relies on a lot of search traffic, they wanted to make the change over as smooth as possible.

Good Intentions

Leading up to the new site launch and domain change over hours were spent manually choosing 301 redirect paths for the category pages and product pages. Then, not wanting to overlook any straggler pages, Widget Co. created a universal URL rewrite rule to cover any pages they might have missed in the manual redirection effort. Everything not already on the list would be automatically redirected from Brand-Keyword-stuffed-domain.com to the shiny new Brand.com homepage.

Robots.txt

Normally, that would be a pretty good strategy. BUT they didn't account for their robots.txt file.

For almost a year now, Brand-keyword-stuffed-domain.com/ robots.txt has been 301 redirecting to Brand.com/robots.txt and Google has been using it!

That means for every change made to the robots.txt file on the new site over the last 9 months the same has been applied to the old domain.

Nightmare Scenario

Why is that so bad you ask? For some reason, during development, there was a need to block the /shop/ directory in robotx.txt on the new site. Sadly, that also happened to be the same directory on the old site that contained....wait for it....all of their product and category pages!

Months later, these same URLs on the old domain, thousands of /shop/ URLs were still in the index. The Googlebot was applying the Robots.txt instructions to the old domain!

Unable to finally die or pass link juice these URLs laid for months, stuck in the oblivion between a 301 redirect and a Robots.txt while organic traffic steadily declined.

Result

All of the painstaking work of manual redirect checking and keeping link juice was lost. Although all of the 301's for these pages worked properly, Googlebot was being stopped via the robots.txt. Therefore, Googlebot could never crawl these pages, discover the 301, pass the link juice, de-index the old site, and rank the new site well.

Moral of the story

  1. Be very careful with blanket redirect rules
  2. The Googlebot will follow a 301 redirect to a Robots.txt on a new domain and apply it to the original domain.

Photo Credit: Mark Strozier