Chris Pearson is a stud when it comes to all things web, not to mention the guy’s relentless pursuit of wordpress goodness keeps me coming back to his blog again and again.
Yesterday morning, a post from him via DIYThemes.com grabbed my attention. It was all about how to use the Robots Meta Tags option in the very popular theme Thesis. While I dont use Thesis, I still found it extremely interesting and started to investigate its declaration:

One of the biggest problems with WordPress is that it automatically generates different kinds of archive pages that can be indexed by search engines. From date-based archives (daily, monthly, yearly) to tags to categories, these auto-generated pages all contain duplicate content that doesn’t belong in search engines.

Like many times before, he’s exactly right.

Maybe you don’t think its a big deal. Okay, I get that…but think of this:

The job of search engines is to index your site’s pages and determine what your site is about. If you have 100 unique article pages, then ideally, search engines should only have to crawl 100 pages to index your site fully.

However, if you’re using categories, tags, and date-based archives (and depending on how overboard you tend to go with categorization and tagging), then you’re going to have at least one additional page per category, per tag, per month, etc.

Now, instead of having 100 pages to index, you may have 600 pages. Considering you only have 100 pages of unique content, forcing a search engine to index 600 pages to determine what your site is about just doesn’t make any sense.

Think of it this way: Would you rather read a 100-page book or a 600-page book that tells the same story? Further, which one do you think you’ll understand better? Which will hold your focus better?

Wow, now that’s heavy.

Functions Hack

So since its implemented with Thesis so beautifully, and since I love to hack my functions.php file…I figured, I’d do just that. I’d put together a super-tiny function that fixes this, albeit small, issue for my specific customization.

// META ROBOTS functions.php hack for proper indexing by Search Engines

function tq_meta_robots() {
// If you would like to disable the ODP or Yahoo! Directory
// keep the next two lines.
if( is_home() )
echo '<meta name="robots" content="index,follow,archive,noodp,noydir" />';
if( is_archive() || is_tax() ) 
echo '<meta name="robots" content="index,nofollow,noarchive" />';
}
add_action('wp_head','tq_meta_robots');

There…that’ll do it.

The first part checks to see if its the home page (since “noodp” and “noydir” are domain specific settings), then sets the proper search engine robot setting to allow indexing, link following, archiving and prevents search engines from pulling extra information regarding your site from the ODP or DMOZ and the Yahoo! Directory.
The second part checks to see if the page is any archive type page (Category, Tag, Author and Date based pages are all types of Archives), or any taxonomy related pages; and sets the information for those types of pages. This sets the search engine robot setting to allow indexing, tells it that links are to NOT be followed, and to not archive the page.

Done and Done.


TQuizzle

TQuizzle.com is my place to talk about whatever. I really don't post here much anymore what with all the other domains I own, but I'm putting it up on GitHub so that it can be seen for archive puposes.