The robots.txt file, a product of the Robots Exclusion Protocol
, is a file stored on a website’s root directory (e.g., www.google.com/robots.txt). The robots.txt file gives instructions to automated web crawlers visiting your site, including search crawlers.
By using robots.txt, webmasters can indicate to search engines which areas of a site they would like to disallow bots from crawling, as well as indicate the locations of sitemap files and crawl-delay parameters. You can read more details about this at the robots.txt Knowledge Center page.
The following commands are available:
Prevents compliant robots from accessing specific pages or folders.
Indicates the location of a website’s sitemap or sitemaps.
Indicates the speed (in milliseconds) at which a robot can crawl a server.
An Example of Robots.txt#Robots.txt www.example.com/robots.txtUser-agent: *Disallow:# Don’t allow spambot to crawl any pagesUser-agent: spambotdisallow: /sitemap:www.example.com/sitemap.xml
Warning: Not all web robots follow robots.txt. People with bad intentions (e.g., e-mail address scrapers) build bots that don’t follow this protocol; and in extreme cases they can use it to identify the location of private information. For this reason, it is recommended that the location of administration sections and other private sections of publicly accessible websites not be included in the robots.txt file. Instead, these pages can utilize the meta robots tag (discussed next) to keep the major search engines from indexing their high-risk content.
Remember how links act as votes
? The rel=nofollow attribute allows you to link to a resource, while removing your “vote” for search engine purposes. Literally, “nofollow” tells search engines not to follow the link, although some engines still follow them to discover new pages. These links certainly pass less value (and in most cases no juice) than their followed counterparts, but are useful in various situations
where you link to an untrusted source.
An Example of nofollow<a href=”http://www.example.com” title=”Example” rel=”nofollow”>Example Link</a>In the example above, the value of the link would not be passed to example.com as the rel=nofollow attribute has been added.
An Example of rel=”canonical” for the URL http://example.com/default.asp<html>
<title>The Best Webpage on the Internet</title>
<link rel=”canonical” href=”http://www.example.com”> </head>
Often, two or more copies of the exact same content appear on your website under different URLs. For example, the following URLs can all refer to a single homepage:
To search engines, these appear as five separate pages. Because the content is identical on each page, this can cause the search engines to devalue the content and its potential rankings.
The canonical tag solves this problem by telling search robots which page is the singular, authoritative version that should count in web results.
Search Engine Tools
Google Webmaster Tools
Geographic Target – If a given site targets users in a particular location, webmasters can provide Google with information that will help determine how that site appears in its country-specific search results, and also improve Google search results for geographic queries.
Preferred Domain – The preferred domain is the one that a webmaster would like used to index their site’s pages. If a webmaster specifies a preferred domain as http://www.example.com and Google finds a link to that site that is formatted as http://example.com, Google will treat that link as if it were pointing at http://www.example.com.
URL Parameters – You can indicate to Google information about each parameter on your site, such as “sort=price” and “sessionid=2“. This helps Google crawl your site more efficiently.
Crawl Rate – The crawl rate affects the speed (but not the frequency) of Googlebot’s requests during the crawl process.
Malware – Google will inform you if it has found any malware on your site. Malware creates a bad user experience, and hurts your rankings.
Crawl Errors – If Googlebot encounters significant errors while crawling your site, such as 404s, it will report these.
HTML Suggestions – Google looks for search engine-unfriendly HTML elements such as issues with meta descriptions and title tags.
Your Site on the Web
Statistics provided by search engine tools offer unique insight to SEOs, like keyword impressions, click-through rates, top pages delivered in search results, and linking statistics.
This important section allows you to submit sitemaps, test robots.txt files, adjust sitelinks
, and submit change of address requests when you move your website from one domain to another. This area also contains the Settings and URL parameters sections discussed in the previous column.
When users share your content on Google+ with the +1 button, this activity is often annotated in search results. Watch this illuminating video on Google+
to understand why this is important. In this section, Google Webmaster Tools reports the effect of +1 sharing on your site’s performance in search results.
The Labs section of Webmaster Tools contains reports that Google considers still in the experimental stage, but which can nonethelsss be useful to webmasters. One of the most important of these reports is Site Performance, which indicates how fast or slow your site loads for visitors.
Bing Webmaster Center
Sites Overview – This interface provides a single overview of all your websites’ performance in Bing powered search results. Metrics at a glance include clicks, impressions, pages indexed, and number of pages crawled for each site.
Crawl Stats – Here you can view reports on how many pages of your site Bing has crawled and discover any errors encountered. Like Google Webmaster Tools, you can also submit sitemaps to help Bing to discover and prioritize your content.
Index – This section allows webmasters to view and help control how Bing indexes their web pages. Again, similar to settings in Google Webmaster Tools, here you can explore how your content is organized within Bing, submit URLs, remove URLs from search results, explore inbound links, and adjust parameter settings.
Traffic – The traffic summary in Bing Webmaster Center reports impressions and click-through data by combining data from both Bing and Yahoo! search results. Reports here show average position as well as cost estimates if you were to buy ads targeting each keyword.
Search engines have only recently started providing better tools to help webmasters improve their search results. This is a big step forward in SEO and the webmaster/search engine relationship. That said, the engines can only go so far to help webmasters. It is true today, and will likely be true in the future, that the ultimate responsibility for SEO lies with marketers and webmasters.
It is for this reason that learning SEO for yourself is so important.