SEO Information

How to Prevent Duplicate Content with Effective Use of the Robots.txt and Robots Meta Tag

Duplicate content is one of the problems that we regularly come across as part of the search engine optimization services we offer. If the search engines determine your site contains similar content, this may result in penalties and even exclusion from the search engines. Fortunately it's a problem that is easily rectified.

Your primary weapon of choice against duplicate content can be found within "The Robot Exclusion Protocol" which has now been adopted by all the major search engines.

There are two ways to control how the search engine spiders index your site.

1. The Robot Exclusion File or "robots.txt" and

2. The Robots < Meta > Tag

The Robots Exclusion File (Robots.txt)
This is a simple text file that can be created in Notepad. Once created you must upload the file into the root directory of your website e.g. www.yourwebsite.com/robots.txt. Before a search engine spider indexes your website they look for this file which tells them exactly how to index your site's content.

The use of the robots.txt file is most suited to static html sites or for excluding certain files in dynamic sites. If the majority of your site is dynamically created then consider using the Robots Tag.

Creating your robots.txt file

Example 1 Scenario
If you wanted to make the .txt file applicable to all search engine spiders and make the entire site available for indexing. The robots.txt file would look like this:

User-agent: *
Disallow:

Explanation
The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. By leaving the "Disallow" blank all parts of the site are suitable for indexing.

Example 2 Scenario
If you wanted to make the .txt file applicable to all search engine spiders and to stop the spiders from indexing the faq, cgi-bin the images directories and a specific page called faqs.html contained within the root directory, the robots.txt file would look like this:

User-agent: *
Disallow: /faq/
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /faqs.html

Explanation
The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. Preventing access to the directories is achieved by naming them, and the specific page is referenced directly. The named files & directories will now not be indexed by any search engine spiders.

Example 3 Scenario
If you wanted to make the .txt file applicable to the Google spider, googlebot and stop it from indexing the faq, cgi-bin, images directories and a specific html page called faqs.html contained within the root directory, the robots.txt file would look like this:

User-agent: googlebot
Disallow: /faq/
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /faqs.html

Explanation

By naming the particular search spider in the "User-agent" you prevent it from indexing the content you specify. Preventing access to the directories is achieved by simply naming them, and the specific page is referenced directly. The named files & directories will not be indexed by Google.

That's all there is to it!

As mentioned earlier the robots.txt file can be difficult to implement in the case of dynamic sites and in this case it's probably necessary to use a combination of the robots.txt and the robots tag.

The Robots Tag
This alternative way of telling the search engines what to do with site content appears in the section of a web page. A simple example would be as follows;

In this example we are telling all search engines not to index the page or to follow any of the links contained within the page.

In this second example I don't want Google to cache the page, because the site contains time sensitive information. This can be achieved simply by adding the "noarchive" directive.

What could be simpler!

Although there are other ways of preventing duplicate content from appearing in the Search Engines this is the simplest to implement and all websites should operate either a robots.txt file and or a Robot tag combination.

Should you require further information about our search engine marketing or optimization services please visit us at http://www.e-prominence.co.uk - The search marketing company

MORE RESOURCES:
Unable to open RSS Feed $XMLfilename with error HTTP ERROR: 404, exiting

RELATED ARTICLES

Googles PR System Explained
The complexities of Google's PR (Page Ranking) System have grown more difficult to understand since the Hilltop Algorithm was introduced. This beginner's guide to the PR system explains the basics of what PR is, what it does, and how it affects your site's rankings.

These 7 Back Link Strategies Will Get You a Top Ranking on Google Guaranteed
Google use a very complex function to determine which search results to return, Google is always changing and Modifying that function to better Serve the Search Engine User. The one constant is quality back links.

Google Sitemaps Just Got Better
Having a Google Sitemap just got better! Not only does the search engine company check your sitemap, but now they give you feedback! The new feature was quickly noticed by many users of the service recently as a way for Google to alert the webmaster about possible problem pages that they have had trouble indexing.I had the wonderful experience of getting to know this service first hand when I logged in to Google Sitemaps to check on the status of one of my many sitemaps.

Rock Your Rank With a Dynamite Text Link - Yahoo Directory Explodes Rankings
Last week a client called me excitedly exclaiming that theirGoogle PageRank had jumped a notch and their targeted keywordterm now ranked #23 (up from #45) for their competitive searchphrase. I asked the client if he'd been notified by Yahoo thathis site was now included in the index after we had submittedit three weeks ago.

SEO #3: Getting Listed In Google in Under 24-Hours!
Yesterday you should have read the second course out of 6 courses that will help you get a TOP rank in the search engines and get EXPLOSIVE LASER TARGETED TRAFFIC for Free. Today we move on to course #3 and reveal how to Get Listed In Google in Under 24-Hours! Today is a short course but it's one that you must have been waiting for.

Now More Than Ever...You Need To Optimize Your Site for Search Engines
As the economy begins to recover in certain parts of the world, more and more online marketers are looking for affordable ways to drive qualified traffic to their sites. That's why, now more than ever, you need to optimize your site for the search engines.

Google Traffic Report Card-Does Your Website Pass? Part 1
This is part 1 of a 7 part series that examines the 7 factors of incoming links that Google considers when choosing a spot for your website in it's SERP's.Why incoming links? First because these are what Google places the highest value on.

Search Engine Optimization Techniques
Search engine optimization is the process of increasing the amount of visitors to a website by achieving a high ranking in the search results of a search engine (i.e.

Yahoos Back!
I was all set to write an article predicting the future of search engines, when Yahoo dropped Google and replaced it with its own engine. Now that's big news.

What To Look For When Shopping For A SEO Specialist
When, shopping for a Search Engine Optimization (SEO) company/specialist you need to be aware of a few things.If someone offers you fast results they may be pulling your leg.

Search Engine Keywords - What Do People Search For?
Do you ever wonder how people search for things on the Internet? What if you knew exactly what words they typed when using a search engine? If you're marketing a product or service it's extremely insightful to know what are the most popular search terms relating to whatever you're marketing.The Overture CompanyOverture.

Linking for Traffic not Positioning!
With more and more experts and search engine enthusiastsclaiming the right way and the wrong way to handle linkswapping, link exchanging or reciprocal linking! You can tell something is important when there is more thanone name for it! GRIN! There are also two schools of thought on the reasons linkswapping. The first reason for link swapping has always been to carryfavour with Search engine rankings.

The Latest Craze: Local Search, 7 Steps to Being #1 in Your Local Market
Anyone would agree that it is much easier to be number 1 out of 100 or 500 then 1 Million or 200 Million. With these 7 Steps you should have no problem being number 1 in your Local Market or MarketsA Recent Search on the Term Book Store Yielded over 200,000,000 Million Results in the Yahoo Search engine.

Why Optimize Your Site For Search Engines?
Sometimes a search engine optimization company will miss that glaring question posed by potential clients and assume the benefits of search engine optimization are obvious to everyone. While shelling out a couple thousand on an SEO campaign is common sense to some, others may find it hard to part with the cash unless they know it is an investment in their business that is sure to bring a good return.

Marketing to Search Engines AND Humans
When you were just a young and precocious student of marketing, someone explained to you how to market to humans. "Know your target audience!" said the experts.

Reciprocal Links to Boost Link Popularity ?
Link popularity means the number of incoming links pointing to your website. This is one of the criterical factor that rank the search results.

Google Slavery...Old Habits Die Hard
For the first few months after Yahoo decided to go their own way with natural search (and MSN decided to get serious about the search business), the search results provided by those two could only be described as bizarre. Enough time has now passed that the dust has somewhat settled and there are three main (from a traffic standpoint) sites for quality natural searches.

Search Engines The Masters Of The Internet Universe - Part 3
This is part three of the article series and deals with the paid inclusion and organic search results from search engines.We touched on the topic of Kibitzing in the last part, now we will deal with some of the new concepts in search engine submission namely Paid Inclusion and Sponsored Results.

3.5 Tips To Help You Avoid Becoming The Next Search Engine Outlaw
Tip 1 - Hide And SeekDo not use hidden text on your website. This means having words and phrases in your pages somewhere that can't be seen to the naked eye.

Five FAQ About Google PageRank
Five FAQ about Google PageRank1. What is PageRank and why should I care about it?PageRank is a formula that assigns a value to every page in the Google index.

home | site map | contact us