What a nightmare! I have spent the past couple of days trying to figure out how to get rid of all my link problems on this site and others. The main problem is that other external sites and even my own internal linking is not universal and many times ends up with many different variations for the same page. Google does not like this and neither do web visitors.

Image you write an article and the sub directory is http://www.mydomain.com/category/article/

Well, you and other website might try linking to it in the following ways

http://mydomain.com/category/article/
http://www.mydomain.com/category/article/index.html
http://www.mydomain.com/category/article
http://mydomain.com/category/article

I recently discovered, due to googles webmaster tools, that all these different variations are actually very unique paths and many times can cause your visitors to end up on 404 error pages. I figured this out because I decided to check up on the reports in webmaster tools and I saw a little section that showed me all the pages that were resolving on 404 error pages.

After a quick glance of the 404 pages I realized, all these articles and pages did exist and that external websites were just linking to my pages incorrectly. I did some research and found out how to not only universalize my url system but a way to also fix other webmasters mistakes without having to knock on doors and asking them all to fix their links.

The answer is all in the htaccess file which uses methods like mod_rewrite and the rewriterule and rewritecondition statements. This is by far no tutoral on htaccess or mod_rewrite but just my finds of near little tricks of code that helped me universalize my urls internally and externally so visitors are never sent to 404 error pages

First up make sure you have the following turned on otherwise these neat little tricks will not work

Options +FollowSymLinks
RewriteEngine On

Fixing Links with a missing www

Now lets say for example you want to ensure that www is always used in your url string, whether or not you or other webmasters remember to use it while linking to an article or page. Add the following lines to ensure your www naming convention always remains the same.

RewriteCond %{THE_REQUEST} ^mydomain\.com$ [NC]
RewriteRule ^(.*)$ http://www.mydomain.com/$1 [R=301,L]

Now I have noticed some other people use this same code but with HTTP_HOST instead of THE_REQUEST. Like I said I am no expert on this subject. but I have learned that if you use the HTTP_HOST version the url will only fix itself when linking to the home page and not sub pages. If you want your subpages to fix themselves as well then use the THE_REQUEST version

now if anyone types in

yourdomain.com
yourdomain.com/article1

they domain will always end up at

www.yourdomain.com
www.yourdomain.com/article1

Remove Trailing File Names

Lets say for example you use a CMS such as wordpress or drupal and you use the permalink option. If you write an article and permalinks makes the destination url www.mydomain.com/article1/ and you want to prevent accidental linling of trialing file names such as

www.mydomain.com/article1/index.html
www.mydomain.com/article1/index.php

Then try adding this code

RewriteCond %{THE_REQUEST} \/index.php\ HTTP [NC]
RewriteRule (.*)index.php$ /$1 [R=301,L]
RewriteCond %{THE_REQUEST} \/index.html\ HTTP [NC]
RewriteRule (.*)index.html$ /$1 [R=301,L]

This will remove the trailing file name on the url and redirect the visitor back to the original naming convention

Ensure Trialing Slashes on your url

How about if someone linked to that same article again, but this time forgot the last / sending their traffic to

www.mydomain.com/article1

instead of

www.mydomain.com/article1/

Try adding this code to put the forward slash back onto the end of the url string

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ http://www.mydomain.com/$1/ [R=301,L]

A special note about this code is that its going to make sure the directory /article doesnt really exist and if it does it will not put the trailing slash on the end of the url. Also, if someone does try to access a directory that does exist it will give them a forbidden access page. This will protect all the sub files in any directories that do exist and are accidentally linked to.

Well Lets give all these are try now

This articles url is

http://www.robdogg.com/wordpress/2008/10/16/my-3-htaccess-mod-rewrite-rewriterule-rewritecondition-tricks-and-tips/

Lets mess with it a bit

Lets forget the www
http://robdogg.com/wordpress/2008/10/16/my-3-htaccess-…ricks-and-tipsmy-3-htaccess-mod-rewrite-rewriterule-rewritecondition-tricks-and-tips/

How about droping the trailing slash and forgetting the www
http://robdogg.com/wordpress/2008/10/16/my-3-htaccess-mod-rewrite-rewriterule-rewritecondition-tricks-and-tips

How about forgetting the www and adding index.html to the end of the url
http://robdogg.com/wordpress/2008/10/16/my-3-htaccess-mod-rewrite-rewriterule-rewritecondition-tricks-and-tips/index.html

all these variations should all end up on my new universal naming convention

http://www.robdogg.com/wordpress/2008/10/16/my-3-htaccess-mod-rewrite-rewriterule-rewritecondition-tricks-and-tips/

Well I hope if any of you are having any of my issues that this helps resolve some of them

You Should Also Check Out This Post:

More Active Posts: