What a nightmare! I have spent the past couple of days trying to figure out how to get rid of all my link problems on this site and others. The main problem is that other external sites and even my own internal linking is not universal and many times ends up with many different variations for the same page. Google does not like this and neither do web visitors.
Image you write an article and the sub directory is http://www.mydomain.com/category/article/
Well, you and other website might try linking to it in the following ways
http://mydomain.com/category/article/
http://www.mydomain.com/category/article/index.html
http://www.mydomain.com/category/article
http://mydomain.com/category/article
I recently discovered, due to googles webmaster tools, that all these different variations are actually very unique paths and many times can cause your visitors to end up on 404 error pages. I figured this out because I decided to check up on the reports in webmaster tools and I saw a little section that showed me all the pages that were resolving on 404 error pages.
After a quick glance of the 404 pages I realized, all these articles and pages did exist and that external websites were just linking to my pages incorrectly. I did some research and found out how to not only universalize my url system but a way to also fix other webmasters mistakes without having to knock on doors and asking them all to fix their links.
The answer is all in the htaccess file which uses methods like mod_rewrite and the rewriterule and rewritecondition statements. This is by far no tutoral on htaccess or mod_rewrite but just my finds of near little tricks of code that helped me universalize my urls internally and externally so visitors are never sent to 404 error pages
First up make sure you have the following turned on otherwise these neat little tricks will not work
Options +FollowSymLinks
RewriteEngine On
Fixing Links with a missing www
Now lets say for example you want to ensure that www is always used in your url string, whether or not you or other webmasters remember to use it while linking to an article or page. Add the following lines to ensure your www naming convention always remains the same.
RewriteCond %{THE_REQUEST} ^mydomain\.com$ [NC]
RewriteRule ^(.*)$ http://www.mydomain.com/$1 [R=301,L]
Now I have noticed some other people use this same code but with HTTP_HOST instead of THE_REQUEST. Like I said I am no expert on this subject. but I have learned that if you use the HTTP_HOST version the url will only fix itself when linking to the home page and not sub pages. If you want your subpages to fix themselves as well then use the THE_REQUEST version
now if anyone types in
yourdomain.com
yourdomain.com/article1
they domain will always end up at
www.yourdomain.com
www.yourdomain.com/article1
Remove Trailing File Names
Lets say for example you use a CMS such as wordpress or drupal and you use the permalink option. If you write an article and permalinks makes the destination url www.mydomain.com/article1/ and you want to prevent accidental linling of trialing file names such as
www.mydomain.com/article1/index.html
www.mydomain.com/article1/index.php
Then try adding this code
RewriteCond %{THE_REQUEST} \/index.php\ HTTP [NC]
RewriteRule (.*)index.php$ /$1 [R=301,L]
RewriteCond %{THE_REQUEST} \/index.html\ HTTP [NC]
RewriteRule (.*)index.html$ /$1 [R=301,L]
This will remove the trailing file name on the url and redirect the visitor back to the original naming convention
Ensure Trialing Slashes on your url
How about if someone linked to that same article again, but this time forgot the last / sending their traffic to
www.mydomain.com/article1
instead of
www.mydomain.com/article1/
Try adding this code to put the forward slash back onto the end of the url string
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ http://www.mydomain.com/$1/ [R=301,L]
A special note about this code is that its going to make sure the directory /article doesnt really exist and if it does it will not put the trailing slash on the end of the url. Also, if someone does try to access a directory that does exist it will give them a forbidden access page. This will protect all the sub files in any directories that do exist and are accidentally linked to.
Well Lets give all these are try now
This articles url is
Lets mess with it a bit
Lets forget the www
http://robdogg.com/wordpress/2008/10/16/my-3-htaccess-…ricks-and-tipsmy-3-htaccess-mod-rewrite-rewriterule-rewritecondition-tricks-and-tips/
How about droping the trailing slash and forgetting the www
http://robdogg.com/wordpress/2008/10/16/my-3-htaccess-mod-rewrite-rewriterule-rewritecondition-tricks-and-tips
How about forgetting the www and adding index.html to the end of the url
http://robdogg.com/wordpress/2008/10/16/my-3-htaccess-mod-rewrite-rewriterule-rewritecondition-tricks-and-tips/index.html
all these variations should all end up on my new universal naming convention
Well I hope if any of you are having any of my issues that this helps resolve some of them
You Should Also Check Out This Post:
- Affiliate Income
- Bum Marketing
- Expectations And Assumptions When It Comes To Friends, Relationships And Marriages
- Back To Blogging And Revamping With A New Direction
- I Quit Smoking Not Blogging
More Active Posts:
- 7 High Page Rank Blogs That Dofollow (47)
- This Is What Real Spam On StumbleUpon Looks Like (18)
- StumbleUpon Series Part 1 - A Commitment To Mastering StumbleUpon (9)
- John Chow Posts Bad Advice For AdWords (9)
- Researching The StumbleUpon Authority System (9)
- Contest For My 4 Ad Spots On My Blog (9)
- StumbleUpon Series Part 3 - Where Does All That Traffic Come From? (8)
- StumbleUpon Series Part 5 - Making The Right Friends In The Right Places (8)
- Internet Marketing Acronyms - The Definitive Guide (8)
- Adwords Video Tutorial - Google Campaign Structuring And Keyword Clustering (8)




No User Responded In This Article
Leave Your Comment Below