5 Most Useful Applications of the Htaccess File
The.htaccess file can be best understood as a per-directory server configuration file. It is stored in the server's root folder (".../htdocs") or any of its subdirectories. The file contains directives (server commands), which apply to resources served out of the directory in which the file is stored, and allow users to override the server's main configuration file.
Below we have compiled a list of 5 most common tasks that can be easily implemented using the.htaccess file to improve the functionality and SEO-compatibility of your website. These include password protection, URL redirection, custom error pages, IP access control, and setting an alternative index file.
URL Redirects
Perhaps the most powerful use of the.htaccess file from the SEO perspective is the Redirect directive. The directive can be used to redirect the client to not just a different file on the same server but also to a different website. Redirecting can help navigate visitors to the right content after website reorganization or changes to file names. In addition, the Redirect command can help solve the problem of duplicate content, which could otherwise adversely affect your website's SEO. It can be implemented as follows:
1. Redirect old_url new_url
The Redirect directive simply maps a URL onto a new one. But in terms of redirecting traffic, the directive is just a drop in the ocean. If you are looking for something more powerful and versatile then you should try out the URL Rewriting Engine, or mod_rewrite, which is a standard Apache module that employs PCRE regular-expression parser, allowing you to rewrite URL requests dynamically. One application of the Rewriting Engine is to redirect users accessing your domain with the leading "www.", for example www(dot)yourwebsite(dot)com, to an address without the "www.", i.e. yourwebsite(dot)com. Here is a code from the.htaccess file shipped with Drupal 7, a popular open-source CMS, which achieves just that:
1. RewriteEngine on
2. RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
3. RewriteRule ^ http(colon)//%1%{REQUEST_URI} [L,R=301]
If you would like to find more about the Rewriting Engine visit Apache's website on the mod_rewrite module.
Custom Error Pages
Another great use of the.htaccess file is to specify custom error pages. When a server returns a HTTP error (e.g. 404 - page not found), whether due to a broken link, site reorganization, or other reasons, you might wish to show visitors your own error page instead of relying on the default page provided by your host. The custom page could be then made significantly more informative, thus improving user experience and allowing you to control the professional feel and behaviour of your website. In addition, if the site is run by a CGI program, or other dynamically generated page, you could capture their referrer, making it possible to identify who has bad links to your site.
In order to set a custom error page use the following directive:
1. ErrorDocument ErrorNumber /error(dot)html
ErrorNumber is an optional argument that allows you to specify a HTTP response status code. The most common error types include: 401 - Unauthorized, 400 - Bad request, 403 - Forbidden, 500 - Internal Server Error, 404 - Not Found. The last argument specifies the path to the error document that you want visitors to see when encountering the HTTP error.
Password Protection
You can protect directories on your server with password authorization by following two simple steps. Firstly, you need to add the following lines to your.htaccess file:
1. AuthName Name of Protected Directory
2. AuthType Basic
3. AuthUserFile /path/to/.htpasswd
4. Require valid-user
AuthName defines the realm in which the users' names and passwords are valid, which is simply the part of the website that you want to protect. AuthType specifies the type of authorization control. Until recently, Basic was the only possible type and is most commonly supported by browsers. Require is a key directive that tells the server to apply password checking. Its argument, valid-user, specifies that only users listed in the.htpasswd file (see next paragraph) can be authorized to access the protected directory.
AuthUserFile relates to the second step of the password protection. It specifies the URI path to the.htpasswd file, which you will need to create. Although you can store the.htpasswd file anywhere on the server, it is strongly advised that you place the file outside of the web root directory. The.htpasswd file contains lines of the format:
1. username:password
Each username is followed by a colon and an encrypted password. To find the encrypted version of your chosen password you can use on the many tools available on the Internet, such as 4webhelp password tool. The protected part of you website can be then accessed via:
http(colon)//username:password@www(dot)website(dot)com/name_of_protected_directory/
Note, however, that the password in the above URL is the actual password of the username and not its encrypted version that is saved in the.htpasswd file. It is also possible to use custom scripts that will embed the password authorization form on your website.
Denying and Allowing IP addresses
You can easily control access to your site content via the allow/deny directive, which allows you to set IP addresses that should be granted/denied access to your directory. For example, if you wish to restrict a certain directory containing documents internal to your organization, you might want to restrict access to hosts within your own network. This could be accomplished with the following code directives, placed in a.htaccess file in that directory.
1. Deny from all
2. Allow from yourorganization(dot)com
3. Order Deny,Allow
In the above code the Deny directive specifies that all users should be denied access. The second line applies the Allow directive to yourcopmany(dot)com, thus granting access to all users accessing the directory form that domain. Although the above example uses a machine name, you could just as well type an IP address, e.g. "Allow from 172.16.254.1".
Finally, the last line uses the Order command to control the order in which the foregoing directives are applied. In the above case the server first applies the Deny directive (preventing all addresses from accessing the content), and only then grants access from yourorganization(dot)com via the Allow directive. If the order was reversed then all visitors would be denied access to the server. One thing to keep in mind is that the argument for the Order directive must be a single word and cannot contain any white space characters.
Alternative index file
Whenever a client opens a URL that points to a directory containing the index.html file, Apache automatically returns it to the client. This behaviour can be changed so that you can specify an alternative index file for a directory. For example if you build your site using PHP you might wish to use the index(dot)php file instead of the default index.html if the two happen to be stored in the same directory.
Alternative index files can be input as a list. The server will start from left to right, checking to see whether each file exists. If it does the server triggers the script and if it does not it looks up the next file.
1. DirectoryIndex index.php main.php index.html index.htm default.html
In the above case, if none of the listed files is found in the directory the default behaviour of the server is often to show a full list of all files in that directory. To override this you can add the following line to the.htaccess file:
1. Options -Indexes
Caveats - Performance and Security
Finally, a word of advice: the use of the.htaccess file does not come without a cost. The downsides primarily occur in two forms: loss in performance and security issues.
When the server is configured to ignore.htaccess files, finding out what directive should be applied to a particular directory when a file is being requested relies on the information provided in the main configuration file, which is parsed at start-up and stored in memory. This process can be very quick because it basically involves only checking for references to the affected directory or file.
If the use of the.htaccess file is enabled, the server is forced to perform an additional process. Apache has to check for a.htaccess file for each directory along the path of the requested resource. This is because.htaccess files apply not only to the directory they are stored in, but also to all subdirectories. Hence, in order to determine what directives have to be applied to a particular directory, you have to check all parent directories up to and including the root. As this additional process obviously slows down the server, performance-wise it is preferable to set AllowOverride to none (that's how you disable the use of.htaccess) in the Apache's main configuration file and put any directory-specific configuration settings inside of the sections.
There are also concerns about security that stem from the fact that you allow multiple users with access to directories on your server to override main configurations. An example of potential security risk would be a case where a user accidentally (or out of malice) makes changes in the behaviour of files that can lead to a denial of service, e.g. by making files load with incorrect MIME types, redirecting content to other locations, or other configuration changes. Thus, you have to always ensure that only trusted users are granted rights to override server configurations.
Below we have compiled a list of 5 most common tasks that can be easily implemented using the.htaccess file to improve the functionality and SEO-compatibility of your website. These include password protection, URL redirection, custom error pages, IP access control, and setting an alternative index file.
URL Redirects
Perhaps the most powerful use of the.htaccess file from the SEO perspective is the Redirect directive. The directive can be used to redirect the client to not just a different file on the same server but also to a different website. Redirecting can help navigate visitors to the right content after website reorganization or changes to file names. In addition, the Redirect command can help solve the problem of duplicate content, which could otherwise adversely affect your website's SEO. It can be implemented as follows:
1. Redirect old_url new_url
The Redirect directive simply maps a URL onto a new one. But in terms of redirecting traffic, the directive is just a drop in the ocean. If you are looking for something more powerful and versatile then you should try out the URL Rewriting Engine, or mod_rewrite, which is a standard Apache module that employs PCRE regular-expression parser, allowing you to rewrite URL requests dynamically. One application of the Rewriting Engine is to redirect users accessing your domain with the leading "www.", for example www(dot)yourwebsite(dot)com, to an address without the "www.", i.e. yourwebsite(dot)com. Here is a code from the.htaccess file shipped with Drupal 7, a popular open-source CMS, which achieves just that:
1. RewriteEngine on
2. RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
3. RewriteRule ^ http(colon)//%1%{REQUEST_URI} [L,R=301]
If you would like to find more about the Rewriting Engine visit Apache's website on the mod_rewrite module.
Custom Error Pages
Another great use of the.htaccess file is to specify custom error pages. When a server returns a HTTP error (e.g. 404 - page not found), whether due to a broken link, site reorganization, or other reasons, you might wish to show visitors your own error page instead of relying on the default page provided by your host. The custom page could be then made significantly more informative, thus improving user experience and allowing you to control the professional feel and behaviour of your website. In addition, if the site is run by a CGI program, or other dynamically generated page, you could capture their referrer, making it possible to identify who has bad links to your site.
In order to set a custom error page use the following directive:
1. ErrorDocument ErrorNumber /error(dot)html
ErrorNumber is an optional argument that allows you to specify a HTTP response status code. The most common error types include: 401 - Unauthorized, 400 - Bad request, 403 - Forbidden, 500 - Internal Server Error, 404 - Not Found. The last argument specifies the path to the error document that you want visitors to see when encountering the HTTP error.
Password Protection
You can protect directories on your server with password authorization by following two simple steps. Firstly, you need to add the following lines to your.htaccess file:
1. AuthName Name of Protected Directory
2. AuthType Basic
3. AuthUserFile /path/to/.htpasswd
4. Require valid-user
AuthName defines the realm in which the users' names and passwords are valid, which is simply the part of the website that you want to protect. AuthType specifies the type of authorization control. Until recently, Basic was the only possible type and is most commonly supported by browsers. Require is a key directive that tells the server to apply password checking. Its argument, valid-user, specifies that only users listed in the.htpasswd file (see next paragraph) can be authorized to access the protected directory.
AuthUserFile relates to the second step of the password protection. It specifies the URI path to the.htpasswd file, which you will need to create. Although you can store the.htpasswd file anywhere on the server, it is strongly advised that you place the file outside of the web root directory. The.htpasswd file contains lines of the format:
1. username:password
Each username is followed by a colon and an encrypted password. To find the encrypted version of your chosen password you can use on the many tools available on the Internet, such as 4webhelp password tool. The protected part of you website can be then accessed via:
http(colon)//username:password@www(dot)website(dot)com/name_of_protected_directory/
Note, however, that the password in the above URL is the actual password of the username and not its encrypted version that is saved in the.htpasswd file. It is also possible to use custom scripts that will embed the password authorization form on your website.
Denying and Allowing IP addresses
You can easily control access to your site content via the allow/deny directive, which allows you to set IP addresses that should be granted/denied access to your directory. For example, if you wish to restrict a certain directory containing documents internal to your organization, you might want to restrict access to hosts within your own network. This could be accomplished with the following code directives, placed in a.htaccess file in that directory.
1. Deny from all
2. Allow from yourorganization(dot)com
3. Order Deny,Allow
In the above code the Deny directive specifies that all users should be denied access. The second line applies the Allow directive to yourcopmany(dot)com, thus granting access to all users accessing the directory form that domain. Although the above example uses a machine name, you could just as well type an IP address, e.g. "Allow from 172.16.254.1".
Finally, the last line uses the Order command to control the order in which the foregoing directives are applied. In the above case the server first applies the Deny directive (preventing all addresses from accessing the content), and only then grants access from yourorganization(dot)com via the Allow directive. If the order was reversed then all visitors would be denied access to the server. One thing to keep in mind is that the argument for the Order directive must be a single word and cannot contain any white space characters.
Alternative index file
Whenever a client opens a URL that points to a directory containing the index.html file, Apache automatically returns it to the client. This behaviour can be changed so that you can specify an alternative index file for a directory. For example if you build your site using PHP you might wish to use the index(dot)php file instead of the default index.html if the two happen to be stored in the same directory.
Alternative index files can be input as a list. The server will start from left to right, checking to see whether each file exists. If it does the server triggers the script and if it does not it looks up the next file.
1. DirectoryIndex index.php main.php index.html index.htm default.html
In the above case, if none of the listed files is found in the directory the default behaviour of the server is often to show a full list of all files in that directory. To override this you can add the following line to the.htaccess file:
1. Options -Indexes
Caveats - Performance and Security
Finally, a word of advice: the use of the.htaccess file does not come without a cost. The downsides primarily occur in two forms: loss in performance and security issues.
When the server is configured to ignore.htaccess files, finding out what directive should be applied to a particular directory when a file is being requested relies on the information provided in the main configuration file, which is parsed at start-up and stored in memory. This process can be very quick because it basically involves only checking for references to the affected directory or file.
If the use of the.htaccess file is enabled, the server is forced to perform an additional process. Apache has to check for a.htaccess file for each directory along the path of the requested resource. This is because.htaccess files apply not only to the directory they are stored in, but also to all subdirectories. Hence, in order to determine what directives have to be applied to a particular directory, you have to check all parent directories up to and including the root. As this additional process obviously slows down the server, performance-wise it is preferable to set AllowOverride to none (that's how you disable the use of.htaccess) in the Apache's main configuration file and put any directory-specific configuration settings inside of the sections.
There are also concerns about security that stem from the fact that you allow multiple users with access to directories on your server to override main configurations. An example of potential security risk would be a case where a user accidentally (or out of malice) makes changes in the behaviour of files that can lead to a denial of service, e.g. by making files load with incorrect MIME types, redirecting content to other locations, or other configuration changes. Thus, you have to always ensure that only trusted users are granted rights to override server configurations.
Source...