The HTTP Protocol and Redirection
In the midst of discussions with others about web development in general, I realized that many developers do not have a firm grasp of how redirection on the web really works. My hope is that individuals who want to learn, but are afraid to ask will read this post and come away with a better understanding of the HTTP protocol and redirection.
Protocols
A protocol is nothing but a set of rules that are followed by two independent parties. For example, if I walk by you in the hallway, and we’ve met before, I might say, “Hello.” At that point, you might say the same thing in return, or possibly, “Hi.” You interpreted my initial input “hello,” and responded with “hi.” That exchange could be described as a protocol.
With regard to technology, a protocol is used to communicate between computers across a network. There are many established protocols already in existence. Many of them are documented in detail by the Internet Engineering Task Force. These documents are called RFCs. Examples of protocol RFCs are:
We make extensive use of these protocols every day – most of the time using 3 or 4 at once. On most systems, when you enter a URL in a web browser you make use of HTTP, which uses TCP, which uses IP, which uses Ethernet.
Hypertext Transfer Protocol
Browsers send and receive data from web servers. The HTTP protocol is an application-layer language that helps browsers and web servers transfer data across the Internet. Two important components of this language are request methods and status codes.
Request Methods
Request methods are verbs for client/server communication over HTTP. They dictate what action is going to be imposed on a given resource. Most people are familiar with GET and POST. There are several others too:
HEAD– LikeGETbut only returns headers, not the resource.PUT– Uploads a representation of the resource.DELETE– Deletes the resource.TRACE– Echos back the received request.OPTIONS– Returns HTTP methods the server supports.CONNECT– Converts the request connection to a transparent TCP/IP tunnel.
Of these, HEAD, GET, and POST are probably the most important and are implemented by almost all web servers.
Status Codes
Status codes are adjectives for client/server communication over HTTP. They give us additional information about a given resource. There are more status codes than methods, but they can be categorized into five categories:
1xx– Informational2xx– Success3xx– Redirection4xx– Client error5xx– Server error
When things are going well, 200 is common. When things are not going to great, we usually see 404 or 500. With regard to redirection, special attention must be paid to 301 vs. 302. 301 implies Moved Permanently, while 302 translates into Found, or a temporary redirect.
Redirection
In general, a common goal for web content providers is to get users the resource they requested, and if that is not available, provide them with some meaningful information so that they can adjust accordingly. There are several redirection techniques available to make web pages accessible via multiple URLs. Server-side techniques typically prevail because they allow us to make use of the HTTP status codes addressed above. Additionally, they give us the ability to make redirection completely transparent to the requestor.
Server-side Scripting
Most server-side scripting languages allow you to append HTTP headers to a response before the response body. In raw PHP, this looks something like:
<?php header('Location: http://www.google.com', true, 301); ?>
The code above redirects users to the absolute URL http://www.google.com with a HTTP status code of 301. Most of the time, if you don’t specify a status code to the server-side language, it will default to a temporary redirect, 302.
mod_rewrite
mod_rewrite is an Apache web server module that includes a regular expression based rewriting engine to modify requested URLs on the fly. A de facto standard, it is used primarily by web site maintainers when they are trying to create persistent or preserve existing URLs. mod_rewrite can generate HTTP redirects and produce completely transparent server-side redirection. The directive below listens for requests containing puppy.cfm and executes smalldog.aspx, while preserving puppy.cfm in the user’s address bar. This is done completely on the server-side, and uses no HTTP status codes.
RewriteRule ^puppy.cfm smalldog.aspx
The directive below reaches a similar end result, except that redirection is done via a HTTP 301 status code and smalldog.aspx replaces puppy.cfm in the address bar:
RewriteRule ^puppy.cfm smalldog.aspx [R=301,L]
Refresh Meta Tag, JavaScript, Frames
These methods are client-side workarounds put in place to get around not having control over status codes. Along with not being as full-featured, client-side redirection solutions can distort browser navigation history, and have a negative impact on how search engines view and index your web site. A sound redirection strategy is aware of client-side techniques, but makes use of server-side scripting languages or URL rewriting engines whenever possible.