URLs and HTTP: The Basics

This article was published by ComputorEdge, issue #2349, , as the cover article, in both their print edition (on pages 14-15) and their website.

Anyone who has browsed Web pages on the Internet has likely noticed that every different page appears to have a unique corresponding address, usually displayed near the top of their Web browser's window. For instance, the home page of the search engine Google has the address "http://www.google.com/", while the page of their maps tool has the address "http://www.google.com/maps".

That same page can be found using the address "http://maps.google.com/", which suggests that a particular page can have more than one address. But what purpose could that serve? For one thing, there are times when you may want to send a Web address to someone in an email message, but the address is so long that you fear that the recipient's email program would chop up the address, making it unclickable within your message. Online services such as SnipURL and urlSNIP can generate much shorter URLs that will take the recipient to your intended page.

In fact, it works the other way as well: A single Web address can display two or more pages that look dramatically different from one another. In most cases, this is because the page's contents are being generated the moment before they are sent from the website's server to your computer's browser. These so-called dynamic Web pages differ from the more conventional static ones, whose contents do not vary with every viewing, but only until the page itself is changed on the server, to put it simply.

A Location for Every Resource

But for the purposes of this article, I can consider that each Web page has its own address on the Internet, similar to every household having its own postal address. These Web addresses are usually referred to as Uniform Resource Locators (URLs), because the address specifies the location of an Internet resource (such as a Web page or an image) following a uniform, standardized format. Another common term — well, perhaps not that common — is Uniform Resource Identifier (URI), which encompasses resources on the Internet and private intranets. Thus, a URL is a type of URI. RU confused yet?

If so, then you'll understand why most people don't bother with such terms as URI and URN (Uniform Resource Name). Simply bear in mind that a URL is a unique address for something located on the Internet. An example can illustrate the standard format used for URLs. Let's say that an online bookseller, AmazingRiver.com, has a page devoted to a particular title, "Acronym Soup", in their cookbook section. Let's also assume that the firm started off by selling only books, but now they sell just about everything under the South American sun. The URL for the particular book might be http://www.amazingriver.com/books/cooking/AcronymSoup.html.

The first part of the URL, "http", specifies the protocol, or communications standard — in this case, hypertext transfer protocol (HTTP), which is discussed in more detail below. After the "://" separator, you see the website's server name, "www.amazingriver.com". The server is simply the computer program that delivers or "serves" the website's resources to other computers, the "clients", on the Internet. The server name comprises a subdomain, in this case "www", and a second level domain, "amazingriver.com". In the earlier example of "http://maps.google.com/", the subdomain is "maps", and not the most common one, "www".

In our "www.amazingriver.com" example, the second level domain contains a top level domain, namely, "com". Other well-known possibilities include "net", "edu", and "gov". By default, these specify domains within the United States. There are also so-called country domains, from "ac" (Ascension Island) to "zw" (Zimbabwe).

The server name can be followed with a port number, which can range from 0 to 1023, and identifies a network service on any network that uses the Internet Protocol (IP). The port number ties together your computer's request with the appropriate program running on the server. The most common port number is 80, which is used for HTTP traffic; it specifies the Web server as the appropriate program. Try "http://maps.google.com:80/" and you can verify that it works.

Somewhere on AmazingRiver.com's computer network, their technical staff has chosen a particular directory (or "folder", in Windows and Mac parlance) to act as the root of a tree of nested directories. In our example, the Web page AcronymSoup.html is located inside of a directory named "cooking", which is inside — and is thus a subdirectory of — another directory named "books", which is inside the chosen root directory. So the directory path to get from the root directory to the given Web page, is "/books/cooking/AcronymSoup.html".

Fast Words on the Net

Earlier it was noted that the communications protocol used for transmitting Web pages from one computer to another, is known as hypertext transfer protocol (HTTP). To the uninitiated, this terminology might sound like something referring to the overly energetic transfer of words over the Internet. But in this case, "hypertext" has a more specific meaning, dating back to the original vision of being able to link one page of text on a computer network to another.

Back in the middle of the 1960s, Ted Nelson, an American filmmaker and computer scientist, developed the idea of a "docuverse" — a universe of documents — in which all documents can be stored once, with no redundancy, and in which none are ever lost or deleted. More importantly, all of the documents in such a realm would be connected or linked to one another, allowing the reader to jump from one document to another at will, thereby greatly speeding up research efforts. To put it another way, the network of linked documents would be usable with more power than the conventional text documents one would find in a paper-based library.

The European and American scientists who developed the Internet and in turn the World Wide Web, were greatly influenced by Nelson's concept of hypertext. In fact, the original term for a Web link is "hyperlink", revealing that it refers to any word, or phrase, or object on a Web page that uses hypertext to link to another resource on the Web.

As one might expect, HTTP is a technical standard, defined and improved over time, whose primary purpose is to lay the ground rules for establishing a connection between a Web server (such as "www.amazingriver.com", in our example) and your Web browser, and then communicate your actions and the server's responses, back and forth. Your actions would consist of, for instance, tabbing among editable fields, entering text into those fields, and clicking on links and buttons. The Web server's responses would consist of transmitting HTML pages and the resources embedded within them, back to your computer. In essence, HTTP is the language that binds together the World Wide Web.

So the next time you find yourself being crowded at the buffet bar, just begin explaining to everyone within earshot about URLs and HTTP, and pretty soon the buffet will be all yours. Now that's what I call useful knowledge.

Copyright © 2005 Michael J. Ross. All rights reserved.
bad bots block