Tuesday, April 28, 2009

How Spyware Works

Has your computer ever become so slow that you can fix yourself a snack in the time it takes your word processor to open? Perhaps spyware is to blame.

Spyware is a category of computer programs that attach themselves to your operating system in nefarious ways. They can suck the life out of your computer's processing power. They're designed to track your Internet habits, nag you with unwanted sales offers or generate traffic for their host Web site. According to some estimates, more than 80 percent of all personal computers are infected with some kind of spyware [source: FaceTime Communications]. But before you chuck your computer out the window and move to a desert island, you might want to read on. In this article we'll explain how spyware gets installed on your computer, what it does there and how you can get rid of it.


Some people mistake spyware for a computer virus. A computer virus is a piece of code designed to replicate itself as many times as possible, spreading from one host computer to any other computers connected to it. It usually has a payload that may damage your personal files or even your operating system.

Spyware, on the other hand, generally isn't designed to damage your computer. Spyware is defined broadly as any program that gets into your computer without your permission and hides in the background while it makes unwanted changes to your user experience. The damage it does is more a by-product of its main mission, which is to serve you targeted advertisements or make your browser display certain sites or search results.

At present, most spyware targets only the Windows operating system. Some of the more notorious spyware threats include Trymedia, Nuvens, Estalive, Hotbar and New.Net.Domain.Plugin

How Your Computer Gets Spyware

Spyware usually ends up on your machine because of something you do, like clicking a button on a pop-up window, installing a software package or agreeing to add functionality to your Web browser. These applications often use trickery to get you to install them, from fake system alert messages to buttons that say "cancel" when they really install spyware.Here are some of the general ways in which spyware finds its way into your computer:

#Piggybacked software installation - Some applications -- particularly peer-to-peer file-sharing clients -- will install spyware as a part of their standard installation procedure. If you don't read the installation list closely, you might not notice that you're getting more than the file-sharing application you want. This is especially true of the "free" versions that are advertised as alternatives to software you have to buy. As the old saying goes, there's no such thing as a free lunch.



#Drive-by download - This is when a Web site or pop-up window automatically tries to download and install spyware on your machine. The only warning you might get would be your browser's standard message telling you the name of the software and asking if it's okay to install it. If your security settings are set low enough, you won't even get the warning.



#Browser add-ons - These are pieces of software that add enhancements to your Web browser, like a toolbar, animated pal or additional search box. Sometimes, these really do what they say they'll do but also include elements of spyware as part of the deal. Or sometimes they are nothing more than thinly veiled spyware themselves. Particularly nasty add-ons are considered browser hijackers -- these embed themselves deeply in your machine and take quite a bit of work to get rid of.




#Masquerading as anti-spyware -- This is one of the cruelest tricks in the book. This type of software convinces you that it's a tool to detect and remove spyware.



When you run the tool, it tells you your computer is clean while it installs additional spyware of its own.

What Spyware Can Do

Spyware can do any number of things once it's installed on your computer.
At a minimum, most spyware runs as an application in the background as soon as you start your computer up, hogging RAM and processor power. It can generate endless pop-up ads that make your Web browser so slow it becomes unusable. It can reset your browser's home page to display an ad every time you open it. Some spyware redirects your Web searches, controlling the results you see and making your search engine practically useless. It can also modify the dynamically linked libraries (DLLs) your computer uses to connect to the Internet, causing connectivity failures that are hard to diagnose. At its very worst, spyware can record the words you type, your Web browsing history, passwords and other private information.

Certain types of spyware can modify your Internet settings so that if you connect through dial-up service, your modem dials out to expensive, pay telephone numbers. Like a bad guest, some spyware changes your firewall settings, inviting in more unwanted pieces of software. There are even some forms that are smart enough to know when you try to remove them in the Windows registry and intercept your attempts to do so.

The point of all this from the spyware makers' perspective isn't always clear. One reason it's used is to pad advertisers' Web traffic statistics. If they can force your computer to show you tons of pop-up ads and fake search results, they can claim credit for displaying that ad to you over and over again. And each time you click the ad by accident, they can count that as someone expressing interest in the advertised product.

Another use of spyware is to steal affiliate credits. Major shopping sites like Amazon and eBay offer credit to a Web site that successfully directs traffic to their item pages. Certain spyware applications capture your requests to view sites like Amazon and eBay and then take the credit for sending you there.


Other "Ware"

Malware -- a general term for any program that makes changes (does malicious or "bad" things) without your express permission
Adware -- programs designed specifically to deliver unrequested advertising
Stealware -- specific spyware designed to capture clicks or Web-site referral credits
Browser hijacker -- a malicious program that becomes deeply embedded in your browser's

Legality

So is it legal to install difficult-to-remove software without the user's permission? Not really. There's an increasing body of state legislation that explicitly bans spyware, including the Spyware Control Act in Utah and the Consumer Protection Against Computer Spyware Act in California. But even without these new state laws, federal law already prohibits spyware. The Computer Fraud and Abuse Act covers any unauthorized software installations. Deceptive trade practices of any kind also violate the Federal Trade Commission Act. Additionally, the Electronic Communications Privacy Act makes it unlawful for companies to violate the security of customers' personal information.

Just like anti-spam legislation, these spyware laws can be very difficult to enforce in practice, and the perpetrators know it. It can be tough to find hard evidence connecting individual companies to their spyware products, and, as with all Internet-related lawsuits, there are often battles over which court's jurisdiction applies to the case. Just because it's illegal doesn't mean it's easy to stop.


How can you protect yourself against spyware, and what can you do if you think you already have some on your computer? Here are a few suggestions.

Use a spyware scanner.
There are several applications you can turn to for trustworthy spyware detection and removal, including Ad-aware, Spybot and Microsoft AntiSpyware, which is currently in beta. All three are free for the personal edition. These work just like your anti-virus software and can provide active protection as well as detection. They will also detect Internet cookies and tell you which sites they refer back to.

Note - Once you know which spyware is on your computer, in some cases you'll need to seek specific instructions on how to remove it. Links to some of those instructions are listed in the "Spyware Help" box to the right, and more are included in the Lots More Information section at the end of this article. Here are a few more solutions:

Use a pop-up blocker.
Many of the current browsers, including Internet Explorer 6.0 and Mozilla Firefox 1.0, have the ability to block all Web sites from serving you pop-up windows. This function can be configured to be on all of the time or to alert you each time a site wants to pop up a new window. It can also tell you where the pop-up is coming from and selectively allow windows from trusted sources.

Disable Active-X.
Most browsers have security settings in their preferences which allow you to specify which actions Web sites are allowed to take on your machine. Since many spyware applications take advantage of a special code in Windows called Active-X, it's not a bad idea to simply disable Active-X on your browser. Note that if you do this, you will also disallow the legitimate uses for Active-X, which may interfere with the functionality of some Web sites.

Be suspicious of installing new software.
In general, it pays to be suspicious when a site asks to install something new on your computer. If it's not a plug-in you recognize, like Flash, QuickTime or the latest Java engine, the safest plan of action is to reject the installation of new components unless you have some specific reason to trust them. Today's Web sites are sophisticated enough that the vast majority of functionality happens inside your browser, requiring only a bare minimum of standard plug-ins. Besides, it never hurts to reject the installation first and see if you can get on without it. A trustworthy site will always give you the opportunity to go back and download a needed component later.

Use the "X" to close pop-up windows.
Get to know what your computer's system messages look like so that you can spot a fake. It's usually pretty easy to tell the difference once you get to know the standard look of your system alerts. Stay away from the "No thanks" buttons if you can help it, and instead close the window with the default "X" at the corner of the toolbar. For an even more reliable option, use the keystroke combination for "close window" built into your software. You can look in your browser's "File" menu to find it.

Friday, April 17, 2009

How Shared Computing Works

Imagine that you've been assigned the task of pushing a very heavy car up a hill. You're allowed to recruit people who aren't doing anything else to help you move the car. You've got two choices: You can look around for one person big and strong enough to do it all by him or herself, or you could grab several average people to push together. While you might eventually find someone large enough to push the car alone, most of the time it will be easier to just gather a group of average-sized people. It might sound strange, but shared computer systems use the same principle.
When a computational problem is really complex, it can take a single computer a long time to process it -- millions of days, in some cases. Even supercomputers have processing limitations. They're also rare and expensive. Many research facilities require a lot of computational power, but don't have access to a supercomputer. For these organizations, shared computing is often an attractive alternative to supercomputers.



Shared computing is a kind of high-performance computing. A shared computing system is a network of computers that work together to accomplish a specific task. Each computer donates part of its processing power -- and sometimes other resources -- to help achieve a goal. By networking thousands of computers together, a shared computing system can equal or even surpass the processing power of a supercomputer.


Most of the time, your computer isn't using all of its computational resources. There are other times when you might have your computer on, but aren't actually using it. A shared computing system takes advantage of these resources that otherwise would remain unused.


Shared computing systems are great for certain complex problems, but aren't useful for others. They can be complicated to design and administer. While several computer scientists are working on a way to standardize shared computing systems, many existing systems rely on unique hardware, software and architecture.

Shared Computing Systems

In a traditional high-performance computing system, all the computers are the same model and run on the same operating system. Much of the time, every application run on the system has its own dedicated server. Sometimes the entire network relies on hardwired connections, meaning all the elements in the system connect to each other through various hubs. The entire system is efficient and elegant.


A shared computing system can be just as efficient, but it doesn't necessarily look very elegant. A shared computing system is limited only by the software it relies upon to connect computers together. With the right software, a shared computing system can work on different kinds of computers running on different operating systems. Network connections might exist over hardwired networks, local area networks (LANs), wireless area networks (WANs) or the Internet. The biggest advantage a shared computing system has over traditional HPC systems is that it's easier to add more resources to a shared computing system. Anyone with a computer capable of running the system's software can join.

The system's software is what gives it access to each computer's unused processing power. Every computer connected to the system must have this software installed in order to participate. There's no definitive shared computing software kit, but in general the software must do the following:

Contact the system's administrative server to get a chunk of data
Monitor the host computer's CPU usage and utilize the processing power whenever it's available
Send analyzed data back to the administrative server in exchange for new data

Shared computing systems have a relatively narrow use. They're great for solving big computational problems that scientists can break down into smaller sections. If breaking the problem into smaller chunks is particularly simple, it's called an embarrassingly parallel problem.


For small computational problems or problems that aren't easy to break up, shared computing systems are less useful. The whole point of the system is to decrease the amount of time it takes to finish complex calculations. It won't necessarily increase the speed of simple calculations across the network.

Shared Computing Architecture

Unlike grid computing systems -- which in theory can have as many network interface points as there are users -- a shared computing system usually only has a few points of control. That's because most shared computing systems have specific purposes and aren't general utilities.

It's useful to imagine a typical shared computing system as having a front end and a back end. On the front end are all the computers that are volunteering CPU resources to the project. On the back end are the computers and servers that manage the overall project, divide the main task into smaller chunks, communicate with the computers on the front end and store the information the front end computers send after completing an analysis.

Virtual Servers

Some shared computing systems use virtual servers. To create virtual servers, an engineer installs special software on a single, physical server. The software divides the server into multiple exclusive platforms, each of which can run an operating system independently of the others. Why do this? Just as the average computer owner rarely uses all of his or her computer's processing power, it's rare for the average server to work at full capacity. Using virtual servers means that a single physical server runs closer to its full potential and reduces the need for additional hardware.


In general, the job of dividing up the computational problem into smaller chunks falls to a program on a back end computer, usually a server. This computer uses specific software to divide up the task into smaller pieces that are easier for an average computer system to manage. When contacted by the companion software installed on a front end computer, the server will send data over the network for analysis. Upon receiving a completed analysis job, the server will direct the data to an appropriate database.


The system's administrators will usually use another computer to piece completed analyses together. The end goal is to come to a solution of a very large problem by solving it in tiny bits. In many cases, the system's administrators will publish the results so that others can benefit from the information.


If this architecture description seems a little vague, it's because there's no single way to create and administer a shared computing system. Each system has its own unique software and architecture. In most cases, a programmer customizes the software for the specific system's goals. While two different shared computer systems might work the same way in general, once you dig down into details, they can look very different.

Shared Computing Applications

There are dozens of active shared computing system projects, each with its own networks and computational tasks. Some of these networks overlap -- it's possible for a user to participate in more than one network, though it does mean that different projects have to divvy up the idle resources. As a result, each individual task takes a little longer.

One example of a shared computer system is the Grid Laboratory of Wisconsin (GLOW). The University of Wisconsin-Madison uses GLOW for multiple projects, which in some ways sets it apart from most shared computing systems. One project uses the GLOW network to study the human genome. Another takes advantage of GLOW's resources to research potential treatments for cancer. Unlike the shared computing systems that are dedicated to a single task, GLOW can accommodate multiple projects.



The software that makes GLOW possible is called Condor. It's Condor's job to seek out idle processors within the GLOW network and use them to work on individual projects. When one project is inactive, Condor borrows its resources for the other projects. However, if any previously inactive project comes back online, Condor releases the respective computers' processors.

Concerns About Shared Computing
Any time a system allows one computer access to another computer's resources, questions come up about safety and privacy. What stops the program's administrators from snooping around a particular user's computer? If the administrators can tap into CPU power, can they also access files and sensitive data?


The simple answer to this question is that it depends on the software the participating computer has to install to be part of the system. Everything a shared computing system can do with an individual computer depends upon that software application. Most of the time, the software doesn't allow anyone direct access to the contents on the host computer. Everything is automated, and only the CPU's processing power is accessible.


There are exceptions, though. A zombie computer system or botnet is an example of a malicious shared computing system. Headed by a hacker, a zombie computer system turns innocent computer owners into victims. First, the victim must install specific software on his or her computer before a hacker can access it. Usually, such a software application is disguised as a harmless program. Once installed, the hacker can access the victim's computer to perform malicious tasks like a direct denial of service (DDoS) attack or send out massive amounts of spam. A botnet can span hundreds or thousands of computers, all without the victims being aware of what's going on.


Shared computing systems also need a plan in place for the times when a particular computer goes offline or otherwise becomes unavailable for an extended time. Most systems have a procedure in place that puts a time limit on each task. If the participant's computer doesn't complete the task in a certain amount of time, the control server will cancel that computer's task and assign the task to a new computer.


One criticism of shared computing is that while it capitalizes on idle processors, it increases power consumption and heat output. As computers use more of their processing power, they require more electricity. Some shared computing system administrators urge participants to leave their computers on all the time so that the system has constant access to resources. Sometimes a shared computing system initiative comes into conflict with green initiatives, which emphasize energy conservation.


Perhaps the biggest criticism of shared computing systems is that they aren't comprehensive enough. While they pool processing power resources together, they don't take advantage of other resources like storage. For that reason, many organizations are looking at implementing grid computing systems, which take advantage of more resources and allow a larger variety of applications to leverage networks.


Are shared computing systems the future, or will grid computing systems take their place? As both models become more commonplace, we'll see which system wins out.

Sunday, April 5, 2009

How Internet Infrastructure Works

One of the greatest things about the Internet is that nobody really owns it. It is a global collection of networks, both big and small. These networks connect together in many different ways to form the single entity that we know as the Internet. In fact, the very name comes from this idea of interconnected networks.

Since its beginning in 1969, the Internet has grown from four host computer systems to tens of millions. However, just because nobody owns the Internet, it doesn't mean it is not monitored and maintained in different ways. The Internet Society, a non-profit group established in 1992, oversees the formation of the policies and protocols that define how we use and interact with the Internet.


The Internet: Computer Network Hierarchy

Every computer that is connected to the Internet is part of a network, even the one in your home. For example, you may use a modem and dial a local number to connect to an Internet Service Provider (ISP). At work, you may be part of a local area network (LAN), but you most likely still connect to the Internet using an ISP that your company has contracted with. When you connect to your ISP, you become part of their network. The ISP may then connect to a larger network and become part of their network. The Internet is simply a network of networks.

Most large communications companies have their own dedicated backbones connecting various regions. In each region, the company has a Point of Presence (POP). The POP is a place for local users to access the company's network, often through a local phone number or dedicated line. The amazing thing here is that there is no overall controlling network. Instead, there are several high-level networks connecting to each other through Network Access Points or NAPs.

Internet Network Example

Here's an example. Imagine that Company A is a large ISP. In each major city, Company A has a POP. The POP in each city is a rack full of modems that the ISP's customers dial into. Company A leases fiber optic lines from the phone company to connect the POPs together (see, for example, this UUNET Data Center Connectivity Map).
Imagine that Company B is a corporate ISP. Company B builds large buildings in major cities and corporations locate their Internet server machines in these buildings. Company B is such a large company that it runs its own fiber optic lines between its buildings so that they are all interconnected.

In this arrangement, all of Company A's customers can talk to each other, and all of Company B's customers can talk to each other, but there is no way for Company A's customers and Company B's customers to intercommunicate. Therefore, Company A and Company B both agree to connect to NAPs in various cities, and traffic between the two companies flows between the networks at the NAPs.

In the real Internet, dozens of large Internet providers interconnect at NAPs in various cities, and trillions of bytes of data flow between the individual networks at these points. The Internet is a collection of huge corporate networks that agree to all intercommunicate with each other at the NAPs. In this way, every computer on the Internet connects to every other.

The Function of an Internet Router

All of these networks rely on NAPs, backbones and routers to talk to each other. What is incredible about this process is that a message can leave one computer and travel halfway across the world through several different networks and arrive at another computer in a fraction of a second!
The routers determine where to send information from one computer to another. Routers are specialized computers that send your messages and those of every other Internet user speeding to their destinations along thousands of pathways. A router has two separate, but related, jobs:

It ensures that information doesn't go where it's not needed. This is crucial for keeping large volumes of data from clogging the connections of "innocent bystanders."
It makes sure that information does make it to the intended destination.
In performing these two jobs, a router is extremely useful in dealing with two separate computer networks. It joins the two networks, passing information from one to the other. It also protects the networks from one another, preventing the traffic on one from unnecessarily spilling over to the other. Regardless of how many networks are attached, the basic operation and function of the router remains the same. Since the Internet is one huge network made up of tens of thousands of smaller networks, its use of routers is an absolute necessity. For more information, read How Routers Work.

Internet Backbone

The National Science Foundation (NSF) created the first high-speed backbone in 1987. Called NSFNET, it was a T1 line that connected 170 smaller networks together and operated at 1.544 Mbps (million bits per second). IBM, MCI and Merit worked with NSF to create the backbone and developed a T3 (45 Mbps) backbone the following year.
Backbones are typically fiber optic trunk lines. The trunk line has multiple fiber optic cables combined together to increase the capacity. Fiber optic cables are designated OC for optical carrier, such as OC-3, OC-12 or OC-48. An OC-3 line is capable of transmitting 155 Mbps while an OC-48 can transmit 2,488 Mbps (2.488 Gbps). Compare that to a typical 56K modem transmitting 56,000 bps and you see just how fast a modern backbone is.

Today there are many companies that operate their own high-capacity backbones, and all of them interconnect at various NAPs around the world. In this way, everyone on the Internet, no matter where they are and what company they use, is able to talk to everyone else on the planet. The entire Internet is a gigantic, sprawling agreement between companies to intercommunicate freely.

Internet Protocol: IP Addresses

Every machine on the Internet has a unique identifying number, called an IP Address. The IP stands for Internet Protocol, which is the language that computers use to communicate over the Internet. A protocol is the pre-defined way that someone who wants to use a service talks with that service. The "someone" could be a person, but more often it is a computer program like a Web browser.
A typical IP address looks like this:


216.27.61.137
To make it easier for us humans to remember, IP addresses are normally expressed in decimal format as a dotted decimal number like the one above. But computers communicate in binary form. Look at the same IP address in binary:


11011000.00011011.00111101.10001001
The four numbers in an IP address are called octets, because they each have eight positions when viewed in binary form. If you add all the positions together, you get 32, which is why IP addresses are considered 32-bit numbers. Since each of the eight positions can have two different states (1 or zero), the total number of possible combinations per octet is 28 or 256. So each octet can contain any value between zero and 255. Combine the four octets and you get 232 or a possible 4,294,967,296 unique values!

Out of the almost 4.3 billion possible combinations, certain values are restricted from use as typical IP addresses. For example, the IP address 0.0.0.0 is reserved for the default network and the address 255.255.255.255 is used for broadcasts.

The octets serve a purpose other than simply separating the numbers. They are used to create classes of IP addresses that can be assigned to a particular business, government or other entity based on size and need. The octets are split into two sections: Net and Host. The Net section always contains the first octet. It is used to identify the network that a computer belongs to. Host (sometimes referred to as Node) identifies the actual computer on the network. The Host section always contains the last octet. There are five IP classes plus certain special addresses. You can learn more about IP classes at What is an IP address?.

Internet Protocol: Domain Name System

When the Internet was in its infancy, it consisted of a small number of computers hooked together with modems and telephone lines. You could only make connections by providing the IP address of the computer you wanted to establish a link with. For example, a typical IP address might be 216.27.22.162. This was fine when there were only a few hosts out there, but it became unwieldy as more and more systems came online.

The first solution to the problem was a simple text file maintained by the Network Information Center that mapped names to IP addresses. Soon this text file became so large it was too cumbersome to manage. In 1983, the University of Wisconsin created the Domain Name System (DNS), which maps text names to IP addresses automatically.

URL: Uniform Resource Locator

When you use the Web or send an e-mail message, you use a domain name to do it. For example, the Uniform Resource Locator (URL) "http://www.xxx.com" contains the domain name howstuffworks.com. So does this e-mail address: example@xxxx.com. Every time you use a domain name, you use the Internet's DNS servers to translate the human-readable domain name into the machine-readable IP address. Check out How Domain Name Servers Work for more in-depth information on DNS.
Top-level domain names, also called first-level domain names, include .COM, .ORG, .NET, .EDU and .GOV. Within every top-level domain there is a huge list of second-level domains. For example, in the .COM first-level domain there is:


Yahoo
Microsoft
Every name in the .COM top-level domain must be unique. The left-most word, like www, is the host name. It specifies the name of a specific machine (with a specific IP address) in a domain. A given domain can, potentially, contain millions of host names as long as they are all unique within that domain.

DNS servers accept requests from programs and other name servers to convert domain names into IP addresses. When a request comes in, the DNS server can do one of four things with it:

It can answer the request with an IP address because it already knows the IP address for the requested domain.
It can contact another DNS server and try to find the IP address for the name requested. It may have to do this multiple times.
It can say, "I don't know the IP address for the domain you requested, but here's the IP address for a DNS server that knows more than I do."
It can return an error message because the requested domain name is invalid or does not exist.

How Web Servers Work

Have you ever wondered about the mechanisms that delivered this page to you? Chances are you are sitting at a computer right now, viewing this page in a browser. So, when you clicked on the link for this page, or typed in its URL (uniform resource locator), what happened behind the scenes to bring this page onto your screen?



If you've ever been curious about the process, or have ever wanted to know some of the specific mechanisms that allow you to surf the Internet, then read on. In this article, you will learn how Web servers bring pages into your home, school or office. Let's get started!

The Basic Process

Let's say that you are sitting at your computer, surfing the Web, and you get a call from a friend who says, "I just read a great article! Type in this URL and check it out. It's at xxxx site" So you type that URL into your browser and press return. And magically, no matter where in the world that URL lives, the page pops up on your screen.

At the most basic level possible, the following diagram shows the steps that brought that page to your screen:

Your browser formed a connection to a Web server, requested a page and received it.




Behind the Scenes

If you want to get into a bit more detail on the process of getting a Web page onto your computer screen, here are the basic steps that occurred behind the scenes:
The browser broke the URL into three parts:
The protocol ("http")
The server name ("www.xxxx.com")
The file name ("xx--xx.htm")

The browser communicated with a name server to translate the server name "www.xxxxx.com" into an IP Address, which it uses to connect to the server machine.

The browser then formed a connection to the server at that IP address on port 80.

Following the HTTP protocol, the browser sent a GET request to the server, asking for the file "http://www.xxxxx.com/xxxx.htm." (Note that cookies may be sent from browser to server with the GET request -- see How Internet Cookies Work for details.)

The server then sent the HTML text for the Web page to the browser. (Cookies may also be sent from server to browser in the header for the page.)

The browser read the HTML tags and formatted the page onto your screen.
If you've never explored this process before, that's a lot of new vocabulary. To understand this whole process in detail, you need to learn about IP addresses, ports, protocols...

The Internet

So what is "the Internet"? The Internet is a gigantic collection of millions of computers, all linked together on a computer network. The network allows all of the computers to communicate with one another. A home computer may be linked to the Internet using a phone-line modem, DSL or cable modem that talks to an Internet service provider (ISP). A computer in a business or university will usually have a network interface card (NIC) that directly connects it to a local area network (LAN) inside the business. The business can then connect its LAN to an ISP using a high-speed phone line like a T1 line. A T1 line can handle approximately 1.5 million bits per second, while a normal phone line using a modem can typically handle 30,000 to 50,000 bits per second.
ISPs then connect to larger ISPs, and the largest ISPs maintain fiber-optic "backbones" for an entire nation or region. Backbones around the world are connected through fiber-optic lines, undersea cables or satellite links (see An Atlas of Cyberspaces for some interesting backbone maps). In this way, every computer on the Internet is connected to every other computer on the Internet.



Clients and Servers

In general, all of the machines on the Internet can be categorized as two types: servers and clients. Those machines that provide services (like Web servers or FTP servers) to other machines are servers. And the machines that are used to connect to those services are clients. When you connect to Yahoo! at www.yahoo.com to read a page, Yahoo! is providing a machine (probably a cluster of very large machines), for use on the Internet, to service your request. Yahoo! is providing a server. Your machine, on the other hand, is probably providing no services to anyone else on the Internet. Therefore, it is a user machine, also known as a client. It is possible and common for a machine to be both a server and a client, but for our purposes here you can think of most machines as one or the other.
A server machine may provide one or more services on the Internet. For example, a server machine might have software running on it that allows it to act as a Web server, an e-mail server and an FTP server. Clients that come to a server machine do so with a specific intent, so clients direct their requests to a specific software server running on the overall server machine. For example, if you are running a Web browser on your machine, it will most likely want to talk to the Web server on the server machine. Your Telnet application will want to talk to the Telnet server, your e-mail application will talk to the e-mail server, and so on...

IP Addresses

To keep all of these machines straight, each machine on the Internet is assigned a unique address called an IP address. IP stands for Internet protocol, and these addresses are 32-bit numbers, normally expressed as four "octets" in a "dotted decimal number." A typical IP address looks like this:

216.27.61.137

The four numbers in an IP address are called octets because they can have values between 0 and 255, which is 28 possibilities per octet.

Every machine on the Internet has a unique IP address. A server has a static IP address that does not change very often. A home machine that is dialing up through a modem often has an IP address that is assigned by the ISP when the machine dials in. That IP address is unique for that session -- it may be different the next time the machine dials in. This way, an ISP only needs one IP address for each modem it supports, rather than for each customer.

If you are working on a Windows machine, you can view a lot of the Internet information for your machine, including your current IP address and hostname, with the command WINIPCFG.EXE (IPCONFIG.EXE for Windows 2000/XP). On a UNIX machine, type nslookup at the command prompt, along with a machine name, like www.howstuffworks.com -- e.g. "nslookup www.GOOGLE.com" -- to display the IP address of the machine, and you can use the command hostname to learn the name of your machine. (For more information on IP addresses, see IANA.)

As far as the Internet's machines are concerned, an IP address is all you need to talk to a server. For example, in your browser, you can type the URL http://209.116.69.66 and arrive at the machine that contains the Web server for GOOGLE. On some servers, the IP address alone is not sufficient, but on most large servers it is .


Domain Names

Because most people have trouble remembering the strings of numbers that make up IP addresses, and because IP addresses sometimes need to change, all servers on the Internet also have human-readable names, called domain names. For example, www.xxxxx.com is a permanent, human-readable name. It is easier for most of us to remember www.howstuffworks.com than it is to remember 209.116.69.66.
The name www.xxxxx.com actually has three parts:

The host name ("www")
The domain name ("xxxxx")
The top-level domain name ("com")

Domain names within the ".com" domain are managed by the registrar called VeriSign. VeriSign also manages ".net" domain names. Other registrars (like RegistryPro, NeuLevel and Public Interest Registry) manage the other domains (like .pro, .biz and .org). VeriSign creates the top-level domain names and guarantees that all names within a top-level domain are unique. VeriSign also maintains contact information for each site and runs the "whois" database. The host name is created by the company hosting the domain. "www" is a very common host name, but many places now either omit it or replace it with a different host name that indicates a specific area of the site. For example, in encarta.msn.com, the domain name for Microsoft's Encarta encyclopedia, "encarta" is designated as the host name instead of "www."

Name Servers

A set of servers called domain name servers (DNS) maps the human-readable names to the IP addresses. These servers are simple databases that map names to IP addresses, and they are distributed all over the Internet. Most individual companies, ISPs and universities maintain small name servers to map host names to IP addresses. There are also central name servers that use data supplied by VeriSign to map domain names to IP addresses.
If you type the URL "http://www.xxxx.com/xxxx.htm" into your browser, your browser extracts the name "www.xxxx.com," passes it to a domain name server, and the domain name server returns the correct IP address for www.xxxx.com. A number of name servers may be involved to get the right IP address. For example, in the case of www.xxxx.com, the name server for the "com" top-level domain will know the IP address for the name server that knows host names, and a separate query to that name server, operated by the xxxx ISP, may deliver the actual IP address for the HowStuffWorks server machine.

On a UNIX machine, you can access the same service using the nslookup command. Simply type a name like "www.xxxx.com" into the command line, and the command will query the name servers and deliver the corresponding IP address to you.

So here it is: The Internet is made up of millions of machines, each with a unique IP address. Many of these machines are server machines, meaning that they provide services to other machines on the Internet. You have heard of many of these servers: e-mail servers, Web servers, FTP servers, Gopher servers and Telnet servers, to name a few. All of these are provided by server machines.

Ports

Any server machine makes its services available to the Internet using numbered ports, one for each service that is available on the server. For example, if a server machine is running a Web server and an FTP server, the Web server would typically be available on port 80, and the FTP server would be available on port 21. Clients connect to a service at a specific IP address and on a specific port.
Each of the most well-known services is available at a well-known port number. Here are some common port numbers:

echo 7
daytime 13
qotd 17 (Quote of the Day)
ftp 21
telnet 23
smtp 25 (Simple Mail Transfer, meaning e-mail)
time 37
nameserver 53
nicname 43 (Who Is)
gopher 70
finger 79
WWW 80

If the server machine accepts connections on a port from the outside world, and if a firewall is not protecting the port, you can connect to the port from anywhere on the Internet and use the service. Note that there is nothing that forces, for example, a Web server to be on port 80. If you were to set up your own machine and load Web server software on it, you could put the Web server on port 918, or any other unused port, if you wanted to. Then, if your machine were known as xxx.yyy.com, someone on the Internet could connect to your server with the URL http://xxx.yyy.com:918. The ":918" explicitly specifies the port number, and would have to be included for someone to reach your server. When no port is specified, the browser simply assumes that the server is using the well-known port 80.

Protocols

Once a client has connected to a service on a particular port, it accesses the service using a specific protocol. The protocol is the pre-defined way that someone who wants to use a service talks with that service. The "someone" could be a person, but more often it is a computer program like a Web browser. Protocols are often text, and simply describe how the client and server will have their conversation.
Perhaps the simplest protocol is the daytime protocol. If you connect to port 13 on a machine that supports a daytime server, the server will send you its impression of the current date and time and then close the connection. The protocol is, "If you connect to me, I will send you the date and time and then disconnect." Most UNIX machines support this server. If you would like to try it out, you can connect to one with the Telnet application. In UNIX, the session would look like this:


%telnet web67.ntx.net 13
Trying 216.27.61.137...
Connected to web67.ntx.net.
Escape character is '^]'.
Sun Oct 25 08:34:06 1998
Connection closed by foreign host.

On a Windows machine, you can access this server by typing "telnet web67.ntx.net 13" at the MSDOS prompt.

In this example, web67.ntx.net is the server's UNIX machine, and 13 is the port number for the daytime service. The Telnet application connects to port 13 (telnet naturally connects to port 23, but you can direct it to connect to any port), then the server sends the date and time and disconnects. Most versions of Telnet allow you to specify a port number, so you can try this using whatever version of Telnet you have available on your machine.

Most protocols are more involved than daytime and are specified in Request for Comment (RFC) documents that are publicly available (see http://sunsite.auc.dk/RFC/ for a nice archive of all RFCs). Every Web server on the Internet conforms to the HTTP protocol, summarized nicely in The Original HTTP as defined in 1991. The most basic form of the protocol understood by an HTTP server involves just one command: GET. If you connect to a server that understands the HTTP protocol and tell it to "GET filename," the server will respond by sending you the contents of the named file and then disconnecting. Here's a typical session:


%telnet www.xxxxx.com 80
Trying 216.27.61.137...
Connected to xxxx.com.
Escape character is '^]'.
GET http://www.xxxxxx.com/


Welcome to xxxxx
...


Connection closed by foreign host.

In the original HTTP protocol, all you would have sent was the actual filename, such as "/" or "/xxxx.htm." The protocol was later modified to handle the sending of the complete URL. This has allowed companies that host virtual domains, where many domains live on a single machine, to use one IP address for all of the domains they host. It turns out that hundreds of domains are hosted on 209.116.69.66 -- the xxxx IP address.

Putting It All Together

Now you know a tremendous amount about the Internet. You know that when you type a
URL into a browser, the following steps occur:

The browser breaks the URL into three parts:

The protocol ("http")
The server name ("www.xxxx.com")
The file name ("xxxx.htm")

The browser communicates with a name server to translate the server name, "www.xxxxx.com," into an IP address, which it uses to connect to that server machine.

The browser then forms a connection to the Web server at that IP address on port 80.

Following the HTTP protocol, the browser sends a GET request to the server, asking for the file "http://www.xxxx.com/xxxx.htm." (Note that cookies may be sent from browser to server with the GET request -- see How Internet Cookies Work for details.)

The server sends the HTML text for the Web page to the browser. (Cookies may also be sent from server to browser in the header for the page.)

The browser reads the HTML tags and formats the page onto your screen.

***Security

You can see from this description that a Web server can be a pretty simple piece of software. It takes the file name sent in with the GET command, retrieves that file and sends it down the wire to the browser. Even if you take into account all of the code to handle the ports and port connections, you could easily create a C program that implements a simple Web server in less than 500 lines of code. Obviously, a full-blown enterprise-level Web server is more involved, but the basics are very simple.

Most servers add some level of security to the serving process. For example, if you have ever gone to a Web page and had the browser pop up a dialog box asking for your name and password, you have encountered a password-protected page. The server lets the owner of the page maintain a list of names and passwords for those people who are allowed to access the page; the server lets only those people who know the proper password see the page. More advanced servers add further security to allow an encrypted connection between server and browser, so that sensitive information like credit card numbers can be sent on the Internet.

That's really all there is to a Web server that delivers standard, static pages. Static pages are those that do not change unless the creator edits the page.

***Dynamic Pages

But what about the Web pages that are dynamic? For example:
Any guest book allows you to enter a message in an HTML form, and the next time the guest book is viewed, the page will contain the new entry.

The whois form at Network Solutions allows you to enter a domain name on a form, and the page returned is different depending on the domain name entered.

Any search engine lets you enter keywords on an HTML form, and then it dynamically creates a page based on the keywords you enter.