2. c ◌ How web browsers find web servers

Web browsers and web servers talk to each other over the Internet. It is the client - the browser - which initiates the exchange. What triggers a browser to do so is the user who enters a URL into the address bar, for example: http://www.example.com.

The Internet is a network of millions of servers distributed all over the world. One of this million of servers serves the www.example.com website. How does your browser connect to exactly this one server?

The first thing the browser needs to do to in its quest to find the right server is to translate the domain name you gave it - www.example.com - into an actual Internet address, which is a number, not a name.

To do so, the browser utilizes an Internet component which is not a part of the World Wide Web itself, but plays a crucial role in making it more comfortable to use: the Domain Name System, or DNS.

You can think of the DNS as a huge telephone book. You know the name of a person, and you use it to look up their telephone number. The DNS offers the same service for browsers - the browser queries the DNS with a domain name like www.example.com, and the DNS tells the browser the Internet address.

You can also query the Domain Name System yourself, from the command line of your virtual Linux box.

To get to a command line, you need to start a Terminal application. To do so, simply click on the Ubuntu symbol in the upper left corner of your Linux desktop. In the search dialogue which opens, type terminal.

You will get a list of three or four applications as the search result - simply click on the first result which is name “Terminal”, the one with the black and white icon of a window with a >_ symbol in it.

A new application window opens which provides a text-based interface - a command line.

At the so-called prompt, which probably looks like this:

yourname@ubuntu:~$

you are able to type commands which the Linux system then executes.

One of this commands is called dig. We can use dig to ask the Domain Name System for the Internet address of www.example.com like so:

yourname@ubuntu:~$ dig www.example.com A

Enter this command, and then hit enter. With this, you start the command line application dig with two parameters: the name for which to look up information from the DNS, and the type of information you want to look up, in this case A, which stands for Internet Protocol address version 4, or IPv4 address, or, as version 4 is still the predominant Internet Protocol version, simply IP address.

As a result, your terminal window will output a text block similar to this one:

; <<>> DiG 9.10.3-P4-Ubuntu <<>> www.example.com A
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51563
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.example.com.       IN  A

;; ANSWER SECTION:
www.example.com.    7157    IN  A   93.184.216.34

;; Query time: 12 msec
;; SERVER: 127.0.1.1#53(127.0.1.1)
;; WHEN: Tue May 30 08:30:17 CEST 2017
;; MSG SIZE  rcvd: 60

While there is a lot of additional information here which we will ignore for now, we also get the information we asked for, on line 13: The IPv4 address of www.example.com is 93.184.216.34.

When you ask your browser to open www.example.com, it does the same - it tries to retrieve this numeric address via the DNS.

It doesn’t start the dig command in the background - that command is for human users. Instead, it issues the query directly, using its own program code. The details of this do not matter to us.

You can sometimes even see the browser doing this step - especially if your internet connection is not the fastest, you can see for a brief moment a status message in the lower left corner of the browser window, stating something similar to “Looking up www.example.com…”.

So, you have entered the address www.example.com into the address bar of your browser, you hit enter, and your browser uses the Domain Name System to look up the IP address of the server system which serves the www.example.com website.

How computers on the Internet establish a network connection

What happens next?

Using the IP address it looked up, the browser can now attempt to establish a network connection to the target server.

To do so, the browser uses the mechanisms of the Internet Protocol, whose job is to make network connections between computer systems possible. The most important mechanism for this is called routing. Routing is the process of creating a path between a source node on the Internet (your computer) and a target node on the Internet (the computer system with IP address 93.184.216.34, in our example).

The beauty of this routing mechanism is that while your computer needs to know the exact target address it wants to talk to, it doesn’t need to bother how to get its data to this target address.

Again, a metaphor from real life comes to mind.

Let’s assume you want to write a letter to, say, the European Organization for Nuclear Research in Switzerland, also known as CERN. In order to do so, you need a target postal address, and you need to know the next post station where you can mail the letter - and nothing more. The postal system takes care of the rest, and the exact route of the letter and the intermediary post stations involved is neither known to you nor important to you. You can rest assured that the system will route your letter correctly to its destination.

With Internet Protocol routing, it’s the same: Your computer knows the target address, plus it knows the address of its own “next” Internet “station” - this is called the default gateway. When you connect your computer to the Wifi box in your home, the IP address of the Wifi box becomes the default gateway of your computer, and a server system from your Internet provider is in turn the default gateway of your Wifi box.

By delivering data to this default gateway, the data packet starts to travel through the Internet towards its destination through many different systems - or “nodes” - on the Internet, bringing it closer to its target with each step.

Each node simply handing over the data packet to its one “next” node of course isn’t enough, because that would only work if all nodes, including your source and target node, were connected serially on the Internet. But the Internet is a network of many nodes interconnected with each other.

Thus, for routing to be useful, several nodes on the Internet have multiple “next nodes” configured, and depending on the IP address of the target system, will decide to route the data packet in one direction or the other:

                                                                   NodeJ  
                                                                    ^     
                                                                   /      
                                                                  /       
                             NodeC  -->  NodeG  -->  NodeI  -->  NodeK
                              ^            \                                 
                             /              \                                
                            /                \                               
                           /                  v                              
Source  -->  NodeA  -->  NodeB              NodeH                           
                          /\
                         /  \
                        /    \
                       v      \
                     NodeD     v
                             NodeE  -->  Target
                                         (93.184.216.34)

The above diagram is meant to be a very simplified illustration of the logical structure of a very tiny part of the Internet.

The Source node could be your computer, which has NodeA (probably your DSL router) as its default gateway. When your computer tries to reach the node with IP 93.184.216.34, it has no choice but to hand over data to the only node it knows, NodeA. NodeA also only has one default gateway (probably a system operated by your Internet provider), NodeB.

NodeB, however, has routes to multiple other nodes, NodeC, NodeD and NodeE, and it also knows which of these nodes is the best next hop for a data packet addressed to 93.184.216.34.

NodeE, then, has a direct route to the target node, and can deliver the data packet.

Thus, Source -> A -> B -> E -> Target is the route over which your computer and the target system can establish a connection.

This routing capability is the foundation of the Internet - it allows two computer systems to exchange data with each other.

How computer programs talk to each other over the Internet

By now, we have established a general understanding of how computers find other computers on the Internet, and how they can establish a network connection via IP addresses and data packet routing using these addresses.

We now need to zoom in even closer, and have a look at how exactly data is exchanged between a client and a server.

First of all, it’s important to note that it is not computers which exchange data through the Internet, it is applications which do.

Our computers are just the physical shell in which our applications live, providing the physical means like network cards and network cables (or radio signals) which enable remote applications to talk to each other.

In our case, the two applications talking to each other are the web browser application and the web server application.

Let’s update a diagram we have used before with more details:

Your computer system                      Web server system
┌───────────────────┐                     ┌───────────────────┐
│                   │                     │                   │
│  Web browser      │                     │  Web server       │
│  application      │                     │  application      │
│  ┌────────────┐   │ requests content    │  ┌────────────┐   │
│  │          --│---│---------------------│--│-->         │   │
│  │            │   │                     │  │            │   │
│  │            │   │                     │  │            │   │
│  │            │   │                     │  │            │   │
│  │            │   │                     │  │            │   │
│  │         <--│---│---------------------│--│--          │   │
│  └────────────┘   │       responds with │  └────────────┘   │
│                   │             content │                   │
│                   │                     │                   │
└───────────────────┘                     └───────────────────┘

As you can see, the word server is used ambigously: it can mean the physical machine - the hardware - which is connected to a network like the Internet in order to serve data (e.g. a web server for web pages, a file server for files, a mail server for mails), but it can also mean the application - the software - which does the serving of web pages, files, or mails.

To distinguish between these two, I’m more precise in this text: I will talk about the server application when talking about the piece of software which serves web pages, and about the server system when talking about the computer which runs the server application.

The Internet mechanisms we have seen so far - DNS, IP addressing, and routing - are sufficient to establish a network connection between two computer systems, but not for making their applications talk to each other. If that was the case

How web browsers and web servers talk to each other

Let’s zoom even further in. We have learned how your web browser is able to find the IP address of the web server it wants to request using the Domain Name System, and we have learned how your web browser can establish a connection to the web server system available under this IP address using the routing mechanisms of the Internet.

Now that a connection is established, we will have a look at the actual data exchange.

The Web Development Beginner Tutorial