Photo by Sigmund on Unsplash

How the Internet works

From the web address to your screen

Justin Masayda
6 min readSep 10, 2021

--

So, you type in a website address and press Enter. Shortly after, a webpage appears on your screen. How exactly did that happen? In the moments after you pressed Enter, there’s actually a lot happening behind the scenes. Let’s take a look at what usually transpires when you visit a website.

Overview

A website is really just a bunch of files on another computer connected to the Internet. In order for your device to load a website, the computer with the website’s files needs to be prepared to send copies of those files to other computers upon request.

Software called web servers (e.g., Apache, Nginx) exist for this purpose; they wait for requests, interpret what’s being asked for, and send a response. Some computers are specifically designed for and/or dedicated to serving websites. The whole computer in such cases could be referred to as a web server. Generally, any software or computer that delivers some kind of content may be called a server.

Complementary to servers are clients. In computing, a client is any software or device which requests information from a server. Software which requests web content is called a web client. When browsing the Internet, our web browser (e.g., Chrome, Safari, Firefox) is a web client.

When you visit a website, your browser sends messages to other computers (servers), waits for them to send a response, and displays the result on your screen. Often, the web servers will need to talk to other servers in order to get relevant information and dynamically create the webpage you want to see before sending it back to your browser.

Let’s take a closer look at what happens on the client and server machines when you visit a website.

Client side

URL

It starts with the Uniform Resource Locator, or URL. Suppose I want to go to the Holberton School website which has the URL https://www.holbertonschool.com. Nobody really types in http://, though, do they? We might skip the www, too. If I want to go to Holberton School’s website, I’d just type holbertonschool.com and press Enter.

The web browser would make a few assumptions about what I typed. Modern browsers have a two-in-one search and address bar; the browser will determine if I am trying to navigate to a website or perform a search. If it (correctly) determines that I entered a web address, it will use the Hypertext Transfer Protocol (HTTP), and prefix the URL with http:// or the encrypted version, https:// which a modern browser will try first. More on HTTP shortly. The browser will also append a / at the end of the URL, which basically tells the server to send its default page. Technically, /means “root directory,” but the web server will send whatever it has been configured to send, root directory or not.

Finally, the browser may take one more step for my convenience. Some websites includingholbertonschool.com will send an HTTP redirect message, which instructs the browser, “I want you to talk to me at this URL instead.” A “smart” browser will remember that and autofill that URL the next time I try to visit that website, which slightly speeds up the loading time. In my example, https://holbertonschool.com/ redirects to https://www.holbertonschool.com/ (adding the www subdomain), so my browser will go directly there.

HTTP

HTTP messages are written in human-readable text. They contain webpage content, metadata, form submissions, etc. Here’s part of an HTTP response I got from http://www.holbertonschool.com/:

HTTP/1.1 200 OK
Server: nginx
Date: Fri, 10 Sep 2021 07:20:52 GMT
Content-Length: 153
Connection: keep-alive

Because they’re plain text, it’s possible for someone “in between” a client and a server to read what they’re sending each other. That’s why sensitive information like passwords and credit card info should never be sent over an HTTP connection. This is where HTTPS comes in.

HTTPS & SSL/TLS

HTTPS ensures that:

  • the server you’re talking to is who it claims to be,
  • nobody except that server can read the messages you’re sending each other,
  • nobody has altered any messages after they have been sent

It does this through the SSL/TLS protocol, which uses public-key cryptography to encrypt messages. While the whole process is rather involved, the basic idea is that the client will ask the server, “are you really www.holbertonschool.com?” The server will confirm its identity by sending a certificate signed by a certificate authority (CA). The client can then ask the CA if the certificate is legit.

Once the client confirms the server’s identity, the client sends a message that only the real server would know how to read. That message contains secret code that the server will use to encrypt what it sends back to the client. Now, only the client and server will understand what they’re communicating, even if someone else was listening to their conversation.

TCP/IP & DNS

Now, before the browser actually sends anything, it needs to know which of the millions of computers on the internet to which to send a message. It will use TCP/IP, another protocol, to send the request. In order to establish a TCP connection, the browser needs the IP address of the computer it wants to talk to. An IP address is a number which uniquely identifies a computer within a network. Finding the IP address of a website it’s ultimately done by a service called the Domain Name System or DNS. DNS is a complex hierarchy of servers distributed around the world. When a website domain name such as holbertonschool.com is purchased, the owner must create a DNS resource record to map that domain name to the IP address of their web server(s).* That DNS record will then be stored on an authoritative name server. When a client wants to visit a website, it check a series of locations terminating with the authoritative name server, which is guaranteed to possess the record.

*A domain is often linked to more than one IP address.

Using the Linux utility host, we can view the IP addresses for www.holbertonschool.com:

$ host www.holbertonschool.comwww.holbertonschool.com is an alias for hbtnweb-prod.us-east-1.elasticbeanstalk.com.hbtnweb-prod.us-east-1.elasticbeanstalk.com has address 75.101.239.242hbtnweb-prod.us-east-1.elasticbeanstalk.com has address 18.204.177.203hbtnweb-prod.us-east-1.elasticbeanstalk.com has address 54.174.134.143

Typing either hbtnweb-prod.us-east-1.elasticbeanstalk.com or any of the IP addresses into your browser’s address bar will take you to the same website as www.holbertonschool.com. However, if you do that, your browser will warn you that it can not establish a secure connection that way. That’s because certificates are issued for specific domain names. Your browser will compare the domain name on the certificate with the domain name its accessing, notice that they don’t match (www.holbertonschool.com75.101.239.242), and fail to establish TSL.

With the URL complete, and the domain name resolved to an IP address, the browser can finally communicate with the server.

Server side

The server-side details depend on the application service/server architecture. We will make some assumptions about the architecture based on common designs.

  1. An HTTPS request comes into the server’s load balancer. The load balancer will be listening on certain ports, and all others should be blocked by a firewall. Websites are typically hosted on multiple web servers to handle more traffic, and as a backup in case of failure. The load balancer routes incoming requests to an available web server based on some load balancing algorithm.
  2. If the web page has dynamic content, the selected web server will ask an application server to generate a web page. Otherwise, the web server can respond immediately with a static web page.
  3. The application server will pull information from a database (e.g., Oracle, MySQL, MongoDB) and generate a web page. Databases store large amounts of information in an organized fashion, much like a bunch of spreadsheets.
  4. The application server returns the page to the web server, which returns the page to the load balancer, which delivers the page to the client as an HTTPS response.

Rendering

Once the web client gets a response, it can then render a webpage. It will parse the webpage’s HTML and might need to request the server for more resources (images, scripts, etc.) to complete the page.

Further Reading:

HTTP 1.1 standard:

HTTPS & TLS:

Sources

--

--

Justin Masayda

Software engineer | Machine learning specialist | Learning audio programming | Jazz pianist | Electronic music producer