How the web stack works on top of the internet?

What happens when you type https://www.holbertonschool.com in your browser and press Enter?

Santiago Peña Mosquera
7 min readAug 28, 2020

--

If you are reading this post, surely you have ever used the internet and asked how it works, what kind of magic does every time we type the name of a page in the browser, and shows us all the information that we see and that is only available to us with one click. In this blog it will be explained in detail what the internet is and how it works in a simple way for all people without the need for a degree in areas related to technology can understand it.

Internet

The Internet is a network of computers worldwide, which exchange information between them. These computers communicate through telephone connections, cable, waves or other types of technologies and due to the use of a common language or protocol that establishes how the data has to travel through the network.

TCP/IP

To connect to the Internet, the first thing we have to do is hire the services of an ISP (internet service provider). such as Claro, Orange, etc, which provide us with internet connection. These providers assign a unique number to our computer within the network so that when it connects, can be identified, called IP (Internt Protocol) address, very similar to our home address or our zip code. There cannot be two computers with the same IP address within the same network allowing devices to communicate without confusion or errors. An IP address looks like this:

34.234.197.104 -> IP adress for www.holbertonschool.com

IP addresses are made up of four digits of numbers separated by points, each of which can take values between 0 and 255. Where the first digits are designated to identify the network and the last to identify the device, this varies depending on the class, A, B or C, but we will not delve into that here, if you want to know more about this you can go to the following link, IP adress.

Another of the Internet standards or protocols is TCP (Transmission Control Protocol), which indicates how information is sent. The information is divided into small packages, which when arriving at the final destination have to be put together again to form the initial information. How these packets of information are divided, joined and sent is what the TCP protocol establishes.

TCP’s mission is to divide data into packets. During this process, it provides each of them with a header that contains various information, as well as the order in which they must be joined later. Each of the packages to be sent contains data such as the address where they should be sent. When the packages arrive at their destination, a new checksum is performed and compared with the original sum, if these do not match, it is because information has been lost on the way, which is why the package is requested again from the origin. Finally, when the validity of all the packets has been verified, TCP joins them together, forming the initial message.

DNS request

As previously mentioned within the internet network all devices connected to it must have an IP Address, but when we use our browser, we write names, such as www.google.com or www.facebook.com, this is because to Humans are much easier to remember these names, than the sequence of numbers in IP addresses, and what gave rise to the DNS (Domain Name System).

Every time we enter the name of a page in the browser, for example www.holbertonschool.com, it goes to the cache to look for the address corresponding to this domain, if it does not find it, it asks the operating system which sends the request to the DNS sever which uses a distributed and hierarchical database that stores information associated with domain names in networks such as the Internet.

Finally, the DNS server returns the IP corresponding to the website (34.234.197.104), with which the browser can send a request to its servers.

HTTPS/SSL

Every time our browser makes a request to a website, it does so through the HTTP protocol. This is a language that mediates between the client’s requests and the server’s responses on the Internet, to allow fluent communication in the same “language”. This protocol establishes the guidelines to follow, the request methods called verbs and has some flexibility to incorporate new requests and functionalities.

When a request is sent to the server through this protocol, it provides a response structured in a timely manner and equipped with a series of metadata, which establishes the guidelines for the start, development and closure of the transmission of information.

Because the information travels as is, through the http protocol, it can be easily intercepted and deciphered by third parties, that is why the “https” protocol was created, the “s” at the end stands for secure, which means that the information travels encrypted, using the SSL (Secure Sockets Layer) protocol, making it impossible to decrypt for someone to intercept it.

https://www.instantssl.com/http-vs-https

Firewall

Apart from protecting the information that is sent through https, it is also necessary to protect computers from the information that is received for both the client and the server, so it is necessary to use a firewall which is a system that allows to protect a computer or a computer network from intrusions that come from a third network by filtering the data packets that go through it, especially if they are permanently connected to the Internet.

That is why when sending a request through the https protocol, the first thing the server does is analyze that the data packets do not have any security violation.

Load-balancer

Although previously I was saying that the browser communicates with the server that hosts the website, the truth is that it does not, or at least it does not do it directly, it does so through the load balancer. This is a hardware or software device that is put in front of a set of servers that serve an application, and is responsible for assigning or balancing the requests that come from clients to the servers using some algorithm. Its functions also include minimizing response times, improving service performance and avoiding saturation.

Due the load balancer is the one with which the client communicates directly, the IP address of the website to which we want to access is their load balacers’s, so when we send a request to www.holbertonschool.com, we are sending it to the load balancer, which redirects it to the servers that host the web page.

https://www.cybercureme.com/load-balancer-how-does-it-work-with-reconnaissance-phase-during-penetration-testing/

Host Server

Many websites are made up of a single server infrastructure with a LAMP stack (Linux, Apache, MySQL, PHP), operating system, web server, database, and application server respectively. This means that within a single server, all these services, that make up a web page are hosted. We will assume this is the case for www.hoolbertonschool.com.

  • Web server: Is a software and hardware that uses HTTP (Hypertext Transfer Protocol) and other protocols to respond to client requests. Its main funciton is to display website content through storing, processing and delivering webpages.
  • Application server: It is in charge of providing application services to clients. An application server typically handles the data access and negotiation logic functions of applications.
  • Database: Program capable of storing a large amount of data, related and structured, that can be quickly consulted according to the desired selective characteristics.

When the request reaches the host server, it is first filtered by the firewall as mentioned above, and it reaches the web server, which is responsible for returning the static content of the page, this communicates with the application server, which is communicates with the database, and returns the dynamic content

Finally the server returns all the content of the page www.holbertonschool.com with an http header with the code 200 ok indicating that the request has been successful, this through TCP / IP, which, as mentioned above, divides all this information into packets , that when they get to our computer join again and so we can see the web page with all its content in our browser.

Summary

When we type www.holbertonschool.com and hit enter, the browser looks for the IP address corresponding to this domain, in the chache, if it does not find it, it requests it from the kernell, which requests it from the DNS server, which after making a series of inquiries in other servers, returns the IP to the browser . Once it has the IP, the browser sends a request to the corresponding server, using the http or https protocols (which uses the SSL protocol), and TCP. The server that receives the request made by the browser contains a load balancer that is responsible for redirecting the request to the servers that contain the website, this without first having analyzed this request through a firewall. When the request reaches the servers that host the website, it is processed by the web server, which together with the application server and the database, return the website and all its information, finally the browser renders this information and this is how we can see the page on our screens.

References

--

--