After the articles dedicated to the HTTP protocol and the HTTP request/response cycle, it's time to dedicate a post to the main protagonist of the interaction between client/browser and the world wide web: the web server (or HTTP server), or the software application. which is responsible for receiving and handling HTTP requests.
In this article we will refer to the web server by adopting the definition of Wikipedia, which describes it as a service or software application running on a physical or virtual server machine: however, the term is also often used to define the machine on which this application is installed. The two different meanings of the term are however largely interchangeable, as the application and (in full) the machine are both required to perform the same main functions.
As mentioned at the beginning, the web server is a service installed on a physical or virtual machine (typically a workstation with a Windows, Linux, or another operating system) that takes care of "listening" to one or more TCP ports (usually 80 and/or 443) and manage requests addressed to them, coming from clients/browsers. The management of these requests involves negotiating the call (and the encrypted channel, in the case of HTTPS), the "unpacking" of the URL relating to the request, and the use of its individual parts to carry out the following activities:
- Verify that the required protocol is among those managed/accepted.
- Verify that the required hostname is among those configured as valid.
- Verify that the TCP port used is among those configured as valid.
- Verify that the path points to a physical file with a valid extension or is managed by a handler, a CGI script, or another server-side component properly configured to handle that type of request.
- Handle the request and transmit the response to the client.
IMPORTANT: The Web Server is typically configured to manage only some specific extensions (usually defined through a white list): all the others are rejected and managed through a Status Code 404 - Not Found, even if the relative file is present on the file system.
Static vs Dynamic Pages
- Static pages are those that the web server retrieves and transmits from the file system: in other words, they are immutable pages that are identical every time a user views them and for all users.
- Dynamic pages are those generated from time to time depending on a series of changing conditions: GET parameters, POST parameters, date and time, etc, thus potentially being different for each call and for each user.
To send a static page to the client, the browser does the following:
- Retrieves the page from the filesystem (as a "content" file). Some examples: hello.html, image1.gif, etc.
- Determines the content type from the file extension
- Reads and transmits the content of the file (and its content type) to the browser
To send a dynamic page to the client, the browser does the following:
- Retrieves the page from the filesystem (in the form of an "executable" file). Some examples: index.php, default.asp, etc.
- Executes the file using its server-side framework (PHP, ASP, ASP.NET, Python, etc), retrieving the execution result (including the content type)
- Transmits the result of the execution (and related content type) to the browser
How does the web server "know" which server-side framework to use? Answer: always through the extension. The extension of the "dynamic" file determines which framework to use, or rather the application (handler, parser, pre-processor, etc.) to which to transmit the HTTP Request and which will "generate" the HTTP Response to be transmitted to the client.
Obviously, the framework must be installed and appropriately configured on the web server: for example, in order to "run" pages with the .php extension, the PHP command interpreter must be installed on the server, otherwise, those pages will most likely return an error (Status Code 404 - Not Found) as there is no association for that particular content type.
IMPORTANT: the client has no information about the server-side framework (and/or its programming language) used by the server to execute the dynamic pages, nor does it have any need to know, as everything it receives is the result of such processing. This means that:
- The client cannot know for sure if the web server uses PHP, ASP, ASP.NET, etc pages: it can only hypothesize the server-side technology used, based on what are the standard behaviors of the various frameworks and/or the "traces ”That they can leave in the various elements that make up the HTTP response. For example, if the web application's internal links point to pages with a .php extension, it is reasonable to assume that the site was built in PHP.
- The client does not need to have any server-side framework installed on its system, as it merely receives the result of the processing of the dynamic pages in the form of an HTTP Response from the web server. In other words, it is not he who has to "execute" those pages: his purpose is to ask the web server (request) for the result of this execution so that it can receive it (response) and display it on the screen. For further information on this HTTP request/response dynamic, we recommend reading this article.
That's it for now: we hope that this overview, albeit with the inevitable simplifications of the case, will be useful to those who are approaching these fundamental concepts of the World Wide Web for the first time.