The abbreviation URL stands for Uniform Resource Locator and is used to specify the address of a resource on the web. A URL can contain a domain name like logic-practice.com or an IP address such as 195.201.68.81.
Syntax:
The structure of a URL consists of the following components:
-
scheme: It indicates the protocol used to access the resource, such as HTTP or HTTPS. -
userinfo: This optional component may contain authentication information like a username and password. -
host: It specifies the domain name or IP address of the server hosting the resource. -
port: An optional port number on which the server is listening for requests. -
path: The path to the specific resource on the server. -
query: Parameters or data sent to the server, typically in key-value pairs. -
fragment: A specific section of the resource, often used in web pages for navigation.
Here,
- scheme is used to define the type of Internet service (the most common is http or https).
- host is the domain name or IP address of the server (e.g., www.logic-practice.com).
- domain is used to define the Internet domain name (like logic-practice.com).
- port is used to define the port number at the host (the default for http is 80).
- path is the hierarchical location of a resource (e.g., /index.html).
Following is a list of some common types of schemes used in URL:
- http (HyperText Transfer Protocol): Common web pages. Not encrypted.
- https (Secure HyperText Transfer Protocol): Secure web pages. Encrypted.
- ftp (File Transfer Protocol): Downloading or uploading files.
- file: A file on your computer.
URL Encoding
URL encoding is essential for converting non-ASCII characters into a format suitable for transmission over the Internet. This process is necessary because URLs are transmitted using the ASCII character-set exclusively. Therefore, any characters in a URL that fall outside the ASCII set must undergo conversion.
URL encoding involves replacing non-ASCII characters with a "%" sign followed by hexadecimal digits.
Spaces are not allowed in URLs. When encoding a URL, spaces are typically substituted with either a plus (+) sign or %20.
Below is a compilation of characters along with their percent-encoded representations in UTF-8 (UTF-8 is typically utilized for URL encoding by contemporary browsers).
| Character | From UTF-8 |
|---|---|
| € | %E2%82%AC |
| £ | %C2%A3 |
| © | %C2%A9 |
| ® | %C2%AE |
À |
%C3%80 |
Á |
%C3%81 |
 |
%C3%82 |
à |
%C3%83 |
Ä |
%C3%84 |
Å |
%C3%85 |
Reserved Characters in URLs
Certain characters in URLs have specific meanings and need to be encoded if they are to be used as normal text.
| Character | Meaning in URL | Encoded Value |
|---|---|---|
: |
Scheme delimiter | %3A |
/ |
Path separator | %2F |
? |
Query string start | %3F |
| # | Fragment identifier | %23 |
& |
Separator between query parameters | %26 |
= |
Assigns values to parameters | %3D |
+ |
Represents space (in query strings) | %2B |
For example, say you wish to add a query parameter whose value is C# tutorial, the coded representation would be:
https://logic-practice.com/search?q=C%23+tutorial
What is the importance of URL encoding?
The ASCII character set is the only set of characters that can be effectively sent over the Internet using URLs. If a URL contains spaces, special characters, or non-ASCII characters, it is essential to encode them. This encoding ensures that the URL remains valid and is correctly interpreted by both browsers and servers.
What are the consequences of not encoding special characters within a URL?
Without encoding special characters (such as #, &, ?, or spaces), the browser or the server can get the URL wrong. This may create broken links, wrong query parameters, or a security problem.
- What's the difference between encoding spaces as + and %20?
+ and %20 are both spaces; however, they are used slightly differently. + is often used in query strings, though percent is used more widely and is better suited to encoding a space in a path or an entire URL. Modern browsers generally support both, and the more universal of the two is thought to be the use of the %20.
- Which character encoding is used for URL encoding today?
The URL encoding utilized by contemporary browsers is known as UTF-8. It allows for the encoding of characters from various languages, ensuring global compatibility.
Are encoded characters always necessary for reserved characters?
Special characters like :, /, ?, and # have specific functions within URLs. They are not encoded when used as regular text, but only when they serve their special purpose. For instance, if a hashtag is part of a keyword, it should be represented as %23 in the encoded form.
- Is it advisable to encode URLs manually, or is it better to depend on functions for this task?
Performing manual URL encoding is suitable for handling straightforward scenarios with a limited number of characters. However, utilizing built-in functions like encodeURIComponent in JavaScript is advisable for more complex URLs to avoid omitting characters or introducing errors. It is worth noting that URL encoding differs from HTML encoding.
Indeed, URL encoding and HTML encoding serve distinct purposes. URL encoding is utilized to secure URLs for Internet transportation by transforming special and non-ASCII characters into valid ASCII representations. On the other hand, HTML encoding converts reserved HTML characters like <, > and & into plain text displayed on a webpage. It is crucial to understand that these two encoding methods are separate and fulfill specific functions.
Conclusion
URL encoding plays a crucial role in facilitating the transmission of web addresses over the Internet in a suitable manner. Due to the restricted range of characters that are considered safe in URLs, encoding converts the utilization of reserved or non-ASCII characters into a standardized format.
In modern web standards, UTF-8 is the preferred encoding system. Web developers need to understand how special characters, spaces, and symbols are encoded to prevent broken links and ensure seamless communication between clients and servers.