Clever Geek Handbook
📜 ⬆️ ⬇️

URL

Uniform resource locator (from the English. Uniform Resource Locator - unified resource locator, abbr. URL [ ˌ j u ː ɑ ː r ˈ e l ]) - system of unified addresses of electronic resources, or a uniform determinant of the location of the resource ( file ) [1] .

It is used as a standard for recording links to objects on the Internet ( hypertext links on the World Wide Web www ).

History

The URL was invented by Tim Berners-Lee in 1990 as part of the European Council for Nuclear Research ( fr. Conseil Européen pour la Recherche Nucléaire, CERN ) in Geneva , Switzerland . URL has become a fundamental innovation on the Internet. Initially, the URL was intended to indicate the locations of resources (most often files) on the World Wide Web . Now the URL is used to denote the addresses of almost all Internet resources. The URL standard is anchored in RFC 3986 . Now the URL is positioned as part of a more general URI resource identification system, the term URL itself is gradually giving way to the wider term URI . The URL standard is governed by the IETF organization and its divisions.

In 2009, Tim Berners-Lee expressed the opinion that the double slash is redundant // at the beginning of the URL, after specifying the network protocol [2] .

URL structure

Initially, the URL locator was developed as a system for the most natural indication of the location of resources in the network. The locator had to be easily expandable and use only a limited set of ASCII characters (for example, the space is never used in the URL). In this regard, the following traditional URL entry form has emerged:

<схема>:[//[<логин>[:<пароль>]@]<хост>[:<порт>]][/<URL‐путь>][?<параметры>][#<якорь>]

In this post:

scheme
resource access scheme; in most cases the network protocol is meant
login
username used to access the resource
password
password of the specified user
host
Fully spelled domain name of the host in the DNS system or the IP-address of the host in the form of four groups of decimal numbers separated by dots; the numbers are integers from 0 to 255.
port
host port to connect
URL path
clarifying information about the location of the resource; depends on the protocol.
options
query string with parameters transmitted to the server ( GET method ). Starts with a symbol ? , the parameter separator is the & sign. Example ?параметр_1=значение_1&параметр_2=значение_2&параметр3=значение_3
anchor
identifier "anchor" preceded by a # . An anchor can be a header inside a document or an id attribute an item. Using such a link, the browser will open the page and move the window to the specified element. For example, a link to this section of the article: https://ru.wikipedia.org/wiki/URL#Структура_URL .

URL schemes (protocols)

Common URL schemes include:

  • ftp - File Transfer Protocol FTP
  • http - HTTP Hypertext Transfer Protocol
  • rtmp - Real Time Messaging Protocol is a proprietary streaming protocol, mainly used for streaming video and audio streams from webcams over the Internet.
  • rtsp - Real-time streaming protocol.
  • https - A special implementation of the HTTP protocol that uses encryption (usually SSL or TLS )
  • gopher - Gopher Protocol
  • mailto - Email Address
  • news - Usenet news
  • nntp - Usenet News via NNTP
  • irc - IRC Protocol
  • smb - SMB / CIFS protocol
  • prospero - Prospero Directory Service
  • telnet - Link to telnet interactive session
  • wais - WAIS Database
  • xmpp - XMPP Protocol (part of Jabber )
  • file - local file name
  • data - Direct data ( Data: URL )
  • tel - call the specified phone

Exotic URL schemes:

  • afs - The global file name in the Andrew File System file system.
  • cid - Content ID for MIME parts
  • mid - Message ID for email
  • mailserver - Access data from mail servers
  • nfs - File name on the NFS network file system
  • tn3270 - Emulate an interactive Telnet 3270 session
  • z39.50 - Access to ANSI Z39.50 services
  • skype - Skype protocol
  • smsto - Opening SMS editor in some mobile phones
  • ed2k - P2P file sharing network eDonkey
  • market - Android Market
  • steam - Steam protocol
  • bitcoin - Bitcoin cryptocurrency
  • ob - OpenBazaar
  • tg - Telegram

URL schemes in browsers:

  • view-source - view the source code of the specified web-page in various browsers.
  • chrome - service pages of the Google Chrome browser or browsers on the Gecko engine [3] . In Yandex. Browser redirects to browser: //
  • opera is the service pages of the Opera browser.
  • browser - Yandex . Browser service pages.

URL coding

The URL standard uses the US- ASCII character set . This has a serious drawback, since it is allowed to use only Latin letters, numbers and a few punctuation marks. All other characters must be recoded. For example, the Cyrillic letters, letters with accents , ligatures , hieroglyphs should be recoded. The transcode encoding is described in RFC 3986 and is referred to as URL-encoding, URLencoded, or percent-encoding .

An example of coding can be seen in Russian-language Wikipedia , which uses Russian in the URL. For example, a string like:

  https://ru.wikipedia.org/wiki/Putin

encoded as:

  https://ru.wikipedia.org/wiki/%D0%94%D0%B6%D0%B8%D0%B3%D1%83 %D1%80%D0%B4%D0

Implementation

The conversion takes place in two stages: first, each Cyrillic character is encoded in UTF-8 into a sequence of two bytes, and then each byte of this sequence is written in hexadecimal representation with a preceding percent sign (%):

  D → D0 and 94 →% D0% 94
 W → D0 and B6 →% D0% B6
 and → D0 and B8 →% D0% B8
 g → D0 and B3 →% D0% B3, etc.
Reserved characters [4]
!*'();:@&=+$,/?#[]
Unreserved characters [4]
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789-_.~

All other characters in the URI are encoded.

Reserved characters are encoded as follows:

!"# [5]$%& [5]'*+, [5]: [5]; [five]<= [5]>? [five][]^`{|}<space>
% 21% 22% 23% 24% 25% 26% 27% 2A% 2B% 2C% 3A% 3B% 3C% 3D% 3E% 3F% 5B% 5D% 5E% 60% 7B% 7C% 7D% 20 [6]

The coding of settings in Internet Explorer and old Firefox is somewhat different [7] .

In some cases, the URL is generated using Base58 coding [8] .

IRI Standard

Since the letters of all alphabets, except basic Latin , are subjected to such a transformation, a URL with the words of the overwhelming majority of languages ​​may become unreadable for humans.

This all contradicts the principle of internationalism proclaimed by all leading Internet organizations, including W3C and ISOC . This problem is intended to be solved by the IRI standard ( English Internationalized Resource Identifier ) - international resource identifiers, in which it would be possible to use Unicode characters without any problems, and which therefore would not infringe upon the rights of other languages . Although it is difficult to say in advance whether IRI identifiers will ever be able to replace such widely used URLs (and URIs in general).

Length Limit

Formally, the length of the URL is not limited, but browsers have restrictions on the length of the URL. It is not recommended to use URLs longer than 2048 characters, since Microsoft Internet Explorer has such a restriction [9] .

PURL Initiative

Another major drawback of the URL is the lack of flexibility. Resources on the World Wide Web and the Internet are moved, and links in the form of URLs remain, indicating already missing resources. This is especially painful for digital libraries, catalogs and encyclopedias. To solve this problem, permanent PURL locators ( Persistent Uniform Resource Locator ) were proposed. These are essentially the same URLs, but they do not indicate the specific location of the resource, but the record in the PURL database, where, in turn, the specific URL address of the resource is recorded. When accessing PURL, the server finds the required entry in this database and redirects the request to a specific location of the resource. If the address of the resource changes, then there is no need to correct all the innumerable references to it - just change the record in the database. At the moment, this idea is not standardized and not widely spread.

See also

  • URN
  • Xri
  • Long domain names
  • Human-readable URL
  • URL normalization

Notes

  1. ↑ URL (Uniform Resource Locator) - Uniform Resource Locator. Address information resource (file) on the Internet - Collection of acronyms in the field of computer technology and programming . 2006
  2. ↑ Double slash in Internet addresses appeared due to the rush of its creator (Neopr.) . RIA News (2009-10-14 19:05). The appeal date is July 11, 2010. Archived August 25, 2011.
  3. ↑ The Chrome URL (Undefined) . The appeal date is December 9, 2014.
  4. ↑ 1 2 RFC 3986 , section 2.2
  5. ↑ 1 2 3 4 5 6 7 The symbol is valid, but when specified by direct text it has a special syntactic meaning
  6. It should be noted that MediaWiki avoids encoding the space as% 20, instead it is replaced everywhere with the underscore “_”. Many search engines replace the space with a “+” symbol.
  7. ↑ HTTP, RFC 3986 and browsers
  8. ↑ Flickr Services
  9. ↑ The maximum length of a URL in Internet Explorer is 2083 characters (non-code) . microsoft.com.

Links

  • RFC 3986
  • URL coding
  • Internet creator regrets double slash
Source - https://ru.wikipedia.org/w/index.php?title=URL&oldid=100363457


More articles:

  • Semenikhin, Vladimir Anatolyevich
  • Annexation of Central America by Mexico
  • Musultemahi
  • Guy Kalpetan Rantius Quirinale Valery Fest
  • Catalyst
  • Trimipramine
  • Ilya S. Rollgeiser
  • Possessions of the Order of Malta
  • Bnei Ephraim
  • Rodriguez, Simon (philosopher)

All articles

Clever Geek | 2019