Uniform Resource Locator
Encyclopedia
In computing
, a uniform resource locator or universal resource locator (URL) is a specific character string that constitutes a reference to an Internet
resource.
A URL is technically a type of uniform resource identifier
(URI) but in many technical documents and verbal discussions URL is often used as a synonym for URI.
and the URI working group of the Internet Engineering Task Force
(IETF) as an outcome of collaboration started at the IETF Living Documents "Birds of a Feather"
session in 1992. The format combines the pre-existing system of domain name
s (created in 1985) with file path
syntax, where forward slashes
are used to separate folder and file
names. Conventions already existed where server names could be prepended to complete file paths, preceded by a double-slash (//).
(commonly called protocol), followed by a colon, two slashes, then, depending on scheme, a domain name
(alternatively, IP address
), a port number, the path of the resource to be fetched or the program to be run, then, for programs such as Common Gateway Interface
(CGI) scripts, a query string
, and an optional fragment identifier
.
The syntax is
scheme://domain:port/path?query_string#fragment_id
Other examples of scheme names include https
:, gopher:, wais
:, ftp:. URLs with https as a scheme (such ashttps://example.com/ ) require that requests and responses will be made over a secure connection to the website. Some schemes that require authentication allow a username, and perhaps a password too, to be embedded in the URL, for example ftp://asmith@ftp.example.org . Passwords embedded in this way are not conducive to secure working, but the full possible syntax is
scheme://username:password@domain:port/path?query_string#fragment_id
, provides a means of locating the resource by describing its primary access mechanism (e.g., its network location)".
, a hostname is a domain name
assigned to a host computer. This is usually a combination of the host's local name with its parent domain's name. For example, en.example.org consists of a local hostname (en) and the domain name example.org. The hostname is translated into an IP address
via the local hosts file
, or the domain name system
(DNS) resolver. It is possible for a single host computer to have several hostnames; but generally the operating system
of the host prefers to have one hostname that the host uses for itself.
Any domain name can also be a hostname, as long as the restrictions mentioned below are followed. For example, both "en.example.org" and "example.org" can be hostnames if they both have IP address
es assigned to them. The domain name "xyz.example.org" may not be a hostname if it does not have an IP address, but "aa.xyz.example.org" may still be a hostname. All hostnames are domain names, but not all domain names are hostnames.
Information technology
Information technology is the acquisition, processing, storage and dissemination of vocal, pictorial, textual and numerical information by a microelectronics-based combination of computing and telecommunications...
, a uniform resource locator or universal resource locator (URL) is a specific character string that constitutes a reference to an Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
resource.
A URL is technically a type of uniform resource identifier
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...
(URI) but in many technical documents and verbal discussions URL is often used as a synonym for URI.
History
The Uniform Resource Locator was created in 1994 by Tim Berners-LeeTim Berners-Lee
Sir Timothy John "Tim" Berners-Lee, , also known as "TimBL", is a British computer scientist, MIT professor and the inventor of the World Wide Web...
and the URI working group of the Internet Engineering Task Force
Internet Engineering Task Force
The Internet Engineering Task Force develops and promotes Internet standards, cooperating closely with the W3C and ISO/IEC standards bodies and dealing in particular with standards of the TCP/IP and Internet protocol suite...
(IETF) as an outcome of collaboration started at the IETF Living Documents "Birds of a Feather"
Birds of a Feather (computing)
In computing, BoF can refer to:* An informal discussion group. Unlike special interest groups or working groups, BoFs are informal and often formed in an ad-hoc manner...
session in 1992. The format combines the pre-existing system of domain name
Domain name
A domain name is an identification string that defines a realm of administrative autonomy, authority, or control in the Internet. Domain names are formed by the rules and procedures of the Domain Name System ....
s (created in 1985) with file path
Path (computing)
A path, the general form of a filename or of a directory name, specifies a unique location in a file system. A path points to a file system location by following the directory tree hierarchy expressed in a string of characters in which path components, separated by a delimiting character, represent...
syntax, where forward slashes
Slash (punctuation)
The slash is a sign used as a punctuation mark and for various other purposes. It is now often called a forward slash , and many other alternative names.-History:...
are used to separate folder and file
Filename
The filename is metadata about a file; a string used to uniquely identify a file stored on the file system. Different file systems impose different restrictions on length and allowed characters on filenames.A filename includes one or more of these components:...
names. Conventions already existed where server names could be prepended to complete file paths, preceded by a double-slash (//).
Syntax
Every URL consists of some of the following: the scheme nameURI scheme
In the field of computer networking, a URI scheme is the top level of the Uniform Resource Identifier naming structure. All URIs and absolute URI references are formed with a scheme name, followed by a colon character , and the remainder of the URI called the scheme-specific part...
(commonly called protocol), followed by a colon, two slashes, then, depending on scheme, a domain name
Domain name
A domain name is an identification string that defines a realm of administrative autonomy, authority, or control in the Internet. Domain names are formed by the rules and procedures of the Domain Name System ....
(alternatively, IP address
IP address
An Internet Protocol address is a numerical label assigned to each device participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing...
), a port number, the path of the resource to be fetched or the program to be run, then, for programs such as Common Gateway Interface
Common Gateway Interface
The Common Gateway Interface is a standard method for web servers software to delegate the generation of web pages to executable files...
(CGI) scripts, a query string
Query string
In World Wide Web, a query string is the part of a Uniform Resource Locator that contains data to be passed to web applications such as CGI programs....
, and an optional fragment identifier
Fragment identifier
In computer hypertext, a fragment identifier is a short string of characters that refers to a resource that is subordinate to another, primary resource...
.
The syntax is
- The scheme name defines the namespace, purpose, and the syntax of the remaining part of the URL. Software will try to process a URL according to its scheme and context. For example, a web browserWeb browserA web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...
will usually dereference the URLhttp://example.org:80 by performing an HTTP request to the host atexample.org , using port number 80. The URLmailto:bob@example.com may start an e-mailE-mailElectronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...
composer with the addressbob@example.com in the To field.
Other examples of scheme names include https
Https
Hypertext Transfer Protocol Secure is a combination of the Hypertext Transfer Protocol with SSL/TLS protocol to provide encrypted communication and secure identification of a network web server...
:, gopher:, wais
Wide area information server
Wide Area Information Servers or WAIS is a client–server text searching system that uses the ANSI Standard Z39.50 Information Retrieval Service Definition and Protocol Specifications for Library Applications" to search index databases on remote computers...
:, ftp:. URLs with https as a scheme (such as
- The domain name or IP address gives the destination location for the URL. The domain google.com, or its IP address
72.14.207.99 , is the address of Google's website. - The domain name portion of a URL is not case sensitive since DNSDomain name systemThe Domain Name System is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities...
ignores case:http://en.example.org/ andHTTP://EN.EXAMPLE.ORG/ both open the same page. - The port number is optional; if omitted, the default for the scheme is used. For example,
http://vnc.example.com:5800 connects to port 5800 of vnc.example.com, which may be appropriate for a VNC remote control session. If the port number is omitted for an http: URL, the browser will connect on port 80, the default HTTP port. The default port for an https: request is 443. - The path is used to specify and perhaps find the resource requested. It is case-sensitive, though it may be treated as case-insensitive by some servers, especially those based on Microsoft WindowsMicrosoft WindowsMicrosoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
. If the server is case sensitive andhttp://en.example.org/wiki/URL is correct,http://en.example.org/WIKI/URL orhttp://en.example.org/wiki/url will display an HTTP 404HTTP 404The 404 or Not Found error message is a HTTP standard response code indicating that the client was able to communicate with the server, but the server could not find what was requested. A 404 error should not be confused with "server not found" or similar errors, in which a connection to the...
error page, unless these URLs point to valid resources themselves. - The query stringQuery stringIn World Wide Web, a query string is the part of a Uniform Resource Locator that contains data to be passed to web applications such as CGI programs....
contains data to be passed to software running on the serverServer-side scriptingServer-side scripting is a web server technology in which a user's request is verified by running a script directly on the web server to generate dynamic web pages. It is usually used to provide interactive web sites that interface to databases or other data stores. This is different from...
. It may contain name/value pairs separated by ampersands, for example ?first_name=John&last_name=Doe. - The fragment identifierFragment identifierIn computer hypertext, a fragment identifier is a short string of characters that refers to a resource that is subordinate to another, primary resource...
, if present, specifies a part or a position within the overall resource or document. When used with HTTP, it usually specifies a section or location within the page, and the browser may scroll to display that part of the page.
Absolute and relative URLs
According to RFC 1738, which defined URLs in 1994, when resources contain references to other resources, they can use relative links to define the location of the second resource as if to say, "in the same place as this one except with the following relative path". It went on to say that such relative URLs are dependent on the original URL containing a hierarchical structure against which the relative link is based, and that theftp
, http
, and file
URL schemes are examples of some that can be considered hierarchical, with the components of the hierarchy being separated by "/".URLs as locators
A URL is a URI that, "in addition to identifying a resourceResource (Web)
The concept of resource is primitive in the Web architecture, and is used in the definition of its fundamental elements. The term was first introduced to refer to targets of Uniform Resource Locators , but its definition has been further extended to include the referent of any Uniform Resource...
, provides a means of locating the resource by describing its primary access mechanism (e.g., its network location)".
Internet hostnames
On the InternetInternet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
, a hostname is a domain name
Domain name
A domain name is an identification string that defines a realm of administrative autonomy, authority, or control in the Internet. Domain names are formed by the rules and procedures of the Domain Name System ....
assigned to a host computer. This is usually a combination of the host's local name with its parent domain's name. For example, en.example.org consists of a local hostname (en) and the domain name example.org. The hostname is translated into an IP address
IP address
An Internet Protocol address is a numerical label assigned to each device participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing...
via the local hosts file
Hosts file
The hosts file is a computer file used in an operating system to map hostnames to IP addresses. The hosts file is a plain-text file and is conventionally named hosts.-Purpose:...
, or the domain name system
Domain name system
The Domain Name System is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities...
(DNS) resolver. It is possible for a single host computer to have several hostnames; but generally the operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
of the host prefers to have one hostname that the host uses for itself.
Any domain name can also be a hostname, as long as the restrictions mentioned below are followed. For example, both "en.example.org" and "example.org" can be hostnames if they both have IP address
IP address
An Internet Protocol address is a numerical label assigned to each device participating in a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing...
es assigned to them. The domain name "xyz.example.org" may not be a hostname if it does not have an IP address, but "aa.xyz.example.org" may still be a hostname. All hostnames are domain names, but not all domain names are hostnames.
See also
- CURIECURIEA CURIE defines a generic, abbreviated syntax for expressing URIs. It is an abbreviated URI expressed in CURIE syntax, and may be found in both XML and non-XML grammars...
(Compact URI) - Extensible Resource IdentifierExtensible Resource IdentifierExtensible Resource Identifier is a scheme and resolution protocol for abstract identifiers compatible with Uniform Resource Identifiers and Internationalized Resource Identifiers, developed by the at OASIS...
(XRI) - Fragment identifierFragment identifierIn computer hypertext, a fragment identifier is a short string of characters that refers to a resource that is subordinate to another, primary resource...
- Internationalized Resource IdentifierInternationalized Resource IdentifierOn the Internet, the Internationalized Resource Identifier is a generalization of the Uniform Resource Identifier . While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set , including Chinese or Japanese kanji, Korean, Cyrillic...
(IRI) - URL normalizationURL normalizationURL normalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs may be equivalent.Search engines...