Web cache
Encyclopedia
A web cache is a mechanism for the temporary storage (caching
) of web document
s, such as HTML pages and images
, to reduce bandwidth
usage, server
load, and perceived lag
. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.
It should not be confused with a web archive
, a site that keeps old versions of web pages.
(17 U.S.C. §: 512) that relinquishes system operators from copyright liability for the purposes of caching.
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...
) of web document
Web document
A web document is similar in concept to a web page, but also satisfies the following broader definition:The term "web document" has been used as a fuzzy term in many sources A web document is similar in concept to a web page, but also satisfies the following broader (W3C) definition:The term "web...
s, such as HTML pages and images
Digital image
A digital image is a numeric representation of a two-dimensional image. Depending on whether or not the image resolution is fixed, it may be of vector or raster type...
, to reduce bandwidth
Bandwidth (computing)
In computer networking and computer science, bandwidth, network bandwidth, data bandwidth, or digital bandwidth is a measure of available or consumed data communication resources expressed in bits/second or multiples of it .Note that in textbooks on wireless communications, modem data transmission,...
usage, server
Web server
Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....
load, and perceived lag
Lag
Lag is a common word meaning to fail to keep up or to fall behind. In real-time applications, the term is used when the application fails to respond in a timely fashion to inputs...
. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.
It should not be confused with a web archive
Web ARChive
The Web ARChive archive format specifies a method for combining multiple digital resources into an aggregate archive file together with related information. The WARC format is a revision of the Internet Archive's ARC File Format [ARC_IA] that has traditionally been used to store "web crawls" as...
, a site that keeps old versions of web pages.
Systems
Web caches various systems.- A search engineSearch engineA search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
may cache a website. - A forward cache is a cache outside the webserver's network, e.g. on the client software's ISP or company network.
- A network-aware forward cache is just like a forward cache but only caches heavily accessed items.
- A reverseReverse proxyIn computer networks, a reverse proxy is a type of proxy server that retrieves resources on behalf of a client from one or more servers. These resources are then returned to the client as though it originated from the reverse proxy itself...
cache sits in front of one or more Web serverWeb serverWeb server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....
s and web applicationWeb applicationA web application is an application that is accessed over a network such as the Internet or an intranet. The term may also mean a computer software application that is coded in a browser-supported language and reliant on a common web browser to render the application executable.Web applications are...
s, accelerating requests from the Internet. - A client, such as a web browserWeb browserA web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...
, can store web content for reuse. For example, if the back button is pressed, the local cached version of a page may be displayed instead of a new request being sent to the web server. - A web proxy sitting between the client and the server can evaluate HTTP headers and choose to store web content.
- A content delivery networkContent Delivery NetworkA content delivery network or content distribution network is a system of computers containing copies of data placed at various nodes of a network....
can retain copies of web content at various points throughout a network.
Cache control
HTTP defines three basic mechanisms for controlling caches: freshness, validation, and invalidation.- Freshness allows a response to be used without re-checking it on the origin server, and can be controlled by both the server and the client. For example, the Expires response header gives a date when the document becomes stale, and the Cache-Control: max-age directive tells the cache how many seconds the response is fresh for.
- Validation can be used to check whether a cached response is still good after it becomes stale. For example, if the response has a Last-Modified header, a cache can make a conditional request using the If-Modified-Since header to see if it has changed. The ETagHTTP ETagAn ETag, or entity tag, is part of HTTP, the protocol for the World Wide Web. It is one of several mechanisms that HTTP provides for cache validation, and which allows a client to make conditional requests. This allows caches to be more efficient, and saves bandwidth, as a web server does not...
(entity tag) mechanism also allows for both strong and weak validation. - Invalidation is usually a side effect of another request that passes through the cache. For example, if URL associated with a cached response subsequently gets a POST, PUT or DELETE request, the cached response will be invalidated.
Browser cache
Web browsers cache content on the client machine, in memory and on disk.Legal issues
In 1998 the DMCA added rules to the United States CodeUnited States Code
The Code of Laws of the United States of America is a compilation and codification of the general and permanent federal laws of the United States...
(17 U.S.C. §: 512) that relinquishes system operators from copyright liability for the purposes of caching.
Comparison of web caches
Name | Type | Operating System | Forward Mode |
Reverse Mode |
License |
---|---|---|---|---|---|
ApplianSys CACHEbox | Appliance | Linux | Commercial | ||
Blue Coat ProxySG Blue Coat Systems Blue Coat Systems Inc. is a network security and network management company based in Sunnyvale, California, United States.It identifies itself as an application delivery network specialist... |
Appliance | SGOS | Commercial | ||
Nginx Nginx nginx is a Web server and a reverse proxy server for HTTP, SMTP, POP3 and IMAP protocols, with a strong focus on high concurrency, performance and low memory usage. It is licensed under a BSD-like license and it runs on Unix, Linux, BSD variants, Mac OS X, Solaris, and Microsoft Windows.- Overview... |
Software | Linux, Unix | 2-clause BSD-like | ||
Microsoft Forefront Threat Management Gateway | Software | Windows | Commercial | ||
Polipo Polipo Polipo is a fast and lightweight, forwarding and caching proxy server and computer software daemon.By virtue of being a compliant HTTP 1.1 proxy, Polipo has all the uses of traditional Web proxies. It features HTTP 1.1, IPv4 & IPv6, traffic filtering and privacy-enhancement. Polipo supports HTTP... |
Software | Linux, Unix, Windows | GNU GPL | ||
Squid | Software | Linux, Unix, Windows | GNU GPL | ||
Traffic Server Traffic Server The Apache Traffic Server is a modular, high-performance reverse proxy and forward proxy server, generally comparable to Nginx and Squid. It was created by Inktomi, and distributed as a commercial product called the Inktomi Traffic Server, before Inktomi was acquired by Yahoo!... |
Software | Linux, Unix | Apache License 2.0 | ||
Untangle Untangle Untangle is a privately held company based in Sunnyvale, California. The company provides an open source network gateway for small businesses, schools, and non-profit organizations. Untangle provides multiple gateway applications installed at the edge of a network.-History:Untangle was founded in... |
Software | Linux | Commercial | ||
Varnish | Software | Linux, Unix | BSD | ||
WinGate Wingate -Places:In New Zealand:* Wingate, New Zealand, A suburb of Lower HuttIn the United Kingdom:* Wingate, County Durham* Old Wingate, County Durham* Wingates, Bolton, Greater ManchesterIn the United States:* Wingate, Indiana... |
Software | Windows | Commercial | ||
See also
- Harvest projectHarvest projectHarvest was a DARPA funded research project by the Internet Research Task Force Research Group on Resource Discovery and hosted at the University of Colorado at Boulder which provided a web cache, developed standards such as the Internet Cache Protocol and Summary Object Interchange Format, and...
- Proxy serverProxy serverIn computer networks, a proxy server is a server that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server...
- Web acceleratorWeb acceleratorA web accelerator is a proxy server that reduces web site access times. They can be a self-contained hardware appliance or installable software....
- Cache manifest in HTML5
Further reading
- Ari Luotonen, Web Proxy Servers (Prentice Hall, 1997) ISBN 0-13-680612-0
- Duane Wessels, Web Caching (O'Reilly and Associates, 2001). ISBN 1-56592-536-X
- Michael Rabinovich and Oliver Spatschak, Web Caching and Replication (Addison Wesley, 2001). ISBN 0-201-61570-3
External links
- Caching Tutorial for Web Authors and Webmasters
- Web Caching and Content Delivery Resources
- Web Caching, Web caching in general with some references to SQUID
- Cache control directives demystified Explanations, do's and don't