Database caching
Encyclopedia
Many applications today are being developed and deployed on multi-tier environments that involve browser-based clients, web application servers and backend databases. These applications need to generate web pages on-demand by talking to backend databases because of their dynamic nature, making middle-tier database caching an effective approach to achieve high scalability and performance.
In three tier architecture, application tier and data tier will be in different hosts. Throughput of the application is affected by the network
speed. This network
overhead shall be avoided by having database
at the application tier. As commercial databases are heavy weight, it is not practically feasible to have application and database
at the same host. There are lot of light-weight databases available in the market, which shall be used to cache the data from the commercial database
s.
Flexibility: achieve QoS, where each cache hosts different parts of the backend data, e.g., the data of Platinum customers are cached while that of ordinary customers are not.
Availability: by continued service for applications that depend only on cached tables even if the backend server is unavailable.
Performance: by potentially responding fast because of locality of data and smoothing out load peaks by avoiding round-trips between middle-tier and data-tier
Most of the existing cache solutions are read-only which limits their usage to small segment of the applications, non-real time applications.
Bi-Directional updates
For updateable caches, updates, which happen in cache, should be propagated to the target database and any updates that happen directly on the target database should come to cache automatically.
Synchronous and asynchronous update propagation
The updates on cache table shall be propagated to target database in two modes. Synchronous mode makes sure that after the database operation completes the updates are applied at the target database as well. In case of Asynchronous mode the updates are delayed to the target database.
Synchronous mode gives high cache consistency and is suited for real time applications. Asynchronous mode gives high throughput and is suited for near real time applications.
Multiple cache granularity: Database level, Table level and Result-set caching
Major portions of corporate databases are historical and infrequently accessed. But, there is some information that should be instantly accessible like premium customer’s data, etc.
Recovery for cached tables
In case of system or power failure, during the restart of caching platform all the committed transactions on the cached tables should be recovered.
Tools to validate the coherence of cache
In case of asynchronous mode of update propagation, cache at different cache nodes and target database may diverge. This needs to be resolved manually and the caching solution should provide tools to identify the mismatches and take corrective measures if required.
Horizontally Scalable
Clustering is employed in many solutions to increase the availability and to achieve load balancing. Caching platform should work in a clustered environment spanning to multiple nodes thereby keeping the cached data coherent across nodes.
Transparent access to non-cached tables reside in target database
Database Cache should keep track of queries and should be able to intelligently route to the database cache or to the origin database based on the data locality without any application code modification.
Transparent Fail over
There should not be any service outages in case of caching platform failure. Client connections should be routed to the target database.
No or very few changes to application for the caching solution
Support for standard interfaces JDBC, ODBC etc. that will make the application to work seamlessly without any application code changes. It should route all stored procedure calls to target database so that they don’t need to be migrated.
There are some products based on result set based caching like memcached
, which are best suited for read-only applications. CSQL Cache
and TimesTen
provides updateable bi-directional caching at table level granularity.
In three tier architecture, application tier and data tier will be in different hosts. Throughput of the application is affected by the network
Computer network
A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....
speed. This network
Computer network
A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....
overhead shall be avoided by having database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
at the application tier. As commercial databases are heavy weight, it is not practically feasible to have application and database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
at the same host. There are lot of light-weight databases available in the market, which shall be used to cache the data from the commercial database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
s.
Benefits
Scalability: distribute query workload from backend to multiple cheap front-end systems.Flexibility: achieve QoS, where each cache hosts different parts of the backend data, e.g., the data of Platinum customers are cached while that of ordinary customers are not.
Availability: by continued service for applications that depend only on cached tables even if the backend server is unavailable.
Performance: by potentially responding fast because of locality of data and smoothing out load peaks by avoiding round-trips between middle-tier and data-tier
Requirements of Caching solution
Updateable Cache TablesMost of the existing cache solutions are read-only which limits their usage to small segment of the applications, non-real time applications.
Bi-Directional updates
For updateable caches, updates, which happen in cache, should be propagated to the target database and any updates that happen directly on the target database should come to cache automatically.
Synchronous and asynchronous update propagation
The updates on cache table shall be propagated to target database in two modes. Synchronous mode makes sure that after the database operation completes the updates are applied at the target database as well. In case of Asynchronous mode the updates are delayed to the target database.
Synchronous mode gives high cache consistency and is suited for real time applications. Asynchronous mode gives high throughput and is suited for near real time applications.
Multiple cache granularity: Database level, Table level and Result-set caching
Major portions of corporate databases are historical and infrequently accessed. But, there is some information that should be instantly accessible like premium customer’s data, etc.
Recovery for cached tables
In case of system or power failure, during the restart of caching platform all the committed transactions on the cached tables should be recovered.
Tools to validate the coherence of cache
In case of asynchronous mode of update propagation, cache at different cache nodes and target database may diverge. This needs to be resolved manually and the caching solution should provide tools to identify the mismatches and take corrective measures if required.
Horizontally Scalable
Clustering is employed in many solutions to increase the availability and to achieve load balancing. Caching platform should work in a clustered environment spanning to multiple nodes thereby keeping the cached data coherent across nodes.
Transparent access to non-cached tables reside in target database
Database Cache should keep track of queries and should be able to intelligently route to the database cache or to the origin database based on the data locality without any application code modification.
Transparent Fail over
There should not be any service outages in case of caching platform failure. Client connections should be routed to the target database.
No or very few changes to application for the caching solution
Support for standard interfaces JDBC, ODBC etc. that will make the application to work seamlessly without any application code changes. It should route all stored procedure calls to target database so that they don’t need to be migrated.
There are some products based on result set based caching like memcached
Memcached
In computing, memcached is a general-purpose distributed memory caching system that was originally developed by Danga Interactive for LiveJournal, but is now used by many other sites. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the...
, which are best suited for read-only applications. CSQL Cache
CSQL Cache
CSQL Cache is an open source high performance, bi-directional updateable data caching infrastructure that sits between the clustered application process and back-end data sources to provide high throughput to the application....
and TimesTen
TimesTen
TimesTen is an in-memory relational database software product from Oracle Corporation. TimesTen is designed for low latency, high-volume data, event and transaction management. Unlike disk-optimized relational databases such as the Oracle database, DB2, Informix, and SQL Server, TimesTen's data is...
provides updateable bi-directional caching at table level granularity.
Database Caching products
- CSQL CacheCSQL CacheCSQL Cache is an open source high performance, bi-directional updateable data caching infrastructure that sits between the clustered application process and back-end data sources to provide high throughput to the application....
- To cache tables from MySQL, Postgres and Oracle.
- memcachedMemcachedIn computing, memcached is a general-purpose distributed memory caching system that was originally developed by Danga Interactive for LiveJournal, but is now used by many other sites. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the...
- To cache result set of queries
- TimesTenTimesTenTimesTen is an in-memory relational database software product from Oracle Corporation. TimesTen is designed for low latency, high-volume data, event and transaction management. Unlike disk-optimized relational databases such as the Oracle database, DB2, Informix, and SQL Server, TimesTen's data is...
- To cache ORACLE tables
- SafePeak - Automated caching of result sets of queries and procedures from SQL Server, with automated cache eviction for full data correctness