Single-instance storage
Encyclopedia
Single-instance storage is a system's ability to keep one copy of content that multiple users or computers share. It is a means to eliminate data duplication and to increase efficiency. SIS is frequently implemented in file system
s, e-mail server software, data
backup
and other storage-related solutions.
In the case of an e-mail server, single-instance storage would mean that a single copy of a message is held within its database
whilst individual mailboxes access the content through a reference pointer. However there is a common misconception that the primary benefit of single instance storage in mail server solutions is a reduction in disk space requirements. The truth is that its primary benefit is to greatly enhance delivery efficiency of messages sent to large distribution lists. In a mail server scenario disk space savings from single instance storage are transient and drop off very quickly over time.
When used in conjunction with a backup solution, single instance storage can reduce the quantity of archive
media required since it avoids storing duplicate copies of the same file. Often identical files are installed on multiple computers, for example operating system
files. With solutions that use single instance storage, only one copy of a file is written to the backup media therefore reducing space. This becomes more important when the storage is offsite and on cloud such as Storage as a Service like Amazon S3. In such cases, it has been reported that deduplication can help reduce the costs of storage, costs of bandwidth and backup windows by up to 10:1.
Novell GroupWise
was built on single-instance storage which accounts for the large data stores that GroupWise is able to achieve.
ISO CD/DVD image files can be optimized to use SIS to reduce the size of a CD/DVD compilation (if there are enough duplicated files) to make it fit into smaller media.
SIS is related to system wide file duplication search and multiple file instance detection tools such as the P2P application Bearshare
(5.n Versions and below) but differs in that SIS reduces storage utilization automatically and creates and retains symbolic linkages, whereas Bearshare allows for manual deletion of duplicates and associated user level file system, Windows Explorer
type of icon links.
, Microsoft
has a patent related to Single Instance Storage.
Single Instance Storage (SIS) was introduced with the Remote Installation Services
feature of Windows 2000 Server. A typical server might hold ten or more unique installation configurations (perhaps with different drivers
or software suites
) but perhaps only 20% of the data may be unique between configurations. Microsoft states that "SIS works by searching a hard disk volume to identify duplicate files. When SIS finds identical files, it saves one copy of the file to a central repository, called the SIS Common Store, and replaces other copies with pointers
to the stored versions." Files are compared solely by their hashes
; files with different names or dates can be consolidated so long as the data itself is identical.. Windows Server 2003
Standard Edition has SIS capabilities but is limited to OEM OS system installs.
The file-based Windows Imaging Format introduced in Windows Vista
also supports single-instance storage. Single-instance storage has been a feature of Microsoft Exchange Server
since version 4.0 and is also present in Microsoft's Windows Home Server
. It is deduplicating attachments only in Exchange 2007 and was dropped completely in Microsoft Exchange Server 2010. It is protected by several patent applications, including United States Patent numbers 6389433 and 6477544.
Microsoft announced Windows Storage Server 2008 (WSS2008) with Single Instance Storage on June 1, 2009, and states this feature is not available on Windows Server 2008.
File system
A file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device which contain it. A file system organizes data in an efficient manner and is tuned to the...
s, e-mail server software, data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
backup
Backup
In information technology, a backup or the process of backing up is making copies of data which may be used to restore the original after a data loss event. The verb form is back up in two words, whereas the noun is backup....
and other storage-related solutions.
In the case of an e-mail server, single-instance storage would mean that a single copy of a message is held within its database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
whilst individual mailboxes access the content through a reference pointer. However there is a common misconception that the primary benefit of single instance storage in mail server solutions is a reduction in disk space requirements. The truth is that its primary benefit is to greatly enhance delivery efficiency of messages sent to large distribution lists. In a mail server scenario disk space savings from single instance storage are transient and drop off very quickly over time.
When used in conjunction with a backup solution, single instance storage can reduce the quantity of archive
Archive
An archive is a collection of historical records, or the physical place they are located. Archives contain primary source documents that have accumulated over the course of an individual or organization's lifetime, and are kept to show the function of an organization...
media required since it avoids storing duplicate copies of the same file. Often identical files are installed on multiple computers, for example operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
files. With solutions that use single instance storage, only one copy of a file is written to the backup media therefore reducing space. This becomes more important when the storage is offsite and on cloud such as Storage as a Service like Amazon S3. In such cases, it has been reported that deduplication can help reduce the costs of storage, costs of bandwidth and backup windows by up to 10:1.
Novell GroupWise
Novell GroupWise
GroupWise is a messaging and collaborative software platform from Novell that supports email, calendaring, personal information management, instant messaging, and document management. The platform consists of the client software, which is available for Windows, Mac OS X, and Linux, and the server...
was built on single-instance storage which accounts for the large data stores that GroupWise is able to achieve.
ISO CD/DVD image files can be optimized to use SIS to reduce the size of a CD/DVD compilation (if there are enough duplicated files) to make it fit into smaller media.
SIS is related to system wide file duplication search and multiple file instance detection tools such as the P2P application Bearshare
BearShare
BearShare is a peer-to-peer file sharing application originally created by Free Peers, Inc. for Microsoft Windows, and now sold by MusicLab, LLC .- History :...
(5.n Versions and below) but differs in that SIS reduces storage utilization automatically and creates and retains symbolic linkages, whereas Bearshare allows for manual deletion of duplicates and associated user level file system, Windows Explorer
Windows Explorer
This article is about the Windows file system browser. For the similarly named web browser, see Internet ExplorerWindows Explorer is a file manager application that is included with releases of the Microsoft Windows operating system from Windows 95 onwards. It provides a graphical user interface...
type of icon links.
Microsoft
In the United StatesUnited States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...
, Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
has a patent related to Single Instance Storage.
Single Instance Storage (SIS) was introduced with the Remote Installation Services
Remote Installation Services
RIS, Remote Installation Services is a Microsoft-supplied server that allows PXE BIOS-enabled computers to remotely execute boot environment variables.-Background:...
feature of Windows 2000 Server. A typical server might hold ten or more unique installation configurations (perhaps with different drivers
Device driver
In computing, a device driver or software driver is a computer program allowing higher-level computer programs to interact with a hardware device....
or software suites
Software suite
A software suite or application suite is a collection of computer programs, usually application software and programming software of related functionality, often sharing a more-or-less common user interface and some ability to smoothly exchange data with each other.Sometimes software makers...
) but perhaps only 20% of the data may be unique between configurations. Microsoft states that "SIS works by searching a hard disk volume to identify duplicate files. When SIS finds identical files, it saves one copy of the file to a central repository, called the SIS Common Store, and replaces other copies with pointers
NTFS reparse point
An NTFS reparse point is a type of NTFS file system object. It is available with the NTFS v3.0 found in Windows 2000 or later versions. Reparse points provide a way to extend the NTFS filesystem by adding extra information to the directory entry, so a file system filter can interpret how the...
to the stored versions." Files are compared solely by their hashes
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...
; files with different names or dates can be consolidated so long as the data itself is identical.. Windows Server 2003
Windows Server 2003
Windows Server 2003 is a server operating system produced by Microsoft, introduced on 24 April 2003. An updated version, Windows Server 2003 R2, was released to manufacturing on 6 December 2005...
Standard Edition has SIS capabilities but is limited to OEM OS system installs.
The file-based Windows Imaging Format introduced in Windows Vista
Windows Vista
Windows Vista is an operating system released in several variations developed by Microsoft for use on personal computers, including home and business desktops, laptops, tablet PCs, and media center PCs...
also supports single-instance storage. Single-instance storage has been a feature of Microsoft Exchange Server
Microsoft Exchange Server
Microsoft Exchange Server is the server side of a client–server, collaborative application product developed by Microsoft. It is part of the Microsoft Servers line of server products and is used by enterprises using Microsoft infrastructure products...
since version 4.0 and is also present in Microsoft's Windows Home Server
Windows Home Server
Windows Home Server, code-named Quattro, is a home server operating system from Microsoft. Announced on 7 January 2007, at the Consumer Electronics Show by Bill Gates, Windows Home Server is intended to be a solution for homes with multiple connected PCs to offer file sharing, automated backups,...
. It is deduplicating attachments only in Exchange 2007 and was dropped completely in Microsoft Exchange Server 2010. It is protected by several patent applications, including United States Patent numbers 6389433 and 6477544.
Microsoft announced Windows Storage Server 2008 (WSS2008) with Single Instance Storage on June 1, 2009, and states this feature is not available on Windows Server 2008.
See also
- WinFSWinFSWinFS is the code name for a cancelled data storage and management system project based on relational databases, developed by Microsoft and first demonstrated in 2003 as an advanced storage subsystem for the Microsoft Windows operating system, designed for persistence and management of...
- Peer to peer file sharing
- Data deduplicationData deduplicationIn computing, data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data. The technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent across a link...
- Capacity optimizationCapacity optimizationCapacity optimization is a general term for technologies used to improve storage utilization by shrinking stored data. The primary technologies used for capacity optimization are deduplication and data compression. These solutions are delivered as software or hardware solution, integrated with...