Filesystem API
Encyclopedia
A file system
API is an application programming interface
through which a utility or user program requests services of a file system. An operating system may provide abstractions for accessing different file systems transparently.
Some file system APIs may also include interfaces for maintenance operations, such as creating or initializing a file system, verifying the file system for integrity, and defragmentation
.
Each operating system includes the APIs needed for the filesystems it supports. Microsoft Windows
has file system APIs for NTFS
and several FAT
file systems. Linux
systems can include APIs for ext2
, ext3
, ReiserFS
, and Btrfs
to name a few.
s. These provided the most basic of interfaces with:
More coordination such as device allocation and deallocation required the addition of:
As filesystem provided more services, more interfaces were defined:
As additional filesystem types, hierarchy structure and supported media increased features needed some specialized functions:
Multi-user systems required APIs for:
Reading user data, sometimes called GET, may include a direction (forward or reverse) or in the case of a keyed filessystems a specific key. As with writing run-time libraries may intercede for the user program. or
Positioning includes adjusting the location of the next record. This may include skipping forward or reverse as well as positioning to the beginning or end of the file.
API may be explicitly requested or implicitly invoked upon the issuance of the first operation by a process on an object. It may cause the mounting of removable media, establishing a connection to another host and validating the location and accessibility of the object. It updates system structures to indicate that the object is in use.
Usual requirements for requesting access to a file system object include:
Additional information may be necessary, for example a password.
Additionally a declaration that other processes may access the same object while the opening process is using the object (sharing). This may depend on the intent of the other process. In contrast a declaration that no other process may access the object regardless of the other processes intent (exclusive use).
These are requested via a programming language library which may provide coordination among modules in the process in addition to forwarding the request to the file system.
MSDN details for C# C++ F# VB
It must be expected that something may go wrong during the processing of the open.
Depending on the programming language, additional specifications in the open may establish the modules to handle these conditions. Some libraries specify a library module to the file system permitting analysis should the opening program be unable to perform any meaningful action as a result of a failure. For example if the failure is on the attempt to open the necessary input file the only action may be to report the failure and abort the program. Some languages simply return a code indicating the type of failure which always must be checked by the program which decides what to report and if it can continue.
Close
may cause un-mounting or ejecting removable media and will update library and file system structures to indicate that the object is no longer in use.
The minimal specification to the close references the object. Additionally some files systems provide specifing a disposition of the object which may indicate the object is to be discarded and no longer be part of the file system.
Similar to the open, it must be expected that something may go wrong.
Considerations for handling a failure are similar to those of the open.
Some of the meta data is maintained by the filesystem, for example last-modification-date (and various other dates depending on the filesystem),
location of the beginning of the file, the size of the file and if the filesystem backup utility has saved the current version of the files. These items cannot usually be altered by a user program.
Additional meta data supported by some file systems may include the owner of the file, the group to which the file belongs as well as permissions and/or access control (i.e. What access and updates various users or groups may perform), and whether the file is normally visible when the directory is listed. These items are usually modifiable by file system utilities which may be executed by the owner.
Some applications store more meta-data. For images the meta data may include the camera model and settings used to take the photo. For audio files, the meta data may include the album, artist who recorded the recording and comments about the recording which may be specific to a particular copy of the file (i.e. different copies of the same recording may have different comments as update by the owner of the file). Documents may include items like checked-by, approved-by, etc.
Meta data operations such as permitting or restricting access the a directory by various users or groups of users are usually included.
Specialized routines in the file system are included to optimize or repair these structures. They are not usually invoked by the user directly but triggered within the filesystem itself. Internal counters of the number of levels of structures, number of inserted objects may be compared against thresholds. These may cause user access to be suspended to a specific structure (usually to the displeasure(?) of the user or users effected) or may be started as low priority asynchronous tasks or they may be deferred to a time of low user activity. Sometimes these routines are invoked or scheduled by the system manager or as in the case of defragmentation
.
It differs with the old schema in that the kernel itself uses its own facilities to talk with the filesystem driver and vice-versa, as contrary to the kernel being the one that handles the filesystem layout and the filesystem the one that directly access the hardware.
It is not the cleanest scheme but resolves the difficulties of major rewrite that has the old scheme.
With modular kernels it allows adding filesystems as any kernel module, even third party ones. With non-modular kernels however it requires the kernel to be recompiled with the new filesystem code (and in closed-source kernels, this makes third party filesystem impossible).
Unix
es and Unix-like
systems such as Linux
have used this modular scheme.
There is a variation of this scheme used in MS-DOS
(DOS 4.0 onward) and compatibles to support CD-ROM and network filesystems. Instead of adding code to the kernel, as in the old scheme, or using kernel facilities as in the kernel-based scheme, it traps all calls to a file and identifies if it should be redirected to the kernel's equivalent function or if it has to be handled by the specific filesystem driver, and the filesystem driver "directly" access the disk contents using low-level BIOS
functions.
It is a cleaner scheme as the filesystem code is totally independent, it allows filesystems to be created for closed-source kernels and online filesystem additions or removals from the system.
Examples of this scheme are the Windows NT
and OS/2
respective IFSs
.
This scheme was used in Windows 3.1 for providing a FAT filesystem driver in 32-bit protected mode, and cached, (VFAT) that bypassed the DOS FAT driver in the kernel (MSDOS.SYS) completely, and later in the Windows 9x series (95
, 98
and Me
) for VFAT, the ISO9660 filesystem driver (along with Joliet), network shares, and third party filesystem drivers, as well as adding to the original DOS APIs the LFN API (that IFS drivers can not only intercept the already existent DOS file APIs but also add new ones from within the 32-bit protected mode executable).
However that API was not completely documented, and third parties found themselves in a "make-it-by-yourself" scenario even worse than with kernel-based APIs.
when the filesystem does not directly use kernel facilities but accesses disks using high-level operating system functions and provides functions in a library that a series of utilities use to access the filesystem.
This is useful for handling disk images.
The advantage is that a filesystem can be made portable between operating systems as the high-level operating system functions it uses can be as common as ANSI C, but the disadvantage is that the API is unique to each application that implements one.
Examples of this scheme are the hfsutils and the adflib.
For example, the ext2 driver for OS/2 is simply a wrapper from the Linux's VFS to the OS/2's IFS and the Linux's ext2 kernel-based, and the HFS driver for OS/2 is a port of the hfsutils to the OS/2's IFS. There also exists a project that uses a Windows NT IFS driver for making NTFS work under Linux.
File system
A file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device which contain it. A file system organizes data in an efficient manner and is tuned to the...
API is an application programming interface
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
through which a utility or user program requests services of a file system. An operating system may provide abstractions for accessing different file systems transparently.
Some file system APIs may also include interfaces for maintenance operations, such as creating or initializing a file system, verifying the file system for integrity, and defragmentation
Defragmentation
In the maintenance of file systems, defragmentation is a process that reduces the amount of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contiguous regions . It also attempts to create larger regions of...
.
Each operating system includes the APIs needed for the filesystems it supports. Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
has file system APIs for NTFS
NTFS
NTFS is the standard file system of Windows NT, including its later versions Windows 2000, Windows XP, Windows Server 2003, Windows Server 2008, Windows Vista, and Windows 7....
and several FAT
File Allocation Table
File Allocation Table is a computer file system architecture now widely used on many computer systems and most memory cards, such as those used with digital cameras. FAT file systems are commonly found on floppy disks, flash memory cards, digital cameras, and many other portable devices because of...
file systems. Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
systems can include APIs for ext2
Ext2
The ext2 or second extended filesystem is a file system for the Linux kernel. It was initially designed by Rémy Card as a replacement for the extended file system ....
, ext3
Ext3
The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions, including Debian...
, ReiserFS
ReiserFS
ReiserFS is a general-purpose, journaled computer file system designed and implemented by a team at Namesys led by Hans Reiser. ReiserFS is currently supported on Linux . Introduced in version 2.4.1 of the Linux kernel, it was the first journaling file system to be included in the standard kernel...
, and Btrfs
Btrfs
Btrfs is a GPL-licensed copy-on-write file system for Linux.Development began at Oracle Corporation in 2007....
to name a few.
History
Some early operating systems were capable of handling only tape and disk file systemFile system
A file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device which contain it. A file system organizes data in an efficient manner and is tuned to the...
s. These provided the most basic of interfaces with:
- Write, read and position
More coordination such as device allocation and deallocation required the addition of:
- Open and close
As filesystem provided more services, more interfaces were defined:
- Meta data management
- File system maintenance
As additional filesystem types, hierarchy structure and supported media increased features needed some specialized functions:
- Directory management
- Data structure management
- Record management
- Non-data operationsFile descriptorIn computer programming, a file descriptor is an abstract indicator for accessing a file. The term is generally used in POSIX operating systems...
Multi-user systems required APIs for:
- Sharing
- Restricting access
- Encryption
write, read and position
Writing user data to a file system is provided for use directly by the user program or the run-time library. The run-time library for some programing languages may provide type conversion, formatting and blocking. Some filessystems provide identification of records by key and may include re-writing an existing record. This operation is sometimes called http://www.prycroft6.com.au/misc/download/GC26-3875-0_MVS_DataMgmtSrvcsGde_Aug78OCR.pdfPUT
or PUTX
(if the record exists)Reading user data, sometimes called GET, may include a direction (forward or reverse) or in the case of a keyed filessystems a specific key. As with writing run-time libraries may intercede for the user program. or
Positioning includes adjusting the location of the next record. This may include skipping forward or reverse as well as positioning to the beginning or end of the file.
open and close
The openOpen (system call)
For most file systems, a program initializes access to a file in a filesystem using the open system call. This allocates resources associated to the file , and returns a handle that the process will use to refer to that file...
API may be explicitly requested or implicitly invoked upon the issuance of the first operation by a process on an object. It may cause the mounting of removable media, establishing a connection to another host and validating the location and accessibility of the object. It updates system structures to indicate that the object is in use.
Usual requirements for requesting access to a file system object include:
- The object which is to be accessed (file, directory, media and location)
- The intended type of operations to be performed after the open ( reads, updates, deletions)
Additional information may be necessary, for example a password.
Additionally a declaration that other processes may access the same object while the opening process is using the object (sharing). This may depend on the intent of the other process. In contrast a declaration that no other process may access the object regardless of the other processes intent (exclusive use).
These are requested via a programming language library which may provide coordination among modules in the process in addition to forwarding the request to the file system.
MSDN details for C# C++ F# VB
It must be expected that something may go wrong during the processing of the open.
- The object or intent may be improperly specified (the name may include an unacceptable character in the name or the intent is unrecognized).
- The process may be prohibited from accessing the object (it may be only accessible by a group or specific user).
- The file system may have be unable to create or update structures required to coordinate activities among users.
- In the case of a new (or replacement) object there may not be sufficient capacity on the media.
Depending on the programming language, additional specifications in the open may establish the modules to handle these conditions. Some libraries specify a library module to the file system permitting analysis should the opening program be unable to perform any meaningful action as a result of a failure. For example if the failure is on the attempt to open the necessary input file the only action may be to report the failure and abort the program. Some languages simply return a code indicating the type of failure which always must be checked by the program which decides what to report and if it can continue.
Close
Close (system call)
For most file systems, a program terminates access to a file in a filesystem using the close system call. This flushes buffers, updates file metadata , de-allocates resources associated with the file and updates the system wide table of files in use...
may cause un-mounting or ejecting removable media and will update library and file system structures to indicate that the object is no longer in use.
The minimal specification to the close references the object. Additionally some files systems provide specifing a disposition of the object which may indicate the object is to be discarded and no longer be part of the file system.
Similar to the open, it must be expected that something may go wrong.
- The specification of the object may be incorrect.
- There may not be sufficient capacity on the media to save any data being buffered or to output a structure indication that the object was successfully updated.
- An device error may occur on the media where the object is stored while writing buffered data, the completion structure or updating meta data related to the object (for example last access time).
- A specification to release the object may be inconsistent with the fact that other processes are still using the object.
Considerations for handling a failure are similar to those of the open.
meta data management
Information about the data in a file is called meta-data.Some of the meta data is maintained by the filesystem, for example last-modification-date (and various other dates depending on the filesystem),
location of the beginning of the file, the size of the file and if the filesystem backup utility has saved the current version of the files. These items cannot usually be altered by a user program.
Additional meta data supported by some file systems may include the owner of the file, the group to which the file belongs as well as permissions and/or access control (i.e. What access and updates various users or groups may perform), and whether the file is normally visible when the directory is listed. These items are usually modifiable by file system utilities which may be executed by the owner.
Some applications store more meta-data. For images the meta data may include the camera model and settings used to take the photo. For audio files, the meta data may include the album, artist who recorded the recording and comments about the recording which may be specific to a particular copy of the file (i.e. different copies of the same recording may have different comments as update by the owner of the file). Documents may include items like checked-by, approved-by, etc.
directory management
Renaming a file, moving a file (or a subdirectory) from one directory to another and deleting a file are examples of the operations provide by the file system for the management of directories.Meta data operations such as permitting or restricting access the a directory by various users or groups of users are usually included.
filesystem maintenance
As a filesystem is used directories, files and records may be added, deleted or modified. This usually causes inefficiencies in the underlying data structures. Things like logically sequential blocks distributed across the media in a way that causes excessive repositioning, partially used even empty blocks included in linked structures. Incomplete structures or other inconsistencies may be caused by device or media errors, inadequate time between detection of impending loss of power and actual power loss, improper system shutdown or media removal, and on very rare occasions filesytem coding errors.Specialized routines in the file system are included to optimize or repair these structures. They are not usually invoked by the user directly but triggered within the filesystem itself. Internal counters of the number of levels of structures, number of inserted objects may be compared against thresholds. These may cause user access to be suspended to a specific structure (usually to the displeasure(?) of the user or users effected) or may be started as low priority asynchronous tasks or they may be deferred to a time of low user activity. Sometimes these routines are invoked or scheduled by the system manager or as in the case of defragmentation
Defragmentation
In the maintenance of file systems, defragmentation is a process that reduces the amount of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contiguous regions . It also attempts to create larger regions of...
.
Kernel-level API
The API is "kernel-level" when the kernel not only provides the interfaces for the filesystems developers but is also the space in which the filesystem code resides.It differs with the old schema in that the kernel itself uses its own facilities to talk with the filesystem driver and vice-versa, as contrary to the kernel being the one that handles the filesystem layout and the filesystem the one that directly access the hardware.
It is not the cleanest scheme but resolves the difficulties of major rewrite that has the old scheme.
With modular kernels it allows adding filesystems as any kernel module, even third party ones. With non-modular kernels however it requires the kernel to be recompiled with the new filesystem code (and in closed-source kernels, this makes third party filesystem impossible).
Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
es and Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....
systems such as Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
have used this modular scheme.
There is a variation of this scheme used in MS-DOS
MS-DOS
MS-DOS is an operating system for x86-based personal computers. It was the most commonly used member of the DOS family of operating systems, and was the main operating system for IBM PC compatible personal computers during the 1980s to the mid 1990s, until it was gradually superseded by operating...
(DOS 4.0 onward) and compatibles to support CD-ROM and network filesystems. Instead of adding code to the kernel, as in the old scheme, or using kernel facilities as in the kernel-based scheme, it traps all calls to a file and identifies if it should be redirected to the kernel's equivalent function or if it has to be handled by the specific filesystem driver, and the filesystem driver "directly" access the disk contents using low-level BIOS
BIOS
In IBM PC compatible computers, the basic input/output system , also known as the System BIOS or ROM BIOS , is a de facto standard defining a firmware interface....
functions.
Driver-based API
The API is "driver-based" when the kernel provides facilities but the filesystem code resides totally external to the kernel (not even as a module of a modular kernel).It is a cleaner scheme as the filesystem code is totally independent, it allows filesystems to be created for closed-source kernels and online filesystem additions or removals from the system.
Examples of this scheme are the Windows NT
Windows NT
Windows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. It was a powerful high-level-language-based, processor-independent, multiprocessing, multiuser operating system with features comparable to Unix. It was intended to complement...
and OS/2
OS/2
OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "Personal System/2 " line of second-generation personal...
respective IFSs
Installable File System
The Installable File System is a filesystem API in IBM OS/2 and Microsoft Windows that enables the operating system to recognize and load drivers for file systems...
.
Mixed kernel-driver-based API
In this API all filesystems are in the kernel, like in kernel-based APIs, but they are automatically trapped by another API, that is driver-based, by the OS.This scheme was used in Windows 3.1 for providing a FAT filesystem driver in 32-bit protected mode, and cached, (VFAT) that bypassed the DOS FAT driver in the kernel (MSDOS.SYS) completely, and later in the Windows 9x series (95
Windows 95
Windows 95 is a consumer-oriented graphical user interface-based operating system. It was released on August 24, 1995 by Microsoft, and was a significant progression from the company's previous Windows products...
, 98
Windows 98
Windows 98 is a graphical operating system by Microsoft. It is the second major release in the Windows 9x line of operating systems. It was released to manufacturing on 15 May 1998 and to retail on 25 June 1998. Windows 98 is the successor to Windows 95. Like its predecessor, it is a hybrid...
and Me
Windows Me
Windows Millennium Edition, or Windows Me , is a graphical operating system released on September 14, 2000 by Microsoft, and was the last operating system released in the Windows 9x series. Support for Windows Me ended on July 11, 2006....
) for VFAT, the ISO9660 filesystem driver (along with Joliet), network shares, and third party filesystem drivers, as well as adding to the original DOS APIs the LFN API (that IFS drivers can not only intercept the already existent DOS file APIs but also add new ones from within the 32-bit protected mode executable).
However that API was not completely documented, and third parties found themselves in a "make-it-by-yourself" scenario even worse than with kernel-based APIs.
User space API
The API is in the user spaceUser space
A conventional computer operating system usually segregates virtual memory into kernel space and user space. Kernel space is strictly reserved for running the kernel, kernel extensions, and most device drivers...
when the filesystem does not directly use kernel facilities but accesses disks using high-level operating system functions and provides functions in a library that a series of utilities use to access the filesystem.
This is useful for handling disk images.
The advantage is that a filesystem can be made portable between operating systems as the high-level operating system functions it uses can be as common as ANSI C, but the disadvantage is that the API is unique to each application that implements one.
Examples of this scheme are the hfsutils and the adflib.
Interoperatibility between filesystem APIs
As all filesystems (at least the disk ones) need equivalent functions provided by the kernel, it is possible to easily port a filesystem code from one API to another, even if they are of different types.For example, the ext2 driver for OS/2 is simply a wrapper from the Linux's VFS to the OS/2's IFS and the Linux's ext2 kernel-based, and the HFS driver for OS/2 is a port of the hfsutils to the OS/2's IFS. There also exists a project that uses a Windows NT IFS driver for making NTFS work under Linux.
See also
- List of file systems
- Comparison of file systemsComparison of file systems-General information:-Limits:-Metadata:-Features:-Allocation and layout policies:-Supporting operating systems:-See also:* Comparison of archive formats* Comparison of file archivers* List of archive formats* List of file archivers...
- File systemFile systemA file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device which contain it. A file system organizes data in an efficient manner and is tuned to the...
- Filename extensionFilename extensionA filename extension is a suffix to the name of a computer file applied to indicate the encoding of its contents or usage....
- Filing Open Service Interface DefinitionFiling Open Service Interface DefinitionThe Filing Open Service Interface Definition is an O.K.I. specification.OSIDs are programmatic interfaces which comprise a Service Oriented Architecture for designing and building reusable and interoperable software....
(OSID) - Installable File SystemInstallable File SystemThe Installable File System is a filesystem API in IBM OS/2 and Microsoft Windows that enables the operating system to recognize and load drivers for file systems...
(IFS) - Virtual file systemVirtual file systemA virtual file system or virtual filesystem switch is an abstraction layer on top of a more concrete file system. The purpose of a VFS is to allow client applications to access different types of concrete file systems in a uniform way...