SFV
Encyclopedia
Simple file verification (SFV) is a file format for storing CRC32 checksum
s of files to verify the integrity of files. SFV is used to verify that a file has not been corrupted
, but does not otherwise verify its authenticity. The .sfv file extension is usually used for SFV files.
, errors in transmission
, write errors during copying
or moving, and software bug
s. SFV verification ensures that a file has not been corrupted by comparing the file's CRC
hash
value to a previously calculated value. Due to the nature of hash functions, hash collision
s may result in false positives, but the likelihood of collisions is usually negligible with random corruption. (The number of possible checksums is limited though large, so that with any checksum scheme many files will have the same checksum, although the probability of a corrupted file having the same checksum as its original is exceedingly small unless deliberately constructed to maintain the checksum.)
SFV cannot be used to verify the authenticity of files, as CRC32 is not a collision resistant
hash function; even if the hash sum file is not tampered with, it is computationally trivial for an attacker to cause deliberate hash collisions, meaning that a malicious change in the file is not detected by a hash comparison. In cryptography, this attack is called a collision attack
. For this reason, the md5sum
and sha1sum
utilities are often preferred in Unix
operating systems, which use the MD5
and SHA-1 cryptographic hash function
s respectively.
Even a single-bit error causes both SFV's CRC and md5sum's cryptographic hash to fail, requiring the entire file to be re-fetched.
The Parchive
and rsync
utilities are often preferred for verifying that a file has not been accidentally corrupted in transmission, since they can correct common small errors with a much shorter download.
Despite the weaknesses of the SFV format, it is popular due to the relatively small amount of time taken by SFV utilities to calculate the CRC32 checksums when compared to the time taken to calculate cryptographic hashes such as MD5
or SHA-1.
SFV uses a plain text
file containing one line for each file and its checksum in the format FILENAMECHECKSUM . Any line starting with a semicolon ';' is considered to be a comment and is ignored for the purposes of file verification. The delimiter between the filename and checksum is always one or several spaces; tabs are never used. A sample SFV file is:
file_one.zip c45ad668
file_two.zip 7903b8e6
file_three.zip e99a65fb
Checksum
A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...
s of files to verify the integrity of files. SFV is used to verify that a file has not been corrupted
Data corruption
Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data...
, but does not otherwise verify its authenticity. The .sfv file extension is usually used for SFV files.
Checksum
Files can become corrupted for a variety of reasons including faulty storage mediaComputer storage
Computer data storage, often called storage or memory, refers to computer components and recording media that retain digital data. Data storage is one of the core functions and fundamental components of computers....
, errors in transmission
Transmission (telecommunications)
Transmission, in telecommunications, is the process of sending, propagating and receiving an analogue or digital information signal over a physical point-to-point or point-to-multipoint transmission medium, either wired, optical fiber or wireless...
, write errors during copying
Copying
Copying is the duplication of information or an artifact based only on an instance of that information or artifact, and not using the process that originally generated it. With analog forms of information, copying is only possible to a limited degree of accuracy, which depends on the quality of the...
or moving, and software bug
Software bug
A software bug is the common term used to describe an error, flaw, mistake, failure, or fault in a computer program or system that produces an incorrect or unexpected result, or causes it to behave in unintended ways. Most bugs arise from mistakes and errors made by people in either a program's...
s. SFV verification ensures that a file has not been corrupted by comparing the file's CRC
Cyclic redundancy check
A cyclic redundancy check is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data...
hash
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...
value to a previously calculated value. Due to the nature of hash functions, hash collision
Hash collision
Not to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....
s may result in false positives, but the likelihood of collisions is usually negligible with random corruption. (The number of possible checksums is limited though large, so that with any checksum scheme many files will have the same checksum, although the probability of a corrupted file having the same checksum as its original is exceedingly small unless deliberately constructed to maintain the checksum.)
SFV cannot be used to verify the authenticity of files, as CRC32 is not a collision resistant
Collision resistance
Collision resistance is a property of cryptographic hash functions: a hash function is collision resistant if it is hard to find two inputs that hash to the same output; that is, two inputs a and b such that H = H, and a ≠ b.Every hash function with more inputs than outputs will necessarily have...
hash function; even if the hash sum file is not tampered with, it is computationally trivial for an attacker to cause deliberate hash collisions, meaning that a malicious change in the file is not detected by a hash comparison. In cryptography, this attack is called a collision attack
Collision attack
In cryptography, a collision attack on a cryptographic hash tries to find two arbitrary inputs that will produce the same hash value, i.e. a hash collision...
. For this reason, the md5sum
Md5sum
md5sum is a computer program that calculates and verifies 128-bit MD5 hashes, as described in RFC 1321. The MD5 hash functions as a compact digital fingerprint of a file. As with all such hashing algorithms, there is theoretically an unlimited number of files that will have any given MD5 hash...
and sha1sum
Sha1sum
sha1sum is a computer program that calculates and verifies SHA-1 hashes. It is commonly used to verify the integrity of files. It is installed by default in most Unix-like operating systems...
utilities are often preferred in Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
operating systems, which use the MD5
MD5
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...
and SHA-1 cryptographic hash function
Cryptographic hash function
A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the hash value, such that an accidental or intentional change to the data will change the hash value...
s respectively.
Even a single-bit error causes both SFV's CRC and md5sum's cryptographic hash to fail, requiring the entire file to be re-fetched.
The Parchive
Parchive
Parchive is an open source software project that emerged in 2001 to develop a parity file format, as conceived by Tobias Rieper and Stefan Wehlus...
and rsync
Rsync
rsync is a software application and network protocol for Unix-like and Windows systems which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar...
utilities are often preferred for verifying that a file has not been accidentally corrupted in transmission, since they can correct common small errors with a much shorter download.
Despite the weaknesses of the SFV format, it is popular due to the relatively small amount of time taken by SFV utilities to calculate the CRC32 checksums when compared to the time taken to calculate cryptographic hashes such as MD5
MD5
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...
or SHA-1.
SFV uses a plain text
Plain text
In computing, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text....
file containing one line for each file and its checksum in the format FILENAME
file_one.zip c45ad668
file_two.zip 7903b8e6
file_three.zip e99a65fb
See also
- File verificationFile verificationFile verification is the process of using an algorithm for verifying the integrity or authenticity of a computer file. This can be done by comparing two files bit-by-bit, but requires two copies of the same file, and may miss systematic corruptions which might occur to both files...
- Comparison of file verification software
- Cyclic redundancy checkCyclic redundancy checkA cyclic redundancy check is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data...
(CRC)
External links
- Online SFV Calculator
- QuickSFV SFV checksum verifier (Windows and Linux)
- wxChecksums- Opensource Windows/Linux application
- Check SFV - SFV software for UNIX systems
- checkSum+, Mac OS X, MD5 compatible, free
- Checksums calculator a checksums calculator for Windows, Linux, Mac OS X.
Windows only
- RapidCRC- Freeware application
- Advanced CheckSum Verifier - SFV and MD5 utility
- AmoK SFV Utility - CRC32 and MD5 Compatible
- SFV Checker
- SFVManager
- SlavaSoft FSUM - Fast File Integrity Checker
- HashCheck Shell Extension - SFV, MD4, MD5, SHA1 (Multi-Language)
- Total Commander - supports creation and verification of SFV files
- hkSFV - supports creation and verification of SFV files (crashes on massive SFV files check)
- DySFV - Open Source (Free) application (the best for massive SFV files check)