Data validation
Encyclopedia
In computer science
, data validation is the process of ensuring that a program operates on clean, correct and useful data. It uses routines, often called "validation rule
s" or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system. The rules may be implemented through the automated facilities of a data dictionary
, or by the inclusion of explicit application program validation logic.
For business applications, data validation can be defined through declarative
data integrity
rules, or procedure-based
business rules. Data that does not conform to these rules will negatively affect business process execution. Therefore, data validation should start with business process definition and set of business rules within this process. Rules can be collected through the requirements capture exercise.
The simplest data validation verifies that the characters provided come from a valid set. For example, telephone numbers should include the digit
s and possibly the characters
Incorrect data validation can lead to data corruption
or a security vulnerability. Data validation checks that data are valid, sensible, reasonable, and secure before they are processed.
Batch totals
Cardinality check
Check digits
Consistency checks
Control totals
Cross-system consistency checks
Data type checks
File existence check
Format or picture check
Hash totals
Limit check
Logic check
Presence check
Range check
Referential integrity
Spelling and grammar check
Uniqueness check
Table Look Up Check
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, data validation is the process of ensuring that a program operates on clean, correct and useful data. It uses routines, often called "validation rule
Validation rule
A Validation rule is a criterion used in the process of data validation, carried out after the data has been encoded onto an input medium and involves a data vet or validation program...
s" or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system. The rules may be implemented through the automated facilities of a data dictionary
Data dictionary
A data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format." The term may have one of several closely related meanings pertaining to...
, or by the inclusion of explicit application program validation logic.
For business applications, data validation can be defined through declarative
Declarative programming
In computer science, declarative programming is a programming paradigm that expresses the logic of a computation without describing its control flow. Many languages applying this style attempt to minimize or eliminate side effects by describing what the program should accomplish, rather than...
data integrity
Data integrity
Data Integrity in its broadest meaning refers to the trustworthiness of system resources over their entire life cycle. In more analytic terms, it is "the representational faithfulness of information to the true state of the object that the information represents, where representational faithfulness...
rules, or procedure-based
Imperative programming
In computer science, imperative programming is a programming paradigm that describes computation in terms of statements that change a program state...
business rules. Data that does not conform to these rules will negatively affect business process execution. Therefore, data validation should start with business process definition and set of business rules within this process. Rules can be collected through the requirements capture exercise.
The simplest data validation verifies that the characters provided come from a valid set. For example, telephone numbers should include the digit
Numerical digit
A digit is a symbol used in combinations to represent numbers in positional numeral systems. The name "digit" comes from the fact that the 10 digits of the hands correspond to the 10 symbols of the common base 10 number system, i.e...
s and possibly the characters
+
, -
, (
, and )
(plus, minus, and brackets). A more sophisticated data validation routine would check to see the user had entered a valid country code, i.e., that the number of digits entered matched the convention for the country or area specified.Incorrect data validation can lead to data corruption
Data corruption
Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data...
or a security vulnerability. Data validation checks that data are valid, sensible, reasonable, and secure before they are processed.
Validation methods
Allowed character checks- Checks that ascertain that only expected characters are present in a field. For example a numeric field may only allow the digits 0-9, the decimal point and perhaps a minus sign or commas. A text field such as a personal name might disallow characters such as < and >, as they could be evidence of a markupHTMLHyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
-based security attack. An e-mailE-mailElectronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...
address might require exactly one @ sign and various other structural details. Regular expressionRegular expressionIn computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...
s are effective ways of implementing such checks. (See also data type checks below)
Batch totals
- Checks for missing records. Numerical fields may be added together for all records in a batch. The batch total is entered and the computer checks that the total is correct, e.g., add the 'Total Cost' field of a number of transactions together.
Cardinality check
- Checks that record has a valid number of related records. For example if Contact record classified as a Customer it must have at least one associated Order (Cardinality > 0). If order does not exist for a "customer" record then it must be either changed to "seed" or the order must be created. This type of rule can be complicated by additional conditions. For example if contact record in Payroll database is marked as "former employee", then this record must not have any associated salary payments after the date on which employee left organisation (Cardinality = 0).
Check digits
- Used for numerical data. An extra digit is added to a number which is calculated from the digits. The computer checks this calculation when data are entered. For example the last digit of an ISBN for a book is a check digit calculated modulusModular arithmeticIn mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value—the modulus....
10.
Consistency checks
- Checks fields to ensure data in these fields corresponds, e.g., If Title = "Mr.", then Gender = "M".
Control totals
- This is a total done on one or more numeric fields which appears in every record. This is a meaningful total, e.g., add the total payment for a number of Customers.
Cross-system consistency checks
- Compares data in different systems to ensure it is consistent, e.g., The address for the customer with the same id is the same in both systems. The data may be represented differently in different systems and may need to be transformed to a common format to be compared, e.g., one system may store customer name in a single Name field as 'Doe, John Q', while another in three different fields: First_Name (John), Last_Name (Doe) and Middle_Name (Quality); to compare the two, the validation engine would have to transform data from the second system to match the data from the first, for example, using SQL: Last_Name || ', ' || First_Name || substr(Middle_Name, 1, 1) would convert the data from the second system to look like the data from the first 'Doe, John Q'
Data type checks
- Checks the data type of the input and give an error message if the input data does not match with the chosen data type, e.g., In an input box accepting numeric data, if the letter 'O' was typed instead of the number zero, an error message would appear.
File existence check
- Checks that a file with a specified name exists. This check is essential for programs that use file handling.
Format or picture check
- Checks that the data is in a specified format (template), e.g., dates have to be in the format DD/MM/YYYY.
- Regular expressions should be considered for this type of validation.
Hash totals
- This is just a batch total done on one or more numeric fields which appears in every record. This is a meaningless total, e.g., add the Telephone Numbers together for a number of Customers.
Limit check
- Unlike range checks, data is checked for one limit only, upper OR lower, e.g., data should not be greater than 2 (<=2).
Logic check
- Checks that an input does not yield a logical error, e.g., an input value should not be 0 when there will be a number that divides it somewhere in a program.
Presence check
- Checks that important data are actually present and have not been missed out, e.g., customers may be required to have their telephone numbers listed.
Range check
- Checks that the data lie within a specified range of values, e.g., the month of a person's date of birth should lie between 1 and 12.
Referential integrity
Referential integrity
Referential integrity is a property of data which, when satisfied, requires every value of one attribute of a relation to exist as a value of another attribute in a different relation ....
- In modern Relational databaseRelational databaseA relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
values in two tables can be linked through foreign keyForeign keyIn the context of relational databases, a foreign key is a referential constraint between two tables.A foreign key is a field in a relational table that matches a candidate key of another table...
and primary key. If values in the primary key field are not constrained by database internal mechanism, then they should be validated. Validation of the foreign key field checks that referencing table must always refer to a valid row in the referenced table.
Spelling and grammar check
- Looks for spelling and grammatical errors.
Uniqueness check
- Checks that each value is unique. This can be applied to several fields (i.e. Address, First Name, Last Name).
Table Look Up Check
- A table look up check takes the entered data item and compares it to a valid list of entries that are stored in a database table.