Psychometrics
Encyclopedia
Psychometrics is the field of study concerned with the theory and technique of psychological measurement
, which includes the measurement of knowledge, abilities, attitudes, personality traits, and educational measurement
. The field is primarily concerned with the construction and validation of measurement instruments such as questionnaire
s, tests, and personality assessments.
It involves two major research tasks, namely: (i) the construction of instruments and procedures for measurement; and (ii) the development and refinement of theoretical approaches to measurement. Those who practice psychometrics are known as psychometricians. All psychometricians possess a specific psychometric qualification, and while many are clinical psychologists, others work as human resources
or learning and development professionals.
, often referred to as "the father of psychometrics", devised and included mental tests among his anthropometric measures. However, the origin of psychometrics also has connections to the related field of psychophysics
. Two other pioneers of psychometrics obtained doctorates in the Leipzig Psychophysics Laboratory under Wilhelm Wundt
: James McKeen Cattell
in 1886 and Charles Spearman
in 1906.
The psychometrician L. L. Thurstone, founder and first president of the Psychometric Society in 1936, developed and applied a theoretical approach to measurement referred to as the law of comparative judgment
, an approach that has close connections to the psychophysical theory of Ernst Heinrich Weber
and Gustav Fechner
. In addition, Spearman and Thurstone both made important contributions to the theory and application of factor analysis
, a statistical method developed and used extensively in psychometrics.
More recently, psychometric theory has been applied in the measurement of personality, attitudes, and beliefs, and academic achievement
. Measurement of these unobservable phenomena is difficult, and much of the research and accumulated science in this discipline has been developed in an attempt to properly define and quantify such phenomena. Critics, including practitioners in the physical sciences and social activists, have argued that such definition and quantification is impossibly difficult, and that such measurements are often misused, such as with psychometric personality tests used in employment procedures:
Figures who made significant contributions to psychometrics include Karl Pearson
, Henry F. Kaiser, L. L. Thurstone, Georg Rasch
, Johnson O'Connor
, Frederic M. Lord
, Ledyard R Tucker, Arthur Jensen
, and David Andrich
.
Psychometric, psychometrician and psychometrist appreciation week is the first week in November.
(1946), is that measurement is "the assignment of numerals to objects or events according to some rule". This definition was introduced in the paper in which Stevens proposed four levels of measurement. Although widely adopted, this definition differs in important respects from the more classical definition of measurement adopted in the physical sciences, which is that measurement is the numerical estimation and expression of the magnitude of one quantity relative to another (Michell, 1997).
Indeed, Stevens's definition of measurement was put forward in response to the British Ferguson Committee, whose chair, A. Ferguson, was a physicist. The committee was appointed in 1932 by the British Association for the Advancement of Science to investigate the possibility of quantitatively estimating sensory events. Although its chair and other members were physicists, the committee also included several psychologists. The committee's report highlighted the importance of the definition of measurement. While Stevens's response was to propose a new definition, which has had considerable influence in the field, this was by no means the only response to the report. Another, notably different, response was to accept the classical definition, as reflected in the following statement:
These divergent responses are reflected in alternative approaches to measurement. For example, methods based on covariance matrices
are typically employed on the premise that numbers, such as raw scores derived from assessments, are measurements. Such approaches implicitly entail Stevens's definition of measurement, which requires only that numbers are assigned according to some rule. The main research task, then, is generally considered to be the discovery of associations between scores, and of factors posited to underlie such associations.
On the other hand, when measurement models such as the Rasch model
are employed, numbers are not assigned based on a rule. Instead, in keeping with Reese's statement above, specific criteria for measurement are stated, and the goal is to construct procedures or operations that provide data that meet the relevant criteria. Measurements are estimated based on the models, and tests are conducted to ascertain whether the relevant criteria have been met.
, developed originally by the French psychologist Alfred Binet
. Intelligence tests are useful tools for various purposes. An alternative conception of intelligence is that cognitive capacities within individuals are a manifestation of a general component, or general intelligence factor
, as well as cognitive capacity specific to a given domain.
Psychometrics is applied widely in educational assessment to measure abilities in domains such as reading, writing, and mathematics. The main approaches in applying tests in these domains have been Classical Test Theory and the more recent Item Response Theory and Rasch
measurement models. These latter approaches permit joint scaling of persons and assessment items, which provides a basis for mapping of developmental continua by allowing descriptions of the skills displayed at various points along a continuum. Such approaches provide powerful information regarding the nature of developmental growth within various domains.
Another major focus in psychometrics has been on personality testing. There have been a range of theoretical approaches to conceptualizing and measuring personality. Some of the better known instruments include the Minnesota Multiphasic Personality Inventory
, the Five-Factor Model
(or "Big 5") and tools such as Personality and Preference Inventory
and the Myers-Briggs Type Indicator
. Attitudes have also been studied extensively using psychometric approaches. A common method in the measurement of attitudes is the use of the Likert scale
. An alternative method involves the application of unfolding measurement models, the most general being the Hyperbolic Cosine Model (Andrich & Luo, 1993).
(CTT) and item response theory
(IRT) An approach which seems mathematically to be similar to IRT but also quite distinctive, in terms of its origins and features, is represented by the Rasch model
for measurement. The development of the Rasch model, and the broader class of models to which it belongs, was explicitly founded on requirements of measurement in the physical sciences.
Psychometricians have also developed methods for working with large matrices of correlations and covariances. Techniques in this general tradition include: factor analysis
, a method of determining the underlying dimensions of data; multidimensional scaling
, a method for finding a simple representation for data with a large number of latent dimensions; and data clustering
, an approach to finding objects that are like each other. All these multivariate descriptive methods try to distill large amounts of data into simpler structures. More recently, structural equation modeling
and path analysis represent more sophisticated approaches to working with large covariance matrices. These methods allow statistically sophisticated models to be fitted to data and tested to determine if they are adequate fits.
One of the main deficiencies in various factor analyses is a lack of consensus in cutting points for determining the number of latent factors. A usual procedure is to stop factoring when eigenvalues drop below one because the original sphere shrinks. The lack of the cutting points concerns other multivariate methods, also.
. A reliable measure is one that measures a construct consistently across time, individuals, and situations. A valid measure is one that measures what it is intended to measure. A measure may be reliable without being valid. However, reliability is necessary, but not sufficient, for validity.
Both reliability and validity can be assessed statistically. Consistency over repeated measures of the same test can be assessed with the Pearson correlation coefficient, and is often called test-retest reliability. Similarly, the equivalence of different versions of the same measure can be indexed by a Pearson correlation, and is called equivalent forms reliability or a similar term.
Internal consistency, which addresses the homogeneity of a single test form, may be assessed by correlating performance on two halves of a test, which is termed split-half reliability; the value of this Pearson product-moment correlation coefficient
for two half-tests is adjusted with the Spearman–Brown prediction formula to correspond to the correlation between two full-length tests. Perhaps the most commonly used index of reliability is Cronbach's α, which is equivalent to the mean of all possible split-half coefficients. Other approaches include the intra-class correlation, which is the ratio of variance of measurements of a given target to the variance of all targets.
There are a number of different forms of validity. Criterion-related validity can be assessed by correlating a measure with a criterion measure known to be valid. When the criterion measure is collected at the same time as the measure being validated the goal is to establish concurrent validity
; when the criterion is collected later the goal is to establish predictive validity
. A measure has construct validity
if it is related to measures of other constructs as required by theory. Content validity
is a demonstration that the items of a test are drawn from the domain being measured. In a personnel selection example, test content is based on a defined statement or set of statements of knowledge, skill, ability, or other characteristics obtained from a job analysis
.
Item response theory models the relationship between latent traits and responses to test items. Among other advantages, IRT provides a basis for obtaining an estimate of the location of a test-taker on a given latent trait as well as the standard error of measurement of that location. For example, a university student's knowledge of history can be deduced from his or her score on a university test and then be compared reliably with a high school student's knowledge deduced from a less difficult test. Scores derived by classical test theory do not have this characteristic, and assessment of actual ability (rather than ability relative to other test-takers) must be assessed by comparing scores to those of a "norm group" randomly selected from the population. In fact, all measures derived from classical test theory are dependent on the sample tested, while, in principle, those derived from item response theory are not.
and reliability
typically are viewed as essential elements for determining the quality of any test. However, professional and practitioner associations frequently have placed these concerns within broader contexts when developing standards
and making overall judgments about the quality of any test as a whole within a given context. A consideration of concern in many applied research settings is whether or not the metric of a given psychological inventory is meaningful or arbitrary.
place standards about validity and reliability, along with errors of measurement and related considerations under the general topic of test construction, evaluation and documentation. The second major topic covers standards related to fairness in testing, including fairness
in testing and test use, the right
s and responsibilities
of test takers, testing individuals of diverse
linguistic backgrounds
, and testing individuals with disabilities
. The third and final major topic covers standards related to testing applications, including the responsibilities of test users, psychological testing and assessment
, educational testing and assessment, testing in employment
and credentialing
, plus testing in program evaluation
and public policy
.
, and in particular educational evaluation
, the Joint Committee on Standards for Educational Evaluation
has published three sets of standards for evaluations. The Personnel Evaluation Standards was published in 1988, The Program Evaluation Standards (2nd edition) was published in 1994, and The Student Evaluation Standards was published in 2003.
Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance.
Measurement
Measurement is the process or the result of determining the ratio of a physical quantity, such as a length, time, temperature etc., to a unit of measurement, such as the metre, second or degree Celsius...
, which includes the measurement of knowledge, abilities, attitudes, personality traits, and educational measurement
Educational measurement
Educational Measurement refers to the use of educational assessments and the analysis of data such as scores obtained from educational assessments to infer the abilities and proficiencies of students...
. The field is primarily concerned with the construction and validation of measurement instruments such as questionnaire
Questionnaire
A questionnaire is a research instrument consisting of a series of questions and other prompts for the purpose of gathering information from respondents. Although they are often designed for statistical analysis of the responses, this is not always the case...
s, tests, and personality assessments.
It involves two major research tasks, namely: (i) the construction of instruments and procedures for measurement; and (ii) the development and refinement of theoretical approaches to measurement. Those who practice psychometrics are known as psychometricians. All psychometricians possess a specific psychometric qualification, and while many are clinical psychologists, others work as human resources
Human resources
Human resources is a term used to describe the individuals who make up the workforce of an organization, although it is also applied in labor economics to, for example, business sectors or even whole nations...
or learning and development professionals.
Origins and background
Much of the early theoretical and applied work in psychometrics was undertaken in an attempt to measure intelligence. Francis GaltonFrancis Galton
Sir Francis Galton /ˈfrɑːnsɪs ˈgɔːltn̩/ FRS , cousin of Douglas Strutt Galton, half-cousin of Charles Darwin, was an English Victorian polymath: anthropologist, eugenicist, tropical explorer, geographer, inventor, meteorologist, proto-geneticist, psychometrician, and statistician...
, often referred to as "the father of psychometrics", devised and included mental tests among his anthropometric measures. However, the origin of psychometrics also has connections to the related field of psychophysics
Psychophysics
Psychophysics quantitatively investigates the relationship between physical stimuli and the sensations and perceptions they effect. Psychophysics has been described as "the scientific study of the relation between stimulus and sensation" or, more completely, as "the analysis of perceptual...
. Two other pioneers of psychometrics obtained doctorates in the Leipzig Psychophysics Laboratory under Wilhelm Wundt
Wilhelm Wundt
Wilhelm Maximilian Wundt was a German physician, psychologist, physiologist, philosopher, and professor, known today as one of the founding figures of modern psychology. He is widely regarded as the "father of experimental psychology"...
: James McKeen Cattell
James McKeen Cattell
James McKeen Cattell , American psychologist, was the first professor of psychology in the United States at the University of Pennsylvania and long-time editor and publisher of scientific journals and publications, most notably the journal Science...
in 1886 and Charles Spearman
Charles Spearman
Charles Edward Spearman, FRS was an English psychologist known for work in statistics, as a pioneer of factor analysis, and for Spearman's rank correlation coefficient...
in 1906.
The psychometrician L. L. Thurstone, founder and first president of the Psychometric Society in 1936, developed and applied a theoretical approach to measurement referred to as the law of comparative judgment
Law of comparative judgment
The law of comparative judgment was conceived by L. L. Thurstone. In modern day terminology, it is more aptly described as a model that is used to obtain measurements from any process of pairwise comparison...
, an approach that has close connections to the psychophysical theory of Ernst Heinrich Weber
Ernst Heinrich Weber
Ernst Heinrich Weber was a German physician who is considered one of the founders of experimental psychology.Weber studied medicine at Wittenberg University...
and Gustav Fechner
Gustav Fechner
Gustav Theodor Fechner , was a German experimental psychologist. An early pioneer in experimental psychology and founder of psychophysics, he inspired many 20th century scientists and philosophers...
. In addition, Spearman and Thurstone both made important contributions to the theory and application of factor analysis
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
, a statistical method developed and used extensively in psychometrics.
More recently, psychometric theory has been applied in the measurement of personality, attitudes, and beliefs, and academic achievement
Academic achievement
Academic achievement or performance is the outcome of education — the extent to which a student, teacher or institution has achieved their educational goals....
. Measurement of these unobservable phenomena is difficult, and much of the research and accumulated science in this discipline has been developed in an attempt to properly define and quantify such phenomena. Critics, including practitioners in the physical sciences and social activists, have argued that such definition and quantification is impossibly difficult, and that such measurements are often misused, such as with psychometric personality tests used in employment procedures:
- "For example, an employer wanting someone for a role requiring consistent attention to repetitive detail will probably not want to give that job to someone who is very creative and gets bored easily."
Figures who made significant contributions to psychometrics include Karl Pearson
Karl Pearson
Karl Pearson FRS was an influential English mathematician who has been credited for establishing the disciplineof mathematical statistics....
, Henry F. Kaiser, L. L. Thurstone, Georg Rasch
Georg Rasch
Georg Rasch was a Danish mathematician, statistician, and psychometrician, most famous for the development of a class of measurement models known as Rasch models. He studied with R.A. Fisher and also briefly with Ragnar Frisch, and was elected a member of the International Statistical Institute in...
, Johnson O'Connor
Johnson O'Connor
Johnson O'Connor was an American psychometrician, researcher, and educator. He is most remembered as a pioneer in the study of aptitude testing and as an advocate for the importance of vocabulary....
, Frederic M. Lord
Frederic M. Lord
Frederic M. Lord was a psychometrician for Educational Testing Service. He was the source of much of the seminal research on item response theory, including two important books: Statistical Theories of Mental Test Scores , and Applications of Item Response Theory to Practical...
, Ledyard R Tucker, Arthur Jensen
Arthur Jensen
Arthur Robert Jensen is a Professor Emeritus of educational psychology at the University of California, Berkeley. Jensen is known for his work in psychometrics and differential psychology, which is concerned with how and why individuals differ behaviorally from one another.He is a major proponent...
, and David Andrich
David Andrich
David Andrich has made substantial contributions to quantitative social science including seminal work on the Polytomous Rasch model for measurement, which is used in the social sciences, in health and other areas. He is currently a Winthrop Professor at the University of Western Australia, where...
.
Psychometric, psychometrician and psychometrist appreciation week is the first week in November.
Definition of measurement in the social sciences
The definition of measurement in the social sciences has a long history. A currently widespread definition, proposed by Stanley Smith StevensStanley Smith Stevens
Stanley Smith Stevens was an American psychologist who founded Harvard's Psycho-Acoustic Laboratory and is credited with the introduction of Stevens' power law. Stevens authored a milestone textbook, the 1400+ page "Handbook of Experimental Psychology" . He was also one of the founding organizers...
(1946), is that measurement is "the assignment of numerals to objects or events according to some rule". This definition was introduced in the paper in which Stevens proposed four levels of measurement. Although widely adopted, this definition differs in important respects from the more classical definition of measurement adopted in the physical sciences, which is that measurement is the numerical estimation and expression of the magnitude of one quantity relative to another (Michell, 1997).
Indeed, Stevens's definition of measurement was put forward in response to the British Ferguson Committee, whose chair, A. Ferguson, was a physicist. The committee was appointed in 1932 by the British Association for the Advancement of Science to investigate the possibility of quantitatively estimating sensory events. Although its chair and other members were physicists, the committee also included several psychologists. The committee's report highlighted the importance of the definition of measurement. While Stevens's response was to propose a new definition, which has had considerable influence in the field, this was by no means the only response to the report. Another, notably different, response was to accept the classical definition, as reflected in the following statement:
- "Measurement in psychology and physics are in no sense different. Physicists can measure when they can find the operations by which they may meet the necessary criteria; psychologists have but to do the same. They need not worry about the mysterious differences between the meaning of measurement in the two sciences." (Reese, 1943, p. 49)
These divergent responses are reflected in alternative approaches to measurement. For example, methods based on covariance matrices
Covariance matrix
In probability theory and statistics, a covariance matrix is a matrix whose element in the i, j position is the covariance between the i th and j th elements of a random vector...
are typically employed on the premise that numbers, such as raw scores derived from assessments, are measurements. Such approaches implicitly entail Stevens's definition of measurement, which requires only that numbers are assigned according to some rule. The main research task, then, is generally considered to be the discovery of associations between scores, and of factors posited to underlie such associations.
On the other hand, when measurement models such as the Rasch model
Rasch model
Rasch models are used for analysing data from assessments to measure variables such as abilities, attitudes, and personality traits. For example, they may be used to estimate a student's reading ability from answers to questions on a reading assessment, or the extremity of a person's attitude to...
are employed, numbers are not assigned based on a rule. Instead, in keeping with Reese's statement above, specific criteria for measurement are stated, and the goal is to construct procedures or operations that provide data that meet the relevant criteria. Measurements are estimated based on the models, and tests are conducted to ascertain whether the relevant criteria have been met.
Instruments and procedures
The first psychometric instruments were designed to measure the concept of intelligence. The best known historical approach involved the Stanford-Binet IQ testStanford-Binet IQ test
The development of the Stanford–Binet Intelligence Scales initiated the modern field of intelligence testing and was one of the first examples of an adaptive test. The test originated in France, then was revised in the United States...
, developed originally by the French psychologist Alfred Binet
Alfred Binet
Alfred Binet was a French psychologist who was the inventor of the first usable intelligence test, known at that time as the Binet test and today referred to as the IQ test. His principal goal was to identify students who needed special help in coping with the school curriculum...
. Intelligence tests are useful tools for various purposes. An alternative conception of intelligence is that cognitive capacities within individuals are a manifestation of a general component, or general intelligence factor
General intelligence factor
The g factor, where g stands for general intelligence, is a statistic used in psychometrics to model the mental ability underlying results of various tests of cognitive ability...
, as well as cognitive capacity specific to a given domain.
Psychometrics is applied widely in educational assessment to measure abilities in domains such as reading, writing, and mathematics. The main approaches in applying tests in these domains have been Classical Test Theory and the more recent Item Response Theory and Rasch
Rasch model
Rasch models are used for analysing data from assessments to measure variables such as abilities, attitudes, and personality traits. For example, they may be used to estimate a student's reading ability from answers to questions on a reading assessment, or the extremity of a person's attitude to...
measurement models. These latter approaches permit joint scaling of persons and assessment items, which provides a basis for mapping of developmental continua by allowing descriptions of the skills displayed at various points along a continuum. Such approaches provide powerful information regarding the nature of developmental growth within various domains.
Another major focus in psychometrics has been on personality testing. There have been a range of theoretical approaches to conceptualizing and measuring personality. Some of the better known instruments include the Minnesota Multiphasic Personality Inventory
Minnesota Multiphasic Personality Inventory
The Minnesota Multiphasic Personality Inventory is one of the most frequently used personality tests in mental health. The test is used by trained professionals to assist in identifying personality structure and psychopathology....
, the Five-Factor Model
Big Five personality traits
In contemporary psychology, the "Big Five" factors of personality are five broad domains or dimensions of personality which are used to describe human personality....
(or "Big 5") and tools such as Personality and Preference Inventory
Personality and Preference Inventory
The Personality and Preference Inventory was originally designed by Dr Max Kostick, Professor of Industrial Psychology at Boston State College, in Massachusetts, USA, in the early 1960s.Kostick’s original goal was to design an instrument that:...
and the Myers-Briggs Type Indicator
Myers-Briggs Type Indicator
The Myers-Briggs Type Indicator assessment is a psychometric questionnaire designed to measure psychological preferences in how people perceive the world and make decisions...
. Attitudes have also been studied extensively using psychometric approaches. A common method in the measurement of attitudes is the use of the Likert scale
Likert scale
A Likert scale is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term is often used interchangeably with rating scale, or more accurately the Likert-type scale, even though...
. An alternative method involves the application of unfolding measurement models, the most general being the Hyperbolic Cosine Model (Andrich & Luo, 1993).
Theoretical approaches
Psychometricians have developed a number of different measurement theories. These include classical test theoryClassical test theory
Classical test theory is a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of classical test theory is to understand and improve the reliability of psychological...
(CTT) and item response theory
Item response theory
In psychometrics, item response theory also known as latent trait theory, strong true score theory, or modern mental test theory, is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is based...
(IRT) An approach which seems mathematically to be similar to IRT but also quite distinctive, in terms of its origins and features, is represented by the Rasch model
Rasch model
Rasch models are used for analysing data from assessments to measure variables such as abilities, attitudes, and personality traits. For example, they may be used to estimate a student's reading ability from answers to questions on a reading assessment, or the extremity of a person's attitude to...
for measurement. The development of the Rasch model, and the broader class of models to which it belongs, was explicitly founded on requirements of measurement in the physical sciences.
Psychometricians have also developed methods for working with large matrices of correlations and covariances. Techniques in this general tradition include: factor analysis
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
, a method of determining the underlying dimensions of data; multidimensional scaling
Multidimensional scaling
Multidimensional scaling is a set of related statistical techniques often used in information visualization for exploring similarities or dissimilarities in data. MDS is a special case of ordination. An MDS algorithm starts with a matrix of item–item similarities, then assigns a location to each...
, a method for finding a simple representation for data with a large number of latent dimensions; and data clustering
Data clustering
Cluster analysis or clustering is the task of assigning a set of objects into groups so that the objects in the same cluster are more similar to each other than to those in other clusters....
, an approach to finding objects that are like each other. All these multivariate descriptive methods try to distill large amounts of data into simpler structures. More recently, structural equation modeling
Structural equation modeling
Structural equation modeling is a statistical technique for testing and estimating causal relations using a combination of statistical data and qualitative causal assumptions...
and path analysis represent more sophisticated approaches to working with large covariance matrices. These methods allow statistically sophisticated models to be fitted to data and tested to determine if they are adequate fits.
One of the main deficiencies in various factor analyses is a lack of consensus in cutting points for determining the number of latent factors. A usual procedure is to stop factoring when eigenvalues drop below one because the original sphere shrinks. The lack of the cutting points concerns other multivariate methods, also.
Key concepts
Key concepts in classical test theory are reliability and validityValidity (statistics)
In science and statistics, validity has no single agreed definition but generally refers to the extent to which a concept, conclusion or measurement is well-founded and corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong...
. A reliable measure is one that measures a construct consistently across time, individuals, and situations. A valid measure is one that measures what it is intended to measure. A measure may be reliable without being valid. However, reliability is necessary, but not sufficient, for validity.
Both reliability and validity can be assessed statistically. Consistency over repeated measures of the same test can be assessed with the Pearson correlation coefficient, and is often called test-retest reliability. Similarly, the equivalence of different versions of the same measure can be indexed by a Pearson correlation, and is called equivalent forms reliability or a similar term.
Internal consistency, which addresses the homogeneity of a single test form, may be assessed by correlating performance on two halves of a test, which is termed split-half reliability; the value of this Pearson product-moment correlation coefficient
Pearson product-moment correlation coefficient
In statistics, the Pearson product-moment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...
for two half-tests is adjusted with the Spearman–Brown prediction formula to correspond to the correlation between two full-length tests. Perhaps the most commonly used index of reliability is Cronbach's α, which is equivalent to the mean of all possible split-half coefficients. Other approaches include the intra-class correlation, which is the ratio of variance of measurements of a given target to the variance of all targets.
There are a number of different forms of validity. Criterion-related validity can be assessed by correlating a measure with a criterion measure known to be valid. When the criterion measure is collected at the same time as the measure being validated the goal is to establish concurrent validity
Concurrent validity
Concurrent validity is a parameter used in sociology, psychology, and other psychometric or behavioral sciences. Concurrent validity is demonstrated where a test correlates well with a measure that has previously been validated. The two measures may be for the same construct, or for different, but...
; when the criterion is collected later the goal is to establish predictive validity
Predictive validity
In psychometrics, predictive validity is the extent to which a score on a scale or test predicts scores on some criterion measure.For example, the validity of a cognitive test for job performance is the correlation between test scores and, for example, supervisor performance ratings...
. A measure has construct validity
Construct validity
In science , construct validity refers to whether a scale measures or correlates with the theorized psychological scientific construct that it purports to measure. In other words, it is the extent to which what was to be measured was actually measured...
if it is related to measures of other constructs as required by theory. Content validity
Content validity
In psychometrics, content validity refers to the extent to which a measure represents all facets of a given social construct. For example, a depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension...
is a demonstration that the items of a test are drawn from the domain being measured. In a personnel selection example, test content is based on a defined statement or set of statements of knowledge, skill, ability, or other characteristics obtained from a job analysis
Job analysis
Job analysis is the process of describing and recording aspects of jobs and specifying the skills and other requirements necessary to perform the job.-Purpose:...
.
Item response theory models the relationship between latent traits and responses to test items. Among other advantages, IRT provides a basis for obtaining an estimate of the location of a test-taker on a given latent trait as well as the standard error of measurement of that location. For example, a university student's knowledge of history can be deduced from his or her score on a university test and then be compared reliably with a high school student's knowledge deduced from a less difficult test. Scores derived by classical test theory do not have this characteristic, and assessment of actual ability (rather than ability relative to other test-takers) must be assessed by comparing scores to those of a "norm group" randomly selected from the population. In fact, all measures derived from classical test theory are dependent on the sample tested, while, in principle, those derived from item response theory are not.
Standards of quality
The considerations of validityValidity (statistics)
In science and statistics, validity has no single agreed definition but generally refers to the extent to which a concept, conclusion or measurement is well-founded and corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong...
and reliability
Reliability (statistics)
In statistics, reliability is the consistency of a set of measurements or of a measuring instrument, often used to describe a test. Reliability is inversely related to random error.-Types:There are several general classes of reliability estimates:...
typically are viewed as essential elements for determining the quality of any test. However, professional and practitioner associations frequently have placed these concerns within broader contexts when developing standards
Standards organization
A standards organization, standards body, standards developing organization , or standards setting organization is any organization whose primary activities are developing, coordinating, promulgating, revising, amending, reissuing, interpreting, or otherwise producing technical standards that are...
and making overall judgments about the quality of any test as a whole within a given context. A consideration of concern in many applied research settings is whether or not the metric of a given psychological inventory is meaningful or arbitrary.
Testing standards
In this field, the Standards for Educational and Psychological TestingStandards for Educational and Psychological Testing
The Standards for Educational and Psychological Testing is a set of testing standards developed jointly by the American Educational Research Association , American Psychological Association , and the National Council on Measurement in Education...
place standards about validity and reliability, along with errors of measurement and related considerations under the general topic of test construction, evaluation and documentation. The second major topic covers standards related to fairness in testing, including fairness
Justice
Justice is a concept of moral rightness based on ethics, rationality, law, natural law, religion, or equity, along with the punishment of the breach of said ethics; justice is the act of being just and/or fair.-Concept of justice:...
in testing and test use, the right
Right
Rights are legal, social, or ethical principles of freedom or entitlement; that is, rights are the fundamental normative rules about what is allowed of people or owed to people, according to some legal system, social convention, or ethical theory...
s and responsibilities
Moral responsibility
Moral responsibility usually refers to the idea that a person has moral obligations in certain situations. Disobeying moral obligations, then, becomes grounds for justified punishment. Deciding what justifies punishment, if anything, is a principle concern of ethics.People who have moral...
of test takers, testing individuals of diverse
Diversity (politics)
In the political arena, the term diversity is used to describe political entities with members who have identifiable differences in their backgrounds or lifestyles....
linguistic backgrounds
Language
Language may refer either to the specifically human capacity for acquiring and using complex systems of communication, or to a specific instance of such a system of complex communication...
, and testing individuals with disabilities
Disability
A disability may be physical, cognitive, mental, sensory, emotional, developmental or some combination of these.Many people would rather be referred to as a person with a disability instead of handicapped...
. The third and final major topic covers standards related to testing applications, including the responsibilities of test users, psychological testing and assessment
Psychological testing
Psychological testing is a field characterized by the use of samples of behavior in order to assess psychological construct, such as cognitive and emotional functioning, about a given individual. The technical term for the science behind psychological testing is psychometrics...
, educational testing and assessment, testing in employment
Employment
Employment is a contract between two parties, one being the employer and the other being the employee. An employee may be defined as:- Employee :...
and credentialing
Professional certification
Professional certification, trade certification, or professional designation, often called simply certification or qualification, is a designation earned by a person to assure qualification to perform a job or task...
, plus testing in program evaluation
Program evaluation
Project evaluation is a systematic method for collecting, analyzing, and using information to answer questions about projects, policies and programs, particularly about their effectiveness and efficiency...
and public policy
Standardized testing and public policy
Standardized testing is used as a public policy strategy to establish stronger accountability measures for public education. While the National Assessment of Education Progress has served as an educational barometer for some thirty years by administering standardized tests on a regular basis to...
.
Evaluation standards
In the field of evaluationEvaluation
Evaluation is systematic determination of merit, worth, and significance of something or someone using criteria against a set of standards.Evaluation often is used to characterize and appraise subjects of interest in a wide range of human enterprises, including the arts, criminal justice,...
, and in particular educational evaluation
Educational evaluation
Educational evaluation is the evaluation process of characterizing and appraising some aspect/s of an educational process.Q. 3 Discuss the role of standards and criteria in educational evaluation...
, the Joint Committee on Standards for Educational Evaluation
Joint Committee on Standards for Educational Evaluation
The Joint Committee on Standards for Educational Evaluation is an American/Canadian based Standards Developer Organization . The Joint Committee represents a coalition of major professional associations formed in 1975 to help improve the quality of standardized evaluation. The Committee has thus...
has published three sets of standards for evaluations. The Personnel Evaluation Standards was published in 1988, The Program Evaluation Standards (2nd edition) was published in 1994, and The Student Evaluation Standards was published in 2003.
Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance.
See also
- Classical test theoryClassical test theoryClassical test theory is a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of classical test theory is to understand and improve the reliability of psychological...
- Concept inventoryConcept inventoryA concept inventory is a criterion-referenced test designed to evaluate whether a student has an accurate working knowledge of a specific set of concepts. To ensure interpretability, it is common to have multiple items that address a single idea...
- Cronbach's alphaCronbach's alphaCronbach's \alpha is a coefficient of reliability. It is commonly used as a measure of the internal consistency or reliability of a psychometric test score for a sample of examinees. It was first named alpha by Lee Cronbach in 1951, as he had intended to continue with further coefficients...
- Educational assessment
- Educational psychologyEducational psychologyEducational psychology is the study of how humans learn in educational settings, the effectiveness of educational interventions, the psychology of teaching, and the social psychology of schools as organizations. Educational psychology is concerned with how students learn and develop, often focusing...
- HistoriometryHistoriometryHistoriometry is the historical study of human progress or individual personal characteristics, using statistics to analyze references to geniuses, their statements, behavior and discoveries in relatively neutral texts...
- Item response theoryItem response theoryIn psychometrics, item response theory also known as latent trait theory, strong true score theory, or modern mental test theory, is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is based...
- List of psychometric software
- Operationalisation
- Quantitative psychologyQuantitative psychologyResearch in quantitative psychology develops psychological theory in relation to mathematics and statistics. Psychological research requires the elaboration of existing methods and the development of new concepts, so that quantitative psychology requires more than "applications" of statistics and...
- Rasch modelRasch modelRasch models are used for analysing data from assessments to measure variables such as abilities, attitudes, and personality traits. For example, they may be used to estimate a student's reading ability from answers to questions on a reading assessment, or the extremity of a person's attitude to...
- Scale (social sciences)Scale (social sciences)In the social sciences, scaling is the process of measuring or ordering entities with respect to quantitative attributes or traits. For example, a scaling technique might involve estimating individuals' levels of extraversion, or the perceived quality of products...
- AptitudeAptitudeAn aptitude is an innate component of a competency to do a certain kind of work at a certain level. Aptitudes may be physical or mental...
- School counselorSchool counselorA school counselor is a counselor and an educator who works in elementary, middle, and high schools to provide academic, career, college access, and personal/social competencies to K-12 students...
- School Psychological ExaminerSchool Psychological Examiner- Role of Psychological Examiners in schools :School Psychological Examiners are assessors licensed by a State Department of Education to work with students from pre-kindergarten to twelfth grade in public schools, interviewing, observing, and administering and interpreting standardized testing...
- School psychologySchool psychologySchool psychology is a field that applies principles of clinical psychology and educational psychology to the diagnosis and treatment of children's and adolescents' behavioral and learning problems...
- Standardized testStandardized testA standardized test is a test that is administered and scored in a consistent, or "standard", manner. Standardized tests are designed in such a way that the questions, conditions for administering, scoring procedures, and interpretations are consistent and are administered and scored in a...
External links
- APA Standards for Educational and Psychological Testing
- Joint Committee on Standards for Educational Evaluation
- The Psychometrics Centre, University of Cambridge
- Psychometric Society and Psychometrika homepage
- London Psychometric Laboratory
- Rasch analysis in psychometrics
- As Test-Taking Grows, Test-Makers Grow Rarer, May 5, 2006, NY Times. "Psychometrics, one of the most obscure, esoteric and cerebral professions in America, is now also one of the hottest."