Psychometry

Psychometry (psychometrics) - a discipline of psychology that studies the theory and methodology of psychological measurements, including the measurement of knowledge, abilities, attitudes and qualities of a person. Psychometry is a section of psychodiagnostics . First of all, this area concerns the creation and validation of measuring instruments, such as questionnaires, tests, and methods for describing (evaluating) a person. It includes two main research tasks, namely:

Creation of tools and construction of measurement procedures;
Development and improvement of theoretical approaches to measurement.

Those involved in psychometry can be either clinical psychologists or development specialists or HR . In any case, a specific, separate qualification in psychometry is not required. In the United States, psychometry is taught at the undergraduate, graduate and doctoral levels.

Content

Beginning of Psychometry

Most of the early research in psychometrics was based on the desire to measure intelligence. Francis Galton , known as the “father of psychometry,” included mental measurements in anthropometric data. The origin of psychometry is also associated with psychophysics . Two other psychometric initiators James Mackin Kettel and Charles Spearman received their doctorates at Wilhelm Wundt's Leipzig Laboratory of Psychophysics.

Psychometrist Louis Thurstone , founder and first president of the Psychometric Society, in 1936 developed a theoretical approach to measurement, which is known as the law of comparative judgment . This approach is closely related to the psychophysical theories of Ernst Weber and Gustav Fechner . Spearman and Thurstone also made a great contribution to the development of factor analysis .

Karl Pearson , Henry Kaiser , George Rush , Johnson O'Connor , Frederick Lord , Ledyard Tucker , Arthur Jensen also made a great contribution to the development of psychometry.

Field of Psychometry

The field of psychometry is associated with a quantitative approach to the analysis of test data. Psychometric theory provides researchers and psychologists with mathematical models used to analyze answers to individual tasks or test items, tests in general, and test suites. Applied Psychometry deals with the application of these models and analytical procedures to specific test data. The four areas of psychometric analysis are rationing and equalization, reliability assessment, validity assessment and task analysis. Each of these areas contains a set of certain theoretical principles and specific procedures used in assessing the quality of the test in each individual case.

The definition of “measurement” in the social sciences

The definition of measurement in the social sciences has a long history. Currently, the broad definition proposed by Stanley Smith Stevens (1946) states that the measurement is "assigning numbers to objects or events according to some rule." This definition was presented in a work in which Stevens proposed four levels of measurement. Although this definition is widespread, it differs from the more classical definition of measurement adopted in physics, which states that measurement is a numerical estimate and expression of one quantity in relation to another (Michel, 1997).

Indeed, Stevens' definition was put forward in response to the British Ferguson Committee, whose chairman, A. Ferguson, was a physicist. The committee was appointed in 1932 by the British Association to promote the development of science in the study of the possibility of quantifying sensory perceptions. Although its chairman and other members were physicists, the committee also included several psychologists. The report of the Committee emphasized the importance of defining measurement. While Stevens' answer was to propose a new definition that would have a significant impact on this area, this was not the only answer to the report. Another radically different answer called for a classical definition, as reflected in the following statement: “Measurements in psychology and physics are in no way different. Physicists can take measurements when they can find operations that can be used to find the necessary criterion "Psychologists may not worry about the mysterious differences in the meaning of" dimension "in the two sciences." (Reese, 1943, p. 49)

These different points of view are reflected in alternative measurement approaches. For example, methods based on a covariance matrix typically use numbers, such as raw scores, as a measurement. This approach implicitly entails the definition of Stevens, which requires only that the numbers are assigned according to some rule. Thus, the main task of research, as a rule, is considered to be the discovery of relationships between indicators and the factors underlying these relationships.

On the other hand, when a measurement model such as the Rush model is used, numbers are not assigned based on rules. Instead, according to Reese's statement above, specific criteria for measurement are indicated, and the goal is to build procedures or operations that provide data that meet the relevant criteria. Measurements are evaluated on the basis of models and tests are carried out to ensure that the relevant criteria have been met.

Rationing tests

Rationing tests is an integral part of their standardization, usually including examining a representative sample of individuals, identifying different levels of test execution, and translating raw test scores into a common scorecard. Tests are sometimes equated when there are various forms of the same test. Equalization leads scores in all forms to a common scale.

There are 4 main equalization strategies. The first method involves conducting each test form on an equivalent (for example, randomly selected) group of respondents, and then estimates for these different forms are established so that equal ratings have equal percentile ranks (the same proportion of respondents gets the same or lower rating). With a more accurate method, all respondents fill out all forms of the test, and equations are used to determine the equivalence of indicators. A third commonly used method involves conducting a general test or part of a test with all respondents. This general evaluation procedure serves as a “binding” test, which allows all subsequent measurements to be linked to a single scale. When conducting a survey using various forms of the same test, several “anchor tasks” are included in each, performing the function of such a “binding” test.

Testing Requirements

Reliability and validity are related to the generalizability of test indicators - the determination of which conclusions on test indicators are reasonable. Reliability refers to conclusions about measurement consistency. Consistency is defined differently: as temporary stability, as a similarity between supposedly equivalent tests, as homogeneity within a single test, or as comparability of assessments made by experts. When using the test-retest method, the reliability of the test is established by re-conducting it with the same group after a certain period of time. Then the two sets of indicators obtained are compared in order to determine the degree of similarity between them. When using the method of interchangeable forms, two parallel measurements are taken on a sample of the subjects. The involvement of experts ("appraisers") in assessing the quality of parallel test forms gives a measure of reliability, called. reliability of appraisers. This method is often used when there is a need for expert evaluation.

Validity characterizes the quality of the conclusions obtained on the basis of the results of the measurement procedure.

Validity is considered as the ability of the test to meet its goals and justify the adequacy of decisions made on the basis of the result. An insufficiently valid test cannot be considered a measurement tool and used in practice, since often the result obtained can seriously affect the future of the test person.

Three types of test validity are distinguished.

Constructive (conceptual) validity . It needs to be determined if the test measures a property that is abstract in nature, that is, not amenable to direct measurement. In such cases, it is necessary to create a conceptual model that would explain this property. This model is either subject to or refuted by the test.

Criteria (empirical) validity . Shows how the test results are related to a certain external criterion. Empirical validity exists in two forms: current criterion validity - correlation of test results with the selected criterion that currently exists; prognostic criteria validity - correlation of results with a criterion that will appear in the future. Determines how much the test predicts the occurrence of the measured quality in the future, taking into account the influence of external factors and the testee's own activity.

Content validity . Determines how closely the test of its subject area corresponds, that is, whether it measures the quality for which it is intended to be measured for a representative sample. To maintain the substantive validity of the test, it is necessary to regularly check it for compliance, since the real picture of the manifestation of a certain quality may change in the sample over time. Assessment of substantive validity should be carried out by an expert in the subject area of the test.

The process of validation of the test should not be a collection of evidence of its validity, but a set of measures to increase this validity.

Most task analysis procedures involve: a) registering the number of subjects who gave a right or wrong answer to a particular task; b) the correlation of individual tasks with other variables; c) checking tasks for a systematic error (or “bias”). The proportion of subjects who completed the test was called, perhaps not quite accurately, the difficulty of the task. A way to improve tasks is to calculate the percentage of choice for each answer to a multiple-choice task; It is also useful to calculate the average test score for subjects who have chosen each option. These procedures allow you to control so that the answer options look plausible for untrained subjects, but do not seem correct to the most knowledgeable. The selection of tasks that strongly correlate with the full test metric maximizes reliability as the internal consistency of the test, while the selection of tasks that strongly correlate with an external criterion maximizes its predictive validity. The descriptive analog model of these correlations is called the characteristic curve of the job ; in typical cases, this is a graph of the proportion of subjects who correctly answer the question from their total test score. For effective tasks, these graphs are positive upward curves that do not decrease as the ability grows.

Links

An article about psychogeometry - personality testing based on geometric shapes
List of known neuropsychological (cognitive) tests (in English)
Psychometric fundamentals of psychodiagnostics - article on the principles of standardization of psychodiagnostic techniques
PEBL Computer Psychometric Test Suite

Literature

R. Corsini, A. Auerbach. "Psychological Encyclopedia."
V.S. Kim. "Testing educational achievements." - Ussuriysk: USPI, 2007.