A
Biserial correlation – – an index used to
Achievement – previous – previous learning.
express the relationship between a continuous variable and an artificially dichotomous variable.
Acquiescence – the – the tendency to agree or to endorse a test item as true.
Adverse impact – – the effect of any test used for selection purposes if it systematically rejects substantially higher proportions of minority than majority job applicants.
Age
differentiation – – discrimination
based on the fact that older children have greater capabilities than do younger children
C Category format – – a rating-scale format that often uses the categories 1 to 10
Ceiling – – the highest score possible on a test. When the test is too easy, many people may get the highest score and the test cannot discriminate between the top level performers and those at lower levels.
Class interval – the – the unit for the horizontal Aptitude – – potential for learning a specific
axis in a frequency distribution
skill
Closed-ended Assessment – – a procedure used to
question
interviewing, a question answered specifically
that
– can
in be
evaluate an individual so that one can describe the person in terms of current functioning and also so that one can predict future functioning. Tests are used in the assessment process.
Coefficient alpha – a – a generalized method
B
Coefficient of alienation – – in correlation
for estimating reliability; Alpha is similar to the KR 20 20 formula, except that it allows items to take on values other than 0 and 1.
Basal – – the level at which a minimum
and regression analysis, the index of nonassociation between two variables
criterion number of correct responses is obtained.
Coefficient of determination – – the
Basal age – – in the Stanford-Binet scale, the highest year level at which the subject successfully passes all tests.
correlation coefficient squared; gives an estimate of the percentage of variation in Y that is known as a function of knowing X (and vice versa).
Base rate – – in decision analysis, the
Concurrent
proportion of people expected to succeed on a criterion if they are chosen at random.
validity
evidence
– evidence for criterion validity in which the
test and the criterion are administered at the same point in time
Confrontation – a statement that points
Criterion
validity
evidence – the
evidence that a test score corresponds to an accurate measure of interest. The measure of interest is called the criterion.
out a discrepancy or inconsistency.
Cross Construct validity evidence – a process used to establish the meaning of a test through a series of studies wherein a researcher simultaneously defines some construct and develops the instrumentation to measure it.
Content
validity
evidence
– the evidence that the content of a test represents the conceptual domain it is designed to cover.
Convergent
evidence
– evidence obtained to demonstrate that a test measures the same attribute as do other measures that purport to measure the same thing. A form of construct validity evidence
Correction
for
attenuation – the
correction for attenuation formula is used to estimate what the correlation would have been if the variables had been perfectly reliable.
Correlation coefficient – a mathematical index used to describe the direction and the magnitude of a relationship between two variables. The correlation coefficient ranges between −1.0 and 1.0.
Criterion-referenced test – a test that describes the specific types of skills, tasks, or knowledge of an individual relative to a well-defined mastery criterion. The content of criterion-referenced tests is limited to certain well-defined objectives.
validation – the process of
evaluating a test or a regression equation for a sample other than the one used for the original studies.
D Deciles – points that divide the frequency distribution into equal tenths.
Descriptive statistics – methods used to provide a concise description of a collection of quantitative information.
Developmental quotient (DQ) – in the Gesell Developmental Schedules, a test score that is obtained by assessing the presence or absence of behaviors associated with maturation.
Dichotomous format – a test item format in which there are two alternatives for each item.
Differential validity – the extent to which a test has different meanings for different groups of people
Discriminability – in item analysis, how well an item performs in relation to some criterion; for example, items may be compared according to how well they separate groups who score high and low on the test. The index of discrimination would then be the association between performance on an item and performance on the whole test.
Discriminant analysis – a multivariate
Factor analysis – a set of multivariate
data analysis method for finding the linear combination of variables that best describes the classification of groups into discrete categories.
data analysis methods for reducing large matrixes of correlations to fewer variables
Discriminant
evidence
– evidence obtained to demonstrate that a test measures something different from what other available tests measure.
Distractors – alternatives on a multiple choice exam that are not correct or for which no credit is given.
False negative – in test-decision theory, a case in which the test suggests a negative classification, yet the correct classification is positive.
False positive – in test-decision analysis, a case in which the test suggests a positive classification, yet the correct classification is negative.
Four-fifths rule – a rule used by federal Drift – the tendency for observers in behavioral studies to stray from the definitions they learned during training and to develop their own idiosyncratic definitions of behaviors.
E Evaluative statement – a statement in interviewing that judges or evaluates.
agencies in deciding whether there is equal employment opportunity. Any procedure that results in a selection rate for any race, gender, or ethnic group that is less than four fifths (80%) of the selection rate for the group with the highest rate is regarded as having an adverse impact.
Frequency distribution – the systematic arrangement of scores on a measure to reflect how frequently each value on the measure occurred
Expectancy effect – the tendency for results to be influenced by what experimenters or test administrators expect to find (also known as the Rosenthal effect, after the psychologist who has studied this problem intensively)
F Face validity – the extent to which items on a test appear to be meaningful and relevant. Actually not evidence for validity because face validity is not a basis for inference.
G General cognitive index (GCI) – in the McCarthy Scales of Children’s Abilities, a standard score with a mean of 100 and standard deviation of 16 Group test – a test that a single test administrator can give to more than one person at a time
H
Hit rate – in test-decision analysis, the proportion of cases in which a accurately predicts success or failure.
test
Interquartile range – the interval of scores bounded by the 25th and the 75th percentiles
Human ability – behaviors that reflect either what a person has learned or the person’s capacity to emit a specific behavior;
I
Interval scale – a scale that one can use to rank order objects and on which the units reflect equivalent magnitudes of the property being measured
Interview – a
Individual tests – tests that can be given
method of gathering information by talk, discussion, or direct questions.
to only one person at a time
Ipsative score – a test result presented in Inferences – logical deductions (from evidence) about something that one cannot observe directly.
Inferential statistics – methods used to make inferences from a small group of observations, called a sample. These inferences are then applied to a larger group of individuals, known as a population. Typically, the researcher wants to make statements about the larger group but cannot make all of the necessary observations.
Intelligence
– general independent of previous learning.
relative rather than absolute terms. Ipsative scores compare the individual against himor herself. Each person thus provides his or her own frame of reference.
Isodensity curve – an ellipse on a scatterplot (or two-dimensional scatter diagram) that encircles a specified proportion of the cases constituting particular groups
Item – a specific stimulus to which a person responds overtly and that can be scored or evaluated.
potential
Intelligence quotient (IQ) – a unit for expressing the results of intelligence tests; The intelligence quotient is based on the ratio of the individual’s mental age (MA) (as determined by the test) to actual or chronological age (CA): IQ = MA/CA × 100 .
Intercept – on a two-dimensional graph, the point on the Y axis where X equals 0. In regression, this is the point at which the regression line intersects the Y axis.
Item analysis – a set of methods used to evaluate test items. The most common techniques involve assessment of item difficulty and item discriminability.
Item characteristic curve – a graph prepared as part of the process of item analysis. One graph is prepared for each test item and shows the total test score on the X axis and the proportion of test takers passing the item on the Y axis.
Item difficulty – a form of item analysis used to assess how difficult items are. The most common index of difficulty is the percentage of test takers who respond with the correct choice.
K
results of intelligence tests; his unit is based on comparing the individual’s performance on the test with the average performance of individuals in a specific chronological age group.
Multiple regression – a multivariate data
Kuder-Richardson 20 – a formula for estimating the internal consistency of a test, KR 20 method is equivalent to the average split-half correlation obtained from all possible splits of the items. For the KR 20 formula to be applied, all items must be scored either 0 or 1.
L Likert format – a format for attitude scale items in which subjects indicate their degree of agreement to statements
M McCall’s
Mental age – a unit for expressing the
T – a
standardized score system with a mean of 50 and a standard deviation of 10. McCall’s T can be obtained from a simple linear transformation of Z scores (T = 10Z + 50).
Mean – the arithmetic average of a set of scores on a variable.
Measurement error – the component of an observed test score that is neither the true score nor the quality you wish to measure.
Median – the point on a frequency distribution marking the 50th percentile.
analysis method that considers the relationship between a continuous outcome variable and the linear combination of two or more predictor variables.
Multivariate analysis – a set of methods for data analysis that considers the relationships between combinations of three or more variables
N Nominal scales – systems that arbitrarily assign numbers to objects, mathematical manipulation of numbers from a nominal scale is not justified. For example, numbers on the backs of football pla yers’ uniforms are a nominal scale.
Normative sample – a comparison group consisting of individuals who have been administered a test under standard conditions—that is, with the instructions, format, and general procedures outlined in the test manual for administering the test
Norm-referenced test – a test that evaluates each individual relative to a normative group.
Norms – a summary of the performance of a group of individuals on which a test was standardized. The norms usually include the
mean and the standard deviation for the reference group and information on how to translate a raw score into a percentile rank.
obtained score and converting the resulting values to percentiles.
Percentile rank – the proportion of scores
O
that fall below a particular score
One-tailed test – a directional test of the
of tasks that require a subject to do something rather than to answer questions
Performance scale – a test that consists
null hypothesis. With a one-tailed test, the experimenter states the specific end of the null distribution that should be used for the region of rejection of the null hypothesis.
Open-ended question – a question that usually cannot be answered specifically, such questions require the interviewee to produce something spontaneously.
Ordinal scale – a scale that one can use
Personality tests – tests that measure overt and covert dispositions of individuals Personality tests measure typical human behavior.
Point scale – a test in which points (0, 1, or 2, for example) are assigned to each item. In a point scale, all items with a particular content can be grouped together.
to rank order objects or individuals
Polytomous format – a format for
P Parallel forms reliability – the method of reliability assessment used to evaluate the error associated with the use of a particular set of items. Equivalent forms of a test are developed by generating two forms using the same rules. The correlation between the two forms is the estimate of parallel forms reliability.
objective tests in which three or more alternative responses are given for each item. This format is popular for multiplechoice exams.
Predictive validity evidence – the evidence that a test forecasts scores on the criterion at some future time
Probing statement – a statement in
Pearson product moment correlation
interviewing that demands more information than the interviewee has been willing to provide of his or her own accord.
– an index of correlation between two continuous variables
Projective hypothesis – the proposal
Percentile
band
– the range of percentiles that are likely to represent a subject’s true score, it is created by forming an interval one standard error of measurement above and below the
that when a person attempts to understand an ambiguous or vague stimulus, his or her interpretation reflects needs, feelings, experiences, prior conditioning, thought processes, and so forth.
Projective personality tests – tests in which the stimulus or the required response or both are ambiguous
to be higher when an observer knows that his or her work is being monitored.
Reassuring statement – a statement Prophecy formula – a formula developed by Spearman and Brown that one can use to correct for the loss of reliability that occurs when the split half method is used and each half of the test is one-half as long as the whole test. The method can also be used to estimate how much the test length must be increased to bring the test to a desired level of reliability.
Psychological
intended to comfort or support.
Receptive vocabulary – in the Peabody Picture Vocabulary Test, a nonverbal estimate of verbal intelligence; in general, the ability to understand language
Regression line – the best-fitting straight line through a set of points in a scatter diagram
test – a device for
measuring characteristics of human beings that pertain to overt and covert behavior
Psychological testing – the use of psychological tests. Psychological testing refers to all of the possible uses, applications, and underlying concepts of psychological tests.
Q Quartiles – points
that divide the frequency distribution into equal fourths.
R Randomly parallel tests – tests created by successive random sampling of items from a domain or universe of items.
Ratio scale – an interval scale with an absolute zero, or point at which there is none of the property being measured.
Reactivity – the phenomenon that causes the reliability of a scale in behavior studies
Reliability – the extent to which a score or measure is free of measurement error. Theoretically, reliability is the ratio of true score variance to observed score variance.
Representative sample – a sample drawn in an unbiased or random fashion so that it is composed of individuals with characteristics similar to those for whom the test is to be used
Residual – the
difference between predicted and observed values from a regression equation.
Response style – the tendency to mark a test item in a certain way irrespective of content
Restricted range – in correlation and regression, variability on one measure is used to forecast variability on a second measure. If the variability is restricted on either measure, the observed correlation is likely to be low.
S
Standard error of estimate – is an index
Scales – tools that relate raw scores on test items to some defined theoretical or empirical distribution.
Scatter diagram – a picture of the
of the accuracy of a regression equation. It is equivalent to the standard deviation of the residuals from a regression analysis. Prediction is most accurate when the standard error of estimate is small.
relationship between two variables. For each individual, a pair of observations is obtained, and the values are plotted in a two-dimensional space created by variables X and Y.
Standard error of measurement – is an
Selection ratio – in test decision analysis,
Standardization sample – a comparison
the proportion selected
Self-report
of
applicants
who
are
questionnaire
– a questionnaire that provides a list of statements about an individual and requires him or her to respond in some way to each, such as “True” or “False”
Shrinkage – the amount of decrease in the strength of the relationship from the original sample to the sample with which the equation is used
Spearman’s rho – a method for finding the correlation between two sets of ranks
Split-half reliability – a method for evaluating reliability in which a test is split into halves
Standard
administration
– the procedures outlined in the test manual for administering a test.
Standard deviation – it is used as a measure of variability in a distribution of scores.
index of the amount of error in a test or measure. The standard error of measurement is a standard deviation of a set of observations for the same test.
group consisting of individuals who have been administered a test under standard conditions—that is, with the instructions, format, and general procedures outlined in the test manual for administering the test
Standardized interview – an interview conducted under standard conditions that are well defined in a manual or procedure book.
Stanine system – a system for assigning the numbers 1 through 9 to a test score. The system was developed by the U.S. Air Force. The standardized stanine distribution has a mean of 5 and a standard deviation of approximately 2.
Stress – response to situations that pose demands, place opportunities.
constraints,
or
give
Structured personality tests – tests that provide a statement, usually of the selfreport variety and require the subject to choose between two or more alternative responses
True score – thee score that would be
T Taylor-Russell tables – a series of tables one can use to evaluate the validity of a test in relation to the amount of information it contributes beyond what would be known by chance.
Test – a
measurement quantifies behavior.
device
that
obtained on a test or measure if there were no measurement error. In practice, the true score can be estimated but not directly observed.
Two-tailed test – is a non-directional test of the null hypothesis. The two-tailed test is used to evaluate whether observations are significantly different from chance in either the upper or lower end of the sampling distribution.
Test administration – the act of giving a test.
Test administrator – person giving a test.
U Understanding response – a statement
Test battery – a collection of tests, the scores of which are used together in appraising an individual. Test –retest reliability – a method for estimating how much measurement error is caused by time sampling, or administering the test at two different points in time. Test – retest reliability is usually estimated from the correlation between performances on two different administrations of the test.
Tracking – the tendency to stay at about the same level of growth or performance relative to peers who are the same age.
Trait anxiety – a personality characteristic reflecting the differences among people in the intensity of their reaction to stressful situations
Traits
– enduring or persistent characteristics of an individual that is independent of situations.
that communicates understanding
V Validity – the extent to which a test measures the quality it purports to measure.
Variance – the average squared deviation around the mean; the standard deviation squared.