A Glossary of Terms Used in Educational Assessment
Freedman, M. & Houtz, J.
Parenting for High Potential
National Association for Gifted Children (NAGC)
March 2004

This article presents an alphabetical listing of terms often associated with gifted assessment.

This article concludes our three part series on tests and testing. Part One was in the June, 2003 issue, and Part Two was in the December, 2003 issue.

"When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less."

"The question is," said Alice, "whether you can make words mean so many different things."

"The question is," said Humpty Dumpty, "which is to be master-that's all."

Carroll, L. (1946/1974). Alice in Wonderland & Through the Looking Glass. New York: Grosset & Dunlop, Inc. (p.238).

When talking with your children's teachers or other school personnel, you might find that you're feeling a bit like Alice in Wonderland. All professional fields have special vocabularies, or jargon, but in education today there are a great many new terms. And many of these terms are quite technical and specialized, dealing with testing and other forms of assessment.

In response to many new laws, practices, and advances in research and theory, there is more jargon or new vocabulary that makes it harder for you as parents-and for everyone else concerned-to keep "on top" of things, to understand your children's school experience, and to participate fully in your children's education. The purpose of this article is to offer definitions and explanations of some "classic" terms that you will often hear, and of several of the more recent measurement terms that are finding their way into the world of education today. We present the terms alphabetically, in the form of a glossary.

Ability and Aptitude. The terms ability and aptitude are closely related and often difficult to distinguish from each other. Ability, the mental or physical capacity to perform at a given level, is considered to be innate, therefore determined genetically. According to psychological theory, it may be described as possession of one or more of the multiple areas of intelligence that have been described by various theories and models. Aptitude may be described as the proclivity to excel in the performance of specific tasks (as in, "she has a real aptitude for drawing").

Accountability in assessment refers to holding individuals or institutions responsible for the outcomes of instruction. For example, you might hear or read that "students are accountable for their school successes and/or failures," that "teachers (or parents) are accountable for the performance of their students (or children)," or that "school principals are accountable for the achievement of their schools."

Achievement is a measure of the quality and or the quantity of the success one has in the mastery of knowledge, skills, or understandings. References to academic achievement, for example, usually involve performance in such areas as reading, mathematics, science, or social studies.

Achievement test batteries. Many schools test students using an array of subtests, in a number of academic content areas and at a variety of grade levels under a single overall test name. For example, a particular "Test of Basic Skills" might involve subtests of mathematical skills, language skills, and vocabulary.

Assessment involves the process of "taking stock" of, or understanding, an individual's characteristics, status, or performance, and typically involves considering and interpreting information from several sources of data. It might involve, for example, observations, interviews, or other kinds of information. (Compare with evaluation and measurement.)

Authentic assessment refers to the evaluation of students' work on activities that students engage in that approximate realistic or real-life tasks and performances, rather than answering traditional paper-and-pencil tests. Authentic tasks typically require complex work, problem solving, and integration of a variety of knowledge and skills brought to bear on a realistic task or challenge. For example, students might use grocery store ads, a shopping list, and a budget to spend as a realistic alternative to completing a group of arithmetic "column addition" exercises on a worksheet.

Competency-based assessment. This phrase indicates that students will be evaluated against some specific learning, behavior, or performance objective. This objective, and/or the level of performance that represents "competency" is clearly established in the curriculum and represents an expected level of expertise or mastery of skills or knowledge.

Criterion-referenced testing refers to evaluating students against an absolute standard of achievement, rather than evaluating them in comparison with the performance of other students. A standard of performance is set to represent a level of expertise or mastery of skills or knowledge.

Derived scores or standard scores transform raw scores (the actual number of correct responses) into values that allow us to compare one student's performance in relation to the performance of others of the same age or grade, or to the highest possible score on a test. Common standard scores are z-scores, T-scores, percentiles, and stanines. Derived or standard scores are all computed by determining how far above or below the mean of all scores a student scores, and then representing the results using a standard scale. [Editor's note: The article by Gregorgy Machek and Jonathan Plucker in the December, 2003 issue of PHP included a chart illustrating many common derived or standard scores.]

Evaluation represents a judgment or determination of value (e.g., effective or ineffective, or below, at, or above grade level) is placed on some performance.

Formative evaluation refers to any form of assessment, such as quizzes, tests, essays, projects, interviews, or presentations, in which the goal is to give students feedback about their work while it is in progress, to help students correct errors or missteps, or to improve the work along the way to the final product. In contrast, summative evaluation is to make a judgment about a final product or about the quality of performance at the end of an instructional unit or course.

Grade equivalent score. A grade equivalent score describes a student's performance on that test in relation to a grade level and number of months during the year of that grade. (A score of 8.2, for example, tells you that your child obtained the same score on a test that an average student in the second month of the eighth grade would obtain.) Of course, if your child is in the fifth grade, that's very good, but if your child is in the tenth grade, that's not so good!

High-stakes testing typically refers to major state or national standardized school achievement tests administered periodically to students at various grade levels. The phrase "high stakes" is used to signify that these test results carry a great deal of weight among school personnel, government agencies, politicians, community leaders, and the general public. These test results often are used to make important decisions about students, teachers, and their schools, such as graduation, grade promotions or retentions, selection for highly competitive programs or schools, or staffing and budget decisions.

Intelligence. Over many years, the concept of intelligence has had many definitions. Intelligence has been defined, to cite several examples, as the ability to think conceptually, to solve problems, to manipulate one's environment, or to develop expertise. Some theorists have proposed that intelligence is mostly innate, inherited, or biologically-based, and others have argued equally strongly that intelligence is influenced by one's environment. Issues regarding the nature and breadth of intelligence continue to be topics of lively discussion among theorists and researchers in several fields of study (including educational psychology, cognitive psychology, and sociology, for example).

Learning objective. A learning objective is a specific statement that describes what the student is to learn, understand, or to be able to do as a result of a lesson or a series of lessons.

Learning outcome. A learning outcome represents what the student actually achieved as a result of a lesson or a series of lessons. The success of lessons may be influenced by the students' prior knowledge, their effort and attention, teaching methods, resources, and time. Learning outcomes refer to the results of instruction, while learning objectives refer to the intended goals and purposes of lessons.

Measurement is simply the process of assigning a number, or a score if you will, to some performance or product. Examples would include grading a test or a homework assignment in terms of number or percent of correct or incorrect responses.

Measures of central tendency are quantitative (numerical) ways to describe the middle of a distribution of scores. Since most individuals in a given population tend to exhibit middle levels of competence or presence of a characteristic, most people tend to earn scores that are near the central portion of the normal curve (see definition, below). There are three common measures of central tendency: mean, median, and mode. The mean refers to a numerical average of the scores. It is obtained by adding all the scores and dividing their sum by the number of scores (e.g. scores of 100,90,80,80 and 70 result in a mean of 84). The median is simply the middle score when all scores are placed in ranked order. The median in our example would be 80 because it is the third score counted in from either direction. The mode is the most often occurring score. In our example, the mode is 80 since it occurs more often than any other score.

Minimum competency is a judgment of the lowest level of skill or knowledge a student must have attained to be considered "competent" in that area. Minimum competency tests are often the focus of broad national educational efforts to improve education. It is important to note, especially for high-ability students, that minimum competencies do not represent an adequate standard or expectation of performance, nor do they imply proficiency in, or mastery of, the content or skill being tested.

Normal curve ("bell curve”). The normal or "bell" curve is a common way of representing the distribution of scores for a particular competence or characteristic in a large population. Since most individuals of any population would exhibit "average" competence or presence of a characteristic, their scores appear in the middle area around the crest of the curve. Those who exhibit exceptionally high or low competence or very great or very small presence of a characteristic appear at either ends of the curve's shape. [Editor's Note: The second part of this series, in the December, 2003 issue of PHP, also included a diagram of the normal curve.]

Norm-referenced testing (or norm-referenced assessment) refers to testing in which individuals' results are compared to some larger group (such as a national or statewide sample of students). Usually, "norm" or "normal" groups are those in which the students' scores are distributed in a "normal" (or "bell-shaped") pattern. In these cases, an individual's performance is assessed in relation to where his or her score would fall under the normal curve.

Objective test items require the student to select a specific response to a question that can be graded as either correct or incorrect. They are easy to administer and score (and can often be machine-scored). Common examples of objective test items include: true-false, multiple-choice, and matching questions.

Online assessment is an assessment that is accessed on a computer via the Internet or a similar computer network. The assessment or test is read online and the responses are given online by selecting or checking a choice by clicking the mouse, typing a response, or perhaps even touching the computer screen with a special "pen" or speaking a response aloud using voice recognition technology. Online assessment may also be a vehicle for submitting a portfolio of student performances or completed assignments for the teacher to evaluate.

Percentile ranks refer to an individual's standing in relation to the rest of the individuals in the norm or comparison group (i.e., others who are taking the same test). If your child receives a percentile rank of 90, it means that your child achieved a score equal to or better than 90 percent of the rest of the group with whom he or she is being compared.

Performance assessment refers to a system of evaluating individuals' abilities or achievements based on actual work or behavior. Performance assessment focuses on the student's ability to apply what he or she has learned to a realistic task- a problem or situation that might be encountered in real life.

Portfolios are collections of an individual's work. Some educators regard portfolio assessment as a better method of observing and evaluating what learners truly know, understand, and can do than are tests and homework exercises, for example. In typical classrooms that employ portfolios, students keep their work (quizzes, test papers, creative writing, homework, book reports, project reports, art projects, etc.) in large folders, boxes, electronic files, or other storage containers. They may keep all their work or, as is more typical and recommended as best practice, students (on their own or with their teachers' guidance) periodically select samples of their work to illustrate their best performances across a variety of activities. Students and teachers also may keep work samples of various degrees of achievement to illustrate growth in ability over time or to help identify and illustrate particular weaknesses or disabilities that require additional attention.

Power tests typically have no time limits or very generous time limits so that the individual has sufficient time to answer all questions. On a power test, the goal is to measure as much as the individual can do without the pressure of time limits. (Compare with "speed tests.")

Profile. A student profile is often used to describe a student's characteristics and learning needs, to help guide important educational decisions for a particular individual, or to guide individualized instructional planning. It may contain many different kinds of data (including test scores, observations, anecdotal records, samples of student work, or comments from cumulative records) that describe the student, the circumstances that prompted creating the profile, questions or problems requiring resolution, and suggestions for making desired decisions.

Range. The range of scores is the difference between the highest and lowest recorded scores. If the lowest score is 28 and the highest is 98, then the range is 70.

Reliability refers to the degree of consistency or dependability of a test. A reliable test will produce similar scores and distributions whenever it is given to similar populations. Thus, if a student scored a 90 on an achievement test today, then, if the test is reliable, the student's score would not differ substantially if the test were taken again another day. Reliability may also mean that a student would earn similar scores on two different forms of a test, if tested at about the same time.

Rubric. A rubric is a chart or plan that identifies criteria for evaluating a piece of a student's work, be it an essay test, a paper, or some other student production. The rubric offers a description of the qualities or characteristics of performance for several levels (such as: beginning, intermediate, or advanced, or needs improvement, adequate, or outstanding) that the teacher or other evaluator may assign. The best rubrics offer the clearest details for each category of evaluation so that a student's products can be evaluated consistently. Rubrics may be "analytic" and "holistic." An analytic rubric specifies all the components of a perfect response and point values are assigned to each component. While holistic scoring also identifies a model or perfect answer, point values. are not assigned. Thus, holistic or global scoring is more subjective and may be less reliable than analytic scoring.

Speed tests are tests with specific time limits. Such a test rewards individuals who can work fast to answer the test items. Students with disabilities may be exempt from time limits set for speed tests. (Compare with power tests.)

Standardized tests are instruments that are administered, scored, and interpreted in the same, pre-specified way by all users. There are detailed instructions or rules for how a test is administered and scored. (One example of a well-known standardized test is the Scholastic Aptitude Test or SAT.)

Standards-based. To put "standards-based" in front of such terms as instruction, assessment, testing, measurement, evaluation and other terms typically means that whatever teachers teach and students do in class is evaluated against specifically written and adopted standards, or goals and objectives, of achievement, usually written and adopted at the state or national level.

Subjective tests refers to the approach used to evaluate or score the student’s response to a writing prompt, an open-ended task or question, or a "free," unstructured response to a short-answer or essay question. Unlike objective tests, in which the correct or incorrect answer selection is easily and quickly obtained, subjective assessments present a more difficult challenge to score and require considerably more time to read and to analyze carefully and equitably.

Validity is a term that describes how well a test, or a test item, measures what it claims to measure, accurately predicts a behavior, or accurately contributes to decision making about the presence or absence of a characteristic.

A Note of Caution
Any interpretations of the results of an assessment and any educational decisions should be made with the primary goal of understanding and doing what is best for the individual child. These decisions, which may involve the parent, teacher, counselor, principal, psychologist, and the child, should use the score of the measurement instrument only as one piece of information - one of many data inputs into the process.

Your efforts to understand and help your child will require that you seek from your child's teachers and other knowledgeable school personnel additional explanations and clarifications of these terms and how they are used. We provided this glossary to help inform you and to guide you in determining the information you will need, as well as the questions you might raise, to improve communication and build a collaborative relationship with the school. Alice marveled at how words might mean so many things, and although this is true in relation to testing in schools today, we hope this glossary will help you to better understand many of the terms you may encounter.

Author Note. Dr. Michael Freedman is Assistant Professor of Science Education in the Graduate School of Education at Fordham University. Dr. John C Houtz is Professor of Educational Psychology and Associate Dean for Academic Affairs in the Graduate School of Education at Fordham University.

Permission Statement

The appearance of any information in the Davidson Institute's Database does not imply an endorsement by, or any affiliation with, the Davidson Institute. All information presented is for informational purposes only and is solely the opinion of and the responsibility of the author. Although reasonable effort is made to present accurate information, the Davidson Institute makes no guarantees of any kind, including as to accuracy or completeness. Use of such information is at the sole risk of the reader.

Close Window