History Of Standardized Testing

Standardized testing has become the means of accountability for students across the nation. What is standardized testing and how should it be used?

American egos were stunned when the Soviet Union successfully launched Sputnik in 1957. How could the US have fallen second in the race for space? Americans looked for answers and found one in the failure of the schools in educating children to their full potential. This started the education pendulum to swing. The initial swing carried it to the revision of science and math programs. It pushed schools into greater science/math requirements. The arts and social studies were pushed into the background. The pendulum began to swing back in the late sixties and seventies with the "peace movement" influence. Social issues and the arts made a comeback at the expense of math and science. In the past twenty years, the pendulum has wavered back and forth. The turmoil of the seventies was replaced with the reform movements of the eighties. Educational reform has continued to maintain a steady pressure on school systems across the country. Accountability has become the buzzword of education in the nineties.

Accountability has been a sign of the times. Parents demand schools that produce a quality product, that is, children who can compete in an increasingly competitive world of skilled labor. Local politicians point to the schools as a drain on taxpayers dollars. National politicians use the education issue as a plank in election platforms. News media fill space with news of declining test scores, increased dropout rates and low quality teachers. Accountability has become the pennant of political campaigns.

Accountability is holding one responsible. Parents and politicians look to the schools to be responsible for student learning and the schools are holding the students accountable for their learning. Traditionally a student's accountability involved being evaluated by his teacher. Evaluation means "finding the value in." Today the term in education refers to making a decision regarding a student's learning progress, whether it's a report card grade or a diploma.

Evaluations are based on assessments. Assessments are tools used to measure student learning, such as quizzes, homework, oral presentations, portfolios, tests, etc. The teacher then appraises the assessments and decides on an evaluation. Teachers often took into account a student's self esteem. Failure was devastating to some students. Teachers could work with students who tried or had a desire to success but were unable to perform well on tests. Alternate assessments could be employed to determine a student's evaluation. State and national politicians demand accountability to be freed from the bias of teachers and schools. The assessment that has become espoused by most states is the standardized norm-referenced test. In most cases, the result of this single assessment has become the evaluation of a student's progress. In some school districts this single test has become the benchmark for promotion or graduation.

The call for school reform over the past twenty years has resulted in a greater emphasis on the use of standardized norm-referenced tests. The most common standardized tests used in elementary and secondary schools are the Iowa Test of Basic Skills (ITBS) developed by Riverside Publishing, a division of Houghlin Mifflin, the California Achievement Test (CAT) developed by CTB/McGraw-Hill, and the Metropolitan Achievement Test developed by Harcourt, Brace Inc. These three companies are private for profit companies. The two major college exams are the ACT, developed by ACT, Inc., and the SAT, developed by Educational Testing Service. Both of these companies are private not-for-profit companies.

In order for tests to be used for comparison purposes, the test needs to be standardized, that is, have a standard set of instructions, testing conditions, time allowed and questions asked. Standardized testing are norm-referenced tests, that is, a statistical average is established by a sample group of students. Students who then take the test are compared to the sample group or the norm. Statistical methods are employed to establish a normal curve. When a normal curve is graphed it appears as a bell shaped curve. The results of a tested student can then be placed on the bell curve indicating a score. This score can then be used to compare a student or groups of students to a larger group of students, local, state or national. Test publishing companies usually normalize their tests every 5-7 years.



Standardized tests should be reliable and valid. Reliability refers to the test construction. Reliable tests should be dependable to report similar results when taken by similar groups over a period of time. In other words, the test is free from errors in its construction and measurement. Therefore, if a test is reliable, it should be meaningful. Validity refers to the test taking. The test manufacturer instructions are followed: directions, class conditions, and test times allotted are basically the same as the initial test group sample. The test is actually measuring what students are learning. Test publishing companies are very concerned about reliability and validity. They carefully monitor the reliability of standardized tests. Validity is a factor that is more difficult to control.

The most common scores reported to parents from standardized tests are National Percentile (NP), stanine and Grade Equivalency (GE). These scores can be utilized to evaluate a school district, a school, a particular education program, a teacher, and a student.

National Percentiles (NP) represent where the student falls on the bell curve established by the test sample group. A student who scores a NP of 65 fits on the curve where 35 scores are better and 65 scores are lower. A student who scores a NP of 50 is exactly in the middle of the test group. This student is the normal average student, 50 students did better and 50 students did worse. The NP is used to determine a stanine score. Stanines are single numbers to represent a percentile position 1-9. 1-3 is below level, 4-6 is in the middle, and 7-9 represents the higher end of the curve. A student who's percentile rank is 50 will have a stanine of 5. Stanines are used as an easier means of categorizing students.

The most misunderstood score reported is the Grade Equivalency (GE). It a score determined by comparing a student's actual score with students who took the test in different grades. Therefore, a student in 4th grade who has a GE of 6.7 has scored the same as an average student in the 6th grade 7th month would on the test. It does not mean that the 4th grader is doing 6th grade work or is able to do 6th grade work. It means the student scored the same as an average 6th grader did. It is reasonable to assume that the student is doing well. This can then be used to compare the student's actual work in class. A student who scores at this level should not be expressing difficulty with reading assignment work in school. It is this score that is most often used in high-stakes testing. It is reasonable to assume that a student would demonstrate learning growth by increasing grade equivalency one year after a year of schooling. A student in 4th grade who scores a GE of 4.5 should score a GE of 5.5 the following year.

The National Curve Equivalency (NCE) is also reported to parents at times. It is a score that statistically corrects the national percentile. This score is more relevant to school administrators than it is to parents. There is little sense including NCE scores on parent reports. The NCE is often confusing when the NP is reported.

Just how reliable are Standardized tests? As long as the same test conditions as the original test group sample is maintained, the test should be reliable. It is very important that the parent keep this in mind when the test is administered. If a child is sick or upset, if the room conditions are unusually hot or cold, if the directions are not clearly given, if the time allotted is too long or too short, a question concerning the validity of the test is brought into question. If the conditions are valid and the test is reliable, then comparisons can be made in norm-referenced tests. Teachers should be able to identify problems but when tests are high stakes orientated, the teachers are often ignored.

Many states have begun to develop standardized testing to meet state standards. Arguments in support claim that there needs to be some proficiency that a student should demonstrate after a year of education. The student, teacher and school need to be accountable for test results. Opponents of standardized testing claim that a single assessment will reduce the quality and levels of learning. They say that students will be trained to take a test because teachers will be forced to "teach the test."

The question of fairness in comparing is a factor to be considered. Proponents of testing point to the need for all students to be exposed to good teaching. Testing will expose schools that need to be reconstituted to fulfill these needs. Opponents point to the great discrepancies between wealthy school districts that have vast resources and poor districts that can barely maintain their buildings.

The use of standardized testing for high stakes evaluation such as promotion and graduation is a hotly contested issue. Proponents claim that this will hold the teacher and student accountable. Opponents point to the inherit unfairness of using a single assessment given on a single day to evaluate a student's entire year of learning.

The battle of standardized testing will be contested for years to come. Both sides have strong arguments to support their case. Students, teachers and schools need to demonstrate student learning growth. Does standardized testing fairly measure this growth?

© High Speed Ventures 2011