Measures of States’ Political Ideology:

Issues of Validity, Reliability, and Utility

Throughout the past two decades, a debate has continued over how to best measure the reification that is state political ideology (SPI). Ideology is an extraordinarily difficult concept to measure, even in reference to individuals, and is considerably more troublesome when applied to large populations. Considering definitional, data collection, and conceptual issues, a true measurement of ideology is impossible within hundreds of thousands of residents grouped by geography.

In this article, I will show why measuring SPI has proved to be so difficult considering issues of validity, reliability, and utility. I will begin by defining what I mean by these concepts. I will then report the difficulties that can be generalized to the measurements of SPI as a whole. My research will conclude with an in-depth look at recent examples of SPI measurements and discussions of their validity, reliability, and usefulness.

Defining Validity, Reliability, and Utility

SPI is not very useful, and even if it could be shown that a measure was useful, this measurement could not be accomplished with existing data without severe cost in validity. For the purposes of this article, I define validity as how well a test measures what it purports to measure. Of the several different types of validity, only construct validity is often applied to measures of SPI. Construct validity is concerned with the measure conceptually; it assesses whether the measure is consistent with the accepted hypotheses and relationships surrounding particular concepts. Face validity, the idea that a measure is valid "on its face," or that it is a reasonable way to ask or answer a question when all terms are "unequivocal," cannot be applied to measures of SPI because it is a reification. A reification cannot necessarily be measured unequivocally because it is not concrete. Content validity, which addresses whether a measure reflects all of the ideas within a specific field, and criterion validity, which assesses the appropriateness of the criteria that are used in a measure, or how well a test accurately predicts what it purports to predict, are not necessarily applicable to measures of SPI because these measures are not observed. In this paper, I test the validity of any measure whether the author(s) define and address the validity of his or their argument. Specific validity issues will also be addressed within each argument. Because only one type of validity can be addressed, validity as a whole is somewhat undermined. If only construct validity can usually be measured, we must question validity as a whole because other types of validity are apparently unavailable.

Reliability does not take on so many different forms. Reliability can be defined as a test’s ability to be repeated and find similar results. No test will always find the same exact results, but the best measures can find very similar results in repeated trials. I will test reliability by assessing whether a test, at least in principle, is replicable.

I define utility as what pertinence the measure has or how the measure is beneficial to society today or to some community when it was published. The terms usefulness and utility are used interchangeably within this paper. My test for usefulness for each of the measures is: does the author specify the substantive gains that we may achieve if we use SPI? If a measure has no meaningful purpose, it can be argued that the research is not necessary. In this paper, I will discuss these three characteristics in a few representative measures of SPI, including the authors’ discussions of validity, reliability, and usefulness, as well as my critique of these three criteria within his or her assessment.

Overall Problems with SPI

To believe that state political ideology can be assessed, one must accept that aggregating individual scores on ideology scales is a good way to indicate ideology of a whole. It cannot necessarily be assumed that just because individuals live in close proximity that they will have similar political beliefs. In measuring SPI, each researcher is saying that state populations have similar enough political beliefs for each state to be characterized by one word, conservative or liberal, or to be placed upon a spectrum between those terms.

Assuming that people within a state share some political "mindset" also assumes that the

most individuals have political ideology at all. Most studies of SPI are based on a liberal-conservative continuum. Using this continuum can be tricky because not everyone defines liberal and conservative the same way. In his article "Varieties of American Ideological Spectra," Herbert J. Gans points out that there are many different continuums in use today and "(w)hile all these spectra have left-right (and center) points, they vary by what they mean by left and right…" By using a liberal-conservative continuum, some validity is sacrificed because the question being answered need not be clear. Asking a participant if he or she is liberal or conservative is only valid if he or she understands what each term means and defines each the same way as the researcher. Reliability also suffers due to this lack of clear definitions because if an individual does not know what an idea means, he or she may not always answer the same way. An individual may one day be conservative and the next day be liberal if he or she does not fully understand the concepts.

Use of the liberal-conservative continuum is also questionable because, even if an individual can call him or herself a liberal or a conservative, he or she may not know where to put him or herself on a continuum. Not only do most people not define liberal and conservative the same way, but W. Russell Neuman finds that most people do not have a clear understanding of what the terms mean. In order for a respondent to place himself or herself on a continuum, he or she must have a decent understanding of what the ideas that are being measured represent. Many people do not have this understanding of political indicators such as liberal and conservative. Members of the "mass public" are politically unsophisticated and uninterested, Neuman finds in his research. If most of the population knows little and cares marginally about what is going on in politics, then most of the population probably cannot put themselves a liberal-conservative continuum with any validity or reliability.

On the other hand, if a person is somewhat versed in political terminology, or has a firm grasp of the concepts of liberal or conservative mean, it can be difficult for that individual to place himself or herself on a single-dimensional spectrum. For example, some people may be economically conservative, but at the same time socially liberal. Answers from this person concerning political ideology will inevitably vary because they cannot place themselves on a standard, single-dimensional scale. Validity is sacrificed because the ideology of individuals like this is not accurately represented on this type of continuum. Reliability is lost due to the inconsistency of answers that may exist from the lack of choices indicated by a single-dimensional spectrum.

Exploration of Specific Texts

Though there are many inherent problems with the idea of measuring SPI overall, there are other problems within specific measures concerning validity, reliability, and utility that go beyond these basics. In comparing studies of SPI, there are many levels of validity, reliability and usefulness; some studies are better than others. Authors generally outline their own assessments of their validity and reliability, and occasionally assess how their measure is useful.

Wright, Erikson, and McIver

"Measuring State Political Ideology with Survey Data," by Gerald C. Wright, Robert S. Erikson, and John P. McIver, was the first study I looked at concerning SPI. These researchers argue that ideology at the state level has been ignored because there have not been sufficient data at the state level to assess ideology. They claim to offer the first example of a "survey based estimate of party identification and ideology in the American states." The authors aggregate the responses to a CBS/New York Times telephone poll by state. The survey asked the respondents about their party affiliation and liberalism and conservatism. According to the authors, their study is meritorious because it uses actual interview responses instead of inferring SPI by non-survey measures. Their predecessors used indirect measures such as election outcomes or socio-economic status of the population to find SPI.

At face value, this measure has some merit. Using direct survey data has more criterion validity than using other factors to infer SPI. If the authors purport to measure ideology, then asking people about their personal ideology is a more valid way to answer their question. Unfortunately, the problems presented by Gans and Neuman arise in surveys that ask respondents to place themselves on a liberal-conservative continuum. If the respondent does not understand the question or does not have the same definition of liberal or conservative as the surveyor, the test loses some validity, because there is no sure way to know if the respondent and the researcher have the same definition of concepts.

The authors defend their construct validity on the basis that their measure corresponds with "earlier multistate surveys," state party registration, and state electoral behavior. They assume that if other representations of data that relate to their data have similar answers to their research, then their research is valid, an assessment using construct validity. This may very well be true, but it rests on assumptions about these other examples of data. If it is acceptable that party registration and electoral behavior indicate ideology, then the construct validity gains strength. This measure passes my test of measurement of validity for construct validity because the authors address this type of validity. Other forms of validity are sacrificed because they cannot be shown, though there could be some ability for a defense of construct validity to be developed.

Wright, Erikson, and McIver’s defense of their reliability is that the surveys used simple random sampling to select the respondents. Usually, simple random sampling would prove to make a study reliable, but Wright, Erikson, and McIver run into a problem with sample size. For a statistical sample to be reliable, it must be very large, and, as later SPI researchers point out, Wright, Erikson, and McIver do not always follow this law of statistics. In 20 of the states that they measure, fewer than 1000 respondents were polled, and in 14 of those, fewer than 500 were polled. Since the survey that Wright, Erikson, and McIver use is a national survey, there is no assurance that the data at the state level will be representative, so reliability may suffer.

This reliability issue in Wright, Erikson, and McIver, is illustrated by their data from Nevada. They find that Nevada is the most liberal state in the United States, a highly debatable finding. The authors make an "adjustment" to Nevada’s data so that it fits in better with the political culture of the states around it. This highlights the reliability problem because if the test were to be done again, the research might not establish the same results since results were inferred from the states in the same region. This also brings up a validity issue because if we are looking a data affected by regional variation and not state, then the question of state political ideology is no longer being answered and is so less valid. Validity as a whole is based much on reliability in this case. If the results are not reliable because of small sample sizes, then the validity is not trustworthy. How can a measure be assessed to be measuring what it purports to measure if the data are not reliable?

Wright, Erikson, and McIver’s measure may not pass my test for reliability because it might not be able to be replicated. Especially with the regional assumptions about Nevada, another researcher with the same data might not come to the same conclusions. And if another researcher were to take similar random samples and apply the same methods to similarly small samples, he or she might not get the same results do to the small sizes of some of the samples.

Wright, Erikson, and McIver give little reason for why their study matters, other than that it is the first time that a survey has been used to measure SPI. Thus, they fail my test of utility. They give no indication of what SPI is useful for or why political scientists or the general public should care to know about it. Since they do not indicate a good reason for the research, I have to doubt its usefulness.

Political ideology does not stay the same over many years; it is not a stagnant force. Even when the article was published in 1985, it was already out of date. The ideology of a state in 1974 may do little for anyone now. Even if SPI were useful to someone, that person would probably want an accurate and current measurement, not one that was already old when the article was published. Other researchers have argued that in some states, "citizen ideology varies substantially over time," and that Wright, Erikson, and McIver "…are wrong about the general stability of citizens’ ideological orientations in the American states." The measure is not useful if it does not account for change that may occur over these years, especially because Wright, Erikson, and McIver do not indicate why data from these years should be aggregated.

Overall, Wright, Erikson, and McIver provide little demonstration for the need of their measure. They indicate that a measure like theirs has never been done before, but they lack a sound reason for producing the research. If there is no void in the field that measures of SPI fill, then there is little use for research of it.

Holbrook-Provoe and Poe

Thomas M. Holbrook-Provoe and Steven C. Poe in their article "Measuring State Political Ideology," examine five different measures of SPI to assess which is most valid and reliable. They find that the "roll-call measure" is the best measure when compared to policy measures and survey-based measures. The roll-call measure uses "the roll-call voting behavior of the state’s congressional delegation." This measure works on the assumption that legislators vote what their constituents want. They claim that this measure received "high marks" in "practicality and ease of use," as well as in "empirical performance." The authors do not address the specific validity and reliability of each test that they assess but that of measuring SPI as a whole. It does seem that the measures correlate well with each other, which the authors see as a good indication of their construct validity and reliability.

In the piece, the authors address criterion validity, so they pass my test of addressing and defining validity. The aforementioned problem with criterion validity in reference to SPI is apparent in this work. SPI is a reification, so its criterion validity is difficult to assume. They treat SPI as a concrete measure, which it is not, and assert that people’s political behavior relies on and affects SPI. The authors seem to miss that their test has strength in construct validity, since different measures are compared to one another, but they do not address this.

There are other specific validity problems with the roll-call test of SPI. First of all, the assumption that the way that a legislator votes is always representative of how his or her constituency feels is contestable. Other things may affect how a legislator may vote, such as party indicators, favors to other legislators, or in response to particular donors or interest groups. This measure is less valid because it cannot be known all of the reasons a politician has for voting a specific way. By using this measure, Holbrook-Provoe and Poe also assume that issues and a population’s issue preference can be situated on the liberal-conservative continuum. Just as it is with individuals, it is difficult to say whether a specific piece of legislation is conservative or liberal. By using a one-dimensional continuum to establish ideological placement of issues and population, the authors sacrifice some validity. Also, there is still the issue that Neuman established that many people do not fit on the liberal-conservative spectrum because they do not understand it. This influences the validity of the measure. This does not render their measure completely invalid, but, as it has been established, using the liberal-conservative continuum is quite problematic.

The use of the continuum may also be problematic for reliability. By attaching a level of political ideology to votes on legislation, Holbrook-Provoe and Poe’s measure may lose some reliability because another researcher attempting the same test may not attach the same indicators to particular legislation. Also, results are only considered for one year, 1984. This is problematic for reliability because there is little evidence that their test would hold up for another year. Even if it is measured against other tests, the same test has not been tried a number of times. There is a certain loss of reliability since it was only performed once. But there is specific data provided by the authors, so there is some indication that another researcher could replicate the test, thereby passing my test for reliability.

A one-year test is commendable for utility. A measure using data from one year, if SPI can be measured, is useful to show something about that year. Unlike Wright, Erikson, and McIver, Holbrook-Provoe and Poe do not aggregate results from many years and assume that SPI does not change; they instead use results from one year and so make their test very useful to anyone who wants information about 1984. The authors also provide some indication of what SPI might be used for, and they argue that SPI "…is essential to many analysis carried out at the state level," including economic and political studies. Even though the authors give some indication of a reason for this test, it is still problematic to argue that the test is very useful in the present day. Understanding that a form of SPI may be useful for economic or political studies, how does roll-call voting from 1984 make any difference to anyone in 1987 (when the article was published) or in 1999? Since this measure relies on past data to establish SPI, it is hard to say that it could be useful now or even at the time of publishing. The usefulness is also arguable because this measure does not specifically account for the public, it only infers something from legislators.

Medoff

Marshall H. Medoff measures SPI based on several variables including "constituency economic interest, legislator shirking, and political party" in his article "The Political Implications of State Political Ideology: A Measure Tested." By accounting for several variables, Medoff claims that his test is reliable and valid assuming "that a constituent ideology exists." Medoff finds that his test is most reliable when he uses more variables; that is, when he accounts for all three of the variables above, he is more likely to get the same results in a given test. He measures the validity of his test in that the scores he gets for particular states reflect the policy output that is produced in that state.

Medoff specifically addresses validity, passing my test for validity, but he neglects to indicate what form of validity his test addresses. Medoff’s validity test comparing policy outputs to his results, a form of construct validity, is justifiable, but there are still ways that his validity is not secure. It cannot always be assumed that legislation coming out of a state is an accurate representation of the ideology of the people in the state; it can only be assumed to represent that of the legislators. Similar to the assessment of Holbrook-Provoe and Poe, Medoff assumes that legislators are a strong representative of the population as a whole, something that is not completely convincing. It also assumes that a particular ideology can be assigned to legislation, which does not account for bipartisan legislation or legislation that is not ideologically motivated. There is also the consideration that just because a piece of legislation is conservative or liberal, it does not always mean that the state as a whole is aligned with that legislation. For example, in California, one of the most liberal states by all measures, there has recently been some very conservative legislation that stopped public service availability to illegal immigrants (Proposition 187) and legislation that abolished Affirmative Action (Proposition 209). Even with these arguably conservative legislation, it is not likely that anyone is going to claim that California is not a liberal state ideologically, but it shows that legislation need not be a dependable measure for ideology.

The reliability for this test is stronger than the others that have been examined in this paper. Because Medoff’s test takes into account many different variables, it is far more reliable than single-variable tests. It is likely that this test could be replicated.

For usefulness, like other authors, Medoff makes no argument for the pertinence of his research, failing my test for utility. He does analyze in the end which states a politician might need if he or she were to win the presidency, so he is making a case, indirectly, that SPI shows which state candidates should spend their time working in. He does not directly indicate that his information would be useful in this way. Many other factors affect how much time is allocated to campaigning in a particular state, such as how many electoral votes a state has or how competitive the state is to win for any candidate, so SPI studies are not likely to greatly affect campaigns. It is doubtful that politicians will stop campaigning in California simply because Medoff says that he or she also needs to spend time in Ohio. This also depends on SPI not changing from year to year in order for the information to be useful to candidates. Since it is not static, for the most part, it is not necessarily useful in this way.

Berry, Ringquist, Fording, and Hanson

Berry and his colleagues in their article "Measuring Citizen and Government Ideology in the American States, 1960-93," use measures of roll-call voting, outcomes of congressional elections, partisan division of state legislatures, party of the governor, and assumptions about voters to measure SPI. They use data of both elected officials and losing candidates to account for opinions of all voters. In assessing congresspeople, Berry et. al. use ratings derived by interest groups. Like Medoff, the merits of this measure come from the fact that the researchers use so many different variables to measure SPI. They include measures from elites and the mass public, so the test is more valid than others that account for only one sector of the population. They also consider the views of challengers who are not in office by estimating their views from other legislators of the same party. They claim that this helps better represent the "average" person. Berry et. al. measure the reliability of their scores against scores by other organizations. They find that their findings correlate with other researchers' findings. The validity of the scores is measured by "correlating it with an accepted measure of ideology" for a given state. They assess the validity of each assumption that they make about voters specifically and assess the construct validity of their scores by looking at "other variables linked to the ideology in the theories of state politics."

Berry and his colleagues’ test has the most validity of any of the tests that have been explored in this paper. They address specifically the validity of each of the assumptions that they use within their measure, passing that test of validity. Because these are all assumptions, the validity that is measured within each is construct validity. The fact that they account for so many sectors of the population, not just or elites and not just the mass public, means that they are considering all parts of the state population. They even go so far as to account for the people who lose elections. This is very impressive considering measuring state ideology began with surveying one group of people about one issue. The problem that arises within this test is that all the data that they use for voters are derived from assumptions. It is hard to justify a measure that has so little concrete evidence. Berry, et. al.’s first assumption is that voters are "arrayed on a liberal-conservative ideological continuum." This is a debatable fact if research by Gans and Neuman is considered. It cannot be assumed that voters are arrayed on this continuum, or that there is only one continuum on which they are arrayed. There may be variation of this continuum, if there is one at all, between different states and within any state. Also, the measurements used for candidates who do not win are not real data but instead are based on "the typical ideological position of incumbents from the same party." It is good that Berry, et. al., try to take in so many sides, but it is only marginally valid to say that this is an accurate assessment of the losing candidate’s ideological position. There will always be assumptions in this type of research, but with so many, there has to be some doubt in the validity of the test.

This measure is also more reliable than others looked at here because it accounts for so many different variables. The great number of assumptions that it is based on, though, also creates problems in reliability. It assumes that another person redoing the same test would make the same assumptions, making the replication more difficult. Only if another researcher makes the same assumptions is the measure useful, and only if the measure is proved useful would another researcher make the same assumptions. Berry et. al. also consider many years of data, further showing that their data is reliable since it is not only true for one year, but it shows the ideological change over many years.

The authors’ discussion of construct validity indicates that their measure of SPI correlates with other variables within a given state. This most recent study finally justifies a use for the test by showing that SPI relates to other variables, but it still rests on the assumption that the use of their research comes out of what they address: that it is important know what ideology can be established for a given state. Showing change over time in particular states, they do remedy the only-good-for-that-year problem. They show that state political ideology changes; it is not a static element of politics in any given state.

Conclusion

Though it seems that researchers are coming closer to finding valid and reliable ways of measuring SPI, as shown through the progression of texts in this paper, the lack of utility and other problems continue to bedevil these measures. Though both Medoff and Berry et. al. use numerous variables in their measure, they base much of their research on many assumptions. The problem is that all of these authors assume that the general public can be arrayed on a ideological spectrum, something highly debated. Since Neuman established the problem with the unsophisticated mass public, and Gans pointed out the problems with single-dimensional spectrums, it is difficult to say whether most people have an ideology that can be measured or whether that ideology can be placed on a continuum. It is not fair or accurate to only account for the elites because then the test is not longer valid; it is measuring the ideology of elites, not of the state. The validity of any of these measures is questionable do to the lack of availability of information from the mass public, the difficulties that go along with measuring the mass public in politics, and the problems with measuring only legislative outputs to show SPI.

The non-empirical nature of these measures complicates the validity. Construct validity can usually, and is usually, found at some level within each measure, but other forms of validity are impossible to find because SPI is a reification. If we can only find this one form of validity because SPI cannot yet be proven, we must still question the validity of the measures to some extent.

Because of all the assumptions on which measures are based on, we must also question their reliability. Could these results be reproduced? Or would the next researcher not have the same assumptions as this one did. Each test has its own reliability problems, the ones with more variables being certainly more reliable, and more useful, than those that only measure one variable in one year. Some of the tests did pass my test of ability to be replicated for reliability.

The question of usefulness is rarely addressed specifically. There has not yet been a great indication for why this research should be done at all. The measures are often outdated by the time they are published, another reason to question how useful they are to the present community. Since ideology is not static, we can learn little from the old data other than what the ideology of a given place was like in a given year. Even if a successfully valid and reliable method for measuring SPI were found, I have yet to see a great use for this measure. Without a distinct use for the results of these measures, there is little reason to continue researching state political ideology.