Maria English and James Kilcup

Abolish Speaker Tabs

A bold case is made that speaker tabs ought to be abolished, as they are both a poor metric and create a bad incentive structure.

At the end of a tournament on the British debate circuit several years ago, fortuitous interior architecture at the finals venue made it possible for the Chief Adjudicators to drop printed speaker tabs from a balcony onto debaters waiting below. Like manna from heaven, the tabs fell into the outstretched arms of speakers who eagerly pored over the sheets.

Many debaters invest a lot of importance in the individual speaker tab; so much so that its importance rivals team outcomes for some. It is therefore important to question the actual value and utility of the speaker tab to debating.

Reward systems in competitive activities should aim to improve the activity’s quality and fairly reward excellence. Speaker tabs are particularly bad at both of those things and so we would not regret their passing. First, we argue that speaker tabs are a deeply and irretrievably flawed metric. Second, we argue that speaker tabs create the wrong mix of incentives, which detracts from the quality and enjoyment of competitive debating as an activity. Finally, we will consider some possible objections to our proposal.

Bad Metric

The primary goal of giving awards at competitive debate competitions is to weigh the relative merits of the competitors and reward their excellence. Because we participate in an activity that is (correctly, we think) viewed as a signal of one’s ability to perform in various professional activities (academia, business, law, activism, etc.) these awards can also be important for individual debaters in their professional aspirations. Competition also requires a basic sense of meritocratic fairness. The integrity of competition is weakened when participants perceive the competition and its awards to be based on arbitrary or unfair methods.

For many reasons, the method by which speaker awards are assigned is pretty hard to describe as fair. There are two forms of criticisms we will advance, those flaws that are essential to awarding speaker points, and those that are incidental (i.e. possibly resolved in particular instances) but common enough to warrant concern.

First we need to ask: what is a conscientious judge looking for when allocating speaker points? A judge might try to abstract from the content of the debate and ask simply: who spoke best? This is tremendously flawed. Evaluating speaking as an activity distinct from the content of the arguments advanced turns the attention of the judge away from the most important aspect of debating: the exchange of ideas. Such an approach also will tend to focus on highly subjective aspects of performance—the quality of someone’s vocabulary, the origin of their accent—markers of perceived sophistication and presence that are substantively unrelated to the quality of their arguments. The focus on these aspects of speaking need not be biased against already disadvantaged speakers, but likely retrenches those subconscious preferences nonetheless.

Let’s suppose instead that our conscientious judge asks: which speaker most persuasively advanced their case? This too becomes complicated, as it requires weighing other nebulous factors such as role-fulfillment, the quality of rebuttal, and the weight of new material. Is it more persuasive to effectively summarise a round as a whole or to introduce and effectively make the best argument for your bench? Probably the latter, but if that is the case the judge privileges member speeches in principle over whip speeches. Additionally, there is always the danger of misplacing credit based on when a strong argument emerges. Judges have no way of knowing which partner on a debate team formulated the argument, only which debater delivered it first. Frequently, so long as the initial formulation of the argument is not markedly deficient, the first speaker to advance a key argument is associated with that argument. Speaker points allocated on this basis favor first speakers without being able to discern genuine responsibility for argument creation.

The reliability of speaker scales is also undermined by the frequent absence of a direct comparison of speakers. At large tournaments with a limited number of preliminary rounds (such as Oxford, Cambridge, Yale, Hart House, etc.), speakers in the top-10 frequently have not had a single round competing directly against one another in the tournament. This means the comparative between these speakers is based on performances against different opponents in front of different judges.

These reflections suggest that, because of the nature of WUDC format debating, its roles and manner of preparation, as well as the ambiguity of what a speaker is to be rewarded for, the task of accurately assigning speaker points to individuals is almost impossible to do, even by the most insightful and well-intentioned of judges. These issues are problematic regardless of practical matters such as the quality of judges and community practices surrounding speaker points. Now we’ll turn to the way in which these scores are assigned in practice. The first is the problem of standardization, the second, of time constraints.

The tool used to solve the standardization problem, speaker scales, has been ineffective. The truth is, judges from different regions and from different eras have different conceptions of what an 80 is, and getting all of these individuals to consistently apply a single standard is nigh impossible. In the U.S. for instance, we’ve noticed that judges are extremely reluctant to give speaker scores above 80, while in the Canadian BP circuit, in response to speaker point inflation, there is a decided effort (at least present in the recent North American BP Championships held at Hart House) to bring speaker scores down. The consequence of such differences is that debaters’ speaker scores—relative to the rest of the field—are inflated or deflated based on the random allocation of judges to their rooms.

Secondly, and perhaps more damningly, precious few moments are actually spent discussing speaker points by adjudication panels, and to the extent that there is a conversation, it rarely touches on the topics that would be necessary for determining who deserves credit. The vast majority of the time spent in an adjudication is rightly focused on the placement of the teams in the debate, and only at the very last moment does the adjudication turn to speaker points. Even then, the conversation is constrained by the fact that each speaker’s individual points must add up to a number that corresponds to their team’s placement in the debate. A discussion of the team’s respective ranking is, by its nature, distinct from an examination of each respective speech. In reality, that discussion rarely happens. Speaker points are assigned within seconds after the panel vaguely agrees on which of the two speakers was better. To assign more time for speaker points would mean either less time focused on a comparative discussion of teams (an obviously unacceptable outcome) or more time allocated for adjudications. The latter option might be acceptable were in not for the cascading problems tournament organizers would face by adding an additional 5-10 minutes to adjudications for speaker point deliberation.

If these were the only problems with the fairness of speaker points, and if they could plausibly be resolved through a reform, then perhaps we would advocate for reform of speaker points rather than abolishment. Neither of the types of issues mentioned above, however, appear likely to go away.

Bad Behaviour

As well as fairly rewarding excellence, reward systems should aim to improve the quality of the activity in question. So how do we measure quality? Presumably, ‘good debating’ is that behaviour which best enables participants to achieve the purpose of the activity in which they are engaged. Though there are many reasons why people debate, it seems safe to say that the main purpose of competitive debating is to win debates. Performing well as an individual speaker is also important, but subsidiary to the main task; in the ‘game’ of debating, it is teams rather than individuals who win.  This is clearly reflected in the fact that, as previously mentioned, judges rightly spend far more time deciding who won a debate than choosing which speaker was best.

Thus, a good regulatory system should align the incentives of debaters towards this kind of behaviour. By the same token, a bad regulatory system would be one that incentivizes debaters to behave in ways that hinder effective team performance.  So what role does the speaker tab play? It is not essential to the effective functioning of the activity of debating; speaker marks are not used to determine the outcome of debates, and they are not necessary to resolve tabbing issues. The only necessary function they play is in breaking ties between teams on the same number of points. For this reason, we second Doug Cochran’s recommendation at the 2013 World Debate Forum of assigning team “speaker points” that assess the overall quality of the team’s effort on a 50-100 scale. Unlike assigning individual speaker points, this decision by the adjudicators would flow naturally from their discussion of the quality of the teams in the debate.

What the speaker tab does, in theory, provide is useful information for individuals about their own speaking performance and reward for speakers who perform well. This should in turn encourage individuals to become better debaters, and give them signals to guide that improvement, thus enhancing overall team performances.

But in practice, this doesn’t happen.

1. It encourages excessive individualism at the expense of the team activity.

Debaters have very scarce time, both during preparation time and in their speeches, and a limited range of strategic options. Operating from ‘win the debate’ logic, the incentive is to share your best ideas with your partner and to allocate scarce time to building the strongest possible team case, even if that requires giving your partner the best material.

What happens when you introduce the speaker tab into the mix? The speaker tab is an exhaustive system; it directly compares every speaker in the tournament. Because it allocates an explicit ranking to every speaker supported by numerical data, it gives the false appearance of creating fine differentiations in performance. So long as one considers speaker scores a somewhat accurate measure, a speaker’s specific location on the tab signifies something important about their performance relative to the rest of the field.  The effect of this is to place all speakers in competition with each other, including their own teammates.

The speaker tab creates perverse incentives and distorts ‘win the debate’ logic, because the tab will not give you credit for anything that is not in your own speech.  In the worst cases, that means keeping good points for yourself rather than sharing them and stealing arguments your partner was supposed to deliver. More subtly, the speaker tab incentivizes the wrong ‘at the margin’ decisions.  To do well on tab, it makes sense to allocate scarce preparation time to developing your own material, and use your speech to strategically build your own arguments rather than to either lay groundwork for your partner, or make the arguments they have already given look stronger. Because this effect is subtle, it is difficult for speakers to self-monitor or to identify when their partner is not being a fully effective team member.

Despite debaters’ best efforts, the speaker tab ensures the spectre of competition always remains between teammates. Amongst the higher-ranked teams, it is competition to be shown by the tab to have been responsible for a greater part of the team’s success than your partner. Amongst teams that look unlikely to break, beating a partner on tab becomes something of a consolation prize that is still within reach even when a break round is not. This competition is poisonous to the team dynamic, and counter-productive to the main purpose of debating, which is to work effectively as a team to win. Even if speaker scores were fairly objective, well standardized, and decided after some reasonable discussion of each person’s speech, this would be something to worry about. Given the scores are none of those things, there is even less justification for actively making it harder for teams to work together.

2. It actively deters some people from debating.

Our experience suggests there is a point somewhere down the tab where rankings begin to be a deterrent rather than an incentive to try harder.

The nature of the tab is such that for every person who does well, someone else has to do poorly. In particular, the more accolades given to those who come out on top, the worse the defeat is for those who end up on the bottom. If you have finished a tournament on only a handful of team points you will be feeling bad enough. Seeing numerical confirmation that you, personally, were the worst speaker in the whole competition takes the signal of failure to a whole new level; it is an individualized, specific and supposedly objective evaluation.

This harm would be somewhat mitigated if it were the case that younger and less experienced debaters advanced up the tab over time as they became better debaters, creating a motivational feedback loop. But that requires speaking improvements to be reflected in improved speaker tab performance. Because speaker marks are a dodgy measure, they often fail to track such improvements, particularly gradual ones, at least over the short term. There is a reasonable likelihood that people who end up near the bottom of the tab at their first tournament will not be able to get out of the bottom quartile for their next few tournaments. Unfortunately, it is in these first tournament experiences that most people decide whether they want to dedicate a substantial amount of their free time to debating rather than something else. There is also a path dependency problem because positions on the individual speaker tab are to a significant extent predetermined by prior performance. The speaker tab creates an assessment bias whereby judges tend to give higher marks to speakers who have finished highly on tab at previous tournaments, both because they are more likely to perceive that speaker favourably going into the debate, and because if they give a previously successful speaker a lower than expected score they may well be challenged about it, whereas hitting an unknown fresher with a 67 will ruffle few feathers.

The end result is that new and inexperienced debaters who tab poorly are unnecessarily discouraged. This makes it harder for them to persevere and improve, two things which are vital to improving the overall quality of debating. This seems a high price to pay for the further affirmation of those near the top of the tab, who are likely to be winning many debates and getting positive judge feedback anyway.

3. It damages the credibility of judges.

As discussed above, the integrity of debating competitions depends on the rewards system being perceived as fair. But there is a major mismatch between the credibility of the speaker tab as a measure of debating ability, and the value attributed to that measure as a signal of status both within and outside the debating community.

It is clear that the speaker tab is not an accurate measure and that it cannot possibly yield results accurate enough to support the degree of differentiation suggested by the exhaustive list of rankings. How big, really, is the difference between the tenth and twentieth placed speaker on tab in a given tournament? There is no way of knowing, and most likely the answer would differ widely between tournaments of comparable size based on the particular speakers involved and the debates that happened. Debaters are well aware of this fact- and for any who are not, it only takes a few experiences allocating speaker marks on a judging panel to become aware that one rogue generous chair judge or a last minute decision to add on a few points here and there to ‘make the scores add up’ can send speakers whizzing up and down the tab rankings.

Yet it remains the case that many speakers continue to view the ranking of themselves and others on tab as highly significant, and in some cases more significant than the actual outcome of their debates. Perhaps part of the reason is that, if there is a list, it is human nature to want to be on top of it, even if the list is compiled in a way that bears little resemblance to performance. This is especially the case among the typically quite competitive and intellectually motivated individuals university debating attracts. The boost high tab rankings give to a CV further reinforces this impulse.

A common and totally understandable response to this mismatch is exasperation. It is not clear what you need to do to get from thirteenth to fifteenth; the criteria for doing so are either ambiguous or indiscernible. Many debaters find themselves disappointed either with their overall ranking at the end of tournaments, or with their scores in particular rounds.

As well as causing much frustration, this damages the credibility of judges. Anecdotal evidence suggests that when speakers do not get the marks they feel they deserved, they more often attribute it to the bad judgment of their judges rather than the semi-arbitrary nature of the mark allocation process. It is not uncommon to hear speakers claiming that a certain judge ‘just doesn’t like me’ or ‘was too liberal/conservative/ignorant to buy my argument’, whereas quite rare to hear a speaker say ‘if only the judges had been able to spend more than one minute discussing my speech, they might have recognized X good thing I did’ (in fact, we have never heard anyone say that). This makes sense; once you have decided you want to succeed within the ranking system, it is self- defeating to dismiss scores as something less than a valuable metric. Frustration with speaker marks can make debaters doubt that they are being judged fairly not just as individuals, but in the debate overall, and thus undermines the adjudication that really matters. It makes debates much less fun for both speakers and for judges.

A radical suggestion

The above concerns lead us to the conclusion that the speaker tab should be abolished, and replaced by a system where an alternative ‘team score’ system is used to resolve tab issues but not used as a basis for awards . This is not to deny the possibility that certain individuals may excel at tournaments, or that such excellence should be recognized. It is simply that speaker scores are a very poor way of identifying excellence, and they actively encourage people to debate worse, give up debating and doubt the fairness of their judges.

Possible Objections

1. People don’t care that much about how they do on tab, because they are aware of its constraints

If this is the case, then abolishing it would be unproblematic. Moreover, people do seem to care a lot. Especially for teams that go into tournaments with little chance of breaking, the speaker tab takes on a lot of importance as an indication of success. In competitive or budget-stretched societies it may also be the case that failing to perform well on tab means not being selected for future tournaments.

2. Speaker points help speakers mark their improvement.

If the tab is not accurate, then it’s not accurate feedback. You’re much better off thinking of your progress as a team effort that will show up in your team ranking, and you can certainly get individual feedback from judges. Abolishing the tab might even encourage speakers to seek more feedback from judges as an alternative gauge of individual performance, and this is a far superior source of information. As discussed above, the tab can discourage new debaters from trying to improve.

3. Speaker scores are useful information for debate societies in the aggregate when making decisions on how to allocate limited resources to debaters.

If debate societies are currently making resource allocation decisions based on speaker points, the foregoing analysis suggests that they should stop, and instead use other metrics such as team success and external evaluations and feedback. The fact that such decisions are being made on the basis of speaker points also underscores the likelihood of our concerns that teammates have incentives to our-rank each other.

4. The natural incentive to win and be someone others want to speak with prevents the perverse incentive we talk about (i.e.competing against your teammate)

The fact that it is possible for people to ignore the speaker tab in favour of other, better objectives is not an argument for keeping the speaker tab, particularly because the logic of a rewards system is that people should want the rewards it distributes. As long as people want to do well on tab as well as win debate (which we contend they do), perverse incentives towards selfishness will make it more difficult to be an effective team member. The more importance is lent to the speaker tab, the harder it is for other incentives to win out.

5. Individual excellence should be rewarded.

We are not opposed to individuals getting recognition in principle. If a system could be devised for doing so that would be accurate, would not distract from the purpose of adjudications, and would not incentivize the wrong sorts of behavior, we would be in favor of it. That being said, debate is ateam activity.  Furthermore, we suspect that even if all awards are focused on teams, the individual egos of excellent debaters will likely survive.

Conclusion

In conclusion, the speaker tab should be abolished. It exists to measure performance and reward excellence, but the nature of the individual scoring system and the practical reality of competitions render it a deeply inaccurate measure. It should help debaters strive for improvement, but instead it skews incentives away from good teamwork, perseverance, and credible adjudication.

Rewards based on bad maths are neither meaningful nor fair, and systems that incentivise bad behaviour are counter-productive. Far from heavenly manna, the speaker tab actively moves us away from fair, high quality debating. So let’s stop dropping it from the balcony and reach for more meaningful metrics.