Responding to two arguments against the student evaluation of teaching scheme in Nigerian universities

Student evaluation of teaching (SET) has been around Universities in the Western and Eastern Hemispheres for a few decades now. SET is making in-roads into the Nigerian tertiary education sector. In this project, I identify two arguments (or assumptions) behind the opposition to the institution of SET in Nigeria. I demonstrate that these arguments/assumptions are incorrect and their worries not enough to scrap the programme. I also show that opposition to SET has been witnessed elsewhere before SET gained acceptance.


Introduction
The student evaluation of teaching (SET) scheme, which started in the Western hemisphere, has been introduced in many countries' tertiary education systems. It provides students the opportunity to evaluate the teaching quality of their lecturers every semester. This programme met with a good number of difficulties in practice, some of which have been dealt with in various parts of the world. In the last decade or more, SET has been introduced into the Nigerian tertiary education system, and expectedly met with a number of worries and scepticism. I have dealt with three of these worries in another paper. I address two of these worries in this paper. The first is that students could witch hunt lecturers for giving them low grades by rating the lecturers poorly in teaching. I provide reasons why I think this argument is incorrect and its worry not enough to scrap the programme. The second worry is that the exercise could lead to corruption, with students and lecturers pandering to one another. I also show that the exercise is arranged and timed in such a way that this kind of corruption hardly arises.

Literature Review
Research by Seldin (1993) shows that student rating of faculty teaching has become the preferred low-cost mechanism by which university administrators monitor lecturing, prevent shirking of duties and ensure productivity. The attitude of both lecturers and students to the idea of student evaluation of teaching (SET) has received considerable literature attention, mainly in the Western hemisphere. Since lecturers are at the receiving end of this exercise, there is naturally to be more interest in what their reactions would be. In this regard, research surveys have often shown an interesting combination of acceptance and trepidation on the part of lecturers regarding their evaluation by students. For instance, research found that academic staff in Western countries are generally positive in their response to evaluation of teaching in general, but often apprehensive of SET, since they often perceive student ratings as a threat to the self-esteem of lecturers (Doyle 1975). This mixed reaction is seen in the fact that although the scheme is perceived by lecturers as ego threatening, most lecturers still concede that some means must be found to achieve improvement. For instance, Ryan and Randhawa (1982) observed that most favoured the idea although some thought it was an invasion of their professional autonomy. Eraut et al (1980) also found that most lecturers express concern for the scheme despite feeling the need for such evaluation (cited in Stringer and Finlay 1993: 96). Reaction of students towards the scheme is much more straightforward and clearer; research survey conducted by Lomax (1985) found that the consensus among students is that they should be taken into account in SET.
Another set of research finding suggest that the introduction of SET is usually initially resisted but accepted and viewed positively in the longer term. This was the finding of Donald (1982) in her university, and Rutherford's (1987) interview with staff of the University of Birmingham suggested that majority supported the scheme as long as it was administered in a systematic and consistent way. She reports that majority expressed the idea that the passage of time would help allays fears about the programme and give it the chance and support it needs to improve courses. Stringer and Finlay (1993) note that it was "slow to permeate the United Kingdom higher education system" (p. 122). This should be because, as Adelman and Alexander (1982) noted, the scheme is something that should 'rock the boat' and no one should be impressed with any such exercise whose results are painlessly accommodated. In this regard, Dunrong and Fan (2009: 101) note that the SET scheme in American and European higher institutions "has been developing for several decades to become a highly standardized and perfected teaching quality assurance system." There are generally two aims of this scheme: to judge lecturers (for instance for promotion purposes) and to develop courses and teaching quality. The former is called evaluation and the latter development. The idea is that the two should go hand in hand with equal weighting. But Rutherford (1987) found (and warned) that over-emphasis on the judgmental aspect of the scheme has been found to remove the developmental advantages that such schemes may offer. This is supported by Durong and Fan (2009: 108-109) who observed that making a clear connection between the student rating of lecturers and the promotion, deployment and disciplining of lecturers is unhealthy and makes it difficult for students to commit their views on paper. They argue that emphasis should be more on teaching development than administrative purposes, and emphasis in administrative purpose should be more on rewarding/awarding of the best performing than on punitive purposes. They argued that the punitive intents of the exercise could alienate its function in the sense that it could hardly be performed in a normal or appropriate atmosphere or frame of mind. Chen and Hoshower (2003) found that it is the improvement of teaching rather than administrative decision making about lecturers that motivate students to evaluate. Haskell (1998: 3) argued that because student evaluation is used for lecturers' salary, promotion and tenure decision-making, there is pressure to conform to classroom demands regarding teaching, and that this is an infringement on academic freedom. But in spite of all these, interviews with 307 Israeli university lecturers showed that 86.6% agreed that student reviews are important for lecturers' promotion and 88.2% agreed it is important for faculty tenure (duration of lecturer's/professor's contract). And in a survey by Carey (1993) many lecturers argued that, given the alternative of peer evaluation, they consider student evaluation to be less harmful. Moreover, it appears to me that the 'academic freedom' that Haskell refers to is that whose largely negative impacts on the quality of teaching and the educational system has led to the institution of SET in the first place.
One important item discovered by researchers is the lesson that lecturers should always be made aware of what they and their courses are being evaluated for. For instance, Murray and Newby (1982) observed that many lecturers are opposed to the idea of constructing evaluation questionnaires without their consultation. Elton (1984) explores a model of the scheme that allows an active role to the person being evaluated. Dunrong and Fan (2009: 112) observed that when teachers are committed to a passive role as audiences of the scheme, it is not surprising that they would be sceptical of the results, no matter how accurate. These seem important concerns, since there is no need to keep targets of evaluation in the dark. The very idea behind evaluation is improvement (apart from promotions) and lecturers who are aware of specific items that they are being evaluated against would conceivably endeavour toward improvement in such items of evaluation in the course or teaching.
In terms of impact on the system, Tuckman and Oliver (1968) found that student feedback yields a more positive response compared to no feedback at all, and that such feedback usually engenders behavioural change. Murray (1984) found that it produced significant improvements in course effectiveness.
Regarding reliability, research shows that students are honest and reliable raters of courses and lecturing (Swanson and Sisson 1971). Evidence collated by Lazovik (1972) shows that student judgments are consistent because the diversity in their opinion becomes a stable measure when the judgments are pooled into a mean. Analysis of research by Stringer and Finlay (1993: 120) show that this consistency cuts across year groups.
On the question of the importance of the scheme, research by Fox (1984) suggested that it is better to evaluate courses directly rather than assume that exam achievement necessarily reflects good learning or a good course.
Perhaps the most crucial findings regard the ability of the scheme to indicate how well teaching is going on or how successful learning is going on. In this regard, comparisons have been made between student ratings of courses and measures of student learning. In this regard, Frey (1973) found that student ratings of courses correlated with their learning, and Doyle and Whitely (1974) report moderately high correlations between student ratings and learning.
Regarding the identity of evaluators, the consensus is that evaluations should be anonymous so that students cannot be penalized for rating courses low. Some lecturers have argued that nonanonymous evaluations would encourage students to be thoughtful in their responses (see Stringer and Finlay 1993: 99). But research surveys by Stone et al (1977) show that students who identify themselves in evaluation feel obliged to rate their courses more highly than those who did not. Doyle (1975) also finds that students are more willing to evaluate when their anonymity is assured.
More recently, studies are beginning to cast doubt on the validity of the student evaluation scheme. For instance, Marsh (1987), Wachtel (1998), Ckonko et al (2002, Spooren and Mortelmans (2006) as well as McPherson et al (2009) point to several factors that could bias student ratings of lecturers, such as gender, class size, ranking of the lecturer, grade expectations. Marsh and Roche (1997) complain about a relationship between ratings and the prior interest of the student in the course and reason for taking it. But McPherson et al (2009) show that these problems can largely be solved by ranking adjustment Centra (2003: 496) and Spooren and Mortelmans (2006) have shown that student rating is generally reliable and only minimally affected by the various course, teacher and student characteristics/factors (class size, lecturer's rank/experience, gender and so on). Moreover, the findings of Spooren and Mortelmans (2006) support the validity hypothesis: student rating reflects level of student learning. A summary of the literature by Spooren et al (2013) also reveals "positive correlations between SET scores and student achievement, expert ratings of teaching behavior, self-ratings, and alumni ratings" (p. 12). And as mentioned, there is also a positive relationship between the Responding to two arguments against the student evaluation of teaching scheme in Nigerian universities E. Ani quality of lecturers' research and their student ratings, meaning that a lecturer's student ratings quite often coincide with her research productivity (Stack 2003). Shevlin et al (2000) argue that if students have a positive personal and/or social view of the lecturer, such as being a charismatic lecturer, this may lead to more positive ratings irrespective of the actual level of teaching effectiveness. But they also admit (2000: 402) that their research could also suggest that lecturers are attributed a level of charisma based on their level of 'lecturer ability' and 'model attributes', that is, the better the lecturer the more charismatic they are rated.
As with any system invented to improve people's behaviour or productivity, SET comes with its own technical difficulties. Some of these difficulties have led to scepticism regarding the ability to keep SET from becoming a platform for witch haunting and a platform for corruption. In this paper, I particularly address the concern that the SET scheme could lead to witch haunting and corruption. I argue these are technical problems that can be improved, rather than reasons for rejecting the scheme. No one, including Aleamoni (1999) has discussed these aspects of SET.

Methodology
Perceptions about the student evaluation scheme were gotten through a qualitative survey depending mainly on interviews. The choice for interviews is clearly influenced by the need to listen to the subjective arguments of Nigerian lecturers and students about the success of SET in order to understand their perceptions and general feel. On the other hand, I did not see the need to embark on quantitative research: it seemed irrelevant ascertaining the quantitative aspect of the misperceptions addressed here (knowing the number of people or proportion of the population holding a certain misperception) because the purpose of the paper is not to reach conclusions based on the demographic strength of a perception but to correct misperceptions identified in a demographic setting. It would therefore suffice for the paper that certain misperceptions exist to be addressed rather than just how statistically widespread they are in the society. As I have already shown, literature concerning the student evaluation of lecturers is predominantly empirical, and not much room has been given to the conceptual examination of perceptions. That said, the business of the paper would be largely conceptual: I seek to show mainly through conceptual discussion that two worries about SET are at least not enough to scrap the programme. This is based on my finding that the selected perceptions about the scheme are either based on conceptual misunderstandings or need to take account of the potential of the programme for technical improvements and growth.

The Two Arguments/Assumptions
Let me proceed to list two worries about the reliability of student evaluation scheme. First, students would witch haunt lecturers for failing them. Second, SET would lead to corruption, with students and lecturers pandering mutually to each other's sensitivities and obfuscating objectivity. Let me address these worries.
Argument/assumption One: students would witch haunt lecturers in student evaluations for failing them in academic examinations.
11 out of 18 lecturers (or 61%) answered "Yes, I think so" to my question, "Do you think that students would witch hunt lecturers in evaluations for failing them in exams?" When I asked for their reasons, some of the answers are worth looking at, including, "Some of them can be vindictive", "Students may not appreciate lecturers that are strict", "Students would want any opportunity to revenge their examination failures", "Some are myopic and so not interested in academic excellence. Their main preoccupation is success even if it is devoid of integrity", "Some of them lack sincerity and will want to get back at lecturers instead of going through their ills and correcting them", among others.
18 out of 41 students (or 43.90%) answered "Yes, I think so" to my question, "Do you think that students would witch hunt lecturers in evaluations for failing them in exams?" When I asked for their reasons, some of the answers showed that even students fear their colleagues could witch haunt lecturers. Some of these answers are, "Anything is possible when a student is upset by a poor grade", "Unserious ones can be dangerous", "Students who are not serious will see it as an opportunity to victimize hard-working lecturers with integrity", "They see it as an opportunity to pay back", "Some students are just naturally mean", "Some students aren't fair", "No student wants to fail", "for a student to attend classes submit assignments and still fail would make the student do such", "Pay back", and "because students are mostly emotional when it comes to issues concerning their courses".
The concern about witch haunting is also shared by some scholars who have proposed the 'grading leniency hypothesis (Marsh 1987, Centra 2003. According to this hypothesis, a lecturer can buy higher student ratings by giving higher grades and vice versa (students can give higher grades, and both sides can punish each other for poor grades/rating). In fact, Spooren and Mortelmans (2006) found only a moderate influence of grade on student rating of lecturers. Their evidence rather shows that it is students who have higher grades across all courses (brighter students) that rate teaching higher in a particular course. This refutes the 'grading leniency hypothesis' that better students in a particular course give higher ratings on teaching effectiveness in that course (p. 211).
Some of the students who did not believe students would witch haunt lecturers provided answers that I also find weak, such as "When the lecturers teaches well, students can not witch hunt them", and "It's wrong", among many others. I will not argue that students will not witch haunt lecturers because students will not try to, will not have the intention, or will find it wrong. I will argue that witch haunting can be prevented from arising because of the chronological arrangements of the student evaluation of teaching and the lecturer marking of scripts. The problem of the correlation of grading students and student rating of lecturers can be fixed through precisely timing the student rating and lecturer grading processes to be completely unaware and uninfluenced by each other. So to address this argument or assumption, I would begin by referring to the chronological arrangements for the student evaluation and academic examinations at the University of Ghana. Here, these things have been timed in such a way that none of the parties would have any idea of the results of the evaluation of the other party until it is too late to do anything about it. Let me examine when students are required to evaluate Responding to two arguments against the student evaluation of teaching scheme in Nigerian universities E. Ani their courses/lecturers, in relation to when lecturers grade students' scripts. For the same course, students are required to evaluate their courses/lecturers before the end of teaching. This is several weeks before taking their examinations, and therefore several months before they are graded. Lecturers are not made privy to the results of their evaluation by students until the end of the entire academic year, or at least a semester after they must have graded their students and submitted their students' grades. Perhaps to seal this arrangement, that is, to avoid the possibility that lecturers could receive their student ratings whilst they still have their student grades with them, the university requires lecturers to submit their student grades at least two weeks into the beginning of the next semester. As such, it is a mutually blind review process on both sides. This technically prevents the issue of witch haunting from arising. Even if the timing were such that made witch haunting possible, then it would obtain on both sides. In tertiary educational systems where lecturers are known to witch haunt students for things as petty as turning down their sexual advances, (the institution of) SET would have evened up the balance of caprice. But, as we see from the timing of student evaluations at the University of Ghana, this does not even arise.
There are two other reasons why it seems to me that concerns regarding malice on the part of the student toward the lecturer are misplaced, technically speaking. To demonstrate this I will examine the two categories of timing regarding the origin of feelings of malice: either a student developed malice toward a lecturer before she registered to take his course, or during the course itself. Let me begin with the first category of timing. The number of compulsory courses in universities is progressively shrinking, and most courses are increasingly becoming elective courses. This means that the student is acquiring increasing freedom of choice regarding what courses she would love to take. What this means is that students register and embark on most courses by choice. Since courses are also advertised along with their lecturers, it means that she has the chance to not only choose the course, but the lecturer teaching it. This effectively tones down the possibility that a student would embark on a course taught by a lecturer she dislikes. Consequently, I do not see much merit in concerns that lecturers face the risk of being maliciously scored poorly by students who have been disappointed in them from previous courses or semesters. If this exists, it would only affect compulsory courses and does not represent enough reason to scrap SET. Finally, if malice happens, such as in compulsory courses, such courses contain a large enough population of students to neutralize the poor rating the aggrieved student(s) can give. Compulsory courses are courses that every student should take. If a few students (to be generous, say up to ten students) are aggrieved for their individual reasons against a lecturer, and they score the lecturer poorly in their SET, the poor grading of these ten students would be swallowed and neutralized by the average of, say, a thousand students that did the SET for the same course and lecturer. As such, the concern that lecturers face risk of malice from students who hate them before choosing to take part in their course does not impress me.
What about the second category of timing regarding the development of hatred toward a lecturer: what if the student develops malice during the course of the lecturer's course in question? Here we must look at what happened during the course. Let me recall the contents of the student evaluation form: if a lecturer introduces her course well, is punctual, attends most of her lecturers, teaches with spirit and dedication, provides regular assignments and discussions of them, interacts fairly with students (most constitutions recognize students as human beings requiring respect rather than rudeness no matter what), there seems no reason why a student should develop animosity to a lecturer during the course of the lecturer's course. So we are back to the question: how could a student develop malice toward a lecturer during the course of taking his course? Since we cannot see an answer to this in the lecturer's official dedication and delivery, we must look beyond the classroom, and here we find ample reason for student malice. If a lecturer begins to engage in extra-curricular activities with a student, then inter-personal sensitivities could begin to emerge. It seems to me to be these sorts of sensitivities, especially where they are sexual in nature, and where they have begun to malfunction, that can lead to student-to-lecturer animosity. But even here, many students would not develop animosity, since there is still a social hierarchy between a student and a lecturer in spite of a lecturer's behaviour. Surveys by Feldman (1977) revealed only few differences between sexes on student ratings. And as we have already seen from the accounts of some students, animosity arises mostly from bullying of students. This behaviour toward students is not warranted under any circumstance whatsoever. Consequently, a lecturer must, in one way or another, be responsible for the development of animosity or malice toward him/her during the handling of his or her course. I do not see how she can escape a charge of this responsibility.
This leads me to the subject of inter-personal sensitivities: opponents of the SET scheme would argue that it exposes lecturers to the sensitivities of students, and that we can hardly avoid hurting their sensitivities, which they would express with malicious evaluations. But I have argued that the issue of inter-personal sensitivity only arises in the context of extra-curricula relationship between the lecturer and the student. A lecturer, who punctually walks into class, teaches interactively, is not disrespectful to students, and generally upholds the ethics of her job, has no fear of hurting the sensitivity of any student. A lecturer who credits a student with a low grade, but sincerely explains to the student why she performed poorly, has no such fear either. In fact, discussions about grades, along with grading itself, occur long after students have completed their evaluations. As such, the issue of hurt sensitivities maliciously affecting student evaluations does not arise. But consider the alternative that has been proposed in some universities: lecturers to be assessed by their direct colleagues in the department. It is here that inter-personal sensitivities become real. Crediting a colleague with low grades in teaching, even when a genuine evaluation, could (and from experience does) hurt people's sensitivities. So why complain about the sensitivities of students (which hardly arises in a legitimate setting) and embrace the much more dangerous inter-personal sensitivities involved in evaluating colleagues? Consequently, argument/assumption number one is incorrect.

Argument/assumption Two: student evaluation of courses/lecturers would lead to corruption, with the student and lecturer pandering mutually to each other's sensitivities
More than half of the lecturers I interviewed (10 out of 18, or approximately 56%) said "Yes, I think so" in answer to my question, "Do you think that students' evaluation of teaching could lead to corruption, with the lecturer and student pandering to each other's sensitivities?" When I asked them to explain their worry, some of their answers were interesting, such as, "Lecturers and students will surely indulge in dirty symbiotic interest protection games", "Lecturers may resort to impress the students even at the price of compromising the academic standard", "Both parties are likely to mortgage their conscience and sense of judgement", and "Students evaluation, if made necessary for lecturer overall assessment of his job can influence pandering sensitivities. Lecturers may pass students unduly just to have favorable assessment." In contrast to the lecturers, only 7 out of 41 students (or 17%) answered "Yes, I think so" to my question, "Do you think that students' evaluation of teaching could lead to corruption, with the lecturer and student pandering to each other's sensitivities?" The reasons most of them gave were not very technical, such as "Possible in Nigeria", and "because students are mostly emotional when it comes to issues concerning their courses" among others.
Responding to two arguments against the student evaluation of teaching scheme in Nigerian universities E. Ani The worry about corruption is understandable given the wider problem of corruption. As with any system, SET sets up a dual duty on the part of lecturers and students to evaluate one another, the lecturer in terms of examination scripts, and the student in terms of SET forms. It is understandable to worry that the two duties could be done in such a manner as to confer mutual benefits on the duty bearers in a way that undermines the objective of the exercises. The good news, however, is that this worry is addressed by the timing arrangement for the student evaluation exercise in relation to their examinations, on which I have dwelt. In the course of a semester in question, the lecturer would have no idea how students have graded his or her teaching. Neither would students have taken their examinations by the time of evaluating the course. Let us imagine that the lecturer wants to extract high grading from students in return for giving them higher marks. How does he/she execute this plan? Since she is aware that she cannot see the SET scores until another semester, and hence to know if the students acceded to her demands, she cannot, technically speaking, make demands. At best, she can only make a plea during her class for favourable grading from students. But there is something belittling about begging marks from students who can be much inferior to one in age and status. Why would a male lecturer who would want to impress his female students consider this? Why would he be comfortable with the idea that they would jest about him after class? What about a female lecturer facing male students in class? Even if the lecturer is confronting the same sex, is it any better? The answer to these questions cannot be positive. This is because any lecturer who has some shame would find such an idea gruesome and counter-intuitive. As such, we have our results from analysis: on the grounds of both timing of evaluations in relation to exams, and the desire in humans for respect from others especially those who are subordinate, argument/assumption number two is incorrect.

Conclusions
I have highlighted two worries being expressed in two West African countries against the SET scheme, and I have demonstrated that these worries should not arise because their arguments and assumptions are incorrect. From my experience with universities that do not operate SET, school authorities still receive informal reports from students about the teaching performances of their teachers, and everyone usually knows who are good, hardworking teachers and who are not. Although school authorities are not able to formally use these pieces of gossip for administrative purposes, the pieces of gossip nevertheless informally influence them in their handling, promotion and appointment of lecturers and professors to various importantly strategic responsibilities in the school. But this also makes room for malicious or corrupt use of discretion. So why not make student opinion to be formal?
The student evaluation scheme derives its origin from the University Quality Assurance concept, which in turn originates from economics where quality is that which satisfies the customer, and the costumer in the university context is the student (Ellis 1993: 3). Many universities have embraced the 'serve the student' philosophy in order to present themselves as qualitative and progressive, and to attract more students. But in the absence of any monitoring mechanisms from students about teaching, there is a gap between this philosophy and reality. The SET scheme would fill this philosophy/reality gap, as it has in other parts of the world (Durong and Fan 2009: 100-105) and whatever problems arising from the practice have to be fixed by improving the technicalities of practice.
It is important to also appreciate that the accumulation of student views about a particular course over a long time would enable university academic boards or academic improvement committees to make changes to a course and observe a change of pattern introduced by the change. This introduces some scientific rigour and monitoring exactness into the general exercise of improving the quality of university teaching. There must be an objective and rigorous way to monitor course quality to ascertain that they are meeting agreed standards, and findings regarding strengths and weaknesses of courses can help in course development.
It is also important to note that student impressions of a course are made known outside the course and institution. As such, authorities of institutions are better off having these impressions to work with, since, like it or not, the impressions go out into the society to affect people's general attitudes towards the institution. In this regard, what are student impressions today could become public impressions tomorrow, and therefore have the potential of determining the overall image of a university. A sequel to this is that it is better to be armed with student impressions so as to start working to correct them when they are generally going in the negative direction.
Let me conclude by highlighting what I think is the most meritorious aspect of a student evaluation scheme: a lecturer is accountable, or at least made to feel so, to teaching students well in such as a scheme, in much the same ways that political leaders and aspirants are made to feel toward citizens in a democracy. When a lecturer is aware that she will be graded on punctuality, attendance, course delivery, teaching animation, quality of impact on students, regularity of assignments and their marking/discussions, fair interaction with students, and so on, she will be compelled to do these things well. This compulsion is not guaranteed in the absence of this monitoring mechanism. It is true that lecturers are obliged to teach well with or without instruments of monitoring. But we all know how we handle obligations often. Obligation is not enough: the subtle compulsion provide by the SET scheme is a prudent additional motivation for teaching delivery and discipline.