| |
Done very well.(4) | You cannot find any significant objections or criticisms of the candidate's performance in data acquisition, diagnosis, and management of the case. The student is fluent. |
Done well.(3) | The general performance of the candidate, the reasoning and the sequence of actions reach an acceptable level of competence. You observe relatively minor errors or inefficiencies in the candidate's approach. |
Poorly done.(2) | You observe several important deficiencies in data acquisition, diagnosis and/or management. |
Unacceptable.(1) | The candidate is clearly unsafe, he mismanages the case without self-awareness of his own inadequacies. |
|
Written
The most important written examination techniques are
Essay
Short answers
Simulation of initial problem solving (SIMP)
Modified essay questions
Written objective tests
Essay
For a long time the writing of an essay was very popular in schools and training courses. Teachers were convinced that they could assess knowledge and understanding of medical subjects in an effective way. But in recent years serious reasons have been raised to avoid essay questions for assessment because of the unreliability in the marking of the answers. Teachers disagree upon the answers given by students. It may be traditional to use this kind of examination, but, before it is used, the purposes for which it is to be used must be very carefully considered: it has severe limitations. While it is certainly desirable for a health worker to be trained to write a narrative and clear report, training for this skill of writing, and testing of it, is best not done at a professional final examination. If for some reasons teacher decides to use this format they have to take the following into account.
- write questions that elicit the type of answers that are described by the objectives
- use clear and directive words. For example describe, compare, contrast, explain, list, and appraise, words, which are in harmony with the system of self-directed learning objectives. ( Table 6. "Process of the formulation of self guided learning objectives" )
- prepare a marking system in advance. Imagine what kind of answer students can give and mark these in advance, correct or incorrect or partially correct.
Short answers
Short answers, just like essays, demand a written response by the student, which must be read by the examiner. But this method is very powerful since very short answers must be given which can be marked much more easily than long and extended responses.
For example
An elderly patient presents with several hypopigmented skin patches on the body. The patches were anaesthetic but non-itchy.
What is the likely diagnosis.
ANSWER leprosy (1 mark)
Or
An elderly patient presents with several hypopigmented skin patches on the body. The patches were anaestetic but not itchy. Physical examination revealed multiple enlarged en tender nerves.
What do these signs indicate?
ANSWER. They are signs of reactive state.
In the same way as with the essay questions, the formulation of the questions is important; they must be clear and direct so that a straightforward answer is possible.
Simulation of initial problem solving (SIMP)
This kind of test is a very simple but effective way to assess the competence of the student if he confronted with a patient. He is invited then to indicate what he would do. What are the most important questions in the history and in the physical examination? What laboratory tests would he request.
It starts with for example the next case.
The pregnant woman with high fever.
A woman in the thirty-third week of a normal pregnancy presents with low-grade fever and productive cough.
What would you do?
The student must list, directed by his initial impression, his planned actions.
For this case the following rating categories (checklist) could be used
History
- How long has she had the fever for?
- How long has she had the productive cough for?
- What is/are the content(s) of the sputum (productive cough)?
- Do she has any other symptoms?
- Has she observed any weight loss?
- Is the foetus kicking regularly or not?
- What treatment does she had received so far?
Physical examination
- Check temperature
- Check respiratory rate
- Check for enlarged glands in the neck
- Check the sputum produced for volume, colour, odour and contents e.g. blood
- Check for evidence of anaemia.
Diagnoses (preliminary hypotheses)
- Malaria
- Pneumonia
- Bronchitis
- Tuberculosis
Further tests/investigations
- Blood film for malaria parasites
- Sputum examination for bacteria, including AFB.
Modified essay Questions (MEQ)
In a MEQ the trainee is provided with a short description of a patient with a limited amount of data and is then asked to write a brief answer to the question. After this first answer, more questions are presented so this format resembles a series of short answer questions. This assessment method allows the examiner to see the way the trainee deals with a patient over time, and is a most valuable method.
Example MEQ. We take the example of the woman with fever again.
A pregnant woman in the thirty third week of a normal pregnancy presents with low-grade fever and productive cough. The direct sputum examination revealed positive AFB.
In a MEQ it is possible to raise specific questions to see how a student deals with this patient over time. For example;
1. What do you do next/Describe what you would recommend?
2. What treatment would you recommend given that she had not taken any drug since the illness started?
After the student has given the answers, the following information is supplied to the student.
The husband of the woman has travelled to a near by town and he is expected back until 4 weeks time. In addition she has two young children at home and no house help.
Question 3. Bearing in mind that the policy of the control programme is that all diagnosed patients must be hospitalised during initial phase of treatment, what would you do?
Good MEQ's and SIMP's are not so difficult to prepare: they take some time but they are a very useful training for teachers to discuss the most appropriate approach of patients and problems. To make sure that the examiner marks the answers reliably, the examiners must all agree what answers are acceptable in the examination.
Written objective tests.
This term is used for some tests like Multiple choice Questions (MCQ) and True/false questions in which the marking of the answers is objective. A typical MCQ has a stem and four of five possible answers. A True/false questions presents a statement and the student has to decide whether the statement is true or false.
An example of a MCQ is
stem: Active immunisation is available against all of the following diseases except
five possible answers(one correct)
- tuberculosis
- smallpox
- poliomyelitis
- malaria
- yellow fever
or
The leprosy bacillus was discovered by
- Hanson
- Koch
- Freud
- Bensen
An example of a True/false question
statement: A TB patient on treatment who becomes positive at the 5th month should be registered as a failure case..
True/False.
Or
A leprosy patient with 5 skin lesions and an enlargement of the ulnar and radial cutaneous nerves should be classified as Puacibacillary.
True/false
Or
Clinical diagnosis of leprosy by supervisors in the field can be made according to the following criteria:
T/F Number of skin lesions
T/F Number of enlarged nerve trunks
T/F Clinical features of the skin lesions
T/F The morphological index
T/F The slit skin smear result
MCQ's can be used to test a wide base of knowledge, to interpret data and to test reasoning in a clinical problem. Visual aids can be presented with these questions, especially in leprosy and related dermatological problems, a picture can be used as the basis for the stem, and by means of the questions the student can be tested on the recognition of the picture and its significance. Machines, computers or administrators (objective) can do marking of these questions and the correct answers. No examiners time is needed, but questions and the correct answers have been agreed before the examination. However with MCQ's this can be a lengthy process, because the stem of many MCQ is ambiguous and so they have to be rejected, while other questions can have mutually exclusive answers that cannot be used.
True/False is a type of question that is meant to investigate whether the student knows or does not know. With this sort of question the student can be tested across a wide range of knowledge.
In the same way as in essay and short notes the examiner has to make sure that the statements are short and unambiguous, and he must ensure that the statement is unequivocally true or false. Again, this must be agreed beforehand by the team of examiners.
Machines or administrators can also do the marking of these questions.
In literature one can read elaborate discussions what should be the best format; True/False or MCQ, and about MCQ how many distracters there should be 4 or 5. And how many answers could be correct, only one or at least one or none of the answers. To avoid difficult discussions the examiner can use a rule of thumb. Many Questions are always better than only a few questions, because the content of the subjects is covered more comprehensively. True/False questions are easier to construct than MCQ's, and simple MCQ's , with only four alternatives and only one correct answer are easier to construct than the more complicated MCQ's. Make sure to have enough questions to test the knowledge of the student and that the questions fairly represent the knowledge that must be known. But if MCQ's are to be used, we prefer those that have a variable number of correct answers. This discourages guessing.
Direct observation
Examinations must also focus on all kind of practical skills. This can be done by observing the performance of the student in practice or in a simulated (role-play) situation. All kind of skills can be observed. Practical skills, as for example how nerves are examined, or communicative skills which determine how a candidate talks to a patient, whether to obtain information in a history, or to explain what's wrong. When an examiner observes a candidate no questions are asked so that the candidate can be allowed to carry out the examination without hindrance. But the different observations by examiners in this kind of practical examination can also be unreliable because different examiners see different things and judge these ,or even the same things, in a different way. Thus again there is the problem of the examiner; what does he see and how does he interpret what he sees?
Let's spend a few words on the problems that arise when somebody is observing in his personal way. When you observe something you are putting something of yourself into that observation and your description of it. We like to illustrate this with an example in art. A famous example of the different ways in which one can look at things is to compare two great artists, Velazquez and Goya, who were both famous Spanish realist painters, painted members of the Royal Family in a completely different way. With Velazquez, they all became noblemen, because Velazquez himself was a nobleman. But, when Goya he painted the Royal Family, he made them look like a butcher's family in their Sunday best clothes.
In the same way we may expect different vues in medicine since different doctors and health workers have different experience and different views about what is important and correct.
Various ways have been suggested by which these different ways of looking and interpretation can be dealt with., which seek to reduce the big variations between different observers. The most important aids are rating categories (checklists) and rating forms, in the same way as with written and oral examinations.
In constructing a checklist to observe the performance of a student examiners discuss in advance what performances can be expected in a test situation and which of these performances are correct and not correct.
In fact the examiner is checking the kind of performance with the checklist and when he has to judge the quality of the performance he can also use rating scales to mark.
For an example of observing practical skills see checklist below.
Examination of an ulcer of the leg. Rating categories (Checklist) and rating scale.
Done satisfactorily | Done inadequately | Not mentioned or attempted | |
Site
Measurement (diameter or length/width) |
|||
Base bleeding
Discharge describe
Surrounding tissue: indurated, oedematous |
|||
Skin: sensation
Other tropic lesions
Opposite limb normal if abnormal, describe it |
|||
Lymphnodes |
Objective Structured Practical Examination (OSPE)
The Objective Structured Practical Examination (OSPE), or the Objective Structured Clinical Examination (OSCE) is a way of examining communication skills, manual skills, decision-making skills and knowledge at the end of a course. A well-designed OSCE would test the student's ability in different areas. The distinctive characteristic of the OSCPE is that it consists of at least 10 "stations". Each station focuses on a particular skill that the student must have at the end of the course.
Each student starts the examination at a different station. At each station the student answers a question or does an examination, which maybe practical or written. At the end of a fixed time period (usually 5 minutes) a bell rings and the student moves to the next station. At the end of the examination every student has visited every station. At the practical stations the students may be asked to take a patient's history, examine some part of the patient (a full examination is not possible in the 5 minutes), examine data or photographs or the results of laboratory tests, or use a piece of equipment. At a written station which follows a practical station, there is usually a short answer question (or possibly MCQ) based on the task performed at the practical station. The practical stations have to be observed by an examiner who uses a checklist or rating scale to assess the student's performance.
One of the great advantages of this kind of examination is that students will be tested on a wide range of abilities. A well-designed OSCE will require the students to do things, which they normally have to do in the field as qualified health workers. The test is valid for a lot of intellectual and practical skills. Because the OSCE has at least 10 stations, quite a lot of space is needed. An ideal space would have several different rooms or a large room, which can be divided by screens for privacy of patients and for the other stations. Because the OSCE is different from more traditional examinations it is vital that both teachers and students prepare for the examination. The students must have a practice an OSCE before they are assessed in their final examinations. This is not a waste of time. It is fair and students can learn from it.
It is essential to prepare all materials thoroughly in advance, checklists, marking systems, instructions for students and examiners, and the technical equipment, which must be in working order. And as we discussed above make sure the examiners understand the items on the checklists, that they agree upon them, and know how to use them.
Prepare a master mark sheet to record all the marks of the students on every station.
Example of 14 stations in a OSCE
Stations
- Examination of the foot
- Interpretation of the clinical features of the skin.
- Taking a history from a patient with cough
- Inspection of the skin and palpation of a group of peripheral nerves
- Approach of the patient (establish a good relationship)
- Chest radiograph: TB with cavity.
- Examination of an eye in Leprosy.
- Social/occupational history-anaesthetic fingers.
- History of a patient with TB who has red urine.
- Examination of a claw hand.
- Photograph of Erythema Nodosum Leprosum, description of Types 1&2 reaction.
- Orthopaedic shoe as exhibit: indications/benefits of its use.
- Classification of a leprosy patient's skin lesion
- Examination of stained sputum.
How to choose the most appropriate examination method
Sometimes it is difficult to make a decision about the most appropriate examination method. For example what are the best formats to assess the 14 different skills listed in the OSPE described above. And how to make sure that these formats are used in the most objective way in order to reduce the subjectivity of different examiners.
To make a good decision about the most appropriate examination one should know something about educational measurement.
For the decision " What kind of examination is the most appropriate to test whether a desired competence is mastered" the examiner has to consider 3 important requirements:
- Reliability (precision, repeatability): If different examiners are observing the same student, are their marks about the same?
- Validity (Accuracy); does the examination gives the information (measures) about a candidate which the examiner wants to know, and which the examination is supposed to measure. Do examiners agree upon rating categories and the level of performance that is acceptable?
- Practicability: is the examination practical in terms of time of the candidates and the examiners and the resources that must be used to stage the examination. In technical language; what is the cost/effectiveness of the examination.
To illustrate these requirements in practice we can give different examples. The first one is about the measurement of blood pressure. If two doctors were to measure blood pressure in patients and one instructed the patient to lie down, while the other asked the patient to sit, we might expect to have two different readings of blood pressure ( not reliable, not precise). Similarly if an adult blood pressure cuff were used on a child, the reading would not be an accurate representation of true blood pressure. Consider another concrete example: suppose a manufacturer was producing blood pressure instruments, which were not well calibrated. They all have the same fault and so when they are used by different physicians a reading 15-mm Hg too low is recorded for every blood pressure. The blood pressures are recorded reliable but they are all incorrect, they are all inaccurate, and so they are not valid.
The second example is the scoring or marking by 2 or more different examiners of the answers provided by the student in respect of the pregnant woman.(See SIMP) without rating categories and a rating scale. If some examiners then give high marks while others low then this procedure is nor very reliable.
The third example is about the question if X-rays indicate whether a patient has tuberculosis in the lungs. If 20 X rays are taken of the same patient, and are shown to 5 observers trained to recognise tuberculosis lesions, and all 5 independently report that all the X-rays indicate a lesion, then you may say that the method is reliable. If the 5 observers agree that 10 of the X-rays show a lesion, but 10 do not, then we must conclude that the method is not reliable. Or in a different situation we show 20 X-rays of several patients to these 5 observers. If there is good agreement among them, I may equally conclude that the method is reliable.
With the example of the X-rays we can introduce now the concept of validity. Is a lesion a valid indicator of TB? The answer is No. In fact, the chest X-ray is a reliable but not very valid for diagnosing TB, even if it does reveal a lesion. A lesion is often but not always a proof for TB. To proof TB we need the evidence of the tubercle bacillus in the sputum. This method is reliable and valid.
Similarly, if an examiner wants to know whether a student can elicit enlargement of a peripherical nerve in a leprosy patient, written examinations are not valid. With the latter you can test if a students knows how to do this, but to check whether the student is really able to do so we need a practical test with a real patient, and the use of rating categories.
Much research has been done on the problems of reliability and validity, which have used sophisticated designs and statistical analysis. But it is not necessary to study this in detail: it is enough to use the conclusions The difference between reliability and validity can be easily seen and understood visually, if we consider a gunman firing at a target: These results can show 3 different patterns. See figure 2.
Figure 2.
Click the figure to enlarge
Neither Reliable nor Valid
The first pattern (...A) shows the gunman who hits the target as if he was throwing a dice. Every shot is random. He never hits the bull's eye! They are bad shots. Suppose different teachers used the same test (compare with the same gun) to test the competence of a candidate and got such results, each widely different from the others. Obviously such a test is unreliable. If the results were reliable (had precision), different examiners would have consistent results from the candidate, not just once, but repeatedly.
Reliable but not valid
Look now at figure B: This is different. In this case the gunman's shots are precise and consistent but now the gunman cannot hit the bull's eye. Now the gunman's shots are reliable but they cannot hit the target in the place he wants to hit it. Their shots are precise but not valid or accurate. Validity is answering the question "To what extent a testing measure actually measures what it is intended to measure?" One has to hit the bull's eye: one has to read the correct, real blood pressure. One has to make sure that in the sputum the tubercle bacillus is existent. X rays of the chest are reliable for seeing lesions but not always valid for the proof of TB.
Validity and Reliability
When the gunman hits the target (validity) with all his shots consistently (reliability) figure C is the result. Or to use the blood pressure example: different physicians record the same blood pressure consistently (reliability) but also correctly (well-calibrated) blood pressure (validity). When you use an examination for the assessment of a specific skill, for example the examination of an ulcer of the leg, and the examiners agreed upon the rating categories and the level of performance that is demanded, then this examination can be marked reliable and valid.
To get valid and reliable results in education a logical sequence has to be followed.
- First the skills and competencies which a student should be able to demonstrate at the end of the course must be defined to ensure that the test is valid. These skills and competencies is the bull's eye.
- Second, independent and competent examiners must agree on what on what constitutes a good answer or a correct behaviour, or an acceptable level of skill for each of the elements of the test or examination. The observation is therefore no longer dependent on the judgement of an individual examiner so the test can be called reliable, precise and objective.
In this special case, since examiners are the measuring instruments, reliability is often called objectivity. Teachers examining skills are reliable or objective, when they do not have personal preferences.
Practicability
Finally, is the examination practicable? Many factors have to be considered if a fair test of agreed goals is to be designed.
We all know how difficult it may be to achieve something new. Good ideas fail because of the financial cost, the demands on time, or the number of available people. The teacher must also take these factors into account. On one side he is responsible for the test criteria, (validity and reliability), on the other hand his decisions are also influenced by very practical criteria like budget, manpower, time, colleagues.
OSPE Reconsidered
While we have suggested the important and distinguishing features of a good examination, the best way for examiners to grasp these features is to design their own examination. As an example we have analysed the 3 requirements, reliability R, validity (V)and practicability (P) for the suggested OSPE we described before.
For each competence, which is assessed in one of the 14 stations, we will suggested an examination procedure and estimate the reliability, the validity and the practicability. (++;+, +-;-, --).
Station | Assessment | Validity | Reliability | Practicability | |
1. | examination of the foot | Observation/real or simulated patient. Checklist/rating scale | ++ | ++ | + |
2. | Interpretation of the clinical features of the skin | real specimen, short answers | ++ | ++ | ++ |
3. | Taking a history from a patient with cough | observation real patient, checklist/rating scale | ++ | ++ | + |
4. | inspection of the skin, palpation of a group peripheral nerves | Observation real patients, checklist/ rating scale | ++ | ++ | + |
5. | Approach to the patient. (establish a good relationship) | Observation real patient. Role-play. Checklist/rating scale | ++ | ++ | + |
6. | Chest radiograph: TB with cavity | Short answers | ++ | ++ | ++ |
7. | Examination of an eye in leprosy | Observation real patient. Checklist | ++ | ++ | + |
8. | Social/occupational history. Anaesthetic fingers. | Observation real patient. Checklist/rating scale | ++ | ++ | + |
9. | History of a patient with TB who has red urine. | Observation real patient. Role-play. Checklist/rating scale | ++ | ++ | + |
10. | Examination of a claw hand. | Observation real patient. Checklist/rating scale | ++ | ++ | + |
11. | Photographs of Erythema Nodosum Leprosum. Description of Types1&2 reactions | Short answers | ++ | ++ | ++ |
12. | Orthopaedic shoe as exhibit: indications/benefits of its use | Short answers | ++ | ++ | ++ |
13. | Classification of a leprosy patient's skin lesions. | Real patients, photographs. Short answers. | ++ | ++ | ++ |
14. | Examination of stained sputum. | Lab. Smear. Short answers | ++ | ++ | ++ |
P.M. The stations with observation must have an examiner around to observe the rating categories and to score the performance of the student.