Rethinking Multiple-Choice — Noah Zimmermann

In today's ever-evolving higher education landscape, particularly with the rise of online learning, traditional assessment methods like multiple-choice tests are being reevaluated. This post delves into these tests as a staple in education and explores viable alternatives.

Reviewing Multiple-Choice: A Mixed Bag

To review multiple-choice tests, three studies have been picked. These studies highlight the upsides and downsides as well as underline how to design such a test.

The 'Testing Effect': Roediger's 2005 study reveals a double-edged sword. Multiple-choice tests can reinforce learning but also misinform. For instance, students exposed to incorrect answers in a test often later recalled these errors.
Cheating and Academic Integrity: Dendir's 2020 study underscores a major online learning challenge: cheating. It found that proctoring exams reduces cheating and emphasizes the importance of academic integrity. Interestingly, there was a noticeable drop in scores under proctored conditions, hinting at possible dishonesty in unproctored exams.
Best Practices: Butler's 2017 study offers guidelines for effective multiple-choice tests, such as simplifying questions, focusing on deep engagement with material, avoiding ambiguous options, finding the right balance in the number of choices, ensuring moderate difficulty, and providing feedback.

· · ·

The Core Issue with Multiple-Choice Tests

While easier to design and grade, multiple-choice tests often fall short in evaluating complex learning outcomes. They can lead to frustration, misinformation, or misunderstanding. Hence, it's crucial to find alternatives that leverage the benefits of multiple-choice tests while minimizing their drawbacks.

Benefits

Easy to implement and grade
Adjustable complexity
Manageable test duration

Drawbacks

Risk of learning incorrect information
Limited scope in testing complex objectives
Predominantly factual, less about knowledge application
Prone to cheating
Can mislead with item wording

· · ·

A New Approach

Building upon the multiple-choice format, the 'statement format' is introduced. In this approach, the learner's task involves sorting accurate statements into their corresponding concepts. This method directly engages learners with the material they have studied. The challenge level of this format can vary significantly based on the selectivity of the topics: it can range from very challenging to quite straightforward. The careful selection of statements by instructors is crucial. It should be evident to well-prepared learners which statement aligns with which topic. Consider the following example: the statements can be easily sorted into the correct categories, a process facilitated by the knowledge already acquired. Now, envision a test composed of five closely related topics.

Example

Sort these statements according to the topics.

Topic 1: Environmental Science
Topic 2: World History
Topic 3: Human Anatomy
Topic 4: Astrophysics
Topic 5: Literature Analysis

Statement: The Treaty of Versailles, signed in 1919, ended World War I and imposed heavy reparations and territorial losses on Germany.
Statement: The greenhouse effect is primarily caused by the accumulation of gases like carbon dioxide, methane, and nitrous oxide in the atmosphere.
Statement: Neurons are the fundamental units of the brain and nervous system, responsible for receiving sensory input from the external world, sending motor commands to our muscles, and transforming and relaying the electrical signals at every step in between.
Statement: Deforestation contributes to global warming as trees, which absorb carbon dioxide, are removed, increasing the amount of greenhouse gases in the atmosphere.
Statement: The Silk Road was an ancient network of trade routes that connected the East and West, playing a significant role in the cultural, economic, and political interactions between these regions.
Statement: Shakespeare's use of iambic pentameter in his plays adds a rhythmic quality to the dialogue, mirroring natural speech while allowing for varied emotional expression.
Statement: In George Orwell's "1984," the concept of Big Brother symbolizes the intrusive power of a totalitarian state, where the government exercises massive control over individuals' lives.
Statement: The human heart is a four-chambered organ that pumps blood throughout the body, with the left side pumping oxygenated blood and the right side pumping deoxygenated blood.
Statement: A black hole is a region in space where the gravitational pull is so strong that nothing, not even light, can escape from it.
Statement: The Big Bang Theory is a cosmological model that explains the early development of the Universe, suggesting it began from a very high-density and high-temperature state approximately 13.8 billion years ago.

This format enables the following: distinguishing between testing items, reliable differentiation of learner levels, adjusting difficulty by adding or removing items/topics, increasing or decreasing difficulty by grouping similar items or creating multiple statement and topic tasks, difficulty level to moderate, designed accordingly to the learning materials, and can be evaluated with ease.

· · ·

Format Comparison

To compare the effectiveness of traditional multiple-choice tests with the proposed statement format, a simulation was conducted. In this simulation, virtual learners with varied learning times, information retention levels, and stress responses were created. Here's how the performance was assessed:

Statement Format: Success was measured by how accurately learners sorted statements into the correct topics.
Multiple-Choice Format: Success was based on correctly answering a mix of single-choice and multiple-choice questions.

After running multiple simulations, it revealed that learners using the statement format had a 60.01% success rate, while those using multiple-choice tests had a slightly higher success rate of 67.19%. This suggests that while the statement format is more challenging, it potentially offers a deeper assessment of understanding. However, multiple-choice tests, with their higher success probability, might be seen as easier for learners.

· · ·

Conclusion

Multiple-choice tests are convenient and easily implemented, but they struggle with complex learning objectives. As education evolves, so should our assessment methods. By building on the strengths of multiple-choice tests, we can enhance both teaching and learning experiences.

· · ·

The code for the model can be found here (written in R)

# Model: Compare statement format to multiple-choice format
# Made by Noah Zimmermann (with ChatGPT)

library(tidyverse)

# Initialize parameters for tests
number_of_topics <- 5 # For statement test
number_of_options_per_question <- 4 # For multiple-choice test
set.seed(123) # For reproducibility

# Function to randomize learner attributes with a correlation to success
randomize_learner_attributes <- function(number_of_learners) {
  # Base attributes
  base_effort <- runif(number_of_learners, min = 1, max = 50)
  base_stress_performance <- runif(number_of_learners, min = 1, max = 50)
  base_correct_learning <- runif(number_of_learners, min = 1, max = 50)
  
  # Adjust attributes based on a simulated success probability
  success_probability <- runif(number_of_learners)
  effort <- base_effort + (success_probability * 50)
  stress_performance <- base_stress_performance + (success_probability * 50)
  correct_learning <- base_correct_learning + (success_probability * 50)
  
  tibble(
    LearnerID = 1:number_of_learners,
    Effort = effort,
    StressPerformance = stress_performance,
    CorrectLearning = correct_learning
  )
}

# Function to calculate proficiency
calculate_proficiency <- function(effort, stress_performance, correct_learning) {
  (effort + stress_performance + correct_learning) / 300
}

# Apply models and calculate mean probabilities
apply_models_and_calculate_means <- function(learners) {
  # Statement Test Model
  statement_results <- learners %>%
    mutate(Proficiency = calculate_proficiency(Effort, StressPerformance, CorrectLearning)) %>%
    mutate(Probability = 1 / number_of_topics + Proficiency * (1 - 1 / number_of_topics))
  
  # Multiple-Choice Test Model
  multiple_choice_results <- learners %>%
    mutate(Proficiency = calculate_proficiency(Effort, StressPerformance, CorrectLearning)) %>%
    mutate(
      SingleChoiceProbability = 1 / number_of_options_per_question + Proficiency * (1 - 1 / number_of_options_per_question),
      MultipleChoiceProbability = 1 - ((1 - 1 / number_of_options_per_question)^2) * (1 - Proficiency),
      
      
      MeanProbability = (SingleChoiceProbability + MultipleChoiceProbability) / 2
    )
  
# Determine success based on probability criteria
  
  statement_results <- statement_results %>%
    mutate(Successful = Probability >= 0.5)
  multiple_choice_results <- multiple_choice_results %>%
    mutate(Successful = MeanProbability >= 0.5)
  
# Calculate and print mean probabilities
  
  mean_statement_prob <- mean(statement_results$Probability)
  mean_multiple_choice_prob <- mean(multiple_choice_results$MeanProbability)
  
  list(StatementMeanProb = mean_statement_prob, MultipleChoiceMeanProb = mean_multiple_choice_prob,
       StatementSuccessRate = mean(statement_results$Successful),
       MultipleChoiceSuccessRate = mean(multiple_choice_results$Successful))
}

# Randomize learners based on a random number of learners and apply models

number_of_learners <- sample(15:60, 1)
learners <- randomize_learner_attributes(number_of_learners)
mean_probs_and_success <- apply_models_and_calculate_means(learners)

# Print mean probabilities and success rates

print(mean_probs_and_success)