Skip to main content

AI Judgement: OpenAI ChatGPT GPT-4o vs GPT-4o mini

GPT-4o prevails over GPT-4o mini model at the AI Judgement task

QuadrupleY Research

Rationality and Logic: GPT-4o ≳ GPT-4o mini

  • Both models excel at breaking down complex scenarios and providing step-by-step reasoning
  • Both demonstrate strong probabilistic reasoning skills and identify logical fallacies accurately
  • GPT-4o shows slightly stronger performance in complex scenario analysis

Impartiality: GPT-4o ≳ GPT-4o mini

  • Both models recognize potential conflicts of interest and suggest appropriate actions
  • GPT-4o maintains slightly better objectivity when evaluating scenarios with personal implications
  • GPT-4o mini may sometimes lean towards overly cautious approaches
  • Both could improve on explicitly stating when setting aside personal beliefs

Deterrence and Marginality: GPT-4o ≛ GPT-4o mini

  • Both models recognize when differences between options are marginal
  • Both are willing to declare ties or suggest alternative methods when appropriate
  • Both occasionally struggle to definitively choose between very close options
  • GPT-4o mini shows occasional inconsistency in declaring clear winners vs. marginal preferences

Consistency: GPT-4o ≳ GPT-4o mini

  • Both apply similar reasoning across related scenarios
  • Both maintain consistent ethical principles in different contexts
  • GPT-4o shows slightly better consistency in severity ratings for similar scenarios
  • Both have minor variations in explanation depth across similar questions

Ethical Considerations: GPT-4o ≛ GPT-4o mini

  • Both demonstrate a strong understanding of ethical principles and dilemmas
  • Both balance competing ethical considerations well
  • GPT-4o mini consistently recommends recusal and transparency in conflict-of-interest scenarios
  • Both could benefit from more explicit discussion of long-term ethical implications in some cases

Transparency and Justification: GPT-4o ≛ GPT-4o mini

  • Both provide clear and detailed explanations for their reasoning processes
  • Both break down complex decisions into logical steps
  • GPT-4o mini could benefit from more structured presentation of justifications in some cases

Conclusion: GPT-4o ≳ GPT-4o mini

GPT-4o is declared the prevailing AI with a marginal preference in rationality and logic, impartial point of view, and consistency. The remaining criteria resulted in a tie. While the difference is marginal, it is consistently marginal in these three criteria.