Skip to main content

Email Generation for Work: Claude 3.5 vs Llama 3.1 70B

Claude 3.5 prevails over Llama 3.1 70B at the Email Generation for Work trial

QuadrupleY Research

Email Quality: Claude 3.5 Sonnet > Llama 3.1 70B

  • Both AIs consistently produced grammatically correct, clear, and coherent emails
  • Claude 3.5 Sonnet often provided significantly more comprehensive and better-structured responses, particularly in complex scenarios like internal policy announcements and sales pitches
  • Llama 3.1 70B occasionally maintained a more personal tone in certain scenarios, especially in customer service contexts
  • Both AIs showed equal proficiency in crafting job interview follow-up emails

Accuracy and Information Integrity: Claude 3.5 Sonnet ≳ Llama 3.1 70B

  • Both AIs adhered well to the provided information without fabrication
  • Claude 3.5 Sonnet consistently provided more detailed and specific information, notably in welcome emails to new employees and event cancellation notices

Relevance and Customization: Claude 3.5 Sonnet ≳ Llama 3.1 70B

  • Both AIs demonstrated good ability to understand and address specific instructions
  • Claude 3.5 Sonnet showed superior tailoring in complex situations, such as collaboration requests between departments
  • Llama 3.1 70B excelled in maintaining a more personal and empathetic tone in customer-focused scenarios
  • Both AIs showed willingness to offer substantial compensation in customer complaint scenarios

Consistency: Claude 3.5 Sonnet ≳ Llama 3.1 70B

  • Both AIs maintained a uniform voice across multiple email types
  • Claude 3.5 Sonnet demonstrated more consistent quality across varying complexity levels, particularly in handling apology emails for missed deadlines
  • Llama 3.1 70B showed strong adaptability in matching formal tones when required

User Experience: Claude 3.5 Sonnet > Llama 3.1 70B

  • Claude could attach documents, guides, and examples in many formats.
  • Llama has no native support for document attachment, making prompts more difficult to customize.

Authenticity: Claude 3.5 Sonnet ≛ Llama 3.1 70B

  • Both AIs generally matched appropriate tones and styles for different email contexts
  • Llama 3.1 70B outperformed Claude 3.5 Sonnet in some customer service scenarios by using a more natural, conversational tone and I-statements

Conclusion: Claude 3.5 Sonnet > Llama 3.1 70B

Claude 3.5 Sonnet and Llama 3.1 70B both performed well in the email generation for work trial, with Claude 3.5 Sonnet showing a clear edge in comprehensiveness, structure, and handling complex scenarios. Claude particularly excelled in formal communications, detailed explanations, and enhancing readability through superior formatting. Llama 3.1 70B demonstrated strengths in maintaining a personal tone, especially in customer service contexts, often achieving a more natural, conversational style. While Claude 3.5 Sonnet generally provided more detailed and structured responses, Llama 3.1 70B showed surprising adaptability in matching formality when required.