• Latestly AI
  • Posts
  • Which LLM Understands PDF Uploads Best?

Which LLM Understands PDF Uploads Best?

We tested the top LLMs on PDF understanding—tables, formatting, layout, and semantic accuracy. Here’s which model performs best on real-world document parsing and Q&A.

AI Benchmarks: PDF Understanding

Which LLM Understands PDF Uploads Best?

Uploading a PDF to ask questions or extract info sounds simple. But under the hood, it’s one of the hardest things for language models to do reliably.

Why? PDFs are:

  • Non-linear (two columns, footnotes, headers)

  • Often contain tables, charts, and layout-specific logic

  • Hard to convert cleanly to text without losing structure

We tested 4 leading models—GPT-4 Turbo, Claude 3.5, Gemini 1.5, and Mistral (via RAG pipeline)—to see which understands PDF uploads best.

Methodology

  • Uploaded 5 PDFs across legal, scientific, and business formats

  • Asked 10 Q&A tasks per document: extract data, summarize sections, find citations

  • Evaluated:

    • Text retention

    • Table recognition

    • Q&A accuracy

    • Section referencing

    • Factual grounding

Results Summary

Model

Overall Score (out of 100)

Claude 3.5 Sonnet

91

GPT-4 Turbo

88

Gemini 1.5 Pro

74

Mixtral (via RAG)

65

1. Text and Structure Retention

Task

Claude

GPT-4

Gemini

Mixtral

Section hierarchy

✅ Excellent

✅ Good

❌ Mid

❌ Weak

Paragraph continuity

✅ Strong

✅ Strong

❌ Inconsistent

❌ Often broken

Page headers/footers

✅ Filtered

❌ Included

❌ Included

❌ Included

Winner: Claude — best understanding of layout and relevance filtering.

2. Table Extraction and Parsing

Task

Claude

GPT-4

Gemini

Mixtral

Table recognition

✅ High

✅ Mid–High

❌ Mid

❌ Weak

Table Q&A accuracy

✅ 90%

✅ 82%

❌ 55%

❌ 40%

Row-column mapping

✅ Accurate

✅ Partial

❌ Lost

❌ Lost

Winner: Claude, followed by GPT-4.

3. Document Q&A and Referencing

Task

Claude

GPT-4

Gemini

Mixtral

Answer using section X

✅ 93%

✅ 90%

❌ 66%

❌ 58%

Citation grounding

✅ Yes

✅ Yes

❌ No

❌ No

Answering footnote-based Qs

✅ Strong

✅ Strong

❌ Missed

❌ Missed

Winner: Claude > GPT-4

Observations

  • Claude 3.5 excels at PDF-specific document parsing. Likely due to:

    • Pre-processing for layout

    • Better document memory and grounding

    • Citation referencing logic

  • GPT-4 Turbo is close, especially with structured documents (e.g. contracts), but struggles with noisy layouts and table-heavy files.

  • Gemini 1.5 often lost structure, treated tables as unstructured text, and hallucinated Q&A references.

  • Mixtral, when used via vector DB RAG pipelines, depended heavily on the embedding quality and chunking strategy—unreliable for detail-heavy tasks.

Final Verdict

Use Case

Best Model

Legal contracts

GPT-4 or Claude

Scientific papers / tables

Claude

Layout-heavy reports / footnotes

Claude

Fast basic parsing

GPT-4

Open-source RAG PDFs

Mixtral (with heavy tuning)

What You Can Learn

  • Upload ≠ understanding: PDF parsing requires preprocessing + formatting awareness

  • Claude is the only model that consistently “reads” PDFs like a human

  • If using GPT-4 or Mixtral, pair with tools like unstructured.io, PDFtoText, or layout-aware chunking

  • For production workflows, Claude 3.5 is currently best-in-class

Marco Fazio Editor,
Latestly AI,
Forbes 30 Under 30

We hope you enjoyed this Latestly AI edition.
📧 Got an AI tool for us to review or do you want to collaborate?
Send us a message and let us know!

Was this edition forwarded to you? Sign up here