The Claude API Techniques We Use for 95% Extraction Accuracy
Back to Research
Technical95% accuracy achieved

The Claude API Techniques We Use for 95% Extraction Accuracy

15 min read
April 2026
Avanon Engineering

Abstract

We extract structured data from unstructured documents at production scale — insurance policies, regulatory filings, pharmacy claim files, real estate comps, contracts, invoices. "95% accuracy" here means field-level accuracy on a labeled holdout set of 4,200 documents spanning 11 document types. This paper details the exact techniques that moved our accuracy from 71% baseline to 95.3%.

Production Performance

95.3%
Field-level accuracy
11 sec
Median latency
$0.08
Cost per document
0.0%
Malformed JSON

Figure 1: Accuracy Improvement Journey

Baseline
71%
+ Citations
86%
+ Two-pass
90%
+ Schema
93%
+ Multi-sample
95%

1. Extract with Citations

The single highest-leverage change. Instead of asking the model to return a JSON object of fields, ask it to return each field alongside the exact verbatim span from the source document that justifies that value. A follow-up deterministic check verifies that the span appears in the source and that the extracted value can be derived from the span.

{
  "contract_value": {
    "value": 250000,
    "citation": "Total Contract Amount: $250,000.00 USD",
    "page": 3,
    "confidence": 0.94
  }
}

This one change moved us from 71% to 86% field-level accuracy by catching hallucinated values.

2. Two-Pass Pipeline

Pass one asks the model to identify the page or section where each field of interest lives. Pass two asks it to extract the field given only that localized context. Two-pass is measurably more accurate than one-pass because the model makes fewer attention errors in shorter contexts.

Pass 1: Locate

"Find the page/section containing the contract value"

Pass 2: Extract

"Given page 3, extract the contract value with citation"

Cost is roughly 1.4x a single call; accuracy gain averaged 4.2 points in our eval.

3. Schema as Prompt

The JSON schema you ask the model to produce is prompt. It's the most efficient prompt you have. Fields should have: a clear semantic name, a type, a description including edge cases, a list of acceptable formats for strings, explicit null-vs-missing semantics, and where relevant a small enum of allowed values.

A well-documented schema outperforms several paragraphs of prose instructions. Our extraction accuracy rose 2.8 points when we replaced free-form instructions with schema-as-prompt using Anthropic's tool use for structured output.

4. Multi-Sample Agreement

For high-stakes fields (contract value, effective date, counterparty legal name) we run the extraction three times at temperature 0.3 and score the agreement across samples. Fields that agree across all three samples go through without review; fields with disagreement get flagged.

Agreement Scoring Results

97.1%
3-sample agreement rate
0.79
Combined signal correlation
3x
Inference cost for high-stakes

5. Production Pipeline

1
Document Ingestion
PDF/image → OCR → structured text
2
Page-level Localization
Identify relevant sections per field
3
Table Sub-agent
Specialized handling for tabular data
4
Field Extraction
Tool-use schema with citations
5
Multi-sample Scoring
3x inference for high-stakes fields
6
Validation
Citation check + schema validation
7
Routing
Auto-accept, auto-retry, or human review

Figure 2: Accuracy by Document Type

Invoices
97%
Contracts
96%
Insurance
95%
Regulatory
94%
Financial
93%
Claims
92%

Evaluation Methodology

Eval Set

4,200 documents with field-level gold standards across 11 document types

Model

Claude 3.5 Sonnet via Anthropic API with tool use

Metrics

Field-level accuracy, latency, cost per document, malformed output rate

Labeling

Double-blind annotation with adjudication for disagreements

The methods matter less than the eval-driven discipline behind them. Build the eval first. Everything else follows. Contact engineering@avanon.com for implementation details.

Back to Research