Financial reasoning with citations — agents must ground every number in source documents and surface contradictions across filings, models, and market data.

Finance Bench research

Bench: Finance Bench

Overview

Finance Bench evaluates agents on tasks regulators and risk committees actually ask: extract metrics, reconcile statements, stress assumptions, and cite the filing — not hallucinate a plausible ratio.

Why it matters

A wrong basis point in a model can move capital or trigger enforcement. Finance Bench treats every numeric output as an L1 claim that must ascend through corroboration and verification before it influences a decision.

Methodology

  • Corpus: public filings, synthetic ledgers, and tenant-scoped matter files (never commingled)
  • Tasks: ratio computation, covenant checks, scenario comparison, anomaly explanation
  • Scoring: citation validity, temporal correctness, contradiction surfacing, calibration on uncertainty
  • Federation-safe summaries for cross-desk review without raw document export

Results & next steps

We report accuracy by task type, citation precision, and human escalation rate. Finance Bench aligns with regulated vertical requirements and ArcLegal-grade provenance discipline.

Arkivist Research

Updated February 1, 2026

Ready to pilot verifiable intelligence?