Financial reasoning with citations — agents must ground every number in source documents and surface contradictions across filings, models, and market data.

Bench: Finance Bench
Overview
Finance Bench evaluates agents on tasks regulators and risk committees actually ask: extract metrics, reconcile statements, stress assumptions, and cite the filing — not hallucinate a plausible ratio.
Why it matters
A wrong basis point in a model can move capital or trigger enforcement. Finance Bench treats every numeric output as an L1 claim that must ascend through corroboration and verification before it influences a decision.
Methodology
- Corpus: public filings, synthetic ledgers, and tenant-scoped matter files (never commingled)
- Tasks: ratio computation, covenant checks, scenario comparison, anomaly explanation
- Scoring: citation validity, temporal correctness, contradiction surfacing, calibration on uncertainty
- Federation-safe summaries for cross-desk review without raw document export
Results & next steps
We report accuracy by task type, citation precision, and human escalation rate. Finance Bench aligns with regulated vertical requirements and ArcLegal-grade provenance discipline.
Arkivist Research
Updated February 1, 2026





