Abstraction and reasoning at the frontier of AGI — we evaluate whether neuro-symbolic agents can generalize grid transformations with provenance, not pattern memorization.

Bench: ARC-AGI
Overview
ARC-AGI (Abstraction and Reasoning Corpus) tests whether systems discover rules from a handful of examples and apply them to novel grids. Arkivist treats each hypothesis as a claim with evidence, not a single-shot neural guess.
Why it matters
Benchmarks that reward scale alone miss what high-stakes work requires: compositional reasoning under audit. ARC-style tasks expose whether an agent can explain its rule before acting — critical when the output becomes operational evidence.
Methodology
- Symbolic program search and graph-backed hypothesis ranking before neural synthesis
- Contradiction checks when multiple rules fit training pairs
- Confidence gated by context match and risk via GOLAG — low Lagrangian escalates to human review
- Full provenance on every proposed transformation chain
Results & next steps
We publish calibration curves, success rates by task family, and failure taxonomy (overfit, under-specified rules, timeout). ARC-AGI is a north-star for verifiable generalization inside the Cartograph substrate.
Arkivist Research
Updated February 1, 2026





