Abstraction and reasoning at the frontier of AGI — we evaluate whether neuro-symbolic agents can generalize grid transformations with provenance, not pattern memorization.

ARC-AGI research

Bench: ARC-AGI

Overview

ARC-AGI (Abstraction and Reasoning Corpus) tests whether systems discover rules from a handful of examples and apply them to novel grids. Arkivist treats each hypothesis as a claim with evidence, not a single-shot neural guess.

Why it matters

Benchmarks that reward scale alone miss what high-stakes work requires: compositional reasoning under audit. ARC-style tasks expose whether an agent can explain its rule before acting — critical when the output becomes operational evidence.

Methodology

  • Symbolic program search and graph-backed hypothesis ranking before neural synthesis
  • Contradiction checks when multiple rules fit training pairs
  • Confidence gated by context match and risk via GOLAG — low Lagrangian escalates to human review
  • Full provenance on every proposed transformation chain

Results & next steps

We publish calibration curves, success rates by task family, and failure taxonomy (overfit, under-specified rules, timeout). ARC-AGI is a north-star for verifiable generalization inside the Cartograph substrate.

Arkivist Research

Updated February 1, 2026

Ready to pilot verifiable intelligence?