ARC-AGI

Overview

ARC-AGI (Abstraction and Reasoning Corpus) tests whether systems discover rules from a handful of examples and apply them to novel grids. Arkivist treats each hypothesis as a claim with evidence, not a single-shot neural guess.

Why it matters

Benchmarks that reward scale alone miss what high-stakes work requires: compositional reasoning under audit. ARC-style tasks expose whether an agent can explain its rule before acting — critical when the output becomes operational evidence.

Methodology

Symbolic program search and graph-backed hypothesis ranking before neural synthesis
Contradiction checks when multiple rules fit training pairs
Confidence gated by context match and risk via GOLAG — low Lagrangian escalates to human review
Full provenance on every proposed transformation chain

Results & next steps

We publish calibration curves, success rates by task family, and failure taxonomy (overfit, under-specified rules, timeout). ARC-AGI is a north-star for verifiable generalization inside the Cartograph substrate.

Overview

Why it matters

Methodology

Results & next steps

Ready to pilot verifiable intelligence?