The Evaluation Commons

CentPol's public artifacts: benchmarks, model cards, evaluation reports, datasets, deployment case studies, and governance patterns — built in the open for African and emerging-market AI.

Partner on the commons Why we're building this

Public artifacts

Output types

Open

Published openly

Building in the open

The first artifacts are on the way.

The commons is being built with partners — every engagement leaves a reusable trace. Below is what we publish; if you want to help shape the first benchmarks, let's talk.

Help build the first artifacts Read the vision

What we publish

Six kinds of public artifact

Not a frontier model lab — a downstream one. CentPol adapts, evaluates, deploys, and governs AI in the contexts large labs are unlikely to prioritize, and publishes the evidence.

Artifact type

Benchmark

Open benchmark suites that reflect local languages, infrastructure, and institutional realities.

Artifact type

Model card

Plain-language model cards: what a system is good for, where it fails, and under what conditions.

Artifact type

Evaluation report

Reproducible evaluations comparing open and commercial models on real local tasks and risk scenarios.

Artifact type

Deployment case study

Deployment case studies that document what was built, what it cost, and what actually worked.

Artifact type

Dataset

Documented, consent-aware datasets for evaluation and adaptation in public-interest contexts.

Artifact type

Governance pattern

Reusable governance patterns — risk registers, oversight, and validation that partners can adopt.