The Evaluation Commons
CentPol's public artifacts: benchmarks, model cards, evaluation reports, datasets, deployment case studies, and governance patterns — built in the open for African and emerging-market AI.
Building in the open
The first artifacts are on the way.
The commons is being built with partners — every engagement leaves a reusable trace. Below is what we publish; if you want to help shape the first benchmarks, let's talk.
Six kinds of public artifact
Not a frontier model lab — a downstream one. CentPol adapts, evaluates, deploys, and governs AI in the contexts large labs are unlikely to prioritize, and publishes the evidence.
Benchmark
Open benchmark suites that reflect local languages, infrastructure, and institutional realities.
Model card
Plain-language model cards: what a system is good for, where it fails, and under what conditions.
Evaluation report
Reproducible evaluations comparing open and commercial models on real local tasks and risk scenarios.
Deployment case study
Deployment case studies that document what was built, what it cost, and what actually worked.
Dataset
Documented, consent-aware datasets for evaluation and adaptation in public-interest contexts.
Governance pattern
Reusable governance patterns — risk registers, oversight, and validation that partners can adopt.