DopeLab
INKby DopeLab

validation

2 articles

AI WorkflowMarch 22, 2026

Karpathy Proved It — AI Agents Without a Validation Harness Will Fail Every Time

Karpathy's March of Nines math is brutal: 90% accuracy sounds great until you chain 10 steps and get 35% success. Here's how we built a 32-check Validation Harness to fix it.

4 min
AI WorkflowMarch 22, 2026

Vision Eval — AI That Checks AI (Using Gemini Vision to QA AI-Generated Images)

We generate 20-30 AI images daily but never QA them — covers miss safe zones, images too dark, text gets blocked. We built vision-eval.py with Gemini Vision: 8 criteria, scored /80, 3 presets, compare mode.

4 min