validation
2 articles
AI WorkflowMarch 22, 2026
Karpathy Proved It — AI Agents Without a Validation Harness Will Fail Every Time
Karpathy's March of Nines math is brutal: 90% accuracy sounds great until you chain 10 steps and get 35% success. Here's how we built a 32-check Validation Harness to fix it.
4 min
AI WorkflowMarch 22, 2026
Vision Eval — AI That Checks AI (Using Gemini Vision to QA AI-Generated Images)
We generate 20-30 AI images daily but never QA them — covers miss safe zones, images too dark, text gets blocked. We built vision-eval.py with Gemini Vision: 8 criteria, scored /80, 3 presets, compare mode.
4 min
