I Built a Micro AI Company on a Single Cloud VPS (Hallucination Prevention, Quality Gates, and Model Tuning)

One cloud VPS. Five AI agents. Marketing, development, QA, and trading monitoring running automatically every day. The hard part was never getting the AI to move - it was stopping it from going off the rails. This post covers the real lessons: the SOL fake prediction incident, invented tool names, quality gate design, how I tuned the Hermes model, and how I tracked down two bugs that took the whole system down.

2026-05-08 · 12 min · 2498 words · Judy

Open-Source LLM in Production: Why We Chose MiniMax M2.7 for Our AI Team

A firsthand account of deploying MiniMax M2.7 into a multi-agent AI system — why we switched from GPT-4o, the real cost difference between subscription and per-token billing, and three pitfalls of running open-source LLMs in a production agent environment.

2026-04-12 · 4 min · 844 words · Judy

Claude Code Hooks: Automating Our AI Team

A real record of connecting an AI team with 4 Claude Code Hooks - PreToolUse as the guardrail, PostToolUse as the logger, Stop as the relay - flipping “human waiting for AI” into AI auto-handoffs. All the pitfalls we’ve hit, laid out bare.

2026-03-25 · 5 min · 854 words · Judy

AI Self-Review Pipeline: How We Got Agents to Review Their Own Code Before Sending PRs

When an Agent says it’s done, that doesn’t mean it’s actually done — this is something we’ve learned the hard way at Judy AI Lab. Silent failures in scheduled tasks, a 40% rejection rate on deliveries forced us to design a five-stage self-review loop: from spec confirmation, implementation, code review, fix, to Xiaoyue’s QA scoring. After going live for over a month, the rejection rate dropped from 40% to 10%.

2026-03-14 · 5 min · 992 words · Judy
Get our weekly AI digest:

AI engineering, trading systems, automation — curated weekly. No spam.