What is Thousand Token Wood and what problem does it solve?

Thousand Token Wood is a multi-agent economic simulation submitted to the Build Small Hackathon. It runs five forest animal characters trading five goods for stone currency using Qwen2.5-3B, a 3B parameter small model. Deployed on Modal with vLLM and a Gradio frontend, it uses a single batched GPU call per turn for all characters, making continuous simulation affordable. The project proves that small models can power complex agent economies when rule design carries the cognitive load, producing emergent bubbles, crashes, and wealth inequality without relying on larger LLMs.

How does the system create bubbles and crashes with only a 3B model?

Bubbles emerge from three scarcity rules, not model intelligence. Characters can only eat one unit of the same food per meal, food rots so it cannot be stockpiled, and winter firewood demand spikes with only one supplier. In a 1929 bank-run scenario, honey crashed from 10 to 3 stones in a few turns when character Oona dumped inventory, while firewood spiked from 4 to 7 during the winter crunch. Structured constraints generate real trading incentives, letting price dynamics arise from the environment instead of the model's reasoning.

What were the measurable results in the 15-turn test?

Across 15 turns the system made 75 API calls and achieved 100% valid JSON output, with 3 to 9 successful trades per turn. The Gini coefficient expanded from 0.14 to 0.38, showing wealth inequality emerging organically from agent decisions. Format reliability was effectively perfect, while economic reasoning remained the weak link. These numbers confirm that a 3B model can sustain stable multi-agent interaction loops at low cost, provided the prompt and market rules carry the structural complexity instead of leaning on raw model capability.

How do I fix weak economic reasoning without upgrading the model?

Do not switch to a larger model. Instead, enrich the prompt with explicit structured context: list each character's produced goods, a blocked purchase list of items they cannot buy, an out-of-stock list for the current turn, and concrete trade examples. This grounds the agent's decision space and removes ambiguity that small models cannot resolve on their own. The Thousand Token Wood team proved this approach holds JSON validity at 100% and produces realistic price swings, demonstrating the core lesson: structure beats scale for agent economies.

What are the limits and risks of running agent economies on a 3B model?

Economic reasoning is shallow, so agents will not invent strategies, hedge, or anticipate cycles without explicit prompt scaffolding. Remove the scarcity rules and trading collapses because overproduction kills incentives. The model handles format reliably but cannot self-correct flawed market design, meaning any bug in rules or supply constraints propagates directly into prices. Batched GPU inference keeps costs low, yet adding many more agents or longer horizons will stress vLLM throughput. Treat the small model as an executor of structured rules, not as an autonomous economist.

Who is Thousand Token Wood actually built for?

It targets AI engineers, indie hackers, and researchers exploring agent-based simulation, emergent economics, or multi-agent reinforcement environments on a tight budget. The Modal plus vLLM plus Gradio stack suits hackathon teams, educators teaching market dynamics, and product builders prototyping NPC economies for games or social simulations. Anyone tempted to default to GPT-4 class models for agent worlds should study this project first. If your goal is realistic emergent behavior rather than deep individual reasoning, a 3B model with disciplined rule design delivers comparable outcomes at a fraction of the cost.

Why is structure better than scale for multi-agent systems?

Larger models cost more per call and rarely fix design flaws in the simulation itself. Thousand Token Wood shows that three well-chosen constraints — meal limits, food rot, and a single firewood supplier — generated price bubbles, crashes, and a Gini jump from 0.14 to 0.38 using only Qwen2.5-3B. The emergent behavior came from the rules, not the parameters. Investing engineering effort into prompt scaffolding, market mechanics, and scarcity design yields more realistic dynamics than upgrading model size, and keeps inference cheap enough to run long, continuous simulations.

Running a Multi-Agent Economic System on a 3B Parameter Small Model: Thousand Token Wood Practical Report

This article is a deep-dive from JudyAI Lab — an AI engineering playbook series with 100+ published guides, 5,000+ weekly readers across 60+ countries, focused on the practical side of running AI agents, trading systems, and content pipelines in production.

📰 Key Highlights

“Thousand Token Wood” is a multi-agent economic simulation system submitted to the Build Small Hackathon, using the Qwen2.5-3B small model to power five forest animal characters trading five types of goods for stone currency in a fictional market. The entire system is deployed on Modal with vLLM, with Gradio for the frontend, and only requires one batch GPU call per turn to complete all characters’ decisions, making continuous simulation cost-feasible.

The tech team found that without artificially designed scarcity mechanisms in the market, overproduction kills trading incentives, so they added three constraints: only one unit of the same food per meal, food rots and can’t be stockpiled, and winter firewood demand spikes but with only one supplier. These three rules directly spawned bubbles and crashes — in a scenario based on the 1929 bank run, character Oona sold honey for stones, causing honey prices to drop from 10 to 3 within a few turns; firewood spiked from 4 to 7 due to the winter crisis.

In the 15-turn test, 75 API calls achieved 100% valid JSON output, 3 to 9 trades per turn, and the Gini coefficient expanded from 0.14 to 0.38, with wealth gaps naturally emerging. While the model was stable with JSON format, its economic reasoning was weak — the fix was to explicitly list角色生产物、禁止购买清单、缺货列表及範例 in the prompt, rather than switching to a larger model. The core conclusion: “Structure beats scale.”

💬 JudyAI Lab Perspective

Thousand Token Wood used the Qwen2.5-3B small model to create bubbles and wealth differentiation, and it tells us something counter-intuitive: you don’t need a bigger model, you need better rules design.

The system drove honey from 10 to 3 and firewood from 4 to 7 in just a few turns — not through the model’s economic reasoning ability, but through three character-designed scarcity rules: food rotting, one unit limit per meal, and only one supplier in winter. This gave characters real trading incentives and let bubbles emerge naturally. The prompt explicitly listed each character’s produces, blocked purchase lists, and out-of-stock lists. 75 API calls achieved 100% JSON-valid output, the Gini coefficient expanded from 0.14 to 0.38, and wealth differentiation emerged without design. The key takeaway: when a multi-Agent system doesn’t behave as expected, first tighten environmental constraints and make the prompt more specific — don’t rush to switch to a larger model.

If you’re designing a multi-Agent workflow, try asking a question: after removing all external constraints, do Agents still have reasons to interact with each other? The answer often lies in the rules design, not the model size.

📅 Source Info

Published: 2026-06-05T22:18
Source: https://huggingface.co/blog/build-small-hackathon/thousand-token-wood-sim

Running a Multi-Agent Economic System on a 3B Parameter Small Model: Thousand Token Wood Practical Report

📰 Key Highlights

💬 JudyAI Lab Perspective

📅 Source Info

🔗 Further Reading

References

📰 Key Highlights#

💬 JudyAI Lab Perspective#

📅 Source Info#

🔗 Further Reading#

References#

Get our weekly AI digest:

📰 Key Highlights

💬 JudyAI Lab Perspective

📅 Source Info

🔗 Further Reading

References