What is WaveRider and what makes its trading claims verifiable?

WaveRider is an autonomous crypto trading agent built for the LabLab.ai ERC-8004 hackathon. Unlike typical AI trading bots that show inflated backtest numbers, WaveRider uses Walk-Forward Optimization to test strategies on unseen data, a 7-layer risk management system, and Merkle-verified audit trails on-chain. Every trade decision and performance metric can be independently verified through cryptographic proofs. It runs three complementary strategies (EMA crossover, BB Squeeze, MACD Divergence) routed by regime detection across a 36-cell parameter matrix covering 6 coins and 6 market states.

Why do standard backtest results from AI trading agents collapse in live trading?

Standard backtesting optimizes parameters on the same dataset used for testing, so the model effectively memorizes the answers. A 90% win rate on backtested data tells you nothing about future performance. Walk-Forward Optimization solves this by training on one time window, testing on the next unseen window, then rolling forward. This exposes overfitting immediately. Most hackathon agents skip this step because their numbers drop sharply when tested honestly. WaveRider scored 98 on validation precisely because it survives walk-forward testing instead of relying on in-sample curve fits.

How does WaveRider handle different market conditions instead of using one strategy?

WaveRider classifies market conditions into six regimes using ADX, Bollinger Band width, and EMA convergence: Trending Up, Trending Down, Ranging, High Volatility, Breakout Forming, and Exhaustion. Each regime routes to a different combination of three strategy engines. EMA crossover handles trends, BB Squeeze catches breakouts from compression, and MACD Divergence captures reversals. This produces a 36-cell matrix across 6 coins and 6 regimes, each with its own optimized parameter set. Single-strategy agents fail in sideways markets because they have no answer for non-trending conditions.

What risk management prevents WaveRider from large drawdowns?

WaveRider achieved just 0.4% drawdown on a $100,000 portfolio through seven layers of risk control: per-trade stop losses, position sizing based on volatility, daily loss caps, regime-based exposure limits, correlation checks across coins, trailing stops on winners, and a kill switch for extreme market conditions. The key is layering hard limits at every stage so no single failure mode can blow up the account. Most agents rely on one or two layers, which works until volatility spikes and cascading losses occur faster than the system can react.

What is ERC-8004 and why does it matter for trading agents?

ERC-8004 is an Ethereum standard for autonomous agent identity and reputation. It lets trading agents register an on-chain identity, publish verifiable performance claims, and accumulate a reputation score that other contracts and users can trust. WaveRider uses Merkle-verified audit trails to anchor every trade decision and validation result on-chain, so claims about win rate, drawdown, and Sharpe ratio cannot be faked or retroactively edited. This solves the core trust problem with AI trading agents, where off-chain numbers are essentially marketing screenshots.

Who should consider building or using an agent like WaveRider?

WaveRider's approach suits quantitative developers, hackathon teams targeting ERC-8004 leaderboards, and crypto fund operators who need auditable proof of agent behavior. It is not a plug-and-play retail bot. You need familiarity with walk-forward validation, regime classification, Python backtesting frameworks, and Solidity for the on-chain reputation layer. Retail traders looking for one-click profits should avoid this path. The system rewards engineers who care more about verifiable methodology than headline returns and who treat risk management as the primary feature, not an afterthought.

What are the most common mistakes when building AI trading agents for hackathons?

The biggest mistake is optimizing one strategy on historical data and showcasing the backtest. This collapses on unseen data. Second is ignoring market regime, which kills trend-following bots in ranging markets. Third is treating risk management as a stop-loss line instead of a multi-layer system. Fourth is publishing performance claims off-chain where they cannot be verified. Fifth is over-engineering parameters per coin without walk-forward validation, producing a 36-cell matrix that looks sophisticated but is just curve-fitted noise. Honest validation beats impressive-looking metrics every time on real leaderboards.

Building an AI Trading Agent That Proves Its Claims: Our ERC-8004 Hackathon Story

This article is a deep-dive from JudyAI Lab — an AI engineering playbook series with 100+ published guides, 5,000+ weekly readers across 60+ countries, focused on the practical side of running AI agents, trading systems, and content pipelines in production.

On March 31st, I deployed an AI trading agent with one strategy, zero on-chain presence, and a score of 58 out of 100.

Eleven days later, that agent sits at #5 on the leaderboard out of 58 registered teams, with a validation score of 98, a reputation score of 94, and — here’s the part I’m most proud of — a total drawdown of just 0.4% on a $100,000 portfolio through some of the choppiest market conditions we’ve seen all year.

This is the story of how we built WaveRider for the LabLab.ai “AI Trading Agents with ERC-8004” hackathon. Not the polished version. The real one — bugs, failures, 3 AM debugging sessions, and all.

The Problem We Set Out to Solve

Every AI trading agent at a hackathon shows you a backtest number. “90% win rate!” “3x returns!” The slides look great.

But ask a simple question — “How does it perform on data it hasn’t seen before?” — and most of those numbers collapse.

That’s because standard backtesting is a trap. You optimize parameters on the same data you test on. The model literally sees the answers. Of course it performs well. It’s like studying the answer key and calling yourself prepared for the exam.

We wanted to build something different: an agent whose claims you can actually verify.

Day 1–3: The Naive Start

WaveRider launched with a single strategy — EMA crossover + RSI momentum + volume confirmation. A classic trend-following approach.

The first few trades looked fine. Then the market went sideways.

Trend-following strategies in ranging markets are like bringing a surfboard to a lake. You sit there waiting for a wave that never comes, and every small ripple costs you money.

First lesson: One strategy isn’t enough.

We added two more engines: BB Squeeze (Bollinger Band compression breakouts) and MACD Divergence (price-momentum divergence for reversals). Three complementary strategies covering trends, breakouts, and reversals.

But which strategy should run when? That led us to regime detection — using ADX, Bollinger Band width, and EMA convergence to classify market conditions into six states: Trending Up, Trending Down, Ranging, High Volatility, Breakout Forming, and Exhaustion.

Each state routes to different strategy combinations. We ended up with a 36-cell matrix — 6 coins × 6 regimes — where each cell has its own optimized parameter set.

Day 4–5: Walk-Forward Optimization Exposes the Truth

Here’s where things got uncomfortable.

We ran Walk-Forward Optimization on all our strategies. Unlike standard backtesting, WFO trains on one time window and tests on the next unseen window, repeating this 8 times across 360 days of data. It’s the closest thing to simulating real forward performance without actually trading.

The results were humbling.

BTC Long, which looked great in traditional backtests, came back with a 40% out-of-sample win rate. We blacklisted it immediately.

DOGE Long? 30.3% OOS. Blacklisted.

But ETH? 93.3% long, 97.8% short across 91 out-of-sample trades. SOL held up at 72–76%. LINK hit 100% long (small sample, but consistent).

The overall OOS portfolio: 82.2% win rate across 366 trades that the model had never seen during optimization.

Second lesson: If your strategy can’t survive Walk-Forward, it can’t survive live markets. Kill your darlings based on data, not ego.

Day 6–7: Dual-AI Ensemble (And When It All Went Down)

Raw strategy signals are noisy. A trend-following signal in a choppy market is technically valid but practically worthless. We needed something to filter signal quality.

Enter the dual-AI ensemble: MiniMax M2.7 (cloud-based, strong reasoning) and Qwen 2.5 (local via Ollama, fast inference). Every signal gets reviewed by both models independently. They assess market context, confluence factors, and risk-reward ratio. An agreement bonus boosts confidence when both concur.

This pushed our signal rejection rate to 87% — only the highest-conviction setups get through.

Then, during scan #92 at 3 AM, all three AI backends (MiniMax, Claude, Ollama) timed out simultaneously. The agent was blind.

Third lesson: AI is a tool, not a crutch.

We reduced per-model timeouts from 45s to 25s, restructured the cascade to fail fast, and added a rule-based fallback that executes at 50% position size when AI is unavailable. The agent should degrade gracefully, never go dark.

Day 8–9: SOL Teaches Us About Per-Pair Risk

SOL/USDT hit three consecutive stop-losses. Our global consecutive loss counter was at 3 — but the global counter looks at ALL pairs combined. By the time it triggered the position scaling reduction, SOL had already bled through three full-size trades.

Fourth lesson: Global risk controls aren’t granular enough.

We invented Layer 6: Per-pair throttle. Two consecutive losses on the same pair triggers a 3-scan cooldown for that specific pair. The risk system went from 5 layers to 7:

Position sizing (max 5% per trade)
Daily loss limit (3% → auto-stop)
Max drawdown (10% → emergency close all)
Global consecutive loss pause
Consecutive loss scaling (50% size)
Per-pair throttle (new)
Batch take-profit (TP1→breakeven, TP2→tighten, TP3→close)

Result: Total drawdown held to 0.4% across 11 days of adverse market conditions. The 7-layer system prevented over $8,300 in potential losses compared to our worst-case OOS scenario.

Day 10: The Reputation Crisis

The hackathon uses shared smart contracts for scoring. One of them — the ReputationRegistry — allows agents to post reputation updates on-chain.

Except it doesn’t. Not for self-assessment.

Every submitFeedback call with our agent wallet reverted. The contract blocks self-rating by design (to prevent score inflation). We tried feedbackType 0, 1, 2 — all rejected.

Most teams would just accept a zero reputation score and move on. We went a different direction.

We built a zero-base reputation formula from scratch. Starting at 0 — not 50, not 65, not some padded base that masks poor performance. Every point is earned:

Risk control: 30 points max (our drawdown under 0.5% = 30/30)
Transparency: 20 points (artifacts per trade ratio)
Validation quality: 15 points
Activity: 15 points
Win rate: 10 points
PnL: 10 points (can be negative)

Our score: 79/100. Not flashy. But every single point represents measurable, verifiable performance. Run make reputation to see the full breakdown.

Fifth lesson: When the system blocks you, build a better system. Constraints breed innovation.

Day 10 (continued): Merkle Integrity

If we’re asking judges to trust our validation artifacts, we should give them a way to verify nothing was tampered with after the fact.

We built a SHA-256 Merkle tree over all 205 validation records — trade intents, risk checks, and strategy checkpoints. The root hash is embedded in the agent card and posted on-chain.

Run make verify to recompute the Merkle root independently. If it matches, no records have been modified. If it doesn’t, something changed.

This is trust-minimized verification applied to trading agents. Not “trust my numbers.” Instead: “here’s the math to check them yourself.”

Day 11: The Breakthrough

The hackathon organizers announced they’d fixed a Solidity bug in the ValidationRegistry. The postEIP712Attestation function was using this.postAttestation(...) — an external call that changes msg.sender from the operator wallet to the contract itself. Since the contract wasn’t in its own whitelist, every checkpoint submission reverted.

Within minutes of the fix, we posted 6 validation checkpoints covering risk management, WFO results, Merkle integrity, reputation methodology, and AI ensemble design.

Validation score jumped to 98. Reputation to 94. Leaderboard position: #5 out of 58.

The Numbers We’re Proud Of (And the Ones We’re Not)

Let’s be transparent:

Metric	OOS Backtest	Live Paper Trading
Win Rate	82.2% (366 trades)	40.0% (25 trades)
Max Drawdown	-8.7%	-0.4%
Risk Rejection	—	87% of signals filtered

The 40% live win rate is real, and we’re not hiding it. The hackathon period was dominated by ranging and choppy markets — exactly the regime where trend-following strategies underperform. Our WFO backtest windows included ~60% trending regimes; live was the opposite.

But here’s what matters: the risk system did its job. A 40% win rate with -0.4% drawdown means the agent lost small and protected capital. When markets turn favorable for our strategies, the validated edge is there. When they don’t, the 7-layer risk system keeps the damage negligible.

A production agent that loses 0.4% in bad markets is more valuable than a demo agent that shows 80%+ on cherry-picked data.

What We Built (The Technical Stack)

3 strategy engines with 36-cell regime-adaptive routing
7-layer risk management including per-pair throttle and batch take-profit
Dual-AI ensemble (MiniMax M2.7 + Qwen 2.5) with 87% signal rejection
Walk-Forward Optimization (8 windows, 366 OOS trades)
SHA-256 Merkle tree over 205 validation artifacts
Zero-base reputation formula (base=0, fully earned)
ERC-8004 on-chain identity (Agent #17, dual-contract registration)
79 EIP-712 signed trade intents submitted to RiskRouter
93 tests (unit + integration + integrity)
Graceful shutdown with SIGTERM/SIGINT handling
All source files under 800 lines. Docker + systemd ready.

Everything is open source at github.com/JudyaiLab/hackathon-trading-agent.

Run make test && make validate && make verify && make reputation to verify every claim in this post.

What I Learned

1. Validation methodology matters more than backtest numbers. Any model can show 90% on in-sample data. Walk-Forward is the honest test.

2. Risk management is the product. Not the strategy, not the AI, not the fancy indicators. When markets turn against you, the only thing that matters is how much you lose.

3. Transparency is a competitive advantage. Showing your failures alongside your successes builds more trust than a perfect track record ever could.

4. Constraints breed innovation. The contract blocking self-rating forced us to build zero-base reputation. The AI timeout forced us to build fallback systems. SOL bleeding forced us to build per-pair throttling. Every problem made the agent better.

5. On-chain identity changes the game. ERC-8004 isn’t just a hackathon requirement — it’s the future of agent accountability. When any agent can register a verifiable identity and build portable reputation, the entire ecosystem levels up.

WaveRider is Agent #17 on Sepolia. Validation score: 98. Reputation: 94. Leaderboard: #5 of 58.

Built during 11 sleepless days by JudyAI Lab.

GitHub · JudyAI Lab

Building an AI Trading Agent That Proves Its Claims: Our ERC-8004 Hackathon Story

The Problem We Set Out to Solve

Day 1–3: The Naive Start

Day 4–5: Walk-Forward Optimization Exposes the Truth

Day 6–7: Dual-AI Ensemble (And When It All Went Down)

Day 8–9: SOL Teaches Us About Per-Pair Risk

Day 10: The Reputation Crisis

Day 10 (continued): Merkle Integrity

Day 11: The Breakthrough

The Numbers We’re Proud Of (And the Ones We’re Not)

What We Built (The Technical Stack)

What I Learned

References

Further Reading

The Problem We Set Out to Solve#

Day 1–3: The Naive Start#

Day 4–5: Walk-Forward Optimization Exposes the Truth#

Day 6–7: Dual-AI Ensemble (And When It All Went Down)#

Day 8–9: SOL Teaches Us About Per-Pair Risk#

Day 10: The Reputation Crisis#

Day 10 (continued): Merkle Integrity#

Day 11: The Breakthrough#

The Numbers We’re Proud Of (And the Ones We’re Not)#

What We Built (The Technical Stack)#

What I Learned#

References#

Further Reading#

Get our weekly AI digest:

The Problem We Set Out to Solve

Day 1–3: The Naive Start

Day 4–5: Walk-Forward Optimization Exposes the Truth

Day 6–7: Dual-AI Ensemble (And When It All Went Down)

Day 8–9: SOL Teaches Us About Per-Pair Risk

Day 10: The Reputation Crisis

Day 10 (continued): Merkle Integrity

Day 11: The Breakthrough

The Numbers We’re Proud Of (And the Ones We’re Not)

What We Built (The Technical Stack)

What I Learned

References

Further Reading