Context
Our trading system already had three strategies running in Paper Trading: Pipeline (trend following), BB Squeeze (Bollinger Band breakout), and MACD Divergence (divergence reversal).
These three share a common weakness: when the market enters a ranging phase, signals drop to nearly zero.
Our ADX indicator showed all six major coins below 20 — no clear trend. Trend-following strategies simply had nothing to do.
So I built a fourth strategy: Mean Reversion.
The Logic
The concept is straightforward:
- RSI below 35 (oversold) → wait for RSI to bounce → go long
- RSI above 65 (overbought) → wait for RSI to drop → go short
- Stop loss: 1.5× ATR
- Take profit: BB middle band (mean reversion target)
In a ranging market, price bounces within a band. Mean reversion should theoretically be the perfect strategy.
Backtest Results — Too Perfect
180 days of historical data, 6 coins × 2 directions = 12 combinations:
| Combination | Win Rate | Profit Factor | Trades |
|---|---|---|---|
| SOL LONG | 100% | ∞ | 3 |
| XRP LONG | 100% | ∞ | 3 |
| XRP SHORT | 85.7% | 7.93 | 7 |
| DOGE LONG | 83.3% | 5.19 | 6 |
| DOGE SHORT | 80% | 3.5 | 5 |
| ETH SHORT | 75% | 2.3 | 4 |
| BNB SHORT | 75% | 2.0 | 4 |
| BTC SHORT | 71.4% | 1.8 | 7 |
Eight combinations above 70% win rate. Three at 100%.
If we had gone live right then, we would have thought this strategy was invincible.
The Red Flag: Sample Size
One number kept bothering me: the number of trades.
SOL LONG and XRP LONG, both at 100% win rate, each had only 3 trades.
A 100% win rate on 3 trades has almost zero statistical significance. The probability of flipping heads three times in a row is 12.5%. Not exactly rare.
Out-of-Sample Validation
I ran OOS validation on every combination:
- Split 180 days into first 120 days (In-Sample) and last 60 days (Out-of-Sample)
- Train parameters on the first 120 days
- Test on the last 60 days — data the model has never seen
Results:
| Combination | IS Win Rate | OOS Win Rate | Drop |
|---|---|---|---|
| SOL LONG | 100% | 75% | -25% |
| XRP LONG | 100% | 0% | -100% |
| ETH SHORT | 100% | 25% | -75% |
| DOGE LONG | 83% | 50% | -33% |
| BTC SHORT | 80% | 40% | -40% |
| BTC LONG | 67% | 33% | -34% |
| BNB LONG | 67% | 50% | -17% |
XRP LONG: IS 100% → OOS 0%. Perfect on training data, every single trade lost on new data.
ETH SHORT: IS 100% → OOS 25%. Four wins in backtesting, only one win on unseen data.
Only 2 Combinations Survived
Out of 12 combinations, only 2 maintained stable OOS performance:
- SOL LONG: IS 78% → OOS 75% (only 3% gap)
- DOGE SHORT: IS 80% → OOS 76% (only 4% gap)
We kept only these 2 for Paper Trading. The other 10 were cut.
Why the Overfitting?
The root cause is small sample size:
- Mean reversion requires extreme RSI values to trigger
- Over 180 days, each combination generated only 3-7 trades
- 100% win rate on 3 trades ≠ the strategy works
- 85% win rate on 7 trades is also insufficient to draw conclusions
Statistics 101: to have reasonable confidence in a binary outcome (win/loss), you need at least 20-30 samples. Most of our combinations had single digits.
Z-score Validation
We quantify whether strategy performance significantly exceeds random using Z-scores:
| |
- Z > 1.96 → 95% confidence it’s better than random
- Z > 2.58 → 99% confidence
Our main strategy Pipeline (414 trades, 72% win rate):
| |
Z = 8.94, far above 2.58. This strategy genuinely outperforms random.
Meanwhile MR’s SOL LONG (3 trades, 100%):
| |
Z = 1.73, doesn’t even reach 95% confidence. Statistics says: this could just be luck.
Our Current Process
After this experience, every new strategy must pass three validations before going live:
- Minimum trade count: at least 20 trades in backtesting
- OOS validation: IS and OOS win rate gap under 10%
- Z-score test: Z > 1.96
Fail any one = no deployment. No exceptions.
Current strategy standings:
| Strategy | Trades | Win Rate | Z-score | Status |
|---|---|---|---|---|
| Pipeline | 414 | 72% | 8.94 | ✅ Primary |
| BB Squeeze | 82 | 56% | 1.09 | ⚠️ Supporting |
| MACD Div | 45 | 33% | -2.28 | ⚠️ Select pairs only |
| Mean Reversion | 53 | 25% | -3.64 | ❌ Only 2 pairs |
Pipeline is the only strategy with robust validation. The others need more data to build confidence.
Takeaway
If your backtest shows 100% win rate, assume it’s fake until OOS validation proves otherwise.
Small-sample backtest results are the most dangerous trap in trading system development. They create an illusion: this strategy is flawless. But flawless strategies don’t exist — flawless backtests are usually just overfitting.
Backtesting is a tool for eliminating bad strategies, not for proving good ones. A strategy that passes OOS validation might not make money, but one that fails definitely will disappoint you in live trading.