Backtesting

Backtesting replays your trading flow against historical candle data to evaluate how it would have performed over a defined time period. This document covers how backtesting works, how to interpret results, and the limitations of historical testing.

How Backtesting Works

When you run a backtest, the engine fetches historical candle data for your selected exchange, symbol, and timeframe. It then iterates through each candle chronologically, executing your flow at each step as if it were a live tick.

At each tick, the engine evaluates all nodes in your flow (triggers, indicators, logic, risk, execution) using the historical data available up to that point. Trade signals are generated and simulated fills are recorded based on the candle's price data.

After processing all candles, the engine calculates performance metrics and displays the results.

Backtest Configuration

Date RangeThe start and end dates for the historical period. Candle data is fetched from your selected exchange for this range.

ExchangeThe exchange to source historical data from. Data availability and granularity may vary between exchanges.

SymbolThe trading pair (e.g., BTC/USDT, ETH/USDT). The symbol must be available on the selected exchange.

TimeframeThe candle timeframe (1m, 5m, 15m, 1h, 4h, 1d). Each candle represents one tick in the backtest. Shorter timeframes produce more ticks and longer run times.

Performance Metrics

After a backtest completes, the following metrics are calculated:

PnL (Profit and Loss)

Total profit or loss in dollar terms and as a percentage of starting balance. This is the bottom-line result of the strategy over the backtest period.

Win Rate

Percentage of trades that were profitable, calculated as winning trades divided by total trades. A high win rate does not necessarily indicate a good strategy if the average loss exceeds the average win.

Max Drawdown

The largest peak-to-trough decline in equity during the backtest. Expressed as a dollar amount and percentage. This measures the worst-case decline a trader would have experienced.

Sharpe Ratio

Risk-adjusted return calculated as the mean return divided by the standard deviation of returns, annualized. A Sharpe ratio above 1.0 is generally considered acceptable. Below 1.0 indicates the return may not justify the risk taken.

Profit Factor

The ratio of gross profits to gross losses. A value above 1.0 means the strategy produced more profit than loss. A profit factor of 2.0 means the strategy earned twice as much as it lost.

Equity Curve

A line chart showing account equity over time throughout the backtest. This visually illustrates the growth, decline, and volatility of the strategy across the test period.

Flow Snapshots

Each backtest preserves the exact nodes and edges of your flow at the time of the run. This means you can review the precise strategy configuration that produced a given set of results, even if you later modify the flow.

Flow snapshots are stored alongside the backtest results and can be viewed in the backtest history.

Limitations of Backtesting

IMPORTANT: Backtesting has fundamental limitations. Read this section before making any decisions based on backtest results.

Historical data is not predictive. Past market conditions do not repeat exactly. A strategy that performed well on historical data may produce losses in current or future markets.
Overfitting risk. A strategy with many tightly tuned parameters that performs well on a specific historical period may be overfit to that data. Such strategies often fail in live markets. Use out-of-sample testing and keep parameters reasonable.
No real execution dynamics. Backtests do not account for real order book depth, liquidity, slippage, partial fills, exchange latency, or order rejections. These factors can materially affect live performance.
Survivorship bias. Backtests only run on symbols that currently exist. Tokens that were delisted or went to zero during the test period may not be represented in the available data.
Look-ahead bias. Ensure your flow logic only uses data that would have been available at each point in time. The engine processes candles sequentially to prevent this, but indicator warmup periods should be considered.
Data quality. Historical candle data is sourced from third-party exchanges and may contain gaps, errors, or inconsistencies, particularly for less liquid trading pairs or shorter timeframes.

Best Practices

Test across multiple time periods. Run your strategy over bull markets, bear markets, and sideways periods to understand how it behaves in different conditions.
Use out-of-sample testing. Develop your strategy using one time period and validate it on a separate, unseen period. If performance degrades significantly, the strategy may be overfit.
Focus on drawdown, not just profit. A strategy with high returns but 60% drawdown may be unsuitable. Evaluate risk-adjusted metrics like Sharpe ratio and maximum drawdown alongside raw PnL.
Keep strategies simple. Strategies with fewer parameters are less likely to be overfit and more likely to generalize to live markets.
Always paper trade after backtesting. Before going live, validate your strategy with paper trading using real-time data. See the Paper vs. Live Trading documentation.