The Monte Carlo envelope is most powerful when used to calibrate expectations rather than to pass/fail a strategy.
Setting Realistic Risk Limits
If the P95 worst case is 49.4%, you should plan for a drawdown near that level — not the 26.4% you observed. The observed drawdown is a single sample. The P95 is the planning boundary: "given my trades, the worst plausible drawdown from bad sequencing is around 49%." If that exceeds your tolerance, reduce position size or tighten stops before deploying.
Envelope + Drawdown Analysis
The Drawdown Analysis modal shows the depth, duration, and recovery pattern of the observed drawdowns. The Envelope shows whether that observation was typical. Together, they answer: "how bad was it, and how bad could it have been?" If the observed max drawdown is at the 20th percentile, the strategy was lucky — live trading will likely produce worse drawdowns. If it is at the 80th percentile, the strategy was unlucky — structural risk is actually lower than the backtest implies.
Envelope + Performance Metrics
The Performance Metrics card shows return-based ratios (Sharpe, Calmar). The envelope adds a dimension: if the range width is large, those ratios are unstable. A Calmar ratio computed from the observed drawdown could be dramatically different if computed from the P95 drawdown instead. For wide-range strategies, compute Calmar using P95 to get a conservative risk-adjusted measure.
Envelope + Behaviour
The Behaviour card shows win/loss streaks. Long losing streaks are the primary driver of sequencing risk — they create deep drawdowns when clustered at the start of trading. If the behaviour analysis shows a max losing streak of 9 (as in this strategy), the Monte Carlo distribution will naturally be wide because clustering those 9 losses together produces very different equity curves than spreading them evenly.
The Re-run Button
The top-right Re-run button regenerates all 10,000 simulations with fresh random seeds. Because Monte Carlo is stochastic, each run produces slightly different percentiles and bounds. If you run it three times and get materially different results each time, the distribution is unstable — typically a signal that 10,000 simulations is insufficient (rare) or that the trade set is very small. Consistent results across runs confirm the analysis is reliable.