What Metrics Best Compare Risk Adjusted Returns of Two Trading Strategies

What Metrics Best Compare Risk Adjusted Returns of Two Trading Strategies

Comparing two trading strategies is not as simple as checking which one made more money. Raw returns hide instability, hidden leverage, drawdown depth, execution friction, and volatility sensitivity. A strategy that shows lower returns on paper may actually be far superior once risk is factored in. This is why professional traders always compare strategies through risk adjusted metrics rather than absolute P&L.

If you evaluate two strategies without normalising for risk, sample size, and volatility, you are effectively comparing outcomes, not behaviour. The goal is to judge which system delivers more reliable and more efficient returns per unit of risk.

Many retail traders use Stratzy at the idea-generation stage, because it gives structured strategy frameworks and directional behaviours. But once you convert a strategy into rules and run full backtests, risk adjusted metrics become the deciding factor.

What follows is a detailed list of the most dependable metrics used to compare two strategies fairly, including how each one works, what it actually measures, and how to interpret the results.

Sharpe Ratio: Baseline Risk Adjusted Comparison

Sharpe Ratio standardises return relative to volatility.
It answers one key question: how much return does the strategy produce per unit of total volatility?

Formula
(Strategy Return – Risk-Free Rate) / Standard Deviation of Returns

Interpretation

  • Sharpe < 1.0 → Too much noise for the return generated

  • Sharpe 1.0–1.5 → Acceptable for most retail systems

  • Sharpe 1.5–2.0 → Strong

  • Sharpe > 2.0 → Very rare in live markets

Sharpe is ideal for comparing two strategies if both have normal return distributions. But it penalises upside and downside volatility equally, which is not always realistic for options and trend strategies.

Sortino improves on Sharpe by counting only downside volatility as risk.

This makes it more reliable when comparing strategies that:

  • Have occasional large winners

  • Show non-normal return distributions

  • Sell or buy options

  • Use breakout or momentum-based entries

If Strategy A has a higher Sharpe but Strategy B has a higher Sortino, B is usually the safer option because it delivers returns with fewer adverse swings.

Rule of thumb: Sortino > 1.5 is considered healthy.

Maximum Drawdown (MDD): Psychological and Structural Risk

Drawdown is a non-negotiable metric during comparison. Two strategies with identical returns can be worlds apart in terms of how painful they are to trade.

What MDD Measures
Largest peak-to-trough fall in equity during the test period.

Why It Matters

  • A strategy with 40 percent MDD and high returns is structurally unstable.

  • A strategy with 15–20 percent MDD and moderate returns is often more scalable.

Drawdown determines whether a strategy is tradeable in the real world, not just on paper.

Calmar Ratio: Return Efficiency vs Drawdown

Calmar Ratio ties return directly to drawdown, making it one of the most objective ways to compare two systems.

Formula
CAGR / Maximum Drawdown

Why It’s Useful

  • Compares long-term return against worst-case historical loss

  • The higher the ratio, the more efficient the strategy

  • Helps decide which strategy scales better with capital

If Strategy A has higher CAGR but lower Calmar than Strategy B, B is typically the stronger long-term candidate.

Profit Factor: How Efficiently the Strategy Converts Trades into Net Profit

Profit Factor (PF) compares the total profit of winners to the total losses of losers.

Interpretation

  • PF < 1 → Losing system

  • PF 1.1–1.3 → Borderline

  • PF 1.5 → Decent

  • PF 2+ → Strong, depending on trade frequency

Profit Factor becomes especially important when comparing systems with very different win rates.

Example:

  • Strategy A wins 80 percent of the time but PF is 1.1 → weak structural edge

  • Strategy B wins 35 percent of the time but PF is 2.1 → significantly more robust

Average Trade (Expectancy): The Strategy’s Core Edge

Average Trade is the mean profit or loss per trade after costs.

Expectancy answers a basic quantitative question:
Does the strategy have a repeatable edge after accounting for slippage and execution friction?

This is crucial when comparing two systems that produce similar Sharpe or CAGR numbers.
A strategy with low expectancy is fragile and usually deteriorates in live conditions.

Volatility of Returns: Stability Over Time

To compare two strategies fairly, you must check their return variance.

Lower volatility for the same return → better strategy.

High volatility systems inflate Sharpe and CAGR during favourable regimes and collapse outside them.

This metric helps you spot strategies that depend heavily on specific market environments.

Monthly and Yearly Return Consistency

Two strategies with similar CAGR can behave completely differently month-by-month.

Look for:

  • Fewer loss clusters
  • Stable contribution across years
  • No dependency on one or two unusually strong periods

Consistency is often a better predictor of long-term survival than high annual returns.

Putting It All Together: How to Compare Two Strategies Correctly

When evaluating two systems, build the comparison on three layers:

Risk-Adjusted Layer

  • Sharpe
  • Sortino
  • Calmar

Behaviour Layer

  • Max drawdown
  • Monthly return consistency
  • Volatility of returns

Structural Layer

  • Profit Factor
  • Expectancy
  • Win/loss distribution

A strategy that ranks higher across these three layers is usually the more sustainable choice.

Where Stratzy Helps Before You Compare Metrics

Stratzy offers simple, actionable frameworks you can study before automating your own version. It creates a clean pathway from manual ideas to algo trading execution.