Q: Many CTAs are continually testing and searching for new ways to improve their trading or for new trading approaches. Much of this research is based on back testing. Is back testing overrated? How much data are necessary in order to have confidence in the results?
Dennis: Back testing is essential. The key question is what time periods should be tested. You can take the point of view that you should use all available data. How would you know that 19th century wheat prices are less relevant than today's wheat prices?
I believe that was the right answer in 1983. Today, I find it hard to be agnostic on the questions of whether markets have changed and how they may have changed. The trends of the 1970s occurred in the absence of computer-generated, trend-following algorithms. The markets of the last ten years are distorted by the onslaught of the technical trader.
As a result, I back test only the last ten years. In the unlikely event markets are as good and undistorted as they were in the good old days, I'll be happy to make less than I might if I had used that early data in an optimization. The trade-off is that if markets continue their perversity, I'm way more likely to have captured a sound way to handle these more difficult markets because I'm fitted only to them.
Eckhardt: I know of no way to validate conjectures concerning technical trading without back testing; however, this procedure is fraught with peril--we all know horror stories. Having adequate amounts of data for reliable inferences is only one of many problems facing the technical analyst, but it is as crucial as any. Statisticians tend to consider that more than about 30 instances constitutes a large sample statistic. For futures price research this is a recipe for disaster. The underlying probability distributions in this subject are so exotic and pathologic that those subtle techniques that statisticians use to squeeze significance out of sparse data are all decidedly out of place.
To make even moderately reliable judgments about a kind of trade, you need something like 300 instances. This is a minimum figure. I don't feel comfortable acting on research results unless I have several thousand instances.
Q: When you back test particular trading strategies, do you attempt to optimize? If so, how many parameters can you comfortably optimize before falling prey to curve-fitting?
Dennis: There is no escaping a priori decision making in research. There may be no absolutes, but some ideas come close. For example, it would be very hard to justify favoring long positions over short. And no amount of data will validate trading certain markets larger than others (liquidity considerations aside). Research that starts with concepts is much more likely to avoid curve-fitting than blind number-crunching.
Eckhardt: I prefer the term "over fitting". This makes clear that you can under fit to data. Those CTAs who boast that they never optimize are doing precisely that--they are grossly under fitting. The topic of fitting raises profound theoretical and practical questions, but it boils down to this: you want to fit to reproducible features and not to accidental ones.
To derive an estimate of how much overall good versus bad fitting your optimization labors have produced, the technique statisticians call cross-validation is quite helpful. Of course, this will not tell you where good or bad features originate or how to alter the mixture favorably. For this, it is crucial to assess the quality of degrees of freedom, not only their sheer number. A degree of freedom that has uniformly graduated significance over a manifold of possibilities is better than one that is quirky or that vacillates in meaning for slightly dissimilar cases. It is also important how selective the influence of a degree of freedom is.
A preset profit objective, for instance, is a much more suspect degree of freedom than, say, a look back. The latter presumably impinges on every trade, whereas the influence of the former tends to be concentrated on a few highly profitable scenarios.
The philosophy of science teaches that all observation is theory-laden; there is simply no way to analyze data in a theoretically neutral manner. In fitting to historical data, theoretically unsound procedures can lead to radically invalid conclusions. This is probably why back testing has developed such a bad reputation.