April 06, 2007

Egor Kraev: 6) How Not to Estimate Standard Deviations

To finish off this series, let me cover a very common mistake in estimating standard deviation. Say you’ve got a daily series, going back 1.5 years (say 382 points), and you want to compute its one-year (252-day) rolling moving average. That gives you 130 points of rolling moving average, each representing the average of one year prior to a given date. Now, how do you estimate the standard deviation of the last point in that series?

   Did you think ‘take the std of the 130 points’? If so, congratulations - you just made the mistake I’m talkng about. The above approach would work if the errors in the 130 points were independent - but of course in reality they are heavily correlated. They are, after all, averages over largely overlapping periods. Therefore, the std of the 130 points will be much lower than the std of the last point.

   What should you do instead? Why, just treat that last point (and each other point in the rolling series) as a point estimate, and apply the methods I discussed earlier.

   Let’s illustrate this with the following MATLAB simulation. We generate 382 points with mean 2.0 and std of 1.0, compute the rolling averages, and repeat that simulation 1000 times. My previous posts suggest the std of the final point should be 1Confidence15x252) 0.063.   

for n=1:1000
    x=2+randn([1 382]); % generate iid numbers with mean=2 and std=1
    y=cumsum(x)/252;
    y=y(253:end)-y(1:130); % compute rolling average
    final(n)=y(end);
    rollstd(n)=std(y);
end
std(final) % the ’true’ standard deviation of the final point
mean(rollstd) % the average std of the 130 rolling points

The simulation gives the std of the final point of 0.067, and the average std of the rolling series of 0.024. So using the rolling series can lead us to underestimate our margin of error by almost a factor of 3!

   This is the last post in this series. The next series will be about MATLAB - why I love it, why I hate it, and what I miss most in it.

Send in your comments please!

Egor

April 04, 2007

Egor Kraev: 5) Confidence Intervals of Almost Anything at All

In my posts to date, I’ve covered a variety of simple confidence interval/standard deviation estimates for particular estimators. Now, I’ll present a more general, if slower, method that should work with virtually any estimator you want.

   Suppose you have a sample of N points (each point can be a tuple of real numbers) from some population, and you want to estimate a population parameter Θ. For that purpose, you have a function F that takes k points and returns an estimate Confidence14x. Suppose this function is a black box, that is all you know about it is you can feed k points into it and get an estimate back. How do you compute the confidence interval for such an estimate?

   If you had M k distinct points from your population, with M ’very large’, you could split them in M groups of k, feed each group to F, and thus get M independent estimates of Θ. The appropriate percentiles of that group of estimates would give you the confidence interval, and you could use the standard deviation of that group of estimates as a proxy for the std of your estimator.

   However, in reality one seldom has access to huge amounts of independent samples - what do you do then? You bootstrap. That is, you choose a ’very large’ M and draw M k random points from your sample of size N, with repetition (that is, drawing a particular point does not exclude that point from subsequent draws). Then proceed as above. Simple? Yes. Crude? Definitely. Does it make sense? I think so. After all, all you know about your population are those N samples you have - so using them as a proxy for your population seems to me both natural and meaningful.

   If you have other (prior) information about the distribution of Θ, you can then integrate it with the confidence intervals you just obtained using Bayesian tricks (if you don’t know what these are, drop me a line and I’ll do a post about that).

   In the next, final post of this series: how not to estimate standard deviations.

Egor

March 15, 2007

Egor Kraev: 4) Confidence intervals for Correlation

   

Lets now look at confidence intervals for correlation. The honest way of doing it can be found in Wikipedia under ‘Fisher transform’. Basically, the Fisher transform    

Confidence9x

is a way of transforming the [-1,1] interval onto the real line, such that the standard deviation of F(ρ) equals Confidence10x_1 if you have used N pairs of points to compute correlation. You get the confidence interval for the transform, transform it back, and there you are. Not too hard, but still too fiddly for my taste.

A simpler way is to notice that ρF(ρ) = Confidence11x and thus linearizing the inverse Fisher transform, we get the pleasingly simple formula    

Confidence12x

So for a correlation of 50% computed using a year’s worth of days, the standard deviation is about 0.75Confidence13x 0.06, so a 95% confidence interval is 24 percentage points wide.

Be careful for correlations very close to ±1 though, especially for small samples - the distribution there is so skewed that you might be better off generating asymmetric confidence intervals with the exact Fisher transform.

By the way, all this should work for any correlation estimate, be it Pearson, Spearman, or Kendall (ask Wikipedia if you don’t know what some of these are).

Send me more comments and questions!

Egor


March 09, 2007

Egor Kraev: 3) More fun with confidence intervals

So, how do we take a standard deviation of a standard deviation? Two formulas are helpful here, namely the results of applying a function before and after averaging. Firstly, for any function f, the mean value of f over the sample is f(x) = 1N-    n=1Nf(xN) and its standard deviation is    

          + -------------------
          ¦¦  1 ?N          ---2
std(f (x)) = ° --   f(xN )2 - f(x)
             N n=1

Apply that to f(x) = (x -x)2 and you get    

             +¦ -----------------------
             ¦° 1-?N      --4        2
std(var(x)) =   N    (xN -x)  - var(x)
                 n=1

 

   Secondly, for any reasonably smooth function F, std(F(x)) xF(x) std(x)

   Applying that to F(x) = vx-- and using std(x) = °var(x)- we get    

            std(var(x))
std(std(x)) =  2·std(x)

 

   Piece of cake, eh? But if it’s still too much hassle, and you are ready to pretend that the residual distribution is normal, you can always just use    

            svtd(x)-
std(std(x)) ˜   2N

   So, say, on a 30-vol stock, if you use one month (22 days’) worth of daily returns, the standard deviation of realized vol estimate is 30 4.5 vol points, thus the 95% confidence interval is 18 vol points wide. Something to be aware of, no?

   Posts forthcoming in this series: std of correlation, and how to compute std of absolutely anything.

Egor

March 05, 2007

Egor Kraev: 2) Confidence Intervals

One of the most basic points of estimation is that an estimate is quite meaningless without some kind of confidence interval around it. If I dont know what the confidence interval is, how can I rely on the number? Is the 1.205 really [1.204, 1.206], or [0.8, 1.5]? If the parameter estimated on last years data is 1.2 an on this years data 0.9, has it changed or is it just measurement error? I cant know what Im doing unless I can answer questions like that - and yet there is a surprising amount of ignorance around regarding confidence intervals. A recent interviewee (and not a bad one at that), when asked “does it even make sense to speak of a confidence interval for a correlation value”, could only suggest [-1, 1].

   So how do you estimate confidence intervals? To first approximation, which is usually good enough, its ±2 standard deviations around the value (unless your distribution is really skewed or has unusual tails). Lets look at a couple of cases I've often needed.

   The very basic one is the standard deviation of an average. Say youve got N points x1,,xN. Then the sample mean is x =   n=1NxN∕N and its standard deviation is    

        + ----------------
        ¦¦ ( 1 ?N   )   --
std(x) = °  --    x2N  - x2
           N  n=1

 

   Elementary? Perhaps. If so, what is the standard deviation of the standard deviation? This is not a strange question: if xn is a log returns series, std(x) is an estimate of its volatility – something we do want a confidence interval for. So, any suggestions on how to get it?

February 28, 2007

Egor Kraev: 1) The Moons of Jupiter and the Whale

Virtually any computational finance project at some point involves estimating some parameters from real data. Suppose that you know what data series you need, and that you have the luxury of timeseries going back for decades. How much should you use?

The major problem with a lot of estimation procedures has been summed up beautifully by Prof. Stephen Figlewski, in his treatise “Forecasting Volatility”:

”I asked [a statistician presenting a complex model at a conference] why we should expect prices observed in the market to behave according to the postulated pricing relationship when there were no actual market participants who understood the model or used it to do the trades that would push prices to the correct values. The answer was that he thought of an equilibrium pricing model for a financial market the way an astronomer thinks of the physical laws of motion that apply to a celestial body. Although the moons of Jupiter do not understand why they behave in a particular way, an outside observer who knows the laws of motion they follow can make very accurate predictions about where they will be thousands of years in the future. [...]

This is essentially the way classical statistics models an estimation problem. There is assumed to be some fixed but unknown underlying structure, or ”data generating process,” and the statistician has a set of observations produced by that process from which estimates of its parameters will be deduced. One aspect of this conceptual framework is that estimation and forecasting are very similar to each other. [...] A second feature of this framework is that one might hope to get arbitrarily good parameter estimates if one has a large enough data sample.

I believe the classical statistics framework fundamentally misrepresents the nature of a financial market [...]. Consider a different and more earthly estimation and prediction problem from that of the moons of Jupiter. Suppose we wanted to predict the movements of a whale, based on observing it over a period of time. Being a large animal with a lot of momentum, the movements of a whale must be fairly predictable over the short run simply by extrapolation. Yet we do not think of a whale as following a fixed and immutable pattern the way the moons of Jupiter do, at least not one that we could ever hope to understand completely.

In this case, we are not looking at a fixed structure with constant but unknown parameters, but rather at a system that evolves over time, and perhaps alters its behavior rapidly on occasion. [...] Prediction is possible only because the system usually evolves slowly and therefore our accumulated information from observing it only decays slowly. In this case, there may be an enormous difference between how well a model fits in-sample and how well it can forecast out-of-sample, and classical goodness of fit statistics may give little guidance about the latter. Also, [...] given that the data is generated by a structure that changes in unknown ways over time, expanding the estimation data set by adding observations from the distant past can easily make the estimates of the current state of the system worse rather than better. Finally, given that the structure does not remain constant, there is a great premium on models and estimation procedures that are robust against small changes. The more detailed and elaborate a model is, the better the fit one is generally able to obtain in-sample, but the faster the model tends to go off track when it is taken out-of-sample.[...]

As should be obvious, I believe a financial market is much more like a whale than like the moons of Jupiter.”

In this series of posts, I will be discussing parameter estimation from the above perspective - things that are simple, robust and adaptive.


Egor

C(omp) Search


WWW
compplusplus.com

C(omp) Community

Could this be you? Thijs van den Berg Dr. Jörg Kienitz Bjarne Stroustrup Dr. Egor Kraev Daniel Duffy Andrea Germani Umberto Cherubini Luigi Ballabio

More Members

Meet the Editorial Team



C(omp) Feeds


Want to know when new posts and features are made available? Sign up to receive email notifications by entering your email address:

Delivered by FeedBurner



Any Comments?

Send in questions for our authors and bloggers: comp@wiley.co.uk



C(omp) Events

1) 13-15 November: Quant Invest 2007
Russell Hotel, London
Key speakers include Sushil Wadhhwani, Paul Wilmott and Deborah Fuhr...

2) 30 November: CCCP Mathematical Finance Conference
Princeton University
Speakers include Paul Glasserman, Peter Carr and Rama Cont.

3) 10-14 December: Risk Minds 2007
President Wilson, Geneva

4) 12-15 December: Quantitative Methods in Finance
Manly Pacific Hotel
Sydney, Australia
Speakers include Mark Joshi

5) January 2008: Distance Learning for Financial Engineers
Computational and Quantitative Finance in C++
Datasim Education BV


Recent Forum Discussions


C(omp) Calendar

June 2008
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30