Theory
theory.Rmd
Consider a regularly varying cumulative distribution function , so for sufficiently large and
where denotes a slowly varying nuisance function that is constant asymptotically ( as ). is called the extreme value index, and the Pareto or tail index () is its reciprocal.
The objective is to estimate the parameter .
The rank-size regression and the Pareto QQ-plot
The rank-size regression estimator of the extreme value index measures the ultimate slope of the Pareto QQ-plot. This follows since the tail quantile function for above model is
where is another slowly varying function, which then implies as . Replacing these population quantities with their empirical counterparts gives the Pareto QQ-plot, and is its ultimate slope.
If the tail of the distribution were strictly Pareto, then the Pareto QQ-plot would be linear and a linear regression would estimate its slope coefficient. In the above model , it will become linear only eventually, and a slow decay of the nuisance functions and will then induce asymptotic distortions in the estimator of the slope coefficient. Below, such slow convergence will be considered in the form of second-order regular variation.
Let denote the order statistics of the given sample of, for example, wealth or income, and consider the upper order statistics. The Pareto QQ-plot quantile plot has coordinates
where the relative rank is given by and for the highest upper-order statistic.
The OLS estimator of the slope parameter in the Pareto QQ-plot is obtained by minimizing the least squares criterion
with respect to , which corresponds to a regression of log sizes on the log of relative ranks for sufficiently large values given by . Note that is a normalized size equal to one at the threshold. The resulting OLS estimator is
Distributional theory
The distributional theory for requires imposing more structure on the behavior of nuisance functions.
It is common practice in the extreme value literature to strengthen the first-order regular variation representation to second-order regular variation.
Recall that above CDF model has the equivalent (first-order regular variation) representation where is a positive norming function with the property . We then assume that
for all , where with . The parameter is the so-called second-order parameter of regular variation, and is a rate function that is regularly varying with index , with as . As falls in magnitude, the nuisance part of decays more slowly.
Many heavy-tailed distributions satisfy this second-order representation, such as members of the Hall class of distributions given by for large , whose tail quantile function is .
Schluter (2018) then demonstrates that as and , this estimator is weakly consistent, and if
Asymptotically, the estimator is thus unbiased if . But if this decay is slow, the estimator will suffer from a higher order distortion in finite samples given by
The choice of the threshold k for the upper order statistics
Any tail index estimator requires a choice of how many upper order statistics, given by , should be taken into account. This choice invariably introduces a trade-off between bias and precision of the estimator that is typically ignored by practitioners. However, this mean-variance trade-off suggests that it is unwise to set the threshold level mechanically (e.g., a wealth level of 1 million euros or 10% of the sample). By contrast, we determine this threshold level in a data-dependent manner by using the residuals in the rank-size regression in order to estimate non-parametrically the asymptotic mean-squared error (AMSE).
Following Beirlant et al. (1996) and Schluter (2018,2021), we observe that the expectation of the mean-weighted theoretical squared deviation equals, to first order, for some coefficients depending only on , and depending on and . For an explicit statement of the coefficients and , see Schluter (2018).
The procedure then consists of applying two different weighting schemes (), estimating the corresponding two mean weighted theoretical deviations using the residuals of rank-size regression, and computing a linear combination thereof such that $ Var() + b_{k,n}^2 $ obtains. We proceed in this manner for weights and for a set of pre-selected values of .
In particular, based on the experiments reported in Schluter (2018, 2021), we have set a very conservative value of (implying a slow decay of the slowly varying nuisance function ).
Complex surveys
Survey data often come with sampling weights to allow inference on the level of the population. The aforementioned theory and methods are easily adapted to this setting if we define the weighted empirical distribution function as
where is the sampling weight associated with the ’s observation with . Examples are a scheme of unity weights ( for all ), or with and . Then, for the ’s largest observation, we have with the implicit notation convention that denotes the summation of the survey weights corresponding to the largest upper-order statistics. The resulting Pareto QQ plot has coordinates
and the resulting survey-weights-adjusted estimator of then becomes