Skip to contents

Consider a regularly varying cumulative distribution function FF , so for sufficiently large yy and γ>0\gamma >0

F(y)=1l(y)×y1/γ F(y)=1- l(y) \times y^{-1/ \gamma}

where ll denotes a slowly varying nuisance function that is constant asymptotically (l(ty)/l(y)=1l(ty)/l(y)=1 as yy \to \infty). γ>0\gamma>0 is called the extreme value index, and the Pareto or tail index (α1/γ\alpha \equiv 1/\gamma) is its reciprocal.

The objective is to estimate the parameter γ\gamma.

The rank-size regression and the Pareto QQ-plot

The rank-size regression estimator of the extreme value index measures the ultimate slope of the Pareto QQ-plot. This follows since the tail quantile function for above model is

U(y)=inf{t:Pr(Y>t)=1/y}=yγ×l̃(y) U(y)=\inf \{t: \Pr(Y>t)=1/y\}=y^\gamma \times \tilde{l}(y) where l̃(y)\tilde{l}(y) is another slowly varying function, which then implies logU(y)γlog(y)\log U(y) \sim \gamma \log (y) as yy \to \infty. Replacing these population quantities with their empirical counterparts gives the Pareto QQ-plot, and γ\gamma is its ultimate slope.

If the tail of the distribution were strictly Pareto, then the Pareto QQ-plot would be linear and a linear regression would estimate its slope coefficient. In the above model , it will become linear only eventually, and a slow decay of the nuisance functions l(y)l(y) and l̃(y)\tilde{l}(y) will then induce asymptotic distortions in the estimator of the slope coefficient. Below, such slow convergence will be considered in the form of second-order regular variation.

Let Y1,nYn,nY_{1,n} \leq \cdots \leq Y_{n,n} denote the order statistics of the given sample Y1,,YnY_1,\cdots, Y_n of, for example, wealth or income, and consider the kk upper order statistics. The Pareto QQ-plot quantile plot has coordinates (log(j/(n+1)),logYnj+1,n)j=1,,k (- \log(j/(n+1)),\log Y_{n-j+1,n}) _{j=1,\cdots,k}

where the relative rank is given by log(j/(n+1))- \log(j/(n+1)) and j=1j=1 for the highest upper-order statistic.

The OLS estimator of the slope parameter in the Pareto QQ-plot is obtained by minimizing the least squares criterion

j=1k(logYnj+1,nYnk,nγlogk+1j)2(1jk<n) \sum_{j=1} ^k \left( \log \frac{Y_{n-j+1,n}}{Y_{n-k,n}} - \gamma \log\frac{k+1}{j}\right)^2 \quad\quad (1 \leq j \leq k < n)

with respect to γ\gamma, which corresponds to a regression of log sizes on the log of relative ranks for sufficiently large values given by Ynk,nY_{n-k,n}. Note that Ynj+1,nYnk,n\frac{Y_{n-j+1,n}}{Y_{n-k,n}} is a normalized size equal to one at the threshold. The resulting OLS estimator is

γ̂=1kj=1klog(k+1j)[logYnj+1,nlogYnk,n]1kj=1k[logk+1j]2.\begin{eqnarray}\label{gamma_hat} \hat{\gamma} &=& \frac{ \frac{1}{k} \sum _{j=1} ^k \log \left(\frac{k+1}{j } \right) \left[ \log Y_{n-j+1,n} - \log Y_{n-k,n}\right]}{ \frac{1}{k} \sum _{j=1} ^k \left[ \log \frac{k+1}{j } \right]^2 }. \end{eqnarray}

Distributional theory

The distributional theory for γ̂\hat{\gamma} requires imposing more structure on the behavior of nuisance functions.

It is common practice in the extreme value literature to strengthen the first-order regular variation representation to second-order regular variation.

Recall that above CDF model has the equivalent (first-order regular variation) representation limtlogU(ty)logU(t)(a(t)/U(t)=logy \lim_{t \to \infty} \frac{\log U(ty) - \log U(t)}{(a(t) / U(t)} = \log y where aa is a positive norming function with the property a(t)/U(t)γa(t)/U(t) \to \gamma. We then assume that

limtlogU(ty)logU(t)a(t)/U(t)logyA(t)=Hγ,ρ(y) \lim_{t \to \infty} \frac{ \frac{\log U(ty) - \log U(t)}{a(t) / U(t)} - \log y}{A(t)} = H_{\gamma, \rho} (y)

for all y>0y >0, where Hγ>0,ρ<0(y)=1ρ(yρ1ρlogy)H_{\gamma > 0, \rho < 0} (y) =\frac{1}{\rho} (\frac{y^\rho -1}{\rho} - \log y) with ρ<0\rho < 0. The parameter ρ\rho is the so-called second-order parameter of regular variation, and A(t)A(t) is a rate function that is regularly varying with index ρ\rho, with A(t)0A(t) \to 0 as tt \to \infty. As ρ\rho falls in magnitude, the nuisance part of ll decays more slowly.

Many heavy-tailed distributions satisfy this second-order representation, such as members of the Hall class of distributions given by F(x)=1ax1/γ[1+bxβ+o(xβ)F(x)=1-a x^{-1/ \gamma} [1 + b x^\beta + o(x^\beta) for large xx, whose tail quantile function is U(x)=cxγ[1+dxρ+o(xρ)]U(x) = c x^\gamma [1 + d x^\rho + o(x^\rho)].

Schluter (2018) then demonstrates that as kk\to \infty and k/n0k/n \to 0, this estimator is weakly consistent, and if kA(n/k)0\sqrt{k} A(n/k) \to 0

k(γ̂γ)dN(0,54γ2). \sqrt{k} (\hat{\gamma} - \gamma) \to^d N \left( 0,\frac{5}{4} \gamma^2 \right).

Asymptotically, the estimator is thus unbiased if kA(n/k)0\sqrt{k} A(n/k) \to 0. But if this decay is slow, the estimator will suffer from a higher order distortion in finite samples given by

bk,n12γρ2ρ(1ρ)2A(n/k)(γ>0,ρ<0) b_{k,n} \equiv \frac{1}{2} \frac{\gamma}{\rho} \frac{2 - \rho}{(1 - \rho)^2} A(n/k) \quad \quad (\gamma > 0,\rho <0)

The choice of the threshold k for the upper order statistics

Any tail index estimator requires a choice of how many upper order statistics, given by kk, should be taken into account. This choice invariably introduces a trade-off between bias and precision of the estimator that is typically ignored by practitioners. However, this mean-variance trade-off suggests that it is unwise to set the threshold level mechanically (e.g., a wealth level of 1 million euros or 10% of the sample). By contrast, we determine this threshold level in a data-dependent manner by using the residuals in the rank-size regression in order to estimate non-parametrically the asymptotic mean-squared error (AMSE).

Following Beirlant et al. (1996) and Schluter (2018,2021), we observe that the expectation of the mean-weighted theoretical squared deviation 1kj=1kwj,kE(log(Ynj+1,nYnk,n)γlog(k+1j))2\begin{equation}\label{wSE} \frac{1}{k} \sum_{j=1} ^k w_{j,k} E \left( \log \left( \frac{Y_{n-j+1,n}}{Y_{n-k,n}} \right) - \gamma \log \left( \frac{k+1}{j} \right) \right)^2 \end{equation} equals, to first order, ckVar(γ̂)+dk(ρ)bk,n2c_k Var(\hat{\gamma}) + d_k (\rho) b_{k,n}^2 for some coefficients ckc_k depending only on kk, and dk(ρ)d_k(\rho) depending on kk and ρ<0\rho<0. For an explicit statement of the coefficients ckc_k and dkd_k, see Schluter (2018).

The procedure then consists of applying two different weighting schemes wj,k(i)w_{j,k} ^{(i)} (i=1,2i=1,2), estimating the corresponding two mean weighted theoretical deviations using the residuals of rank-size regression, and computing a linear combination thereof such that $ Var() + b_{k,n}^2 $ obtains. We proceed in this manner for weights wj,k(1)1w_{j,k} ^{(1)}\equiv 1 and wj,k(2)=j/(k+1)w_{j,k} ^{(2)} = j/(k+1) for a set of pre-selected values of ρ\rho.

In particular, based on the experiments reported in Schluter (2018, 2021), we have set a very conservative value of ρ=.5\rho=-.5 (implying a slow decay of the slowly varying nuisance function ll).

Complex surveys

Survey data often come with sampling weights to allow inference on the level of the population. The aforementioned theory and methods are easily adapted to this setting if we define the weighted empirical distribution function as

Fn(y)=1ni=1nwi1(Yiy) F_n (y) = \frac{1}{n} \sum_{i=1} ^n w_i 1(Y_i \leq y)

where wiw_i is the sampling weight associated with the ii’s observation YiY_i with i=1nwi=n\sum_{i=1} ^n w_i = n. Examples are a scheme of unity weights (wi1w_i \equiv 1 for all ii), or wi=wĩnw_i= \tilde{w_i} n with wĩ<1\tilde{w_i} < 1 and iwĩ=1\sum_i \tilde{w_i} = 1. Then, for the jj’s largest observation, we have Fn(Yn(j1),n)=ni=1jw(ij)nF_n (Y_{n-(j-1),n}) = \frac{n-\sum_{i=1} ^{j} w_{(i \leq j)}} {n} with the implicit notation convention that i=1jw(ij)\sum_{i=1}^j w_{(i \leq j)} denotes the summation of the survey weights corresponding to the jj largest upper-order statistics. The resulting Pareto QQ plot has coordinates

(log(i=1jw(ij)/(n+1)),logYnj+1,n)j=1,,k, (- \log(\sum_{i=1} ^j w_{(i \leq j)}/(n+1)), \log Y_{n-j+1,n}) _{j=1,\cdots,k}, and the resulting survey-weights-adjusted estimator of γ\gamma then becomes γ̂=1kj=1klog(i=1k+1w(ik+1)i=1jw(ij))[logYnj+1,nlogYnk,n]1kj=1k[logi=1k+1w(ik+1)i=1jw(ij)]2.\begin{eqnarray}\label{gamma_hat2} \hat{\gamma} &=& \frac{ \frac{1}{k} \sum _{j=1} ^k \log \left( \frac{\sum_{i=1} ^{k+1} w_{(i \leq k+1)}}{\sum_{i=1} ^j w_{(i \leq j)}} \right) \left[ \log Y_{n-j+1,n} - \log Y_{n-k,n}\right]}{ \frac{1}{k} \sum _{j=1} ^k \left[ \log \frac{\sum_{i=1} ^{k+1} w_{(i \leq k+1)}}{\sum_{i=1} ^j w_{(i \leq j)}} \right]^2 }. \end{eqnarray}