Theory • beyondpareto

Consider a regularly varying cumulative distribution function $F$ , so for sufficiently large $y$ and $\gamma >0$

$F(y)=1- l(y) \times y^{-1/ \gamma}$

where $l$ denotes a slowly varying nuisance function that is constant asymptotically ( $l(ty)/l(y)=1$ as $y \to \infty$ ). $\gamma>0$ is called the extreme value index, and the Pareto or tail index ( $\alpha \equiv 1/\gamma$ ) is its reciprocal.

The objective is to estimate the parameter $\gamma$ .

The rank-size regression and the Pareto QQ-plot

The rank-size regression estimator of the extreme value index measures the ultimate slope of the Pareto QQ-plot. This follows since the tail quantile function for above model is

$U(y)=\inf \{t: \Pr(Y>t)=1/y\}=y^\gamma \times \tilde{l}(y)$ where $\tilde{l}(y)$ is another slowly varying function, which then implies $\log U(y) \sim \gamma \log (y)$ as $y \to \infty$ . Replacing these population quantities with their empirical counterparts gives the Pareto QQ-plot, and $\gamma$ is its ultimate slope.

If the tail of the distribution were strictly Pareto, then the Pareto QQ-plot would be linear and a linear regression would estimate its slope coefficient. In the above model , it will become linear only eventually, and a slow decay of the nuisance functions $l(y)$ and $\tilde{l}(y)$ will then induce asymptotic distortions in the estimator of the slope coefficient. Below, such slow convergence will be considered in the form of second-order regular variation.

Let $Y_{1,n} \leq \cdots \leq Y_{n,n}$ denote the order statistics of the given sample $Y_1,\cdots, Y_n$ of, for example, wealth or income, and consider the $k$ upper order statistics. The Pareto QQ-plot quantile plot has coordinates $(- \log(j/(n+1)),\log Y_{n-j+1,n}) _{j=1,\cdots,k}$

where the relative rank is given by $- \log(j/(n+1))$ and $j=1$ for the highest upper-order statistic.

The OLS estimator of the slope parameter in the Pareto QQ-plot is obtained by minimizing the least squares criterion

$\sum_{j=1} ^k \left( \log \frac{Y_{n-j+1,n}}{Y_{n-k,n}} - \gamma \log\frac{k+1}{j}\right)^2 \quad\quad (1 \leq j \leq k < n)$

with respect to $\gamma$ , which corresponds to a regression of log sizes on the log of relative ranks for sufficiently large values given by $Y_{n-k,n}$ . Note that $\frac{Y_{n-j+1,n}}{Y_{n-k,n}}$ is a normalized size equal to one at the threshold. The resulting OLS estimator is

$\begin{eqnarray}\label{gamma_hat} \hat{\gamma} &=& \frac{ \frac{1}{k} \sum _{j=1} ^k \log \left(\frac{k+1}{j } \right) \left[ \log Y_{n-j+1,n} - \log Y_{n-k,n}\right]}{ \frac{1}{k} \sum _{j=1} ^k \left[ \log \frac{k+1}{j } \right]^2 }. \end{eqnarray}$

Distributional theory

The distributional theory for $\hat{\gamma}$ requires imposing more structure on the behavior of nuisance functions.

It is common practice in the extreme value literature to strengthen the first-order regular variation representation to second-order regular variation.

Recall that above CDF model has the equivalent (first-order regular variation) representation $\lim_{t \to \infty} \frac{\log U(ty) - \log U(t)}{(a(t) / U(t)} = \log y$ where $a$ is a positive norming function with the property $a(t)/U(t) \to \gamma$ . We then assume that

$\lim_{t \to \infty} \frac{ \frac{\log U(ty) - \log U(t)}{a(t) / U(t)} - \log y}{A(t)} = H_{\gamma, \rho} (y)$

for all $y >0$ , where $H_{\gamma > 0, \rho < 0} (y) =\frac{1}{\rho} (\frac{y^\rho -1}{\rho} - \log y)$ with $\rho < 0$ . The parameter $\rho$ is the so-called second-order parameter of regular variation, and $A(t)$ is a rate function that is regularly varying with index $\rho$ , with $A(t) \to 0$ as $t \to \infty$ . As $\rho$ falls in magnitude, the nuisance part of $l$ decays more slowly.

Many heavy-tailed distributions satisfy this second-order representation, such as members of the Hall class of distributions given by $F(x)=1-a x^{-1/ \gamma} [1 + b x^\beta + o(x^\beta)$ for large $x$ , whose tail quantile function is $U(x) = c x^\gamma [1 + d x^\rho + o(x^\rho)]$ .

Schluter (2018) then demonstrates that as $k\to \infty$ and $k/n \to 0$ , this estimator is weakly consistent, and if $\sqrt{k} A(n/k) \to 0$

$\sqrt{k} (\hat{\gamma} - \gamma) \to^d N \left( 0,\frac{5}{4} \gamma^2 \right).$

Asymptotically, the estimator is thus unbiased if $\sqrt{k} A(n/k) \to 0$ . But if this decay is slow, the estimator will suffer from a higher order distortion in finite samples given by

$b_{k,n} \equiv \frac{1}{2} \frac{\gamma}{\rho} \frac{2 - \rho}{(1 - \rho)^2} A(n/k) \quad \quad (\gamma > 0,\rho <0)$

The choice of the threshold k for the upper order statistics

Any tail index estimator requires a choice of how many upper order statistics, given by $k$ , should be taken into account. This choice invariably introduces a trade-off between bias and precision of the estimator that is typically ignored by practitioners. However, this mean-variance trade-off suggests that it is unwise to set the threshold level mechanically (e.g., a wealth level of 1 million euros or 10% of the sample). By contrast, we determine this threshold level in a data-dependent manner by using the residuals in the rank-size regression in order to estimate non-parametrically the asymptotic mean-squared error (AMSE).

Following Beirlant et al. (1996) and Schluter (2018,2021), we observe that the expectation of the mean-weighted theoretical squared deviation $\begin{equation}\label{wSE} \frac{1}{k} \sum_{j=1} ^k w_{j,k} E \left( \log \left( \frac{Y_{n-j+1,n}}{Y_{n-k,n}} \right) - \gamma \log \left( \frac{k+1}{j} \right) \right)^2 \end{equation}$ equals, to first order, $c_k Var(\hat{\gamma}) + d_k (\rho) b_{k,n}^2$ for some coefficients $c_k$ depending only on $k$ , and $d_k(\rho)$ depending on $k$ and $\rho<0$ . For an explicit statement of the coefficients $c_k$ and $d_k$ , see Schluter (2018).

The procedure then consists of applying two different weighting schemes $w_{j,k} ^{(i)}$ ( $i=1,2$ ), estimating the corresponding two mean weighted theoretical deviations using the residuals of rank-size regression, and computing a linear combination thereof such that $ Var() + b_{k,n}^2 $ obtains. We proceed in this manner for weights $w_{j,k} ^{(1)}\equiv 1$ and $w_{j,k} ^{(2)} = j/(k+1)$ for a set of pre-selected values of $\rho$ .

In particular, based on the experiments reported in Schluter (2018, 2021), we have set a very conservative value of $\rho=-.5$ (implying a slow decay of the slowly varying nuisance function $l$ ).

Complex surveys

Survey data often come with sampling weights to allow inference on the level of the population. The aforementioned theory and methods are easily adapted to this setting if we define the weighted empirical distribution function as

$F_n (y) = \frac{1}{n} \sum_{i=1} ^n w_i 1(Y_i \leq y)$

where $w_i$ is the sampling weight associated with the $i$ ’s observation $Y_i$ with $\sum_{i=1} ^n w_i = n$ . Examples are a scheme of unity weights ( $w_i \equiv 1$ for all $i$ ), or $w_i= \tilde{w_i} n$ with $\tilde{w_i} < 1$ and $\sum_i \tilde{w_i} = 1$ . Then, for the $j$ ’s largest observation, we have $F_n (Y_{n-(j-1),n}) = \frac{n-\sum_{i=1} ^{j} w_{(i \leq j)}} {n}$ with the implicit notation convention that $\sum_{i=1}^j w_{(i \leq j)}$ denotes the summation of the survey weights corresponding to the $j$ largest upper-order statistics. The resulting Pareto QQ plot has coordinates

$(- \log(\sum_{i=1} ^j w_{(i \leq j)}/(n+1)), \log Y_{n-j+1,n}) _{j=1,\cdots,k},$ and the resulting survey-weights-adjusted estimator of $\gamma$ then becomes $\begin{eqnarray}\label{gamma_hat2} \hat{\gamma} &=& \frac{ \frac{1}{k} \sum _{j=1} ^k \log \left( \frac{\sum_{i=1} ^{k+1} w_{(i \leq k+1)}}{\sum_{i=1} ^j w_{(i \leq j)}} \right) \left[ \log Y_{n-j+1,n} - \log Y_{n-k,n}\right]}{ \frac{1}{k} \sum _{j=1} ^k \left[ \log \frac{\sum_{i=1} ^{k+1} w_{(i \leq k+1)}}{\sum_{i=1} ^j w_{(i \leq j)}} \right]^2 }. \end{eqnarray}$