Extras: Top Income Shares
extras_top_income_shares.Rmd
Should the entire income (or wealth) distribution be available, we
can compute the top income shares. We do so by pasting the Pareto-like
tail (obtained from beyondpareto
) to the empirical
distribution.
The GB2 Benchmark
We use the GB2 again with \(X \sim F(.;a,b,p,q)\), so that \(\gamma = 1 / (bq)\).
It is well know (see e.g. Kleiber and Kotz (2003), equation 6.23) that for the GB2 distribution the income shares can simply be recovered from the appropriately re-parametrised CDF, i.e. \(1 -F(Q(t); a, b, p + 1/a, q - 1/a)\) for the top \((1-t) \times 100\)% where \(Q(t)\) denotes the \(t\)-quantile of the GB2 distribution. For the chosen parameters, this analytical solution yields a top 10% income share of 26.887%, and a top 1% share of 6.44%.
The next chunk generates samples from a GB2, computes the theoretical income shares.
beyondpareto_GB2helper
(toggle to un/fold)
program beyondpareto_GB2helper, eclass sortpreserve
*! part of the beyondpareto suite
*! Christian Schluter, Isabella Retter
*! To learn more about beyondpareto see the vignette at
*! https://christianschluter.github.io/beyondpareto/
version 11.2
syntax , ///
[b (real 5.18) ///
a (real 32754) ///
p (real 0.518) ///
q (real 0.509) ///
rv (string)]
ereturn scalar gamma_GB2 = 1 / (`b' * `q')
tempname income
if "`rv'" !="" {
qui gen double `income' = `a'*( (1/invibeta(`p',`q',runiform()))-1 )^(-1/`b')
qui rename `income' `rv'
}
tempname perc
matrix `perc' = (0.01, 0.05, 0.10, 0.15, 0.20, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)'
svmat `perc', name(___)
tempvar percentage tempshares shares pop_share_val
qui gen `percentage' = ___1
drop ___1
qui gen `tempshares' = `a'*( (1/invibeta(`p',`q',1-`percentage'))-1 )^(-1/`b')
qui gen `pop_share_val' = (1 - ibeta(`p' + 1 / `b' ,`q' - 1 / `b', (`tempshares'/`a')^`b'/(1+(`tempshares'/`a')^`b') ) )*100
qui replace `percentage' = (`percentage' * 100)
mkmat `percentage' `pop_share_val', matrix(`shares') nomissing
matrix colnames `shares' = "top [%]" "inc share"
ereturn matrix shares_GB2 = `shares'
end
.37927346
| top [%] inc share
-------------+---------------------
r1 | 1 6.444202
r2 | 5 17.49657
r3 | 10 26.88753
r4 | 15 34.5484
r5 | 20 41.24623
r6 | 25 47.29004
r7 | 30 52.84112
r8 | 40 62.81477
r9 | 50 71.60019
r10 | 60 79.39153
r11 | 70 86.26047
r12 | 80 92.1821
r13 | 90 96.99409
Next we estimate the tail index using beyondpareto
.
No sampling weights given. Using w=1.
Considering the top 19000 of 140000 values, starting from the first 18000 and testing 1001
> values for k base.
----------------------------------------------------------------------
Results | Ybase gamma S.E. [95% Conf. Interval]
----------+-----------------------------------------------------------
income | 59026.925 .38291478 .0031064 .3768262 .3890034
----------------------------------------------------------------------
Optimal k base: 18993
Estimating the top income shares
To this end, one can use the ado function
beyondpareto_topshares
.
Using the following results for gamma: .38291478, kbase: 18993 and Ybase: 59026.925
No sampling weights given. Using w=1.
e(shares)[3,2]
top % sample share in %
r1 1 6.5026712
r2 5 17.555607
r3 10 26.926321
Note that only top income shares above Ybase
are
computed.
Theoretical and sample top income shares are very close:
comp[3,4]
top [%] inc share top % sample share in %
r1 1 6.4442019 1 6.5026712
r2 5 17.496574 5 17.555607
r3 10 26.887531 10 26.926321
The Burr case
The Burr case allows analytical statements. But since the Burr distribution is nested within the GB2 distribution, an alternative computational approach is simply to exploit this feature directly.
The helper function beyondpareto_Burrhelper
codes the
direct computations.
program beyondpareto_Burrhelper, eclass sortpreserve
*! part of the beyondpareto suite
*! Christian Schluter, Isabella Retter
*! To learn more about beyondpareto see the vignette at
*! https://christianschluter.github.io/beyondpareto/
version 11.2
syntax, [gam (real 0.6) rho (real -2) rv(string)]
local a = -`rho' / `gam'
local q = -1 / `rho'
ereturn scalar gamma_Burr = `gam'
tempname income
if "`rv'" != "" {
qui gen double `income' = ((runiform())^(-1/`q') - 1)^(1/`a')
qui rename `income' `rv'
}
local aa = 1 + 1/`a'
local bb = `q' - 1/`a'
tempname perc
matrix `perc' = (0.01, 0.05, 0.10, 0.15, 0.20, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)'
svmat `perc', name(___)
tempvar percentage tempshares shares pop_share_val
qui gen `percentage' = ___1
drop ___1
qui gen `tempshares' = 1 - (`percentage')^(1/`q')
qui gen `pop_share_val' = (1-ibeta(`aa',`bb',`tempshares') ) * 100
qui replace `percentage' = (`percentage' * 100)
mkmat `percentage' `pop_share_val', matrix(`shares') nomissing
matrix colnames `shares' = "top [%]" "inc share"
ereturn matrix shares_Burr = `shares'
end
clear
qui set obs 20000
set seed 330033
beyondpareto_Burrhelper, rv(income)
di e(gamma_Burr)
matlist e(shares_Burr)
.6
| top [%] inc share
-------------+---------------------
r1 | 1 17.04568
r2 | 5 32.4441
r3 | 10 42.79416
r4 | 15 50.29764
r5 | 20 56.38168
r6 | 25 61.57469
r7 | 30 66.1389
r8 | 40 73.93049
r9 | 50 80.4332
r10 | 60 85.96696
r11 | 70 90.69451
r12 | 80 94.68317
r13 | 90 97.90305
In contrast to beyondpareto_Burrhelper
, the helper
function `beyondpareto_Burrhelper2
exploits the fact that
the Burr distribution is nested within the GB2 distribution.
program beyondpareto_Burrhelper2, eclass sortpreserve
*! part of the beyondpareto suite
*! Christian Schluter, Isabella Retter
*! To learn more about beyondpareto see the vignette at
*! https://christianschluter.github.io/beyondpareto/
version 11.2
syntax, [gam (real 0.6) rho (real -2)]
local a = -`rho' / `gam'
local q = -1 / `rho'
beyondpareto_GB2helper, a(1) b(`a') p(1.0) q(`q')
end
set seed 330033
beyondpareto_Burrhelper2
di e(gamma_GB2)
matrix comp = e(shares_Burr),e(shares_GB2)
matlist comp
.6
| top [%] inc share top [%] inc share
-------------+-------------------------------------------
r1 | 1 17.04568 1 17.04512
r2 | 5 32.4441 5 32.44411
r3 | 10 42.79416 10 42.79416
r4 | 15 50.29764 15 50.29765
r5 | 20 56.38168 20 56.38167
r6 | 25 61.57469 25 61.57469
r7 | 30 66.1389 30 66.1389
r8 | 40 73.93049 40 73.93048
r9 | 50 80.4332 50 80.4332
r10 | 60 85.96696 60 85.96696
r11 | 70 90.69451 70 90.69452
r12 | 80 94.68317 80 94.68317
r13 | 90 97.90305 90 97.90305
Finally, deploying beyondpareto_topshares
beyondpareto income, nrange(3000, 6000) rho(-0.5)
beyondpareto_topshares income
matrix comp2 = comp[1..4,1..2],e(shares)
matlist comp2
No sampling weights given. Using w=1.
Considering the top 6000 of 20000 values, starting from the first 3000 and testing 3001 va
> lues for k base.
----------------------------------------------------------------------
Results | Ybase gamma S.E. [95% Conf. Interval]
----------+-----------------------------------------------------------
income | 2.8018258 .60229767 .0113035 .5801428 .6244525
----------------------------------------------------------------------
Optimal k base: 3549
Using the following results for gamma: .60229767, kbase: 3549 and Ybase: 2.8018258
No sampling weights given. Using w=1.
| top [%] inc share top % s~e share i~%
-------------+-------------------------------------------
r1 | 1 17.04568 1 17.16358
r2 | 5 32.4441 5 32.55292
r3 | 10 42.79416 10 42.88548
r4 | 15 50.29764 15 50.38975
Some Theory
Assuming that the Pareto QQ plot becomes approximately linear from the \(k\)’s largest observation, \(X_{n-k+1,n} \equiv w_{min}\), the complete income (or wealth) distribution \(F\) is, for \(x > w_{min}\),
\[ F(x)=p + (1-p) \left(1-\left(\frac{x}{w_{min}} \right)^{-1/\gamma}\right) \] with \(p=\hat{F}(w_{min})\) being the empirical CDF of the survey data. Upon inversion, an upper quantile is \[ Q(u)= \left(\frac{1-u}{1-p}\right)^{-\gamma} w_{min}, \quad \quad (u>p). \]
In the unweighted case, the resulting well-known (see, e.g.,
Embrechts et al., 1997, p. 348) estimates of the tail of the wealth
distributions and the top quantile estimate are then, with \(1-p=k/n\), \[
1-\hat{F} (x) = \left(\frac{k}{n} \right) \left( \frac{x}{X_{n-k+1,n}}
\right) ^{-1/\hat{\gamma}} , \quad \quad
\hat{x}_u = \left( \frac{n}{k} (1-u) \right) ^{-\hat{\gamma}}
X_{n-k+1,n}.
\]
Taking into account the survey sampling weights \(\omega_i\) for household \(i\) enumerated from the poorest to the
richest, we have \[
p =\frac{\sum_{i=1}^{n-k} \omega_i}{\sum_{i=1}^{n} \omega_i} .
\]
Expected wealth then is simply \(E(X) = p E_{cens} + (1-p) \left(\frac{\alpha}{\alpha-1} \right) w_{min}\) with \(\alpha = 1/\gamma\) and \(E_{cens}\) being the empirical mean wealth in the survey data conditional on wealth not exceeding \(w_{min}\). The wealth share of the top \(t 100\%\) then is, with \(1-t=u>p\), \[ t^{1-1/\alpha} \left(\frac{\alpha}{\alpha-1} \right) w_{min} (1-p)^{1/\alpha}/ E(X). \] The so-called inverted Pareto coefficient \(E(W| W > w)/w\) with \(w\) equal to the top \(t\) quantile \(x_{1-t}\) and \(1-t=u>p\) is \(\alpha/(\alpha-1)\).
Session Information
STATA version: 18 running on MacOSX 14.1.0
which beyondpareto
*! beyondpareto.ado, version 1.0, 2024-06-14
*! Christian Schluter and Isabella Retter
*! To learn more about beyondpareto see the vignette at
*! https://christianschluter.github.io/beyondpareto/
which beyondpareto_topshares
*! part of the beyondpareto suite
*! Christian Schluter and Isabella Retter
*! To learn more about beyondpareto see the vignette at
*! https://christianschluter.github.io/beyondpareto/