Skip to contents

Should the entire income (or wealth) distribution be available, we can compute the top income shares. We do so by pasting the Pareto-like tail (obtained from beyondpareto) to the empirical distribution.

The GB2 Benchmark

We use the GB2 again with \(X \sim F(.;a,b,p,q)\), so that \(\gamma = 1 / (bq)\).

It is well know (see e.g. Kleiber and Kotz (2003), equation 6.23) that for the GB2 distribution the income shares can simply be recovered from the appropriately re-parametrised CDF, i.e. \(1 -F(Q(t); a, b, p + 1/a, q - 1/a)\) for the top \((1-t) \times 100\)% where \(Q(t)\) denotes the \(t\)-quantile of the GB2 distribution. For the chosen parameters, this analytical solution yields a top 10% income share of 26.887%, and a top 1% share of 6.44%.

The next chunk generates samples from a GB2, computes the theoretical income shares.

beyondpareto_GB2helper (toggle to un/fold)
program beyondpareto_GB2helper, eclass sortpreserve
*! part of the beyondpareto suite
*! Christian Schluter, Isabella Retter
*! To learn more about beyondpareto see the vignette at
*!     https://christianschluter.github.io/beyondpareto/
    version 11.2
    
    syntax  ,         ///
    [b (real 5.18)    ///
     a (real 32754)   ///
     p (real 0.518)   ///
     q (real 0.509)   ///
       rv (string)]
      
    ereturn scalar gamma_GB2 = 1 / (`b' *  `q')
      tempname income
    if "`rv'" !=""  {
           qui gen double `income' =  `a'*( (1/invibeta(`p',`q',runiform()))-1  )^(-1/`b')
           qui rename `income' `rv'
    }
    tempname perc   
    matrix `perc' = (0.01, 0.05, 0.10, 0.15, 0.20, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)'
    svmat `perc', name(___)
    tempvar percentage tempshares shares pop_share_val
    qui gen `percentage' = ___1
    drop ___1
    qui gen `tempshares' =  `a'*( (1/invibeta(`p',`q',1-`percentage'))-1  )^(-1/`b')
    qui gen `pop_share_val' = (1 - ibeta(`p' + 1 / `b' ,`q' - 1 / `b', (`tempshares'/`a')^`b'/(1+(`tempshares'/`a')^`b') ) )*100
    qui replace `percentage' = (`percentage' * 100)
    mkmat `percentage' `pop_share_val', matrix(`shares') nomissing
    matrix colnames `shares' = "top [%]" "inc share"        
    ereturn matrix shares_GB2 = `shares'
    
end  
qui set obs 140000
set seed 330033
beyondpareto_GB2helper, rv(income)
di e(gamma_GB2)
.37927346
matrix temp =  e(shares_GB2)
matrix comp = temp[1..3,1..2]
matrix drop temp

matlist e(shares_GB2)
             |   top [%]  inc share 
-------------+---------------------
          r1 |         1   6.444202 
          r2 |         5   17.49657 
          r3 |        10   26.88753 
          r4 |        15    34.5484 
          r5 |        20   41.24623 
          r6 |        25   47.29004 
          r7 |        30   52.84112 
          r8 |        40   62.81477 
          r9 |        50   71.60019 
         r10 |        60   79.39153 
         r11 |        70   86.26047 
         r12 |        80    92.1821 
         r13 |        90   96.99409 

Next we estimate the tail index using beyondpareto.

beyondpareto income, nrange(18000, 19000) rho(-0.5)
No sampling weights given. Using w=1.
Considering the top 19000 of 140000 values, starting from the first 18000 and testing 1001
>  values for k base.

----------------------------------------------------------------------
  Results |      Ybase       gamma        S.E.  [95% Conf.   Interval]
----------+-----------------------------------------------------------
   income |  59026.925   .38291478    .0031064    .3768262    .3890034
----------------------------------------------------------------------
Optimal k base: 18993

Estimating the top income shares

To this end, one can use the ado function beyondpareto_topshares.

beyondpareto_topshares income
Using the following results for gamma: .38291478, kbase: 18993 and Ybase: 59026.925
No sampling weights given. Using w=1.
matrix list e(shares)
e(shares)[3,2]
    top % sample    share in %
r1             1     6.5026712
r2             5     17.555607
r3            10     26.926321

Note that only top income shares above Ybase are computed.

Theoretical and sample top income shares are very close:

matrix comp = comp,e(shares)
matrix list comp 
comp[3,4]
         top [%]     inc share  top % sample    share in %
r1             1     6.4442019             1     6.5026712
r2             5     17.496574             5     17.555607
r3            10     26.887531            10     26.926321

The Burr case

The Burr case allows analytical statements. But since the Burr distribution is nested within the GB2 distribution, an alternative computational approach is simply to exploit this feature directly.

The helper function beyondpareto_Burrhelper codes the direct computations.

program beyondpareto_Burrhelper, eclass sortpreserve
*! part of the beyondpareto suite
*! Christian Schluter, Isabella Retter
*! To learn more about beyondpareto see the vignette at
*!     https://christianschluter.github.io/beyondpareto/
    version 11.2
    
    syntax, [gam (real 0.6) rho (real -2) rv(string)]
    local a = -`rho' / `gam'
    local q = -1 / `rho'
    ereturn scalar gamma_Burr = `gam'
    tempname income
    if "`rv'" != "" {
       qui gen double `income' = ((runiform())^(-1/`q') - 1)^(1/`a')  
       qui rename `income' `rv'
      }
    local aa = 1 + 1/`a'
    local bb = `q' - 1/`a'
    tempname perc   
    matrix `perc' = (0.01, 0.05, 0.10, 0.15, 0.20, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)'
    svmat `perc', name(___)
    tempvar percentage tempshares shares pop_share_val
    qui gen `percentage' = ___1
    drop ___1

    qui gen `tempshares' = 1 - (`percentage')^(1/`q')
    qui gen `pop_share_val' = (1-ibeta(`aa',`bb',`tempshares') ) * 100
    
    qui replace `percentage' = (`percentage' * 100)
    mkmat `percentage' `pop_share_val', matrix(`shares') nomissing
    matrix colnames `shares' = "top [%]" "inc share"        
    ereturn matrix shares_Burr = `shares'
end
clear 
qui set obs 20000
set seed 330033

beyondpareto_Burrhelper, rv(income)
di e(gamma_Burr)
matlist e(shares_Burr)
.6

             |   top [%]  inc share 
-------------+---------------------
          r1 |         1   17.04568 
          r2 |         5    32.4441 
          r3 |        10   42.79416 
          r4 |        15   50.29764 
          r5 |        20   56.38168 
          r6 |        25   61.57469 
          r7 |        30    66.1389 
          r8 |        40   73.93049 
          r9 |        50    80.4332 
         r10 |        60   85.96696 
         r11 |        70   90.69451 
         r12 |        80   94.68317 
         r13 |        90   97.90305 

In contrast to beyondpareto_Burrhelper, the helper function `beyondpareto_Burrhelper2 exploits the fact that the Burr distribution is nested within the GB2 distribution.

program beyondpareto_Burrhelper2, eclass sortpreserve
*! part of the beyondpareto suite
*! Christian Schluter, Isabella Retter
*! To learn more about beyondpareto see the vignette at
*!     https://christianschluter.github.io/beyondpareto/
    version 11.2
    
    syntax, [gam (real 0.6) rho (real -2)]
    local a = -`rho' / `gam'
    local q = -1 / `rho'
    beyondpareto_GB2helper, a(1) b(`a')  p(1.0) q(`q')
end
set seed 330033
beyondpareto_Burrhelper2
di e(gamma_GB2)

matrix comp = e(shares_Burr),e(shares_GB2)
matlist comp
.6

             |   top [%]  inc share    top [%]  inc share 
-------------+-------------------------------------------
          r1 |         1   17.04568          1   17.04512 
          r2 |         5    32.4441          5   32.44411 
          r3 |        10   42.79416         10   42.79416 
          r4 |        15   50.29764         15   50.29765 
          r5 |        20   56.38168         20   56.38167 
          r6 |        25   61.57469         25   61.57469 
          r7 |        30    66.1389         30    66.1389 
          r8 |        40   73.93049         40   73.93048 
          r9 |        50    80.4332         50    80.4332 
         r10 |        60   85.96696         60   85.96696 
         r11 |        70   90.69451         70   90.69452 
         r12 |        80   94.68317         80   94.68317 
         r13 |        90   97.90305         90   97.90305 

Finally, deploying beyondpareto_topshares

beyondpareto income, nrange(3000, 6000) rho(-0.5)
beyondpareto_topshares income

matrix comp2 = comp[1..4,1..2],e(shares)
matlist comp2
No sampling weights given. Using w=1.
Considering the top 6000 of 20000 values, starting from the first 3000 and testing 3001 va
> lues for k base.

----------------------------------------------------------------------
  Results |      Ybase       gamma        S.E.  [95% Conf.   Interval]
----------+-----------------------------------------------------------
   income |  2.8018258   .60229767    .0113035    .5801428    .6244525
----------------------------------------------------------------------
Optimal k base: 3549

Using the following results for gamma: .60229767, kbase: 3549 and Ybase: 2.8018258
No sampling weights given. Using w=1.

             |   top [%]  inc share  top % s~e  share i~% 
-------------+-------------------------------------------
          r1 |         1   17.04568          1   17.16358 
          r2 |         5    32.4441          5   32.55292 
          r3 |        10   42.79416         10   42.88548 
          r4 |        15   50.29764         15   50.38975 

Some Theory

Assuming that the Pareto QQ plot becomes approximately linear from the \(k\)’s largest observation, \(X_{n-k+1,n} \equiv w_{min}\), the complete income (or wealth) distribution \(F\) is, for \(x > w_{min}\),

\[ F(x)=p + (1-p) \left(1-\left(\frac{x}{w_{min}} \right)^{-1/\gamma}\right) \] with \(p=\hat{F}(w_{min})\) being the empirical CDF of the survey data. Upon inversion, an upper quantile is \[ Q(u)= \left(\frac{1-u}{1-p}\right)^{-\gamma} w_{min}, \quad \quad (u>p). \]

In the unweighted case, the resulting well-known (see, e.g., Embrechts et al., 1997, p. 348) estimates of the tail of the wealth distributions and the top quantile estimate are then, with \(1-p=k/n\), \[ 1-\hat{F} (x) = \left(\frac{k}{n} \right) \left( \frac{x}{X_{n-k+1,n}} \right) ^{-1/\hat{\gamma}} , \quad \quad \hat{x}_u = \left( \frac{n}{k} (1-u) \right) ^{-\hat{\gamma}} X_{n-k+1,n}. \]
Taking into account the survey sampling weights \(\omega_i\) for household \(i\) enumerated from the poorest to the richest, we have \[ p =\frac{\sum_{i=1}^{n-k} \omega_i}{\sum_{i=1}^{n} \omega_i} . \]

Expected wealth then is simply \(E(X) = p E_{cens} + (1-p) \left(\frac{\alpha}{\alpha-1} \right) w_{min}\) with \(\alpha = 1/\gamma\) and \(E_{cens}\) being the empirical mean wealth in the survey data conditional on wealth not exceeding \(w_{min}\). The wealth share of the top \(t 100\%\) then is, with \(1-t=u>p\), \[ t^{1-1/\alpha} \left(\frac{\alpha}{\alpha-1} \right) w_{min} (1-p)^{1/\alpha}/ E(X). \] The so-called inverted Pareto coefficient \(E(W| W > w)/w\) with \(w\) equal to the top \(t\) quantile \(x_{1-t}\) and \(1-t=u>p\) is \(\alpha/(\alpha-1)\).

Session Information

display "STATA version: `c(stata_version)' running on `c(os)' `c(osdtl)' "
STATA version: 18 running on MacOSX 14.1.0 
which beyondpareto
*! beyondpareto.ado, version 1.0, 2024-06-14
*! Christian Schluter and Isabella Retter
*! To learn more about beyondpareto see the vignette at  
*!     https://christianschluter.github.io/beyondpareto/
which beyondpareto_topshares
*! part of the beyondpareto suite
*! Christian Schluter and Isabella Retter
*! To learn more about beyondpareto see the vignette at
*!     https://christianschluter.github.io/beyondpareto/