Transparency, Clarity, Replicability, and Reproducibility

Introduction

  • Papers and seminar presentations are exercises in persuasion.
  • Even if your paper(s) contain(s) brilliant ideas – if these are not presented to the highest professional standard, they will usually go unnoticed.
  • “The insights of your paper will first be judged by how you present them. If your paper is written in an unprofessional manner, [everything] will be viewed with […] skepticism’’ (Goldin and Katz.)
  • Transparency + Clarity $\to$ Better Research.

Replication and Reproducibility

  • The ability to replicate a research finding is central to the scientific paradigm, and helps convince us that a particular result is “real”.

  • In the physical sciences, other labs should be able to exactly reproduce what the original authors did and generate the same results
  • There is a widespread perception that much work in economics is not reproducible, which has contributed to what has become known as a credibility crisis.

  • Practices are (slowly) changing to help put economic research on a more scientific basis.

Examples of evidence on (un)reliability

  • One of the most important recent studies is Camerer et al. (2016), which repeated 18 behavioral economics lab experiments originally published between 2011 and 2014 in the AER and the QJE to assess their replicability.
  • The estimated effects were statistically significant with the same sign in 11 of the 18 replication studies (61.1%).

Chang and Li (2015) systematically tested the reproducibility of 67 macroeconomics papers:

  • Thirty five articles are published in journals with data and code sharing requirements, but Chang and Li could obtain data for only 28 of these (80%) from the journal archives
  • Of the 26 papers in journals without data sharing requirements, Chang and Li were unable to obtain 15 datasets (58%).
  • The overall replication success rate is 29 of 67 (43%) overall, or 29 of 61 (48%) among those using non-proprietary datasets, so roughly half.

Pure and statistical replication


But what exactly is replication?


Perhaps surprisingly, multiple definitions are used by different scholars, across fields and over time

Hamermesh (2007) proposes a distinction between:

Pure replication: an exercise to verify that the same results are obtained if one uses the same data and same methods can discover errors in the original analysis.

Statistical replication: an exercise using alternative methods and/or data to test the same hypothesis. Perhaps a better label is reproducibility or re-analysis.

Examples: Feldstein (1974, JPE)

Example of high profile pure replication controversy:

  • Leimer and Lesnoy (1982) found a coding error in the famous Feldstein (1974, JPE) paper claiming that Social Security expansion had reduced private savings by 50% (!), with potentially large adverse consequences for U.S. economic growth
  • In a pure replication exercise that corrected the error, the original result was over-turned.

Examples: Acemoglu et al. (2001, AER)

  • Albouy (2012) disputes the construction of historical data used in the famous Acemoglu, Johnson and Robinson (2001, AER) using historical settler mortality as an instrumental variable (IV) for rule of law, which concludes that “institutions” were the key determinant of comparative economic growth outcomes over hundreds of years.
  • Using Albouy’s modified data (which he claims corrects multiple errors), the IV first stage is weaker, implying that the method is not longer appropriate.

  • These two original papers were extremely influential, with potentially important implications for social science and public policy.

Example: Burnside and Dollar (2000, AER)

  • Another example is Easterly, Levine, and Roodman (2004), commenting on a high profile paper by Burnside and Dollar (2000, AER):

  • The comment extended the earlier data set to some additional countries and a few additional years, so that it falls somewhere between pure replication and scientific replication.
  • Its main result was to demonstrate that the original finding that the amount of foreign aid a developing nation receives interacts with good macroeconomic policy to induce growth but does nothing absent such policy did not seem to be robust to the addition of relatively few data points.
  • But replication work remains uncommon in the social sciences: the data from the median empirical paper published in a field journal is not shared at all (within a few years of publication), and even for articles in leading journals, the data is rarely accessed even 6-7 years later (and sometimes even then probably used for other purposes, such as graduate teaching).
  • Hamermesh (2007) proposes a number of explanations (e.g. incentives, institutionalising replications), as well as possible “remedies” going forward (change in social norms and practices).

Current practices have improved considerably, but there are still problems:

  • There is a widespread view that the quality of posted materials on the AER/AEJ sites is often quite low: the materials are never carefully checked by journal staff, and are often unusable (i.e., variable labels not included). Hence compliance is not what it ought to be.
  • Many other leading journals in Economics, and even more so in other social science fields, do not have a similar data and code posting requirement.
  • Hammermesh also links reproducibility to a perspective on the limitations of empirical research (which is often forgotten by authors seeking to advertise the perceived importance of their results):
"By far the most important justification for scientific replication in non- experimental studies is that one cannot expect econometric results produced for one time period or for one economy to carry over to another. Temporal change in econometric structure may alter the size and even the sign of the effects being estimated, so that the hypotheses we are testing might fail to be refuted with data from another time. This alteration might occur because institutions change, because incentives that are not accounted for in the model change and are not separable from the behaviour on which the model focuses, or, crucially, that even without these changes the behaviour is dependent on random shocks specific to the period over which an economy is observed."

Koenker and Zeileis (2009): Claerbout’s Principle:

  • “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.

    We view this as a desirable objective for econometric research.”

Literate programming

  • The technology to (purely) replicate fully and easily now exists in the form of notebooks (eg Jupyter for R or Stata, Rstudio for R) (Examples).

  • The process of generating the output reported in the paper can now be fully automated.

  • All this helps to make your research more credible.
  • Hammermesh (2007): “our ideas are unlikely to be taken seriously if our empirical research is not credible, so that the likelihood of positive payoffs to our research is enhanced if we maintain our data and records and ensure the possibility of replication.”

References

  • Christensen and Miguel (2018, JEL) “Transparency, Reproducibility, and the Credibility of Economics Research”
  • Hamermesh (2007, Can JEcon) “Viewpoint: Replication in economics”
  • Koenker and Zeileis (2009, JAP) “On Reproducible Econometric Research”