Top-coded
Encyclopedia
In econometrics
Econometrics
Econometrics has been defined as "the application of mathematics and statistical methods to economic data" and described as the branch of economics "that aims to give empirical content to economic relations." More precisely, it is "the quantitative analysis of actual economic phenomena based on...

 and statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, a top-coded dataset is one for which the upper bound is not known. This is often done to preserve the anonymity of people participating in the survey (for example, if a survey included a person with wealth of $51 billion, it would not be anonymous because people would know it is Bill Gates
Bill Gates
William Henry "Bill" Gates III is an American business magnate, investor, philanthropist, and author. Gates is the former CEO and current chairman of Microsoft, the software company he founded with Paul Allen...

).

Example: Top-coding of wealth

notes>
id age income
1 26 24778 exact value
2 32 26750 exact value
3 45 26780 exact value
4 32 30000+ top coded
5 45 30000+ top coded


Jacob S. Hacker and Paul Pierson argue that the practice of top-coding, or capping the reported maximum value on tax returns ostensibly to protect the earner's anonymity, complicates the analysis of the distribution of wealth in the United States.

Implications for ordinary least squares
Ordinary least squares
In statistics, ordinary least squares or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear...

  • If the lower bound of the top-coded group is used as a regressor value (30000 in the example above), OLS is biased and inconsistent.
  • The top-coded group can be omitted from the regression entirely. Provided there are no systematic differences between the omitted group and the included groups, OLS is consistent and unbiased.
  • The Tobit
    Tobit model
    The Tobit model is a statistical model proposed by James Tobin to describe the relationship between a non-negative dependent variable y_i and an independent variable x_i....

    procedure is robust to top coding, and gives unbiased estimates.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK