In
statistics and
econometrics, set identification (or partial identification) extends the concept of
identifiability (or "point identification") in
statistical models to environments where the model and the distribution of observable variables are not sufficient to determine a unique value for the model
parameters, but instead constrain the parameters to lie in a
strict subset of the parameter space. Statistical models that are set (or partially) identified arise in a variety of settings in
economics, including
game theory and the
Rubin causal model. Unlike approaches that deliver point-identification of the model parameters, methods from the literature on partial identification are used to obtain set estimates that are valid under weaker modelling assumptions.[1]
Partial identification continues to be a major theme in research in econometrics.
Powell (2017) named partial identification as an example of theoretical progress in the econometrics literature, and
Bonhomme & Shaikh (2017) list partial identification as “one of the most prominent recent themes in econometrics.”
Definition
Let denote a vector of latent variables, let denote a vector of observed (possibly endogenous) explanatory variables, and let denote a vector of observed endogenous outcome variables. A structure is a pair , where represents a collection of conditional distributions, and is a structural function such that for all realizations of the random vectors . A model is a collection of admissible (i.e. possible) structures .[2][3]
Let denote the collection of conditional distributions of consistent with the structure . The admissible structures and are said to be observationally equivalent if .[2][3] Let denotes the true (i.e. data-generating) structure. The model is said to be point-identified if for every we have . More generally, the model is said to be set (or partially) identified if there exists at least one admissible such that . The identified set of structures is the collection of admissible structures that are observationally equivalent to .[4]
In most cases the definition can be substantially simplified. In particular, when is independent of and has a known (up to some finite-dimensional parameter) distribution, and when is known up to some finite-dimensional vector of parameters, each structure can be characterized by a finite-dimensional parameter vector . If denotes the true (i.e. data-generating) vector of parameters, then the identified set, often denoted as , is the set of parameter values that are observationally equivalent to .[4]
Example: missing data
This example is due to
Tamer (2010). Suppose there are two
binary random variables, Y and Z. The econometrician is interested in . There is a
missing data problem, however: Y can only be observed if .
Bonhomme, Stephane; Shaikh, Azeem (2017). "Keeping the econ in econometrics:(micro-) econometrics in the journal of political economy". The Journal of Political Economy. 125 (6): 1846–1853.
doi:
10.1086/694620.
Frisch, Ragnar (1934). Statistical Confluence Analysis by means of Complete Regression Systems. University Institute of Economics, Oslo.
Manski, Charles (1989). "Anatomy of the Selection Problem". The Journal of Human Resources. 24 (3): 343–360.
doi:
10.2307/145818.
Manski, Charles (1990). "Nonparametric Bounds on Treatment Effects". The American Economic Review. 80 (2): 319–323.
JSTOR2006592.
Marschak, Jacob; Andrews, Williams (1944). "Random Simultaneous Equations and the Theory of Production". Econometrica. 12 (3/4). The Econometric Society: 143–205.
doi:
10.2307/1905432.
Powell, James (2017). "Identification and Asymptotic Approximations: Three Examples of Progress in Econometric Theory". Journal of Economic Perspectives. 31 (2): 107–124.
doi:
10.1257/jep.31.2.107.