If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.
Where to get help
If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the
talk page of a
relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.
To improve your odds of a faster review, tag your draft with relevant
WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.
Please note that if the issues are not fixed, the draft will be declined again.
Submission declined on 10 October 2023 by Phuzion (
talk).
This draft's references do not show that the subject
qualifies for a Wikipedia article. In summary, the draft needs multiple published sources that are:
in-depth (not just passing mentions about the subject)
Make sure you add references that meet these criteria before resubmitting. Learn about
mistakes to avoid when addressing this issue. If no additional references exist, the subject is not suitable for Wikipedia.
Covariate Shift is a phenomenon in
machine learning and
statistics where the
distribution of input features (
covariates) changes between the
training and test datasets, usually affecting the performance of a machine learning model.[1] It is a common challenge faced in real-world applications, as models are often trained on historical data and expected to generalize to new, unseen data.[2] Covariate shift can lead to decreased model performance or even model failure,[3] as it violates the assumption that training and test data follow the same distribution.
Covariate shift is also referred to as domain shift and is a special case of dataset shift where only the covariates (inputs) are changing. That is, only changes. This is distinct from both label shift (where changes) and
concept drift (where changes).[4]
Mathematical definition
Pure covariate shift occurs when the distribution of input features changes between the training and test data, while the conditional distribution of the target variable given the input features remains the same.[5] Let denote the distribution of input features in the training data and denote the distribution in the test data. Covariate shift is defined as:
where represents the input features, represents the target variable, and is the feature space.[6]
Measuring covariate shift
Covariate shift is usually measured using
statistical distances,
divergences, and
two-sample tests. Some measurement methods work on continuous features, others on categorical features, and some on both. Additionally, some methods are capable of measuring
univariate drift while others are capable of measuring
multivariate drift.
Maximum mean discrepancy (MMD) (Continuous): MMD is a kernel-based method that measures the distance between two probability distributions by comparing the means of their samples in a reproducing kernel Hilbert space.[7] MMD provides a symmetric, non-negative measure of the difference between the training and test distributions, with higher values indicating a greater degree of covariate shift.
Wasserstein distance(Continuous and Categorical): Also known as the
Earth mover's distance, the Wasserstein distance quantifies the difference between two probability distributions by measuring the minimum cost required to transform one distribution into the other.[8] This metric provides a symmetric and non-negative measure of the divergence between the training and test distributions, with higher values indicating a more substantial degree of covariate shift.
Hellinger distance(Continuous and Categorical): The Hellinger distance is another symmetric measure of the difference between two probability distributions. It is derived from the
Bhattacharyya coefficient, a measure of the similarity between two probability distributions. The Hellinger distance is defined as the square root of the sum of the squared differences between the square roots of the probabilities in the two distributions. Like other statistical distances, the Hellinger distance is non-negative, with higher values indicating a more significant divergence between the training and test distributions.
Jensen-Shannon Distance (Continuous and Categorical): The Jensen-Shannon Distance is derived from the
JS divergence by applying a transformation to obtain a true distance metric that satisfies the properties of non-negativity, identity of indiscernibles, symmetry, and triangle inequality. Specifically, the Jensen-Shannon Distance is defined as the square root of the JS Divergence:
Kullback-Leibler (KL) Divergence(Continuous and Categorical): KL divergence is a measure of the difference between two probability distributions. It can be used to compare the training distribution q(x) and test distribution p(x), providing a non-negative value that quantifies the dissimilarity between the two distributions. A higher KL divergence value indicates a more significant degree of covariate shift. However, it is important to note that KL divergence is not symmetric, meaning the divergence from q(x) to p(x) may not be equal to the divergence from p(x) to q(x).
Jensen-Shannon (JS) Divergence(Continuous and Categorical): The JS Divergence is a symmetric measure of the difference between two probability distributions, derived from the Kullback-Leibler (KL) Divergence. It can be interpreted as the average of the KL divergences between each distribution and a mixture of the two distributions. The JS Divergence is non-negative, with higher values indicating a greater degree of dissimilarity between the training and test distributions. Unlike the KL Divergence, the JS Divergence is symmetric, providing a more consistent measure of the divergence between the distributions.
Kolmogorov-Smirnov test(Continuous and Categorical): The Kolmogorov-Smirnov test is a non-parametric statistical hypothesis test used to assess whether two samples come from the same underlying distribution. This test provides a p-value, which can be used to determine the presence of covariate shift. A small p-value (typically below a predetermined significance level, such as 0.05) indicates that the training and test distributions are significantly different, suggesting the presence of covariate shift.
Chi-Squared Test(Categorical): The Chi-Squared Test is a statistical method for detecting covariate shift in categorical features. It evaluates the association between the categorical variables representing the training and test distributions by comparing their observed frequencies in a contingency table to the expected frequencies under the assumption of independence. The test assesses the null hypothesis that there is no significant difference between the training and test distributions. If the null hypothesis is rejected, it suggests the presence of covariate shift. The Chi-Squared Test is applicable only for categorical variables and requires a sufficient sample size and minimum expected frequencies in the contingency table.
SciPy: SciPy is an open-source library for the
Python programming language, widely used for scientific computing and data analysis tasks. It provides tools for conducting statistical tests, such as the Chi-Squared Test and Kolmogorov-Smirnov Test and tools for calculating statistical distances and divergences, all of which can be utilized to detect the presence of covariate shift between training and test distributions.
NannyML: An open-source
Python library for model monitoring that has functionality for detecting
univariate and
multivariate distribution drift and estimating
machine learning model performance without ground truth labels. NannyML offers statistical tests, statistical distances and divergences.
Univariate vs. multivariate covariate shift
Covariate shift can occur in different forms depending on the number of features involved. Univariate covariate shift involves a single feature experiencing a change in distribution, whereas multivariate covariate shift can involve multiple features changing simultaneously or alterations in the correlation structure between features.
Univariate covariate shift
Univariate covariate shift occurs when the distribution of a single feature changes between the training and test datasets. As it involves only one dimension, univariate covariate shift is generally simpler to detect and address compared to its multivariate counterpart. Common techniques for detecting univariate covariate shift include statistical distances such as the Jensen-Shannon distance and
Wasserstein (earth mover's) distance.
Multivariate covariate shift
Multivariate covariate shift arises when the distributions of multiple features change simultaneously between the training and test datasets or when the correlation structure between features is altered. The latter case, where the marginal distributions of individual features remain unchanged but the dependencies among them change, can be particularly challenging to detect and handle. In multivariate covariate shift, the complexity of the distribution shift and potential interactions between features require more advanced techniques for detection.
To address multivariate covariate shift, techniques such as Maximum Mean Discrepancy (MMD) with appropriate
kernel functions that consider the relationships between multiple features can be employed.
Internal covariate shift
The term internal covariate shift was introduced in "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift."[9] Internal covariate shift occurs when the distribution of the inputs of a given hidden layer in a neural network shifts due to the parameters of a previous layer changing. It is hypothesized that
batch normalization can reduce internal covariate shift,[9] however this is contested.[10]
Difference between covariate shift and concept drift
Covariate shift and concept drift are two related but distinct phenomena in machine learning, both of which involve changes in the underlying data distribution. Covariate shift and concept drift can occur independently or simultaneously, and both can negatively impact the performance of machine learning models.
The main difference between covariate shift and concept drift is that covariate shift refers to changes in the distribution of input features between the training and test datasets, while concept drift involves changes in the relationship between input features and the target variable over time. In covariate shift, the underlying relationship between the features and the target remains constant, whereas, in concept drift, this relationship itself changes due to evolving processes or external factors.
^Quiñonero-Candela, Joaquin, ed. (2009). Dataset shift in machine learning. Neural information processing series. Cambridge, Mass.: MIT Press.
ISBN978-0-262-17005-5.
^Gretton, Arthur; Borgwardt, Karsten M.; Rasch, Malte J. M.; Scholkopf, Bernhard; Smola, Alexander (2012).
"A Kernel Two-Sample Test"(PDF). The Journal of Machine Learning Research. 13: 723–773.
^
abIoffe, Sergey; Szegedy, Christian (2015-03-02), Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,
arXiv:1502.03167
^Santurkar, Shibani; Tsipras, Dimitris; Ilyas, Andrew; Madry, Aleksander (2019-04-14), How Does Batch Normalization Help Optimization?,
arXiv:1805.11604