Concept drift

What is concept drift?

In machine learning, predictive modeling, and data mining, concept drift is the gradual change in the relationships between input data and output data in the underlying problem. The ‘concept’ in question is the unknown and hidden relationship between input and output variables.

“A difficult problem with learning in many real-world domains is that the concept of interest may depend on some hidden context, not given explicitly in the form of predictive features. Often the cause of change is hidden, not known a priori, making the learning task more complicated.”
— The problem of concept drift: definitions and related work, 2004.

It’s a phenomenon where the statistical properties of the variable which the model is trying to predict change over time. The context changes without the model being aware about the change. It occurs when the patterns that predictive models learned are no longer valid.

Usually, either the law behind the data changes, so the model built on past data cannot be used anymore or the assumptions that the model made based on past data need to be revised based on current data.

What causes concept drift?

The changing relationships between input and output data are what cause concept drift. The properties of the target variables could shift between the static training data and real-world dynamic data. Machine learning models are generally created from training datasets in local or offline environments. After they’re deployed to the real world, it’s possible for the relationships between input and output data to shift and change dynamically. Because of this, the model might not be able to correctly understand emerging trends, which causes a degradation in the model’s effectiveness.

When concept drift occurs in machine learning, the model is left unable to reach the level of accuracy that it achieved during the training process.

As the environment changes, the rules and patterns that the model recognizes might just become obsolete. When the real-world environment of live data deviates from the training environment, the effectiveness of the model will be substantially reduced.

While creating predictive and forecasting models, you should plan for concept drift and actively monitor it. You can also retrain your models on a regular basis to keep up with the evolving and changing data.

How do concept drift changes occur?

The manner in which concept drift takes place plays a role in deciding how the drift should be handled. The concept drift may involve changes that occur:

Suddenly

There is an abrupt shift from an old concept to a new one. For example, the lockdowns triggered by the pandemic caused instantaneous changes in the behavioral patterns of populations around the world.

Gradually

There may be incremental changes from one concept to another as new information comes to light and new concepts emerge. The quality decline is usually because of changes in external factors.

Seasonal

These are recurring changes. For example, buying patterns change during festive seasons and revert to normal after those seasons end.

‍

How to find concept drift?

When concept drift takes place, the performance of mining techniques like classification and clustering gets impacted in a negative manner. This is because the chances of misclassification rise. Here are a few methods that can be used for the purpose of concept drift detection in machine learning:

SPC / Sequential Analysis Concept Drift detectors

These detectors check whether the predictive model’s error-rate is in-control. They alert you when the error-rate is out-of-control.

Some methods under this family include

Drift Detection Method (DDM) and Early Drift Detection method (EDDM):
EDDM is more effective when it comes to detecting incremental drift. But this method is more sensitive to noise.
LFR (Sequential Analysis)
Page Hinkley test (PHT):
This is usually utilized to monitor change detection in the average of a Gaussian signal.

Monitoring drift between different time windows

This family of concept drift detection methods involves comparing a a sliding detection window to a fixed reference window to check whether the distributions match. The test dataset is generally used as a reference point for the input received from the sliding window.

Statistical Distribution Properties: Employing distribution metrics like KL- Divergence, Total Variation Distance & Hellinger Distance. These are very effective when used with input features.
Adaptive Sliding Window (ADWIN): For the window W, in the case where the two sub-windows W0 and W1 are big enough and their means differ from each other, you can drop W0 (the older one) and note a change in distribution.

Contextual approaches

There are several techniques that can be used to check whether the current data holds different signals than the training data, inclusive of the judicial use of some algorithms and their properties. Some include:

Adding a timestamp as a feature to a decision tree based classifier. The tree will splice the data on the timestamp if it suggests a different context.

Training a large stable model on the large dataset as well as smaller reactive model on a new smaller window. If the reactive model performs better than the stable model, the latest window might hold new concepts.

‍

How do you fix concept drift?

Here are a few ways to address concept drift:

Static model

Doing nothing and assuming that the data does not change is the most common way of addressing concept drift. If you suspect that your dataset is facing concept drift, you can monitor the skill of your static model to detect concept drift and even use the skill as a baseline to compare to any changes that you make.

Periodically Re-Fit

This involves periodically adding more recent historical data. You may need to back-test your model to decide how much historical data to include while retesting. Sometimes your best bet would be to only use a little recent historical data to understand the new relationships between input data and output data.

Periodically update

This involves updating the model fit by using a sample of the most recent historical data. It is best for machine learning algorithms like regression algorithms and neural networks that make use of weights or coefficients.

Weight Data

This involves using a weighting that is inversely proportional to the age of the data, allowing your model to prioritize the most recent data.

Learn The Change

This method leaves the static model untouched, while a new model are fit on more recent data.

Detect and Choose Model

In some domain abrupt changes have occurred in the past and you may want to check for those in the future. You can design systems to identify changes and choose a different model to make predictions.

Data Preparation

Sometimes data may be expected to change over time (especially in time series forecasting). The data can be prepared to eliminate the systematic changes to the data over time, like trends and seasonality by differencing.

What’s the difference between concept drift and data drift?

While concept drift is a phenomenon where the statistical properties of the class variable (the target variable), data drift is about the change of the properties of the independent variables.