Concept drift is a phenomenon where data distribution shifts over time, impacting machine learning models’ performance. In this article, we discuss dependent data, where the target variable depends on one or more other variables, making concept drift detection challenging. We propose two new methods, ADF-KPSS and KCpD, to detect concept drift in dependent data. These methods use different approaches to identify change points and perform trend corrections. The article provides a detailed explanation of these methods, along with examples and comparisons with other existing methods.
Independent Data
Think of independent data as a car driving on a straight road. The distance traveled is the target variable, and the speed is the dependent variable. In this case, there is no concept drift since the distance traveled does not change over time.
Dependent Data
Now imagine the same car driving on a winding road where the distance traveled depends on the speed. The faster the car goes, the more distance it covers. This is an example of dependent data, where the target variable depends on one or more other variables. Concept drift can occur when the relationship between these variables changes over time, causing the data distribution to shift.
ADF-KPSS Method
The ADF-KPSS method combines the augmented Dickey-Fuller (ADF) test with the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test for trend detection. These tests are used to identify if there is a stationary or non-stationary time series. The method then performs trend corrections using the KPSS test.
KCpD Method
The KCpD method uses the number of found change points to determine concept drift. It first detects trends using the ADF test and then identifies changes in the data distribution using the KPSS test. The method finally calculates the number of change points and performs trend corrections.
Comparing Methods
We compare our proposed methods with other existing methods, including the seasonal-trend decomposition (STL) method, the moving average (MA) method, and the detective algorithm (DA). Our methods outperform these methods in terms of accuracy and computational efficiency.
Conclusion
In conclusion, concept drift detection in dependent data is a challenging task due to the complex relationships between variables. We proposed two new methods, ADF-KPSS and KCpD, which use different approaches to identify change points and perform trend corrections. These methods are more accurate and efficient than existing methods and can help practitioners detect concept drift in dependent data. By understanding the complexities of concept drift in dependent data, we can improve machine learning models’ performance and make better predictions in dynamic environments.