Distributed Concept Drift Detection for Efficient Model Adaptation with Big Data Streams

Published in 2023 IEEE International Conference on Big Data, 2023

We propose a distributed drift detection workflow based on the DDM algorithm paired with a Random Forest predictive model, implemented in Apache Spark. The workflow adaptively updates models as drifts are detected, leveraging Pandas UDFs for efficient distribution across multiple worker nodes. Experiments on two real-world datasets demonstrate positive results in Speedup, Scaleup, and detection delay compared to single-node implementations.

Recommended citation: I. Whitehouse, R. Yepez-Lopez, R. Corizzo. (2023). "Distributed Concept Drift Detection for Efficient Model Adaptation with Big Data Streams." IEEE Big Data 2023.
Download Paper