You are tasked with preparing ads data for AI models and historical analytics, where identifying longtail and outlier data points is crucial. The data requires near-real-time cleansing before AI model usage. Which approach should you adopt for data cleansing?