হোম/Roadmap/Chapter 7.02
Phase 7 · Chapter 7.02

Data Drift Detection

পৃথিবী বদলায়, user behavior বদলায়, sensor calibrate বদলায়। Training-time আর serving-time data যত আলাদা, model তত ভুল।

3 Kinds of Drift

শব্দগুলো গুলিয়ে ফেলো না

  • Covariate (Feature) drift: P(X) বদলেছে — input distribution shift।
  • Label / Prior drift: P(Y) বদলেছে — class balance shift।
  • Concept drift: P(Y|X) বদলেছে — same input, different correct answer।
Detection Methods

Statistical tools

  • PSI (Population Stability Index): bin-wise distribution compare। <0.1 OK, 0.1–0.25 watch, >0.25 alert।
  • KS test: numeric continuous distribution — p-value < 0.05 = drifted।
  • Chi-square: categorical feature।
  • Wasserstein / Earth Mover: robust continuous metric।
  • JS divergence: bounded, symmetric KL।
PSI Implementation

হাতে লেখা version

pythonproduction
import numpy as np

def psi(expected, actual, buckets=10):
    breakpoints = np.percentile(expected, np.linspace(0, 100, buckets + 1))
    breakpoints[0], breakpoints[-1] = -np.inf, np.inf

    e_counts, _ = np.histogram(expected, breakpoints)
    a_counts, _ = np.histogram(actual,   breakpoints)

    e_pct = np.clip(e_counts / e_counts.sum(), 1e-6, None)
    a_pct = np.clip(a_counts / a_counts.sum(), 1e-6, None)

    return np.sum((a_pct - e_pct) * np.log(a_pct / e_pct))

score = psi(train_petal_length, prod_petal_length)
print(f"PSI={score:.3f}")  # > 0.25 → retrain alert
Evidently

Production-grade report এক command-এ

pythonproduction
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=train_df, current_data=prod_df)
report.save_html("drift_report.html")
# OR push as JSON to monitoring backend
report_dict = report.as_dict()
Detecting Concept Drift

Label ছাড়া কীভাবে

  • Delayed feedback — কয়েক দিন পর truth এলে rolling accuracy দেখো।
  • Proxy: prediction confidence histogram drift।
  • Domain classifier: train vs prod sample-কে binary classify করো; AUC > 0.7 = covariate drift।
Pipeline

Daily drift job

pythonproduction
# airflow DAG / cron
1. pull last 24h prod logs from warehouse
2. compute PSI per feature vs reference snapshot
3. if any feature PSI > 0.25 → Slack alert + ticket
4. if avg confidence drop > 10% → trigger retrain DAG
5. write metrics to Prometheus pushgateway
Pitfalls

False alarms ও missed signals

  • Reference window স্থির রাখো — না হলে gradual drift মিস হবে।
  • Seasonal pattern — দিন/সপ্তাহ/মৌসুমি cycle drift ভাবলে spam alert।
  • Small sample-এ KS unstable — minimum N rule রাখো।
  • Drift মানেই retrain না — performance drop verify করো আগে।
Mini Project

Drift simulation

  1. Iris-এর petal_length-এ artificial shift (×1.5) করো।
  2. PSI compute, threshold check।
  3. Evidently HTML report generate করো।
Takeaway

মনে রাখো

Drift detection = early warning system। Accuracy পড়ার আগেই signal দেয় — সেটাই এর মূল্য।