হোম/Roadmap/Chapter 7.01
Phase 7 · Chapter 7.01

Model Monitoring

Deploy করে fire-and-forget? Model নীরবে পচে যায়। Monitoring মানে production-এ model-এর pulse অনবরত মাপা।

Why

Software vs ML monitoring

Software fail করলে error throw করে — alert সহজ। Model fail করে নীরবে — same 200 OK, কিন্তু prediction ভুল। তাই extra layer চাই।

3 Layers

কী measure করবে

  • Operational: latency, QPS, error rate, CPU/mem — DevOps-এর জগৎ।
  • Statistical: input drift, output drift, prediction distribution।
  • Business / Outcome: conversion, revenue per prediction, user feedback।
Instrumentation

FastAPI + Prometheus

pythonproduction
from prometheus_client import Counter, Histogram, make_asgi_app
from fastapi import FastAPI, Request
import time

app = FastAPI()
app.mount("/metrics", make_asgi_app())

PRED_COUNT = Counter(
    "predictions_total", "Total predictions",
    ["model_version", "predicted_class"],
)
PRED_LATENCY = Histogram(
    "prediction_latency_seconds", "Inference latency",
    ["model_version"],
)
INPUT_FEATURE = Histogram(
    "input_petal_length", "Petal length distribution",
    buckets=[1, 2, 3, 4, 5, 6, 7],
)

@app.post("/predict")
def predict(req: IrisFeatures):
    INPUT_FEATURE.observe(req.petal_length)
    start = time.perf_counter()
    label = model.predict([req.to_array()])[0]
    PRED_LATENCY.labels("v1").observe(time.perf_counter() - start)
    PRED_COUNT.labels("v1", label).inc()
    return {"prediction": label, "model_version": "v1"}
What to Track

Per-layer checklist

textproduction
Operational
  - request rate, error rate, p50/p95/p99 latency
  - container CPU/mem, GPU utilization

Statistical
  - input feature mean/std/histogram per hour
  - prediction class distribution
  - confidence score histogram
  - drift score (PSI / KS)

Business
  - click-through, conversion
  - delayed label accuracy (when truth arrives)
  - user override / thumbs-down rate
The Stack

Tooling combinations

  • Open-source: Prometheus + Grafana + Loki + Evidently।
  • Managed: Datadog, New Relic — operational excellent।
  • ML-specific: Arize, WhyLabs, Fiddler, Aporia।
  • DIY: log to S3 → batch job → BigQuery → dashboard।
Anti-patterns

যা ভুল হয়

  • শুধু latency monitor করা — model accuracy চুপচাপ ঝরে যায়।
  • Ground truth-এর জন্য অপেক্ষা — proxy metric (input drift) আগে দেখো।
  • High-cardinality label (user_id) Prometheus-এ — series explode।
  • Alert fatigue — সব metric-এ alert বসিয়ে গুরুত্বপূর্ণ মিস।
Mini Project

Iris monitoring

  1. Iris API-তে উপরের ৩ metric যোগ করো।
  2. Docker Compose-এ Prometheus + Grafana চালু করো।
  3. 1000 dummy request পাঠাও, Grafana dashboard বানাও।
Takeaway

মনে রাখো

Monitoring = operational + statistical + business। তিনটার এক layer ও skip কোরো না।