Phase 7 · Chapter 7.01
Model Monitoring
Deploy করে fire-and-forget? Model নীরবে পচে যায়। Monitoring মানে production-এ model-এর pulse অনবরত মাপা।
Why
Software vs ML monitoring
Software fail করলে error throw করে — alert সহজ। Model fail করে নীরবে — same 200 OK, কিন্তু prediction ভুল। তাই extra layer চাই।
3 Layers
কী measure করবে
- Operational: latency, QPS, error rate, CPU/mem — DevOps-এর জগৎ।
- Statistical: input drift, output drift, prediction distribution।
- Business / Outcome: conversion, revenue per prediction, user feedback।
Instrumentation
FastAPI + Prometheus
pythonproduction
from prometheus_client import Counter, Histogram, make_asgi_app
from fastapi import FastAPI, Request
import time
app = FastAPI()
app.mount("/metrics", make_asgi_app())
PRED_COUNT = Counter(
"predictions_total", "Total predictions",
["model_version", "predicted_class"],
)
PRED_LATENCY = Histogram(
"prediction_latency_seconds", "Inference latency",
["model_version"],
)
INPUT_FEATURE = Histogram(
"input_petal_length", "Petal length distribution",
buckets=[1, 2, 3, 4, 5, 6, 7],
)
@app.post("/predict")
def predict(req: IrisFeatures):
INPUT_FEATURE.observe(req.petal_length)
start = time.perf_counter()
label = model.predict([req.to_array()])[0]
PRED_LATENCY.labels("v1").observe(time.perf_counter() - start)
PRED_COUNT.labels("v1", label).inc()
return {"prediction": label, "model_version": "v1"}What to Track
Per-layer checklist
textproduction
Operational
- request rate, error rate, p50/p95/p99 latency
- container CPU/mem, GPU utilization
Statistical
- input feature mean/std/histogram per hour
- prediction class distribution
- confidence score histogram
- drift score (PSI / KS)
Business
- click-through, conversion
- delayed label accuracy (when truth arrives)
- user override / thumbs-down rateThe Stack
Tooling combinations
- Open-source: Prometheus + Grafana + Loki + Evidently।
- Managed: Datadog, New Relic — operational excellent।
- ML-specific: Arize, WhyLabs, Fiddler, Aporia।
- DIY: log to S3 → batch job → BigQuery → dashboard।
Anti-patterns
যা ভুল হয়
- শুধু latency monitor করা — model accuracy চুপচাপ ঝরে যায়।
- Ground truth-এর জন্য অপেক্ষা — proxy metric (input drift) আগে দেখো।
- High-cardinality label (
user_id) Prometheus-এ — series explode। - Alert fatigue — সব metric-এ alert বসিয়ে গুরুত্বপূর্ণ মিস।
Mini Project
Iris monitoring
- Iris API-তে উপরের ৩ metric যোগ করো।
- Docker Compose-এ Prometheus + Grafana চালু করো।
- 1000 dummy request পাঠাও, Grafana dashboard বানাও।
Takeaway
মনে রাখো
Monitoring = operational + statistical + business। তিনটার এক layer ও skip কোরো না।