হোম/Roadmap/Chapter 7.03
Phase 7 · Chapter 7.03

Performance Metrics Tracking

যা measure কোরো না, তা improve করতে পারবে না। কিন্তু ভুল metric measure করলে ভুল direction-এ optimize করবে।

The Four Quadrants

ML metric framework

textproduction
              | Online (live)         | Offline (batch)
--------------|----------------------|------------------------
System        | latency, QPS, errors | resource cost, build time
ML / Business | CTR, accuracy*,      | holdout accuracy,
              | revenue/req          | fairness, calibration

* with delayed label
Latency 101

Average মিথ্যা বলে

  • Average 50ms শুনতে ভালো, কিন্তু p99 = 2s → ১% user-এর experience খারাপ।
  • সবসময় p50 / p95 / p99 report করো।
  • Tail latency-ই scaling এবং UX-এর সত্যিকারের বাধা।
Histogram

Prometheus latency tracking

pythonproduction
from prometheus_client import Histogram

LATENCY = Histogram(
    "predict_latency_seconds",
    "Inference latency",
    ["model_version"],
    buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
)

with LATENCY.labels("v1").time():
    pred = model.predict(x)
promqlproduction
# Grafana query
histogram_quantile(0.99,
  sum by (le) (rate(predict_latency_seconds_bucket{model_version="v1"}[5m]))
)
Delayed Accuracy

Truth এলে rolling metric

pythonproduction
# truth arrives in clickstream T+1h
def evaluate_predictions_with_label(window="1h"):
    df = (
        warehouse.query(f"""
          SELECT p.prediction, l.label
          FROM predictions p JOIN labels l USING (request_id)
          WHERE l.ts >= now() - INTERVAL '{window}'
        """)
    )
    acc = (df.prediction == df.label).mean()
    push_to_prometheus("rolling_accuracy_1h", acc)
MLflow Tracking

Experiment + production unified

pythonproduction
import mlflow

mlflow.set_experiment("iris-prod")

with mlflow.start_run(run_name=f"deploy-{version}"):
    mlflow.log_param("n_estimators", 200)
    mlflow.log_metric("offline_accuracy", 0.96)
    mlflow.log_metric("online_p95_ms", 38.2)
    mlflow.log_metric("rolling_accuracy_7d", 0.93)
    mlflow.sklearn.log_model(model, "model")
Business Tie-in

Tech metric ≠ value

  • Recommendation accuracy ↑ মানে revenue ↑ — সবসময় না। CTR + downstream conversion মাপো।
  • Fraud model recall ↑ কিন্তু false-positive ৩x → customer support cost বাড়ে।
  • Always pair ML metric with cost-aware business KPI
Pitfalls

Metric trap

  • Average latency dashboard — tail invisible।
  • Accuracy single number — class imbalance hide করে; precision/recall দেখো।
  • Survivorship bias — শুধু success log করা।
  • Goodhart's law — metric target হয়ে গেলে gaming শুরু।
Mini Project

Grafana dashboard

  1. Iris API-তে histogram latency add করো।
  2. Prediction count + class breakdown counter।
  3. Grafana-এ p50/p95/p99 panel + class distribution pie।
Takeaway

মনে রাখো

Metric = percentile, paired, purposeful। Single number-এ decision নিও না।