Phase 7 · Chapter 7.03

Performance Metrics Tracking

যা measure কোরো না, তা improve করতে পারবে না। কিন্তু ভুল metric measure করলে ভুল direction-এ optimize করবে।

The Four Quadrants

ML metric framework

textproduction

              | Online (live)         | Offline (batch)
--------------|----------------------|------------------------
System        | latency, QPS, errors | resource cost, build time
ML / Business | CTR, accuracy*,      | holdout accuracy,
              | revenue/req          | fairness, calibration

* with delayed label

Latency 101

Average মিথ্যা বলে

Average 50ms শুনতে ভালো, কিন্তু p99 = 2s → ১% user-এর experience খারাপ।
সবসময় p50 / p95 / p99 report করো।
Tail latency-ই scaling এবং UX-এর সত্যিকারের বাধা।

Histogram

Prometheus latency tracking

pythonproduction

from prometheus_client import Histogram

LATENCY = Histogram(
    "predict_latency_seconds",
    "Inference latency",
    ["model_version"],
    buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
)

with LATENCY.labels("v1").time():
    pred = model.predict(x)

promqlproduction

# Grafana query
histogram_quantile(0.99,
  sum by (le) (rate(predict_latency_seconds_bucket{model_version="v1"}[5m]))
)

Delayed Accuracy

Truth এলে rolling metric

pythonproduction

# truth arrives in clickstream T+1h
def evaluate_predictions_with_label(window="1h"):
    df = (
        warehouse.query(f"""
          SELECT p.prediction, l.label
          FROM predictions p JOIN labels l USING (request_id)
          WHERE l.ts >= now() - INTERVAL '{window}'
        """)
    )
    acc = (df.prediction == df.label).mean()
    push_to_prometheus("rolling_accuracy_1h", acc)

MLflow Tracking

Experiment + production unified

pythonproduction

import mlflow

mlflow.set_experiment("iris-prod")

with mlflow.start_run(run_name=f"deploy-{version}"):
    mlflow.log_param("n_estimators", 200)
    mlflow.log_metric("offline_accuracy", 0.96)
    mlflow.log_metric("online_p95_ms", 38.2)
    mlflow.log_metric("rolling_accuracy_7d", 0.93)
    mlflow.sklearn.log_model(model, "model")

Business Tie-in

Tech metric ≠ value

Recommendation accuracy ↑ মানে revenue ↑ — সবসময় না। CTR + downstream conversion মাপো।
Fraud model recall ↑ কিন্তু false-positive ৩x → customer support cost বাড়ে।
Always pair ML metric with cost-aware business KPI।

Pitfalls

Metric trap

Average latency dashboard — tail invisible।
Accuracy single number — class imbalance hide করে; precision/recall দেখো।
Survivorship bias — শুধু success log করা।
Goodhart's law — metric target হয়ে গেলে gaming শুরু।

Mini Project

Grafana dashboard

Iris API-তে histogram latency add করো।
Prediction count + class breakdown counter।
Grafana-এ p50/p95/p99 panel + class distribution pie।

Takeaway

মনে রাখো

Metric = percentile, paired, purposeful। Single number-এ decision নিও না।

← Roadmap-এ ফিরুন পরবর্তী: Logging & Alert Systems