Phase 7 · Chapter 7.03
Performance Metrics Tracking
যা measure কোরো না, তা improve করতে পারবে না। কিন্তু ভুল metric measure করলে ভুল direction-এ optimize করবে।
The Four Quadrants
ML metric framework
textproduction
| Online (live) | Offline (batch)
--------------|----------------------|------------------------
System | latency, QPS, errors | resource cost, build time
ML / Business | CTR, accuracy*, | holdout accuracy,
| revenue/req | fairness, calibration
* with delayed labelLatency 101
Average মিথ্যা বলে
- Average 50ms শুনতে ভালো, কিন্তু p99 = 2s → ১% user-এর experience খারাপ।
- সবসময় p50 / p95 / p99 report করো।
- Tail latency-ই scaling এবং UX-এর সত্যিকারের বাধা।
Histogram
Prometheus latency tracking
pythonproduction
from prometheus_client import Histogram
LATENCY = Histogram(
"predict_latency_seconds",
"Inference latency",
["model_version"],
buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
)
with LATENCY.labels("v1").time():
pred = model.predict(x) promqlproduction
# Grafana query
histogram_quantile(0.99,
sum by (le) (rate(predict_latency_seconds_bucket{model_version="v1"}[5m]))
)Delayed Accuracy
Truth এলে rolling metric
pythonproduction
# truth arrives in clickstream T+1h
def evaluate_predictions_with_label(window="1h"):
df = (
warehouse.query(f"""
SELECT p.prediction, l.label
FROM predictions p JOIN labels l USING (request_id)
WHERE l.ts >= now() - INTERVAL '{window}'
""")
)
acc = (df.prediction == df.label).mean()
push_to_prometheus("rolling_accuracy_1h", acc)MLflow Tracking
Experiment + production unified
pythonproduction
import mlflow
mlflow.set_experiment("iris-prod")
with mlflow.start_run(run_name=f"deploy-{version}"):
mlflow.log_param("n_estimators", 200)
mlflow.log_metric("offline_accuracy", 0.96)
mlflow.log_metric("online_p95_ms", 38.2)
mlflow.log_metric("rolling_accuracy_7d", 0.93)
mlflow.sklearn.log_model(model, "model")Business Tie-in
Tech metric ≠ value
- Recommendation accuracy ↑ মানে revenue ↑ — সবসময় না। CTR + downstream conversion মাপো।
- Fraud model recall ↑ কিন্তু false-positive ৩x → customer support cost বাড়ে।
- Always pair ML metric with cost-aware business KPI।
Pitfalls
Metric trap
- Average latency dashboard — tail invisible।
- Accuracy single number — class imbalance hide করে; precision/recall দেখো।
- Survivorship bias — শুধু success log করা।
- Goodhart's law — metric target হয়ে গেলে gaming শুরু।
Mini Project
Grafana dashboard
- Iris API-তে histogram latency add করো।
- Prediction count + class breakdown counter।
- Grafana-এ p50/p95/p99 panel + class distribution pie।
Takeaway
মনে রাখো
Metric = percentile, paired, purposeful। Single number-এ decision নিও না।