Phase 3 · Chapter 3.03
Automated Testing for AI Systems
Software test বলে 'code কাজ করে'। ML test বলে 'code + data + model একসাথে কাজ করে'। চারটা স্তর — চারটা strategy।
The 4-Layer Pyramid
ML Testing-এর চার স্তর
- 1. Unit tests — feature engineering, utility function।
- 2. Data tests — schema, range, null, distribution।
- 3. Model tests — accuracy, fairness, invariance, directional।
- 4. Integration / API tests — endpoint, contract, smoke।
Layer 1
Unit test — pytest fixture
pythonproduction
# tests/test_features.py
import pytest
from src.features import normalize_petal
def test_normalize_petal_range():
assert normalize_petal(0.0) == 0.0
assert normalize_petal(10.0) == 1.0
def test_normalize_petal_negative_raises():
with pytest.raises(ValueError):
normalize_petal(-1.0)Layer 2
Data test — Pandera schema
pythonproduction
import pandera as pa
from pandera import Column, Check
iris_schema = pa.DataFrameSchema({
"sepal_length": Column(float, Check.in_range(4.0, 8.0)),
"sepal_width": Column(float, Check.in_range(2.0, 4.5)),
"petal_length": Column(float, Check.in_range(1.0, 7.0)),
"petal_width": Column(float, Check.in_range(0.1, 2.5)),
"species": Column(str, Check.isin(["setosa", "versicolor", "virginica"])),
})
def test_training_data_schema(train_df):
iris_schema.validate(train_df) # fail fast on bad dataLayer 3
Model test — quality + behavior
pythonproduction
import joblib, numpy as np
model = joblib.load("models/iris.pkl")
def test_baseline_accuracy(test_df):
acc = (model.predict(test_df.X) == test_df.y).mean()
assert acc >= 0.90, f"Accuracy dropped: {acc:.3f}"
def test_invariance_to_irrelevant_noise():
# small perturbation should not change prediction
x = np.array([[5.1, 3.5, 1.4, 0.2]])
base = model.predict(x)[0]
noisy = model.predict(x + 1e-3)[0]
assert base == noisy
def test_directional_petal_length():
# larger petal → less likely setosa
small = model.predict_proba([[5.0, 3.5, 1.0, 0.2]])[0][0]
large = model.predict_proba([[5.0, 3.5, 5.0, 0.2]])[0][0]
assert large < smallLayer 4
API test — FastAPI TestClient
pythonproduction
from fastapi.testclient import TestClient
from src.main import app
client = TestClient(app)
def test_predict_endpoint_contract():
r = client.post("/predict", json={
"sepal_length": 5.1, "sepal_width": 3.5,
"petal_length": 1.4, "petal_width": 0.2,
})
assert r.status_code == 200
body = r.json()
assert "prediction" in body
assert "model_version" in body
assert body["prediction"] in {"setosa", "versicolor", "virginica"}
def test_predict_rejects_invalid_input():
r = client.post("/predict", json={"sepal_length": "oops"})
assert r.status_code == 422Best Practices
ML test লেখার নিয়ম
- Random seed fix —
np.random.seed(42)। - Small fixed test dataset commit করো (
tests/fixtures/)। - Baseline metric file রাখো — regression detect হবে।
- Slow test আলাদা mark —
@pytest.mark.slow। - Coverage target ≥ 80% — কিন্তু coverage ≠ quality।
Anti-patterns
যা কোরো না
- Production data টেস্টে use করা (PII leak)।
- Network call test-এ — mock করো।
- Exact float equality —
pytest.approxuse করো। - একই test file 1000+ lines — ভেঙে আলাদা করো।
Mini Project
4-layer test suite
- Iris project-এ
tests/directory বানাও — 4টা file। - GitHub Actions-এ
pytest --covচালাও। - Coverage report PR comment-এ post করার action যোগ করো।
Phase 3 Complete
তুমি যা শিখলে
CI/CD concepts, GitHub Actions workflow, ৪ স্তরের automated testing — এখন তোমার ML code production-এ যাওয়ার আগেই automatic quality gate পায়। পরবর্তী Phase: Cloud Deployment — AWS, GCP, Azure-এ host করার জগৎ।
← Roadmap-এ ফিরুন
পরবর্তী: AWS/GCP/Azure Basicsশীঘ্রই