হোম/Roadmap/Chapter 3.03
Phase 3 · Chapter 3.03

Automated Testing for AI Systems

Software test বলে 'code কাজ করে'। ML test বলে 'code + data + model একসাথে কাজ করে'। চারটা স্তর — চারটা strategy।

The 4-Layer Pyramid

ML Testing-এর চার স্তর

  • 1. Unit tests — feature engineering, utility function।
  • 2. Data tests — schema, range, null, distribution।
  • 3. Model tests — accuracy, fairness, invariance, directional।
  • 4. Integration / API tests — endpoint, contract, smoke।
Layer 1

Unit test — pytest fixture

pythonproduction
# tests/test_features.py
import pytest
from src.features import normalize_petal

def test_normalize_petal_range():
    assert normalize_petal(0.0) == 0.0
    assert normalize_petal(10.0) == 1.0

def test_normalize_petal_negative_raises():
    with pytest.raises(ValueError):
        normalize_petal(-1.0)
Layer 2

Data test — Pandera schema

pythonproduction
import pandera as pa
from pandera import Column, Check

iris_schema = pa.DataFrameSchema({
    "sepal_length": Column(float, Check.in_range(4.0, 8.0)),
    "sepal_width":  Column(float, Check.in_range(2.0, 4.5)),
    "petal_length": Column(float, Check.in_range(1.0, 7.0)),
    "petal_width":  Column(float, Check.in_range(0.1, 2.5)),
    "species":      Column(str,   Check.isin(["setosa", "versicolor", "virginica"])),
})

def test_training_data_schema(train_df):
    iris_schema.validate(train_df)  # fail fast on bad data
Layer 3

Model test — quality + behavior

pythonproduction
import joblib, numpy as np

model = joblib.load("models/iris.pkl")

def test_baseline_accuracy(test_df):
    acc = (model.predict(test_df.X) == test_df.y).mean()
    assert acc >= 0.90, f"Accuracy dropped: {acc:.3f}"

def test_invariance_to_irrelevant_noise():
    # small perturbation should not change prediction
    x = np.array([[5.1, 3.5, 1.4, 0.2]])
    base = model.predict(x)[0]
    noisy = model.predict(x + 1e-3)[0]
    assert base == noisy

def test_directional_petal_length():
    # larger petal → less likely setosa
    small = model.predict_proba([[5.0, 3.5, 1.0, 0.2]])[0][0]
    large = model.predict_proba([[5.0, 3.5, 5.0, 0.2]])[0][0]
    assert large < small
Layer 4

API test — FastAPI TestClient

pythonproduction
from fastapi.testclient import TestClient
from src.main import app

client = TestClient(app)

def test_predict_endpoint_contract():
    r = client.post("/predict", json={
        "sepal_length": 5.1, "sepal_width": 3.5,
        "petal_length": 1.4, "petal_width": 0.2,
    })
    assert r.status_code == 200
    body = r.json()
    assert "prediction" in body
    assert "model_version" in body
    assert body["prediction"] in {"setosa", "versicolor", "virginica"}

def test_predict_rejects_invalid_input():
    r = client.post("/predict", json={"sepal_length": "oops"})
    assert r.status_code == 422
Best Practices

ML test লেখার নিয়ম

  • Random seed fix — np.random.seed(42)
  • Small fixed test dataset commit করো (tests/fixtures/)।
  • Baseline metric file রাখো — regression detect হবে।
  • Slow test আলাদা mark — @pytest.mark.slow
  • Coverage target ≥ 80% — কিন্তু coverage ≠ quality।
Anti-patterns

যা কোরো না

  • Production data টেস্টে use করা (PII leak)।
  • Network call test-এ — mock করো।
  • Exact float equality — pytest.approx use করো।
  • একই test file 1000+ lines — ভেঙে আলাদা করো।
Mini Project

4-layer test suite

  1. Iris project-এ tests/ directory বানাও — 4টা file।
  2. GitHub Actions-এ pytest --cov চালাও।
  3. Coverage report PR comment-এ post করার action যোগ করো।
Phase 3 Complete

তুমি যা শিখলে

CI/CD concepts, GitHub Actions workflow, ৪ স্তরের automated testing — এখন তোমার ML code production-এ যাওয়ার আগেই automatic quality gate পায়। পরবর্তী Phase: Cloud Deployment — AWS, GCP, Azure-এ host করার জগৎ।

← Roadmap-এ ফিরুন
পরবর্তী: AWS/GCP/Azure Basicsশীঘ্রই