Automated Testing for AI Systems

Software test বলে 'code কাজ করে'। ML test বলে 'code + data + model একসাথে কাজ করে'। চারটা স্তর — চারটা strategy।

The 4-Layer Pyramid

ML Testing-এর চার স্তর

1. Unit tests — feature engineering, utility function।
2. Data tests — schema, range, null, distribution।
3. Model tests — accuracy, fairness, invariance, directional।
4. Integration / API tests — endpoint, contract, smoke।

Layer 1

Unit test — pytest fixture

pythonproduction

# tests/test_features.py
import pytest
from src.features import normalize_petal

def test_normalize_petal_range():
    assert normalize_petal(0.0) == 0.0
    assert normalize_petal(10.0) == 1.0

def test_normalize_petal_negative_raises():
    with pytest.raises(ValueError):
        normalize_petal(-1.0)

Layer 2

Data test — Pandera schema

pythonproduction

import pandera as pa
from pandera import Column, Check

iris_schema = pa.DataFrameSchema({
    "sepal_length": Column(float, Check.in_range(4.0, 8.0)),
    "sepal_width":  Column(float, Check.in_range(2.0, 4.5)),
    "petal_length": Column(float, Check.in_range(1.0, 7.0)),
    "petal_width":  Column(float, Check.in_range(0.1, 2.5)),
    "species":      Column(str,   Check.isin(["setosa", "versicolor", "virginica"])),
})

def test_training_data_schema(train_df):
    iris_schema.validate(train_df)  # fail fast on bad data

Layer 3

Model test — quality + behavior

pythonproduction

import joblib, numpy as np

model = joblib.load("models/iris.pkl")

def test_baseline_accuracy(test_df):
    acc = (model.predict(test_df.X) == test_df.y).mean()
    assert acc >= 0.90, f"Accuracy dropped: {acc:.3f}"

def test_invariance_to_irrelevant_noise():
    # small perturbation should not change prediction
    x = np.array([[5.1, 3.5, 1.4, 0.2]])
    base = model.predict(x)[0]
    noisy = model.predict(x + 1e-3)[0]
    assert base == noisy

def test_directional_petal_length():
    # larger petal → less likely setosa
    small = model.predict_proba([[5.0, 3.5, 1.0, 0.2]])[0][0]
    large = model.predict_proba([[5.0, 3.5, 5.0, 0.2]])[0][0]
    assert large < small

Layer 4

API test — FastAPI TestClient

pythonproduction

from fastapi.testclient import TestClient
from src.main import app

client = TestClient(app)

def test_predict_endpoint_contract():
    r = client.post("/predict", json={
        "sepal_length": 5.1, "sepal_width": 3.5,
        "petal_length": 1.4, "petal_width": 0.2,
    })
    assert r.status_code == 200
    body = r.json()
    assert "prediction" in body
    assert "model_version" in body
    assert body["prediction"] in {"setosa", "versicolor", "virginica"}

def test_predict_rejects_invalid_input():
    r = client.post("/predict", json={"sepal_length": "oops"})
    assert r.status_code == 422

Best Practices

ML test লেখার নিয়ম

Random seed fix — np.random.seed(42)।
Small fixed test dataset commit করো (tests/fixtures/)।
Baseline metric file রাখো — regression detect হবে।
Slow test আলাদা mark — @pytest.mark.slow।
Coverage target ≥ 80% — কিন্তু coverage ≠ quality।

Anti-patterns

যা কোরো না

Production data টেস্টে use করা (PII leak)।
Network call test-এ — mock করো।
Exact float equality — pytest.approx use করো।
একই test file 1000+ lines — ভেঙে আলাদা করো।

Mini Project

4-layer test suite

Iris project-এ tests/ directory বানাও — 4টা file।
GitHub Actions-এ pytest --cov চালাও।
Coverage report PR comment-এ post করার action যোগ করো।

Phase 3 Complete

তুমি যা শিখলে

CI/CD concepts, GitHub Actions workflow, ৪ স্তরের automated testing — এখন তোমার ML code production-এ যাওয়ার আগেই automatic quality gate পায়। পরবর্তী Phase: Cloud Deployment — AWS, GCP, Azure-এ host করার জগৎ।

← Roadmap-এ ফিরুন

পরবর্তী: AWS/GCP/Azure Basicsশীঘ্রই