হোম/Roadmap/Chapter 8.04
Phase 8 · Chapter 8.04

AutoML Pipelines

Feature engineering, model search, hyperparameter tune — সব automate করার প্রতিশ্রুতি। সত্যি? অংশে।

Definition

AutoML মানে কী

Raw data → preprocessing → algorithm select → hyperparameter tune → ensemble → tuned model — পুরো pipeline auto। তুমি data + target দাও, framework বাকিটা চালায়।

Spectrum

৩ ধরন

  • Library: AutoGluon, H2O AutoML, FLAML, auto-sklearn — local code।
  • Managed cloud: Vertex AutoML, SageMaker Autopilot, Azure AutoML।
  • HPO-only: Optuna, Ray Tune — model architecture fixed, hyperparameter search।
AutoGluon Demo

3 লাইনে production-ready model

pythonproduction
from autogluon.tabular import TabularPredictor
import pandas as pd

train = pd.read_csv("iris_train.csv")
predictor = TabularPredictor(label="species", eval_metric="accuracy")
predictor.fit(train, time_limit=300, presets="best_quality")

# leaderboard of all trained models
print(predictor.leaderboard())

# predict
test = pd.read_csv("iris_test.csv")
preds = predictor.predict(test)
Optuna

Hyperparameter search

pythonproduction
import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

def objective(trial):
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 50, 500),
        "max_depth":    trial.suggest_int("max_depth", 3, 20),
        "min_samples_split": trial.suggest_int("min_samples_split", 2, 20),
    }
    model = RandomForestClassifier(**params, random_state=42)
    return cross_val_score(model, X, y, cv=5, scoring="accuracy").mean()

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100, n_jobs=4)
print("best:", study.best_params, study.best_value)
Workflow Integration

AutoML-কে MLOps pipeline-এ ঢোকাও

textproduction
Airflow DAG (weekly):
  pull_latest_data
   └─> validate_schema
        └─> run_autogluon (time_limit=2h, retrain)
             └─> evaluate_vs_champion
                  ├─ if new_acc > champion + ε:
                  │     register_in_model_registry
                  │     trigger_canary_deploy
                  └─ else: keep_champion, log_attempt
When AutoML Wins

ভালো use case

  • Tabular data, well-defined target — classification/regression।
  • Small team, baseline model দরকার দ্রুত।
  • Many similar models (per-tenant, per-country) — manual tune অসম্ভব।
  • Periodic refresh — same problem, fresh data।
Limits

যেখানে AutoML হারে

  • Domain-specific feature engineering (e.g. fraud signal) — AutoML জানে না।
  • Deep learning research — architecture novelty AutoML produce করে না।
  • Strict latency budget — AutoML ensemble বড় ও ধীর হয়।
  • Interpretability requirement — black-box stacked ensemble explain কঠিন।
  • Data quality খারাপ — garbage in, garbage out, automated।
Best Practice

AutoML responsibly

  • সবসময় baseline (logistic regression) compare করো — AutoML অনেক সময় হারে।
  • Time budget বাড়ালেই accuracy বাড়ে না (diminishing returns)।
  • Production-এ deploy করার আগে fairness + drift audit।
  • Feature importance / SHAP বের করে human-in-loop সমীক্ষা।
Mini Project

AutoGluon vs Manual

  1. Iris dataset-এ AutoGluon 5 min budget চালাও।
  2. Hand-tuned RandomForest-এর accuracy compare করো।
  3. Leaderboard সব model latency measure করো।
  4. Best model joblib export করে FastAPI-তে serve করো।
Phase 8 Complete

তুমি যা শিখলে

A/B testing, canary, blue-green, AutoML — advanced deployment + experimentation toolkit। পরবর্তী Phase: Real-world AI Systems — recommendation, chatbot, CV, NLP production।

← Roadmap-এ ফিরুন
পরবর্তী: Recommendation System Deploymentশীঘ্রই