Phase 8 · Chapter 8.04
AutoML Pipelines
Feature engineering, model search, hyperparameter tune — সব automate করার প্রতিশ্রুতি। সত্যি? অংশে।
Definition
AutoML মানে কী
Raw data → preprocessing → algorithm select → hyperparameter tune → ensemble → tuned model — পুরো pipeline auto। তুমি data + target দাও, framework বাকিটা চালায়।
Spectrum
৩ ধরন
- Library: AutoGluon, H2O AutoML, FLAML, auto-sklearn — local code।
- Managed cloud: Vertex AutoML, SageMaker Autopilot, Azure AutoML।
- HPO-only: Optuna, Ray Tune — model architecture fixed, hyperparameter search।
AutoGluon Demo
3 লাইনে production-ready model
pythonproduction
from autogluon.tabular import TabularPredictor
import pandas as pd
train = pd.read_csv("iris_train.csv")
predictor = TabularPredictor(label="species", eval_metric="accuracy")
predictor.fit(train, time_limit=300, presets="best_quality")
# leaderboard of all trained models
print(predictor.leaderboard())
# predict
test = pd.read_csv("iris_test.csv")
preds = predictor.predict(test)Optuna
Hyperparameter search
pythonproduction
import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
def objective(trial):
params = {
"n_estimators": trial.suggest_int("n_estimators", 50, 500),
"max_depth": trial.suggest_int("max_depth", 3, 20),
"min_samples_split": trial.suggest_int("min_samples_split", 2, 20),
}
model = RandomForestClassifier(**params, random_state=42)
return cross_val_score(model, X, y, cv=5, scoring="accuracy").mean()
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100, n_jobs=4)
print("best:", study.best_params, study.best_value)Workflow Integration
AutoML-কে MLOps pipeline-এ ঢোকাও
textproduction
Airflow DAG (weekly):
pull_latest_data
└─> validate_schema
└─> run_autogluon (time_limit=2h, retrain)
└─> evaluate_vs_champion
├─ if new_acc > champion + ε:
│ register_in_model_registry
│ trigger_canary_deploy
└─ else: keep_champion, log_attemptWhen AutoML Wins
ভালো use case
- Tabular data, well-defined target — classification/regression।
- Small team, baseline model দরকার দ্রুত।
- Many similar models (per-tenant, per-country) — manual tune অসম্ভব।
- Periodic refresh — same problem, fresh data।
Limits
যেখানে AutoML হারে
- Domain-specific feature engineering (e.g. fraud signal) — AutoML জানে না।
- Deep learning research — architecture novelty AutoML produce করে না।
- Strict latency budget — AutoML ensemble বড় ও ধীর হয়।
- Interpretability requirement — black-box stacked ensemble explain কঠিন।
- Data quality খারাপ — garbage in, garbage out, automated।
Best Practice
AutoML responsibly
- সবসময় baseline (logistic regression) compare করো — AutoML অনেক সময় হারে।
- Time budget বাড়ালেই accuracy বাড়ে না (diminishing returns)।
- Production-এ deploy করার আগে fairness + drift audit।
- Feature importance / SHAP বের করে human-in-loop সমীক্ষা।
Mini Project
AutoGluon vs Manual
- Iris dataset-এ AutoGluon 5 min budget চালাও।
- Hand-tuned RandomForest-এর accuracy compare করো।
- Leaderboard সব model latency measure করো।
- Best model joblib export করে FastAPI-তে serve করো।
Phase 8 Complete
তুমি যা শিখলে
A/B testing, canary, blue-green, AutoML — advanced deployment + experimentation toolkit। পরবর্তী Phase: Real-world AI Systems — recommendation, chatbot, CV, NLP production।
← Roadmap-এ ফিরুন
পরবর্তী: Recommendation System Deploymentশীঘ্রই