Phase 4 · Chapter 4.02
Model Hosting Services
Managed endpoint মানে — তুমি model upload করবে, cloud auto-scale, monitor, versioning সব handle করবে।
The Spectrum
Hosting-এর ৩ level
- Self-managed: EC2/GKE-তে নিজের FastAPI container — full control, full responsibility।
- Managed endpoint: SageMaker / Vertex AI / Azure ML — model upload + config, scaling auto।
- Inference-as-a-Service: HuggingFace Inference, Replicate, Modal — শুধু API call করো।
Managed Endpoints
তিনটা platform-এর core flow
- SageMaker: Model artifact → S3, container image → ECR, Endpoint Config → Endpoint।
- Vertex AI: Model upload → Model Registry, deploy → Endpoint, traffic split supported।
- Azure ML: Model register → Environment → Managed Online Endpoint → Deployment।
SageMaker Example
Python SDK দিয়ে deploy
pythonproduction
import sagemaker
from sagemaker.sklearn.model import SKLearnModel
role = "arn:aws:iam::123:role/SageMakerRole"
model = SKLearnModel(
model_data="s3://my-bucket/iris/model.tar.gz",
role=role,
entry_point="inference.py",
framework_version="1.2-1",
)
predictor = model.deploy(
initial_instance_count=1,
instance_type="ml.t2.medium",
endpoint_name="iris-prod",
)
# Inference
result = predictor.predict([[5.1, 3.5, 1.4, 0.2]])
print(result)inference.py
SageMaker contract — ৪টা function
pythonproduction
import joblib, os, json
def model_fn(model_dir):
return joblib.load(os.path.join(model_dir, "model.joblib"))
def input_fn(body, content_type):
return json.loads(body)
def predict_fn(data, model):
return model.predict(data).tolist()
def output_fn(pred, accept):
return json.dumps({"prediction": pred}), acceptTrade-offs
Managed vs DIY
- Managed pros: Auto-scale, A/B traffic split, monitoring, model registry, IAM built-in।
- Managed cons: Vendor lock-in, debugging কঠিন, cost বেশি, custom runtime সীমিত।
- DIY pros: Full control, portable (Docker), সস্তা scale-এ।
- DIY cons: তুমি pager-on-call।
Pitfalls
যা প্রায়ই ভুল হয়
- Endpoint চালু রেখে ভুলে যাওয়া — ঘণ্টায় $$$।
- Default instance type — workload-এর জন্য oversized।
- Cold start latency measure না করা — first request slow।
- Model version overwrite — old version rollback সম্ভব না।
Mini Project
SageMaker-এ Iris deploy
- Joblib model S3-তে upload।
inference.pyলিখে SKLearnModel deploy করো।- 10 request পাঠিয়ে latency measure করো।
- কাজ শেষে endpoint delete করো।
Takeaway
মনে রাখো
Managed endpoint = speed-to-production। DIY = control + cost। ছোট startup-এ managed দিয়ে শুরু করো, scale বাড়লে evaluate করো।