হোম/Roadmap/Chapter 4.02
Phase 4 · Chapter 4.02

Model Hosting Services

Managed endpoint মানে — তুমি model upload করবে, cloud auto-scale, monitor, versioning সব handle করবে।

The Spectrum

Hosting-এর ৩ level

  • Self-managed: EC2/GKE-তে নিজের FastAPI container — full control, full responsibility।
  • Managed endpoint: SageMaker / Vertex AI / Azure ML — model upload + config, scaling auto।
  • Inference-as-a-Service: HuggingFace Inference, Replicate, Modal — শুধু API call করো।
Managed Endpoints

তিনটা platform-এর core flow

  • SageMaker: Model artifact → S3, container image → ECR, Endpoint Config → Endpoint।
  • Vertex AI: Model upload → Model Registry, deploy → Endpoint, traffic split supported।
  • Azure ML: Model register → Environment → Managed Online Endpoint → Deployment।
SageMaker Example

Python SDK দিয়ে deploy

pythonproduction
import sagemaker
from sagemaker.sklearn.model import SKLearnModel

role = "arn:aws:iam::123:role/SageMakerRole"

model = SKLearnModel(
    model_data="s3://my-bucket/iris/model.tar.gz",
    role=role,
    entry_point="inference.py",
    framework_version="1.2-1",
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.t2.medium",
    endpoint_name="iris-prod",
)

# Inference
result = predictor.predict([[5.1, 3.5, 1.4, 0.2]])
print(result)
inference.py

SageMaker contract — ৪টা function

pythonproduction
import joblib, os, json

def model_fn(model_dir):
    return joblib.load(os.path.join(model_dir, "model.joblib"))

def input_fn(body, content_type):
    return json.loads(body)

def predict_fn(data, model):
    return model.predict(data).tolist()

def output_fn(pred, accept):
    return json.dumps({"prediction": pred}), accept
Trade-offs

Managed vs DIY

  • Managed pros: Auto-scale, A/B traffic split, monitoring, model registry, IAM built-in।
  • Managed cons: Vendor lock-in, debugging কঠিন, cost বেশি, custom runtime সীমিত।
  • DIY pros: Full control, portable (Docker), সস্তা scale-এ।
  • DIY cons: তুমি pager-on-call।
Pitfalls

যা প্রায়ই ভুল হয়

  • Endpoint চালু রেখে ভুলে যাওয়া — ঘণ্টায় $$$।
  • Default instance type — workload-এর জন্য oversized।
  • Cold start latency measure না করা — first request slow।
  • Model version overwrite — old version rollback সম্ভব না।
Mini Project

SageMaker-এ Iris deploy

  1. Joblib model S3-তে upload।
  2. inference.py লিখে SKLearnModel deploy করো।
  3. 10 request পাঠিয়ে latency measure করো।
  4. কাজ শেষে endpoint delete করো।
Takeaway

মনে রাখো

Managed endpoint = speed-to-production। DIY = control + cost। ছোট startup-এ managed দিয়ে শুরু করো, scale বাড়লে evaluate করো।