হোম/Roadmap/Chapter 10.01
Phase 10 · Chapter 10.01

Microservices for AI

একটা বড় ML app-কে ছোট ছোট স্বাধীন service-এ ভাঙা — যাতে preprocessing, model, post-processing আলাদা scale করতে পারে।

Why Microservices

Monolith-এর সমস্যা

  • একটা bug = পুরো system down।
  • Model retrain → পুরো app redeploy।
  • GPU service আর lightweight API একসাথে scale করতে হয় — costly।
  • একাধিক team একই codebase-এ conflict।
Service Boundary

কীভাবে ভাঙবে

  • Gateway: auth, rate limit, routing।
  • Preprocessor: tokenize, resize, feature extract (CPU)।
  • Inference: GPU-bound, batched।
  • Postprocessor: business rule, formatting।
  • Feature store / Vector DB: stateful, আলাদা lifecycle।
gRPC Inference Service

High-performance internal API

protoproduction
// inference.proto
syntax = "proto3";
service Inference {
  rpc Predict (PredictRequest) returns (PredictResponse);
}
message PredictRequest { repeated float features = 1; }
message PredictResponse { float score = 1; string version = 2; }
pythonproduction
# server.py
import grpc, inference_pb2, inference_pb2_grpc
from concurrent import futures
import joblib

model = joblib.load("model.pkl")

class InferenceServicer(inference_pb2_grpc.InferenceServicer):
    def Predict(self, req, ctx):
        score = float(model.predict_proba([list(req.features)])[0][1])
        return inference_pb2.PredictResponse(score=score, version="v1.3")

server = grpc.server(futures.ThreadPoolExecutor(max_workers=16))
inference_pb2_grpc.add_InferenceServicer_to_server(InferenceServicer(), server)
server.add_insecure_port("[::]:50051")
server.start(); server.wait_for_termination()
API Gateway

Kong / Envoy / Nginx config

yamlproduction
# kong.yaml
services:
  - name: inference
    url: http://inference-svc.ml.svc.cluster.local:50051
    routes:
      - paths: ["/v1/predict"]
    plugins:
      - name: rate-limiting
        config: { minute: 600 }
      - name: jwt
Communication Pattern

Sync vs Async

  • REST/gRPC: low-latency, request-response।
  • Message Queue: Kafka/RabbitMQ — heavy job, retry, decoupling।
  • WebSocket: streaming inference (chatbot, live caption)।
Pitfalls

যা microservice-এ ভাঙে

  • Distributed tracing ছাড়া — bug debug অসম্ভব (Jaeger/OpenTelemetry লাগবে)।
  • Service mesh ছাড়া mTLS, retry, circuit breaker manage কঠিন।
  • Too many services → operational overhead — সবকিছু microservice বানানো ভুল।
  • Network latency — 10 service hop × 5ms = 50ms gone।
Mini Project

Split a monolith

  1. একটা FastAPI monolith নাও (preprocess + predict + log)।
  2. 3 service-এ ভাঙো: gateway, inference (gRPC), logger (Kafka)।
  3. Docker Compose-এ চালাও, latency compare করো।
Takeaway

মূল কথা

Microservice = সমস্যা ভাঙার tool, fashion না। ছোট team-এ monolith ভালো, বড় AI platform-এ service-ভিত্তিক architecture inevitable।

← Roadmap-এ ফিরুন
পরবর্তী: Event-driven AI Systemsশীঘ্রই