Phase 10 · Chapter 10.01
Microservices for AI
একটা বড় ML app-কে ছোট ছোট স্বাধীন service-এ ভাঙা — যাতে preprocessing, model, post-processing আলাদা scale করতে পারে।
Why Microservices
Monolith-এর সমস্যা
- একটা bug = পুরো system down।
- Model retrain → পুরো app redeploy।
- GPU service আর lightweight API একসাথে scale করতে হয় — costly।
- একাধিক team একই codebase-এ conflict।
Service Boundary
কীভাবে ভাঙবে
- Gateway: auth, rate limit, routing।
- Preprocessor: tokenize, resize, feature extract (CPU)।
- Inference: GPU-bound, batched।
- Postprocessor: business rule, formatting।
- Feature store / Vector DB: stateful, আলাদা lifecycle।
gRPC Inference Service
High-performance internal API
protoproduction
// inference.proto
syntax = "proto3";
service Inference {
rpc Predict (PredictRequest) returns (PredictResponse);
}
message PredictRequest { repeated float features = 1; }
message PredictResponse { float score = 1; string version = 2; } pythonproduction
# server.py
import grpc, inference_pb2, inference_pb2_grpc
from concurrent import futures
import joblib
model = joblib.load("model.pkl")
class InferenceServicer(inference_pb2_grpc.InferenceServicer):
def Predict(self, req, ctx):
score = float(model.predict_proba([list(req.features)])[0][1])
return inference_pb2.PredictResponse(score=score, version="v1.3")
server = grpc.server(futures.ThreadPoolExecutor(max_workers=16))
inference_pb2_grpc.add_InferenceServicer_to_server(InferenceServicer(), server)
server.add_insecure_port("[::]:50051")
server.start(); server.wait_for_termination()API Gateway
Kong / Envoy / Nginx config
yamlproduction
# kong.yaml
services:
- name: inference
url: http://inference-svc.ml.svc.cluster.local:50051
routes:
- paths: ["/v1/predict"]
plugins:
- name: rate-limiting
config: { minute: 600 }
- name: jwtCommunication Pattern
Sync vs Async
- REST/gRPC: low-latency, request-response।
- Message Queue: Kafka/RabbitMQ — heavy job, retry, decoupling।
- WebSocket: streaming inference (chatbot, live caption)।
Pitfalls
যা microservice-এ ভাঙে
- Distributed tracing ছাড়া — bug debug অসম্ভব (Jaeger/OpenTelemetry লাগবে)।
- Service mesh ছাড়া mTLS, retry, circuit breaker manage কঠিন।
- Too many services → operational overhead — সবকিছু microservice বানানো ভুল।
- Network latency — 10 service hop × 5ms = 50ms gone।
Mini Project
Split a monolith
- একটা FastAPI monolith নাও (preprocess + predict + log)।
- 3 service-এ ভাঙো: gateway, inference (gRPC), logger (Kafka)।
- Docker Compose-এ চালাও, latency compare করো।
Takeaway
মূল কথা
Microservice = সমস্যা ভাঙার tool, fashion না। ছোট team-এ monolith ভালো, বড় AI platform-এ service-ভিত্তিক architecture inevitable।
← Roadmap-এ ফিরুন
পরবর্তী: Event-driven AI Systemsশীঘ্রই