হোম/Roadmap/Chapter 9.04
Phase 9 · Chapter 9.04

NLP API Systems

Sentiment, NER, summarization, embedding, translation — সব NLP service-এর backbone এক: tokenize → encode → decode → post-process।

Task Map

Common NLP API endpoint

  • Classification: sentiment, topic, intent।
  • Token tagging: NER, POS।
  • Embedding: semantic search, clustering।
  • Seq2seq: summarization, translation, paraphrase।
  • Generation: LLM (Chapter 9-02-এ আলাদা)।
HuggingFace + FastAPI

Embedding service

pythonproduction
from fastapi import FastAPI
from pydantic import BaseModel, Field
from sentence_transformers import SentenceTransformer
import numpy as np

app = FastAPI()
model = SentenceTransformer("intfloat/multilingual-e5-base", device="cuda")

class EmbedReq(BaseModel):
    texts: list[str] = Field(min_length=1, max_length=128)

@app.post("/embed")
def embed(req: EmbedReq):
    vecs = model.encode(req.texts, batch_size=32, normalize_embeddings=True)
    return {"dim": vecs.shape[1], "vectors": vecs.tolist()}
Classification

Sentiment service with batching

pythonproduction
from transformers import pipeline

clf = pipeline(
    "text-classification",
    model="cardiffnlp/twitter-xlm-roberta-base-sentiment",
    device=0,                    # GPU
    truncation=True,
    max_length=256,
)

@app.post("/sentiment")
def sentiment(req: BatchReq):
    out = clf(req.texts, batch_size=32)
    return [{"label": o["label"], "score": round(o["score"], 4)} for o in out]
Server-Side Batching

Triton-style micro-batching

pythonproduction
# collect requests for 5ms, run as single forward pass
import asyncio
QUEUE: asyncio.Queue = asyncio.Queue()

async def batcher():
    while True:
        items = [await QUEUE.get()]
        try:
            while len(items) < 32:
                items.append(QUEUE.get_nowait())
        except asyncio.QueueEmpty:
            pass
        texts = [i["text"] for i in items]
        results = clf(texts, batch_size=len(texts))
        for it, r in zip(items, results):
            it["fut"].set_result(r)

@app.on_event("startup")
async def start():
    asyncio.create_task(batcher())

@app.post("/sentiment")
async def sentiment(req: TextReq):
    fut = asyncio.get_event_loop().create_future()
    await QUEUE.put({"text": req.text, "fut": fut})
    return await asyncio.wait_for(fut, timeout=5)
Multilingual

ভাষাভিত্তিক চ্যালেঞ্জ

  • Tokenizer ভাষা-aware হতে হবে (XLM-R, mBERT, IndicBERT)।
  • Language detection আগে — তারপর model route।
  • Right-to-left script (Arabic, Hebrew) — display + preprocessing।
  • Bengali-Hindi-Tamil mixed input — script detector + transliterate।
Optimization

NLP latency-এর শত্রু

  • Tokenizer slow → use_fast=True (Rust)।
  • Long text → chunk + aggregate, কখনো truncate।
  • Padding waste compute → dynamic padding বা bucketing।
  • FP16 + ONNX → BERT-class model 3–5x faster।
  • Sentence embedding-এ Redis cache (hash(text) → vec)।
Pitfalls

যা production-এ ভাঙে

  • Tokenizer mismatch — training vs serving — silent accuracy drop।
  • UTF-8 / encoding bug — emoji, RTL char।
  • Max length silently truncate — long doc-এ context হারায়।
  • PII leak — log-এ raw text store।
  • Model 1.5GB — pod cold start 30s, autoscale ভাঙে।
Mini Project

Multilingual sentiment API

  1. XLM-R sentiment model FastAPI-তে wrap।
  2. Server-side micro-batching যোগ করো।
  3. Redis cache hash(text) → label।
  4. 100 concurrent request-এ p95 measure।
Phase 9 Complete

তুমি যা শিখলে

Recsys, chatbot, CV, NLP — চার domain-এর production blueprint। পরবর্তী Phase: AI System Architecture — microservices, event-driven, low-latency design।

← Roadmap-এ ফিরুন
পরবর্তী: Microservices for AIশীঘ্রই