Phase 9 · Chapter 9.04

NLP API Systems

Sentiment, NER, summarization, embedding, translation — সব NLP service-এর backbone এক: tokenize → encode → decode → post-process।

Task Map

Common NLP API endpoint

Classification: sentiment, topic, intent।
Token tagging: NER, POS।
Embedding: semantic search, clustering।
Seq2seq: summarization, translation, paraphrase।
Generation: LLM (Chapter 9-02-এ আলাদা)।

HuggingFace + FastAPI

Embedding service

pythonproduction

from fastapi import FastAPI
from pydantic import BaseModel, Field
from sentence_transformers import SentenceTransformer
import numpy as np

app = FastAPI()
model = SentenceTransformer("intfloat/multilingual-e5-base", device="cuda")

class EmbedReq(BaseModel):
    texts: list[str] = Field(min_length=1, max_length=128)

@app.post("/embed")
def embed(req: EmbedReq):
    vecs = model.encode(req.texts, batch_size=32, normalize_embeddings=True)
    return {"dim": vecs.shape[1], "vectors": vecs.tolist()}

Classification

Sentiment service with batching

pythonproduction

from transformers import pipeline

clf = pipeline(
    "text-classification",
    model="cardiffnlp/twitter-xlm-roberta-base-sentiment",
    device=0,                    # GPU
    truncation=True,
    max_length=256,
)

@app.post("/sentiment")
def sentiment(req: BatchReq):
    out = clf(req.texts, batch_size=32)
    return [{"label": o["label"], "score": round(o["score"], 4)} for o in out]

Server-Side Batching

Triton-style micro-batching

pythonproduction

# collect requests for 5ms, run as single forward pass
import asyncio
QUEUE: asyncio.Queue = asyncio.Queue()

async def batcher():
    while True:
        items = [await QUEUE.get()]
        try:
            while len(items) < 32:
                items.append(QUEUE.get_nowait())
        except asyncio.QueueEmpty:
            pass
        texts = [i["text"] for i in items]
        results = clf(texts, batch_size=len(texts))
        for it, r in zip(items, results):
            it["fut"].set_result(r)

@app.on_event("startup")
async def start():
    asyncio.create_task(batcher())

@app.post("/sentiment")
async def sentiment(req: TextReq):
    fut = asyncio.get_event_loop().create_future()
    await QUEUE.put({"text": req.text, "fut": fut})
    return await asyncio.wait_for(fut, timeout=5)

Multilingual

ভাষাভিত্তিক চ্যালেঞ্জ

Tokenizer ভাষা-aware হতে হবে (XLM-R, mBERT, IndicBERT)।
Language detection আগে — তারপর model route।
Right-to-left script (Arabic, Hebrew) — display + preprocessing।
Bengali-Hindi-Tamil mixed input — script detector + transliterate।

Optimization

NLP latency-এর শত্রু

Tokenizer slow → use_fast=True (Rust)।
Long text → chunk + aggregate, কখনো truncate।
Padding waste compute → dynamic padding বা bucketing।
FP16 + ONNX → BERT-class model 3–5x faster।
Sentence embedding-এ Redis cache (hash(text) → vec)।

Pitfalls

যা production-এ ভাঙে

Tokenizer mismatch — training vs serving — silent accuracy drop।
UTF-8 / encoding bug — emoji, RTL char।
Max length silently truncate — long doc-এ context হারায়।
PII leak — log-এ raw text store।
Model 1.5GB — pod cold start 30s, autoscale ভাঙে।

Mini Project

Multilingual sentiment API

XLM-R sentiment model FastAPI-তে wrap।
Server-side micro-batching যোগ করো।
Redis cache hash(text) → label।
100 concurrent request-এ p95 measure।

Phase 9 Complete

তুমি যা শিখলে

Recsys, chatbot, CV, NLP — চার domain-এর production blueprint। পরবর্তী Phase: AI System Architecture — microservices, event-driven, low-latency design।

← Roadmap-এ ফিরুন

পরবর্তী: Microservices for AIশীঘ্রই