Phase 9 · Chapter 9.01

Recommendation System Deployment

Million items, billion users, ১০০ms-এ top-10 দেখাও। Recsys = retrieval + ranking — দুই stage-এর dance।

Why Two Stages

Brute-force unfeasible

১০ million item × ১ million user = ১০ trillion score। Real-time অসম্ভব। তাই — candidate generation দিয়ে ১০M → ১০০০, তারপর ranking দিয়ে ১০০০ → ১০।

Architecture

Layered design

textproduction

User request
   │
   ▼
[ Candidate Generation ]   100ms budget
  - ANN search (FAISS, ScaNN)
  - collaborative filtering
  - popularity / trending
  - business rules
   │  (≈1000 items)
   ▼
[ Ranking ]                30ms budget
  - gradient boosted tree / DNN
  - features: user × item × context
   │  (top 100)
   ▼
[ Re-ranking / Filters ]   10ms
  - diversity, freshness, business rules
  - dedup, blacklist
   │
   ▼
Top-10 result

Two-Tower Retrieval

Embedding-based candidate gen

pythonproduction

# user_tower(user_features) -> user_emb (128-d)
# item_tower(item_features) -> item_emb (128-d)
# score = dot(user_emb, item_emb)

# OFFLINE: precompute all item embeddings, index them
import faiss, numpy as np
item_emb = item_tower.predict(all_items)        # (10M, 128)
index = faiss.IndexFlatIP(128)
index.add(item_emb.astype("float32"))
faiss.write_index(index, "items.faiss")

# ONLINE: per-request ANN search
def candidates(user_feat, k=1000):
    u = user_tower.predict([user_feat]).astype("float32")
    _, ids = index.search(u, k)
    return ids[0]

Ranker

Per-candidate scoring

pythonproduction

import lightgbm as lgb

def rank(user_id, candidate_ids):
    feats = build_features(user_id, candidate_ids)   # (1000, F)
    scores = ranker.predict(feats)
    order = np.argsort(-scores)
    return [candidate_ids[i] for i in order[:100]]

Serving Stack

Production components

Vector DB: Milvus, Pinecone, Vespa, pgvector।
Feature store: online lookup < 5ms (Feast + Redis)।
Model serving: Triton / TorchServe; ONNX optimized।
Cache: Redis — popular user/segment-এর top-N cached।
Event log: Kafka — impression/click ফেরে training-এ।

Production Concerns

যা beginner ভাবে না

Cold start: নতুন user — popularity fallback; নতুন item — content-based embedding।
Position bias: top-position click বেশি — training-এ inverse propensity weighting।
Diversity: top-10 সব এক category হলে user বিরক্ত — MMR re-ranker।
Freshness: news / video — recency penalty।
Exploration: ε-greedy বা bandit — নতুন item-কে chance দাও।

Pitfalls

Recsys-এর landmines

Offline NDCG ভালো, online CTR খারাপ — feedback loop missing।
Filter bubble — diversity penalty না থাকলে user-এর world narrow।
Item embedding stale — daily refresh job না থাকলে recommendation outdated।
Privacy — user-item interaction sensitive, anonymize + retention policy।

Mini Project

Two-tower toy recsys

MovieLens dataset দিয়ে user + item embedding train।
FAISS index বানাও, FastAPI /recommend/{user_id} endpoint।
LightGBM ranker দিয়ে top-100 → top-10।
Latency p95 measure করো।

Takeaway

মনে রাখো

Recsys = retrieve fast, rank smart, re-rank wisely। তিন stage-এই latency budget ভাগ করো।

← Roadmap-এ ফিরুন পরবর্তী: Chatbot Deployment System