হোম/Roadmap/Chapter 9.01
Phase 9 · Chapter 9.01

Recommendation System Deployment

Million items, billion users, ১০০ms-এ top-10 দেখাও। Recsys = retrieval + ranking — দুই stage-এর dance।

Why Two Stages

Brute-force unfeasible

১০ million item × ১ million user = ১০ trillion score। Real-time অসম্ভব। তাই — candidate generation দিয়ে ১০M → ১০০০, তারপর ranking দিয়ে ১০০০ → ১০।

Architecture

Layered design

textproduction
User request
   │
   ▼
[ Candidate Generation ]   100ms budget
  - ANN search (FAISS, ScaNN)
  - collaborative filtering
  - popularity / trending
  - business rules
   │  (≈1000 items)
   ▼
[ Ranking ]                30ms budget
  - gradient boosted tree / DNN
  - features: user × item × context
   │  (top 100)
   ▼
[ Re-ranking / Filters ]   10ms
  - diversity, freshness, business rules
  - dedup, blacklist
   │
   ▼
Top-10 result
Two-Tower Retrieval

Embedding-based candidate gen

pythonproduction
# user_tower(user_features) -> user_emb (128-d)
# item_tower(item_features) -> item_emb (128-d)
# score = dot(user_emb, item_emb)

# OFFLINE: precompute all item embeddings, index them
import faiss, numpy as np
item_emb = item_tower.predict(all_items)        # (10M, 128)
index = faiss.IndexFlatIP(128)
index.add(item_emb.astype("float32"))
faiss.write_index(index, "items.faiss")

# ONLINE: per-request ANN search
def candidates(user_feat, k=1000):
    u = user_tower.predict([user_feat]).astype("float32")
    _, ids = index.search(u, k)
    return ids[0]
Ranker

Per-candidate scoring

pythonproduction
import lightgbm as lgb

def rank(user_id, candidate_ids):
    feats = build_features(user_id, candidate_ids)   # (1000, F)
    scores = ranker.predict(feats)
    order = np.argsort(-scores)
    return [candidate_ids[i] for i in order[:100]]
Serving Stack

Production components

  • Vector DB: Milvus, Pinecone, Vespa, pgvector।
  • Feature store: online lookup < 5ms (Feast + Redis)।
  • Model serving: Triton / TorchServe; ONNX optimized।
  • Cache: Redis — popular user/segment-এর top-N cached।
  • Event log: Kafka — impression/click ফেরে training-এ।
Production Concerns

যা beginner ভাবে না

  • Cold start: নতুন user — popularity fallback; নতুন item — content-based embedding।
  • Position bias: top-position click বেশি — training-এ inverse propensity weighting।
  • Diversity: top-10 সব এক category হলে user বিরক্ত — MMR re-ranker।
  • Freshness: news / video — recency penalty।
  • Exploration: ε-greedy বা bandit — নতুন item-কে chance দাও।
Pitfalls

Recsys-এর landmines

  • Offline NDCG ভালো, online CTR খারাপ — feedback loop missing।
  • Filter bubble — diversity penalty না থাকলে user-এর world narrow।
  • Item embedding stale — daily refresh job না থাকলে recommendation outdated।
  • Privacy — user-item interaction sensitive, anonymize + retention policy।
Mini Project

Two-tower toy recsys

  1. MovieLens dataset দিয়ে user + item embedding train।
  2. FAISS index বানাও, FastAPI /recommend/{user_id} endpoint।
  3. LightGBM ranker দিয়ে top-100 → top-10।
  4. Latency p95 measure করো।
Takeaway

মনে রাখো

Recsys = retrieve fast, rank smart, re-rank wisely। তিন stage-এই latency budget ভাগ করো।