Phase 9 · Chapter 9.01
Recommendation System Deployment
Million items, billion users, ১০০ms-এ top-10 দেখাও। Recsys = retrieval + ranking — দুই stage-এর dance।
Why Two Stages
Brute-force unfeasible
১০ million item × ১ million user = ১০ trillion score। Real-time অসম্ভব। তাই — candidate generation দিয়ে ১০M → ১০০০, তারপর ranking দিয়ে ১০০০ → ১০।
Architecture
Layered design
textproduction
User request
│
▼
[ Candidate Generation ] 100ms budget
- ANN search (FAISS, ScaNN)
- collaborative filtering
- popularity / trending
- business rules
│ (≈1000 items)
▼
[ Ranking ] 30ms budget
- gradient boosted tree / DNN
- features: user × item × context
│ (top 100)
▼
[ Re-ranking / Filters ] 10ms
- diversity, freshness, business rules
- dedup, blacklist
│
▼
Top-10 resultTwo-Tower Retrieval
Embedding-based candidate gen
pythonproduction
# user_tower(user_features) -> user_emb (128-d)
# item_tower(item_features) -> item_emb (128-d)
# score = dot(user_emb, item_emb)
# OFFLINE: precompute all item embeddings, index them
import faiss, numpy as np
item_emb = item_tower.predict(all_items) # (10M, 128)
index = faiss.IndexFlatIP(128)
index.add(item_emb.astype("float32"))
faiss.write_index(index, "items.faiss")
# ONLINE: per-request ANN search
def candidates(user_feat, k=1000):
u = user_tower.predict([user_feat]).astype("float32")
_, ids = index.search(u, k)
return ids[0]Ranker
Per-candidate scoring
pythonproduction
import lightgbm as lgb
def rank(user_id, candidate_ids):
feats = build_features(user_id, candidate_ids) # (1000, F)
scores = ranker.predict(feats)
order = np.argsort(-scores)
return [candidate_ids[i] for i in order[:100]]Serving Stack
Production components
- Vector DB: Milvus, Pinecone, Vespa, pgvector।
- Feature store: online lookup < 5ms (Feast + Redis)।
- Model serving: Triton / TorchServe; ONNX optimized।
- Cache: Redis — popular user/segment-এর top-N cached।
- Event log: Kafka — impression/click ফেরে training-এ।
Production Concerns
যা beginner ভাবে না
- Cold start: নতুন user — popularity fallback; নতুন item — content-based embedding।
- Position bias: top-position click বেশি — training-এ inverse propensity weighting।
- Diversity: top-10 সব এক category হলে user বিরক্ত — MMR re-ranker।
- Freshness: news / video — recency penalty।
- Exploration: ε-greedy বা bandit — নতুন item-কে chance দাও।
Pitfalls
Recsys-এর landmines
- Offline NDCG ভালো, online CTR খারাপ — feedback loop missing।
- Filter bubble — diversity penalty না থাকলে user-এর world narrow।
- Item embedding stale — daily refresh job না থাকলে recommendation outdated।
- Privacy — user-item interaction sensitive, anonymize + retention policy।
Mini Project
Two-tower toy recsys
- MovieLens dataset দিয়ে user + item embedding train।
- FAISS index বানাও, FastAPI
/recommend/{user_id}endpoint। - LightGBM ranker দিয়ে top-100 → top-10।
- Latency p95 measure করো।
Takeaway
মনে রাখো
Recsys = retrieve fast, rank smart, re-rank wisely। তিন stage-এই latency budget ভাগ করো।