Phase 12 · Chapter 12.02
System Design Interviews for AI
45-60 মিনিটে একটা AI system design করে দেখানো — interview-এর সবচেয়ে high-signal round। Framework ছাড়া attempt করা = fail।
Framework
6-step ML system design
textproduction
1. Clarify — problem, scale, constraint (5 min)
2. Metric — offline + online, business goal (5 min)
3. Data — source, labeling, freshness (5 min)
4. Model — baseline → advanced, trade-off (10 min)
5. System — train pipeline + serving + monitoring (15 min)
6. Scale + risk — bottleneck, failure mode, future (10 min)Step 1: Clarify
Question to ask interviewer
- QPS? Daily active users? Geographic scope?
- Latency requirement (p99)?
- Personalization level (per-user vs cohort)?
- Cold start expected? New users / new items?
- Privacy / regulatory constraint (GDPR, HIPAA)?
Step 2: Metric
Offline vs Online — দুটোই বলো
- Recsys: offline NDCG@10, online CTR + dwell time।
- Search: offline MRR, online click-through, zero-result rate।
- Fraud: offline PR-AUC, online $ saved / false-positive cost।
- LLM: offline BLEU/eval suite, online thumbs-up rate।
- Always tie to business: revenue, retention, cost।
Step 5: Architecture
Reference diagram — always draw this
textproduction
┌────────────┐ ┌──────────────┐ ┌──────────────┐
│ Client │──▶│ API Gateway │──▶│ Inference │
└────────────┘ │ (auth, rate) │ │ Service │
└──────────────┘ │ (GPU pool) │
└──────┬───────┘
│
┌────────────────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌──────────┐
│ Feature │ │ Model Store │ │ Cache │
│ Store │ │ (MLflow/S3) │ │ (Redis) │
└─────┬────┘ └──────┬───────┘ └──────────┘
│ │
┌──────────┴──────────┐ ┌──────┴──────┐
│ Online (Redis) │ │ CI/CD │
│ Offline (Parquet/BQ)│ │ retrain DAG │
└─────────────────────┘ │ (Airflow) │
└─────────────┘Common Questions
Top 10 asked
- Design YouTube recommendation system।
- Design Twitter/X timeline ranker।
- Design Uber ETA prediction।
- Design credit card fraud detection।
- Design ChatGPT serving infrastructure।
- Design Google search autocomplete।
- Design Amazon "people also bought"।
- Design Instagram explore page।
- Design ad click-through prediction।
- Design self-driving perception pipeline।
Scoring Rubric
Interviewer যা mark করে
- Clarification (10%): assumption আগে clarify করেছ কি?
- Metric (15%): offline-online gap জানো কি?
- Data (15%): labeling, freshness, leakage handle?
- Model (15%): baseline → advanced, trade-off।
- System (25%): train + serve + monitor wiring।
- Scale + risk (20%): bottleneck identify + mitigation।
Pitfalls
Interview-এ যা fail করায়
- Clarify না করেই solution লেখা শুরু।
- সবচেয়ে fancy model দিয়ে শুরু — baseline মেনশন না।
- Monitoring/retraining skip — "model deploy করলেই শেষ"।
- Number ছাড়া capacity claim ("scalable" বললে চলবে না)।
- Trade-off না বলে decision impose।
Practice Plan
4 সপ্তাহে interview-ready
- Week 1: framework মুখস্থ + 3 mock নিজে লেখো।
- Week 2: top 10 question-এর reference solution পড়ো (Alex Xu, Chip Huyen)।
- Week 3: peer-এর সাথে mock interview — record + review।
- Week 4: actual interview, প্রতিটার পর note।
Takeaway
মূল কথা
System design = communication test। Framework + trade-off + number — এই তিনটা থাকলে interview clear হয়, perfect solution লাগে না।
← Roadmap-এ ফিরুন
পরবর্তী: Production AI Best Practicesশীঘ্রই