Phase 12 · Chapter 12.03
Production AI Best Practices
Tutorial-এ পাওয়া যায় না — এমন principle যা শুধু production-এ পুড়ে শেখা যায়।
Reliability
System যেন না ভাঙে
- SLO define করো (99.5% uptime, p99 < 200ms) — vague target নয়।
- Graceful degradation — model down হলে cached / rule-based fallback।
- Circuit breaker — downstream slow হলে fail fast।
- Idempotency key — retry-এ duplicate write এড়াও।
- Chaos engineering — production-এ pod kill করে test।
Cost
ML cost-এর 80% optimization
textproduction
Layer Saving Technique
─────────────────────────────────────────────────────
GPU inference Batching + quantization (FP16/INT8) → 5x
Embedding Cache hash(input) → vec → 60-80% hit
LLM token Model routing (small → big fallback) → 70%
Storage S3 Intelligent-Tiering → 30-50%
Train job Spot instance + checkpoint → 70%
Data egress Same-region keep → expensive crossSecurity
AI-specific threat
- Prompt injection: user input system prompt override করতে পারে। Input sanitize + output filter।
- PII leak: training data মুখে আসে। Redact pipeline + DLP scanner।
- Model theft: rate limit + query pattern detection।
- Adversarial: CV-তে imperceptible perturbation। Robust training।
- Supply chain: HF model untrusted code execute। Sandbox + signed model।
Governance
Audit, compliance, fairness
- Model card: training data, metric, intended use, limitation document।
- Lineage: data → feature → model → prediction trace করতে পারো।
- Bias audit: protected attribute (gender, race, age) group-wise metric।
- Right-to-explanation: SHAP / LIME দিয়ে individual decision explain।
- Data retention: GDPR/CCPA — delete request honor।
On-call
3 AM page-এ যা করতে হয়
textproduction
1. Acknowledge (5 min) — page-কে quiet করো
2. Triage — user-facing impact কতটুকু? scope?
3. Mitigate — rollback আগে, root cause পরে
4. Communicate — status page + Slack incident channel
5. Resolve — service healthy confirm
6. Postmortem — blameless, action item, timeline"Fix first, understand later" — production-এ এটাই rule।
Team
Senior MLOps-এর soft skill
- RFC culture: বড় change-এ লেখো, review পাও, then ship।
- Runbook: common incident-এর step-by-step doc — junior on-call পারবে।
- Mentor: code review সময় শেখাও, just approve করো না।
- Push back: "AI add করো" এর আগে ROI demand করো।
- Boring tech: hype tool না, proven stack পছন্দ করো।
Pitfalls
Senior-রাও যা ভুল করে
- "Works on my machine" — staging-prod environment parity ভাঙা।
- Manual deploy — Friday 5 PM-এ disaster recipe।
- Monitoring alert fatigue — সবাই mute করে দেয়।
- Model retrain schedule নেই — silent drift।
- Documentation শুধু code review-এর জন্য, real user-এর জন্য না।
Checklist
Production launch-এর আগে
- SLO + alert defined? on-call schedule আছে?
- Rollback strategy tested?
- Cost dashboard ready?
- Security scan (Snyk, Trivy) clean?
- Model card + runbook published?
- Load test passed (2x peak)?
Takeaway
মূল কথা
Production AI = boring engineering + careful experimentation। সবচেয়ে fancy model না, সবচেয়ে reliable system জিতে।
← Roadmap-এ ফিরুন
পরবর্তী: Open Source Contributionশীঘ্রই