হোম/Roadmap/Chapter 6.03
Phase 6 · Chapter 6.03

Feature Stores

Feature এক টিম বানায়, ১০ টিম use করে। Feature store = features-এর central registry + offline/online serving + point-in-time guarantee।

The Problem

কেন feature store প্রয়োজন হলো

  • Reuse: "user_lifetime_orders" ৫ team ৫ ভাবে compute করে — definition drift।
  • Train/serve skew: training pandas, serving raw SQL — subtle bug।
  • Latency: serving-এ DB join করলে 200ms — feature need 10ms।
  • Point-in-time: training-এ "তখন feature কী ছিল" — future leak এড়াও।
Architecture

Offline + Online — dual store

textproduction
             ┌──────────── Source ────────────┐
             │  Warehouse / Kafka / DB        │
             └────────────────┬───────────────┘
                              │ transform
                              ▼
                  ┌─────────────────────┐
                  │  Feature definition │  ← single source of truth
                  └──────┬──────────┬───┘
                         │          │
                  offline│          │online
                         ▼          ▼
              Parquet / BigQuery   Redis / DynamoDB
              (training, batch)    (low-latency serve)
Feast Example

Open-source feature store

pythonproduction
# feature_repo/features.py
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64
from datetime import timedelta

user = Entity(name="user_id", join_keys=["user_id"])

source = FileSource(
    path="s3://ml/features/user_stats.parquet",
    timestamp_field="event_ts",
)

user_stats = FeatureView(
    name="user_stats",
    entities=[user],
    ttl=timedelta(days=7),
    schema=[
        Field(name="total_orders",  dtype=Int64),
        Field(name="avg_basket",    dtype=Float32),
        Field(name="days_active",   dtype=Int64),
    ],
    source=source,
)
Training + Serving

Same definition, two paths

pythonproduction
from feast import FeatureStore
store = FeatureStore(repo_path=".")

# --- Training: point-in-time correct join ---
training_df = store.get_historical_features(
    entity_df=labels_df,          # has user_id + event_ts + label
    features=[
        "user_stats:total_orders",
        "user_stats:avg_basket",
    ],
).to_df()

# --- Online serving ---
features = store.get_online_features(
    features=["user_stats:total_orders", "user_stats:avg_basket"],
    entity_rows=[{"user_id": 42}],
).to_dict()
# {"total_orders": [17], "avg_basket": [22.5]}
When to adopt

কখন feature store overkill, কখন essential

  • Overkill: 1 model, 1 team, feature ১০টা — pandas + Redis যথেষ্ট।
  • Essential: Multiple models share features, real-time + batch দুই-ই দরকার, audit/governance চাই।
Players

Ecosystem

  • Feast: Open-source, BYO infra (Redis + S3/BQ)।
  • Tecton: Commercial, end-to-end managed।
  • Vertex AI Feature Store / SageMaker Feature Store — cloud-native managed।
  • Hopsworks: Open-source + enterprise।
Pitfalls

যা ভাঙে

  • Online store stale — materialization job fail।
  • TTL ভুল — expired feature serve।
  • Entity key mismatch — training-এ user_id str, serving-এ int।
  • Feature definition version নেই — silent semantic change।
Mini Project

Local Feast

  1. pip install feast, feast init demo
  2. Iris-এর pseudo user features parquet বানাও।
  3. feast apply + feast materialize
  4. Online lookup script লেখো।
Takeaway

মনে রাখো

Feature store = definition once, serve everywhere, point-in-time correct