सामग्रीकडे जा

Hybrid Retrieval: Vector + Keyword + Metadata

Website RAG साठी resilient hybrid retrieval engineering: vector, lexical, metadata आणि temporal signals blend करणे.

retrieval • hybrid • search • rag

Single-modality retrieval edge cases मध्ये fail होते: dense vectors rare tokens आणि IDs miss करतात; pure lexical paraphrase आणि semantic similarity miss करते. Hybrid retrieval complementary signals - dense semantic, sparse lexical, structured metadata, temporal freshness - fuse करून stable high-precision candidate sets देते. हा article architecture, normalization, scoring fusion, failure handling, आणि evaluation स्पष्ट करतो.

Motivation

Failure scenarios:

  • Proper nouns / SKU codes dense model कडून miss होणे.
  • Pricing change queries ने temporal boost नसलेला stale snapshot आणणे.
  • Long natural questions sparse-only system मध्ये stopwords वर over-weight होणे.
  • Semantically broad marketing pages वर vector false positives, lexical anchoring नसणे.

Hybrid orthogonal evidence dimensions capture करून हे कमी करते.

Component Layering

Recommended flow:

  1. Query Embedding -> ANN search (k_vec)
  2. Lexical Search (BM25 / SPLADE / Elasticsearch) (k_lex)
  3. Union -> Score Normalization (per source scaling)
  4. Metadata Filter Pass (locale, access_tier, page_type)
  5. Diversity & Freshness Adjustments
  6. Optional Cross/Mono Re-Ranker
  7. Final Truncation (top K)

Audit साठी raw pre-fusion scores ठेवा.

Query Normalization

Steps:

  • Unicode normalize NFKC
  • Lowercase; answer formatting साठी casing snapshot लागल्यास preserve करा.
  • Tokenize करा आणि stopwords preserve करा; semantic embeddings context वापरू शकतात.
  • Synonym / Alias Expansion: internal product codename mapping साठी alternative tokens append करा; model prompt मध्ये टाकू नका, sparse retrieval साठीच वापरा.
  • Numeric & Version Extraction: targeted lexical scoring साठी X.Y.Z patterns capture करा.

Metadata & Attribute Filters

Initial candidate union नंतर filters लावल्याने recall loss कमी होतो. Common fields: locale, access_tier, page_type, product_area, updated_bucket. Security filters (tenant / tier) scoring fusion आधी enforce करा, leakage re-ranking वर प्रभाव करू नये म्हणून. Inspection साठी filtered_out set देणारा debug mode द्या.

Re-Ranking Strategy

Top N (10-20) वर lightweight cross-encoder वापरा. Latency budget पेक्षा जास्त असेल तर degrade करा: re-rank skip करा किंवा candidate count कमी करून lexical weight वाढवा. Cost justify करण्यासाठी re_rank_delta = MRR_post - MRR_pre track करा. Short TTL मध्ये identical union sets साठी re-rank results cache करा.

Freshness & Temporal Signals

freshness_weight = exp(-lambda * age_days) compute करा, जिथे lambda content type नुसार tuned असतो (pricing higher, API stable lower). Combine: final_score = w_sem * sem_score + w_lex * lex_score + w_fresh * freshness_weight + w_meta * meta_priors. Dominance टाळण्यासाठी प्रत्येक component आधी normalize करा (z-score किंवा min-max).

Failure Modes

FailureCauseMitigation
Popularity BiasOverweight lexical tf-idfCap term frequency contribution
Stale ResultsFreshness weight mis-tunedRecalibrate lambda using evaluation set
Locale LeakageLate filter applicationMove security filters earlier
Semantic DriftEmbedding model upgradeDual-index and A/B compare before rollout
Over-fusion NoiseUnbounded union sizeLimit union, diversity pruning

Evaluation Framework

Experiments:

  • Ablation: vector only, lexical only, hybrid without rerank, full - Recall@k आणि MRR measure करा.
  • Fusion Weight Tuning: validation gold set वापरून weights grid search करा.
  • Latency Budget: configuration नुसार mean + P95 retrieval latency track करा.
  • Drift: head vs tail queries साठी weekly relative recall change monitor करा.

Config hashes सह evaluation manifest ठेवा.

Optimization Loop

Cycle:

  1. Retrieval traces log करा: query, candidates, scores, source_tag.
  2. Mis-hits शोधा: downstream low faithfulness किंवा low citation count; root cause classify करा - missing lexical candidate, semantic false positive, stale content.
  3. Weights / thresholds adjust करा; offline suite run करा.
  4. New fusion weights feature flag मागे canary करा.
  5. Statistically significant improvement वर promote करा.

Key Takeaways

  • Hybrid retrieval tunable dials ची system आहे - relentlessly instrument करा.
  • Security आणि access filters लवकर apply करा; scoring मध्ये leakage टाळा.
  • Re-ranking ने measurable MRR / Recall lift देऊन latency justify केली पाहिजे.
  • Temporal decay outdated, high-authority pages dominate होऊ देत नाही.
  • Fusion changes code सारखे हाताळा: version, evaluate, roll forward किंवा back.