Hybrid Retrieval: Vector + Keyword + Metadata

Single-modality retrieval edge cases मध्ये fail होते: dense vectors rare tokens आणि IDs miss करतात; pure lexical paraphrase आणि semantic similarity miss करते. Hybrid retrieval complementary signals - dense semantic, sparse lexical, structured metadata, temporal freshness - fuse करून stable high-precision candidate sets देते. हा article architecture, normalization, scoring fusion, failure handling, आणि evaluation स्पष्ट करतो.

Motivation

Failure scenarios:

Proper nouns / SKU codes dense model कडून miss होणे.
Pricing change queries ने temporal boost नसलेला stale snapshot आणणे.
Long natural questions sparse-only system मध्ये stopwords वर over-weight होणे.
Semantically broad marketing pages वर vector false positives, lexical anchoring नसणे.

Hybrid orthogonal evidence dimensions capture करून हे कमी करते.

Component Layering

Recommended flow:

Query Embedding -> ANN search (k_vec)
Lexical Search (BM25 / SPLADE / Elasticsearch) (k_lex)
Union -> Score Normalization (per source scaling)
Metadata Filter Pass (locale, access_tier, page_type)
Diversity & Freshness Adjustments
Optional Cross/Mono Re-Ranker
Final Truncation (top K)

Audit साठी raw pre-fusion scores ठेवा.

Query Normalization

Steps:

Unicode normalize NFKC
Lowercase; answer formatting साठी casing snapshot लागल्यास preserve करा.
Tokenize करा आणि stopwords preserve करा; semantic embeddings context वापरू शकतात.
Synonym / Alias Expansion: internal product codename mapping साठी alternative tokens append करा; model prompt मध्ये टाकू नका, sparse retrieval साठीच वापरा.
Numeric & Version Extraction: targeted lexical scoring साठी X.Y.Z patterns capture करा.

Initial candidate union नंतर filters लावल्याने recall loss कमी होतो. Common fields: locale, access_tier, page_type, product_area, updated_bucket. Security filters (tenant / tier) scoring fusion आधी enforce करा, leakage re-ranking वर प्रभाव करू नये म्हणून. Inspection साठी filtered_out set देणारा debug mode द्या.

Re-Ranking Strategy

Top N (10-20) वर lightweight cross-encoder वापरा. Latency budget पेक्षा जास्त असेल तर degrade करा: re-rank skip करा किंवा candidate count कमी करून lexical weight वाढवा. Cost justify करण्यासाठी re_rank_delta = MRR_post - MRR_pre track करा. Short TTL मध्ये identical union sets साठी re-rank results cache करा.

Freshness & Temporal Signals

freshness_weight = exp(-lambda * age_days) compute करा, जिथे lambda content type नुसार tuned असतो (pricing higher, API stable lower). Combine: final_score = w_sem * sem_score + w_lex * lex_score + w_fresh * freshness_weight + w_meta * meta_priors. Dominance टाळण्यासाठी प्रत्येक component आधी normalize करा (z-score किंवा min-max).

Failure Modes

Failure	Cause	Mitigation
Popularity Bias	Overweight lexical tf-idf	Cap term frequency contribution
Stale Results	Freshness weight mis-tuned	Recalibrate lambda using evaluation set
Locale Leakage	Late filter application	Move security filters earlier
Semantic Drift	Embedding model upgrade	Dual-index and A/B compare before rollout
Over-fusion Noise	Unbounded union size	Limit union, diversity pruning

Evaluation Framework

Experiments:

Ablation: vector only, lexical only, hybrid without rerank, full - Recall@k आणि MRR measure करा.
Fusion Weight Tuning: validation gold set वापरून weights grid search करा.
Latency Budget: configuration नुसार mean + P95 retrieval latency track करा.
Drift: head vs tail queries साठी weekly relative recall change monitor करा.

Config hashes सह evaluation manifest ठेवा.

Optimization Loop

Cycle:

Retrieval traces log करा: query, candidates, scores, source_tag.
Mis-hits शोधा: downstream low faithfulness किंवा low citation count; root cause classify करा - missing lexical candidate, semantic false positive, stale content.
Weights / thresholds adjust करा; offline suite run करा.
New fusion weights feature flag मागे canary करा.
Statistically significant improvement वर promote करा.

Key Takeaways

Hybrid retrieval tunable dials ची system आहे - relentlessly instrument करा.
Security आणि access filters लवकर apply करा; scoring मध्ये leakage टाळा.
Re-ranking ने measurable MRR / Recall lift देऊन latency justify केली पाहिजे.
Temporal decay outdated, high-authority pages dominate होऊ देत नाही.
Fusion changes code सारखे हाताळा: version, evaluate, roll forward किंवा back.