Single-modality retrieval edge cases मध्ये fail होते: dense vectors rare tokens आणि IDs miss करतात; pure lexical paraphrase आणि semantic similarity miss करते. Hybrid retrieval complementary signals - dense semantic, sparse lexical, structured metadata, temporal freshness - fuse करून stable high-precision candidate sets देते. हा article architecture, normalization, scoring fusion, failure handling, आणि evaluation स्पष्ट करतो.
Motivation
Failure scenarios:
- Proper nouns / SKU codes dense model कडून miss होणे.
- Pricing change queries ने temporal boost नसलेला stale snapshot आणणे.
- Long natural questions sparse-only system मध्ये stopwords वर over-weight होणे.
- Semantically broad marketing pages वर vector false positives, lexical anchoring नसणे.
Hybrid orthogonal evidence dimensions capture करून हे कमी करते.
Component Layering
Recommended flow:
- Query Embedding -> ANN search (k_vec)
- Lexical Search (BM25 / SPLADE / Elasticsearch) (k_lex)
- Union -> Score Normalization (per source scaling)
- Metadata Filter Pass (locale, access_tier, page_type)
- Diversity & Freshness Adjustments
- Optional Cross/Mono Re-Ranker
- Final Truncation (top K)
Audit साठी raw pre-fusion scores ठेवा.
Query Normalization
Steps:
- Unicode normalize NFKC
- Lowercase; answer formatting साठी casing snapshot लागल्यास preserve करा.
- Tokenize करा आणि stopwords preserve करा; semantic embeddings context वापरू शकतात.
- Synonym / Alias Expansion: internal product codename mapping साठी alternative tokens append करा; model prompt मध्ये टाकू नका, sparse retrieval साठीच वापरा.
- Numeric & Version Extraction: targeted lexical scoring साठी X.Y.Z patterns capture करा.
Metadata & Attribute Filters
Initial candidate union नंतर filters लावल्याने recall loss कमी होतो. Common fields: locale, access_tier, page_type, product_area, updated_bucket. Security filters (tenant / tier) scoring fusion आधी enforce करा, leakage re-ranking वर प्रभाव करू नये म्हणून. Inspection साठी filtered_out set देणारा debug mode द्या.
Re-Ranking Strategy
Top N (10-20) वर lightweight cross-encoder वापरा. Latency budget पेक्षा जास्त असेल तर degrade करा: re-rank skip करा किंवा candidate count कमी करून lexical weight वाढवा. Cost justify करण्यासाठी re_rank_delta = MRR_post - MRR_pre track करा. Short TTL मध्ये identical union sets साठी re-rank results cache करा.
Freshness & Temporal Signals
freshness_weight = exp(-lambda * age_days) compute करा, जिथे lambda content type नुसार tuned असतो (pricing higher, API stable lower). Combine: final_score = w_sem * sem_score + w_lex * lex_score + w_fresh * freshness_weight + w_meta * meta_priors. Dominance टाळण्यासाठी प्रत्येक component आधी normalize करा (z-score किंवा min-max).
Failure Modes
| Failure | Cause | Mitigation |
|---|---|---|
| Popularity Bias | Overweight lexical tf-idf | Cap term frequency contribution |
| Stale Results | Freshness weight mis-tuned | Recalibrate lambda using evaluation set |
| Locale Leakage | Late filter application | Move security filters earlier |
| Semantic Drift | Embedding model upgrade | Dual-index and A/B compare before rollout |
| Over-fusion Noise | Unbounded union size | Limit union, diversity pruning |
Evaluation Framework
Experiments:
- Ablation: vector only, lexical only, hybrid without rerank, full - Recall@k आणि MRR measure करा.
- Fusion Weight Tuning: validation gold set वापरून weights grid search करा.
- Latency Budget: configuration नुसार mean + P95 retrieval latency track करा.
- Drift: head vs tail queries साठी weekly relative recall change monitor करा.
Config hashes सह evaluation manifest ठेवा.
Optimization Loop
Cycle:
- Retrieval traces log करा: query, candidates, scores, source_tag.
- Mis-hits शोधा: downstream low faithfulness किंवा low citation count; root cause classify करा - missing lexical candidate, semantic false positive, stale content.
- Weights / thresholds adjust करा; offline suite run करा.
- New fusion weights feature flag मागे canary करा.
- Statistically significant improvement वर promote करा.
Key Takeaways
- Hybrid retrieval tunable dials ची system आहे - relentlessly instrument करा.
- Security आणि access filters लवकर apply करा; scoring मध्ये leakage टाळा.
- Re-ranking ने measurable MRR / Recall lift देऊन latency justify केली पाहिजे.
- Temporal decay outdated, high-authority pages dominate होऊ देत नाही.
- Fusion changes code सारखे हाताळा: version, evaluate, roll forward किंवा back.