Chunking
Chunking shi ne raba source documents zuwa kananan retrieval units kafin a yi embedding. Girman chunk da boundary strategy suna tantance yadda retriever zai iya gano relevant fact daidai, yana daidaita recall, precision, da embedding cost a knowledge base.
Kalmomi masu kama: text chunking, document segmentation, passage splitting, chunk strategy
Chunking wuri ne da retrieval quality ke nasara ko faduwa a hankali. Strategy na iya zama fixed token window, overlapping sliding window, ko boundaries da ke bin semantic structure kamar headings da sections. Kowane chunk ana embedding da indexing tare da metadata - source, language, timestamps, content hash - domin retrieval ya iya filtering, deduplication, da incremental refresh. Saboda kowace downstream answer tana da kyau ne gwargwadon passage da aka retrieve, chunking na hankali wajibi ne ga responses masu grounded da citation.