Jump go di content
Glossary

Chunking

Chunking na process of splitting source documents into smaller retrieval units before embedding dem. Di chunk size and boundary strategy determine how precisely retriever fit locate relevant fact, balancing recall, precision, and embedding cost across knowledge base.

Synonym dem: text chunking, document segmentation, passage splitting, chunk strategy

Chunking na where retrieval quality quietly win or lose. Strategy fit be fixed token window, overlapping sliding window, or boundaries wey follow semantic structure like headings and sections. Each chunk dey embedded and indexed with metadata: source, language, timestamps, content hash, so retrieval fit filter, deduplicate, and refresh incrementally. Because every downstream answer only good as di passage wey e retrieve, deliberate chunking na prerequisite for grounded, citable responses.

Question dem wey people dey ask well-well

Wetin make good chunk?
Good chunk dey semantically self-contained, sized so one fact no split across boundaries, and carry stable metadata so e fit be filtered, refreshed, and cited reliably.
How chunking affect answer quality?
Chunks wey too large dilute relevance and waste tokens, while chunks wey too small break context and lose meaning. Boundary choices directly shape recall and grounding of generated answers.